Reliable or Not: A Correlational Study of Automated Scoring and Analytic CAF Indices in A Computer-Based Monological English Speaking Test

Hengzhi Ciel Hu; Harwati Hashim

doi:10.6035/languagev.8676

PDF (English) HTML (English) XML (English)

Publicades: de des. 30, 2025

DOI: https://doi.org/10.6035/languagev.8676

Paraules clau:

avaluació assistida per ordinador, parla anglesa, puntuació automatitzada, CPF, aprenentatge d'idiomes assistit per dispositius mòbils

Hengzhi Ciel Hu

Faculty of Education (TESL) National University of Malaysia

Harwati Hashim

Resum

L'augment de les proves per ordinador ha transformat l'avaluació de la llengua, amb proves per ordinador de producció oral en anglès (CBESTs, acrònim en anglès) que ofereixen avaluacions escalables i eficients. No obstant això, la mesura en què la puntuació automàtica s'alinea amb mesures analítiques, com ara la complexitat, la precisió i la fluïdesa (CPF), segueix sent poc clara. Aquest estudi examina la relació entre les puntuacions automatitzades del CBEST i els índexs de la CPF en les respostes orals de 418 estudiants de secundària xinesos. Els resultats van revelar fortes correlacions entre la precisió automatitzada i les puntuacions de fluïdesa i els seus respectius índexs analítics, demostrant la capacitat del sistema per avaluar la precisió gramatical i la fluïdesa temporal. No obstant això, les puntuacions de complexitat automatitzada es correlacionaven només moderadament amb la longitud mitjana de les clàusules, sense enllaços significatius a la diversitat lèxica. Les correlacions multidimensionals suggereixen la superposició potencial en les construccions de puntuació. Aquests resultats posen de manifest els punts forts i les limitacions de CBESTs en l'avaluació del domini oral, emfatitzant la necessitat de refinar els algoritmes de puntuació per millorar la validesa i l'exhaustivitat en les avaluacions automàtiques del llenguatge.

Descàrregues

Les dades de descàrrega encara no estan disponibles.

Com citar

Ciel Hu, H., & Hashim, H. (2025). Fiable o no: Un estudi correlacional de la puntuació automàtica i els índexs de CAF analítics en un test d’expressió en anglès monològic basat en ordinador. Language Value, 18(2). https://doi.org/10.6035/languagev.8676

Número

Vol. 18 No 2 (2025): Vol. 18.2

Secció

Articles

##plugins.generic.funding.fundingData##

Universiti Kebangsaan Malaysia

Referències

Abbasi, Muhammad Hassan, Shafqat, Asmara, & Farshad, Muhammad (2023). Testing speaking skills in an undergraduate ESL classroom in Pakistan. Journal of Development and Social Sciences, 4(4), 345–352. https://doi.org/10.47205/jdss.2023(4-IV)31

Bagheridoust, Esmaeil, & Khairullah, Yasameen Khalid. (2024). A comparative-correlative study of test rubrics used as benchmarks in assessing IELTS and TOEFL speaking skills. English Language Teaching, 16(6), 1–14. https://doi.org/10.5539/elt.v16n6p1

Bland, John Martin, & Altman, Douglas Graham (2011). Correlation in restricted ranges of data. BMJ, (342), Article d556. https://doi.org/10.1136/bmj.d556

Brahim, Yazan (2023). Computer-based vs. face-to-face speaking assessment: Fitness for purpose from a communicative language testing view. International Journal of Social Science and Human Research, 6(1), 22–30. https://doi.org/10.47191/ijsshr/v6-i1-04

Bulté, Bram, & Housen, Alex (2012). Defining and operationalising L2 complexity. In Alex Housen, Folkert Kuiken, & Ineke Vedder (Eds.), Dimensions of L2 performance and proficiency: complexity, accuracy and fluency in SLA (pp. 23–46). John Benjamins.

Cao, Linlin (2020). Comparison of automatic and expert teachers’ rating of computerized English listening-speaking test. English Language Teaching, 13(1), 18–30. https://doi.org/10.5539/elt.v13n1p18

Chen, Yunxian, & Wang, Zhonghui (2024). Integrating translanguaging in formative assessments. Research Studies in English Language Teaching and Learning, 2(3), 116–131. https://doi.org/10.62583/rseltl.v2i3.46

Ellis, Rod (2009). Task-Based Language Teaching: Sorting out the Misunderstandings. International Journal of Applied Linguistics, 19, 221-246.

http://dx.doi.org/10.1111/j.1473-4192.2009.00231.x

Ellis, Rod, & Barkhuizen, Gary (2005). Analysing learner language. Oxford University Press.

Feng, Ruiling, & Guo, Qian (2022). Second language speech fluency: What is in the picture and what is missing. Frontiers in Psychology, 13, Article 859213. https://doi.org/10.3389/fpsyg.2022.859213

Foster, Paul, Tonkyn, Alan, & Wigglesworth, Gillian (2000). Measuring spoken language: A unit for all reasons. Applied Linguistics, 21(3), 354–375. https://doi.org/10.1093/applin/21.3.354

François, Jennifer, & Albakry, Mohammed (2021). Effect of formulaic sequences on fluency of English learners in standardized speaking tests. Language Learning & Technology, 25(2), 26–41. http://hdl.handle.net/10125/73429

Ganegedara, Thushan (2022). Natural language processing with TensorFlow: The definitive NLP book to implement the most sought-after machine learning models and tasks. Packt Publishing.

Gao, Jianmin, Li, Xin, Gu, Peiqi, & Liu, Ziqi (2020). An evaluation of China’s automated scoring system Bingo English. International Journal of English Linguistic, 10(6), 30–39. https://doi.org/10.5539/ijel.v10n6p30

García Laborda, Jesús, & Amengual Pizarro, Marián (2017). Analysing test-takers’ views on a computer-based speaking tests. Profile: Issues in Teachers’ Professional Development, 19(1), 23–38. https://doi.org/10.15446/profile.v19n_sup1.68447

Ginting, Rita Seroja, Dalimunte, Ahmad Amin, Dalimunte, Muhammad, Kurniati, Eka Yuni, & Adelita, Devika (2023). A critical review of IELTS speaking test. Journal of Linguistics, Literature and Language Teaching, 9(2), 138–155. https://doi.org/10.32505/jl3t.v9i2.7161

Hasnain, Shazia, & Halder, Santoshi (2022). Intricacies of the multifaceted triad-complexity, accuracy, and fluency: A review of studies on measures of oral production. Journal of Education, 204(1), 145–158. https://doi.org/10.1177/0022057422110137

Housen, Alex (2022). Complexity, accuracy and fluency (CAF). In Hassan Mohebbi & Christine Coombe (Eds.), Research questions in language education and applied linguistics (pp. 787–792). Routledge.

Hu, Hengzhi (2022). Computer-delivered English listening and speaking test in Zhongkao: Test-taker perception, motivation and performance. In F. Uslu (Ed.), Abstracts & Proceedings of SOCIOINT 2022- 9th International Conference on Education and Education of Social Sciences (pp. 59-75). The International Society for Technology, Education and Science. https://doi.org/10.46529/socioint.202209

Hu, Hengzhi, Qiuyu, Gong, Mohd Said, Nur Ehsan (2025). Exploring a decade of research: A systematic review of computer-based English speaking tests. Forum for Linguistic Studies, 7(4), 788–803. https://doi.org/10.30564/fls.v7i4.8978

iFLYTEK. (2024). Interim report 2024 of iFLYTEK Co., Ltd. iFLYTEK. https://static.cninfo.com.cn/finalpage/2024-09-19/1221234376.PDF

Jang, Byeong-Yong, & Kwon, Oh-Wook (2016). Computer-based fluency evaluation of English speaking tests for Koreans. Phonetics and Speech Sciences, 6(2), 9–20. https://doi.org/10.13064/KSSS.2014.6.2.009

Jin, Yan, Wang, Wei, Zhang, Xiaoyi, & Zhao, Yinghua (2020). A preliminary investigation of the scoring validity of the CET-SET automated scoring system. China Examination, (7), 25–33. https://doi.org/10.19360/j.cnki.11-3303/g4.2020.07.004

Joo, Mi-Jin (2022). Effects of pre-task and on-line planning on complexity, fluency, and accuracy in computer-based English speaking and writing tests. Korean Journal of English Language and Linguistics, 22, 938–956. https://doi.org/10.15738/kjell.22..202210.938

Kanzaki, Masaya (2017). TOEIC Speaking test: A correlational study and test takers’ reactions. In Peter Clements, Aleda Krause, & Howard Brown (Eds.), Transformation in language education (pp. 441–448). JALT.

Khabbazbashi, Nahal, Nakatsuhara, Fumiyo, Inoue, Chihiro Kaplan, Gabriela, & Green, Anthony (2022). The design and validation of an online speaking test for young learners in Uruguay: Challenges and innovations. International Journal of TESOL Studies, 4(1), 141–168. https://doi.org/10.46451/ijts.2022.01.10

Kullick, Andreas (2024). Supporting young learners in speaking English: Tasks and digital technologies in foreign language learning in primary school. Waxmann Verlag GmbH.

Li, Jingyi (2019). An evaluation of IELTS speaking test. Open Access Library Journal, 16(12), Article 97399. https://doi.org/10.4236/oalib.1105935

Li, Wangqing (2023). A critique of the computer-based English speaking test in Fujian (CEST-FJ). English Language Teaching and Linguistics Studies, 5(3), 123–141. https://doi.org/10.22158/eltls.v5n3p123

Liu, Junyan, & Zhang, Bo (2020). Multi-level rasch model analysis of computer-assisted automated scoring of English listening and speaking tests. In 2020 International Conference on Computer Engineering and Application (pp. 632–636). IEEE. https://doi.org/ 10.1109/ICCEA50009.2020.00138

Liu, Letong, Pan, Yuechen, Guo, Zihao, Li, Ziqing, & Ji, Donglin (2024). Research on survey and design of intelligent English speaking learning app based on large language models. Software Engineering and Applications, 13(2), 262–280. https://doi.org/10.12677/sea.2024.132027

Luoma, Sari (2004). Assessing speaking. Cambridge University Press.

Mashoura, Maryam, Baradaran, Abdollah, & Nazanin, Ghassemi (2021). The comparative effect of audio-taped and written homework feedback on EFL learners’ speaking complexity, accuracy and fluency. International Journal of Applied Linguistics & English Literature, 10(4), 4–10. https://doi.org/10.7575/aiac.ijalel.v.10n.4p.4

Maulana, Andi Reza, & Suharto, Ririn Pratiwi (2024). Exploring the impact of automatic speech recognition on students’ speaking skills and perceptions. Jurnal Ilmu Komputer dan Teknologi, 5(3), 48–55. https://doi.org/10.35960/ikomti.v5i3.1661

McKee, George, Malvern, David, & Richards, Brian (2000). Measuring vocabulary diversity using dedicated software. Applied Linguistics, 15(3), 323–338. https://doi.org/10.1093/llc/15.3.323

Michel, Marije (2017). Complexity, accuracy and fluency (CAF). In Shawn Loewen & Masatoshi Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 2–38). Routledge.

Min, Yu, Li, Chao, & Wang, Xin (2020). Computer based English speaking test based on artifical neural network. Computer Science & IT Research Journal, 1(1), 29–36. https://doi.org/10.51594/csitrj.v1i1.132

Mohan, Gayathree, & Rajeshwari, Rajeshwari (2024). Board games as content and language-integrated teaching and assessment tool for EFL students in Taiwan. Journal of Foreign Language Education and Technology, 9(1), 1–18. https://doi.org/10.2139/ssrn.4737262

Nivre, Joakim (2006). Inductive dependency parsing. Springer.

Norris, John M., & Ortega, Lourdes (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. https://doi.org/10.1093/applin/amp044

Ockey, Gary J., Timpe-Laughlin, Veronika, Davis, Larry, & Gu, Lin (2019). Exploring the potential of a video-mediated interactive speaking assessment. ETS Research Report Series, (1), 1–29. https://doi.org/10.1002/ets2.12240

Pallotti, Gabriele (2021). Measuring complexity, accuracy, and fluency (CAF). In Paula Winke & Tineke Brunfaut (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 201–210). Routledge.

Shoja, Leila, & Maadikhah, Mohammad Mahdi (2024). From CALT to AI: Reviewing the evolution of technology-based language testing and assessment. Unpublished manuscript. https://doi.org/10.13140/RG.2.2.13997.81125

Skehan, Peter (2009). Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics, 30(4), 510–532. https://doi.org/10.1093/applin/amp047

Sun, Haixia (2022). How to teach spoken english in junior high schools to cope with the “human-computer dialogue” test. Learning Week, 10(10), 77–79. https://doi.org/10.16657/j.cnki.issn1673-9132.2022.10.026

Sun, Yan (2024). The application of intelligent speech recognition in the teaching of spoken English in colleges and universities. Applied Mathematics and Nonlinear Sciences, 9(1), 1–15. https://doi.org/10.2478/amns-2024-2125

Sundqvist, Pia, & Sandlund, Erica (2024). Testing talk: Ways to assess second language oral proficiency. Bloomsbury Publishing.

Suwandi, I Nyoman (2023). The assessment of students’ speaking skills in Indonesian classrooms. International Journal of Social Science, 2(6), 3465–3470. https://doi.org/10.53625/ijss.v2i6.6103

Tan, Jinjin (2021). Research on computer-aided English language evaluation system. Journal of Physics Conference Series, 1992(3), Article 032103. https://doi.org/10.1088/1742-6596/1992/3/032103

Wang, Jiaying (2023). Research on English pronunciation teaching strategies based on man-machine dialogue test mode. Journal of Qiqihar Teachers’ College, (4), 154–156. https://doi.org/10.3969/j.issn.1009-3958.2023.04.043

Wei, Si, Zhu, Bo, & Wang, Shijin (2019). 语音评测技术助力英语口语教学与评价 [Voice evaluation technology facilitates English speaking teaching and evaluation]. Artificial Intelligence, (3), 73–79. https://aic-fe.bnu.edu.cn/docs/20200709135938655519.pdf

Yan, Tianqin, Zheng, Qikui, & Huang, Jinrui (2024). A study of the AI-aided teaching of college oral English. Advances in Social Sciences, 13(8), 38–43. https://doi.org/10.12677/ass.2024.138670

Fiable o no: Un estudi correlacional de la puntuació automàtica i els índexs de CAF analítics en un test d'expressió en anglès monològic basat en ordinador

Resum

Descàrregues

##plugins.generic.funding.fundingData##

Referències

Articles més llegits del mateix autor/a

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Resum

Descàrregues

##plugins.themes.bootstrap3.article.details##

##plugins.generic.funding.fundingData##

Referències

Articles més llegits del mateix autor/a