Pronunciation assessment

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech,[1][2] as distinguished from manual assessment by an instructor or proctor.[3] Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction.

Pronunciation assessment does not determine unknown speech (as in dictation or automatic transcription) but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their intelligibility to listeners,[4][5] sometimes along with often inconsequential prosody such as intonation, pitch, tempo, rhythm, and syllable and word stress.[6] Pronunciation assessment is also used in reading tutoring, for example in products such as Microsoft Teams[7] and from Amira Learning.[8] Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia.[9]

  1. ^ El Kheir, Yassine; et al. (October 21, 2023), Automatic Pronunciation Assessment — A Review, Conference on Empirical Methods in Natural Language Processing, arXiv:2310.13974, S2CID 264426545
  2. ^ Ehsani, Farzad; Knodt, Eva (July 1998). "Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm". Language Learning & Technology. 2 (1). University of Hawaii National Foreign Language Resource Center; Michigan State University Center for Language Education and Research: 54–73. Retrieved 11 February 2023.
  3. ^ Isaacs, Talia; Harding, Luke (July 2017). "Pronunciation assessment". Language Teaching. 50 (3): 347–366. doi:10.1017/S0261444817000118. ISSN 0261-4448. S2CID 209353525.
  4. ^ Loukina, Anastassia; et al. (September 6, 2015), "Pronunciation accuracy and intelligibility of non-native speech" (PDF), INTERSPEECH 2015, Dresden, Germany: International Speech Communication Association, pp. 1917–1921, only 16% of the variability in word-level intelligibility can be explained by the presence of obvious mispronunciations.
  5. ^ O’Brien, Mary Grantham; et al. (31 December 2018). "Directions for the future of technology in pronunciation research and teaching". Journal of Second Language Pronunciation. 4 (2): 182–207. doi:10.1075/jslp.17001.obr. hdl:2066/199273. ISSN 2215-1931. S2CID 86440885. pronunciation researchers are primarily interested in improving L2 learners' intelligibility and comprehensibility, but they have not yet collected sufficient amounts of representative and reliable data (speech recordings with corresponding annotations and judgments) indicating which errors affect these speech dimensions and which do not. These data are essential to train ASR algorithms to assess L2 learners' intelligibility.
  6. ^ Eskenazi, Maxine (January 1999). "Using automatic speech processing for foreign language pronunciation tutoring: Some issues and a prototype". Language Learning & Technology. 2 (2): 62–76. Retrieved 11 February 2023.
  7. ^ Tholfsen, Mike (9 February 2023). "Reading Coach in Immersive Reader plus new features coming to Reading Progress in Microsoft Teams". Techcommunity Education Blog. Microsoft. Retrieved 12 February 2023.
  8. ^ Banerji, Olina (7 March 2023). "Schools Are Using Voice Technology to Teach Reading. Is It Helping?". EdSurge News. Retrieved 7 March 2023.
  9. ^ Hair, Adam; et al. (19 June 2018). "Apraxia world: A speech therapy game for children with speech sound disorders". Proceedings of the 17th ACM Conference on Interaction Design and Children (PDF). pp. 119–131. doi:10.1145/3202185.3202733. ISBN 9781450351522. S2CID 13790002.