The earliest work on pronunciation assessment avoided measuring genuine listener intelligibility,[10] a shortcoming corrected in 2011 at the Toyohashi University of Technology,[11] and included in the Versant high-stakes English fluency assessment from Pearson[12] and mobile apps from 17zuoye Education & Technology,[13] but still missing in 2023 products from Google Search,[14]Microsoft,[15]Educational Testing Service,[16] Speechace,[17] and ELSA.[18] Assessing authentic listener intelligibility is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments;[19][20][21] from words with multiple correct pronunciations;[22] and from phoneme coding errors in machine-readable pronunciation dictionaries.[23] In 2022, researchers found that some newer speech to text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores closely correlated with genuine listener intelligibility.[24] In the Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels.[25]
Although there are as yet no industry-standard benchmarks for evaluating pronunciation assessment accuracy, researchers occasionally release evaluation speech corpuses for others to use for improving assessment quality.[26][27] Such evaluation databases often emphasize formally unaccented pronunciation to the exclusion of genuine intelligibility evident from blinded listener transcriptions.[5] Some promising areas for improvement being developed in 2024 include articulatoryfeature extraction[28][29][30] and transfer learning to suppress unnecessary corrections.[31] Other interesting advances under development include "augmented reality" interfaces for mobile devices using optical character recognition to provide pronunciation training on text found in user environments.[32][33]
^El Kheir, Yassine; et al. (October 21, 2023), Automatic Pronunciation Assessment — A Review, Conference on Empirical Methods in Natural Language Processing, arXiv:2310.13974, S2CID264426545
^ abO’Brien, Mary Grantham; et al. (31 December 2018). "Directions for the future of technology in pronunciation research and teaching". Journal of Second Language Pronunciation. 4 (2): 182–207. doi:10.1075/jslp.17001.obr. hdl:2066/199273. ISSN2215-1931. S2CID86440885. pronunciation researchers are primarily interested in improving L2 learners' intelligibility and comprehensibility, but they have not yet collected sufficient amounts of representative and reliable data (speech recordings with corresponding annotations and judgments) indicating which errors affect these speech dimensions and which do not. These data are essential to train ASR algorithms to assess L2 learners' intelligibility.
^Bernstein, Jared; et al. (November 18, 1990), "Automatic Evaluation and Training in English Pronunciation"(PDF), First International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan: International Speech Communication Association, pp. 1185–1188, retrieved 11 February 2023, listeners differ considerably in their ability to predict unintelligible words.... Thus, it seems the quality rating is a more desirable... automatic-grading score. (Section 2.2.2.)
^Bonk, Bill (25 August 2020). "New innovations in assessment: Versant's Intelligibility Index score". Resources for English Language Learners and Teachers. Pearson English. Archived from the original on 2023-01-27. Retrieved 11 February 2023. you don't need a perfect accent, grammar, or vocabulary to be understandable. In reality, you just need to be understandable with little effort by listeners.
^Gao, Yuan; et al. (May 25, 2018), "Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art", 2nd IEEE Advanced Information Management, Communication, Electronic and Automation Control Conference (IMCEC 2018), pp. 924–927, arXiv:1709.01713, doi:10.1109/IMCEC.2018.8469649, ISBN978-1-5386-1803-5, S2CID31125681
^Alnafisah, Mutleb (September 2022), "Technology Review: Speechace", Proceedings of the 12th Pronunciation in Second Language Learning and Teaching Conference (Virtual PSLLT), no. 40, vol. 12, St. Catharines, Ontario, ISSN2380-9566, retrieved 14 February 2023{{citation}}: CS1 maint: location missing publisher (link)
^E.g., CMUDICT, "The CMU Pronouncing Dictionary". www.speech.cs.cmu.edu. Retrieved 15 February 2023. Compare "four" given as "F AO R" with the vowel AO as in "caught," to "row" given as "R OW" with the vowel OW as in "oat."