Recognition of L1 and L2 speech : Comparing accuracy and hallucination in human and Whisper transcriptions

Seung-Eun Kim

Recognition of L1 and L2 speech : Comparing accuracy and hallucination in human and Whisper transcriptions

원문정보

Seung-Eun Kim

국제언어인문학회 인문언어 제27권 1호 2025.06 pp.51-78 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Both human listeners and Automatic Speech Recognition (ASR) systems tend to struggle more with recognizing second-language (L2) speech than first-language (L1) speech. This study examined the performance of Whisper (a state-of-the-art ASR system) and L1 English listeners in recognizing L1 and L2 English under a controlled, homogeneous setting (using the same sentences and data collection procedures), enabling a direct comparison across listener and talker types. Speech recordings from 67 L2 English talkers and 25 L1 English talkers embedded in varying levels of background noise were tested, and transcriptions from Whisper and humans were analyzed. Across both L1 and L2 speech, Whisper showed overall higher word recognition accuracy than humans. Notably, it achieved near-perfect performance in quiet or low-noise conditions. Despite this superior performance, Whisper showed greater hallucination rates than humans under loud-noise conditions, with a particularly large gap for L2 speech. Further analysis revealed that Whisper’s hallucination rates remained higher for L2 than L1 speech even after controlling for accuracy, suggesting that these hallucinations are not merely a byproduct of recognition difficulty but reflect a functional difference in how Whisper processes L1 vs. L2 speech. Overall, these findings underscore both the strengths and limitations of Whisper: its robustness in clean listening conditions, but also its hallucination bias against L2 speech.

1. Introduction
2. Methods
2.1. Speech materials
2.2. Word recognition accuracy and hallucination rate
2.3. Statistical analysis
3. Results
3.1. Word recognition accuracy and hallucination rate of L2 speech
3.2. Word recognition accuracy and hallucination rate of L1 speech
3.3. Post-hoc comparison of hallucination rate in L1 vs. L2 speech
4. Discussion
5. Conclusion
References
[Abstract]

키워드

저자정보

Seung-Eun Kim 김승은. Northwestern University, USA

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

6,700원

0개의 논문이 장바구니에 담겼습니다.

earticle