TY - GEN
T1 - Using the LARA Little Prince to compare human and TTS audio quality
AU - Akhlaghi, Elham
AU - Auðunardóttir, Ingibjörg Iða
AU - Bączkowska, Anna
AU - Bédi, Branislav
AU - Beedar, Hakeem
AU - Berthelsen, Harald
AU - Chua, Cathy
AU - Cucchiarini, Catia
AU - Habibi, Hanieh
AU - Horváthová, Ivana
AU - Ikeda, Junta
AU - Maizonniaux, Christele
AU - Chiaráin, Neasa Ní
AU - Raheb, Chadi
AU - Rayner, Manny
AU - Sloan, John
AU - Tsourakis, Nikos
AU - Yao, Chunlin
PY - 2022/6
Y1 - 2022/6
N2 - A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support second and foreign (L2) learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s Le petit prince, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2×2 cross product of the conditions {dialogue, not-dialogue} and {humour, not-humour}. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.
AB - A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support second and foreign (L2) learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s Le petit prince, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2×2 cross product of the conditions {dialogue, not-dialogue} and {humour, not-humour}. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.
KW - Text-To-Speech (TTS)
KW - Evaluation
KW - Multimodality
KW - Reading
KW - Emotion
KW - Computer assisted language learning
KW - CALL
KW - evaluation
KW - TTS
KW - emotion
KW - reading
KW - multimodality
UR - http://www.lrec-conf.org/proceedings/lrec2022/program.html
UR - https://lrec2022.lrec-conf.org/en/about/
UR - http://www.scopus.com/inward/record.url?scp=85144385319&partnerID=8YFLogxK
M3 - Conference contribution
T3 - 2022 Language Resources and Evaluation Conference, LREC 2022
SP - 2967
EP - 2975
BT - 2022 Language Resources and Evaluation Conference, LREC 2022
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Odijk, Jan
A2 - Piperidis, Stelios
PB - European Language Resources Association
CY - Paris, France
T2 - Language Resources and Evaluation Conference
Y2 - 20 June 2022 through 25 June 2022
ER -