Using the LARA Little Prince to compare human and TTS audio quality

Elham Akhlaghi, Ingibjörg Iða Auðunardóttir, Anna Bączkowska, Branislav Bédi, Hakeem Beedar, Harald Berthelsen, Cathy Chua, Catia Cucchiarini, Hanieh Habibi, Ivana Horváthová, Junta Ikeda, Christele Maizonniaux, Neasa Ní Chiaráin, Chadi Raheb, Manny Rayner, John Sloan, Nikos Tsourakis, Chunlin Yao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)
85 Downloads (Pure)

Abstract

A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support second and foreign (L2) learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s Le petit prince, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2×2 cross product of the conditions {dialogue, not-dialogue} and {humour, not-humour}. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.
Original languageEnglish
Title of host publication2022 Language Resources and Evaluation Conference, LREC 2022
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis
Place of PublicationParis, France
PublisherEuropean Language Resources Association
Pages2967-2975
Number of pages9
ISBN (Electronic)9791095546726
Publication statusPublished - Jun 2022
EventLanguage Resources and Evaluation Conference - Marseille, France
Duration: 20 Jun 202225 Jun 2022

Publication series

Name2022 Language Resources and Evaluation Conference, LREC 2022

Conference

ConferenceLanguage Resources and Evaluation Conference
Abbreviated titleLREC 2022
Country/TerritoryFrance
CityMarseille
Period20/06/2225/06/22

Keywords

  • Text-To-Speech (TTS)
  • Evaluation
  • Multimodality
  • Reading
  • Emotion
  • Computer assisted language learning
  • CALL
  • evaluation
  • TTS
  • emotion
  • reading
  • multimodality

Fingerprint

Dive into the research topics of 'Using the LARA Little Prince to compare human and TTS audio quality'. Together they form a unique fingerprint.

Cite this