Submit to Special Issue Submit Abstract to Special Issue Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Computational Linguistics: From Text to Speech Technologies

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2025 | Viewed by 4749

Share This Special Issue

Special Issue Editors

Prof. Dr. Gloria Corpas Pastor

E-Mail Website
Guest Editor

Research Institute on Multilingual Language Technologies, Department of Translation and Interpreting, University of Malaga, 29016 Málaga, Spain
Interests: corpus linguistics; machine interpreting; speech-to-text; translation and interpreting technologies; computational phraseology

Dr. Tharindu Ranasinghe

E-Mail Website
Guest Editor

UCREL, Lancaster University, Lancaster LA1 4WA, UK
Interests: computational linguistics; natural language processing; machine translation; quality estimation

Special Issue Information

Dear Colleagues,

In recent years, advancements in machine learning, natural language processing, artificial intelligence, and speech synthesis have revolutionized how we communicate with other humans and language-based systems. From virtual assistants to language translation tools, the capabilities of these technologies continue to expand, offering new possibilities for communication, accessibility, and innovation.

This Special Issue serves as a platform to explore the latest research, methodologies, and applications that drive the development of various text-to-speech technologies, such as automatic speech recognition, machine interpreting, speech translation, and speech-to-text software, among others. The Special Issue is intended for researchers, practitioners, and enthusiasts in the fields of computational linguistics, corpus linguistics, natural language processing, and machine learning. We invite research studies based on neural network architectures, large language models, linguistic modeling, AI-driven systems, and the intersection of linguistics and computer science (including multilingual communication). We would also like to invite authors to address the challenges in applying text-to-speech technologies in practical applications, low-resource languages, and specific domains.

Prof. Dr. Gloria Corpas Pastor
Dr. Tharindu Ranasinghe
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

artificial intelligence (AI)
automatic speech recognition (ASR)
machine interpreting (MI)
cascaded models
end2end models
speech-to-text (STT) modelling
speech translation
quality estimation
large language models (LLMs)

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

17 pages, 1448 KiB

Open AccessArticle

Fit for What Purpose? NER Certification of Automatic Captions in English and Spanish

by Pablo Romero-Fresco and Yanou Van Gauwbergen

Appl. Sci. 2025, 15(3), 1387; https://doi.org/10.3390/app15031387 - 29 Jan 2025

Viewed by 611

Abstract

As human and fully automatic live captioning methods coexist and compete against one another, quality analyses and certification become essential. A case in point is LiRICS, the Live Respeaking International Certification Standard created by the Galician Observatory for Media Accessibility (GALMA) to help maintain high international standards in the live captioning profession. Until now, this certification had only been used to assess human captioners. In this paper, it is applied for the first time to automatic captioning (more specifically to Lexi, the automatic software used by the leading captioning company AI-Media) in order to ascertain whether automatic captions have reached an accuracy level that can match that of human captions. After presenting the materials and the methods (NER model), the paper reports on the results of the analysis of Lexi’s English and Spanish automatic captions. With average accuracy rates of 98.56% in English and 98.26% in Spanish, these captions often manage to reach human levels of quality, except when applied to colloquial content featuring several speakers. A final discussion is devoted to a reflection on how automatic and human live captions can coexist as long as the different purposes they serve are considered, namely the access in bulk provided by automatic captions and the curated access offered by human captions. Full article

(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)

► Show Figures

Figure 1

24 pages, 432 KiB

Open AccessArticle

Sequence-to-Sequence Models and Their Evaluation for Spoken Language Normalization of Slovenian

by Mirjam Sepesy Maučec, Darinka Verdonik and Gregor Donaj

Appl. Sci. 2024, 14(20), 9515; https://doi.org/10.3390/app14209515 - 18 Oct 2024

Viewed by 785

Abstract

Sequence-to-sequence models have been applied to many challenging problems, including those in text and speech technologies. Normalization is one of them. It refers to transforming non-standard language forms into their standard counterparts. Non-standard language forms come from different written and spoken sources. This paper deals with one such source, namely speech from the less-resourced highly inflected Slovenian language. The paper explores speech corpora recently collected in public and private environments. We analyze the efficiencies of three sequence-to-sequence models for automatic normalization from literal transcriptions to standard forms. Experiments were performed using words, subwords, and characters as basic units for normalization. In the article, we demonstrate that the superiority of the approach is linked to the choice of the basic modeling unit. Statistical models prefer words, while neural network-based models prefer characters. The experimental results show that the best results are obtained with neural architectures based on characters. Long short-term memory and transformer architectures gave comparable results. We also present a novel analysis tool, which we use for in-depth error analysis of results obtained by character-based models. This analysis showed that systems with similar overall results can differ in the performance for different types of errors. Errors obtained with the transformer architecture are easier to correct in the post-editing process. This is an important insight, as creating speech corpora is a time-consuming and costly process. The analysis tool also incorporates two statistical significance tests: approximate randomization and bootstrap resampling. Both statistical tests confirm the improved results of neural network-based models compared to statistical ones. Full article

(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)

► Show Figures

Figure 1

14 pages, 433 KiB

Open AccessArticle

Automatic Speech Recognition Advancements for Indigenous Languages of the Americas

by Monica Romero, Sandra Gómez-Canaval and Ivan G. Torre

Appl. Sci. 2024, 14(15), 6497; https://doi.org/10.3390/app14156497 - 25 Jul 2024

Cited by 1 | Viewed by 1381

Abstract

Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities in America. The Second AmericasNLP Competition Track 1 of NeurIPS 2022 proposed the task of training automatic speech recognition (ASR) systems for five Indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana. In this paper, we describe the fine-tuning of a state-of-the-art ASR model for each target language, using approximately

36.65

h of transcribed speech data from diverse sources enriched with data augmentation methods. We systematically investigate, using a Bayesian search, the impact of the different hyperparameters on the Wav2vec2.0 XLS-R variants of 300 M and 1 B parameters. Our findings indicate that data and detailed hyperparameter tuning significantly affect ASR accuracy, but language complexity determines the final result. The Quechua model achieved the lowest character error rate (CER) (

12.14

), while the Kotiria model, despite having the most extensive dataset during the fine-tuning phase, showed the highest CER (

36.59

). Conversely, with the smallest dataset, the Guarani model achieved a CER of

15.59

, while Bribri and Wa’ikhana obtained, respectively, CERs of

34.70

and

35.23

. Additionally, Sobol’ sensitivity analysis highlighted the crucial roles of freeze fine-tuning updates and dropout rates. We release our best models for each language, marking the first open ASR models for Wa’ikhana and Kotiria. This work opens avenues for future research to advance ASR techniques in preserving minority Indigenous languages. Full article

(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)

► Show Figures

Figure 1

17 pages, 791 KiB

Open AccessArticle

Using Transfer Learning to Realize Low Resource Dungan Language Speech Synthesis

by Mengrui Liu, Rui Jiang and Hongwu Yang

Appl. Sci. 2024, 14(14), 6336; https://doi.org/10.3390/app14146336 - 20 Jul 2024

Viewed by 1193

Abstract

This article presents a transfer-learning-based method to improve the synthesized speech quality of the low-resource Dungan language. This improvement is accomplished by fine-tuning a pre-trained Mandarin acoustic model to a Dungan language acoustic model using a limited Dungan corpus within the Tacotron2+WaveRNN framework. Our method begins with developing a transformer-based Dungan text analyzer capable of generating unit sequences with embedded prosodic information from Dungan sentences. These unit sequences, along with the speech features, provide <unit sequence with prosodic labels, Mel spectrograms> pairs as the input of Tacotron2 to train the acoustic model. Concurrently, we pre-trained a Tacotron2-based Mandarin acoustic model using a large-scale Mandarin corpus. The model is then fine-tuned with a small-scale Dungan speech corpus to derive a Dungan acoustic model that autonomously learns the alignment and mapping of the units to the spectrograms. The resulting spectrograms are converted into waveforms via the WaveRNN vocoder, facilitating the synthesis of high-quality Mandarin or Dungan speech. Both subjective and objective experiments suggest that the proposed transfer learning-based Dungan speech synthesis achieves superior scores compared to models trained only with the Dungan corpus and other methods. Consequently, our method offers a strategy to achieve speech synthesis for low-resource languages by adding prosodic information and leveraging a similar, high-resource language corpus through transfer learning. Full article

(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)

► Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-4

Journal Menu

Journal Browser

Computational Linguistics: From Text to Speech Technologies

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (4 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI