Skip to main content
  • Dr. Eiman Mustafawi is currently an Associate Professor in the Department of English Literature and Linguistics and t... moreedit
AFFRICATION IN NORTH ARABIC REVISITED* Eiman Mustafawi Qatar University 1. Introduction One of the characteristics of North Arabic1 varieties is the affrication of the voiced velar stop [g] to Q] 2 (Johnstone 1967: 2), a process that is... more
AFFRICATION IN NORTH ARABIC REVISITED* Eiman Mustafawi Qatar University 1. Introduction One of the characteristics of North Arabic1 varieties is the affrication of the voiced velar stop [g] to Q] 2 (Johnstone 1967: 2), a process that is generally assumed to be triggered in the ...
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount... more
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseline acoustic models were fused together after two independent trainings to create a hybrid acoustic model. Pronunciation modeling was also hybrid by generating graphemic pronunciation variants as well as phonemic variants. Different techniques are proposed for pronunciation modeling to reduce model complexity. Experiments were conducted on large vocabulary news broadcast speech domain. The proposed hybrid approach has shown a relative reduction in WER of 8.8% to 12.6% based on pronunciation modeling settings and the supervision in the baseline systems.
Research Interests:
Talk presented at the 47th annual meeting of the North Atlantic Conference on Afroasiatic Linguistics (NACAL), 24-26 June 2019, Paris, France.
Talk presented at the Brill's Journal of Afroasiatic Languages and Linguistics International Conference, 14-16 November 2018, Nantes, France.
Poster presented at the 11th Annual Meeting of the Society for the Neurobiology of Language, 20-22 August 2019, Helsinki, Finland.
This study investigates the linguistic attitudes and perceptions of Qatar University students regarding the utility and vitality of the two languages that define the education and communication scenes in Qatar, namely, Arabic and English.... more
This study investigates the linguistic attitudes and perceptions of Qatar University students regarding the utility and vitality of the two languages that define the education and communication scenes in Qatar, namely, Arabic and English. It also reports on the predictors of these attitudes in terms of demographic traits. 861 students completed a questionnaire that was divided into: Media Language Preference (MLP); Value and Symbolism of Arabic (VSA); Arabic in Education and Society (AES); Medium of Instruction (MOI); Impact of Al-Jazeera Network (IJN); English in Scientific and Professional Communication (ESPC); Qatari Cultural Identity (QCI); Arabic Books (AB); English in Society and Work (ESW); Language in Workplace (LIW); Arabic in Employment (AE); Status of Arabic (SA); and Manifestations of Sociocultural Identity (MSI). Results showed that Arabic got higher ratings for MLP, VSA, AES, MOI, QCI, and MSI, while English was perceived as more useful than Arabic in ESPC. Correlation...
Poster presented at the 25th Architectures and Mechanisms of Language Processing (AMLaP) meeting, September 2019, Moscow, Russia.
Poster presented at the Society for the Neurobiology of Language conference 2019, Helsinki, Finland.
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio... more
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. We have collected more than 1,400 hours of conversational Arabic besides the corresponding human generated non-aligned transcriptions. Automatic audio segmentation is performed using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass. In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model ad...
— The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine... more
— The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine Colloquial Arabic (LCA). A fully continuous context dependent acoustic model was trained using 50 hours of speech from the BBN DARPA Babylon corpus. Pronunciation modeling was initially grapheme-based due to the absence of diacritic marks in transcriptions. Acoustic model parameters have been optimized including number of senones and Gaussians. In order to improve speech recognition accuracy, a cross-lingual hybrid acoustic and pronunciation modeling approach is proposed, where a MSA phoneme-based acoustic model is adapted using a small amount of LCA speech data. The adapted AM was then combined with the initial grapheme-based model to create a hybrid acoustic model. 1
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, we propose a trans-fer learning framework to jointly use large amount of Modern Standard Arabic (MSA) data and little... more
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, we propose a trans-fer learning framework to jointly use large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. We have chosen the Qatari Arabic (QA) dialect as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and sys-tem combination. The proposed approach can achieve more than 28 % relative reduction in WER.
Idrissi, Muralikrishnan et al. (2018). Poster presented at the CUNY Conference on Sentence Processing, University of California, Davis, USA.
We propose a framework for long audio alignment for conversational Arabic speech. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and... more
We propose a framework for long audio alignment for conversational Arabic speech. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. In this work, we have collected more than 1400 hours of conversational Arabic besides the corresponding non-aligned text transcriptions. Automatic segmentation is applied using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass applied on the current episode's segments. In second pass, a more restricted LM is generated for each segment, and unsupervised a...
Imperical research has shown that such single items constitute the majority of the other language material in most bilingual discourse, so grouping them with the wrong category may obscure the patterns of behavior of the true members of... more
Imperical research has shown that such single items constitute the majority of the other language material in most bilingual discourse, so grouping them with the wrong category may obscure the patterns of behavior of the true members of that category. For example, if lone words are categorized with codeswitches, their patterns of behavior may skew the patterns of behavior of the true codeswitches, which gives rise to theories of codeswitching which account poorly for the data (Ghafar-Samar & Meechan 1998, p. 206). Therefore, it is important to keep the status of lone words ambiguous until their patterns of behavior show similarity, to either established loanwords (borrowings) or unambiguous codeswitches (CSs). There are three different views in the field with respect to lone words. The first as reflected in work of Mahoutian (1993), Eliasson (1990), and Myers-Scotton (1992; 1993), does not distinguish between borrowing and codeswitching and attribute them to the same mechanism. The ...
Research Interests:
Textbooks in phonology often specify a distinction between segmental features (e.g., place and manner of articulation) vs. suprasegmental features (stress and phrasing). The distinction between segmental and suprasegmental features is... more
Textbooks in phonology often specify a distinction between segmental features (e.g., place and manner of articulation) vs. suprasegmental features (stress and phrasing). The distinction between segmental and suprasegmental features is useful even in autosegmental models like Articulatory Phonology, because it distinguishes between features shared by the different instantiations of a phoneme vs. those not so shared. In a model like Articulatory Phonology, however, there is no requirement that a segmental feature should be synchronous with the other features of the same segment. Classification results are provided from Levantine Arabic, showing that features of the primary articulator of a fricative are acoustically signaled during frication, but that features of the secondary articulator are signaled during the preceding and following vowels, suggesting that the definition of the word “segmental” should not require synchronous implementation.
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount... more
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseline acoustic models were fused together after two independent trainings to create a hybrid acoustic model. Pronunciation modeling was also hybrid by generating graphemic pronunciation variants as well as phonemic variants. Different techniques are proposed for pronunciation modeling to reduce model complexity. Experiments were conducted on large vocabulary news broadcast speech domain. The proposed hybrid approach has shown a relative reduction in WER of 8.8% to 12.6% based on pronunciation modeling settings and the supervision in the baseline systems.
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio... more
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. We have collected more than 1,400 hours of conversational Arabic besides the corresponding human generated non-aligned transcriptions. Automatic audio segmentation is performed using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass. In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model ad...
The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine... more
The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine Colloquial Arabic (LCA). A fully continuous context dependent acoustic model was trained using 50 hours of speech from the BBN DARPA Babylon corpus. Pronunciation modeling was initially grapheme-based due to the absence of diacritic marks in transcriptions. Acoustic model parameters have been optimized including number of senones and Gaussians. In order to improve speech recognition accuracy, a cross-lingual hybrid acoustic and pronunciation modeling approach is proposed, where a MSA phoneme-based acoustic model is adapted using a small amount of LCA speech data. The adapted AM was then combined with the initial grapheme-based model to create a hybrid acoustic model.
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount... more
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseline acoustic models were fused together after two independent trainings to create a hybrid acoustic model. Pronunciation modeling was also hybrid by generating graphemic pronunciation variants as well as phonemic variants. Different techniques are proposed for pronunciation modeling to reduce model complexity. Experiments were conducted on large vocabulary news broadcast speech domain. The proposed hybrid approach has shown a relative reduction in WER of 8.8% to 12.6% based on pronunciation modeling settings and the supervision in the baseline systems.
This study investigates the linguistic attitudes and perceptions of Qatar University students regarding the utility and vitality of the two languages that define the education and communication scenes in Qatar, namely, Arabic and English.... more
This study investigates the linguistic attitudes and perceptions of Qatar University students regarding the utility and vitality of the two languages that define the education and communication scenes in Qatar, namely, Arabic and English. It also reports on the predictors of these attitudes in terms of demographic traits. 861 students completed a questionnaire that was divided into: Media Language Preference (MLP); Value and Symbolism of Arabic (VSA); Arabic in Education and Society (AES); Medium of Instruction (MOI); Impact of Al-Jazeera Network (IJN); English in Scientific and Professional Communication (ESPC); Qatari Cultural Identity (QCI); Arabic Books (AB); English in Society and Work (ESW); Language in Workplace (LIW); Arabic in Employment (AE); Status of Arabic (SA); and Manifestations of Sociocultural Identity (MSI). Results showed that Arabic got higher ratings for MLP, VSA, AES, MOI, QCI, and MSI, while English was perceived as more useful than Arabic in ESPC. Correlation...
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little... more
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. The Qatari Arabic (QA) dialect has been chosen as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and system combination. The proposed approach can achieve more than 28% relative reduction in WER.
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, we propose a transfer learning framework to jointly use large amount of Modern Standard Arabic (MSA) data and little... more
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, we propose a transfer learning framework to jointly use large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. We have chosen the Qatari Arabic (QA) dialect as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and system combination. The proposed approach can achieve more than 28% relative reduction in WER.
Poster presented at the Society for the Neurobiology of Language Conference 2019, Helsinki, Finland.
... Publisher Login. Oink. E-mail this page 1. Dissertation Information. Title: An Optimality Theoretic Approach to Variable Consonantal Alternations in Qatari Arabic, Add Dissertation. Author: Eiman Mustafawi, Update Dissertation. Email:
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little... more
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. The Qatari Arabic (QA) dialect has been chosen as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and system combination. The proposed approach can achieve more than 28% relative reduction in WER.
... (1996). Awthahu l-masaalik ila alfiyat ibnu maalik, Vol. 4. Beirut: Al-maktabatu l-'asriya. Al-Mansour, Nasser. (1998). ... (1993). Duellng languages. Oxford: Clarendon Press. Myers-Scotton, Carol, Jake, Janice,... more
... (1996). Awthahu l-masaalik ila alfiyat ibnu maalik, Vol. 4. Beirut: Al-maktabatu l-'asriya. Al-Mansour, Nasser. (1998). ... (1993). Duellng languages. Oxford: Clarendon Press. Myers-Scotton, Carol, Jake, Janice, & Okasha, Maha. (1995). 'Arabic and constraints on codeswitching'. ...
To date, normative psycholinguistics research has mainly focused on establishing norms for producing databases for concrete words using standardized pictures, while abstract words have been subject to much less attention. Understandably,... more
To date, normative psycholinguistics research has mainly focused on establishing norms for producing databases for concrete words using standardized pictures, while abstract words have been subject to much less attention. Understandably, the fact that the first can be represented visually helps in formulating picture-naming tasks to elicit verbal identification for pictures representing nouns and verbs, which greatly contributes to language experiments in both theoretical and clinical studies. The present study argues for the equal importance of studies that aim to develop databases for abstract words, as language use is not restricted to picturable/concrete concepts. We provide norms for a set of 165 abstract nouns, 56 abstract verbs and 109 abstract adjectives, collected from healthy speakers of Arabic. Using rating tasks, norms for imageability, age of acquisition, and familiarity are established. Linguistic factors such as syllable length and phoneme length are also accounted fo...
Standardized pictorial stimuli and predictors of successful picture naming are not readily available for Gulf Arabic. On the basis of data obtained from Qatari Arabic, a variety of Gulf Arabic, the present study provides norms for a set... more
Standardized pictorial stimuli and predictors of successful picture naming are not readily available for Gulf Arabic. On the basis of data obtained from Qatari Arabic, a variety of Gulf Arabic, the present study provides norms for a set of 319 object pictures and a set of 141 action pictures. Norms were collected from healthy speakers, using a picture-naming paradigm and rating tasks. Norms for naming latencies, name agreement, visual complexity, image agreement, imageability, age of acquisition, and familiarity were established. Furthermore, the database includes other intrinsic factors, such as syllable length and phoneme length. It also includes orthographic frequency values (extracted from Aralex; Boudelaa & Marslen-Wilson, 2010). These factors were then examined for their impact on picture-naming latencies in object- and action-naming tasks. The analysis showed that the primary determinants of naming latencies in both nouns and verbs are (in descending order) image agreement, n...
We used event-related brain potentials to identify the neurophysiological responses of Arabic speakers to processing full and deflected agreement in plural noun-adjective constructions in Standard Arabic. Under full agreement, an... more
We used event-related brain potentials to identify the neurophysiological responses of Arabic speakers to processing full and deflected agreement in plural noun-adjective constructions in Standard Arabic. Under full agreement, an adjective fully agrees in number and gender with a preceding plural noun, but only when this noun is human, while it is systematically marked feminine singular when the noun is non-human under deflected agreement. We recorded grammaticality judgment and ERP responses from 32 speakers of Arabic to sentences violating full and deflected agreement and their well-formed counterparts. The participants were relatively fast and accurate in judging all the sentences, although violations, especially deflected agreement violations, were not always deemed ungrammatical. However, the ERP responses show a differential processing of human versus non-human violations. Violations of full agreement involving human nouns elicited larger N400 and P600 components than violatio...
Abstract: This paper reports on a pilot study (two experiments) that examined the knowledge of principles A and B of the Binding Theory in two Qatari-Arabic-speaking children using a truth-value judgment task and an act-out task. The... more
Abstract: This paper reports on a pilot study (two experiments) that examined the knowledge of principles A and B of the Binding Theory in two Qatari-Arabic-speaking children using a truth-value judgment task and an act-out task. The results have suggested that Principle A ...
In this chapter, a general description and discussion of the phonology of Arabic is presented. First, the sound system of Modern Standard Arabic (MSA) is briefly discussed and compared to the sound system of Classical Arabic (CA) as... more
In this chapter, a general description and discussion of the phonology of Arabic is presented. First, the sound system of Modern Standard Arabic (MSA) is briefly discussed and compared to the sound system of Classical Arabic (CA) as described by Classical Arabic grammarians. Differences, or possible differences, between the two are highlighted. The phonemic inventory of MSA is then compared to those of the other Arabic spoken varieties, conventionally classified into six main dialect groups from East to West: Gulf Arabic (GA), Iraqi Arabic (IA), Levantine Arabic (LA), Yemeni Arabic (YA), Egyptian Arabic (EA), and Maghrebi Arabic (MA). Comparisons among these varieties are drawn in terms of sound system, syllable structure, and stress patterns. Some sound alternations in different dialects are analyzed within an OCP driven framework. Additionally, processes such as assimilation, affrication, lenition, and pharyngealization are discussed. Occasionally, reference is made to specific dialects within the six major groups, and new data are presented from less studied dialects.
In this chapter, a general description and discussion of the phonology of Arabic is presented. First, the sound system of Modern Standard Arabic (MSA) is briefly discussed and compared to the sound system of Classical Arabic (CA) as... more
In this chapter, a general description and discussion of the phonology of Arabic is presented. First, the sound system of Modern Standard Arabic (MSA) is briefly discussed and compared to the sound system of Classical Arabic (CA) as described by Classical Arabic grammarians. Differences, or possible differences, between the two are highlighted. The phonemic inventory of MSA is then compared to those of the other Arabic spoken varieties, conventionally classified into six main dialect groups from East to West: Gulf Arabic (GA), Iraqi Arabic (IA), Levantine Arabic (LA), Yemeni Arabic (YA), Egyptian Arabic (EA), and Maghrebi Arabic (MA). Comparisons among these varieties are drawn in terms of sound system, syllable structure, and stress patterns. Some sound alternations in different dialects are analyzed within an OCP driven framework. Additionally, processes such as assimilation, affrication, lenition, and pharyngealization are discussed. Occasionally, reference is made to specific dialects within the six major groups, and new data are presented from less studied dialects.
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little... more
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. The Qatari Arabic (QA) dialect has been chosen as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and system combination. The proposed approach can achieve more than 28% relative reduction in WER.
Research Interests:
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio... more
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. We have collected more than 1,400 hours of conversational Arabic besides the corresponding human generated non-aligned transcriptions. Automatic audio segmentation is performed using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass. In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model adaptation is applied. The recognizer output is aligned with the processed transcriptions using Levenshtein algorithm. The proposed approach resulted in an initial alignment accuracy of 97.8-99.0% depending on the amount of disfluencies. A confidence scoring metric is proposed to accept/reject aligner output. Using confidence scores, it was possible to reject the majority of mis-aligned segments resulting in alignment accuracy of 99.0-99.8% depending on the speech domain and the amount of disfluencies.

And 6 more

Research Interests: