Heba salama

Building a spoken Arabic corpus for Egyptian children: data collection and transcription

The present thesis aims to construct a spoken Arabic corpus for Egyptian children. The corpus is ... more The present thesis aims to construct a spoken Arabic corpus for Egyptian children. The corpus is special in several ways. It is based on a spontaneous conversation between children from 1; 6 to 4 years, parents and/or researcher. It is a collection of longitudinal child language data based on 10 children. It is a large corpus with about 25,645 words based on 5 hours of recordings and 330hours of transcription for 6 GB. It is a corpus that uses an international platform. This platform is the child language data exchange system (CHILDES) for transcribing and coding the recordings into CHILDS system database. The goals of this thesis are to introduce data collection and transcription, present annotation for children&#39;s speech samples, and show certain findings based on the corpus. This work also has five main contributions: (1) It is the first Egyptian corpus for children. (2) It constitutes a corpus that compiles to CHILDES database. (3) All the files are in a standard transcription CHAT format. (4) It implements part of speech tag (POS) especially a morphologically annotated corpus of spoken Arabic child language. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. (5) It transcripts data providing certain morphological analysis, such as mean length of utterance (MLU) counts, as well as lexical analysis, such as frequency (FREQ) count.

Developing a Framework for a Remote, International Research Collaboration Among Graduate Students: Lessons Learned During the COVID-19 Pandemic

Perspectives of the ASHA Special Interest Groups, 2021

Purpose: This article describes a framework for developing international research collaborations ... more Purpose: This article describes a framework for developing international research collaborations among graduate students. Central to this framework is the utility of institutional and association-based academic mentorship programs in developing collaborative partnerships. We illustrate how the American Speech-Language-Hearing Association's Mentoring Academic Research Careers program served as a vehicle for fostering remote collaboration and provided training experiences for graduate students during the COVID-19 pandemic. Conclusions: This model successfully supported doctoral students in developing an ongoing and sustainable research partnership during a challenging time when in-person networking opportunities were unavailable. This partnership provided a unique pathway for professional development that complemented formal academic training. More broadly, international collaboration experiences such as these provide valuable, skill-based training for all students, such that they...

Download

Lexical Growth in Egyptian Arabic Speaking Children: A corpus Based Study

The Egyptian Journal of Language Engineering, 2017

Download

Building a POS-Annotated Corpus For Egyptian Children

The Egyptian Journal of Language Engineering, 2016

Download

Developing a Framework for a Remote, International Research Collaboration Among Graduate Students: Lessons Learned During the COVID-19 Pandemic

Developing a Framework for a Remote, International Research Collaboration Among Graduate Students: Lessons Learned During the COVID-19 Pandemic, 2021

This article describes a framework for developing international research collaborations among gra... more This article describes a framework for developing international research collaborations among graduate students. Central to this framework is the utility of institutional and association-based academic mentorship programs in developing collaborative partnerships. We illustrate how the American Speech-Language-Hearing Association's Mentoring Academic Research Careers program served as a vehicle for fostering remote collaboration and provided training experiences for graduate students during the COVID-19 pandemic. Conclusions: This model successfully supported doctoral students in developing an ongoing and sustainable research partnership during a challenging time when in-person networking opportunities were unavailable. This partnership provided a unique pathway for professional development that complemented formal academic training. More broadly, international collaboration experiences such as these provide valuable, skill-based training for all students, such that they are better equipped to serve diverse populations and as members of diverse teams. We offer recommendations for others endeavoring to develop international collaboration initiatives for students paired with mentorship.

Download

A Morphologically-Analyzed Corpus for Egyptian Child Language

The main objective of this paper is to develop a morphological analyzer MOR grammar that is speci... more The main objective of this paper is to develop a morphological analyzer MOR grammar that is specifically developed for Egyptian Arabic corpus analysis. In child language studies, morphological analysis is a valuable tool for both researchers and clinicians. This corpus aims to contribute future language research, child language acquisition theory, and psycholinguistics in general. Hand-tagging is time consuming and subject to human error. As such, the MOR program for automatic analysis and part-of-speech tagging was introduced into the CHILDES system to address these problems. To date, MOR analysis programs have been constructed for 11 languages. Once a child language corpus has been automatically tagged by MOR, it is then possible to automate various systems for the assessment and identification of disorders in child language. The annotated corpora will support three different kinds of investigations: 1) Basic child language developmental research that examines the sequence of acquisition of grammatical morphemes. as well as the various morphosyntactic processes of the language, 2) Computational modeling of child language acquisition in Egyptian Arabic, and 3) Diagnosis of language differences and disorders through methods such as automation of IPSYN or DSS scores. This Egyptian Arabic child language corpus consists of 8,467 utterances comprising 31,305 word-tokens (5594 word-types) from 10 children between the ages of 1;7 to 3;7. This paper discusses methods for preparing data for MOR analysis and developing MOR grammar. Additionally, the content outlines rules for controlling allomorphy and morpheme concatenation. The emergence of this new line of work is a step forward in child language research in Egyptian Arabic.

Download

Child language corpus

Abstract— This paper aims to build a spoken Arabic corpus for Egyptian children. This corpus is ... more Abstract— This paper aims to build a spoken Arabic corpus for Egyptian children. This corpus is special in many ways. It is the first corpus of a spoken Arabic for Egyptian children. It is a collection of longitudinal child language data. It is a speech- based corpus transcribed from recordings of spontaneous conversations. This spontaneous speech transcribed later using the CHAT format as described in the CHILDES (Child Language Exchange System) database. It provides data in consistent fully documented transcription system. The corpus text files transcribed from 10 children (5 boys - 5 girls) aged in range from 1.6 (one year and half) to 4 years with about 5 hour recordings. We obtained audio recording of thirty minutes spontaneous speech which produced by children in natural settings. The children divided into five Age groups according to their age. Each group was increase by five months. The recording of children was by a transcriber and parents. The transcripts of spoken interactions provide a vast amount of useful data for linguistic, psychological, and sociological studies of child language. Audio data presented in WAV file format. Broad phonemic transcription was manually by using CHILDES Unicode and chat program codes of transcription. Transcription based on orthographic conventions of English using IPA symbols. Approximately 15 GB of audio file transcribed. The size of the corpus is nearly 25,645 utterances based on audio files by 10 children.
Key words: child corpus, CHILDES database.

Download

Child language corpus

Download

AMorphological Analyzed Corpus

language eneering conferance, 2018

A Morphological Analyzed Corpus for Egyptian Child Language Heba Salama Phonetics and linguistic... more A Morphological Analyzed Corpus for
Egyptian Child Language
Heba Salama
Phonetics and linguistics Department, Faculty of Arts Alexandria University
Abstract: The main focus of the present paper is on the construction of a MOR grammar for Egyptian children. We also introduce a morphological analyzer that was specifically developed for Egyptian children corpus. Morphological analysis is so crucial for child language studies. The corpus will be an instrument for future investigations of language development, child language acquisition theory and psycholinguistics in general. Egyptian children hand tagging of child transcript is time-consuming and error-prone. The MOR program for automatic analysis and part of speech tagging was introduced into the CHILDES system to address these problems. To date, MOR analysis programs have been constructed for 11 languages that do not include Arabic; the Arabic language do not have yet a MOR lexicons. Thus, one must create a system of part of speech using the automatic tools provided by CLAN. Once a child language corpus has been automatically tagged by MOR, it is then possible to automate various systems for assessment and diagnosis of child language. The annotated corpora will support three different kinds of investigation: 1) Basic child language developmental research that examines the sequence of acquisition of grammatical morphemes and the various morphosyntactic processes of the language. 2) Computational modeling of child language acquisition, and 3) Diagnosis of language differences and disorders through methods such as automation of IPSYN or DSS scores. The corpus data contained 26,700 words, all the data are transcribed in CHAT. The paper discusses methods for preparing data for MOR analysis and developing MOR grammars. Describe the shape of rules for controlling Allomorphy and morpheme concatenation. Finally the Egyptian morphological analysis is ill-starred and followed by discussion and future work. The emergence of this new line of work represents a major step forward for Egyptian child language research.

Key words: Morphosyntax, part of speech tagging, child language, MOR grammar

بناء مدونة لغوية محللة صرفيا في عربية الاطفال المصريين
هبه سلامة
كلية الاداب- قسم الصوتيات واللغويات- جامعة اسكندرية
تهدف الدراسة الي بناء مدونة محللة صرفيا للأطفال المصريين،حيث أن التحليل المورفولوجي مهم للغاية في دراسة لغة الأطفال وستكون هذه الاداة في عمل أبحاث مستقبليةعن نظرية اكتساب الطفل للغة وعلم اللغة النفسي. يستغرق عمل المحلل الصرفي الكثيرمن الوقت والجهد والأخطاء لذلك فان انشاء محلل صرفي أليMOR الذي سوف يساعد في التغلب علي هذه المشاكل. حتي الان تم بناء برنامج MOR ل 11 لغة لا تتضمن العربية المصرية.ويجب انشاء محلل صرفي باستخدام الادوات الموجودة ببرنامج. بمجرد أن يتم عمل محلل صرفي تقوم العديد من البرامج التحليلية التي تعمل علي تقييم وتحليل لغة الأطفال.تساعد المدونات المحللة صرفيا 3 تحليلات اساسية :1) تحليل تطور النحو الصرفي للاطفال.2) النمذجة الحاسوبية لإكتساب الطفل للغة. 3) تشخيص الإختلافات والإضطرابات اللغوية من خلال أدوات برنامج CLAN. تحتوي المدونة علي 26,700 كلمة تم إعدادها بإستخدام .CHATوتتناول الدراسة طرق إعداد المحلل الصرفي والقواعد المستخدمة ويمثل ظهور هذا الاتجاه من البرنامج خطوة كبيرة للأمام في بحث اللغة المصرية للأطفال.

Download

A Morphological Analyzed Corpus for Egyptian Child Language

Download

Building a POS-annotated corpus for Egyptian children.docx

In this paper, we present an attempt at developing a POS annotated corpus for Egyptian children. ... more In this paper, we present an attempt at developing a POS annotated corpus for Egyptian children. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. This is an initial annotated corpus for Egyptian children. It implements part of speech tag (POS) especially a morphologically annotated corpus of spoken Arabic child language. POS are made in "%mor" 'morphology' tiers manually. Coding language transcripts for computer analysis is a daunting task. It approximately took 170 hours, and thus manual annotation focused on a particular child. The POS coding process started with a purely manually annotation of 2701words. 1380 words annotated for an adult and 1321 annotated words for the child was handled. Annotated child language proved to be challenging, and time consuming task. The MOR grammar exists in many languages, such as English, French, German, Japanese, Cantonese, Hebrew, and they are generated automatically, the CLAN has the automatic coding system "MOR program". In Egyptian Arabic, this is not applied for two reasons. First, there is no previous Egyptian Arabic work done on a constructing system for such a representation. Second, morphology of Egyptian Arabic is very rich and different from other languages. Thus, their rules cannot be applied to Arabic. In the two Arabic studies of Qatari and Emirati languages, semi-automatic and mini automatic MOR is used. Finally, certain applications of linguistic analysis commands are provided by using CLAN software. The analyses include frequency counts, word searches, co-occurrence analyses; MLU (mean length of utterance) counts and analyzes specified pairs of utterances. Transcript data provide some morphological analysis, such as mean length of utterance (MLU) counts, lexical analysis, such as frequency (FREQ) count, syntactic analysis, such as searching the data for specified combinations of words or complex string patterns (COMBO) count, as well as the discourse and interactional analysis, such as analyzes specified pairs of utterances (CHIP) count.
Key words: POS annotated corpus, CHILDES database

Download

Lexical Growth in Egyptian Arabic Speaking Children: A corpus Based Study

— This paper calculate developmental index of language growth in Egyptian Arabic based on a corpu... more — This paper calculate developmental index of language growth in Egyptian Arabic based on a corpus consists of spontaneous speech sample from 10 children 5 boys and 5 girls from 1.7 to 4 years. Depended on 30 minutes transcripts of spontaneous speech production the following properties of collected data were analysed: size of vocabulary and frequency of word use in relation to age (development of types and tokens) and individual differences in vocabulary size. The contribution of the current study lies in the use of vocabulary profile results as a measure of potential indicators of developmental language delay. The results provide a new measurement tool for lexical growth at different developmental stages.

Download

Building a spoken Arabic corpus for Egyptian children: Data collection and transcription

alexandria university, 2015

The present thesis aims to construct a spoken Arabic corpus for Egyptian children. The corpus is ... more The present thesis aims to construct a spoken Arabic corpus for Egyptian children. The corpus is special in several ways. It is based on a spontaneous conversation between children from 1; 6 to 4 years, parents and/or researcher. It is a collection of longitudinal child language data based on 10 children. It is a large corpus with about 25,645 words based on 5 hours of recordings and 330hours of transcription for 6 GB. It is a corpus that uses an international platform. This platform is the child language data exchange system (CHILDES) for transcribing and coding the recordings into CHILDS system database. The goals of this thesis are to introduce data collection and transcription, present annotation for children's speech samples, and show certain findings based on the corpus. This work also has five main contributions:
(1) It is the first Egyptian corpus for children.
(2) It constitutes a corpus that compiles to CHILDES database.
(3) All the files are in a standard transcription CHAT format.
(4) It implements part of speech tag (POS) especially a morphologically annotated corpus of spoken Arabic child language. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage.
(5) It transcripts data providing certain morphological analysis, such as mean length of utterance (MLU) counts, as well as lexical analysis, such as frequency (FREQ) count.

Download

abstract 1500 english-2.doc

The present thesis aims to construct a spoken Arabic corpus for Egyptian children. The corpus is ... more The present thesis aims to construct a spoken Arabic corpus for Egyptian children. The corpus is special in several ways. It is based on a spontaneous conversation between children, 1; 6 to 4 years, parents and/or researcher from. It is a collection of longitudinal child language data based on 10 children. It is a large corpus with about 25,645 words based on 5 hours of recordings and 330hours of transcription for 6 GB. It is a corpus that uses an international platform. This platform is the child language data exchange system (CHILDES) for transcribing and coding the recordings into CHILDS system database. The goals of this thesis are to introduce data collection and transcription, present annotation for children's speech samples, and show certain findings based on the corpus. This work also has five main contributions:
(1) It is the first Egyptian corpus for children.
(2) It constitutes a corpus that compiles to CHILDES database.
(3) All the files are in a standard transcription CHAT format.
(4) It implements part of speech tag (POS) especially a morphologically annotated corpus of spoken Arabic child language. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage.
(5) It transcripts data providing certain morphological analysis, such as mean length of utterance (MLU) counts, as well as lexical analysis, such as frequency (FREQ) count, and (COMBO) for specified combinations of words or character strings, and finally discourse and interaction analysis, such as analyzes specified pairs of utterances (CHIP) count.
Building child corpora rose as the revolution of computer technology and tools. Thirty-two languages around the world build their child corpora and contribute them to CHILDES database. As for the Arab countries, only Qatar and Emirates, built their child language corpus database and make them available through websites. Unfortunately, child language corpora trend not yet found in Egyptian Arabic, urging the researcher to build one. One first need to develop Corpora of child languages is its significance for psycholinguistic research. Second, linguistic annotation of the corpora is needed to provide researchers with better means of exploring the development of grammatical constructions and their usage and performs linguistic analysis. Third, standardized tests and experiments can teach us a great deal about language disorders, though not enough in themselves.

Download

A Morphological Analyzed Corpus for Egyptian Child Language

language engenner journal, 2018

The main focus of the present paper is on the construction of a MOR grammar for Egyptian children... more The main focus of the present paper is on the construction of a MOR grammar for Egyptian children. We also introduce a morphological analyzer that was specifically developed for Egyptian children corpus. Morphological analysis is so crucial for child language studies. The corpus will be an instrument for future investigations of language development, child language acquisition theory and psycholinguistics in general. Egyptian children hand tagging of child transcript is time-consuming and error-prone, the MOR program for automatic analysis and part of speech tagging was introduced into the CHILDES system to address these problems. To date, MOR analysis programs have been constructed for 11 languages. Once a child language corpus has been automatically tagged by MOR, it is then possible to automate various systems for assessment and diagnosis of child language. The annotated corpora will support three different kinds of investigation: 1) Basic child language developmental research that examines the sequence of acquisition of grammatical morphemes and the various morphosyntactic processes of the language , 2) Computational modeling of child language acquisition , and 3) Diagnosis of language differences and disorders through methods such as automation of IPSYN or DSS scores. The corpus data contained 26,700 words, all the data are transcribed in CHAT. The paper discusses methods for preparing data for MOR analysis and developing MOR grammars. Describe the shape of rules for controlling Allomorphy and morpheme concatenation. The emergence of this new line of work represents a major step forward for Egyptian child language research.

Download

Uploads

Papers by Heba salama

Thesis Chapters by Heba salama

Books by Heba salama

Log In