Skip to main content

    Yahya Mohamed Elhadj

    The majority of successful automatic speech recognition (ASR) systems utilize a probabilistic modeling of the speech signal via hidden Markov models (HMMs). In a standard HMM model, state duration probabilities decrease exponentially with... more
    The majority of successful automatic speech recognition (ASR) systems utilize a probabilistic modeling of the speech signal via hidden Markov models (HMMs). In a standard HMM model, state duration probabilities decrease exponentially with time, which fails to satisfactorily describe the temporal structure of speech. Incorporating explicit state durational probability distribution functions (pdf) into the HMM is a famous solution to overcome this feebleness. This way is well-known as a hidden semi-Markov model (HSMM). Previous papers have confirmed that using HSMM models instead of the standard HMMs have enhanced the recognition accuracy in many targeted languages. This paper addresses an important stage of our on-going work which aims to construct an accurate Arabic recognizer for teaching and learning purposes. It presents an implementation of an HSMM model whose principal goal is improving the classical HMM's durational behavior. In this implementation, the Gaussian distribution is used for modeling state durations. Experiments have been carried out on a particular Arabic speech corpus collected from recitations of the Holy Quran. Results show an increase in recognition accuracy by around 1% We confirmed via these results that such a system outperforms the baseline HTK when the Gaussian distribution is integrated into the HTK's recognizer back-end.
    This paper is part of an ongoing work aiming to build an accurate Arabic sounds recognizer for teaching and learning purposes. Early phases of this work were dedicated to the development of a particular sound database from recitations of... more
    This paper is part of an ongoing work aiming to build an accurate Arabic sounds recognizer for teaching and learning purposes. Early phases of this work were dedicated to the development of a particular sound database from recitations of the Holy Quran to cover classical Arabic sounds; speech signals of this sound database were manually segmented and labelled on three levels: word, phoneme, and allophone. Next, two baseline recognizers were built to validate the speech segmentation on both phoneme and allophone levels and also to test the feasibility of the sounds' recognition intended target. This current phase considers the development of an elaborated recognizer, by considering the basic sounds and looking for their distinctive features (e.g. duration, energy, etc.) to determine which ones will be particularly helpful to identify the phonological variation of the basic sound. Here, we present the first results of the basic sounds recognition obtained so far. INTRODUCTION Auto...
    ملخص رغم التطور الكبير الذي يشهد ه العالم اليوم في مجال استخدامات الحاسب وعلومه إلا أن الأعمال الحاسوبية الموجهة لخدمة الدين الإسلامي وخصوصا القرآن الكريم مازالت قليلة جدا ولا تتماشى مع أهمية القرآن في حياة المسلمين. فالقرآن هو معجزة... more
    ملخص رغم التطور الكبير الذي يشهد ه العالم اليوم في مجال استخدامات الحاسب وعلومه إلا أن الأعمال الحاسوبية الموجهة لخدمة الدين الإسلامي وخصوصا القرآن الكريم مازالت قليلة جدا ولا تتماشى مع أهمية القرآن في حياة المسلمين. فالقرآن هو معجزة الإسلام الخالدة، تعاليمه تدعو إلى إقامة العدل بين الناس وتحثهم على التفكير والتأمل وتقدم لهم الحلول والتوجيهات وفق منهج رباني يفرد العبودية لله جل وعلا. وقد أنزل الله القرآن ليتلى ويتدبر ويعمل به، ونبه الرسول صلى الله عليه وسلم على فضله وفضل تعليمه وتعلمه، كما في الحديث "خيركم من تعلم القرآن وعلمه". من هذا المنطلق انبثقت فكرة مشروع "التعلم الآلي للقرآن الكريم" الذي يهدف إلى تحسين أساليب حوسبة القرآن الكريم وعلومه من خلال تطبيق جملة من الوسائل المستخدمة في أنظمة تقنية المعلومات وأدوات الذكاء الاصطناعي. تقدم هذه الورقة جزء من هذا المشروع يختص ببناء نظام تفاعلي للتحفيظ الآلي للقرآن الكريم، يمكن استخدامه في المؤسسات التعليمية وفي الحلقات الخاصة في المساجد أو البيوت. وسيكون بحول الله عونا للراغبين في حفظ كتاب الله وتدارسه ممن لا تتوفر لهم...
    This paper presents the different machine translation approaches. We distinguish two classes: the classical approaches (Direct, Transfer-Based and Interlingua) and the corpus-based approaches (Memory, Example and Statistical). We describe... more
    This paper presents the different machine translation approaches. We distinguish two classes: the classical approaches (Direct, Transfer-Based and Interlingua) and the corpus-based approaches (Memory, Example and Statistical). We describe languages challenges such as morphology, syntax and structure. We illustrate a previous systems related to foreign (American, British and European) sign language machine translation. We list a review of the most important works related to Arabic Sign Language (ArSL) machine translation.
    Recently, there has been a great interest in Islamic software that try to harness computer to serve the religion. This brought about some applications and programs for the Holy Quran and its sciences, Hadith " ثيدѧحلا"... more
    Recently, there has been a great interest in Islamic software that try to harness computer to serve the religion. This brought about some applications and programs for the Holy Quran and its sciences, Hadith " ثيدѧحلا" (Prophet’s Tradition) and its methodology, Fiqh "هѧقفلا " (Islamic jurisdiction), and Islamic law in general. However, these computer-programs, especially those developed for the sake of the noble Quran are still limited and have been focused on a direct application of the Information Technology techniques, such as storing, listening, searching, etc, without using more elaborated techniques in the domain. To contribute in the improvement of these efforts, a project for Computerized Teaching of the Holy Quran (CTHQ) is initiated. It aims to introduce advanced techniques and methodologies to develop an appropriate environment for self learning of the Holy Quran and its sciences. Different sub-systems are being developed separately and will then be co...
    Abstract: Problem statement: This study presented the development of an Arabic part-of-speech tagger that can be used for analyzing and annotating traditional Arabic texts, especially the Quran text. Approach: It is a part of a project... more
    Abstract: Problem statement: This study presented the development of an Arabic part-of-speech tagger that can be used for analyzing and annotating traditional Arabic texts, especially the Quran text. Approach: It is a part of a project related to the computerization of the Holy Quran. One of the main objectives in this project was to build a textual corpus of the Holy Quran. Results: Since an appropriate textual version of the Holy Quran was prepared and morphologically analyzed in other stages of this project, we focused in this work on its annotation by developing and using an appropriate tagger. The developed tagger employed an approach that combines morphological analysis with Hidden Markov Models (HMMs) based-on the Arabic sentence structure. The morphological analysis is used to reduce the size of the tags lexicon by segmenting Arabic words in their prefixes, stems and suffixes; this is due to the fact that Arabic is a derivational language. On another hand, HMM is used to rep...
    This paper is part of an ongoing work aiming to build an accurate Arabic sounds recognizer for teaching and learning purposes. Early phases of this work were dedicated to the development of a particular sound database from recitations of... more
    This paper is part of an ongoing work aiming to build an accurate Arabic sounds recognizer for teaching and learning purposes. Early phases of this work were dedicated to the development of a particular sound database from recitations of the Holy Quran to cover classical Arabic sounds; speech signals of this sound database were manually segmented and labelled on three levels: word, phoneme, and allophone. Next, two baseline recognizers were built to validate the speech segmentation on both phoneme and allophone levels and also to test the feasibility of the sounds' recognition intended target. This current phase considers the development of an elaborated recognizer, by considering the basic sounds and looking for their distinctive features (e.g. duration, energy, etc.) to determine which ones will be particularly helpful to identify the phonological variation of the basic sound. Here, we present the first results of the basic sounds recognition obtained so far.
    International Liaison Ismail Khalil, Institute of Telecooperation Johannes Kepler, Austria (Chair) Abbas Cheddad, Karolinska Institute, Sweden Abdallah M'Hamed, Telecom SudParis, France Adel Al-Jumaily, University of Technology,... more
    International Liaison Ismail Khalil, Institute of Telecooperation Johannes Kepler, Austria (Chair) Abbas Cheddad, Karolinska Institute, Sweden Abdallah M'Hamed, Telecom SudParis, France Adel Al-Jumaily, University of Technology, Sydney, Australia Ahmad Tariq, University of Technology, Iraq Ahmed Ferchichi, University of Tunisia, Tunisia Ahmed M. Zeki, Bahrain University, Bahrain Atakan Kurt, Fatih University, Istanbul, Turkey Chandana Withana, Charles Sturt University Study Centre, Australia Ersan Akyıldız, Middle East Technical University, Turkey Ezendu Ariwa, London Metropolitan University, UK Hadj Bourdoucen, Sultan Qaboos University, Oman Hamad Aedh A. Alreshidi, University of Hail, Saudi Arabia Husham Jawad Ahmed, Cihan University, Iraq Ivan Jelinek, Czech Technical University, Czech Republic Khalid Al-Tahat, Arab open University, Jordan Mohammed A. Alam, Naif Arab University for Security Sciences, Saudi Arabia. Mohamed Elammari, University of Benghazi, Libya Prabhat K. Mah...
    مرحبا بكم في هذا العدد الخاص الذي ننشر فيه أوراقا مختارة من الدورة الرابعة للورشة الإفتراضية لعلوم وهندسة الحاسوب باللغة العربية والتي خصصت بشكل أستثنائي للحوسبة في مجالات الشريعة الإسلامية وأقيمت في الفترة ما بين 3 و4 ديسمبر 2012. وقد... more
    مرحبا بكم في هذا العدد الخاص الذي ننشر فيه أوراقا مختارة من الدورة الرابعة للورشة الإفتراضية لعلوم وهندسة الحاسوب باللغة العربية والتي خصصت بشكل أستثنائي للحوسبة في مجالات الشريعة الإسلامية وأقيمت في الفترة ما بين 3 و4 ديسمبر 2012. وقد شكلت هذه الدورة منعطفا هاما في مسار سلسلة الورشات التي تقام بشكل دوري مرتين في السنة والتي تسعى إلى تعزيز استخدام اللغة العربية في المجالات الحاسوبية، ليس فقط من خلال تركزيها على تسخير العلوم الحاسوبية لخدمة المجالات الشرعية وإنما أيضا من خلال نوعية وعدد المشاركات التي قدمت. فقد تمحورت المشاركات في مجالات مختلفة، وهي التطبيقات الحاسوبية في القرآن وعلومه، التطبيقات الحاسوبية في الحديث وفروعه، التطبيقات الحاسوبية في مجال الدعوة الإسلامية، التطبيقات الحاسوبية في الفقه والتشريع الإسلامي بشكل عام.
    Recently, there has been a great interest in Islamic software that try to harness computer to serve the religion. This brought about some applications and programs for the Holy Quran and its sciences, Hadith " ﺚﻳﺪѧﺤﻟا "... more
    Recently, there has been a great interest in Islamic software that try to harness computer to serve the religion. This brought about some applications and programs for the Holy Quran and its sciences, Hadith " ﺚﻳﺪѧﺤﻟا " (Prophet’s Tradition) and its methodology, Fiqh "ﻪѧѧﻘﻔﻟا " (Islamic jurisdiction), and Islamic law in general. However, these computer-programs, especially those developed for the sake of the noble Quran are still limited and have been focused on a direct application of the Information Technology techniques, such as storing, listening, searching, etc, without using more elaborated techniques in the domain. To contribute in the improvement of these efforts, a project for Computerized Teaching of the Holy Quran (CTHQ) is initiated. It aims to introduce advanced techniques and methodologies to develop an appropriate environment for self learning of the Holy Quran and its sciences. Different sub-systems are being developed separately and will then be ...
    An Initial Islamic bilingual parallel corpus of Arabic and Sign language was developed in a previous research we carried out at Al-Imam University, Saudi Arabia. This corpus is composed of Arabic texts describing the basic Islamic topics... more
    An Initial Islamic bilingual parallel corpus of Arabic and Sign language was developed in a previous research we carried out at Al-Imam University, Saudi Arabia. This corpus is composed of Arabic texts describing the basic Islamic topics such as prayer, pilgrimage, fasting, etc. in terms of elements and “functioning”. In this paper, we present our work related to the refinement and enhancement of this corpus and its structure; we present also a prototype of a teaching and learning environment that we will continuously improve to reach a good educational support for deaf. This environment will be powered by a hybrid (rule-based and statistical approaches) translation component combined with avatar-based 3D animations, which are currently under development.
    This paper is part of a continuous work aiming to build an accurate recognizer for Classical Arabic sounds usable for teaching and learning purposes. Previous efforts of this work focused firstly on the development of a particular sound... more
    This paper is part of a continuous work aiming to build an accurate recognizer for Classical Arabic sounds usable for teaching and learning purposes. Previous efforts of this work focused firstly on the development of a particular sound database from recitations of the Holy Quran to cover classical Arabic sounds; speech signals of this sound database were manually segmented and labeled on three levels: word, phoneme, and allophone. Next, two baseline recognizers were built to validate the speech segmentation on both phoneme and allophone levels and also to test the feasibility of the sounds' recognition intended target. This current phase - which is a PhD work - considers the development of an elaborated recognizer, by considering the basic sounds and looking for their distinctive features (e.g. duration, energy, etc.) to determine which ones will be particularly helpful to identify the phonological variation of the basic sound. Here, we present the first results of the basic so...
    ABSTRACT The automatic recognition of spoken words is increasingly common, for dictaphone applications, telephone services or the command of various devices by disabled persons. In the latter case, a high recognition rate is expected on a... more
    ABSTRACT The automatic recognition of spoken words is increasingly common, for dictaphone applications, telephone services or the command of various devices by disabled persons. In the latter case, a high recognition rate is expected on a vocabulary of small to medium size. To achieve this goal, the model must be refined. Thus, both the training stage and the recognition stage for such applications can be very time consuming and occasional re-training may happen. Its parallelization is thus worth considering. In this paper we present firstly the models we use: the classical hidden Markov model and another model that takes into account the prosody of speech, namely the centisecond two-level hidden Markov model introduced by Meziane [10]. Then two parallelization strategies are detailed: the first one simply shares the vocabulary among the processors, the second one also distributes the model. Experimental results highlight the need for a finer load-balancing: an a priori load estimation is presented and is used to statically balance the computational load between the processors. Further experiments have been conducted and exhibit efficiencies higher than 65% on an architecture composed of 12 Pentium Pro interconnected via Myrinet. Directions for improving further the parallelization are given.
    This paper presents a system for Arabic Part.Of.Spe ech Tagging, which combines morphological analysis with Hidden Markov Model (HMM) and relies on the Arabic sentence structure. On the one hand, the morphological analysis is used to... more
    This paper presents a system for Arabic Part.Of.Spe ech Tagging, which combines morphological analysis with Hidden Markov Model (HMM) and relies on the Arabic sentence structure. On the one hand, the morphological analysis is used to reduce the size of the tags lexicon by segmenting Arabic words in their prefixes, stems, and suffixes due to the fact that Arabic is
    ABSTRACT The advances in Science and Technology made it possible for people with hearing impairments and deaf to be more involved and get better chances of education, access to information, knowledge and interaction with the large... more
    ABSTRACT The advances in Science and Technology made it possible for people with hearing impairments and deaf to be more involved and get better chances of education, access to information, knowledge and interaction with the large society. Exploiting these advances to empower hearing-impaired and deaf persons with knowledge is a challenge as much as it is a need. Here, we present a part of our work in a national project to develop an environment for automatic translation from Arabic to Saudi Sign Language using 3D animations. One of the main objectives of this project is to develop a bilingual parallel corpus for automatic translation purposes; avatar-based 3D animations are also supposed to be built. These linguistic resources will be used for supporting development of ICT applications for deaf community. Due to the complexity of this task, the corpus is being developed progressively. In this paper, we present a first part of the corpus by working on a specific topic from the Islamic sphere.
    In this work we study the parallelization of the training phase for an automatic speech recognition system using the Hidden Markov Models. The vocabulary is uniformly distributed on processors, but the Markovian network of the treated... more
    In this work we study the parallelization of the training phase for an automatic speech recognition system using the Hidden Markov Models. The vocabulary is uniformly distributed on processors, but the Markovian network of the treated application is duplicated on all processors. The proposed parallel algorithms are based on two strategies of communications. In the first one, called regrouping algorithm, communications are delayed until the training of all local sentences is finished. In the second one, called cutting algorithm, packages of optimal sizes is firstly determined and then asynchronous communications are performed after the training of each package. Experimental results show that good performances can be obtained with the second algorithm.
    In this paper, we present a part of our work related to an ongoing ambitious project funded by the Saudi National plan for Sciences and Technologies, to build a virtual translator from Arabic Text to Saudi Sign Language. This project... more
    In this paper, we present a part of our work related to an ongoing ambitious project funded by the Saudi National plan for Sciences and Technologies, to build a virtual translator from Arabic Text to Saudi Sign Language. This project comprises two main parts. The first part concerns the mapping between words (or morphemes) in the input text and their
    ABSTRACT Research demonstrates that deaf individuals are undereducated and most of them are illiterate or at least semi- illiterate. Educating individuals with disabilities, in general, is a good investment. It doesn't only reduce... more
    ABSTRACT Research demonstrates that deaf individuals are undereducated and most of them are illiterate or at least semi- illiterate. Educating individuals with disabilities, in general, is a good investment. It doesn't only reduce welfare costs and future dependence; it reduces current dependence and frees other household members from caring responsibilities, as well as allowing them to increase employment or other productive activities. In this scope, a national funded project is launched to develop an environment for teaching and learning for Saudi deaf, using both automatic translation from Arabic to Saudi Sign Language and 3D animation techniques called Avatars. As part of the project, this paper presents the development of educational material to allow access to vital information for deaf people by presenting essential knowledge needed in their daily lives in an easy manner to grasp and comprehend. Resources for the subject of Islamic Education is collected and indexed based on levels and depths of information to accommodate needs of various types of users targeted by our works.
    We present in this paper an Arabic morpho-syntactic analyzer (Morphar+) built on top of the free Arabic Morphological analyzer (AraMorph). It is known that AraMorph produces a large number of morphological solutions, but little... more
    We present in this paper an Arabic morpho-syntactic analyzer (Morphar+) built on top of the free Arabic Morphological analyzer (AraMorph). It is known that AraMorph produces a large number of morphological solutions, but little information to select the appropriate morphological solution for words in context. For this purpose, we start characterizing/describing all particles of the Arabic language, broken noun patterns,
    ABSTRACT The task of tagging and allotting the correct Part of Speech (POS) to text given its context is not obvious and requires expertise and use of considerable resources. Automating such task and building tools that can carry such job... more
    ABSTRACT The task of tagging and allotting the correct Part of Speech (POS) to text given its context is not obvious and requires expertise and use of considerable resources. Automating such task and building tools that can carry such job is crucial and imperative to advance in major areas of natural language processing. A limited numbers of Part of Speech Taggers exist currently for Arabic and their availability is not trivial. In this paper we present an effort to design and build a POS tagger that would take into consideration the richness of the language as well as the efficiency in processing volumes of text. The Light Arabic Part of Speech Tagger (LAPOST) current output is very comparable to existing system but more effective from the processing perspective.
    ABSTRACT