Keywords

1 Introduction

The language is the most essential way of communication and interacting with the society, and it is the only way humans can understand each other and express their feelings in words. According to the World Health Organization, there are around 466 million people worldwide who are deaf, including 34 million children. Studies show that by 2050 the number of people with hearing impairment will increase to more than 900 million people, including genetic causes, complications at birth, some infectious diseases or chronic ear infections. For Deaf community Sign Language (SL) is the most used from many different languages that help hearing impaired individuals to communicate and to be understood by their peers and the rest of the community.

In this paper we will present ArSign for Arabic Sign Language Translator using Leap Motion Controller (LMC) in mobile situation. The proposed system allow to translate sign language produced by hand gestures to a written text to help individuals understanding hearing impaired people without knowing sign language rules and\or specific alphabet. We think that using the Arabic Sign Language (ASL) translator can make deaf interaction more easier and faster.

The remaining of this paper is organized into three sections. The next one describe background research and the most recent related studies presented into three subsections: Sensors based systems, vision based systems and advance in sign language recognition for Arabic alphabet. Section 3 presents the proposed system and discuss design choices. Conclusion will summarize the presented work and discuss future research directions.

2 Related Works

Sign language is not standardized around the world, some examples of sign languages are American Sign Language (ASL), Arabic Sign Language (ArSL), Greek Sign Language (GSL), or Korean Sign Language (KSL). During the last decades, several research projects have been conducted dealing with sign language recognition. Methods used for recognizing sign language can be classified into two main categories: sensors and vision-based approaches. The sensor-based approach is based on various sensors attached to the hands to capture 3D hand movements and generally relies on gloves. On the other hand, vision-based approaches utilizes one or more visual sensors to track hand movements and analyzes performed gestures based on motion algorithms.

2.1 Sensors Based Approaches

Sensor-based system employs variety of electromechanical devices that are incorporated with many sensors to recognize signs into gloves [1]. In 2018, Ahmed et al. [2] present a review on systems-based sensory gloves for sign language recognition between 2007 and 2017 with more than 20 proposed system such as [3,4,5]. Glove-based recognition system are composed by three main module (see Fig. 1). Sensors are used to detect finger movements and hand orientation. Preetham et al. [6] proposed a simple gloves with five flex sensors attached to the fingers. The used glove log the finger displacement and translate it to characters. Shukor et al. [27] developed a wearable smart glove using an embedded accelerometer placed on the back of the palm to capture the hand motion and 10 tilt sensors to measure the fingers’ motion. This system was tested with American sign language and results reported an average accuracy rate of 89%. [7] designed a new solution based on a microcontroller board with a Bluetooth connection and 5 accelerometer sensors placed in each fingers. An android application connected to the glove have been implemented. The board send recognized hand movement to the android application witch display correspondent worlds. Authors report promising results when evaluating this prototype. Lee et al. [8] in their prototype propose to add pressure sensors in a smart hand device for recognizing American sign language alphabets. The wearable device is composed by 5 flex sensors placed in fingers, a three-axis inertial motion on the back of the palm, and 2 pressure sensors on the middle finger. The fusion of signal provided by flex and pressure sensors showed an improvement in the accuracy rate increasing from 65.7% (with only flex sensors) to 98.2% (with the fusion of signal from flex and pressure sensors). Another recent work presented by [9] for the Indonesian sign language based on an android application. The proposed system is dedicated for hearing impaired peoples. In this system a Bluetooth based glove was developed within an accelerometer mounted on the index finger and five flex sensors mounted in each finger. An Arduino Nano board collect finger movement and through a mapping algorithm identify Indonesian letters from recognized gestures. Preliminary results of the proposed system was interesting, and authors report a very high recognition rate.

Fig. 1.
figure 1

adapted from [2]

The main components used by glove-based systems

In addition to gloves, the Leap Motion Controller (LMC) was also widely used as input sensor for sign language recognition. [10] present a system based two Leap Motion sensors for collecting hand states from different viewing angle. This system allow recognizing digits (0–9). The achieved recognition accuracy using both sensors was 93.14%.

From the above, we can see that sensor based approaches for SLR was widely studied during last decades and results was promising to help researcher in order to propose a more accurate system for SLR.

2.2 Vision Based Approaches

Vision-based systems use cameras as primary tools to obtain the necessary input data. So, it removes the need for sensors in sensory gloves and reduces costs of such systems because cameras less expensive than sophisticated sensors, and become available with most smartphones and laptops. However, using embedded vision present various problems such as high computational costs needed to image processing, the limited and difference field of view according each capturing device and the need for stereo cameras to obtain good results.

Vision based solutions uses monocular or stereo vision to capture the gesture and analyzes the movement with certain motion algorithms [11,12,13]. Several works were performed in order to translate signs to written text using vision based systems. [14] utilized the Kinect system, which is proposed by Microsoft, to capture the body skeleton and the 3D trajectory of the motion of both hands to recognize word and sentence. Also [15,16,17,18] propose to use the kinect as a low coast capture device for vision based systems. [19] have presented a new dynamic Bayesian network model achieving the highest accuracy of 94.6% using a five-state hidden node for 48 ASL signs. Kang et al. [20] used depth sensors to recognize 31 Americans sign language using a convolutional neural network with an accuracy rate from 83.58% to 85.49% obtained without training and for new users.

For the Japanese Sign Language (JSL), More and Sattar [28] propose to use the scale invariance Fourier transform (SIFT) to detect the dynamic hand gestures by searching the matched key points between the input image and images stored in a database. In another work proposed by [21], authors propose to recognize Indian sign language through a webcam. The hand recognized in the image frames is segmented and the state of fingers was used to recognize the alphabet. To do that, the number of fingers fully opened, closed or semi closed and angle made between fingers are used as interesting features for recognition (Fig. 2).

Fig. 2.
figure 2

adapted from [2]

Processing steps used in vision based sign recognition system

The effectiveness of vision-based approaches is highly affected by the images processing issues (e.g. image filtering, background cancellation, color segmentation, and boundary detection, etc.). For instance, uncontrolled background images can greatly decrease movement detection. Hence, as presented in this section, several evaluations of vision based systems demonstrate that such system present hand occlusion problems and suffer from confusing gestures recognition.

2.3 Arabic Sign Language Translators

Sign language recognition for Arabic alphabets (see Fig. 3 for Arabic Sign language alphabets) was also studied, both approaches (i.e. Sensors and vision) was also used in order to help deaf people to communicate. A review of recent work in Arabic SLR was presented in [22]. Ibrahim et al. [11] presents SLRS an automatic visual Sign Language Recognition System that translates Arabic words signs into text. The proposed system was composed by four steps: 1) hand segmentation, 2) hand tracking, 3) feature extraction and 4) gesture classification. For hand segmentation, a dynamic skin detector is used. To evaluate the proposed solution, a preliminary evaluation was performed within a dataset of 30 isolated Arabic words. Words were chosen taking into consideration different occlusion states. Experimental results indicate that the proposed system has a recognition rate of 97% in signer-independent mode.

Fig. 3.
figure 3

Arabic sign alphabets as proposed by [23]

El-Jaber et al. [12] presents a new solution for Arabic Sign language gestures using disparity images for user-dependent recognition. The sequences of disparity images are used to segment out the body of the user from a non-stationary background in video based gestures. The spatiotemporal features in the sequence of images are represented in two images by accumulating the prediction errors of consecutive segmented images according to the directionality of motion. The assessment of such methodology is performed by collecting 50 repetitions of 23 gestures from 4 different users using Bumblebee XB3 camera. A classification rate of 96.8%, and an improvement of 62% are reported for the described methodology. This solution is adapted for solving the occlusion states. Hidden Markov Models (HMV) [24] and conventional neural networks [25] was also used in order to improve the recognition rate of gesture for Arabic sign language.

In 2018, a fully labeled dataset of Arabic Sign Language (ArSL) images was implemented [26]. The proposed dataset contain a large number of gesture for all Arabic letters. The authors aim to help researcher to investigate and develop automated systems for the impaired hearing people using sensors based or vision based approaches and deep learning algorithms.

3 The Proposed System

To recognize performed gestures, we use the Leap Motion Controller (LMC) which is 3D non-contact motion sensor which can tracks and detects hands, fingers, bones and finger-like objects. The LMC Application Programming Interface (API) include different features for bones, gestures, hand and fingers. The LMC is placed in front of the user connected to the device throw OTG connector. The overall architecture of the proposed system is presented by Fig. 4 and it consists on three modules, namely, the sensor to capture hand movements, processing to extract the meaning of gestures, and output modules to display or translate the recognized gestures.

Fig. 4.
figure 4

The overall system architecture

Gestures will be captured through the LMC. The three-axis hand palm position and the five fingertips position will be accurately defined and will be sent to the processing unit. Once gestures recognized, a simple algorithm will perform comparison with a dataset containing 28 Arabic sign language gesture recorded by the user.

3.1 Data Acquisition and Features Extraction

Data acquisition is one of the most important process in such systems. As discussed in the literature review section, we will use a sensor based approach based on the Leap Motion Controller (LMC) to capture hand movements (see Fig. 5).

Fig. 5.
figure 5

A hand representation when recognized by the LMC

The LMC is a small USB peripheral device developed by Leap Motion CompanyFootnote 1 and designed to be placed on a physical desktop, facing upward or attach to a head-mounted display for Virtual Reality (VR) applications. The LMC is based three LEDs to capture hand information when it is into his active rang but does not provide any pictures of detected images and two infrared cameras with high precision. It detects hands and fingers like objects reporting discrete position and motion. The Leap Motion Controller tracks fingers and hands in its field of view, it provides updates as a set, or frames of data. Each frame contains a list of the basic tracking data that describes the overall motion in the scene. The LMC employs a right-handed Cartesian coordinate system when logging hand and finger positions. The origin is centered at the top of the LMC. It has a large interaction space above the device with approximately 60 cm.

The LMC company provide a complete API for application developers that include a large number of adapted features for fingers, hand and gesture tracking. In this work we will use some of them as follow:

  • For fingers: The API send for each recognized finger his direction, length and width in millimeters, position in a Cartesian coordinate system and velocity.

  • For hand: The API send information about the recognized hand as fellow: A list of the fingers related to the recognized hand, the type of the hand, hand position relative to the center point of the palm in millimeters, hand velocity.

  • For gestures: Some gesture are already recognized such as for instance a circle gesture when a finger trace a circle in the space. A swipe gesture when linear movement of a finger is detected. A key Tap gesture is recognized if a downward tapping movement by a finger or a tool is performed.

Based on these information we can build a framework to capture hands and fingers movements and understand the meaning of gestures.

3.2 System Implementation

The proposed system is composed by three components as shown in Fig. 4. The first one is the LMC to capture hand and fingers movements. The second a processing module to convert gesture to alphabets based on an algorithm that compare recognized gesture with a dataset of gesture. The third module display recognized alphabets on a screen or through a Text To Speech (TTS) engine. An android based application is implemented for that. As presented by Fig. 6, when the user start the application, the user is asked to connect the LMC through an OTG connector. After that the main interface recognize the hand and begin to write the corresponding text according recognized gestures.

Fig. 6.
figure 6

A) In this interface the user is asked to connect the LMC to start interaction. B) The LMC recognize the hand and finger movements’ and translate it into a text. C) The user can save the input text or send it through social media (e.g. Twitter, Facebook, Instagram, etc.) or convert it to a voice mail through a Text To Speech (TTS) engine.

4 Conclusions and Discussion

In this paper a large state of the art was presented for both sensors and vision based approaches for sign language recognition systems. Indeed vision based system suffer from image processing issues such as image filtering, background colors, segmentation, and boundary detection, etc. However researcher tries to overhead these limitation via more sensitive cameras or new image processing algorithms. Consequently, a non-vision based method is an alternative approach.

In this paper, we propose to use Leap Motion Controller within an android based application to convert Arabic sign language to written text. To do that we proposed a system with three independent module to capture gesture, recognize sign and display message. The LMC will be connected to the Smartphone via an USB connector (e.g. OTG adapter). We think that we can recognize 28 letters composing the Arabic alphabet and 10 digits. The proposed solution present a set of functions including Arabic sign recognition, a Text To Speech (TTS) translation using an android based text synthesizer engine and also social media option that allow users to directly share translated text in their social media account.