Hand Sign Amp Gesture Recognition System
Hand Sign Amp Gesture Recognition System
Monu Singh
Department of CSE
2023 6th International Conference on Contemporary Computing and Informatics (IC3I) | 979-8-3503-0448-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/IC3I59117.2023.10398116
Galgotias University
Greater Noida, Uttar Pradesh, India
monu.singh@galgotiasuniversity.edu.in
Abstract— Hand gestures serve as a form of non-verbal recognition research is still in its early stages, and hand
communication utilized in sign language, primarily employed by gestures play a vital role in nonverbal communication, crucial
individuals who are deaf or have hearing impairments to for everyday interactions. The program demonstrates a real-
communicate with those who do not share the same condition. time system capable of identifying hand gestures and signals
While numerous sign language generators exist, none of them
based on shape-based parameters such as orientation,
offer customization options or cost-effectiveness for end users.
This proposed work outlines a prototype system named the centroid of the center of mass, finger states, and thumb
Hand Sign and Gesture Recognition System Software. The positioning in relation to extended or folded fingers. The
system aims to automatically interpret sign language, feature extraction process will be supervised by a
facilitating smoother communication between individuals who Convolutional Neural Network (CNN). Throughout this
are hearing and those who are deaf. Particularly, the system can process, each video frame will be captured, enabling the
be implemented on popular video-sharing platforms like identification and isolation of hands for processing by the
YouTube, which currently lack an auto-text creation feature for CNN.
gestures and sign language. The system utilizes a real-time hand
gesture recognition mechanism that employs shape-based
The rest part of the work is structured as follows: Section
features such as orientation, centroid of the center of mass,
finger state, and thumb position relative to extended or folded 2 consists of summary of state-of-the-art schemes; Section 3
fingers to identify gestures. A convolutional neural network describes the proposed approach; Section 4 demonstrates
(CNN) supervises the entire feature extraction process. Each outcomes; and summarizes the proposed work.
frame of the video is captured, allowing the identification and
isolation of hands for processing by the CNN. The system II. LITERATURE SURVEY
achieved a remarkable test accuracy of 99.89% by employing
CNNs for both training and testing. An advantage of this Systems that comprehend sign language gestures play a
approach is its independence from external hardware or devices, crucial role in improving communication abilities for
as it operates solely through the software implementation.
individuals who are hearing impaired or deaf [2][3]. In
Index terms— Convolutional Neural Network, Multi-class previous studies, convolutional neural networks (CNN) were
Support Vector, Hand Segmentation Approach, Region of Interest, employed to analyze input data from input devices after pre-
Hand Gestures processing. However, the complexity and diversity of hand
motions used in these tests significantly impacted
I. INTRODUCTION identification rates and accuracy [4-6]. Specifically, Rahim et
al. [7] presents an efficient approach for hand gesture
Hand gestures are utilized as a non-verbal means of detection using CNN and improved data pre-processing
communication within sign language. They are commonly techniques, including feature fusion, erosion, hole filling,
employed by individuals facing challenges in speaking or YCbCr binarization, and RGB color input. Another study
hearing to interact with those unaffected by such difficulties proposed by Islam et al. in reference [5] is based on the Multi-
[1]. Several sign language systems have been developed by class Support Vector Machine (MCSVM) to enhance
various companies, but they lack adaptability and productivity and efficiency, especially when dealing with a
affordability for end users. large number of specimens or gesture types.
This proposal presents the Hand Sign and Gesture Other recent research studies have primarily focused on
Recognition System Software as a prototype solution to data pre-processing. In [9], Patel et al. introduced a static
enhance interpersonal communication for individuals who hand gesture recognition system for American Sign
are deaf and mute. The system aims to serve as an Language, utilizing a lightweight deep convolutional neural
autonomous sign language translator, particularly applicable network that allows for easy deployment and transport. Image
to platforms like YouTube where auto-text creation for processing algorithms were employed to assist with
gestures and sign language is currently unavailable. Gesture background reduction, frame segmentation, feature
639
979-8-3503-0448-0/22/$31.00 ©2023 IEEE
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on December 15,2024 at 15:25:33 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)
extraction, Hand Segmentation Approach (HSA), and glove- model will classify the provided image based on the received
based hand motion detection, prioritizing portability, cascaded image. This vision-based approach eliminates the
affordability, simplicity, and minimal processing need for artificial devices in interaction since all the signals
requirements. can be performed solely with the hands.
III. METHODOLOGY
640
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on December 15,2024 at 15:25:33 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)
b. We began by capturing an image of every frame difficult to generalize to new instances. Dropout layers
generated by the computer's camera. address this issue by randomly setting a fraction of
activations from that layer to zero [19][20]. The network
c. Each frame was assigned a Region of Interest (ROI),
can still generate accurate outputs or classifications for
represented as a square with a blue border.
given scenarios even if some activations are excluded.
Throughout our search, we encountered difficulties in By employing two layers of algorithms, we aim to
finding suitable pre-existing datasets in the desired format, identify and analyze symbols that become increasingly
leading us to take the initiative in compiling our own dataset. similar to improve the accuracy of recognizing the
Our dataset creation process involved capturing multiple provided in Fig. 4. The major steps taken to recognize
images using OpenCV, allowing us to obtain a sufficient the gestures of hand are shown in Fig.5.
number of samples for testing and training each ASL sign.
The Region of Interest (ROI) for each frame was visually
indicated by a square with a blue border.
A. First-tier algorithm
i. Extract features from the frame captured by Figure 4. Possible detectable hand sign gestures
OpenCV and apply the Gaussian blur filter and
threshold to obtain the final image.
ii. If a letter appears in more than 50 frames of the
processed image, it is transmitted to the CNN model
for prediction and considered for word construction.
B. Second-Tier Algorithm
i. In the second algorithmic layer, we identify
alternative sets of symbols that yield equal results.
ii. We then employ classifiers specifically designed for
those sets to differentiate between the categories.
C. Activation Function
For both the convolutional and fully connected neurons
Figure 5. A Framework of Hand Sign & Gesture Recognition System
in each layer, Rectified Linear Units (ReLu) were
utilized. ReLu determines the maximum value between
the input pixel and zero (Max(x, 0)). This nonlinearity V. RESULT AND CONCLUSION
enables the network to learn more complex features
effectively. ReLu also helps address the vanishing We developed a technique to differentiate sign languages
gradient problem and accelerates training by reducing based on consistent hand movements, with a focus on Indian
computation time. Sign Language (ISL) as a case study. Our model employed
back projection histogram to set the image's histogram. By
D. Pooling Layer utilizing convolutional neural networks (CNNs) for training
and testing, we achieved a test accuracy of 99.89%. One
We employ Max pooling with a pool size of (2, 2) and a notable advantage of our approach is its independence from
ReLu activation function [17] on the input image. This external hardware or devices. Adequate lighting and a well-
reduces the number of parameters, thereby mitigating lit background are important considerations. Additionally, the
over-fitting and computational costs. method is designed for stationary motions only. We created a
E. Dropout Layers dataset consisting of gestures representing each letter of the
alphabet, from A to Z, using gestures and the digits 0 to 9.
Over-fitting [18] occurs when a network's weights are Each image in the dataset has a size of 50x50 pixels. Training
too closely aligned with the training examples, making it
641
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on December 15,2024 at 15:25:33 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)
the model across multiple epochs yielded the best results. convolutional neural networks. In 2020 IEEE 17th India Council
International Conference (INDICON) (pp. 1-6). IEEE.
Therefore, we ran the software with forty epochs and
[13] Strezoski, G., Stojanovski, D., Dimitrovski, I., & Madjarov, G. (2018).
evaluated accuracy through testing, training, and validation. Hand gesture recognition using deep convolutional neural networks. In
ICT Innovations 2016: Cognitive Functions and Next Generation ICT
In the future, advancements in technology will allow for Systems (pp. 49-58). Springer International Publishing.
[14] Srivastava, A. K., Gupta, K., Singh, A., & Srivastava, H. (2022). Virtual
the combination of these individual letters into meaningful
Graphic Creator by Sensing Hand Motion.
words, streamlining communication. This technique can also [15] Chen Y, Luo B, Chen YL, Liang G, Wu X. A real-time dynamic hand
be applied to videos on platforms like "YouTube" and gesture recognition system using kinect sensor. In: Robotics and
"Netflix," where automated text synthesis based on gestures Biomimetics (ROBIO); 2015 IEEE International Conference on ; 2015
Dec 6; New York : IEEE;2015; p. 2026-2030
and sign languages is currently unavailable. Furthermore, our
[16] Chinimilli, B. T., Anjali, T., Kotturi, A., Kaipu, V. R., & Mandapati, J.
technology could be integrated to enhance communication V. (2020, June). Face recognition based attendance system using haar
during video conferences. In summary, the proposed "hand cascade and local binary pattern histogram algorithm. In 2020 4th
sign and gesture recognition system" has potential international conference on trends in electronics and informatics
(ICOEI)(48184) (pp. 701-704). IEEE.
applications in both commercial and public-benefit domains,
[17] Lei, X., Pan, H., & Huang, X. (2019). A dilated CNN model for image
such as utilizing gestures to operate smart devices instead of classification. IEEE Access, 7, 124087-124095.
relying solely on verbal communication. [18] Cao, Y., Chen, Z., Belkin, M., & Gu, Q. (2022). Benign overfitting in
two-layer convolutional neural networks. Advances in neural
information processing systems, 35, 25237-25250.
REFERENCES
[19] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., &
Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural
[1] Gupta, U., Sharma, S., Jyani, U., Bhardwaj, A., & Sharma, M. (2022, networks from overfitting. The journal of machine learning
December). Sign Language Detection for Deaf and Dumb students research, 15(1), 1929-1958.
using Deep learning: Dore Idioma. In 2022 2nd International [20] Skourt, B. A., El Hassani, A., & Majda, A. (2022). Mixed-pooling-
Conference on Innovative Sustainable Computational Technologies dropout for convolutional neural network regularization. Journal of
(CISCT) (pp. 1-5). IEEE. King Saud University-Computer and Information Sciences, 34(8),
[2] Giovanzana, C. (2023). A Shift Correction Algorithm to Improve 4756-4762.
Wearability of an Event-driven Hand Gesture Recognition
Armband (Doctoral dissertation, Politecnico di Torino).
[3] Baptista, J., Santos, V., Silva, F., & Pinho, D. (2023). Domain
Adaptation with Contrastive Simultaneous Multi-Loss Training for
Hand Gesture Recognition. Sensors, 23(6), 3332.
[4] Yang, Y., Wang, H., Jiang, R., Guo, X., Cheng, J., & Chen, Y. (2022).
A review of IoT-enabled mobile healthcare: technologies, challenges,
and future trends. IEEE Internet of Things Journal, 9(12), 9478-9502.
[5] Gupta, U., Sharma, S., Jyani, U., Bhardwaj, A., & Sharma, M. (2022,
December). Sign Language Detection for Deaf and Dumb students
using Deep learning: Dore Idioma. In 2022 2nd International
Conference on Innovative Sustainable Computational Technologies
(CISCT) (pp. 1-5). IEEE.
[6] Zhang, M., Zhou, Z., Wang, T., & Zhou, W. (2023). A lightweight
network deployed on ARM devices for hand gesture recognition. IEEE
Access.
[7] Rahim, M. A., Shin, J., & Islam, M. R. (2019, July). Dynamic hand
gesture based sign word recognition using convolutional neural
network with feature fusion. In 2019 IEEE 2nd International
Conference on Knowledge Innovation and Invention (ICKII) (pp. 221-
224). IEEE.
[8] Islam, M. Z., Hossain, M. S., ul Islam, R., & Andersson, K. (2019,
May). Static hand gesture recognition using convolutional neural
network with data augmentation. In 2019 Joint 8th International
Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd
International Conference on Imaging, Vision & Pattern Recognition
(icIVPR) (pp. 324-329). IEEE.
[9] Patel, R., Dhakad, J., Desai, K., Gupta, T., & Correia, S. (2018,
December). Hand gesture recognition system using convolutional
neural networks. In 2018 4th international conference on computing
communication and automation (ICCCA) (pp. 1-6). IEEE.
[10] Mohamed, N., Mustafa, M. B., & Jomhari, N. (2021). A review of the
hand gesture recognition system: Current progress and future
directions. IEEE Access, 9, 157422-157436.
[11] Islam, M. R., Mitu, U. K., Bhuiyan, R. A., & Shin, J. (2018,
September). Hand gesture feature extraction using deep convolutional
neural network for recognizing American sign language. In 2018 4th
International Conference on Frontiers of Signal Processing
(ICFSP) (pp. 115-119). IEEE.
[12] Hatibaruah, D., Talukdar, A. K., & Sarma, K. K. (2020, December). A
static hand gesture based sign language recognition system using
642
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on December 15,2024 at 15:25:33 UTC from IEEE Xplore. Restrictions apply.