Vigneshkumar - Sign Language Recognition
Vigneshkumar - Sign Language Recognition
ABSTRACT
Our study focuses on sign language recognition, communicate seamlessly without requiring any
further devices. Using [Long-ShortTM] model leverage action detection.We used dropout layers in
both the training and testing stages of our deep learning models in an effort to improve accuracy and
avoid overfitting. Our system's accuracy was greatly increased by this calculated addition. Notably,
after extensive training and execution, we reached an excellent accuracy rate of 99.35%.This project
extends beyond offering a transformative impact on communication dynamics. we aim streamline
interaction for not just the deaf and mute community but for society at large. This innovation
transcends language barriers, providing a universal and inclusive mode of expression. The ease with
which emotions and messages can be conveyed through gestures fosters a more connected and
understanding world, where everyone, regardless of their communication abilities, can actively
participate in meaningful interactions.
Keywords: Mediapipe-Model, Predicting Sign-language , Long-Short Term Memory , Recognizing
Hand-Gestures.
I.INTRODUCTION
In the rich tapestry of human communication, sign language which played a vital roles, particularly
for people who have hearing and talk impairments. Originating in 17th-century Spain, sign language
has evolved into a complex and expressive means of conveying thoughts and emotions. This unique
form of communication involves intricate hand movements, facial expressions, lip motions, and body
gestures, making it a multi-faceted and nuanced language.The significance of sign language extends
beyond mere communication; it serves as a powerful tool for the cognitive and social development of
deaf and mute individuals. As noted in various studies, learning sign language enhances mental,
verbal, and sign skills while reducing internal and psychological pressures. Recognizing the
importance of integration, efforts have been made to incorporate deaf and mute children into
mainstream educational settings, fostering better social and educational experiences.Despite the
global presence of over 300 sign languages, each with its own set of alphabets and regional
variations, challenges persist. There are the absence of universal symbols for names further
complicates communication. To bridge these gaps, technologies like deep learning, convolutional
neural networks (CNN), and depth sensors have been explored for sign language prediction.This
project specifically goals on real-time sign language prediction uses a (LSTM) model and action
prediction. By capturing and analyzing sequences of frames, our system aims to accurately identify
and predict sign language gestures. The incorporation of dropout layers in the training process
addresses overfitting, resulting in a remarkable accuracy rate of 99.35%.
This work is not just about advancing technology; it is about empowering individuals. voice holds
the potential to revolutionize communication for the community of the deaf and mute and the and the
broader peoples. Beyond the technical intricacies, this project envisions a more inclusive world
where expressing oneself through gestures becomes a universal language, fostering understanding
and connection across diverse communities. In the following sections, we delve into related work,
the applied system, methodology, evaluation metrics, and conclude with the broader impact of our
efforts.
II.RELATED WORK
There are two main categories of sequence-to-sequence learning methods: Encoder-Decoder
Networks and those based on Connectionist Temporal Classification (CTC). Encoder-Decoder
networks originated from Neural Machine Translation (NMT)[1,10], with early models using a
single Recurrent Neural Network (RNN)[1,11] for both encoding and decoding sequences.
Subsequent improvements involved separating the encoding and decoding tasks into two RNNs, and
attention mechanisms were introduced to address issues in modeling long-term dependencies
between input and output sequences. This success in NMT led to the adoption of encoder-decoder
networks in computer vision applications such as image captioning, activity recognition, and lip-
reading.The second category, based on CTC[1,12], was proposed by Graves et al. and has been
widely applied in Speech Recognition and Handwriting Recognition. This method is particularly
suitable for tasks with weakly labeled data. In computer vision, CTC has been used for sentence-
level lip reading and action recognition[1,13].
The paper focuses on demonstrating sequence-to-sequence learning techniques in continuous sign
language recognition. This domain is chosen due to the multi-channel nature and the availability of
substantial expert linguistic knowledge. While most sign language recognition research previously
focused on isolated sign samples, recent interest has shifted towards continuous sign language
recognition, especially with the availability of large datasets like RWTH-PHOENIX-Weather-
2014[1,14]. Since continuous datasets lack frame-level annotations, previous work required an
alignment step to locate individual signs in videos. Relevant to this paper is the work by Koller
[1,15]et al., which combines deep representations with traditional Hidden Markov Model (HMM)
based temporal modeling.
III.METHODOLOGY
Our hand tracking technology, with applications ranging from gesture recognition to augmented
reality effects, employs a straightforward yet effective strategy. Initially, we focus on determining
the state of each finger, if it is straight or curved , by calculating the total angles at which joints are in
the hand skeleton. This foundational step precedes any further method application.Once the finger
states are identified, a mapping process assigns a pairs of predefined actions to each finger state.
This grouping enables the recognition of basic static gestures with reasonable accuracy. However,
our method goes beyond static gestures; it utilizes a series of landmarks to anticipate dynamic
motions, presenting an improvement over existing techniques.To enhance the user experience, we
also explore the incorporation of augmented reality features. Specifically, we superimpose these
features onto the bones of the hand skeleton, providing a visually engaging and interactive
experience. This aligns with the current trend of hand-based augmented reality effects that are
gaining popularity.In essence, our method seamlessly combines the assessment of finger states,
dynamic motion anticipation, and augmented reality integration, making it a versatile and impactful
technology.
A.ARCHITECTURE DIAGRAM
Fig..4,. Accuracy
Fig..5,. Loss
[13] R. N. Karthika, C. Valliyammai and M. Naveena, "Phish block: a blockchain framework for
phish detection in cloud," Computer Systems Science and Engineering, vol. 44, no.1, pp. 777–795,
2023.
[14] N. Velmurugan, C. S, G. V and K. S, "Thumbs-Up: A Sanction Probe Software using Machine Learning,"
2022 6th International Conference on Electronics, Communication and Aerospace Technology, Coimbatore,
India, 2022, pp. 1-5, doi: 10.1109/ICECA55336.2022.10009075.