[go: up one dir, main page]

0% found this document useful (0 votes)
16 views8 pages

Vigneshkumar - Sign Language Recognition

Uploaded by

petchima08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Vigneshkumar - Sign Language Recognition

Uploaded by

petchima08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

USING LS TERM MEMORY AND ENHANCED HAND DETECTION FOR ADVANCED

REAL-TIME ASL RECOGNITION


Vigneshkumar V, Abinaya G
Department of IT, Saveetha Engineering College, Chennai, India

ABSTRACT
Our study focuses on sign language recognition, communicate seamlessly without requiring any
further devices. Using [Long-ShortTM] model leverage action detection.We used dropout layers in
both the training and testing stages of our deep learning models in an effort to improve accuracy and
avoid overfitting. Our system's accuracy was greatly increased by this calculated addition. Notably,
after extensive training and execution, we reached an excellent accuracy rate of 99.35%.This project
extends beyond offering a transformative impact on communication dynamics. we aim streamline
interaction for not just the deaf and mute community but for society at large. This innovation
transcends language barriers, providing a universal and inclusive mode of expression. The ease with
which emotions and messages can be conveyed through gestures fosters a more connected and
understanding world, where everyone, regardless of their communication abilities, can actively
participate in meaningful interactions.
Keywords: Mediapipe-Model, Predicting Sign-language , Long-Short Term Memory , Recognizing
Hand-Gestures.
I.INTRODUCTION
In the rich tapestry of human communication, sign language which played a vital roles, particularly
for people who have hearing and talk impairments. Originating in 17th-century Spain, sign language
has evolved into a complex and expressive means of conveying thoughts and emotions. This unique
form of communication involves intricate hand movements, facial expressions, lip motions, and body
gestures, making it a multi-faceted and nuanced language.The significance of sign language extends
beyond mere communication; it serves as a powerful tool for the cognitive and social development of
deaf and mute individuals. As noted in various studies, learning sign language enhances mental,
verbal, and sign skills while reducing internal and psychological pressures. Recognizing the
importance of integration, efforts have been made to incorporate deaf and mute children into
mainstream educational settings, fostering better social and educational experiences.Despite the
global presence of over 300 sign languages, each with its own set of alphabets and regional
variations, challenges persist. There are the absence of universal symbols for names further
complicates communication. To bridge these gaps, technologies like deep learning, convolutional
neural networks (CNN), and depth sensors have been explored for sign language prediction.This
project specifically goals on real-time sign language prediction uses a (LSTM) model and action
prediction. By capturing and analyzing sequences of frames, our system aims to accurately identify
and predict sign language gestures. The incorporation of dropout layers in the training process
addresses overfitting, resulting in a remarkable accuracy rate of 99.35%.
This work is not just about advancing technology; it is about empowering individuals. voice holds
the potential to revolutionize communication for the community of the deaf and mute and the and the
broader peoples. Beyond the technical intricacies, this project envisions a more inclusive world
where expressing oneself through gestures becomes a universal language, fostering understanding
and connection across diverse communities. In the following sections, we delve into related work,
the applied system, methodology, evaluation metrics, and conclude with the broader impact of our
efforts.
II.RELATED WORK
There are two main categories of sequence-to-sequence learning methods: Encoder-Decoder
Networks and those based on Connectionist Temporal Classification (CTC). Encoder-Decoder
networks originated from Neural Machine Translation (NMT)[1,10], with early models using a
single Recurrent Neural Network (RNN)[1,11] for both encoding and decoding sequences.
Subsequent improvements involved separating the encoding and decoding tasks into two RNNs, and
attention mechanisms were introduced to address issues in modeling long-term dependencies
between input and output sequences. This success in NMT led to the adoption of encoder-decoder
networks in computer vision applications such as image captioning, activity recognition, and lip-
reading.The second category, based on CTC[1,12], was proposed by Graves et al. and has been
widely applied in Speech Recognition and Handwriting Recognition. This method is particularly
suitable for tasks with weakly labeled data. In computer vision, CTC has been used for sentence-
level lip reading and action recognition[1,13].
The paper focuses on demonstrating sequence-to-sequence learning techniques in continuous sign
language recognition. This domain is chosen due to the multi-channel nature and the availability of
substantial expert linguistic knowledge. While most sign language recognition research previously
focused on isolated sign samples, recent interest has shifted towards continuous sign language
recognition, especially with the availability of large datasets like RWTH-PHOENIX-Weather-
2014[1,14]. Since continuous datasets lack frame-level annotations, previous work required an
alignment step to locate individual signs in videos. Relevant to this paper is the work by Koller
[1,15]et al., which combines deep representations with traditional Hidden Markov Model (HMM)
based temporal modeling.
III.METHODOLOGY
Our hand tracking technology, with applications ranging from gesture recognition to augmented
reality effects, employs a straightforward yet effective strategy. Initially, we focus on determining
the state of each finger, if it is straight or curved , by calculating the total angles at which joints are in
the hand skeleton. This foundational step precedes any further method application.Once the finger
states are identified, a mapping process assigns a pairs of predefined actions to each finger state.
This grouping enables the recognition of basic static gestures with reasonable accuracy. However,
our method goes beyond static gestures; it utilizes a series of landmarks to anticipate dynamic
motions, presenting an improvement over existing techniques.To enhance the user experience, we
also explore the incorporation of augmented reality features. Specifically, we superimpose these
features onto the bones of the hand skeleton, providing a visually engaging and interactive
experience. This aligns with the current trend of hand-based augmented reality effects that are
gaining popularity.In essence, our method seamlessly combines the assessment of finger states,
dynamic motion anticipation, and augmented reality integration, making it a versatile and impactful
technology.
A.ARCHITECTURE DIAGRAM

Fig..1,. Architecture Diagram


B.DATASET
The dataset for [ A-S-L ] was meticulously carefully chosen. It comprises a hundred distinct signs,
each performed three times by a just one signer under various lighting, resulting in a dataset of 2400
images for each sign. This dataset was separated into a trained set of 1800 records and a test pair for
evaluation.
Data acquisition involved using the Lana dataset scene2-front camera, capturing information from
hands, bodies, and faces, stored as numpy arrays. A deep neural-networks with [L-STM] layers was
designed for classification, capturing temporal dependencies. The approach includes predicting
actions across frames, not just individual ones, and integrating OpenCV for real-time prediction
using a camera. This comprehensive methodology ensures a robust system for American Sign
Language recognition.
C.MODULES
I. Using MediaPipe to Pre-process Images:
Uses Google's dataset for human body detection and MediaPipe for multi-modal experiences. uses
two models for hand movement recognition and landmark detection, and normalizes coordinates.
The basis for LSTM-based sign language prediction is laid by 21 hand landmarks.
II. Normalization and Cleaning of Data:
Combines several hand landmarks and concentrates on x and y coordinates. divides the dataset into
training and validation portions, eliminates null entries, and normalizes coordinates. utilizes loss
functions and optimization while training a model.
III. ML Algorithm Prediction:
Employs LSTM models and hand pose estimation to classify words in sign language on real time.
Fig..2,. A set of landmarks is superimposed to real time hand
IV.CONSTRUCTION AND TRAINING OF LSTM
In crafting our proposed model, we leverage Tensor flow [23] and the Keras library [24] to import
essential components. The sequential model forms the backbone, facilitating the creation of a
sequential neural network. LSTM and Dense layers play pivotal roles, with LSTM introducing a
temporal component crucial for action detection. The incorporation of Tensor board, an integral part
of the Tensor Flow package, enables real-time monitoring during neural network training .The initial
steps involve establishing a log directory and configuring Tensor board call backs. Subsequently, the
neural network architecture takes shape. Utilizing the sequential API, we integrate three sets of
LSTM layers, comprising 64 layers with specified parameters like return sequences=True, activation
functions, and given shape. Two extra LSTM layer with 128 units combined with subsequent
[LSTM] levels with 64 units follow, each with pertinent specifications. The architecture further
includes two Dense layers with 64 units, activation functions, and a concluding Dense layer for
action extraction. This last layer employs three neural network units and the soft max activation
function, ensuring a probability distribution .Upon segregating the dataset into training and
validation sets, the model's efficacy is assessed. To refine model accuracy, strategic introduction of
dropout layers between the input leveks and the first hidden level is implemented. Adjustments to
learning rate, momentum, and dropout rates result in incremental enhancements. A sequence of
dropout layers is applied to tackle overfitting concerns. With a dropout rate set at 20%, imposing
weight constraints on hidden neurons, the final accuracy reaches 98.87%. This comprehensive
strategy, featuring dropout regularization, effectively addresses overfitting challenges in deep
learning models, ensuring resilient performance in both training and testing phases.
Fig..3,. The structure of an LS Term Memory block
V. REVIEW OF EVALUATION METRICS
Our LSTM model undergoes a thorough evaluation process involving the manipulation of 3D
datasets using Numpy's Reshape function. The model architecture comprises two LSTM layers with
256 units each, followed by a Dense layer. Throughout training, essential parameters like trained
accuracy, trained loss are meticulously monitored. Post-training, the model parameters are revealed,
and a file is saved. Visual representations, including epoch accuracy and epoch loss plots, provide
insights into the model's learning trajectory. To ensure a comprehensive evaluation, we utilize crucial
metrics such as a multi-label matrix of confusion. This diagram dissects positives, negatives for each
label. Analyzing the confusion matrix sheds light on the distribution of correct and incorrect
predictions. Following an extensive training regimen spanning approximately 2000 epochs and
testing with two users, the model achieves an impressive categorical accuracy of 99.35%.
Examination of the epoch loss, as illustrated in Figure 4, suggests the potential for further
improvement with additional training epochs. Figure visually captures the feature extraction process
and predicted results from the model. In summary, our model demonstrates remarkable performance
with high accuracy, providing profound insights through diverse evaluation metrics. The possibility
of further training presents promising opportunities to enhance accuracy and minimize loss,
reinforcing the model's efficacy in real-time sign language recognition.

Fig..4,. Accuracy
Fig..5,. Loss

Fig..6,. Result of SL recognition


VI.CONCLUSION
In conclusion, our novel approach to real-time sign language recognition stands at the forefront of
technological advancements in this domain. By leveraging real-time video frames, we have not only
streamlined the recognition process but also achieved notable improvements in data efficiency,
detection speed, and model training duration. The LSTM model, central to our methodology,
exhibited an outstanding accuracy of 99.35%, surpassing benchmarks set by related works. A
significant breakthrough lies in our strategic handling of challenges associated with signer
backgrounds and lighting variations. The implementation of a minimalistic background proved
instrumental in enhancing sign capturing, laying the foundation for testing with three distinct signs.
Our commitment extends beyond conventional recognition, aiming for real-time video-based sign
language identification that adapts to diverse environmental factors and remains independent of
signer-specific attributes. Furthermore, our segmentation enhancements, employing gradient
masking and threshold optimization, contribute to the precision of object recognition. The utilization
of binary gradient masking, with a focus on vertical structuring elements, enhances segmentation
robustness. This work not only represents a technological milestone in the field but also holds
promise for practical applications in dynamic, real-world scenarios. The combination of real-time
efficiency, accuracy, and adaptability positions our model as a significant contribution to the ongoing
evolution of sign language recognition systems. As we continue refining and expanding this
methodology, the potential for its positive impact on accessibility and communication for individuals
using sign language is substantial.
REFERENCES
[1] S. Sharma, K. Kumar, and N. Singh, “Deep eigen space based ASL recognition system,” IETE
Journal of Research, vol. 68, no. 5, pp. 3798–3808, Sep. 2022, doi:
10.1080/03772063.2020.1780164.
[2] W. Liu, Y. Fan, and Z. ZhongZhang, “RGBD video based human hand trajectory tracking and
gesture recognition system,” Mathematical Problems in Engineering, vol. 2015, 2014, doi:
10.1155/2015/863732.
[3] D. Uebersax, J. Gall, M. Van den Bergh, and L. Van Gool, “Real-time sign language letter and
word recognition from depth data,” in 2011 IEEE International Conference on Computer Vision
Workshops (ICCV Workshops), Nov. 2011, pp. 383–390, doi: 10.1109/ICCVW.2011.6130267.
[4] R. Sharma, R. Khapra, and N. Dahiya, “Sign language gesture recognition,” International
Journal of Recent Research Aspects, vol. 7, no. 2, pp. 14–19, 2020.
[5] A. Halder and A. Tayade, “Real-time vernacular sign language recognition using mediapipe and
machine learning,” International Journal of Research Publication and Reviews, no. 2, pp. 9–17, 2021.
[6] M. Taskiran, M. Killioglu, and N. Kahraman, “A real-time system for recognition of american
sign language by using deep learning,” in 2018 41st International Conference on
Telecommunications and Signal Processing (TSP), Jul. 2018, pp. 1–5, doi:
10.1109/TSP.2018.8441304.
[7] Y. Bin, Z.-M. Chen, X.-S. Wei, X. Chen, C. Gao, and N. Sang, “Structure-aware human pose
estimation with graph convolutional networks,” Pattern Recognition, vol. 106, p. 107410, Oct. 2020,
doi: 10.1016/j.patcog.2020.107410.
[8] B. M. Wilson, “Evaluating and Improving the SEU reliability of artificial neural networks
implemented in SRAM-based FPGAs with TMR,” Thesis, Brigham Young University, 2020.
[9] U. von Agris, J. Zieren, U. Canzler, B. Bauer, and K.-F. Kraiss, “Recent developments in visual
sign language recognition,” Universal Access in the Information Society, vol. 6, no. 4, pp. 323–362,
Feb. 2008, doi: 10.1007/s10209-007-0104-x.
[10] P. Das, T. Ahmed, and M. F. Ali, “Static hand gesture recognition for american sign language
using deep convolutional neural network,” in 2020 IEEE Region 10 Symposium (TENSYMP), 2020,
pp. 1762–1765, doi: 10.1109/TENSYMP50017.2020.9230772.
[11] Dr R Cristin, K Suresh Kumar, Anbhazhagan, Severity Level Classification of Brain Tumor
Based on MRI Images Using Fractional-Chicken Swarm Optimization Algorithm, The Computer
Journal,2021.
[12] Elwin, J.G.R. , Kumar, K.S. , Ananth, J.P. , Kumar, R.R. , Entropy Weighted and Kernalized Power K-
Means Clustering Based Lesion Segmentation and Optimized Deep Learning for Diabetic Retinopathy
Detection, International Journal on Artificial Intelligence Tools , 2023, 32(1), 2250044

[13] R. N. Karthika, C. Valliyammai and M. Naveena, "Phish block: a blockchain framework for
phish detection in cloud," Computer Systems Science and Engineering, vol. 44, no.1, pp. 777–795,
2023.
[14] N. Velmurugan, C. S, G. V and K. S, "Thumbs-Up: A Sanction Probe Software using Machine Learning,"
2022 6th International Conference on Electronics, Communication and Aerospace Technology, Coimbatore,
India, 2022, pp. 1-5, doi: 10.1109/ICECA55336.2022.10009075.

You might also like