SignScribe_241117_234004
SignScribe_241117_234004
SignScribe_241117_234004
Project submitted by
Aratrika Debnath
Bijaya Roy
Nirakshi Kundu
Titli Ghosh
2024
SignScribe
by
September, 2024
Declaration
We hereby declare that this dissertation is the product of our own work,
and we attest that it contains no material that resulted from collaboration,
except where explicitly acknowledged in the text. Furthermore, we confirm
that this project has not been previously submitted, either in part or in its
entirety, to any other University or Institution for the purpose of obtaining
any degree, diploma, or other qualification. All sources used and referenced
in this dissertation are duly credited, and any borrowed ideas or information
are appropriately cited in accordance with academic standards and guide-
lines.
...........................................
Date: 22-11-2024 ( Aratrika Debnath, Bijaya Roy, Nirakshi Kundu,
Titli Ghosh )
Place: SNU, WB Registration Numbers: 210010973991,
210011011594, 210011144913, 210011305972
...............................................
(Dr. Pratik Bhattacharjee)
Date: 22-11-2024 Assistant Professor
Place: SNU, WB Dept of Computer Science
Sister Nivedita University
List of Figures vi
Acknowledgement vii
Abstract viii
1 Introduction 1
1.1 Organization of chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Chapter 2: Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Chapter 3: Feasibility study . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Chapter 4: Problem identification . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.4 Chapter 5: Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.5 Chapter 6: Experimental Results and Discussion . . . . . . . . . . . . . . 3
1.1.6 Chapter 7: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.7 Chapter 8: Future scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
2.1 Existing Research on CNNs for Feature Extraction . . . . . . . . . . . . . . . . . 4
2.2 Sequence Modeling with RNNs and LSTMs for Gesture Recognition . . . . . . . 4
2.3 Large-Scale Datasets for Indian Sign Language . . . . . . . . . . . . . . . . . . . 5
2.4 Existing Challenges and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Proposed Solutions and Unique Contributions . . . . . . . . . . . . . . . . . . . . 5
3 Feasibility study 7
4 Problem identification 8
5 Proposed Solution 10
5.1 Proposed Solution and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Proposed System Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4 System Architecture Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4.1 ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4.2 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4.3 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4.4 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Conclusion 16
5.1 ER diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 DFD Level 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4 DFD Level 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.5 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
By using advanced computer vision and machine learning models, the system will be capable
of interpreting a wide range of sign language gestures, turning them into text that can be easily
understood by non-signers. This real-time capability will be particularly beneficial in situations
where quick and accurate communication is essential, such as emergency scenarios or day-to-
day interactions in public spaces.
Beyond simple communication, the system can be integrated with other technologies to act
as an automatic interpreter, offering real-time translation between sign language and spoken
language. This would make it easier for individuals who are deaf or hard of hearing to engage
in conversations with people who do not know sign language. Additionally, it could function
as a potential SOS tool, enabling those with hearing impairments to quickly convey distress
signals or request help in urgent situations.
By enhancing communication between the hearing and non-hearing population, this system
will contribute to greater inclusivity, accessibility, and empowerment for individuals who rely
on sign language as their primary means of communication. The project has the potential to
improve quality of life for millions of people, reducing social isolation and promoting equal
participation in society.
The chapter will explain the use of CNNs for feature extraction and LSTMs for handling se-
quential data. Additionally, it will discuss the integration of transfer learning to support vari-
ations in sign languages, especially Indian Sign Language (ISL). The solution’s potential for
plug-and-play compatibility and use as an SOS tool will also be highlighted.
particularly LSTMs, are effective in handling sequential data by processing gestures in a time-
series format, capturing meaning across multiple frames [1]. Multi-language SLR systems
utilizing LSTMs have achieved significant accuracy, though challenges like real-time respon-
siveness and regional adaptability persist [7].
The Hybrid CNN-BiLSTM SLR model addresses preprocessing challenges by employing
Histogram Difference (HD) for key-frame extraction, combining VGG-19 for spatial feature
extraction with BiLSTM for temporal analysis, achieving an accuracy of 87.67 percent [8].
Another real-time ISL recognition system uses skin segmentation for ROI tracking, paired with
fuzzy c-means clustering for gesture prediction [9]. A CNN-pretrained model coupled with
LSTMs has shown improved spatio-temporal recognition performance [10], while an ISLR
system using Mediapipe Holistic landmarks and LSTM variants highlighted the strengths of
simple and bidirectional LSTMs over stacked models for ISL gesture recognition [11].
A real-time system designed to recognize numbers 0-9 using LSTMs achieved 91.50 percent
accuracy on a dataset of 200,000 samples and is slated for expansion to daily-life gestures [12].
Another work compared CNN and RNN models, with RNNs excelling in temporal dynamics
and CNNs performing well on spatial aspects, emphasizing the importance of combining both
approaches for robust ISL recognition [13].
SignScribe will provide an out-of-the-box solution that can be quickly deployed without exten-
sive configuration. Emergency SOS Functionality: In critical situations, SignScribe could serve
as an SOS tool by integrating with public CCTV systems monitored by nearby police stations,
offering an extra layer of security. Cross-Platform Accessibility: The system will be accessible
through a variety of mediums—mobile apps, smart devices, and web interfaces—encouraging
widespread adoption and daily use. Interoperability for Wider Adoption: Through design
choices that prioritize interoperability, As shown in Table 2.1, SignScribe aims to foster a more
inclusive adoption of SLR technology across different contexts and environments.
– An SLR system could assist deaf individuals in accessing public services, whether
at government offices, healthcare facilities, or service counters, enhancing their in-
dependence and engagement with society.
– By translating sign language in real-time, this system could also help individuals
more fully participate in social activities, reducing feelings of isolation and fostering
a sense of community.
• Advancements through Machine Learning and AI:
– Using Machine Learning (ML) techniques, including computer vision and natural
language processing (NLP), the system can be designed for high accuracy and adapt-
ability.
– By leveraging deep learning models, this system can recognize various languages,
dialects, and regional variations of sign language, making it widely applicable and
versatile.
– This technology also offers real-time recognition capabilities, crucial for sponta-
neous conversations and ensuring that interactions flow naturally.
• Application in Diverse Settings:
– Beyond face-to-face interactions, the SLR system could integrate with virtual com-
munication platforms, enabling smooth interaction in remote work settings, online
learning environments, and virtual conferences.
– The system could also be utilized in customer service kiosks, mobile apps, video
conferencing tools, and public spaces, enabling communication across various chan-
nels and situations.
• Contributing to Social Equality and Inclusion:
– Developing an SLR system addresses a critical societal need for inclusivity, equality,
and accessibility for the deaf and hard-of-hearing communities.
– This technology stands as a step toward a more equitable society where everyone
has the tools to communicate and engage in everyday life freely and independently,
empowering both deaf and hearing individuals to participate in a more inclusive
world.
5.3 Assumptions
• The target user base is primarily those requiring Indian Sign Language (ISL) support.
• Users will have access to devices with cameras (such as smartphones or webcams) to
capture gesture input.
• Sufficient computational resources, like access to cloud-based GPUs, are available for
model training and testing.
DFD Level 1 : The Level 1 DFD provides a detailed breakdown of the main process into
its key sub-processes. It shows how Sign Language Data is first preprocessed to prepare it for
further analysis, such as feature extraction. The extracted features are used to train the model,
which is validated to ensure accuracy and reliability. The validated model is then deployed,
and the system outputs the Recognized Gesture. This figure 5.4 helps in understanding the
sequential flow of data and the key components involved in the system’s operation.
5.4.4 Flowchart
The flowchart will visually represent the entire process sequence of the Sign Language Recog-
nition (SLR) system, from the initial step of capturing hand gestures to the final output of
translated text. The first stage involves gesture capture, where the system uses a camera or
sensor to record the user’s sign language gesture. This raw gesture data is then passed on to
the preprocessing stage, where techniques such as image resizing, normalization, and possibly
augmentation are applied to prepare the data for accurate model inference. After preprocess-
ing, the system feeds the processed data into the trained model, which performs inference
using its learned weights to classify the gesture. The model’s output is then translated into
text that corresponds to the recognized gesture. Finally, this translated text is displayed on the
screen, allowing the user to see the real-time interpretation of their sign language gesture. The
flowchart, as shown in Figure 5.5, provides a clear and detailed illustration of these sequential
steps, highlighting how each operation contributes to the overall functionality of the system.
5.5 Algorithm
The algorithm below outlines the general workflow of the SignScribe system:
• Step 1: Input Capture - Initialize the camera and capture gesture frames in real-time.
• Step 2: Preprocessing -
– Normalize and resize frames.
– Apply data augmentation techniques as needed.
• Step 3: Feature Extraction - Feed each frame into a CNN to extract spatial features.
• Step 4: Sequence Processing - Pass the sequence of extracted features through the LSTM
to interpret temporal dependencies.
• Step 5: Translation - Generate a text output for the recognized gesture sequence.
• Step 6: Display and Integration -
– Display the text output to the user.
– Trigger SOS functionality if a predefined emergency gesture is detected.
5.6 Summary
This proposed solution aims to create an accessible, reliable, and scalable system for real-
time sign language translation. By using state-of-the-art deep learning techniques and focusing
on interoperability, the system will enable the hearing-impaired community to communicate
effectively and enhance their access to emergency services when needed.
• Data Collection:
– A comprehensive dataset of sign language gestures has been gathered to support
the development and validation of the recognition model. This dataset serves as
the foundation for accurate and reliable gesture classification, covering various sign
language alphabets.
• Model Training:
– A Convolutional Neural Network (CNN) has been trained using TensorFlow and
Keras to extract and analyze features from sign language gestures. By focusing ini-
tially on recognizing individual alphabets, the model establishes a foundational step
in understanding hand shapes and orientations essential to accurate classification.
• Model Development:
– A functional model has been successfully built, marking a significant milestone in
the project. The model’s architecture has been developed using TensorFlow and
Keras, two robust frameworks that provide flexibility and power for deep learning
applications. This model is capable of recognizing individual sign language gestures
based on the input images, with an emphasis on accuracy and real-time performance.
The model’s architecture is stored in a .json file, while the weights and trained pa-
rameters are saved in an .h5 file. This setup ensures the model can be easily reloaded
for further testing or fine-tuning, offering scalability for future enhancements. The
successful development of the model signifies a crucial first step towards the goal of
building a fully functional Sign Language Recognition system.
These advancements demonstrate the feasibility of the project and set the stage for further
refinement and expansion, including gesture sequence recognition and full integration
with translation features.