Handwriting To Text Conversion

Uploaded by

sk23ecb0f24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views4 pages

Handwriting To Text Conversion

Uploaded by

sk23ecb0f24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Handwriting to Text Conversion: A Robust

Approach Using KNIME and AWS Rekognition

Shreeya Kosireddy Ravi Kishore Kodali
Department of Electronics and Communication Engineering, Department of Electronics and Communication Engineering,
National Institute of Technology, Warangal, National Institute of Technology, Warangal,
Warangal, Telangana, India Warangal, Telangana, India
Email: sk23ecb0f24@student.nitw.ac.in Email: kishore@nitw.ac.in

Abstract—The digitization of handwritten documents plays a This paper proposes a scalable system for handwriting-to-
crucial role in modern document management, from educational text conversion that integrates AWS Rekognition, a cloud-
assessments to archival preservation. This paper presents a based OCR tool, with the KNIME analytics platform. AWS
solution for handwriting-to-text conversion, aimed at automating
time-consuming processes such as examination paper grading Rekognition extracts text from handwritten document images
and document analysis. The system, implemented on the KNIME stored in an S3 bucket, while KNIME processes and organizes
analytics platform, integrates with AWS Rekognition, a cloud- the data into a structured format. A Python script is used to
based Optical Character Recognition (OCR) service, to extract filter out low-confidence OCR results, ensuring high-quality
text from images of handwritten documents stored in an S3 outputs. This combination of tools addresses the challenges
bucket. A custom Python script, utilizing the boto3 library,
automates interaction with AWS Rekognition, ensuring efficient of handwriting digitization and provides a reliable, automated
text extraction by filtering out low-confidence outputs. KNIME solution.
handles preprocessing tasks such as path manipulation and The proposed system is designed to reduce manual effort
data integration, while join nodes combine the extracted text and improve efficiency in document processing. Its applica-
with relevant metadata, producing a comprehensive, structured
tions span across various domains, such as digitizing archives,
output.
The proposed system reduces manual effort, minimizes errors, automating grading systems, and streamlining administrative
and saves time, demonstrating strong accuracy in processing workflows. The modular nature of the workflow allows for
diverse handwriting styles. Key applications include digitizing easy adaptation and future enhancements, such as support for
archival records, automating educational assessments, and im- multiple languages and advanced data analytics.
proving administrative document handling. Future enhancements
By leveraging the strengths of cloud-based OCR and data
will focus on improving OCR accuracy for complex handwrit-
ing, expanding multilingual support, and integrating advanced analytics platforms, this solution demonstrates the potential
analytics. By leveraging cloud-based technologies, this scalable to transform handwritten document management. It not only
solution highlights the potential of modern tools in transforming addresses current limitations but also opens new possibilities
document analysis workflows. for automating and improving document workflows in diverse
fields.
I. I NTRODUCTION
II. R ELATED W ORK
Handwritten documents are integral to many fields, includ-
ing education, administration, and historical record-keeping. Handwriting recognition has been a challenging area in doc-
Despite the growing adoption of digital tools, many processes ument digitization due to the variability in handwriting styles
continue to rely on handwritten records, such as examina- and image quality. Traditional OCR systems, like Tesseract,
tion grading, administrative file management, and preserving were designed for printed text and struggled with handwriting.
historical documents. These processes are often manual and These systems relied on rule-based approaches and feature
require significant time and effort, leading to errors and extraction techniques that limited their adaptability to diverse
inefficiencies, particularly when dealing with large volumes handwriting.
of data. With the emergence of machine learning, neural networks
Existing OCR technologies primarily excel at recognizing brought significant advancements in OCR accuracy. Connec-
printed text, but they face significant challenges when applied tionist Temporal Classification (CTC) and convolutional neural
to handwritten content. Factors like varying handwriting styles, networks enabled models to handle the spatial dependencies in
image quality, and inconsistent spacing between characters handwriting. However, these methods required large datasets
often result in incomplete or inaccurate text extraction. While and computational resources, making them less accessible for
recent advancements in OCR technology have improved ac- small-scale applications.
curacy, there is still a gap in handling complex handwritten Cloud-based OCR services, such as AWS Rekognition and
documents efficiently and reliably. Google Cloud Vision, addressed scalability and accessibility
challenges by offering pre-trained models capable of recogniz- downstream processing more efficient. KNIME nodes are
ing complex handwriting. However, low-confidence results and used to automate this preprocessing, ensuring that only clean,
challenges with non-standard scripts remained significant ob- structured data is passed to the OCR stage, reducing the
stacles. Recent efforts have focused on combining cloud-based potential for errors.
OCR with data analytics platforms, enabling preprocessing and
result refinement.
Our system builds on these advancements by integrat-
ing AWS Rekognition with KNIME, offering a flexible and
scalable workflow. This combination improves accuracy by
filtering low-confidence outputs and ensures structured data
for downstream applications.
III. M ETHODOLOGY
A. Introduction to Workflow
This project outlines a process to transform handwritten text
into structured digital text by combining AWS Rekognition, a
cloud-based Optical Character Recognition (OCR) tool, with
KNIME, a platform for data preprocessing, analysis, and
visualization. The modular workflow allows customization at
each step to meet specific requirements.
As shown in Figure 1, the process consists of several
stages: dataset loading, OCR processing, filtering outputs by
confidence levels, text conversion into numerical form, and
result generation. Each stage ensures the system’s accuracy
and reliability.
The modularity of the workflow allows for easy substi- Fig. 2. Accessing the Dataset and Text Preprocessing Workflow
tution of components. For instance, AWS Rekognition can
be replaced with a more advanced OCR tool, or additional
preprocessing steps can be incorporated based on the dataset. C. Optical Character Recognition (OCR)
This flexibility makes the system adaptable and scalable for
various applications. The core of the workflow is OCR, where handwritten
images are analyzed, and text is extracted using AWS Rekogni-
tion. Each detected word or line is assigned a confidence score.
A Python script in KNIME filters out text with a confidence
score below 50
This threshold strikes a balance between retaining useful
data and avoiding low-quality results. AWS Rekognition’s
parallel processing capabilities make it efficient for large
datasets, reducing processing time. The extracted text is then
formatted into a structured output for the next stage of the
workflow.

Fig. 1. Complete Workflow of the OCR System in KNIME

B. Accessing the Dataset and Preprocessing

The input consists of images of handwritten text stored in an
Amazon S3 bucket. This step involves accessing the dataset,
performing basic preprocessing, and removing unsuitable files
such as corrupted or empty ones.
Metadata is crucial for tracking and indexing the dataset.
Each image is assigned a unique identifier along with meta-
data, including upload date, source, and document type. This Fig. 3. OCR Engine Workflow in KNIME
structure improves data management and traceability, making
D. Text Vectorization F. Final Results
After text extraction, the next step is text vectorization, The final output pairs the original handwritten images with
where the text is transformed into a numerical format suitable the extracted text, showcasing the system’s ability to accurately
for analysis or machine learning models. Techniques like one- digitize handwritten documents. The modular design allows
hot encoding or word embeddings are used, depending on the the system to handle a range of use cases, including digitizing
application. archives, processing exam papers, or analyzing handwritten
Vectorization prepares the data for advanced tasks such as records.
clustering, summarization, or sentiment analysis. By convert- The results can be further enhanced by incorporating Natu-
ing text into numerical vectors, the system leverages mathe- ral Language Processing (NLP) tools for tasks like summariza-
matical models to gain insights and make predictions. tion or grammatical correction. Machine learning can also be
added to provide additional insights, such as sentiment analysis
or topic modeling, directly from the handwritten data.

Fig. 4. Text Vectorization Workflow in KNIME

E. High Confidence Table Visualization

To ensure OCR result quality, KNIME’s Table View node
displays detected text alongside its confidence score, catego-
rizing it into ”High Confidence” (50
The table also serves as a diagnostic tool during testing
and development, providing immediate feedback on how the
system handles different input types and facilitating iterative Fig. 6. Output: Handwritten Images and Detected Text (Second View)
improvements.
IV. C ONCLUSION
This project presents an efficient and scalable solution
for converting handwritten text into structured digital format
using AWS Rekognition and KNIME. The system’s modular
workflow, which includes preprocessing, OCR extraction, text
vectorization, and result generation, ensures accurate and
efficient data processing. By leveraging AWS Rekognition’s
powerful OCR capabilities and KNIME’s robust data pro-
cessing environment, the system can handle large volumes of
handwritten data, making it suitable for various applications,
such as digitizing historical documents, automating exam
paper corrections, and analyzing handwritten records.
The system’s flexibility allows easy adjustments to compo-
nents like OCR tools and preprocessing methods, making it
adaptable to diverse use cases. The final results demonstrate
Fig. 5. High Confidence Table in KNIME the system’s ability to convert handwritten text into digital
format, ready for further analysis or integration into other
applications. Confidence-based filtering further enhances OCR
accuracy by ensuring only high-confidence results are consid-
ered for analysis.
Future work could expand the system by integrating ad-
vanced NLP techniques for tasks like automatic summariza-
tion, grammatical correction, or sentiment analysis. Addition-
ally, machine learning models could be incorporated to further
improve OCR accuracy, particularly for challenging handwrit-
ing styles. Overall, this project lays a robust foundation for
building intelligent systems capable of processing handwritten
text for a wide range of practical applications.
V. ACKNOWLEDGMENT
I would like to thank the Department of Electronics and
Communication Engineering, NIT Warangal, for providing
resources and support for this project.
VI. R EFERENCES
1) IAM Handwriting Database. (n.d.). Re-
trieved from https://fki.tic.heia-fr.ch/databases/
download-the-iam-handwritingdatabase
2) Amazon Web Services. (n.d.). Extracting and Send-
ing Text to AWS Comprehend for Analysis. Amazon
Textract. https://docs.aws.amazon.com/textract/latest/dg/
extracting-sending-to-comprehend.html
3) KNIME Analytics Platform. (n.d.). KNIME
Software Documentation. https://www.knime.com/
knime-analytics-platform
4) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J.,
Jones, L., Gomez, A. N., Kaiser, Ł., Polosukhin, I.
(2017). Attention is All You Need. In Advances in
Neural Information Processing Systems (NeurIPS 2017).
https://arxiv.org/abs/1706.03762
5) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P. (2016).
SQuAD: 100,000+ Questions for Machine Comprehen-
sion of Text. In Proceedings of the 2016 Conference
on Empirical Methods in Natural Language Processing
(EMNLP 2016). https://arxiv.org/abs/1606.05250
6) Burt, J., Yang, Y. (2019). A Comparative Study of
Handwritten Text Recognition Techniques: OCR and
Beyond. International Journal of Computer Science and
Information Technology, 15(2), 121-138. https://www.
ijcseonline.org/
7) Amazon Web Services. (n.d.). Amazon Rekognition
Documentation. https://docs.aws.amazon.com/
rekognition/latest/dg/what-is.html
8) Koller, D., Rajapakse, J. (2020). Understanding Neural
Networks for Handwriting Recognition: A Deep Dive
into Applications of CNNs and RNNs. Machine Learning
Journal, 14(3), 75-91. https://www.ml-journal.org/
9) Sahami, M., Dumais, S. T., Hearst, M. A. (1998).
Text Classification Using WordNet (No. TR-98-098).
Stanford University. https://cs.stanford.edu/people/mam/
wb-tc/

Handwriting To Text Conversion
No ratings yet
Handwriting To Text Conversion
7 pages
Handwriting to Text with KNIME & AWS
No ratings yet
Handwriting to Text with KNIME & AWS
7 pages
AI-Powered Handwriting Conversion
No ratings yet
AI-Powered Handwriting Conversion
7 pages
Aiml Report Ref
No ratings yet
Aiml Report Ref
9 pages
AI - Digitizing - Handwriting - Presentation COMPLETED
No ratings yet
AI - Digitizing - Handwriting - Presentation COMPLETED
13 pages
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
No ratings yet
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
6 pages
Information
No ratings yet
Information
19 pages
Project Overview
No ratings yet
Project Overview
15 pages
Sample Project Report
No ratings yet
Sample Project Report
26 pages
Documentation Final
No ratings yet
Documentation Final
73 pages
Handwritten Character Recognition From Images Using CNN-ECOC Handwritten Character Recognition From Images Using CNN-ECOC
No ratings yet
Handwritten Character Recognition From Images Using CNN-ECOC Handwritten Character Recognition From Images Using CNN-ECOC
7 pages
Deep Learning for Handwriting OCR
No ratings yet
Deep Learning for Handwriting OCR
13 pages
Task
No ratings yet
Task
2 pages
BATCH 6 For Presentation
No ratings yet
BATCH 6 For Presentation
37 pages
Fin Ijprems1731866781
No ratings yet
Fin Ijprems1731866781
3 pages
Dipak - Hand To Text
No ratings yet
Dipak - Hand To Text
38 pages
A Review of Neural Networks in Handwritten Character Recognition
No ratings yet
A Review of Neural Networks in Handwritten Character Recognition
6 pages
Raj Synopsis12
No ratings yet
Raj Synopsis12
5 pages
A Ramesh Handwritten Text Recognition Using Deep Learning Techniques
No ratings yet
A Ramesh Handwritten Text Recognition Using Deep Learning Techniques
5 pages
Offline Handwritten Character Recognition
No ratings yet
Offline Handwritten Character Recognition
33 pages
Icicct 2018 8473291
No ratings yet
Icicct 2018 8473291
4 pages
Anar Ahmadov Thesis
No ratings yet
Anar Ahmadov Thesis
50 pages
Iarjset 2025 12116
No ratings yet
Iarjset 2025 12116
7 pages
Handwritten Text Recognition Using Machine Learning: Journal of Engineering Sciences Vol 14 Issue 02,2023
No ratings yet
Handwritten Text Recognition Using Machine Learning: Journal of Engineering Sciences Vol 14 Issue 02,2023
11 pages
Ai Powered Ocr For Efficient Government Documentation
No ratings yet
Ai Powered Ocr For Efficient Government Documentation
49 pages
Handwriting Detection Presentation
No ratings yet
Handwriting Detection Presentation
10 pages
Neural OCR for Handwriting Recognition
No ratings yet
Neural OCR for Handwriting Recognition
21 pages
Ocr 2
No ratings yet
Ocr 2
42 pages
Handwritten Character Recognition Presentation
No ratings yet
Handwritten Character Recognition Presentation
12 pages
Abstract
No ratings yet
Abstract
8 pages
Batch 6
No ratings yet
Batch 6
38 pages
Handwritten Character Recognition Presentation
No ratings yet
Handwritten Character Recognition Presentation
12 pages
Review
No ratings yet
Review
19 pages
IJCRT2107479
No ratings yet
IJCRT2107479
6 pages
1 s2.0 S0031320318304370 Main
No ratings yet
1 s2.0 S0031320318304370 Main
10 pages
Ocr PPT GRP 12
No ratings yet
Ocr PPT GRP 12
10 pages
Conf Paper
No ratings yet
Conf Paper
7 pages
MANVA
No ratings yet
MANVA
51 pages
Final - Synopsis (2) With Pages Removed
No ratings yet
Final - Synopsis (2) With Pages Removed
15 pages
TensorFlow Handwriting Recognition
No ratings yet
TensorFlow Handwriting Recognition
28 pages
MP Final Report
No ratings yet
MP Final Report
38 pages
Digitizing Notes Using Optical Character Recognition and Automatic Topic Identification and Classification Using Natural Language Processing
No ratings yet
Digitizing Notes Using Optical Character Recognition and Automatic Topic Identification and Classification Using Natural Language Processing
10 pages
An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier
No ratings yet
An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier
7 pages
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
No ratings yet
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
6 pages
ANN Miniproject Report
No ratings yet
ANN Miniproject Report
11 pages
Python-Based Document Scanner Project
No ratings yet
Python-Based Document Scanner Project
14 pages
Project Aicte
No ratings yet
Project Aicte
11 pages
B.E Cse Batchno 178 Sathyabama
No ratings yet
B.E Cse Batchno 178 Sathyabama
56 pages
Handwritten Optical Character Recognition
No ratings yet
Handwritten Optical Character Recognition
2 pages
From Handwritten Records To Digital Databases
No ratings yet
From Handwritten Records To Digital Databases
14 pages
Phase 1 PDF
No ratings yet
Phase 1 PDF
9 pages
Ocr Presentation
No ratings yet
Ocr Presentation
15 pages
English Review
No ratings yet
English Review
14 pages
Abstract (1) 1
No ratings yet
Abstract (1) 1
10 pages
Synopis Final
No ratings yet
Synopis Final
10 pages
Bilingual OCR Report
No ratings yet
Bilingual OCR Report
10 pages
Fin Irjmets1684836352
No ratings yet
Fin Irjmets1684836352
7 pages
Hexel
No ratings yet
Hexel
75 pages
Water and Carbon
No ratings yet
Water and Carbon
46 pages
Protein Structure and Function
No ratings yet
Protein Structure and Function
42 pages
Nucleic Acids
No ratings yet
Nucleic Acids
27 pages
Self Balancing Robot Report - Team YEEET!
No ratings yet
Self Balancing Robot Report - Team YEEET!
4 pages
TI Codes U1 SB2 Calculate Answers
No ratings yet
TI Codes U1 SB2 Calculate Answers
2 pages
Debugger's Duel
No ratings yet
Debugger's Duel
1 page
Agreement Letter For Navigator
No ratings yet
Agreement Letter For Navigator
4 pages
228 B.M. 1625 Uy Timosa
No ratings yet
228 B.M. 1625 Uy Timosa
2 pages
Cervical Cancer Screening PDF
No ratings yet
Cervical Cancer Screening PDF
14 pages
Manual de Voo AS350B3 Arriel2B1 Treinamento
No ratings yet
Manual de Voo AS350B3 Arriel2B1 Treinamento
612 pages
"Failed To Verify The SSL Certificate - After Upgrading To Vcenter Server 5
No ratings yet
"Failed To Verify The SSL Certificate - After Upgrading To Vcenter Server 5
7 pages
(Individual Project) 黎幸明書 - 110006401
No ratings yet
(Individual Project) 黎幸明書 - 110006401
8 pages
Risk Assessment Rig
No ratings yet
Risk Assessment Rig
501 pages
Techies: Crack the Code!
No ratings yet
Techies: Crack the Code!
1 page
Leadership Roles and Management in ? Functions Nursing Theory and
No ratings yet
Leadership Roles and Management in ? Functions Nursing Theory and
346 pages
3 Culvert Design
No ratings yet
3 Culvert Design
5 pages
FM 42: Investment and Portfolio Management
No ratings yet
FM 42: Investment and Portfolio Management
39 pages
PCLECC Meeting Report
No ratings yet
PCLECC Meeting Report
1 page
Credit Analysis & Functions Guide
No ratings yet
Credit Analysis & Functions Guide
7 pages
Airline Reservation Confirmation - Finish - American Airlines - AA
67% (3)
Airline Reservation Confirmation - Finish - American Airlines - AA
2 pages
Verified PDF Download Testbank Country Music Cowboy Fast Instant Download
No ratings yet
Verified PDF Download Testbank Country Music Cowboy Fast Instant Download
408 pages
Strategic Information and Evaluation Management Unit: Sie Data Management Training Portfolio of Evidence
No ratings yet
Strategic Information and Evaluation Management Unit: Sie Data Management Training Portfolio of Evidence
65 pages
Welding Standards & Specifications
No ratings yet
Welding Standards & Specifications
9 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
64 pages
Extinction of Criminal Action
No ratings yet
Extinction of Criminal Action
17 pages
Strategic Management Essentials
No ratings yet
Strategic Management Essentials
7 pages
ETA-0043-01 - Server Unit - Honeywell
No ratings yet
ETA-0043-01 - Server Unit - Honeywell
4 pages
Financial Statements Formate 3.2
No ratings yet
Financial Statements Formate 3.2
21 pages
Economic and Political Weekly
No ratings yet
Economic and Political Weekly
6 pages
Oberoi Hotel - Case Study On Trade Union PDF
0% (1)
Oberoi Hotel - Case Study On Trade Union PDF
3 pages
Yamaha RX-659 AV Receiver Manual
No ratings yet
Yamaha RX-659 AV Receiver Manual
132 pages
Hydraulic Fluid Selection Guide
100% (1)
Hydraulic Fluid Selection Guide
6 pages
Induction Motor
100% (1)
Induction Motor
20 pages
Analysis of Power Losses in AC-DC Converter For EV Drive
No ratings yet
Analysis of Power Losses in AC-DC Converter For EV Drive
4 pages
2 Community Organizing
No ratings yet
2 Community Organizing
54 pages
Dormitory Management System Proposal For DBU
100% (3)
Dormitory Management System Proposal For DBU
88 pages

Handwriting To Text Conversion

Uploaded by

Handwriting To Text Conversion

Uploaded by

Handwriting to Text Conversion: A Robust

Approach Using KNIME and AWS Rekognition

Fig. 1. Complete Workflow of the OCR System in KNIME

B. Accessing the Dataset and Preprocessing

Fig. 4. Text Vectorization Workflow in KNIME

E. High Confidence Table Visualization

You might also like