Machine Learning-Based Grammar Correction System
Machine Learning-Based Grammar Correction System
com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2024) 000–000
Procedia Computer Science 00 (2024) 000–000 www.elsevier.com/locate/procedia
ScienceDirect www.elsevier.com/locate/procedia
Abstract
Abstract
In the context of global integration, the role of business English is becoming increasingly prominent worldwide. In order to help
business English
In the context studentsintegration,
of global improve the theaccuracy of written
role of business expressions,
English this research
is becoming applied
increasingly deep learning
prominent methods
worldwide. In to design
order and
to help
implement a grammar
business English correction
students improvesystem suitableofforwritten
the accuracy business English students.
expressions, Usingapplied
this research large-scale
deepbusiness
learning English
methodstexts as corpus
to design and
and natural alanguage
implement grammarprocessing
correctionmethods, the features
system suitable of grammar
for business errors
English were extracted,
students. and combined
Using large-scale businesswith convolutional
English neural
texts as corpus
networks
and natural and recurrent
language neural networks
processing methods, tothe
achieve
featureserror
of detection
grammar and correction.
errors The automatic
were extracted, correction
and combined withstatistics showed
convolutional that
neural
the systemand
networks performed
recurrentexcellently in correcting
neural networks to achievespelling errors, with
error detection and 180 successful
correction. The corrections. For grammar
automatic correction errors,
statistics although
showed that
they were successfully
the system corrected in
performed excellently forcorrecting
120 times,spelling
some complex situations
errors, with still require
180 successful manual For
corrections. intervention.
grammar Theerrors,experiment
although
showed
they were thatsuccessfully
the method corrected
used in thisforresearch can effectively
120 times, some complex identify variousstill
situations grammar errors,
require especially
manual for complex
intervention. sentences,
The experiment
business-professionalized
showed that the method used English expressions,
in this research canetc.effectively
The findings of thisvarious
identify research provideerrors,
grammar an effective solution
especially for automatic
for complex error
sentences,
correction in business English,
business-professionalized Englishwhich has a positive
expressions, etc. Theimpact on improving
findings the efficiency
of this research provide anof effective
business solution
English for
students’ language
automatic error
skills.
correction in business English, which has a positive impact on improving the efficiency of business English students’ language
© 2024 The Authors. Published by ELSEVIER B.V.
skills.
© 2024
This The
is an Authors.
open Published
accessPublished by Elsevier
article under B.V.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
© 2024 The Authors. by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review
This is an under
open responsibility
access article of the scientific
under CC committee
BY-NC-ND of the
license 11th International Conference on Applications and Techniques
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 11th International Conference on Applications and Techniques
in Cyber Intelligence
Peer-review under responsibility of the scientific committee of the 11th International Conference on Applications and Techniques
in Cyber Intelligence
Keywords: Machine Learning, Business English Grammar Correction System, Grammar Errors, Automatic Correction Statistics
in Cyber Intelligence
Keywords: Machine Learning, Business English Grammar Correction System, Grammar Errors, Automatic Correction Statistics
1. Introduction
1. Introduction
In the era of economic globalization, business English has gradually become an important academic discipline.
In the era
However, for of economic
business globalization,
English business
students, the properEnglish has gradually
use of business become
English an important
for effective academic discipline.
communication, especially
However, for business English students, the proper use of business English for effective communication, especially
with correct grammar, is a major challenge. To tackle this issue, machine-learning-based grammar correction
methods have drawn growing attention. This study intends to use advanced deep learning models such as
convolutional neural networks and recurrent neural networks to design and develop a system suitable for correcting
grammar errors made by business English students. The purpose of this system is to improve grammatical accuracy
in discourse, especially in the specific field of business English. A large number of studies have shown that machine
learning has obvious advantages in language processing, grammar correction, and other fields. This study helps to
expand the application of this field, improve the written expression level of non-native speakers, and thus improve
the learning outcomes of business English students.
This article first provides an overview of common grammar error correction systems in international business
English, and provides a detailed analysis of some commonly used machine learning-based grammar error
recognition algorithms. Then, a detailed introduction is given to the deep learning models and algorithms used,
including data preprocessing, feature extraction, model training, and system performance evaluation. The results
indicate that the method adopted in this study can effectively rectify various grammar errors, especially for complex
business texts. In addition, this research also explores the application prospects of the system in education, business,
and other fields.
This article contains the following research topics. It begins by discussing the research background and existing
findings related to this topic, highlighting the significance and potential applications of machine learning technology
in grammar error correction. It then elaborates on the data collection process, model design, experimental setup, and
evaluation metrics employed. The research results are analyzed, and potential future research avenues are suggested.
By examining these aspects, the article aims to offer a thorough exploration of the grammar correction system in
business English, providing a deeper comprehension of the topic.
2. Related Work
Therefore, the study of error correction mechanisms in business English is of great significance for improving the
language accuracy and professional skills of business English learners in the process of business communication.
Lan Keran explored the application of cognitive contextual clue chain model in business English reading [1]. Fu
Weili conducted research and practice on cross-border teaching of Chinese language and business English in
vocational schools [2]. Xu Min studied the modular teaching of business English majors in vocational colleges
based on a blended learning model [3]. Yang Tian explored the reform plan of business English talent training mode
for application-oriented undergraduate colleges in the context of the "the Belt and Road Initiative" [4]. Qin Ying
conducted research on the construction and practice of a smart classroom ecosystem for business English in
universities under the background of informatization [5]. However, existing research mostly focuses on error
correction in general English, lacking exploration of specific situations and vocabulary, especially for error
correction in business-specific English. In addition, many systems lack the ability to handle complex sentences or
terms when dealing with business documents.
Therefore, studying machine learning based grammar correction methods for business English students has
significant theoretical and practical value in improving the efficiency and accuracy of business-specific
communication. Chen G studied a method for detecting errors in written verb forms among Chinese English learners
based on linking grammar and pattern grammar [6]. Hussein S M explored the correlation between error correction
and grammatical accuracy in second language writing [7]. Kholmurodova D K studied cognitive grammar in English
classrooms [8]. Krismayani N W explored the lexical and grammatical features of Business English [9]. Zhong Y
discussed the correction plan for English grammar errors in deep learning [10]. However, existing research is mostly
limited to general error correction methods, lacking a deep understanding of specific words and contexts. In addition,
the existing systems need to improve their processing capabilities for non-standard grammar and complex business
vocabulary.
20 Hongwei Ju et al. / Procedia Computer Science 247 (2024) 18–26
Hongwei Ju / Procedia Computer Science 00 (2024) 000–000 3
3. Method
To establish an effective business English grammar correction system, it is necessary to collect sufficient
business English corpus. This article utilizes the business school, business English online teaching platform, and
public business document resources to collect nearly a million documents, including business reports, emails, and
meeting minutes. On this basis, this article further improves the generalization performance of the model through
methods such as semantic substitution and semantic reconstruction.
In order to explore the research on machine learning-based business English grammar correction systems, this
article uses a single LSTM (long short-term memory) model as the baseline model. In fact, the actual dataset may be
extremely large and complex, and a basic example is provided here. The original content, error location, error type,
and modified content corresponding to the sentence number are shown in Table 1. This format facilitates precise
recognition and correction of specific errors in sentences. The sentence "He is going to the office" incorrectly uses
"is going" and should be changed to "is going", and "like" and "apple" should be changed to "likes" and "apples",
respectively. These can be adjusted accordingly for verbs and nouns.
Table 1. Original content, error location, error type, and modified content corresponding to sentence numbering
Sentence Wrong
Original content Error type Modified content
numbering location
2 They have not finished yet their homework. 7-8 Word order They have not yet finished their homework.
3 She like eating apple. 2 Verbal form She likes eating apples.
4 The meeting will be hold tomorrow. 6 Passive voice The meeting will be held tomorrow.
In the process of correcting grammar errors in business English, correctly extracting features is of great
significance for improving the performance of the system [11]. In response to this issue, this study is based on
natural language processing to extract features from speech. By using the word vector representation method, each
word in the text is transformed into a vector representation containing the meaning and context of the word.
Syntactic dependency analysis method is used to extract structural features from sentences, in order to help the
model understand the relationships between various components and improve its ability to recognize and correct
grammar errors.
Accuracy "Accuracy" :
(1)
Here, True Positives (TP) represents correctly recognized syntax errors; True Negatives (TN) represents the
correctly identified non erroneous part; Total Predictions is the total predicted quantity.
Hongwei Ju et al. / Procedia Computer Science 247 (2024) 18–26 21
4 Hongwei Ju / Procedia Computer Science 00 (2024) 000–000
Convolutional neural networks are combined with recurrent neural networks to establish a grammar error
correction model. Convolutional neural networks are mainly used to extract dependency relationships between
sentences, while recurrent neural networks mainly study long-term correlations in sentences, such as subject verb
consistency, tense coherence, etc. During the model training phase, using corpus with error annotations, the back
propagation algorithm is used to continuously optimize the model weights, thereby reducing the gap between
predicted results and actual errors.
Recall rate Recall:
(2)
False Negatives (FN) refers to grammar errors that are not correctly recognized. The recall rate measures the
model's ability to capture actual syntax errors.
This study tests the effectiveness of error correction methods in business English through a series of evaluation
criteria, including accuracy, recall, F1 score, etc. Finally, the method is compared with existing labeled test cases to
evaluate the model's ability to recognize various syntax errors. By integrating the prediction results of multiple deep
learning models, the robustness and accuracy of the algorithm can be further improved.
Precision P_R:
(3)
False Positives (FP) refers to non incorrect parts that are incorrectly marked as syntax errors. Precision measures
the model's prediction of syntax errors.
Finally, this study integrates the developed grammar correction system into an online business English teaching
platform and tests it for the public to ensure its practicality and practicality.
F1 Score 〖F1〗_S:
(4)
In practical applications, appropriate development architectures such as TensorFlow, PyTorch, etc., are selected
based on the complexity of the model to achieve learning and deployment of the model.
In depth exploration of feature engineering:
Text feature analysis: From the three levels of vocabulary, syntax, to semantics, the semantic understanding
ability of the model is improved through methods such as part of speech tagging, dependency syntax analysis, and
word vector vectors.
Error type identification: The types of grammatical errors that need to be identified in the pattern are identified,
such as subject predicate inconsistency, improper use of tenses, and improper use of articles.
4.2. Results
The application examples of machine learning technology in the business English grammar correction system are
shown in Table 2. For the sentence "He is going to the office.", syntactic dependency can clearly indicate the subject
verb relationship between "is" and "He", as well as the verb object relationship between "go" and "to." This is
beneficial for grammar correction. These relationships are crucial for the model to recognize and correct verb form
errors. In addition, the labeling of incorrect positions and types provides clear training objectives for the model,
helping it gradually learn and correct different grammar errors.
Table 2. Application examples of machine learning techniques in business English grammar correction systems
Sentence Wrong
Original sentence Syntactic Dependency (Example) Error type Correcting sentences
numbering location
The meeting will be hold (ROOT, will, 2) (be, will, 3) (hold, meeting, Passive The meeting will be held
4 6
tomorrow. 4) (tomorrow, be, 5) voice tomorrow.
I am writing a e-mail to (ROOT, am, 2) (writing, am, 3) (e-mail, a, 4) Spelling I am writing an e-mail to
5 5-6
my boss. (to, writing, 5) (boss, my, 7) errors my boss.
The exploration of performance trends for different versions of models is shown in Figure 1, which includes
model version (V), accuracy, recall, and F1 score.
Hongwei Ju et al. / Procedia Computer Science 247 (2024) 18–26 23
6 Hongwei Ju / Procedia Computer Science 00 (2024) 000–000
0.88
Numerical value
0.86
0.84
0.82
0.80
Figure 1 shows the gradual improvement of key performance indicators such as accuracy, recall, and F1 score for
models from version 1.0 to version 1.5. The accuracy of the initial version 1.0 model is 85%, which increases to 91%
in the V1.5 version, indicating an improvement in the accuracy of the model in correctly identifying categories.
During the same period, the recall rate increases from 80% to 90%, indicating a significant improvement in the
model's ability to identify positive samples. In addition, the continuous enhancement of model performance is
mainly driven by algorithm optimization, feature engineering improvement, parameter fine-tuning, and expanded
training datasets. Despite positive progress, the model may still face technical challenges such as overfitting,
underfitting, and class imbalance in actual deployment, which need to be addressed through the adoption of more
advanced technologies and strategies.
The recognition results of the business English grammar correction system for the distribution of error types are
shown in Figure 2. In the grammar correction system of business English, 500 records of grammar errors are the
most common, highlighting their universality and challenge in business English. Spelling errors are also more
common, with 400 error records. Punctuation and vocabulary usage errors, although occurring less frequently at 200
and 150 times respectively, are still important as they may affect the clarity and professionalism of business texts.
For other errors, although the minimum number is only 50, a detailed analysis of these errors cannot be ignored as
they may contain multiple complex types of errors. When analyzing the data of the business English grammar
correction system, the high proportion of grammar errors at 0.385 indicates their main position in error correction.
Spelling errors, accounting for 0.308%, are also an important category of errors. Punctuation errors and vocabulary
usage errors account for 0.154 and 0.115, respectively. Although the proportion is relatively low, they have a
significant impact on the clarity and accuracy of business texts. The lowest proportion of other errors is 0.038,
indicating that although these errors are widespread, the frequency of single type occurrence is low, emphasizing the
necessity of conducting a detailed analysis of these error types.
24 Hongwei Ju et al. / Procedia Computer Science 247 (2024) 18–26
Hongwei Ju / Procedia Computer Science 00 (2024) 000–000 7
Fig 2. Recognition results of error type distribution in Business English grammar correction system
The automatic correction effect is shown in Figure 3. The automatic correction statistics show that the system
performs excellently in spelling errors, with 180 successful corrections. For grammar errors, although they have
been successfully corrected 120 times, some complex situations still require manual intervention. The automatic
correction performance of punctuation errors is still acceptable, achieving 100 successful corrections. However, the
number of automatic corrections for vocabulary usage errors and other errors is relatively low, only 80 and 40
respectively. It indicates that the system's automation ability is limited when dealing with these types of errors, and
the automatic correction performance in these areas needs further optimization.
The demand for manual intervention in different types of errors reveals specific challenges for automated error
correction systems. Punctuation errors require the most frequent intervention due to their close relationship with text
structure, totaling 40 times. Grammar errors demonstrate the challenge of handling complex grammar rules in 30
interventions. Although spelling errors have strong automatic correction ability, they still require 20 manual
interventions in contexts with diverse meanings. Vocabulary usage errors and other types of errors require 50 and 60
manual interventions, respectively, reflecting that the automatic correction of these errors relies heavily on
contextual information and the complexity of processing algorithms.
Hongwei Ju et al. / Procedia Computer Science 247 (2024) 18–26 25
8 Hongwei Ju / Procedia Computer Science 00 (2024) 000–000
160
50
140
120
40
100
80
30
60
40 20
20
Grammar Spelling Punctuation Vocabulary usage Other
Error type
5. Conclusion
This research aims to establish a machine learning method suitable for correcting grammatical errors in business
English, represented by convolutional neural networks and recurrent neural networks, with the goal of improving the
business communication ability of business English students and enhancing their language expression ability. The
main research content includes: massive data collection and preprocessing, efficient feature extraction, construction
and training of various deep learning models, and system performance evaluation. The method proposed in this
study has achieved good results on various evaluation indicators, and experimental results show that it can
effectively identify and correct some uncommon complex grammar errors. Through practical application
experiments, feedback from business English students has been obtained, proving that the system has a significant
effect on improving the writing competence of their business English. Although the system has achieved satisfactory
results in many fields, it also has certain shortcomings. However, the recognition ability of existing models for
special grammatical errors such as appositives and prepositions still needs to be improved. In addition, this method
also requires high computational resources and is difficult to meet the requirements for real-time feedback. The
research content mentioned in this article includes: firstly, the training samples are expanded and enriched to
improve the model's generalization performance and accuracy. Secondly, this research studies a lightweight model
architecture suitable for mobile terminals and resource scarce application scenarios. On this basis, this research also
studies more advanced natural language processing methods, such as converters. Through continuous learning and
improvement, this system is expected to be widely applied in business English teaching and practice, and it also
plays an important role in facilitating business English students to improve their grammar when speaking or writing
English in the business context.
References
[1] Lan Keran. Research on the Application of Cognitive Context Clue Chain Model in Business English Reading [J]. Journal of Shaoguan
University, 2022, 43 (10): 79-84.
[2] Fu Weili, Du Tingting, Zhang Yunjuan. Research and Practice on Cross disciplinary Teaching of Chinese Language and Business English in
Vocational Schools [J]. Vocational Education, 2020, 019 (018): P.55-60.
[3] Xu Min. Research on Modular Teaching of Business English Majors in Higher Vocational Education Based on Blended Teaching Model [J].
Journal of Hubei Open Vocational College, 2022, 35 (22): 175-177.
26 Hongwei Ju et al. / Procedia Computer Science 247 (2024) 18–26
Hongwei Ju / Procedia Computer Science 00 (2024) 000–000 9
[4] Yang Tian, Tian Xuejing. Research on the reform of business English talent training mode of application-oriented undergraduate colleges in
the context of the "the Belt and Road" [J]. Journal of Qilu Normal University, 2022, 37 (1): 53-57.
[5] Qin Ying. The construction and practice of a smart classroom ecosystem for business English in universities under the background of
informatization [J]. China New Communications, 2023, 25 (1): 206-208.
[6] Chen G , Liang M .Verb form error detection in written English of Chinese EFL learners:A study based on Link Grammar and Pattern
Grammar[J].International Journal of Corpus Linguistics, 2022, 27(2):139-165.
[7] Hussein S M .The Correlation between Error Correction and Grammar Accuracy in Second Language Writing[J].International Journal of
Psychosocial Rehabilitation, 2020, 24(5):2980-2990.
[8] Kholmurodova D K .COGNITIVE GRAMMAR IN ENGLISH LESSONS[J].Theoretical & Applied Science, 2020, 84(4):390-392.
[9] Krismayani N W, Suastra I M, Suparwa I N, et al. Lexical and Grammatical Features of Business English[J]. The International Journal of
Social Sciences World (TIJOSSW), 2020, 2(1): 51-64.
[10] Zhong Y, Yue X. On the correction of errors in English grammar by deep learning[J]. Journal of Intelligent Systems, 2022, 31(1): 260-270.
[11] Tsai Y R. Exploring the effects of corpus-based business English writing instruction on EFL learners’ writing proficiency and perception[J].
Journal of Computing in Higher Education, 2021, 33(2): 475-498.