Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022
Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022
Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022
net/publication/365870747
CITATIONS READS
0 412
2 authors, including:
Chinmaya Nayak
Netaji Subhas Institute of Technology
2 PUBLICATIONS 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Chinmaya Nayak on 12 March 2023.
Chinmaya Nayak
Department of Computer Science & Engineering, Netaji Shubhash University Of Technology
New Delhi, India, chinunayak01@gmail.com 0000-0003-2496-6443
Arvind Kumar
Department of Computer Science & Engineering, Netaji Shubhash University Of Technology
New Delhi, India, er.arvindkumar@gmail.com, 0000-0003-2334-4482
Abstract— Long phrases might be tedious to write, but text prediction technology built into
keyboards makes this simple. Another name for the next word prediction is language modelling. The
task at hand is anticipating the first word that will be said. It has several applications and is one of
the main tasks of human language technology. This approach employs letter to letter prediction and
says that it predicts a letter when letter is used to build a word. Long short time memory equation
can sense prior text and anticipate the words that may be beneficial again for users to surround
phrases.
Keywords—LSTMs, Activation function, classification, Next Word
I. INTRODUCTION
Word prediction technologies have been created to help people talk more easily and to help those
who write more slowly. This research introduces a languages prototype architecture for rapid digital
communication that forecasts the likely upcoming word given a set of present words. With only a
few starting text chunks, word prediction techniques will be tasked with predicting the previous word
that is likely to follow. By recommending appropriate terms to the user, we want to make the process
of immediate digital communication easier. Convolutional Neural Networks (CNN), Recurrent
Neural Networks (RNN), and Long Short-Term Memory Networks (LSTM) were suggested with
both the research and use of deep learning technologies. When evaluating prediction issues, LSTM
as well as other nonlinear sequences models may successfully handle serialised data. As just a
standard time-series forecasting model, LSTM may employ long-distance temporal features to
forecast post-sequential time series by accounting for both long-term memory and short-term
memory in its memory mechanism. As a result, the suggested words are consistent with the particular
user's vocabulary preferences.
As a result, this work proposes an LSTM-based word input predictive model. First, the textual data
set for a particular industry is normalised. Next, an LSTM networks is utilized to training the heavily
processed text to create an industrial keyword prediction model that is then applied to the industry's
input system.
II. LITERATURE REVIEW
The Next Word Prediction model, which is used by previous systems, predicts that the next word
will support the one that came before it. These systems operate using a machine learning method that
5161
Chinmaya Nayak , 2022 Advanced Engineering Science
is limited in its ability to produce proper syntax. The model in the prior study employed RNN
(Recurrent Neural Network) approaches to predict or determine the next word, with a decent
accuracy of 86% but occasionally incorrect syntax due to issues with the RNN model. As is well
knowledge, RNN employs Weights and Bias with time stamps and sigmoid functions to produce a
probabilistic output for the prediction model. something we must rely upon. When we employ
derivatives and derivatives repeatedly, a problem arises. The Long Term Activation-dependent
behaviour function (sigmoid) that resulted in a computation mistake.
The model's Weights and Biases in RNN approaches do not update correctly as the number of layers
rises. As a result, the model becomes inaccurate, and as more people fall victim to this, our model
becomes even less accurate.
Previous studies also included some LSTM work for the prediction. So, because challenge of long-
term dependencies was specified in the previous system, the RNN model can predict "Boy" with ease
if we deal with short words like: - He is a " ". However, if the phrases were lengthy, the system may
have forgotten the context, which creates a long-term dependency issue. Additionally, when we use
earlier word prediction methods, the RNN model suffers from unidirectionality, which is another
issue.
In conclusion, the RNN or Vanish Gradient Descend is a difficulty that arises when using a machine
learning network with backpropagation. It has a significant impact, the model update process weights
are significantly and broadly impacted, and the model leads become completely unusable. We choose
to employ LSTM (Long Short-Term Memory) instead of RNN with some characteristics because of
this.
Previous systems operate using a word prediction model, which predicts that the word will be backed
by an immediate succeeding word. These systems operate using machine learning techniques that
are limited in their ability to provide proper syntax. The multi-window required, and a residual-
connected lowest gated unit (MGU), a condensed version of the LSTM, has also been developed. In
order to save training time and increase accuracy, CNN attempts to skip a few levels in this instance.
However, using different levels of neural network models will result in delay when predicting a large
number of words. Models created using lstm Model algorithms are capable of managing more
knowledge quickly and predicting higher-quality outcomes than models built without these
algorithms. Developing technologies have been producing more accurate results than the current
system technologies.
In this research we include using an RNN model and LSTMs together for Prediction. Use of current
logic data as opposed to all historical data logs. Greater precision with shorter execution times. The
model does not require timestamps. Utilize some elementary logic gates (and, or, and xor) to reduce
the prior data. After each stage, feedback is offered to improve accuracy.
5162
ISSN: 2096-3246
Volume 54, Issue 02, November, 2022
level examination of both the result from the prior input and this input. RNNs can process input
sequences in a way that convey neural networks cannot because of its inner representation (memory).
The Recurrent neural network inputs are linked together.
Like CNN as well as ANN(artificial neural networks), RNNs have a design that is primarily
composed of three layers: an input nodes, a dense nodes, and an output units. Once more, these levels
function in order. The data is first fetched by input layers, which also perform data pre-processing.
Once the data has been filtered, it is then moved to hidden layers, where several neural networks'
activation functions and algorithms are run in order to extract useful information. Finally, this
collection of information is sent to the output layer.
The main function of RNN is to give each layer the exact same and exactly equivalent weight and
bias, which causes the control variables to become dependent variables. It will also decrease the
parameters and memorise each prior output by feeding each output into the following hidden layer,
which allows all four levels to be linked together into an only one recurrent layer with the same
weights and biases across all hidden layers.
The difference between recurrent neural networks and other neural networks is that recurrent neural
networks everyone has loops for storing and processing information, although neural networks do
not. One method may also be that recurrent neural networks can connect prior data to the current
situation. Information persistence is a feature of recurrent neural networks that is absent from
traditional neural networks. The core component of an RNN, also known as the key to its long-lasting
memory, is an LSTM. A gated recurrent unit, or GRU for short, has been shown to be as efficient as
an LSTM and occasionally even more so because to its superior speed and accuracy.
5163
Chinmaya Nayak , 2022 Advanced Engineering Science
TECHNIQUES
S.NO ADVANTAGES DISADVANTAGES
USED
Minimize the loss function and
Simple RNN
1. get predicted data in the minimum Not forming correct syntax.
techniques.
amount of time.
Simple RNN It also helps in minimizing loss
with activation function with derivatives and High execution time with less
2. probabilistic rules.
function accuracy percentage.
(Sigmoid). S=1/1+e^-x
It also helps in minimizing loss
Simple RNN function with derivatives and Less execution time with more
3. with activation probabilistic rules. accuracy than sigmoid function.
function (Tanh).
T=(e^x – e^-x)/( e^x + e^-x)
It also helps in minimizing loss
Simple RNN function with derivatives rule Less execution time with more
4. with activation only. accuracy than sigmoid function
function (Rule). and Tanh.
R=Max(0,X)
RNN with It is used with an activation
It is only applicable for
5. Gradient function with time stamps to
sequential sentences.
Problem. catch or guess the next word.
It is used to make a chain rule for
It leads to making high
RNN with the backpropagation process to
6. execution with less accuracy
Backpropagation get output threshold values for a
and memory loss.
better approach.
It gives an output of the previous
This model over ally
RNN with node to the present node for
computational expression that
7. Stack-Up / prediction and makes correct
goes on can never be justified
paralleling. syntax as compared to other
with any accuracy gain.
techniques.
5164
ISSN: 2096-3246
Volume 54, Issue 02, November, 2022
When back propagation is taken into account, neural networks may have vanished gradient
descent as a drawback. The value updating mechanism is significantly impacted by the Brobdingnag
Ian effect, and the model is now worthless. As a result, LSTM, which has a hidden layer and a storage
cell with three forgot, scanned, and inputs gates.
This forget gate is mostly used to govern what information has to be deleted that is unnecessary. The
input gate ensures that new information is added to the cell, while the output gate ensures that the
cell's components are sent to a later concealed state. Every gate equation uses the sigmoid function,
which ensures that the value will be reduced to 0 to 1.
It has a chain-like architecture and is a component of the RNN, however it features a four-layer
neural network rather than a single-layer one. Architectural gates in the LSTM architecture can add
or delete any data. In LSTM, there are five structural components. They are as follows:
1. Input gate
2. Forget gate
3. Cell
4. Output gate
5. Hidden state output
The input gate takes the input information, followed by the forget gate, which instructs the cell to
disregard or forget any irrelevant information. It accomplishes this by multiplying the value of the
irrelevant information by zero, leaving it with no value. This knowledge is then returned to the cell,
where a subsequent output gate analyses the output.
5165
Chinmaya Nayak , 2022 Advanced Engineering Science
A rudimentary LSTM prediction model given our scenario is shown in Fig. 2 below. The model
decrypts a sequence of words describing a potential future sub event from the embedding after
encoding the input sequence of words describing prior sub events into an associate degree
embedding.
C. Relu activation function
A rectified linear unit (ReLU) is just an activation function that provides a deep learning model the
flexibility to be non-linear and addresses the vanishing gradients problem. It understands the
conclusive aspect of its case. One of the most well-liked deep learning activation functions is this
one.
ReLU formula is : f(x) = max(0,x)
5166
ISSN: 2096-3246
Volume 54, Issue 02, November, 2022
B. TEXT ANALYSIS
Text analysis is the systematic process of reading and comprehending human-written text using
computer tools to get business insights. Text analysis technology can categorise, sort, and extract
data on its own from texts to find patterns, connections, attitudes, and other useful information.
The Term Document Matrix function was used to create term matrixes in order to obtain an
accounting of term frequency in order to determine the rate of occurrences of words.
C. TOKENIZATION
The practice of breaking up a huge amount of text into tokens is known as tokenization. These tokens
serve as a great starting point for stemmed and inflectional and are particularly helpful for identifying
5167
Chinmaya Nayak , 2022 Advanced Engineering Science
trends.
Tokenization is one of the essential social control techniques. It just divides the continuously flowing
text into separate word parts. One really simple method would be to divide inputs by home and give
each word its own identity.
D. PAD SEQUENCE
It remains challenging to give our neural networks inputs of comparable length when converting
texts to numerical values. Not all sentences are the same length. The pad sequence’s function is used
to truncate some of the larger sequence and replace some of the shorter phrases with zeroes.
Furthermore, since it is frequently indicated if it is necessary to cushion and truncation either at the
start or the termination, relying mostly on pre-setting and post setting again for padding arguments
and the truncating arguments by default, truncation and artefact arguments may occur at the
beginning of the sequence.
5168
ISSN: 2096-3246
Volume 54, Issue 02, November, 2022
The entire dataset is divided up into discrete word phases. A word index is created during
segmentation using a predetermined characteristic variety of words.
When we train the model, the accuracy is found to be 92-96% per set of data.
5169
Chinmaya Nayak , 2022 Advanced Engineering Science
REFERENCES
[1] Sourabh Ambulgekar, Sanket Malewadikar, Raju Garande and Bharti Joshi (Dr.).” Next word
Prediction using RNN” Computing and Communication 2021 (ICACC-2021)
09/August/2021
[2] Sachin minocha, tarun kumar,International Journal of Recent Technology and Engineering
(IJRTE) ISSN: 2277-3878, Volume-8 Issue-4, November 2019. “Recurrent
Neural Network based Models for Word Prediction”.
5170
View publication stats
ISSN: 2096-3246
Volume 54, Issue 02, November, 2022
[3] IlyaSutskever, James Martens, Geoffrey E. Hinton,” Generating text with recurrent neural
networks” International Conference on Machine Learning April 2015.
[4] anshul kumar, rajiv singh “A Deep Learning Approach in Predicting the Next
Word(s).”,Toward data science April 2020.
[5] “Various Optimization Algorithms for Training Neural Network” , Towared Data science.
[6] Tomáš Mikolov , Czech Republic ,Lukáš Burget Jan Černocký Sanjeev Khudanpur”
Extensions of recurrent neural network language model” 2021 .
[7] Akhter Mohiuddin RatherAkhter Mohiuddin RatherArun AgarwalV. N. SastryV. N. Sastry,”
Recurrent neural network and a hybrid model for prediction of the word”,
03/December/2014.
[8] Keerthana N, Harikrishnan S, Konsaha Buji M, Jona J B,” NEXT WORD PREDICTION
“,12 December 2021, International Journal of Creative Research Thoughts
(IJCRT).
[9] Prasad R,” NEXTWORDPREDICTIONANDCORRECTIONSYSTEM
USINGTENSORFLOW”, E-ISSN No : 2454-9916 | Volume : 3 | Issue : 5 | May
2017, International Education & Research Journal [IERJ].
5171