Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/365870747
NEXT WORD PREDICTION USING MACHINE LEARNING TECHNIQUES
Article in Cybersecurity · November 2022
CITATIONS READS
0 412
2 authors, including:
Chinmaya Nayak
Netaji Subhas Institute of Technology
2 PUBLICATIONS 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
word prediction View project
NEXT WORD PREDICTION USING MACHINE LEARNING TECHNIQUES View project
All content following this page was uploaded by Chinmaya Nayak on 12 March 2023.
The user has requested enhancement of the downloaded file.

ISSN: 2096-3246
Volume 54, Issue 02, November, 2022
NEXT WORD PREDICTION USING MACHINE LEARNING TECHNIQUES
Chinmaya Nayak
Department of Computer Science & Engineering, Netaji Shubhash University Of Technology
New Delhi, India, chinunayak01@gmail.com 0000-0003-2496-6443
Arvind Kumar
Department of Computer Science & Engineering, Netaji Shubhash University Of Technology
New Delhi, India, er.arvindkumar@gmail.com, 0000-0003-2334-4482
Abstract— Long phrases might be tedious to write, but text prediction technology built into
keyboards makes this simple. Another name for the next word prediction is language modelling. The
task at hand is anticipating the first word that will be said. It has several applications and is one of
the main tasks of human language technology. This approach employs letter to letter prediction and
says that it predicts a letter when letter is used to build a word. Long short time memory equation
can sense prior text and anticipate the words that may be beneficial again for users to surround
phrases.
Keywords—LSTMs, Activation function, classification, Next Word
I. INTRODUCTION
Word prediction technologies have been created to help people talk more easily and to help those
who write more slowly. This research introduces a languages prototype architecture for rapid digital
communication that forecasts the likely upcoming word given a set of present words. With only a
few starting text chunks, word prediction techniques will be tasked with predicting the previous word
that is likely to follow. By recommending appropriate terms to the user, we want to make the process
of immediate digital communication easier. Convolutional Neural Networks (CNN), Recurrent
Neural Networks (RNN), and Long Short-Term Memory Networks (LSTM) were suggested with
both the research and use of deep learning technologies. When evaluating prediction issues, LSTM
as well as other nonlinear sequences models may successfully handle serialised data. As just a
standard time-series forecasting model, LSTM may employ long-distance temporal features to
forecast post-sequential time series by accounting for both long-term memory and short-term
memory in its memory mechanism. As a result, the suggested words are consistent with the particular
user's vocabulary preferences.
As a result, this work proposes an LSTM-based word input predictive model. First, the textual data
set for a particular industry is normalised. Next, an LSTM networks is utilized to training the heavily
processed text to create an industrial keyword prediction model that is then applied to the industry's
input system.
II. LITERATURE REVIEW
The Next Word Prediction model, which is used by previous systems, predicts that the next word
will support the one that came before it. These systems operate using a machine learning method that
5161
Chinmaya Nayak , 2022 Advanced Engineering Science
is limited in its ability to produce proper syntax. The model in the prior study employed RNN
(Recurrent Neural Network) approaches to predict or determine the next word, with a decent
accuracy of 86% but occasionally incorrect syntax due to issues with the RNN model. As is well
knowledge, RNN employs Weights and Bias with time stamps and sigmoid functions to produce a
probabilistic output for the prediction model. something we must rely upon. When we employ
derivatives and derivatives repeatedly, a problem arises. The Long Term Activation-dependent
behaviour function (sigmoid) that resulted in a computation mistake.
The model's Weights and Biases in RNN approaches do not update correctly as the number of layers
rises. As a result, the model becomes inaccurate, and as more people fall victim to this, our model
becomes even less accurate.
Previous studies also included some LSTM work for the prediction. So, because challenge of long-
term dependencies was specified in the previous system, the RNN model can predict "Boy" with ease
if we deal with short words like: - He is a " ". However, if the phrases were lengthy, the system may
have forgotten the context, which creates a long-term dependency issue. Additionally, when we use
earlier word prediction methods, the RNN model suffers from unidirectionality, which is another
issue.
In conclusion, the RNN or Vanish Gradient Descend is a difficulty that arises when using a machine
learning network with backpropagation. It has a significant impact, the model update process weights
are significantly and broadly impacted, and the model leads become completely unusable. We choose
to employ LSTM (Long Short-Term Memory) instead of RNN with some characteristics because of
this.
Previous systems operate using a word prediction model, which predicts that the word will be backed
by an immediate succeeding word. These systems operate using machine learning techniques that
are limited in their ability to provide proper syntax. The multi-window required, and a residual-
connected lowest gated unit (MGU), a condensed version of the LSTM, has also been developed. In
order to save training time and increase accuracy, CNN attempts to skip a few levels in this instance.
However, using different levels of neural network models will result in delay when predicting a large
number of words. Models created using lstm Model algorithms are capable of managing more
knowledge quickly and predicting higher-quality outcomes than models built without these
algorithms. Developing technologies have been producing more accurate results than the current
system technologies.
In this research we include using an RNN model and LSTMs together for Prediction. Use of current
logic data as opposed to all historical data logs. Greater precision with shorter execution times. The
model does not require timestamps. Utilize some elementary logic gates (and, or, and xor) to reduce
the prior data. After each stage, feedback is offered to improve accuracy.
III. THE ALGORITHM

A. RNN(Recurrent Neural Network)
Continuous neural networks were extensions of a memory-equipped instant propagation neural
network. The RNN is repeated in that it executes a fixed operation for each knowledge entry, but at
a fixed rate, this output is dependent on the preceding computation. The decision is based on a degree-
5162
ISSN: 2096-3246
level examination of both the result from the prior input and this input. RNNs can process input
sequences in a way that convey neural networks cannot because of its inner representation (memory).
The Recurrent neural network inputs are linked together.
Like CNN as well as ANN(artificial neural networks), RNNs have a design that is primarily
composed of three layers: an input nodes, a dense nodes, and an output units. Once more, these levels
function in order. The data is first fetched by input layers, which also perform data pre-processing.
Once the data has been filtered, it is then moved to hidden layers, where several neural networks'
activation functions and algorithms are run in order to extract useful information. Finally, this
collection of information is sent to the output layer.
The main function of RNN is to give each layer the exact same and exactly equivalent weight and
bias, which causes the control variables to become dependent variables. It will also decrease the
parameters and memorise each prior output by feeding each output into the following hidden layer,
which allows all four levels to be linked together into an only one recurrent layer with the same
weights and biases across all hidden layers.
The difference between recurrent neural networks and other neural networks is that recurrent neural
networks everyone has loops for storing and processing information, although neural networks do
not. One method may also be that recurrent neural networks can connect prior data to the current
situation. Information persistence is a feature of recurrent neural networks that is absent from
traditional neural networks. The core component of an RNN, also known as the key to its long-lasting
memory, is an LSTM. A gated recurrent unit, or GRU for short, has been shown to be as efficient as
an LSTM and occasionally even more so because to its superior speed and accuracy.
This state formula is that the following: -

hf = f(ht-1,xt)
Application of the activation operate: -
Ht = tanh(Whhht-1 + Wxhxt)
Figure 1: Basic RNN Structure
5163
A.1. Basic RNN Techniques

1. Simple RNN techniques.
2. Simple RNN with activation function (Sigmoid).
3. Simple RNN with activation function (Tanh).
4. Simple RNN with activation function (Rule).
5. RNN with Gradient Problem.
6. RNN with Backpropagation
7. RNN with Stack-Up / paralleling.
8. Recurrent method with the feed-forward network.
9. RNN with Sequence levels.
These are some techniques which were used in RNN for prediction techniques, which having some
kind of goals and gaps which is discussed in table 1. (below)
TECHNIQUES
S.NO ADVANTAGES DISADVANTAGES
USED
Minimize the loss function and
Simple RNN
1. get predicted data in the minimum Not forming correct syntax.
techniques.
amount of time.
Simple RNN It also helps in minimizing loss
with activation function with derivatives and High execution time with less
2. probabilistic rules.
function accuracy percentage.
(Sigmoid). S=1/1+e^-x
It also helps in minimizing loss
Simple RNN function with derivatives and Less execution time with more
3. with activation probabilistic rules. accuracy than sigmoid function.
function (Tanh).
T=(e^x – e^-x)/( e^x + e^-x)
It also helps in minimizing loss
Simple RNN function with derivatives rule Less execution time with more
4. with activation only. accuracy than sigmoid function
function (Rule). and Tanh.
R=Max(0,X)
RNN with It is used with an activation
It is only applicable for
5. Gradient function with time stamps to
sequential sentences.
Problem. catch or guess the next word.
It is used to make a chain rule for
It leads to making high
RNN with the backpropagation process to
6. execution with less accuracy
Backpropagation get output threshold values for a
and memory loss.
better approach.
It gives an output of the previous
This model over ally
RNN with node to the present node for
computational expression that
7. Stack-Up / prediction and makes correct
goes on can never be justified
paralleling. syntax as compared to other
with any accuracy gain.
techniques.
5164
ISSN: 2096-3246
Recurrent It helps to make the input before

It takes quite low computational
method with the current with a minimized number
8. & quite difficult to make
feed-forward of rules and good syntax
continue with RNN techniques.
network. prediction.
It is difficult to predict of longer
sequence due to long-term
It helps to predict the next word
RNN with dependencies which lead to loss
9. for smaller sentences with high
Sequence levels. of memory or forgotten memory
accuracy.
while continuing to the next
stage.
Table1: Basic RNN Techniques

B. LSTM(Long-Short Term Memory)
Since these networks were created for long-term dependency, their ability to recall information for
an extended period of time without having to learn it over and over again is what distinguishes them
from other neural networks and makes the entire process easier and quicker. This particular sort of
RNN has a built-in memory to store data.
When back propagation is taken into account, neural networks may have vanished gradient
descent as a drawback. The value updating mechanism is significantly impacted by the Brobdingnag
Ian effect, and the model is now worthless. As a result, LSTM, which has a hidden layer and a storage
cell with three forgot, scanned, and inputs gates.
This forget gate is mostly used to govern what information has to be deleted that is unnecessary. The
input gate ensures that new information is added to the cell, while the output gate ensures that the
cell's components are sent to a later concealed state. Every gate equation uses the sigmoid function,
which ensures that the value will be reduced to 0 to 1.
It has a chain-like architecture and is a component of the RNN, however it features a four-layer
neural network rather than a single-layer one. Architectural gates in the LSTM architecture can add
or delete any data. In LSTM, there are five structural components. They are as follows:
1. Input gate
2. Forget gate
3. Cell
4. Output gate
5. Hidden state output
The input gate takes the input information, followed by the forget gate, which instructs the cell to
disregard or forget any irrelevant information. It accomplishes this by multiplying the value of the
irrelevant information by zero, leaving it with no value. This knowledge is then returned to the cell,
where a subsequent output gate analyses the output.
5165
Figure 2: Basic LSTM Structure
A rudimentary LSTM prediction model given our scenario is shown in Fig. 2 below. The model
decrypts a sequence of words describing a potential future sub event from the embedding after
encoding the input sequence of words describing prior sub events into an associate degree
embedding.
C. Relu activation function
A rectified linear unit (ReLU) is just an activation function that provides a deep learning model the
flexibility to be non-linear and addresses the vanishing gradients problem. It understands the
conclusive aspect of its case. One of the most well-liked deep learning activation functions is this
one.
ReLU formula is : f(x) = max(0,x)
Figure 3: Basic ReLU activation function
D. Soft Max activation function

The activation function known as Softmax scales numbers and logits into probabilities. A Softmax
produces a vector (let's say v) containing probability for each potential result. For all potential
outcomes or classes, the probability in vector v add up to one.
Softmax formula is:
probability = exp(value) / sum v in list exp(v)
5166
ISSN: 2096-3246
Figure 4: Basic Softmax activation function
IV. PROPOSED METHODOLOGY

System inputs must be gathered. Tokenizer will be used to separate inputs into tokens. Tokenized
inputs separated into parts that will be incorporated in Model 1 of LSTMs. Model 1's LSTM outputs
will be used as Model 2's input. Model 2 outputs should be utilized with the RNN- ReLU activation
algorithm (derivatives). Along with the most recent output, additional RNN (Softmax) activation
functions will be applied. Model 3 is a prediction model that will be developed after obtaining the
final derivative output. This Model assists with accuracy and will allow us to share input from
previous nodes.
A. DATA PREPROCESSING
These are simple cleanup operations that make using the information in later phases simpler. Tensor
flow library is used to help manage this strategy.
The following are some pre-processing stages that are usually performed:
1. Indicating white spaces
2. Conversions to lower case
3. Eliminating numbers
4. Deleting the punctuation
5. Deleting offensive words
6. Taking out foreign words
B. TEXT ANALYSIS
Text analysis is the systematic process of reading and comprehending human-written text using
computer tools to get business insights. Text analysis technology can categorise, sort, and extract
data on its own from texts to find patterns, connections, attitudes, and other useful information.
The Term Document Matrix function was used to create term matrixes in order to obtain an
accounting of term frequency in order to determine the rate of occurrences of words.
C. TOKENIZATION
The practice of breaking up a huge amount of text into tokens is known as tokenization. These tokens
serve as a great starting point for stemmed and inflectional and are particularly helpful for identifying
5167
trends.
Tokenization is one of the essential social control techniques. It just divides the continuously flowing
text into separate word parts. One really simple method would be to divide inputs by home and give
each word its own identity.
D. PAD SEQUENCE
It remains challenging to give our neural networks inputs of comparable length when converting
texts to numerical values. Not all sentences are the same length. The pad sequence’s function is used
to truncate some of the larger sequence and replace some of the shorter phrases with zeroes.
Furthermore, since it is frequently indicated if it is necessary to cushion and truncation either at the
start or the termination, relying mostly on pre-setting and post setting again for padding arguments
and the truncating arguments by default, truncation and artefact arguments may occur at the
beginning of the sequence.
Figure 5: Proposed methodology
V. IMPLEMENTATION AND RESULTS

We have met accuracy and prediction as targeted at the initial stage of the project as presented below
accuracy as been mapped in the graphical form as well.
5168
ISSN: 2096-3246
Figure 6: Tokenization of words
The entire dataset is divided up into discrete word phases. A word index is created during
segmentation using a predetermined characteristic variety of words.
Figure 7: Accuracy Level
When we train the model, the accuracy is found to be 92-96% per set of data.
5169
Figure 8: Accuracy check
Figure 8: Final result of model

VI. CONCLUSION
On the basis of the presented dataset, the ensuing predictive analytic model is fairly accurate.
Applying multiple pattern-discovery techniques is necessary for NLP in order to get rid of noisy data.
In around one hundred epochs, the loss was greatly decreased. The processing of huge files or
datasets still requires considerable efficiency. To improve the model's predictions, though, limit
pretreatment processes and bound model adjustments are frequently developed.
As a result, the LSTM-based keyword input prediction system developed in this study may be
successfully used in particular sectors and increase user input efficiency. The appropriate industry
language prediction model is created and implemented into the input device of the industry by
processing the text data set of that industry and training the preprocessed text using LSTM.
REFERENCES
[1] Sourabh Ambulgekar, Sanket Malewadikar, Raju Garande and Bharti Joshi (Dr.).” Next word
Prediction using RNN” Computing and Communication 2021 (ICACC-2021)
09/August/2021
[2] Sachin minocha, tarun kumar,International Journal of Recent Technology and Engineering
(IJRTE) ISSN: 2277-3878, Volume-8 Issue-4, November 2019. “Recurrent
Neural Network based Models for Word Prediction”.
5170
View publication stats
ISSN: 2096-3246
[3] IlyaSutskever, James Martens, Geoffrey E. Hinton,” Generating text with recurrent neural
networks” International Conference on Machine Learning April 2015.
[4] anshul kumar, rajiv singh “A Deep Learning Approach in Predicting the Next
Word(s).”,Toward data science April 2020.
[5] “Various Optimization Algorithms for Training Neural Network” , Towared Data science.
[6] Tomáš Mikolov , Czech Republic ,Lukáš Burget Jan Černocký Sanjeev Khudanpur”
Extensions of recurrent neural network language model” 2021 .
[7] Akhter Mohiuddin RatherAkhter Mohiuddin RatherArun AgarwalV. N. SastryV. N. Sastry,”
Recurrent neural network and a hybrid model for prediction of the word”,
03/December/2014.
[8] Keerthana N, Harikrishnan S, Konsaha Buji M, Jona J B,” NEXT WORD PREDICTION
“,12 December 2021, International Journal of Creative Research Thoughts
(IJCRT).
[9] Prasad R,” NEXTWORDPREDICTIONANDCORRECTIONSYSTEM
USINGTENSORFLOW”, E-ISSN No : 2454-9916 | Volume : 3 | Issue : 5 | May
2017, International Education & Research Journal [IERJ].
5171

Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

NEXT WORD PREDICTION USING MACHINE LEARNING TECHNIQUES

Article in Cybersecurity · November 2022

word prediction View project

NEXT WORD PREDICTION USING MACHINE LEARNING TECHNIQUES View project

The user has requested enhancement of the downloaded file.

NEXT WORD PREDICTION USING MACHINE LEARNING TECHNIQUES

III. THE ALGORITHM

This state formula is that the following: -

A.1. Basic RNN Techniques

Recurrent It helps to make the input before

Table1: Basic RNN Techniques

Figure 2: Basic LSTM Structure

Figure 3: Basic ReLU activation function

D. Soft Max activation function

Figure 4: Basic Softmax activation function

IV. PROPOSED METHODOLOGY

Figure 5: Proposed methodology

V. IMPLEMENTATION AND RESULTS

Figure 6: Tokenization of words

Figure 7: Accuracy Level

Figure 8: Accuracy check

Figure 8: Final result of model

You might also like