[go: up one dir, main page]

0% found this document useful (0 votes)
49 views90 pages

A Project Report On

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 90

A Project Report

on
STOCK FORECASTING USING MACHINE LEARNING AND DEEP LEARNING
Submitted in partial fulfillment of the requirements for the Degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

By
P. RAJA KUMARI M. PREM KUMAR
(19FE1A05B7) (19FE1A0592)
M.PRAMEELA M.VENKAT SAI
(20FE5A0508) (19FE1A0581)

Under the guidance of

Ms. J.VIDYA M. Tech


Assistant Professor
Department of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


VIGNAN’S LARA INSTITUTE OF TECHNOLOGY &SCIENCE
(Affiliated to Jawaharlal Nehru Technological University Kakinada, Kakinada)
(An ISO 9001:2015 Certified Institution, Approved by AICTE)
NBA Accredited (CSE, IT, ECE, EEE & MECH)
Vadlamudi, Guntur Dist., Andhra Pradesh-522213.
April - 2023.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
VIGNAN’S LARA INSTITUTE OF TECHNOLOGY &SCIENCE
(Affiliated to Jawaharlal Nehru Technological University Kakinada,Kakinada)
(An ISO 9001:2015 Certified Institution, Approved by AICTE)
NBA Accredited (CSE, IT, ECE, EEE & MECH)
Vadlamudi, Guntur Dist., Andhra Pradesh-522213.

CERTIFICATE

This is to certify that a Project report entitled “STOCK FORECASTING USING MACHINE
LEARNING AND DEEP LEARNING” is a bonafide work done by RAJA KUMARI (19FE1A05B7),
M.PREM KUMAR(19FE1A0592), M.PRAMEELA(20FE5A0508), M.VENKAT SAI (19FE1A0581)
under my guidance and submitted in fulfillment of the requirements for the degree of Bachelor of Technology
in COMPUTER SCIENCE AND ENGINEERING from JAWAHARLAL NEHRU
TECHNOLOGICAL UNIVERSITY KAKINADA, KAKINADA. The work embodied in this project
report is not submitted to any University or Institution.

Project Guide Head of the Department


Ms. J. VIDYA, M . Tech Dr. K. VENKATESWARA RAO,Ph.D.
Assistant Professor Professor

External Examiner
DECLARATION

We hereby declare that a Project report entitled “STOCK FORECASTING USING MACHINE
LEARNING AND DEEP LEARNING” is a record of an original work done by us under the guidance
of Ms.J.VIDYA, M. Tech, Assistant Professor of Computer Science and Engineering, and this project
report is submitted in fulfillment of the requirements for the Degree of Bachelor of Technology in Computer
Science and Engineering. The results embodied in this project report are not submitted to any other
University or Institute.

Signature
Project Members

P. RAJA KUMARI (19FE1A05B7)


M. PREM KUMAR (19FE1A0592)
M. PRAMEELA (20FE5A0508)
M. VENKAT SAI (19FE1A0581)

Place: Vadlamudi
Date:
ACKNOWLEDGEMENT

The satisfaction that accompanies with the successful completion of any task would be incomplete without
the mention of people whose ceaseless cooperation made it possible, whose constant guidance and
encouragement crown all efforts with success.

We are grateful to Ms.J.VIDYA, M.Tech, Assistant professor, Department of Computer Science and
Engineering for guiding us through this project and for encouraging us right from the beginning of the
project till its successful completion of the project. Every interaction with her was an inspiration.

We thank Dr. K.VENKATESWARA RAO, Professor & HOD, Department of Computer Science and
Engineering for your support and valuable suggestions.

We also express our thanks to Dr. K. PHANEENDRA KUMAR, Principal, Vignan’s Lara Institute of
Technology & Science for providing the resources to carry out the project.

We also express our sincere thanks to our beloved Chairman Dr. LAVU RATHAIAH for providing support
and stimulating the environment for developing the project.

We also place our floral gratitude to all other teaching and lab technicians for their constant support and
advice throughout the project.

Project Members
P. RAJA KUMARI (19FE1A05B7)
M.PREM KUMAR (19FE1A0592)
M. PRAMEELA (20FE5A0508)
M.VENKAT SAI (19FE1A0581)
LIST OF CONTENTS
DESCRIPTION PAGE NUMBERS

ABSTRACT i
LIST OF FIGURES ii-iii
LIST OF ABBREVIATIONS iv

CHAPTER-1: INTRODUCTION 1-13


1.1 Stock market Prediction 1
1.2 Comparison of various learning classifiers 2
1.3 Machine Learning 2
1.4 Machine Learning Methods 3

1.5 Applications of Machine learning 6


1.6 Deep Learning 7
1.7 Applications of Deep Learning 8
1.8 KNN Algorithm 9

1.9 SVM Algorithm 10-11


1.10 LSTM Algorithm 12
1.11 Gradient Boosting Algorithm 13
CHAPTER-2: LITERATURE SURVEY 14-18
2.1 Journals 14-16
2.2 Existing System 16

2.3 Limitations of Existing Methods 17-18


CHAPTER-3: PROPOSED METHOD 19-29
3.1 Proposed System 19
3.1.1 KNN 20-22

3.1.2 SVM 23-24

3.1.3 LSTM 26-27

3.1.4 Gradient Boosting 28-29

CHAPTER-4: SYSTEM DESIGN 29-45


4.1 System Architecture 30
4.2 Feasibility Study 32

4.2.1 Economical Feasibility 32


32
4.2.2 Technical Feasibility
32
4.2.3social Feasibility
32
4.3 System Requirement Specification
4.4 Requirement Analysis 34
4.4.1 Requirement Specification 34

4.4.2 Hardware & Software Requirements 35


35
4.5 Data Flow Diagram
35
4.6 UML Diagrams
36
4.6.1 Use Case Diagram
37
4.6.2 Class Diagram
40
4.6.3 Object Diagram
41
4.6.4 Sequence Diagram 42
4.6.6 State Chart Diagram 44
4.6.7 Interaction Overview 45
CHAPTER-5: DEPLOYMENT 48
CHAPTER-6: SYSTEM TESTING 49-51
6.1 Testing 49
6.2 Types of Tests 49
6.2.1 Unit Testing 49
6.2.2 Integration Testing 49
6.2.3 Functional Testing 49
6.2.4 System Testing 50
6.2.5 White Box Testing 50
6.2.6 Black Box Testing 50
6.3 Verification and Validation 51

CHAPTER-7 : DATA SETS & RESULTS


52-66
7.1 Dataset
52-58
7.2 Results
59-66
CHAPTER-8: CONCLUSION AND FUTURE WORK
REFERENCES
67
APPENDIX
68-69
70-80
ABSTRACT

Stock forecasting is invariably a severe obstacle for advisors in financial statistics in extension for those
who want to engage in the stock market but are indecisive about the uncertainty of stock volatility accordingly,
our analysis centers on observing the transformation in stock prices and trying to fix them. Four artificial
intelligence algorithms are provided in this probe paper to anticipate stock market changes using ancient
evidence. The proposed machine learning and deep learning approaches include k-nearest neighbor (KNN), The
assumptions focus on the nearest neighbors which are at the centroid of data points for test instances. This
approach excludes the non-centric data points which can be statistically significant in the problem of predicting
stock price trends. For this, it is necessary to construct an enhanced model that integrates KNN with a
probabilistic method that utilizes both centric and non-centric data points in the computations of probabilities
for the target instances. gradient boosting support vector machine (SVM), and long-short term memory
(LSTM). Yahoo Finance will be employed to get documented data we gathered the results of several enterprise
data sets and evaluated the total progress and discovered that the LSTM algorithm was the finest predictor.

i
LIST OF FIGURES

S.NO LIST OF FIGURES


PAGE.NO
1.1 Artificial Intelligence 3
1.2 Machine Learning Models 4
1.3 Supervised Learning 4
1.4 Unsupervised Learning 5
1.5 KNN Classifier 10
1.6 Example of KNN 10
1.7 SVM Model 11
1.8 Gradient Boosting Model 11
1.9 LSTM Model 13
3.1 New Data Point 21
3.2 New Data Point To Required Category 23
3.3 SVM has two data sets (blue and green) 23
3.4 SVM Hyperplane 24
3.5 SVM Optimal Hyperplane 25
3.6 Non-Linear SVM 25
3.7 Non-Linear SVM(adding third dimension) 26
3.8 LSTM Simple Neural Network for Classification 26
3.9.1 LSTM Simple Neural Network for Regression 27
3.9.2 LSTM Neural network for Video Classification 27
3.9.3 Gradient Boosting Architecture 28
4.1 System Architecture 30
4.2 Data Flow Diagram 36
4.3 Uml Diagrams 37
4.4 Use Case Diagram 38
4.5 The Priority Checking 38
4.6 Class Diagram 41
4.7 Object Diagram 42
4.8 Sequence Diagram 43
4.9.1 Activity Diagram 44
4.9.2 State Chart Diagram 45
4.9.3 Interaction Diagram 46
7.1 Debt and Equity 53
7.2 Cash flow 54
7.3 Return on Equity 55
7.4 Price Trend 56
7.5 Asset_Turnover 57
7.6 Data Set 58
7.2.1 Home Page 59
7.2.2 Download Data Set 59
7.2.3 Correlation For Data 60
7.2.4 Data Pre-Processing 60
7.2.5 Knn With Uniform Weights 61
7.2.6 Knn With Distant Weights 61
7.2.7 Mse Value Of Svm 62
7.2.8 Run Svm Algorithm 62
ii
7.2.9 Run Gradient Booster Algorithm 63
7.2.10 Mse Value Of Gradient Booster Algorithm 63
7.2.11 Run Lstm Algorithm 64
7.2.12 Mse Of Lstm Algorithm 64
7.2.13 Predict The Test Data 65
7.2.14 Plotting the Prediction for KNN with Distance Weights 65
7.2.15 Plotting the Prediction for KNN with Uniform Weights 66

iii
LIST OF ABBREVIATIONS

SVM Support Vector Machine


KNN K-Nearest Neighbor
LSTM Long Short Term Memory
NNS Nearest Neighbor Search
FLANN Fast Library for Approximate Nearest Neighbors
AI Artificial Intelligence
OCR Optical Character Recognition
ANN Artificial Neural Networks
DL Deep Learning
GBM Gradient Boosting Machine
RNN Recurrent Neural Network
RSI Relative Strength Index
MACD Moving Average Convergence/Divergence Indicator
LR Logistic Regression
RF Random Forest
GUI Graphical User Interface
USD Use Case Diagram
UML Unified Modelling Language
SYS ML Systems Modelling Language
DE Debit Equity
CFF Cash Flow from Financing
CED Cash Flow from Issuing Equity (or) Debit Card
RP Repurchase of Debt and Equity
ROE Return on Equity
MSE Mean Stack Error

iv
CHAPTER 1
INTRODUCTION
1.1 Stock market Prediction:
Stock prediction is one of the most important problems in the economy. The movement
of the stock market is not clear due to its fluctuations in deciding variables. This probe intends
to approximately reduce the exposure allied with trend prediction employing an Artificial
intelligence approach. Stock exchange foretelling residue is an incomprehensible and pragmatic
trade. It's unusual when someone intentionally gives their effective stratagem. The main ambition
of this project is to build up the experimental awareness of stock exchange predicting. By being
compassionate about how markets endeavor, stockholders may be able to avert another economic
disaster. This study provides a calculable and mathematical assessment of cutting-edge
technology. Therefore, for this prediction, we use four forecasting models named K-Nearest
Neighbor (KNN), Gradient Boosting, Support Vector Machine (SVM), and Long Short-
Term Memory (LSTM). Here we use these algorithms to anticipate stock movement to anticipate
forthcoming values which will benefit people to spend their wealth for more acquisition and key
value more accurately of a stock, and just by using these algorithms to predict stock market
trends, it will also help expansion growth and the country’s economy. In this commodity, we
compare all algorithms with each other and determine the greater applicable algorithm for
enormous data sets as well as for the limited data set we have a kind of united stock market data
set other. We have collected these stock market data sets from Yahoo finance.

Analyzing financial data in securities has been an important and challenging issue in the
investment community. Stock price efficiency for publicly listed firms is difficult to achieve due
to the opposing effects of information competition among major investors and the adverse
selection costs imposed by their information advantage. There are two main schools of thought
in analyzing the financial markets. The first approach is known as fundamental analysis. The
methodology used in fundamental analysis evaluates a stock by measuring its intrinsic value
through qualitative and quantitative analysis. This approach examines a company’s financial
reports, management, industry, micro and macro-economic factors. The second approach is
known as technical analysis. The methodology used in technical analysis for forecasting the
direction of prices is through the study of historical market data. Technical analysis uses a variety
of charts to anticipate what are likely to happen. The stock charts include candlestick charts, line
charts, bar charts, point and figure charts, OHLC (open-high-low-close) charts and, mountain
charts. The charts are viewable in different time frames with price and volume. There are many

1
types of indicators used in the charts, including resistance, support, breakout, trending and
momentum.

Several alternatives to approach this type of problem have been proposed, which range
from traditional statistical modelling to methods based on computational intelligence and
machine learning. Vanstone and Tan surveyed the works in the domain of applying soft
computing to financial trading and investment. They categorized the papers reviewed in the
following areas: time series, optimization, hybrid methods, pattern recognition and classification.
Within the context of financial trading discipline, the survey showed that most of the research
was being conducted in the field of technical analysis. An integrated fundamental and technical
analysis model was examined to evaluate the stock price trends by focusing on macro economic
lysis. It also analyzed the company’ saviour and the associated industry in about economy which
in turn provide more information for investors in their investment decisions. A nearest neighbor
search (NNS) method produced an intended result by the use of KNN technique with technical
analysis. This model applied technical analysis on stock market data which include historical
price and trading volume. It applied technical indicators made up of stop loss, stop gain and RSI
filters. The KNN algorithm part applied the distance function on the collected data. This model
was compared with the buy-and-hold strategy by using the fundamental analysis approach.

Fast Library for Approximate Nearest Neighbours (FLANN) is used to perform


the searches for choosing the best algorithm found to work best among a collection of algorithms
in its library. Majhi etal. examined the FLANN model to predict the S&P 500 indices, and the
FLANN model was established by performing fast approximate nearest neighbour searches in
high dimensional spaces.

1.2 Comparison of various learning classifiers:


When developing a classifier using various functions from different classifiers, it is
important to compare the performances of the classifiers. Simulation results can provide us with
direct comparison results for the classifiers with a statistical analysis of the objective functions.
The hybrid KNN-Probabilistic model was compared with the supervised learning and
classification algorithms, including ‘KNN’, ‘LSTM’, ’Gradient Boosting’, ’SVM’.

1.3Machine Learning:
Machine learning is a subfield of artificial intelligence (AI). The goal of machine learning
generally is to understand the structure of data and fit that data into models that can be understood
and utilized by people. Although machine learning is a field within computer science, it differs

2
from traditional computational approaches. In traditional computing, algorithms are sets of
explicitly programmed instructions used by computers to calculate or problem solve.

Machine learning algorithms instead allow for computers to train on data inputs
and use statistical analysis in order to output values that fall within a specific range. Because of
this, machine learning facilitates computers in building models from sample data in order to
automate decision-making processes based on data inputs.

FIG 1.1 Artificial Intelligence

Any technology user today has benefitted from machine learning. Facial
recognition technology allows social media platforms to help users tag and share photos of friends. Optical
character recognition (OCR) technology converts images of text into movable type.

Machine learning is a continuously developing field. Because of this, there are some
considerations to keep in mind as you work with machine learning methodologies, or analyze the impact
of machine learning processes. We’ll look into the common machine learning methods of supervised and
unsupervised learning, and common algorithmic approaches in machine learning, including the k-nearest
neighbor algorithm, decision tree learning, and deep learning.

1.4 Machine Learning Methods:


Two of the most widely adopted machine learning methods are

1.Supervised learning

2.Unsupervised learning

3
FIG 1.2 Machine Learning Models

1. Supervised learning:

In supervised learning, the computer is provided with example inputs that are labeled with their
desired outputs. The purpose of this method is for the algorithm to be able to “learn” by comparing its
actual output with the “taught” outputs to find errors, and modify the model accordingly. Supervised
learning therefore uses patterns to predict label values on additional unlabeled data.

FIG 1.3 Supervised Learning

A common use case of supervised learning is to use historical data to predict statistically likely
future events. It may use historical stock market information to anticipate upcoming fluctuations, or be

4
employed to filter out spam emails. In supervised learning, tagged photos of dogs can be used as input
data to classify untagged photos of dogs.

2. Unsupervised Learning:
In unsupervised learning, data is unlabeled, so the learning algorithm is left to find commonalities
among its input data. As unlabeled data are more abundant than labeled data, machine learning methods
that facilitate unsupervised learning are particularly valuable.

The goal of unsupervised learning may be as straightforward as discovering hidden patterns


within a dataset, but it may also have a goal of feature learning, which allows the computational machine
to automatically discover the representations that are needed to classify raw data. Unsupervised learning
is commonly used for transactional data.

FIG 1.4 Unsupervised Learning

Without being told a “correct” answer, unsupervised learning methods can look at complex
data that is more expansive and seemingly unrelated in order to organize it in potentially meaningful
ways. Unsupervised learning is often used for anomaly detection including for fraudulent credit card
purchases, and recommender systems that recommend what products to buy next.

5
1.5 Applications of Machine learning:

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition and face
detection is, Automatic friend tagging suggestion: Facebook provides us a feature of auto friend
tagging suggestion. Whenever we upload a photo with our Facebook friends, then we automatically get
a tagging suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm. It is based on the Facebook project named "Deep Face," which
is responsible for face recognition and person identification in the picture.

2. Speech Recognition:

While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning. Speech recognition is a process of converting voice
instructions into text, and it is also known as "Speech to text", or "Computer speech recognition." At
present, machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition technology to
follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested
with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information from the
user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet surfing
on the same browser and this is because of machine learning. Google understands the user interest using
various machine learning algorithms and suggests the product as per customer interest. As similar, when
we use Netflix, we find some recommendations for entertainment series, movies, etc., and this is also
done with the help of machine learning.

6
5. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market, there is always a
risk of up and downs in shares, so for this machine learning's long short term memory neural
network is used for the prediction of stock market trends.

1.6 DEEP LEARNING


Deep learning technology is based on artificial neural networks (ANNs). These ANNs constantly
receive learning algorithms and continuously growing amounts of data to increase the efficiency of
training processes. The larger data volumes are, the more efficient this process is. The training process
is called deep, because, with the time passing, a neural network covers growing number of levels. The
deeper this network penetrates, the higher its productivity is. DL algorithms can create new tasks to
solve current ones.

Advantages of Deep Learning

Creating New Features One of the main benefits of deep learning over various machine
learning algorithms is its ability to generate new features from a limited series of features located
in the training dataset. Therefore, deep learning algorithms can create new tasks to solve current
ones. What does it mean for data scientists working in technological startups? Since deep
learning can create features without a human intervention, data scientistscan save much time on
working with big data and relying on this technology. It allows themto use more complex sets
of features in comparison with traditional machine learning software.

Advanced Analysis
Due to its improved data processing models, deep learning generates actionable results
when solving data science tasks. While machine learning works only with labelled data, deep
learning supports unsupervised learning techniques that allow the system to become smarter on
its own. The capacity to determine the most important features allows deep learning to efficiently
provide data scientists with concise and reliable analysis results.

DEEP LEARNING CHALLENGES

Deep learning is an approach that models human abstract thinking (or at least represents
an attempt to approach it) rather than using it. However, this technology has a setof significant
disadvantages despite all its benefits.

Continuous Input Data Management

In deep learning, a training process is based on analyzing large amounts of data.


Although, fast- moving and streaming input data provide little time for ensuring an efficient
7
training process. That is why data scientists have to adapt their deep learning algorithms in the
way neural networks can handle large amounts of continuous input data.

Ensuring Conclusion Transparency

Another important disadvantage of deep learning software is that it is incapable of


providing arguments why it has reached a certain conclusion. Unlike in case of traditional
machine learning, you cannot follow an algorithm to find out why your system has decided that
it is a cat on a picture, not a dog. To correct errors in DL algorithms, you have to revise the
whole algorithm.

Resource-Demanding Technology

Deep learning is a quite resource-demanding technology. It requires more powerful


GPUs, high-performance graphics processing units, large amounts of storage to train the models,
etc. Furthermore, this technology needs more time to train in comparison withtraditional
machine learning.

Lack of flexibility

Despite the occasional warnings of AI taking over the world, deep learning algorithms
are pretty simple in their nature. In order to solve a given problem, a deep learning network
needs to be provided with data describing that specific problem, thus rendering the algorithm
ineffective to solve any other problems. This is true no matter how similar they are to the original
problem.

1.7 APPLICATIONS OF DEEP LEARNING:

Self-Driving Cars:
In self-driven cars, it is able to capture the images around it by processing a huge amount of data,
and then it will decide which actions should be incorporated to take a left or right or should it stop. So,
accordingly, it will decide what actions it should take, which will further reduce the accidents that
happen every year.
Voice Controlled Assistance:
When we talk about voice control assistance, then Siri is the one thing that comes into our mind.
So, you can tell Siri whatever you want it to do it for you, and it will search it for you and display it for
you.
Automatic Image Caption Generation:
Whatever image that you upload, the algorithm will work in such a way that it will generate
caption accordingly. If you say blue colored eye, it will display a blue-colored eye with a caption at the
bottom of the image.
Automatic Machine Translation:
With the help of automatic machine translation, we are able to convert one language into another
with the help of deep learning.
8
Advantages:
o It lessens the need for feature engineering.
o It eradicates all those costs that are needless.
o It easily identifies difficult defects.
o It results in the best-in-class performance on problems.

Disadvantages:
o It requires an ample amount of data.
o It is quite expensive to train.
o It does not have strong theoretical groundwork.

1.8 KNN Algorithm:


 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.

 K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.

Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want
to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works
on a similarity measure. Our KNN model will find the similar features of the new data set to the cats
and dogs images and based on the most similar features it will put it in either cat or dog category.

9
FIG 1.5 KNN Classifier

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1,
so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:

FIG 1.6 Example of KNN


1.9 SVM Algorithm:

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which
is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below

10
diagram in which there are two different categories that are classified using a decision boundary or
hyperplane:

EXAMPLE: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm.
We will first train our model with lots of images of cats and dogs so that it can learn about different
features of cats and dogs, and then we test it with this strange creature. So as support vector creates a
decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it
will see the extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat.
Consider the below diagram:

FIG 1.7 SVM Model

1.10 Gradient Boosting Algorithm

Gradient Boosting Machine (GBM) is one of the most popular forward learning ensemble
methods in machine learning. It is a powerful technique for building predictive models for regression
and classification tasks.

GBM helps us to get a predictive model in form of an ensemble of weak prediction models such as
decision trees. Whenever a decision tree performs as a weak learner then the resulting algorithm is called
gradient-boosted trees.

It enables us to combine the predictions from various learner models and build a final predictive model
having the correct prediction.

11
FIG 1.8 Gradient Boosting Model

1.11 LSTM Algorithm:

Long Short Term Memory is a kind of recurrent neural network. In RNN output from the last
step is fed as input in the current step. LSTM was designed by Hochreiter & Schmidhuber. It tackled
the problem of long-term dependencies of RNN in which the RNN cannot predict the word stored in
the long-term memory but can give more accurate predictions from the recent information. As the gap
length increases RNN does not give an efficient performance. LSTM can by default retain the
information for a long period of time. It is used for processing, predicting, and classifying on the basis
of time-series data.
Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) that is
specifically designed to handle sequential data, such as time series, speech, and text. LSTM networks
are capable of learning long-term dependencies in sequential data, which makes them well suited for
tasks such as language translation, speech recognition, and time series forecasting.

A traditional RNN has a single hidden state that is passed through time, which can make it
difficult for the network to learn long-term dependencies. LSTMs address this problem by introducing
a memory cell, which is a container that can hold information for an extended period of time. The
memory cell is controlled by three gates: the input gate, the forget gate, and the output gate. These
gates decide what information to add to, remove from, and output from the memory cell.
The input gate controls what information is added to the memory cell. The forget gate controls
what information is removed from the memory cell. And the output gate controls what information is
output from the memory cell. This allows LSTM networks to selectively retain or discard information
as it flows through the network, which allows them to learn long-term dependencies.

LSTM networks find useful applications in the following areas:

 Language modeling
 Machine translation

12
 Handwriting recognition
 Image captioning
 Image generation using attention models
 Question answering
 Video-to-text conversion
 Polymorphic music modeling
 Speech synthesis
 Protein secondary structure prediction

FIG 1.9 LSTM Model

13
CHAPTER 2

LITERATURE SURVEY

2.1 JOURNALS:

PAPER-1: B Vijayakumar, et al.in 2020, As we calculated the result for all the different
algorithms, we came to know that random forest algorithms are the best and more suitable for prediction
purposes [1].

PAPER-2: Sneh Kalra et al. In 2019, The writers of this article conducted research on the
fluctuation of stock pricing in contrast to the corresponding company's new reports. The number of both
positive and negative sentiments of news articles for each day, in addition to the variability of the
preceding day's price at the end, was utilized for the forecast. [2].

PAPER-3: Aditya Menon et al. in 2019, After evaluating a neural model, expect this research
will focus on an analysis of a bayesian neural model for forecasting stock performance.
LSTM techniques for forecasting financial consequences merging into the current phenomenon would
be prioritized. [3].

PAPER-4: Ashish Sharma et al. 2017, performed an assessment of regression approaches for
stock prediction utilizing stock market data and determined that regression analysis is employed in the
bulk of stock market trend estimates. In the coming, more number variables may be employed to enhance
outcomes [4].

PAPER-5: Mu yen Chen et al. in 2019, Long short-term memory (LSTM) deep learning
technique used by researchers to compute the effect of news stories on stock prices. The researchers
believe this research can forecast stock market trends [5].

PAPER-6:Andrea Picasso et al. in 2019, Through a variety of applications and automation


techniques, the authors of this study aimed to combine economic and elemental analyses for market
trend prediction. The sentiment of a news story is used as input data. Their analysis indicates that the
use of news astral one-off information is the most troublesome assignment. The significant feature
approach will be appropriate in the future to solve this issue [6].

PAPER-7:Gangadhar Shobha et al. in a 2018, This study gave a comprehensive introduction


to machine learning strategies that would aid readers in using equations and concepts. The author

14
addressed three types of all machine learning methodology as well as many types of metrics including
accuracy, confusion matrix, recall, RMSE, precision, and percentage of errors. Since many individuals
are unsure whether to employ machine learning techniques for prediction or other purposes, the author
believes that this overview might be helpful to those who are new to the field [7].

PAPER-8: Suryoday Basak et al. in 2018, To forecast stock prices, the author constructed an
experimental framework. In this project, the author utilizes the two algorithms known as a random forest
classifier and Gradient boosted decision' n trees, and they obtained greater accuracy in comparison to
previous research papers. They might utilize the built-in boosted tree model for short-term data windows
in the future [8].

PAPER-9: Arash Negahdari kia et al. in 2018, As with stock predictions, several experiments
and models have been created for historical data prediction purposes, such as the HyS3 graph-based
semi-supervised model presented in this research by the author using the ConKruG network views
Kruskal based graph method. They believe that in the future, social media and Twitter data might be
utilized to predict stocks more accurately using these algorithms [9].

PAPER-10: Bruno Miranda et al. in 2019, The author's work on the prediction of financial
market values by using the support vector machine (SVM). A survey of bibliographic techniques that
concentrate on text areas for research is also examined [10].

PAPER-11: K. Hiba Sadia et al. In 2019, the Random forest algorithm is recommended as the
finest algorithm for upcoming stock forecasting. For future work, they believe that in addition to adding
the variables can improve outcomes by increasing accuracy and efficiency. [11].

PAPER-12: A. Akash et al. in 2019, A technique that uses the "LS SVM" and the best-
unbounded parameter to improve result accuracy by reducing overfitting and some technical indications.
The author also compares the recommended method to the idea of an artificial neural network [12].

PAPER-13: Aparna Nayak et al.in 2016, In this paper, authors predicted data based on daily
live data that is directly called by the programmed using the Yahoo Financial website as well as monthly-
based prediction, with daily live prediction providing a better result than monthly prediction. For future
work, they believe that considering more sentiments to the monthly [prediction] would be beneficial
[13].

PAPER-14: Nuno Oliveira et al. 2016, The author of this study wanted to develop a mechanism
for obtaining the value of stock prediction and messaging service data they used, for stock prices and
return indexing, as well as others like a strategy. They used a large number of Tweets for their analysis.

15
As a result, they determined that Twitter and blog data were important for forecasting purposes. This
conclusion might be expanded by using additional different data sources, such as online
sites databases, and others. [14].

PAPER-15: Han locks Siew et al. in 2017, The author employed a regression approach in this
work to determine the accuracy of forecasting a stock trend. To carry out this experiment, they used
WEKA software, which is used for data mining and machine learning techniques, as well as a dataset
that comprises heterogeneous values and is used for managing currency values and functional ratios. For
forecasting purposes, the dataset used to calculate stock movement is obtained from Bursa Malaysia.
The authors believed that having a more consistent ordinal data format might enhance forecasts using
the regression approach in the future [15].

PAPER-16: Smruti Rekha das et al. in 2019, novelists used the great technique to anticipate
stock prices as an input dataset in this project, all attained dataset was well converted by applying
suitable numerical equations by utilizing the backpropagation- propagation algorithm, bayesian
network, and much more two approaches used for prediction. predicting the future in accordance with
the time scale of interchanging days, such as one day. etc, More parameters could be added to the
developed algorithms in the future to produce more precise outcomes. [16].

PAPER-17:Dattatray P.Gandhmal et al. in 2019, The tremendous approaches for


the prediction of stock markets include KNN, SVM, SVR, and many more. Therefore, for the objective
of using past data, these approaches could be more beneficial. In the upcoming, the researchers will
analyze other studies to determine the finest method for estimation.

[17].

2.2 EXISTING SYSTEM :

There are various existing systems of stock forecasting, ranging from simple technical analysis tools to
complex machine learning algorithms. Here are some examples:

1.Moving Averages: Moving averages are a widely used technical analysis tool for stock forecasting.
They are calculated by taking the average price of a stock over a certain period of time, and are used to
identify trends and potential price reversals.

16
2.Relative Strength Index (RSI): The RSI is a momentum oscillator that measures the speed and
change of price movements. It is used to identify overbought or oversold conditions in a stock, which
can indicate potential price reversals.

3.MACD: The Moving Average Convergence Divergence (MACD) is a technical analysis tool that
combines moving averages with a momentum oscillator. It is used to identify trend reversals and
potential entry and exit points for trades.

4.Artificial Neural Networks (ANNs): ANNs are a type of machine learning algorithm that can
be used for stock forecasting. They are trained on historical stock data and can identify patterns and
relationships that may not be apparent to human analysts.

5.Support Vector Machines (SVMs): SVMs are another type of machine learning algorithm that
can be used for stock forecasting. They are based on the idea of finding the best hyperplane that separates
data into different classes, and can be used to predict stock prices based on historical data.

6.Long Short-Term Memory (LSTM) Networks: LSTMs are a type of neural network that
are particularly well-suited for time-series data, such as stock prices. They can learn long-term
dependencies in the data and can be used for forecasting future stock prices.

Overall, there are many existing systems and tools for stock forecasting, and the choice of which to use
will depend on the specific needs and goals of the user.

2.3 LIMITATIONS OF EXISTING SYSTEM:

While there are many existing systems and tools for stock forecasting, there are also several limitations
to these systems:

Data Quality: The accuracy of stock forecasting systems is highly dependent on the quality and
accuracy of the data used. Historical stock data can be incomplete, inaccurate, or biased, which can
affect the accuracy of the forecasting results.

17
Complexity: Some stock forecasting systems are very complex and require a high level of expertise
to use effectively. This can limit their accessibility to users who do not have the necessary skills or
resources.

Overfitting: Machine learning algorithms can be prone to overfitting, which occurs when a model is
too complex and fits the training data too closely, resulting in poor generalization performance when
applied to new data.

Market Volatility: The stock market can be highly volatile, and sudden changes in market conditions
can make it difficult to accurately predict future stock prices.

Human Factors: Stock prices are influenced by a wide range of factors, including political events,
economic conditions, and investor sentiment, which can be difficult to predict with accuracy. This can
limit the effectiveness of even the most sophisticated forecasting systems.

Lack of Transparency: Some stock forecasting systems, particularly those based on machine
learning algorithms, can be difficult to interpret and understand. This can make it difficult to identify
the underlying factors driving the forecasting results and can limit the ability of users to adjust or
improve the system.

18
CHAPTER 3

PROPOSED SYSTEM

3.1 PROPOSED SYSTEM:

Predicting the Stock Market has been the bane and goal of investors since its existence.
Everyday billions of dollars are traded on the exchange, and behind each dollar is an investor hoping to
profit in one way or another. Entire companies rise and fall daily based on the behaviour of the market.
Should an investor be able to accurately predict market movements, it offers a tantalizing promises of
wealth and influence. It is no wonder then that the Stock Market and its associated challenges find their
way into the public imagination every time it misbehaves. The 2008 financial crisis was no different, as
evidenced by the flood of films and documentaries based on the crash. If there was a common theme
among those productions, it was that few people knew how the market worked or reacted. Perhaps a
better understanding of stock market prediction might help in the case of similar events in the future.
Tehran’s stock market has been greatly popular lately due to the remarkable growth of the main index
in the last decade. The important reason behind that is privatizing most of the state-owned in the Iranian
constitution firms under the general policies of article 44. The shares of lately privatized firms can be
bought by ordinary people under particular conditions. The market has some special features compared
to other country’s stock markets; for example, dealing price limitation that is ±5% of opening price for
every index in each trading day. This matter hampers scatter market shocks and irregular market
fluctuations, political matters, etc. over a particular time and could form the market smoother. However,
the effect of fundamental parameters on the market is considerable and the prediction task of future
movements is not easy [23]. This study employed stock market groups (that are important for traders)
to investigate the task of predicting future trends. In spite of remarkable progress in the Tehran stock
market in the recent decade, there have been not adequate papers on the stock price predictions and
trends via novel machine learning algorithms. However, a paper has been published recently by
Nabipour et al. [23] where they employed tree-based models and deep learning algorithms to estimate
future stock prices from 1 day ahead to 30 days ahead as a regression problem. The experimental results
indicated that LSTM (as the superior model) could successfully predict values (from Tehran Stock
Exchange) with the lowest error. In this research, we concentrate on comparing prediction performance
of nine machine learning models (Decision Tree, Random Forest, Adaboost, XGBoost, SVC, Naïve
Bayes, KNN, Logistic Regression and ANN) and two deep learning methods (RNN and LSTM) to
predict stock market movement. Ten technical indicators are utilized as inputs to our models. Our study
includes two different approaches for inputs, continuous data and binary data, to investigate the effect
of preprocessing; the former uses stock trading data (open, close, high and low values) while the latter
19
employs preprocessing step to convert continuous data to binary one. Each technical indicator has its
specific possibility of up or down movement based on market inherent properties. The performance of
the mentioned models is compared for the both approaches with three classification metrics, and the best
tuning parameter for each model (except Naïve Bayes and Logistic Regression) is reported. All
experimental tests are done with ten years of historical data of four stock market groups (petroleum,
diversified financials, basic metals and non-metallic minerals), that are totally crucial for investors, from
Tehran stock exchange. We believe that this study is a new research paper that incorporates multiple
machine learning and deep learning methods to improve the prediction task of stock groups’ trend and
movement.

3.1.1 KNN ARCHITECTURE:

KNN (k-nearest neighbors) is a machine learning algorithm used for classification and regression tasks.
It is a non-parametric and lazy learning algorithm, meaning it does not make any assumptions about the
underlying data distribution and it does not explicitly learn a model during training.

The architecture of the KNN algorithm is relatively simple. Given a training dataset of input features
and corresponding target labels, the algorithm operates as follows:

1.During the training phase, the algorithm simply stores the input features and their corresponding labels.

2.During the testing phase, the algorithm takes an input feature and identifies the k-nearest neighbors to
that feature in the training dataset, based on a specified distance metric (e.g., Euclidean distance).

3.For classification tasks, the algorithm assigns the majority label of the k-nearest neighbors to the input
feature as its predicted label.

4.For regression tasks, the algorithm assigns the average value of the target labels of the k-nearest
neighbors to the input feature as its predicted value.

20
The KNN algorithm can be easily implemented and has a low computational cost during training, as it
only stores the training data. However, its performance can be affected by the choice of the k value,
which determines the number of neighbors used to make predictions. Additionally, the algorithm can be
sensitive to the choice of distance metric and can be affected by the curse of dimensionality (i.e., the
performance can degrade as the number of input features increases).

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider the below
image:

FIG 3.1 New Data Point

o Firstly, we will choose the number of neighbors, so we will choose the k=5.

21
o Next, we will calculate the Euclidean distance between the data points. The Euclidean distance
is the distance between two points, which we have already studied in geometry. It can be
calculated as:

o By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:

22
FIG 3.2 New Data Point To Required Category

As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to
category A.

3.1.2 SVM ARCHITECTURE :

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset
that has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that
can classify the pair(x1, x2) of coordinates in either green or blue. Consider the below image:

FIG 3.3 SVM has two data sets (blue and green)

23
So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there
can be multiple lines that can separate these classes. Consider the below image:

FIG 3.4 SVM Hyperplane

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region
is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes.
These points are called support vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin
is called the optimal hyperplane.

24
FIG 3.5 SVM Optimal Hyperplane

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the below image:

FIG 3.6 Non-Linear SVM

25
So to separate these data points, we need to add one more dimension. For linear data, we have used
two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated
as: z=x2 +y2

By adding the third dimension, the sample space will become as below image:

FIG 3.7 Non-Linear SVM(adding third dimension)

3.1.3 LSTM ARCHITECTURE :

LSTM Neural Network Architecture

The core components of an LSTM neural network are a sequence input layer and an LSTM layer.
A sequence input layer inputs sequence or time series data into the neural network. An LSTM
layer learns long-term dependencies between time steps of sequence data.

This diagram illustrates the architecture of a simple LSTM neural network for classification. The
neural network starts with a sequence input layer followed by an LSTM layer. To predict class labels,
the neural network ends with a fully connected layer, a softmax layer, and a classification output layer.

FIG 3.8 LSTM Simple Neural Network for Classification

26
This diagram illustrates the architecture of a simple LSTM neural network for regression. The
neural network starts with a sequence input layer followed by an LSTM layer. The neural network ends
with a fully connected layer and a regression output layer.

FIG 3.9.1 LSTM Simple Neural Network for Regression

This diagram illustrates the architecture of a neural network for video classification. To input
image sequences to the neural network, use a sequence input layer. To use convolutional layers to extract
features, that is, to apply the convolutional operations to each frame of the videos independently, use a
sequence folding layer followed by the convolutional layers, and then a sequence unfolding layer. To
use the LSTM layers to learn from sequences of vectors, use a flatten layer followed by the LSTM and
output layers.

FIG 3.9.2 LSTM Neural network for Video Classification

To create an LSTM network for sequence-to-label classification, create a layered array containing a
sequence input layer, an LSTM layer, a fully connected layer, a softmax layer, and a classification output
layer.Set the size of the sequence input layer to the number of features of the input data. Set the size of
the fully connected layer to the number of classes. You do not need to specify the sequence length.

27
3.1.4 GRADIENT BOOSTING ARCHITECTURE :

FIG 3.9.3 Gradient Boosting Architecture

Gradient boosting is a machine learning technique used for supervised learning tasks such as regression
and classification. It involves combining multiple weak or simple models, typically decision trees, into
an ensemble model that is stronger than its individual components.

The main idea behind gradient boosting is to sequentially add new models to the ensemble, where each
new model is trained to correct the errors made by the previous models. Specifically, the algorithm
minimizes a loss function that measures the difference between the predicted and actual values or labels,
using gradient descent to iteratively optimize the model parameters.

The "gradient" in gradient boosting refers to the gradient of the loss function with respect to the predicted
values, which is used to update the parameters of each new model in the ensemble. The "boosting" in
gradient boosting refers to the fact that each new model is trained to boost the performance of the
ensemble, rather than being trained independently.

Gradient boosting has become a popular technique due to its ability to produce highly accurate
predictions on a wide range of datasets, including those with complex relationships between the input
features and target variables. However, gradient boosting can also be computationally expensive and
28
prone to overfitting, especially if the ensemble has too many models or if the training dataset is too
small. Regularization techniques and hyperparameter tuning are commonly used to address these issues.

ADVANTAGES:-
 Determine the future value of a stock.
 Available stocks data and gain intelligence analysis.
 Accurate prediction.
 Large and small capitalizations and in the three different markets.

29
CHAPTER 4
SYSTEM DESIGN
4.1 SYSTEM ARCHITECTURE:

In the proposed system we are implementing artificial intelligence algorithms to anticipate the stock exchange
movements using the SVM, Gradient Boosting, KNN, and LSTM.
Here is the suggested work system architecture, as represented by the stepwise structure. In the first step, we
start by giving raw data to our trained algorithm, which will pre-process the data by using python libraries, which
is also a feature extraction part, after that, we divided our data into two parts, where 80% of our data is trained
and the remaining 20% of data is for testing by using the trained algorithm, after all of this process we will only
get our predicted data.

FIG 4.1 System Architecture

1.DATA CONGREGATION:
Data congregation is a vital phase and the early stage in design and development. It is usually involved
with obtaining the required dataset. A diversity of variables must be evaluated while building a dataset
30
for forecasting purposes. The collection of data also aids in expanding the dataset by including additional
different data sources. Our information is primarily based on stock prices from the preceding year. We
would first analyze the live data, and then apply the model with the data to adequately estimate the
predictions based on their precision.
2.DATA CREATION:
Noise, missing values, and discrepancies are common in raw data. Data quality influences data
mining outcomes. To aid in the enhancement of information affection or, as a result, which is the output
of extraction the unstructured information returns in advance to aid in the enhancement or simplification
of extraction operations. This is the only difficult phase in the data extraction process since it negotiates
and attempts to reconstruct and change the original information.
3.DATA COMBINING:
The possibility of anatomizing your information or data function would be included in data
integration, which means the information will be combined from several cradles to build a suitable
stockpile, such information in the stockpile. Which can merge a large number of data, files, and other
stuff in certain files. As a result, we may face several challenges throughout information compilation.
The mix of techniques might be effective and deceptive. In contrast, the other word corporations could
be matched from a variety of data sources.
4.DATA CONVERSION:
1. Formalize, the adoration of measurable features that may be represented on a scale of 0 to 1.
2. Self-assembled, unwanted information, or noise, that incidentally includes data. These are
procedures that bend and turn information.
3. Agregassion, in which data is summarized or compiled. The information that deals with daily might
be collated or calculated to compute the total income of the year and monthly subtotal. Which is typically
maintained to develop an information set of full collective-grained anatomizing.
4. Data with a zero level and Old' (green) information that was taken out with the high place of
information through the use of conceptual ordering. As an example, a distinct arterial feature may be
adapted to another ordering abstraction, such as a rural region. Numerical values or age may be included
in the map of high-level notions, like young, middle-aged, and older.
5.DATA CLEANING:
The attributes under consideration may not always be operational. As transaction data is
dependent on client information, certain information cannot be engaged where it may be suggested
arrogant at login.
Appropriate information cannot be recorded owing to misconceptions or mechanical failure. Data that
did not match any other recorded data may have been erased. Furthermore, past data recording or
alteration may be disregarded. Missing data particularly duplicates with missing values for some of the
31
symbols, may be entered. Given information may be arrested, with no remedial responsibility collection
of data, due to a given array of dataset tools being incorrect. That might be a human mistake or machine
automation due to the presence of information. Inaccuracy in information transmission may occur.more
accurate predictions based on the current input.

4.2 FEASIBILITY STUDY:


The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysisthe feasibility study
of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden
to the company. For feasibility analysis, some understanding of the major requirements for the
system is essential.
4.2.1 ECONOMICAL FEASIBILITY:
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of the
system is limited. The expenditures must be justified. Thus the developed system as well within the
budget and this was achieved because most of the technologiesused are freely available. Only the
customized products had to be purchase.
4.2.2 TECHNICAL FEASIBILITY:
This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical resources.
This will lead to high demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest requirement, as only minimal or
null changes are required for implementing this system.
4.2.3 SOCIAL FEASIBILITY:
The aspect of study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened by the
system, instead must accept it as a necessity. The level of acceptance by theusers solely depends on the
methods that are employed to educate the user about the system and to make him familiar with it. His
level of confidence must be raised so that he is also ableto make some constructive criticism, which is
welcomed, as he is the final user of the system.

4.3 SYSTEM REQUIREMENT SPECIFICATION:


A requirements specification for a software system – is a complete description of the behaviour
of a system to be developed. It includes a set of use cases that describe all the interactions the users
32
will have with the software. In addition to use cases, the SRS also contains non-functional
requirements. Non-functional requirements are requirements which impose constraints on the design
or implementation (such as performance engineering requirements, quality standards, or design
constraints).

A structured collection of information that embodies the requirements of a system. A business


analyst, sometimes titled system analyst, is responsible for analyzing the business needs of their
clients and stakeholders to help identify business problems and propose solutions. Within the systems
development life cycle domain, the BA typically performs a liaison function between the business
side of an enterprise and the information technology department or external service providers.
Projects are subject to three sorts of requirements:

 Business requirements describe in business terms what must be delivered or


accomplished to provide value.
 Product requirements describe properties of a system or product (which could be
one of several ways to accomplish a set of business requirements.)
 Process requirements describe activities performed by the developing organization.
For instance, process requirements could specify specific methodologies that must be
followed, and constraints that the organization must obey.

Product and process requirements are closely linked. Process requirements often specify the
activities that will be performed to satisfy a product requirement. For example, a maximum
development cost requirement (a process requirement) may be imposed to help achieve a maximum
sales price requirement (a product requirement); a requirement that the product be maintainable (a
Product requirement) often is addressed by imposing requirementsto follow particular development
styles.
PURPOSE:
A systems engineering, a requirement can be a description of what a system must do, referred
to as a Functional Requirement. This type of requirement specifies something that the delivered system
must be able to do. Another type of requirement specifies something about the system itself, and how
well it performs its functions. Such requirements are often called Non-functional requirements, or
'performance requirements' or 'quality of service requirements.' Examples of such requirements include
usability, availability, reliability, supportability, testability and maintainability.

A collection of requirements define the characteristics or features of the desiredsystem. A

33
'good' list of requirements as far as possible avoids saying how the system should implement the
requirements, leaving such decisions to the system designer. Specifying how the system should be
implemented is called "implementation bias" or "solution engineering". However, implementation
constraints on the solution may validly be expressed by the future owner, for example for required
interfaces to external systems; for interoperability with other systems; and for commonality (e.g. of
user interfaces) with other owned products.

In software engineering, the same meanings of requirements apply, except that the focus of
interest is the software itself.

4.4 REQUIREMENT ANALYSIS:

The project involved analyzing the design of few applications so as to make the application
more users friendly. To do so, it was really important to keep the navigations from one screen to the
other well-ordered and at the same time reducing the amount of typingthe user needs to do. In order to
make the application more accessible, the browser version had to be chosen so that it is compatible
with most of the Browsers.
4.4.1 REQUIREMENT SPECIFICATION:

1.Functional Requirements

 Graphical User interface with the User.


2.Non-Functional Requirements
The major non-functional Requirements of the system are as follows
 Usability
The system is designed with completely automated process hence there is no or less
user intervention.
 Reliability
The system is more reliable because of the qualities that are inherited from thechosen
platform java. The code built by using java is more reliable.
 Performance
This system is developing in the high level languages and using the advancedfront-
end and back-end technologies it will give response to the end user on client system
with in very less time.

34
4.4.2 SOFTWARE AND HARDWARE REQUIREMENTS:

 Software Requirements

For developing the application the following are the Software Requirements:

 Python

 Operating Systems supported


 Windows

 Technologies and Languages used to Develop


 Python

 Hardware Requirements

For developing the application the following are the Hardware Requirements:

 Processor: Pentium IV or higher


 RAM: 2GB
 Space on Hard Disk: minimum 10GB

35
4.5 DATA FLOW DIAGRAM :

FIG 4.2 Data Flow Diagram


4.6 UML DIAGRAMS:

A use case diagram in the Unified Modelling Language (UML) is a type of behavioural
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as use
cases), and any dependencies between those use cases. The main purpose of a use case diagram is to
show what system functions are performed forwhich actor. Roles of the actors in the system can be
depicted. Use Case diagrams are formally included in two modelling languages defined by the OMG:
the Unified Modelling Language (UML) and the Systems Modelling Language (Sys ML).
TYPES OF UML DIAGRAMS:
There are several types of UML diagrams and each one of them serves a different
36
purpose. The two most broad categories that encompass all other types are Behavioral UML diagram
and Structural UML diagram. As the name suggests, some UML diagrams try to analyze and depict
the structure of a system or process, whereas other describe the behavior of the system, its actors,
and its building components. The different types are broken down as follows:

FIG 4.3 Uml Diagrams

4.6.1 USE CASE DIAGRAM :


A cornerstone part of the system is the functional requirements that the system fulfills. Use Case
diagrams are used to analyze the system’s high-level requirements. These requirements are expressed
through different use cases. We notice three main components of this UML diagram:

Functional requirements – represented as use cases; a verb describing an action

Actors – they interact with the system; an actor can be a human being, an organization or an internal
or external application

Relationships between actors and use cases – represented using straight arrows

The Case Diagram of the Proposed System The case diagram of the proposed system is shown
37
below. The proposed system allows the user to provide the training data and the threshold constant.
Training Neural network requires initialization of input weights and the first input weight will be
adjusted until the optimal prediction is achieved.

FIG 4.4 Use Case Diagram

38
Use Case ID Use Case Name Primary Scope Complexity Priority
actor

1 Collect data Admin In High 1


2 Compute result and Admin In High 1
performance
3 System update Admin In High 1

4 View traded User In Medium 2


exchange
5 Company Stock User In Medium 2

6 View predicted User In High 1


outcome

FIG 4.5 The Priority Checking

Within the circular containers, we express the actions that the actors perform. Such actions are: purchasing
and paying for the stock, checking stock quality, returning the stock or distributing it. As you might have noticed,
use case UML diagrams are good for showing dynamic behaviours between actors within a system, by
simplifying the view of the system and not reflecting the details of implementation.

Use case description:

Use case ID:1

Use case name: Collect data

Description: Every required data will be available in Yahoo stock exchange. Admin will be
able to collect the data for system.

Use case ID:2

Use case name: Compute result and performance

Description: Prediction result will be handled and generated by admin. The system will be
39
built, through which the result of prediction and system performance will be analyzed.

Use case ID: 3

Use case name: System update

Description: With the change of market and technology regular update of system is required.
Beside there the predict result of stock exchange and their actual price will be updated by
admin in regular basis.

Use case ID: 4

Use case name: View traded exchange

Description: Company trading which is held at Yahoo can be viewed by user.

Use Case ID: 5

Use Case Name: Company Stock

Description: It is extended feature of view traded exchange. This includes the stock value of
particular company.

Use Case ID: 6

Use Case Name: View predicted outcome

Description: This use case is must important in whole project. The key feature of thisproject
is to predict the stock value of hydropower companies. Thus, this will be available in user
interface and viewer can observe them.

4.6.2 CLASS DIAGRAM:


Class UML diagram is the most common diagram type for software documentation. Since most
software being created nowadays is still based on the Object-Oriented Programming paradigm, using
class diagrams to document the software turns out to be a common-sense solution. This happens
because OOP is based on classes and the relations between them.
In a nutshell, class diagrams contain classes, alongside with their attributes (also referred to as data

40
fields)
and their behaviors (also referred to as member functions). More specifically, each class has 3 fields:
the class name at the top, the class attributes right below the name, the class operations/behaviors
bottom. The relation between different classes (represented by a connecting line), makes up a class
diagram.

FIG 4.6 Class Diagram

The example above shows a basic class diagram. The ‘Checking’s Account’ class and the
‘Savings Account’ class both inherit from the more general class, ‘Account’. The inheritance is
shown using the blank-headed arrow.

4.6.3 OBJECT DIAGRAM:


When dealing with documentation of complex systems, component UML diagrams can help
41
break down the system into smaller components. Sometimes it is hard to depict the architecture of a
system because it might encompass several departments or it might employ different technologies.
For example, Lambda architecture is the typical example of a complex architecture that can be
represented using a component UML diagram. Lambda architecture is a data- processing
architecture employed by several companies for storing and processing data in a distributed system.
It is made up of three different layers: the speed layer, the batch layer and the serving one.

FIG 4.7 Object Diagram

The image above shows how a component diagram can help us get a simplified top-level view of
a more complex system. The annotations used here are not tailored according to UML standards.

4.6.4 SEQUENCE DIAGRAM:


Sequence diagrams are probably the most important UML diagrams among not only the
computer science community but also as design-level models for business application development.
42
The sequence diagram number the actions starting from the data input to the optimal prediction. There
is an arrow direction to show the sequence of flow for the action taking to arrive at the optimal
prediction.
User Application

1.Download dataset

2.Corelation Of Data

3.Data Preprocessing

4.KNN with Uniform W eights

5.KNN W ith Dist Weights

6.KNN Accuracy

FIG 4.8 Sequence Diagram

4.6.5 ACTIVITY DIAGRAM:


An activity diagram is essentially a flowchart, showing flow of controls from one activity to
another. Unlike a traditional flowchart, it can model the dynamic functional view of a system. An
activity diagram represents an operation on some classes in the system that results to changes in the
state of the system.

43
FIG 4.9.1 Activity Diagram
4.6.6 STATE CHART DIAGRAM:
State machine UML diagrams, also referred to as State chart diagrams, are used to describe the
different states of a component within a system. It takes the name state machine because the diagram is
essentially a machine that describes the several states of an object and how it changes based on internal
and external events.

A very simple state machine diagram would be that of a chess game. A typical chess game
consists of moves made by White and moves made by Black. White gets to have the first move and
thus initiates the game. The conclusion of the game can occur regardless of whether it is the White’s
turn or the Black’s. The game can end with a checkmate, resignationor in a draw (different states of
the machine).

44
FIG 4.9.2 State Chart Diagram

4.6.7 Interaction Overview Diagram:

Interaction Overview UML diagrams are probably some of the most complex ones. So farwe
have explained what an activity diagram is. Additionally, within the set of behavioral diagrams, we have
a subset made of four diagrams, called Interaction Diagrams:

 Timing Diagram
 Sequence Diagram
 Communication Diagram

45
FIG 4.9.3 Interaction Diagram

Timing diagram:
Timing UML diagrams are used to represent the relations of objects when the center of
attention rests on time. We are not interested in how the objects interact or change each other, but rather
we want to represent how objects and actors act along a linear time axis.

Each individual participant is represented through a lifeline, which is essentially a line


forming steps since the individual participant transits from one stage to another. The main focus is
on time duration of events and the changes that occur depending on the duration constraints.
The main components of a timing UML diagram are:

 Lifeline – individual participant


 State timeline – a single lifeline can go through different states within a pipeline
 Duration constraint – a time interval constraint that represents the duration ofnecessary for
a constraint to be fulfilled

46
 Time constraint – a time interval constraint during which something needs to befulfilled
by the participant
 Destruction occurrence – a message occurrence that destroys the individualparticipant
and depicts the end of that participant’s lifeline

An example of a simplified timing UML diagram is given below.

47
CHAPTER 5
DEPLOYMENT

Deployment diagrams are used to visualize the relation between software and hardware. To be
more specific, with deployment diagrams we can construct a physical model of how software
components (artifacts) are deployed on hardware components, known as nodes.

A typical simplified deployment diagram for a web application would include:

 Nodes (application server and database server)


 Artifacts (application client and database schema)

The nodes host the artifacts. The database schema runs on the database server and the
application client runs on the application server.

48
CHAPTER 6
SYSTEM TESTING
6.1 TESTING:

The purpose of testing is to discover errors. Testing is the process of trying todiscover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub-assemblies, assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the Software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There arevarious types of test. Each test
type addresses a specific testing requirement.

6.2 TYPES OF TESTING :

6.2.1 UNIT TESTING:


Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the application
.it is done after the completion of an individual unit before integration. This is a structural testing,
that relies on knowledge of its construction and is invasive. Unit tests perform basic tests at
component level and test a specific businessprocess, application, and/or system configuration. Unit
tests ensure that each unique path of a business process performs accurately to the documented
specifications and contains clearly defined inputs and expected results.

6.2.2 INTEGRATION TESTING:


Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concernedwith the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually
satisfaction, as shown by successfully unit testing, the combination of components is correct and
consistent. Integration testing is specifically aimed at exposing the problems that arise from the
combination of components.

6.2.3 FUNCTIONAL TESTING:


Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.

49
Functional testing is centred on the following items:

 Valid Input : identified classes of valid input must be accepted.


 Invalid Input : identified classes of invalid input must be rejected.
 Functions : identified functions must be exercised.
 Output : identified classes of application outputs must be exercised.
 Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements,key functions, or special


test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields,
predefined processes, and successive processes must be considered for testing. Before functional
testing is complete, additional tests are identified and the effective value of current tests is determined.

6.2.4 SYSTEM TESTING:


System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.

6.2.5 WHITE BOX TESTING:


White Box Testing is a testing in which in which the software tester has knowledge of the inner
workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test
areas that cannot be reached from a black box level.

6.2.6 BLACK BOX TESTING:


Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests, must be
written from a definitive source document, such as specification or requirements document, such as
specification or requirements document. It is a testing in which the software under test is treated, as a
black box .you cannot “see” into it. The test provides inputs and responds to outputs without
considering how the software works.

6.3 VERIFICATION AND VALIDATION:


The testing process is a part of broader subject referring to verification and validation.We have
to acknowledge the system specifications and try to meet the customer’s requirements and for this
50
sole purpose, we have to verify and validate the product to make sure everything is in place.
Verification and validation are two different things. One is performed to ensure that the software
correctly implements a specific functionality and other is done to ensure if the customer requirements
are properly met or not by the end product. Verification of the project was carried out to ensure that
the project met all the requirement and specification of our project. We made sure that our project is
up to the standard as we planned at the beginning of our project development.

51
CHAPTER 7
DATA SETS & RESULTS
7.1 DATA SETS
The naming conventions for the features in the data sets are closely resembled to the financial
ratio terms which include debt/equity, asset turnover, cash flow and return on equity. The original
source of data in data set 1 contains numeric values which were then converted into categorical
values in data set 2.

Debt_Equity , Asset_Turnover, Cash_Flow, Return_on_Equity are the independent


variables. The values of the independent variables are used to predict the value of the class variable
named Price_Trend. The variable named Price in data set 1 contains numerical values which do not
provide semantic interpretations of the stocks since various stocks have various prices. Whereas the
variable named Price_Trend in data set 2 contains standardized categorical values which indicate
positive or negative price trend.
Feature Data Type

Debt_Equity Numeric

Asset_Turnover Numeric

Cash_Flow Numeric

Return_On_Equity Numeric

Price Numeric

Table 1: Data set 1

Feature Data Type

Debt_Equity Categorical

Asset_Turnover Categorical

Cash_Flow Categorical

Return_On_Equity Categorical

Price_Trend Categorical

Table 2: Data set 2

52
Debt_Equity (D/E):
The debt-to-equity (D/E) ratio is calculated by dividing a company’s total liabilities by its
shareholder equity. These numbers are available on the balance sheet of a company’s financial
statements. The debt-to-equity ratio measures a company’s debt relative to the valueof its net assets,
a high debt/equity ratio is often associated with high risk; it means that a company has been
aggressive in financing its growth with debt.

Debt/Equity= Total Liabilities/Total shareholder’s Equity

FIG 7.1 Debt and Equity

Cash Flow:

Cash flow is a measure of a company's financial health in terms of incomings and outgoings
of cash, representing the operating activities of a company.

53
FIG 7.2 Cash flow

Cash from Operating Activities:

The operating activities on the CFS include any sources and uses of cash from business
activities. In other words, it reflects how much cash is generated from a company's products or
services.

Cash Flow from Investing Activities:

Cash flow from investing activities is one of the sections on the cash flow stat ement that
reports how much cash has been generated or spent from various investment-related activities in a
specific period. Investing activities include purchases of physical assets, investments in securities,
or the sale of securities or assets.

Negative cash flow is often indicative of a company's poor performance. However, negative
cash flow from investing activities might be due to significant amounts of cash being invested in the
long-term health of the company, such as research and development.

Before analysing the different types of positive and negative cash flows from investing
activities, it's important to review where a company's investment activity falls within its financial
statements.
54
Cash Flow from Financing Activities:

Cash flow from financing activities (CFF) is a section of a company’s cash flow statement,
which shows the net flows of cash that are used to fund the company. Financing activities include
transactions involving debt, equity, and dividends.

Cash flow from financing activities provides investors with insight into a company’s financial
strength and how well a company's capital structure is managed.
Formula:
CFF = CED − (CD + RP)
Where:

CED = Cash in flows from issuing equity or debtCD =


Cash paid as dividends
RP = Repurchase of debt and equity

Return_on_Equity (ROE):
Return on equity is a measure of profitability based on how much profit a company generates
with each dollar of stockholders' equity. ROE is considered a measure of how effectively
management is using a company’s assets to create profits.

Return on Equity=Net Income/Average Shareholders’ Equity

FIG 7.3 Return on Equity

Price_Trend:

Price trend is the general direction and momentum of a market or of the price of security. If
the price of security is going mainly upward, it is said to be on an upward price trend. The values of

55
the dependable feature named “Price_Trend" consists of Profit and Loss labels. Profit indicates an
upward price trend, whereas Loss indicates a downward price trend.The data set 2 is structured to
suit the approach of the KNN-Probabilistic model.

FIG 7.4 Price Trend

Asset_Turnover:

The asset turnover ratio measures the value of a company's sales or revenues relative to the
value of its assets. The asset turnover ratio can be used as an indicator of the efficiency with which
a company is using its assets to generate revenue.

The higher the asset turnover ratio, the more efficient a company is at generating revenue
from its assets. Conversely, if a company has a low asset turnover ratio, it indicates itis not efficiently
using its assets to generate sales.

Asset Turnover=Total Sales/Beginning Assets + Ending Assets/2

Where:

Total Sales=Annual sales total Beginning

Assets=Assets at start of year Ending

Assets=Assets at end of year

56
FIG 7.5 Asset_Turnover

The data contains a numerical form where the first column is the date. The second column is
the high price of that day. The third column is the low price of that day.
The fourth column is the open price where the initial price of that day. The fifth column is the close
price which is the ending price of that day. The sixth column is the volume which represents the count
of the stocks of that day. The seventh column is the adj close where it is the real value of the stock.
Everyday price fluctuations are accomplished using the closing price of the stock in previous data values,
and the difference among adjacent day close prices is determined by deducting the preceding day's
pricing from today's estimated price. If the obtained value is a negative value, the movement is down;
else, the movement is upward. as depicted in picture.

57
FIG 7.6 Data Set

58
7.2 RESULTS:

FIG 7.2.1 Home Page

FIG 7.2.2 DOWNLOAD DATA SET


59
FIG 7.2.3 CORRELATION FOR DATA

FIG 7.2.4 DATA PRE-PROCESSING

60
FIG 7.2.5 KNN WITH UNIFORM WEIGHTS

FIG 7.2.6 KNN WITH DISTANT WEIGHTS

61
FIG 7.2.7 MSE VALUE OF SVM

FIG 7.2.8 RUN SVM ALGORITHM

62
FIG 7.2.9 RUN GRADIENT BOOSTER ALGORITHM

FIG 7.2.10 MSE VALUE OF GRADIENT BOOSTER ALGORITHM

63
FIG 7.2.11 RUN LSTM ALGORITHM

FIG 7.2.12 MSE OF LSTM ALGORITHM

64
FIG 7.2.13 PREDICT THE TEST DATA

FIG 7.2.14 Plotting the Prediction for KNN with Distance Weights

65
FIG 7.2.15 Plotting the Prediction for KNN with Uniform Weights

66
CHAPTER 8

CONCLUSION AND FUTURE WORK

In this project, we implemented LSTM, SVM, and Gradient Boosting Algorithms and KNN
above is the output of each algorithm this is a forecast-based algorithm so we can calculate Mean Square
Error (MSE). MSE refers to the difference between the predicted price and the original test price so the
lower the MSE the better the predictions. In the above screen, we are showing the original stock price
and then the predicted SVM price and we can see there is a close difference between the original test
price and the predicted price. In the above graph we are showing predictions for the next 100 days where
the red line represents the original test price and the green line represents SVM predicted price both lines
are closely colliding and below is the SVM MSE rate. In the above screen with SVM, we got only a 3%
difference between the original and predicted prices. Now click on the 'Run Gradient Boosting Regressor
Algorithm' button to get the below output. In the above screen, we can see the prediction of the gradient
boosting algorithm and this algorithm's green line is fully colliding with the red line so its predicted
prices are accurate now click on the 'Run LSTM Algorithm' button to get the below output. In the above
screen, we can see the prediction of LSTM which is very much close and accurate to the original prices
you can see both green and red lines are fully overlapping with a minor difference, and below is the
MSE rate of all three algorithms. In the above screen with SVM, we got a high MSE error rate of 3%
and with gradient boosting, we got an MSE error of 1.06% with LSTM we got a less MSE error rate of
0.21 which is lesser than all other algorithms. So, from the above algorithms, we can say LSTM is
accurate.

67
REFERENCES

[1]. NAEIMUDDIN, B VIJAYAKUMAR. STOCK MARKET TREND PREDICTION USING SUPERVISED MACHINE
LEARNING ALGORITHMS 2020 IRJET E-ISSN: 2395-0056, P-ISSN: 2395-0072, PUBLISHED ONLINE 2020:51-
57
[2]. KALRA S, PRASAD JS. EFFICACY OF NEWS SENTIMENT FOR STOCK MARKET PREDICTION. PROC INT CONF
MACH LEARN BIG DATA, CLOUD PARALLEL COMPUTE TRENDS, PERSPECTIVES PROSPECT COM 2019.
PUBLISHED ONLINE 2019:491-496. DOI:10.1109/(COMITCON).2019.8862265
[3]. MENON A, SINGH S, PAREKH H. A REVIEW OF STOCK MARKET PREDICTION USING NEURAL NETWORKS . 2019
IEEE INT CONF SYST COMPUTE AUTOM NETWORKING, ICSCAN 2019. PUBLISHED ONLINE 2019:1-6
[4].MU YEN CHEN, CHIEN HSIANG LIAO, REN PAO HSIEH MODELING PUBLIC MOOD AND EMOTION:
STOCK MARKET TREND PREDICTION WITH ANTICIPATORY COMPUTING APPROACH . COMPUTE HUMAN BEHAVE.
2019;101(SEPTEMBER2018):402-408. DOI: 10.1016/J.CHB.2019.03.021
[5]. SHARMA A, BHUIYAN D, SINGH U. SURVEY OF STOCK MARKET PREDICTION USING MACHINE LEARNING
APPROACH. PROC INT CONF ELECTRON COMMUNEAFROS TECHNOL ICECA 2017. 2017;2017-JANUA:506-509.
DOI:10.1109/ICECA.2017.8212715

[6]. PICASSO A, MERELLO S, MA Y, ONETO L, CAMBRIA E. TECHNICAL ANALYSIS AND SENTIMENT


EMBEDDINGS FOR MARKET TREND PREDICTION. EXPERT SYST APPL. 2019; 135:60-70. DOI:
10.1016/J.ESWA.2019.06.014
[7]. SHOBHA G, RANGASWAMY S. MACHINE LEARNING. VOL 38. 1ST ED. ELSEVIER B.V.; 2018. DOI:
10.1016/BS.HOST.2018.07.004
[8]. BASAK S, KAR S, SAHA S, KHAIDEM L, DEY SR. PREDICTING THE DIRECTION OF STOCK MARKET PRICES
USING TREE-BASED CLASSIFIERS. NORTH AM J ECON FINANCE. 2019;47(DECEMBER 2017):552-567. DOI:
10.1016/J.NAJEF.2018.06.013
[9]. KIA AN, HARATIZADEH S, SHOURAKI SB. A HYBRID SUPERVISED SEMI-SUPERVISED GRAPH-BASED MODEL
TO PREDICT ONE-DAY AHEAD MOVEMENT OF GLOBAL STOCK MARKETS AND COMMODITY PRICES. EXPERT SYST
APPL. 2018; 105:159-173. DOI: 10.1016/J.ESWA.2018.03.037
[10]. HENRIQUE BM, SOBREIRO VA, KIMURA H. LITERATURE REVIEW: MACHINE LEARNING TECHNIQUES
APPLIED TO FINANCIAL MARKET PREDICTION. EXPERT SYST APPL. 2019; 124:226-251. DOI:
10.1016/J.ESWA.2019.01.012
[11]. ZHOU F, ZHOU H MIN, YANG Z, YANG L. EMD2FNN: A STRATEGY COMBINING EMPIRICAL MODE
DECOMPOSITION AND FACTORIZATION MACHINE-BASED NEURAL NETWORK FOR STOCK MARKET TREND
PREDICTION. EXPERT SYST APPL. 2019; 115:136-151. DOI: 10.1016/J.ESWA.2018.07.065

[12].SIRIMEVAN N, MAMALGAHA IGUH, JAYASEKARA C, MAYURAN YS, JAYAWARDENA C. STOCK MARKET


PREDICTION USING MACHINE LEARNING TECHNIQUES. 2019 INT CONF ADV COMPUTE ICAC 2019.
2019;(4):192-197. DOI:10.1109/ICAC49085.2019.9103381
[13].JADHAV AA, BIRADAR N, BHALDAR H, MATHPATI MS, WADEKAR R, SCHOLAR R. INTERNATIONAL
JOURNAL OF INNOVATIVE RESEARCH IN COMPUTER AND COMMUNICATION ENGINEERING DESIGN AND
ANALYSIS OF TRIPLE BAND MINIATURIZED ANTENNA FOR WEARABLE APPLICATION. 2019;(MARCH).
DOI:10.15680/IJIRCCE.2019

[14]. NAYAK A, PAI MMM, PAI RM. PREDICTION MODELS FOR INDIAN STOCK MARKET. PROCEDIA COMPUTE
SCI. 2016; 89:441-449. DOI: 10.1016/J.PROCS.2016.06.096
[15]. OLIVEIRA N, CORTEZ P, AREAL N. THE IMPACT OF MICROBLOGGING DATA FOR STOCK MARKET
PREDICTION: USING TWITTER TO PREDICT RETURNS, VOLATILITY, TRADING VOLUME, AND SURVEY SENTIMENT
INDICES. EXPERT SYST APPL. 2017; 73:125-144.

DOI: 10.1016/J.ESWA.2016.12.036
[16]. SIEW HL, NORDIN MJ. REGRESSION TECHNIQUES FOR THE PREDICTION OF STOCK PRICE TRENDS. ICSBE
2012 - PROCEEDINGS, 2012 INT CONF STAT SCI BUS ENG "EMPOWERING DECI’SMAAK WITH STAT SCI.
68
2012;(DECEMBER):99-103. DOI:10.1109/ICSSBE.2012.6396535
[17]. DAS SR, MISHRA D, ROUT M. STOCK MARKET PREDICTION USING FIREFLY ALGORITHM WITH
EVOLUTIONARY FRAMEWORK OPTIMIZED FEATURE REDUCTION FOR OSELM METHOD. EXPERT SYST WITH APPL
X. 2019; 4:100016. DOI: 10.1016/J.ESWAX.2019.100016
[18].GANDHMAL, DP, KUMAR K. SYSTEMATIC ANALYSIS AND REVIEW OF STOCK MARKET PREDICTION
TECHNIQUES. COMPUTE SCI REV. 2019; 34:100190. DOI: 10.1016/J.COSREV.2019.08.001

[19]. YADAV A, JHA CK, SHARAN A. OPTIMIZING LSTM FOR TIME SERIES PREDICTION IN THE INDIAN STOCK
MARKET. PROCEDIA COMPUTE SCI. 2020;167(2019):2091-2100. DOI: 10.1016/J.PROCS.2020.03.257

[20]. BUSTOS O, POMARES-QUIMBAYA A. STOCK MARKET MOVEMENT FORECAST: A SYSTEMATIC REVIEW


EXPERT SYST APPL. 2020;156. DOI: 10.1016/J.ESWA.2020.113464
[21]. JAHAN I, SAJAL SZ, NYGARD KE. PREDICTION MODEL USING RECURRENT NEURAL NETWORKS . IEEE INT
CONF ELECTRO INF TECHNOL. 2019;2019-MAY:390-395. DOI:10.1109/EIT.2019.8834336
[22]. IYER M, MEHRA R. A SURVEY ON STOCK MARKET PREDICTION. PDGC 2018 - 2018 5TH INT CONF
PARALLEL, DISTRIB GRID COMPUTE. PUBLISHED ONLINE 2018:663- 668. DOI:10.1109/PDGC.2018.8745715
[23].HAJEK P. FORECASTING STOCK MARKET TREND USING PROTOTYPE GENERATION CLASSIFIERS . WSEAS
TRANS SYST. 2012;11(12):671-680.
[24].GOLMOHAMMADI K,ZAIANE OR, DIAZ D. DETECTING STOCK MARKET MANIPULATION USING SUPERVISED
LEARNING ALGORITHMS. DSAA 2014 - PROC 2014 IEEE INT CONF DATA SCI ADV ANAL. PUBLISHED ONLINE
2014:435-441. DOI:10.1109/DSAA.2014.7058109.

69
APPENDIX

from tkinter import messagebox


from tkinter import *
from tkinter import simpledialog
import tkinter
from tkinter import filedialog
from imutils import paths
from tkinter.filedialog import askopenfilename
import pickle
import pandas as pd
import datetime
import pandas_datareader.data as web
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
from matplotlib import style
import matplotlib as mpl
from matplotlib import cm as cm
import math
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
import seaborn as sns
from sklearn.svm import SVR #SVR regression
from sklearn.metrics import mean_squared_error
import os
from sklearn.ensemble import GradientBoostingRegressor
import webbrowser
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM #class for LSTM regression
from keras.layers import Dropout

70
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
from pandas_datareader import data as pdr

main = tkinter.Tk()
main.title("PREDICTING STOCK MARKET TRENDS USING DEEP LEARNING AND
MACHINE LEARNING ALGORITHMS")
main.geometry("1300x1200")

global dataFrame, dfreg


global moving_avg
global dfcomp
global clfknn
global clfknndist
global X, y, X_train, y_train, X_test, y_test,X_pred
global distknn, uniknn, knnunipred, knndistpred
def loadDataset():
text.delete('1.0', END)
global dataFrame
global dfcomp
start = datetime.datetime(2020, 1, 1)
end = datetime.datetime(2021, 1, 11)

dataFrame = pd.read_csv("Yahoo-Finance-Dataset/Apple_YahooDataSet.csv")

text.insert(END, "Shape of Apple Stock Dataset: "+str(dataFrame.shape)+"\n\n")


text.insert(END, "Sample of Apple Stock Data: \n"+str(dataFrame.head(2))+"\n\n")

dfcomp = dataFrame#web.DataReader(['AAPL', 'GE', 'GOOG', 'IBM', 'MSFT'], 'yahoo',


start=start, end=end)['Adj Close']
dfcomp.drop(['Date'], axis = 1,inplace=True)
text.insert(END, "Shape of Apple Competitor Stock Dataset: " + str(dfcomp.shape) + "\n\n")
text.insert(END, "Sample of Apple Competitor Stock Data: \n" + str(dfcomp.head(2)) + "\n\n")

text.insert(END, "Dataset Downloaded from Yahoo Finance Dataset\n\n")


71
def dfcorr():
text.delete('1.0', END)
global dfcomp

text.insert(END, "Correlation form Apple Competitor Stock\n\n")


retscomp = dfcomp.pct_change()
corr = retscomp.corr()
text.insert(END, "correlation: \n"+str(corr)+"\n\n")

def dataPreProcess():
text.delete('1.0', END)
global dataFrame,dfreg
global X, y, X_train, X_test, y_train, y_test,X_pred

text.insert(END,"Data PreProcessing for Apple Stock Dataset\n\n")


dfreg = dataFrame.loc[:,["Adj Close","Volume"]]
dfreg["HL_PCT"] = (dataFrame["High"] - dataFrame["Low"]) / dataFrame["Close"] * 100.0
dfreg["PCT_change"] = (dataFrame["Close"] - dataFrame["Open"]) / dataFrame["Open"] *
100.0

# Drop missing value


dfreg.fillna(value=-99999, inplace=True)
# We want to separate 1 percent of the data to forecast
forecast_out = int(math.ceil(0.01 * len(dfreg)))

# Separating the label here, we want to predict the AdjClose


forecast_col = 'Adj Close'
dfreg['label'] = dfreg[forecast_col].shift(-forecast_out)
X = np.array(dfreg.drop(['label'], 1))

# Scale the X so that everyone can have the same distribution for linear regression
X = preprocessing.scale(X)
# Finally We want to find Data Series of late X and early X (train) for model generation and
evaluation
72
X_pred = X[-forecast_out:]
X = X[:-forecast_out]
# Separate label and identify it as y
y = np.array(dfreg['label'])
y = y[:-forecast_out]

text.insert(END, "X lablels : \n"+str(X)+"\n\n")


text.insert(END, "Y lablels : \n"+str(y)+"\n\n")
text.insert(END, "Data spliting into Train and Test")
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)

text.insert(END, "number of Train Samples : " + str(len(X_train)) + "\n")


text.insert(END, "number of Test Sample: " + str(len(X_test)) + "\n")

text.insert(END, "Data Preprocessing Completed\n\n")

def uniformKNN():
text.delete('1.0',END)
global clfknn
global uniknn

# KNN Regression
clfknn = KNeighborsRegressor(n_neighbors=5)
clfknn.fit(X_train, y_train)

uniknn = clfknn.score(X_train, y_train)

text.insert(END, "Accuracy of KNN with Uniform weights : "+str(uniknn*100)+"\n\n")

def distKNN():
text.delete('1.0', END)
global clfknndist,knndistpred
global distknn,knnunipred
# KNN Regression
clfknndist = KNeighborsRegressor(n_neighbors=5,weights='distance')
73
clfknndist.fit(X_train, y_train)
distknn = clfknndist.score(X_train, y_train)

text.insert(END, "Accuracy of KNN with Distance weights : "+str(distknn*100)+"\n\n")


def predModel():
text.delete('1.0', END)
global clfknndist,clfknn,knnunipred,knndistpred
global X, y, X_train, X_test, y_train, y_test

filename = filedialog.askopenfilename(initialdir="Yahoo-Finance-Dataset")
test = pd.read_csv(filename)
text.insert(END, filename + " test file loaded\n"+str(test.columns)+"\n");
x_pred = np.array(test.drop(['Unnamed: 0'],1))

text.insert(END, "test Dataset: \n"+str(x_pred)+"\n\n");

knndistpred = clfknndist.predict(x_pred)

text.insert(END, "Predict values for KNN with Dist weights: \n" + str(knndistpred) + "\n\n");

knnunipred = clfknn.predict(x_pred)

text.insert(END, "Predict values for KNN with Uni Wights: \n" + str(knnunipred) + "\n\n");

def graph():
text.delete('1.0', END)

global uniknn,distknn
global knnunipred,knndistpred
global dfreg

dfreg['Forecast'] = np.nan
last_date = dfreg.iloc[-1].name
last_unix = last_date
74
next_unix = last_unix + datetime.timedelta(days=1)

for i in knnunipred:
next_date = next_unix
next_unix += datetime.timedelta(days=1)
dfreg.loc[next_date] = [np.nan for _ in range(len(dfreg.columns) - 1)] + [i]
dfreg['Adj Close'].tail(500).plot()
dfreg['Forecast'].tail(500).plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.savefig('knnUniformPredGraph.png')
plt.close()

for i in knndistpred:
next_date = next_unix
next_unix += datetime.timedelta(days=1)
dfreg.loc[next_date] = [np.nan for _ in range(len(dfreg.columns) - 1)] + [i]
dfreg['Adj Close'].tail(500).plot()
dfreg['Forecast'].tail(500).plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.savefig('knnDistPredGraph.png')
plt.close()

height = [uniknn,distknn]
bars = ('KNN with uniform weights Accuracy', 'KNN with distance weights Accuracy')
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()

def createTable(original, predict, algorithm):


output = '<table border=1 align=left>'
75
output+= '<tr><th>Original Price</th><th>'+algorithm+' Predicted Price</th></tr>'
for i in range(len(original)):
output += '<tr><td>'+str(original[i])+'</td><td>'+str(predict[i])+'</td></tr>'
output+='</table></body></html>'
f = open("output.html", "w")
f.write(output)
f.close()
webbrowser.open("output.html",new=1)

def runSVM():
text.delete('1.0', END)
global X, y, X_train, X_test, y_train, y_test,X_pred

svr_regression = SVR(C=1.0, epsilon=0.2)


#training SVR with X and Y data
svr_regression.fit(X_train, y_train)
#performing prediction on test data
predict = svr_regression.predict(X_test)
labels = y_test
labels = labels[0:100]
predict = predict[0:100]
#calculating MSE error
svr_mse = mean_squared_error(labels,predict)
text.insert(END, "SVM Mean Square Error : "+str(svr_mse)+"\n\n")
text.update_idletasks()
createTable(labels,predict,"SVM")
#plotting comparison graph between original values and predicted values
plt.plot(labels, color = 'red', label = 'Original Stock Price')
plt.plot(predict, color = 'green', label = 'SVM Predicted Price')
plt.title('SVM Stock Prediction')
plt.xlabel('Test Data')
plt.ylabel('Stock Prediction')
plt.legend()
plt.show()

76
def runGBR():
global X, y, X_train, X_test, y_train, y_test,X_pred

gbr_regression = GradientBoostingRegressor()
#training Gradient Boosting Regressor with X and Y data
gbr_regression.fit(X, y)
#performing prediction on test data
predict = gbr_regression.predict(X_test)
labels = y_test
labels = labels[0:100]
predict = predict[0:100]
#calculating MSE error
svr_mse = mean_squared_error(labels,predict)
text.insert(END, "Gradient Boosting Regressor Mean Square Error : "+str(svr_mse)+"\n\n")
text.update_idletasks()
createTable(labels,predict,"Decision Tree")
#plotting comparison graph between original values and predicted values
plt.plot(labels, color = 'red', label = 'Original Stock Price')
plt.plot(predict, color = 'green', label = 'Gradient Boosting Regressor Predicted Price')
plt.title('Gradient Boosting Regressor Stock Prediction')
plt.xlabel('Test Data')
plt.ylabel('Stock Prediction')
plt.legend()
plt.show()

def runLSTM():
global X, y, X_train, X_test, y_train, y_test,X_pred
if os.path.exists("model/lstm.txt"):
with open('model/lstm.txt', 'rb') as file:
lstm = pickle.load(file)
file.close()
else:
XX = np.reshape(X, (X.shape[0], X.shape[1], 1))
print(XX.shape)
lstm = Sequential()
77
lstm.add(LSTM(units = 50, return_sequences = True, input_shape = (XX.shape[1],
XX.shape[2])))
lstm.add(Dropout(0.2))
lstm.add(LSTM(units = 50, return_sequences = True))
lstm.add(Dropout(0.2))
lstm.add(LSTM(units = 50, return_sequences = True))
lstm.add(Dropout(0.2))
lstm.add(LSTM(units = 50))
lstm.add(Dropout(0.2))
lstm.add(Dense(units = 1))
lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
lstm.fit(XX, y, epochs = 1000, batch_size = 16)
predict = lstm.predict(X_test)
labels = y_test
labels = labels[0:100]
predict = predict[0:100]
#calculating MSE error
svr_mse = mean_squared_error(labels,predict)
text.insert(END, "LSTM Mean Square Error : "+str(svr_mse)+"\n\n")
text.update_idletasks()
createTable(labels,predict,"Decision Tree")
#plotting comparison graph between original values and predicted values
plt.plot(labels, color = 'red', label = 'Original Stock Price')
plt.plot(predict, color = 'green', label = 'LSTM Predicted Price')
plt.title('LSTM Tree Stock Prediction')
plt.xlabel('Test Data')
plt.ylabel('Stock Prediction')
plt.legend()
plt.show()

font = ('times', 16, 'bold')


title = Label(main, text='PREDICTING STOCK MARKET TRENDS USING DEEP LEARNING
AND MACHINE LEARNING ALGORITHMS')
title.config(bg='floral white', fg='black')
title.config(font=font)
78
title.config(height=3, width=120)
title.place(x=0,y=5)

font1 = ('times', 13, 'bold')


uploadButton = Button(main, text="Download Dataset", command=loadDataset)
uploadButton.place(x=700,y=100)
uploadButton.config(font=font1)

corrButton = Button(main, text="Correlation for Data", command=dfcorr)


corrButton.place(x=700,y=150)
corrButton.config(font=font1)

ppButton = Button(main, text="Data Preprocessing", command=dataPreProcess)


ppButton.place(x=700,y=200)
ppButton.config(font=font1)

uniformButton = Button(main, text="Run KNN with Uniform Weights",


command=uniformKNN)
uniformButton.place(x=700,y=250)
uniformButton.config(font=font1)

distButton = Button(main, text="Run KNN with Dist Weights", command=distKNN)


distButton.place(x=700,y=300)
distButton.config(font=font1)

svmButton = Button(main, text="Run SVM Algorithm", command=runSVM)


svmButton.place(x=700,y=350)
svmButton.config(font=font1)

dtButton = Button(main, text="Run Gradient Boosting Regressor Algorithm",


command=runGBR)
dtButton.place(x=700,y=400)
dtButton.config(font=font1)

lstmButton = Button(main, text="Run LSTM Algorithm", command=runLSTM)


79
lstmButton.place(x=700,y=450)
lstmButton.config(font=font1)

predButton = Button(main, text="Predict the Test Data ", command=predModel)


predButton.place(x=700,y=500)
predButton.config(font=font1)

font1 = ('times', 12, 'bold')


text=Text(main,height=30,width=80)
scroll=Scrollbar(text)
text.configure(yscrollcommand=scroll.set)
text.place(x=10,y=100)
text.config(font=font1)

main.config(bg='light sea green')


main.mainloop()

80

You might also like