Text Search
Text Search
Text Search
SENTIMENT ANALYSIS
2
Department of Computer Engineering, KJSCE, Mumbai
hodcomp@somaiya.edu
3
Department of Computer Engineering, KJSCE, Mumbai
jyothirao@somaiya.edu
ABSTRACT
Efficient Market Hypothesis is the popular theory about stock prediction. With its failure much research
has been carried in the area of prediction of stocks. This project is about taking non quantifiable data
such as financial news articles about a company and predicting its future stock trend with news sentiment
classification. Assuming that news articles have impact on stock market, this is an attempt to study
relationship between news and stock trend. To show this, we created three different classification models
which depict polarity of news articles being positive or negative. Observations show that RF and SVM
perform well in all types of testing. Nave Bayes gives good result but not compared to the other two.
Experiments are conducted to evaluate various aspects of the proposed model and encouraging results
are obtained in all of the experiments. The accuracy of the prediction model is more than 80% and in
comparison with news random labelling with 50% of accuracy; the model has increased the accuracy by
30%.
KEYWORDS
Text Mining, Sentiment analysis, Naive Bayes, Random Forest, SVM, Stock trends
1. INTRODUCTION
In the finance field, stock market and its trends are extremely volatile in nature. It attracts
researchers to capture the volatility and predicting its next moves. Investors and market analysts
study the market behaviour and plan their buy or sell strategies accordingly. As stock market
produces large amount of data every day, it is very difficult for an individual to consider all the
current and past information for predicting future trend of a stock. Mainly there are two methods
for forecasting market trends. One is Technical analysis and other is Fundamental analysis.
Technical analysis considers past price and volume to predict the future trend where as
Fundamental analysis On the other hand, Fundamental analysis of a business involves analyzing
its financial data to get some insights. The efficacy of both technical and fundamental analysis is
disputed by the efficient-market hypothesis which states that stock market prices are essentially
unpredictable.
This research follows the Fundamental analysis technique to discover future trend of a stock by
considering news articles about a company as prime information and tries to classify news as
good (positive) and bad (negative). If the news sentiment is positive, there are more chances that
the stock price will go up and if the news sentiment is negative, then stock price may go down.
This research is an attempt to build a model that predicts news polarity which may affect
changes in stock trends. In other words, check the impact of news articles on stock prices. We
are using supervised machine learning as classification and other text mining techniques to
check news polarity. And also be able to classify unknown news, which is not used to build a
classifier. Three different classification algorithms are implemented to check and improve
classification accuracy. We have taken past three years data from Apple Company as stock price
and news articles.
2. LITERATURE SURVEY
Stock price trend prediction is an active research area, as more accurate predictions are directly
related to more returns in stocks. Therefore, in recent years, significant efforts have been put
into developing models that can predict for future trend of a specific stock or overall market.
Most of the existing techniques make use of the technical indicators. Some of the researchers
showed that there is a strong relationship between news article about a company and its stock
prices fluctuations. Following is discussion on previous research on sentiment analysis of text
data and different classification techniques.
Nagar and Hahsler in their research [1] presented an automated text mining based approach to
aggregate news stories from various sources and create a News Corpus. The Corpus is filtered
down to relevant sentences and analyzed using Natural Language Processing (NLP) techniques.
A sentiment metric, called NewsSentiment, utilizing the count of positive and negative polarity
words is proposed as a measure of the sentiment of the overall news corpus. They have used
various open source packages and tools to develop the news collection and aggregation engine
as well as the sentiment evaluation engine. They also state that the time variation of
NewsSentiment shows a very strong correlation with the actual stock price movement.
Yu et al [2] present a text mining based framework to determine the sentiment of news articles
and illustrate its impact on energy demand. News sentiment is quantified and then presented as a
time series and compared with fluctuations in energy demand and prices.
J. Bean [3] uses keyword tagging on Twitter feeds about airlines satisfaction to score them for
polarity and sentiment. This can provide a quick idea of the sentiment prevailing about airlines
and their customer satisfaction ratings. We have used the sentiment detection algorithm based
on this research.
This research paper [4] studies how the results of financial forecasting can be improved when
news articles with different levels of relevance to the target stock are used simultaneously. They
used multiple kernels learning technique for partitioning the information which is extracted
from different five categories of news articles based on sectors, sub-sectors, industries etc.
News articles are divided into the five categories of relevance to a targeted stock, its sub
industry, industry, group industry and sector while separate kernels are employed to analyze
each one. The experimental results show that the simultaneous usage of five news categories
improves the prediction performance in comparison with methods based on a lower number of
news categories. The findings have shown that the highest prediction accuracy and return per
trade were achieved for MKL when all five categories of news were utilized with two separate
kernels of the polynomial and Gaussian types used for each news category.
3. METHODOLOGY
Following system design is proposed in this project to classify news articles for generating stock
trend signal.
News Collection Document Plot time series of
Representation past Adj_close price
Polarity Detection
Algorithm System Evaluation
Also, to ignore words that appear in only one or two documents, we are considering minimum
document frequency which considers words that appear in minimum three documents.
Stemming is also important to reduce redundancy in words. Using stemming process, all the
words are replaced by its original version of word. For example, the words developed,
development, developing are reduced to its stem word develop. Some of the pre-processing
is done before applying polarity detection algorithm. And some of them are applied after
applying polarity detection algorithm.
Here, we are considering one assumption as if the score of the document is 0, then we label it as
positive as we are considering two class problem for this implementation. As a result, we get
news collection with its sentiment score and polarity as positive or negative.
4. EVALUATION
We tested the models using different testing options so that we can compare each method
against different scenarios. Following are the test options on which we tested our models.
5-fold cross validation
10-fold cross validation
15-fold cross validation
70% Data split
80% Data split
New testing data
Figure 4: Time series plot of news sentiment score vs. actual stock price for test dataset
5. CONCLUSION
Finding future trend for a stock is a crucial task because stock trends depend on number of
factors. We assumed that news articles and stock price are related to each other. And, news may
have capacity to fluctuate stock trend. So, we thoroughly studied this relationship and concluded
that stock trend can be predicted using news articles and previous price history.
As news articles capture sentiment about the current market, we automate this sentiment
detection and based on the words in the news articles, we can get an overall news polarity. If the
news is positive, then we can state that this news impact is good in the market, so more chances
of stock price go high. And if the news is negative, then it may impact the stock price to go
down in trend.
We used polarity detection algorithm for initially labelling news and making the train set. For
this algorithm, dictionary based approach was used. The dictionaries for positive and negative
words are created using general and finance specific sentiment carrying words. Then pre-
processing of text data was also a challenging task. We created own dictionary for stop words
removal which also includes finance specific stop words. Based on this data, we implemented
three classification models and tested under different test scenarios. Then after comparing their
results, Random Forest worked very well for all test cases ranging from 88% to 92% accuracy.
Accuracy followed by SVM is also considerable around 86%. Naive Bayes algorithm
performance is around 83%. Given any news article, it would be possible for the model to arrive
on a polarity which would further predict the stock trend.
FUTURE WORK
We would like to extend this research by adding more companys data and check the prediction
accuracy. For those companies where availability of financial news is a challenge, we would be
using twitter data for similar analysis. We can also incorporate similar strategies for algorithmic
trading.
ACKNOWLEDGEMENTS
Authors would like to thank our guides, teachers, family and friends who supported in the
completion of this research project. Appreciating everyone who helped us knowingly or
unknowingly for this project.
REFERENCES
[1] Anurag Nagar, Michael Hahsler, Using Text and Data Mining Techniques to extract Stock
Market Sentiment from Live News Streams, IPCSIT vol. XX (2012) IACSIT Press, Singapore
[2] W.B. Yu, B.R. Lea, and B. Guruswamy, A Theoretic Framework Integrating Text Mining and
Energy Demand Forecasting, International Journal of Electronic Business Management. 2011,
5(3): 211-224
[3] J. Bean, R by example: Mining Twitter for consumer attitudes towards airlines, In Boston
Predictive Analytics Meetup Presentation, 2011
[4] Yauheniya Shynkevich, T.M. McGinnity, Sonya Coleman, Ammar Belatreche, Predicting Stock
Price Movements Based on Different Categories of News Articles, 2015 IEEE Symposium
Series on Computational Intelligence
[5] P. Hofmarcher, S. Theussl, and K. Hornik, Do Media Sentiments Reflect Economic Indices?
Chinese Business Review. 2011, 10(7): 487-492
[6] R. Goonatilake and S. Herath, The volatility of the stock market and news, International
Research Journal of Finance and Economics, 2007, 11: 53-65.
[7] Spandan Ghose Chowdhury, Soham Routh , Satyajit Chakrabarti, News Analytics and Sentiment
Analysis to Predict Stock Price Trends, (IJCSIT) International Journal of Computer Science and
Information Technologies, Vol. 5 (3) , 2014, 3595-3604
[8] Robert P. Schumaker, Yulei Zhang, Chun-Neng Huang, Sentiment Analysis of Financial News
Articles
[9] Gyz Gidfalvi, Using News Articles to Predict Stock Price Movements, University of
California, San Diego La Jolla, CA 92037, 2001
[10] L. Breiman, Random forests. Machine Learning, 45(1):5-32, 2001
[11] Data Mining Lab 7: Introduction to Support Vector Machines (SVMS)
[12] Joachims T., Text Categorization with Support Vector Machines: Learning with Many Relevant
Features, European Conference on Machine Learning (ECML), Application of Machine
Learning and Data mining in Finance, Chemnitz, Germany, 1998)
[13] Kyoung-jae Kim, Financial time series forecasting using support vector machines,
Neurocomputing 55 (2013) 307 319
[14] Pegah Falinouss, Stock Trend Prediction using News articles, The Lulea University of
Technology, 2007
[15] https://en.wikipedia.org/wiki/Support_vector_machine
[16] http://www3.nd.edu/~mcdonald/Word_Lists.html
[17] https://jeffreybreen.wordpress.com/2011/07/04/twitter-text-mining-r-slides/
Authors
Kalyani Joshi
Student of Master in Engineering in at K. J.
Somaiya College of Engineering, Mumbai.
Completed Bachelors in Engineering from Pune
University, 2013.
Prof. Bharathi H. N.
Currently working as Head of Department of
Computer Engineering at K. J. Somaiya
College of Engineering, Mumbai.