Algorithmic Trading Bot
Algorithmic Trading Bot
ISSN 2229-5518
Abstract- We aim to dive into the field of financial machine learning by building an algorithmic trading bot. This would be used to make better
investment decisions and for seeking out more profitable trades. A lot of time is spent by traders on continuous monitoring of transactions.
These activities can instead be monitored by the trading bot, thereby saving cost and time for constant supervision. This would lead to a
reduction in transaction costs. It would also greatly reduce the possibility of human errors while placing trades. The bot can be used for
institutional investors and big brokerage houses for both stock market companies like Apple, Microsoft, and Tesla (AAPL, MSFT, TSLA) as
well as cryptocurrency investments like Bitcoin and Ethereum (BTCUSD, ETHUSD). We have implemented two machine learning algorithms
using the concepts of ensemble learning and support vector machine. A random forest regressor is implemented into the buy-and-hold trading
strategy for long-term investments and a support vector regressor is implemented into the scalping trading strategy for short-term
investments. We have also used backtesting for a successful deployment.
Index Terms— Ensemble learning, financial machine learning, hyperparameter optimization, random forests, regression, support vector
machine, trading algorithms
—————————— ——————————
IJSER
competition, many traders in the same market use these
machine learning algorithms for the same purpose. So, the
data and can identify the features which indicate an B. A kind of Stock Market Forecasting method merged
approaching variation in the bid and market pricing. based on sentiment analysis and HMM (2014)
Algorithmic trading also eliminates human error. Human [2] In this patent, they used a method of Sentiment
traders may get affected by intense market pressures, which analysis and HMM by Enhancing the accuracy of Stock
may affect their judgement and lead to poor market decisions. Market Forecasting by employing the emotional
Algorithmic trading aims to reduce such errors caused by
tendency information in economic and financial news
psychological and emotional factors.
webpages and has great potential for application in
domains like sentiment analysis, subject identification,
Stock Market Forecasting, and Website content
monitoring.
IJSER
analysis and HMM, whereas in our solution we have
language processing (NLP) techniques and then using
used Zipline Data Portal Interface to Plot and Chart
that information to predict changes in stock prices or
the Pricing Data using Matplotlib and analyse Candle
volatility. The algorithms are then used in a wide
Stick Charts
range of texts, including publications from online
newspapers such as financial newsletters and the Wall
C. Coordination of algorithms in algorithmic trading
Street Journal, as well as television transcripts, radio
engine (2010)
broadcasts, and annual reports. Their solution is made
up of two parts: a text understanding component that
[3] In this patent, they used a method for optimizing
fills in simple templates instantly and a statistical
algorithmic trading by making the chores of initiating
correlation component that analyses the relationship
and running algorithms easier while also giving real-
between this pattern and stock price gains or declines.
time feedback on the user's automated trade
executions. Preferred implementations of the specific
Similarities with our solution:
topic system overcome recognised algorithmic trading
Both The solutions are related to financial trading
products' limitations by
algorithms, namely the assessment of rapidly
(1) allowing financial markets to use a simplified,
changing sources of information combining natural
instinctive graphical interface to click and drag
language processing and user trading behaviour to
complicated, multi-algorithm investment strategies,
predict fluctuations in stock price or volatility. Both
(2) allowing users to track informational market
These predictions can be utilised to develop successful
impact costs in real-time, and
trading strategies.
(3) automating the classification, management, and
cancellation of algorithms based on user input.
Differences between the patent and our solution:
In the patent, they used natural language processing
Similarities with our solution:
in extracting information from news and parsing or
Both the Patent’s and our solution’s goal is to provide
pattern match on words to identify natural language
real-time feedback to traders on both their order
text describing activities or announcements of a
implementations and the market impact they have.
particular publicly-traded company to fetch the
Both the solutions are trying to complement rather
dataset. Where in our solution we used Zipline Data
than replace the value a human brings towards the
Portal Interface to Plot and Chart the Pricing Data
trading process, broadening rather than narrowing his
using Matplotlib and analyse Candle Stick Charts
point of view on the industry and also how his orders
affect it all through direct visual evidence of changes
IJSER © 2022
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 13, Issue 7, July-2022 522
ISSN 2229-5518
IJSER
simulation indicative order input area generates
feedback for assessing operational characteristics of an
algorithm described as in the algorithm. along with an
auto hedging option, a scratch quantity is employed.
If a quantity in a market at a counter order's market
price falls underneath the stated scratch amount, the
counter order's price level decreases.
The random forests which we are using here are for a better
result due to the bagging of many decisions trees which will
sort the required features according to their importance. We
will have a regression analysis for statistical analysis with the
use of a regression tree called split criterion. After the complete
evaluation of our RFI, we will be building a trading algorithm
such as a One-pass algorithm, Linear algorithm, Scikit-learns,
etc. followed by implementing exploit the correlation strategy.
IJSER © 2022
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 13, Issue 7, July-2022 523
ISSN 2229-5518
We will then implement the GBoosting which is an ML as financial data structures as these are essential in designing
technique for regression as well as classification problems. any sort of financial machine learning model and hence will
build a bot skeleton. Data preprocessing is done through
We will build the model in a stage-wise way and will generate designing a machine learning model and diving deeper into
them by optimizing an arbitrary differentiable loss function ML and training and evaluating our model.
using the Python language and then evaluating the model We will be fetching the data and the process used will be,
performance. As we reach this step, we going to introduce risk installing the zipline into the packages of the python interpreter
management which is very important as it gives a signal to the system. To retrieve publicly available data, set we are creating
traders for the financial indicators for profitably driving the an account on quandl.com. To reduce the basic errors, we tried
trades. In the advance trading algorithm, we are going to using Quantopian-quandl. We are going to take the help of
introduce a scalping strategy where it will be very helpful to getting a history window to get the data frame containing the
the local investors due to which they will be less prone to risks history window as it is available as a member function of the
and hence attractive. Many such advances are benefiting our zipline data portal interface used internally to answer the
project from settling targets to higher rise to the opportunity by question about data. (df=data_port.get_history_window ()).
proper observation of the trend. The assumed positions which We will be initializing the data portal interface which requires
we are keeping are the real-time trading which will ensure respected three mandatory parameters. The first parameter is
profitability. The process which we will be following for risk asset unscored finder a method reference used internally to
management will be defining goals, measuring risk followed solve assets and here it’s a member method of bundle objects
by designing a system. which is initialized previously calling bundle. Methods and
passing the name of quandl data sets. Our second parameter is
the trading calendar used internally for minutes and session
scheduling. The other two parameters in the data set are also
IJSER
4 PROPOSED METHODOLOGY defined as bundle objects.
A. Data extraction, preprocessing, and feature selection
To identify the combination of features and strategy
parameters that can give the most accurate model we are using
different segments of the dataset. To fetch the data, set we are
going to use quandl and use for data preprocessing. We will
also try using Quantopian-quandl for a more accurate result.
For an accurate model and to evaluate the ultimate feature
space the technique we are going to apply due to its iterative
method is the Random Forest strategy in all the combinations
of strategy data giving a statistical outcome of the most
accurate model. In feature selection, we are using the filter
method. This will pick up the properties which are intrinsic to
the features. Since this method is faster and less expensive than
wrapper techniques. So, it is cheaper to use the filter method.
To measure the linear relationship of multiple variables and to
predict the data we will be using a correlation coefficient. The
logic we are generating behind using this for feature selection
relation is the correlation with the target. We are keeping in
mind the uncorrelation of the target variables between
themselves. The information gain techniques are used to
calculate the entropy further in transforming a dataset. We are
also trying other techniques and will finalize based on the value
which will be close to our profitable prediction values.
B. Ensemble learning
the number of levels of the decision tree makes it more prone starting year of data availability is much before Tesla’s
to overfitting. foundation year, making the null values not useful.
We can relate the use of diverse decision trees with the E. Splitting criteria
diversity of financial portfolios. Like it is always better to
maintain a mixed portfolio across debt and equity funds to
reduce risk, multiple and diverse decision trees lead to more
efficient performance on unseen data. Ensemble learning
aggregates multiple results leading to more stability and
robustness, and a noise reduction. Another advantage of
ensemble learning is that it can catch both linear and non-linear
relationships of data by using different models and then
forming their ensemble.
IJSER
(for classification). We shall be using the average method since results. But we shall be using the Gini index as it is faster. The
ours is a regression model for predicting stock prices. reason is that entropy makes the use of logarithmic function,
making it more computationally expensive. So, we shall use
multiple CART trees in the random forest.
IJSER
Trading Strategy distribution after concatenating the historical returns with the
future returns. In our case, we choose this lowest percentile to
Scalping is a more advanced trading strategy than the one we be 5%.
used before. Scalping is used to execute trades at very high
speeds. This strategy works on the principle of opening and This risk metric will trigger an action that would be executed
closing positions rapidly. This limits the exposure to the whenever the returns for a particular trading minute fall below
market. A strict exit strategy is needed so that one huge loss the value at risk. This action would trigger an exit signal to exit
does not affect multiple small profits. our trading position.
IJSER © 2022
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 13, Issue 7, July-2022 526
ISSN 2229-5518
The graph below shows the price of Tesla stock over time.
The graph below depicts the final portfolio value of the tesla
stock and as we can see the final portfolio was somewhere
around 24 million dollars
Here we can see there are two colours – red and blue. The red
colour shows the ground truth(the data in the dataset) whereas
the blue colour shows the data that our model predicted.
IJSER Here the output is negative which means that we have sold the
stock for more value than we have purchased them indicating
that we have done a profit.
The graph below shows the algorithm volatility vs benchmark And this is the final portfolio value for a year which is around
volatility. Here, our volatility is high, but since we know high 17 million dollars
risk equals high profit, it is required.
IJSER © 2022
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 13, Issue 7, July-2022 527
ISSN 2229-5518
The graph below has two plots – one for price and the other one SECTION 3 – ADVANCED TRADING ALGORITHM
for correlation
The graph we see below is the Sharpe ratio evaluation metric
which is a financial metric often used by investors.
IJSER The graph below is the bar chart for sharpe ratios and here most
of our values are on the negative side.
The graph below shows that we have only entered the trades
but not exiting them which is a sign of error in the long run.
IJSER © 2022
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 13, Issue 7, July-2022 528
ISSN 2229-5518
using ‘clf.best_params_’ we got our best value for estimator_c Finally, the image below shows the percentage of returns i.e.,
as ‘1’ which gave us a good result. we made around 33 per cent profit.
The below graph also depicts the Sharpe ratio, but this was at
the time of backtesting. As we can see most of the time Sharpe
ratio was in a positive direction. It was only in the beginning
that it was negative
B. Comparison
I.
IJSER
[1] In this patent they used a method of extracting
information from online news feeds using Neural
Networks, a natural language processing technique
and then using that information to predict changes in
stock prices or volatility. While in our model we used
Random forests and Support Vector Machines for the
same. Random forests and SVM are comparatively
inexpensive and don’t require the use of a graphics
processor to complete training. A random forest can
provide a better understanding of a decision tree with
better performance. Whereas for Neural Networks To
be effective, they will require far more data than the
average person has on hand. Support Vector Machines
and Random Forest, on the other hand, require far
fewer data as input. For the sake of performance, the
The graph below depicts cumulative returns(which is defined neural network will just destroy the interpretability of
as the sum of historical returns) and as we can see the return is the data to the level of making it just meaningless.
very high. Therefore, using Random forests and Support vector
Machines over a Neural Networks is the best pick.
the dense characteristics in NLP, and thus results in 1 Algorith Neural Naiev KNN SVM and
sentimental analysis and machine translation, whereas m Netwo e Random
rks Bayes Forests
Naive Bayes' results are inconsistent. You don't
anticipate the inputs to be substantially connected, 2 R2 Value 0.2567 0.36 0.997 0.9937
thus Naïve Bayes is more of a generic approach that 9
only works when we want to categorise a tiny corpus 3 Support 12% 42% 27% 33%
of data with a relatively limited number of input Vector
attributes. Regressio
n profit %
C. Squareoff - Algo Trading firm
IJSER
and the type of distance to be utilized must also be of 33% was obtained, making our models fit for deployment.
determined. As we must compute the distance between This was possible by implementing a strict exit policy for short-
each query instance and all training samples, the term investments wherein the bot would make multiple small
computation time is likewise very long. Therefore, the profits and exit the position before a huge loss can be inflicted.
selection of K, as well as the metric (distance) to use in
KNN, must be carefully calibrated. We used SVM in our
model because Outliers are handled better by SVM than 7 REFERENCES
by KNN. Since there are many characteristics and little
training data, SVM outperforms KNN, making or [1] https://patents.google.com/patent/US8285619B2/en
solution more efficient.
[2]
https://patents.google.com/patent/CN103778215B/en?q=sto
ck+market&oq=stock+market
[3]
https://patents.google.com/patent/US8095455B2/en?q=algo
rithm+trading&oq=algorithm+trading
[4]
https://patents.google.com/patent/JP2017117473A/en?q=al
gorithm+trading&oq=algorithm+trading
[5] https://squareoff.in/
S. Evaluatio A B C Algorith
n n Metris mic
o Trading
Bot
IJSER © 2022
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 13, Issue 7, July-2022 530
ISSN 2229-5518
IJSER
IJSER © 2022
http://www.ijser.org