[go: up one dir, main page]

0% found this document useful (0 votes)
85 views5 pages

Gold Price Forecasting for Investors

This document discusses using machine learning techniques to predict future gold prices. It first provides background on gold markets and importance of price forecasting. It then summarizes random forest regression and linear regression methods that will be used for prediction. Random forest regression is an ensemble bagging method that combines decision trees. Linear regression models a dependent variable as a function of independent variables. The document aims to accurately forecast gold prices to inform investors and institutions using these machine learning algorithms.

Uploaded by

Manya K M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views5 pages

Gold Price Forecasting for Investors

This document discusses using machine learning techniques to predict future gold prices. It first provides background on gold markets and importance of price forecasting. It then summarizes random forest regression and linear regression methods that will be used for prediction. Random forest regression is an ensemble bagging method that combines decision trees. Linear regression models a dependent variable as a function of independent variables. The document aims to accurately forecast gold prices to inform investors and institutions using these machine learning algorithms.

Uploaded by

Manya K M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Gold Market Prediction using Machine Learning

Techniques
Dr. D. Durga Bhavani​1​,Temburu Harshita​2​, Palaparthi Hemalatha Carolin​3​ and Chanamolu Harsha​4
1​
Professor Dept. of Computer Science, CVR College of Engineering, Ranga Reddy, India
Email: ​drdurgabhavani@gmail.com
2​
Dept. of Computer Science, CVR College of Engineering, Ranga Reddy, India
Email: harshitarao.26@gmail.com
3​
Dept. of Computer Science, CVR College of Engineering, Ranga Reddy, India
Email: hemacarolin721@gmail.com
4​
Dept. of Computer Science, CVR College of Engineering, Ranga Reddy, India
Email: harshachanamolu@gmail.com

Abstract: G ​ old price plays a significant role in ensemble learning methods for classification and regression.
economical and monetary systems. ​Forecasting and An Ensemble method is a technique that combines the
predicting the future price trends of gold and other predictions from multiple machine learning algorithms
precious metals, will be beneficial for money managers together to make more accurate predictions than any
and individual investors to hedge to decide when to individual model. A model composed of many models is
invest in this commodity. Central banks across the world called an Ensemble model.
maintain gold reserves to guarantee the money of their
depositors, foreign-debt creditors, and currency holders.   There are two types of Ensemble learning. They are
They also use gold reserves as a means to control Boosting and Bagging. Boosting refers to a group of
inflation and strengthen their country’s financial algorithms that utilize weighted averages to make weak
standing. As a precious metal, gold is popular for jewelry learners into stronger learners. Boosting is all about
and ornamentation. ​In this paper, we predict future gold “teamwork”. Each model that runs, dictates what features
prices via Linear Regression and Random Forest   the next model will focus on. Bootstrap Aggregation
Regression. (Bagging) refers to random sampling with replacement.
Bootstrap allows us to better understand the bias and the
Index Terms: Forecasting; Prediction; Linear variance with the dataset. Bootstrap involves random
Regression; Random Forest Regression; Supervised sampling of a small subset of data from the dataset. Bagging
Learning makes each model run independently and then aggregates
the outputs at the end without preference to any model.
I. I​NTRODUCTION
Random forest is a bagging technique and not a boosting
​ hroughout history, almost every established culture has
T
technique. It illustrates the power of combining many
used gold to symbolize power, beauty, purity, and
decision trees into one model. Decision trees are
accomplishment. Today we continue to use gold for our
computationally expensive to train, carry a big risk of
most significant objects such as wedding rings, Olympic
overfitting, and tend to find local optima because they can’t
medals, Awards, money, crucifixes and ecclesiastical art. No
go back after they have made a split. Random forests
other substance of the same rarity holds a more visible and
address these weaknesses. The trees in random forests are
prominent place in our society. Due to its high value and
run in parallel. There is no interaction between these trees
very limited supply it has long been used as a medium of
while building the trees[2].
exchange or money. Early transactions were done using
pieces of gold or pieces of silver. The rarity, usefulness, and
Linear regression ​generally explains the relationship
desirability of gold make it a substance of long-term value.
between an independent or predictor variable and one
Forecasting its price is a widely explored topic and
dependent or criterion variable. A dependent variable is
interesting to multiple global institutions and small & large
modeled as a function of the independent variable with the
scale investors[1]
corresponding coefficient, along with the constant term.

In this paper we have majorly covered Random forest


Regression and Linear Regression. Random forest
Regression is a Supervised Learning algorithm which uses

CVR College of Engineering


II. R​ELATED​ W​ORK
III. P​ROBLEM​ S​TATEMENT
A. Literature review
Historically, Gold was used for supporting trade
Banhi Guha and Gautam Bandyopadhyay in their paper
transactions around the world besides other modes of
“Gold Price Forecasting using ARIMA Model” have used
payment. Various states maintained and enhanced their gold
the Auto Regressive Integrated Moving Average model
reserves and were recognized as wealthy and progressive
which belongs to the Time Series machine learning model
states. In present times, precious metals like gold are held
that is widely used in today’s world since the evolution of
with central banks of all countries to guarantee repayment of
sophisticated statistical software package. It uses 6 different
foreign debts, and also to control inflation. Moreover, it also
parameters to accurately predict the price[3].
reflects the financial strength of the country. Besides
government agencies, various multinational companies and
Wind Turbine noise prediction using Random Forest
individuals have also invested in gold reserves. In addition
Regression by Gino Iannace, Giuseppe Ciaburro and Amelia
to the demand and supply of the commodity in the market,
Trematerra shows the active and accurate random forest
the performance of the world’s leading economies also
regression model to predict the noise in Southern Italy. The
strongly influences gold rates. The successful application of
methodology involves the usage of a linear regression model
data mining in highly visible fields like e-business,
to compare the accuracy with the results obtained by the
marketing and retail has led to its application in other
random forest regressor[4].
industries and sectors. We predict future gold rates using
Linear Regression and Random Forest Regression
Vanitha S and Saravanakumar K’s paper “The usage of gold
techniques. Accurate forecasting of gold price helps to
and the investment analysis based on gold rate in India”
foresee the circumstances of trends in the future. This
gave us the essential insight into the factors affecting the
provides the useful information for stakeholders to fulfill the
price of Gold across the globe. These factors include,
essential actions in order to prevent or mitigate risks, which
Inflation, Global Movement, Government Gold reserves,
may lead to financial losses or even bankruptcy.
Festive seasons, Interest rate trends, Stock market and
production costs. Additional factors such as Silver rate and
Crude oil rate are also responsible for the fluctuation of the
IV. ​PROPOSED METHODOLOGIES
price of gold[5].
A. DataSet
Research on Power Load Forecasting Based on Random
Forest Regression by Na Liu, Yanzhu Hu and Xinbo Ai To train the model we have used historic gold prices time
consists of a crucial technique of denoising the data and series dataset. It consists of Date and Gold price. The
effectively implementing the random forest regression problem of missing values was handled in an appropriate
algorithm to predict the power consumption with various manner to complete the dataset. Gold prices change on a
input factors. According to the paper, the establishment daily basis and are also affected by major world events. This
process of the random forest regression model mainly dataset consists over 9771 records and using stratified
includes two steps, that is building a decision tree and shuffle split 20% data is used as the test set, and the 80%
forming a random forest. The random forest model is a data is used for training i.e train data has 7328 records and
combined model composed of many regression decision test data has 2443 records.
trees. The parameters of each decision tree are independent
and identically distributed random variables. The regression B. Machine Learning Models
decision trees give a prediction for each input sample[6].
The Linear Regression and Random Forest Regression
Iftikhar ul Sami, Khurum Nazir Junejo’s paper “Predicting models are fitted with the training set data and the models
Future Gold Rates using Machine Learning Approach” are tested for accuracy using the test set.
defines the use of Artificial neural networks and predicts the
prices using the performance indicators from Russia, China, Linear regression, is a statistical technique that uses a single
and India as they are the biggest purchaser of gold. Data for explanatory variable to predict the outcome of a response
this study is collected from January 2005 to variable. The goal of linear regression is to model the linear
September 2016 from various sources. Data for attributes, relationship between the explanatory (independent) variable
such as Oil Price, NYSE, Standard and Poor’s (S&P) 500 and response (dependent) variable.
index, US Bond rates (10 years), Euro/ USD exchange rates
were gathered. Data of many government central banks and The equation has the form Y= a + bX, where Y is the
five large companies that have invested huge amounts in dependent variable (that’s the variable that goes on the Y
gold have also been collected. Price of precious metals axis), X is the independent variable (i.e. it is plotted on the
during this period is also included in the analysis[7].

CVR College of Engineering


X axis), ‘b’ is the slope of the line and ‘a’ is the Figure 3: System Architecture
y-intercept​[8].
D. Procedure (Linear Regression)

The following is the step wise procedure to train and test the
Linear Regression model.

1. The Price and Date columns are reshaped to (-1,1).


2. The data set is imported and it is split into the Train
set and Test set with the test_size = 0.20.
3. The Linear Regression module is imported from
sklearn.linear.
4. Train the model with the train set and predict for
the test set.
5. Plot the predictions and print the accuracy.
A Random Forest is an ensemble technique capable of
performing both regression and classification tasks with the
E. Procedure (Random Forest Regression)
use of multiple decision trees and a technique called
Bootstrap Aggregation, commonly known as bagging.
​The following is the step wise procedure to train and test
The basic idea behind this is to combine multiple decision
the Random Forest Regression model.
trees in determining the final output rather than relying on
1. The Price and Date columns are reshaped to (-1,1).
individual decision trees. Using the data set mentioned we
  2. The data set is imported and it is split into the Train
have used 10 decision trees. Using more decision trees will
set and Test set with the test_size = 0.20.
invariably increase the complexity of the problem solution
3. The Random Forest Regression module is imported
It has a long training period. Errors are checked for
from sklearn.ensemble and initialized with
regression and the data is visualized for easy understanding.
n_estimators = 10, random_state = 42.
In future, we intend to improve our results by using
4. Train the model with the train set and predict using
ensemble learning, and deep learning.[9]
the test set.
5. Plot the predictions and print the accuracy.

F. Testing for accuracy

For Linear regression we have used the R squared test to


project the accuracy. R-squared is a goodness-of-fit measure
for linear regression models. This statistic indicates the
percentage of the variance in the dependent variable that the
independent variables explain collectively. R-squared
measures the strength of the relationship between your
model and the dependent variable on a convenient 0 – 100%
scale. After fitting a linear regression model, determine how
well the model fits the data.

R-squared evaluates the scatter of the data points around


the fitted regression line. It is also called the coefficient of
determination, or the coefficient of multiple determination
for multiple regression. For the same data set, higher
Figure 2: Random Forest Regression
R-squared values represent smaller differences between the
observed data and the fitted values. R-squared is the
percentage of the dependent variable variation that a linear
C. System Architecture
model explains.
Initially the data set along with the required modules are
imported. Data Reduction is performed on the data set if
required. The data set is further divided into Train set and
Test set. The machine learning model is built and it is tested
for accuracy. The resulting prediction is visualized.
R-squared is always between 0 and 100%. 0% represents a
model that does not explain any of the variation in the
response variable around its mean. The mean of the

CVR College of Engineering


dependent variable predicts the dependent variable as well   Using more decision trees will invariably increase the
as the regression model. 100% represents a model that complexity of the problem solution and increase the training
explains all of the variation in the response variable around period but if there is any change in the training data, it is
its mean. Usually, the larger the R2, the better the regression very likely that only one or two trees will be affected at
model fits your observations[10]. most.

For the Random Forest Regression model we have used the


score function provided by the model itself. ​score​(self, X, y,
sample_weight=None) it r​eturns the coefficient of
determination R^2 of the prediction.

V. R​ESULTS​ ​AND​ ​DISCUSSION VI. C​ONCLUSION​ ​AND​ F​UTURE​ W​ORK


The proposed work is implemented and tested
Figure 3 shows the Prediction using Linear Regression.  
successfully. By measuring the accuracy of the different
Figure 4 shows the Prediction using Random Forest algorithms, we found that the most suitable algorithm for
Regression. predicting the market price of a gold based on various data
points is the Random Forest Algorithm.

R​EFERENCES
[1] Hobart M. Kings Ph.D,,RPG “The many uses of Gold”
Article published at Geology.com

[2] Bhanu Yerra “Predicting tomorrow's gold price” published


atTowards DataScience.

[3] Banhi Guha and Gautam Bandyopadhyay “Gold Price


Forecasting Using ARIMA Model” published in Journal of
Advanced Management Science Vol. 4, No. 2, March 2016

[4] Gino Iannace *, Giuseppe Ciaburro and Amelia “Wind


Turbine Noise Prediction Using Random Forest Regression”
Trematerra Department of Architecture and Industrial Design,
Università degli Studi della Campania Luigi Vanvitelli, 81100
Figure 3: Prediction using Linear Regression model Aversa (CE), Italy .

[5] Vanitha S., Saravanakumar K’s “The usage of gold and the
investment analysis based on gold
Figure 4:Prediction using Random
rate in India ” published in the
Forest Regression model
International Journal of Electrical
and Computer Engineering (IJECE)
Vol. 9, No. 5, October 2019.

[6] Na Liu, Yanzhu Hu and Xinbo


Ai* “Research on Power Load
In Figure 4, the blue represents Forecasting Based on Random
Forest Regression” published in
the train data and pink
IOP Conf. Series: Earth and
represents the test data. It shows Environmental Science.
the most promising results with
an accuracy of 98.9%. Linear [7] Iftikhar ul Sami, Khurum
Regression model shows Nazir Junejo’s “Predicting
accuracy levels of ​85.3% ​where Future Gold Rates using
the Blue indicates the test data. Machine Learning Approach”
Hence, the results show that Random forest model has a published in the International Journal of Advanced
very high accuracy. Computer Science and Applications, Vol. 8, No. 12,
2017.
[8] Sai Shruti Swaminathan “Linear Regression – Detailed
B. Drawbacks and Limitations View” published on Towards Science on Feb 26,2018.

CVR College of Engineering


[9] Afroz Chakure “Random Forest Regression” ​published in
Towards Science on June 29,2019.

[10] Jim Frost “How To Interpret R-squared in Regression


Analysis” published on StatisticsByJim.com.

CVR College of Engineering

You might also like