[go: up one dir, main page]

0% found this document useful (0 votes)
106 views5 pages

Logistic Regression Model - A Review

The method of determining future values of a company’s stocks and other financial values is called stock price prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views5 pages

Logistic Regression Model - A Review

The method of determining future values of a company’s stocks and other financial values is called stock price prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165
LOGISTIC REGRESSION MODEL –A REVIEW

*Mrinalini Smita*
Assistant Professor,
Intermediate Section, St. Xavier’s College, Ranchi, Jharkhand

Abstract:- The method of determining future values of a  CLASSIFY observations by estimating the probability
company’s stocks and other financial values is called that an observation is in a particular category (such as
stock price prediction. The movements of stock prices GOOD or POOR performance of Company of S&P BSE
and stock indices are influenced by many macro- 30 in our case).
economic variables such as political events, policies of
the corporate enterprises, general economic conditions, OBJECTIVE OF THE STUDY:
commodity price index, bank rate, loan rates, foreign  To help investors to get idea where to invest their
exchange rates, investors’ expectations, investors’ valuable money.
choices and the human psychology of stock market  To enable stock brokers and the investing public to make
investors. [Miao et al,2007] Hence to develop predictive better informed decisions.
models for stock market prediction is a difficult task due
to the uncertainty involved in the movement of stock II. LOGISTIC REGRESSION MODEL
market. That is why it requires continuous improvement
in forecasting models. Forecasting accuracy is the most Binary logistic regression model is also known
important factor in selecting any forecasting methods. predictive model. It is used where data in dichotomous or
Financial ratios influence investment decision-making. binary (0 or 1) dependent variables like good/poor,
This is the reason that stock market prediction with the success/failure etc. [Ali et. al, 2018] In logistic regression,
help of binary logistic regression using relation between the goal is to predict classification the observations into one
financial ratios and stock performance can enhance an of the classes.
investor’s stock price forecasting ability. This paper is
presenting a review on Logistic regression Model (LRM). ASSUMPTIONS OF LOGISTIC REGRESSION
MODEL:
Keywords:- Stock Price Prediction, Financial Ratios,  Dependent variables should be measured on a
Logistic Regression Model. dichotomous scale. For example, performance of stock-
GOOD or POOR
I. INTRODUCTION  We must have one or more continuous or categorical
independent variables .
LOGISTIC REGRESSION  The dependent variable should have mutually exclusive
Regression analysis is one of the most useful and the and exhaustive categories.
most frequently used statistical methods. The aim of the
 A linear relationship should be between any continuous
regression methods is to describe the relationship between a independent variables along with the logit transformation
response variable and one or more explanatory variables.
of the dependent variable.
Among the different regression models, logistic regression
 There should be high correlation with two or more
plays a particular role.[Ngunyi et. al, 2014] Logistic
independent variables i.e. data must show
regression extends the ideas of linear regression to the
multicollinearity .
situations where the dependent variable Y is categorical.
Logistic Regression is applied to categorize a bunch of
Logistic regression is a predictive analysis which
independent variables into either two or more mutually
extends the idea of linear regression to the situation where
exclusive classes. [ Ali et. al, 2018]Logistic regression seeks
dependent variable is categorical variable with two levels,
to
including Y/N, High/ low, Good/ Poor while predictor can
 MODEL the probability of an event occurring be continuous or dichotomous, just as in regression analysis.
depending on the values of the independent variables , Since the probability of an event must be between 0 and 1, it
which can be categorical or numerical . is impractical to model probabilities with linear regression
 ESTIMATE the probability that an event occurs for a techniques because the linear regression model allows the
randomly selected observations verses that the dependent variable to take values greater than 1 or less
probability that the event does not occur. than 0 .[ Dutta et. al 2012] . Moreover, calculations using
 PREDICT the effect of a series of variables on binary linear regression are very complex. In linear regression ,
response variables. accuracy is low.[Navale et. al, 2016]

IJISRT21MAY1050 www.ijisrt.com 1276


Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Graph1. Linear Graph2. Logistic


Regression model Regression model Graph 3. Graph of Logit
Graph depicting Difference between LINEAR Regression Classification in case of linear regression is not possible as
and LOGISTIC Regression : ℎ(𝑥) given by
In Logistic regression, instead of predicting the value
ℎ(𝑥) = ∑𝑁 𝑖=0 𝛽𝑖 𝑥𝑖 , where N is the number of predictors.
of the variable Y from a predictor variable X or several
predictor variable Xs , we predict the Probability of Y
Always gives real values . So we can apply another
occurring , given known values of Xs. Hence the logistic
function on the linear function so that we can use the result
regression model is a type of generalised linear
for classification. That another function is Logistic Function
model(GLM) that extends the linear regression model by
(Sigmoid Function) , which is a S-shaped curve .
linking the range of real numbers to 0-1 range.

The logistic regression model or logit probit model is given


below.
𝑝(𝑥𝑖 ) = 𝑃(𝑦𝑖 = 1: 𝑥𝑖 )

=[ 1 + exp( 𝑋 𝑇 𝛽 )]−1
and 𝑋 𝑇 𝛽 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ . . 𝛽𝑛 𝑥𝑛

Where 𝑥1 , 𝑥2 ,…………….. are independent variables and 𝛽 is


the coefficient.

Logistic regression model the relationship between the


dichotomous dependent variables depending upon odds
ratios. ODDS RATIO for a variable in logistic regression
represents how the odds change with one unit increase in
that variable holding all others variables constant. In Graph 4: Sigmoid Function
logistic regression , the dependent variable is a log odds or
logit, which is the natural log of odds. [Dutta et. al., 2012] In the graph , as z→ ∞, 𝑔(𝑧) → 1
and z→ −∞, 𝑔(𝑧) → 0
p(x) 1
= [exp ( – X T β)]-1 = ODDS g(z) or 𝜎(𝑧)= 1+𝑒 −𝑧
p(1-x)

Taking natural log both sides Just like in regression h(x), in logistic regression for
classification, we have
p(x)
log p(1-x) = [−(𝑋 𝑇 𝛽)]−1
h(x) = g(∑ 𝛽𝑖 𝑥𝑖 )

Above transformation in log is known as logistic = 𝑔(𝛽 𝑇 𝑋) [ Matrix notation]


transformation (logit).
[ Ali et. al, 2018] 1
Or σ(z)= T
1+e-β X

IJISRT21MAY1050 www.ijisrt.com 1277


Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
𝑛
We can use a linear function of 𝛽, pass it through the 1 𝑒𝑥𝑝−𝛽𝑥𝑖
Sigmoid function (S-shaped curve) and use it for 𝑙(𝛽) = ∑ 𝑦𝑖 [log ( ) − log ( )]
1 + 𝑒𝑥𝑝−𝛽𝑥𝑖 1 + 𝑒𝑥𝑝−𝛽𝑥𝑖
classification. 𝑖=1
𝑒𝑥𝑝−𝛽𝑥𝑖
+ log ( )
1 + 𝑒𝑥𝑝−𝛽𝑥𝑖
𝑛
𝑒𝑥𝑝−𝛽𝑥𝑖 𝑒𝑥𝑝𝛽𝑥𝑖
𝑙(𝛽) = ∑ yi [log(𝑒𝑥𝑝𝛽𝑥𝑖 )] + log ( . )
This derivative of Sigmoid function, given by 1 + 𝑒𝑥𝑝−𝛽𝑥𝑖 𝑒𝑥𝑝𝛽𝑥𝑖
𝑖=1
𝑔′ (𝑧) = 𝑔(𝑧)( 1 − 𝑔(𝑧)),
𝑛
1
is the most attractive feature of Sigmoid function which is 𝑙(𝛽) = ∑ 𝑦𝑖 𝛽𝑥𝑖 + log ( )
extremely simple to compute and making use of it for 1 + 𝑒𝑥𝑝𝛽𝑥𝑖
𝑖=1
classification problem.
𝑙(𝛽) = ∑𝑛𝑖=1 𝑦𝑖 𝛽𝑥𝑖 − log(1 + 𝑒𝑥𝑝𝛽𝑥𝑖 ) [transcendental
III. PARAMETER ESTIMATION equation]

Maximum Likelihood Estimation (MLE) is a method This is the final form of likelihood function which is
of estimating the parameter of probability distribution by to be optimized. The goal is to find the value of 𝛽 that
maximizing a likelihood function, in order to increase the maximizes this function. This is called transcendental
probability of occurring the observed data. The goal of equation (involving logarithm and exponents term) which is
maximum likelihood is the optimal way to fit a distribution computationally expensive. However , we can use numerical
to the data. methods for approximation. To update β parameters to
maximise this function we use Newton-Raphson Method to
In Maximum Likelihood Estimation converge to the maximum of this function for estimation.
 Consider N samples with labels either 0 or 1
 For samples labelled “1”: estimate 𝛽 ^ (𝛽-hat) such that IV. NEWTON RAPHSON METHOD FOR
𝑝(𝑋) is as closed to 1 as possible PARAMETER ESTIMATION
 For samples labelled “0”: estimate 𝛽 ^ (𝛽-hat) such that
One of the most common method to determine the beta
1- 𝑝(𝑋) is as closed to 1 as possible i.e. the maximum value. β values for the logistic regression equation and then to
make predictions on new data is done by Newton Raphson
On combining these requirements ,we want to find 𝛽 Method. Newton Raphson is a deterministic numerical
parameters such that both these product is maximum over all optimization technique. The primary advantage of using
elements of dataset. This function we need to optimize is Newton Raphson compared to probabilistic alternatives is
called likelihood function. that in most iterations this method is fast.

Thus , on combining the products We consider the Newton Raphson Method.

L(𝛽) = ∏𝑥 𝑓𝑜𝑟𝑦𝑖 =1 𝑝(𝑥𝑖 ) . ∏𝑥 𝑓𝑜𝑟𝑦𝑖 =0 (1 − 𝑝(𝑥𝑖 )) ∇β l(β) = ∇β l(β∗) + (β − β∗)∇ββ l(β∗)

L(𝛽) = ∏ 𝑝(𝑥𝑖 ) 𝑦𝑖 . (1 − 𝑝(𝑥𝑖 ) )1−𝑦𝑖 ∇β l(β∗) + (β − β∗ )∇ββ l(β∗)= 0


𝑛

𝑙(𝛽) = ∑ 𝑦𝑖 log( 𝑝(𝑥𝑖 )) + (1 − 𝑦𝑖 )log(1 − 𝑝(𝑥𝑖 )) ∇𝛽 𝑙(𝛽 ∗)


𝛽 = 𝛽∗ −
𝑖=1 ∇𝛽𝛽 𝑙(𝛽 ∗)
[Here 𝑙 represents log-likelihood]
∇ 𝛽 𝑙(𝛽 𝑡 )
𝑛 𝛽 𝑡+1 = 𝛽 𝑡 − [Newton raphson Equation]
1 ∇𝛽𝛽 𝑙(𝛽 𝑡 )
𝑙(𝛽) = ∑ 𝑦𝑖 log ( )
1 + 𝑒𝑥𝑝−𝛽𝑥𝑖
𝑖=1 We need to compute this for t iterations then data will
𝑒𝑥𝑝−𝛽𝑥𝑖
+ (1 − 𝑦𝑖 ) log ( ) eventually converge to the approximate coefficient vector.
1 + 𝑒𝑥𝑝−𝛽𝑥𝑖
Now we compute the Gradient with respect to𝛽:
1 n
𝑤ℎ𝑒𝑟𝑒 𝑝(𝑥𝑖 ) =
1+𝑒𝑥𝑝 −𝛽𝑥𝑖
∇β l = ∇β ∑ yi βx i − log(1 + expβxi )
i=1

∇β l = ∑ni=1 ∇β [yi βx i − log(1 + expβxi )]

IJISRT21MAY1050 www.ijisrt.com 1278


Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
∇β l = ∑ni=1 ∇β [ yi βxi ] − ∇β [log(1 + expβxi )]
The final logistic regression equation estimated by
n using the maximum likelihood estimation for classifying the
1
∇β l = ∑ yi x i − [ . expβxi . x i ] dependent variable, 𝑧 for given independent variables ,
1 + expβxi 𝑥1 , 𝑥2 , … … … . . 𝑥𝑛 is:
i=1

n
1 𝑧 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ . 𝛽𝑛 𝑥𝑛+∈
∇β l = ∑ yi x i − [ . . x i]
1 + exp−βxi 𝑝 1
i=1 where: 𝑧 = log(1−𝑝 ) and ‘𝑝 = 1+𝑒 −(𝛽0 +𝛽1𝑥1+𝛽2 𝑥2 +⋯.𝛽𝑛𝑥𝑛)
’ is
∇β = ∑𝑛𝑖=0 𝑦𝑖 𝑥𝑖 − [𝑝(𝑥𝑖 )]𝑥𝑖] the probability that the outcome is GOOD(1) and ∈ is the
error term.[Ali et al, 2018]
∇β l = ∑ni=1[yi -[p(xi )]. x i [Gradient Vector]
Hence the three important steps for classification
prediction through logistic regression model is given by
This is the numerator term of Newton- Raphson Equation.
1. 𝑙𝑜𝑔𝑖𝑡 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ . 𝛽𝑛 𝑥𝑛 +∈
Next we compute the denominator term (𝜵𝜷𝜷 𝒍(𝜷𝒕))
Where 𝛽0 is the intercept and 𝛽1 , 𝛽2 , 𝛽3 , … … . . 𝛽𝑛 are co-
called the Hessian matrix which is a matrix of second order efficients to be determined .
derivatives with respect to data coefficients.
2. ODDS = 𝑒 𝑙𝑜𝑔𝑖𝑡
∇𝛽𝛽 𝑙 = ∇𝛽 ∑𝑛𝑖=1[𝑦𝑖 − 𝑝(𝑥𝑖 )]𝑥𝑖 𝑜𝑑𝑑𝑠
3. 𝑃(𝑌) = 1+0𝑑𝑑𝑠
∇𝛽𝛽 𝑙 = ∑𝑛𝑖=1 ∇𝛽 [𝑦𝑖 − 𝑝(𝑥𝑖)]𝑥𝑖
MERIT/ RELEVANCE OF THE LOGISTIC
∇𝛽𝛽 𝑙 = ∑𝑛𝑖=1 ∇𝛽 − 𝑝(𝑥𝑖 )𝑥𝑖 [remove y, as it is independent REGRESSION MODEL :
of 𝛽 ]  The variables may be either continuous or discrete or any
combination of both types and they do not necessarily
𝑛 have normal distributions.
1  The logit function is particularly popular because its
∇𝛽𝛽 𝑙 = ∑ ∇𝛽 − [ ]𝑥
1 + 𝑒𝑥𝑝−𝛽𝑥𝑖 𝑖 results are relatively easy to interpret.
𝑖=1
 The contribution of Logistic Regression (co-efficient and
1 intercept) can be interpreted.
𝑤ℎ𝑒𝑟𝑒 𝑝(𝑥𝑖 ) =
1+𝑒𝑥𝑝 −𝛽𝑥𝑖  It is possible to test the statistical significance of the
2
coefficients in the model. These statistical tests are
1
∇𝛽𝛽 𝑙 = ∑𝑛𝑖=1 ∇𝛽 [ ] . 𝑒𝑥𝑝−𝛽𝑥𝑖 . (−𝑥𝑖 ). 𝑥𝑖 Kolmogorov-Smirnov Test, Hosmer and Lemeshow
1+𝑒𝑥𝑝 −𝛽𝑥𝑖
Test, Omnibus test of model coefficients which can be
𝑛 used to build models incrementally .
𝑒𝑥𝑝−𝛽𝑥𝑖 1  This model helps investors to form an opinion about the
∇𝛽𝛽 𝑙 = − ∑[ ]. [ ]. 𝑥 𝑇 𝑥
−𝛽𝑥
1 + 𝑒𝑥𝑝 𝑖 1 + 𝑒𝑥𝑝−𝛽𝑥𝑖 𝑖 𝑖 right time to invest with better decisions.
𝑖=1

∇𝛽𝛽 = − ∑𝑛𝑖=1 𝑝(𝑥𝑖 )(1 − 𝑝(𝑥𝑖 ) 𝑥𝑖 𝑇 𝑥𝑖 [ Hessian Matrix] DEMERIT/LIMITATION OF THE LOGISTIC
REGRESSION MODEL:
Now converting these into Matrix Notation:  These models can predict linear patterns only.
∇𝛽 𝑙 = 𝑋 𝑇 ( 𝑌 − 𝑌 ^ )
V. CONCLUSION
𝑇
∇𝛽𝛽 𝑙 = −𝑋 𝑃(1 − 𝑃)𝑋
The logistic regression model is very interesting and
important in the field of stock market forecasting. The
∇𝛽𝛽 𝑙 = −𝑋 𝑇 𝑊𝑋 [ replacing P(1-P) by W as diagonal ultimate goal is to increase the yield from the investment by
matrix] providing useful information to shareholders and potential
investors to enable them to make better decisions regarding
Substituting these values in Newton Raphson investments. This model has overcome the tedious,
Equation, we have expensive and time-consuming process of traditional
𝛽 𝑡+1 = 𝛽 𝑡 + (𝑋 𝑇 𝑊 (𝑡) 𝑋)−1 𝑋(𝑌 − 𝑌 ^(𝑡) ) ) techniques of prediction. This model can be a stepping stone
for future prediction technologies. Moreover, the logistic
Then we have to execute it for number of iterations ‘t’ regression model can be used for comparative study with
until the value of converges. Once the coefficients have other models like Artificial Neural Network , Time –Series
been estimated, we can substitute the values of some feature Model etc. as well, in future.
vectors X to estimate the probabilities of it belonging to a REFERENCES
specific class ( by choosing a parameter above which it is
class 1 (GOOD) and below which it is class 0 (POOR).

IJISRT21MAY1050 www.ijisrt.com 1279


Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[1]. Miao, K., Chen, F. & Zhao, Z.G. (2007). “Stock price
forecast based on bacterial colony RBF neural
network”. Journal of Qingdao University (Natural
Science Edition), 2, 11.
[2]. Ngunyi, A., Mwita, P.N., Odhiambo,R.O., “On the
estimation of properties of logistic regression
Parameters”, IOSR Journals of Mathematics.,e-
ISSN:2278-5728,Volume10,Issue 4,(2014) :57-68
[3]. Ali, S.S. , Mubeen, M., Lal I., Hussain A. ,
“Prediction of stock performance by using logistic
regression model: evidence from Pakistan Stock
Exchange (PSX)” , Asian Journal of Empirical
Research ,Volume 8, Issue 7 (2018): 247-258 ISSN
(P): 2306-983X, ISSN (E): 2224-4425
[4]. Navale, G., Dudhwala, N., Jadhav, K. , Gabda, P. ,
Vihangam , B.K., “Prediction of stockmarket using
Data mining and Artificial Intelligence” , IJESC, Vol.
6,ISSN: 2321-3361,2016, 1-6,6539-6544.
[5]. Upadhyay, A., Bandyopadhyay, G., & Dutta, A.
(2012). “Forecasting stock performance in indian
market using multinomial logistic regression.”,
Journal of Business Studies Quarterly, 3(3), 16-39.

IJISRT21MAY1050 www.ijisrt.com 1280

You might also like