[go: up one dir, main page]

0% found this document useful (0 votes)
16 views51 pages

Multi Linear Regression

Documents asked to submit

Uploaded by

charitycy665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views51 pages

Multi Linear Regression

Documents asked to submit

Uploaded by

charitycy665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

CHAPTER ONE

1.0 INTRODUCTION

This project work will be based mainly on the pr1nciples of multiple

regression model Here ,ve will be dealing with a regression ·model .

that contains more than one i11deper1dent (regressor) variables a.r.id

using it to predict the value of the dependent va riable or response.

In our case study, the recovery time of blood pressure of a patient .

after surgery depends largely on two factors, the dose of the drug

administered to lower the blood pressure and the average systolic

blood pressure reached during the surgery.

These are uncontrolled variables which the recc:very time depends on.

Hence, we are going to be working with rnultiple regression model

having two regressor variables. ·

Chapter two will highlight the works of different authors on

regression and .correlation analysis. In chapte.r three a detailed ·

examir;iation of the method used in the estimation of regression

· coefficients 'Will be given. Chapter four will be bothering on the

application of multiple regression analysis in the health sector.

Precisely we will be looking the recovery time of 53 cases (PB

1
patients) after surgery in Delta State University Teaching Hospital

Oghara, Delta State..

The term regression · was· first used in nineteenth century.

Regression and correlation analysis involves the study of

relationship between variables usually the dependent and.

independent variables. This relationship can exist between the

demand and supply of goods, the quality of ,vears, the pull strength ·

of a wire bond etc., in particular the recovery time of a PB patient

after surgery is a function of some factors like the dose of the drug

administered to stabilize . the blood pressure and average systolic

(highest level) blood pressure reached during the surgery.

In carrying out this analysis,. a secondary published by the Department

of Surgery and Anesthesia, Delata State University Teaching Hospital,

Oghara, on 5:3 PB patients that undergo surgery was used. During the

process of analysis errors are likely to occur, but the model will be

able to deal with such errors in measurement.

2
1.1 WHAT IS REGRESSION ANALYSIS?

Regression analysis is the estimation of unknown values of one

variables from known variable(s) value more specifically, regression

analysis helps us understand how a typical value of the dependent

variable changes when any one of the independent variable predictor

is altered.

1.2 SOME DEFINITIONS

Independent Variable:

The independent variable is the variable that :logically have some

effect on the dependent variable. For_ example in an experiment the

independent variable is the variable that is varied or manipulated

by the researcher.

Dependent Variable:

The dependent variable is the variable that is observed or measured

for variation. It cannot be manipulated by the researcher in an

experiment.

Simple Linear Regression:

3
A linear regression with one independent variable is called simple

linear regression. It is represented as,·

y = f},, +fi, x + s, (1.2.1)

Where y is the dependent variable

X is independent variable

o and 1 are regression parameters.

,:, is the error terms.

Multiple Linear

Regression:

· A linear regression with more , than one independent variable 1s

called multiple linear regression. It is represented as.

y = fJ,, + /3, x, + f],x 2 + ...+ /Jk xk + s,


( 1.2.2)

Where y is the dependent variable

x,. x, x, ,..., x, are the independent variables

/J ,JJ ,/3 2 • ..• /3, are the regression pararneters and ,:, is the error terms . .

4
1.3 . WHAT IS CORRELATION?

Correlation is the degree of relationship existing between two or more

variables. The degree of relationship that exists between two variables

is called simple correlation while the relationship between more than

two variables is called multiple correlation coefficient.

Correlation Coefficient:

This is the degree of linear relationship existing between two or

more variables. This is denoted as for populatio n and r for sample

correlation. This value lies between -1 and 1 i.e -1 ≤r≤1

1.4 TYPES OF CORRELATION

- Positive Correlation: When increase in y is associated with an

increases 1n x or when decrease in y is associated with

a corresponding decreases in x.

-Negative correlation: An increase in y is associated with a decrease

in x, or a decrease iny is associated with an increase in x.

-Perfect Positive Correlation: When all the scatter points lie on the

positive slope line.


.,.
)
-Perfect Negative Correlation: When all the scatter points lie on

the negative slope line.

-Zero Correlation: When no relationship exists between the

variable under study.

1.5 USEFULNESS OF REGRESSION AND CORRELATION

ANALYSIS

1. It is used in building empirical models to engineering and

scientific data.

2. It is used to influence the outcome of the dependent

variable, in case there is a causal relationship.

3. Correlation is used mainly to show the degree of linear

relationship. between variables.

4. It is used for backcasting and forecasting of economic

trends.

1.6 OBJECTIVES OF THE STUDY

The objective of this project work is to obtain 8- regression line for

the recovery time of a PB patient after surgery with respect to the

dose of the drug administered to stabilize the blood ·pressure and


6
average systolic (highest level) blood pressure reached during the

surgery for 53 patients examined .

1.7 METHODOLOGY

The least square estimation method will be employed in estimating

the parameters of the multiple f egression line. This is because the

least square estin1ation method unlike other methods of estimation

(Bayesian and the Maximum Likelihood Estimation methods)

possesses the BLUE properties. That is, best linear unbiased

estimator.

To test for the regression coefficients, we will make use of the z- statistic

and the F- ratio test (ANOVA) .

1.8 SCOPE OF THE STUDY

The scope of this work is centered on multiple regression and

correlation analysis. For the purpose of this project we are going to

consider the special case of only two regressors or independent

variables. Finally, the application of multiple regression and

correlation analysis to the recovery time of PB patients after surgery

is discussed and analyzed.

7
CHAPTER TWO

2.1 LITERATURE REVIEW

In this chapter we are going to cite the works of other authors who

have contributed to the development of regression and correlation

analysis. The term "Regression" was first used in the nineteenth

century to show that the progeny of exceptional individuals tends on

average to be less exceptional than their parents and more· likely

their distant ancestors.

Monga (1972) stated that correlation and regression are statistical tools

to rneasure the degree of sameness and variation. He opinedthat for

every value· of x there is a corresponding value of anothervariable

Y and there are n observations with set of pairs, (X,Y), (X2 Y2) ...

(XnYn) constituting a bivariate population, then the following questions

can be asked.

1. Is there any relationship between the two variables?

2. Is the equation measurable"?

3. What is the degree of relationship if any between · these

variable?

8
.,

Kreyzig ( 1970) gave some insights _in regression analysis. According to

him, in regression analysis one of the two variable say X can be

regarded as a variables measure without appreciable error while the

other variable say Yi's a random variable. The variable X is called the

controlled variable and one is interested in the dependent variable, Y,

this is the unknown variable he opined that in carrying out the

experiment, we first select n values, where n is the size of the .,

sample say X1 X2, X3, ....Xn. and then observer Y a t these values of X1

such we obtain a sample of the form (X1Y1) ,(X2 Y2) ,.... {Xn,Yn). He

further stated that in correlation, both X and Y are random innature;

as such one is interested in the association between them.

Akimbo (2000) defined correlation coefficient as the degree of linear

relationship betv,een two or more variables. It is normally denoted

as 15 for population and r for sample correlation. The values lie

between - 1 and 1, that is, (- s 15 s I) or (-1 s r s I)

Suppose there are X and Y variables, if increase in X brings about

increase in Y, X and Y are said to be positively cJrrelated . Also if an

increase in X brings about a decrease in Y, X and Y are said to be

negatively correlated. He added that, regression analysis is

9
concerned with the · prediction of tµe . values of the dependent

variable Y based on the known values of the incteperident variables

X. When one independent variable is involved we have a sample

regression, otherwise we have a multiple regression problem.

Summers {1977) opined that in correlation and regression analysis,

we are concerned with techniques based on bivariate observations.

He gave an example of the weight and of two persons, but the

weight of a person and the iNeight of another person do not

constitute a bivariate observa tion. He stated that the basic

distinction between correlation and regression analysis is that, in

regression the dependent variable Y is a random variable· but the

value of the independent variables are assurned to · be known .

without error. He explained the sample correlation coefficient as a

dimensionless measure of the degree of associaton, while R-square

is called the coefficient of detern1ination which assumes the values

-1,0,1.

Iyoha and Ekanem (2004) described linear regression model as

having two components, namely, the variables and the parameters.

10
The variables are linear in the parameters implies that the parameters

are given in powers not·greater than one.

Aldrich ( 1995) stated that "correlation" does not imply causation.

This phrase is used. in science and statistics to emphasize that

correlation between two variab


es does not automatically imply that one
!

causes the other (though correlation is necessary for causation in

the absence of any third and competing causative variable .

Montgomery and Runger (2003) stated that many applications of

regression analysis involve situation in which th!:re are more than

one regressor variable. A regression model that contain more than

one regressor variable is called a multiple regression model. An

example suppose the effective life of a cutting tool depend on the

cutting speed and the tool angle. A multiple regression model that

might describe this relationship is, Y = /3,, + /J , X , , {J2 X , + E. where y

represents the tool life X1 represents· the cutting speed, X2

represents the tool angle, and E: is random error term this i s a

multiple linear regression model with two. regressors. The term

linear is used because Y is a linear functionof the unknown

parameters, f3o,f31 and (32. The parameter (32 are called the partial

11
regression coefficients, becaµse 131 measured the expected change in

. Y per unit change in X1 when X2 is held constant, and !32 measure


the expected change in Y per unit change in X2 when X1 is

constant.

In general, the dependent variable or 'response Y may be related to

. K independent
.
or regressor variables.
.
·

The model

;.-_
Y =JJ,, +/J, X , + /32 X 2 + ................ + /3,X , + c. ;

1s ·. called a multiple · linear regression model with k regressor variables,

The parameters /J;,I = OJ,2,...,k are called the regressioncoefficients.

According to Strait (1983), simple linear regression analysis is

concerned with the problem of predicting and estimating . the value


. . : . .

o{ a random variable y on the basis . of a measurement x when

certain conditions of linearity can be assured . .In many applied ·

problems there are three steps in the analysis:

. 1. · · The examination of sample data to determine the validity of

an· assumption of linear dependence.


·,',a-·
12

l
[\.-,
11. The estimation of regression line with the method of least

square.

111. Computation of confidence limits to evaluate the goodness

of the estimated coefficient.

· Kendall ( 1938) propounded the kendal rank correlation coefficient

commonly referred to as kendalls tau (T) coefficient. It is a statistic

used · to measure the association between two measurableq11ahtities.

A tau test is a non-parametric hypothesis test which uses the

coefficient to test for statistical dependence.

The kendall coefficient is defined as

. T ::· (number of concordant pairs) - number of discordant pass)


Y,11(11 -" I) .

,. It has the following properties the· denominator is the total number

of pairs, so the coefficients must be in the range -1≤ƍ≤1.

-If the agreement between the two rankings is perfect (i .e the two

rankings are the same) the coefficient -has value 1.

13
-If X and Y are independent, then we would expect the coefficient to

be approximately zero.

The kendall rank coefficient is often used as a statistic in statistical

hypothesis test to establish whether two variables may be regarded

as statistically dependent.

Davidson (2003) , described regression models as models used in

,,
explaining how one or perhaps a Jew responses depend on other

explanatory variable. The ideal regression is the core of many

statistical modeling because of the question, that what happens to y

when x varies is central to r:nany investigation s. It is most times

needed to predict 'or control the other variables or to have an

understanding of the relationship between them. Here h_e opines that

there is always a single response treated :.is random variable and there

are also many explanatory variables which are non- stochastic. Hence

the simplest model involves linear dependence.

Ojameruaye and Oaikhenan (2004), pointed out that a single

equation regression model seeks to explain chan ges in the values of

a variable usually . denoted as X/s are called the independent

variables. They assumed that the , dependent variable is a liner

14

·?c.' -
function of the independent variable. Usually in X, for i= 1,2,...,n, where

n is the sample size.

Cox and Hinkley ( 1974), worked on the "Removing correlation".

According to them it is always possible to rernove the correlation

between random variables with a linear transforrnati.on, even if the

relationship betw een the variables is non-linear.

Mosteller (1977) , discussed regression in two different cases. In

mathematics distribution of Y with density f (y/ x) i.e f of y given X.

theregression of y on X is defined as follows

Y ( X )= If (Y ( X ) dy.
He then conducted that y is a linear function of x passing through the

origin. He also defined regression as that of fitting a function. This

makes use of aggression method . which we often have comparatively

modest among rather than the hundred soothing.Hoel ( 1971) gave a

geometric view of the regressicn curve. According to him the regression

curve is the locus of the mean of the conditional distribution. Whose

densities are given by f (y/ x) , hence, it is convenient to use the density

interpretation of f (y/ x) .

15
CHAPTER THREE
METHODOLOGY OF MULTIPLE REGRESSION AND CORRELATION
ANALYSIS

2 THE LINEAR MULTIPLE REGRESSION MODEL

In general, the linear multiple regression model is defined as,

Y =/3,, + /31 X 2 + /J2 X2 +...... + /3,X , + c. (3.1.1)

Where,

Y= dependent (response) variable

X's = the independent variable (predictors)

Ws = the regression coefficients !parameters)

.c = the random error or disturbance

Here we are going to look at a regression model

having two independent variables. That is,

· Y'= /3,, + /31 X 1 + /32 X 2 + c. (3.1. 1)

3.1 ASSUMPTIONS OF THE LINEAR MULTIPL:€

REGRESSION MODEL

To complete the specification of our model, we need the

following assumptions.
a. Normality of 6 : The values of each 6; are normally

distributed with mean O and variance 0"2 • 6, -N(O, 0"2 ) .

b. Zero Mean of 6 : The random error 6 has a z;ero mean for


each

Xi.That is , E ( 6 i) = 0.

c. Randomness of 6 : The variable £ is a real random variable

d. Homoselasticity, the variance of each 6, is the same for all

the Xi values. That is the variance of each 6; is coristant for

all x,

e. No errors of measurement in the X's: the inclependent

variables are measured without errors. That is X ,s are

fixed.

L Independence of 6, and X; .Every error term l is

independent of the independent variables that is E( 6 ,X ,) =

0
E( E,.X ,) =

g. No perfect Multi collinearity of X ;' s. The independen t

variables are not perfectly linearly correlated.

3.2 ESTIMATION OF REGRESSION PARAMETERS

In estimating the parameters of the regression n1odel, ,ve

will make use of the least squares are and matrix methods.
CHAPTER FOUR

APPLICATION OF THE MULTIPLE REGRESSION MODEL TO THE

RECOVERY TIME OF PB PATIENTS AFTER SURGERY

4.0 INTRODUCTION

A normal blood pressure is 120/70. A blood pressure reading consists of two

numbers: systolic is the higher number, 120 is normal and diastolic is the

lower number, 70 is normal.

High blood pressure is not a contraindication to surgery but very high blood

pressure can make anesthesia unsafe. Patients with a diastolic of 100 or

greater must be treated before they can undergo electice surgery.

Elevated blood pressure is almost never a reason to delay or cancel surgery.

In some cases, very high blood pressure before surgery cannot be avoided,

and the surgery itself is meant to correct the cause of the high blood pressure.

During surgery, a type of doctor called an Anesthesiologist is responsible for

monitoring the vital signs. In addition to putting the patient to sleep for the

surgery, the Doctor also carefully watches the heart rate, breathing pattern

and blood pressure. By studying the medical history and understanding the

type of surgery being performed, the Anesthesiologist knows what values

each vital sign should have. During the surgery the Anesthesiologist will not
only monitor these values, but will use intravenous drugs to correct them if

they start to deviate from accepted values.

The drugs used to control blood pressure during surger are all given through

an IV tube, are very fast acting, and extremely effective. Throughout the

surgical procedure, all of vital signs will be maintained at very close to their

ideal levels.

For elective surgical procedures like cosmetic surgery or vision correction

surgery, the surgeon may want to try and get the patient blood pressure as

close to normal as possible before proceeding with the surgery. While this is

not strictkly necessary, it does reduce the risk of certain surgical

complications. Since the surgery can safely be delayed as long as necessary,

this approach is medically appropriate in these circumstances.

4.1 DATA ANALYSIS

The Data used in this project is a secondary data. Below is the data showing

the recovery time, Log dose of drug administered to stabilize BP and the

systolic blood pressure reached during surgery.


CHAPTER FIVE

5.0 SUMMARY

In this project work, we started with an introduction of the

general concept of multiple linear regression and correlation

analysis and subsequently discussed the works of various

authors on regression and correlation analysis. In our

attempt to estimate the regression coefficients (parameters) ,

we have used the least square methods and in testing the

statistical significance of these parameter estimates, wc

made use of the ,t- statistic.

. Also, we were able to obtain the coefficient of determination and

in testing the statistical reliability of our model; we used the

F- statistic. Since, · we were testing for the joint significance of

the regression parameters.

Finally, we applied our multiple regression and correlation

models to the . recovery time of BP patients in Delta State

University Teaching Hospital, Oghara, Delta State, which

depends on the Log Dose of anesthetic drug administered and

the BP of the patient

. before the surgery.


. 5.1 CONCLUSION

In this project work, we have been able to adapt a model that best

fits recovery time of BP patients after surgery with respect to the log

dose and BP. As such, we have been able to predict using our model

the recovery time of a patient given the log dose and the patients BP.

The first scatter diagram tells us about the effect of the dose of the

anesthetic drug used on recovery time

The second scatter diagrmn !:ells us about the effect of blood pressure

achieved during surgery and time to recovery to the original level.

The greater the drop in the blood pressure the longer it would take

for it to return to its original level. The R-sq. value is 20.3o/o

indicating that 20.3% of the variation in recovery

· time can be accounted for by the patients BP before the surgery.

The . Multiple Regression Analysis describes the effect of the two

explanatory variables acting jointly on the recovery time. R-sq

improves to 22.8% indicating that even though the dose of the

· drug is art important factor in lowering the ··blood pressure, the


lower the blood pressure achieved . during surgery the

longer it takes for it to recover to norn1al value.

An interesting message here is about the individual


variability of

· subjects in responding to the anesthetic drug. For the


same dose

· those subjects experiencing a greater fall in their blood

pressure would take longer for it to return to normal.


BIBLIOGRAPHY
Aldrich John ( 1995) : "Correlations Genuine3 and Spurious
in Pwrson and Yule". Statistical Science 10 (4): 364 -
376.
Cox, D.R. and Hinkley, D.V. (1974) : Theoretical statistics.
Chapman and Hall (Appendix 3) ISBJV 04 7 2124203.
Davidson, A.C. (2000): "Statistical :Models" Carnbridge
University Press.
Hoel, G.P ( 1971) : "Introduction to Mathematical Statistics". Fourth
Edition, John Wiley and sons, Inc.
Kendall, M. (1938) : "A New Measure of Rank
Correlation".
Biometrika 30 (1-2): 81 -89.
Kreyszig Erwin ( 1970) : "Introductory Mathematical Statistics"
John Wiley and Sons Inc., New York.
Monga G.S. ( 1972) : "Mathematics and Statistics for
Economics".
Vikas publiching House put Ltd.
Montgomery, D.C And Runger, G.C (2003) : "Applied Statistics and
Probability for Engineers". John Wiley and sons, Inc.
Mosteller. Federick (1977) : "Data An alysis and Regression, A 2nd

Course in Statistics". Wesley Publishing.


Ojameruagye, E.O AND Oaikhenan, H.E (2004) : "A second course in
Econometrics".
Strait Pegg Tang (1983) : "Probability and Statistics
with Application,,. Harcourt Brace, Jovanovich, Inc.
Summers, W.G., Williams, S.P and Charles, P.A. (1977) : "Basic
statitistics and Introduction" Wadsworth Publishing
Company, Inc.
Statistical Records of BP Patients in Delta State. 201:5 ; 14:53-61.

You might also like