CHAPTER ONE
1.0 INTRODUCTION
This project work will be based mainly on the pr1nciples of multiple
regression model Here ,ve will be dealing with a regression ·model .
that contains more than one i11deper1dent (regressor) variables a.r.id
using it to predict the value of the dependent va riable or response.
In our case study, the recovery time of blood pressure of a patient .
after surgery depends largely on two factors, the dose of the drug
administered to lower the blood pressure and the average systolic
blood pressure reached during the surgery.
These are uncontrolled variables which the recc:very time depends on.
Hence, we are going to be working with rnultiple regression model
having two regressor variables. ·
Chapter two will highlight the works of different authors on
regression and .correlation analysis. In chapte.r three a detailed ·
examir;iation of the method used in the estimation of regression
· coefficients 'Will be given. Chapter four will be bothering on the
application of multiple regression analysis in the health sector.
Precisely we will be looking the recovery time of 53 cases (PB
1
patients) after surgery in Delta State University Teaching Hospital
Oghara, Delta State..
The term regression · was· first used in nineteenth century.
Regression and correlation analysis involves the study of
relationship between variables usually the dependent and.
independent variables. This relationship can exist between the
demand and supply of goods, the quality of ,vears, the pull strength ·
of a wire bond etc., in particular the recovery time of a PB patient
after surgery is a function of some factors like the dose of the drug
administered to stabilize . the blood pressure and average systolic
(highest level) blood pressure reached during the surgery.
In carrying out this analysis,. a secondary published by the Department
of Surgery and Anesthesia, Delata State University Teaching Hospital,
Oghara, on 5:3 PB patients that undergo surgery was used. During the
process of analysis errors are likely to occur, but the model will be
able to deal with such errors in measurement.
2
1.1 WHAT IS REGRESSION ANALYSIS?
Regression analysis is the estimation of unknown values of one
variables from known variable(s) value more specifically, regression
analysis helps us understand how a typical value of the dependent
variable changes when any one of the independent variable predictor
is altered.
1.2 SOME DEFINITIONS
Independent Variable:
The independent variable is the variable that :logically have some
effect on the dependent variable. For_ example in an experiment the
independent variable is the variable that is varied or manipulated
by the researcher.
Dependent Variable:
The dependent variable is the variable that is observed or measured
for variation. It cannot be manipulated by the researcher in an
experiment.
Simple Linear Regression:
3
A linear regression with one independent variable is called simple
linear regression. It is represented as,·
y = f},, +fi, x + s, (1.2.1)
Where y is the dependent variable
X is independent variable
o and 1 are regression parameters.
,:, is the error terms.
Multiple Linear
Regression:
· A linear regression with more , than one independent variable 1s
called multiple linear regression. It is represented as.
y = fJ,, + /3, x, + f],x 2 + ...+ /Jk xk + s,
( 1.2.2)
Where y is the dependent variable
x,. x, x, ,..., x, are the independent variables
/J ,JJ ,/3 2 • ..• /3, are the regression pararneters and ,:, is the error terms . .
4
1.3 . WHAT IS CORRELATION?
Correlation is the degree of relationship existing between two or more
variables. The degree of relationship that exists between two variables
is called simple correlation while the relationship between more than
two variables is called multiple correlation coefficient.
Correlation Coefficient:
This is the degree of linear relationship existing between two or
more variables. This is denoted as for populatio n and r for sample
correlation. This value lies between -1 and 1 i.e -1 ≤r≤1
1.4 TYPES OF CORRELATION
- Positive Correlation: When increase in y is associated with an
increases 1n x or when decrease in y is associated with
a corresponding decreases in x.
-Negative correlation: An increase in y is associated with a decrease
in x, or a decrease iny is associated with an increase in x.
-Perfect Positive Correlation: When all the scatter points lie on the
positive slope line.
.,.
)
-Perfect Negative Correlation: When all the scatter points lie on
the negative slope line.
-Zero Correlation: When no relationship exists between the
variable under study.
1.5 USEFULNESS OF REGRESSION AND CORRELATION
ANALYSIS
1. It is used in building empirical models to engineering and
scientific data.
2. It is used to influence the outcome of the dependent
variable, in case there is a causal relationship.
3. Correlation is used mainly to show the degree of linear
relationship. between variables.
4. It is used for backcasting and forecasting of economic
trends.
1.6 OBJECTIVES OF THE STUDY
The objective of this project work is to obtain 8- regression line for
the recovery time of a PB patient after surgery with respect to the
dose of the drug administered to stabilize the blood ·pressure and
6
average systolic (highest level) blood pressure reached during the
surgery for 53 patients examined .
1.7 METHODOLOGY
The least square estimation method will be employed in estimating
the parameters of the multiple f egression line. This is because the
least square estin1ation method unlike other methods of estimation
(Bayesian and the Maximum Likelihood Estimation methods)
possesses the BLUE properties. That is, best linear unbiased
estimator.
To test for the regression coefficients, we will make use of the z- statistic
and the F- ratio test (ANOVA) .
1.8 SCOPE OF THE STUDY
The scope of this work is centered on multiple regression and
correlation analysis. For the purpose of this project we are going to
consider the special case of only two regressors or independent
variables. Finally, the application of multiple regression and
correlation analysis to the recovery time of PB patients after surgery
is discussed and analyzed.
7
CHAPTER TWO
2.1 LITERATURE REVIEW
In this chapter we are going to cite the works of other authors who
have contributed to the development of regression and correlation
analysis. The term "Regression" was first used in the nineteenth
century to show that the progeny of exceptional individuals tends on
average to be less exceptional than their parents and more· likely
their distant ancestors.
Monga (1972) stated that correlation and regression are statistical tools
to rneasure the degree of sameness and variation. He opinedthat for
every value· of x there is a corresponding value of anothervariable
Y and there are n observations with set of pairs, (X,Y), (X2 Y2) ...
(XnYn) constituting a bivariate population, then the following questions
can be asked.
1. Is there any relationship between the two variables?
2. Is the equation measurable"?
3. What is the degree of relationship if any between · these
variable?
8
.,
Kreyzig ( 1970) gave some insights _in regression analysis. According to
him, in regression analysis one of the two variable say X can be
regarded as a variables measure without appreciable error while the
other variable say Yi's a random variable. The variable X is called the
controlled variable and one is interested in the dependent variable, Y,
this is the unknown variable he opined that in carrying out the
experiment, we first select n values, where n is the size of the .,
sample say X1 X2, X3, ....Xn. and then observer Y a t these values of X1
such we obtain a sample of the form (X1Y1) ,(X2 Y2) ,.... {Xn,Yn). He
further stated that in correlation, both X and Y are random innature;
as such one is interested in the association between them.
Akimbo (2000) defined correlation coefficient as the degree of linear
relationship betv,een two or more variables. It is normally denoted
as 15 for population and r for sample correlation. The values lie
between - 1 and 1, that is, (- s 15 s I) or (-1 s r s I)
Suppose there are X and Y variables, if increase in X brings about
increase in Y, X and Y are said to be positively cJrrelated . Also if an
increase in X brings about a decrease in Y, X and Y are said to be
negatively correlated. He added that, regression analysis is
9
concerned with the · prediction of tµe . values of the dependent
variable Y based on the known values of the incteperident variables
X. When one independent variable is involved we have a sample
regression, otherwise we have a multiple regression problem.
Summers {1977) opined that in correlation and regression analysis,
we are concerned with techniques based on bivariate observations.
He gave an example of the weight and of two persons, but the
weight of a person and the iNeight of another person do not
constitute a bivariate observa tion. He stated that the basic
distinction between correlation and regression analysis is that, in
regression the dependent variable Y is a random variable· but the
value of the independent variables are assurned to · be known .
without error. He explained the sample correlation coefficient as a
dimensionless measure of the degree of associaton, while R-square
is called the coefficient of detern1ination which assumes the values
-1,0,1.
Iyoha and Ekanem (2004) described linear regression model as
having two components, namely, the variables and the parameters.
10
The variables are linear in the parameters implies that the parameters
are given in powers not·greater than one.
Aldrich ( 1995) stated that "correlation" does not imply causation.
This phrase is used. in science and statistics to emphasize that
correlation between two variab
es does not automatically imply that one
!
causes the other (though correlation is necessary for causation in
the absence of any third and competing causative variable .
Montgomery and Runger (2003) stated that many applications of
regression analysis involve situation in which th!:re are more than
one regressor variable. A regression model that contain more than
one regressor variable is called a multiple regression model. An
example suppose the effective life of a cutting tool depend on the
cutting speed and the tool angle. A multiple regression model that
might describe this relationship is, Y = /3,, + /J , X , , {J2 X , + E. where y
represents the tool life X1 represents· the cutting speed, X2
represents the tool angle, and E: is random error term this i s a
multiple linear regression model with two. regressors. The term
linear is used because Y is a linear functionof the unknown
parameters, f3o,f31 and (32. The parameter (32 are called the partial
11
regression coefficients, becaµse 131 measured the expected change in
. Y per unit change in X1 when X2 is held constant, and !32 measure
the expected change in Y per unit change in X2 when X1 is
constant.
In general, the dependent variable or 'response Y may be related to
. K independent
.
or regressor variables.
.
·
The model
;.-_
Y =JJ,, +/J, X , + /32 X 2 + ................ + /3,X , + c. ;
1s ·. called a multiple · linear regression model with k regressor variables,
The parameters /J;,I = OJ,2,...,k are called the regressioncoefficients.
According to Strait (1983), simple linear regression analysis is
concerned with the problem of predicting and estimating . the value
. . : . .
o{ a random variable y on the basis . of a measurement x when
certain conditions of linearity can be assured . .In many applied ·
problems there are three steps in the analysis:
. 1. · · The examination of sample data to determine the validity of
an· assumption of linear dependence.
·,',a-·
12
l
[\.-,
11. The estimation of regression line with the method of least
square.
111. Computation of confidence limits to evaluate the goodness
of the estimated coefficient.
· Kendall ( 1938) propounded the kendal rank correlation coefficient
commonly referred to as kendalls tau (T) coefficient. It is a statistic
used · to measure the association between two measurableq11ahtities.
A tau test is a non-parametric hypothesis test which uses the
coefficient to test for statistical dependence.
The kendall coefficient is defined as
. T ::· (number of concordant pairs) - number of discordant pass)
Y,11(11 -" I) .
,. It has the following properties the· denominator is the total number
of pairs, so the coefficients must be in the range -1≤ƍ≤1.
-If the agreement between the two rankings is perfect (i .e the two
rankings are the same) the coefficient -has value 1.
13
-If X and Y are independent, then we would expect the coefficient to
be approximately zero.
The kendall rank coefficient is often used as a statistic in statistical
hypothesis test to establish whether two variables may be regarded
as statistically dependent.
Davidson (2003) , described regression models as models used in
,,
explaining how one or perhaps a Jew responses depend on other
explanatory variable. The ideal regression is the core of many
statistical modeling because of the question, that what happens to y
when x varies is central to r:nany investigation s. It is most times
needed to predict 'or control the other variables or to have an
understanding of the relationship between them. Here h_e opines that
there is always a single response treated :.is random variable and there
are also many explanatory variables which are non- stochastic. Hence
the simplest model involves linear dependence.
Ojameruaye and Oaikhenan (2004), pointed out that a single
equation regression model seeks to explain chan ges in the values of
a variable usually . denoted as X/s are called the independent
variables. They assumed that the , dependent variable is a liner
14
·?c.' -
function of the independent variable. Usually in X, for i= 1,2,...,n, where
n is the sample size.
Cox and Hinkley ( 1974), worked on the "Removing correlation".
According to them it is always possible to rernove the correlation
between random variables with a linear transforrnati.on, even if the
relationship betw een the variables is non-linear.
Mosteller (1977) , discussed regression in two different cases. In
mathematics distribution of Y with density f (y/ x) i.e f of y given X.
theregression of y on X is defined as follows
Y ( X )= If (Y ( X ) dy.
He then conducted that y is a linear function of x passing through the
origin. He also defined regression as that of fitting a function. This
makes use of aggression method . which we often have comparatively
modest among rather than the hundred soothing.Hoel ( 1971) gave a
geometric view of the regressicn curve. According to him the regression
curve is the locus of the mean of the conditional distribution. Whose
densities are given by f (y/ x) , hence, it is convenient to use the density
interpretation of f (y/ x) .
15
CHAPTER THREE
METHODOLOGY OF MULTIPLE REGRESSION AND CORRELATION
ANALYSIS
2 THE LINEAR MULTIPLE REGRESSION MODEL
In general, the linear multiple regression model is defined as,
Y =/3,, + /31 X 2 + /J2 X2 +...... + /3,X , + c. (3.1.1)
Where,
Y= dependent (response) variable
X's = the independent variable (predictors)
Ws = the regression coefficients !parameters)
.c = the random error or disturbance
Here we are going to look at a regression model
having two independent variables. That is,
· Y'= /3,, + /31 X 1 + /32 X 2 + c. (3.1. 1)
3.1 ASSUMPTIONS OF THE LINEAR MULTIPL:€
REGRESSION MODEL
To complete the specification of our model, we need the
following assumptions.
a. Normality of 6 : The values of each 6; are normally
distributed with mean O and variance 0"2 • 6, -N(O, 0"2 ) .
b. Zero Mean of 6 : The random error 6 has a z;ero mean for
each
Xi.That is , E ( 6 i) = 0.
c. Randomness of 6 : The variable £ is a real random variable
d. Homoselasticity, the variance of each 6, is the same for all
the Xi values. That is the variance of each 6; is coristant for
all x,
e. No errors of measurement in the X's: the inclependent
variables are measured without errors. That is X ,s are
fixed.
L Independence of 6, and X; .Every error term l is
independent of the independent variables that is E( 6 ,X ,) =
0
E( E,.X ,) =
g. No perfect Multi collinearity of X ;' s. The independen t
variables are not perfectly linearly correlated.
3.2 ESTIMATION OF REGRESSION PARAMETERS
In estimating the parameters of the regression n1odel, ,ve
will make use of the least squares are and matrix methods.
CHAPTER FOUR
APPLICATION OF THE MULTIPLE REGRESSION MODEL TO THE
RECOVERY TIME OF PB PATIENTS AFTER SURGERY
4.0 INTRODUCTION
A normal blood pressure is 120/70. A blood pressure reading consists of two
numbers: systolic is the higher number, 120 is normal and diastolic is the
lower number, 70 is normal.
High blood pressure is not a contraindication to surgery but very high blood
pressure can make anesthesia unsafe. Patients with a diastolic of 100 or
greater must be treated before they can undergo electice surgery.
Elevated blood pressure is almost never a reason to delay or cancel surgery.
In some cases, very high blood pressure before surgery cannot be avoided,
and the surgery itself is meant to correct the cause of the high blood pressure.
During surgery, a type of doctor called an Anesthesiologist is responsible for
monitoring the vital signs. In addition to putting the patient to sleep for the
surgery, the Doctor also carefully watches the heart rate, breathing pattern
and blood pressure. By studying the medical history and understanding the
type of surgery being performed, the Anesthesiologist knows what values
each vital sign should have. During the surgery the Anesthesiologist will not
only monitor these values, but will use intravenous drugs to correct them if
they start to deviate from accepted values.
The drugs used to control blood pressure during surger are all given through
an IV tube, are very fast acting, and extremely effective. Throughout the
surgical procedure, all of vital signs will be maintained at very close to their
ideal levels.
For elective surgical procedures like cosmetic surgery or vision correction
surgery, the surgeon may want to try and get the patient blood pressure as
close to normal as possible before proceeding with the surgery. While this is
not strictkly necessary, it does reduce the risk of certain surgical
complications. Since the surgery can safely be delayed as long as necessary,
this approach is medically appropriate in these circumstances.
4.1 DATA ANALYSIS
The Data used in this project is a secondary data. Below is the data showing
the recovery time, Log dose of drug administered to stabilize BP and the
systolic blood pressure reached during surgery.
CHAPTER FIVE
5.0 SUMMARY
In this project work, we started with an introduction of the
general concept of multiple linear regression and correlation
analysis and subsequently discussed the works of various
authors on regression and correlation analysis. In our
attempt to estimate the regression coefficients (parameters) ,
we have used the least square methods and in testing the
statistical significance of these parameter estimates, wc
made use of the ,t- statistic.
. Also, we were able to obtain the coefficient of determination and
in testing the statistical reliability of our model; we used the
F- statistic. Since, · we were testing for the joint significance of
the regression parameters.
Finally, we applied our multiple regression and correlation
models to the . recovery time of BP patients in Delta State
University Teaching Hospital, Oghara, Delta State, which
depends on the Log Dose of anesthetic drug administered and
the BP of the patient
. before the surgery.
. 5.1 CONCLUSION
In this project work, we have been able to adapt a model that best
fits recovery time of BP patients after surgery with respect to the log
dose and BP. As such, we have been able to predict using our model
the recovery time of a patient given the log dose and the patients BP.
The first scatter diagram tells us about the effect of the dose of the
anesthetic drug used on recovery time
The second scatter diagrmn !:ells us about the effect of blood pressure
achieved during surgery and time to recovery to the original level.
The greater the drop in the blood pressure the longer it would take
for it to return to its original level. The R-sq. value is 20.3o/o
indicating that 20.3% of the variation in recovery
· time can be accounted for by the patients BP before the surgery.
The . Multiple Regression Analysis describes the effect of the two
explanatory variables acting jointly on the recovery time. R-sq
improves to 22.8% indicating that even though the dose of the
· drug is art important factor in lowering the ··blood pressure, the
lower the blood pressure achieved . during surgery the
longer it takes for it to recover to norn1al value.
An interesting message here is about the individual
variability of
· subjects in responding to the anesthetic drug. For the
same dose
· those subjects experiencing a greater fall in their blood
pressure would take longer for it to return to normal.
BIBLIOGRAPHY
Aldrich John ( 1995) : "Correlations Genuine3 and Spurious
in Pwrson and Yule". Statistical Science 10 (4): 364 -
376.
Cox, D.R. and Hinkley, D.V. (1974) : Theoretical statistics.
Chapman and Hall (Appendix 3) ISBJV 04 7 2124203.
Davidson, A.C. (2000): "Statistical :Models" Carnbridge
University Press.
Hoel, G.P ( 1971) : "Introduction to Mathematical Statistics". Fourth
Edition, John Wiley and sons, Inc.
Kendall, M. (1938) : "A New Measure of Rank
Correlation".
Biometrika 30 (1-2): 81 -89.
Kreyszig Erwin ( 1970) : "Introductory Mathematical Statistics"
John Wiley and Sons Inc., New York.
Monga G.S. ( 1972) : "Mathematics and Statistics for
Economics".
Vikas publiching House put Ltd.
Montgomery, D.C And Runger, G.C (2003) : "Applied Statistics and
Probability for Engineers". John Wiley and sons, Inc.
Mosteller. Federick (1977) : "Data An alysis and Regression, A 2nd
Course in Statistics". Wesley Publishing.
Ojameruagye, E.O AND Oaikhenan, H.E (2004) : "A second course in
Econometrics".
Strait Pegg Tang (1983) : "Probability and Statistics
with Application,,. Harcourt Brace, Jovanovich, Inc.
Summers, W.G., Williams, S.P and Charles, P.A. (1977) : "Basic
statitistics and Introduction" Wadsworth Publishing
Company, Inc.
Statistical Records of BP Patients in Delta State. 201:5 ; 14:53-61.