[go: up one dir, main page]

0% found this document useful (0 votes)
612 views6 pages

Introduction To Correlation and Regression Analysis PDF

The document discusses correlation and regression analysis techniques. Correlation analysis quantifies the association between two continuous variables, while regression analysis assesses the relationship between an outcome and predictor variables. A sample correlation coefficient between -1 and 1 measures the strength and direction of a linear relationship. The document provides an example calculating the correlation coefficient between gestational age and birth weight using data from 17 infants.

Uploaded by

Azra Mufti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
612 views6 pages

Introduction To Correlation and Regression Analysis PDF

The document discusses correlation and regression analysis techniques. Correlation analysis quantifies the association between two continuous variables, while regression analysis assesses the relationship between an outcome and predictor variables. A sample correlation coefficient between -1 and 1 measures the strength and direction of a linear relationship. The document provides an example calculating the correlation coefficient between gestational age and birth weight using data from 17 infants.

Uploaded by

Azra Mufti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

MultivariableMethods
Page:1|2|3|4|5|6|7|8|9|10

printall

IntroductiontoCorrelationand
RegressionAnalysis

Contents

Inthissectionwewillfirstdiscusscorrelationanalysis,whichisusedtoquantifythe
associationbetweentwocontinuousvariables(e.g.,betweenanindependentanda
dependentvariableorbetweentwoindependentvariables).Regressionanalysisis
arelatedtechniquetoassesstherelationshipbetweenanoutcomevariableand
oneormoreriskfactorsorconfoundingvariables.Theoutcomevariableisalso
calledtheresponseordependentvariableandtheriskfactorsandconfounders
arecalledthepredictors,orexplanatoryorindependentvariables.Inregression
analysis,thedependentvariableisdenoted"y"andtheindependentvariablesare
denotedby"x".
[NOTE:Theterm"predictor"canbemisleadingifitisinterpretedastheability
topredictevenbeyondthelimitsofthedata.Also,theterm"explanatory
variable"mightgiveanimpressionofacausaleffectinasituationinwhich
inferencesshouldbelimitedtoidentifyingassociations.Theterms
"independent"and"dependent"variablearelesssubjecttothese
interpretationsastheydonotstronglyimplycauseandeffect.

CorrelationAnalysis

Introductionto
Correlationand
Regression
Analysis
Correlation
Analysis
Example
Correlationof
GestationalAge
andBirthWeight

ModuleTopics
AllModules

Incorrelationanalysis,weestimateasamplecorrelationcoefficient,more
specificallythePearsonProductMomentcorrelationcoefficient.Thesample
correlationcoefficient,denotedr,
rangesbetween1and+1andquantifiesthedirectionandstrengthofthelinear
associationbetweenthetwovariables.Thecorrelationbetweentwovariablescan
bepositive(i.e.,higherlevelsofonevariableareassociatedwithhigherlevelsof
theother)ornegative(i.e.,higherlevelsofonevariableareassociatedwithlower
levelsoftheother).
Thesignofthecorrelationcoefficientindicatesthedirectionoftheassociation.The
magnitudeofthecorrelationcoefficientindicatesthestrengthofthe
association.
Forexample,acorrelationofr=0.9suggestsastrong,positiveassociation
betweentwovariables,whereasacorrelationofr=0.2suggestaweak,negative
association.Acorrelationclosetozerosuggestsnolinearassociationbetweentwo
continuousvariables.

LISA:[Ifindthisdescriptionconfusing.Yousaythatthecorrelation
coefficientisameasureofthe"strengthofassociation",butifyouthink
aboutit,isn'ttheslopeabettermeasureofassociation?Weuseriskratios
andoddsratiostoquantifythestrengthofassociation,i.e.,whenanexposure
ispresentithashowmanytimesmorelikelytheoutcomeis.Theanalogous
quantityincorrelationistheslope,i.e.,foragivenincrementinthe
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

1/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

independentvariable,howmanytimesisthedependentvariablegoingto
increase?And"r"(orperhapsbetterRsquared)isameasureofhowmuchof
thevariabilityinthedependentvariablecanbeaccountedforbydifferences
intheindependentvariable.Theanalogousmeasureforadichotomous
variableandadichotomousoutcomewouldbetheattributableproportion,
i.e.,theproportionofYthatcanbeattributedtothepresenceofthe
exposure.]

Itisimportanttonotethattheremaybeanonlinearassociationbetweentwo
continuousvariables,butcomputationofacorrelationcoefficientdoesnotdetect
this.Therefore,itisalwaysimportanttoevaluatethedatacarefullybefore
computingacorrelationcoefficient.Graphicaldisplaysareparticularlyusefulto
exploreassociationsbetweenvariables.
Thefigurebelowshowsfourhypotheticalscenariosinwhichonecontinuous
variableisplottedalongtheXaxisandtheotheralongtheYaxis.

Scenario1depictsastrongpositiveassociation(r=0.9),similartowhatwe
mightseeforthecorrelationbetweeninfantbirthweightandbirthlength.
Scenario2depictsaweakerassociation(r=0,2)thatwemightexpecttosee
betweenageandbodymassindex(whichtendstoincreasewithage).
Scenario3mightdepictthelackofassociation(rapproximately0)betweenthe
extentofmediaexposureinadolescenceandageatwhichadolescentsinitiate
sexualactivity.
Scenario4mightdepictthestrongnegativeassociation(r=0.9)generally
observedbetweenthenumberofhoursofaerobicexerciseperweekand
percentbodyfat.

ExampleCorrelationofGestationalAgeandBirthWeight

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

2/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

ExampleCorrelationofGestationalAgeandBirthWeight
Asmallstudyisconductedinvolving17infantstoinvestigatetheassociation
betweengestationalageatbirth,measuredinweeks,andbirthweight,measuredin
grams.

Wewishtoestimatetheassociationbetweengestationalageandinfantbirth
weight.Inthisexample,birthweightisthedependentvariableandgestationalage
istheindependentvariable.Thusy=birthweightandx=gestationalage.Thedata
aredisplayedinascatterdiagraminthefigurebelow.

Eachpointrepresentsan(x,y)pair(inthiscasethegestationalage,measuredin
weeks,andthebirthweight,measuredingrams).Notethattheindependent
variableisonthehorizontalaxis(orXaxis),andthedependentvariableisonthe
verticalaxis(orYaxis).Thescatterplotshowsapositiveordirectassociation
betweengestationalageandbirthweight.Infantswithshortergestationalagesare
morelikelytobebornwithlowerweightsandinfantswithlongergestationalages
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

3/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

aremorelikelytobebornwithhigherweights.
Theformulaforthesamplecorrelationcoefficientis

whereCov(x,y)isthecovarianceofxandydefinedas

arethesamplevariancesofxandy,definedas

Thevariancesofxandymeasurethevariabilityofthexscoresandyscores
aroundtheirrespectivesamplemeans(
,consideredseparately).Thecovariancemeasuresthevariabilityofthe
(x,y)pairsaroundthemeanofxandmeanofy,consideredsimultaneously.
Tocomputethesamplecorrelationcoefficient,weneedtocomputethevarianceof
gestationalage,thevarianceofbirthweightandalsothecovarianceofgestational
ageandbirthweight.
Wefirstsummarizethegestationalagedata.Themeangestationalageis:

Tocomputethevarianceofgestationalage,weneedtosumthesquareddeviations
(ordifferences)betweeneachobservedgestationalageandthemeangestational
age.Thecomputationsaresummarizedbelow.

Thevarianceofgestationalageis:

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

4/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

Next,wesummarizethebirthweightdata.Themeanbirthweightis:

Thevarianceofbirthweightiscomputedjustaswedidforgestationalageas
showninthetablebelow.

Thevarianceofbirthweightis:

Nextwecomputethecovariance,

Tocomputethecovarianceofgestationalageandbirthweight,weneedtomultiply
thedeviationfromthemeangestationalagebythedeviationfromthemeanbirth
weightforeachparticipant(i.e.,

Thecomputationsaresummarizedbelow.Noticethatwesimplycopythedeviations
fromthemeangestationalageandbirthweightfromthetwotablesaboveintothe
tablebelowandmultiply.

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

5/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

Thecovarianceofgestationalageandbirthweightis:

Wenowcomputethesamplecorrelationcoefficient:

Notsurprisingly,thesamplecorrelationcoefficientindicatesastrongpositive
correlation.
Aswenoted,samplecorrelationcoefficientsrangefrom1to+1.Inpractice,
meaningfulcorrelations(i.e.,correlationsthatareclinicallyorpracticallyimportant)
canbeassmallas0.4(or0.4)forpositive(ornegative)associations.Thereare
alsostatisticalteststodeterminewhetheranobservedcorrelationisstatistically
significantornot(i.e.,statisticallysignificantlydifferentfromzero).Proceduresto
testwhetheranobservedsamplecorrelationissuggestiveofastatistically
significantcorrelationaredescribedindetailinKleinbaum,KupperandMuller.1

returntotop|previouspage|nextpage
Content2013.AllRightsReserved.
Datelastmodified:January17,2013.
BostonUniversitySchoolofPublicHealth
mobilepage

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

6/6

You might also like