Hack 13. Use One Variable to Predict Another

theexistenceofarelationshipbetweenthetwomakesACT

scoresagoodcandidateasapredictortoguessGPA.

Simplelinearregressionistheprocedurethatproducesallthe

valuesweneedtocookupthemagicformulathatwillpredict

thefuture.Thisprocedureproducesaregressionlinethatwe

cangraphtodeterminewhatthefutureholds[Hack#12],but

oncewehavetheformula,wedon'tactuallyneedtodoany

graphingtomakeourguesses.

CookingUptheEquation

First,examinetherecipeforcreatingtheformula(seethe

"RegressionFormulaRecipe"sidebar),andthenwe'llseehowto

useitwithrealdata.Youcanclipthisrecipeoutandkeepitin

thekitchendrawer.

RegressionFormulaRecipe

Ingredients

2samplesofdatafromcorrelatedvariables:

1criterionvariable(theoneyouwanttopredict)

1predictorvariable(theoneyouwillpredictwith)

1correlationcoefficientoftherelationshipbetweenthe2variables

2samplemeans

2samplestandarddeviations

Container

Anemptyequationshapedlikethis:

Directions

Calculatetheweightbywhichyouwillmultiplyyourpredictorvariable:

Calculatetheconstant:

Filltheregressionequationwiththeweightandconstantyoujustprepared.

Serves

Anyoneinterestedinguessingwhatwouldhappenif....

Theregressionrecipecallsfortwootheringredients,meansand

standarddeviationsforbothvariables.Herearethosestatistics

forourexample:

Variable

ACTscores

GPA

Mean

20.10

2.98

Standarddeviation

2.38

.68

Youcanreviewmeansandstandarddeviationsin"DescribetheWorld

UsingJustTwoNumbers"[Hack#2].

information.Consequently,aseachapplicant'slettercameinto

scoreintotheregressionformulaandpredicthisGPA.Let's

figureoutthepartsoftheregressionequationinthisexample:

Byplacingallthisinformationintotheregressionequation

format,wegetthisformulaforpredictingfreshmanGPAusing

ACTscores:

Noticethattheconstantinthiscaseisanegativenumber.That'sOK.

PredictingScores

Oneapplicant,Melissa,hasanACTscoreof26.Theother

applicantlet'scallhimBrucehasanACTscoreof14.

Usingtheregressionequationwehavebuilt,therewouldbe

averages:

ForMelissa

PredictedGPA=-.24+(26x.16)

PredictedGPA=-.24+4.16

PredictedGPA=3.90

ForBruce

PredictedGPA=-.24+(14x.16)

PredictedGPA=-.24+2.24

PredictedGPA=2.00

Ihope,forBruce'ssake,thereismorethanonespotavailable.

Thetwovariablesinthisexample,ACTscoresandGPA,areondifferent

scales,withACTscorestypicallyrunningbetween1and36andGPA

rangingfrom0to4.0.Partofthemagicofcorrelationalanalysesisthat

thevariablescanbeonallsortsofdifferentscalesanditdoesn't

matter.Thepredictedoutcomesomehowknowstobeonthescaleof

thecriterionvariable.Kindofspooky,huh?

WhyItWorks

Whentwovariablescorrelatewitheachother,thereisoverlapin

theinformationtheyprovide.Itisasiftheyshareinformation.

Statisticianssometimesusecorrelationalinformationtotalk

Ifsomeofthevarianceinonevariableisaccountedforbythe

varianceinanothervariable,itmakessensethatsmart

mathematicianscanuseonecorrelatedvariabletoestimatethe

amountofvariancefromthemean(ordistancefromthemean)

onanothervariable.Theywouldhavetousenumbersthat

representthevariables'meansandvariability,andanumber

thatrepresentstheamountofoverlapininformation.Our

regressionequationusesallthatinformationbyincluding

means,standarddeviations,andthecorrelationcoefficient.

WhereElseItWorks

makingpredictions.Sometimes,scientistsjustwantto

understandavariableandhowitoperatesorhowitis

distributedinapopulation.Theycandothisbylookingathow

thatvariableisrelatedtoanothervariablethattheyknowmore

Statisticianscallsimplelinearregressionsimplenotbecauseitiseasy,

butbecauseitusesonlyonepredictorvariable.Itissimpleas

comparedtocomplex.Real-lifepredictionslikethoseinourexample

usuallyusemanypredictors,notjustone.Themethodofpredictinga

criterionvariableusingmorethanonepredictoriscalledmultiple

regression[Hack#14].

WhereItDoesn'tWork

Therewillbeerrorinpredictionsunderthreecircumstances.

First,ifthecorrelationislessthanperfectbetweentwo

variables,thepredictionwillnotbeperfectlyaccurate.Since

therearealmostneverreallylargerelationshipsbetween

predictorsandcriteria,letaloneperfect1.0correlations,real-

worldapplicationsofregressionmakelotsofmistakes.Inthe

presenceofanycorrelationatall,though,thepredictionis

moreaccuratethanblindguessing.Youcandeterminethesize

ofyourerrorswiththestandarderrorofestimate[Hack#18].

Second,linearregressionassumesthattherelationshipis

linear.Thisisdiscussedin"GraphRelationships"[Hack#12]in

greaterdetail,butifthestrengthoftherelationshipvariesat

differentpointsalongtherangeofscores,theregression

predictionwillmakelargeerrorsinsomecases.

Finally,ifthedatacollectedtofirstestablishthevaluesusedin

theregressionequationarenotrepresentativeoffuturedata,

example,ifanapplicantpresentswithanACTscoreof36,the

predictedGPAis5.52.Thisisanimpossiblevaluethatdoesnot

evenfitontheGPAscale,whichmaxesoutat4.0.Becausethe

pastdatathatwasusedtoestablishthepredictionformula

includedfewornoACTscoresof36,theequationwasnot

equippedtodealwithsuchahighscore.

Hack14.UseMoreThanOneVariabletoPredict

Another

Thesuperpowersofpredictingthefutureandseeingthe

invisibleareavailabletoanystatisticshackerswhofeel

andusecorrelationalinformationtosolveproblemsby

usingonevariabletopredictanother.Formoreaccurate

predictions,though,severalpredictorvariablescanbe

combinedinasingleregressionequationbyusingthe

methodsofmultipleregression.

"GraphRelationships"[Hack#12]discussestheuseful

propheticqualitiesofaregressionline.Thoseproceduresallow

performanceonassessmentsnevertaken,understand

variables.Theyaccomplishthesetricksusingjustasingle

predictorvariable.

"UseOneVariabletoPredictAnother"[Hack#13]presentsthe

predictfutureperformance.Thesolutioninthathackusesone

variable(astandardizedtestscore)toestimateperformanceon

Often,real-liferesearcherswanttomakeuseoftheinformation

foundinabunchofvariables,notjustonevariable,tomake

predictionsorestimatescores.Whentheywantgreater

accuracy,scientistsattempttofindseveralvariablesthatall

appeartoberelatedtothecriterionvariableofinterest(the

variableyouaretryingtopredict).Theyuseallthisinformation

toproduceamultipleregressionequation.

ChoosingPredictorVariables

Another"[Hack#13]beforegoingfurtherwiththishack,justto

reviewtheproblemathandandhowregressionsolvesit.Here

istheequationwebuiltinthathackforusingasinglepredictor,

PredictedGPA=-.24+(ACTScorex.16)

Thissinglepredictorproducedaregressionequationwithoutput

thatcorrelated.55withthecriterion.Prettygood,andpretty

accurate,butitcouldbebetter.

ofprecisionshecouldgetusingtheregressionlineorequation

moreaccurateresultifshecouldfindmorevariablesthat

statisticianfoundtwootherpredictorvariablesthatcorrelated

withcollegeperformance:

Anattitudemeasure

Thequalityofawrittenessay

Perhapsperformanceonacollegeattitudesurveyiscollectedby

thecollege(scoresrangebetween20and100),andisfoundto

1to5onapersonalessaycouldcorrelatewithcollegeGPAand

mightbeincludedinthemultipleregressionequation.

