广义倾向得分匹配法:Stata程序+实例论文 - 图文 下载本文

TheStataJournal(2008)8,Number3,pp.354–373

AStatapackagefortheestimationofthedose–responsefunctionthroughadjustmentfor

thegeneralizedpropensityscore

MichelaBia

LaboratorioRiccardoRevelliCentreforEmploymentStudies

CollegioCarloAlbertoMoncalieri,Italy

michela.bia@laboratoriorevelli.it

AlessandraMatteiDepartmentofStatisticsUniversityofFlorence

Florence,Italymattei@ds.uni?.it

Abstract.Inthisarticle,webrie?yreviewtheroleofthepropensityscoreinestimatingdose–responsefunctionsasdescribedinHiranoandImbens(2004,Ap-pliedBayesianModelingandCausalInferencefromIncomplete-DataPerspectives,73–84).ThenwepresentasetofStataprogramsthatestimatethepropensityscoreinasettingwithacontinuoustreatment,testthebalancingpropertyofthegeneralizedpropensityscore,andestimatethedose–responsefunction.Weillus-tratetheseprogramsbyusingadatasetcollectedbyImbens,Rubin,andSacerdote(2001,AmericanEconomicReview91:778–794).

Keywords:st0150,gpscore,doseresponse,doseresponsemodel,biasremoval,dose–responsefunction,generalizedpropensityscore,weakunconfoundedness

1Introduction

Muchoftheworkonpropensity-scoreanalysishasfocusedoncaseswherethetreat-mentisbinary.Matchingestimatorsforcausale?ectsofabinarytreatmentbasedonpropensityscoreshavealsobeenimplementedinStata(e.g.,BeckerandIchino[2002]andLeuvenandSianesi[2003]).

Inmanyobservationalstudies,thetreatmentmaynotbebinaryorevencategorical.Insuchacase,onemaybeinterestedinestimatingthedose–responsefunctionwherethetreatmentmighttakeonacontinuumofvalues.Forexample,ineconomics,animportantquantityofinterestisthee?ectofaidto?rms(e.g.,BiaandMattei[2007]).Insocioeconomicstudies,onemaybeinterestedinthee?ectoftheamountofalotteryprizeonsubsequentlaborearnings(e.g.,HiranoandImbens[2004]).

HiranoandImbens(2004)developedanextensiontothepropensity-scoremethodinasettingwithacontinuoustreatment.FollowingRosenbaumandRubin(1983)andmostoftheliteratureonpropensity-scoreanalysis,theymakeanunconfoundednessassumption,whichallowsthemtoremoveallbiasesincomparisonsbytreatmentstatusbyadjustingfordi?erencesinasetofcovariates.Thentheyde?neageneralizationofthepropensityscoreforthebinarycase—henceforthlabeledgeneralizedpropensityscore(GPS)—whichhasmanyoftheattractivepropertiesofthebinary-treatmentpropensityscore.

c2008StataCorpLP??

st0150

M.BiaandA.Mattei355

Inthisarticle,webrie?yreviewthemethoddevelopedbyHiranoandImbens(2004),andweprovideasetofStataprogramsthatestimatetheGPS,assesstheadequacyoftheunderlyingassumptionsonthedistributionofthetreatmentvariable,testwhethertheestimatedGPSsatis?esthebalancingproperty,andestimatethedose–responsefunction.FollowingHiranoandImbens(2004),ourStataprogramsaddresstheproblemofestimationandinferencebyusingparametricmodels.

WeillustratetheseprogramswithadatasetcollectedfromImbens,Rubin,andSac-erdote(2001).ThepopulationconsistsofindividualswhowontheMegabuckslotteryinMassachusettsinthemid-1980s.Weapplyourprogramstoestimatetheaveragepo-tentialpost-winninglaborearningsforeachlevelofthelotteryprize(thedose–responsefunction).Althoughtheassignmentoftheprizeisobviouslyrandom,substantialitemandunitnonresponseledtoaselectedsamplewheretheamountoftheprizeisnolongerindependentofbackgroundcharacteristics.Inusingtheseprograms,rememberthattheyonlyallowyoutoreduce,nottoeliminate,thebiasgeneratedbyunobservableconfoundingfactors.Asinthebinary-treatmentcase,theextenttowhichthisbiasisreduceddependscruciallyontherichnessandqualityofthecontrolvariables,onwhichtheGPSiscomputed.

2Thepropensityscorewithcontinuoustreatments

SupposewehavearandomsampleofsizeNfromalargepopulation.Foreachunitiinthesample,weobserveap×1vectorofpretreatmentcovariates,Xi;thetreatmentreceived,Ti;andthevalueoftheoutcomevariableassociatedwiththistreatment,Yi.UsingtheRubincausalmodel(Holland1986)asaframeworkforcausalinference,wede?neasetofpotentialoutcomes,{Yi(t)}t∈T,i=1,...,N,whereTisacontinuoussetofpotentialtreatmentvalues,andYi(t)isarandomvariablethatmapsaparticu-larpotentialtreatment,t,toapotentialoutcome.HiranoandImbens(2004)referto{Yi(t)}t∈Tastheunit-leveldose–responsefunction.Weareinterestedintheaveragedose–responsefunction,μ(t)=E{Yi(t)}.FollowingHiranoandImbens(2004),weas-sumethat{Yi(t)}t∈T,Ti,andXi,i=1,...,N,arede?nedonacommonprobabilityspace;thatTiiscontinuouslydistributedwithrespecttotheLebesguemeasureonT;andthatYi=Yi(Ti)isawell-de?nedrandomvariable.Tosimplifythenotation,wewilldroptheisubscriptinthesequel.

Thepropensityfunctionisde?nedbyHiranoandImbens(2004)astheconditionaldensityoftheactualtreatmentgiventheobservedcovariates.

De?nition2.1(GPS)Letr(t,x)betheconditionaldensityofthetreatmentgiventhecovariates:

r(t,x)=fT|X(t|x)

ThentheGPSisR=r(T,X).

356EstimatingtheGPSandthedose–responsefunction

TheGPShasabalancingpropertysimilartothatofthestandardpropensityscore;thatis,withinstratawiththesamevalueofr(t,x),theprobabilitythatT=tdoesnotdependonthevalueofX:

X⊥I(T=t)|r(t,x)

whereI(·)istheindicatorfunction.HiranoandImbens(2004)showthat,incombina-tionwithasuitableunconfoundednessassumption,thisbalancingpropertyimpliesthatassignmenttotreatmentisunconfounded,giventheGPS.

Theorem2.1(WeakunconfoundednessgiventheGPS)Supposethatassignmenttothetreatmentisweaklyunconfounded,givenpretreatmentvariablesX:

Y(t)⊥T|X

Then,foreveryt,

fT{t|r(t,X),Y(t)}=fT{t|r(t,X)}

Usingthistheorem,HiranoandImbens(2004)showthattheGPScanbeusedtoeliminateanybiasesassociatedwithdi?erencesinthecovariates.

Theorem2.2(BiasremovalwithGPS)Supposethatassignmenttothetreatmentisweaklyunconfounded,givenpretreatmentvariablesX.Then

β(t,r)=E{Y(t)|r(t,X)=r}=E(Y|T=t,R=r)

and

μ(t)=E[β{t,r(t,X)}]

forallt∈T

3Estimationandinference

TheimplementationoftheGPSmethodconsistsofthreesteps.Inthe?rststep,weestimatethescorer(t,x).Inthesecondstep,weestimatetheconditionalexpectationoftheoutcomeasafunctionoftwoscalarvariables,thetreatmentlevelTandtheGPSR:β(t,r)=E(Y|T=t,R=r).Inthethirdstep,weestimatethedose–responsefunction,μ(t)=E[β{t,r(t,X)}],t∈T,byaveragingtheestimatedconditionalexpectation,??{t,r(t,X)},overtheGPSateachlevelofthetreatmentweareinterestedin.β

M.BiaandA.Mattei357

3.1

Modelingtheconditionaldistributionofthetreatmentgiventhecovariates

The?rststepistoestimatetheconditionaldistributionofthetreatmentgiventhecovariates.Weassumethatthetreatment(oritstransformation)hasanormaldistri-butionconditionalonthecovariates:

????

g(Ti)|Xi~Nh(γ,Xi),σ2

(1)

whereg(Ti)isasuitabletransformationofthetreatmentvariable[g(·)maybetheidentityfunction],andh(γ,Xi)isafunctionofcovariateswithlinearandhigher-orderterms,whichdependsonavectorofparameters,γ.Thechoiceofthehigher-ordertermstoincludeisonlydeterminedbytheneedtoobtainanestimateoftheGPSthatsatis?esthebalancingproperty.

Theprogramgpscore.adoestimatestheGPSandteststhebalancingpropertyac-cordingtothefollowingalgorithm:

1.Estimatetheparametersγandσ2oftheconditionaldistributionofthetreatmentgiventhecovariates(1)bymaximumlikelihood.12.Assessthevalidityoftheassumednormaldistributionmodelbyoneofthefollow-inguser-speci?edgoodness-of-?ttests:theKolmogorov–Smirnov,theShapiro–Francia,theShapiro–Wilk,ortheStataskewnessandkurtosistestfornormality.

a.Ifthenormaldistributionmodelisstatisticallydisapproved,informtheuserthattheassumptionofnormalityisnotsatis?ed.Theuserisinvitedtouseadi?erenttransformationofthetreatmentvariableg(Ti).3.EstimatetheGPSas

1??i=√Rexp?2{g(Ti)?h(γ??,Xi)}2σ??2πσ??2whereγ??andσ??2aretheestimatedparametersinstep1.

4.Testthebalancingpropertyandinformtheuserwhetherandtowhatextent

thebalancingpropertyissupportedbythedata.FollowingHiranoandImbens(2004),theprogramgpscore.adotestsforbalancingofcovariatesaccordingtothefollowingscheme:

a.Dividethesetofpotentialtreatmentvalues,T,intoKintervalsaccordingtoauser-speci?edrule,whichshouldbede?nedonthebasisofthesampledis-tributionofthetreatmentvariable.LetG1,...,GKdenotetheKtreatmentintervals.

1.Themodel(1)isspeci?edintheauxiliaryado-?legpscoremodel.ado.

1

????

358EstimatingtheGPSandthedose–responsefunction

b.WithineachtreatmentintervalGk,k=1,...,K,computetheGPSatauser-speci?edrepresentativepoint(e.g.,themean,themedian,oranotherpercentile)ofthetreatmentvariable,whichwedenotebytGk,foreachunit.Letr(tGk,Xi)bethevalueoftheGPScomputedattGk∈Gkforuniti.c.Foreachk,k=1,...,K,blockonthescoresr(tGk,Xi),usingmintervals,de?nedbythequantilesoforderj/m,j=1,...,m?1,oftheGPSevalu-(k)(k)

atedattGk,r(tGk,Xi),i=1,...,N.LetB1,...,BmdenotethemGPSintervalsforthekthtreatmentinterval,Gk.d.WithineachintervalBj,j=1,...,m,calculatethemeandi?erenceofeachcovariatebetweenunitsthatbelongtothetreatmentinterval,Gk,{i:Ti∈

(k)

Gk},andunitsthatareinthesameGPSinterval,{i:r(tGk,Xi)∈Bj},but

/Gk}.belongtoanothertreatmentinterval,{i:Ti∈e.Combinethemdi?erencesinmeans,calculatedinstepd,byusingaweighted

average,withweightsgivenbythenumberofobservationsineachGPSin-(k)

tervalBj,j=1,...,m.Speci?cally,thefollowingweightedaverageiscalculatedforeachofthepcovariatesXl,l=1,...,p:

m1??

NB(k){xl,j(Gk)?xl,j(Gck)}jNj=1(k)

whereNB(k)isthenumberofobservationsintheBj

j

(k)

GPS

interval;xl,j(Gk)

(k)

isthemeanofthecovariateXlforunitsi,suchthatr(tGk,Xi)∈Bjand

??

Ti∈Gk;andxl,j(Gck)isthemeanofthecovariateXlforunitsi,suchthat

(k)

/Gk.Theteststatisticsweusetoevaluatether(tGk,Xi??)∈BjandTi??∈

balancingpropertyarefunctionsofthisweightedaverage.

f.ForeachGk,k=1,...,K,teststatistics(theStudent’ststatisticsortheBayesfactors)arecalculatedandshownintheResultswindow.Finally,themostextremevalueoftheteststatistics(thehighestabsolutevalueoftheStudent’ststatisticsorthelowestvalueoftheBayesfactors)iscomparedwithreferencevalues,andtheuserisinformedoftheextenttowhichthebalancingpropertyissupportedbythedata.

3.2

EstimatingtheconditionalexpectationoftheoutcomegiventhetreatmentandGPS

Inthesecondstage,wemodeltheconditionalexpectationoftheoutcome,Yi,givenTiandRi,asa?exiblefunctionofitstwoarguments.Weusepolynomialapproximationsofordernothigherthanthree.Speci?cally,themostcomplexmodelweconsideris

?{E(Yi|Ti,Ri)}=ψ(Ti,Ri;α)

23+α6·Ri+α7·Ti·Ri=α0+α1·Ti+α2·Ti2+α3·Ti3+α4·Ri+α5·Ri

M.BiaandA.Mattei359

where?(·)isalinkfunctionthatrelatesthepredictor,ψ(Ti,Ri;α),totheconditionalexpectation,E(Yi|Ti,Ri).

Weassumethatthemaine?ectsofTiandRicannotberemovedsothatwehave18possiblesubmodels.Theprogramdoseresponsemodel.adode?nesallthesemodels

??i.When?ttingtheselectedandestimateseachofthembyusingtheestimatedGPS,R

model,theprogramtakesintoaccountthenatureoftheoutcomevariable—whichmaybebinary,categorical(nominalorordinal),orcontinuous—bychoosingtheappropriatelinkfunction.

AsHiranoandImbens(2004)emphasize,thereisnodirectmeaningtotheestimatedcoe?cientsintheselectedmodel,exceptthattestingwhetherallcoe?cientsinvolvingtheGPSareequaltozerocanbeinterpretedasatestofwhetherthecovariatesintroduceanybias.

3.3Estimatingthedose–responsefunction

Thelaststepconsistsofaveragingtheestimatedregressionfunctionoverthescorefunctionevaluatedatthedesiredlevelofthetreatment.Speci?cally,inordertoobtainanestimateoftheentiredose–responsefunction,weestimatetheaveragepotentialoutcomeforeachlevelofthetreatmentweareinterestedinas

NN??1????1???1??????E{Y(t)}=???}β{t,r??(t,Xi)}=ψ{t,r??(t,Xi);αNi=1Ni=1

whereα??isthevectoroftheestimatedparametersinthesecondstage.

Theprogramdoseresponse.adoestimatesthedose–responsefunctionaccordingto

thefollowingalgorithm:

1.EstimatetheGPS,verifythenormalmodelusedfortheGPS,andtestthebalancingpropertycallingtheroutinegpscore.ado.2.Estimatetheconditionalexpectationoftheoutcome,giventhetreatmentandtheGPS,bycallingtheroutinedoseresponsemodel.ado.3.Estimatetheaveragepotentialoutcomeforeachlevelofthetreatmenttheuserisinterestedin.4.Estimatestandarderrorsofthedose–responsefunctionviabootstrapping.25.Plottheestimateddose–responsefunctionand,ifrequested,itscon?denceinter-vals.

2.HiranoandImbens(2004)statethatasymptoticstandarderrorsoftheestimateddose–responsefunctioncouldbecalculatedbyusingexpansionsbasedontheestimatingequations;theseshouldtakeintoaccounttheestimationoftheGPSaswellastheαparameters.Forpracticalreasons,ourprogramusesbootstrapmethodstoobtainstandarderrorsandcon?denceintervalsofthedose–responsefunctionthattakeintoaccountestimationoftheGPSandtheαparameters.

360EstimatingtheGPSandthedose–responsefunction

Someremarksonstep4ofthealgorithmcanbeuseful.Whenbootstrappedstandarderrorsarerequested,byactivatingtheappropriateoption(seesections4and5),thebootstrapencompassesboththeestimationoftheGPSbasedonthespeci?cationgivenbytheuserandtheestimationoftheαparameters.ReestimatingtheGPSandtheαparametersateachreplicationofthebootstrapprocedureallowsustoaccountfortheuncertaintyassociatedwiththeestimationoftheGPSandtheαparameters.

Typically,userswould?rstidentifyatransformationofthetreatmentvariableandaspeci?cationofthefunctionhin(1),satisfyingthenormalityassumptionandthebalancingproperty,respectively(byusing,forinstance,theroutinegpscore.ado),andthenprovideexactlythistransformationandthisspeci?cationintheinputtothepro-gramdoseresponse.ado.

4Syntax

??if????

in????

??

weight,t(varname)gpscore(newvar)

predict(newvar)sigma(newvar)cutpoints(varname)index(string)

??

nqgps(#)ttransf(transformation)normaltest(test)normlevel(#)

??

testvarlist(varlist)test(type)flag(#)detailgpscorevarlist

????????????

inweight,outcome(varname)doseresponsemodeltreatvarGPSvarif

??

cmd(regressioncmd)regtypet(string)regtypegps(type)

??

interaction(#)doseresponsevarlist

??if????

in????

??

weight,outcome(varname)t(varname)

gpscore(newvar)predict(newvar)sigma(newvar)cutpoints(varname)index(string)nqgps(#)doseresponse(newvarlist)??

ttransf(transformation)normaltest(test)normlevel(#)testvarlist(varlist)test(type)flag(#)cmd(regressioncmd)regtypet(type)regtypegps(type)interaction(#)tpoints(vector)npoints(#)delta(#)filename(?lename)bootstrap(string)bootreps(#)

??

analysis(string)analysislevel(#)graph(?lename)detailInthegpscoreanddoseresponsecommands,theargumentvarlistrepresentsthelistofcontrolvariables,whichareusedtoestimatetheGPS.Inthedoseresponsemodelcommand,thevariablelistconsistsofonlytwovariables:thetreatmentvariable(treatvar)andtheGPS(GPSvar).

M.BiaandA.Mattei361

5Options

Wedescribeonlytheoptionsforthedoseresponsecommand,becausetheyincludealltheoptionsforthegpscorecommandandthedoseresponsemodelcommand.There-fore,alltheoptionsdescribedinsections5.1and5.2applytodoseresponse,andwespecify,ifapplicable,whethertheoptionalsoappliestogpscoreordoseresponsemodel.

5.1Required

outcome(varname)(doseresponsemodel)speci?esthatvarnameistheoutcomevari-able.t(varname)(gpscore)speci?esthatvarnameisthetreatmentvariable.gpscore(newvar)(gpscore)speci?esthevariablenamefortheestimatedGPS.predict(newvar)(gpscore)createsanewvariabletoholdthe?ttedvaluesofthetreatmentvariable.sigma(newvar)(gpscore)createsanewvariabletoholdthemaximumlikelihoodesti-mateoftheconditionalstandarderrorofthetreatmentgiventhecovariates.cutpoints(varname)(gpscore)dividesthesetofpotentialtreatmentvalues,T,intointervalsaccordingtothesampledistributionofthetreatmentvariable,cuttingatvarnamequantiles.index(string)(gpscore)speci?estherepresentativepointofthetreatmentvariableatwhichtheGPShastobeevaluatedwithineachtreatmentinterval.stringidenti-?eseitherthemean(string=mean)orapercentile(string=p1,...,p100)ofthetreatment.nqgps(#)(gpscore)speci?esthatthevaluesoftheGPSevaluatedattherepresen-tativepointindex(string)ofeachtreatmentintervalhavetobedividedinto#(#∈{1,...,100})intervals,de?nedbythequantilesoftheGPSevaluatedattherepresentativepointindex(string).doseresponse(newvarlist)speci?esthevariablename(s)fortheestimateddose–responsefunction(s).

(Continuedonnextpage)

362EstimatingtheGPSandthedose–responsefunction

5.2Optional

ttransf(transformation)(gpscore)speci?esthetransformationofthetreatmentvari-ableusedinestimatingtheGPS.Thedefaulttransformationistheidentityfunction.Thesupportedtransformationsarethelogarithmictransformation,ttransf(ln);thezero-skewnesslogtransformation,ttransf(lnskew0);thezero-skewnessBox–Coxtransformation,ttransf(bcskew0);andtheBox–Coxtransformation,ttransf(boxcox).TheBox–Coxtransformation?ndsthemaximumlikelihoodestimatesoftheparametersoftheBox–Coxtransformregressingthetreatmentvariablet(varname)onthecontrolvariableslistedintheinputvariablelist.3normaltest(test)(gpscore)speci?esthegoodness-of-?ttestthatgpscorewillper-formtoassessthevalidityoftheassumednormaldistributionmodelforthetreat-mentconditionalonthecovariates.Bydefault,gpscoreperformstheKolmogorov–Smirnovtest(normaltest(ksmirnov)).PossiblealternativesaretheShapiro–Franciatest,normaltest(sfrancia);theShapiro–Wilktest,normaltest(swilk);andtheStataskewnessandkurtosistestfornormality,normaltest(sktest).normlevel(#)(gpscore)setsthesigni?cancelevelofthegoodness-of-?ttestfornor-mality.Thedefaultisnormlevel(0.05).testvarlist(varlist)(gpscore)speci?esthattheextentofcovariatebalancinghastobeinspectedforeachvariableofvarlist.ThedefaultvarlistconsistsofthevariablesusedtoestimatetheGPS.Thisoptionisusefulwhentherearecategoricalvariablesamongthecovariates.gpscore,whichisaregression-likecommand,requiresthatcategoricalvariablesareexpandedintoindicator(alsocalleddummy)variablesetsandthatonedummy-variablesetisdroppedinestimatingtheGPS.However,thebalancingtestshouldalsobeperformedontheomittedgroup.Thiscanbedonebyusingthetestvarlist(varlist)optionandbylistinginvarlistallthevariables,includingthecompletesetofindicatorvariablesforeachcategoricalcovariate.

3.Theproblemiswhetherthetreatmentvariabletakeszerovalue.Insuchacase,theprogramcontinues,forcingatransformationofthetreatmentvariabletotakeasuitablevalue.Speci?cally,weassumethatln(0)=0,ttransf(0)=?1/λifλ>0,andttransf(0)=ln(0)=0ifλ=0,forttransf=bcskew0orboxcox.Allowingforzerovaluesofthetreatmentimpliesthatuntreatedunitsmightbeincludedinthestudy.BecausetheGPSmethodsaredesignedforanalyzingthee?ectofatreatmentintensity,theyspeci?callyrefertothesubpopulationoftreatedunits.Thisimpliesthatincludinguntreatedunitsmightleadtomisleadingresults.

M.BiaandA.Mattei363

test(type)(gpscore)speci?eswhetherthebalancingpropertyhastobetestedusingeitherastandardtwo-sidedttest(thedefault)oraBayes-factor–basedmethod(test(Bayesfactor)).Theprograminformstheuserifthereissomeevidencethatthebalancingpropertyissatis?ed.Recallthatthetestisperformedforeachsinglevariableintestvarlist(varlist)andforeachtreatmentinterval.Speci?cally,letpbethenumberofcontrolvariablesintestvarlist(varlist),andletKbethenumberofthetreatmentintervals.We?rstcalculatep×Kvaluesoftheteststatistic;thenweselecttheworstvalue(thehighesttvalueinmodulus,orthelowestBayesfactor)andcompareitwithstandardvalues.Table1showsthe“orderofmagnitude”interpretationsoftheteststatisticsweconsider.

Table1.“Orderofmagnitude”interpretationsoftheteststatistics

tvalue|t|<1.282

1.282<|t|<1.645

Bayesfactor(BF)?Evidenceforthebalancingproperty(BP)√

BF>1.00

EvidencesupportstheBP

0.10

1.645<|t|<1.9600.10

|t|>2.576

?

0.01

BF<0.01

DecisiveevidenceagainsttheBP

TheorderofmagnitudeinterpretationsoftheBayesfactorweappliedwereproposedbyJe?reys(1961).

flag(#)(gpscore)speci?esthatgpscoreestimatestheGPSwithoutperformingeitheragoodness-of-?ttestfornormalityorabalancingtest.Thedefault#is1,meaningthatboththenormaldistributionmodelandthebalancingpropertyaretested;thedefaultlevelisrecommended.Weintroducedthisoptionforpracticalreasons.Recallthatdoseresponseestimatesthestandarderrorsofthedose–responsefunctionbyusingbootstrapmethods.Ineachbootstrapiteration,wewanttoreestimatetheGPSwithouttestingeitherthenormalityassumptionorthebalancingproperty.cmd(regressioncmd)(doseresponsemodel)de?nestheregressioncommandtobeusedforestimatingtheconditionalexpectationoftheoutcomegiventhetreatmentandtheGPS.Thedefaultfortheoutcomevariableiscmd(logit)whentherearetwodis-tinctvalues,cmd(mlogit)whenthereare3–5values,andcmd(regress)otherwise.Thesupportedregressioncommandsarelogit,probit,mlogit,mprobit,ologit,oprobit,andregress.

364EstimatingtheGPSandthedose–responsefunction

regtypet(type)(doseresponsemodel)de?nesthemaximumpowerofthetreatmentvariableinthepolynomialfunctionusedtoapproximatethepredictorforthecon-ditionalexpectationoftheoutcomegiventhetreatmentandtheGPS.Thedefault

??;α),isalinearfunctionofthetypeislinear,meaningthatthepredictor,ψ(T,R

treatment.Alternatively,typecanbequadraticorcubic.regtypegps(type)(doseresponsemodel)de?nesthemaximumpoweroftheesti-matedGPSinthepolynomialfunctionusedtoapproximatethepredictorfortheconditionalexpectationoftheoutcomegiventhetreatmentandtheGPS.Thede-??;α),isalinearfunctionoffaulttypeislinear,meaningthatthepredictor,ψ(T,R

theestimatedGPS.Alternatively,typecanbequadraticorcubic.interaction(#)(doseresponsemodel)speci?eswhetherthemodelforthecondi-tionalexpectationoftheoutcomegiventhetreatmentandtheGPShastheinterac-tionbetweentreatmentandGPS.Thedefault#is1,meaningthattheinteractionisincluded.tpoints(vector)speci?esthatdoseresponseestimatestheaveragepotentialoutcomeforeachlevelofthetreatmentinvector.Bydefault,doseresponsecreatesavectorwiththeithelementequaltotheithobservedtreatmentvalue.Thisoptioncannotbeusedwiththenpoints(#)option(seebelow).npoints(#)speci?esthatdoseresponseestimatestheaveragepotentialoutcomeforeachlevelofthetreatmentbelongingtoasetofevenlyspacedvalues,t0,t1,...,t#,thatcovertherangeoftheobservedtreatment.Thisoptioncannotbeusedwiththetpoints(vector)option(seeabove).delta(#)speci?esthatdoseresponsealsoestimatesthetreatment-e?ectfunctioncon-sideringa#-treatmentgap,whichisde?nedasμ(t+#)?μ(t).Thedefault#is0,meaningthatdoseresponseestimatesonlythedose–responsefunction,μ(t).filename(?lename)speci?esthatthetreatmentlevelsspeci?edthroughthe

tpoints(vector)optionorthenpoints(#)option,theestimateddose–responsefunction,and,eventually,theestimatedtreatment-e?ectfunction,alongwiththeirstandarderrors(ifcalculated),bestoredtoanew?lecalled?lename.bootstrap(string)speci?estheuseofbootstrapmethodstoderivestandarderrorsandcon?denceintervals.Bydefault,doseresponsedoesnotapplybootstraptechniques.Insuchacase,nostandarderroriscalculated.Toactivatethisoption,stringshouldbesettoyes.bootreps(#)speci?esthenumberofbootstrapreplicationstobeperformed.Thedefaultisbootreps(50).Thisoptionproducesane?ectonlyifthebootstrap()optionissettoyes.

M.BiaandA.Mattei365

analysis(string)speci?esthatdoseresponseplotstheestimateddose–responsefunc-tion(s)and,eventually,theestimatedtreatment-e?ectfunction(s),alongwiththecorrespondingcon?denceintervalsiftheyarecalculatedwithbootstrapping.Bydefault,doseresponseplotsonlytheestimateddose–responseandtreatmentfunc-tion(s).Inordertoplotcon?denceintervals,stringhastobesettoyes.Iftheusertypesanalysis(no),noplotisshown.analysislevel(#)setsthecon?dencelevelofthecon?denceintervals.Thedefaultisanalysislevel(0.95).graph(?lename)storestheplotsoftheestimateddose–responsefunctionandtheesti-matedtreatmente?ectstoanew?lecalled?lename.Whentheoutcomevariableiscategorical,doseresponsecreatesanew?leforeachcategoryioftheoutcomevariableandnamesit?lenamei.detail(gpscore)displaysmoredetailedoutput.Speci?cally,thisoptionspeci?esthatgpscoreshowstheresultsofthegoodness-of-?ttestfornormality,somesummarystatisticsofthedistributionoftheGPSevaluatedattherepresentativepointofeachtreatmentinterval,andtheresultsofthebalancingtestwithineachtreatmentinterval.Whenthisoptionisspeci?edfordoseresponse,theresultsoftheregressionoftheoutcomeonthetreatmentandtheGPSarealsoshown.

6

Example:TheImbens–Rubin–Sacerdotelotterysam-ple

WeusedatafromthesurveyofMassachusettslotterywinners;thedataaredescribedindetailinImbens,Rubin,andSacerdote(2001).Weareinterestedinestimatingthee?ectoftheprizeamountonsubsequentlaborearnings(fromU.S.SocialSecurityrecords).Althoughthelotteryprizeisobviouslyrandomlyassigned,substantialunitanditemnonresponseledtoaselectedsample,wheretheamountoftheprizeispotentiallycorrelatedwithbackgroundcharacteristicsandpotentialoutcomes.Toremovesuchbiases,wemaketheweakunconfoundednessassumptionspecifyingthat,conditionalonthecovariates,thelotteryprizeisindependentofthepotentialoutcomes.4

Thesampleweuseinthisanalysisisthe“winners”sampleof237individualswhowonamajorprizeinthelottery.Theoutcomeofinterestisyear6(earningssixyearsafterwinningthelottery),andthetreatmentisprize,theprizeamount.Controlvariablesareage,gender,yearsofhighschool,yearsofcollege,winningyear,numberofticketsbought,workstatusafterwinning,andearningssyearsbeforewinningthelottery(withs=1,2,...,6).

WetriedtoreplicatetheresultsproducedbyHiranoandImbens(2004)buthavenotbeenabletonumericallyreplicatealltheirestimatesbecauseofrestrictionsofour

4.Inthiscontext,thenonignorabilityoftheassignmentmechanismisduetothepresenceofnon-response.Therefore,sayingthattheunconfoundednessassumptionallowsustoremoveallbiasesassociatedwithdi?erencesintheobservedcovariatesmeansthatweareimplicitlyassumingthattheoutcomevariableismissingatrandom(Rubin1976).

366EstimatingtheGPSandthedose–responsefunction

programs.Speci?cally,ourprogramsdonotallowustoconsiderafunctionofthetreat-mentvariableorafunctionoftheGPSintheestimationoftheconditionalexpectationoftheoutcome,giventhetreatmentandtheGPS.However,wegetqualitativelysimilarresults.

6.1Outputfromgpscore

We?rstchoosethequantilesofthetreatmentvariabletodividethesampleintogroups.FollowingHiranoandImbens(2004),wedividetherangeofprizesintothreetreatmentintervals,[0–23],(23–80],and(80–485].Thenwerungpscoreusingthespeci?cationappliedbyHiranoandImbens(2004).Theoutputlookslikethefollowing:

.uselotterydataset.dta

.quigeneratecut=23ifprize<=23

.quireplacecut=80ifprize>23&prize<=80.quireplacecut=485ifprize>80

.gpscoreagewmaleownhsowncolltixbotworkthenyearwyearm1yearm2yearm3>yearm4yearm5yearm6,t(prize)gpscore(pscore)predict(hat_treat)sigma(sd)>cutpoints(cut)index(p50)nq_gps(5)t_transf(ln)detailGeneralizedPropensityScore

******************************************************Algorithmtoestimatethegeneralizedpropensityscore******************************************************

Estimationofthepropensityscore

Thelogtransformationofthetreatmentvariableprizeisused

T

1%5%Pu???%

Percentiles1.6094382.2838512.4200122.835211

3.457834.1430084.8754265.1288925.720607

logloglogloglog

Largest5.5987925.7206075.7786436.183716likelihoodlikelihoodlikelihoodlikelihoodlikelihood

=====

Smallest.1301507.13015071.6094381.67818

Obs

SumofWgt.Mean

Std.Dev.VarianceSkewnessKurtosis

--4917.4112-480.91803-348.62357-348.62357

2372373.558185.9553768.9127448-.01658893.452439

initial:feasible:rescale:rescaleeq:Iteration0:

(couldnotbeevaluated)

(outputomitted)Iteration4:loglikelihood=-307.68186

M.BiaandA.Mattei

NumberofobsWaldchi2(13)Prob>chi2

Std.Err..0048563.1351124.060835.0397666.0182546.1645602.0464566.010379.0162758.0166256.0158217.0153635.0110455.4693959.040709

z3.133.240.320.940.240.77-0.030.60-0.760.721.53-1.41-0.454.9321.77

P>|z|0.0020.0010.7520.3490.8120.4400.9750.5500.4490.4720.1260.1590.6510.0000.000

===

23737.220.0004

367

Loglikelihood=-307.68186

T

eq1

agewmaleownhsowncolltixbotworkthen

yearwyearm1yearm2yearm3yearm4yearm5yearm6_conseq2

_cons

.886297.0151905.4379826.0192025.0372805.0043423.1270879-.0014367.0062064-.0123161.0119446.0242245-.0216437-.00500212.315546

Coef.

[95%Conf.Interval].0056724.1731672-.1000319-.0406607-.031436-.1954442-.09249-.014136-.044216-.0206411-.0067855-.0517555-.02665091.395547.806509

.0247086.702798.1384368.1152217.0401206.44962.0896166.0265488.0195839.0445302.0552344.0084682.01664673.235545.9660851

Testfornormalityofthedisturbances

Kolmogorov-Smirnovequality-of-distributionstestNormalDistributionofthedisturbances

One-sampleKolmogorov-Smirnovtestagainsttheoreticaldistribution

normal((res_etreat-r(mean))/sqrt(r(Var)))

SmallergroupDP-valueCorrectedres_etreat:0.0517Cumulative:-0.0420CombinedK-S:0.0517TheassumptionofNormalityis

0.2810.4340.5500.517

statisticallysatisfiedat.05level

Estimatedgeneralizedpropensityscore

1%5%P%

Percentiles.0131817.0869414.1272663.2255553.3536221

Smallest.0003053.0011738.0131817.0163113

Obs

SumofWgt.Mean

Std.Dev.

237237.3196603.1222106.0149354-.77235012.510499

Largest

75%.4343045.450000390%.4481351.4500911Variance95%.4497166.450096Skewness99%.4500911.4501086Kurtosis********************************************Endofthealgorithmtoestimatethegpscore********************************************

368EstimatingtheGPSandthedose–responsefunction

******************************************************************************Thesetofthepotentialtreatmentvaluesisdividedinto3intervalsThevaluesofthegpscoreevaluatedattherepresentativepointofeachtreatmentintervalaredividedinto5intervals

*****************************************************************************************************************************************SummarystatisticsofthedistributionoftheGPSevaluatedattherepresentativepointofeachtreatmentinterval

***********************************************************

VariableObsMeanStd.Dev.MinMax

gps_1Variablegps_2Variable

gps_3

237Obs237Obs237

.262852

Mean.4178101

Mean.1814998

.0956436Std.Dev..0373217Std.Dev..088236

.0583948

Min.2433839

Min.0181741

.4486237

Max.4501224

Max.4141454

******************************************************************************Testthattheconditionalmeanofthepre-treatmentvariablesgiventhe

generalizedpropensityscoreisnotdifferentbetweenunitswhobelongtoaparticulartreatmentintervalandunitswhobelongtoallothertreatmentintervals

******************************************************************************TreatmentIntervalNo1-[1.139000058174133,22.98200035095215]

MeanStandardDifferenceDeviationt-value

agew

maleownhsowncolltixbotworkthenyearwyearm1yearm2yearm3yearm4yearm5yearm6

-.25322.04799.15044.20765.33298.00154.00156.33117.908721.2445.74998.962991.4414

1.814.04246.156.23738.48448.05608.191351.90521.77191.67561.56251.71751.7774

-.139591.1304.96433.87476.68729.0275.00813.17382.51284.74274.47999.5607.81098

M.BiaandA.Mattei

TreatmentIntervalNo2-[23.08799934387207,79.11299896240234]

MeanStandardDifferenceDeviationt-value

agew

maleownhsowncolltixbotworkthenyearwyearm1yearm2yearm3yearm4yearm5yearm6

-.13308-.03419-.2294-.20996-.26933.03013-.32817.51467.23703.41572.46856-.00903-.33587Mean

Difference-1.7504-.04742.34062.23199-.03159-.07006.3672-.63678-.83409-1.2074-1.351-1.6137-2.2111

1.8294.0657.13927.21228.43812.05266.170081.77411.70381.66561.5711.62421.6445StandardDeviation2.3202.06211.1914.28116.56716.07448.226131.94281.83561.73221.59821.87921.8615

-.07275-.52041-1.6471-.98908-.61474.57227-1.9295.2901.13912.24959.29826-.00556-.20423

369

TreatmentIntervalNo3-[82.98699951171875,484.7900085449219]

t-value-.75444-.763421.7796.82512-.0557-.940691.6238-.32777-.45441-.69707-.84534-.8587-1.1878

agewmaleownhsowncolltixbotworkthen

yearwyearm1yearm2yearm3yearm4yearm5yearm6

Accordingtoastandardtwo-sidedt-test:

ModerateevidenceagainstthebalancingpropertyThebalancingpropertyissatisfiedatlevel0.05

Thisoutputisthemostdetailedwecanhavebecausewespeci?edthedetailoption.Whenthisoptionisnotspeci?ed,someinformationisomittedfromtheoutput.Specif-ically,theresultsofthegoodness-of-?ttestfornormality,thesummarystatisticsofthedistributionoftheGPSevaluatedattherepresentativepointofeachtreatmentinterval,andtheresultsofthebalancingtestwithineachtreatmentintervalarenotshown.Insuchacase,theprogramprovidesonlyshortsentencesinformingtheuserwhetherthenormaldistributionmodelandthebalancingpropertyarestatisticallysatis?ed.

370EstimatingtheGPSandthedose–responsefunction

6.2Outputfromdoseresponse

Beforerunningdoseresponse,wehavetodecideaboutthetreatmentlevels,whichestimatetheaveragepotentialoutcome.FollowingHiranoandImbens(2004),wefocusonthevalues10,20,...,100,whichwestoretoa10-dimensionalvectornamedtp(seebelow).Theoutputfromrunningdoseresponseisasfollows:

.uselotterydataset.dta,clear

.quigeneratecut=23ifprize<=23

.quireplacecut=80ifprize>23&prize<=80.quireplacecut=485ifprize>80

.matrixdefinetp=(10\\20\\30\\40\\50\\60\\70\\80\\90\\100)

.doseresponseagewownhsmaletixbotowncollworkthenyearwyearm1yearm2>yearm3yearm4yearm5yearm6,outcome(year6)t(prize)gpscore(pscore)>predict(hat_treat)sigma(sd)cutpoints(cut)index(p50)nq_gps(5)>t_transf(ln)dose_response(dose_response)tpoints(tp)delta(1)

>reg_type_t(quadratic)reg_type_gps(quadratic)interaction(1)bootstrap(yes)>boot_reps(100)filename(\analysis(yes)graph(\detail********************************************ESTIMATEOFTHEGENERALIZEDPROPENSITYSCORE********************************************(outputomitted)Theoutcomevariable``year6′′isacontinuousvariable

Theregressionmodelis:Y=T+T^2+GPS+GPS^2+T*GPS

SourceSSdfMSNumberofobs

F(5,196)

Model2945.927385589.185477Prob>FResidual38378.9633196195.811037R-squared

AdjR-squared

Total41324.8907201205.596471RootMSE

year6

prizeprize_sqpscorepscore_sqprize_pscore

_cons

Coef.-.2254371.0003537-103.3373131.949.549993331.26845

Std.Err..0748156.000166948.3707679.40569.21976616.955419

t-3.012.12-2.141.662.504.50

P>|t|0.0030.0350.0340.0980.0130.000

======2023.010.01220.07130.047613.993

[95%Conf.Interval]-.3729839.0000245-198.7312-24.65021.116583517.55138

-.0778902.0006828-7.943281288.5482.983403144.98552

Bootstrappingofthestandarderrors

...............................................................................>.....................

TheprogramisdrawinggraphsoftheoutputThisoperationmaytakeawhile(filegraph_output.gphsaved)EndoftheAlgorithm

M.BiaandA.Mattei371Theestimatedcoe?cientsoftheregressionoftheoutcome,earningssixyearsafterwinningthelottery,theprize,andthescoreareshownbecausewehaverequiredadetailedoutput.Otherwise,doseresponseprovidesonlyagraphicoutput,suchasthatshownin?gure1.Figure1showsboththeestimateddose–responsefunctionandtheestimatedtreatment-e?ectfunction,whichcanbeinterpretedasaderivate,becausewehavespeci?edatreatmentgapequalto1(delta(1)).OnlyinformationconcerningtheGPSestimationisprovidedwhendetailisnotspeci?edandtheanalysis()optionissettono.Dose Response Function200E[year6(t+1)]?E[year6(t)]020406080Treatment level100?2000?1000100E[year6(t)]500010000150002000025000Treatment Effect Function204060Treatment level80100Dose ResponseUpper boundLow boundTreatment EffectUpper boundLow boundConfidence Bounds at .95 % levelDose response function = Linear predictionConfidence Bounds at .95 % levelDose response function = Linear predictionFigure1.Estimateddose–responsefunction,estimatedderivative,and95%con?dencebands(Continuedonnextpage)372EstimatingtheGPSandthedose–responsefunction

TheresultsgeneratedbydoseresponsearestoredinanewStata?le,whichwehavenamedoutput.This?lehas10observationsand6variables:treatmentlevel,containingthetreatmentlevels,atwhichweestimatetheaveragepotentialoutcome;treatmentlevelplus,containingthe#-shiftedtreatmentlevels,where#isequalto1;doseresponse,theestimateddose–responsefunction;sedoseresponsebs,thestandarderrorsoftheestimateddose–responsefunction;diffdoseresponse,thees-timatedtreatment-e?ectfunction;andsediffdoseresponsebs,thestandarderrorsoftheestimatedtreatment-e?ectfunction.Thegraphicoutputisalsostoredtoanew?le,whichwehavenamedgraphoutput.

7Acknowledgments

WethankFabriziaMealli,GuidoImbens,andKeisukeHiranofortheirinsightfulsug-gestionsanddiscussions,andGuidoImbensandKeisukeHiranoforprovidingthedata.

8References

Becker,S.O.,andA.Ichino.2002.Estimationofaveragetreatmente?ectsbasedonpropensityscores.StataJournal2:358–377.Bia,M.,andA.Mattei.2007.Applicationofthegeneralizedpropensityscore.Eval-uationofpubliccontributionstoPiedmontenterprises.POLISWorkingPaper80,UniversityofEasternPiedmont.Hirano,K.,andG.W.Imbens.2004.Thepropensityscorewithcontinuoustreat-ments.InAppliedBayesianModelingandCausalInferencefromIncomplete-DataPerspectives,ed.A.GelmanandX.-L.Meng,73–84.WestSussex,England:WileyInterScience.Holland,P.W.1986.Statisticsandcausalinference.JournaloftheAmericanStatisticalAssociation8:945–960.Imbens,G.W.,D.B.Rubin,andB.I.Sacerdote.2001.Estimatingthee?ectofunearnedincomeonlaborearnings,savings,andconsumption:Evidencefromasurveyoflotteryplayers.AmericanEconomicReview91:778–794.Je?reys,H.1961.TheoryofProbability.3rded.Oxford:OxfordUniversityPress.Leuven,E.,andB.Sianesi.2003.psmatch2:StatamoduletoperformfullMahalanobisandpropensityscorematching,commonsupportgraphing,andcovariateimbalancetesting.BostonCollegeDepartmentofEconomics,StatisticalSoftwareComponents.Downloadablefromhttp://ideas.repec.org/c/boc/bocode/s432001.html.Rosenbaum,P.R.,andD.B.Rubin.1983.Thecentralroleofthepropensityscoreinobservationalstudiesforcausale?ects.Biometrika70:41–55.Rubin,D.B.1976.Inferenceandmissingdata.Biometrika63:581–592.

M.BiaandA.Mattei

Abouttheauthors

373

MichelaBiaisaresearchassistantatLaboratorioRevelli,CentreforEmploymentStudies,CollegioCarloAlberto,Turin,Italy.

AlessandraMatteiisastatisticsresearchassistantintheDepartmentofStatistics,“GiuseppeParenti”,UniversityofFlorence,Italy.OnFebruary25,2008,shewonacompetitiveexamtoworkasanassistantprofessorofstatisticsinthefacultyofpsychologyattheUniversityofFlorence.