TheStataJournal(2008)8,Number3,pp.354–373
AStatapackagefortheestimationofthedose–responsefunctionthroughadjustmentfor
thegeneralizedpropensityscore
MichelaBia
LaboratorioRiccardoRevelliCentreforEmploymentStudies
CollegioCarloAlbertoMoncalieri,Italy
michela.bia@laboratoriorevelli.it
AlessandraMatteiDepartmentofStatisticsUniversityofFlorence
Florence,Italymattei@ds.uni?.it
Abstract.Inthisarticle,webrie?yreviewtheroleofthepropensityscoreinestimatingdose–responsefunctionsasdescribedinHiranoandImbens(2004,Ap-pliedBayesianModelingandCausalInferencefromIncomplete-DataPerspectives,73–84).ThenwepresentasetofStataprogramsthatestimatethepropensityscoreinasettingwithacontinuoustreatment,testthebalancingpropertyofthegeneralizedpropensityscore,andestimatethedose–responsefunction.Weillus-tratetheseprogramsbyusingadatasetcollectedbyImbens,Rubin,andSacerdote(2001,AmericanEconomicReview91:778–794).
Keywords:st0150,gpscore,doseresponse,doseresponsemodel,biasremoval,dose–responsefunction,generalizedpropensityscore,weakunconfoundedness
1Introduction
Muchoftheworkonpropensity-scoreanalysishasfocusedoncaseswherethetreat-mentisbinary.Matchingestimatorsforcausale?ectsofabinarytreatmentbasedonpropensityscoreshavealsobeenimplementedinStata(e.g.,BeckerandIchino[2002]andLeuvenandSianesi[2003]).
Inmanyobservationalstudies,thetreatmentmaynotbebinaryorevencategorical.Insuchacase,onemaybeinterestedinestimatingthedose–responsefunctionwherethetreatmentmighttakeonacontinuumofvalues.Forexample,ineconomics,animportantquantityofinterestisthee?ectofaidto?rms(e.g.,BiaandMattei[2007]).Insocioeconomicstudies,onemaybeinterestedinthee?ectoftheamountofalotteryprizeonsubsequentlaborearnings(e.g.,HiranoandImbens[2004]).
HiranoandImbens(2004)developedanextensiontothepropensity-scoremethodinasettingwithacontinuoustreatment.FollowingRosenbaumandRubin(1983)andmostoftheliteratureonpropensity-scoreanalysis,theymakeanunconfoundednessassumption,whichallowsthemtoremoveallbiasesincomparisonsbytreatmentstatusbyadjustingfordi?erencesinasetofcovariates.Thentheyde?neageneralizationofthepropensityscoreforthebinarycase—henceforthlabeledgeneralizedpropensityscore(GPS)—whichhasmanyoftheattractivepropertiesofthebinary-treatmentpropensityscore.
c2008StataCorpLP??
st0150
M.BiaandA.Mattei355
Inthisarticle,webrie?yreviewthemethoddevelopedbyHiranoandImbens(2004),andweprovideasetofStataprogramsthatestimatetheGPS,assesstheadequacyoftheunderlyingassumptionsonthedistributionofthetreatmentvariable,testwhethertheestimatedGPSsatis?esthebalancingproperty,andestimatethedose–responsefunction.FollowingHiranoandImbens(2004),ourStataprogramsaddresstheproblemofestimationandinferencebyusingparametricmodels.
WeillustratetheseprogramswithadatasetcollectedfromImbens,Rubin,andSac-erdote(2001).ThepopulationconsistsofindividualswhowontheMegabuckslotteryinMassachusettsinthemid-1980s.Weapplyourprogramstoestimatetheaveragepo-tentialpost-winninglaborearningsforeachlevelofthelotteryprize(thedose–responsefunction).Althoughtheassignmentoftheprizeisobviouslyrandom,substantialitemandunitnonresponseledtoaselectedsamplewheretheamountoftheprizeisnolongerindependentofbackgroundcharacteristics.Inusingtheseprograms,rememberthattheyonlyallowyoutoreduce,nottoeliminate,thebiasgeneratedbyunobservableconfoundingfactors.Asinthebinary-treatmentcase,theextenttowhichthisbiasisreduceddependscruciallyontherichnessandqualityofthecontrolvariables,onwhichtheGPSiscomputed.
2Thepropensityscorewithcontinuoustreatments
SupposewehavearandomsampleofsizeNfromalargepopulation.Foreachunitiinthesample,weobserveap×1vectorofpretreatmentcovariates,Xi;thetreatmentreceived,Ti;andthevalueoftheoutcomevariableassociatedwiththistreatment,Yi.UsingtheRubincausalmodel(Holland1986)asaframeworkforcausalinference,wede?neasetofpotentialoutcomes,{Yi(t)}t∈T,i=1,...,N,whereTisacontinuoussetofpotentialtreatmentvalues,andYi(t)isarandomvariablethatmapsaparticu-larpotentialtreatment,t,toapotentialoutcome.HiranoandImbens(2004)referto{Yi(t)}t∈Tastheunit-leveldose–responsefunction.Weareinterestedintheaveragedose–responsefunction,μ(t)=E{Yi(t)}.FollowingHiranoandImbens(2004),weas-sumethat{Yi(t)}t∈T,Ti,andXi,i=1,...,N,arede?nedonacommonprobabilityspace;thatTiiscontinuouslydistributedwithrespecttotheLebesguemeasureonT;andthatYi=Yi(Ti)isawell-de?nedrandomvariable.Tosimplifythenotation,wewilldroptheisubscriptinthesequel.
Thepropensityfunctionisde?nedbyHiranoandImbens(2004)astheconditionaldensityoftheactualtreatmentgiventheobservedcovariates.
De?nition2.1(GPS)Letr(t,x)betheconditionaldensityofthetreatmentgiventhecovariates:
r(t,x)=fT|X(t|x)
ThentheGPSisR=r(T,X).
356EstimatingtheGPSandthedose–responsefunction
TheGPShasabalancingpropertysimilartothatofthestandardpropensityscore;thatis,withinstratawiththesamevalueofr(t,x),theprobabilitythatT=tdoesnotdependonthevalueofX:
X⊥I(T=t)|r(t,x)
whereI(·)istheindicatorfunction.HiranoandImbens(2004)showthat,incombina-tionwithasuitableunconfoundednessassumption,thisbalancingpropertyimpliesthatassignmenttotreatmentisunconfounded,giventheGPS.
Theorem2.1(WeakunconfoundednessgiventheGPS)Supposethatassignmenttothetreatmentisweaklyunconfounded,givenpretreatmentvariablesX:
Y(t)⊥T|X
Then,foreveryt,
fT{t|r(t,X),Y(t)}=fT{t|r(t,X)}
Usingthistheorem,HiranoandImbens(2004)showthattheGPScanbeusedtoeliminateanybiasesassociatedwithdi?erencesinthecovariates.
Theorem2.2(BiasremovalwithGPS)Supposethatassignmenttothetreatmentisweaklyunconfounded,givenpretreatmentvariablesX.Then
β(t,r)=E{Y(t)|r(t,X)=r}=E(Y|T=t,R=r)
and
μ(t)=E[β{t,r(t,X)}]
forallt∈T
3Estimationandinference
TheimplementationoftheGPSmethodconsistsofthreesteps.Inthe?rststep,weestimatethescorer(t,x).Inthesecondstep,weestimatetheconditionalexpectationoftheoutcomeasafunctionoftwoscalarvariables,thetreatmentlevelTandtheGPSR:β(t,r)=E(Y|T=t,R=r).Inthethirdstep,weestimatethedose–responsefunction,μ(t)=E[β{t,r(t,X)}],t∈T,byaveragingtheestimatedconditionalexpectation,??{t,r(t,X)},overtheGPSateachlevelofthetreatmentweareinterestedin.β
M.BiaandA.Mattei357
3.1
Modelingtheconditionaldistributionofthetreatmentgiventhecovariates
The?rststepistoestimatetheconditionaldistributionofthetreatmentgiventhecovariates.Weassumethatthetreatment(oritstransformation)hasanormaldistri-butionconditionalonthecovariates:
????
g(Ti)|Xi~Nh(γ,Xi),σ2
(1)
whereg(Ti)isasuitabletransformationofthetreatmentvariable[g(·)maybetheidentityfunction],andh(γ,Xi)isafunctionofcovariateswithlinearandhigher-orderterms,whichdependsonavectorofparameters,γ.Thechoiceofthehigher-ordertermstoincludeisonlydeterminedbytheneedtoobtainanestimateoftheGPSthatsatis?esthebalancingproperty.
Theprogramgpscore.adoestimatestheGPSandteststhebalancingpropertyac-cordingtothefollowingalgorithm:
1.Estimatetheparametersγandσ2oftheconditionaldistributionofthetreatmentgiventhecovariates(1)bymaximumlikelihood.12.Assessthevalidityoftheassumednormaldistributionmodelbyoneofthefollow-inguser-speci?edgoodness-of-?ttests:theKolmogorov–Smirnov,theShapiro–Francia,theShapiro–Wilk,ortheStataskewnessandkurtosistestfornormality.
a.Ifthenormaldistributionmodelisstatisticallydisapproved,informtheuserthattheassumptionofnormalityisnotsatis?ed.Theuserisinvitedtouseadi?erenttransformationofthetreatmentvariableg(Ti).3.EstimatetheGPSas
1??i=√Rexp?2{g(Ti)?h(γ??,Xi)}2σ??2πσ??2whereγ??andσ??2aretheestimatedparametersinstep1.
4.Testthebalancingpropertyandinformtheuserwhetherandtowhatextent
thebalancingpropertyissupportedbythedata.FollowingHiranoandImbens(2004),theprogramgpscore.adotestsforbalancingofcovariatesaccordingtothefollowingscheme:
a.Dividethesetofpotentialtreatmentvalues,T,intoKintervalsaccordingtoauser-speci?edrule,whichshouldbede?nedonthebasisofthesampledis-tributionofthetreatmentvariable.LetG1,...,GKdenotetheKtreatmentintervals.
1.Themodel(1)isspeci?edintheauxiliaryado-?legpscoremodel.ado.
1
????
358EstimatingtheGPSandthedose–responsefunction
b.WithineachtreatmentintervalGk,k=1,...,K,computetheGPSatauser-speci?edrepresentativepoint(e.g.,themean,themedian,oranotherpercentile)ofthetreatmentvariable,whichwedenotebytGk,foreachunit.Letr(tGk,Xi)bethevalueoftheGPScomputedattGk∈Gkforuniti.c.Foreachk,k=1,...,K,blockonthescoresr(tGk,Xi),usingmintervals,de?nedbythequantilesoforderj/m,j=1,...,m?1,oftheGPSevalu-(k)(k)
atedattGk,r(tGk,Xi),i=1,...,N.LetB1,...,BmdenotethemGPSintervalsforthekthtreatmentinterval,Gk.d.WithineachintervalBj,j=1,...,m,calculatethemeandi?erenceofeachcovariatebetweenunitsthatbelongtothetreatmentinterval,Gk,{i:Ti∈
(k)
Gk},andunitsthatareinthesameGPSinterval,{i:r(tGk,Xi)∈Bj},but
/Gk}.belongtoanothertreatmentinterval,{i:Ti∈e.Combinethemdi?erencesinmeans,calculatedinstepd,byusingaweighted
average,withweightsgivenbythenumberofobservationsineachGPSin-(k)
tervalBj,j=1,...,m.Speci?cally,thefollowingweightedaverageiscalculatedforeachofthepcovariatesXl,l=1,...,p:
m1??
NB(k){xl,j(Gk)?xl,j(Gck)}jNj=1(k)
whereNB(k)isthenumberofobservationsintheBj
j
(k)
GPS
interval;xl,j(Gk)
(k)
isthemeanofthecovariateXlforunitsi,suchthatr(tGk,Xi)∈Bjand
??
Ti∈Gk;andxl,j(Gck)isthemeanofthecovariateXlforunitsi,suchthat
(k)
/Gk.Theteststatisticsweusetoevaluatether(tGk,Xi??)∈BjandTi??∈
balancingpropertyarefunctionsofthisweightedaverage.
f.ForeachGk,k=1,...,K,teststatistics(theStudent’ststatisticsortheBayesfactors)arecalculatedandshownintheResultswindow.Finally,themostextremevalueoftheteststatistics(thehighestabsolutevalueoftheStudent’ststatisticsorthelowestvalueoftheBayesfactors)iscomparedwithreferencevalues,andtheuserisinformedoftheextenttowhichthebalancingpropertyissupportedbythedata.
3.2
EstimatingtheconditionalexpectationoftheoutcomegiventhetreatmentandGPS
Inthesecondstage,wemodeltheconditionalexpectationoftheoutcome,Yi,givenTiandRi,asa?exiblefunctionofitstwoarguments.Weusepolynomialapproximationsofordernothigherthanthree.Speci?cally,themostcomplexmodelweconsideris
?{E(Yi|Ti,Ri)}=ψ(Ti,Ri;α)
23+α6·Ri+α7·Ti·Ri=α0+α1·Ti+α2·Ti2+α3·Ti3+α4·Ri+α5·Ri
M.BiaandA.Mattei359
where?(·)isalinkfunctionthatrelatesthepredictor,ψ(Ti,Ri;α),totheconditionalexpectation,E(Yi|Ti,Ri).
Weassumethatthemaine?ectsofTiandRicannotberemovedsothatwehave18possiblesubmodels.Theprogramdoseresponsemodel.adode?nesallthesemodels
??i.When?ttingtheselectedandestimateseachofthembyusingtheestimatedGPS,R
model,theprogramtakesintoaccountthenatureoftheoutcomevariable—whichmaybebinary,categorical(nominalorordinal),orcontinuous—bychoosingtheappropriatelinkfunction.
AsHiranoandImbens(2004)emphasize,thereisnodirectmeaningtotheestimatedcoe?cientsintheselectedmodel,exceptthattestingwhetherallcoe?cientsinvolvingtheGPSareequaltozerocanbeinterpretedasatestofwhetherthecovariatesintroduceanybias.
3.3Estimatingthedose–responsefunction
Thelaststepconsistsofaveragingtheestimatedregressionfunctionoverthescorefunctionevaluatedatthedesiredlevelofthetreatment.Speci?cally,inordertoobtainanestimateoftheentiredose–responsefunction,weestimatetheaveragepotentialoutcomeforeachlevelofthetreatmentweareinterestedinas
NN??1????1???1??????E{Y(t)}=???}β{t,r??(t,Xi)}=ψ{t,r??(t,Xi);αNi=1Ni=1
whereα??isthevectoroftheestimatedparametersinthesecondstage.
Theprogramdoseresponse.adoestimatesthedose–responsefunctionaccordingto
thefollowingalgorithm:
1.EstimatetheGPS,verifythenormalmodelusedfortheGPS,andtestthebalancingpropertycallingtheroutinegpscore.ado.2.Estimatetheconditionalexpectationoftheoutcome,giventhetreatmentandtheGPS,bycallingtheroutinedoseresponsemodel.ado.3.Estimatetheaveragepotentialoutcomeforeachlevelofthetreatmenttheuserisinterestedin.4.Estimatestandarderrorsofthedose–responsefunctionviabootstrapping.25.Plottheestimateddose–responsefunctionand,ifrequested,itscon?denceinter-vals.
2.HiranoandImbens(2004)statethatasymptoticstandarderrorsoftheestimateddose–responsefunctioncouldbecalculatedbyusingexpansionsbasedontheestimatingequations;theseshouldtakeintoaccounttheestimationoftheGPSaswellastheαparameters.Forpracticalreasons,ourprogramusesbootstrapmethodstoobtainstandarderrorsandcon?denceintervalsofthedose–responsefunctionthattakeintoaccountestimationoftheGPSandtheαparameters.
360EstimatingtheGPSandthedose–responsefunction
Someremarksonstep4ofthealgorithmcanbeuseful.Whenbootstrappedstandarderrorsarerequested,byactivatingtheappropriateoption(seesections4and5),thebootstrapencompassesboththeestimationoftheGPSbasedonthespeci?cationgivenbytheuserandtheestimationoftheαparameters.ReestimatingtheGPSandtheαparametersateachreplicationofthebootstrapprocedureallowsustoaccountfortheuncertaintyassociatedwiththeestimationoftheGPSandtheαparameters.
Typically,userswould?rstidentifyatransformationofthetreatmentvariableandaspeci?cationofthefunctionhin(1),satisfyingthenormalityassumptionandthebalancingproperty,respectively(byusing,forinstance,theroutinegpscore.ado),andthenprovideexactlythistransformationandthisspeci?cationintheinputtothepro-gramdoseresponse.ado.
4Syntax
??if????
in????
??
weight,t(varname)gpscore(newvar)
predict(newvar)sigma(newvar)cutpoints(varname)index(string)
??
nqgps(#)ttransf(transformation)normaltest(test)normlevel(#)
??
testvarlist(varlist)test(type)flag(#)detailgpscorevarlist
????????????
inweight,outcome(varname)doseresponsemodeltreatvarGPSvarif
??
cmd(regressioncmd)regtypet(string)regtypegps(type)
??
interaction(#)doseresponsevarlist
??if????
in????
??
weight,outcome(varname)t(varname)
gpscore(newvar)predict(newvar)sigma(newvar)cutpoints(varname)index(string)nqgps(#)doseresponse(newvarlist)??
ttransf(transformation)normaltest(test)normlevel(#)testvarlist(varlist)test(type)flag(#)cmd(regressioncmd)regtypet(type)regtypegps(type)interaction(#)tpoints(vector)npoints(#)delta(#)filename(?lename)bootstrap(string)bootreps(#)
??
analysis(string)analysislevel(#)graph(?lename)detailInthegpscoreanddoseresponsecommands,theargumentvarlistrepresentsthelistofcontrolvariables,whichareusedtoestimatetheGPS.Inthedoseresponsemodelcommand,thevariablelistconsistsofonlytwovariables:thetreatmentvariable(treatvar)andtheGPS(GPSvar).
M.BiaandA.Mattei361
5Options
Wedescribeonlytheoptionsforthedoseresponsecommand,becausetheyincludealltheoptionsforthegpscorecommandandthedoseresponsemodelcommand.There-fore,alltheoptionsdescribedinsections5.1and5.2applytodoseresponse,andwespecify,ifapplicable,whethertheoptionalsoappliestogpscoreordoseresponsemodel.
5.1Required
outcome(varname)(doseresponsemodel)speci?esthatvarnameistheoutcomevari-able.t(varname)(gpscore)speci?esthatvarnameisthetreatmentvariable.gpscore(newvar)(gpscore)speci?esthevariablenamefortheestimatedGPS.predict(newvar)(gpscore)createsanewvariabletoholdthe?ttedvaluesofthetreatmentvariable.sigma(newvar)(gpscore)createsanewvariabletoholdthemaximumlikelihoodesti-mateoftheconditionalstandarderrorofthetreatmentgiventhecovariates.cutpoints(varname)(gpscore)dividesthesetofpotentialtreatmentvalues,T,intointervalsaccordingtothesampledistributionofthetreatmentvariable,cuttingatvarnamequantiles.index(string)(gpscore)speci?estherepresentativepointofthetreatmentvariableatwhichtheGPShastobeevaluatedwithineachtreatmentinterval.stringidenti-?eseitherthemean(string=mean)orapercentile(string=p1,...,p100)ofthetreatment.nqgps(#)(gpscore)speci?esthatthevaluesoftheGPSevaluatedattherepresen-tativepointindex(string)ofeachtreatmentintervalhavetobedividedinto#(#∈{1,...,100})intervals,de?nedbythequantilesoftheGPSevaluatedattherepresentativepointindex(string).doseresponse(newvarlist)speci?esthevariablename(s)fortheestimateddose–responsefunction(s).
(Continuedonnextpage)
362EstimatingtheGPSandthedose–responsefunction
5.2Optional
ttransf(transformation)(gpscore)speci?esthetransformationofthetreatmentvari-ableusedinestimatingtheGPS.Thedefaulttransformationistheidentityfunction.Thesupportedtransformationsarethelogarithmictransformation,ttransf(ln);thezero-skewnesslogtransformation,ttransf(lnskew0);thezero-skewnessBox–Coxtransformation,ttransf(bcskew0);andtheBox–Coxtransformation,ttransf(boxcox).TheBox–Coxtransformation?ndsthemaximumlikelihoodestimatesoftheparametersoftheBox–Coxtransformregressingthetreatmentvariablet(varname)onthecontrolvariableslistedintheinputvariablelist.3normaltest(test)(gpscore)speci?esthegoodness-of-?ttestthatgpscorewillper-formtoassessthevalidityoftheassumednormaldistributionmodelforthetreat-mentconditionalonthecovariates.Bydefault,gpscoreperformstheKolmogorov–Smirnovtest(normaltest(ksmirnov)).PossiblealternativesaretheShapiro–Franciatest,normaltest(sfrancia);theShapiro–Wilktest,normaltest(swilk);andtheStataskewnessandkurtosistestfornormality,normaltest(sktest).normlevel(#)(gpscore)setsthesigni?cancelevelofthegoodness-of-?ttestfornor-mality.Thedefaultisnormlevel(0.05).testvarlist(varlist)(gpscore)speci?esthattheextentofcovariatebalancinghastobeinspectedforeachvariableofvarlist.ThedefaultvarlistconsistsofthevariablesusedtoestimatetheGPS.Thisoptionisusefulwhentherearecategoricalvariablesamongthecovariates.gpscore,whichisaregression-likecommand,requiresthatcategoricalvariablesareexpandedintoindicator(alsocalleddummy)variablesetsandthatonedummy-variablesetisdroppedinestimatingtheGPS.However,thebalancingtestshouldalsobeperformedontheomittedgroup.Thiscanbedonebyusingthetestvarlist(varlist)optionandbylistinginvarlistallthevariables,includingthecompletesetofindicatorvariablesforeachcategoricalcovariate.
3.Theproblemiswhetherthetreatmentvariabletakeszerovalue.Insuchacase,theprogramcontinues,forcingatransformationofthetreatmentvariabletotakeasuitablevalue.Speci?cally,weassumethatln(0)=0,ttransf(0)=?1/λifλ>0,andttransf(0)=ln(0)=0ifλ=0,forttransf=bcskew0orboxcox.Allowingforzerovaluesofthetreatmentimpliesthatuntreatedunitsmightbeincludedinthestudy.BecausetheGPSmethodsaredesignedforanalyzingthee?ectofatreatmentintensity,theyspeci?callyrefertothesubpopulationoftreatedunits.Thisimpliesthatincludinguntreatedunitsmightleadtomisleadingresults.
M.BiaandA.Mattei363
test(type)(gpscore)speci?eswhetherthebalancingpropertyhastobetestedusingeitherastandardtwo-sidedttest(thedefault)oraBayes-factor–basedmethod(test(Bayesfactor)).Theprograminformstheuserifthereissomeevidencethatthebalancingpropertyissatis?ed.Recallthatthetestisperformedforeachsinglevariableintestvarlist(varlist)andforeachtreatmentinterval.Speci?cally,letpbethenumberofcontrolvariablesintestvarlist(varlist),andletKbethenumberofthetreatmentintervals.We?rstcalculatep×Kvaluesoftheteststatistic;thenweselecttheworstvalue(thehighesttvalueinmodulus,orthelowestBayesfactor)andcompareitwithstandardvalues.Table1showsthe“orderofmagnitude”interpretationsoftheteststatisticsweconsider.
Table1.“Orderofmagnitude”interpretationsoftheteststatistics
tvalue|t|<1.282
1.282<|t|<1.645
Bayesfactor(BF)?Evidenceforthebalancingproperty(BP)√
BF>1.00
EvidencesupportstheBP
0.10 √ 1.645<|t|<1.9600.10 |t|>2.576 ? 0.01 BF<0.01 DecisiveevidenceagainsttheBP TheorderofmagnitudeinterpretationsoftheBayesfactorweappliedwereproposedbyJe?reys(1961). flag(#)(gpscore)speci?esthatgpscoreestimatestheGPSwithoutperformingeitheragoodness-of-?ttestfornormalityorabalancingtest.Thedefault#is1,meaningthatboththenormaldistributionmodelandthebalancingpropertyaretested;thedefaultlevelisrecommended.Weintroducedthisoptionforpracticalreasons.Recallthatdoseresponseestimatesthestandarderrorsofthedose–responsefunctionbyusingbootstrapmethods.Ineachbootstrapiteration,wewanttoreestimatetheGPSwithouttestingeitherthenormalityassumptionorthebalancingproperty.cmd(regressioncmd)(doseresponsemodel)de?nestheregressioncommandtobeusedforestimatingtheconditionalexpectationoftheoutcomegiventhetreatmentandtheGPS.Thedefaultfortheoutcomevariableiscmd(logit)whentherearetwodis-tinctvalues,cmd(mlogit)whenthereare3–5values,andcmd(regress)otherwise.Thesupportedregressioncommandsarelogit,probit,mlogit,mprobit,ologit,oprobit,andregress. 364EstimatingtheGPSandthedose–responsefunction regtypet(type)(doseresponsemodel)de?nesthemaximumpowerofthetreatmentvariableinthepolynomialfunctionusedtoapproximatethepredictorforthecon-ditionalexpectationoftheoutcomegiventhetreatmentandtheGPS.Thedefault ??;α),isalinearfunctionofthetypeislinear,meaningthatthepredictor,ψ(T,R treatment.Alternatively,typecanbequadraticorcubic.regtypegps(type)(doseresponsemodel)de?nesthemaximumpoweroftheesti-matedGPSinthepolynomialfunctionusedtoapproximatethepredictorfortheconditionalexpectationoftheoutcomegiventhetreatmentandtheGPS.Thede-??;α),isalinearfunctionoffaulttypeislinear,meaningthatthepredictor,ψ(T,R theestimatedGPS.Alternatively,typecanbequadraticorcubic.interaction(#)(doseresponsemodel)speci?eswhetherthemodelforthecondi-tionalexpectationoftheoutcomegiventhetreatmentandtheGPShastheinterac-tionbetweentreatmentandGPS.Thedefault#is1,meaningthattheinteractionisincluded.tpoints(vector)speci?esthatdoseresponseestimatestheaveragepotentialoutcomeforeachlevelofthetreatmentinvector.Bydefault,doseresponsecreatesavectorwiththeithelementequaltotheithobservedtreatmentvalue.Thisoptioncannotbeusedwiththenpoints(#)option(seebelow).npoints(#)speci?esthatdoseresponseestimatestheaveragepotentialoutcomeforeachlevelofthetreatmentbelongingtoasetofevenlyspacedvalues,t0,t1,...,t#,thatcovertherangeoftheobservedtreatment.Thisoptioncannotbeusedwiththetpoints(vector)option(seeabove).delta(#)speci?esthatdoseresponsealsoestimatesthetreatment-e?ectfunctioncon-sideringa#-treatmentgap,whichisde?nedasμ(t+#)?μ(t).Thedefault#is0,meaningthatdoseresponseestimatesonlythedose–responsefunction,μ(t).filename(?lename)speci?esthatthetreatmentlevelsspeci?edthroughthe tpoints(vector)optionorthenpoints(#)option,theestimateddose–responsefunction,and,eventually,theestimatedtreatment-e?ectfunction,alongwiththeirstandarderrors(ifcalculated),bestoredtoanew?lecalled?lename.bootstrap(string)speci?estheuseofbootstrapmethodstoderivestandarderrorsandcon?denceintervals.Bydefault,doseresponsedoesnotapplybootstraptechniques.Insuchacase,nostandarderroriscalculated.Toactivatethisoption,stringshouldbesettoyes.bootreps(#)speci?esthenumberofbootstrapreplicationstobeperformed.Thedefaultisbootreps(50).Thisoptionproducesane?ectonlyifthebootstrap()optionissettoyes. M.BiaandA.Mattei365 analysis(string)speci?esthatdoseresponseplotstheestimateddose–responsefunc-tion(s)and,eventually,theestimatedtreatment-e?ectfunction(s),alongwiththecorrespondingcon?denceintervalsiftheyarecalculatedwithbootstrapping.Bydefault,doseresponseplotsonlytheestimateddose–responseandtreatmentfunc-tion(s).Inordertoplotcon?denceintervals,stringhastobesettoyes.Iftheusertypesanalysis(no),noplotisshown.analysislevel(#)setsthecon?dencelevelofthecon?denceintervals.Thedefaultisanalysislevel(0.95).graph(?lename)storestheplotsoftheestimateddose–responsefunctionandtheesti-matedtreatmente?ectstoanew?lecalled?lename.Whentheoutcomevariableiscategorical,doseresponsecreatesanew?leforeachcategoryioftheoutcomevariableandnamesit?lenamei.detail(gpscore)displaysmoredetailedoutput.Speci?cally,thisoptionspeci?esthatgpscoreshowstheresultsofthegoodness-of-?ttestfornormality,somesummarystatisticsofthedistributionoftheGPSevaluatedattherepresentativepointofeachtreatmentinterval,andtheresultsofthebalancingtestwithineachtreatmentinterval.Whenthisoptionisspeci?edfordoseresponse,theresultsoftheregressionoftheoutcomeonthetreatmentandtheGPSarealsoshown. 6 Example:TheImbens–Rubin–Sacerdotelotterysam-ple WeusedatafromthesurveyofMassachusettslotterywinners;thedataaredescribedindetailinImbens,Rubin,andSacerdote(2001).Weareinterestedinestimatingthee?ectoftheprizeamountonsubsequentlaborearnings(fromU.S.SocialSecurityrecords).Althoughthelotteryprizeisobviouslyrandomlyassigned,substantialunitanditemnonresponseledtoaselectedsample,wheretheamountoftheprizeispotentiallycorrelatedwithbackgroundcharacteristicsandpotentialoutcomes.Toremovesuchbiases,wemaketheweakunconfoundednessassumptionspecifyingthat,conditionalonthecovariates,thelotteryprizeisindependentofthepotentialoutcomes.4 Thesampleweuseinthisanalysisisthe“winners”sampleof237individualswhowonamajorprizeinthelottery.Theoutcomeofinterestisyear6(earningssixyearsafterwinningthelottery),andthetreatmentisprize,theprizeamount.Controlvariablesareage,gender,yearsofhighschool,yearsofcollege,winningyear,numberofticketsbought,workstatusafterwinning,andearningssyearsbeforewinningthelottery(withs=1,2,...,6). WetriedtoreplicatetheresultsproducedbyHiranoandImbens(2004)buthavenotbeenabletonumericallyreplicatealltheirestimatesbecauseofrestrictionsofour 4.Inthiscontext,thenonignorabilityoftheassignmentmechanismisduetothepresenceofnon-response.Therefore,sayingthattheunconfoundednessassumptionallowsustoremoveallbiasesassociatedwithdi?erencesintheobservedcovariatesmeansthatweareimplicitlyassumingthattheoutcomevariableismissingatrandom(Rubin1976). 366EstimatingtheGPSandthedose–responsefunction programs.Speci?cally,ourprogramsdonotallowustoconsiderafunctionofthetreat-mentvariableorafunctionoftheGPSintheestimationoftheconditionalexpectationoftheoutcome,giventhetreatmentandtheGPS.However,wegetqualitativelysimilarresults. 6.1Outputfromgpscore We?rstchoosethequantilesofthetreatmentvariabletodividethesampleintogroups.FollowingHiranoandImbens(2004),wedividetherangeofprizesintothreetreatmentintervals,[0–23],(23–80],and(80–485].Thenwerungpscoreusingthespeci?cationappliedbyHiranoandImbens(2004).Theoutputlookslikethefollowing: .uselotterydataset.dta .quigeneratecut=23ifprize<=23 .quireplacecut=80ifprize>23&prize<=80.quireplacecut=485ifprize>80 .gpscoreagewmaleownhsowncolltixbotworkthenyearwyearm1yearm2yearm3>yearm4yearm5yearm6,t(prize)gpscore(pscore)predict(hat_treat)sigma(sd)>cutpoints(cut)index(p50)nq_gps(5)t_transf(ln)detailGeneralizedPropensityScore ******************************************************Algorithmtoestimatethegeneralizedpropensityscore****************************************************** Estimationofthepropensityscore Thelogtransformationofthetreatmentvariableprizeisused T 1%5%Pu???% Percentiles1.6094382.2838512.4200122.835211 3.457834.1430084.8754265.1288925.720607 logloglogloglog Largest5.5987925.7206075.7786436.183716likelihoodlikelihoodlikelihoodlikelihoodlikelihood ===== Smallest.1301507.13015071.6094381.67818 Obs SumofWgt.Mean Std.Dev.VarianceSkewnessKurtosis - 2372373.558185.9553768.9127448-.01658893.452439 initial:feasible:rescale:rescaleeq:Iteration0: (couldnotbeevaluated) (outputomitted)Iteration4:loglikelihood=-307.68186 M.BiaandA.Mattei NumberofobsWaldchi2(13)Prob>chi2 Std.Err..0048563.1351124.060835.0397666.0182546.1645602.0464566.010379.0162758.0166256.0158217.0153635.0110455.4693959.040709 z3.133.240.320.940.240.77-0.030.60-0.760.721.53-1.41-0.454.9321.77 P>|z|0.0020.0010.7520.3490.8120.4400.9750.5500.4490.4720.1260.1590.6510.0000.000 === 23737.220.0004 367 Loglikelihood=-307.68186 T eq1 agewmaleownhsowncolltixbotworkthen yearwyearm1yearm2yearm3yearm4yearm5yearm6_conseq2 _cons .886297.0151905.4379826.0192025.0372805.0043423.1270879-.0014367.0062064-.0123161.0119446.0242245-.0216437-.00500212.315546 Coef. [95%Conf.Interval].0056724.1731672-.1000319-.0406607-.031436-.1954442-.09249-.014136-.044216-.0206411-.0067855-.0517555-.02665091.395547.806509 .0247086.702798.1384368.1152217.0401206.44962.0896166.0265488.0195839.0445302.0552344.0084682.01664673.235545.9660851 Testfornormalityofthedisturbances Kolmogorov-Smirnovequality-of-distributionstestNormalDistributionofthedisturbances One-sampleKolmogorov-Smirnovtestagainsttheoreticaldistribution normal((res_etreat-r(mean))/sqrt(r(Var))) SmallergroupDP-valueCorrectedres_etreat:0.0517Cumulative:-0.0420CombinedK-S:0.0517TheassumptionofNormalityis 0.2810.4340.5500.517 statisticallysatisfiedat.05level Estimatedgeneralizedpropensityscore 1%5%P% Percentiles.0131817.0869414.1272663.2255553.3536221 Smallest.0003053.0011738.0131817.0163113 Obs SumofWgt.Mean Std.Dev. 237237.3196603.1222106.0149354-.77235012.510499 Largest 75%.4343045.450000390%.4481351.4500911Variance95%.4497166.450096Skewness99%.4500911.4501086Kurtosis********************************************Endofthealgorithmtoestimatethegpscore******************************************** 368EstimatingtheGPSandthedose–responsefunction ******************************************************************************Thesetofthepotentialtreatmentvaluesisdividedinto3intervalsThevaluesofthegpscoreevaluatedattherepresentativepointofeachtreatmentintervalaredividedinto5intervals *****************************************************************************************************************************************SummarystatisticsofthedistributionoftheGPSevaluatedattherepresentativepointofeachtreatmentinterval *********************************************************** VariableObsMeanStd.Dev.MinMax gps_1Variablegps_2Variable gps_3 237Obs237Obs237 .262852 Mean.4178101 Mean.1814998 .0956436Std.Dev..0373217Std.Dev..088236 .0583948 Min.2433839 Min.0181741 .4486237 Max.4501224 Max.4141454 ******************************************************************************Testthattheconditionalmeanofthepre-treatmentvariablesgiventhe generalizedpropensityscoreisnotdifferentbetweenunitswhobelongtoaparticulartreatmentintervalandunitswhobelongtoallothertreatmentintervals ******************************************************************************TreatmentIntervalNo1-[1.139000058174133,22.98200035095215] MeanStandardDifferenceDeviationt-value agew maleownhsowncolltixbotworkthenyearwyearm1yearm2yearm3yearm4yearm5yearm6 -.25322.04799.15044.20765.33298.00154.00156.33117.908721.2445.74998.962991.4414 1.814.04246.156.23738.48448.05608.191351.90521.77191.67561.56251.71751.7774 -.139591.1304.96433.87476.68729.0275.00813.17382.51284.74274.47999.5607.81098 M.BiaandA.Mattei TreatmentIntervalNo2-[23.08799934387207,79.11299896240234] MeanStandardDifferenceDeviationt-value agew maleownhsowncolltixbotworkthenyearwyearm1yearm2yearm3yearm4yearm5yearm6 -.13308-.03419-.2294-.20996-.26933.03013-.32817.51467.23703.41572.46856-.00903-.33587Mean Difference-1.7504-.04742.34062.23199-.03159-.07006.3672-.63678-.83409-1.2074-1.351-1.6137-2.2111 1.8294.0657.13927.21228.43812.05266.170081.77411.70381.66561.5711.62421.6445StandardDeviation2.3202.06211.1914.28116.56716.07448.226131.94281.83561.73221.59821.87921.8615 -.07275-.52041-1.6471-.98908-.61474.57227-1.9295.2901.13912.24959.29826-.00556-.20423 369 TreatmentIntervalNo3-[82.98699951171875,484.7900085449219] t-value-.75444-.763421.7796.82512-.0557-.940691.6238-.32777-.45441-.69707-.84534-.8587-1.1878 agewmaleownhsowncolltixbotworkthen yearwyearm1yearm2yearm3yearm4yearm5yearm6 Accordingtoastandardtwo-sidedt-test: ModerateevidenceagainstthebalancingpropertyThebalancingpropertyissatisfiedatlevel0.05 Thisoutputisthemostdetailedwecanhavebecausewespeci?edthedetailoption.Whenthisoptionisnotspeci?ed,someinformationisomittedfromtheoutput.Specif-ically,theresultsofthegoodness-of-?ttestfornormality,thesummarystatisticsofthedistributionoftheGPSevaluatedattherepresentativepointofeachtreatmentinterval,andtheresultsofthebalancingtestwithineachtreatmentintervalarenotshown.Insuchacase,theprogramprovidesonlyshortsentencesinformingtheuserwhetherthenormaldistributionmodelandthebalancingpropertyarestatisticallysatis?ed. 370EstimatingtheGPSandthedose–responsefunction 6.2Outputfromdoseresponse Beforerunningdoseresponse,wehavetodecideaboutthetreatmentlevels,whichestimatetheaveragepotentialoutcome.FollowingHiranoandImbens(2004),wefocusonthevalues10,20,...,100,whichwestoretoa10-dimensionalvectornamedtp(seebelow).Theoutputfromrunningdoseresponseisasfollows: .uselotterydataset.dta,clear .quigeneratecut=23ifprize<=23 .quireplacecut=80ifprize>23&prize<=80.quireplacecut=485ifprize>80 .matrixdefinetp=(10\\20\\30\\40\\50\\60\\70\\80\\90\\100) .doseresponseagewownhsmaletixbotowncollworkthenyearwyearm1yearm2>yearm3yearm4yearm5yearm6,outcome(year6)t(prize)gpscore(pscore)>predict(hat_treat)sigma(sd)cutpoints(cut)index(p50)nq_gps(5)>t_transf(ln)dose_response(dose_response)tpoints(tp)delta(1) >reg_type_t(quadratic)reg_type_gps(quadratic)interaction(1)bootstrap(yes)>boot_reps(100)filename(\analysis(yes)graph(\detail********************************************ESTIMATEOFTHEGENERALIZEDPROPENSITYSCORE********************************************(outputomitted)Theoutcomevariable``year6′′isacontinuousvariable Theregressionmodelis:Y=T+T^2+GPS+GPS^2+T*GPS SourceSSdfMSNumberofobs F(5,196) Model2945.927385589.185477Prob>FResidual38378.9633196195.811037R-squared AdjR-squared Total41324.8907201205.596471RootMSE year6 prizeprize_sqpscorepscore_sqprize_pscore _cons Coef.-.2254371.0003537-103.3373131.949.549993331.26845 Std.Err..0748156.000166948.3707679.40569.21976616.955419 t-3.012.12-2.141.662.504.50 P>|t|0.0030.0350.0340.0980.0130.000 ======2023.010.01220.07130.047613.993 [95%Conf.Interval]-.3729839.0000245-198.7312-24.65021.116583517.55138 -.0778902.0006828-7.943281288.5482.983403144.98552 Bootstrappingofthestandarderrors ...............................................................................>..................... TheprogramisdrawinggraphsoftheoutputThisoperationmaytakeawhile(filegraph_output.gphsaved)EndoftheAlgorithm M.BiaandA.Mattei371Theestimatedcoe?cientsoftheregressionoftheoutcome,earningssixyearsafterwinningthelottery,theprize,andthescoreareshownbecausewehaverequiredadetailedoutput.Otherwise,doseresponseprovidesonlyagraphicoutput,suchasthatshownin?gure1.Figure1showsboththeestimateddose–responsefunctionandtheestimatedtreatment-e?ectfunction,whichcanbeinterpretedasaderivate,becausewehavespeci?edatreatmentgapequalto1(delta(1)).OnlyinformationconcerningtheGPSestimationisprovidedwhendetailisnotspeci?edandtheanalysis()optionissettono.Dose Response Function200E[year6(t+1)]?E[year6(t)]020406080Treatment level100?2000?1000100E[year6(t)]500010000150002000025000Treatment Effect Function204060Treatment level80100Dose ResponseUpper boundLow boundTreatment EffectUpper boundLow boundConfidence Bounds at .95 % levelDose response function = Linear predictionConfidence Bounds at .95 % levelDose response function = Linear predictionFigure1.Estimateddose–responsefunction,estimatedderivative,and95%con?dencebands(Continuedonnextpage)372EstimatingtheGPSandthedose–responsefunction TheresultsgeneratedbydoseresponsearestoredinanewStata?le,whichwehavenamedoutput.This?lehas10observationsand6variables:treatmentlevel,containingthetreatmentlevels,atwhichweestimatetheaveragepotentialoutcome;treatmentlevelplus,containingthe#-shiftedtreatmentlevels,where#isequalto1;doseresponse,theestimateddose–responsefunction;sedoseresponsebs,thestandarderrorsoftheestimateddose–responsefunction;diffdoseresponse,thees-timatedtreatment-e?ectfunction;andsediffdoseresponsebs,thestandarderrorsoftheestimatedtreatment-e?ectfunction.Thegraphicoutputisalsostoredtoanew?le,whichwehavenamedgraphoutput. 7Acknowledgments WethankFabriziaMealli,GuidoImbens,andKeisukeHiranofortheirinsightfulsug-gestionsanddiscussions,andGuidoImbensandKeisukeHiranoforprovidingthedata. 8References Becker,S.O.,andA.Ichino.2002.Estimationofaveragetreatmente?ectsbasedonpropensityscores.StataJournal2:358–377.Bia,M.,andA.Mattei.2007.Applicationofthegeneralizedpropensityscore.Eval-uationofpubliccontributionstoPiedmontenterprises.POLISWorkingPaper80,UniversityofEasternPiedmont.Hirano,K.,andG.W.Imbens.2004.Thepropensityscorewithcontinuoustreat-ments.InAppliedBayesianModelingandCausalInferencefromIncomplete-DataPerspectives,ed.A.GelmanandX.-L.Meng,73–84.WestSussex,England:WileyInterScience.Holland,P.W.1986.Statisticsandcausalinference.JournaloftheAmericanStatisticalAssociation8:945–960.Imbens,G.W.,D.B.Rubin,andB.I.Sacerdote.2001.Estimatingthee?ectofunearnedincomeonlaborearnings,savings,andconsumption:Evidencefromasurveyoflotteryplayers.AmericanEconomicReview91:778–794.Je?reys,H.1961.TheoryofProbability.3rded.Oxford:OxfordUniversityPress.Leuven,E.,andB.Sianesi.2003.psmatch2:StatamoduletoperformfullMahalanobisandpropensityscorematching,commonsupportgraphing,andcovariateimbalancetesting.BostonCollegeDepartmentofEconomics,StatisticalSoftwareComponents.Downloadablefromhttp://ideas.repec.org/c/boc/bocode/s432001.html.Rosenbaum,P.R.,andD.B.Rubin.1983.Thecentralroleofthepropensityscoreinobservationalstudiesforcausale?ects.Biometrika70:41–55.Rubin,D.B.1976.Inferenceandmissingdata.Biometrika63:581–592. M.BiaandA.Mattei Abouttheauthors 373 MichelaBiaisaresearchassistantatLaboratorioRevelli,CentreforEmploymentStudies,CollegioCarloAlberto,Turin,Italy. AlessandraMatteiisastatisticsresearchassistantintheDepartmentofStatistics,“GiuseppeParenti”,UniversityofFlorence,Italy.OnFebruary25,2008,shewonacompetitiveexamtoworkasanassistantprofessorofstatisticsinthefacultyofpsychologyattheUniversityofFlorence.