-------------------------------------------------------------------------------------------------- log: H:\Antoine\TEACH-EC475\stata-log\ps2l.log log type: text opened on: 12 Nov 2009, 21:29:16 . local myfolder "H:\Antoine\TEACH-EC475\data" . . clear . cd "`myfolder'" H:\Antoine\TEACH-EC475\data . set more off . set mem 700m Current memory allocation current memory usage settable value description (1M = 1024k) -------------------------------------------------------------------- set maxvar 5000 max. variables allowed 1.909M set memory 700M max. data space 700.000M set matsize 400 max. RHS vars in models 1.254M ----------- 703.163M . set matsize 700 Current memory allocation current memory usage settable value description (1M = 1024k) -------------------------------------------------------------------- set maxvar 5000 max. variables allowed 1.909M set memory 700M max. data space 700.000M set matsize 700 max. RHS vars in models 3.797M ----------- 705.706M . use lqdata.dta . desc Contains data from lqdata.dta obs: 10,000 vars: 7 17 Oct 2009 23:03 size: 320,000 (99.9% of memory free) ------------------------------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------ uniqid float %9.0g yearcur float %9.0g choice12 float %9.0g age float %9.0g race float %9.0g dispy float %9.0g constant float %9.0g ------------------------------------------------------------------------------------------------ Sorted by: . . /**************/ . /* question 1 */ . /**************/ . . gen lag_choice12=choice12[_n-1] (1 missing value generated) . gen lag_dispy=dispy[_n-1] (1 missing value generated) . gen lag_race=race[_n-1] (1 missing value generated) . . /* each individual is followed over 10 years */ . /* look at the boudary btw 1st and 2nd inividual */ . . list uniqid yearcur race lag_race in 10/12 +------------------------------------+ | uniqid yearcur race lag_race | |------------------------------------| 10. | 10000 1979 1 1 | 11. | 20000 1970 0 1 | 12. | 20000 1971 0 0 | +------------------------------------+ . . /* the 2nd individual can not have two different races */ . . /**************/ . /* question 2 */ . /**************/ . . /* solution a (does not use stata panel command) */ . /* process the usual lags but by individual */ . /* this creates one missing values by individual */ . . cap drop lag_choice12 lag_dispy lag_race /* detroy the var do not print */ . /* capture executes command, suppressing all its output (including error messages, if any) */ . . . so uniqid yearcur . by uniqid: gen lag_choice12=choice12[_n-1] (1000 missing values generated) . . /* this is valid if the dataset is well-balanced and if observations are every year */ . /* we may want to check that the previous observations is indeed in year t-1 */ . . cap drop lag_choice12 . so uniqid yearcur . by uniqid: gen lag_choice12=choice12[_n-1] if yearcur==yearcur[_n-1]+1 (1000 missing values generated) . . /* As we have to do that over several variables we can define a simple program */ . cap program drop mypanellag /* drop the previous program called mypanellag */ . program define mypanellag 1. /* 3 arguments: panelid timeid lagged var */ . cap drop lag_`3' /* delete the previous lag variable if exists */ 2. by `1': gen lag_`3'=`3'[_n-1] if `2'==`2'[_n-1]+1 3. end . . mypanellag uniqid yearcur choice12 /* rk: the program requires the data to be sorted */ (1000 missing values generated) . mypanellag uniqid yearcur dispy (1000 missing values generated) . mypanellag uniqid yearcur race (1000 missing values generated) . . /* solution b use the panel data commands in stata */ . . /* step 1: define the panel data set */ . . xtset uniqid yearcur /* rk: both dimensions have to be numeric variables */ panel variable: uniqid (strongly balanced) time variable: yearcur, 1970 to 1979 delta: 1 unit . xtdes /* describe the panel structure */ uniqid: 10000, 20000, ..., 10000000 n = 1000 yearcur: 1970, 1971, ..., 1979 T = 10 Delta(yearcur) = 1 unit Span(yearcur) = 10 periods (uniqid*yearcur uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 10 10 10 10 10 10 10 Freq. Percent Cum. | Pattern ---------------------------+------------ 1000 100.00 100.00 | 1111111111 ---------------------------+------------ 1000 100.00 | XXXXXXXXXX . . /* running a loop over variables */ . local mylist dispy race choice12 . for varlist `mylist',noheader: cap drop lag_X \ display "X and lag_X" /* noheader suppress var > iable header */ dispy and lag_dispy race and lag_race choice12 and lag_choice12 . . mypanellag uniqid yearcur choice12 (1000 missing values generated) . mypanellag uniqid yearcur dispy (1000 missing values generated) . mypanellag uniqid yearcur race (1000 missing values generated) . . /* using a list of variables */ . local mylist lag_dispy lag_race lag_choice12 . drop `mylist' . . sort uniqid yearcur . gen lag_choice12=l.choice12 /* "understand the panel structure " */ (1000 missing values generated) . gen lag_dispy=l.dispy (1000 missing values generated) . gen lag_race=l.race (1000 missing values generated) . . save lqdata_lag.dta, replace /* saving the lag var in a new dataset */ file lqdata_lag.dta saved . . . /**************/ . /* question 3 */ . /**************/ . . clear . use gpnl.dta . desc Contains data from gpnl.dta obs: 200 vars: 10 size: 7,600 (99.9% of memory free) ------------------------------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------ iid byte %8.0g tid byte %8.0g y float %9.0g x1 float %9.0g x2 float %9.0g z1 float %9.0g z2 float %9.0g yb float %9.0g x1b float %9.0g x2b float %9.0g ------------------------------------------------------------------------------------------------ Sorted by: . . sort iid tid . . . local mylist y x1 z1 x1b . for varlist `mylist',noheader: cap drop lagg_X \ gen lagg_X=X[_n-1] (1 missing value generated) (1 missing value generated) (1 missing value generated) (1 missing value generated) . . xtset iid tid /* rk: both dimensions have to be numeric variables */ panel variable: iid (strongly balanced) time variable: tid, 1 to 4 delta: 1 unit . xtdes /* describe the panel structure */ iid: 1, 2, ..., 50 n = 50 tid: 1, 2, ..., 4 T = 4 Delta(tid) = 1 unit Span(tid) = 4 periods (iid*tid uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 4 4 4 4 4 4 4 Freq. Percent Cum. | Pattern ---------------------------+--------- 50 100.00 100.00 | 1111 ---------------------------+--------- 50 100.00 | XXXX . . /* running a loop over variables */ . for varlist `mylist', noheader: cap drop lag_X \ gen lag_X=l.X \ display "X and lag_X" (50 missing values generated) y and lag_y (50 missing values generated) x1 and lag_x1 (50 missing values generated) z1 and lag_z1 (50 missing values generated) x1b and lag_x1b . . save gpnl_lag.dta, replace file gpnl_lag.dta saved . . . /*****************************************/ . /* Part II: Estimation and specification */ . /*****************************************/ . . /**************/ . /* question 1 */ . /**************/ . . clear . use gpnl_lag.dta . xtset iid tid panel variable: iid (strongly balanced) time variable: tid, 1 to 4 delta: 1 unit . xtdes iid: 1, 2, ..., 50 n = 50 tid: 1, 2, ..., 4 T = 4 Delta(tid) = 1 unit Span(tid) = 4 periods (iid*tid uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 4 4 4 4 4 4 4 Freq. Percent Cum. | Pattern ---------------------------+--------- 50 100.00 100.00 | 1111 ---------------------------+--------- 50 100.00 | XXXX . . su y yb x1 x1b x2 x2b z1 z2 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- y | 200 1.069324 .8219009 -1.3549 2.9008 yb | 200 1.069322 .6862345 -.6536 2.6231 x1 | 200 .0561463 .994093 -2.729 2.7635 x1b | 200 .0561466 .7020076 -1.3706 1.9552 x2 | 200 .0841514 1.954771 -6.4997 6.5373 -------------+-------------------------------------------------------- x2b | 200 .0841542 1.681308 -5.412 4.1098 z1 | 200 -.0831348 .7726174 -2.0182 1.2646 z2 | 200 -.2542387 1.509842 -3.1094 3.663 . . corr y yb x1 x1b x2 x2b z1 z2 (obs=200) | y yb x1 x1b x2 x2b z1 z2 -------------+------------------------------------------------------------------------ y | 1.0000 yb | 0.8349 1.0000 x1 | 0.1219 -0.1026 1.0000 x1b | -0.1214 -0.1453 0.7062 1.0000 x2 | -0.3721 -0.3169 0.6433 0.7006 1.0000 x2b | -0.3076 -0.3684 0.5753 0.8146 0.8601 1.0000 z1 | 0.3350 0.4012 0.1224 0.1733 0.2077 0.2415 1.0000 z2 | 0.2408 0.2884 0.2030 0.2874 0.2551 0.2966 0.8087 1.0000 . . /* a) simple pooled OLS */ . reg y x1 x2 z1 z2 Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 4, 195) = 58.63 Model | 73.3977763 4 18.3494441 Prob > F = 0.0000 Residual | 61.0309234 195 .312979094 R-squared = 0.5460 -------------+------------------------------ Adj R-squared = 0.5367 Total | 134.4287 199 .675521104 Root MSE = .55945 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .5205576 .0524162 9.93 0.000 .4171821 .6239331 x2 | -.3627779 .0269238 -13.47 0.000 -.4158771 -.3096788 z1 | .5156956 .0876549 5.88 0.000 .3428222 .688569 z2 | -.0320701 .0454355 -0.71 0.481 -.1216781 .0575379 _cons | 1.105343 .040372 27.38 0.000 1.025722 1.184965 ------------------------------------------------------------------------------ . /* b) basic fixed effect */ . xtreg y x1 x2 z1 z2,fe Fixed-effects (within) regression Number of obs = 200 Group variable: iid Number of groups = 50 R-sq: within = 0.7203 Obs per group: min = 4 between = 0.2051 avg = 4.0 overall = 0.3608 max = 4 F(2,148) = 190.60 corr(u_i, Xb) = -0.0198 Prob > F = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .5339654 .0306479 17.42 0.000 .4734013 .5945295 x2 | -.3287642 .0216329 -15.20 0.000 -.3715135 -.2860149 z1 | (dropped) z2 | (dropped) _cons | 1.06701 .0197076 54.14 0.000 1.028065 1.105954 -------------+---------------------------------------------------------------- sigma_u | .61679116 sigma_e | .27737809 rho | .83178037 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(49, 148) = 13.17 Prob > F = 0.0000 . areg y x1 x2 z1 z2,absorb(iid) /* provides the same result, "absorbing" the individual effects > */ Linear regression, absorbing indicators Number of obs = 200 F( 2, 148) = 190.60 Prob > F = 0.0000 R-squared = 0.9153 Adj R-squared = 0.8861 Root MSE = .27738 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .5339654 .0306479 17.42 0.000 .4734013 .5945295 x2 | -.3287642 .0216329 -15.20 0.000 -.3715135 -.2860149 z1 | (dropped) z2 | (dropped) _cons | 1.06701 .0197076 54.14 0.000 1.028065 1.105954 -------------+---------------------------------------------------------------- iid | F(49, 148) = 13.168 0.000 (50 categories) . /* c) between estimator */ . xtreg y x1 x2 z1 z2,be Between regression (regression on group means) Number of obs = 200 Group variable: iid Number of groups = 50 R-sq: within = 0.7051 Obs per group: min = 4 between = 0.4748 avg = 4.0 overall = 0.5445 max = 4 F(4,45) = 10.17 sd(u_i + avg(e_i.))= .5229012 Prob > F = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .4962882 .1856537 2.67 0.010 .1223624 .8702139 x2 | -.3692911 .077317 -4.78 0.000 -.5250155 -.2135667 z1 | .5119962 .1657589 3.09 0.003 .1781405 .8458519 z2 | -.0251447 .086411 -0.29 0.772 -.1991854 .1488961 _cons | 1.108707 .0758009 14.63 0.000 .9560364 1.261378 ------------------------------------------------------------------------------ . /* d) random effects one factor model */ . xtreg y x1 x2 z1 z2,re Random-effects GLS regression Number of obs = 200 Group variable: iid Number of groups = 50 R-sq: within = 0.7200 Obs per group: min = 4 between = 0.4660 avg = 4.0 overall = 0.5420 max = 4 Random effects u_i ~ Gaussian Wald chi2(4) = 422.55 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .5321243 .0301627 17.64 0.000 .4730064 .5912422 x2 | -.3364102 .0201916 -16.66 0.000 -.3759851 -.2968354 z1 | .5172175 .1628366 3.18 0.001 .1980636 .8363714 z2 | -.0429552 .0835552 -0.51 0.607 -.2067205 .1208101 _cons | 1.099834 .0749927 14.67 0.000 .9528514 1.246817 -------------+---------------------------------------------------------------- sigma_u | .50417358 sigma_e | .27737809 rho | .76764806 (fraction of variance due to u_i) ------------------------------------------------------------------------------ . /* section 39.6 of lecture notes */ . /* using lambda_i=1-sqrt(sigma2_v/(T_i*sigma2_a+sigma2_v)) and y-lambda_i*ybari ... */ . /* need first to estimate lambda_hat */ . /* this can be done using the between model */ . . local mylist y x1 x2 z1 z2 . for varlist `mylist', noheader: cap drop avt_X \ by iid: egen avt_X=mean(X) . . . . /* e) hausman taylor */ . xthtaylor y x1 x2 z1 z2, endog(x2 z2) Hausman-Taylor estimation Number of obs = 200 Group variable: iid Number of groups = 50 Obs per group: min = 4 avg = 4 max = 4 Random effects u_i ~ i.i.d. Wald chi2(4) = 390.70 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- TVexogenous | x1 | .5339654 .0308308 17.32 0.000 .4735382 .5943926 TVendogenous | x2 | -.3287642 .021762 -15.11 0.000 -.3714169 -.2861116 TIexogenous | z1 | 1.041398 .6239737 1.67 0.095 -.1815683 2.264364 TIendogenous | z2 | -.3773716 .3907571 -0.97 0.334 -1.143242 .3884984 | _cons | 1.057644 .0971932 10.88 0.000 .8671483 1.248139 -------------+---------------------------------------------------------------- sigma_u | .56643347 sigma_e | .2755227 rho | .80866815 (fraction of variance due to u_i) ------------------------------------------------------------------------------ Note: TV refers to time varying; TI refers to time invariant. . /* rk: this is the one step version of HT !!!*/ . . /**************/ . /* question 3 */ . /**************/ . . /* Hausman test of FE(1b) vs RE-GLS one factor(1d) */ . . xtreg y x1 x2 z1 z2,fe /* 1b */ Fixed-effects (within) regression Number of obs = 200 Group variable: iid Number of groups = 50 R-sq: within = 0.7203 Obs per group: min = 4 between = 0.2051 avg = 4.0 overall = 0.3608 max = 4 F(2,148) = 190.60 corr(u_i, Xb) = -0.0198 Prob > F = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .5339654 .0306479 17.42 0.000 .4734013 .5945295 x2 | -.3287642 .0216329 -15.20 0.000 -.3715135 -.2860149 z1 | (dropped) z2 | (dropped) _cons | 1.06701 .0197076 54.14 0.000 1.028065 1.105954 -------------+---------------------------------------------------------------- sigma_u | .61679116 sigma_e | .27737809 rho | .83178037 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(49, 148) = 13.17 Prob > F = 0.0000 . estimates store FE . xtreg y x1 x2 z1 z2,re /* 1d */ Random-effects GLS regression Number of obs = 200 Group variable: iid Number of groups = 50 R-sq: within = 0.7200 Obs per group: min = 4 between = 0.4660 avg = 4.0 overall = 0.5420 max = 4 Random effects u_i ~ Gaussian Wald chi2(4) = 422.55 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .5321243 .0301627 17.64 0.000 .4730064 .5912422 x2 | -.3364102 .0201916 -16.66 0.000 -.3759851 -.2968354 z1 | .5172175 .1628366 3.18 0.001 .1980636 .8363714 z2 | -.0429552 .0835552 -0.51 0.607 -.2067205 .1208101 _cons | 1.099834 .0749927 14.67 0.000 .9528514 1.246817 -------------+---------------------------------------------------------------- sigma_u | .50417358 sigma_e | .27737809 rho | .76764806 (fraction of variance due to u_i) ------------------------------------------------------------------------------ . estimates store GLS . hausman FE GLS ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | FE GLS Difference S.E. -------------+---------------------------------------------------------------- x1 | .5339654 .5321243 .0018411 .0054318 x2 | -.3287642 -.3364102 .007646 .0077641 ------------------------------------------------------------------------------ b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 1.13 Prob>chi2 = 0.5693 . . /* BACK TO PS1 */ . /* RUNNING A LOOP ON VARIABLES */ . local varlist y x1 x2 . foreach varname in `varlist' { 2. display "the dependent variable is " "`varname'" 3. local myexpl=itrim(subinword("`varlist'","`varname'","",.)) 4. display "`varlist'" 5. display "the explanatory variables are " "`myexpl'" 6. local myexpl `myexpl' 7. reg `varname' `myexpl' 8. display "*****************************************" 9. } the dependent variable is y y x1 x2 the explanatory variables are x1 x2 Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 55.68 Model | 48.5497651 2 24.2748825 Prob > F = 0.0000 Residual | 85.8789346 197 .435933678 R-squared = 0.3612 -------------+------------------------------ Adj R-squared = 0.3547 Total | 134.4287 199 .675521104 Root MSE = .66025 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .5096367 .0614978 8.29 0.000 .3883583 .6309152 x2 | -.323182 .0312745 -10.33 0.000 -.3848578 -.2615062 _cons | 1.067906 .0467635 22.84 0.000 .9756844 1.160127 ------------------------------------------------------------------------------ ***************************************** the dependent variable is x1 y x1 x2 the explanatory variables are y x2 Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 128.13 Model | 111.185433 2 55.5927166 Prob > F = 0.0000 Residual | 85.4705207 197 .433860511 R-squared = 0.5654 -------------+------------------------------ Adj R-squared = 0.5610 Total | 196.655954 199 .988220874 Root MSE = .65868 ------------------------------------------------------------------------------ x1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- y | .5072131 .0612053 8.29 0.000 .3865114 .6279147 x2 | .4065144 .0257343 15.80 0.000 .3557643 .4572645 _cons | -.5204374 .081012 -6.42 0.000 -.6801995 -.3606754 ------------------------------------------------------------------------------ ***************************************** the dependent variable is x2 y x1 x2 the explanatory variables are y x1 Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 160.64 Model | 471.377667 2 235.688834 Prob > F = 0.0000 Residual | 289.026973 197 1.46714199 R-squared = 0.6199 -------------+------------------------------ Adj R-squared = 0.6160 Total | 760.40464 199 3.82112885 Root MSE = 1.2113 ------------------------------------------------------------------------------ x2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- y | -1.087674 .1052549 -10.33 0.000 -1.295245 -.8801034 x1 | 1.374668 .0870231 15.80 0.000 1.203052 1.546285 _cons | 1.170045 .1410437 8.30 0.000 .8918954 1.448194 ------------------------------------------------------------------------------ ***************************************** . . . . . log close log: H:\Antoine\TEACH-EC475\stata-log\ps2l.log log type: text closed on: 12 Nov 2009, 21:29:18 ------------------------------------------------------------------------------------------------