*Set the number of observations and the seed
set obs 1000
set seed 1
*Generate female, experience and lwage
generate female=(uniform()<0.5)
generate experience=4*invchi2(5,uniform())*(1-0.2*female)
generate lwage=2+experience*0.03 ///
-0.1*female+invnormal(uniform())*(0.6-0.2*female)
*Decomposition of the median difference in lwage between men and women.
*We use 100 quantile regression in the first step and we don't estimate
*the standard errors (both are default values):
rqdeco lwage experience, by(female) quantile(0.5)
*Interpretation: the observed median gender gap is 31%.
*About 17% is explained by gender differences in the distribution of
*experience. About 14% is due to differing coefficients between men
*and women and can be interpreted as discrimination.
*Decomposition of the 99 percentile differences in lwage between men
*and women. We estimate 100 quantile regression in the first step and
*estimate the standard errors by bootstraping the results 100 times.
*We don't require a print of the results:
rqdeco lwage experience, by(female) qlow(0.01) qhigh(0.99) ///
qstep(0.01) vce(boot) reps(100) noprint
*We can find the point estimates in the matrix r(results):
matrix list r(results)
*We can find the standard errors in the matrix r(se):
matrix list r(se)
*We prepare the data to plot the results:
matrix results=r(results)
matrix se=r(se)
svmat results, names(col)
svmat se, names(col)
*We plot the decomposition as a function of the quantile:
twoway (line total_differential quantile) ///
(line characteristics quantile) (line coefficients quantile), ///
title(Decomposition of differences in distribution) ///
ytitle(Log wage effects) xtitle(Quantile) ///
legend(order(1 "Total differential" ///
2 "Effects of characteristics" 3 "Effects of coefficients"))
*Interpretation: the observed gap is increasing (in absolute value)
*when we move up on the wage distribution. Actually, women are
*positively discriminated at the bottom of the distribution. Both
*the experience distribution and the coefficients are responsible
*for this fact. The experience distribution is less dispersed for
*women than for men. The residuals also are less dispersed for
*women than for men. Quantitatively, the second effect is more
*important than the first one. Looking at these results, we can
*write that there is a glass ceiling effect for women: the
*discrimination increases as we move up on the wage distribution.
*We prepare the data to plot a 95% confidence interval for the
*effects of coefficients (discrimination):
generate lo_coef=coefficients-1.96*se_coefficients
generate hi_coef=coefficients+1.96*se_coefficients
*We plot the effects of coefficients with a 95% confidence interval:
twoway (rarea hi_coef lo_coef quantile, bcolor(gs13) legend(off)) ///
(line coefficients quantile, ///
title(Effects of coefficients (discrimination)) ///
ytitle(Log wage effects) xtitle(Quantile))
/********************************************************************
How to calculate the interdecile range (difference between the
the 9th decile and the 1st decile) and its standard error
*********************************************************************/
*Decomposition of the 99 percentile differences in lwage between men
*and women. We estimate 100 quantile regression in the first step and
*estimate the standard errors by bootstraping the results 100 times.
*We save the bootstrap results in the file "C:/ado/personal/rqdeco_boot".
*We don't require a print of the results:
rqdeco lwage experience, by(female) qlow(0.01) qhigh(0.99) ///
qstep(0.01) vce(boot) reps(100) noprint ///
saving("C:/ado/personal/rqdeco_boot", replace)
*The results are saved in the matrix r(results)
mat results=r(results)
*The 90-10 ranges can be calculated using this matrix:
*Q90-q10 for the fitted_1 distribution
sca q90q10_fitted1=results[90,2]-results[10,2]
*Q90-q10 for the counterfactual distribution
sca q90q10_counter=results[90,3]-results[10,3]
*Q90-q10 for the fitted_0 distribution
sca q90q10_fitted0=results[90,4]-results[10,4]
*Decomposition of the q90-q10 range:
*total difference
sca q90q10_tot=q90q10_fitted1-q90q10_fitted0
*characteristics
sca q90q10_coef=q90q10_fitted1-q90q10_counter
*coefficients
sca q90q10_char=q90q10_counter-q90q10_fitted0
sca dir
*Standard errors
*open the bootstrap results
preserve
use "C:\ado\Personal\rqdeco_boot.dta", clear
*generate the q90q10 ranges for each bootstrap draw
generate q90_q10_fitted1= X1_C1_90- X1_C1_10
generate q90_q10_fitted0= X0_C0_90- X0_C0_10
generate q90_q10_counter= X1_C0_90- X1_C0_10
*generate the effects for each bootstrap draw
generate q90_q10_tot=q90_q10_fitted1-q90_q10_fitted0
generate q90_q10_coef=q90_q10_fitted1-q90_q10_counter
generate q90_q10_char=q90_q10_counter-q90_q10_fitted0
*The simplest way to obtain the standard errors consists in taking
*the standard errors over the bootstrap draws. (Alternatively, confidence
*intervals may be obtain by the percentiles of the effects over the
*draws)
sum q90_q10_fitted1 q90_q10_fitted0 q90_q10_counter q90_q10_tot q90_q10_coef q90_q10_char