Multiple imputation procedures provide a way to deal with missing values on the variable current net labor income in Euros by using information about determinants of the household income and replacing item-nonresponse with multiply imputed data. Five imputations are available within the $PGEN datasets: the variables pgi1labnet-pgi5labnet. The imputations were calculated using the method of chained equations predictive mean matching in STATA. The procedures were written by Patrick Royston (see Royston 2004, 2005a, 2005b, 2007, 2009) and Ian White (see White, Daniel and Royston 2010; White, Royston and Wood 2011). Predicted mean matching means that for each missing observation on income, the particular non-missing observation is found whose prediction on observed data is closest. This closest observation is used to impute the missing value. The most important variable for modelling the current net labor income is the gross labor income of the previous year. A complete list of the variables used for modelling is available upon request.
The missing observations were assumed to be missing at random. We set the number of imputations m=5 and get 5 multiple imputed values for pglabnet. The number of iterations carried out in each prediction model was specified to be 2000. Sample E&I and the supplementary sample S1 were imputed separately.
Analysing multiply imputed data: For analysing multiple imputed data, one does not necessarily need special methods; however, such tools exist and simplify the use of multiply imputed data. Below is given a short overview of some useful tools for various statistical packages. These tools estimate the parameters of a regression model by combining the estimates across the several replicates of imputation. Point estimates from multiple imputations are then the arithmetic mean of the several point estimates obtained from analysis on each imputed data. Standard errors are obtained by combining the average of the squared standard errors of the several (m) estimates with the within-and between-imputation variance.
· STATA provides a built-in functionality called mi.
· Within SAS, PROC MIANALYZE combines the results of analyses on the data sets.
· IVEware is a set of routines that can be launched from SAS or run independently using data from many sources. You can use the IVEware module regress to perform multiple imputation analysis.
Royston, P. 2004. Multiple imputation of missing values. Stata Journal 4: 227–241.
Royston, P. 2005a. Multiple imputation of missing values: Update. Stata Journal 5: 188–201.
Royston, P. 2005b. Multiple imputation of missing values: Update of ice. Stata Journal 5: 527–536.
Royston, P. 2007. Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445–464.
Royston, P. 2009. Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables. Stata Journal 9: 466–477.
Multiple imputation procedures provide a way to deal with missing values on the variable current net labor income in Euros by using information about determinants of the household income and replacing item-nonresponse with multiply imputed data. Five imputations are available within the $PGEN datasets: the variables pgi1labnet-pgi5labnet. The imputations were calculated using the method of chained equations predictive mean matching in STATA. The procedures were written by Patrick Royston (see Royston 2004, 2005a, 2005b, 2007, 2009) and Ian White (see White, Daniel and Royston 2010; White, Royston and Wood 2011). Predicted mean matching means that for each missing observation on income, the particular non-missing observation is found whose prediction on observed data is closest. This closest observation is used to impute the missing value. The most important variable for modelling the current net labor income is the gross labor income of the previous year. A complete list of the variables used for modelling is available upon request.
The missing observations were assumed to be missing at random. We set the number of imputations m=5 and get 5 multiple imputed values for pglabnet. The number of iterations carried out in each prediction model was specified to be 2000. Sample E&I and the supplementary sample S1 were imputed separately.
Analysing multiply imputed data: For analysing multiple imputed data, one does not necessarily need special methods; however, such tools exist and simplify the use of multiply imputed data. Below is given a short overview of some useful tools for various statistical packages. These tools estimate the parameters of a regression model by combining the estimates across the several replicates of imputation. Point estimates from multiple imputations are then the arithmetic mean of the several point estimates obtained from analysis on each imputed data. Standard errors are obtained by combining the average of the squared standard errors of the several (m) estimates with the within-and between-imputation variance.
· STATA provides a built-in functionality called mi.
· Within SAS, PROC MIANALYZE combines the results of analyses on the data sets.
· IVEware is a set of routines that can be launched from SAS or run independently using data from many sources. You can use the IVEware module regress to perform multiple imputation analysis.
Royston, P. 2004. Multiple imputation of missing values. Stata Journal 4: 227–241.
Royston, P. 2005a. Multiple imputation of missing values: Update. Stata Journal 5: 188–201.
Royston, P. 2005b. Multiple imputation of missing values: Update of ice. Stata Journal 5: 527–536.
Royston, P. 2007. Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445–464.
Royston, P. 2009. Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables. Stata Journal 9: 466–477.