Abstract
Background
Missing data is a common occurrence in survey data. It arises in the analysis whenever one or more of the sequences of measurements from subjects within the study are incomplete. There are various approaches proposed to handle missing data. However, most of the survey studies ignore the missing observations during parameter estimation even though ignoring them could lead to wrong estimations and statistical hypotheses. This study aimed to compare the effects of ignoring missing observations in the data using multiple imputations methods.
Methods
The efficiency of different imputation techniques was applied to the nutritional status of under-five children in Ethiopia using the 2016 Demographic and Healthy Survey dataset. There were a total of 10,641 under-five children in the survey data; however 7960 under-five children were used in the multiple imputation analyses. Multiple imputations were considered to allow for the uncertainty about the missing data by creating several different plausible imputed datasets and providing unbiased and valid estimates of associations based on information taken from the available data.
Results
Among the variables considered in the study, the highest percentages of missing observations were found at the zone level (58.0%) and the least missing observation was found father’s educational level (4.06%). The standard error of missed value analysis of zones variable was (stunting = 0.05, wasting = 0.06, and underweight = 0.05). But, using multiple imputed methods, estimates of zones variable were (stunting = 0.004, wasting = 0.003, underweight = 0.001) showing reduced estimation errors. Similarly, using the multiple imputation approach M (Multiple imputation) = 5(Imputed value) on stunting, wasting, and underweight, the precise parameter estimates for zones (3rd administrative level) variables were estimated to be 0.003, 0.001, and 0.004, respectively.
Conclusions
Ignoring missing observations in the data analysis produce biased results in the statistical hypothesis and conclusions. Using Markov chain Monte Carlo (MCMC) imputation approaches in the analysis accounting for missed observations results decreased standard error and yield valid statistical inferences.
Missing data is a common occurrence in survey data. It arises in the analysis whenever one or more of the sequences of measurements from subjects within the study are incomplete. There are various approaches proposed to handle missing data. However, most of the survey studies ignore the missing observations during parameter estimation even though ignoring them could lead to wrong estimations and statistical hypotheses. This study aimed to compare the effects of ignoring missing observations in the data using multiple imputations methods.
Methods
The efficiency of different imputation techniques was applied to the nutritional status of under-five children in Ethiopia using the 2016 Demographic and Healthy Survey dataset. There were a total of 10,641 under-five children in the survey data; however 7960 under-five children were used in the multiple imputation analyses. Multiple imputations were considered to allow for the uncertainty about the missing data by creating several different plausible imputed datasets and providing unbiased and valid estimates of associations based on information taken from the available data.
Results
Among the variables considered in the study, the highest percentages of missing observations were found at the zone level (58.0%) and the least missing observation was found father’s educational level (4.06%). The standard error of missed value analysis of zones variable was (stunting = 0.05, wasting = 0.06, and underweight = 0.05). But, using multiple imputed methods, estimates of zones variable were (stunting = 0.004, wasting = 0.003, underweight = 0.001) showing reduced estimation errors. Similarly, using the multiple imputation approach M (Multiple imputation) = 5(Imputed value) on stunting, wasting, and underweight, the precise parameter estimates for zones (3rd administrative level) variables were estimated to be 0.003, 0.001, and 0.004, respectively.
Conclusions
Ignoring missing observations in the data analysis produce biased results in the statistical hypothesis and conclusions. Using Markov chain Monte Carlo (MCMC) imputation approaches in the analysis accounting for missed observations results decreased standard error and yield valid statistical inferences.
Original language | English |
---|---|
Publisher | Research Square |
Number of pages | 31 |
DOIs | |
Publication status | Published - 3 May 2023 |