Model-based comparison on multiple imputations and missing data analysis on nutritional status among under-five children in Ethiopia

Kenenisa Abdisa Kuse, Dereje Danbe Debeko, Richard Gyan Aboagye, Precious A Duodu, Abdul-Aziz Seidu, Bright Opoku Ahinkorah

Research output: Working paperPreprint



Missing data is a common occurrence in survey data. It arises in the analysis whenever one or more of the sequences of measurements from subjects within the study are incomplete. There are various approaches proposed to handle missing data. However, most of the survey studies ignore the missing observations during parameter estimation even though ignoring them could lead to wrong estimations and statistical hypotheses. This study aimed to compare the effects of ignoring missing observations in the data using multiple imputations methods.


The efficiency of different imputation techniques was applied to the nutritional status of under-five children in Ethiopia using the 2016 Demographic and Healthy Survey dataset. There were a total of 10,641 under-five children in the survey data; however 7960 under-five children were used in the multiple imputation analyses. Multiple imputations were considered to allow for the uncertainty about the missing data by creating several different plausible imputed datasets and providing unbiased and valid estimates of associations based on information taken from the available data.


Among the variables considered in the study, the highest percentages of missing observations were found at the zone level (58.0%) and the least missing observation was found father’s educational level (4.06%). The standard error of missed value analysis of zones variable was (stunting = 0.05, wasting = 0.06, and underweight = 0.05). But, using multiple imputed methods, estimates of zones variable were (stunting = 0.004, wasting = 0.003, underweight = 0.001) showing reduced estimation errors. Similarly, using the multiple imputation approach M (Multiple imputation) = 5(Imputed value) on stunting, wasting, and underweight, the precise parameter estimates for zones (3rd administrative level) variables were estimated to be 0.003, 0.001, and 0.004, respectively.


Ignoring missing observations in the data analysis produce biased results in the statistical hypothesis and conclusions. Using Markov chain Monte Carlo (MCMC) imputation approaches in the analysis accounting for missed observations results decreased standard error and yield valid statistical inferences.
Original languageEnglish
PublisherResearch Square
Number of pages31
Publication statusE-pub ahead of print - 3 May 2023

Cite this