Abstract
Noticeable growth in the use of intelligent devices has resulted in the generation of vast amounts of data from sensor devices. When dealing with large amounts of data, it is common to observe databases with large amounts of missing values. This is a challenge for data miners because various methods for data analysis only work well on complete databases. A traditional approach to handling missing data is to discard instances of missing values and only use complete cases for analysis. However, research has shown that this approach is not practical especially when large amounts of data are missing. This led to an increased need to develop strategies for replacing missing values with plausible values through imputation. This study presents an imputation strategy called <italic>med.BFMVI</italic> for recovering missing values before training downstream classification models. Experiments simulated missingness from 10% to 40% using MCAR and MAR mechanisms and the performance of the proposed technique was measured against state-of-the-art techniques. Overall, the proposed algorithm recorded the best imputation accuracy as opposed to benchmark techniques and showed significant improvements on downstream learning.
Original language | English |
---|---|
Article number | 10256187 |
Pages (from-to) | 102935-102943 |
Number of pages | 9 |
Journal | IEEE Access |
Volume | 11 |
Early online date | 20 Sep 2023 |
DOIs | |
Publication status | Published - 26 Sep 2023 |