AbstractNoticeable growth in the use of intelligent devices has resulted in the generation of vast amounts of data from sensor devices. When dealing with large amounts of data, it is common to observe databases with large amounts of missing values. This is a challenge for data miners because various methods for data analysis only work well on complete databases. A traditional approach to handling missing data is to discard instances of missing values and only use complete cases for analysis. However, research has shown that this approach is not practical especially when large amounts of data are missing. This led to an increased need to develop strategies for replacing missing values with plausible values through imputation. Also, as more sensitive data is also being generated, research has shown the need for more secure and private approaches to pre-processing data. This thesis proposes imputation strategies called k−BFMVI and med.BFMVI for recovering missing values before training downstream regression and classification models respectively. An Average Site Mixture (AvSM) model is further developed to simulate secure missing data recovery for IoT applications using IOTA.
Experiments simulated missingness from 10% to 40% using MCAR and MAR mechanisms. Missing values were further imputed using benchmark techniques and their performance was cross-validated for downstream regression and classification tasks. To simulate distributed settings, missing values were also explored, showing variations in the information held across distributed sites for IoT applications using IOTA.
|Date of Award
|27 Feb 2023
|Richard Hill (Main Supervisor), George Bargiannis (Co-Supervisor) & Muhammad Hussain (Co-Supervisor)