Skip to main navigation Skip to search Skip to main content

Imputation-Aware Weather Forecasting Using Machine Learning

  • Johnson Okezie

Student thesis: Master's Thesis

Abstract

In real-world weather time series, handling missing values is critical, as inadequate interpolation or imputation can distort underlying dynamics, shift what is forecast in practice, and ultimately affect downstream model outputs. This is an operationally significant issue examined in this thesis, with a focus on daily station data. Prior studies often rely on synthetic settings, single variables, single imputation methods, narrow missingness levels, and limited exploration of Kalman variants, with little end-to-end assessment of how imputation choices propagate to accuracy and realism. This thesis applies machine learning and data mining methods to develop a leakagesafe, time-aware pipeline (causal lags, rolling means, and train-only transforms), and evaluates performance on held-out test splits using RMSE, R², and a smoothness rate (SR) that screens out flat predictions and avoids selecting over-smoothed or excessively jagged outputs. Comparing Linear Interpolation (LI) with a Kalman Filter Local Linear Trend (KMFLLT) for temperature yields practically meaningful gains: RMSE drops by roughly one-third on average, R² rises by ≈ 0.16, and SR increases by ≈ 0.45, restoring day-to-day variability that appears physically plausible. Under KMFLLT-imputed temperature features, linear and ensemble learners (LR/GB/RF) perform best, whereas for persistence-dominated variables such as pressure and wind, ARIMA is more suitable; for intermittent targets such as rain and snow, an occurrence-intensity framing is preferable to point regression. A season classifier trained only on past-safe features achieves ≈ 0.73 accuracy, indicating a strong embedded regime signal and showing that the pipeline does not disrupt seasonal structure; instead, it preserves and strengthens it. Overall, this thesis contributes an imputation-aware, station-level forecasting framework that improves accuracy, preserves temporal realism, and supports reliable decision-making, with a reusable pipeline that is operationally viable across forecast desks, IoT systems, and AI services.
Date of Award24 Apr 2026
Original languageEnglish
SupervisorAbirami Gunasekaran (Main Supervisor) & Gary Allen (Co-Supervisor)

Cite this

'