Prediction of diabetes prescription volumes of various geographies using regression techniques

Research output: Contribution to journalArticlepeer-review


Background: Increasing diabetes prevalence is a major public health concern. In this study we ask whether linked open data can be used to predict prescription volumes of drugs used in the treatment of diabetes across small geographies of England. Methods: We propose and demonstrate a methodology of utilising publicly available open data to infer the geo-spatial distribution of prescribed drugs for diabetes, at the lower layer super output area level. Multiple datasets are acquired, processed, and linked together, enabling a more in-depth analysis. Combining these linked datasets with published deprivation factors of geographies across England, we build highly predictive regression models. Results: Regression models were trained and are capable of accurately predicting diabetes prescribing volumes based on deprivation indicators of various geographies across England. Models built with data covering the city of Bradford, England, produced a predicted against actual correlation value of R2 = 0.672 using multiple linear regression and 0.775 using Least Absolute Shrinkage and Selection Operator (LASSO). Median age and air quality factors proved to be significant markers for diabetes prescribing. Conclusions: The results of this study suggest our methodology is robust and accurate. Such predictive models are useful to health authorities in light of increasing costs and increasing prevalence of diabetes. While using publicly available open data negates any issues of data privacy.

Original languageEnglish
Number of pages21
JournalHealth Informatics Journal
Issue number1
Early online date24 Jan 2023
Publication statusE-pub ahead of print - 24 Jan 2023


Dive into the research topics of 'Prediction of diabetes prescription volumes of various geographies using regression techniques'. Together they form a unique fingerprint.

Cite this