A Knowledge Engineering Approach for Designing Data Analysis Pipelines

  • Michalis Georgiou

Student thesis: Doctoral Thesis

Abstract

Automated machine learning (AutoML) systems enable not-technical people to utilise machine learning techniques to their problems, by building and executing the entire machine learning pipeline process. However, typical AutoML systems lack the ability of describing and explaining their background process and their models predictions, making their entire procedure difficult to be interpreted or explained to non-technical people. This work introduces OntoAML, which is an AutoML system that utilizes Semantic Web technologies such as ontology languages and Semantic Web Rule Language to build machine learning pipelines for binary classification and regression problems. In addition, OntoAML provides explainable reports and methods either to explain the best machine learning pipeline, or to explain the best machine learning pipeline's predictions, making the entire process interpretable and explainable to non-technical users. OntoAML allows three settings, namely interpretability, performance and speed, based on user preferences. Each provided OntoAML setting was first evaluated on binary classification and regression tasks, and then the best OntoAML setting was compared to state-of-the-art systems TPOT and Auto-Sklearn. OntoAML achieved comparable results in both binary classification and regression tasks, proving that OntoAML can be a competitive contender against state-of-the-art AutoML systems.
Date of Award5 Aug 2024
Original languageEnglish
SupervisorIlias Tachmazidis (Main Supervisor) & Grigoris Antoniou (Co-Supervisor)

Cite this

'