Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology

Pritesh Mistry, Daniel Neagu, Paul R. Trundle, Jonathan D. Vessey

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.
Original languageEnglish
Pages (from-to)2967-2979
Number of pages13
JournalSoft Computing
Volume20
Issue number8
Early online date20 Nov 2015
DOIs
Publication statusPublished - 1 Aug 2016
Externally publishedYes

Fingerprint

Toxicology
Random Forest
Decision trees
Decision tree
Drugs
Prediction
Toxicity
Learning systems
Machine Learning
Model
Model-based
Functional Relationship
Methodology
Data mining
Data Mining
Classifiers
Classifier
Industry
Minimise
Predict

Cite this

Mistry, Pritesh ; Neagu, Daniel ; Trundle, Paul R. ; Vessey, Jonathan D. / Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. In: Soft Computing. 2016 ; Vol. 20, No. 8. pp. 2967-2979.
@article{9b93da02065a4f9c9524d12b9097e013,
title = "Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology",
abstract = "Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 {\%} using random forest models whilst the decision tree models produce accuracies in the 70 {\%} region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.",
author = "Pritesh Mistry and Daniel Neagu and Trundle, {Paul R.} and Vessey, {Jonathan D.}",
year = "2016",
month = "8",
day = "1",
doi = "10.1007/s00500-015-1925-9",
language = "English",
volume = "20",
pages = "2967--2979",
journal = "Soft Computing",
issn = "1432-7643",
publisher = "Springer Verlag",
number = "8",

}

Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. / Mistry, Pritesh; Neagu, Daniel; Trundle, Paul R.; Vessey, Jonathan D.

In: Soft Computing, Vol. 20, No. 8, 01.08.2016, p. 2967-2979.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology

AU - Mistry, Pritesh

AU - Neagu, Daniel

AU - Trundle, Paul R.

AU - Vessey, Jonathan D.

PY - 2016/8/1

Y1 - 2016/8/1

N2 - Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.

AB - Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.

U2 - 10.1007/s00500-015-1925-9

DO - 10.1007/s00500-015-1925-9

M3 - Article

VL - 20

SP - 2967

EP - 2979

JO - Soft Computing

JF - Soft Computing

SN - 1432-7643

IS - 8

ER -