The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot

David M. Andrews, Laura M. Broad, Paul J. Edwards, David N. A. Fox, Timothy Gallagher, Stephen L. Garland, Richard Kidd, Joseph B. Sweeney

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70% were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5–80% depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18% of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses.
LanguageEnglish
Pages3869-3878
JournalChemical Science
Volume7
Issue number6
DOIs
Publication statusPublished - 23 Feb 2016

Fingerprint

Screening
Pharmaceutical Preparations
Profitability
Molecules
Drug Discovery
compound 18

Cite this

Andrews, D. M., Broad, L. M., Edwards, P. J., Fox, D. N. A., Gallagher, T., Garland, S. L., ... Sweeney, J. B. (2016). The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot. Chemical Science, 7(6), 3869-3878. https://doi.org/10.1039/C6SC00264A
Andrews, David M. ; Broad, Laura M. ; Edwards, Paul J. ; Fox, David N. A. ; Gallagher, Timothy ; Garland, Stephen L. ; Kidd, Richard ; Sweeney, Joseph B. / The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot. In: Chemical Science. 2016 ; Vol. 7, No. 6. pp. 3869-3878.
@article{d1bec6b45ebe4c38953870f8996c4091,
title = "The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot",
abstract = "We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70{\%} were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5–80{\%} depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18{\%} of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses.",
author = "Andrews, {David M.} and Broad, {Laura M.} and Edwards, {Paul J.} and Fox, {David N. A.} and Timothy Gallagher and Garland, {Stephen L.} and Richard Kidd and Sweeney, {Joseph B.}",
year = "2016",
month = "2",
day = "23",
doi = "10.1039/C6SC00264A",
language = "English",
volume = "7",
pages = "3869--3878",
journal = "Chemical Science",
issn = "2041-6520",
publisher = "Royal Society of Chemistry",
number = "6",

}

Andrews, DM, Broad, LM, Edwards, PJ, Fox, DNA, Gallagher, T, Garland, SL, Kidd, R & Sweeney, JB 2016, 'The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot', Chemical Science, vol. 7, no. 6, pp. 3869-3878. https://doi.org/10.1039/C6SC00264A

The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot. / Andrews, David M.; Broad, Laura M.; Edwards, Paul J.; Fox, David N. A.; Gallagher, Timothy; Garland, Stephen L.; Kidd, Richard; Sweeney, Joseph B.

In: Chemical Science, Vol. 7, No. 6, 23.02.2016, p. 3869-3878.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot

AU - Andrews, David M.

AU - Broad, Laura M.

AU - Edwards, Paul J.

AU - Fox, David N. A.

AU - Gallagher, Timothy

AU - Garland, Stephen L.

AU - Kidd, Richard

AU - Sweeney, Joseph B.

PY - 2016/2/23

Y1 - 2016/2/23

N2 - We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70% were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5–80% depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18% of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses.

AB - We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70% were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5–80% depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18% of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses.

U2 - 10.1039/C6SC00264A

DO - 10.1039/C6SC00264A

M3 - Article

VL - 7

SP - 3869

EP - 3878

JO - Chemical Science

T2 - Chemical Science

JF - Chemical Science

SN - 2041-6520

IS - 6

ER -

Andrews DM, Broad LM, Edwards PJ, Fox DNA, Gallagher T, Garland SL et al. The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot. Chemical Science. 2016 Feb 23;7(6):3869-3878. https://doi.org/10.1039/C6SC00264A