Learning from text-based close call data

Peter Hughes, Miguel Figueres Esteban, Coen Van Gulijk

Research output: Contribution to journalArticle

Abstract

A key feature of big data is the variety of data sources that are available; which include not just numerical data but also image or video data or even free text. The GB railways collects a large volume of free text data daily from railway workers describing close call hazard reports: instances where an accident could have – but did not – occur. These close call reports contain valuable safety information which could be useful in managing safety on the railway, but which can be lost in the very large volume of data – much larger than is viable for a human analyst to read. This paper describes the application of rudimentary natural language processing (NLP) techniques to uncover safety information from close calls. The analysis has proven that basic information extraction is possible using the rudimentary techniques, but has also identified some limitations that arise using only basic techniques. Using these findings further research in this area intends to look at how the techniques that have been proven to date can be improved with the use of more advanced NLP techniques coupled with machine-learning.
Original languageEnglish
Pages (from-to)184-198
Number of pages15
JournalSafety and Reliability
Volume36
Issue number3
Early online date29 Nov 2016
DOIs
Publication statusPublished - 2016

Fingerprint

Processing
Learning systems
Hazards
Accidents
Big data

Cite this

Hughes, Peter ; Figueres Esteban, Miguel ; Van Gulijk, Coen. / Learning from text-based close call data. In: Safety and Reliability. 2016 ; Vol. 36, No. 3. pp. 184-198.
@article{6b300c09334c4a5cbea310c45fcc1a24,
title = "Learning from text-based close call data",
abstract = "A key feature of big data is the variety of data sources that are available; which include not just numerical data but also image or video data or even free text. The GB railways collects a large volume of free text data daily from railway workers describing close call hazard reports: instances where an accident could have – but did not – occur. These close call reports contain valuable safety information which could be useful in managing safety on the railway, but which can be lost in the very large volume of data – much larger than is viable for a human analyst to read. This paper describes the application of rudimentary natural language processing (NLP) techniques to uncover safety information from close calls. The analysis has proven that basic information extraction is possible using the rudimentary techniques, but has also identified some limitations that arise using only basic techniques. Using these findings further research in this area intends to look at how the techniques that have been proven to date can be improved with the use of more advanced NLP techniques coupled with machine-learning.",
keywords = "natural language processing, railway safety, close calls",
author = "Peter Hughes and {Figueres Esteban}, Miguel and {Van Gulijk}, Coen",
year = "2016",
doi = "10.1080/09617353.2016.1252083",
language = "English",
volume = "36",
pages = "184--198",
journal = "Safety and Reliability",
issn = "0961-7353",
number = "3",

}

Hughes, P, Figueres Esteban, M & Van Gulijk, C 2016, 'Learning from text-based close call data', Safety and Reliability, vol. 36, no. 3, pp. 184-198. https://doi.org/10.1080/09617353.2016.1252083

Learning from text-based close call data. / Hughes, Peter; Figueres Esteban, Miguel; Van Gulijk, Coen.

In: Safety and Reliability, Vol. 36, No. 3, 2016, p. 184-198.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Learning from text-based close call data

AU - Hughes, Peter

AU - Figueres Esteban, Miguel

AU - Van Gulijk, Coen

PY - 2016

Y1 - 2016

N2 - A key feature of big data is the variety of data sources that are available; which include not just numerical data but also image or video data or even free text. The GB railways collects a large volume of free text data daily from railway workers describing close call hazard reports: instances where an accident could have – but did not – occur. These close call reports contain valuable safety information which could be useful in managing safety on the railway, but which can be lost in the very large volume of data – much larger than is viable for a human analyst to read. This paper describes the application of rudimentary natural language processing (NLP) techniques to uncover safety information from close calls. The analysis has proven that basic information extraction is possible using the rudimentary techniques, but has also identified some limitations that arise using only basic techniques. Using these findings further research in this area intends to look at how the techniques that have been proven to date can be improved with the use of more advanced NLP techniques coupled with machine-learning.

AB - A key feature of big data is the variety of data sources that are available; which include not just numerical data but also image or video data or even free text. The GB railways collects a large volume of free text data daily from railway workers describing close call hazard reports: instances where an accident could have – but did not – occur. These close call reports contain valuable safety information which could be useful in managing safety on the railway, but which can be lost in the very large volume of data – much larger than is viable for a human analyst to read. This paper describes the application of rudimentary natural language processing (NLP) techniques to uncover safety information from close calls. The analysis has proven that basic information extraction is possible using the rudimentary techniques, but has also identified some limitations that arise using only basic techniques. Using these findings further research in this area intends to look at how the techniques that have been proven to date can be improved with the use of more advanced NLP techniques coupled with machine-learning.

KW - natural language processing

KW - railway safety

KW - close calls

UR - http://www.tandfonline.com/toc/tsar20/36/3?nav=tocList

U2 - 10.1080/09617353.2016.1252083

DO - 10.1080/09617353.2016.1252083

M3 - Article

VL - 36

SP - 184

EP - 198

JO - Safety and Reliability

JF - Safety and Reliability

SN - 0961-7353

IS - 3

ER -