A key feature of big data is the variety of data sources that are available; which include not just numerical data but also image or video data or even free text. The GB railways collects a large volume of free text data daily from railway workers describing close call hazard reports: instances where an accident could have – but did not – occur. These close call reports contain valuable safety information which could be useful in managing safety on the railway, but which can be lost in the very large volume of data – much larger than is viable for a human analyst to read. This paper describes the application of rudimentary natural language processing (NLP) techniques to uncover safety information from close calls. The analysis has proven that basic information extraction is possible using the rudimentary techniques, but has also identified some limitations that arise using only basic techniques. Using these findings further research in this area intends to look at how the techniques that have been proven to date can be improved with the use of more advanced NLP techniques coupled with machine-learning.
|Number of pages||15|
|Journal||Safety and Reliability|
|Early online date||29 Nov 2016|
|Publication status||Published - 2016|