Real-Time Queries on Large Volumes of Safety Text

Matthew Newall, Coen Van Gulijk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

It is often necessary to parse large volumes of text in the process of carrying out Safety and Risk Management duties. One example of this is the Close Call system, operated in the UK to log safety related incidents on the GB railways. Approximately 300,000 unstructured text reports are added each year. Traditionally, locating and categorizing potential risk indicators in the Close Call text (and other systems like it) has been a human task. Though steps have been taken towards augmenting this with computer-based analysis, real-time feedback has not been possible. This paper will discuss a platform which allows real-time queries on large volumes of text. A novel application of Integer based hashing is applied to n-grams of the text. Using this method, in combination with search optimizations such as binary searching (which would be cumbersome or impossible to perform on unmodified text) it can be shown that pattern matching performance is improved by several orders of magnitude when compared to Brute force matching, or even more developed methods such as the Boyer-Moore algorithm.
LanguageEnglish
Title of host publicationProceedings of 29th European Safety and Reliability Conference
EditorsMichael Beer, Enrico Zio
Pages1800-1803
Number of pages4
Volume1
Edition1
ISBN (Electronic)9789811127243
Publication statusPublished - 2019
Event29th European Safety and Reliability Conference - Leibniz Universität, Hannover, Germany
Duration: 22 Sep 201926 Sep 2019
Conference number: 29
https://esrel2019.org/#/

Conference

Conference29th European Safety and Reliability Conference
Abbreviated titleESREL 2019
CountryGermany
CityHannover
Period22/09/1926/09/19
Internet address

Fingerprint

Pattern matching
Risk management
Feedback

Cite this

Newall, M., & Van Gulijk, C. (2019). Real-Time Queries on Large Volumes of Safety Text. In M. Beer, & E. Zio (Eds.), Proceedings of 29th European Safety and Reliability Conference (1 ed., Vol. 1, pp. 1800-1803)
Newall, Matthew ; Van Gulijk, Coen. / Real-Time Queries on Large Volumes of Safety Text. Proceedings of 29th European Safety and Reliability Conference. editor / Michael Beer ; Enrico Zio. Vol. 1 1. ed. 2019. pp. 1800-1803
@inproceedings{5e9606537fac43208464c5a7d8dc0d38,
title = "Real-Time Queries on Large Volumes of Safety Text",
abstract = "It is often necessary to parse large volumes of text in the process of carrying out Safety and Risk Management duties. One example of this is the Close Call system, operated in the UK to log safety related incidents on the GB railways. Approximately 300,000 unstructured text reports are added each year. Traditionally, locating and categorizing potential risk indicators in the Close Call text (and other systems like it) has been a human task. Though steps have been taken towards augmenting this with computer-based analysis, real-time feedback has not been possible. This paper will discuss a platform which allows real-time queries on large volumes of text. A novel application of Integer based hashing is applied to n-grams of the text. Using this method, in combination with search optimizations such as binary searching (which would be cumbersome or impossible to perform on unmodified text) it can be shown that pattern matching performance is improved by several orders of magnitude when compared to Brute force matching, or even more developed methods such as the Boyer-Moore algorithm.",
author = "Matthew Newall and {Van Gulijk}, Coen",
year = "2019",
language = "English",
volume = "1",
pages = "1800--1803",
editor = "Michael Beer and Enrico Zio",
booktitle = "Proceedings of 29th European Safety and Reliability Conference",
edition = "1",

}

Newall, M & Van Gulijk, C 2019, Real-Time Queries on Large Volumes of Safety Text. in M Beer & E Zio (eds), Proceedings of 29th European Safety and Reliability Conference. 1 edn, vol. 1, pp. 1800-1803, 29th European Safety and Reliability Conference, Hannover, Germany, 22/09/19.

Real-Time Queries on Large Volumes of Safety Text. / Newall, Matthew; Van Gulijk, Coen.

Proceedings of 29th European Safety and Reliability Conference. ed. / Michael Beer; Enrico Zio. Vol. 1 1. ed. 2019. p. 1800-1803.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Real-Time Queries on Large Volumes of Safety Text

AU - Newall, Matthew

AU - Van Gulijk, Coen

PY - 2019

Y1 - 2019

N2 - It is often necessary to parse large volumes of text in the process of carrying out Safety and Risk Management duties. One example of this is the Close Call system, operated in the UK to log safety related incidents on the GB railways. Approximately 300,000 unstructured text reports are added each year. Traditionally, locating and categorizing potential risk indicators in the Close Call text (and other systems like it) has been a human task. Though steps have been taken towards augmenting this with computer-based analysis, real-time feedback has not been possible. This paper will discuss a platform which allows real-time queries on large volumes of text. A novel application of Integer based hashing is applied to n-grams of the text. Using this method, in combination with search optimizations such as binary searching (which would be cumbersome or impossible to perform on unmodified text) it can be shown that pattern matching performance is improved by several orders of magnitude when compared to Brute force matching, or even more developed methods such as the Boyer-Moore algorithm.

AB - It is often necessary to parse large volumes of text in the process of carrying out Safety and Risk Management duties. One example of this is the Close Call system, operated in the UK to log safety related incidents on the GB railways. Approximately 300,000 unstructured text reports are added each year. Traditionally, locating and categorizing potential risk indicators in the Close Call text (and other systems like it) has been a human task. Though steps have been taken towards augmenting this with computer-based analysis, real-time feedback has not been possible. This paper will discuss a platform which allows real-time queries on large volumes of text. A novel application of Integer based hashing is applied to n-grams of the text. Using this method, in combination with search optimizations such as binary searching (which would be cumbersome or impossible to perform on unmodified text) it can be shown that pattern matching performance is improved by several orders of magnitude when compared to Brute force matching, or even more developed methods such as the Boyer-Moore algorithm.

UR - http://itekcmsonline.com/rps2prod/esrel2019/e-proceedings/html/copyright.html

M3 - Conference contribution

VL - 1

SP - 1800

EP - 1803

BT - Proceedings of 29th European Safety and Reliability Conference

A2 - Beer, Michael

A2 - Zio, Enrico

ER -

Newall M, Van Gulijk C. Real-Time Queries on Large Volumes of Safety Text. In Beer M, Zio E, editors, Proceedings of 29th European Safety and Reliability Conference. 1 ed. Vol. 1. 2019. p. 1800-1803