HPC and the Big Data challenge

Violeta Holmes, Matthew Newall

Research output: Contribution to journalArticle

Abstract

High performance computing (HPC) and Big Data are technologies vital for advancement in science, business and industry. HPC combines computing power of supercomputers and computer clusters, and parallel and distributed processing techniques for solving complex computational problems. The term Big Data refers to the fact that more data are being produced, consumed and stored than ever before. This is resulting in datasets that are too large, complex, and/or dynamic to be managed and analysed by traditional methods. Access to HPC systems and the ability to model, simulate and manipulate massive and dynamic data, is now critical for research, business and innovation. In this paper an overview of HPC and Big Data technology is presented. The paper outlines the advances in computer technology enabling Peta and Exa scale and energy efficient computing, and Big Data challenges of extracting meaning and new information from the data. As an example of HPC and Big Data synergy in risk analysis, a case study of processing close-call data is conducted using HPC resources at the University of Huddersfield. A parallel program was designed and implemented on the university's Hadoop cluster to speed up processing of unstructured free form text records pertaining to close call railway events, in order to identify potential risks and incidents. This case study demonstrates the benefits of using HPC with parallel programming techniques, and the improvements achieved compared to serial processing on a standard workstation computer system. However, it also highlights the challenges in risk analysis of Big Data that require novel approaches in HPC system and software design.
LanguageEnglish
Pages213-224
Number of pages12
JournalSafety and Reliability
Volume36
Issue number3
Early online date29 Nov 2016
DOIs
Publication statusPublished - 2016

Fingerprint

Risk analysis
Processing
Industry
Parallel programming
Computer workstations
Supercomputers
Software design
Big data
Computer systems
Innovation
Systems analysis

Cite this

Holmes, Violeta ; Newall, Matthew. / HPC and the Big Data challenge. In: Safety and Reliability. 2016 ; Vol. 36, No. 3. pp. 213-224.
@article{823f08bc3dba48fc85003dea463a6c95,
title = "HPC and the Big Data challenge",
abstract = "High performance computing (HPC) and Big Data are technologies vital for advancement in science, business and industry. HPC combines computing power of supercomputers and computer clusters, and parallel and distributed processing techniques for solving complex computational problems. The term Big Data refers to the fact that more data are being produced, consumed and stored than ever before. This is resulting in datasets that are too large, complex, and/or dynamic to be managed and analysed by traditional methods. Access to HPC systems and the ability to model, simulate and manipulate massive and dynamic data, is now critical for research, business and innovation. In this paper an overview of HPC and Big Data technology is presented. The paper outlines the advances in computer technology enabling Peta and Exa scale and energy efficient computing, and Big Data challenges of extracting meaning and new information from the data. As an example of HPC and Big Data synergy in risk analysis, a case study of processing close-call data is conducted using HPC resources at the University of Huddersfield. A parallel program was designed and implemented on the university's Hadoop cluster to speed up processing of unstructured free form text records pertaining to close call railway events, in order to identify potential risks and incidents. This case study demonstrates the benefits of using HPC with parallel programming techniques, and the improvements achieved compared to serial processing on a standard workstation computer system. However, it also highlights the challenges in risk analysis of Big Data that require novel approaches in HPC system and software design.",
keywords = "HPC, Big Data, Hadoop, Risk analysis",
author = "Violeta Holmes and Matthew Newall",
year = "2016",
doi = "10.1080/09617353.2016.1252085",
language = "English",
volume = "36",
pages = "213--224",
journal = "Safety and Reliability",
issn = "0961-7353",
number = "3",

}

HPC and the Big Data challenge. / Holmes, Violeta; Newall, Matthew.

In: Safety and Reliability, Vol. 36, No. 3, 2016, p. 213-224.

Research output: Contribution to journalArticle

TY - JOUR

T1 - HPC and the Big Data challenge

AU - Holmes, Violeta

AU - Newall, Matthew

PY - 2016

Y1 - 2016

N2 - High performance computing (HPC) and Big Data are technologies vital for advancement in science, business and industry. HPC combines computing power of supercomputers and computer clusters, and parallel and distributed processing techniques for solving complex computational problems. The term Big Data refers to the fact that more data are being produced, consumed and stored than ever before. This is resulting in datasets that are too large, complex, and/or dynamic to be managed and analysed by traditional methods. Access to HPC systems and the ability to model, simulate and manipulate massive and dynamic data, is now critical for research, business and innovation. In this paper an overview of HPC and Big Data technology is presented. The paper outlines the advances in computer technology enabling Peta and Exa scale and energy efficient computing, and Big Data challenges of extracting meaning and new information from the data. As an example of HPC and Big Data synergy in risk analysis, a case study of processing close-call data is conducted using HPC resources at the University of Huddersfield. A parallel program was designed and implemented on the university's Hadoop cluster to speed up processing of unstructured free form text records pertaining to close call railway events, in order to identify potential risks and incidents. This case study demonstrates the benefits of using HPC with parallel programming techniques, and the improvements achieved compared to serial processing on a standard workstation computer system. However, it also highlights the challenges in risk analysis of Big Data that require novel approaches in HPC system and software design.

AB - High performance computing (HPC) and Big Data are technologies vital for advancement in science, business and industry. HPC combines computing power of supercomputers and computer clusters, and parallel and distributed processing techniques for solving complex computational problems. The term Big Data refers to the fact that more data are being produced, consumed and stored than ever before. This is resulting in datasets that are too large, complex, and/or dynamic to be managed and analysed by traditional methods. Access to HPC systems and the ability to model, simulate and manipulate massive and dynamic data, is now critical for research, business and innovation. In this paper an overview of HPC and Big Data technology is presented. The paper outlines the advances in computer technology enabling Peta and Exa scale and energy efficient computing, and Big Data challenges of extracting meaning and new information from the data. As an example of HPC and Big Data synergy in risk analysis, a case study of processing close-call data is conducted using HPC resources at the University of Huddersfield. A parallel program was designed and implemented on the university's Hadoop cluster to speed up processing of unstructured free form text records pertaining to close call railway events, in order to identify potential risks and incidents. This case study demonstrates the benefits of using HPC with parallel programming techniques, and the improvements achieved compared to serial processing on a standard workstation computer system. However, it also highlights the challenges in risk analysis of Big Data that require novel approaches in HPC system and software design.

KW - HPC

KW - Big Data

KW - Hadoop

KW - Risk analysis

U2 - 10.1080/09617353.2016.1252085

DO - 10.1080/09617353.2016.1252085

M3 - Article

VL - 36

SP - 213

EP - 224

JO - Safety and Reliability

T2 - Safety and Reliability

JF - Safety and Reliability

SN - 0961-7353

IS - 3

ER -