The Impact of Different Training Sets on Medical Documents Classification

Roberto Gatta, Mauro Vallati, Berardino De Bari, Mahmut Ozsahin

Research output: Contribution to journalConference article

1 Citation (Scopus)

Abstract

The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital. The result is that observed performance are different from expected ones. Given such differences, it is unclear how should be the "right" training set, and how its characteristics affects the classification performance. In this paper we present the results of an experimental analysis, conducted on actual clinical documents from a medical Department, which aims to evaluate the impact of differently sized and assembled training sets on well-known classification techniques.

Original languageEnglish
Pages (from-to)1-5
Number of pages5
JournalCEUR Workshop Proceedings
Volume1213
Publication statusPublished - 17 Aug 2014
Event3rd International Workshop on Artificial Intelligence and Assistive Medicine - Prague, Czech Republic
Duration: 18 Aug 201418 Aug 2014
Conference number: 3

Fingerprint

Information retrieval

Cite this

Gatta, Roberto ; Vallati, Mauro ; De Bari, Berardino ; Ozsahin, Mahmut. / The Impact of Different Training Sets on Medical Documents Classification. In: CEUR Workshop Proceedings. 2014 ; Vol. 1213. pp. 1-5.
@article{52255892a34b4194bb773c8e4078286f,
title = "The Impact of Different Training Sets on Medical Documents Classification",
abstract = "The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital. The result is that observed performance are different from expected ones. Given such differences, it is unclear how should be the {"}right{"} training set, and how its characteristics affects the classification performance. In this paper we present the results of an experimental analysis, conducted on actual clinical documents from a medical Department, which aims to evaluate the impact of differently sized and assembled training sets on well-known classification techniques.",
author = "Roberto Gatta and Mauro Vallati and {De Bari}, Berardino and Mahmut Ozsahin",
year = "2014",
month = "8",
day = "17",
language = "English",
volume = "1213",
pages = "1--5",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR Workshop Proceedings",

}

The Impact of Different Training Sets on Medical Documents Classification. / Gatta, Roberto; Vallati, Mauro; De Bari, Berardino; Ozsahin, Mahmut.

In: CEUR Workshop Proceedings, Vol. 1213, 17.08.2014, p. 1-5.

Research output: Contribution to journalConference article

TY - JOUR

T1 - The Impact of Different Training Sets on Medical Documents Classification

AU - Gatta, Roberto

AU - Vallati, Mauro

AU - De Bari, Berardino

AU - Ozsahin, Mahmut

PY - 2014/8/17

Y1 - 2014/8/17

N2 - The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital. The result is that observed performance are different from expected ones. Given such differences, it is unclear how should be the "right" training set, and how its characteristics affects the classification performance. In this paper we present the results of an experimental analysis, conducted on actual clinical documents from a medical Department, which aims to evaluate the impact of differently sized and assembled training sets on well-known classification techniques.

AB - The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital. The result is that observed performance are different from expected ones. Given such differences, it is unclear how should be the "right" training set, and how its characteristics affects the classification performance. In this paper we present the results of an experimental analysis, conducted on actual clinical documents from a medical Department, which aims to evaluate the impact of differently sized and assembled training sets on well-known classification techniques.

UR - http://www.scopus.com/inward/record.url?scp=84920877198&partnerID=8YFLogxK

M3 - Conference article

VL - 1213

SP - 1

EP - 5

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -