Constructing Multiple Domain Taxonomy for Text Processing Tasks

Yihong Zhang, Yongrui Qin, Longkun Guo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In recent years large volumes of short text data can be eas- ily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promis- ing performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such is- sues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in com- bination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classifica- tion, where the accuracy can be improved by up to 11%.
LanguageEnglish
Title of host publication29th International Conference on Database and Expert Systems Applications (DEXA 2018)
Place of PublicationCham
PublisherSpringer Verlag
Pages501-509
Number of pages9
Edition1st
ISBN (Electronic)9783319988122
ISBN (Print)9783319988115
DOIs
Publication statusPublished - 9 Aug 2018
Event29th International Conference on Database and Expert Systems Applications - Regensburg, Germany
Duration: 3 Sep 20186 Sep 2018
Conference number: 29
http://www.dexa.org/dexa2018 (Link to Conference Website)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer

Conference

Conference29th International Conference on Database and Expert Systems Applications
Abbreviated titleDEXA 2018
CountryGermany
CityRegensburg
Period3/09/186/09/18
Internet address

Fingerprint

Text processing
Taxonomies
Semantics
Processing

Cite this

Zhang, Y., Qin, Y., & Guo, L. (2018). Constructing Multiple Domain Taxonomy for Text Processing Tasks. In 29th International Conference on Database and Expert Systems Applications (DEXA 2018) (1st ed., pp. 501-509). (Lecture Notes in Computer Science). Cham: Springer Verlag. https://doi.org/10.1007/978-3-319-98812-2_46
Zhang, Yihong ; Qin, Yongrui ; Guo, Longkun. / Constructing Multiple Domain Taxonomy for Text Processing Tasks. 29th International Conference on Database and Expert Systems Applications (DEXA 2018). 1st. ed. Cham : Springer Verlag, 2018. pp. 501-509 (Lecture Notes in Computer Science).
@inproceedings{18c563e5701744c4834725cdc3d62823,
title = "Constructing Multiple Domain Taxonomy for Text Processing Tasks",
abstract = "In recent years large volumes of short text data can be eas- ily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promis- ing performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such is- sues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in com- bination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classifica- tion, where the accuracy can be improved by up to 11{\%}.",
author = "Yihong Zhang and Yongrui Qin and Longkun Guo",
year = "2018",
month = "8",
day = "9",
doi = "10.1007/978-3-319-98812-2_46",
language = "English",
isbn = "9783319988115",
series = "Lecture Notes in Computer Science",
publisher = "Springer Verlag",
pages = "501--509",
booktitle = "29th International Conference on Database and Expert Systems Applications (DEXA 2018)",
edition = "1st",

}

Zhang, Y, Qin, Y & Guo, L 2018, Constructing Multiple Domain Taxonomy for Text Processing Tasks. in 29th International Conference on Database and Expert Systems Applications (DEXA 2018). 1st edn, Lecture Notes in Computer Science, Springer Verlag, Cham, pp. 501-509, 29th International Conference on Database and Expert Systems Applications, Regensburg, Germany, 3/09/18. https://doi.org/10.1007/978-3-319-98812-2_46

Constructing Multiple Domain Taxonomy for Text Processing Tasks. / Zhang, Yihong; Qin, Yongrui; Guo, Longkun.

29th International Conference on Database and Expert Systems Applications (DEXA 2018). 1st. ed. Cham : Springer Verlag, 2018. p. 501-509 (Lecture Notes in Computer Science).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Constructing Multiple Domain Taxonomy for Text Processing Tasks

AU - Zhang, Yihong

AU - Qin, Yongrui

AU - Guo, Longkun

PY - 2018/8/9

Y1 - 2018/8/9

N2 - In recent years large volumes of short text data can be eas- ily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promis- ing performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such is- sues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in com- bination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classifica- tion, where the accuracy can be improved by up to 11%.

AB - In recent years large volumes of short text data can be eas- ily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promis- ing performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such is- sues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in com- bination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classifica- tion, where the accuracy can be improved by up to 11%.

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85052786808&origin=resultslist&sort=plf-f&src=s&st1=Constructing+Multiple+Domain+Taxonomy+for+Text+Processing+Tasks&st2=&sid=64909e95f38eba38242c15f9c6964585&sot=b&sdt=b&sl=78&s=TITLE-ABS-KEY%28Constructing+Multiple+Domain+Taxonomy+for+Text+Processing+Tasks%29&relpos=0&citeCnt=0&searchTerm=

U2 - 10.1007/978-3-319-98812-2_46

DO - 10.1007/978-3-319-98812-2_46

M3 - Conference contribution

SN - 9783319988115

T3 - Lecture Notes in Computer Science

SP - 501

EP - 509

BT - 29th International Conference on Database and Expert Systems Applications (DEXA 2018)

PB - Springer Verlag

CY - Cham

ER -

Zhang Y, Qin Y, Guo L. Constructing Multiple Domain Taxonomy for Text Processing Tasks. In 29th International Conference on Database and Expert Systems Applications (DEXA 2018). 1st ed. Cham: Springer Verlag. 2018. p. 501-509. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-319-98812-2_46