Constructing Multiple Domain Taxonomy for Text Processing Tasks

Yihong Zhang, Yongrui Qin, Longkun Guo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years large volumes of short text data can be eas- ily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promis- ing performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such is- sues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in com- bination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classifica- tion, where the accuracy can be improved by up to 11%.
Original languageEnglish
Title of host publication29th International Conference on Database and Expert Systems Applications (DEXA 2018)
Place of PublicationCham
PublisherSpringer Verlag
Pages501-509
Number of pages9
Edition1st
ISBN (Electronic)9783319988122
ISBN (Print)9783319988115
DOIs
Publication statusPublished - 9 Aug 2018
Event29th International Conference on Database and Expert Systems Applications - Regensburg, Germany
Duration: 3 Sep 20186 Sep 2018
Conference number: 29
http://www.dexa.org/dexa2018 (Link to Conference Website)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer

Conference

Conference29th International Conference on Database and Expert Systems Applications
Abbreviated titleDEXA 2018
Country/TerritoryGermany
CityRegensburg
Period3/09/186/09/18
Internet address

Fingerprint

Dive into the research topics of 'Constructing Multiple Domain Taxonomy for Text Processing Tasks'. Together they form a unique fingerprint.

Cite this