Abstract
XML has become the standard way for representing and transforming data over the World Wide Web. The problem with XML documents is that they have a very high ratio of redundancy, which makes these documents demanding a large storage capacity and large network band-width for transmission. This study designs a system for compressing and querying XML documents (XMLCQ) which compresses the XML document without the need to its schema or DTD to minimize the amount of technologies associated with these documents. XMLCQ first compressed the XML document by separating its data into containers according to the path of these data from the root to the leaf, then it compressed these containers using a back-end compression technique. The compressed file then could be retrieved with any kind of queries applied. Only the required information is decompressed and submitted to the user. Depending on several experiments, the query processor part of the system showed the ability to answer different kinds of queries ranging from simple exact match queries to complex ones. Furthermore, this paper introduced the idea of retrieving information from more than one compressed XML documents.
Original language | English |
---|---|
Title of host publication | Information Retrieval Methods for Multidisciplinary Applications |
Publisher | IGI Global |
Chapter | 7 |
Pages | 95-115 |
Number of pages | 21 |
ISBN (Electronic) | 9781466638990 |
ISBN (Print) | 1466638982, 9781466638983 |
DOIs | |
Publication status | Published - 30 Apr 2013 |