TopX: efficient and versatile top-k query processing for semistructured data
Publication (help) | |
---|---|
TopX: efficient and versatile top-k query processing for semistructured data | |
Authors: | Martin Theobald, Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Gerhard Weikum [edit item] |
Citation: | The VLDB Journal 17 (1): 81-115. 2008. |
Publication type: | Journal article |
Peer-reviewed: | Yes |
Database(s): | |
DOI: | 10.1007/s00778-007-0072-z. |
Google Scholar cites: | Citations |
Link(s): | Paper link |
Added by Wikilit team: | Added on initial load |
Search | |
Article: | Google Scholar BASE PubMed |
Other scholarly wikis: | AcaWiki Brede Wiki WikiPapers |
Web search: | Bing Google Yahoo! — Google PDF |
Other: | |
Services | |
Format: | BibTeX |
Contents
[edit] Abstract
Recent {IR} extensions to {XML} query languages such as Xpath 1.0 {Full-Text} or the {NEXI} query language of the {INEX} benchmark series reflect the emerging interest in {IR-style} ranked retrieval over semistructured data. {TopX} is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked {XML} retrieval with {XPath} {Full-Text} functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the {TopX} system, with performance experiments on large-scale corpora like {TREC} Terabyte and {INEX} Wikipedia.
[edit] Research questions
"The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for rankedXMLretrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia"
Research details
Topics: | Query processing [edit item] |
Domains: | Computer science [edit item] |
Theory type: | Analysis, Design and action [edit item] |
Wikipedia coverage: | Other [edit item] |
Theories: | "Undetermined" [edit item] |
Research design: | Experiment [edit item] |
Data source: | [edit item] |
Collected data time dimension: | Cross-sectional [edit item] |
Unit of analysis: | Article [edit item] |
Wikipedia data extraction: | Clone [edit item] |
Wikipedia page type: | Article [edit item] |
Wikipedia language: | Not specified [edit item] |
[edit] Conclusion
"TopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data. It achieves effective ranked retrieval by means of an XML-specific extension of the probabilistic-IR BM25 relevance scoring model, and also leverages thesauri and ontologies for word-sense disambiguation and robust query expansion. It achieves scalability and efficient top-k query processing by means of extended threshold algorithms with specific priority-queue management, judicious scheduling of random accesses to index entries, and probabilistic score predictions for early pruning of top-k candidates."
[edit] Comments
"TopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data."
Further notes[edit]
Abstract | Recent {IR} extensions to {XML} query lang … Recent {IR} extensions to {XML} query languages such as Xpath 1.0 {Full-Text} or the {NEXI} query language of the {INEX} benchmark series reflect the emerging interest in {IR-style} ranked retrieval over semistructured data. {TopX} is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked {XML} retrieval with {XPath} {Full-Text} functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the {TopX} system, with performance experiments on large-scale corpora like {TREC} Terabyte and {INEX} Wikipedia.like {TREC} Terabyte and {INEX} Wikipedia. |
Added by wikilit team | Added on initial load + |
Collected data time dimension | Cross-sectional + |
Comments | TopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data. |
Conclusion | TopX is an efficient and effective search … TopX is an efficient and effective search engine for nonschematic
XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data. It achieves effective ranked retrieval by means of an XML-specific extension of the probabilistic-IR BM25 relevance scoring model, and also leverages thesauri and ontologies for word-sense disambiguation and robust query expansion. It achieves scalability and efficient top-k query processing by means of extended threshold algorithms with specific priority-queue management, judicious scheduling of random accesses to index entries, and probabilistic score predictions for early pruning of top-k candidates. ons for early pruning of top-k candidates. |
Doi | 10.1007/s00778-007-0072-z + |
Google scholar url | http://scholar.google.com/scholar?ie=UTF-8&q=%22TopX%3A%2Befficient%2Band%2Bversatile%2Btop-k%2Bquery%2Bprocessing%2Bfor%2Bsemistructured%2Bdata%22 + |
Has author | Martin Theobald +, Holger Bast +, Debapriyo Majumdar +, Ralf Schenkel + and Gerhard Weikum + |
Has domain | Computer science + |
Has topic | Query processing + |
Issue | 1 + |
Pages | 81-115 + |
Peer reviewed | Yes + |
Publication type | Journal article + |
Published in | The VLDB Journal + |
Research design | Experiment + |
Research questions | The main contributions of this paper unfol … The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for rankedXMLretrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipediapora like TREC Terabyte and INEX Wikipedia |
Revid | 9,916 + |
Theories | Undetermined |
Theory type | Analysis + and Design and action + |
Title | TopX: efficient and versatile top-k query processing for semistructured data |
Unit of analysis | Article + |
Url | http://www.springerlink.com/content/y34h4h0741378kl6/ + |
Volume | 17 + |
Wikipedia coverage | Other + |
Wikipedia data extraction | Clone + |
Wikipedia language | Not specified + |
Wikipedia page type | Article + |
Year | 2008 + |