Browse wiki

Jump to: navigation, search
TopX: efficient and versatile top-k query processing for semistructured data
Abstract Recent {IR} extensions to {XML} query langRecent {IR} extensions to {XML} query languages such as Xpath 1.0 {Full-Text} or the {NEXI} query language of the {INEX} benchmark series reflect the emerging interest in {IR-style} ranked retrieval over semistructured data. {TopX} is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked {XML} retrieval with {XPath} {Full-Text} functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the {TopX} system, with performance experiments on large-scale corpora like {TREC} Terabyte and {INEX} Wikipedia.like {TREC} Terabyte and {INEX} Wikipedia.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments TopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data.
Conclusion TopX is an efficient and effective search TopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data. It achieves effective ranked retrieval by means of an XML-specific extension of the probabilistic-IR BM25 relevance scoring model, and also leverages thesauri and ontologies for word-sense disambiguation and robust query expansion. It achieves scalability and efficient top-k query processing by means of extended threshold algorithms with specific priority-queue management, judicious scheduling of random accesses to index entries, and probabilistic score predictions for early pruning of top-k candidates.ons for early pruning of top-k candidates.
Data source Experiment responses  + , Wikipedia pages  +
Doi 10.1007/s00778-007-0072-z +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22TopX%3A%2Befficient%2Band%2Bversatile%2Btop-k%2Bquery%2Bprocessing%2Bfor%2Bsemistructured%2Bdata%22  +
Has author Martin Theobald + , Holger Bast + , Debapriyo Majumdar + , Ralf Schenkel + , Gerhard Weikum +
Has domain Computer science +
Has topic Query processing +
Issue 1  +
Pages 81-115  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in The VLDB Journal +
Research design Experiment  +
Research questions The main contributions of this paper unfolThe main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for rankedXMLretrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipediapora like TREC Terabyte and INEX Wikipedia
Revid 11,002  +
Theories Undetermined
Theory type Analysis  + , Design and action  +
Title TopX: efficient and versatile top-k query processing for semistructured data
Unit of analysis Article  +
Url http://www.springerlink.com/content/y34h4h0741378kl6/  +
Volume 17  +
Wikipedia coverage Other  +
Wikipedia data extraction Dump  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2008  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:31:48  +
Categories Query processing  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:31:59  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.