TopX: efficient and versatile top-k query processing for semistructured data

From WikiLit
Revision as of 18:43, October 18, 2013 by Ochado (Talk | contribs) (Text replace - "([ ][|]research_design=)([^ ]*Experiment)([^ ]*[ ][|]collected_datatype=)([^ ]*)([^ ]*[ ])" to "\1\2\3Experiment responses, \4\5")

Jump to: navigation, search
Publication (help)
TopX: efficient and versatile top-k query processing for semistructured data
Authors: Martin Theobald, Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Gerhard Weikum [edit item]
Citation: The VLDB Journal 17 (1): 81-115. 2008.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1007/s00778-007-0072-z.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
TopX: efficient and versatile top-k query processing for semistructured data is a publication by Martin Theobald, Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Gerhard Weikum.


[edit] Abstract

Recent {IR} extensions to {XML} query languages such as Xpath 1.0 {Full-Text} or the {NEXI} query language of the {INEX} benchmark series reflect the emerging interest in {IR-style} ranked retrieval over semistructured data. {TopX} is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked {XML} retrieval with {XPath} {Full-Text} functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the {TopX} system, with performance experiments on large-scale corpora like {TREC} Terabyte and {INEX} Wikipedia.

[edit] Research questions

"The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for rankedXMLretrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia"

Research details

Topics: Query processing [edit item]
Domains: Computer science [edit item]
Theory type: Analysis, Design and action [edit item]
Wikipedia coverage: Other [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Clone [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"TopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data. It achieves effective ranked retrieval by means of an XML-specific extension of the probabilistic-IR BM25 relevance scoring model, and also leverages thesauri and ontologies for word-sense disambiguation and robust query expansion. It achieves scalability and efficient top-k query processing by means of extended threshold algorithms with specific priority-queue management, judicious scheduling of random accesses to index entries, and probabilistic score predictions for early pruning of top-k candidates."

[edit] Comments

"TopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data."


Further notes[edit]

Facts about "TopX: efficient and versatile top-k query processing for semistructured data"RDF feed
AbstractRecent {IR} extensions to {XML} query langRecent {IR} extensions to {XML} query languages such as Xpath 1.0 {Full-Text} or the {NEXI} query language of the {INEX} benchmark series reflect the emerging interest in {IR-style} ranked retrieval over semistructured data. {TopX} is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked {XML} retrieval with {XPath} {Full-Text} functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the {TopX} system, with performance experiments on large-scale corpora like {TREC} Terabyte and {INEX} Wikipedia.like {TREC} Terabyte and {INEX} Wikipedia.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsTopX is an efficient and effective search engine for nonschematic XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data.
ConclusionTopX is an efficient and effective search TopX is an efficient and effective search engine for nonschematic

XML documents, with the full functionality of XPath Full-Text and supporting also the entire range of text, semistructured, and structured data. It achieves effective ranked retrieval by means of an XML-specific extension of the probabilistic-IR BM25 relevance scoring model, and also leverages thesauri and ontologies for word-sense disambiguation and robust query expansion. It achieves scalability and efficient top-k query processing by means of extended threshold algorithms with specific priority-queue management, judicious scheduling of random accesses to index entries, and probabilistic score predictions for early pruning of top-k candidates.ons for early

pruning of top-k candidates.
Doi10.1007/s00778-007-0072-z +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22TopX%3A%2Befficient%2Band%2Bversatile%2Btop-k%2Bquery%2Bprocessing%2Bfor%2Bsemistructured%2Bdata%22 +
Has authorMartin Theobald +, Holger Bast +, Debapriyo Majumdar +, Ralf Schenkel + and Gerhard Weikum +
Has domainComputer science +
Has topicQuery processing +
Issue1 +
Pages81-115 +
Peer reviewedYes +
Publication typeJournal article +
Published inThe VLDB Journal +
Research designExperiment +
Research questionsThe main contributions of this paper unfolThe main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for rankedXMLretrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipediapora like TREC Terabyte and INEX Wikipedia
Revid9,916 +
TheoriesUndetermined
Theory typeAnalysis + and Design and action +
TitleTopX: efficient and versatile top-k query processing for semistructured data
Unit of analysisArticle +
Urlhttp://www.springerlink.com/content/y34h4h0741378kl6/ +
Volume17 +
Wikipedia coverageOther +
Wikipedia data extractionClone +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2008 +