BinRank: scaling dynamic authority-based search using materialized subgraphs

From WikiLit
Jump to: navigation, search
Publication (help)
BinRank: scaling dynamic authority-based search using materialized subgraphs
Authors: Heasoo Hwang, Andrey Balmin, Berthold Reinwald, Erik Nijkamp [edit item]
Citation: IEEE Transactions on Knowledge and Data Engineering 22 (8): 1176-1190. 2010.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1109/TKDE.2010.85.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
BinRank: scaling dynamic authority-based search using materialized subgraphs is a publication by Heasoo Hwang, Andrey Balmin, Berthold Reinwald, Erik Nijkamp.


[edit] Abstract

Dynamic authority-based keyword search algorithms, such as ObjectRank and personalized PageRank, leverage semantic link information to provide high quality, high recall search in databases, and the Web. Conceptually, these algorithms require a query-time PageRank-style iterative computation over the full graph. This computation is too expensive for large graphs, and not feasible at query time. Alternatively, building an index of precomputed results for some or all keywords involves very expensive preprocessing. We introduce BinRank, a system that approximates ObjectRank results by utilizing a hybrid approach inspired by materialized views in traditional query processing. We materialize a number of relatively small subsets of the data graph in such a way that any keyword query can be answered by running ObjectRank on only one of the subgraphs. BinRank generates the subgraphs by partitioning all the terms in the corpus based on their co-occurrence, executing ObjectRank for each partition using the terms to generate a set of random walk starting points, and keeping only those objects that receive non-negligible scores. The intuition is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. We demonstrate that BinRank can achieve subsecond query execution time on the English Wikipedia data set, while producing high-quality search results that closely approximate the results of ObjectRank on the original graph. The Wikipedia link graph contains about 108 edges, which is at least two orders of magnitude larger than what prior state of the art dynamic authority-based search systems have been able to demonstrate. Our experimental evaluation investigates the trade-off between query execution time, quality of the results, and storage requirements of BinRank.

[edit] Research questions

"In this paper,we introduce a BinRank system that employs a hybrid approach where query time can be traded off for preprocessing time and storage. BinRank closely approximates ObjectRank scores by running the same ObjectRank algorithm on a small subgraph, instead of the full data graph. The subgraphs are precomputed offline. The precomputation can be parallelized with linear scalability. For example, on the full Wikipedia data set, BinRank can answer any query in less than 1 second, by precomputing about a thousand subgraphs, which takes only about 12 hours on a single CPU."

Research details

Topics: Query processing, Ranking and clustering systems [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: N/A [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"In this paper, we proposed BinRank as a practical solution for scalable dynamic authority-based ranking. It is based on partitioning and approximation using a number of materialized subgraphs. We showed that our tunable system offers a nice trade-off between query time and preprocessing cost. We introduce a greedy algorithm that groups co-occurring terms into a number of bins for which we compute materialized subgraphs. Note that the number of bins is much less than the number of terms. The materialized subgraphs are computed offline by using ObjectRank itself. The intuition behind the approach is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. Our extensive experimental evaluation confirms this intuition."

[edit] Comments


Further notes[edit]

Facts about "BinRank: scaling dynamic authority-based search using materialized subgraphs"RDF feed
AbstractDynamic authority-based keyword search algDynamic authority-based keyword search algorithms, such as ObjectRank and personalized PageRank, leverage semantic link information to provide high quality, high recall search in databases, and the Web. Conceptually, these algorithms require a query-time PageRank-style iterative computation over the full graph. This computation is too expensive for large graphs, and not feasible at query time. Alternatively, building an index of precomputed results for some or all keywords involves very expensive preprocessing. We introduce BinRank, a system that approximates ObjectRank results by utilizing a hybrid approach inspired by materialized views in traditional query processing. We materialize a number of relatively small subsets of the data graph in such a way that any keyword query can be answered by running ObjectRank on only one of the subgraphs. BinRank generates the subgraphs by partitioning all the terms in the corpus based on their co-occurrence, executing ObjectRank for each partition using the terms to generate a set of random walk starting points, and keeping only those objects that receive non-negligible scores. The intuition is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. We demonstrate that BinRank can achieve subsecond query execution time on the English Wikipedia data set, while producing high-quality search results that closely approximate the results of ObjectRank on the original graph. The Wikipedia link graph contains about 108 edges, which is at least two orders of magnitude larger than what prior state of the art dynamic authority-based search systems have been able to demonstrate. Our experimental evaluation investigates the trade-off between query execution time, quality of the results, and storage requirements of BinRank.ults, and storage requirements of BinRank.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
ConclusionIn this paper, we proposed BinRank as a prIn this paper, we proposed BinRank as a practical solution for scalable dynamic authority-based ranking. It is based

on partitioning and approximation using a number of materialized subgraphs. We showed that our tunable system offers a nice trade-off between query time and preprocessing cost. We introduce a greedy algorithm that groups co-occurring terms into a number of bins for which we compute materialized subgraphs. Note that the number of bins is much less than the number of terms. The materialized subgraphs are computed offline by using ObjectRank itself. The intuition behind the approach is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. Our extensive experimental evaluation confirms this intuition.mental

evaluation confirms this intuition.
Data sourceExperiment responses + and Wikipedia pages +
Doi10.1109/TKDE.2010.85 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22BinRank%3A%2Bscaling%2Bdynamic%2Bauthority-based%2Bsearch%2Busing%2Bmaterialized%2Bsubgraphs%22 +
Has authorHeasoo Hwang +, Andrey Balmin +, Berthold Reinwald + and Erik Nijkamp +
Has domainComputer science +
Has topicQuery processing + and Ranking and clustering systems +
Issue8 +
Pages1176-1190 +
Peer reviewedYes +
Publication typeJournal article +
Published inIEEE Transactions on Knowledge and Data Engineering +
Research designExperiment +
Research questionsIn this paper,we introduce a BinRank systeIn this paper,we introduce a BinRank system that employs

a hybrid approach where query time can be traded off for preprocessing time and storage. BinRank closely approximates ObjectRank scores by running the same ObjectRank algorithm on a small subgraph, instead of the full data graph. The subgraphs are precomputed offline. The precomputation can be parallelized with linear scalability. For example, on the full Wikipedia data set, BinRank can answer any query in less than 1 second, by precomputing about a thousand subgraphs,

which takes only about 12 hours on a single CPU.
takes only about 12 hours on a single CPU.
Revid10,685 +
TheoriesUndetermined
Theory typeDesign and action +
TitleBinRank: scaling dynamic authority-based search using materialized subgraphs
Unit of analysisN/A +
Urlhttp://dx.doi.org/10.1109/TKDE.2010.85 +
Volume22 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2010 +