Dynamic link-based ranking over large-scale graph-structured data
Publication (help) | |
---|---|
Dynamic link-based ranking over large-scale graph-structured data | |
Authors: | Heasoo Hwang [edit item] |
Citation: | University of California, San Diego : . 2010. United States, California. |
Publication type: | Thesis |
Peer-reviewed: | Yes |
Database(s): | |
DOI: | Define doi. |
Google Scholar cites: | Not available |
Link(s): | Paper link |
Added by Wikilit team: | Added on initial load |
Search | |
Article: | Google Scholar BASE PubMed |
Other scholarly wikis: | AcaWiki Brede Wiki WikiPapers |
Web search: | Bing Google Yahoo! — Google PDF |
Other: | |
Services | |
Format: | BibTeX |
Contents
[edit] Abstract
Information Retrieval techniques have been the primary means of keyword search in document collections. However, as the amount and the diversity of avail- able semantic connections between objects increase, link-based ranking methods including {ObjectRank} have been proposed to provide high-recall semantic keyword search over graph-structured data. Since a wide variety of data sources can be modeled as data graphs, supporting keyword search over graph-structured data greatly improves the usability of such data sources. However, it is challenging in both online performance and result quality. We first address the performance issue of dynamic authority-based ranking methods such as personalized {PageRank} and {ObjectRank.} Since they dynamically rank nodes in a data graph using an expensive matrix-multiplication method, the online execution time rapidly increases as the size of data graph grows. Over the English Wikipedia dataset of 2007, {ObjectRank} spends 20-40 seconds to compute query-specific relevance scores, which is unacceptable. We introduce a novel approach, {BinRank}, that approximates dynamic link-based ranking scores efficiently. {BinRank} partitions a dictionary into bins of relevant keywords and then constructs materialized subgraphs {(MSGs)} per bin in preprocessing stage. In query time, to produce highly accurate {top-K} results efficiently, {BinRank} uses the {MSG} corresponding to the given keyword, instead of the original data graph. {PageRank} and {ObjectRank} calculate the global importance score and the query-specific authority score of each node respectively by exploiting the link structure of a given data graph. However, both measures favor nodes with high in-degree that may contain popular yet generic content, and thus those nodes are frequently included in {top-K} lists, regardless of given query. We propose a novel ranking measure, Inverse {ObjectRank}, which measures the content-specificity of each node by traversing the semantic links in the data graph in the reverse direction. Then, we allow users to adjust the importance of the three ranking measures (global importance, query-relevance, and content-specificity) to improve the quality of search results.
[edit] Research questions
"We first address the performance issue of dynamic authority-based ranking methods such as personalized PageRank and ObjectRank. Since they dynamically rank nodes in a data graph using an expensive matrix-multiplication method, the online execution time rapidly increases as the size of data graph grows. Over the English Wikipedia dataset of 2007, ObjectRank spends 20-40 seconds to compute query-specific relevance scores, which is unacceptable. We introduce a novel ap- proach, BinRank, that approximates dynamic link-based ranking scores efficiently."
Research details
Topics: | Ranking and clustering systems [edit item] |
Domains: | Computer science [edit item] |
Theory type: | Design and action [edit item] |
Wikipedia coverage: | Sample data [edit item] |
Theories: | "Moreover, we explain
these authority flow metrics from the perspective of information theory. We also elaborate on the combining ranking function and study techniques to weigh the query keywords." [edit item] |
Research design: | Experiment [edit item] |
Data source: | Experiment responses, Wikipedia pages [edit item] |
Collected data time dimension: | Cross-sectional [edit item] |
Unit of analysis: | Article [edit item] |
Wikipedia data extraction: | Dump [edit item] |
Wikipedia page type: | Article [edit item] |
Wikipedia language: | English [edit item] |
[edit] Conclusion
"We first addressed the performance issue of dynamic authority-based rank- ing methods such as personalized PageRank and ObjectRank....The intuition behind the approach is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. Our extensive experimental evaluation confirms this intuition. Since the number of bins is much less than the number of terms, the preprocessing stage of BinRank is performed efficiently. In query time, to produce highly accurate top- K results efficiently, BinRank uses the MSG corresponding to the given keyword, instead of the original data graph. We showed that our tunable system offers a nice trade-off between query time and preprocessing cost.
Then, we presented Inverse ObjectRank, which is a link-based and keyword- specific specificity metric. PageRank and ObjectRank calculate the global impor- tance score and the query-specific authority score of each node respectively by exploiting the link structure of a given data graph. However, both measures fa- vor nodes with high in-degree that may contain popular yet generic content, and thus those nodes are frequently included in top-K lists, regardless of given query. Inverse ObjectRank measures the content-specificity of each node by traversing the semantic links in the data graph in the reverse direction. We built a proto- type of our methods on a bibliographic database, which we made available on the Web. We allowed users to adjust the importance of the three ranking measures (global importance, query-relevance, and content-specificity) to improve the qual- ity of search results. We showed how Inverse ObjectRank is combined with other ranking measures to produce the results list for a keyword query. Our methods have been qualitatively evaluated using a user survey and the bibliography sec- tions of a database textbook. Our experimental evaluation showed that combining ObjectRank with the square root of Inverse ObjectRank produces results of highest quality."
[edit] Comments
Further notes[edit]
Abstract | Information Retrieval techniques have been … Information Retrieval techniques have been the primary means of keyword search in document collections. However, as the amount and the diversity of avail- able semantic connections between objects increase, link-based ranking methods including {ObjectRank} have been proposed to provide high-recall semantic keyword search over graph-structured data. Since a wide variety of data sources can be modeled as data graphs, supporting keyword search over graph-structured data greatly improves the usability of such data sources. However, it is challenging in both online performance and result quality. We first address the performance issue of dynamic authority-based ranking methods such as personalized {PageRank} and {ObjectRank.} Since they dynamically rank nodes in a data graph using an expensive matrix-multiplication method, the online execution time rapidly increases as the size of data graph grows. Over the English Wikipedia dataset of 2007, {ObjectRank} spends 20-40 seconds to compute query-specific relevance scores, which is unacceptable. We introduce a novel approach, {BinRank}, that approximates dynamic link-based ranking scores efficiently. {BinRank} partitions a dictionary into bins of relevant keywords and then constructs materialized subgraphs {(MSGs)} per bin in preprocessing stage. In query time, to produce highly accurate {top-K} results efficiently, {BinRank} uses the {MSG} corresponding to the given keyword, instead of the original data graph. {PageRank} and {ObjectRank} calculate the global importance score and the query-specific authority score of each node respectively by exploiting the link structure of a given data graph. However, both measures favor nodes with high in-degree that may contain popular yet generic content, and thus those nodes are frequently included in {top-K} lists, regardless of given query. We propose a novel ranking measure, Inverse {ObjectRank}, which measures the content-specificity of each node by traversing the semantic links in the data graph in the reverse direction. Then, we allow users to adjust the importance of the three ranking measures (global importance, query-relevance, and content-specificity) to improve the quality of search results. to improve the quality of search results. |
Added by wikilit team | Added on initial load + |
Collected data time dimension | Cross-sectional + |
Conclusion | We first addressed the performance issue o … We first addressed the performance issue of dynamic authority-based rank-
ing methods such as personalized PageRank and ObjectRank....The intuition behind the approach is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. Our extensive experimental evaluation confirms this intuition. Since the number of bins is much less than the number of terms, the preprocessing stage of BinRank is performed efficiently. In query time, to produce highly accurate top- K results efficiently, BinRank uses the MSG corresponding to the given keyword, instead of the original data graph. We showed that our tunable system offers a nice trade-off between query time and preprocessing cost. Then, we presented Inverse ObjectRank, which is a link-based and keyword- specific specificity metric. PageRank and ObjectRank calculate the global impor- tance score and the query-specific authority score of each node respectively by exploiting the link structure of a given data graph. However, both measures fa- vor nodes with high in-degree that may contain popular yet generic content, and thus those nodes are frequently included in top-K lists, regardless of given query. Inverse ObjectRank measures the content-specificity of each node by traversing the semantic links in the data graph in the reverse direction. We built a proto- type of our methods on a bibliographic database, which we made available on the Web. We allowed users to adjust the importance of the three ranking measures (global importance, query-relevance, and content-specificity) to improve the qual- ity of search results. We showed how Inverse ObjectRank is combined with other ranking measures to produce the results list for a keyword query. Our methods have been qualitatively evaluated using a user survey and the bibliography sec- tions of a database textbook. Our experimental evaluation showed that combining ObjectRank with the square root of Inverse ObjectRank produces results of highest quality. tRank produces results of highest quality. |
Conference location | United States, California + |
Data source | Experiment responses + and Wikipedia pages + |
Google scholar url | http://scholar.google.com/scholar?ie=UTF-8&q=%22Dynamic%2Blink-based%2Branking%2Bover%2Blarge-scale%2Bgraph-structured%2Bdata%22 + |
Has author | Heasoo Hwang + |
Has domain | Computer science + |
Has topic | Ranking and clustering systems + |
Peer reviewed | Yes + |
Publication type | Thesis + |
Published in | University of California, San Diego + |
Research design | Experiment + |
Research questions | We first address the performance issue of … We first address the performance issue of dynamic authority-based ranking
methods such as personalized PageRank and ObjectRank. Since they dynamically rank nodes in a data graph using an expensive matrix-multiplication method, the online execution time rapidly increases as the size of data graph grows. Over the English Wikipedia dataset of 2007, ObjectRank spends 20-40 seconds to compute query-specific relevance scores, which is unacceptable. We introduce a novel ap- proach, BinRank, that approximates dynamic link-based ranking scores efficiently.mic link-based ranking scores efficiently. |
Revid | 10,739 + |
Theories | Moreover, we explain
these authority flow metrics from the perspective of information theory. We also elaborate on the combining ranking function and study techniques to weigh the query keywords. |
Theory type | Design and action + |
Title | Dynamic link-based ranking over large-scale graph-structured data |
Unit of analysis | Article + |
Url | http://proquest.umi.com/pqdweb?did=2065693471&Fmt=7&clientId=10306&RQT=309&VName=PQD + |
Wikipedia coverage | Sample data + |
Wikipedia data extraction | Dump + |
Wikipedia language | English + |
Wikipedia page type | Article + |
Year | 2010 + |