Accuracy estimate and optimization techniques for SimRank computation

From WikiLit
Jump to: navigation, search
Publication (help)
Accuracy estimate and optimization techniques for SimRank computation
Authors: Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, Denis Turdakov [edit item]
Citation: VLDB Journal 19 (1): 45-66. 2010.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1007/s00778-009-0168-8.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Accuracy estimate and optimization techniques for SimRank computation is a publication by Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, Denis Turdakov.


[edit] Abstract

The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of this kind, based on a graph-theoretic model. SimRank is typically computed iteratively, in the spirit of PageRank. However, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity. In this paper, we present a technique to estimate the accuracy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n4) in the worst case to min(O(nl), O(n3/ log2n)), with n denoting the number of objects, and l denoting the number object-to-object relationships. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method. As a practical illustration of our techniques, we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links.

[edit] Research questions

"The measure of similarity between objects is a very useful tool in many areas of computer science, including informa- tion retrieval. SimRank is a simple and intuitive measure of this kind, based on graph-theoretic model.

In this paper we present a technique to estimate the ac- curacy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the com- putational complexity of the iterative algorithm from O(n4) to O(n3) in the worst case. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the eciency of the method."

Research details

Topics: Ranking and clustering systems [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Mathematical modeling [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"A precise accuracy estimate for SimRank iterative com- putation is established. The estimate reveals that SimRank computation parameters suggested in the original SimRank proposal implied a relatively low accuracy, and the choice for different parameter values is suggested. The accuracy estimate allows a-priori finding out the correct number of iterations required for achieving a desired accuracy. The number of iterations turns out to be independent of input graph characteristics, the fact to benefit scalability.

Experimental results show a 50 times speedup achieved by the optimization techniques for a graph with 10K nodes, and relative improvement in computation time further increases for larger graphs. The experience in computing SimRank scores over the English Wikipedia corpus exhibits practical viability of the approach for relatively large data corpora. We believe that the results presented in the paper would facilitate a wider application of SimRank to computer science techniques, as this similarity measure definitely deserves."

[edit] Comments


Further notes[edit]

Facts about "Accuracy estimate and optimization techniques for SimRank computation"RDF feed
AbstractThe measure of similarity between objects The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of this kind, based on a graph-theoretic model. SimRank is typically computed iteratively, in the spirit of PageRank. However, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity. In this paper, we present a technique to estimate the accuracy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n4) in the worst case to min(O(nl), O(n3/ log2n)), with n denoting the number of objects, and l denoting the number object-to-object relationships. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method. As a practical illustration of our techniques, we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links.mplete set of articles and category links.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
ConclusionA precise accuracy estimate for SimRank itA precise accuracy estimate for SimRank iterative com-

putation is established. The estimate reveals that SimRank computation parameters suggested in the original SimRank proposal implied a relatively low accuracy, and the choice for different parameter values is suggested. The accuracy estimate allows a-priori finding out the correct number of iterations required for achieving a desired accuracy. The number of iterations turns out to be independent of input graph characteristics, the fact to benefit scalability.

Experimental results show a 50 times speedup achieved by the optimization techniques for a graph with 10K nodes, and relative improvement in computation time further increases for larger graphs. The experience in computing SimRank scores over the English Wikipedia corpus exhibits practical viability of the approach for relatively large data corpora. We believe that the results presented in the paper would facilitate a wider application of SimRank to computer science techniques,

as this similarity measure definitely deserves.
is similarity measure definitely deserves.
Data sourceWikipedia pages +
Doi10.1007/s00778-009-0168-8 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Accuracy%2Bestimate%2Band%2Boptimization%2Btechniques%2Bfor%2BSimRank%2Bcomputation%22 +
Has authorDmitry Lizorkin +, Pavel Velikhov +, Maxim Grinev + and Denis Turdakov +
Has domainComputer science +
Has topicRanking and clustering systems +
Issue1 +
Pages45-66 +
Peer reviewedYes +
Publication typeJournal article +
Published inVLDB Journal +
Research designMathematical modeling +
Research questionsThe measure of similarity between objects The measure of similarity between objects is a very useful

tool in many areas of computer science, including informa- tion retrieval. SimRank is a simple and intuitive measure of this kind, based on graph-theoretic model.

In this paper we present a technique to estimate the ac- curacy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the com- putational complexity of the iterative algorithm from O(n4) to O(n3) in the worst case. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the eciency of the method.urther

improves the eciency of the method.
Revid10,646 +
TheoriesUndetermined
Theory typeDesign and action +
TitleAccuracy estimate and optimization techniques for SimRank computation
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1007/s00778-009-0168-8 +
Volume19 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2010 +