Browse wiki

Jump to: navigation, search
Accuracy estimate and optimization techniques for SimRank computation
Abstract The measure of similarity between objects The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of this kind, based on a graph-theoretic model. SimRank is typically computed iteratively, in the spirit of PageRank. However, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity. In this paper, we present a technique to estimate the accuracy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n4) in the worst case to min(O(nl), O(n3/ log2n)), with n denoting the number of objects, and l denoting the number object-to-object relationships. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method. As a practical illustration of our techniques, we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links.mplete set of articles and category links.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conclusion A precise accuracy estimate for SimRank itA precise accuracy estimate for SimRank iterative com- putation is established. The estimate reveals that SimRank computation parameters suggested in the original SimRank proposal implied a relatively low accuracy, and the choice for different parameter values is suggested. The accuracy estimate allows a-priori finding out the correct number of iterations required for achieving a desired accuracy. The number of iterations turns out to be independent of input graph characteristics, the fact to benefit scalability. Experimental results show a 50 times speedup achieved by the optimization techniques for a graph with 10K nodes, and relative improvement in computation time further increases for larger graphs. The experience in computing SimRank scores over the English Wikipedia corpus exhibits practical viability of the approach for relatively large data corpora. We believe that the results presented in the paper would facilitate a wider application of SimRank to computer science techniques, as this similarity measure definitely deserves.is similarity measure definitely deserves.
Data source Wikipedia pages  +
Doi 10.1007/s00778-009-0168-8 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Accuracy%2Bestimate%2Band%2Boptimization%2Btechniques%2Bfor%2BSimRank%2Bcomputation%22  +
Has author Dmitry Lizorkin + , Pavel Velikhov + , Maxim Grinev + , Denis Turdakov +
Has domain Computer science +
Has topic Ranking and clustering systems +
Issue 1  +
Pages 45-66  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in VLDB Journal +
Research design Mathematical modeling  +
Research questions The measure of similarity between objects The measure of similarity between objects is a very useful tool in many areas of computer science, including informa- tion retrieval. SimRank is a simple and intuitive measure of this kind, based on graph-theoretic model. In this paper we present a technique to estimate the ac- curacy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the com- putational complexity of the iterative algorithm from O(n4) to O(n3) in the worst case. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the eciency of the method.urther improves the eciency of the method.
Revid 10,646  +
Theories Undetermined
Theory type Design and action  +
Title Accuracy estimate and optimization techniques for SimRank computation
Unit of analysis Article  +
Url http://dx.doi.org/10.1007/s00778-009-0168-8  +
Volume 19  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Live Wikipedia  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2010  +
Creation dateThis property is a special property in this wiki. 15 March 2012 18:41:21  +
Categories Ranking and clustering systems  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:20:01  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.