Adaptive indexing for content-based search in P2P systems

From WikiLit
Jump to: navigation, search
Publication (help)
Adaptive indexing for content-based search in P2P systems
Authors: Aoying Zhou, Rong Zhang, Weining Qian, Quang Hieu Vu, Tianming Hu [edit item]
Citation: Data and Knowledge Engineering 67 (3): 381-398. 2008.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1016/j.datak.2008.06.013.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Adaptive indexing for content-based search in P2P systems is a publication by Aoying Zhou, Rong Zhang, Weining Qian, Quang Hieu Vu, Tianming Hu.


[edit] Abstract

One of the major challenges in Peer-to-Peer (P2P) file sharing systems is to support content-based search. Although there have been some proposals to address this challenge, they share the same weakness of using either servers or super-peers to keep global knowledge, which is required to identify importance of terms to avoid popular terms in query processing. As a result, they are not scalable and are prone to the bottleneck problem, which is caused by the high visiting load at the global knowledge maintainers. To that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge. Our method is based on an adaptive indexing structure that combines a Chord ring and a balanced tree. The tree is used to aggregate and classify terms adaptively, while the Chord ring is used to index terms of nodes in the tree. Specifically, at each node of the tree, the system classifies terms as either important or unimportant. Important terms, which can distinguish the node from its neighbor nodes, are indexed in the Chord ring. On the other hand, unimportant terms, which are either popular or rare terms, are aggregated to higher level nodes. Such classification enables the system to process queries on the fly without the need for global knowledge. Besides, compared to the methods that index terms separately, term aggregation reduces the indexing cost significantly. Taking advantage of the tree structure, we also develop an efficient search algorithm to tackle the bottleneck problem near the root. Finally, our extensive experiments on both benchmark and Wikipedia datasets validated the effectiveness and efficiency of the proposed method.

[edit] Research questions

"To that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge."

Research details

Topics: Other information retrieval topics [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Archival records, Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"As a result, our system avoids several severe problems caused by maintaining such global knowledge. Nevertheless, it still achieves comparable retrieval efficiency as those systems keeping global knowledge. Although the structure of our system is partially based on a tree structure, our search algorithm guarantees no bottleneck at the root or nodes near the root. In addition, we also introduced several techniques to further improve the system performance. Finally, our extensive experiments demonstrated the effectiveness and efficiency of the proposed approach."

[edit] Comments

"Wikipedia pages Documents

We evaluated all systems on three datasets. The first one is the benchmark dataset containing four different types of documents: MED, CISI, CACM, and TIMES used by Smart [2], with 1033, 1460, 3204, 425 documents and 30, 35, 64 and 83 queries, respectively. The two remaining datasets are the 2005 TREC publish spam corpus consisting of 84,053 files obtained from http://plg.uwaterloo.ca/gvcormac/treccorpus/ and the Wikipedia dataset containing 1,000,000 pages downloaded from Wikipedia (http://wikipedia.org), a free multilingual online encyclopedia. Queries of these two datasets are generated randomly from the categories of TREC and Wikipedia."


Further notes[edit]

Facts about "Adaptive indexing for content-based search in P2P systems"RDF feed
AbstractOne of the major challenges in Peer-to-PeeOne of the major challenges in Peer-to-Peer (P2P) file sharing systems is to support content-based search. Although there have been some proposals to address this challenge, they share the same weakness of using either servers or super-peers to keep global knowledge, which is required to identify importance of terms to avoid popular terms in query processing. As a result, they are not scalable and are prone to the bottleneck problem, which is caused by the high visiting load at the global knowledge maintainers. To that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge. Our method is based on an adaptive indexing structure that combines a Chord ring and a balanced tree. The tree is used to aggregate and classify terms adaptively, while the Chord ring is used to index terms of nodes in the tree. Specifically, at each node of the tree, the system classifies terms as either important or unimportant. Important terms, which can distinguish the node from its neighbor nodes, are indexed in the Chord ring. On the other hand, unimportant terms, which are either popular or rare terms, are aggregated to higher level nodes. Such classification enables the system to process queries on the fly without the need for global knowledge. Besides, compared to the methods that index terms separately, term aggregation reduces the indexing cost significantly. Taking advantage of the tree structure, we also develop an efficient search algorithm to tackle the bottleneck problem near the root. Finally, our extensive experiments on both benchmark and Wikipedia datasets validated the effectiveness and efficiency of the proposed method.ess and efficiency of the proposed method.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsWikipedia pages

Documents

We evaluated alWikipedia pages Documents

We evaluated all systems on three datasets. The first one is the benchmark dataset containing four different types of documents: MED, CISI, CACM, and TIMES used by Smart [2], with 1033, 1460, 3204, 425 documents and 30, 35, 64 and 83 queries, respectively. The two remaining datasets are the 2005 TREC publish spam corpus consisting of 84,053 files obtained from http://plg.uwaterloo.ca/gvcormac/treccorpus/ and the Wikipedia dataset containing 1,000,000 pages downloaded from Wikipedia (http://wikipedia.org), a free multilingual online encyclopedia. Queries of these two datasets are generated randomly from the categories of TREC and Wikipedia.from the categories of TREC and Wikipedia.
ConclusionAs a result, our system avoids several sevAs a result, our system avoids several severe problems caused by maintaining such global knowledge. Nevertheless, it still achieves comparable retrieval efficiency as those systems keeping global knowledge. Although the structure of our system is partially based on a tree structure, our search algorithm guarantees no bottleneck at the root or nodes near the root. In addition, we also introduced several techniques to further improve the system performance. Finally, our extensive experiments demonstrated the effectiveness and efficiency of the proposed approach.s and efficiency of the proposed approach.
Data sourceArchival records +, Experiment responses + and Wikipedia pages +
Doi10.1016/j.datak.2008.06.013 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Adaptive%2Bindexing%2Bfor%2Bcontent-based%2Bsearch%2Bin%2BP2P%2Bsystems%22 +
Has authorAoying Zhou +, Rong Zhang +, Weining Qian +, Quang Hieu Vu + and Tianming Hu +
Has domainComputer science +
Has topicOther information retrieval topics +
Issue3 +
Pages381-398 +
Peer reviewedYes +
Publication typeJournal article +
Published inData and Knowledge Engineering +
Research designExperiment +
Research questionsTo that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge.
Revid10,648 +
TheoriesUndetermined
Theory typeDesign and action +
TitleAdaptive indexing for content-based search in P2P systems
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1016/j.datak.2008.06.013 +
Volume67 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2008 +