Difference between revisions of "A knowledge-based search engine powered by Wikipedia"

From WikiLit
Jump to: navigation, search
m (Text replace - "collected datatype" to "data source")
m (Text replace - "Collected datatype" to "Data source")
 
Line 74: Line 74:
 
"Resign design" should be "Design science" (the construction of the Koru system is described and "experiment" (a user study with 12 subjects are made).
 
"Resign design" should be "Design science" (the construction of the Koru system is described and "experiment" (a user study with 12 subjects are made).
  
"Collected datatype" can (apart from those listed) also be Wikipedia pages as these are used to constructed a thesaurus.
+
"Data source" can (apart from those listed) also be Wikipedia pages as these are used to constructed a thesaurus.
  
 
"Unit of analysis" should probably not be "user", - it is not a user of Wikipedia, but a user of the Koru system. Pages are used for analysis in the construction of the thesaurus.
 
"Unit of analysis" should probably not be "user", - it is not a user of Wikipedia, but a user of the Koru system. Pages are used for analysis in the construction of the thesaurus.

Latest revision as of 20:53, January 30, 2014

Publication (help)
A knowledge-based search engine powered by Wikipedia
Authors: David N. Milne, Ian H. Witten, David M. Nichols [edit item]
Citation: CIKM '07 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management  : 445-454. 2007 November 6-9. Lisboa, Portugal. Association for Computing Machinery.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1321440.1321504.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
A knowledge-based search engine powered by Wikipedia is a publication by David N. Milne, Ian H. Witten, David M. Nichols.


[edit] Abstract

This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.

[edit] Research questions

"This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers."

Research details

Topics: Query processing [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Design science, Experiment [edit item]
Data source: Computer usage logs, Direct observation, Experiment responses, Survey responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: User [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"This paper has introduced Koru, a new search engine that harnesses Wikipedia to provide domain-independent knowledgebased retrieval. Our intuition that Wikipedia could provide a knowledge base that matched both documents and queries has so far been borne out. We have tested it with a varied domainindependent collection of documents and retrieval tasks, and it was able to recognize and lend assistance to almost all queries issued to it, and significantly improve retrieval performance. Koru’s design was also validated, in that it allowed users to apply the knowledge found in Wikipedia to their retrieval process easily, effectively and efficiently. The following quote, given by one participant at the conclusion of their session, summarizes Koru’s performance best: It feels like a more powerful searching method, and allows you to search for topics that you may not have thought of… …it could use some improvements but the ability to graphically turn topics on/off is useful, and the way the system compresses synonymous terms together saves the user from having to search for the variations themselves. The ability to see a list of related terms also makes it easier to refine a search, where as with keyword searching you have to think up related terms yourself."

[edit] Comments

""[Wikipedia was tested] with a varied domain independent collection of documents and retrieval tasks, and it was able to recognize and lend assistance to almost all queries issued to it, and significantly improve retrieval performance." p. 453 search results

"Resign design" should be "Design science" (the construction of the Koru system is described and "experiment" (a user study with 12 subjects are made).

"Data source" can (apart from those listed) also be Wikipedia pages as these are used to constructed a thesaurus.

"Unit of analysis" should probably not be "user", - it is not a user of Wikipedia, but a user of the Koru system. Pages are used for analysis in the construction of the thesaurus.

"Wikipedia page type" is probably "implicit" articles.

"Wikipedia language" is very likely English, but should be set to "Not specified".

From discussion in 2013-01-16: "“A knowledge-based search engine powered by Wikipedia”: Should also be “design science”, “data source”: also “Wikipedia pages” (because of use in the thesaurus), “Unit of analysis”: keep it as “User” (Wikipedia pages are not studied per se), “Wikipedia pagetype: “Article”.""


Further notes[edit]