A knowledge-based search engine powered by Wikipedia

From WikiLit
Revision as of 20:53, January 30, 2014 by Fnielsen (Talk | contribs) (Text replace - "Collected datatype" to "Data source")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Publication (help)
A knowledge-based search engine powered by Wikipedia
Authors: David N. Milne, Ian H. Witten, David M. Nichols [edit item]
Citation: CIKM '07 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management  : 445-454. 2007 November 6-9. Lisboa, Portugal. Association for Computing Machinery.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1321440.1321504.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
A knowledge-based search engine powered by Wikipedia is a publication by David N. Milne, Ian H. Witten, David M. Nichols.


[edit] Abstract

This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.

[edit] Research questions

"This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers."

Research details

Topics: Query processing [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Design science, Experiment [edit item]
Data source: Computer usage logs, Direct observation, Experiment responses, Survey responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: User [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"This paper has introduced Koru, a new search engine that harnesses Wikipedia to provide domain-independent knowledgebased retrieval. Our intuition that Wikipedia could provide a knowledge base that matched both documents and queries has so far been borne out. We have tested it with a varied domainindependent collection of documents and retrieval tasks, and it was able to recognize and lend assistance to almost all queries issued to it, and significantly improve retrieval performance. Koru’s design was also validated, in that it allowed users to apply the knowledge found in Wikipedia to their retrieval process easily, effectively and efficiently. The following quote, given by one participant at the conclusion of their session, summarizes Koru’s performance best: It feels like a more powerful searching method, and allows you to search for topics that you may not have thought of… …it could use some improvements but the ability to graphically turn topics on/off is useful, and the way the system compresses synonymous terms together saves the user from having to search for the variations themselves. The ability to see a list of related terms also makes it easier to refine a search, where as with keyword searching you have to think up related terms yourself."

[edit] Comments

""[Wikipedia was tested] with a varied domain independent collection of documents and retrieval tasks, and it was able to recognize and lend assistance to almost all queries issued to it, and significantly improve retrieval performance." p. 453 search results

"Resign design" should be "Design science" (the construction of the Koru system is described and "experiment" (a user study with 12 subjects are made).

"Data source" can (apart from those listed) also be Wikipedia pages as these are used to constructed a thesaurus.

"Unit of analysis" should probably not be "user", - it is not a user of Wikipedia, but a user of the Koru system. Pages are used for analysis in the construction of the thesaurus.

"Wikipedia page type" is probably "implicit" articles.

"Wikipedia language" is very likely English, but should be set to "Not specified".

From discussion in 2013-01-16: "“A knowledge-based search engine powered by Wikipedia”: Should also be “design science”, “data source”: also “Wikipedia pages” (because of use in the thesaurus), “Unit of analysis”: keep it as “User” (Wikipedia pages are not studied per se), “Wikipedia pagetype: “Article”.""


Further notes[edit]

Facts about "A knowledge-based search engine powered by Wikipedia"RDF feed
AbstractThis paper describes Koru, a new search inThis paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.the gap between expert and novice seekers.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Comments"[Wikipedia was tested] with a varied doma"[Wikipedia was tested] with a varied domain independent collection of documents and retrieval tasks, and it was able to recognize and lend assistance to almost all queries issued to it, and significantly improve retrieval performance." p. 453

search results

"Resign design" should be "Design science" (the construction of the Koru system is described and "experiment" (a user study with 12 subjects are made).

"Data source" can (apart from those listed) also be Wikipedia pages as these are used to constructed a thesaurus.

"Unit of analysis" should probably not be "user", - it is not a user of Wikipedia, but a user of the Koru system. Pages are used for analysis in the construction of the thesaurus.

"Wikipedia page type" is probably "implicit" articles.

"Wikipedia language" is very likely English, but should be set to "Not specified".

From discussion in 2013-01-16: "“A knowledge-based search engine powered by Wikipedia”: Should also be “design science”, “data source”: also “Wikipedia pages” (because of use in the thesaurus), “Unit of analysis”: keep it as “User” (Wikipedia pages are not studied per se), “Wikipedia pagetype: “Article”."
per se), “Wikipedia pagetype: “Article”."
ConclusionThis paper has introduced Koru, a new searThis paper has introduced Koru, a new search engine that

harnesses Wikipedia to provide domain-independent knowledgebased retrieval. Our intuition that Wikipedia could provide a knowledge base that matched both documents and queries has so far been borne out. We have tested it with a varied domainindependent collection of documents and retrieval tasks, and it was able to recognize and lend assistance to almost all queries issued to it, and significantly improve retrieval performance. Koru’s design was also validated, in that it allowed users to apply the knowledge found in Wikipedia to their retrieval process easily, effectively and efficiently. The following quote, given by one participant at the conclusion of their session, summarizes Koru’s performance best: It feels like a more powerful searching method, and allows you to search for topics that you may not have thought of… …it could use some improvements but the ability to graphically turn topics on/off is useful, and the way the system compresses synonymous terms together saves the user from having to search for the variations themselves. The ability to see a list of related terms also makes it easier to refine a search, where as with keyword searching you have to think up related terms yourself.u have to think up

related terms yourself.
Conference locationLisboa, Portugal +
Data sourceComputer usage logs +, Direct observation +, Experiment responses +, Survey responses + and Wikipedia pages +
Dates6-9 +
Doi10.1145/1321440.1321504 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22A%2Bknowledge-based%2Bsearch%2Bengine%2Bpowered%2Bby%2BWikipedia%22 +
Has authorDavid N. Milne +, Ian H. Witten + and David M. Nichols +
Has domainComputer science +
Has topicQuery processing +
MonthNovember +
Pages445-454 +
Peer reviewedYes +
Publication typeConference paper +
Published inCIKM '07 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management +
PublisherAssociation for Computing Machinery +
Research designDesign science + and Experiment +
Research questionsThis paper describes Koru, a new search inThis paper describes Koru, a new search interface that offers

effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.the gap between expert and

novice seekers.
Revid11,127 +
TheoriesUndetermined
Theory typeDesign and action +
TitleA knowledge-based search engine powered by Wikipedia
Unit of analysisUser +
Urlhttp://researchcommons.waikato.ac.nz/handle/10289/5379 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2007 +