Understanding user's query intent with Wikipedia

From WikiLit
Jump to: navigation, search
Publication (help)
Understanding user's query intent with Wikipedia
Authors: Jian Hu, Gang Wang, Fred Lochovsky, Jian tao Sun, Zheng Chen [edit item]
Citation: WWW '09 Proceedings of the 18th international conference on World wide web  : . 2009.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1526709.1526773.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Understanding user's query intent with Wikipedia is a publication by Jian Hu, Gang Wang, Fred Lochovsky, Jian tao Sun, Zheng Chen.


[edit] Abstract

Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.

[edit] Research questions

"In this paper, we propose a general methodology to the problem of query intent classification."

Research details

Topics: Query processing [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Case study, Experiment [edit item]
Data source: Experiment responses [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: N/A [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article, Log [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain."

[edit] Comments

""Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small." p.471"


Further notes[edit]