Unsupervised query segmentation using generative language models and Wikipedia

From WikiLit
Jump to: navigation, search
Publication (help)
Unsupervised query segmentation using generative language models and Wikipedia
Authors: Bin Tan, Fuchun Peng [edit item]
Citation: Proceeding of the 17th international conference on World Wide Web  : 347-356. 2008.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1367497.1367545.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Unsupervised query segmentation using generative language models and Wikipedia is a publication by Bin Tan, Fuchun Peng.


[edit] Abstract

In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia. Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774).

[edit] Research questions

"In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia."

Research details

Topics: Query processing [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment, Statistical analysis [edit item]
Data source: Experiment responses, Websites, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: N/A [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article, Log [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774)."

[edit] Comments

""the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774).""


Further notes[edit]

Facts about "Unsupervised query segmentation using generative language models and Wikipedia"RDF feed
AbstractIn this paper, we propose a novel unsupervIn this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia. Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774). of 46% improvement (from 0.530 to 0.774).
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Comments"the basic generative language model contr"the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774)."of 46% improvement (from 0.530 to 0.774)."
ConclusionExperiments show that our approach dramatiExperiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774). of 46% improvement (from 0.530 to 0.774).
Data sourceExperiment responses +, Websites + and Wikipedia pages +
Doi10.1145/1367497.1367545 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Unsupervised%2Bquery%2Bsegmentation%2Busing%2Bgenerative%2Blanguage%2Bmodels%2Band%2BWikipedia%22 +
Has authorBin Tan + and Fuchun Peng +
Has domainComputer science +
Has topicQuery processing +
Pages347-356 +
Peer reviewedYes +
Publication typeConference paper +
Published inProceeding of the 17th international conference on World Wide Web +
Research designExperiment + and Statistical analysis +
Research questionsIn this paper, we propose a novel unsupervIn this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia.g, we incorporate evidence from Wikipedia.
Revid11,014 +
TheoriesUndetermined
Theory typeDesign and action +
TitleUnsupervised query segmentation using generative language models and Wikipedia
Unit of analysisN/A +
Urlhttp://dl.acm.org/citation.cfm?id=1367545 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageEnglish +
Wikipedia page typeArticle + and Log +
Year2008 +