Large-scale named entity disambiguation based on Wikipedia data

From WikiLit
Jump to: navigation, search
Publication (help)
Large-scale named entity disambiguation based on Wikipedia data
Authors: Silviu Cucerzan [edit item]
Citation: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning  : 708-716. 2007. Prague. Association for Computational Linguistics.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Yes
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Large-scale named entity disambiguation based on Wikipedia data is a publication by Silviu Cucerzan.


[edit] Abstract

This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information extracted from Wikipedia and the context of a document, as well as the agreement among the category tags associated with the candidate entities, the implemented system shows high disambiguation accuracy on both news stories and Wikipedia articles.

[edit] Research questions

"The system discussed in this paper performs both named entity identification and disambiguation. ... The disambiguation component, which constitutes the main focus of the paper, employs a vast amount of contextual and category information automatically extracted from Wikipedia .... We augment the Wikipedia category information with information automatically extracted from Wikipedia list pages and use it in conjunction with the context information in a vectorial model that employs a novel disambiguation method."

Research details

Topics: Information extraction, Other natural language processing topics [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"We presented a large scale named entity disambiguation system that employs a huge amount of information automatically extracted from Wikipedia over a space of more than 1.4 million entities. In tests on both real news data and Wikipedia text, the system obtained accuracies exceeding 91% and 88%."

[edit] Comments


Further notes[edit]

Facts about "Large-scale named entity disambiguation based on Wikipedia data"RDF feed
AbstractThis paper presents a large-scale system fThis paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information extracted from Wikipedia and the context of a document, as well as the agreement among the category tags associated with the candidate entities, the implemented system shows high disambiguation accuracy on both news stories and Wikipedia articles. both news stories and Wikipedia articles.
Added by wikilit teamYes +
Collected data time dimensionCross-sectional +
ConclusionWe presented a large scale named entity diWe presented a large scale named entity disambiguation system that employs a huge amount of information automatically extracted from Wikipedia over a space of more than 1.4 million entities. In tests on both real news data and Wikipedia text, the system obtained accuracies exceeding 91% and 88%.obtained accuracies exceeding 91% and 88%.
Conference locationPrague +
Data sourceExperiment responses + and Wikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Large-scale%2Bnamed%2Bentity%2Bdisambiguation%2Bbased%2Bon%2BWikipedia%2Bdata%22 +
Has authorSilviu Cucerzan +
Has domainComputer science +
Has topicInformation extraction + and Other natural language processing topics +
Pages708-716 +
Peer reviewedYes +
Publication typeConference paper +
Published inJoint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning +
PublisherAssociation for Computational Linguistics +
Research designExperiment +
Research questionsThe system discussed in this paper performThe system discussed in this paper performs both named entity identification and disambiguation. ... The disambiguation component, which constitutes the main focus of the paper, employs a vast amount of contextual and category information automatically extracted from Wikipedia .... We augment the Wikipedia category information with information automatically extracted from Wikipedia list pages and use it in conjunction with the context information in a vectorial model that employs a novel disambiguation method.hat employs a novel disambiguation method.
Revid10,845 +
TheoriesUndetermined
Theory typeDesign and action +
TitleLarge-scale named entity disambiguation based on Wikipedia data
Unit of analysisArticle +
Urlhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.1582 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2007 +