Last modified on January 30, 2014, at 20:30

Ranking very many typed entities on Wikipedia

Publication (help)
Ranking very many typed entities on Wikipedia
Authors: Hugo Zaragoza, Henning Rode, Peter Mika, Jordi Atserias, Massimiliano Ciaramita, Giuseppe Attardi [edit item]
Citation: CIKM '07 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management  : 1015-1018. 2007.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s):
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Ranking very many typed entities on Wikipedia is a publication by Hugo Zaragoza, Henning Rode, Peter Mika, Jordi Atserias, Massimiliano Ciaramita, Giuseppe Attardi.


[edit] Abstract

We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific. We discuss two approaches for this problem: i) exploiting the entity containment graph and ii) using a Web search engine to compute entity relevance. We evaluate these approaches on the real task of ranking Wikipedia entities typed with a state-of-the-art named-entity tagger. Results show that both approaches can greatly increase the performance of methods based only on passage retrieval.

[edit] Research questions

"We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific."

Research details

Topics: Ranking and clustering systems [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Statistical analysis [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"We have taken the rst steps towards studying the problem of ad-hoc entity ranking in the presence of a large set of heterogeneous entities. We have constructed a realistic test- bed to carry out evaluation of entity ranking models, and we have provided some initial directions of research. With respect to entity containment graphs our results show that it is important to take into account the notion of inverted entity frequency to discount general types. With respect to Web methods we showed that taking into account the rank of the documents in the computation of correlations can yield signi cant improvements in performance"

[edit] Comments

"" With respect to entity containment graphs our results show that it is important to take into account the notion of inverted entity frequency to discount general types. With respect to Web methods we showed that taking into account the rank of the documents in the computation of correlations can yield signi cant improvements in performanc" p. 1018"


Further notes[edit]