Browse wiki

Jump to: navigation, search
Wikitology: a novel hybrid knowledge base derived from Wikipedia
Abstract World knowledge may be available in differWorld knowledge may be available in different forms such as relational databases, triple stores, link graphs, meta-data and free text. Human minds are capable of understanding and reasoning over knowledge represented in different ways and are influenced by different social, contextual and environmental factors. By following a similar model, we have integrated a variety of knowledge sources in a novel way to produce a single hybrid knowledge base i.e., Wikitology, enabling applications to better access and exploit knowledge hidden in different forms. Wikipedia proves to be an invaluable resource for generating a hybrid knowledge base due to the availability and interlinking of structured, semi-structured and un-structured encyclopedic information. However, Wikipedia is designed in a way that facilitates human understanding and contribution by providing interlinking of articles and categories for better browsing and search of information, making the content easily understandable to humans but requiring intelligent approaches for being exploited by applications directly. Research projects like Cyc [61] have resulted in the development of a complex broad coverage knowledge base, however, relatively few applications have been built that really exploit it. In contrast, the design and development of Wikitology {KB} has been incremental and has been driven and guided by a variety of applications and approaches that exploit the knowledge available in Wikipedia in different ways. This evolution has resulted in the development of a hybrid knowledge base that not only incorporates and integrates a variety of knowledge resources but also a variety of data structures, and exposes the knowledge hidden in different forms to applications through a single integrated query interface. We demonstrate the value of the derived knowledge base by developing problem specific intelligent approaches that exploit Wikitology for a diverse set of use cases, namely, document concept prediction, cross document co-reference resolution defined as a task in Automatic Content Extraction {(ACE)} [1], Entity Linking to {KB} entities defined as a part of Text Analysis Conference - Knowledge Base Population Track 2009 [65] and interpreting tables [94]. These use cases directly serve to evaluate the utility of the knowledge base for different applications and also demonstrate how the knowledge base could be exploited in different ways. Based on our work we have also developed a Wikitology {API} that applications can use to exploit this unique hybrid knowledge resource for solving real world problems. The different use cases that exploit Wikitology for solving real world problems also contribute to enriching the knowledge base automatically. The document concept prediction approach can predict inter-article and category-links for new Wikipedia articles. Cross document co-reference resolution and entity linking provide a way for specifically linking entity mentions in Wikipedia articles or external articles to the entity articles in Wikipedia and also help in suggesting redirects. In addition to that we have also developed specific approaches aimed at automatically enriching the Wikitology {KB} by unsupervised discovery of ontology elements using the inter-article links, generating disambiguation trees for entities and estimating the page rank of Wikipedia concepts to serve as a measure of popularity. The set of approaches combined together can contribute to a number of steps in a broader unified framework for automatically adding new concepts to the Wikitology knowledge base.concepts to the Wikitology knowledge base.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments Research design: Design science "we can cResearch design: Design science "we can create a hybrid knowledge base from Wikipedia and other related knowledge sources by automatically generating knowledge about the world, effectively supporting a diverse set of common use cases. Wikipedia is originally designed for humans. Applications need to employ intelligent approaches to access and harvest the knowledge available in Wikipedia for solving different problems." p. "We downloaded the Wikipedia XML snapshot of 4 November 2006 and extracted 2,557,939 Wikipedia articles." p. 66acted 2,557,939 Wikipedia articles." p. 66
Conclusion In the thesis statement we stated that we In the thesis statement we stated that we can create a hybrid knowledge base from Wikipedia and other related knowledge sources by automatically generating knowledge about the world, effectively supporting a diverse set of common use cases. Wikipedia is originally designed for humans. Applications need to employ intelligent approaches to access and harvest the knowledge available in Wikipedia for solving different problems. In this dissertation, we have presented novel approaches for solving a variety of real world problems by exploiting hybrid knowledge avail- able in di erent forms such as free text, link graphs, categories and triples in a collaboratively developed knowledge resource, Wikipedia along with other related resources. These novel approaches exploiting Wikipedia have guided and directed the incremental development of the Wikitology knowledge base which uni es a variety of hybrid knowledge sources and representations in an organized way and provides a rich integrated query interface thus enabling applications to better access and exploit knowledge hidden in di erent forms and in di erent resources. This dissertation also presents a set of approaches that can generate structured data for automatically enriching Wikipedia and hence the Wikitology knowledge base by predicting inter-article links, categories, redirects, disambiguation trees and discovering ontology elements using the Wikipedia inter-article links. Structured data is often sparse. Unstructured data or free text can complement the structured data to overcome data sparseness in many cases. Text similarity algorithms return ranked results based on relevance which is not the case for structured data. A knowledge base incorporating information available in di erent forms can better meet the needs of real world applications than one focusing and exposing knowledge in a more restricted way such as through SQL, SPARQL or simple keyword queries. Exploiting Wikipedia and related knowledge sources to develop a novel hybrid knowledge base brings advantages inherent to Wikipedia. Wikipedia provides a way to allow ordinary people to contribute knowledge as it is familiar and easy to use. This collaborative development process leads to a consensus model that is kept current and up-to-date and is also available in many languages. In- corporating these qualities in knowledge bases like Cyc [61] will be very expensive in terms of time, e ort and cost. E orts like DBpedia, Freebase and Linked Open Data are focused on making knowledge available in structured forms. Wikitology knowledge base can complement existing resources by integrating knowledge available in other forms and providing much more exible access to knowledge. We have directly demonstrated through our work that we can use world knowledge accessible through Wikitology hybrid knowledge base system to go beyond the level of mere words and can predict the semantic concepts present in documents as well as resolve ambiguity in NER systems by mapping the entities mentioned in documents to unique entities in the real world. Wikitology knowledge base system can provide a way to access and utilize common-sense and background knowledge for solving a variety of real world problems. solving a variety of real world problems.
Conference location United States, Maryland +
Data source Wikipedia pages  +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Wikitology%3A%2Ba%2Bnovel%2Bhybrid%2Bknowledge%2Bbase%2Bderived%2Bfrom%2BWikipedia%22  +
Has author Zareen Saba Syed +
Has domain Computer science +
Has topic Ontology building +
Peer reviewed Yes  +
Publication type Thesis  +
Published in University of Maryland, Baltimore County +
Research design Other  +
Research questions World knowledge may be available in differWorld knowledge may be available in different forms such as relational databases, triple stores, link graphs, meta-data and free text. Humans are capable of understanding and reasoning over knowledge represented in di erent ways and are influenced by di erent social, contextual and environmental factors. By following a similar model, we can integrate a variety of knowledge sources in a novel way to produce a single hybrid knowledge base enabling applications to better access and exploit knowledge hidden in di erent forms.xploit knowledge hidden in di erent forms.
Revid 11,106  +
Theories Undetermined
Theory type Design and action  +
Title Wikitology: a novel hybrid knowledge base derived from Wikipedia
Unit of analysis Article  + , Category  +
Url http://proquest.umi.com/pqdweb?did=2157352461&Fmt=7&clientId=10306&RQT=309&VName=PQD  +
Wikipedia coverage Main topic  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  + , Information categorization and navigation  +
Year 2010  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:36:44  +
Categories Ontology building  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:33:07  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.