Browse wiki

Jump to: navigation, search
Sparse relational data sets: issues and an application
Abstract This dissertation comprises three parts. TThis dissertation comprises three parts. The first part presents a relational approach to building a workbench that supports extracting and processing structured data from unstructured data for querying, and that allows users to query using as much structured data as is currently available. The workbench provides basic operations that can be combined to process data in a pay as you go" fashion and a wide table to store the resulting sparse data set in which most attributes are null for most documents. As a proof of concept we conducted a case study on applying this relational workbench approach to support structured queries over Wikipedia. We present examples of incremental data processing with a series of operations and show that users can pose increasingly sophisticated queries over the results of these operations. Our conclusion from the case study is that while the relational workbench approach is promising and worth investigating its success heavily relies on good relational database support for sparse data. Unfortunately most relational database systems are not good at handling sparse data sets. The second part of the dissertation addresses some challenges presented when managing sparse data in relational database systems. With recent work showing that we can store sparse data efficiently by using interpreted storage we show that storing a sparse data set in a single wide table is an effective approach. For querying we show that keyword search often provides "focused" results because most terms appear in few rows and columns. As for query evaluation rows and columns. As for query evaluation
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments our represented method could extract structured data from unstructured data for querying
Conclusion Our Experience in this study demonstartes that our approach exploits existing technology effectively and facilitates the transition from extracting structured data from documents to querying this structured data.
Conference location United States, Wisconsin +
Data source Wikipedia pages  +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Sparse%2Brelational%2Bdata%2Bsets%3A%2Bissues%2Band%2Ban%2Bapplication%22  +
Has author Eric Lik Han Chu +
Has domain Computer science +
Has topic Information extraction +
Peer reviewed Yes  +
Publication type Thesis  +
Published in The University of Wisconsin - Madison +
Research design Case study  +
Research questions presents a relational approach to building a workbench that supports extracting and processing structured data from unstructured data for querying, and that allows users to query using as much structured data as is currently available.
Revid 10,949  +
Theories Undetermined
Theory type Design and action  +
Title Sparse relational data sets: issues and an application
Unit of analysis Article  +
Url http://proquest.umi.com/pqdweb?did=1599589551&Fmt=7&clientId=10306&RQT=309&VName=PQD  +
Wikipedia coverage Case  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2008  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:30:16  +
Categories Information extraction  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:31:25  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.