Sparse relational data sets: issues and an application

From WikiLit
Jump to: navigation, search
Publication (help)
Sparse relational data sets: issues and an application
Authors: Eric Lik Han Chu [edit item]
Citation: The University of Wisconsin - Madison  : . 2008. United States, Wisconsin.
Publication type: Thesis
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Sparse relational data sets: issues and an application is a publication by Eric Lik Han Chu.


[edit] Abstract

This dissertation comprises three parts. The first part presents a relational approach to building a workbench that supports extracting and processing structured data from unstructured data for querying, and that allows users to query using as much structured data as is currently available. The workbench provides basic operations that can be combined to process data in a pay as you go" fashion and a wide table to store the resulting sparse data set in which most attributes are null for most documents. As a proof of concept we conducted a case study on applying this relational workbench approach to support structured queries over Wikipedia. We present examples of incremental data processing with a series of operations and show that users can pose increasingly sophisticated queries over the results of these operations. Our conclusion from the case study is that while the relational workbench approach is promising and worth investigating its success heavily relies on good relational database support for sparse data. Unfortunately most relational database systems are not good at handling sparse data sets. The second part of the dissertation addresses some challenges presented when managing sparse data in relational database systems. With recent work showing that we can store sparse data efficiently by using interpreted storage we show that storing a sparse data set in a single wide table is an effective approach. For querying we show that keyword search often provides "focused" results because most terms appear in few rows and columns. As for query evaluation

[edit] Research questions

"presents a relational approach to building a workbench that supports extracting and processing structured data from unstructured data for querying, and that allows users to query using as much structured data as is currently available."

Research details

Topics: Information extraction [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Case [edit item]
Theories: "Undetermined" [edit item]
Research design: Case study [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"Our Experience in this study demonstartes that our approach exploits existing technology effectively and facilitates the transition from extracting structured data from documents to querying this structured data."

[edit] Comments

"our represented method could extract structured data from unstructured data for querying"


Further notes[edit]

Facts about "Sparse relational data sets: issues and an application"RDF feed
AbstractThis dissertation comprises three parts. TThis dissertation comprises three parts. The first part presents a relational approach to building a workbench that supports extracting and processing structured data from unstructured data for querying, and that allows users to query using as much structured data as is currently available. The workbench provides basic operations that can be combined to process data in a pay as you go" fashion and a wide table to store the resulting sparse data set in which most attributes are null for most documents. As a proof of concept we conducted a case study on applying this relational workbench approach to support structured queries over Wikipedia. We present examples of incremental data processing with a series of operations and show that users can pose increasingly sophisticated queries over the results of these operations. Our conclusion from the case study is that while the relational workbench approach is promising and worth investigating its success heavily relies on good relational database support for sparse data. Unfortunately most relational database systems are not good at handling sparse data sets. The second part of the dissertation addresses some challenges presented when managing sparse data in relational database systems. With recent work showing that we can store sparse data efficiently by using interpreted storage we show that storing a sparse data set in a single wide table is an effective approach. For querying we show that keyword search often provides "focused" results because most terms appear in few rows and columns. As for query evaluation rows and columns. As for query evaluation
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Commentsour represented method could extract structured data from unstructured data for querying
ConclusionOur Experience in this study demonstartes that our approach exploits existing technology effectively and facilitates the transition from extracting structured data from documents to querying this structured data.
Conference locationUnited States, Wisconsin +
Data sourceWikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Sparse%2Brelational%2Bdata%2Bsets%3A%2Bissues%2Band%2Ban%2Bapplication%22 +
Has authorEric Lik Han Chu +
Has domainComputer science +
Has topicInformation extraction +
Peer reviewedYes +
Publication typeThesis +
Published inThe University of Wisconsin - Madison +
Research designCase study +
Research questionspresents a relational approach to building a workbench that supports extracting and processing structured data from unstructured data for querying, and that allows users to query using as much structured data as is currently available.
Revid10,949 +
TheoriesUndetermined
Theory typeDesign and action +
TitleSparse relational data sets: issues and an application
Unit of analysisArticle +
Urlhttp://proquest.umi.com/pqdweb?did=1599589551&Fmt=7&clientId=10306&RQT=309&VName=PQD +
Wikipedia coverageCase +
Wikipedia data extractionDump +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2008 +