Browse wiki

Jump to: navigation, search
Keywords in the mist: automated keyword extraction for very large documents and back of the book indexing
Abstract This research addresses the problem of autThis research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts.resembling those created by human experts.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conference location United States, Texas +
Data source Wikipedia pages  +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Keywords%2Bin%2Bthe%2Bmist%3A%2Bautomated%2Bkeyword%2Bextraction%2Bfor%2Bvery%2Blarge%2Bdocuments%2Band%2Bback%2Bof%2Bthe%2Bbook%2Bindexing%22  +
Has author Andras Csomai +
Has domain Computer science +
Has topic Information extraction +
Peer reviewed Yes  +
Publication type Thesis  +
Published in University of North Texas +
Research design Statistical analysis  +
Research questions What are the challenges associated with keWhat are the challenges associated with keyphrase extraction on very long documents? What is the role played by different information theoretic and linguistic features? What is the role played by supervision in the quality of keyphrase extraction and back of the book indexing? Can techniques for keyphrase extraction be used to develop a fully automated back of the book indexing system?utomated back of the book indexing system?
Revid 10,809  +
Theories In Chapter 5, I present the construction-integration theory of human comprehension…
Theory type Design and action  +
Title Keywords in the mist: automated keyword extraction for very large documents and back of the book indexing
Unit of analysis N/A  +
Url http://proquest.umi.com/pqdweb?did=1597616811&Fmt=7&clientId=10306&RQT=309&VName=PQD  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Dump  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2008  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:29:26  +
Categories Information extraction  + , Computer science  + , Publications with missing conclusion  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:28:45  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.