Extraction, selection and ranking of field association (FA) terms from domain-specific corpora for building a comprehensive FA terms dictionary

From WikiLit
Jump to: navigation, search
Publication (help)
Extraction, selection and ranking of field association (FA) terms from domain-specific corpora for building a comprehensive FA terms dictionary
Authors: Tshering Dorji, El sayed Atlam, Susumu Yata, Masao Fuketa, Kazuhiro Morita, Junichi Aoe [edit item]
Citation: Knowledge and Information Systems  : . 2010 April.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1007/s10115-010-0296-x.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Extraction, selection and ranking of field association (FA) terms from domain-specific corpora for building a comprehensive FA terms dictionary is a publication by Tshering Dorji, El sayed Atlam, Susumu Yata, Masao Fuketa, Kazuhiro Morita, Junichi Aoe.


[edit] Abstract

Field Association (FA) Terms—words or phrases that serve to identify document fields are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306 MB of domain-specific corpora obtained from English Wikipedia dumps selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97 and 65–98. This is better than the traditional methods. The FA Terms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying the fields of 10,077 test documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set.

[edit] Research questions

"FieldAssociation (FA) Terms—words or phrases that serve to identify document fields are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306MB of domain-specific corpora obtained from English Wikipedia dumps selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97and 65–98. This is better than the traditionalmethods. The FATerms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying the fields of 10,077 test documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set."

Research details

Topics: Other natural language processing topics [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"The novel technique of using FA Terms holds much potential for use in many areas of information retrieval and natural language processing, but one of the major problems today is the lack of a comprehensive FA Terms dictionary. Therefore, we have presented a methodology to extract and select FA Terms effectively to build a comprehensive FA Terms dictionary. The methodology is based on POS pattern rules, corpora comparison and modified tf-idf weighting for selecting domain-relevant terms. Experimental evaluation carried out for 21 different fields using 306MBof domain-specific corpora obtained fromWikipedia dump selected 22,229 compound FA Terms and 9,005 single FA Terms. The precision and recall were 74–97 and 65–98% respectively. The results show that the proposed methodology is effective for building a comprehensive dictionary of FA Terms."

[edit] Comments

"we have presented a methodology to extract and select FA Terms effectively to build a comprehensive FA Terms dictionary."


Further notes[edit]

Facts about "Extraction, selection and ranking of field association (FA) terms from domain-specific corpora for building a comprehensive FA terms dictionary"RDF feed
AbstractField Association (FA) Terms—words or phraField Association (FA) Terms—words or phrases that serve to identify document fields are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306 MB of domain-specific corpora obtained from English Wikipedia dumps selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97 and 65–98. This is better than the traditional methods. The FA Terms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying the fields of 10,077 test documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set.ers RCV1 corpus and 20 Newsgroup data set.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Commentswe have presented a methodology to extract and select FA Terms effectively to build a comprehensive FA Terms dictionary.
ConclusionThe novel technique of using FA Terms holdThe novel technique of using FA Terms holds much potential for use in many areas of information

retrieval and natural language processing, but one of the major problems today is the lack of a comprehensive FA Terms dictionary. Therefore, we have presented a methodology to extract and select FA Terms effectively to build a comprehensive FA Terms dictionary. The methodology is based on POS pattern rules, corpora comparison and modified tf-idf weighting for selecting domain-relevant terms. Experimental evaluation carried out for 21 different fields using 306MBof domain-specific corpora obtained fromWikipedia dump selected 22,229 compound FA Terms and 9,005 single FA Terms. The precision and recall were 74–97 and 65–98% respectively. The results show that the proposed methodology is effective for building a comprehensive dictionary of FA Terms.ng a comprehensive dictionary of

FA Terms.
Data sourceExperiment responses + and Wikipedia pages +
Doi10.1007/s10115-010-0296-x +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Extraction%2C%2Bselection%2Band%2Branking%2Bof%2Bfield%2Bassociation%2B%28FA%29%2Bterms%2Bfrom%2Bdomain-specific%2Bcorpora%2Bfor%2Bbuilding%2Ba%2Bcomprehensive%2BFA%2Bterms%2Bdictionary%22 +
Has authorTshering Dorji +, El sayed Atlam +, Susumu Yata +, Masao Fuketa +, Kazuhiro Morita + and Junichi Aoe +
Has domainComputer science +
Has topicOther natural language processing topics +
MonthApril +
Peer reviewedYes +
Publication typeJournal article +
Published inKnowledge and Information Systems +
Research designExperiment +
Research questionsFieldAssociation (FA) Terms—words or phrasFieldAssociation (FA) Terms—words or phrases that serve to identify document

fields are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306MB of domain-specific corpora obtained from English Wikipedia dumps selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97and 65–98. This is better than the traditionalmethods. The FATerms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying the fields of 10,077 test

documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set.
ers RCV1 corpus and 20 Newsgroup data set.
Revid10,769 +
TheoriesUndetermined
Theory typeDesign and action +
TitleExtraction, selection and ranking of field association (FA) terms from domain-specific corpora for building a comprehensive FA terms dictionary
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1007/s10115-010-0296-x +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2010 +