Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features

From WikiLit
Jump to: navigation, search
Publication (help)
Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features
Authors: Viola Ganter, Michael Strube [edit item]
Citation: ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers  : 173-176. 2009.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features is a publication by Viola Ganter, Michael Strube.


[edit] Abstract

We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.

[edit] Research questions

"We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features."

Research details

Topics: Computational linguistics [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"The experiments show that the syntactic patterns work better when using a broader notion of hedging tested on manual annotations. When evaluating on Wikipedia weasel tags itself, word frequency and distance to the tag is sufficient. Our approach takes a much broader domain into account than previous work. It can also easily be applied to different languages as the weasel tag exists in more than 20 different language versions of Wikipedia. For a narrow domain, we suggest to start with our approach for deriving a seed set of hedging indicators and then to use a weakly supervised approach. Though our classifiers were trained on data from multiple Wikipedia dumps, there were only a few hundred training instances available. The transient nature of the weasel tag suggests to use the Wikipedia edit history for future work, since the edits faithfully record all occurrences of weasel tags."

[edit] Comments


Further notes[edit]

Facts about "Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features"RDF feed
AbstractWe investigate the automatic detection of We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.n, as well as shallow linguistic features.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
ConclusionThe experiments show that the syntactic paThe experiments show that the syntactic patterns

work better when using a broader notion of hedging tested on manual annotations. When evaluating on Wikipedia weasel tags itself, word frequency and distance to the tag is sufficient. Our approach takes a much broader domain into account than previous work. It can also easily be applied to different languages as the weasel tag exists in more than 20 different language versions of Wikipedia. For a narrow domain, we suggest to start with our approach for deriving a seed set of hedging indicators and then to use a weakly supervised approach. Though our classifiers were trained on data from multiple Wikipedia dumps, there were only a few hundred training instances available. The transient nature of the weasel tag suggests to use the Wikipedia edit history for future work, since the edits faithfully record all occurrences of weasel tags.lly record all occurrences of

weasel tags.
Data sourceExperiment responses + and Wikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Finding%2Bhedges%2Bby%2Bchasing%2Bweasels%3A%2Bhedge%2Bdetection%2Busing%2BWikipedia%2Btags%2Band%2Bshallow%2Blinguistic%2Bfeatures%22 +
Has authorViola Ganter + and Michael Strube +
Has domainComputer science +
Has topicComputational linguistics +
Pages173-176 +
Peer reviewedYes +
Publication typeConference paper +
Published inACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers +
Research designExperiment +
Research questionsWe investigate the automatic detection of

We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.n,

as well as shallow linguistic features.
Revid10,773 +
TheoriesUndetermined
Theory typeDesign and action +
TitleFinding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features
Unit of analysisArticle +
Urlhttp://dl.acm.org/citation.cfm?id=1667636 +
Wikipedia coverageMain topic +
Wikipedia data extractionDump +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2009 +