Browse wiki

Jump to: navigation, search
The geographical analog engine: hybrid numeric and semantic similarity measures for U.S. cities
Abstract This dissertation began with the goal to dThis dissertation began with the goal to develop a methodology for locating climate change analogs, and quickly turned into a quest for computational means of locating geographical analogs in general. Previous work in geographical analogs either only computed on numeric information, or manually considered qualitative information. Current and emerging technologies, such as electronic document collections, the Internet, and the Semantic Web, make it possible for people and organizations to store millions of books and articles, share them with the world, or even author some themselves. The amount of electronic and online content is expanding at an exponential speed, such that analysts are increasingly overwhelmed by the sheer volumes of accessible information. The dissertation explores techniques from knowledge engineering, artificial intelligence, information sciences, linguistics and cognitive science, and proposes a novel, automatic methodology that computes similarity within online/offline textual information, and graphically and statistically combines the results with those of numeric methods. {U.S.} cities with populations larger than 25,000 people are selected as a test case. Places are evaluated based on their numeric characteristics in the County and City Data Book and qualitative characteristics from Wikipedia entries. The dissertation recommends a way to convert Wikipedia entries into the Web Ontology Language {(OWL)} ontologies, which computer algorithms can read, understand and compute. The dissertation initially experiments with Mitra and Wiederhold's semantic measure to quantify similarity between places in the qualitative space. Many shortfalls are identified, and a series of experimental enhancements are explored. The experiments demonstrate that good semantic measures should employ a comprehensive stop-words list and a complete, but succinct vocabulary. A semantic measure that can recognize synonyms must understand the intended senses of words in a place description. Furthermore, analysts need to be careful with two styles of descriptions: descriptions of places that are (1) created by following a template, or (2) laden with statistical statements can result in falsely high similarity between the places. It is illustrated that scatter plots of numeric similarity scores versus semantic similarity scores can effectively help analysts consider similarity between places in two-space. Analysts can visually observe whether the numeric ranks of places agree with the semantic ranks. The dissertation also shows that the Spearman's rank correlation test and the {Kruskal-Wallis} test of means can provide statistical confirmation for visual observations. The proposed hybrid methodology enables analysts to automatically discover geographical analogs in ways that strictly numeric methods or manual semantic analysis cannot offer. or manual semantic analysis cannot offer.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments this study developes a method for translating Wikipedia content to machine computable format.
Conclusion this study developes a method for translating Wikipedia content to machine computable format.
Conference location United States, Pennsylvania +
Data source Experiment responses  + , Wikipedia pages  +
Google scholar url  +
Has author Tawan Banchuen +
Has domain Geography + , Information science +
Has topic Ontology building +
Peer reviewed Yes  +
Publication type Thesis  +
Published in Pennsylvania State University +
Research design Experiment  +
Research questions This study developes a machine for matching similar places based on qualitative descriptions of the places on Wikipedia
Revid 10,978  +
Theories Undetermined
Theory type Design and action  +
Title The geographical analog engine: hybrid numeric and semantic similarity measures for U.S. cities
Unit of analysis Article  +
Url  +
Wikipedia coverage Other  +
Wikipedia data extraction Live Wikipedia  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2008  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:30:42  +
Categories Ontology building  + , Geography  + , Information science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:31:44  +
hide properties that link here 
  No properties link to this page.


Enter the name of the page to start browsing from.