The geographical analog engine: hybrid numeric and semantic similarity measures for U.S. cities

From WikiLit
Jump to: navigation, search
Publication (help)
The geographical analog engine: hybrid numeric and semantic similarity measures for U.S. cities
Authors: Tawan Banchuen [edit item]
Citation: Pennsylvania State University  : . 2008. United States, Pennsylvania.
Publication type: Thesis
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
The geographical analog engine: hybrid numeric and semantic similarity measures for U.S. cities is a publication by Tawan Banchuen.


[edit] Abstract

This dissertation began with the goal to develop a methodology for locating climate change analogs, and quickly turned into a quest for computational means of locating geographical analogs in general. Previous work in geographical analogs either only computed on numeric information, or manually considered qualitative information. Current and emerging technologies, such as electronic document collections, the Internet, and the Semantic Web, make it possible for people and organizations to store millions of books and articles, share them with the world, or even author some themselves. The amount of electronic and online content is expanding at an exponential speed, such that analysts are increasingly overwhelmed by the sheer volumes of accessible information. The dissertation explores techniques from knowledge engineering, artificial intelligence, information sciences, linguistics and cognitive science, and proposes a novel, automatic methodology that computes similarity within online/offline textual information, and graphically and statistically combines the results with those of numeric methods. {U.S.} cities with populations larger than 25,000 people are selected as a test case. Places are evaluated based on their numeric characteristics in the County and City Data Book and qualitative characteristics from Wikipedia entries. The dissertation recommends a way to convert Wikipedia entries into the Web Ontology Language {(OWL)} ontologies, which computer algorithms can read, understand and compute. The dissertation initially experiments with Mitra and Wiederhold's semantic measure to quantify similarity between places in the qualitative space. Many shortfalls are identified, and a series of experimental enhancements are explored. The experiments demonstrate that good semantic measures should employ a comprehensive stop-words list and a complete, but succinct vocabulary. A semantic measure that can recognize synonyms must understand the intended senses of words in a place description. Furthermore, analysts need to be careful with two styles of descriptions: descriptions of places that are (1) created by following a template, or (2) laden with statistical statements can result in falsely high similarity between the places. It is illustrated that scatter plots of numeric similarity scores versus semantic similarity scores can effectively help analysts consider similarity between places in two-space. Analysts can visually observe whether the numeric ranks of places agree with the semantic ranks. The dissertation also shows that the Spearman's rank correlation test and the {Kruskal-Wallis} test of means can provide statistical confirmation for visual observations. The proposed hybrid methodology enables analysts to automatically discover geographical analogs in ways that strictly numeric methods or manual semantic analysis cannot offer.

[edit] Research questions

"This study developes a machine for matching similar places based on qualitative descriptions of the places on Wikipedia"

Research details

Topics: Ontology building [edit item]
Domains: Geography, Information science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Other [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"this study developes a method for translating Wikipedia content to machine computable format."

[edit] Comments

"this study developes a method for translating Wikipedia content to machine computable format."


Further notes[edit]

Facts about "The geographical analog engine: hybrid numeric and semantic similarity measures for U.S. cities"RDF feed
AbstractThis dissertation began with the goal to dThis dissertation began with the goal to develop a methodology for locating climate change analogs, and quickly turned into a quest for computational means of locating geographical analogs in general. Previous work in geographical analogs either only computed on numeric information, or manually considered qualitative information. Current and emerging technologies, such as electronic document collections, the Internet, and the Semantic Web, make it possible for people and organizations to store millions of books and articles, share them with the world, or even author some themselves. The amount of electronic and online content is expanding at an exponential speed, such that analysts are increasingly overwhelmed by the sheer volumes of accessible information. The dissertation explores techniques from knowledge engineering, artificial intelligence, information sciences, linguistics and cognitive science, and proposes a novel, automatic methodology that computes similarity within online/offline textual information, and graphically and statistically combines the results with those of numeric methods. {U.S.} cities with populations larger than 25,000 people are selected as a test case. Places are evaluated based on their numeric characteristics in the County and City Data Book and qualitative characteristics from Wikipedia entries. The dissertation recommends a way to convert Wikipedia entries into the Web Ontology Language {(OWL)} ontologies, which computer algorithms can read, understand and compute. The dissertation initially experiments with Mitra and Wiederhold's semantic measure to quantify similarity between places in the qualitative space. Many shortfalls are identified, and a series of experimental enhancements are explored. The experiments demonstrate that good semantic measures should employ a comprehensive stop-words list and a complete, but succinct vocabulary. A semantic measure that can recognize synonyms must understand the intended senses of words in a place description. Furthermore, analysts need to be careful with two styles of descriptions: descriptions of places that are (1) created by following a template, or (2) laden with statistical statements can result in falsely high similarity between the places. It is illustrated that scatter plots of numeric similarity scores versus semantic similarity scores can effectively help analysts consider similarity between places in two-space. Analysts can visually observe whether the numeric ranks of places agree with the semantic ranks. The dissertation also shows that the Spearman's rank correlation test and the {Kruskal-Wallis} test of means can provide statistical confirmation for visual observations. The proposed hybrid methodology enables analysts to automatically discover geographical analogs in ways that strictly numeric methods or manual semantic analysis cannot offer. or manual semantic analysis cannot offer.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Commentsthis study developes a method for translating Wikipedia content to machine computable format.
Conclusionthis study developes a method for translating Wikipedia content to machine computable format.
Conference locationUnited States, Pennsylvania +
Data sourceExperiment responses + and Wikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22The%2Bgeographical%2Banalog%2Bengine%3A%2Bhybrid%2Bnumeric%2Band%2Bsemantic%2Bsimilarity%2Bmeasures%2Bfor%2BU.S.%2Bcities%22 +
Has authorTawan Banchuen +
Has domainGeography + and Information science +
Has topicOntology building +
Peer reviewedYes +
Publication typeThesis +
Published inPennsylvania State University +
Research designExperiment +
Research questionsThis study developes a machine for matching similar places based on qualitative descriptions of the places on Wikipedia
Revid10,978 +
TheoriesUndetermined
Theory typeDesign and action +
TitleThe geographical analog engine: hybrid numeric and semantic similarity measures for U.S. cities
Unit of analysisArticle +
Urlhttp://proquest.umi.com/pqdweb?did=1637577661&Fmt=7&clientId=10306&RQT=309&VName=PQD +
Wikipedia coverageOther +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2008 +