Tree-traversing ant algorithm for term clustering based on featureless similarities
|Tree-traversing ant algorithm for term clustering based on featureless similarities|
|Authors:||Wilson Wong, Wei Liu, Mohammed Bennamoun|
|Citation:||Data Mining and Knowledge Discovery 15 (3): 349-381. 2007.|
|Publication type:||Journal article|
|Google Scholar cites:||Citations|
|Added by Wikilit team:||Added on initial load|
|Article:||Google Scholar BASE PubMed|
|Other scholarly wikis:||AcaWiki Brede Wiki WikiPapers|
|Web search:||Bing Google Yahoo! — Google PDF|
Many conventional methods for concepts formation in ontology learning have relied on the use of predefined templates and rules, and static resources such as WordNet. Such approaches are not scalable, difficult to port between different domains and incapable of handling knowledge fluctuations. Their results are far from desirable, either. In this paper, we propose a new ant-based clustering algorithm, Tree-Traversing Ant (TTA), for concepts formation as part of an ontology learning system. With the help of Normalized Google Distance (NGD) and n of Wikipedia (nW) as measures for similarity and distance between terms, we attempt to achieve an adaptable clustering method that is highly scalable and portable across domains. Evaluations with an seven datasets show promising results with an average lexical overlap of 97% and ontological improvement of 48%. At the same time, the evaluations demonstrated several advantages that are not simultaneously present in standard ant-based and other conventional clustering methods.
"In this paper, we propose a new antbased clustering algorithm, Tree-Traversing Ant (TTA), for concepts formation as part of an ontology learning system.With the help of Normalized GoogleDistance (NGD) and n◦ ofWikipedia (n◦W) as measures for similarity and distance between terms, we attempt to achieve an adaptable clustering method that is highly scalable and portable across domains."
|Theory type:||Analysis, Design and action|
|Theories:||"the TTAs will employ a new measure called n◦ ofWikipedia (n◦W) for quantifying the distance between two terms based on the cross-linking of Wikipedia articles (Wong et al. 2006)."|
|Data source:||Experiment responses, Wikipedia pages|
|Collected data time dimension:||Cross-sectional|
|Unit of analysis:||Article|
|Wikipedia data extraction:||Dump|
|Wikipedia page type:||Article|
"Seven of the most notable strength of the TTA with NGD and n◦W are: – Able to further distinguish hidden structures within clusters; – Flexible in regards to the discovery of clusters; – Capable of identifying and isolating outliers; – Tolerance to differing cluster sizes; – Able to produce consistent results; – Able to identify implicit taxonomic relationships between clusters; and – Inherent capability of coping with synonyms, word senses and the fluctuation in terms usage."
"we have proposed the innovative use of featureless similarity based on Normalized Google Distance (NGD) and n◦ of Wikipedia (n◦W). The use of the two similarity measures as part of a new hybrid clustering algorithm called Tree-Traversing Ant (TTA) demonstrated excellent results during our evaluations."