Browse wiki

Jump to: navigation, search
Overview of the INEX 2008 XML mining track
Abstract We describe here the {XML} Mining Track atWe describe here the {XML} Mining Track at {INEX} 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning {(ML)} tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of {XML} documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the {XML} documents and also the link information between documents.so the link information between documents.
Added by wikilit team Added on initial load  +
Collected data time dimension N/A  +
Comments This article is an overview of a collection of models designed for the same purpose; the methodology used by the other authors is experimental. Secondary data (Wikipedia pages and other sources);
Conclusion We have presented here the different modelWe have presented here the different models and results obtained during the XML Document Mining Track at INEX 2008. The original idea of this track was to provide simultaneously XML documents with a graph structure. The graph labeling task is a promising task that corresponds to many real applications (classification on the Web, classification on Social networks, ...) and the XML Mining track is a first step to develop new models for text categorization/ clustering in a graph structure.rization/ clustering in a graph structure.
Data source Experiment responses  + , Scholarly articles  +
Doi 10.1007/978-3-642-03761-0 41 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Overview%2Bof%2Bthe%2BINEX%2B2008%2BXML%2Bmining%2Btrack%22  +
Has author Ludovic Denoyer + , Patrick Gallinari +
Has domain Computer science +
Has topic Data mining +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Advances in Focused Retrieval +
Research design Experiment  + , Literature review  +
Research questions We describe here the XML Mining Track at IWe describe here the XML Mining Track at INEX 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML documents and also the link information between documents.so the link information between documents.
Revid 11,178  +
Theories Undetermined
Theory type Design and action  +
Title Overview of the INEX 2008 XML mining track
Unit of analysis N/A  +
Url http://0-portal.acm.org.mercury.concordia.ca/citation.cfm?id=1611913.1611961&coll=DL&dl=GUIDE&CFID=112020990&CFTOKEN=50968312&preflayout=flat  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Secondary dataset  +
Wikipedia language Not specified  +
Wikipedia page type N/A  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:29:57  +
Categories Data mining  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:30:16  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.