Overview of the INEX 2008 XML mining track

From WikiLit
Revision as of 20:11, February 20, 2014 by Ochado (Talk | contribs) (Added "Experiment", since this is the methodology of constitutent articles)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Publication (help)
Overview of the INEX 2008 XML mining track
Authors: Ludovic Denoyer, Patrick Gallinari [edit item]
Citation: Advances in Focused Retrieval  : . 2009.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1007/978-3-642-03761-0_41.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Overview of the INEX 2008 XML mining track is a publication by Ludovic Denoyer, Patrick Gallinari.


[edit] Abstract

We describe here the {XML} Mining Track at {INEX} 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning {(ML)} tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of {XML} documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the {XML} documents and also the link information between documents.

[edit] Research questions

"We describe here the XML Mining Track at INEX 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML documents and also the link information between documents."

Research details

Topics: Data mining [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment, Literature review [edit item]
Data source: Experiment responses, Scholarly articles [edit item]
Collected data time dimension: N/A [edit item]
Unit of analysis: N/A [edit item]
Wikipedia data extraction: Secondary dataset [edit item]
Wikipedia page type: N/A [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"We have presented here the different models and results obtained during the XML Document Mining Track at INEX 2008. The original idea of this track was to provide simultaneously XML documents with a graph structure. The graph labeling task is a promising task that corresponds to many real applications (classification on the Web, classification on Social networks, ...) and the XML Mining track is a first step to develop new models for text categorization/ clustering in a graph structure."

[edit] Comments

"This article is an overview of a collection of models designed for the same purpose; the methodology used by the other authors is experimental. Secondary data (Wikipedia pages and other sources);"


Further notes[edit]