Crossing textual and visual content in different application scenarios

From WikiLit
Jump to: navigation, search
Publication (help)
Crossing textual and visual content in different application scenarios
Authors: Julien Ah-Pine, Marco Bressan, Stephane Clinchant, Gabriela Csurka, Yves Hoppenot, Jean-Michel Renders [edit item]
Citation: Multimedia Tools and Applications 42 (1): 31-56. 2009.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1007/s11042-008-0246-8.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Crossing textual and visual content in different application scenarios is a publication by Julien Ah-Pine, Marco Bressan, Stephane Clinchant, Gabriela Csurka, Yves Hoppenot, Jean-Michel Renders.


[edit] Abstract

This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.

[edit] Research questions

"This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content."

Research details

Topics: Multimedia information retrieval [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "First of all, the theoretical contribution is the extension of the principle of trans-media feedback, into a metric view: the definition of trans-media similarities. As it was shown, these new similarity measures of cross-content enables to find illustrative images for a text, to annotate an image, cluster or retrieve multi-modal objects." [edit item]
Research design: Design science [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: French [edit item]

[edit] Conclusion

"We have presented a framework for accessing multimodal data. First of all, the theoretical contribution is the extension of the principle of trans-media feedback, into a metric view: the definition of trans-media similarities. As it was shown, these new similarity measures of cross-content enables to find illustrative images for a text, to annotate an image, cluster or retrieve multi-modal objects. Moreover, the trans-media similarities are not specific to image and text: they can be applied to any mixture of media ( speech, video, text ) or views of an object. Most importantly, we have shown how these techniques can be used in two use cases: the travel blog assistant system and the multimedia browsing tool. These two applications stress the necessity of cross-media systems, where no monomedia systems can solve the user’s problem, nor address all the different user’s need at the same time."

[edit] Comments

"Wikipedia pages: This corpus concerns around 8,500 pages taken from the french Wikipedia corpus. We extracted these pages from the xml dump done in September 2007 and provided by the Wikipedia Foundation.

In the part that describes their program with visualization of Wikipedia data there is little experiment. "Data source" should not be "Experiment responses". "Research design" should be "design science", and perhaps "mathematical modelling", probably not "experiment".

They do not use the whole part of the document only title, free-text image description and the paragraph where it is used. Thus the "unit of analysis" is not the full article.

Collected time dimension must be "cross-sectional"."


Further notes[edit]

Facts about "Crossing textual and visual content in different application scenarios"RDF feed
AbstractThis paper deals with multimedia informatiThis paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.ount the multimedia nature of its content.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsWikipedia pages: This corpus concerns arouWikipedia pages: This corpus concerns around 8,500 pages taken from the french Wikipedia corpus. We extracted these pages from the xml dump done in September 2007 and provided by the Wikipedia Foundation.

In the part that describes their program with visualization of Wikipedia data there is little experiment. "Data source" should not be "Experiment responses". "Research design" should be "design science", and perhaps "mathematical modelling", probably not "experiment".

They do not use the whole part of the document only title, free-text image description and the paragraph where it is used. Thus the "unit of analysis" is not the full article.

Collected time dimension must be "cross-sectional".
time dimension must be "cross-sectional".
ConclusionWe have presented a framework for accessinWe have presented a framework for accessing multimodal data. First of all, the

theoretical contribution is the extension of the principle of trans-media feedback, into a metric view: the definition of trans-media similarities. As it was shown, these new similarity measures of cross-content enables to find illustrative images for a text, to annotate an image, cluster or retrieve multi-modal objects. Moreover, the trans-media similarities are not specific to image and text: they can be applied to any mixture of media ( speech, video, text ) or views of an object. Most importantly, we have shown how these techniques can be used in two use cases: the travel blog assistant system and the multimedia browsing tool. These two applications stress the necessity of cross-media systems, where no monomedia systems can solve the user’s

problem, nor address all the different user’s need at the same time.
he different user’s need at the same time.
Data sourceWikipedia pages +
Doi10.1007/s11042-008-0246-8 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Crossing%2Btextual%2Band%2Bvisual%2Bcontent%2Bin%2Bdifferent%2Bapplication%2Bscenarios%22 +
Has authorJulien Ah-Pine +, Marco Bressan +, Stephane Clinchant +, Gabriela Csurka +, Yves Hoppenot + and Jean-Michel Renders +
Has domainComputer science +
Has topicMultimedia information retrieval +
Issue1 +
Pages31-56 +
Peer reviewedYes +
Publication typeJournal article +
Published inMultimedia Tools and Applications +
Research designDesign science +
Research questionsThis paper deals with multimedia informatiThis paper deals with multimedia information access. We propose two new

approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.ount the multimedia nature of its

content.
Revid11,129 +
TheoriesFirst of all, the theoretical contributionFirst of all, the theoretical contribution is the extension of the principle of trans-media feedback, into a metric view: the definition of trans-media similarities. As it was shown, these new similarity measures of cross-content enables to find illustrative images for a text, to annotate an image, cluster or retrieve multi-modal objects., cluster or retrieve multi-modal objects.
Theory typeDesign and action +
TitleCrossing textual and visual content in different application scenarios
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1007/s11042-008-0246-8 +
Volume42 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageFrench +
Wikipedia page typeArticle +
Year2009 +