Mobile information retrieval with search results clustering: prototypes and evaluations

From WikiLit
Jump to: navigation, search
Publication (help)
Mobile information retrieval with search results clustering: prototypes and evaluations
Authors: Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, Matteo Snidero [edit item]
Citation: Journal of the American Society for Information Science and Technology 60 (5): 877-895. 2009. United States, California.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1002/asi.21036.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Mobile information retrieval with search results clustering: prototypes and evaluations is a publication by Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, Matteo Snidero.


[edit] Abstract

Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we invthe application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present imobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a retrieval task, by means of the device-independent notion of subtopic reach time with a reusable test collection built from Wikipedia ambiguous entries. Then, we make a crosscomparison of methods (i.e., clustering and ranked list) and dev(i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants. The main finding is that clustering engines are a viable complementary approach to plain search engines both for desktop and mobilsearches especially, but not only, for multitopic informational queries.

[edit] Research questions

"In this article, we investigate the application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present its mobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a subtopic retrieval task, by means of the device-independent notion of subtopic reach time together with a reusable test collection built from wkipedia ambiguous entries. Then, we make a crosscomparison of methods (i.e., clustering and ranked list) and devices (i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants.

Our study is split in two parts. In the first part, we consider the theoretical retrieval performance of cluster hierarchies and ranked lists, regardless of the specific device used to display and interact with the results. In the second part, we do not assume that there is a predefined model of access to information, as in the first experiment, and we explicitly consider not only the retrieval method (i.e., clustering and ranked list) but also the device (i.e., desktop, PDA, and cell phone)"

Research details

Topics: Ranking and clustering systems [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "In practice, CREDO starts from the cluster placed at the hierarchy root, which is usually described by the query terms and covers all retrieved results, and then iteratively builds two lower levels of the hierarchy. Each level contains the most general of the concepts that are theoretically more specific than the concepts in the preceding level, according to the definition of formal concepts. To increase the utility of the clustering process for subtopic retrieval, the first level is generated using only the terms contained in the title of search results, and the second level using both the title and the snippet.

The CREDO hierarchy is then visualized using a simple folder tree layout. The system initially shows the hierarchy root and the first level of the hierarchy. The user can click on each cluster to see the results associated with it and expand its subclusters (if any). All the documents of one cluster that are not covered by its “children” are grouped in a dummy cluster named “other.”

CREDO does not neatly fit in either of the two classes discussed earlier. Similar to data-centric algorithms, it uses strict single-word indexing. Its monothetic clusters are mostly described by a single word, but they also can accommodate labels with multiple contiguous words, reflecting the causal (or deterministic) associations between words in the given query context. For instance, for the query “metamorphosis” (see Figure 1), CREDO returns some multiple-word concepts such as “hilary duff” and “star trek,” consistent with the fact that in the limited context represented by the results of “metamorphosis,” “hilary” always co-occurs with “duff” and “star” with “trek.”

In CREDO, cluster labeling is integrated with cluster formation by definition because a concept intent is univocally determined by a concept extent, and vice versa. Thus, CREDO builds the cluster structure and cluster descriptions at once. By contrast, these two operations are usually treated separately. The disadvantage of the common approach is that there may be a mismatch between the criterion used to find a common description and that used to group the search results, thus increasing the chance that the contents will not correspond to the labels (or vice versa)." [edit item]

Research design: Experiment, Mathematical modeling [edit item]
Data source: Experiment responses, Websites, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"We have shown that mobile search results clustering is both feasible and effective. In particular, our results support the viewthat mobile clustering engines can be faster and more accurate than the corresponding mobile search engines, especially for subtopic retrieval tasks.We also found that although mobile retrieval becomes, in general, less effective as the search device gets smaller, the adoption of clustering may help expand the usage patterns beyond mere informational search while mobile."

[edit] Comments

"Websites (Twitter), Wikipedia articles"


Further notes[edit]

Facts about "Mobile information retrieval with search results clustering: prototypes and evaluations"RDF feed
AbstractWeb searches from mobile devices such as PWeb searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we invthe application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present imobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a retrieval task, by means of the device-independent notion of subtopic reach time with a reusable test collection built from Wikipedia ambiguous entries. Then, we make a crosscomparison of methods (i.e., clustering and ranked list) and dev(i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants. The main finding is that clustering engines are a viable complementary approach to plain search engines both for desktop and mobilsearches especially, but not only, for multitopic informational queries.nly, for multitopic informational queries.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsWebsites (Twitter), Wikipedia articles
ConclusionWe have shown that mobile search results cWe have shown that mobile search results clustering is

both feasible and effective. In particular, our results support the viewthat mobile clustering engines can be faster and more accurate than the corresponding mobile search engines, especially for subtopic retrieval tasks.We also found that although mobile retrieval becomes, in general, less effective as the search device gets smaller, the adoption of clustering may help expand the usage patterns beyond mere informational search while mobile.nd mere informational

search while mobile.
Conference locationUnited States, California +
Data sourceExperiment responses +, Websites + and Wikipedia pages +
Doi10.1002/asi.21036 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Mobile%2Binformation%2Bretrieval%2Bwith%2Bsearch%2Bresults%2Bclustering%3A%2Bprototypes%2Band%2Bevaluations%22 +
Has authorClaudio Carpineto +, Stefano Mizzaro +, Giovanni Romano + and Matteo Snidero +
Has domainComputer science +
Has topicRanking and clustering systems +
Issue5 +
Pages877-895 +
Peer reviewedYes +
Publication typeJournal article +
Published inJournal of the American Society for Information Science and Technology +
Research designExperiment + and Mathematical modeling +
Research questionsIn this article, we investigate the applicIn this article, we investigate the application of search results clustering, used with some

success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present its mobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a subtopic retrieval task, by means of the device-independent notion of subtopic reach time together with a reusable test collection built from wkipedia ambiguous entries. Then, we make a crosscomparison of methods (i.e., clustering and ranked list) and devices (i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants.

Our study is split in two parts. In the first part, we consider the theoretical retrieval performance of cluster hierarchies and ranked lists, regardless of the specific device used to display and interact with the results. In the second part, we do not assume that there is a predefined model of access to information, as in the first experiment, and we explicitly consider not only the retrieval method (i.e., clustering and ranked list) but also the device (i.e., desktop, PDA, and cell phone)
evice (i.e., desktop, PDA, and cell phone)
Revid10,874 +
TheoriesIn practice, CREDO starts from the clusterIn practice, CREDO starts from the cluster placed at the hierarchy root, which is usually described by the query terms and covers all retrieved results, and then iteratively builds two lower levels of the hierarchy. Each level contains the most general of the concepts that are theoretically more specific than the concepts in the preceding level, according to the definition of formal concepts. To increase the utility of the clustering process for subtopic retrieval, the first level is generated using only the terms contained in the title of search results, and the second level using both the title and the snippet.

The CREDO hierarchy is then visualized using a simple folder tree layout. The system initially shows the hierarchy root and the first level of the hierarchy. The user can click on each cluster to see the results associated with it and expand its subclusters (if any). All the documents of one cluster that are not covered by its “children” are grouped in a dummy cluster named “other.”

CREDO does not neatly fit in either of the two classes discussed earlier. Similar to data-centric algorithms, it uses strict single-word indexing. Its monothetic clusters are mostly described by a single word, but they also can accommodate labels with multiple contiguous words, reflecting the causal (or deterministic) associations between words in the given query context. For instance, for the query “metamorphosis” (see Figure 1), CREDO returns some multiple-word concepts such as “hilary duff” and “star trek,” consistent with the fact that in the limited context represented by the results of “metamorphosis,” “hilary” always co-occurs with “duff” and “star” with “trek.”

In CREDO, cluster labeling is integrated with cluster formation by definition because a concept intent is univocally determined by a concept extent, and vice versa. Thus, CREDO builds the cluster structure and cluster descriptions at once. By contrast, these two operations are usually treated separately. The disadvantage of the common approach is that there may be a mismatch between the criterion used to find a common description and that used to group the search results, thus increasing the chance that the contents will not correspond to the labels (or vice versa).
correspond to the labels (or vice versa).
Theory typeDesign and action +
TitleMobile information retrieval with search results clustering: prototypes and evaluations
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1002/asi.21036 +
Volume60 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2009 +