Browse wiki

Jump to: navigation, search
Contextual retrieval of single Wikipedia articles to support the reading of academic abstracts
Abstract Google style search engines are currently Google style search engines are currently some of the most popular tools that people use when they are looking for information. There are a variety of reasons that people can have for conducting a search, although, these reasons can generally be distilled down to users being engaged in a task and developing an information need that impedes them from completing that task at a level which is satisfactory to them. The Google style search engine, however, is not always the most appropriate tool for every user task. In this thesis, our approach to search differs from the traditional search engine as we focus on providing support to users who are reading academic abstracts. When people do not understand a passage in the abstract they are reading, they often look for more detailed information or a definition. Presenting them with a list of possibly relevant search results, as a Google style search would, may not immediately meet this information need. In the case of reading, it is logical to hypothesize that userswould prefer to receive a single document containing the information that they need. Developed in this thesis are retrieval algorithms that use the abstract being read along with the passage that the user is interested in to retrieve a single highly related article from Wikipedia. The top performing algorithm from the experiments conducted in this thesis is able to retrieve an appropriate article 77\% of the time. This algorithm was deployed in a prototype reading support tool. {LiteraryMark}, in order to investigate the usefulness of such a tool. The results from the user experiment conducted in this thesis indicate that {LiteraryMark} is able to significantly improve the understanding and confidence levels of people reading abstracts.idence levels of people reading abstracts.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conclusion A-Ql: Does using the terms from the abstraA-Ql: Does using the terms from the abstract being read to form a context provide high precision retrieval of Wikipedia articles relevant to the text the user highlights? There arc two term frequency based algorithms developed in this thesis: • Paragraph Search and Rerank (Para/Rerank) • Phrase Paragraph Augmented Query (Phrase/Para) Both of these algorithms use the terms from the abstract to find similar Wikipedia articles. The terms from the user highlighted text are used to find a relevant article from that set. The top performing algorithm, Phrase/Para, has a precision at rank 1 of .77 over the test set used in this thesis. It significantly outperformed the baseline vector space model retrieval algorithm thus showing that terms from the abstract can be used to form a context providing high precision retrieval of Wikipedia articles. A-Q2: Does using the categories in Wikipedia to form a context provide high precision retrieval of Wikipedia articles relevant to the text the user highlights? There are five category based algorithms developed in this thesis: • Absolute Membership, Phrase Query, Top Ranked (AbsPhrRnk) • Encouraged Membership, Phrase Query, Top Ranked (EncPhrRnk) t Encouraged Membership, Phrase/Para Query, Top Ranked (EncPhrParRnk) • Encouraged Membership, Phrase Query, Category Popularity Threshold (EncPhrThrcs) • Encouraged Membership, Phrase/Para Query, Category Popularity Threshold (EncPhrParThres) These algorithms conduct context searches to find Wikipedia articles similar to the abstract. These similar articles have their categories extracted and the ones that are most frequent among them arc used to form a context. The highlighted text is then used to search for Wikipedia articles and those that belong to the a context category arc re-ranked to the top. The top performing algorithm, EncPhrParThres, uses encouraged membership and a threshold to select context categories for re-ranking the results from the phrase paragraph augment query, the highlighted text search. This algorithm significantly outperforms the baseline vector space model though its parameter settings reduced the impact the context has on the search process. These results provide only weak evidence that Wikipedia categories can be used to create a context for retrieving relevant Wikipedia articles. A-Q3: Does using the links between Wikipedia articles to form a context provide high precision retrieval of Wikipedia articles relevant to the text the user highlights? There are four link based algorithms developed in this thesis: • Authority, Phrase Query (AuthorPhr) • Authority, Phrase/Para Query (AuthorPhrPar) • Hub, Phrase Query (HubPhr) • Hub, Phrase/Para Query (HubPhrPar) Similar to the category based ones, these algorithms perform a context search. The results from this search arc mined for links to in order to identify authority and hub Wikipcdia articles. The top performing algorithm, HubPhrPar, mines the results from phrase paragraph augment query search for hub articles that arc then rc-ranked to the top. It significantly outperforms the baseline vector space model though its parameter settings reduced the impact the context has on the search process. These results provide only weak evidence that the Wikipcdia link structure can be used to create1 a context for retrieving relevant Wikipcdia articles. A-Q4: Between the terms in the abstract, the categories in Wikipedia, and the links between Wikipedia articles, which is the most effective for building a context providing high precision retrieval of Wikipedia articles relevant to the text the user highlights? The top performing contextual retrieval algorithm developed in this thesis, Phrase/Para, uses neither categories or links to form a context. Instead, terms from the abstract arc combined with terms from the user highlighted text in a search query. Abstract terms are used to search the contents of Wikipedia articles and highlighted terms are used to search the titles and the beginnings of articles. While the top performing category and link based algorithms have similar performances, it is due to Phrase/Para being incorporated into them and parameter settings that minimized the influence of the category and link based contexts. 5.1.2 User Experiment The second area of contribution is in the understanding of how users performed when using the reading support tool, LitcraryMark, to retrieve explanatory information related to the academic abstracts they arc reading and the text passages they highlight. In this thesis, we posed three user experiment research questions: U-Ql: Do users prefer using the prototype reading tool, LiteraryMark, over a traditional keyword search engine to retrieve Wikipedia articles while reading abstracts? The results from the user experiment conducted in this thesis indicate that for passages of three words or less, there was a strong preference for using LiteraryMark instead of a search engine. For larger passages, however, preferences were more varied; over half the participants cither preferred to use a search engine or had no particular preference. U-Q2: Does the use of the prototype reading tool, LiteraryMark, improve the reported level of understanding that users have while reading abstracts? The results from the user experiment conducted in this thesis show that the understanding levels that participants reported significantly improved when they used LiteraryMark. These results provide evidence that LiteraryMark helps people understand academic abstracts. U-Q3: Does the use of the prototype reading tool, LiteraryMark, improve the reported level of confidence that users have in their understanding while reading abstracts? Similar to the reported understanding levels, the results from the user experiment conducted in this thesis show that the confidence levels that participants reported significantly improved when they used LiteraryMark. These results provide evidence that LiteraryMark improves the confidence that people have in their understandings of academic abstracts that they arc reading. reading tool, LitcraryMark, is a good alternative to the traditional search engine for looking up Wikipcdia articles related to short passages while reading, Furthermore, it has identified a user task where the retrieval of a single relevant article is more important than providing a list of possible relevant search results. People who are reading want to look up explanatory information, like definitions, quickly without the distractions associated with traditional search engine functions. with traditional search engine functions.
Data source Archival records  + , Experiment responses  + , Wikipedia pages  +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Contextual%2Bretrieval%2Bof%2Bsingle%2BWikipedia%2Barticles%2Bto%2Bsupport%2Bthe%2Breading%2Bof%2Bacademic%2Babstracts%22  +
Has author Christopher Jordan +
Has domain Computer science + , Information science + , Library science +
Has topic Reading support +
Peer reviewed Yes  +
Publication type Thesis  +
Published in Dalhousie University +
Research design Experiment  +
Research questions The goal of this work is to help people unThe goal of this work is to help people understand the abstract that they are reading, not to improve how they search for information. A prototype reading tool has been developed to reduce the efficiency cost of accessing additional explanatory information while reading abstracts. As searching is a disruptive activity for readers, it is unlikely that search tools will help reduce the overall cost of searching no matter how effective they are in retrieving relevant information. A reading tool that automatically retrieves explanatory information related to text that people highlight will allow them to continue reading quickly rather than engage in a Google style search. Our goal is to let users simply highlight the text that they want explanatory information on and present them with a pop-up box that displays a single relevant Wikipedia article. There are four research questions addressed in this part of the thesis: A-Ql Does using the terms from the abstract being read to form a context provide high precision retrieval of Wikipcdia articles relevant to the text the user highlights? A-Q2 Docs using the categories in Wikipcdia to form a context provide high precision retrieval of Wikipcdia articles relevant to the text the user highlights? A-Q3 Docs using the links between Wikipcdia articles to form a context provide high precision retrieval of Wikipcdia articles relevant to the text the user highlights? A-Q4 Between the terms in the abstract, the categories in Wikipcdia, and the links between Wikipedia articles, which is the most effective for building a context providing high precision retrieval of Wikipedia articles relevant to the text the user highlights? The aim of the second part of this thesis is to measure the usefulness of the prototype reading tool, LitcraryMark. This tool is examined for its ability to improve the understanding and confidence levels of users reading academic Computer Science abstracts, As well, user preferences for using it over a traditional search engine are measured: U-Ql Do users prefer using the prototype reading tool, LiteraryMark, over a traditional keyword search engine to retrieve Wikipcdia articles while reading abstracts? U-Q2 Does the use of the prototype reading tool, LiteraryMark, improve the reported level of understanding that users have while reading abstracts? U-Q3 Does the use of the prototype reading tool, LiteraryMark, improve the reported level of confidence that users have in their understanding while reading abstracts?eir understanding while reading abstracts?
Revid 10,715  +
Theories Ontologies are often employed to provide tOntologies are often employed to provide the necessary structure in the metadata for describing a document. It is theorized that with comprehensive and accurate metadata representing documents, retrieval can be carried out with high precision and recall. They do not have to be able to formulate a traditional search query. Therefore, this prototypical reading tool should, theoretically, be less disruptive to the user than a traditional search engine.the user than a traditional search engine.
Theory type Design and action  +
Title Contextual retrieval of single Wikipedia articles to support the reading of academic abstracts
Unit of analysis Article  +
Url http://dl.acm.org/citation.cfm?id=1571201  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:25:38  +
Categories Reading support  + , Computer science  + , Information science  + , Library science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:22:45  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.