WikiLit:Export

From WikiLit
Jump to: navigation, search

Export of data from WikiLit: WikiLit can be queried like any other Semantic MediaWiki installation.

Exporting abstracts as CSV

Get all abstracts from publications

{{#ask: [[Category:Publications]] 
 | ?abstract
 | format = CSV
 | limit = 1000
 | mainlabel = Page
}}

Use this link to download: CSV

Python code to work with the data:

import pandas as pd
 
# Copied from the link above
url = ("http://wikilit.referata.com/wiki/Special:Ask"     
       "/-5B-5BCategory:publications-5D-5D/-3FAbstract/"
       "format%3D-20CSV/limit%3D-201000/offset%3D0"
       "/mainlabel%3D-20Page/offset%3D0")
 
abstract = pd.read_csv(url)
for index, (page, abstract) in abstracts.iterrows():
    if isinstance(abstract, str):
        if '.' != abstract.strip()[-1] or '{' in abstract or '}' in abstract or '\\' in abstract:
            print("{} {}\n    {}".format(index, page, abstract[-50:]))

'wikipedia' in titles

Get all titles from publications

{{#ask: [[Category:Publications]] 
 | ?title
 | format = CSV
 | limit = 1000
 | mainlabel = - 
}}

Use this link to download: CSV

Python code to work with the data:

import pandas as pd
import numpy as np
 
# Copied from the link above
url = ("http://wikilit.referata.com/wiki/Special:Ask/"
       "-5B-5BCategory:publications-5D-5D/-3FTitle/"
       "format%3D-20CSV/limit%3D-201000/mainlabel%3D-20-2D/"
       "offset%3D0")
 
corpus = pd.read_csv(url)
 
# With what rate does the word 'wikipedia' appears in the title?
np.mean(['wikipedia' in title.lower() for title in corpus.Title ])

'Collected data time dimension' and year

Should be included:

  • "Added on initial load"
  • "Yes"
  • "No but verified"

Should not be included:

  • "Yes but not verified": Articles that need not be included, e.g., not peer-reviewed.
  • "No"
{{#ask: [[Added by wikilit team::No but verified]] OR 
        [[Added by wikilit team::Added on initial load]] OR 
        [[Added by wikilit team::Yes]] 
 | ?Title
 | ?Year
 | ?Collected data time dimension
 | format = CSV
 | limit = 1000
 | mainlabel =Page 
}}

Use this link to download: CSV

import pandas as pd
from pylab import show, ion
 
 
url = ("http://wikilit.referata.com/wiki/Special:Ask/"
       "-5B-5BAdded-20by-20wikilit-20team::No-20but-20verified-5D"
       "-5D-20OR-20-0A-20-20-20-20-20-20-20-20-20-5B-5BAdded-20by"
       "-20wikilit-20team::Added-20on-20initial-20load-5D-5D-20OR"
       "-20-0A-20-20-20-20-20-20-20-20-20-5B-5BAdded-20by-20wikilit"
       "-20team::Yes-5D-5D/-3F%3DPage-23/-3FTitle/-3FYear/"
       "-3FCollected-20data-20time-20dimension/format%3D-20csv/"
       "limit%3D-201000/mainlabel%3DPage/offset%3D0")
 
# Get data from referata WikiLit site
corpus = pd.read_csv(url)
 
# Crosstabulation with plot
pd.crosstab(corpus.Year, corpus["Collected data time dimension"]).plot(kind='bar')
ion()
show()

Research details and year

{{#ask: [[Added by wikilit team::No but verified]] OR 
        [[Added by wikilit team::Added on initial load]] OR 
        [[Added by wikilit team::Yes]] 
 | ?Title
 | ?Year
 | ?Publication type
 | ?has topic = Topic
 | ?has domain = Domain
 | ?Theory type
 | ?Wikipedia coverage
 | ?Research design
 | ?Data source 
 | ?Collected data time dimension
 | ?Unit of analysis
 | ?Wikipedia data extraction
 | ?Wikipedia page type
 | ?Wikipedia language
 | format = CSV
 | limit = 1000
 | mainlabel =Page 
}}

Use this link to download: CSV

import pandas as pd
from pylab import show, ion
 
 
url = ("http://wikilit.referata.com/w/index.php?"
       "title=Special:Ask&x=-5B-5BAdded-20by-20wikilit-20team"
       "%3A%3ANo-20but-20verified-5D-5D-20OR-20-0A-20-20-20-20"
       "-20-20-20-20-5B-5BAdded-20by-20wikilit-20team%3A%3AAdded"
       "-20on-20initial-20load-5D-5D-20OR-20-0A-20-20-20-20-20-20"
       "-20-20-5B-5BAdded-20by-20wikilit-20team%3A%3AYes-5D-5D%2F"
       "-3F%3DPage-23%2F-3FTitle%2F-3FYear%2F-3FPublication-20type"
       "%2F-3FHas-20topic%3DTopic%2F-3FHas-20domain%3DDomain%2F-3F"
       "Theory-20type%2F-3FWikipedia-20coverage%2F-3FResearch-20design"
       "%2F-3FData-20source%2F-3FCollected-20data-20time-20dimension"
       "%2F-3FUnit-20of-20analysis%2F-3FWikipedia-20data-20extraction"
       "%2F-3FWikipedia-20page-20type%2F-3FWikipedia-20language"
       "&format=%20CSV&limit=%201000&mainlabel=Page&offset=0")
 
# Get data from referata WikiLit site
corpus = pd.read_csv(url)
 
# Crosstabulation with plot
pd.crosstab(corpus.Year, corpus["Collected data time dimension"]).plot(kind='bar')
ion()
show()

Data source : Scholarly article

  Data source Year Added by wikilit team
A Wikipedia literature review Scholarly articles 2010 No but verified
Information sharing and social computing: why, what, and where? Scholarly articles 2009 Yes
Mediating at the student-Wikipedia intersection Scholarly articles 2010 Yes
Mining meaning from Wikipedia Scholarly articles 2009 Added on initial load
Overview of the INEX 2008 XML mining track Experiment responses
Scholarly articles
2009 Added on initial load
Researching Wikipedia - current approaches and new directions Scholarly articles 2006 Added on initial load
The visibility of Wikipedia in scholarly publications Scholarly articles 2011 Added on initial load
Value production in a collaborative environment Scholarly articles 2012 Yes
What we know about Wikipedia: a review of the literature analyzing the project(s) Scholarly articles 2012 Yes