Automating the collection of literature – or, keeping up to date with the MOOC literature
Spoiler: We’ve been toying with automating the collection of literature on MOOCs (and other topics). Interested? Read further.
Researchers use different ways to keep updated with the literature on a topic. On a daily basis for example, I use Table of Content (TOC) alerts, RSS feeds, and Google Scholar alerts. Many colleagues have sought to keep track of literature on a topic and share it. For example, danah boyd maintained this list of papers on Twitter and microblogging; Tony Bates shared a copy of the MOOC literature he collected on his blog; Katy Jordan also kept a collection of MOOC literature.
A Google Scholar Alert
The problem with maintaining an updated list of relevant literature on a topic is that it quickly becomes a daunting and time-consuming task, especially for popular topics (like MOOCs or social media or teacher training).
In an attempt to automate the collection and sharing of literature, my research team and I created a python script that goes through the Google Scholar alert emails that I receive (see above), parses the content of the emails, and places it in an html page on my server, from where others can access it. The script runs daily and any new literature is added to the page.
We aren’t there just yet, but here is the output for the MOOC literature going back to November 2012. All 400 pages. I placed it in a Google Document because the html file is 2.5mb (and its easier for people to just download it in a format that they prefer)
In theory this is supposed to work quite well, but there’s a couple of problems with it:
- The output is as good as the input. Google Scholar (and its associated alerts) are a black box – meaning there’s no transparency of what is and isn’t indexed.
- It’s automated – which means it’s not clean and some “mooc literature” may not really be mooc literature because Google Scholar alerts work on keywords in the body of papers/text rather than keywords describing the papers/text.
We plan on to make the source code available and describe the process to install this so that others can use it for their own literature needs. My question is: How can the output be more helpful to you? Is there anything else that we can do to improve this?