Data Scraping and Other Internet-Based Research Methods

Today I completed CITI Program‘s refresher course on Social & Behavioral Research to apply for human subjects research at my current institution.  In this online course, there was a unit on internet-based research that I didn’t remember being there when I first took the course a few years ago at the University of Iowa.  Perhaps it was there but the unit cited sources published during and after I took the first course, so most of the content is new or updated.  This unit was especially important to me because I intend to continue my research on sojourning ELTs using internet-based research methods in addition to more traditional methods.

Data Scraping

One research method that I learned about was data scraping, an ugly term that reminds me of scraping tissue samples for DNA testing.  Susanne Webster from My Helpster wrote about data scraping at

Image from The Helpster's article about scraping: ethernets cords plugged into a hard drive.

For my dissertation, I practiced what the article calls “manual copying” to collect blog data from my participants who gave their informed consent to be in the study.  Instead of relying on the help of overseas freelancers, I did the work myself by copying and pasting blog posts from online to MS Word documents saved and password-protected on my hard drive.  When I was doing this, I was thinking that there had to be an easier way of collecting this data but my literature review did not provide any answers except data mining, which was not exactly what I wanted to do.  I limited my search for this service because I did not have the resources to pay a third-party to do this.

Resources on Internet-Based Research Methods

For my dissertation, I borrowed The SAGE Handbook of Online Research Methods from the university library.  It is a massive book with nearly 600 pages.  Although it was somewhat helpful, it did not help me navigate through the much of the methodological and ethical messiness of collecting blog data for qualitative research.  Perhaps one reason was that it was out of date, published in 2008, which means many of the sections were written in 2007 at the latest.  To help contextualize, 2007 was the year Facebook was really catching on in the U.S.

The CITI Program training course provided a couple of helpful resources for internet-based research, both of which are much more current, and I believe will continue to update as our technology and our use of technology evolves.

Screen shot of Michael Zimmer's websiteThe first is Michael Zimmer’s website, which can be found at  As you can see in the screen shot above, his blog covers many issues.  I just found his webpage today, so I can’t go into too much detail as I’m still learning about it myself.

Banner from UC Berkeley's department of Research Administration and Compliance

The second resource is the University of California Berkeley’s guidance on internet-based research.  Just last month from this posting (July 2015), they updated their guide at This is a great reference for those getting started with their internet-based research.  It complements many items from CITI Program’s course.

So if you’re interested in my research methods or if you’re interested in conducting research similar to mine or if you’d like to collaborate with me, these resources will give you a good idea of how to conduct research online appropriately and ethically.  I’d love to hear from anybody who has been doing similar research and has more tips to share.



One thought on “Data Scraping and Other Internet-Based Research Methods

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s