Web Scraping For Marketers

MSI (the Marketing Science Institute) issues working papers on issues of interest to marketers. The institute’s purpose is to bring together academics and practice. As such MSI advice aims to be of importance to both academics and practitioners. I recently read an interesting offering on web scraping for marketers, written by Johannes Boegershausen, Abhishek Borah, and Andrew Stephen. The article advocates for more use of web scraping by consumer researchers.

Potential Of Web Scraping For Marketers

The potential of web scraping for marketing is clear. Consumers post loads of information about themselves and their tastes. Can marketers use this information to serve them better? Certainly, academics can hope to better understand what consumers want. Such data analyses has potential to rival the use of laboratory experiments that have historically been very popular with consumer behavior researchers.

Given consumers are posting this information publically this might help reassure some people, at least, that scraping the data was no invasion of privacy. Twitter and blogs have proved fascinating sources of information. There are any number of other great sources. You can even find out what marketing academics like me write about in their spare time. As the internet (and my blog) gets bigger there will be more and more interesting stuff to scrap.

Scraping My Blog Could Give Insights Into A Fascinating Topic (Me)

Using Web Scraping In Research

The authors point out a problem with using web scraping in research. Specifically, that you see what is out there at the point you scrap. To scrap the web the researcher grabs data at a specific time. The information available will have changed by the next time the data is grabbed. This makes it hard to replicate any findings. Regular changes in data make it critically important to know how the data was collected. Thus, academics must be clear on their collection procedures to mitigate concerns about dubious data. Boegershausen and his colleagues outline some best practices.

Is It Legal?

Given that I am not a lawyer and do not play on TV (although I’m sure I’d be great at it) I can’t really comment. I am deliberately not giving advice. The authors share some advice but they aren’t lawyers either (I think). As such the advice they give is a bit CYA to be honest.

In sum, web scraping continues to be a legal gray area. Researchers need to be aware of the potential liability from web scraping. In particular, republishing scraped datasets is highly problematic.
Boegershausen, Borah and Stephen, (2021) Appendix 2

I will translate this: ‘don’t sue the authors of this paper if something goes wrong’.

Web scraping happens. It doesn’t seem to cause the world to come to an end in many cases. Still legality is always something to think about. Do check the terms of any website you are using and get some legal advice if relevant. (I too can give advice that is a bit wet when I want to).

That said, I would hope people wouldn’t avoid the area for fear of lawyers (Liticaphobia? I googled and this description of fear of lawyers appeared. I am still to be convinced it is a proper word).

The Future Of Web Scraping In Consumer Research

My perception is that consumer researchers have embraced technology a little less enthusiastically than their marketing strategy or analytically focused colleagues. So it would be fascinating if consumer behavior research were to become the area where the technologically inclined, e.g., computer science trained and inspired faculty, ended up making the most contribution.

For more on what academic researchers research see here.

Read: Johannes Boegershausen, Abhishek Borah and Andrew T. Stephen, (2021) Fields of Gold: Web Scraping For Consumer Research, https://www.msi.org/working-papers/fields-of-gold-web-scraping-for-consumer-research/