Week 3

Reflection – Week 3

In this week’s workshop, I experimented with web scraping for the first time and tried to collect data from BBC iPlayer.

In this week’s workshop, I experimented with web scraping for the first time and tried to collect data from BBC iPlayer. At the beginning, I assumed that scraping would simply “show me the data behind the interface,” as if information on a website was openly available for anyone to take. But very quickly, I realised that data online is not as transparent or accessible as I imagined.

For my first scrape, I focused on the BBC iPlayer homepage. The only information I managed to collect was the visible labels, such as “Drama,” “Documentary,” “Most Popular,” and clickable buttons like “View All” (p1). This made me think of D’Ignazio and Klein’s idea that “what gets counted counts.” The platform decides which categories matter, and by doing so, it shapes my viewing experience before I even click anything. Thus, what I have scraped is not the “neutral information” but the result of someone else’s classification system.

p1

Then I tried scraping an individual programme page. I was able to extract the title, description, and episode details, but I still couldn’t access deeper data or anything related to the video itself (p2 p3). This connects to Crawford’s argument that data is never neutral—it is shaped and controlled by infrastructures and institutions. I realised that platforms not only show data selectively, but also hide data deliberately through tools like APIs and DRM.

p2 p3

At the same time, this process also raised ethical questions. Even when data is publicly viewable, it doesn’t mean we have the right to collect, store, or reuse it. Even if scraping is technically feasible, it raises questions about consent, ownership, and the terms under which data is circulated. As researchers, we must be mindful that respecting data copyright, platform regulations, and data reuse agreements forms part of research ethics. When handling data in future, I should exercise caution to ensure transparency and legitimacy in my approach, thereby avoiding unnecessary risks and harm. I believe this also constitutes a form of respect for the individuals and contexts behind the data.