Welcome to the 18th International AAAI Conference on Web and Social Media!

We’re thrilled to present our tutorial Scraping Reddit the Right Way: A Guide to Legal and Ethical Data Collection with RedditHarbor. This tutorial is co-organised by socius and the Open Data Institute (ODI), featuring:

  • Nick Oh, Principal Researcher at socius

  • Sasha Moriniere, Researcher at the Open Data Institute

  • Jared Keller, Principal Researcher at the Open Data Institute

The tutorial has three parts over the course of 2 hours. First, we’ll introduce the ODI’s Global Data Infrastructure typology, providing a big picture view of platform data access for public interest research - from APIs to data trusts and more. Next, we’ll focus specifically on Reddit, discussing how to collect and use Reddit data ethically and legally. Finally, we’ll provide hands-on training for RedditHarbor, a scraping toolkit designed to streamline Reddit data collection seamlessly.

We hope you find this tutorial to be a valuable learning experience! To promote open science, this documentation is intended as an ongoing reference for both ICWSM 2024 attendees and researchers who couldn’t make it this year. For general inquiries or feedback, please contact research@socius.org. If you have any questions specifically related to the ODI’s #GlobalDataInfrastructure initiative, reach out to sasha.moriniere@theodi.org. We’re delighted to assist fellow computational social scientists in navigating the ethical and legal access to platform data, such as Reddit, even after the conclusion of the conference.

For ICWSM 2024 attendees

ICWSM attendees MUST have the prerequisites before attending this tutorial. The hands-on activities require having the prerequisites set up beforehand. If you have any questions or encounter any difficulties along the way, raising a GitHub issue (here) is the fastest way to receive support from our team.

hosted at:

(JUNE 3RD – 6TH 2024)

co-organised by:

socius the Open Data Institute