Best Practises#

During this section, we will discuss some of the best practises πŸ—ΊοΈ when working with Reddit data, from being cautious with vulnerable subreddit communities to properly handling user-deleted content in an ethical manner.#

🚨 Exercising Caution with Vulnerable Populations#

If delving into psychological research, exercise heightened caution. Scrutinise subreddit rules, as some carry explicit warnings about direct data collection. For instance, mental health support communities mandate moderator approval for research posts/surveys. While these subreddits may not explicitly prohibit collecting submissions and comments, participants are inherently vulnerable, necessitating robust ethical safeguards. Researchers should judiciously weigh potential benefits against risks of utilising such sensitive data, emphasising informed consent, confidentiality, and community sensitivities. Consider transparently advising moderators on data usage plans to uphold ethical standards.

πŸ”‘ Leveraging Approved APIs and Privacy-Focused Tools#

Secure an approved API from Reddit, and ensure data collection adheres to ethical and privacy-safe practices. Leverage RedditHarbor – a tool designed to navigate the grey areas of data scraping and user privacy.

πŸ“ Responsible Publication and Attribution#

While publishing research results is permissible, disseminating the data itself or any derivative products based on Reddit data (e.g., models trained using Reddit data) is prohibited. Proper attribution to Reddit (and any third-party tools utilised) is mandatory, coupled with stringent anonymisation of information in published findings. Additionally, provide Reddit ( with a copy of your research, allowing reasonable advance notice before publication.

βŒ› Handling User-Deleted Content Ethically#

When users delete their posts or accounts, it’s advisable to respect their implicit revocation of consent by removing any stored copies of their data from your research databases. This upholds user privacy expectations and ethical research practices.

πŸ”­ Longitudinal Research Justification#

For longitudinal studies involving retention of user-deleted data, clearly articulate the overriding societal benefits and scientific merit. Ensure robust anonymisation, data minimisation, and limited access controls to justify this exceptional use case.

🀝 Transparency and Community Engagement#

Foster transparency by openly communicating your research intentions, data handling practices, and privacy safeguards. Consider engaging with relevant Reddit subreddit moderators to build trust and address potential concerns.