Reddit Files Lawsuit Against Perplexity AI Over Alleged Data Scraping Violations

Reddit Escalates Legal Battle Over AI Data Scraping

Social media platform Reddit has filed a federal lawsuit against Perplexity AI and three data providers, alleging they engaged in systematic content scraping without authorization. The complaint, filed in the Southern District of New York, represents the latest escalation in the ongoing battle between content creators and artificial intelligence companies over training data.

Reddit Escalates Legal Battle Over AI Data Scraping
Alleged Data Scraping Network Identified
Technical Circumvention Methods Alleged
Legal Claims and Potential Consequences
Industry Response and Broader Context
Broader Implications for AI Industry

Alleged Data Scraping Network Identified

According to the legal filing, Reddit is targeting Perplexity AI along with data providers Oxylabs UAB, AWM Proxy, and SerpApi. The complaint alleges these companies worked together to bypass both Reddit’s and Google’s anti-scraping defenses to harvest Reddit content and related search results. Sources indicate that rather than obtaining proper licensing, Perplexity allegedly chose to purchase what Reddit characterizes as “purloined data.”

Reddit’s Chief Legal Officer Ben Lee stated in an emailed declaration that AI companies‘ hunger for quality human-generated content is fueling what he described as an “industrial scale data laundering economy.” Analysts suggest this case highlights the growing tension between AI firms needing training data and content platforms seeking to control and monetize their users’ contributions.

Technical Circumvention Methods Alleged

The lawsuit claims the data providers employed sophisticated methods to evade detection. According to reports, when unable to scrape Reddit directly, the companies allegedly masked their identities, concealed their locations, and disguised their web scrapers to harvest Reddit content through Google Search instead. Lee characterized Oxylabs UAB, AWM Proxy, and SerpApi as textbook examples of illegal data scraping operations.

Reddit’s complaint uses vivid analogies to describe the alleged activities, comparing the data providers to “would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.” The filing also echoes Cloudflare CEO Matthew Prince’s previous characterization of Perplexity as being “more akin to a ‘North Korean hacker’” in its approach to obtaining data.

Legal Claims and Potential Consequences

Reddit contends the defendants violated the Digital Millennium Copyright Act by circumventing technological protections against automated access. The company specifically accuses SerpApi and Oxylabs of violating the DMCA’s prohibition on trafficking in technology circumvention products or services. Additional claims include unfair competition, unjust enrichment, and civil conspiracy.

The social media company is seeking both an injunction to stop the alleged scraping activities and monetary damages. This lawsuit follows Reddit’s similar legal action against Anthropic in June, when negotiations for a content licensing agreement reportedly broke down. Industry observers suggest these cases represent a strategic effort by Reddit to establish legal precedents regarding AI training data rights.

Industry Response and Broader Context

A Perplexity spokesperson responded to the allegations by stating the company had not yet received the lawsuit but would “always fight vigorously for users’ rights to freely and fairly access public knowledge.” The spokesperson emphasized that Perplexity’s approach remains “principled and responsible” and that they would not tolerate “threats against openness and the public interest.”

Oxylabs, which advertises itself as “the largest ethical proxy network and advanced scraping solutions empowering the AI industry,” and SerpApi did not immediately respond to requests for comment, according to reports. Google, while not participating in the lawsuit, has reportedly taken measures to prevent automated scraping of its search results.

This case joins a growing number of legal challenges around AI training data. Recent lawsuits include claims against Apple for allegedly using pirated books in its Books3 dataset, allegations against OpenAI for scraping YouTube videos, and The New York Times’ case against Microsoft and OpenAI over news content usage. The outcomes of these cases could significantly shape how AI companies access and use online content moving forward.

Broader Implications for AI Industry

Industry analysts suggest these legal battles highlight fundamental questions about data ownership and fair use in the AI era. As companies like Reddit seek to monetize their content through licensing deals—as seen with Reddit’s agreement with OpenAI—the pressure on AI firms to obtain training data through legitimate channels increases. The resolution of these cases may establish important boundaries around what constitutes permissible scraping of publicly available web content.

According to legal experts following these developments, the outcomes could force AI companies to be more transparent about their data sourcing practices and potentially reshape how the industry approaches content acquisition. With multiple high-profile cases now moving through the courts, 2025 is shaping up to be a pivotal year for establishing legal frameworks around AI training data.