Reddit Files Lawsuit Alleging Perplexity AI Illegally Scraped Content Through Third Parties

Legal Battle Over AI Training Data

Reddit has initiated legal action against artificial intelligence company Perplexity, alleging the AI firm engaged in unauthorized data scraping of Reddit content to train its AI systems. According to the complaint filed in court, Perplexity collaborated with three data scraping companies—AWMProxy, Oxylabs, and SerpApi—to access Reddit’s content without proper authorization. Sources indicate that Perplexity maintained business relationships with at least one of these scraping firms to systematically extract data from Reddit’s platform.

Legal Battle Over AI Training Data
The “Digital Marked Bill” Investigation
Escalation Despite Warning
Perplexity’s Defense Statement
Broader Implications for AI Industry
What Comes Next

The “Digital Marked Bill” Investigation

What makes this case particularly noteworthy is the method Reddit employed to allegedly catch Perplexity in the act. According to court documents, Reddit created what it describes as a “test post”—equivalent to a digital “marked bill”—that was exclusively accessible through Google’s search engine and unavailable elsewhere on the internet. The report states that within hours of creating this unique content, queries to Perplexity’s “answer engine” produced the exact contents of the test post. Analysts suggest this demonstrates Perplexity was scraping Google search results pages containing the Reddit content and rapidly incorporating that data into its AI systems.

Escalation Despite Warning

The legal complaint reveals that Reddit had previously sent a cease-and-desist letter to Perplexity regarding its scraping activities. Surprisingly, according to Reddit’s allegations, rather than reducing its citation of Reddit content after receiving the legal warning, Perplexity actually increased its references to the platform. This development suggests the relationship between the companies deteriorated significantly before reaching the current litigation stage. Industry observers note that Reddit has established formal agreements with other AI companies for content access, but no such arrangement existed with Perplexity.

Perplexity’s Defense Statement

Perplexity has responded to the allegations through a statement provided to The Verge, though the company notes it hasn’t formally received the lawsuit documents yet. “Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge,” the company stated. “Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

Broader Implications for AI Industry

This lawsuit emerges amid growing tensions between content platforms and AI companies regarding training data sourcing. The case highlights ongoing debates about what constitutes fair use of publicly available web content for AI training purposes. Legal experts suggest the outcome could establish important precedents for how data scraping activities are regulated and what permissions AI companies need to access content from platforms like Reddit. As artificial intelligence systems increasingly rely on vast amounts of web data for training, analysts suggest such legal conflicts will likely become more common across the technology sector.

What Comes Next

The legal proceedings are just beginning, and industry watchers anticipate a complex case that could examine fundamental questions about data ownership and AI development practices. Meanwhile, Reddit’s creative investigation method—using digitally traceable content as bait—demonstrates how platforms are developing sophisticated techniques to detect unauthorized data access. The technology community will be closely watching how this case develops, as it could influence how AI companies source training data and how content platforms protect their information in the future.