Reddit Files Lawsuit Against Perplexity AI Over Alleged Data Scraping Practices

Legal Battle Over AI Training Data Intensifies

Reddit has filed a federal lawsuit against Perplexity AI, alleging the artificial intelligence company illegally scraped user-generated content from its platform to train its AI models, according to court documents filed in New York. The social media platform is seeking monetary damages and a permanent injunction to prevent further use of its data, the report states.

Legal Battle Over AI Training Data Intensifies
Alleged Data Harvesting Operation
Pattern of Legal Action Against AI Firms
Perplexity’s Response and Defense
Broader Industry Implications
Ongoing Legal Proceedings

Alleged Data Harvesting Operation

The lawsuit claims Perplexity collaborated with three data scraping firms—Oxylabs from Lithuania, AWMProxy from Russia, and SerpApi from Texas—to bypass Reddit’s protective measures against unauthorized data collection. Sources indicate this coordinated effort allowed Perplexity to access Reddit’s extensive library of human discussions and conversations, which the company allegedly “desperately needs” to improve its AI model accuracy.

Pattern of Legal Action Against AI Firms

This legal action represents the second lawsuit Reddit has filed against AI companies in recent months, analysts suggest. In June, the platform initiated similar proceedings against Anthropic, another AI startup, over comparable data scraping allegations. Reddit’s Chief Legal Officer Ben Lee described the situation as part of what he calls a “data laundering economy,” stating that AI firms are engaged in what appears to be an “arms race for quality human content.”

Perplexity’s Response and Defense

Perplexity has publicly denied any wrongdoing in response to the allegations. The company released a statement asserting that their approach “remains principled and responsible as we provide factual answers with accurate AI.” They further stated they “will not tolerate threats against openness and the public interest” and plan to vigorously defend their position in court, according to their official response.

Broader Industry Implications

The lawsuit emerges amid growing legal challenges facing AI companies regarding their data sourcing practices. Multiple tech giants and content creators have filed similar cases questioning how AI models are trained using existing online content, reports indicate. This case could potentially set important precedents for how user-generated content on social platforms can be utilized for artificial intelligence training purposes, legal experts suggest.

Ongoing Legal Proceedings

The case is currently proceeding through the federal court system, with both parties preparing their legal arguments. The outcome could have significant implications for the AI industry’s access to publicly available online content and may establish clearer boundaries around data scraping practices, according to industry analysts monitoring the situation.