OpenAI Unintentionally Removes Possible Evidence in New York Times Copyright Legal Battle


OpenAI might have unintentionally lost essential data linked to its ongoing **copyright litigation** with the *New York Times*.

A report from [TechCrunch](https://techcrunch.com/2024/11/20/openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit/) indicates that attorneys for the *Times* and co-plaintiff *Daily News* filed a [letter](https://storage.courtlistener.com/recap/gov.uscourts.nysd.612697/gov.uscourts.nysd.612697.328.0.pdf) with the judge overseeing the case, stating that “an entire week’s worth of its experts’ and lawyers’ contributions” was “irretrievably lost.” The problem emerged after OpenAI supplied the plaintiffs with two virtual machines for examining alleged copyright infringements. However, on Nov. 14, it was reported that OpenAI engineers deleted programs and search result data located on one of the virtual machines.

### **SEE ALSO:**
[OpenAI says over 2 million people consulted ChatGPT for the 2024 election](https://mashable.com/article/openai-election-2024-report)

The *New York Times* has accused OpenAI—and Microsoft, which incorporates OpenAI’s technologies into its Bing AI chatbot—of copyright violations, claiming that their models were trained on paywalled and unauthorized materials. The lawsuit points to instances of ChatGPT outputs featuring “[near-verbatim](https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html)” reproductions of *Times* material. OpenAI, however, refutes these claims, asserting that its models were trained on publicly available information, which it contends qualifies as fair use under copyright legislation. The case is contingent upon the *Times* demonstrating that OpenAI’s models utilized its material without consent, acknowledgment, or payment.

While OpenAI succeeded in recovering most of the deleted information, the “folder structure and file names” were irretrievably lost, making the recovered data ineffective. Consequently, the plaintiffs’ legal team needs to restart their evidence-gathering processes. In their letter, the plaintiffs’ counsel noted there was “no reason to believe [the deletion] was intentional” but highlighted that OpenAI is “in the best position to search its own datasets.” It is worth mentioning that OpenAI has not revealed specifics about its training data.

This litigation is among several [copyright lawsuits](https://mashable.com/article/openai-chatgpt-class-action-lawsuit) directed at OpenAI. In a related matter, a suit from Raw Story and AlterNet was recently [dismissed](https://www.reuters.com/legal/litigation/openai-defeats-news-outlets-copyright-lawsuit-over-ai-training-now-2024-11-07/) on the grounds that the plaintiffs did not sufficiently demonstrate harm. Meanwhile, OpenAI has been establishing licensing agreements with media firms to utilize their content for training purposes, ensuring that ChatGPT responses also include appropriate citations. For example, [Adweek](https://www.adweek.com/media/openai-dotdash-meredith-licensing-payment/) recently reported that OpenAI is compensating Dotdash Meredith with at least $16 million a year for content licensing.