Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

2 weeks ago 74

“Meta has treated the so-called ‘public availability’ of shadow datasets as a get out of jail free card, notwithstanding that internal Meta records show every relevant decision-maker at Meta, up to and including its CEO, Mark Zuckerberg, knew LibGen was ‘a dataset we know to be pirated,’” the plaintiffs allege in this motion. (Originally filed in late 2024, the motion is a request to file a third amended complaint.)

In addition to the plaintiffs’ briefs, another filing was unredacted in response to Chhabria’s order—Meta’s opposition to the motion to file an amended complaint. It argues that the authors’ attempts to add additional claims to the case are an “eleventh-hour gambit based on a false and inflammatory premise,” and denies that Meta waited to reveal crucial information in discovery. Instead, Meta argues it first revealed to the plaintiffs that it used a LibGen dataset in July 2024. (As much of the discovery materials remain confidential, it is difficult for WIRED to confirm that claim.)

Meta’s argument hinges on its claim that the plaintiffs already knew about the LibGen use and shouldn’t be granted additional time to file a third amended claim when they had ample time to do so before discovery ended in December 2024. “Plaintiffs knew of Meta’s downloading and use of LibGen and other alleged ‘shadow libraries’ since at least mid-July 2024,” the tech giant’s lawyers argue.

In November 2023, Chhabria granted Meta’s motion to dismiss some of the lawsuit’s claims, including its claim Meta’s alleged use of the authors’ work to train AI violat...

Read Entire Article