Home / Technology / Lawsuit Claims Nvidia Used Shadow Libraries for AI Model Training

Technology

Lawsuit Claims Nvidia Used Shadow Libraries for AI Model Training

January 24, 2026 11:00 am

Lawsuit Claims Nvidia Used Shadow Libraries for AI Model Training, alleging the company accessed millions of pirated books. The amended complaint states that NVIDIA executives approved the use of the shadow library Anna’s Archive to speed up AI model training. Reports suggest competitive pressure in the AI sector led the company to pursue pirated materials as training data.

Internal NVIDIA emails, cited in the class-action lawsuit, show the company sought high-speed access to the Books3 dataset. The dataset included millions of books typically available only through the Internet Archive’s digital lending system. According to the filing, NVIDIA also accessed additional pirated sources such as LibGen, Sci-Hub, and Z-Library to support large language model pre-training.

The authors claim NVIDIA “knowingly downloaded large volumes of copyrighted books” to improve AI performance. The company allegedly provided corporate AI tools and scripts that enabled automated downloads of pirated content for third-party users. Experts warn that such practices may lead to significant copyright infringement, AI issues, and legal consequences for the company.

The complaint emphasizes NVIDIA executives’ approval despite warnings about the illegal nature of these materials. Analysts say the incident highlights ethical challenges in AI training datasets and the need for responsible data sourcing. With roughly 500 terabytes of data involved, the scale of the alleged copyright violation is enormous.

This case may influence future AI workflow automation and set stricter standards for pre-training data. It also serves as a reminder that AI model training requires careful handling of copyrighted content. NVIDIA now faces scrutiny from authors, regulators, and the broader tech community.