AI Copyright Ruling: Fair Use Win, Piracy Trouble for Anthropic

Big news just dropped in the world of AI and copyright law. A U.S. federal court recently made a surprising call saying it’s actually okay for companies like Anthropic to train their language models (like Claude) on copyrighted content. But before you think it’s all smooth sailing, there’s a major twist. While the judge called it “fair use,” Anthropic’s in hot water over how they got that data and let’s just say, pirated books are at the heart of it.

Alright, so get this, there was a pretty big legal case recently in the US involving an AI company called Anthropic and their language model, Claude.

Here’s the gist of it:

The judge actually ruled that Anthropic’s use of copyrighted material to train Claude was “fair use” under US copyright law. The thinking behind this, according to Judge William Alsup, is that the output from these AI models is “quintessentially transformative.” He even said it’s like a new writer learning from existing works, not to just copy them, but to “turn a hard corner and create something different.” So, in that sense, scraping copyrighted content for training? Generally okay.

But, and this is a huge but, Anthropic is still in some serious hot water because of how they got a lot of that copyrighted material. It turns out, they allegedly used thousands of pirated books they just “found” online, rather than actually buying them. The judge made it super clear that Anthropic had “no entitlement to use pirated copies for its central library.”

This whole thing kicked off last summer when authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic. They claimed Anthropic’s whole business model was based on “largescale theft of copyrighted works” and that they were “strip-mining” human expression.

What’s wild is that even Anthropic’s own employees had concerns about using these pirated books.

The company eventually pivoted to buying physical books and digitizing them, which is a big effort. However, the judge ruled that the earlier piracy still needs to be addressed legally. So, while Claude can keep being trained on the authors’ works, Anthropic has to go back to court in December to face trial for that “largescale theft” thing.

Now, me personally, the writer of the article, I’m a bit skeptical about the whole “transformative” argument for AI. I mean, AIs like Claude don’t really “understand” texts the way humans do; they’re more like playing a super complex game of word association to try and make coherent copy. But, you know, this ruling from the San Francisco federal court could set a big legal precedent for large language models going forward.