On the heels of the recent rulings on Anthropic and Meta, which put the training of copyright content under “fair use” and dismissed (most of) the cases. I recently wondered how many other lawsuits were still out there as well as which ones could have major implications on the AI industry. I also didn’t want to bore myself or you as the reader to death…so what I’ll do is cover some of the categories and high level findings, then we can go over where the biggest risks are and if anything warrants concern. Obviously if the courts said, “well you simply can’t train on anything”, then that would mean you should probably short all AI stocks and call it a day.
Also note, I am not a lawyer and I am drastically simplifying things to get a better understanding. I’m not trying to set out to make policy here and just want to be an analyst on what is happening in the industry. With that, let’s cover a few lawsuits in AI in these categories:
Books
Art and Images
Code
Data scraping and Privacy
Antitrust
I’ll try to breeze through this so don’t fall asleep please
It’s worth covering Books first since we just had the recent news.
Books
The Anthropic lawsuit was under Bartz et. al. vs. Anthropic, and the Meta was under Silverman et. al. vs Meta (and that is Sarah Silverman, the comedian). These were filed back in 2023 but the case was just dismissed for Anthropic under fair use (but the retention of pirated data will still go to court in Dec 2025). Meta’s case was also dismissed last week for “using the wrong argument” so it might come back. There is still another one against OpenAI with the Authors Guild and a new case called Microsoft Megatron. This Microsoft one is interesting because it was filed just after the Anthropic case last week. In summary here though, it isn’t illegal to convert something that is copyright into a vector space and then create an output of probabilities of how words relate to each other. It would be as if someone watched a movie and created another movie that felt very similar, the act of consuming content and coming up with something new is not illegal. However, the main point here is that the newly created content does need to be “transformative”. It does become problematic if the output spits out the exact same content, however now with web search tools and deep research it may be hard to argue that an output is simply regurgitating the data source it read but with how the technology works it is not storing articles word for word in the weights of the model.
Also it’s again worth noting that the Authors Guild sued Google in the past for digitizing and putting copyright text on the internet, which settled in 2011 for $125M and allowed authors to opt-out. Here with OpenAI though, the data isn’t being stored as the original, it’s going into vectors, even though it could potentially output pretty comprehensive snippets of works. This will be one of the lawsuits to watch, but could similarly end in a settlement like Google’s did, it will hinge on fair use as well.
Where it appears that there could be traction here is through the pirated part. If these companies are pirating copies of this data, or using another data source and not properly licensing, that seems to be a hole where the next lawsuits or outcomes are where we look towards, but it would not be as significant of a blow as being banned from training on this type of content. It’s also worth noting OpenAI does have a mechanism to block users from outputting stuff, sometimes when a string appears on an output, it’ll cancel and give an error. Next, let’s move to image generation
Art and Images
The most notable case here is from Getty Images suing Stability AI, this was dismissed in the UK but still ongoing in the US. It is now trying to target trademark as the next steps. Next was Andersen et al. vs. Stability AI, Midjourney, and DeviantArt. These dismissed DMCA and other claims, but there’s a few things that could be problematic such as featuring artist’s names without consent and they used works without authorization. I believe these companies did give artists the right to opt-out which also moved the needle closer to their favor
Now one interesting one was the new case from Disney and Universal vs. Midjourney. This is still based on the unauthorized training as above but also the output resembling their IP. This is a bit different than text, where it can output something similar but “transformed” in a different context. In the image space, you are literally outputting their characters. I think the best outcome here should be a settlement or licensing deal, since that actually allows Disney and Universal to scale with the growth of AI.
Code
Yes, I work in the AI Code space, and there’s a heavy lawsuit Doe et al. v. GitHub, OpenAI, Microsoft. This showed that open-source code was used to train AI models, and that they could be reproduced pretty reliably which could violate some licenses for users such as GPL. At Windsurf, we got around this by simply removing all non-permissive code, and we have an attribution filter to make sure we don’t output that kind of code. However, this case is still active and it remains unclear if any of the other AI Coding assistants account for this. One call out though is as these models get more powerful, they can find ways to generate their own code to train on, and there is less dependence on open-source code.
Data scraping and privacy
This one had me surprised, there are a lot of lawsuits in motion for this so I’ll try to be brief. There’s the Canadian News Publishers vs. OpenAI which said OpenAI shouldn’t have used their data, and the lawsuit is focused on challenging the Canadian copyright law “fair dealing” which is similar to fair use, but a lot more restrictive (there is a defined list). Next up was OpenAI Privacy Class Actions (Clarkson, Morgan & Morgan) which said OpenAI and Microsoft took personal data without consent. And we also have the EU Privacy Regulators v. OpenAI, where the Italy data protection authority fined OpenAI $15m and EU Data Protection Board said that the scraping requires transparency and user consent. Finally there’s Reddit vs. Anthropic, where Reddit claims Anthropic broke their terms of service and trained on its data. OpenAI is already paying Reddit for this data so I can see the outcome being a similar 8-9 figure deal with Anthropic
To me this could go in any number of directions. It may be that AI companies need to be more transparent on data sources, show more consent than just their terms of service, or come up with better licensing models like I suggested in previous sections.
Antitrust
Finally, there is a case with the US Government called FTC 6(b) Study on AI Mergers / Competition, launched in January 2024, which studies the relationship between OpenAI and Microsoft, Anthropic and Amazon, and Anthropic and Google. It is looking at whether any of these companies use their relationships to gain an edge over others. This could mean sharing data, controlling access to preferential AI partners, or creating agreements that force AI partners to get locked in to one provider.
There are also some other laws, not lawsuits, that could drastically shape the AI landscape. While California vetoed SB 1047, which would have required auditing for large models and kill switches, they are coming up with a more comprehensive report on what to do next. Utah rolled out the AI Policy Act which required disclosure when Gen AI was used, and Tennessee had the ELVIS act which protected artists from unauthorized AI impersonations.
There are also of course export laws, which prohibit companies like Nvidia from shipping its best GPUs to some of the other countries, categorized by Tier 1 through 3. With Tier 1 being allies like UK and Japan that are unrestricted, Tier 2 like Singapore and Saudi Arabia which impose caps, and Tier 3 like China and Russia who are barred from the advanced AI GPUs. In response, Nvidia has to create chips like the B30. Oh and the EU also had the AI Act, which gave a comprehensive legal framework around AI, the full extent which will take place in August of next year. AI companies here are worried about too much compliance and making open-source models take accountability even when they are public. It also forces AI companies to expose trade secrets on how models are trained or where the data comes from.
Whew, I know lawsuits are not the most exciting part of AI, but it’s probably the topic that I have least covered. In summary, you can see each lawsuit progressing slowly, but at least nothing shattering the industry yet. The biggest risks will be what data can be trained, but this is slowly becoming less of an issue with licensing and synthetic data. The other is copyright, but that keeps falling under fair use and the one “out” is being able to block outputs.
Where I think things likely go is more licensing with companies so that they all can grow as the industry keeps growing. It’s a stark contrast from Google and web search, where Google actually funneled traffic to other websites. Now with AI you can simply ask and receive the answer, and that isn’t great for business models that require people to browse through sites full of ads. If I were an AI company, I’d be figuring out how to reduce dependence on other works, and if I were a content creation business, I’d be trying to make deals with AI companies for the data. The best position right now is probably “recent” or “real time” data, since that won’t get obsolete.
Just want to note, I’m probably missing a bunch of cases here, and I am not a lawyer so take my opinions with a grain of salt. However, many of these lawsuits are ongoing and will shape how the industry is operated on going forward. I know I have had to scramble behind the scenes to create contingency plans multiple times at Windsurf, and most likely if you are thinking of joining or running an AI company, you’ll have to do the same as well. My hope is that these laws don’t slow things down for us (for example China is probably moving fastest with the least worries about litigation), or give the unnecessary burden of extra work. Nevertheless, let’s see how things play out, and continue to think about incentives and alignment with partners.
I was wondering about this. Thanks for writing. On the images side anything generated with AI is not copyrightable which is an interesting protection. The Dutch also gave copyright to your own likeness. Thought these were interesting as well.
https://www.euronews.com/next/2025/06/30/denmark-fights-back-against-deepfakes-with-copyright-protection-what-other-laws-exist-in-e
https://www.reuters.com/legal/ai-generated-art-cannot-receive-copyrights-us-court-says-2023-08-21/