
Navigating the AI Legal Maze: Perplexity's Predicament
AI Applied • AI Applied

Navigating the AI Legal Maze: Perplexity's Predicament
AI Applied
What You'll Learn
- ✓Perplexity is being sued by major news publishers for using 'retrieval augmented generation' (RAG) to generate summaries of paywalled articles, which the publishers claim is copyright infringement
- ✓This lawsuit differs from previous AI-related lawsuits, which were more focused on verbatim copying of content, whereas RAG involves retrieving and summarizing content
- ✓There are challenges in determining what constitutes fair use vs. copyright infringement, especially when other websites may be scraping and republishing paywalled content
- ✓The episode explores the broader implications for the AI industry's relationship with the media, including efforts by companies like Meta to license content directly from publishers
- ✓Perplexity has defended itself by claiming it does not train its models on publishers' content, but the issue of how to handle third-party scraping and republishing remains complex
- ✓The legal battle highlights the tensions between AI innovation and protecting intellectual property rights in the media industry
Episode Chapters
Introduction
The hosts discuss the legal troubles facing Perplexity, an AI company, as it is sued by major news publishers for its use of AI technology.
Perplexity's Use of RAG
The episode explores how Perplexity's use of 'retrieval augmented generation' (RAG) technology differs from previous AI-related lawsuits and the challenges in determining fair use vs. copyright infringement.
Broader Implications
The discussion covers the broader implications of this legal battle for the AI industry's relationship with the media, including efforts by companies like Meta to license content directly from publishers.
Perplexity's Defense
The episode examines Perplexity's defense against the lawsuits and the complexities around third-party scraping and republishing of paywalled content.
Conclusion
The hosts conclude by highlighting how this legal battle underscores the tensions between AI innovation and protecting intellectual property rights in the media industry.
AI Summary
This episode discusses the legal challenges facing Perplexity, an AI company, as it is being sued by several major news publishers like the New York Times, Wall Street Journal, and Chicago Tribune. The key issue is Perplexity's use of 'retrieval augmented generation' (RAG) to generate summaries of paywalled news articles, which the publishers claim is a violation of copyright. The episode explores the nuances of this legal battle, including the differences from previous AI-related lawsuits, the challenges of distinguishing legitimate fair use from copyright infringement, and the broader implications for the AI industry's relationship with the media.
Key Points
- 1Perplexity is being sued by major news publishers for using 'retrieval augmented generation' (RAG) to generate summaries of paywalled articles, which the publishers claim is copyright infringement
- 2This lawsuit differs from previous AI-related lawsuits, which were more focused on verbatim copying of content, whereas RAG involves retrieving and summarizing content
- 3There are challenges in determining what constitutes fair use vs. copyright infringement, especially when other websites may be scraping and republishing paywalled content
- 4The episode explores the broader implications for the AI industry's relationship with the media, including efforts by companies like Meta to license content directly from publishers
- 5Perplexity has defended itself by claiming it does not train its models on publishers' content, but the issue of how to handle third-party scraping and republishing remains complex
- 6The legal battle highlights the tensions between AI innovation and protecting intellectual property rights in the media industry
Topics Discussed
Frequently Asked Questions
What is "Navigating the AI Legal Maze: Perplexity's Predicament" about?
This episode discusses the legal challenges facing Perplexity, an AI company, as it is being sued by several major news publishers like the New York Times, Wall Street Journal, and Chicago Tribune. The key issue is Perplexity's use of 'retrieval augmented generation' (RAG) to generate summaries of paywalled news articles, which the publishers claim is a violation of copyright. The episode explores the nuances of this legal battle, including the differences from previous AI-related lawsuits, the challenges of distinguishing legitimate fair use from copyright infringement, and the broader implications for the AI industry's relationship with the media.
What topics are discussed in this episode?
This episode covers the following topics: AI legal challenges, Copyright and fair use, Retrieval augmented generation (RAG), AI-media industry relationships, Perplexity legal issues.
What is key insight #1 from this episode?
Perplexity is being sued by major news publishers for using 'retrieval augmented generation' (RAG) to generate summaries of paywalled articles, which the publishers claim is copyright infringement
What is key insight #2 from this episode?
This lawsuit differs from previous AI-related lawsuits, which were more focused on verbatim copying of content, whereas RAG involves retrieving and summarizing content
What is key insight #3 from this episode?
There are challenges in determining what constitutes fair use vs. copyright infringement, especially when other websites may be scraping and republishing paywalled content
What is key insight #4 from this episode?
The episode explores the broader implications for the AI industry's relationship with the media, including efforts by companies like Meta to license content directly from publishers
Who should listen to this episode?
This episode is recommended for anyone interested in AI legal challenges, Copyright and fair use, Retrieval augmented generation (RAG), and those who want to stay updated on the latest developments in AI and technology.
Episode Description
In this episode of AI Applied, Conor Grennan and Jaeden dive into the complex legal challenges facing AI companies, focusing on Perplexity's recent lawsuits. They explore the implications of retrieval augmented generation (RAG) and how it compares to Meta's strategic partnerships with news publishers. Tune in to understand the evolving landscape of AI, legal battles, and the ethical considerations shaping the future of technology.Get the top 40+ AI Models for $20 at AI Box: https://aibox.aiConor’s AI Course: https://www.ai-mindset.ai/coursesConor’s AI Newsletter: https://www.ai-mindset.ai/Jaeden’s AI Hustle Community: https://www.skool.com/aihustle See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Full Transcript
So perplexities in the news, Jaden, today for all the wrong reasons. You know, Jaden, a lot of times when we're talking about perplexity, we're like, oh, look at this insane new stuff they have. And look at this. But the chickens were always coming home to roost, Jaden. And as you and I were just talking about offline a second ago, everybody's suing them now. And look, no surprise. Right. I mean, I was just writing today because I just saw The New York Times is blaring a headline how they're suing them. But, you know, I think, is it News Corp also? Is it Wall Street Journal, New York Post, maybe others? It's a ton of, I guess, New York Post and Wall Street Journal sued them last August. New York Times is also suing OpenAI. But, Jaden, the thing I want to kind of get your take on, because you've been doing a lot of research on this and on your other podcasts as well, you know, how does this one differ also? because I feel like Arvind Srinivas, who's the CEO, has been pretty explicit early on about, oh, you know what? We bend the rules for growth, essentially. I mean, he was just sort of like flaunting it. You and I had a lot of conversations about this last year. I will say I'm a big paid perplexity user. I have been since the beginning. Big paid New York Times fan. But Jaden, what's sort of capturing your attention on this? Because you were sort of like doing a ton on this. Oh, yeah. I think there's a couple different elements to it. And I think this lawsuit differs slightly from all the other lawsuits, all the other AI lawsuits we've seen over the last number of years. And so I thought this one was kind of interesting. And I will also say we'll cover this in a little bit. But this comes at the same time as Meta is signing commercial AI licensing agreements with a bunch of different publishers to get news essentially built into Meta AI. So they're going like a very direct publisher route and paying for it because obviously they want to just avoid everything that Perplex is going through right now. So I think that's smart. We'll break down that deal and what that might look like if Perplexity or others want to play that game, which I think inevitably everyone will be forced to. Spoiler alert. The reason that I think this particular lawsuit is different is when ChatGPT first came out, we saw a lot of publishers suing ChatGPT. And the lawsuits were around the fact that, look, you are the New York Times original lawsuit back then was like, look, you're looking at our articles, you're copying, pasting them verbatim, and you're giving them to users. And, you know, there's that's not allowed, basically. And that's totally fair. That would be a fair complaint. I think when people when they looked into that lawsuit, they found that like, that's not actually how Chatshiftty behaves by default. And maybe someone was jailbreaking it or hacking it, or they're figuring a way to say, give me this verbatim, or maybe it was on another website. Anyways, there's like some other. That was the whole wire cutter. Yeah. Yeah. And so this is different. This isn't just the AI model going to the website and getting information per se and summarizing it because that is a technically, I mean, I'm, I'm, I'm sure there's still some sort of lawsuits in gray area on that, but I think it's going to come out that that is allowed. That is fair use because that's what humans do. Right. I go read the wall street journal and then I go write a LinkedIn post about some interesting takeaways from it. And the wall street journals aren't going to come after me because like the news, they're just reporting on a story. They're just reporting on what's happening. I'm reporting on what I learned. So that's okay for fair use, generally speaking. What's not okay is verbatim copying and pasting. And this is what Perplex is being sued for. But the reason that they're being sued for in this particular case is because RAG reg. So essentially it's called retrieval augmented generation. And that is when data is stored typically in a database and it is retrieved exactly Now the culprit here would be if perplexity really is going to all of these news sites grabbing their news and storing the articles verbatim in their database, and then giving that to users. If that's truly the case, perplexity will be in trouble for that because, you know, they shouldn't be taking it. Now, perplexity has responded to some of the accusations and lawsuits saying that they do not train their models based off of New York Times or there's another lawsuit coming from the Chicago Tribune. They're like, look, we don't take your guys' stuff per se and stick it into our training data set. I think it's kind of a big no-no topic at this point. But what's interesting is there's a ton of other websites that do a summary of it, which is one thing. Sorry, they do a summary of it, which is one thing. But another thing that they do would be to go and like, I'm sure there's websites that just go copy and paste the Chicago Tribune and post it. And those might be getting included. So it's kind of tricky if you're going to say, you know what I mean? Like if you're scraping the whole thing and pulling everything in, how do you know some other random quote unquote pirate isn't just going to paywalled, you know, Chicago Tribune stuff and posting on the internet or posting paragraphs of it on the internet. And it all gets, so like really everything is getting in there, but also how would you know to not allow that? You'd almost have to create a data set of everything the Chicago Tribune does publish and say, like, blacklist this word-for-word text from everything in here. It gets really tricky when the way that these AI models work. So, anyways, the reason this one's different is because it's being sued for using reg. And you can see there's a ton of other ones, too. We have Reddit who filed a lawsuit in October. The Dow Jones is also suing them. Amazon is threatening them for comment. So, there's a whole bunch of stuff going on, and there's an interesting one with comment, but I'll get your take on that. No, no, this is, okay, so I had missed that part about the reg. So I think a lot of our audience will understand a reg, which we've augmented generation, but just in case for those of you who don't, because I always hate when people are just like throwing out terms and we all get it. But just if you ever create like a GPT or something like that and you like put your human resources benefits policy in there and then it can just kind of talk to that document and turn it into a talking document, like a Beauty and the Beast style talking book, that's essentially reg. It's pulling from a specific thing. And, Jaden, the point that you're bringing up, I hadn't known before or I hadn't seen that. I hadn't read it anyway, which was that perplexity is doing this in a different way, which is they're using RAG. And that's, you know what it reminds me of, Jaden, it reminds me of, and I'm a little older than you, so I was probably more in this. I don't know, but like when Napster was out, right? And like what the problem with Napster was that it was, and this is sort of like what changed, what essentially created music streaming. But what Napster did wrong, which was allowed it to be sued by everybody, was that it was holding the songs on a massive database. Like it was storing them and allowing people to sort of exchange files. And that's where I may have said a little wrong and people can correct me. But as I recall, that's where like BitTorrent and things like that started to change the game because it wasn't being stored anywhere. It was just sort of being transferred from one to the other. So that's interesting. And also perplexity. What are you doing? Like you must know that this is the thing that drives me a little nuts about perplexity because people are like, oh, perplexity is going to get their lunch eaten by Google because I'm telling you, Perplexing is fantastic. I don't know what model there is. I mean, I use Claude under the hood, but they fine tune it in such a way that it fantastic The user interface is great They are constantly innovating They amazing You know Sreenivas the CEO was saying just early in the beginning he like look you know growth at all costs right And also this we are still Jayden I don know if you still qualify. This is Wild West. Certainly the last two years were. So I don't know on that thing. But the other point about, your other point was super interesting about, well, what about sort of like some other pirate site just grabbing Washington, you know, Tribune or whatever, I'm sorry, the Chicago Tribune or whatever, and just putting it in perplexity, sucking that up. Then who's at fault there? Right. Which is really interesting because you're right with New York Times and Chad G.B.T. I don't know. The other thing, the other complaint, by the way, New York Times is making is this. They say they also claim that perplexity made up information falsely attributed to the Times. Come on. I mean, that's so rare. It's just like they're putting it out there and I get it. This is a lawsuit. But it's just so rare that that happens. But I do think that it's interesting about who's sort of like to blame when another site is grabbing and the perplexity just grabs it. And then it reminds me that I tried this the other day, Jaden, and this is in the interest of our podcast. Jaden, I was going to put this on LinkedIn. I'm like, you know what? On LinkedIn, it kind of lives forever. And our podcast, AI Applied, is where Jaden and I sometimes share like our deepest stuff that we don't tell anybody else. I'm not even just trying to sort of like sell this here. So, Jaden, if you find something that's paywalled, right, and like, you know, behind whatever. the FT or something like that, right? And you just grab the headline and anything else you can, the date, anything that you can see and put that into perplexity. Perplexity will basically spit out the article. Like I have, I mean, look, I mean, have I ever done that? You know, not for the record, but I'm just saying like, that is a possibility. So New York Times is coming for you, Connor. But I'm a paid subscriber to the Times. But like, but what, you know, that's the problem that I think we're facing here is the paywall problem. You can subscribe to both, right? Like I want the New York Times, but am I going to subscribe to everything? No. So what is perplexity doing? It's allowing you to jump over paywalls. And that's where OpenAI pulled the, you know, the sort of the ripcord very early on saying, okay, whoa, remember those days when you could just go over paywalls? Not doing that anymore. But perplexity definitely is still allowing you to do that, right? Yeah, so I mean, I haven't had a ton of experience with that particular scenario. What I will say like that I do, if I see like a paywalled article, I just copy and paste the title, post it into Google and another like someone else has already read the article and CNBC has written a free one. I'll usually read that. So like I don't know how much different that is if it's not verbatim copy and paste. What I will say the allegation is so, you know, just allegedly at this point is that Perplexity's Comet browser is bypassing paywalls and is giving you summaries of the articles. Same thing that you kind of mentioned, but also, in my opinion, same thing sort of that CNBC would do. If the New York Times has an exclusive, everyone else is still going to copy and write a summary of the exclusive based off of exclusive data only available to the New York Times. I mean, they're attributing it, which is great, but I don't know. I fail to see how it's that much different if it's not copy and pasted, but also AI models have no need to ever copy and paste because they can summarize everything in two seconds. So I don't know. It is a really interesting area. But I guess maybe they can't pull the quotes. Well, I guess you can. You could still share a quote that someone says. I don't know. It is a very interesting gray area. And who is getting away from all of this gray area right now is Meta. You know this is a company we love to hate sometimes But to be fair I think they are doing a lot of things in a pretty reasonable way And so Meta right now they just signed a commercial AI data agreement with publishers to offer real-time news on Meta AI. A couple of things that I think are interesting here with that, one of them being that Meta seems to be the company that a lot of people are accusing of kind of falling behind. And it feels like to me, it doesn't feel so much like Meta is too worried about the lawsuits. It doesn't feel too much to me like Meta has a philanthropic heart that wants to donate to the news organizations. I think Meta genuinely is doing this for user experience. They feel like they're getting left behind and they want cutting edge news. And the reason for this is because in 2024, Meta killed its news tab. And they also stopped compensating news publishers back in 2022, a program that they had before. So if they really wanted to pay news publishers, they would probably have kept doing it. They obviously don't want to spend the money. But I think they know that LLMs really struggle with this. Here's a quote that I'll get your take on it, Connor. But they said, we're committed to making meta AI more responsive, accurate and balanced. Real time events can be challenging for current AI systems to keep up with. But by integrating more and more different types of news sources, our aim is to improve meta AI's ability to deliver timely and relevant content and information with a wide variety of standpoints and content types. included in this whole deal is CNN, Fox News, Fox Sport, LeMond Group, The People Incorporated, and all of their portfolio of media brands, Daily Caller, The Washington Examiner, and the USA Today. That is a lot of news. I think you're getting right and left. So I think politically, people aren't going to complain. You get Fox and CNN on there. You get a bunch of other ones. And I think, I don't know, I just think this is going to be a great move for meta AI to have timely, accurate news. There's going to be no lawsuits. They're allowed to reg model all of this news so they get to pull it directly and no one's complaining they can copy and paste it like this seems like a great setup to me for meta to come back with a really timely ai model i remember working with i did a thing with adobe back in 2023 and this is when mid-journey was really taking off and adobe's um kind of sell was essentially look we're doing this the right way we have a you know we have a relationship with getty images like we're not going to be mid-journey we're not pirating And I think the thought, remember this back in June in 2023, Jaden, whenever it was like, okay, this is an existential thing. Either mid-journey is going to get taken out completely because it's been doing everything illegal or it's going to grow to be too big to fail. And I sort of think we're seeing the latter. So I am just wondering the ethical approach, like, great. Is it worth it? It's, I mean, and I know we're sort of like running up against time a little bit here, but like, you know, Anthropic, if Anthropic was just an ethical company, you know, like ethical AI, I don't think they would be doing what they, Anthropic is phenomenal. as just as a performer. So I'm just wondering, I think you're absolutely right about meta probably trying to do the right thing. I think it makes sense. I do wonder where they're going to go with this and sort of like how they're going to, you know, whether this approach is actually going to matter to people or whether it just matters to lawsuits or not. But guys, listen, as we're talking about all these different models, the open AIs, the chat GT, competitors like meta and everybody else, remember that AI box, that AI can give you all of these things for $19 a month. It's one of my favorite things. You just compare model next to model next to model. You can build out your own workflows on it, guys. It is the absolute sleeper hit of this past year. Super proud of Jaden for building this absolutely unbelievable product. He's gotten just tremendous feedback on it. Go check it out, AI Box State AI, and we will see you on the next AI Applied.
Related Episodes

Disney's Billion-Dollar Bet on OpenAI
AI Applied
12m

Exploring GPT 5.2: The Future of AI and Knowledge Work
AI Applied
12m

AI Showdown: OpenAI vs. Google Gemini
AI Applied
14m

Unlocking the Power of Google AI: Gemini & Workspace Studio
AI Applied
12m

Spotting AI Writing: Insights from Wikipedia's Guide
AI Applied
12m

Exploring OpenAI's Latest: ChatGPT Pulse & Group Chats
AI Applied
13m
No comments yet
Be the first to comment