Back to Podcasts
The AI Daily Brief

Will This OpenAI Update Make AI Agents Work Better?

The AI Daily Brief • Nathaniel Whittemore

Monday, December 15, 202522m
Will This OpenAI Update Make AI Agents Work Better?

Will This OpenAI Update Make AI Agents Work Better?

The AI Daily Brief

0:0022:11

What You'll Learn

  • OpenAI's GPT-5.2 model is now tied for the leader in the overall Artificial Analysis Intelligence Index, and also tied for first place in coding benchmarks with Gemini 3 Pro.
  • The White House executive order on AI regulations has sparked political tensions, with some Republicans concerned about the potential impact on jobs and the party's message to workers.
  • China is rejecting the U.S. export of NVIDIA's H200 chips, as they aim to achieve semiconductor independence and prop up their domestic chip industry.
  • Beijing is preparing a $70 billion package to incentivize domestic chip making, which could be the largest state-backed investment in semiconductors.
  • The GDPVal benchmark, developed by OpenAI, measures the agentic capabilities of AI models by giving them real-world white-collar tasks with established economic value.
  • The competition between the premier models of all the major foundation labs is very tight, and it remains to be seen if OpenAI can pull ahead with their next release.

Episode Chapters

1

Introduction

The episode covers the latest news and discussions in the AI industry, including OpenAI's model updates, the White House executive order on AI regulations, and the U.S.-China chip competition.

2

White House Executive Order on AI Regulations

The episode discusses the controversial White House executive order that aims to block states from passing their own AI regulations, and the resulting political tensions within the Republican party.

3

OpenAI's GPT-5.2 Model

The episode analyzes the performance of OpenAI's latest GPT-5.2 model on various benchmarks, and how it compares to other leading AI models.

4

U.S.-China Chip Competition

The episode covers China's response to the U.S. allowing the export of NVIDIA's H200 chips, and Beijing's plans for a $70 billion package to incentivize domestic chip making.

5

GDPVal Benchmark

The episode discusses the GDPVal benchmark, developed by OpenAI, which measures the agentic capabilities of AI models by giving them real-world white-collar tasks with established economic value.

6

Conclusion

The episode concludes by highlighting the tight competition between the premier models of all the major foundation labs, and the anticipation for OpenAI's next release.

AI Summary

This episode discusses OpenAI's latest model update, GPT-5.2, and how it compares to other leading AI models in various benchmarks. It also covers the recent White House executive order on AI regulations, which aims to block states from passing their own AI laws, and the resulting political tensions within the Republican party. Additionally, the episode touches on China's response to the U.S. allowing the export of NVIDIA's previous generation H200 chips, and Beijing's plans for a $70 billion package to incentivize domestic chip making.

Key Points

  • 1OpenAI's GPT-5.2 model is now tied for the leader in the overall Artificial Analysis Intelligence Index, and also tied for first place in coding benchmarks with Gemini 3 Pro.
  • 2The White House executive order on AI regulations has sparked political tensions, with some Republicans concerned about the potential impact on jobs and the party's message to workers.
  • 3China is rejecting the U.S. export of NVIDIA's H200 chips, as they aim to achieve semiconductor independence and prop up their domestic chip industry.
  • 4Beijing is preparing a $70 billion package to incentivize domestic chip making, which could be the largest state-backed investment in semiconductors.
  • 5The GDPVal benchmark, developed by OpenAI, measures the agentic capabilities of AI models by giving them real-world white-collar tasks with established economic value.
  • 6The competition between the premier models of all the major foundation labs is very tight, and it remains to be seen if OpenAI can pull ahead with their next release.

Topics Discussed

#GPT-5.2#AI benchmarks#White House AI regulations#U.S.-China chip competition#GDPVal benchmark

Frequently Asked Questions

What is "Will This OpenAI Update Make AI Agents Work Better?" about?

This episode discusses OpenAI's latest model update, GPT-5.2, and how it compares to other leading AI models in various benchmarks. It also covers the recent White House executive order on AI regulations, which aims to block states from passing their own AI laws, and the resulting political tensions within the Republican party. Additionally, the episode touches on China's response to the U.S. allowing the export of NVIDIA's previous generation H200 chips, and Beijing's plans for a $70 billion package to incentivize domestic chip making.

What topics are discussed in this episode?

This episode covers the following topics: GPT-5.2, AI benchmarks, White House AI regulations, U.S.-China chip competition, GDPVal benchmark.

What is key insight #1 from this episode?

OpenAI's GPT-5.2 model is now tied for the leader in the overall Artificial Analysis Intelligence Index, and also tied for first place in coding benchmarks with Gemini 3 Pro.

What is key insight #2 from this episode?

The White House executive order on AI regulations has sparked political tensions, with some Republicans concerned about the potential impact on jobs and the party's message to workers.

What is key insight #3 from this episode?

China is rejecting the U.S. export of NVIDIA's H200 chips, as they aim to achieve semiconductor independence and prop up their domestic chip industry.

What is key insight #4 from this episode?

Beijing is preparing a $70 billion package to incentivize domestic chip making, which could be the largest state-backed investment in semiconductors.

Who should listen to this episode?

This episode is recommended for anyone interested in GPT-5.2, AI benchmarks, White House AI regulations, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Today’s episode breaks down OpenAI’s quiet adoption of Anthropic’s “skills” mechanism and why it could meaningfully change how AI agents work in practice. The discussion explains what skills are, how progressive disclosure improves efficiency and reliability, and why modular, shareable instruction folders may matter more than building ever-more complex agents. In the headlines: fallout from the White House executive order blocking state AI regulation, GOP infighting over AI policy, Nvidia H200 export approval to China and Beijing’s response, and early benchmark results for GPT-5.2.</p><p><strong>Brought to you by:</strong></p><p>KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG &#39;You Can with AI&#39; podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. <a href="https://www.kpmg.us/AIpodcasts">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>Rovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - <a href="https://rovo.com/">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://rovo.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>AssemblyAI - The best way to build Voice AI apps - <a href="https://www.assemblyai.com/brief">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/</p><p>Blitzy.com - Go to <a href="https://blitzy.com/">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a> to build enterprise software in days, not months </p><p>Robots &amp; Pencils - Cloud-native AI solutions that power results <a href="https://robotsandpencils.com/">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>The Agent Readiness Audit from Superintelligent - Go to <a href="https://besuper.ai/ ">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a>to request your company&#39;s agent readiness score.</p><p>The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614</p><p><strong>Interested in sponsoring the show? </strong>sponsors@aidailybrief.ai</p><p><br></p><p><br></p>

Full Transcript

Today on the AI Daily Brief, why open AI are adopting the skills mechanism and how it could improve agents. Before that in the headlines, the fallout from the latest White House executive order on AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Rovo, Robots and Pencils, and Blitzy. To get an ad-free version of the show, go to patreon.com slash AIDailyBrief, or you can subscribe on Apple Podcasts. If you are interested in sponsoring the show, send us a note at sponsors at AIDailyBrief.ai. We can send you all the information you need. Also at AIDailyBrief.ai, you can find out anything else you might need to know about the podcast. We're going to be doing a few more days of this newsletter test this week before reviewing and seeing what the plan is for January. For now, like I said, you can find that all on aidailybrief.ai. Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. Last week, after a lot of behind-the-scenes discourse, some of which spilled into very public acrimony, President Trump signed a highly contentious order attempting to block states from passing their own AI regulations. Now, this is one of those classic debates that's about 100 things at once. To take the administration at face value, this is about creating a single federal rulebook as a necessary step to ensuring the U.S. can win the AI race. But then, of course, underneath that, there are issues of the power relationship between the federal government and states. That's one that's been big here in the U.S. for the last 250 years or so. And there's also the substory of the GOP fracturing around Trump's alliance with AI technology companies. A draft of the order had circulated in late November, sparking outrage on both sides of the aisle. The executive order that ended up passing on Thursday was substantively identical to the draft. That included the controversial measure of establishing a dedicated task force within the DOJ to start a campaign of litigation against states with their own AI laws. The order also instructed the Commerce Department to withhold federal broadband funding from states that had, in the words of the EO, onerous AI laws. There are three big issues that the EO brings up when it comes to state-level regulations. First, they said by definition it creates a patchwork of 50 different regulatory regimes that makes compliance, especially for startups, particularly challenging. Second, the White House claims, quote, state laws are increasingly responsible for requiring entities to embed ideological bias within models. Third, they say state laws sometimes impermissibly regulate beyond state borders, impinging on interstate commerce. Now, of course, the Democratic side of the aisle immediately had a lot to say about this. Scott Weiner, who has been extensively involved in state AI legislation in California, said, it's absurd for Trump to think he can weaponize the DOJ and commerce to undermine those state rights. If the Trump administration tries to enforce this ridiculous order, we will see them in court. Federal Senator Brian Schatz has already sponsored a bill that would overturn the order. Schatz drew on the criticism that this order blocks state law and replaces it with nothing, commenting, Congress has a responsibility to get this technology right and quickly, but states must be allowed to act in the public interest in the meantime. Now, as I mentioned before, the order also triggered infighting for Republicans who are worried that AI will be a losing issue in the midterms. Writes the Washington Post, populist forces within the Republican Party mounted an extensive campaign to derail the action, after a draft of the order leaked last month, arguing that fears over AI's potential to automate jobs would undermine the party's message to workers. Now, the Post said a handful of tech leaders neutralized those fears for now, convincing the president, a longtime real estate developer, that burdensome regulation could cripple the industry. White House AI's R David Sachs did take to Twitter slash X to have some conciliatory words on at least a few of the concerns from the right. He called them the four C's, child safety, communities, creators, and censorship. On child safety, he said, preemption would not apply to generally applicable state laws. So state laws requiring online platforms to protect children from online predators or sexually explicit material would remain in effect. On communities, he said, AI preemption would not apply to local infrastructure. In short, preemption would not force communities to host data centers they don't want. On creators, he said, copyright law is already federal, so there is no need for preemption here. Questions about how copyright law should be applied to AI are already playing out in the courts. That's where this issue will be decided. And on censorship, he claimed, As mentioned, the biggest threat of censorship is coming from certain blue states. Red states can't stop this. Only President Trump's leadership at the federal level can. Still, it does not seem all is resolved when it comes to AI politics on the right. The Post describes a, quote, simmering rift between the populist and tech factions of the Republican Party, with one source saying, It feels like millions of votes across the country just got traded for thousands of VCs in tech-rich votes in regions Republicans will never win. Now, moving over to another recent move. Last week, the president announced that NVIDIA's previous generation H200 chips would be approved for export, the first time that unmodified Western versions of the chips had been approved in over three years. That news was immediately followed by reports that Beijing was meeting with tech firms and considering how tightly to restrict access. Basically, the strategic consideration for China is how much to allow in these new chips, which could accelerate the output of their labs, versus to continue to focus on their domestic chip industry, which while potentially slowing down those outputs in the short term could create long-term resilience and independence. Speaking with Bloomberg on Friday, AI's R David Sachs said, China's rejecting our chips. Apparently they don't want them. And I think the reason for that is they want semiconductor independence. Now, he cited Financial Times reporting here rather than inside communications. Still the comments highlight that the chip strategy may be too late The logic of granting access to H200s was largely that the U needs to get ahead of China developing their own advanced chips And if NVIDIA can flood China with their chips then that sort of puts the strategy in jeopardy NVIDIA meanwhile said while we do not yet have results to report, it's clear that three years of overbroad export controls fueled America's foreign competitors and cost U.S. taxpayers billions of dollars. Added Sachs, what you see is China's not taking them because they want to prop up and subsidize Huawei. That was part of our calculation of selling not the best but lagging chips to China is that you can take market share away from Huawei. But I think the Chinese government has figured that out, and that's why they're not allowing them. To that point, Bloomberg is reporting that Beijing is preparing a $70 billion package to incentivize domestic chip making. Final details, including target companies, are still to be determined, but this could be the largest ever state-backed investment in semiconductors. For comparison, $39 billion was allocated to the Chips Act subsidies in the U.S., and the EU is currently putting together a $46 billion package for their domestic industry. Moving over to models, GPT 5.2 has been out for a few days and the independent benchmarking results are in. The model is now tied for leader in the overall Artificial Analysis Intelligence Index, nuzzling up together with Gemini 3 Pro. On their coding index, the model also tied for first place with Gemini 3 Pro, with Claude 4.5 Opus a couple of points behind. Now, for any of you who follow developers on X and see the difference of opinion on Opus 4.5 versus all these models, is exactly the sort of reason why you need to be skeptical of the overall value of benchmarks. On their agentic index, GPT 5.2 is in second place to Opus 4.5, but slightly ahead of Gemini 3 Pro. Overall, all these results really do to show is that with 5.2, OpenAI now has a credible competitor to the other big labs. It is not decidedly and clearly better than the other models, but it is a meaningful bump from GPT 5 and 5.1. Now, recent reporting suggested that Code Red would continue until next year, and these results I think help show why. Now one particularly interesting result was on GDPVal. That benchmark you might remember was developed by OpenAI and seeks to measure agentic capabilities by giving models real-world white-collar tasks with established economic value, unlike some other benchmarks it measures end-to-end task completion. Artificial Analysis recently developed an independent AI evaluator for the tasks that allows them to include GDPVal in their assessment suite. When OpenAI announced it, they were using real-world experts in addition to an experimental AI assessor. On that benchmark, 5.2 managed to top the leaderboards, pulling ahead of Opus 4.5 by a decent margin. I think people are still trying to wrap their head around GDPVal and come to a common sense understanding of just how valuable the benchmark is. But again, this just further solidifies to me that there is a very tight, clear competition with the premier models of all the major foundation labs. We will see if OpenAI can change that with their next release, which is anticipated in January. For now, however, that is going to do it for today's headlines. I appreciate you listening or watching as always, and until next time, peace. focused on what it's like to actually drive AI change inside your enterprise, and as case studies, expert panels, and a lot more practical goodness that I hope will be extremely valuable for you as the listener. Search You Can With AI on Apple, Spotify, or YouTube and subscribe today. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Robo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence, and Jira Service Management Standard, Premium, and Enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you Robo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV, as in victory, O.com. AI isn't a one-off project. It's a partnership that has to evolve as the technology does. Robots and pencils work side-by-side with clients to bring practical AI into every phase. Automation, personalization, decision support, and optimization. They prove what works through applied experimentation and build systems that amplify human potential. As an AWS-certified partner with Global Delivery Centers, Robots & Pencils combines reach with high-touch service. Where others hand off, they stay engaged. Because partnership isn't a project plan, it's a commitment. As AI advances, so will their solutions. That's long-term value. Progress starts with the right partner. Start with Robots & Pencils at robotsandpencils.com slash AIDailybrief. This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context. Next, Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale codebases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan then generates and precompiles code for each task Blitzy delivers 80 plus of the development work autonomously while providing a guide for the final 20 of human development work required to complete the sprint Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org. Visit blitzy.com and press get a demo to learn how Blitzy transforms your SDLC from AI assisted to AI native. Welcome back to the AI Daily Brief. Today we're getting a little bit more technical than we normally do, but there's a reason for that. One of the big themes of 2025 was supposed to be AI agents. And while I would argue that that came true, it was a little bit more nuanced than I think people thought it would be going into it. I believe that the expectation was that we would see agents proliferate across the enterprise. Instead, what we got was one, coding agents becoming the most important breakout category in AI writ large. and two, a lot of infrastructure and standards type work around how we build agents that set us up for that sort of maturity and proliferation in the years to come. Now, around that, one of the things that's been interesting is to see how companies, even very fiercely competitive companies in the space, have frequently decided over the course of the last year to adopt each other's standards rather than trying to compete around standards. We saw this, of course, with MCP, which became a standard way adopted by Google and OpenAI and Microsoft, even though it originated with Anthropic, to allow LLMs and AI applications to access outside information. And now it appears that something similar might be happening with skills. At the end of last week, a number of folks on Twitter slash X, including Simon Willison, noticed that Anthropic's skills mechanism was starting to show up in the OpenAI ecosystem. So let's talk about what skills are and why this could be a big deal. Back in October, Anthropic introduced agent skills, which they called a new way to build specialized agents using files and folders. And at core, files and folders are what skills are. Specifically, Anthropic writes that skills are organized folders of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks. The goal is to allow general purpose agents to become specialized agents in the context of the work that they're doing at the time. And in many ways, when Anthropic introduced this, that seemed to be the goal. Instead of developers having to build this complicated, balkanized and fragmented landscape of custom-designed agents for every single different use case, by making capabilities and knowledge composable and accessible on demand, a much less fragmented landscape of generalized agents could access those capabilities and knowledge when needed to become specialized agents. A skill is basically a folder or a directory that contains a file called skill.md. In other words, a markdown file. That file has a name, a description and instructions. When an agent that has access to skills starts up, it loads the names and descriptions of all installed skills into its systems prompt. And then when a relevant task comes up, Claude can read the full instructions. This is what Anthropic calls progressive disclosure. Claude only loads context when it needs it. In other words, Claude doesn't have to waste a bunch of time loading up all the instructions in each skill. It can just sort through that name and description metadata to figure out which skills it should be accessing for a particular task. So layer one of progressive disclosure is that basic metadata of a name and a description. The second layer of detail is the actual body of the file, with instructions, procedural knowledge, context, whatever it may be. If there is even additional content, that can also be bundled underneath, leading to a third level of progressive disclosure. In that announcement, Postanthropic wrote, As skills grow in complexity, they may contain too much context to fit into a single skill.md, or context that's relevant only in specific scenarios. In these cases, Skills can bundle additional files within the skill directory and reference them by name from skill.md. These additional linked files are the third level and beyond of detail, which Claude can choose to navigate and discover only as needed. In the example they give, which is a comprehensive PDF toolkit for extracting text and tables, the second layer overview includes a line for advanced features, JavaScript libraries, and detailed examples. See reference.md. And if you need to fill out a PDF form, read forms.md and follow its instructions. This is that bundling of additional content. So like I said, sometimes skills are going to include procedural knowledge. Sometimes they're going to include background and context. Sometimes they're going to include code. For example, instead of Claude generating code to extract PDF form fields, a skill might include a Python script that does it reliably. So there are a bunch of theoretical benefits of this system. Skill files are markdown files, meaning that anyone can write them. This allows for customization without engineering. If you can write instructions for a human, you can write instructions that become part of a skill. Second benefit is efficiency. Progressive disclosure means that context is only loaded when it's needed so that the user isn't burning tokens on irrelevant instructions. There's the composability benefit in the fact that skills stack. You can have multiple skills working together instead of building single-purpose agents. There's reliability. We just mentioned that coding example and skills can include code that runs deterministically instead of it being regenerated every single time. And finally, there's portability. Institutional knowledge gets captured in a format that persists and can be transferred, meaning that new users or agents can access it immediately. So basically, if the model context protocol is an open standard for allowing LLMs to connect to external tools and data sources in a uniform way skills are a standard for specialized instructions and context that allow LLMs or agents to perform specialized tasks without the user having to re the process every time Now when skills came out there was a lot of excitement about them AI engineering thought leader Simon Willison for example wrote a post called Claude Skills Are Awesome Maybe a Bigger Deal Than MCP. Now, Simon's core argument comes down to efficiency and simplicity. Back in October, he wrote, Model Context Protocol has attracted an enormous amount of buzz since its initial release back in November last year. Over time, the limitations of MCP have started to emerge. The most significant is in terms of token usage. GitHub's official MCP on its own famously consumes tens of thousands of tokens of context. And once you've added a few more to that, there's precious little space left for the LLM to actually do useful work. Simon continued, My own interest in MCPs has waned ever since I started taking coding agents seriously. Almost everything I might achieve with an MCP can be handled by a CLI, or command line interface, instead. LLMs know how to call CLI tool help, which means you don't have to spend many tokens describing how to use them. The model can figure it out later when it needs to. Skills have the exact same advantage, only now I don't even need to implement a new CLI tool. I can drop a markdown file in describing how to do a task efficiently, adding extra scripts only if they'll make things more reliable or efficient. Now, trying to simplify this as much as possible, basically what Simon is saying is that with MCP, you have to build something for Claude to use a tool. With a CLI, Claude can just use tools that already exist. But with skills, Claude can just read instructions you wrote and figure it out. And indeed to Simon, as he puts it, the simplicity is the point. He writes, one of the most exciting things about skills is how easy they are to share. I expect many skills will be implemented as a single file. More sophisticated ones will be a folder with a few more. Something I love about the design of skills is that there is nothing at all preventing them from being used with other models. You can grab a skills folder right now, point Codex CLI or Gemini CLI at it, and say read PDF slash skill MD, and then create me a PDF describing this project, and it will work. despite those tools and models having no baked-in knowledge of the skills system. I expect we'll see a Cambrian explosion of skills, which will make this year's MCP Rush look pedestrian by comparison. The core simplicity of the skills design is why I'm so excited about it. Now, in retrospect, that looks a little prophetic. Sean Wang Swix wrote, I was skeptical when Simon Willison said that Claude's skills are awesome, maybe a bigger deal than MCP, but early indications are this is correct. He then shared a talk from the recent AI engineer Code Summit, which he said is the fastest talk to ever pass 100,000 views on the AI Engineer channel. The talk, by the way, was about why we should stop building agents and start building skills. The problem they identified was intelligent agents lack expertise. Genius without experience, as they put it. The solution is a new architecture with skills. A skill, they say, is an expert in a folder. And the new app store for AI are the skills that they can access. The old way, then, are monolithic agents that have a separate agent for each domain, hard-coded or prompted in context, and which doesn't improve over time, while the new way, agents plus skills, are a general agent with many skills packaged in simple reusable folders that enable continuous and tangible learning. Then at the end of last week, people started to notice skills showing up in the OpenAI ecosystem. AI techie Arun writes, OpenAI just quietly stole Anthropics' homework and it's brilliant. OpenAI integrated Anthropics' skills mechanism into ChatGPT and Codex, allowing the models to dynamically manage files like spreadsheets and PDFs. This modular approach to agent capabilities is proving to be a foundational piece of next-gen LLMs. Simon Willison also picked up on this. On Friday, he wrote, OpenAI aren't talking about it yet, but it turns out they've adopted Anthropix's brilliant skills mechanism in a big way. Skills are now live in both ChatGPT and their Codex CLI tool. This was confirmed a couple days later by Tebow at OpenAI, who wrote, We've added experimental support for skills and it combines well with GPT-5 too. Already seeing some cool things in the wild that leverage skills in Codex. I think about skills as an extension of agents.md with progressive disclosure. By the way, agents.md was OpenAI's lightweight markdown standard for providing AI coding agents specifically with project-specific instructions. So thinking in a similar domain. Now in Simon's new post, he wrote, One of the things that most excited me about Anthropik's new skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just a folder with a markdown file and some optional extra resources and scripts, so any LLM with the ability to navigate and read from a file system should be capable of using them. It turns out OpenAI are doing exactly that, with skills support quietly showing up in both their Codex CLI tool and now also in ChatGPT itself. Now, so far, people are just starting to experiment and figure out how they work in OpenAI. But as Simon summed up, when I first wrote about skills in October, I said they're awesome, maybe a bigger deal than MCP. The fact that it's just turned December and OpenAI have already leaned into them in a big way reinforces to me I called up one correctly. Hold aside Simon's good call. This to me is continued evidence that it matters way more to these foundation lab companies to move at the speed of development than to own the standard. Keyshawn wrote, OpenAI seems comfortable to let Anthropic create standards like MCP and skills, then adopt them later. Skills are wonderfully simple, and I wish all the CLI agents adopt the pattern. Look, even though 2025 was a big year for agents in a lot of ways, it's still very clear that we are so barely scratching the surface of what's possible. and one of the things that will accelerate us heading into 2026 is the common adoption of these mutual standards. So super interesting stuff. Excited to see what people go build with this. For now, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace. Thank you.

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies