
Why Opus 4.5 Changes Vibe Coding
The AI Daily Brief • Nathaniel Whittemore

Why Opus 4.5 Changes Vibe Coding
The AI Daily Brief
Episode Description
<p>Today's episode digs into why Anthropic’s surprise launch of Claude Opus 4.5 is landing like a true step-function moment for coding, agentic workflows, and the emerging paradigm of vibe-based software creation, with new benchmarks, early user tests, and developer reactions all pointing to a shift in how real work gets done; plus a quick look at the latest headlines including the White House’s Genesis Mission and Amazon’s massive new government-focused AI expansion. </p><p><strong>Brought to you by:</strong></p><p>KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. <a href="https://www.kpmg.us/AIpodcasts">https://www.kpmg.us/AIpodcasts</a></p><p>Rovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - <a href="https://rovo.com/">https://rovo.com/</a></p><p>AssemblyAI - The best way to build Voice AI apps - <a href="https://www.assemblyai.com/brief">https://www.assemblyai.com/brief</a></p><p>LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/</p><p>Blitzy.com - Go to <a href="https://blitzy.com/">https://blitzy.com/</a> to build enterprise software in days, not months </p><p>Robots & Pencils - Cloud-native AI solutions that power results <a href="https://robotsandpencils.com/">https://robotsandpencils.com/</a></p><p>The Agent Readiness Audit from Superintelligent - Go to <a href="https://besuper.ai/ ">https://besuper.ai/ </a>to request your company's agent readiness score.</p><p>The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614</p><p><strong>Interested in sponsoring the show? </strong>sponsors@aidailybrief.ai</p><p><br></p>
Full Transcript
Today on the AI Daily Brief, the incredible string of model releases continues with Anthropic dropping Claude Opus 4.5. Before that in the headlines, the White House launches the AI Genesis mission. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Super Intelligent, Robots and Pencils, Blitzy and Rovo. To get an ad-free version of the show, go to patreon.com slash AIDailyBrief, or you can subscribe on Apple Podcasts. And if you are interested in sponsoring the show, we're doing a bunch of wrapping up Q1 right now. Send us a note at sponsors at AIDailyBrief.ai, and I can give you all of the info. And with that, let's dive in. Welcome back to the AIDailyBrief Headlines Edition, all the daily AI news you need in around five minutes. Yesterday, you heard about how one AI executive order from the White House had been squashed. Basically, there was a big dust up with congressional Republicans around the White House's plan to create a task force to go after states who put AI regulations on the books. But as it turns out, that was not the only executive order they have planned. President Trump has now officially signed an executive order to launch a national AI science program known as the Genesis Mission. The text of the order argues that the race for global technology dominance in the development of AI, requires a historic national effort comparable in urgency and ambition to the Manhattan Project. This order launches the Genesis mission as a dedicated, coordinated national effort to unleash a new age of AI-accelerated innovation and discovery that can solve the most challenging problems of the century. Michael Kratzios, the director of the White House Office of Science and Technology Policy, continued that tone during the Monday announcement. He described the Genesis mission as the largest marshalling of federal scientific resources since the Apollo program. Now, stripping away the superlatives, the Genesis mission is at core an initiative to collate scientific knowledge from across the government to enable new AI-driven discoveries. Datasets will be gathered from the National Science Foundation, the National Institute of Standards and Technology, and the National Institute of Health. The datasets, some of which stretch all the way back to the 1940s, will be cleaned and transformed into machine-readable formats to make them accessible to AI models. The order lays out a two-fold goal. The trained scientific foundation models, and create AI agents to test new hypotheses, automate research workflows, and accelerate scientific breakthroughs. To that end, the Department of Energy and their network of 17 national labs will make their data and compute resources available to research institutions and private sector companies. The order instructs the DOE to, quote, create a closed loop AI experimentation platform that integrates our nation's world-class supercomputers and unique data assets to generate scientific foundation models and power robotic laboratories. Essentially, this is a major effort to organize the scientific data that's scattered across government agencies and marshal resources in order to drive AI-accelerated scientific discovery. Krasios again said, Since the 1990s, America's scientific edge has faced growing challenges. He cited declining numbers of drug approvals and research outputs despite soaring scientific budgets. The Genesis mission seeks to reverse that trend by, in his words, unifying agencies' scientific efforts and integrating AI as a scientific tool to revolutionize the way science and research are conducted. Datasets and compute infrastructure will be centralized into the American Science and Security Platform to be established by the DOE, who said that once complete, the platform will be, quote, the world's most complex and powerful scientific instrument ever built. It will draw upon the expertise of roughly 40,000 DOE scientists, engineers, and technical staff, alongside private sector innovators to ensure that the United States leads and builds the technologies that will define the future. The DOE is also tasked with formulating a list of 20 science and technology challenges of national importance to form the initial focus of the Genesis mission. This potentially includes domains like advanced manufacturing, biotechnology, critical materials, nuclear fission and fusion energy, quantum information science, and semiconductors. The initiative builds on the existing National Artificial Intelligence Research Resource, or NAR, which was established in 2020 and brought together federal agencies, including the Department of Defense, NASA, and the National Institutes of Health, with private companies like OpenAI, Google, and Palantir, to form a nationwide research community. Lynn Parker, who co-chaired NAR during the Biden admin, said, Government support for AI research builds the foundations for new breakthroughs and helps keep innovation aligned with the public interest. We take for granted that new products appear regularly, but seldom consider the decades of research that made them possible. Without long-term investment, we risk ceding leadership in the technologies that will define our economy, our security, and our daily lives. Now, speaking of the connection between public and private, Amazon announced on Monday that they will spend up to $50 billion to expand their AI and supercomputing facilities for U.S. government customers. The expansion will begin next year and is expected to add a total of 1.3 gigawatts of AI capacity to the AWS regions that service government demand. The expansion will increase capacity for both unclassified and top-secret AWS servers. Said AWS CEO Matt Garman in a press release, our investment in purpose-built government AI and cloud infrastructure will fundamentally transform how federal agencies leverage supercomputing. We're giving agencies expanded access to advanced AI capabilities that will enable them to accelerate critical missions, from cybersecurity to drug discovery. This investment removes the technology barriers that have held government back and further positions America to lead in the AI era. Staying on the chip theme, Meta appears to be preparing to use Google's TPUs in their own data centers. The information reports that Google has begun pitching large cloud customers, including Meta and large financial institutions, on installing TPUs at their own facilities. Google has made their custom AI chips available through Google Cloud for years, but they've yet to sell TPUs directly to outside customers. Part of the pitch is that they're able to operate the chips with higher security and compliance standards that aren't possible with cloud use. According to sources speaking with the information, Meta is in talks to order billions of dollars worth of TPUs to install in their data centers in 2027. If you've been listening over the last week, what's clear is that while Google has been making TPUs for over a decade, the release of Gemini 3 put the chips firmly on people's radar. The new model was trained exclusively on TPUs, leading many to question whether Google's chips could be a viable alternative to NVIDIA's GPUs. The news seems to have moved the stock market, with Bloomberg reporting a 2.7% bump for Google and a 2.7% drop for NVIDIA in overnight markets. Bloomberg analysts wrote, Meta's likely use of Google's CPUs, which are already used by Anthropic, shows third-party providers of large language models are likely to leverage Google as a secondary supplier of accelerator chips for inferencing in the near term. Now, while Google is clearly ramping up to compete, the analysis is still probably getting a little bit ahead of itself. That said, the new report contained a few more crumbs of information on how Google is looking to address the market for AI chips. One of NVIDIA's biggest moats is the CUDA developer ecosystem. As part of the information report, they write that Google has developed a new software suite called TPU Command Center that's designed to make TPU compatibility more easy to navigate. Ultimately, while it could take Google a number of years to carve out a meaningful share of the AI chip market, NVIDIA is already taking the threat seriously. According to the information, NVIDIA is following the dealmaking closely and have enticed Anthropic and OpenAI to make large commitments to NVIDIA GPUs They also wrote that it possible that NVIDIA will seek to preempt a deal between Google and Meta Futurum Equity chief market strategist Shea Bulor writes I know the first instinct is to frame Meta exploring Google TPUs as the start of NVIDIA's pricing power erosion, but that's not what it is. The real story is the velocity of Meta's AI workload curve, as Lama training cycles, video understanding systems, and tens of billions of daily inference calls all smash into the same compute ceiling. Meta is already on pace to spend $100 billion on NVIDIA hardware, and they're still capacity constrained. Adding CPUs doesn't replace the spend, it just sits on top of it. Even if NVIDIA doubled output, Meta would still be short on compute. That's how steep the structural AI capacity shortage actually is. Lastly today, in an interview at the Emerson Collective's Demo Day, which is the venture and philanthropy fund of Steve Jobs' widow Lorene Powell Jobs, Sam Altman and Johnny Ives said that they've nailed the design of their AI device. In possibly the strangest ever description of a consumer device, Altman said, There was an earlier prototype that we were quite excited about, but I did not have any feeling of, I want to pick up that thing and take a bite out of it. And then finally, we got there all of a sudden. Altman said this was Ive's test for knowing when a design is dialed in, when you want to lick it or take a bite out of it or something like that. The pair stayed silent on features, but Altman was excited to describe the vibes of the product. He compared the experience of modern devices as being like walking through Times Square, flashing lights, noises, and the dopamine drip, constantly just dealing with all the little indignities. By comparison, he wants using the OpenAI device to feel more like sitting in the most beautiful cabin by a lake and in the mountains and just sort of enjoying the peace and calm. I've added his vibe, commenting, I love solutions that teeter on appearing almost naive in their simplicity, and I also love incredibly intelligent, sophisticated products that you want to touch, and you feel no intimidation that you want to use almost carelessly. Altman commented, I hope that when people see it, they say, that's it. The interview added no information on what the device will actually do, but for Altman, the key feature continues to be total contextual awareness. He said, It is so simple, but then AI can just do so much for you that so much can fall away. And the degree to which Johnny has chipped away at every little thing that this doesn't need to do or doesn't need to be in there is remarkable. If you feel more rather than less confused, don't worry about it. Substantively, the biggest news was a timeline, with I've stating the device could be available within two years. But with that, we close today's headlines. Next up, the main episode. Today's episode is brought to you by Superintelligent. Now, for those of you who don't know, who are new here, maybe Superintelligent is actually my company. We started it because every single company we talked to, all the enterprises out there are trying to figure out what AI can do for them. But most of the advice is super generic, not specific to your company. So what we do is we map your AI and agent opportunities by deploying voice agents to interview your teams about how work works now and how your people would like it to work in the future. The result is an AI action map with high potential ROI use cases and specific change management needs, basically everything you need to go actually deliver AI value. Go to bsuper.ai to learn more. AI isn't a one-off project. It's a partnership that has to evolve as the technology does. Robots and Pencils work side-by-side with clients to bring practical AI into every phase. Automation, personalization, decision support, and optimization. They prove what works through applied experimentation and build systems that amplify human potential. As an AWS-certified partner with Global Delivery Centers, Robots and Pencils combines reach with high-touch service. Where others hand off, they stay engaged. Because partnership isn't a project plan, it's a commitment. As AI advances, so will their solutions. That's long-term value. Progress starts with the right partner. Start with Robots and Pencils at robotsandpencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and precompiles code for each task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org. Visit blitzy.com and press Get a Demo to learn how Blitzy transforms your SDLC from AI-assisted to AI-native. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Rovo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Rovo is already built into Jira, Confluence, and Jira Service Management Standard, Premium, and Enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you Rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV, as in victory, O.com. Welcome back to the AI Daily Brief. The Thanksgiving 2025 parade of models has continued into a new week, this time with the launch of Claude Opus 4.5 from Anthropic. Now, people have been assuming for some time that we were going to get an Opus 4.5. We've obviously had Sonnet 4.5 for a while now, and so people figured that this was in the offing, but there had been a lot less conversation leading up to this around when it was going to come. The big model, of course, that people have been anticipating is Gemini 3, and in many ways this was a wildly understated announcement. And yet, the response has been, in a word, significant. While they may not have hype-posted, Anthropic minces no words in their launch post. Our newest model, Claude Opus 4.5, is available today. It's intelligent, efficient, and the best model in the world for coding, agents, and computer use. It's also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done. So let's talk first about the benchmarks. And it is no accident that the one they choose to put right at the top is SWE Bench Verified. Now, you might remember that in our discussions about Gemini 3, the only major benchmark that they didn't win or at least match was this one. While Sonnet 4.5 was at a 77.2%, Gemini 3 Pro was at 76.2%, not like it was super far behind, but still not technically state of the art. GPT-51 was also a little tiny bit ahead of Gemini 3 Pro at 76 and extended that lead at 77 when they released GPT Codex Max in the days following Gemini 3 For a very short time 5 Codex Max was the top of the Sweebench verified chart but Opus 4.5, at least by the benchmarks, blows it out of the water. 80.9%. Writes Morgan, a 3% lead has never looked so large. And it wasn't just Sweebench verified. On the TerminalBench 2.0 agentic terminal coding benchmark, 4.5 was meaningfully ahead of all the others as well. On agentic tool use, scaled tool use, and computer use, Opus 4.5 sets a new standard. Now, there were some tests where Opus 4.5 meaningfully lagged behind Gemini 3, such as Humanity's last exam, where they were significantly behind both without search and with search. And yet, what everyone was talking about, of course, was the coding results. If you are a regular listener of this show, you will know that the ascendancy of Anthropic this year and the speed with which they are catching up to OpenAI has much to do with them being the preferred AI coding model for developers. That started with 3.5 and has basically continued unchallenged, although after the release of GPT-5, there have at least been credible competitors. Anthropic seems very clearly to agree with SWIX on the relative importance of coding as compared to all other use cases. A couple times I've referenced Sean's post about what made him decide to go work with Cognition, where he basically booked coding as the high-value short timeline activity. The line which I've shared a couple of times, code AGI will be achieved in 20% of the time of full AGI and capture 80% of the value of AGI. Whether or not that's true, Anthropic has certainly behaved as such. Now, outside just the standard SWE bench, there were a couple of other things that people noticed. Igor Kotenkoff points out that while there are ways to overfit towards the SWE bench verified benchmark, the more recent SWE Bench Pro is a lot more difficult and connected to the real world, and Opus blows previous models out of the water. Opus gets a 52, where Sonnet 4.5 got 43.6, and GPT-5 got just 36%. On Arc AGI, Opus 4.5 set a new standard ahead of 5.1 in Gemini 3, and at Arc AGI 2, they got 37.64% at 240 a task. Already just hours after the release, the people who had early access were also independently verifying some of these results. Bindu Reddy writes, Opus 4.5 tops LiveBench AI and is the world's best agentic model. We can confirm this after testing this over the past few days. Now, interestingly, one of the things that we've seen a lot from labs recently is the people inside the labs really talking up the specifics about what they like about the models. We got a spate of that from Anthropic team members, such as Jake Eaton, who writes, Opus 4.5 is very good at a lot of things, and you should read the benchmarks, the model card, etc. But my favorite thing about working with it these past two weeks is that in conversation, it is somehow more fine-grained. It has a depth and texture that for me was immediately noticeable. It also feels interestingly much more self-contained. Sasha DeMarigny says, The internal response to Opus 4.5 has been a mix of excitement, awe, and surprise, particularly around how good it is at coding. Tariq writes, Opus 4.5 is special. A world record in Sweebench and OS World benchmarks, the best model we've ever had at Vision. On Claude Code, I've completely stopped writing code in the IDE. I think there's so much to discover about Opus 4.5. And indeed, some of the most interesting responses from Anthropics members come from their engineering team. Shelto Douglas writes, I am so excited about this model. First off, the most important eval. Everyone at Anthropic has been posting stories of crazy bugs that Opus found or incredible PRs that it nearly soloed. A couple of our best engineers are hitting the interventions-only phase of coding. Adam Wolf writes, This new model is something else. Since on it 4.5, I've been tracking how long I can get the agent to work autonomously. With Opus 4.5, this is starting to routinely stretch to 20 or 30 minutes. When I come back, the task is often done, simply and idiomatically. They talked about how Claude Opus compared on a notoriously difficult candidate exam. In their announcement post, they wrote, We give prospective performance engineering candidates a notoriously difficult take-home exam. We also test new models on this exam as an internal benchmark. Within our prescribed two-hour time limit, Claude Opus 4.5 scored higher than any human candidate ever. They continue, a productivity improvement of at least 100%. The mean self-estimated productivity improvement was 220%. They also popped open the hood a little bit on how they're making Claude even better when it comes to Agentix. In short, they have a huge emphasis on tools. Indeed, they write, the future of AI agents is one where models work seamlessly across hundreds or thousands of tools. An IDE assistant that integrates Git operations, file manipulation, package managers, testing frameworks, and deployment pipelines. An operations coordinator that connects Slack, GitHub, Google Drive, Jira, company databases, and dozens of MCP servers simultaneously. To build effective agents, they need to work with unlimited tool libraries without stuffing every definition into context up front. Agents also need to be able to call tools from code. Agents also need to learn correct tool usage from examples. Following that, they shared that they were releasing three features to make all of that possible. A tool search tool, which allows Claude to use search tools to access thousands of tools without consuming its context window. Programmatic tool calling, which allows Claude to invoke tools in a code execution environment, reducing the impact on the model's context window, and tool use examples, which provide a universal standard for demonstrating how to effectively use a given tool. So again, all of this is telling a very consistent story, which is that Claude is for coding and pushing the frontier of what agents can do. So outside of interacting with the benchmarks, what were people's first impressions? Some were excited and appreciated that there was less hype around this. Nico Christie writes, have to respect Anthropic's commitment to not vague posting all weekend. This is the most exciting model release since Sonnet 3.5. Leo at SynthWaved writes, Be Anthropic. Pretend Gemini 3 does not exist. Know you're ready to cook it for code anyways. Wait, zero hype posting. Drop new Opus. State-of-the-art for code. State-of-the-art in RKGI. Better than expected. Cost less than old Opus. Be more like Anthropic. On the flip side, Ethan Mollick basically asked why they were burying the lead. I'm not sure why Anthropic keeps doing very low-key launches for fairly major releases and materially important improvements to their services. I kind of think it has to do with the assessment and the specificity of their audience in and among developers. Basically, it's a group of people that they think is going to respond more to having their peers and colleagues tell them about an update rather than getting maximum social distribution because of being loud and hypey. But what about people's early tests? Victor Talen writes, To my surprise, Opus 4.1 one-shotted my hardest calculus problem tying with Gemini 3. In terms of first impressions couldn be more promising I guess Ethan Mollick writes I had early access to Opus 4 and it a very impressive model that seems to be right at the frontier Big gains in ability to do practical work like make a PowerPoint from an Excel Nico again writes Opus 4 is a step function improvement for spreadsheet work. Extremely hard became doable, doable tasks became easy, and easy tasks are now solved. And yet if there were a few examples of people trying non-coding things, coding is very much where the main excitement lies. Guillermo Roch, the CEO of Vercel, writes, Opus is on a different level. It's unreasonably good at Next.js and the best model we've tried on V0 to date. Menlo Ventures' Didi Das writes, Anthropic just dropped the best coding model, Opus 4.5. The coolest thing he points out is it does better at Sweebench Verified without thinking than with 64k reasoning tokens. In other words, a super token-efficient model. Matt Schumer, who didn't have early access, said, First test of clawed Opus 4.5 and I'm already impressed. I asked it for a Colab competitor UI and it quickly pulled together this screen. Definitely better than my similar tests with GPT-5.1 and shockingly, Gemini 3. More testing to go, but this is a good start. He followed it up. Okay, wow, I'm kind of blown away. In one shot, Opus 4.5 made the UI actually functional, with Python running in the browser. Some, like SuperDario, pointed out that this may not even be the best model that Anthropic has behind the scenes. They write, Good time to remind everyone, Anthropic has a long-standing policy of not significantly pushing the frontier to prevent an arms race. Dario can hit Sweebench scores at will. Now, whether or not that's true, the fact that there is a lot of chatter like that I think is good reflection of the sentiment in the community. Maybe the most vocally excited about this is Dan Shipper and the team at Every. He writes, Breaking news. Anthropic just dropped Claude Opus 4.5. It is by far the best coding model I've ever used. And here's how Dan describes it. It extends the horizon of what you can vibe code. Explaining, he writes, The current generation of new models, Santhropic Sonnet 4.5, Google's Gemini 3, or OpenAI's Codex Max 5.1, can all competently build a minimum viable product in one shot, or fix a highly technical bug autonomously. But eventually, if you keep pushing them to vibe code more, they'd start to trip over their own feet. The code would be convoluted and contradictory, and you'd get stuck in endless bugs. We have not found that limit yet with Opus 4.5. It seems to be able to vibe code forever. Two more observations. Opus 4.5, he says, takes working in parallel to a whole new level. Because it's far better at planning and coding, it can work with more autonomy, meaning you can do more in parallel without breaking anything. One of his teammates worked on 11 different projects in six hours and had good results on all of them. Lastly, he points out it's great at design iteration. Opus 4.5, Dan writes, is incredibly skilled at iterating through a design autonomously using an MCP-like playwright. Previous models would lose the thread after a few cycles or say a design was done when it wasn't. Opus 4.5 is incredible at autonomously iterating until a design is pixel perfect. Indeed, Dan's team at Every were equally as vocal in their love of this model. Kieran Klassen writes, 2023 was GPT-4, 2024 was Sonnet 3.5, 2025 is Opus 4.5. This is the coding model launch I've been waiting for. First time I genuinely believe I can Vibecode an entire app end-to-end without touching the implementation details. We haven't found the limit yet. Previous models would eventually trip over their own feet. Convoluted code, contradictory logic, endless bugs. Opus 4.5 just keeps going. If you write code with AI, you need to try this. And I think that this idea is the thing to watch for, to see whether Kieran and Dan's first impressions here, and some of the impressions of the Anthropic team, really play out. That this is, as Kieran puts it, the first time we can vibe code an entire app end to end without touching the implementation details. It strikes me that if that is the case, that could be the most massive implication of this model. Adam Wolf from Anthropic again wrote, I believe this new model in Claude Code is a glimpse of the future we're hurtling towards, maybe as soon as the first half of next year. Software engineering is done. Soon, we won't bother to check generated code for the same reasons we don't check compiler output. I love programming and it's a little scary to think it might not be a big part of my job, but coding was always the easy part. The hard part is requirements, goals, feedback, figuring out what to build and whether it's working. There's still so much left to do, and plenty the models aren't close to yet. Architecture, systems design, understanding users, coordinating across teams, it's going to continue being fun and very interesting for the foreseeable future. But still, it's not hard to see that that's a fairly big pronouncement. Now, moving back to the realm of the non-speculative, the other thing that captured people's attention about this is that Opus 4.5 is significantly cheaper than Opus 4.1. The cost dropped from $15 to 5 per million input tokens and from 75 to 25 per million output tokens. Indeed, Jeremy from Anthropic points out, one fact people won't realize immediately about Opus 4.5, it's remarkably token efficient. All in, it's often cheaper than Sonnet 4.5 and other models for cost per task success. Simon Willison points out why we probably need to be looking not just at cost per output and input tokens, but also token efficiency, when he writes, This is notable. Opus 4.5 is around 60% more expensive than Sonnet, $25 per million output compared to $15 per million output, but if it can use 76% fewer output reasoning tokens for the same complex task, it may end up cheaper. Now that 76% came from Claude Relations' Alex Albert, who said on Sweebench Verified at Medium Effort, Opus 4.5 beat Sonnet 4.5 while using 76% fewer output tokens. Look, it's early days, but the first impressions are big. Dan Shipper again sums up, every 6 to 12 months, a model drops that truly shifts the paradigm. Opus 4.5 launched today and that's what it is. Best coding model I've ever used and it's not close. We're never going back. Brian Atwood points out, I said a month or two ago that Anthropic is a vertical AI company and this is what I meant. They rightly identified that coding is the number one use case for LLMs right now and are overwhelmingly focused on it. Meanwhile, others are throwing darts in every conceivable direction, spreading themselves thin. Interestingly, just a couple days ago, Sam Altman posted, It has been amazing to watch the progress of the Codex team. They are beasts. The product and model is already so good and will get much better. I believe they will create the best and most important product in the space and enable so much downstream work. It has been pretty clear for some time now that OpenAI has come around to a similar view of the importance of coding and are very much not content to cede that ground. Summing up, Ethan Malek writes, The main lesson of the past few weeks is that the big four U.S. labs all seem to have figured out a path forward in continuing the exponential pace of LLM improvement, at least in the near future. More simply put, Andrew Curran writes, AI winter is canceled. Try again next year, Grinch squad. There will, I'm sure, be lots more to discuss around Opus 4.5 as people get deeper into it. But for now, like I said, the Thanksgiving model explosion continues on a beta. That's going to do it for today's episode. Appreciate you listening as always. Until next time, peace. Bye.
Processing in Progress
This episode is being processed. The AI summary will be available soon. Currently generating summary...
Related Episodes

4 Reasons to Use GPT Image 1.5 Over Nano Banana Pro
The AI Daily Brief
25m

The Most Important AI Lesson Businesses Learned in 2025
The AI Daily Brief
21m

Will This OpenAI Update Make AI Agents Work Better?
The AI Daily Brief
22m

The Architects of AI That TIME Missed
The AI Daily Brief
19m

Why AI Advantage Compounds
The AI Daily Brief
22m

GPT-5.2 is Here
The AI Daily Brief
24m
No comments yet
Be the first to comment