Back to Podcasts
The AI Daily Brief

What 1,250 Professionals Say About Working With AI

The AI Daily Brief • Nathaniel Whittemore

Friday, December 5, 202528m
What 1,250 Professionals Say About Working With AI

What 1,250 Professionals Say About Working With AI

The AI Daily Brief

0:0028:25

What You'll Learn

  • Google has released Gemini 3 DeepThink mode, their most powerful AI model yet, designed for complex math, science, and logic problems.
  • Google has partnered with Replit to bring VibeCoding, a tool for enterprise-level coding, to the market.
  • Opus 4.5 has reportedly solved the core bench scientific agent benchmark, a benchmark focused on agentic code execution.
  • The CoreBench team found that Opus 4.5's performance on the benchmark almost doubled when using a new scaffold that uses Claude Code.
  • The CoreBench team is now planning to pivot to a new, undisclosed set of test questions to ensure the questions aren't included in the training data.

Episode Chapters

1

Gemini 3 DeepThink Release

Google has released a powerful new AI model called Gemini 3 DeepThink, designed for complex problem-solving.

2

Google and Replit Partnership

Google has partnered with Replit to bring VibeCoding, a tool for enterprise-level coding, to the market.

3

Opus 4.5 Solves CoreBench

Opus 4.5 has reportedly solved the core bench scientific agent benchmark, a benchmark focused on agentic code execution.

AI Summary

This episode of the AI Daily Brief covers the latest developments in the AI industry, including the release of Google's Gemini 3 DeepThink mode, a powerful AI model designed for complex problem-solving. It also discusses Google's partnership with Replit to bring VibeCoding to the enterprise, and the continued hype around Opus 4.5, which has reportedly solved the core bench scientific agent benchmark.

Key Points

  • 1Google has released Gemini 3 DeepThink mode, their most powerful AI model yet, designed for complex math, science, and logic problems.
  • 2Google has partnered with Replit to bring VibeCoding, a tool for enterprise-level coding, to the market.
  • 3Opus 4.5 has reportedly solved the core bench scientific agent benchmark, a benchmark focused on agentic code execution.
  • 4The CoreBench team found that Opus 4.5's performance on the benchmark almost doubled when using a new scaffold that uses Claude Code.
  • 5The CoreBench team is now planning to pivot to a new, undisclosed set of test questions to ensure the questions aren't included in the training data.

Topics Discussed

#Gemini 3 DeepThink#VibeCoding#Opus 4.5#CoreBench scientific agent benchmark

Frequently Asked Questions

What is "What 1,250 Professionals Say About Working With AI" about?

This episode of the AI Daily Brief covers the latest developments in the AI industry, including the release of Google's Gemini 3 DeepThink mode, a powerful AI model designed for complex problem-solving. It also discusses Google's partnership with Replit to bring VibeCoding to the enterprise, and the continued hype around Opus 4.5, which has reportedly solved the core bench scientific agent benchmark.

What topics are discussed in this episode?

This episode covers the following topics: Gemini 3 DeepThink, VibeCoding, Opus 4.5, CoreBench scientific agent benchmark.

What is key insight #1 from this episode?

Google has released Gemini 3 DeepThink mode, their most powerful AI model yet, designed for complex math, science, and logic problems.

What is key insight #2 from this episode?

Google has partnered with Replit to bring VibeCoding, a tool for enterprise-level coding, to the market.

What is key insight #3 from this episode?

Opus 4.5 has reportedly solved the core bench scientific agent benchmark, a benchmark focused on agentic code execution.

What is key insight #4 from this episode?

The CoreBench team found that Opus 4.5's performance on the benchmark almost doubled when using a new scaffold that uses Claude Code.

Who should listen to this episode?

This episode is recommended for anyone interested in Gemini 3 DeepThink, VibeCoding, Opus 4.5, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Anthropic asked 1,250 professionals how AI is actually changing their work, and the results reveal a blend of optimism, anxiety, and shifting identity—creatives feeling squeezed, scientists wanting trustworthy partners, and most workers hoping to hand off routine tasks while keeping what defines their craft. The episode also looks at how AI-run interviews collapse the old scale-vs-context tradeoff in research and what that means for understanding real-world AI impact. Headlines include Gemini 3 Deep Think, Replit’s enterprise push with Google, Opus 4.5’s benchmark surge, Salesforce’s Agent Force momentum, and Meta’s pivot away from the metaverse.</p><p><strong>Brought to you by:</strong></p><p>KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG &#39;You Can with AI&#39; podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. <a href="https://www.kpmg.us/AIpodcasts">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>Rovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - <a href="https://rovo.com/">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://rovo.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>AssemblyAI - The best way to build Voice AI apps - <a href="https://www.assemblyai.com/brief">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/</p><p>Blitzy.com - Go to <a href="https://blitzy.com/">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a> to build enterprise software in days, not months </p><p>Robots &amp; Pencils - Cloud-native AI solutions that power results <a href="https://robotsandpencils.com/">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a></p><p>The Agent Readiness Audit from Superintelligent - Go to <a href="https://besuper.ai/ ">⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠</a>to request your company&#39;s agent readiness score.</p><p>The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614</p><p><strong>Interested in sponsoring the show? </strong>sponsors@aidailybrief.ai</p><p><br></p>

Full Transcript

Today on the AI Daily Brief, what 1,200 professionals tell us about working with AI. And before that, in the headlines, Gemini 3 Deep Think is now available. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Super Intelligent, Robo, Robots and Pencils, and Blitzy. To get an ad-free version of the show, go to patreon.com slash ai daily brief, or you can subscribe on Apple Podcasts. And of course, if you are interested in sponsoring the show, locking in those 2025 rates before they expire, send us a note at sponsors at AIDailyBrief.ai. Now, last note before we dive in, we're doing a bit of a switcheroo today. The headline section is actually a little bit longer than the main episode. There was just enough news that we kind of had to do it that way. So without any further ado, let's dive in. Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. Although today is a very jam-packed episode, so I expect it to be a little longer than normal. We kick off today with an exciting one for you model testers out there. Google has released Gemini 3 Deep Think Mode, which is their most powerful version of the new Gemini 3 suite. Now, right now, the new mode is exclusively available to subscribers of Google's AI Ultra Plan, which is their couple hundred dollar a month type of product. Now, as you might imagine then, with the price tag that high, DeepThink is designed to tackle the most complex math, science, and logic problems available. The mode builds on top of Gemini 2.5 DeepThink, and as much as I tend not to care about benchmarks, does claim some impressive performances. They claim a state-of-the-art 41% result on Humanity's last exam without the use of tools, outperforming GPT-5 Pro at 30.7%, and DeepThink also achieved a 45% result on the Arc AGI 2 test, more than doubling the performance of GPT-5 Pro to become the new state-of-the-art. Now, it should be noted whenever we talk about ArcGi that there are two vectors. There is score and there is cost per task. And while Gemini 3 DeepThink absolutely shattered the previous high score, it did so at a pretty elevated cost of $77 a task. Now, this might go some way to explaining why they're paywalling DeepThink mode behind the most expensive subscription. It's important to note that this is the first time normal users have ever had access to a model this expensive to run. OpenAI never released the preview version of O3 that cost $167 per task to achieve its state-of-the-art performance at the end of last year. DeepThink achieves its state-of-the-art performance by exploring multiple hypotheses at once before delivering a solution, a technique that has been used in research to boost performance but generally hasn't been available to regular users as a standard feature due to the high inference costs. Now, one thing that's not exactly clear yet is what the use case is actually expected to be here. They announced it by showing it generating a dominoes game with complex physics in one shot. Another Googler showed it producing a complex physics simulation of a rubber vase falling on a hard surface, which I think helps clear up one thing. Because it is called DeepThink and we already have deep research, there may be some mental overlap between the two, but they are fundamentally different things. DeepThink is not just a souped-up version of deep research. Instead, it is capable of scientific reasoning. Now, in terms of first reactions, a lot of people had the same experience of Hyperbrowser founder Shri Shukani, who wrote, with all their TPUs and GPUs, how TF is Gemini 3 DeepThink overloaded and unusable? This is also the response I got for the first couple of hours after the announcement, although then it cleared up. Victor Talon writes, for those wondering and as expected, Gemini 3 DeepThink solves the stack overflow bug that cost me a few days. The answer is more decisive than Opus 4.5, the only other public model to solve it. Even Gemini 3 Pro fails. It even points the exact location confidently. Takes forever though. I don't have harder tests for now. Most of my benchmarks are saturated. Now I've only had access for less than a day so far as well. Knowing that it sort of probably wasn't the use case, I still gave it a recent business strategy question that I had been both genuinely exploring, but also trying to test GPT-5-1 thinking versus GPT-5-1 Pro versus Gemini 3 Pro. And I will say at this stage, I don't particularly think that the extra reps of DeepThink are worth it for that type of business strategy question. Basically, I don't think that it particularly added anything more. In fact, I didn't even prefer its response relative to the others. So whereas I have recently been finding myself being willing to take the time for 5.1 Pro on business strategy questions, I think DeepThink might be a little too far and just not right-sized for that particular purpose. In any case, I will continue to experiment with it, making full use of that Ultra account. Next up, we stay in the Google universe, where they have partnered with Replit to bring VibeCoding to the enterprise. The multi-year partnership will see Replit expand their use of Google Cloud services, meaning a deeper integration of Google's AI models, as well as using Google Cloud infrastructure on the back end to enable fully functional Vibe-coded software. Apps coded in Replit will also be able to leverage Google Cloud Marketplace in their go-to-market strategy. Replit CEO Amjan Massad said, The goal for us and for Google is to make enterprise Vibe-coding a thing. We want to show the world that these tools are actually going to transform businesses and how people work. Instead of people working in silos, designers only doing design, project managers only writing, now anyone in the company can be entrepreneurial. Richard Serrata, the senior director for Google Cloud, added, It may feel like it, but Replit is no overnight success. Amjad and team built something over time that became the exact right thing for this current moment with builders. In separate comments to CNBC on the state of the AI bubble, Amjad acknowledged that the honeymoon phase for VibeCoding is over. He said, Early on in the year, there was the VibeCoding hype market where everyone's heard about VibeCoding. Everyone wanted to go try it. The tools were not as good as they are today. So I think that burnt a lot of people. So there's a bit of a Vibe coding, I would say, hype slowdown, and a lot of companies that were making money are not making as much money. Amjad noted that earlier in the year, we were getting weekly ARR updates from the Vibe coding companies, and now we're not. That said, new statistics from Ramp suggest that Replit isn't slowing down all that much. The Ramp Economics Lab reported that Replit is currently number one for new customer growth across all software vendors. Google is also up there following the release of Gemini 3 and Nano Banana Pro, sitting at number five for new customer growth and number two for new spend growth. Now, one of my contrarian somehow takes is that I think we are actually way too bearish on vibe coding right now. I tend to think that when we say vibe coding, we are having two entirely different conversations at the same time using the same words. There is vibe coding for non-technical people, which is entirely different than vibe coding for software engineers. That shine coming off the rows type of phenomenon that Amjad was talking about is, I think, specific to Vibe Coding for software engineers. There is a recalibration happening right now among developers around how best to deploy these tools around the autonomy spectrum, all these sort of questions around how you're going to integrate agentic coding into your processes in a way that doesn't just create new problems. However, for the non-technical people, I think we are barely scratching the surface. In particular, I do not think that Vibe Coding has significantly made its way into the business world yet It mostly still individual hackers and tinkerers who are discovering that they can build and modify their own websites now without having to use Wix or Squarespace or something like that I genuinely believe that that is going to change And I actually think 26 is going to be a massive increase year for Vibe Coding but with a very different market audience One more Google adjacent story. Google's NeoCloud partner Fluidstack is in talks to raise $700 million at a $7 billion valuation. Fluidstack started the year as a relative unknown, but signed multiple data center development deals to jumpstart their business. Google served as the backstop on a pair of deals, pledging to repay debt if Fluidstack defaults. As part of those deals, Fluidstack became one of the first third-party vendors to receive Google's TPUs. Now, that wasn't massive news back in September when the deals were struck, but now that the market narrative views TPUs as a genuine contender to NVIDIA's dominant GPUs, that is changing. Fluidstack also secured the contract to build a gigawatt capacity data center in France as part of President Emmanuel Macron's push for sovereign AI. They are additionally the infrastructure partner for Anthropics' $50 billion data center investment announced last month. The new funding round will reportedly be led by situational awareness, which is of course the hedge fund started by former OpenAI researcher Leopold Aschenbrenner. Moving to our next story, hype around Opus 4.5 continues to build as the model keeps pushing the limits. Sayesh Kapoor, who you may know from the AI is Normal Technology blog, announced that his team are ready to declare that Opus has solved the core bench scientific agent benchmark. The benchmark requires agents to reproduce scientific papers when given the code and data from a paper. The agent is scored on its ability to set up the repo from the paper, run the code, and then correctly answer questions about the result. Functionally, it's a benchmark primarily about agentic code execution. CoreBench uses a common agent scaffold called CoreAgent to allow comparison between different models on a level playing field. Opus 4.5 was initially tested using CoreAgent and scored 42%, a solid score but not close to Opus 4.1's leading score of 51%. DeepMind researcher Nicholas Carlini then reached out to the team with a new scaffold that uses Claude Code, as well as some issues with the way the benchmark was being scored. The core bench team ran the benchmark again using the Claude Code harness and found that Opus 4.5's performance almost doubled to 78%. Interestingly, a jump of this size was unique to Opus 4.5. Sonnet 4 and 4.5 saw much smaller improvements, and Opus 4.1 actually went backwards. Kapoor wrote, We're unsure what led to this difference. One hypothesis is that the Claude 4.5 series of models is much better tuned to work with Claude code. Another could be that the lower-level instructions in CoreAgent, which worked well for less capable models, stopped being effective and hinder the model's performance for more capable models. The CoreBench team also manually went through their benchmark, weeding out grading errors that Carlini had pointed out. Eight tasks were being incorrectly marked as wrong due to small floating-point errors, and one task was impossible to reproduce due to a dataset being removed from the internet. The team manually scored Opus 4.5's performance at 95% with only two tasks failed. Kapoor wrote, With Opus 4.5 scoring 95%, we're treating core bench hard as solved. The team now plans to pivot to an undisclosed set of test questions for their next benchmark to ensure the questions aren't included in training data. Now, outside the benchmarks, the personal testimonials for 4.5 Opus just continue to roll in. Dan Shipper from Every, who was very bullish to begin with, wrote a new piece going even farther. He said on Twitter, into an exercise in writing prompts instead of writing code. The NYT's Kevin Roos is also finding Opus 4.5 great for non-coding purposes. He writes, Claude Opus 4.5 is a remarkable model for writing, brainstorming, and giving feedback on written work. It's also fun to talk to and seems almost anti-engagement maxed. The other night I was hitting it with stupid questions at 1am and it said, Kevin, go to bed. Now as for me, I have not yet found myself switching away from GPT-5.1 or Gemini 3 to Opus 4.5 all that often, but with all of this chatter, it seems clear that I'm going to have to give it an even bigger swing. A couple more stories. Like I said, we are on an extended headlines today. A little bit of market and adoption news. Salesforce has delivered a strong revenue forecast on the back of AgentForce adoption. Salesforce said their Q4 revenue would be between $11.1 billion and $11.2 billion, outstripping analysts' forecasts of $10.9 billion. They also said that remaining performance obligations, a measure of future bookings, would increase by about 15% compared to analyst estimates of 10%. CEO Mark Benioff credited their AI-focused products, stating, Our AgentForce and Data360 products are the momentum drivers. Active customer accounts for AgentForce have grown 70% quarter over quarter, with many customers now transitioning from the pilot phase to active deployment. Benioff said that they now have over 9,500 paying AgentForce customers. He said, We've delivered incredible results with AgentForce. It's really exceeding our expectations. This is our fastest-growing product ever. Now, one interesting sub-wrinkle that I'm watching with the Salesforce story? A big question that many have is to what extent models get commoditized in the future. Salesforce, for their part, has primarily built on top of OpenAI models since they launched AgentForce in late 2024. However, last week, Benioff posted, I've used ChatGPT every day for three years, just spent two hours on Gemini 3, I'm not going back. The leap is insane. Reasoning, speed, images, video, everything is sharper and faster. It feels like the world just changed again. Then on Thursday, he posted, So interesting things to watch to see how Salesforce thinks about model switching and what that means for the rest of the market. An even bigger market story yesterday, if only a little tangentially related to AI, is that Meta could be giving up on their namesake technology with rumors of deep cuts to the metaverse division. Bloomberg reports that the metaverse group could see budget cuts as high as 30% next year. Their sources said cuts of that magnitude would most likely include layoffs as soon as January of next year. They did caveat that no final decisions have been made, but deep cuts to the Metaverse Group are on the agenda for end-of-year budget planning sessions. Sources said that Zuckerberg has asked for 10% cuts across the board, which has been the standard request for the past few years. However, the Metaverse Group was signaled out for deeper cuts due to the lack of industry-wide competition over the technology. Now, for most public market investors, it's hard for them to see the metaverse as anything but a massive disappointment, especially relative to the pitch. In 2021, Zuckerberg presented the metaverse with such conviction that he changed the name of the company. Since then, their metaverse group has been nothing short of a cash incinerator. The group has lost more than $70 billion since the metaverse strategy was announced. And thus, unsurprisingly, markets responded well to the idea that meta would be slashing that particular category of spend. The stock jumped by 5.7% in its largest intraday move since July. Now, while the metaverse group is being slashed, that doesn't necessarily carry over to the parent division Reality Labs That broader division is focused on Meta various AR and VR products and has been going from strength to strength in recent years The Meta Ray have been a surprise hit and now define their product category which is presumably a product category only becoming more important as LLM capabilities catch up to the promise of AI wearables. A Meta spokesperson suggested this strategy pivot is underway, commenting, Within our overall Reality Labs portfolio, we are shifting some of our investment from Metaverse towards AI glasses and wearables given the momentum there. We aren't planning any broader changes than that. The reallocation of resources also aligns to Metis poaching a veteran Apple UX designer, Alan Dye, earlier this week. On Wednesday, Zuckerberg announced that Dye would lead a new creative studio within Reality Labs that would focus on design, fashion, and technology. In a post on Thread, Zuckerberg wrote, We're entering a new era where AI glasses and other devices will change how we connect with technology and each other. The potential is enormous, but what matters most is making these experiences feel natural and truly centered around people. With this new studio, we're focused on making every interaction thoughtful, intuitive, and built to serve people. So friends, that is the story from this extended headlines edition. But for now, we'll wrap it there and move on to the main episode. Today's episode is brought to you by my company, Superintelligent. Superintelligent is an AI planning platform. And right now, as we head into 2026, the big theme that we're seeing among the enterprises that we work with is a real determination to make 2026 a year of scaled AI deployments, not just more pilots and experiments. However, many of our partners are stuck on some AI plateau. It might be issues of governance. It might be issues of data readiness. It might be issues of process mapping. Whatever the case, we're launching a new type of assessment called Plateau Breaker that, as you probably guessed from that name, is about breaking through AI plateaus. We'll deploy voice agents to collect information and diagnose what the real bottlenecks are that are keeping you on that plateau. From there, we put together a blueprint and an action plan that helps you move right through that plateau into full-scale deployment and real ROI. If you're interested in learning more about Plateau Breaker, shoot us a note, contact at bsuper.ai with plateau in the subject line. Meet Rovo, your AI-powered teammate. Robo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio. Robo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Robo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. rovo is already built into jira confluence and jira service management standard premium and enterprise subscriptions know the feeling when ai turns from tool to teammate if you rovo you know discover rovo your new ai teammate powered by atlassian get started at rov as in victory o.com ai isn't a one-off project it's a partnership that has to evolve as the technology does robots and pencils work side by side with clients to bring practical ai into every phase automation, personalization, decision support, and optimization. They prove what works through applied experimentation and build systems that amplify human potential. As an AWS-certified partner with global delivery centers, Robots & Pencils combines reach with high-touch service. Where others hand off, they stay engaged. Because partnership isn't a project plan, it's a commitment. As AI advances, so will their solutions. That's long-term value. Progress starts with the right partner. Start with Robots and Pencils at robotsandpencils.com slash AI Daily Brief. Thank you. Welcome back to the AI Daily Brief. In an episode earlier this week, I talked about how I thought that heading into 2026, we were likely to see a lot more studies and research and experiments that were trying to figure out just how much of the current slate of human work AI was actually able to do. We got a McKinsey study a couple of weeks ago that said that up to 57% of tasks could be automated. More recently, we got that MIT Iceberg report, which said that 11.7% of value generating tasks could be automated. Now, of course, those things have been translated by mainstream media into headlines that 57% of jobs or 12% of jobs are going to be lost. If you want a refutation of why 12% of tasks being able to be automated doesn't mean 12% of jobs going away, listen to yesterday's episode. But alongside those types of studies, what I hope for is that we're also going to get more research around how these things are playing out in practice. There is a seismic gap and massive difference in what AI can theoretically do and what it is actually doing in practice. Now, one of the companies that is most on the spot right now when it comes to providing some amount of that real lived experience information is Anthropic. Yesterday, we looked at some research that they did around their own team where they had interviewed researchers and engineers in August to figure out how Claude and Claude Code were impacting their work. And today we're looking at an even more expanded look at how AI is working in practice with the introduction of Anthropic interviewer. The TLDR is that Anthropic launched a new research tool and tested it by asking professionals about their experience working with AI. Now, I want to focus mostly on the results and what the professionals actually said more than the tool itself, but it is worth mentioning the tool itself a little bit because some are understanding that holding aside this specific use case for it, this potentially represents a broader pattern in how research happens in the future. Now, in their introduction, Anthropic points out that while they recently developed Clio, which is a privacy-preserving system for getting insights from real-world AI use of Claude, there were inherent limits there. As they write, the tool only allowed us to understand what was happening within conversations with Claude. What about what comes afterwards? How are people actually using Claude's outputs? How do they feel about it? What do they imagine the role of AI to be in their future? If we want a comprehensive picture of AI's changing role in people's lives and to center humans in the development of models, we need to ask people directly. Such a project, they noted, would require us to run many hundreds of interviews. Here, we enlisted AI to help us do so. Now, Google's Tao Dong got that there was something interesting about the form factor here. He writes After reading the project blog post and a few transcripts my initial impression is that we seeing a new genre of user research a crossover between surveys and interviews I tempted to call it semi surveys It acts like a survey with predefined open-ended questions, but with the ability to ask decent follow-up questions on the fly. While these 10-15 minute sessions weren't particularly deep, they combined the scale of a survey with the flexibility of a moderator. Pairing this with AI analysis allowed the team to identify quantitative patterns and actually explain why they exist. Seems like a fascinating experiment. What's your take? My take is that this is pretty much exactly what we built, or at least a version of it, with Superintelligent. In Superintelligent's audits and assessments, whether they are our agent readiness assessments or our new plateau breaker assessments, one of the key ideas is that surveys are great for scale but bad for context. Interviews are great for context but bad for scale. But with AI, particularly voice AI, you don't have to make that trade-off. And rather than inferring from a small sample, you can just go ask everybody. Now, I don't think that this is some crazy novel insight. Nor do I think that our technology is some stratospheric leap. But I think that this particular pattern of using AI to radically scale information gathering and then speed up information analysis is something that is absolutely going to become de rigueur for all sorts of current research processes. And in so doing, it is going to open up totally new types of research that weren't possible before because of the scale of information that you can collect and analyze. So before we move on to the specific results, I would say if you are thinking about interesting research projects across basically any domain, if they involve talking with people, I believe that you can radically increase your ambition thanks to the new tools that are available. All right, so back to this actual survey of these 1,250 professionals. For our purposes, what I'm most interested in is what they said about working with AI. Now, presumably, this group is probably going to be more enthusiastic than a random sample of 1,250 people. And so I think that that caveat is important. But within that, some of the high-level insights from Anthropic are that, one, people are optimistic about the role that AI plays in their work. Positive sentiment characterized the majority of topics discussed. However, as we'll see, there are a small number of topics that have more relatively pessimistic outlooks. A second insight, which I think is really valuable about how we design systems and think about displacement, people from the general workforce want to preserve tasks that define their professional identity while delegating routine work to AI. They envision futures where routine tasks are automated and their roles shift to overseeing AI systems. Again, I don't think that that's particularly novel, but it's interesting to see that that is how people are starting to think about their role as well. We talk a lot as insiders about this idea of shifting to a model where humans manage AI agents and AI systems, but it's interesting to see that start to come out as an expectation or a goal from individual professionals as well. A third insight, which absolutely resonates with what I'm seeing, is that despite creatives facing peer judgment and anxiety about their future, they are turning to AI to increase their productivity. As Anthropic puts it, they are navigating both the immediate stigma of AI use in creative communities and deeper concerns about economic displacement and the erosion of human creative identity. Lastly, number four, Anthropic writes that scientists want AI partnership but can't yet trust it for core research. Scientists uniformly express a desire for AI that could generate hypotheses and design experiments, but at present they confine their actual use to tasks like writing manuscripts or debugging analysis code. So what's interesting here, as opposed to some of these other areas, is that it sounds like scientists want AI to do more or at least be more helpful with their core functions, not just those routine tasks to be automated. So let's look at the visualization. If you're listening to the show, I'll go through this pretty fast. But if you're watching it, the blue-gray represents more pessimistic. The muted yellow represents more optimistic. And you can see across almost every category, optimism mostly beats out pessimism. The one area, it appears to me, where there's relatively more pessimism, at least among the general workforce, is in career adaption, which makes sense. Now, among creatives, there are a few areas where, again, pessimism takes a little bit more root. In particular, artist displacement shows actually people more pessimistic than optimistic overall. Same with writer displacement. Among scientists, the biggest area that saw actual more pessimism than optimism is around security concerns. And when you dig into the examples they shared, a lot of it reflects broader sentiment that you hear day in and day out on social media. For example, a lot of folks are trying to figure out what parts of their jobs won't be automated, which parts of their skills will be valuable in a future where they assume AI is ubiquitous. For example, a trucking dispatcher said, I'm always trying to figure out things that humans offer to the industry that can't be automated, and really hone in on that aspect, like the personalized human interactions. However, that is not something that I think will be necessary in the long run. I'm still trying to figure out what skills would be good to work on that AI can't take over. Obviously way bigger than just that particular job role. This question is particularly pertinent, I think, because it's not only something that people should be asking individually, but it's also something that people who are designing upskilling and retraining systems need to be hyper-conscious of. It is not going to be particularly useful if we design a bunch of training programs that just get obviated by GPT-7. Another thing that comes up on the pessimism side is the stigma of using AI. A salesperson, for example, said, I hear from colleagues that they can tell when email correspondence is AI generated and they have a slightly negative regard for the sender. They feel slighted and the sender is too lazy to send them a personalized note and push it onto AI to do it. I think one really interesting question is to what extent that is a temporary transitional feeling, where in the future people will feel like, of course they used AI to write an email, or if that's going to be something that's more persistent. On the optimism side, however, you see tons of reflective comments. People looking to AI to help them manage their time, expand their creativity, reduced their stress by allowing them to focus on the best parts of their job. Overall, 86% of professionals reported that AI saves them time, and 65% said that they were satisfied with the role AI plays in their work. Across different categories of work, there was a pretty similar distribution of frustration, satisfaction, with slightly expanded worry in categories like art, design, and media. Among creative professionals, there is a much bigger band of responses. Designers, for example, see much more frustration than filmmakers, and in many cases, you just see complication even within individual categories. Worry and satisfaction and hope and frustration all sitting alongside one another. In my estimation, this is the type of survey that needs to happen, not once in a while, but on a very regular basis. I want to see these questions tracked over time, and I want to see that data available to policymakers of all stripes. Now, one really cool thing about this is we wrap. Anthropic is making all of the data available in a public data set that you can download from Hugging Face. Of course, with all the participants' approval, meaning that if you are so interested, you can go interact with and run your own analysis on this research as well. Overall, I think these 1,200 professionals tell us a lot of the same story that we've been seeing for years now, a future that has so much opportunity but is fundamentally different, and in that difference, somewhat scary as well. Good job to Anthropic for digging up this real information, but that's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always, and until next time, peace. Thank you.

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies