Why Our Chatbots Are Now Becoming Our Coworkers

I’m not a coder, and I wouldn’t try to pretend otherwise.

Sure, I’ve spent decades around technology and finance, but my job has always been to understand where things are going. That puts me in the business of spotting inflection points, not writing code.

But it means that I can tell the difference between technology that’s interesting and technology that’s genuinely changing how work gets done. And the reaction I’m seeing around Anthropic’s latest artificial intelligence tools has me incredibly excited about the latter.

Anthropic calls this product Claude Code.

It runs on the company’s Claude Opus 4.5 model and is designed to let the AI operate inside a real development environment instead of just answering questions in a chat window. Claude Code can read and modify files, reason across a codebase, run tests, debug errors and keep iterating toward a working solution.

What really stands out to me is how people talk about using it.

They say they’re able to leave it running, and when they come back later, they find that the work has already moved several steps forward on its own. If a first attempt doesn’t work, Claude doesn’t just freeze. It fixes what’s broken and keeps going.

This behavior confirms something I’ve been noticing for the past few months.

The core ingredients for artificial general intelligence have started falling into place, and they’re beginning to reinforce each other.

From Talking to Working

When I say general intelligence, I don’t mean consciousness or creativity. I mean AI that can pursue a goal over time, correct its own mistakes and decide what to do next without needing constant direction.

That’s the difference between software that answers questions and software that actually gets work done.

The first ingredient for general intelligence is knowledge.

That’s what fueled the original ChatGPT when it broke through in late 2022. Models trained on vast amounts of text suddenly became good enough at responses that interacting with them felt natural. They could answer questions, explain ideas and generate language well enough to change expectations for artificial intelligence practically overnight.

But those early AI systems were still fundamentally reactive.

They responded to a prompt, produced an answer and then stopped. Every interaction was a fresh start. It was still useful, and often impressive, but it was limited by an inability to carry work forward on its own.

To take the next step, AI needed a dash of the second ingredient: reasoning.

Over the next couple of years, AI kept improving as several pieces got better at the same time. Models got larger and training improved. Systems also became better at following instructions and using tools.

The real inflection, though, came when explicit reasoning entered the picture.

By late 2024, with the release of models like OpenAI’s o1, AI systems became noticeably better at multi-step logic, math and debugging.

That improvement showed up almost immediately.

GitHub’s research found that developers using AI coding assistants completed tasks roughly 30% faster on average, with even larger gains on routine or repetitive work.

And for the first time, these systems weren’t just producing fluent answers. They were reliably working through problems.

But even then, the way people used AI didn’t really change. You asked a question, got an answer and moved on.

But that’s changing now with the addition of a third ingredient: iteration.

This is what’s emerging with tools like Claude Code and other long-horizon agents that are built to operate over longer stretches of time.

These systems don’t just respond and stop. They work through a problem, test the result, notice what broke, revise their approach and continue without being told exactly what to do next.

Generally intelligent people can work autonomously for hours at a time, making and fixing their mistakes and figuring out what to do next without constant direction.

For the first time, software is starting to behave the same way.

And researchers have been measuring this capability directly. Groups like METR track how long AI systems can reliably pursue a goal without human intervention, and the trend they’re seeing is exponential.

Image: metr.org

The length of tasks these systems can handle has roughly doubled every seven months.

If we trace out the exponential, agents should be able to work reliably to complete tasks that take human experts a full day by 2028, a full year by 2034, and a full century by 2037.

To be clear, I’m not talking about artificial superintelligence (ASI). That comes later.

What comes first is persistence, error correction and follow-through. Those traits are what will turn our AI tools into something closer to coworkers.

Claude happens to be the clearest example of this right now, but it isn’t alone. OpenAI, Google and others are clearly racing toward the same kind of long-horizon capability.

Check out this recent post from a developer talking about Codex, OpenAI’s system that’s designed for similar long-horizon coding tasks.

But Claude Code stands out today because of its ability to be an interactive, collaborative and conversational partner. And Anthropic’s emphasis on safety and controllability will become even more relevant as systems run longer and with less direct oversight.

When a model works for seconds, mistakes are easy to catch. But when it works for hours, the stakes are a lot higher.

Claude Code is raising the stakes.

Here’s My Take

You don’t have to believe that artificial general intelligence is right around the corner to recognize what’s happening here. AI systems that can plan, execute and revise work over extended timeframes represent a real shift in how labor and productivity scale.

Think of it this way.

The AI applications of 2023 and 2024 were talkers. Some were very sophisticated conversationalists. But their impact was limited because they still needed constant input from people.

The AI applications of 2026 and 2027 will be doers.

They will feel less like software and more like coworkers. Instead of using AI a few times a day, people will run it all day. Multiple agents will work at the same time. And instead of saving a few hours, users will move from doing the work themselves to managing teams of intelligent systems.

In other words, the goal is no longer better answers.

It’s getting real work done.

Tomorrow, I’ll show you how researchers are measuring this shift and why the curve just bent sharply upward.

Regards,

Ian KingChief Strategist, Banyan Hill Publishing

Editor’s Note: We’d love to hear from you!

If you want to share your thoughts or suggestions about the Daily Disruptor, or if there are any specific topics you’d like us to cover, just send an email to [email protected].

Don’t worry, we won’t reveal your full name in the event we publish a response. So feel free to comment away!

Source link