Are AI Agents Underperforming? Could GPT-5 Be the Answer?


As 2025 commenced, OpenAI CEO Sam Altman was promoting two innovations he believed would transform our existence. One was GPT-5, a much-anticipated significant enhancement to the Large Language Model (LLM) that catapulted ChatGPT to technological distinction. The other was AI Agents that don’t merely respond to inquiries like ChatGPT but actually perform tasks on your behalf. “We are confident that, in 2025, we might witness the first AI agents entering the workforce and significantly altering company outputs,” Altman stated back in January.

Now, eight months later, Altman’s forecast already necessitates a considerable caveat. Companies are enthusiastic about integrating AI Agents, such as OpenAI’s ChatGPT agent. A May 2025 report from consultancy giant PWC revealed that half of the firms surveyed intended to implement some form of AI Agent by the year’s end. Approximately 88% of executives are keen to bolster their teams’ AI budgets because of Agentic AI.

However, what about the genuine AI Agent experience? Regrettably for optimistic executives, the feedback is largely unfavorable. If “AI Agents” were a new cutting-edge James Bond film, this is the type of commentary you would find on Rotten Tomatoes: “glitchy … inconsistent” (Wired); “seemed like a naive internet newbie” (Fast Company); “reality doesn’t match the hype” (Fortune); “not living up to the buzzwords” (Bloomberg); “the new vaporware … overpromising is worse than ever” (Forbes).

Study reveals OpenAI’s offering faltered almost every time

A May 2025 study from Carnegie Mellon University found Google’s Gemini Pro 2.5 failed at real-world office tasks 70% of the time. And that was the top-performing agent. OpenAI’s submission, powered by GPT 4.o, faltered more than 90% of the time.

GPT-5 is anticipated to surpass that figure … but that’s not stating much. And not solely because early reports indicate OpenAI faced challenges in equipping GPT-5 with sufficient enhancements to justify the release number.

In fact, researchers are beginning to perceive that this letdown is intrinsic to the process of LLMs learning to execute tasks for you. The issue, as indicated by this AI Agent engineer’s evaluation, is straightforward math: errors accumulate over time, so the more tasks an agent undertakes, the more they deteriorate. AI Agents handling multiple complex tasks are susceptible to hallucination, much like all AI.

Ultimately, some agents “panic” and can commit “a catastrophic judgment error,” to cite an apology from a Replit AI Agent that actually deleted a customer’s database after 9 days of working on a coding project. (Replit’s CEO deemed the failure “unacceptable.”)

Notably, that isn’t the sole instance of an AI Agent erasing code in 2025 — which clarifies why one enterprising startup is providing insurance for your AI Agent going rogue, and why Wal-Mart has resorted to bringing in four “super Agents” in an attempt to control its AI Agents.

It’s no surprise a recent Gartner report forecasted that 40% of all those AI Agents currently being initiated by companies will be scrapped within 2 years. “Most Agentic AI projects,” opined senior analyst Anushree Verma, are “propelled by hype and misapplication … This can blind organizations to the genuine cost and complexity of deploying AI agents on a large scale.”

What can GPT-5 offer for AI Agents?

It’s conceivable that the ChatGPT agent will ascend to the top of the reliability charts once it operates on GPT-5. (Again, that’s not the highest of thresholds.) However, the new release is improbable to remedy what fundamentally ails the Agentic domain.

This is because guardrails are already being established — by both companies and regulators — limiting what even the most dependable AI Agent can do for you.

Consider Amazon, for instance. The world’s largest retailer, like most tech giants, is making grand claims about AI Agents (as they did at a Shanghai Agentic AI fair in July, depicted above). Simultaneously, Amazon has restricted any AI Agent from browsing and purchasing anywhere on its site.

This makes sense for Amazon, which has always sought control over the customer experience, not to mention its aim to deliver advertisements and sponsored results to actual human viewers. But it’s also restricting a substantial amount of potential Agent activity right there. (On the upside, no “catastrophic failure” involving a large volume of next-day deliveries at your door.)

And do we trust AI Agents to shop online for us anyway? It’s not that they are malicious and wish to steal your credit card information; it’s that they are gullible and susceptible to being phished by malicious individuals who do desire your card.

Even GPT-5 may not be able to circumvent one vulnerability noted by researchers: data embedded in images can instruct AI Agents to disclose any