Demand Agents: The Need for Machine Learning
Humans learn through doing. AI is no different. A culture of experimentation and iterative improvement is key to building a production-ready agent.
The Gartner hype cycle is famous for a reason. New software markets evolve along a predictable pattern, which Gartner have spent many years mapping. Every launch follows the same curve, from the initial hype through the trough of disillusionment and eventually to the plateau of maturity. Yet, it can be very difficult to estimate where any particular product is on the curve at any given time. Analysts have spent the last 18 months saying that Generative AI has entered the trough of disillusionment. The implication being that expectations have peaked and ROI is just around the corner. Yet, board-level AI mandates haven't gone away, and the data centre investment bubble is still in full swing.
For their latest hype cycle estimates, Gartner broke out the different AI use cases into separate technologies. Each AI product category was placed at a different place on the maturity curve. AI assistants were considered to be approaching the plateau of productivity. Vibe coding and agentic AI are still considered to be in the hype phase. Such a distinction allows analysts to separate the markets where AI has achieved widespread acceptance from those where the technology is still being piloted.
Mature Assistants
Generative AI chatbots are the most mature use case. Broadly speaking, consumers have figured out where they want to use ChatGPT and other LLMs. People are using them much more for search, but only for certain types of queries. According to SEO specialists, AI is primarily used for research and short listing. Traditional search engines are still the preferred choice for navigational and transactional queries. The same applies in an enterprise context, where Copilot and Gemini are typically used for information retrieval.
Indeed, user complaints about AI search have grown over recent months. ChatGPT’s recent high profile code red blitz is a sign of a product dealing with slowing growth, while trying to juggle conflicting user feedback. Copilot has been attracting the same criticism in an enterprise context. The Microsoft 365 AI works for some people some of the time, particularly when used to create high level overviews, but isn't accurate enough for every project. Those limitations have dented trust in all forms of generative AI, and are one of the key factors in the widespread backlash against agentic AI.
Agentic Experiments
Agents have their use cases, particularly in a marketing context, but the technology powering them is experimental. Agentic platforms such as Agentforce were released prematurely and are still being developed in public. SaaS vendors are still adding MCP servers to their products. Meanwhile, executives have realised that replacing SaaS with vibe coded AI apps is a non-starter, even at this early stage. That won’t stop the adoption of agents where needed. The investment bubble means that new AI startups are constantly bringing new ideas to market, some of which look set to catch on.
Two broad lessons have emerged from these early agentic AI pilots. The first is the continued importance of rule-based automation. Many humans are fairly terrible at executing fixed repetitive processes. Generative AI is even worse at it. Both humans and AI will skip key steps through error or ‘just because’. It took AI vendors a long time to accept this reality, but many have developed their products to combine the best traits of deterministic workflows and probabilistic decision making.
Machine Learning
The second key learning relates to the importance of learning itself. Any AI process needs a feedback loop. Training AI is an iterative process, that takes a lot of reinforcement over a long period of time. That same training process needs to carry on even after the agent enters production. Unfortunately, over-stretched enterprise tech teams simply don’t have the time to constantly monitor AI outputs and tweak inputs. The data model underlying any agentic process needs to incorporate some form of machine learning, so that the AI can evaluate incorrect decisions and avoid repeating errors.
The difficulty with self-learning AI is that agentic AI is very much a black box. Training relies on constant tweaking of prompts and foundational datasets through endless testing processes. That makes it difficult to deliver an ROI on AI projects, particularly given how many companies are still struggling with the best way to efficiently test AI at scale, especially for projects which require a human-in-the-loop cycle. AI vendors are aware of this issue, though. Google recently released Antigravity, a new AI developer toolkit, which was widely praised for offering a solution to this exact problem.
Constant Improvement
If Agentic AI is to deliver on the potential, then it needs to be more accessible. Releasing a dedicated AI development IDE is a start, but is only useful for AI developers. Different tools are needed for data analysts and operations teams working with the AI features within their corporate tech stack. At the moment, those training processes are lacking, which in turn limits confidence in the technology as a whole. The ability to run process tests in bulk and compare output across evaluations is an important step in any workflow. It’s especially important when AI is involved, given how subjective AI output can be. Slop has become a buzzword for a reason.
Improving output quality is paramount in building confidence, but can continue even after production deployment. Business teams have become more realistic about the best uses for the technology, but there is still a significant gap between ambition and reality. Increasingly, these barriers are due to a combination of experience and tooling rather than fundamental issues with the technology. Bridging knowledge gaps takes time, but will happen eventually. It’s the natural progression of any new technology. It reaches maturity when ops teams become used to working with it. Generative AI is no different.