You Put “Agentic AI” in the Tender. Now What?

There is a phrase that has taken over technology tenders across Southeast Asia in the past 18 months.

It appears in scope of work documents. It appears in evaluation criteria. It appears in press releases celebrating digitisation. It appears in slide decks presented to steering committees by agencies who learned the term two weeks before the presentation.

“Agentic AI.”

Nobody stops the meeting to define it. The vendor nods. The procurement officer nods. The project gets awarded. Somewhere downstream, a team of engineers opens the technical requirements document and begins the quiet process of figuring out how to deliver something that sounds like a magic trick for a budget that was sized for a website redesign.

This article is for the person holding the budget. Not to shame them. To prepare them.

What “Agentic” Actually Means

Before discussing cost, there must be a shared understanding of what is being bought. Because “Agentic AI” is not a product. It is an architecture. And architecture has layers, each with a price tag.

A standard AI chatbot receives a question and returns an answer. That is one language model call. One token transaction. Done.

An agentic AI system is different in a specific and expensive way. It receives a goal. It then breaks that goal into steps, decides which tools it needs to complete each step, calls those tools, evaluates the results, adjusts its plan if something goes wrong and continues until the goal is achieved. Each of those decisions, evaluations and adjustments is a separate language model call. Each call costs money.

Microsoft describes agentic AI as “an autonomous AI system that plans, reasons and acts to complete tasks with minimal human oversight.” Google Cloud defines it as AI that can “set goals, plan and execute tasks with minimal human intervention.” AWS calls it “proactive rather than reactive.” These are accurate descriptions. They are also descriptions of a system that, at scale, makes hundreds or thousands of individual reasoning calls to complete a single complex task.

The technical components of a production agentic AI system are:

The LLM layer. One or more large language models that serve as the reasoning engine. This is billed per token, where a token is roughly four characters of text. Every time the agent thinks, every time it reads a tool result and every time it generates a plan, tokens are consumed and billed.

The orchestration layer. The code that manages the agent’s decision loop: what tools to call, in what order, how to handle failures, when to retry, when to stop. Built using frameworks like LangChain, LangGraph or Microsoft AutoGen. This layer requires engineers who understand agent architecture, not just AI tools.

The tool layer. The actual systems the agent interacts with: databases, APIs, file systems, external services. Each integration must be built, secured and maintained. A government system talking to five different databases needs five secure integrations, each with its own error handling.

The memory layer. Agents need context across steps. This requires a vector database (Pinecone, Weaviate, pgvector) to store and retrieve relevant information quickly. The more context the agent needs, the more storage and retrieval calls are made. The more calls made, the more tokens are consumed feeding that context into the model.

The guardrails layer. Controls that prevent the agent from taking unintended actions, accessing restricted data, or producing harmful outputs. In a government context handling citizen data, this layer is not optional. It is a compliance requirement. Building it adds significant engineering time and ongoing monitoring cost.

The observability layer. Logging, tracing and monitoring of every agent action. Without this, there is no way to audit what the agent did, debug failures or demonstrate compliance. For public sector deployment, auditability is not a feature. It is a legal necessity.

The human oversight layer. The people who review agent outputs, approve high-risk actions and handle edge cases the agent cannot resolve. These are not temporary during rollout. They are permanent. If an agentic AI system handles 1,000 tasks but human staff must verify 30% of the outputs, that verification workforce must be counted as part of the system’s operating cost for as long as the system runs.

A chatbot is one box. An agentic AI system is all seven boxes, connected, monitored and maintained.

The Cost Breakdown Nobody Puts in the Tender

Here is what a production agentic AI deployment actually costs. All figures are verified from industry sources and real deployments.

Layer 1: LLM API Costs

The LLM is the core engine and the most variable cost. Current pricing for leading models as of early 2026, verified against official provider documentation:

GPT-4.1 (OpenAI): USD 2.00 input / USD 8.00 output per million tokens (released April 2025; newer than GPT-4o and better suited to long agentic contexts with its 1M token window)

GPT-4o (OpenAI): USD 2.50 input / USD 10.00 output per million tokens (still widely deployed in production systems)

Claude Sonnet 4.6 (Anthropic): USD 3.00 input / USD 15.00 output per million tokens (current default model as of February 2026, released alongside Claude Opus 4.6)

Gemini 2.5 Flash (Google): USD 0.30 input / USD 2.50 output per million tokens (note: this model’s pricing increased significantly from its 2024 launch rates; budget accordingly)

DeepSeek R1 0528 (DeepSeek): USD 0.40 input / USD 1.75 output per million tokens (updated May 2025 release; the original R1 remains at USD 0.55 input / USD 2.19 output)

A “token” is roughly four characters. A single complex agentic task can consume tens of thousands of tokens through its reasoning loop. A Fuzzy Labs analysis of a production SRE agent found that a single incident resolution consumed approximately 120,000 input tokens and 1,500 output tokens, costing about USD 0.38 per task.

Scale that to 300 tasks per month and the LLM cost alone is USD 114 per month for that one use case. That is cheap. But government agentic AI systems are not built for one use case. A citizen services agent handling document processing, inquiry routing and status updates across multiple departments will run many parallel tasks. According to Azilen, a mid-sized product serving around 1,000 daily users consumes 5 to 10 million tokens per month in LLM costs alone, at a cost of USD 1,000 to USD 5,000 monthly. Add multi-step reasoning, retries and longer contexts. That figure climbs.

A real documented case: a mid-market company deployed a customer support agent with an initial build cost of USD 120,000. Within three months, unoptimised prompts and unlimited conversation depth pushed LLM spending to USD 7,500 per month. Nobody had modelled that in the business case.

Layer 2: Cloud Infrastructure

Compute (Kubernetes cluster, t3.medium baseline): USD 30–150/month

Vector database (Pinecone, Weaviate or pgvector): USD 500–2,500/month

Message queuing and workflow orchestration: USD 50–300/month

Logging and observability (ELK stack or CloudWatch): USD 100–500/month

Security and access management: USD 100–400/month

AWS runs 15 to 22% more expensive than Google Cloud for AI workloads. Azure reserved instances offer up to 42% savings for long-term commitments. These differences matter at scale but are irrelevant if the project is using cloud credits from an MSC grant that expire before the system reaches production.

Infra costs that look small in proof of concept expand sharply in staging and production. One documented example: a supply chain optimisation agent saw infrastructure costs jump from USD 5,000 per month in prototyping to USD 50,000 per month in staging due to unoptimised RAG queries fetching ten times more context than needed.

Layer 3: Build Cost

The one-time cost to design, build and deploy an agentic AI system to production.

Simple single-agent system (one use case, minimal integrations): USD 50,000–80,000

Mid-complexity multi-agent system (3–5 use cases, several integrations): USD 120,000–250,000

Enterprise-grade multi-agent platform (full workflow, compliance, monitoring): USD 250,000–500,000+

These ranges reflect what competent teams charge to build systems that work reliably in production, not demos. A demo can be built in a weekend. A production system handling government data, with auditability, guardrails, error recovery and human oversight workflows, takes three to six months of serious engineering work.

Layer 4: Ongoing Operating Cost

This is the cost that procurement documents rarely include.

Prompt tuning and model maintenance. Agent behaviour drifts when underlying models are updated. When the LLM provider releases a new model version, agent prompts written for the old version may produce different outputs. Someone must monitor this, test it and fix it. This is not a one-time cost. It is a permanent staffing requirement.

Human verification workforce. Studies show that organisations deploying agentic AI need human staff to review a meaningful portion of agent outputs. If the system processes 10,000 tasks per month and 30% require human review, that is 3,000 manual reviews. At 10 minutes each, that is 500 hours of staff time per month. This cost must appear somewhere in the operating budget.

Security and compliance monitoring. Agentic systems that access government databases, citizen data or financial records require continuous security monitoring. Prompt injection attacks, where malicious inputs attempt to hijack agent behaviour, are a documented and growing threat. The OWASP Top 10 for LLM Applications lists this as a primary risk. Without runtime monitoring, the organisation cannot detect or respond to these attacks.

Integration maintenance. Every API the agent calls can change. Government systems are updated. External services are deprecated. Each change can silently break an agent workflow. Someone must own integration maintenance as a named responsibility, not as “we’ll handle it if something breaks.”

A realistic total operating cost for a production government agentic AI system, excluding the initial build, runs between USD 5,000 and USD 25,000 per month depending on scale, model choice and oversight requirements.

What “Agent Washing” Means for Buyers

Gartner, in a June 2025 press release, introduced a term that every procurement officer in Southeast Asia should know.

“Agent washing.”

It means vendors rebranding existing products (chatbots, robotic process automation tools, rule-based workflow software) as agentic AI without delivering genuine autonomous capabilities. Gartner estimates that of the thousands of vendors globally claiming to offer agentic AI, only around 130 actually do.

The other vendors have noticed that tenders now require “Agentic AI” in the scope of work. They have updated their marketing materials accordingly. They have not updated their products.

A chatbot that routes inquiries between departments is not agentic. A form automation tool with an AI label is not agentic. An RPA workflow that calls a language model to summarise a document is not agentic. Agentic AI plans, reasons across steps, takes actions through tools and adapts when those actions produce unexpected results. If the system cannot do all of those things autonomously and reliably, it is not what the tender describes.

The danger for buyers is not that they will be deceived by bad-faith vendors. Most vendor proposals are written in good faith. The danger is that the vendor and the buyer are using the same words to describe different things. Nobody in the room has the technical background to notice.

The Gartner Number That Should Be in Every Business Case

In June 2025, Gartner released a formal prediction based on a poll of 3,412 organisations.

Over 40% of agentic AI projects will be canceled by the end of 2027 due to escalating costs, unclear business value or inadequate risk controls.

The reason cited by Gartner’s Senior Director Analyst Anushree Verma: “Most agentic AI projects right now are early stage experiments or proof of concepts that are mostly driven by hype and are often misapplied. This can blind organisations to the real cost and complexity of deploying AI agents at scale, stalling projects from moving into production.”

Gartner also placed agentic AI at the Peak of Inflated Expectations on their Hype Cycle, with 2026 predicted to be the year it enters the Trough of Disillusionment. That prediction is already bearing out. A separate analysis found that 70% of developers attempting to deploy AI agents in enterprise environments report significant problems integrating with existing systems. This is not a capability problem. It is an architecture problem. Legacy systems were not built to receive instructions from autonomous software. Connecting them is expensive and slow.

This prediction is not about technology failure. The technology works. The failure mode is procurement and planning. Projects are funded based on demo performance, not production requirements. Budgets are set before cost modelling is done. Timelines are approved without accounting for integration complexity. And when the real costs emerge mid-project, the funding is gone and the scope must be cut, usually starting with the parts that actually make the system safe to deploy.

For government agencies, a canceled project is not just a budget write-off. It is a public record. It becomes an audit finding. It appears in parliamentary questions. The cost of an underfunded, incomplete agentic AI project is not just the money spent. It is the credibility of every future digital initiative the agency proposes.

What “Readiness” Actually Requires

Before a government agency puts “Agentic AI” into a tender, there are questions that must be answered inside the organisation. Not by the vendor. By the agency itself.

Data readiness. Agentic systems are only as reliable as the data they access. If the underlying databases have inconsistent formats, outdated records or missing fields, the agent will produce unreliable outputs. Research suggests up to 85% of AI projects encounter significant issues related to data quality. Government data systems, accumulated over decades with varying standards, frequently have all three problems. Cleaning and structuring source data before an agentic system can use it reliably is a project in itself.

Process readiness. Agentic AI is most effective when the workflow it is automating is already well-defined, documented and understood. If the human process is inconsistent, full of exceptions and dependent on tribal knowledge, the agent will replicate that inconsistency at scale. The work of defining and standardising the process must happen before the agent is built, not as part of the build.

Skills readiness. According to AWS research, 52% of ASEAN businesses cite skills as the primary barrier to AI adoption. Building and maintaining an agentic AI system requires specialists in LLM prompt engineering, agent framework development, vector database management, API integration and AI security monitoring. These are not roles that exist in most government IT departments today. The organisation must either hire them, develop them through training or accept a long-term dependency on the vendor for all maintenance and changes.

Governance readiness. An agentic system that takes actions on behalf of the government must have clear rules governing what actions it is permitted to take, what data it is permitted to access and what happens when it makes a mistake. These rules must be written before the system is built, not discovered after the first incident. In Malaysia, the Personal Data Protection (Amendment) Act 2024 imposes mandatory breach notification requirements and fines of up to RM 1 million for violations. If an agentic system misconfigures access to citizen data, the legal exposure belongs to the agency, not the vendor.

Budget readiness. The one-time build cost is only the beginning. The questions that must be answered before any tender is issued:

What is the monthly LLM API budget for year one, year two and year three?
What is the staffing plan for human oversight of agent outputs?
What is the process and budget for handling model updates and prompt retuning?
What happens to the system when the vendor changes their pricing?
Who owns the system after the vendor contract ends?

If these questions do not have answers, the project is not ready to be tendered.

What a Good Tender Actually Looks Like

A well-written agentic AI tender does not specify the technology. It specifies the outcome, the constraints and the accountability.

It describes the specific process to be automated, the current volume of work that process handles and the acceptable error rate for automated outputs. It specifies which data sources the system will access and the security classification of that data. It requires vendors to provide a five-year total cost of ownership model, not just an implementation quote. It defines the human oversight requirement explicitly, including who reviews agent outputs, what response time is required and how errors are escalated. It requires a proof of concept against real data before full award, not a demo on curated samples. It includes a knowledge transfer requirement so the agency can maintain the system after the vendor contract ends.

A tender that says “the system shall utilise Agentic AI to improve service delivery” without specifying any of the above is not a tender. It is a wish list with a budget attached.

The Honest Summary

Agentic AI is real. It works. It can genuinely improve public service delivery, reduce processing time for routine tasks and free government staff for work that requires human judgment.

It is also complex, expensive to build, more expensive to operate than a simple application and dependent on data quality and organisational readiness that most agencies have not yet achieved.

The agencies that will get value from agentic AI are the ones that spend three months on requirements before issuing any tender, that include honest cost modelling for operations not just development and that start with one well-scoped process rather than agency-wide transformation.

The agencies that will produce the 40% cancellation rate Gartner is predicting are the ones that put the term in the tender because they saw it in someone else’s tender, approve a budget sized for a chatbot and then wonder why the project is stuck in proof of concept two years later.

Pay peanuts, get monkeys. That is not a commentary on vendors. It is a description of what happens when the budget for a system is set before anyone understands what the system costs.

The author is a Technical Lead with experience building backend systems in Laravel and Python. This article is part of a series on the gap between what technology sounds like in a tender and what it costs to deliver in production. All cost figures cited are sourced from published industry research and documented real-world deployments.