Anshul Gupta
The Tokenomics of Building AI Systems for Revenue

Anshul Gupta
Co Founder
Table of Contents
Driving AI adoption inside a revenue organization is genuinely hard. Reps are skeptical and change management is slow. The path of least resistance, and increasingly the default, is to hand sellers access to a general-purpose AI and let them figure it out. Give everyone Claude or Gemini and encourage them to build workflows, automate the repetitive stuff, and prompt their way to better outputs.
The intent is directionally right. Empowering reps to move faster and think sharper is exactly what AI should do for a revenue team. In the early days, the results on face seem to bear fruit: experimentation increases, adoption metrics look good in the QBR.
We’ve noticed three trends that affect teams building with generalist AI tools.
The first is uncontrolled cost. When reps are free to prompt however they want against a live model, token spend scales with behavior rather than with accounts. A rep who iterates ten times to get a usable email, pastes in full call transcripts as context, and runs three variations of the same outreach has just burned what a well-architected agent would spend on that account in a month. Multiply that across a team, add new use cases, and the bill grows in ways that are nearly impossible to forecast or govern. This is what people mean when they talk about tokenomics. It’s not the cost of running AI, but the cost of running it without architectural guardrails.
The second is a quality problem that compounds quietly. Without standardization or governance over what reps are prompting and what they're sending, there's no QA layer between the model and the prospect. Some reps produce strong outputs. Others accelerate mediocre work at scale. We call this sales slop - AI-generated assets and messaging that moves fast without substance. You may be reaching more prospects, but with less intentional work that a human would never send on their own. Adoption metrics look healthy, but pipeline quality suffers.
The third is the absence of any compounding value through feedback loops. You can see that reps are using the tool. You cannot see whether the workflows they've built, the prompts they've crafted, or the outputs they've sent are actually driving outcomes. Creation automation, at best, frees up time. At worst, it accelerates low-quality work in ways that are invisible until the results show up in the numbers, by which point the causal chain is long gone.
These problems are predictable consequences of a specific architectural gap: there is no persistent, governed layer sitting between the model and the rep. Utilizing general purpose AI tools means building on a stateless architectural foundation. Each interaction starts from scratch, context is assembled on the fly, and nothing that happens compounds into organizational intelligence.
What Stateless GTM AI Actually Looks Like
When most teams build AI for GTM, they default to a familiar pattern:
A rep asks a question about an account.
The system fetches relevant records from the CRM, retrieves call transcripts, pulls enrichment data, assembles all of it into a context window, and sends it to the model.
The model reasons over that context and returns an answer.
This works for the first query. The problem is that it works exactly the same way for the second query. And the third. And every query after that.
The system has no persistent understanding of the account. It doesn't know what questions were already asked or what conclusions were already drawn.
There are two cost problems this creates, although they are not equal in severity.
The first is prompt inflation. When reps don't get what they need on the first try, they iterate. They add context, re-explain the situation, and run variations. A session that should cost a few cents costs several multiples of that. This is real, but it is also partially addressable. Every major frontier lab now supports prompt caching, which reduces the cost of cached tokens within a session by 60–70%. In-session iteration is an expensive habit, but it is a manageable one.
The second problem is harder, and it is the one that actually breaks the economics at scale. Context reconstruction is not an in-session problem - it is a per-invocation problem. Every time an agent runs, anywhere in your organization, it has to re-fetch context from scratch: pull CRM records, retrieve transcripts, gather enrichment signals, assemble a coherent picture of the account. There is no cached state to draw from because there is no state. The account is rebuilt from raw sources on every call.
Now consider what happens when you scale this. You deploy agents across your org, and each one is independently assembling context for the same accounts. With N deployments, you pay the re-fetch cost N times. There is no shared understanding of what was already gathered or reasoned over. Foundation models are fundamentally probabilistic systems without a standardized memory layer - meaning that each invocation may pull in a different context than the last, meaning you don't even get consistent outputs across use cases, let alone consistent costs.
The moment you support more than one or two use cases, this multiplicative effect of every agent invocation becomes the dominant cost driver.
The Maintenance Tax
Token spend is only the floor of the total cost.
Every time a frontier model provider releases a new model, in-house GTM AI systems need to be re-evaluated and re-optimized. Prompts that worked for the previous model produce inconsistent outputs on the new one. The evaluation pipeline needs to be re-run. Engineers need to audit outputs across use cases. Teams discover regressions. This is not a one-time project. It is a continuous engineering line item with no natural endpoint, because the model landscape is not stabilizing.
Add to that the infrastructure required to actually run agents at scale:
Ingestion pipelines for each data source
A retrieval layer
Vector databases
Orchestration
Sandboxing
Rate limiting
Observability
PII governance
Building this in-house ties up 8-12 engineers just to keep the platform layer running. People who should be building next-best-action logic but are stuck on ingestion and memory plumbing instead. We've seen GTM teams burn 15-20 AI engineers on this alone.
Why Memory Changes the Economics
A common response to this framing goes something like: "We understand our business better than any vendor does. If we encode our sales motion, our qualification criteria, and our workflows into a set of Claude skills, we have something proprietary that reflects how we actually sell. Why would we need anything else?"
It's a reasonable instinct. Procedural knowledge - the organizational expertise encoded into prompts, skills, and workflow automations - is genuinely valuable. A well-designed set of Claude skills that reflects how your best reps handle objections, or how your CS team identifies expansion signals, is real IP. We don't dismiss it.
But procedural knowledge has a ceiling, and that ceiling is determined by the context it operates on.
A skill that encodes how to evaluate account readiness for expansion is only as good as the account context it reasons over. If that context is incomplete, stale, or inconsistently assembled - which it will be in any stateless system - the skill produces inconsistent outputs regardless of how well it was designed. The procedural layer is executing correctly. The memory layer underneath it is lacking. These failures are probabilistic rather than systematic, they're hard to diagnose and nearly impossible to prevent.
The same logic applies to the reasoning layer - what we'd call our next-best-action architecture. Knowing your sales motion doesn't tell you what to do next with this account, today, given everything that's happened in the last two weeks. That requires synthesizing current state across signals that no skill can pre-encode: a champion who just changed roles, a competitor that just closed a deal with a similar company, a product usage drop that started three days ago. Static procedural knowledge can't reason over a dynamic state it doesn't have access to.
This is the ceiling. You can build excellent skills and still be limited by two things: the memory layer isn't persistent, so the context those skills reason over is reconstructed imperfectly on every call; and there's no standing opinion layer that evaluates what matters for each account right now and tells the rep what to do about it. The skills are execution logic. Without memory and without a reasoning layer generating a logical point of view, they're execution logic with nowhere useful to go.
The core insight behind Actively's architecture follows directly from this. Context should be additive, not reconstructed. Per-account agents maintain a continuously updated state - a synthesized view of what's happening, what's changed, and what matters next for each account. When new signals arrive, the agent processes the delta. It doesn't re-read the full account history to incorporate one new piece of information. On top of that memory layer, a reasoning layer continuously evaluates each account against your sales motion and produces a specific point of view on what to do next.
This changes the token math fundamentally. Instead of paying to reason over the full account context on every interaction, you pay to update and query a structured state that already exists. The work done last week doesn't have to be repeated this week unless something changes. Since the outputs are grounded in a standing, current understanding of each account rather than a freshly assembled raw dump, they're more consistent, more accurate, and significantly cheaper to produce at scale.
For a large enterprise organization running agents on tens of thousands of accounts, this difference is not marginal. The raw compute cost of a naive architecture versus a persistent memory architecture is substantially higher, and that gap widens as use cases multiply.
Predictability Is the Other Half of the Problem
Beyond raw cost, the volatility of stateless token spend creates a distinct problem for finance and operations teams trying to budget AI initiatives.
When usage depends on how many questions reps ask and how many times they retry, spend is nearly impossible to forecast. A new use case, a new team onboarded, a new model that requires more prompting to get usable output - any of these can spike the bill. Revenue organizations don't tolerate cost structures that scale unpredictably with usage behavior.
A per-account architecture naturally produces a different cost model. Spend is a function of how many accounts are under coverage and at what cadence agents run, not what reps happen to ask on any given day. You forecast by accounts, not by queries. That predictability matters enormously for organizations trying to justify and scale AI investment.
The Compounding Effect
The economic advantage of persistent memory grows over time in a way that a stateless system cannot replicate.
Each time an agent processes a new signal or interaction, it refines its understanding of the account. Over months and quarters, the agent's state becomes substantially richer than what any naive retrieval system would assemble on the fly. The quality of its reasoning improves not because the underlying model changed, but because the memory it reasons over has compounded.
This also means that corrections and learnings propagate. When a rep adjusts the agent's recommended action or provides feedback on its reasoning, that signal updates the agent's understanding not just for that account, but for similar patterns across the organization. When one agent learns, every other agent learns alongside it. A stateless system can't do this.
The result is an architecture where the cost of serving each account decreases over time as the agent's memory becomes more efficient, while the quality of outputs increases. That's the opposite trajectory of a stateless system, where costs scale linearly with usage and quality depends entirely on what happens to be in the context window on any given call.
Build vs Build With Actively
We've seen how this plays out across dozens of enterprise GTM teams.
The teams that start with a stateless architecture hit the same wall. Early results are encouraging. Then token costs come in higher than expected. Then a model upgrade breaks outputs. Or a new use case requires re-architecting the retrieval layer. Or a key engineer leaves and the institutional knowledge of how the system works goes with them.
Samsara is a useful illustration. Samsara’s team had made substantial investments internally to build GTM AI tools with an ambitious roadmap. Their team started with basic RAG capabilities which worked for point-in-time question answering but did not satisfy continuity nor high recall over every interaction with an account. They built an agent with raw context tools to derive per account intelligence but found it too expensive without persistent memory. Ultimately, building use case by use case was producing a fragmented set of agents, each with its own bespoke logic and increasing maintenance overhead.
For Samsara, architecting a stateful foundation would have been difficult without incurring massive compute costs. Actively's Per-Account AgentTM architecture aligned with exactly what they had envisioned building, and partnering meant they could optimize their engineering resources toward delivering use cases rather than maintaining the infrastructure underneath them. Actively-sourced opportunities have since driven a 2x conversion rate for their ADR and commercial AE teams, with the deployment continuing to expand across the GTM organization.
Read more about Samsara’s journey with Actively
The pattern is consistent. The teams that reach production impact fastest recognize early that the memory and orchestration layer is not a feature to build. It's an infrastructure investment with ongoing maintenance costs that don't go away. Building on top of a persistent account agent layer, rather than underneath it, is what lets AI engineers focus on the workflows that actually move revenue.
That's the architectural position Actively is designed to occupy. The hard work - continuous memory, state management at scale, model routing, retrieval optimization, shared learning loops - is maintained on our side, so the teams building with us can focus entirely on what their revenue organization needs next.
We can surface that intelligence via multiple ambient interfaces. Our API gives internal teams direct access to per-account agent memory, decisions, and strategy, embedding continuously maintained GTM intelligence into any internal tool, workflow, or application they want to build. Internal AI teams ship GTM use cases significantly faster with that foundation under them, because they're no longer responsible for keeping the memory layer alive. For teams building with Claude, Cowork, or other AI tools, Actively's MCP connection means every custom agent gets access to each account's full research, strategy, and history without any context setting.
The choice isn't really build versus buy. It's building the layer that differentiates you versus building everything from scratch and paying the full cost of both. For teams taking the tokenomics question seriously, that distinction is where the answer lives. The token spend problem isn't solved by negotiating better API rates or throttling usage. It's solved by building on an architecture where context compounds rather than gets reconstructed, and where the work that was done last quarter makes this quarter cheaper, not the same price all over again.
This is ultimately what Intelligence-Led Revenue requires at the infrastructure level. The vision of agents working every account continuously, in the background, without waiting for human initiation only holds if the economics of running those agents are sustainable at scale. A stateless architecture can produce impressive demos. It cannot produce a revenue organization that gets meaningfully smarter every quarter, because it has no mechanism for accumulating what it learns. The tokenomics don't work, and neither does the model. Persistent memory is what makes continuous execution financially viable. And continuous execution is what makes Intelligence-Led Revenue real rather than aspirational.
If you're thinking through the tokenomics of your GTM AI system, we'd be glad to dig into the specifics and how Actively is built to provide predictability and drive outcomes in a live demo.



