Building AI Agents

Human-agent collaboration for long-running agents

Deep dive on designing interfaces for long-running AI agents based on four modes of human-agent interaction: pull, push, ambient, and autonomous.

One of the most important problems in software over the next decade will be designing collaborative interfaces for humans and AI agents to work together. Today, the vast majority of people in the world who have interacted with AI have only done so through simple conversational interfaces like ChatGPT; it's very obvious to us that this won't be true 3 years from now.

This is especially critical given two trends:

Long-running agents that run for several minutes, hours, or days (or in our case, forever) to perform more complex tasks
Ambient agents doing self-initiated work vs. responding to user requests

The UX for long running AI Agents is going to be one of the most interesting design questions in the coming years. The more the agent is doing complex tasks for you in the background, the more the UI of software is about the meta elements of managing their work.
— Aaron Levie (@levie) May 24, 2025

There is huge alpha in figuring this out: Cursor experimented for months of very little growth until they nailed the UX (and even chose the contrarian path of re-building an IDE to have more control over the interface):

So there was some initial buzz at the very start… but then usage tanked. The entirety of that summer was just incredibly slow growth. That was somewhat demoralizing… We tried all these different things that summer, and then we found this core set of features that really, really worked incredibly well. One of them was this instructed edit ability, and we kind of nailed the UX for that… We had a bunch of other experiments that didn't pan out—probably ten failed ones for every feature you see in the product.
— Aman Sanger Apr 2, 2025

This is especially critical as AI is starting to be measured against the real productivity impact it drives — we need to actually embed agents more deeply within workflows to drive real productivity gains:

There really are 2 different worlds of AI adoption right now. Most individuals, teams, and organizations have *finally* gotten around to implementing AI chat systems for the first time.

But this is happening at the exact same moment when it's getting clearer what the future of AI agents are going to look like.

The type ahead or chat interaction paradigm of AI maxes out at double digit productivity gains because you're inherently rate limited by how fast you can type or interact with the system. You're still doing most of the work, and AI is just providing quick answers and suggestions to move you along faster.

The AI agent model, where you can run many agents in the background in parallel, actually can deliver multiples in productivity gains. Coding is where we're seeing this first, but it will come for most categories of knowledge work.

The only trick is that it's likely not as easy to adopt as the first paradigm was, because it requires a change in workflow. But those that get there are going to see the future faster, and get more compounding returns, than those that don't.
— Aaron Levie (@levie) September 17, 2025

Our take: 4 modes of interaction

We have a framework of 4 modes of human-agent interaction that we call pull, push, ambient & autonomous. We haven't seen anything else like this and it's been extremely helpful to us as we build across each, so we wanted to share.

Mode	Description	Examples	Challenges
Pull	User asks agent for help	Real-time: ChatGPT, Cursor Cmd-K, Harvey Assistant Long-running: Coding Agents, Deep Research Actively: Assistant, 1-Click Workflows	Doing longer-running work while keeping user engaged
Push	Agent tells user to do things	Deterministic triggers: Cursor Bugbot, Traversal AI SRE Agentic triggers: ChatGPT Pulse Actively: Agent Inbox	Keeping signal to noise ratio high
Ambient	Agent sees what user is doing and offers help	Today: Coding autocomplete, Cluely Future: AI browsers Actively: Chrome Extension HUD	Quality vs. latency tradeoffs Seamless UX to not interrupt workflow
Autonomous	Agent initiates and executes work on its own	Customer support agents (Sierra, Decagon) Actively: autonomous emails to SMB prospects	Limiting downside Building mechanisms for management + collaboration with humans

Pull

Users ask the agent for help. We think of this as on-demand intelligence, and it's great because it's incredibly flexible and has the lowest threshold of quality needed to be useful.

The obvious baseline is ChatGPT, but there are a few clever things people have done:

Lots of AI products now show you their work (reasoning and tool use) while answering queries
Cursor's Cmd-K for inline editing was one of the early features that made them take off
Harvey's Assistant product has two modes: Assist mode that is just simple chat, and a Draft mode that produces a full document and then kicks you into a specialized editor with change tracking for further revision (similar to Claude Artifacts or ChatGPT Canvas)
Probably the coolest early concept we've seen here is Cobot, with the model of “a todo list that does the tasks for you” by creating a group chat of agents that work together to accomplish each task you add

The more interesting question here is how to build interfaces around long-running agents where helping the user requires much more than a few seconds of work — required for agents to do any serious task. Coding agents and Deep Research are both examples of this.

This is challenging because you (1) need to keep the user engaged and productive while the agent is working and (2) give them the right levers to steer the agent over a much longer duration. Nailing these is crucial to achieve real productivity gains, as anyone who has used a coding agent knows first-hand (there's the recent viral study that claims that developers who use coding agents think they're more productive but actually take longer to complete tasks).

We've seen interesting attempts to solve these but nothing that feels quite right yet:

Async-first execution: Many coding agents now have Slack integrations where users can delegate work in an interface that is async-first: when you Slack a co-worker asking them to do something, you don't expect them to respond instantly. ChatGPT gives you notifications on web and mobile for long-running tasks like Deep Research.
Pull forward planning: Many coding agents also have the concept of Plan Mode, where the agent will now quickly build a multi-step plan upfront (to get user direction while they're engaged) and then go off and execute on it without user direction, effectively “pulling forward” the period where user engagement is needed. Cursor's has the interesting twist where the user can actually edit the plan in a text editor (vs. just giving feedback to the model to adjust).
Multiplexing: There's been a recent wave of Claude Code UI wrappers like Conductor that make it easier to manage multiple agents doing different tasks (like instruction pipelining for humans!)

We have a different take here for our own product. In our domain (sales), there's a clear atomic unit (the account, meaning a potential or current customer) and goal (maximize revenue from that account) that both humans and agents work towards. This enables us to have long-running agents that run outside of the query path, where we have per-account agents that run forever and work in the background on each account to do research, develop a point of view, and initiate actions on their own (see push and autonomous below) — conceptually similar to having a sales rep that just works on that account 24/7, across every account you could sell to.

Then, to power our conversational “pull” interface (we call it Assistant), we spawn short-lived conversational agents that have shared read-write access to the long-running agent's memory and can see and alter what the agent is doing in the background (let's call this “thinking fast and slow”). This enables us to get the best of both worlds, where we can accomplish user-initiated tasks quickly (like short-running agents) but have them reflect days of work in the background (like very long-running agents) because the vast majority of both reasoning and tool calling has been done in the background already and the results are stored in shared memory. It's like talking to a colleague who already knows everything about the topic you're asking about — it's much faster and easier for them to help you because they've done the hard work already.

This also then enables us to power different types of pull interactions in surfaces beyond chat. For example, we have a feature called 1-Click Workflows where we embed buttons within Salesforce that trigger real-time agentic workflows (write an email and put in my drafts, create a slide deck, etc.).

There aren't that many other domains where there is both a clear atomic unit and clear goal, but we expect to see versions of this across many agent products in the future. For example, the atomic unit in a large engineering org is the repo; Devin's DeepWiki uses agents that run in the background to build deep knowledge of a repo that significantly improves performance in large existing codebases.

Push

Agents tell users to do things they're not already doing and then help them execute it. Having high signal-to-noise is critical here (otherwise it's just annoying and even worse than not having anything), but we've found that there is extremely high upside when you get this right because it drives instant, significant productivity impact and leads to outcomes that wouldn't have happened otherwise by overcoming the limitations of the human who has to initiate all actions in a pull-only world. In our view, push is critical to overcome the capped productivity impact of pull-only interfaces, because it lets agents initiate actions themselves.

An easy way to do this is by having agents run on deterministic triggers, like Cursor's Bugbot (new PR) or Traversal's AI SRE agent (new incident).

The next step here is agentic triggers, where agents can initiate actions themselves. What if Linear suggested features to build, Figma came up with products to design, Cursor proposed refactors to do, or Harvey identified gaps in your arguments?

This would be immensely valuable, but what's even more exciting is the holy-grail feedback loop that actually lets agents learn higher-level tasks, especially in a world of reinforcement learning. Today, agents typically perform at the level of junior employees: Linear's AI is essentially a low-level products ops person (triages tickets and cleans up the backlog), user reviews often describe Harvey as comparable to a junior associate at a law firm (drafts documents and does research you tell it to do), etc. The main difference between junior and senior employees isn't quality of work; it's that senior employees come up with what to do vs. just doing things they're told. So the only way for agents to climb the ladder is to be put in a position to come up with ideas themselves and get feedback from humans. Conceptually, Linear could train the world's best PM agent if they did this!

ChatGPT Pulse, which proactively does background research to give you daily suggestions of things it can help you with each morning, is an exciting example of agentic-trigger push, and we expect that many agent companies over the coming years will start to do this.

In our product, we have an Agent Inbox that each of our per-account agents can use to push tasks to the human sales reps that cover that account, like potential customers to reach out to or what to say in upcoming meetings to progress deals. Tasks are queued asynchronously via a tool call from the per-account agent (which is then notified about their lifecycle as the user accepts/rejects, the action actually gets executed, etc.).

Ambient

Agents see what users are doing and offer them help in-the-moment. This is extremely valuable when done right but also very difficult because it requires both high signal-to-noise and a great UX (low latency, seamlessly embedded into workflow).

Coding autocomplete models are perhaps the best example of ambient intelligence done right, and they're incredibly useful. Cursor Tab is an especially well-done example of deep integration here where it not only predicts what you are going to write wherever you are in the file, but also where in the file you're going to edit next.

As an example of a cool consumer concept that has gotten a lot of hype but isn't quite ready for prime-time is Cluely (the “cheat on everything” company). Real-time call coaching is great in theory, but in practice Cluely is too high-latency and the suggestions are too low-quality to be useful.

The reason this is so hard is because of the quality-latency tradeoff that exists with models and agents today (the whole reason there's a trend towards long-running agents is for them to do more significant amounts of work). There are two potential solutions here:

First, you can find specific cases with looser latency requirements. For example, for a live editing agent in Google Docs, does it matter if it's slightly lagged, within maybe 30 seconds? Probably not; in fact, it's arguably better (it's annoying if you're writing a doc and someone is typing over the same sentence you are). If you're predicting higher-level chunks of action, like where an engineer is going to go next in the file, you have a larger time window for inference. But the problem is that this caps your impact to only a subset of use cases, and still doesn't give you that much time to do tool calls or actual long-running work.

The other solution is to employ the same “thinking fast and slow” trick from the pull section, where you have a long-running agent actually do the pre-work in the background, and then give a fast model access to the long-running agent's memory to offer in-the-moment help.

In our product, we use the latter method to power a heads-up display (HUD) in our browser Chrome extension that reads the DOM of the current page across various sales workflow tools (since they're all browser-based) and uses a real-time model to understand user intent and generate a contextual suggestion.

Autonomous

Agents come up with things to do and do them on their own. The trivial case here is when agents are doing totally different things than humans, but we're much more interested in considering the case where agents and humans do related work together in the same environment.

The most widespread example of this today is customer support agents like Sierra or Decagon, where agents have the right-of-first-refusal to autonomously resolve incoming tickets and escalate to a human when unable to resolve. A few things make this easier though: (1) individual tickets are independent, short-lived atomic units that the agent can own, (2) the agent doesn't actually have to do anything interesting to initiate work because it's initiated by a deterministic trigger (new ticket), and (3) there is only a single, agent-to-human handoff vs. back-and-forth or collaborative work.

In most domains, this is more difficult. Today, in our product we only give agents full autonomy in limited cases where downside is low, such as automatically sending emails to the smallest potential companies that our customers could sell to.

In addition to limiting downside via guardrails and restricting agent actions, the interesting set of things to build here are similar to how you might structure mechanisms for management and collaboration within a company. A few interesting things we're thinking about:

Delaying autonomous tool calls to let users inspect upcoming actions, adjust agent training as needed, and see the impact of adjustments on impending actions
Allowing users to make arbitrary queries over actions over rolling windows to understand and correct agent behavior
Simulations to let users gain confidence around agent behavior and test out changes to training, including by rewinding long-running agents replaying real events and seeing how the agents would handle them differently (Sierra has a version of this)

We expect to see much more here because we believe systems of intelligence will kill systems of action via autonomous agentic execution (perhaps as an obvious example from our domain: today, some sales reps make hundreds of calls a day; it's hard to imagine that humans will still be doing this 5 years from now given progress in voice agents). In this future world, what matters is the quality of the underlying intelligence and the scaffolding around the agents to help humans monitor and iterate on them.

Core design philosophy: unifying the experience

We have the unique privilege of getting to work with some of some of the most forward-thinking companies out there (like Samsara, Ramp, and Ironclad), who have been great partners in experimenting here. As we've done so, we've developed two core design principles that are different in practice than what we see many other agent companies doing:

You should build opportunities for users to interact with intelligence in as many places as possible, in order to meet users where they are. While it is tempting to try to own the workflow, adoption is what you really care about
The experience needs to feel as unified as possible across the various modes of interaction

Here's how we think about these principles in our product:

Interact everywhere → users can interact with our pull & push experiences across Slack, our Chrome sidebar extension (which is a nice way to live alongside everything they do in-browser), our web app, and their own internal tools via API (pull) + webhook (push)
Unified experience → two ideas here:
1. All modes share user-level memory, preferences, user-defined “workflows,” etc. which ensures consistency of output. For example, if you like to write emails a certain way, we'll both learn that and factor it in within every interaction mode
2. Chat (pull) is the base and we try as much as possible to let you shift back into that mode from other modes for maximal interactivity. For example, every push Agent Inbox notification has a button to “continue the conversation” in Assistant, which then gets seeded with the context that led to the push recommendation.

Looking ahead

We almost view these different modes of interaction as a “Maslow's hierarchy of human-agent interaction” from pull → push / ambient → autonomous, with high upside to ascending it in terms of value added for users and ultimate impact on actual productivity. But each level of the hierarchy requires a higher level of intelligence, such as high signal-to-noise and the ability for agent to do more things without human interaction.

Cursor, for example, has invested a lot to this end, like doing reinforcement learning to improve the signal-to-noise of Tab. As discussed above in the push section, ascending the hierarchy lets you build a killer data flywheel for RL, since it gives the agent the opportunity to collect valuable data and feedback on its own, not just when users initiate interaction.

We also think about being able to more readily ascend the hierarchy in use cases where there is less downside. For us, this means something like the job of SMB SDRs should largely be done by agents in autonomous mode while Strategic AEs should likely interact primarily with pull mode for the foreseeable future.

We're especially interested in ambient and autonomous modes, and are aggressively investing here for our own product. We're excited to see all the big consumer AI companies try to move out of the shell of a website into owning the browser (ChatGPT Atlas, Perplexity Comet, Claude for Chrome). So far, these products are still all pull-based via a chat sidebar, but the whole point of building a browser is to deeply embed within the DOM, and it will be exciting to see how these products start to leverage that ability.

We're incredibly excited for a future where long-running agents live and work everywhere, in modes beyond just pull, to significantly increase productivity and growth. And if you are too, come join us (or contact sales) 😉

This is especially critical given two trends:

Long-running agents that run for several minutes, hours, or days (or in our case, forever) to perform more complex tasks
Ambient agents doing self-initiated work vs. responding to user requests

The UX for long running AI Agents is going to be one of the most interesting design questions in the coming years. The more the agent is doing complex tasks for you in the background, the more the UI of software is about the meta elements of managing their work.
— Aaron Levie (@levie) May 24, 2025

So there was some initial buzz at the very start… but then usage tanked. The entirety of that summer was just incredibly slow growth. That was somewhat demoralizing… We tried all these different things that summer, and then we found this core set of features that really, really worked incredibly well. One of them was this instructed edit ability, and we kind of nailed the UX for that… We had a bunch of other experiments that didn't pan out—probably ten failed ones for every feature you see in the product.
— Aman Sanger Apr 2, 2025

There really are 2 different worlds of AI adoption right now. Most individuals, teams, and organizations have *finally* gotten around to implementing AI chat systems for the first time.

But this is happening at the exact same moment when it's getting clearer what the future of AI agents are going to look like.

The type ahead or chat interaction paradigm of AI maxes out at double digit productivity gains because you're inherently rate limited by how fast you can type or interact with the system. You're still doing most of the work, and AI is just providing quick answers and suggestions to move you along faster.

The AI agent model, where you can run many agents in the background in parallel, actually can deliver multiples in productivity gains. Coding is where we're seeing this first, but it will come for most categories of knowledge work.

The only trick is that it's likely not as easy to adopt as the first paradigm was, because it requires a change in workflow. But those that get there are going to see the future faster, and get more compounding returns, than those that don't.
— Aaron Levie (@levie) September 17, 2025

Our take: 4 modes of interaction

Mode	Description	Examples	Challenges
Pull	User asks agent for help	Real-time: ChatGPT, Cursor Cmd-K, Harvey Assistant Long-running: Coding Agents, Deep Research Actively: Assistant, 1-Click Workflows	Doing longer-running work while keeping user engaged
Push	Agent tells user to do things	Deterministic triggers: Cursor Bugbot, Traversal AI SRE Agentic triggers: ChatGPT Pulse Actively: Agent Inbox	Keeping signal to noise ratio high
Ambient	Agent sees what user is doing and offers help	Today: Coding autocomplete, Cluely Future: AI browsers Actively: Chrome Extension HUD	Quality vs. latency tradeoffs Seamless UX to not interrupt workflow
Autonomous	Agent initiates and executes work on its own	Customer support agents (Sierra, Decagon) Actively: autonomous emails to SMB prospects	Limiting downside Building mechanisms for management + collaboration with humans

Pull

Users ask the agent for help. We think of this as on-demand intelligence, and it's great because it's incredibly flexible and has the lowest threshold of quality needed to be useful.

The obvious baseline is ChatGPT, but there are a few clever things people have done:

Lots of AI products now show you their work (reasoning and tool use) while answering queries
Cursor's Cmd-K for inline editing was one of the early features that made them take off
Harvey's Assistant product has two modes: Assist mode that is just simple chat, and a Draft mode that produces a full document and then kicks you into a specialized editor with change tracking for further revision (similar to Claude Artifacts or ChatGPT Canvas)
Probably the coolest early concept we've seen here is Cobot, with the model of “a todo list that does the tasks for you” by creating a group chat of agents that work together to accomplish each task you add

We've seen interesting attempts to solve these but nothing that feels quite right yet:

Async-first execution: Many coding agents now have Slack integrations where users can delegate work in an interface that is async-first: when you Slack a co-worker asking them to do something, you don't expect them to respond instantly. ChatGPT gives you notifications on web and mobile for long-running tasks like Deep Research.
Pull forward planning: Many coding agents also have the concept of Plan Mode, where the agent will now quickly build a multi-step plan upfront (to get user direction while they're engaged) and then go off and execute on it without user direction, effectively “pulling forward” the period where user engagement is needed. Cursor's has the interesting twist where the user can actually edit the plan in a text editor (vs. just giving feedback to the model to adjust).
Multiplexing: There's been a recent wave of Claude Code UI wrappers like Conductor that make it easier to manage multiple agents doing different tasks (like instruction pipelining for humans!)

Push

An easy way to do this is by having agents run on deterministic triggers, like Cursor's Bugbot (new PR) or Traversal's AI SRE agent (new incident).

Ambient

Autonomous

Delaying autonomous tool calls to let users inspect upcoming actions, adjust agent training as needed, and see the impact of adjustments on impending actions
Allowing users to make arbitrary queries over actions over rolling windows to understand and correct agent behavior
Simulations to let users gain confidence around agent behavior and test out changes to training, including by rewinding long-running agents replaying real events and seeing how the agents would handle them differently (Sierra has a version of this)

Core design philosophy: unifying the experience

You should build opportunities for users to interact with intelligence in as many places as possible, in order to meet users where they are. While it is tempting to try to own the workflow, adoption is what you really care about
The experience needs to feel as unified as possible across the various modes of interaction

Here's how we think about these principles in our product:

Interact everywhere → users can interact with our pull & push experiences across Slack, our Chrome sidebar extension (which is a nice way to live alongside everything they do in-browser), our web app, and their own internal tools via API (pull) + webhook (push)
Unified experience → two ideas here:
1. All modes share user-level memory, preferences, user-defined “workflows,” etc. which ensures consistency of output. For example, if you like to write emails a certain way, we'll both learn that and factor it in within every interaction mode
2. Chat (pull) is the base and we try as much as possible to let you shift back into that mode from other modes for maximal interactivity. For example, every push Agent Inbox notification has a button to “continue the conversation” in Assistant, which then gets seeded with the context that led to the push recommendation.

Looking ahead

Mihir Garimella

Oct 31, 2025

Actively AI

Solutions

For Teams

Sales Leaders

Revenue Operations

Sales Development

Customers

Blog

Careers

See demo

Try Actively and see for yourself

See how Actively's Superintelligence can increase revenue per rep from day 1.

Learn More

Actively AI

Home

Customers

Blog

Careers

Security

Privacy

contact@actively.ai

Try Actively and see for yourself

See how Actively's Superintelligence can increase revenue per rep from day 1.

Learn More

Actively AI

Home

Customers

Blog

Careers

Security

Privacy

contact@actively.ai