Boris Tane I'm Betting My Company on Proactive Agents
Agents can do almost anything you ask them to, and that’s the problem: you still have to ask.
I keep hearing people call agents “digital coworkers”. That framing is wrong. A coworker who sits in silence until you hand them a perfectly-scoped task, completes it, then goes back to waiting, is not a coworker. Every agent you’ve ever used works exactly like this. The models got smarter, the harnesses got better, the runs got longer, but the interface never changed: you bring the work, the agent brings the labour.
And most people haven’t noticed, because the prompt box has quietly become what AI is in our heads.
Every interface is still a prompt box
The input box is how this whole era started. ChatGPT put one on top of a model and became the fastest-growing product in history, and we all copied it. Every AI product since has been a variation on the same interaction: human types, machine responds, machine waits.
Claude Code was the next evolution. The agent moved into your terminal, picked up your files, your shell, and your git history, and started doing real work instead of talking about it. It changed what agents could do, but not how they start: you type, it works, it stops, and it waits for you to type again.
Then agents moved to the cloud. Codex, Devin, Claude Code on the web. They run for hours instead of minutes, spin up sub-agents to parallelise the work, and don’t die when you close your laptop. You hand one a task before lunch and come back to a pull request.
The prompt even stopped being a keyboard-only thing. Background agents can be kicked off by an alert or a webhook, and most agent platforms now offer automations: when this event fires, or this cron ticks, run an agent with these instructions.
- When
- a Sentry alert fires
- Do
- investigate, open an incident if it's real
- Then
- post the findings to #incidents
But what is an automation? It’s a prompt you wrote in advance. You predicted the failure mode, picked the event, and wrote down what to do about it. The trigger fires the agent, but the judgement inside it is yours, frozen at setup time. An automation catches exactly what you anticipated, and nothing else.
You are the scheduler
Strip away the tooling and the division of labour hasn’t moved in three years. The agent does the work. Deciding what the work is remains your job.
You read the dashboards, you listen to the users, you figure out what matters, and you compress everything you’ve learned into a prompt, either live at the keyboard or ahead of time in a trigger. The agent executes brilliantly, but every piece of judgement in the system originates with you.
The next evolution: agents that find the work
I’m betting the next evolution is agents figuring out the job to be done. No prompt, no trigger to configure, no instructions written in advance. You connect your stack, and the agent finds the work by itself: it watches the same signals you watch, notices what’s off, decides whether it matters, and starts working on it autonomously.
This is easy to say, and brutally hard to build, because proactivity is three problems stacked on top of each other, and skipping any one of them gives you something worse than the agent behind an input box.
Context. Obviously, the agent needs a live model of the world it operates in, not a snapshot you pasted into the context window at prompt time. A reactive agent with bad context gives you a bad answer. A proactive agent with bad context drops your prod database because it thought it was staging.
Judgement. This is what has been giving me the hardest time. At any given moment, thousands of things in a production system are slightly off. An agent that flags all of them is a noise machine, noise machines get muted, and muted agents are dead agents. The entire value of proactivity lives in the gap between “something changed” and “something matters”:
[
{
"signal": "memory up 3% on checkout-edge",
"verdict": "no anomaly",
"reasoning": "within the seasonal range for this hour on this worker"
},
{
"signal": "new error pattern, 2 minutes after deploy 9f3c2a1",
"verdict": "incident",
"reasoning": "error class never seen on this worker, tightly correlated with a deploy"
}
]
Action. Noticing without acting is just a smarter alert, and alerts are the thing I’m trying to kill. The agent has to finish the job, inside boundaries that make autonomy safe: reversible actions, receipts for everything, and a hard gate before anything irreversible happens.
None of these three involve the agent deciding what good looks like. It decides what’s off, whether it matters, and what to do about it, and that’s the whole list, because in production nobody has to define good: errors at zero, latency at baseline, queues drained, certificates valid. The desired state comes with the territory.
Which means the real shift isn’t from “you prompt” to “the agent prompts itself”. It’s from imperative operations to declarative operations. An automation is imperative: you enumerate the failure modes in advance and script a response to each. A proactive agent is a reconciliation loop: it holds the system you have up against the system you should have, and works to close the gap. Kubernetes did this for infrastructure a decade ago, you write down three replicas and a controller does whatever it takes to keep three replicas alive. Nobody has done it for the operation of software itself. And here there isn’t even YAML to write, because the desired state is already known. It works out of the box.
I started with operations
Ask a proactive agent to pick your product roadmap and you get an opinionated intern, because product direction is a matter of taste. Production isn’t. It’s the one domain where all three problems are solvable today.
The work announces itself: error rates climb, a deploy goes sideways, a certificate expires, a queue backs up. The work is already sitting in the telemetry, waiting for someone to notice it. And unlike taste-driven domains, ground truth exists: the error rate either spiked or it didn’t, the rollback either restored the baseline or it didn’t, so the agent’s judgement gets scored by the system itself, continuously, with no room for vibes.
Most of all, we already staff this job with humans. We call it on-call: a person sleeping next to a phone, waiting for a machine to say that another machine is unhappy. I spent years in observability, founded an observability company and Cloudflare acquired it, and wrote a whole manifesto about structuring telemetry so the answer is one query away. Its thesis is the reason this company exists: nobody should be on-call in 2026.
Because the uncomfortable truth about the last decade of observability is that we made systems easier for humans to interrogate at 3am, then declared victory while humans were still the ones being woken up. The dashboards got prettier and the pager stayed on the nightstand. Observability without action is just expensive storage.
What it looks like
This is what Polylane does. Here is a concrete Tuesday:
Nobody configured a check for this failure mode. There are no rules to write and no thresholds to tune: agents evaluate every connected resource on a continuous cadence, and your existing dashboards and saved queries become their checklist. Judgement is where I am strictest. The default verdict is no anomaly, because it’s much better to miss a borderline issue than to wake someone for noise. “A deploy could introduce a bug” is true of every deploy and never grounds to page anyone.
When something is real, the investigation runs several competing hypotheses in parallel, and each agent’s job is to disprove its hypothesis rather than confirm it, so correlation never gets to masquerade as cause. When the confirmed root cause is a code change, the fix arrives as a pull request with the investigation attached:
Opened by Polylane · gated on review and CI
Proactive is not unsupervised
The autonomy is in the noticing, the triage, the 3am archaeology, and the mitigation. Anything that restores a known-good state, the agent does on its own: roll back the bad deploy, flip the flag back off. Those actions are reversible by construction, and they are the ones that actually silence the pager. What stays gated is anything that creates a new state: a code change ships through your review and your CI, never around them.
The line isn’t human versus agent. It’s reversible versus not. And it’s why you’re not secretly still on-call: the rollback ended the incident at 02:33, six hours before you merged the pull request. The PR was never what stopped the bleeding. It’s what stops it from happening again, and that can wait for coffee.
And the gate is yours to delegate. Code review agents already read every pull request in your repo. There is a world, not far from this one, where your review agent reads the fix at 02:41, checks it against the investigation, approves it, and your CI deploys to production before you wake up. Nothing about the loop changes except who holds the approve button. That is where this ends: software that fixes itself, with you writing the policy instead of clicking merge.
Notice what’s missing from that loop:
The prompt box was a great way to learn to trust these systems, and it’s a terrible way to run production. The agent has seen every deploy, every log line, and every metric across every service at once. Keeping it behind a prompt means the best-informed member of your team only speaks when spoken to.
Nobody should be on-call in 2026.