We’ve explored what agents are, where they add value, and how teams are using them across real GTM workflows.
Now we come to the operational reality: actually making AI agents work. Not on LinkedIn, not in a demo or a sandbox, but in production where workflows are messy, stakes are high, and brand risk is real.
This section is about hiring agents with the right scope, deploying them with the right supervision, and measuring performance in a way that reflects real value. We'll cover:
- The Build vs. Buy decision, and when it makes sense to do either (or both)
- The role of human-in-the-loop (HITL) design in reducing risk and drift
- How to define guardrails and ownership for the agent
- What to measure, and what performance metrics actually mean in agent workflows
Before you can measure performance or enforce guardrails, you have to answer the first operational question:
Are you building your own agent, or buying one off the shelf?
This question is less about speed or cost and more about ownership, flexibility, and accountability. And it has downstream consequences for how your agent performs, how it scales, and how you supervise it.
So how do you decide?
Nina Butler says it starts with understanding the problem’s complexity. If the workflow is simple and well-understood, like pulling account research, you might build it yourself. But when the task involves multiple inputs, market nuance, or subjective value framing, it’s often better to rely on teams who’ve already done the hard work.
She also cautions against underestimating the investment required to build something reliable, especially when AI literacy is still low across most teams.
The more strategic or sensitive the workflow, the more careful you need to be.
Sometimes the better move isn’t building or buying: it’s scoping more clearly. Because no agent can succeed without a clearly defined job. And no decision — build, buy, or blend — works without that first.
Whether you build or buy, one thing doesn’t change: your responsibility. Just because you’re using a vendor’s agent doesn’t mean you outsource accountability. And just because you’ve custom-built one doesn’t mean it will behave predictably. That’s what makes human supervision essential.
AI agents aren’t plug-and-play. As Ori Entis reminds us, agents are non-deterministic; they won’t always behave the same way twice. They respond to new data, shifting context, and dynamic prompts. And that means they can drift, hallucinate, or misfire.
That unpredictability is the tradeoff. And that’s what makes ongoing supervision essential. As such, every high-functioning agent needs a Human-in-the-Loop (HITL).
HITL is not synonymous with micromanagement; it’s about structured supervision:



Nina Butler describes this as the “teammate” model:
As agents prove themselves, supervision can evolve. But it should never disappear. Ori Entis offers a practical roadmap:
As your agent matures, your oversight model should adapt:
In early-stage deployment, supervision is heavy: every output is checked, behavior is closely monitored, and thresholds are strict.
In mid-stage maturity, sampling increases, failures are logged, and metrics are tracked over time.
In late-stage or high-volume use, agents may monitor each other, but humans still own the escalation and performance loop.
That’s what guardrails are: a way to define your boundaries without hard-coding every if-then. You don’t need 60 pages of red-tape. You just need clear answers to a few key questions, and a team that knows who owns what.
Here’s a quick framework you can use to define agent limits, no matter the workflow:
When ownership is distributed or unclear, agents lose reliability fast. Nobody monitors them closely. Nobody tunes them when things go off. And nobody feels responsible when something breaks. Guardrails keep agents safe. Ownership keeps them accountable.
Every agent should have a clearly defined “agent owner”, the person or team responsible for:
- Setting and updating the agent’s scope
- Reviewing and approving changes to behavior or prompts
- Monitoring output quality and performance metrics
- Leading postmortems when failures happen
- Communicating changes to stakeholders
Pro tip: Don’t let vendors "own" the agent’s purpose, even if they built it.You still own the outcomes. You still own the risk.
It’s tempting to evaluate AI agents the way you’d evaluate a new tool: uptime, output volume, maybe accuracy. But that misses the bigger picture.
Agents aren’t just producing content or completing tasks. They’re interacting with your workflows, your systems, and your team. And that means asking:
- Was the output useful?
- Did it save time or improve consistency?
- Did it help the team make a better decision, or a faster one?
Most early deployments track the wrong things:
To track real performance, measure across three layers:
Did the agent do what it was asked? Was the output complete, relevant, correct?
Did the agent reduce effort, increase consistency, or unblock the next step?
Are people actually using it? Does it improve trust, speed, or clarity?
An agent doesn’t need to work independently to be valuable. It just needs to work reliably, and in the right context.
A year ago, Lattice made headlines for announcing the world’s first AI “employee”: an HR agent to manage other agents. The internet mocked it; the use case felt dystopian.
Cut to today and that future no longer feels like a Silicon Valley parody. At enterprise scale, we’re seeing the rise of internal “agent managers,” drag-and-drop agent workflows, and early-stage infrastructure for secure, accountable agent orchestration.
That’s how fast this space is moving.
As we built this playbook, we spoke with operators and analysts to ground it in expert insight. But even outside formal interviews, we’ve had front-row seats to the the AI agent rush. Here’s what we’re learning from the field:
We’ve seen $300k consulting deals signed just to advise orgs on “where to put an agent on the org chart.” We’ve heard of boards demanding agent adoption, only for exec teams to scramble so the next board meeting can report, “We’ve deployed two agents.” It’s easy to laugh — but the truth is, every shift this big starts with confusion. We’re currently in the phase where hype is moving faster than understanding.
Most enterprise deployments today are still at the copilot stage: tools that enhance productivity through chat-like interfaces. This is the layer where adoption is actually happening. True AI agents capable of handling structured goals independently are still rare.
The breakthroughs will come when copilots grow up: from interface overlays to persistent agents that plan, act, and learn within bounded systems - not just responding to prompts. Right now, we’re still in early innings.
Many of today’s so-called agents are just AI wrappers on brittle, legacy SaaS systems. And they break easily. Copilots are often poorly integrated, bolted on rather than built in. But if your copilot layer is brittle, your agent layer will be worse. The disruption will start with rethinking these foundations. And most commercial tools won’t make that leap.
It’s not just buyers who are uncertain. Most vendors are also in test-and-learn mode. They need early users to discover what works. That means buyers aren’t just adopting agents; they’re co-developing them. And that’s not a bad thing, as long as expectations are aligned.
No one wants to be the one saying, “I don’t get it.” So teams overspend, overpromise, and over-automate just to keep up appearances. But the most thoughtful leaders we met weren’t rushing. They were asking better questions: What process are we trying to improve? What failure modes do we expect? Who owns the outcomes?
Most of this playbook has already said it, so we’ll keep this part short.
Don’t fall for the myth that everyone else has it figured out. They don’t. If you’re experimenting with agents, start small. Scope tightly. Choose outcomes you can measure. Ask vendors sharp questions. And resist the urge to copy someone else’s AI strategy; you know your business best.
That said, what we’re seeing is also healthy. The doubt, the discovery, the urgency — all signs of a market waking up, not giving up.
You’re not falling behind if you’re still figuring it out. You’re doing it right.
That’s what we’re betting on at Petavue. And we’re just as excited to see where this goes next.