Evals or Guardrails or Both?

What Every AI Team Needs to Know

Your AI agent just processed a refund it shouldn’t have. Here’s how that should have been caught—and how to make sure it never happens again.

Your team just shipped an AI agent. Not just a chatbot—an actual agent. It can look up orders, process refunds, update shipping addresses. Users love it. Support tickets are down 40%.

Then one morning, you wake up to an incident report. A customer asked for a refund on an order that was already delivered. The agent looked it up, saw “delivered,” and processed the refund anyway. $847. Gone. And it turns out this has happened 23 times this week.

Or maybe it’s a coding agent your engineering team uses. A developer asked it to write a database function. The agent delivered clean, working code—with a hardcoded password and a SQL injection vulnerability baked right in. It made it past code review because the senior dev was on vacation.

This is the reality of production AI in 2025. We’re not just building chatbots anymore. We’re building agents—systems that can reason, use tools, execute code, and take real-world actions. And when these agents make mistakes, those mistakes have consequences.

That’s where two critical systems come in: Guardrails and Evals.

Most teams conflate these. They shouldn’t. Understanding the difference—and when to use each—is the difference between agents you can trust and ones that are ticking time bombs.

Chapter 1: The Core Difference

Let me give you the simplest mental model:

Guardrails are your seatbelt—they prevent harm in real-time.
Evals are your crash test rating—they tell you if you’re safe to drive.

Remember that refund incident from the opening? A guardrail would have caught it. Before the agent’s process_refund() call executed, a guardrail would have checked: “Wait—this order is marked as delivered. Block this action and ask the customer for more information.”

The eval, on the other hand, would have told you something different: “23% of your refund requests this week resulted in the agent trying to refund delivered orders. Your model has a reasoning problem. Fix the system prompt.”

See the difference? The guardrail stopped the bad thing from happening. The eval measured how often the agent tried to do the bad thing—so you can fix the root cause.

In the old days of simple chatbots, this was straightforward. User sends message, LLM responds, guardrail checks the response, done. But agents are different. They don’t just talk—they act. They call tools. They execute code. They make decisions in loops.

Think about what your agent actually does:

Customer asks for refund → Agent reasons → Calls lookup_order()
↓
Gets order data → Agent reasons again → Tries process_refund()

That’s the danger zone. Between “agent decides to act” and “action executes”—that’s where guardrails live. They’re the checkpoint that says: “Hold on. Let me verify this is safe before you do something irreversible.”

Evals come later. They look at the whole trace—all the reasoning, all the tool calls, all the decisions—and score it. They tell you whether your agent is getting better or worse over time.

Quick Comparison

Guardrails

Run on every action, in real-time
Must be fast (<100ms)
Binary: block or allow
Prevents harm before it happens

Evals

Run async, often sampled
Can take seconds or minutes
Nuanced scores (0.0 to 1.0)
Measures quality for improvement

Chapter 2: The Development Lifecycle

Here’s what confuses most teams: evals aren’t just a production thing. They’re your primary tool during development. Guardrails, on the other hand, are infrastructure—you set them up once and they run automatically.

Development

Evals: Heavy | Guardrails: Optional

You’re building a new feature. You write eval tests constantly: “Does my new prompt maintain relevancy?” “Did I break faithfulness with this change?”

Pull Request / CI

Evals: Gate PRs

Your eval suite runs automatically. If quality scores drop below baseline, the PR is blocked. No exceptions. No “we’ll fix it later.”

Staging

Both: Full

Production-like environment. Full guardrails enabled, comprehensive eval suite runs against real-ish traffic patterns.

Production

Guardrails: 100% | Evals: 5-10%

Guardrails run on every single request—they’re your safety net. Evals sample 5-10% of traffic for monitoring, trending, and alerting.

Key insight: Product teams use evals heavily during development—they own their quality definitions and test cases. Safety guardrails (toxicity, jailbreaks) often make sense to centralize for consistency. But domain-specific checks? Those stay close to the product team that understands the context.

Chapter 3: The Shoe That Didn’t Exist

Let’s walk through a real scenario. You’re running an e-commerce site and you’ve built an AI product assistant. A customer is looking for running shoes for their first marathon.

The Request

Customer:
“I’m training for my first marathon in April. I need good running shoes but don’t want to spend more than $150. I usually wear Nike but open to other brands.”

AI Assistant (Raw Output):
“Great choice training for a marathon! Here are my top recommendations:

Nike Pegasus 40 – $129.99 – Great cushioning for long runs
Brooks Ghost 15 – $139.99 – Excellent for beginners
Nike Infinity Pro 3 – $144.99 – Superior energy return

The Pegasus 40 is a best-seller and only 3 left in stock – I’d grab it before it’s gone!”

Looks helpful, right? But there are problems here.

What Guardrails Catch (Real-time)

Product Existence Check
Guardrail queries the product catalog: Does “Nike Infinity Pro 3” exist?
Result: NO. This product doesn’t exist. The AI hallucinated it. Guardrail removes it from the response.

Price Validation
Guardrail checks current prices: Is Brooks Ghost 15 actually $139.99?
Result: NO. Current price is $149.99. Guardrail corrects the price.

Inventory Urgency Check
Guardrail checks: Are there really “only 3 left” of Nike Pegasus 40?
Result: NO. There are 847 in stock. This is fake urgency. Guardrail removes the claim.

Total guardrail time: 43ms
Actions taken: Removed hallucinated product, corrected price, removed false urgency claim

What the Customer Actually Sees

AI Assistant (After Guardrails):
“Great choice training for a marathon! Here are my top recommendations:

Nike Pegasus 40 – $129.99 – Great cushioning for long runs
Brooks Ghost 15 – $149.99 – Excellent for beginners

Both are well-suited for marathon training within your budget.”

What Evals Measure (Async)

Meanwhile, this interaction gets sampled for evaluation. An eval job runs (taking about 2-3 seconds using LLM-as-judge):

Relevancy

0.92

Personalization

0.88

Budget Adherence

1.0

Hallucination

0.67

The eval results get logged. Over time, the team sees patterns: “Hallucination rate is 12% this week—up from 8% last week. Something changed. We need to investigate the prompt.”

The difference: Guardrails prevented the customer from seeing wrong information. Evals measured that there’s a quality problem to fix at the source.

The Bottom Line

Guardrails are your seatbelt—they prevent the crash.

Evals are your crash test rating—they tell you if you’re safe to drive.

You need both.

Remember where we started? The customer service agent that refunded $847 for a delivered order. The coding assistant that shipped a password to production.

With guardrails, those incidents never would have happened. The refund would have been blocked before execution. The code would have been sanitized before the developer even saw it.

With evals, you would have known the problems were brewing. You’d have seen “23% of refund attempts are for delivered orders” in your dashboard before the first one slipped through. You’d have known your coding agent had a 15% rate of generating credentials in code.

The teams that build AI agents without both of these systems aren’t being bold—they’re being reckless. And in 2025, when agents can execute code, process payments, and take actions that affect real people and real money, reckless isn’t an option.

Build the guardrails. Run the evals. Trust your agents—but verify.

One response to “Evals or Guardrails or Both?”

Guardrails & Evaluation: Safety You Can See – about people, places, things and EXPERIENCES

Jan 5, 2026 1:05 PM

[…] For a deeper exploration of this distinction and when to apply each approach, see Evals or Guardrails or Both? […]

LikeLike