Automation

The Email My n8n Agent Almost Sent, And Why I Now Require Approval for Every Outbound Action

By Felix Maru · June 12, 2026 · 8 min read

The workflow had been running clean for six weeks. Every day it picked up lower-priority support tickets, passed them to Claude via the API, received a draft reply, and sent the email. I reviewed roughly 5% of outgoing replies at random. They looked fine. Ticket first-response time dropped from around four hours to under 20 minutes without me touching anything.

Then on a Tuesday morning I was scrolling the sent folder looking for something unrelated, and I saw it.

A reply addressed to one customer by name, acknowledging their specific billing concern in detail, except the customer named in the salutation and the customer whose billing issue it described were two different people. The agent had crossed the contexts of two open tickets. It had the right tone, the right template, the right account detail, from entirely the wrong ticket.

I pulled the thread, confirmed no one had opened it yet, corrected it manually, sent an apology, and spent the next hour deciding how much trust I still had in the workflow. By end of morning the decision was made: full send authority for outbound customer email was coming off the table until I had an approval gate that actually worked.

The Hack That Worked Until It Didn't

My first attempt at a gate was n8n's Wait node. The agent drafted the email, the workflow paused, a Slack message went out with the draft text, and the flow waited for me to send a specific reply before continuing. Functional in principle. In practice it was fragile, the resume logic broke if I missed the message window, timeouts killed the workflow quietly, and if I was in a meeting when the Slack landed, the whole thing stalled.

What I actually needed wasn't a workaround. I needed the gate built into the tool call itself.

In May 2026, n8n shipped Human-in-the-Loop (HITL) as a first-class feature on the AI Agent node, applied at the tool level. It is meaningfully different from the Wait-node approach, and it's the thing I'd been patching toward for months.

How It Actually Works Now

The setup is straightforward: on any tool your agent can call, you flip a toggle to require human review. When the agent decides to invoke that tool, send_customer_email, for example, the workflow pauses before execution and surfaces a decision request through your configured channel: Slack, Telegram, or the n8n built-in chat interface. You see exactly which tool the agent wants to use and with what parameters: the recipient address, the ticket reference, the full draft body.

You approve or deny. If approved, the tool executes. If denied, the agent receives that signal and can attempt a different path, rephrase the draft, flag for manual handling, or close the ticket with a note.

What makes this different from the Wait-node hack is where the pause happens. In the old approach, the pause was at the workflow level, context could get stale, the resume was a separate trigger, and the agent had no real awareness of what had happened during the wait. In the native HITL implementation, the pause happens at the tool-execution layer. The agent's working context is preserved. It knows it was waiting, it knows what you said, and it continues with that information rather than starting fresh or failing silently.

Human-in-the-loop doesn't mean you're doing the work. It means you're the last check before the agent's decision becomes irreversible.

What I Actually Configured

My current agent setup for the email-handling workflow has four tools. Here's what got a gate and what didn't:

send_customer_email, HITL on. Every outbound email requires approval before sending. No exceptions.
search_ticket_history, no gate. Reading ticket data is non-destructive. The agent can call this as many times as it needs to.
update_ticket_status, no gate. Changing a queue status is reversible in under a minute if it's wrong.
create_internal_note, no gate. Internal only, no customer exposure, easy to edit or delete.

The Slack approval message I see shows: customer name, email address, ticket subject, and the full draft body. One button to approve, one to deny with an optional short note that the agent receives as context.

Since enabling HITL on May 22nd, I've reviewed around 230 drafts. I've denied 12, roughly 5%. Of those 12, three had genuine factual errors (wrong customer reference, wrong product name, one where Claude answered a question that wasn't actually in the ticket at all). The other nine I denied because the tone or framing was wrong for that specific customer situation, nuance that system prompts can capture in general but rarely hold perfectly in every individual case.

That's a roughly 1-in-20 catch rate. At the volume this workflow runs, without the gate, that's several bad emails a month going out before anyone noticed.

HITL vs Human-on-the-Loop, Which One You Actually Need

HITL has a real cost: latency. If you require approval for every tool call, you've built a human-in-the-loop process, not an automated one. The speed advantage disappears and you're just adding an extra click to work you could have done yourself.

The cleaner distinction is between two patterns:

Human-in-the-loop: the agent pauses before the action. Nothing happens until a human signs off. Use this when the action is outbound-facing, irreversible, or when the agent is operating in territory you haven't tested thoroughly enough to trust yet.

Human-on-the-loop: the agent acts, then a human reviews after the fact. Use this when the action is internal, reversible, and the agent's error rate in that domain is low enough that spot-checking is sufficient.

In my workflow, send_customer_email is firmly in the first category. Ticket status updates and internal notes are in the second. The boundary between them isn't the complexity of the task, it's the reversibility and the exposure.

The mistake I see most often in agent builds is giving full autonomy over everything and only adding gates retroactively after something breaks publicly. The better habit is to default to gates on anything outbound or irreversible, prove the agent's reliability on those actions over time, and remove the gate when you're actually confident, not before.

What HITL Doesn't Fix

A gate at the tool level catches individual bad outputs. It doesn't fix a misconfigured agent. Three of my 12 denials were errors that a better system prompt would have prevented, the agent was picking up context it shouldn't have access to, or it was conflating structured data from two concurrent tickets in a way I hadn't anticipated.

For those, the right fix is the prompt, not more approvals. I updated the context-passing logic to pass only the relevant ticket's data at call time, and I added an explicit instruction not to reference account details that don't appear in the active ticket body. The denial rate on that error type has been zero in the three weeks since.

Gates catch instances. Prompt fixes stop patterns. You need both, and you should be working on both in parallel whenever you're debugging an agent's output quality.

The Workflow I Should Have Built First

If I were building this email automation from scratch today, I'd start with HITL on every outbound tool and a full-automation flag on every read-only or internal tool. I'd run it that way for at least 100 executions, track the approval rate, and narrow the gate scope based on what the data shows.

What I actually did was build the automation first, prove its speed, and only add the gate after a near-miss with a real customer. That sequencing gave me an embarrassing few weeks of false confidence in between.

The n8n HITL implementation is the right infrastructure for the problem. The version that shipped in May 2026 is genuinely better than anything I was able to hack together before it. But infrastructure doesn't replace judgment about which actions are risky enough to gate, that's still a call you have to make before you start building, not after something goes sideways.

One More Thing Worth Saying

The customers whose tickets this workflow handles don't know an agent drafted the replies. Most of the time, the drafts are good enough that there's nothing to flag. But "good enough most of the time" and "good enough to send without review" are not the same standard, and they're separated by the 5% of cases where the model confidently gets something wrong.

I don't want to be the person who discovers that gap from a customer complaint. That's the whole reason the gate exists.

If you're running n8n workflows that touch anything customer-facing and you haven't added approval gates, the May 2026 release makes it genuinely straightforward to do. If you want to compare notes on how you've configured yours, or you're trying to work out which tools in your workflow actually warrant a gate, drop me a line.