Automation

I Built an Agentic Support Workflow Before "Agentic" Was a Word. Here's What It Actually Does.

By Felix Maru · June 2, 2026 · 7 min read

The 2026 Gartner Hype Cycle for Agentic AI landed last month and put the technology squarely at the Peak of Inflated Expectations. Only 17% of organizations have deployed AI agents to date, but more than 60% expect to within two years. Cue the enterprise strategy decks, the consulting proposals, the vendor webinars.

I've been running something like this in production for over a year. I didn't call it an AI agent. I called it "the triage workflow." It lives in n8n, it talks to the Claude API, and it handles a meaningful slice of our incoming support queue without a human touching it. Here's what I actually built, what it does well, and what it quietly gets wrong.

The Problem I Was Trying to Solve

Our support queue at a top US-based company I work with had a predictable shape. Roughly a third of incoming tickets were variations of the same small set of requests: password resets, access-level questions, account unlock requests, and "where do I find X" questions that existed almost verbatim in our internal knowledge base. They weren't hard tickets. They were expensive tickets, because every one of them pulled an agent out of something that actually needed human attention.

I'd already automated parts of the workflow with n8n, routing by tag, acknowledgement replies, a Slack alert when a VIP ticket came in. But everything still ended at a human. The triage was manual. The knowledge base lookup was manual. The templated reply was manual, even when the answer was word-for-word in a doc we'd written two years ago.

The question I asked myself was simple: what if I handed the classification and first-response decision to Claude instead of to a human?

How the Workflow Is Actually Built

The architecture is not complicated. What makes it work is the prompt design and the confidence-gating, the logic that decides when to act and when to stop and wait for a person.

The core loop looks like this:

Trigger: A new ticket arrives via webhook from our helpdesk tool. n8n picks it up and extracts subject, body, requester email, and any prior ticket history for that user.
Classification: n8n sends the ticket content to the Claude API with a structured prompt. Claude returns a JSON object: category (one of eight defined types), a confidence score from 0 to 1, whether it contains personally identifiable data that should stay human-handled, and a suggested response draft if confidence is above 0.85.
Routing decision: A Switch node in n8n checks the confidence score. Above 0.85 with a non-sensitive category, the suggested response gets queued for sending with a 10-minute delay so a human can review in the queue before it fires. Below 0.85, or if PII is flagged, the ticket routes directly to a human with Claude's classification notes attached as an internal comment, so the agent starts with context, not a blank ticket.
Knowledge base retrieval: For the auto-response path, n8n runs a separate lookup against our documentation index before the response sends. If Claude's draft references something we can't verify in the knowledge base, the confidence drops and it falls to the human queue regardless.
Logging: Every decision, auto-handled or human-escalated, confidence score, category, response time, goes into a simple Google Sheet. That's the feedback loop. I review it weekly and update the classification prompt when I see consistent miscategorisation.

The workflow doesn't replace an agent's judgment on hard tickets. It replaces an agent's time on easy ones, and sends every ticket to a human with better context than it arrived with.

The Numbers, Six Months In

I'm cautious about citing specific figures because the volume varies week to week, and I've seen other people in this space overstate results in ways that don't survive scrutiny. What I can say directionally:

Roughly 35-40% of tickets now receive an initial automated response without a human composing it from scratch. That doesn't mean zero human involvement, someone still reviews before send, but the labour per ticket drops substantially for that category.
Average first-response time on the auto-handled tier is well under two minutes. Before the workflow, first response on those same ticket types averaged around two hours because they sat in queue behind higher-priority work.
False positives, cases where the workflow auto-responded incorrectly, have been consistently below 5% since the first month. The confidence threshold earns its keep.
The most common miscategorisation is billing-adjacent questions that look like access requests. I've updated the prompt classification description three times to tighten that boundary. It's better but not perfect.

The number I care most about isn't any of those. It's the agent hours freed up for actual problem-solving. When your best support people aren't spending 40% of their day on templated replies, that capacity goes somewhere, escalations get faster, complex tickets get more thorough diagnosis, and the team burns out less.

What It Gets Wrong

I want to be direct about the failure modes because most write-ups about AI automation skip this part.

Confident wrong answers. The confidence score is not a reliability guarantee, it's a pattern-match score. On a handful of occasions, Claude returned high-confidence responses that were technically accurate but contextually wrong for that specific user's situation. The 10-minute review window exists partly for this. But anyone building on this pattern needs to accept that high confidence and correct answer are not the same thing, and design accordingly.

Novel ticket shapes. When a ticket arrives in a category the workflow hasn't seen before, a new integration failure, a compliance-specific question, anything outside the eight defined categories, classification breaks down gracefully rather than dangerously. It flags low confidence and escalates. But it adds latency. A human would have recognised immediately that this ticket was unusual. The workflow takes slightly longer to conclude that it doesn't know.

Conversation threads. The current workflow handles ticket openers well. Follow-ups in a thread are harder because context accumulates across messages in ways that change what the correct response is. I haven't fully solved this. Right now, any ticket with more than two prior replies goes to a human by default, regardless of confidence score. That's a rule I put in because I watched too many thread mid-points get handled incorrectly. It's a blunt solution. I'm working on a better one.

Why You Don't Need to Wait for Enterprise AI to Start

Every vendor at every conference this year will tell you the same thing: agentic AI is coming, it will transform your support operations, and you should get on the waitlist for their platform. Some of that is real. Most of it is schedule-slipping for another 18 months.

The tools to build a working, production-grade triage workflow exist right now. n8n is open source and self-hostable, or you can run it on their cloud. The Claude API has rate limits that scale with usage. The whole thing costs less per month than a single enterprise software seat.

What it costs is time to build and judgment to tune. That's the part the vendors can't sell you in a box. You have to understand your ticket categories well enough to define them clearly in a prompt. You have to set thresholds that match your team's risk tolerance. You have to review the logs and iterate when the categorisation drifts.

None of that requires a large team or a large budget. It requires someone willing to sit with the data for a few hours a week and treat the workflow as a product rather than a deployment.

What I'd Build Next

The triage workflow is one node in what I think a real agentic support stack looks like. The next pieces I'm working toward:

Automated knowledge-base maintenance. Right now I update docs manually when I spot gaps in the logs. I want a workflow that identifies recurring questions that don't match existing documentation and drafts new articles for human review, closing the feedback loop from tickets to knowledge base without my having to bridge it manually.
Smarter thread context. Passing full conversation history into the classification prompt gets expensive quickly. I'm experimenting with summarisation as an intermediate step, compress the prior thread, then classify against the summary. Early results are promising but not ready to put into production yet.
Proactive ticket prevention. The best ticket is the one that never gets submitted. I want to use the pattern data from two years of tickets to identify which product changes, documentation gaps, or communication failures are generating the most predictable support load, and surface that to the product and ops teams before the tickets arrive.

That last one is the most interesting to me because it moves the intervention point from "after the problem" to "before the problem." That's the difference between a support function and an ops function. It's the direction I think the best teams are heading, regardless of what the hype cycle is calling it this year.

If you're working on something similar or want to talk through the specifics of how I've structured the prompt or the confidence-gating logic, reach out. Happy to share more detail than fits here.