THE BRIEFING ROOM

Why your AI pilot didn't deliver (and what to do differently next time)

65% of AI pilots fail to deliver expected outcomes. Not "underperform." Fail. And according to Forrester's 2025 analysis, three out of four firms attempting advanced agentic AI independently will fail outright.

So if you're sitting there thinking we tried AI and it didn't work, you're in the majority. Cold comfort, I know. But it matters, because the conclusion most firms draw from that experience - that AI isn't right for us, or that the technology isn't ready - is almost always wrong.

And honestly? That wrong conclusion is the thing that keeps me up at night more than the failed pilots themselves.

I've spent the last eighteen months in post-mortem conversations with managing partners, CTOs, and operations directors who ran AI pilots that went nowhere. Some spent six figures. Some spent less but burned through more goodwill internally than they can afford. There's a pattern. It's remarkably consistent. The technology wasn't the problem. The setup was.

We tried AI. It didn't work. Our people didn't use it, the outputs weren't reliable, and we wasted six months and a significant budget. I'm not ready to go through that again.

I get it. Genuinely. That's not an irrational position. But before you write off the whole thing, ask a different question: did the pilot fail because AI doesn't work, or did it fail because of how the pilot was designed?

Because in my experience, the answer is almost always the latter. And that distinction - between a capability failure and an implementation failure - is the difference between walking away from something your competitors are going to use to beat you, and fixing something fixable.

The five reasons AI pilots actually fail

I've started keeping an informal tally. Every failed AI pilot I've been involved in diagnosing, directly or through advisory conversations, has been caused by some combination of these five things. Sometimes it's one. Usually it's three working together.

Wrong use case selection. This is the big one. I was in a meeting last year with a mid-sized consulting firm - eight people round a table, the managing partner at the head, clearly frustrated. They'd spent twelve weeks trying to use AI to automate their entire proposal process end to end. The senior partners had lost confidence because the AI kept producing proposals that were technically competent but tonally wrong for the client relationship. One partner, I remember this clearly, slid a printed AI draft across the table and said, "This reads like it was written by someone who's never met a client." He wasn't wrong. The use case wasn't bad in theory. It was just too complex for a first attempt. Meanwhile, another firm I know piloted AI to summarise internal meeting notes. Saved each person maybe four minutes a day. Nobody cared enough to keep using it. Too ambitious on one end, too trivial on the other - and both failed.

Poor data foundations. The AI tool was applied to data that was inconsistent, incomplete, or locked in systems that couldn't talk to each other. The outputs were confidently wrong - which, if you've experienced this, you'll know is worse than no output at all. A lawyer receiving an AI-drafted summary that misattributes a clause to the wrong contract doesn't just distrust the tool. They distrust the entire concept. I've written separately about why data quality matters more than fancy dashboards - the same principle applies here, except the consequences are more immediate. Poor data quality costs organisations an average of $15 million annually, and AI doesn't hide that problem. It amplifies it.

No change management. The tool landed on people's desks with a login and a two-page PDF. No explanation of what it does well and what it doesn't. No guidance on how to evaluate its outputs. No conversation about why using it is worth the initial awkwardness of changing how you work. This one drives me slightly mad, because it's so preventable. Deloitte's research shows that firms with structured change management around AI implementation see 66% productivity gains. Those without? Less than 15% adoption. That's not a technology gap. That's a people gap.

Unclear success metrics. The pilot had no agreed definition of what "working" looked like. So there was no moment at which anyone could have said confidently that it had succeeded - and no moment where the data would have flagged that something needed adjusting. "We'll know it when we see it" is not a success metric. It's a recipe for six months of ambiguity followed by a shrug.

Vendor overpromise. I want to be careful here, because it's tempting to blame the vendor for everything, and that's rarely the whole truth. But the capabilities demonstrated in a sales demo don't always replicate in a production environment with your data, your systems, and your team's actual workflows. If the vendor showed you magic in a controlled environment and you expected the same magic with your messy data and your reluctant partners, both parties bear some responsibility for that gap.

Here's the thing I don't say enough, though: most of these failure modes are well-known. They've been written about extensively. And firms still walk into them. Which means the problem isn't information. The problem is that the people championing AI pilots internally are often under pressure to show results quickly, and that pressure produces exactly the conditions that guarantee failure - broad scope, rushed timelines, inadequate preparation. The pilot is set up to fail before it starts.

Why "it didn't work" is the wrong conclusion

Here's the reframe, and I need to say it directly: in almost every AI pilot failure I've seen, the technology worked as designed. The pilot failed because the design was wrong.

I know that's uncomfortable if you're the person who championed the pilot internally and now feels burned. The frustration was real, the wasted budget was real, and the internal credibility hit was real. All of that happened.

But "AI doesn't work for our firm" was based on incomplete evidence. A pilot that fails because of poor use case selection and inadequate change management tells you almost nothing about what a well-designed AI initiative would produce. It's like concluding that your car doesn't work because you put diesel in a petrol engine. The car's fine. The fuel was wrong.

McKinsey's 2025 data shows that 78% of organisations are now using AI in at least one function. But only a third scale it across the organisation successfully. The gap between those numbers isn't explained by different technology. It's explained by governance, use case discipline, and change management. The firms that are succeeding with AI are not using fundamentally different tools. They're using the same tools with better foundations underneath them.

Your competitors who are getting value from AI didn't get lucky. They got the setup right. And that's learnable.

The reset framework

If you're considering a second attempt - and I think you should be - here's the sequence that works. It's not quick, and it's not painless, but it's practical.

Reassess the use case. Go back to the beginning. Was the original use case genuinely the highest-value, lowest-risk starting point? Or was it selected because it was interesting, because a vendor recommended it, or because a senior partner had read about it on a flight? The best reset use cases are specific, contained, and measurable. Not "improve productivity" but "reduce the time to produce a first-draft engagement letter from 45 minutes to under 10 minutes, measured across ten lawyers over four weeks." If you can't describe the use case in a sentence that includes a number, it's probably still too vague.

Fix the data. What specific data quality problems affected the pilot's reliability? And what would it take to address them for a narrower, more contained use case? You don't need to solve your entire data estate. You need to solve the data problem for this one process. That might mean cleaning one dataset, connecting two systems, or simply establishing a consistent format for the information the AI tool needs to work with.

Define measurable success before you start. Three specific, observable outcomes. Written down. Agreed by the people who will judge whether the pilot worked. Not "AI will improve efficiency" but "proposal turnaround time will drop from five days to two days" or "the number of manual data entry errors in client onboarding will reduce by 50%." If you can't measure it, you can't prove it worked, and you can't learn from what didn't.

Start smaller than feels comfortable. The reset pilot should be narrower in scope, shorter in timeline, and more heavily supported than the original. I know that feels counterintuitive. You've already spent money and time, and going smaller feels like going backwards. But the goal isn't to transform the firm in one go. The goal is to produce undeniable evidence of value in a contained environment - evidence that justifies the next, larger step.

We worked with a 300-person professional services firm that had tried and failed with a firm-wide knowledge management AI tool. The post-mortem was painful - the CTO had championed it, the partners had tolerated it, and by month four everyone had quietly stopped using it. When we reset, we went narrow: automated proposal generation for one practice area only. Sixty percent reduction in proposal turnaround time. Twelve hours of senior consultant time saved per week. That evidence made the case for phases two and three far more powerfully than any business case document could have. The CTO told me afterwards that the reset felt like admitting defeat at first. It wasn't. It was the thing that actually worked.

Build internal capability alongside the tool. Involve the users in selecting and setting up the reset pilot. Provide training that's specific to the use case - not generic "here's how the platform works" sessions, but "here's how to evaluate whether this draft needs significant editing or minor tweaks." Create a feedback mechanism that surfaces problems in week two, not month six. The firms that get adoption right treat the first four weeks as a supported learning period, not a sink-or-swim deployment.

How to sell the reset internally

Right. So you're convinced the reset is worth attempting. Now you need to convince the board, the partnership, or whoever controls the budget - people who watched the first pilot fail and are understandably sceptical.

The reset conversation needs to do two things at once. First, acknowledge honestly what went wrong. Not "the vendor let us down" - because that's rarely the whole truth, and experienced leaders will see through it. More like: "We selected a use case that was too broad, our data wasn't ready, and we didn't invest enough in helping people use the tool effectively. Those are fixable problems."

Second, make a specific case for why this time will be different. Not "we're more optimistic now" - that's not a strategy. But: "We've selected a different, narrower use case. We've addressed the specific data quality issues. We've defined measurable success criteria. And the initial scope is small enough that we'll know within eight weeks whether it's working."

The reset should sound like applied learning, not repeated optimism. There's a meaningful difference between those two things, and a sceptical board will spot which one you're doing.

When to walk away (honestly)

I'd be doing you a disservice if I pretended every firm should try again. In some cases, the honest conclusion from a failed AI pilot is that the firm isn't ready yet - and the preparatory work required to make a second attempt viable is significant enough that it should be treated as a separate programme.

The signals that suggest this: the data quality problems you identified aren't tractable with a cleanup sprint - they'd require a proper data governance programme to address, which is months of work before you even get back to AI. Or the cultural resistance isn't scepticism (which is healthy and addressable) but genuine fear about job displacement, which requires a fundamentally different conversation before any technology initiative. Or the governance maturity to review AI outputs before they have consequences simply isn't in place, and building those structures is a prerequisite, not something you can run in parallel.

Identifying when to walk away is not defeatism. It's the honest diagnosis that prevents an expensive repeat. Some firms need to spend six months on data foundations and change readiness before an AI pilot makes sense. That's a legitimate conclusion, and it's infinitely better than burning through another budget and another round of internal credibility.

If you're trying to figure out which category your firm falls into - ready for a reset, or needing to do the groundwork first - I've written a separate piece on how we approach AI readiness that includes a framework for assessing where you actually stand across data quality, systems integration, cultural willingness, and governance. There's also an AI readiness self-assessment you can work through independently. Both are worth a look before you commit either way.

The worst outcome isn't a failed pilot. It's drawing the wrong conclusion from one.