I watched a managing partner kill an AI programme in a single sentence last autumn. He didn't mean to. He was actually trying to be supportive. The sentence was: "We tried AI and it didn't work."
That was it. Six words. And the room - eight partners, a head of operations, and a very frustrated CTO - just deflated. Because once that sentence enters the lexicon of a professional services firm, it becomes almost impossible to dislodge. It becomes the thing people say in corridor conversations when someone brings up automation. It becomes the reason the next proposal gets a polite "not right now." It becomes expensive.
The thing is, AI had worked in that firm. Sort of. They'd run a pilot six months earlier - a document review tool that was supposed to cut hours off a particular compliance workflow. The technology did what the technology does. But the expectations around it had been set so high, so loosely, and so publicly that when the results came back as "promising but needs refinement," the partnership experienced it as failure.
That gap - between what was promised and what was delivered - is where most AI confidence goes to die in professional services. And it's almost entirely preventable.
If you've been through this, you'll recognise the shape of it immediately.
A vendor gives a compelling demo. And they usually are compelling - these things are built to be. An internal champion (often the CTO, sometimes a forward-thinking partner) gets excited and starts building organisational enthusiasm. People start talking about "transformation" and "competitive advantage." The pilot gets scoped, but because the excitement is broad, the scope is broad too. Success criteria are vague - something like "demonstrate value" or "improve efficiency" - rather than specific and measurable.
The pilot runs. The data isn't as clean as the demo data was. The users weren't trained as thoroughly as they needed to be. The results are... mixed. Not bad, exactly. But definitely not the revolution that was implicitly promised. And because expectations were set at "revolution," anything short of that feels like failure.
Then the narrative hardens. We tried AI. It didn't work. And now every subsequent AI proposal has to fight not just for its own merits but against the scar tissue of the last attempt.
I've seen this play out at least a dozen times across the professional services firms we work with. I remember one particularly painful example involving a well-known legal AI platform - not naming names, but you'd recognise it - where the vendor demo had been so slick, so full of "imagine your associates getting this back in minutes," that by the time the pilot launched, the partners had mentally already fired half the document review team. The technology worked fine, actually. It just didn't work like the demo. The data was messier, the edge cases were weirder, and the associates still needed to check everything. "Promising but needs refinement" landed like a verdict. The programme was shelved within a week.
The pattern is consistent. And the prevention starts much earlier than most people think - not at the results stage, but at the scoping stage.
The single most powerful thing you can do to prevent the overpromise cycle is to be boringly, relentlessly specific about what you expect the AI to do.
Not "AI will transform our client onboarding." Instead: "This tool will reduce the average time to produce a first-draft engagement letter from 45 minutes to 10 minutes, for the twelve fee earners in the corporate team who use it." Not "AI will improve our data analysis capability." Instead: "This system will flag the 15% of client accounts that match the pattern of clients who have previously churned, so our relationship team can intervene earlier."
See the difference? The first version in each pair sounds exciting in a partner meeting but gives you absolutely nothing to measure against. The second version is testable. You can look at it after 90 days and say: did it do this or didn't it? And if it reduced the time from 45 minutes to 18 minutes instead of 10 - well, that's still a meaningful result you can build on, rather than an ambiguous disappointment.
Specific claims build or erode confidence based on evidence. Vague claims build or erode confidence based on feelings. And feelings, in a partnership of 30 or 40 people with different appetites for technology risk, are a terrible foundation for investment decisions.
McKinsey estimates that Gen AI could inject $2.6-4.4 trillion annually into the global economy. Deloitte's 2026 State of AI report found 66% of organisations reporting productivity gains from AI. But - and this is the bit that should make you sit up - only one-third are managing to scale AI across the organisation. The gap between "AI works" and "AI works at scale in our firm" is where the real challenge lives.
The pilot is where confidence is either constructed or demolished. So treat it like the strategic exercise it is, not a technology experiment.
I've started advising firms to select pilot use cases against four criteria. High-visibility - the result will be seen by the people whose support matters for broader adoption. Low-risk - if the result disappoints, it's a learning moment, not a catastrophe. Well-defined, with success criteria agreed before the pilot begins, not retrofitted after the results are in. And genuinely representative of the kind of work the broader adoption would support - not a cherry-picked showcase that makes the technology look good but doesn't translate to real workflows.
I know that reads like a tidy framework. In practice it's messier. When we ran a 14-day assessment at a 300-person professional services firm last year, we identified over a dozen potential AI use cases. We scored each one by impact, feasibility, and risk. There were heated conversations. One partner was convinced the right starting point was a client-facing research tool that would have been genuinely impressive but had almost no chance of producing clean results in 90 days. Another wanted to start with something so internal and invisible that even a great result wouldn't have moved anyone.
The one we landed on - automated proposal generation - wasn't the most exciting on paper. But it was visible (partners saw the output every week), low-risk (a bad draft proposal doesn't hurt anyone, it just gets rewritten), well-defined (we measured turnaround time and senior consultant hours saved), and representative (every team in the firm writes proposals).
Ninety days later, proposal turnaround time had dropped by 60%, and the firm was saving roughly 12 hours of senior consultant time per week. The second and third use cases were commissioned immediately. Not because the partnership had been convinced by a presentation - because they'd seen it working with their own eyes.
Run the pilot for long enough to produce meaningful data but not so long that momentum stalls. Report honestly. What worked, what didn't, what surprised you. If the next step is more adoption, say so and explain why. If the next step is a different use case because this one didn't pan out, say that too. Honesty at this stage is worth more than spin.
"But we tried something similar and the results were inconclusive."
I hear you. Were the success criteria defined before you started? Were they specific and measurable? Was the use case selected against clear criteria, or did someone just pick the thing that seemed most impressive? In my experience, "inconclusive results" almost always trace back to inconclusive scoping. The technology rarely lets you down as badly as unclear expectations do.
The biggest risk to your AI pilot isn't the technology. It's the people.
I know that sounds like something you'd read on a conference slide. But I mean it in a specific way. I've seen the same AI product - same configuration, same vendor, same firm - produce completely different results in two teams sitting two floors apart. One team had been involved in selecting the tool, had a proper training session before they were expected to produce results, and had a named person they could go to with problems. The other had it dropped on them with a two-paragraph email and a deadline.
The first team's results were good enough to justify the next phase. The second team's results were used as evidence that the technology didn't work.
What was different? The second team was younger, more junior, and - I suspect - felt like the tool was being introduced to monitor their output rather than help them. Nobody had explained why it was being rolled out or what would happen with the data it generated. Nobody had asked them what would actually make their work easier. So they used it minimally, reported problems grudgingly, and when the results were underwhelming, they weren't surprised.
This is particularly acute in professional services, where partner autonomy is practically a constitutional right. You can't mandate AI adoption the way a corporate operations director might roll out a new system. You have to build coalition. You have to show, not tell. And you have to be honest that yes, there will be a learning curve, and no, the first week won't be smooth.
Right, let's talk about governance. I know - not the most thrilling word in the English language. But bear with me, because getting this right is genuinely the difference between AI adoption that scales and AI adoption that stalls after one pilot.
Most firms think of governance as the thing that slows you down. The compliance layer. The bureaucratic overhead. I'd argue it's the opposite. Good governance is what gives your partnership the confidence to say yes to the next phase. It's confidence infrastructure.
What does that look like in practice?
Human review of AI outputs before they reach clients or inform high-stakes decisions. Not forever - this isn't about creating a permanent bottleneck. It's a calibration mechanism. While the organisation is developing judgement about when AI outputs can be trusted, someone checks them. Over time, as confidence grows and the patterns become clear, you adjust the review threshold. But starting without it is how you end up with an AI-generated client report containing a hallucinated case citation. Ask me how I know.
Privacy and data handling standards that are clear before the pilot begins. Not retrofitted after someone raises a concern in a partner meeting. If you're feeding client data into an AI tool, your clients deserve to know what's happening to it, and your firm needs to have thought through the implications before - not after - the technology is live. Only one in five companies has a mature governance model for autonomous AI agents, according to Deloitte's 2026 research. That's a gap that's going to get very expensive for firms that don't close it.
And bias awareness. AI systems trained on historical data reflect historical patterns, including historical biases. If your firm has historically referred more work to certain types of clients or hired from certain backgrounds, and your AI system is learning from that data, it will reproduce those patterns unless you actively build review mechanisms to catch it. This isn't abstract ethics - it's a practical risk to client outcomes and regulatory standing.
Let's say you've done this well. You picked the right use case, set specific expectations, brought the team along, built the governance infrastructure. The pilot delivered measurable results. People are cautiously optimistic.
Now what?
The temptation is to immediately scale - roll it out across the firm, launch two more use cases, tell the board AI is working. Resist that. The next step is to document what you learned, share the results honestly with the partnership, and make a clear, specific case for the next phase. Same discipline, same specificity. The firms building durable AI capability in professional services right now aren't the ones that launched the biggest pilots or made the boldest claims. They're the ones that started with something specific, measured it honestly, and let the visible results create the internal appetite for more.
It's less exciting than "AI will transform everything." But it actually works. And in a partnership where trust is the currency that funds every decision, that counts for rather a lot.
If you want to design your first AI pilot with the right use case, the right success criteria, and the right governance in place before you start, we've put together an AI pilot design template that covers each of those decisions. It's designed to be shared with your board or partnership before an initiative is approved - because making the discipline visible upfront is half the battle.