THE BRIEFING ROOM

How to run an internal pilot that builds confidence (not cynicism)

Let me tell you about a pilot that went wrong. Not spectacularly wrong - nobody lost a client, nothing caught fire. It went wrong in the way that most pilots go wrong, which is quietly, politically, and in a way that made the next conversation about investment twice as hard.

A mid-sized consulting firm - about 200 people, four offices - wanted to test whether automating parts of their proposal process could save senior consultant time. Reasonable hypothesis. Good instinct. They picked a team, gave them access to the tools, ran it for six weeks, and then presented the results to the leadership group.

The results were... fine. Some time saved. Some teething problems. A few enthusiastic quotes from the people involved. A slide deck with a bar chart that went up and to the right if you squinted.

And a partner in the back of the room said: "So we spent six weeks and we still can't tell if this actually works at scale?"

He wasn't being obstructive. He was right. The pilot hadn't been designed to answer a specific question. It had been designed to generate momentum. And the most experienced people in the room could smell that from across the table. The person presenting went very quiet. The room shifted. And I remember thinking: this is going to set them back six months, minimum.

I've seen this pattern enough times now that I've started to think the badly designed pilot is actually more damaging than not running one at all. Because at least before the pilot, you had an open question. After a bad pilot, you have a data point that sceptics will use against you for the next eighteen months. "We tried that. It didn't really work, did it?"

So let's talk about how to design a pilot that actually builds confidence. Because a good one is genuinely one of the most powerful tools you have for getting investment decisions across the line.

Start with a hypothesis, not an exploration

This is the single biggest mistake I see. The pilot gets framed as an exploration rather than a test.

"Let's give the team access to the new tools and see what happens" sounds open-minded and pragmatic. It is neither. What it actually does is guarantee that the results will be ambiguous enough for anyone to interpret them however they want. The advocate will cherry-pick the wins. The sceptic will cherry-pick the problems. And the leadership team will be no closer to a decision than they were before you started.

A good pilot tests a specific hypothesis. Something like: "We believe that AI-assisted first-draft review will reduce per-matter review time by 35% for the due diligence team. This pilot will confirm or deny that hypothesis using a representative sample of 20 matters over eight weeks."

That's a statement that can produce a yes or a no. And here's the thing about a clear yes or no - a sceptic can argue with observations all day long, but they can't easily argue with a hypothesis that was stated in advance and confirmed by the evidence everyone agreed to measure.

The difference between "we'll see what happens" and "we're testing whether X produces Y" isn't semantic. One produces a decision. The other produces a debate that runs until someone senior gets bored and calls it.

I've written separately about how this applies specifically to AI initiatives, where the temptation to explore is even stronger because the technology feels novel. But the principle holds for any type of pilot - portal adoption, process changes, platform migrations, whatever. If you can't write the hypothesis in one sentence before you start, you're not ready to start.

Choosing the right scope - the Goldilocks problem

Scope is where most pilots quietly die. Too small, and a positive result gets dismissed as an outlier. Too large, and a negative result creates a political crisis before you've had time to learn anything.

I think about pilot scope through three lenses - and I'll keep this brief because the underlying logic is simpler than it sounds.

The first is credibility. A pilot involving one person for two weeks might produce brilliant results, and nobody will care. Your CFO isn't signing off on a six-figure investment because one enthusiastic junior had a good fortnight. The sample needs to be large enough that the results can't be attributed entirely to the specific individuals involved or the specific conditions of that week. Twenty matters, not three. Eight weeks, not two. Five team members, not one.

The second is safety. The scope should be small enough that if the results are negative, you haven't damaged a client relationship, created regulatory exposure, or produced a failure so visible that it poisons the well before you've extracted the learning. This is particularly important in regulated industries - I've seen firms test client-facing changes in pilots without thinking through what happens if the pilot fails in front of actual clients. You want to learn, not create a compliance incident.

The third is representativeness. If you only test with your most digitally confident team, in your best-performing office, on your simplest work type, the sceptics will - correctly - point out that you've proven it works in ideal conditions. That's not evidence. That's a demo.

Getting all three right simultaneously is genuinely hard, and I'd be lying if I said there's a formula. But here's a useful test: before you finalise the scope, ask yourself what your most thoughtful sceptic would say about it. Not your most unreasonable sceptic - you can't design for bad faith. The partner who's genuinely cautious, who's seen initiatives come and go, who wants to be convinced but needs real evidence. Would this scope be enough for them?

If the honest answer is "probably not," make it bigger. If the honest answer is "this is starting to feel like a full programme," scale it back. Somewhere in between is usually right.

The communication strategy most people skip

Here's where things get properly political, and where I see the most avoidable damage.

Before the pilot starts, the sceptics and the advocates both need to know the success criteria and the timeline. Not after the results are in. Before. A pilot whose success criteria are only revealed alongside the results isn't a pilot - it's a post-hoc justification, and everyone with more than ten years' experience in your firm will recognise it as such.

This feels counterintuitive. You might be thinking: but what if we set the bar wrong? What if we commit to 35% and only hit 28%? I get it. But moving the goalposts after the fact is worse. Much worse. Because the moment you do that, you've confirmed every suspicion the sceptics had about the programme being a foregone conclusion dressed up as an evidence-gathering exercise.

During the pilot, send brief, regular updates that describe what is happening without claiming results. "We're in week four of eight. Twelve matters have been through the new process so far. The team has flagged two issues with the data quality in older files, which we're addressing. Full results at the end of the measurement period." That's it. Maintain awareness without over-promising. Resist the temptation to share early wins before the data is complete - you'll regret it if the second half doesn't match the first.

After the pilot, present the results - positive and negative - with explicit comparison to the pre-stated success criteria. And here's the bit that takes genuine courage: present them to the sceptics first. Not around them. Not in a carefully staged town hall where the narrative is already locked down. To them. Directly. Before the wider announcement.

A sceptic who learns the pilot results at the same time as everyone else cannot credibly claim the results are being managed. You've removed their best weapon, which is the suspicion that they're being sold to rather than informed.

I had a COO tell me once that this approach felt like "giving ammunition to the opposition." I understood his concern. But the sceptics already have ammunition - they've got years of mixed results from previous initiatives. What you're doing is replacing that old ammunition with current evidence, on your terms, with your hypothesis framing the conversation. It's not a concession. It's a takeover.

When the pilot doesn't work

Right, this is the section that matters most, and the one that people most want to skip.

A pilot that produces negative results has not failed. I'm going to say that again because I genuinely believe it: a pilot that produces negative results has not failed. It has produced information. Specific, useful information that tells you this use case, at this scope, with these foundations, does not produce the hypothesised outcome.

That's not a consolation prize. Was it the use case itself? Was the data quality not there? Was the change management insufficient? Was the hypothesis just wrong? Each of those is a different problem with a different solution, and knowing which one you're dealing with is worth a lot. The organisations that successfully learn from negative pilots communicate the result honestly and immediately, explain what they learned, and describe the adjusted approach that the learning informs. "The pilot didn't confirm our hypothesis. Here's why we think that happened. Here's what we'd do differently. And here's what the negative result tells us about the two other use cases we were considering."

The organisations that lose all credibility are the ones that try to reframe a negative result as a partial success. You know the language - "while we didn't hit the target, we saw promising indicators that suggest with further refinement..." Your sceptics are not stupid. They can see what you're doing. And once they catch you doing it, every future positive result from every future pilot will be viewed through the lens of "are they spinning this too?"

One important caveat: success criteria aren't carved in stone forever. Sometimes what a pilot reveals genuinely changes what the right criteria should have been. If you discover mid-pilot that the hypothesis was measuring the wrong thing, that's legitimate. But the time to acknowledge that is during the pilot, transparently, with an explanation - not after the results are in, conveniently adjusted to make them look better.

There's a companion piece on change management planning that goes deeper on this - because honestly, a lot of pilot "failures" are actually change management failures in disguise.

From pilot evidence to scaling investment

So let's say the pilot works. The hypothesis was confirmed. The sceptics have seen the evidence, presented honestly, and while they might not be enthusiastic, they can't argue the results were fabricated. Now what?

The bridge from pilot evidence to Phase 2 investment has three components, and missing any one of them will stall the conversation.

Connect the pilot result to the scaling opportunity with specific numbers. "The pilot confirmed a 35% reduction in per-matter review time across a sample of 20 matters. Across the full due diligence practice of 45 matters per year, that represents approximately 680 hours of recovered senior consultant capacity, which at current charge-out rates equates to roughly £340,000 of realisable value." That's not a projection - it's an extrapolation from evidence you've already generated. Very different thing, and your CFO will appreciate the distinction.

Then acknowledge what was different about the pilot that may not replicate at scale. This is where most people get nervous, because it feels like you're undermining your own case. You're not. You're demonstrating that you've thought about it. "The pilot team was self-selected, which likely means higher digital confidence than the practice average. The matters were selected for relative simplicity. At scale, we'd expect the reduction to be closer to 25-30% initially, with improvement as the team builds familiarity." A sceptic who hears you acknowledge the limitations before they raise them is a sceptic who's starting to trust you.

And then make a Phase 2 request that's proportionate to the evidence. Not the full ambition of the programme. The next reasonable step the pilot has earned. If the pilot tested one team on one use case, Phase 2 might be three teams on two use cases. Not a full organisational rollout. Not a £500,000 platform investment. The step that the evidence supports.

I've written a separate piece on how to structure a phased digital investment proposal that covers this in much more detail - including how to frame the financials and the governance structure. If you're at the point where your pilot has delivered and you're trying to build the Phase 2 case, that's probably worth reading alongside this.

The pilot nobody remembers

The best pilots don't become legendary stories. They become quiet, unremarkable stepping stones to the next decision. The leadership team doesn't remember the pilot itself - they remember that the evidence was clear, the communication was honest, and the recommendation was proportionate.

That's the goal. Not excitement. Not a standing ovation. Confidence. The kind of confidence that comes from a process that was transparent enough that even the people who didn't want it to succeed couldn't find a legitimate reason to dismiss it.

If you want the pilot design framework - hypothesis statement, scope criteria, communication plan, and success criteria template - as a one-page document, download it here. It's the same structure we use when we're helping clients design pilots, and it works whether you're testing an AI use case, a portal redesign, or a process change.

Because honestly, the hardest part of most digital programmes isn't the technology. It's getting a room full of experienced, cautious people to agree that the evidence supports taking the next step. A well-designed pilot is how you earn that agreement. A badly designed one is how you lose it - sometimes for years.