I'm going to do something slightly unusual here. I'm going to tell you about three things that went wrong on a real migration project we delivered at Distinction. Not "challenges we overcame" or "lessons learned" - actual problems that cost time, money, and a few difficult conversations.
Why? Because if you're a CTO or digital director approaching a platform migration, you've sat through enough agency pitches to know that everyone's case studies are immaculate. Every project delivered on time, on budget, delighted stakeholders all round. And you know - because you've been doing this long enough - that's not how it works. Not for us, not for anyone.
The question that actually matters isn't whether problems will occur. They will. It's whether the team finds them early or late, communicates honestly or defensively, and resolves things in a way that protects the project's outcomes. So here are three problems from a single engagement - what happened, why, how we dealt with it, and what we changed afterwards.
The original content audit gave us confidence. We mapped 11 content types across the legacy CMS, estimated the migration at 18 days, and built that into the project plan. We've done this dozens of times. Seemed solid.
Then we started the actual migration and discovered the audit had missed something fundamental. Around 35% of the pages contained bespoke component configurations that didn't appear in the template library. They'd been hand-coded by a developer who'd left the business three years earlier, each one slightly different from the last. What looked like 11 content types was actually closer to 30 variations, many with embedded media references pointing to files that had been reorganised - or quietly deleted - since the original pages were published.
The migration estimate went from 18 days to just over 60. Not a small overrun - a fundamental re-scoping.
Here's where I have to be honest about our part in this. The audit methodology we used at the time was template-level. We looked at the CMS content types as defined, sampled a percentage of pages, and assumed consistency. That assumption was wrong. We should have done a deeper, page-level sample - especially on a platform that had been in use for seven years with multiple developers touching it. That's on us.
How we handled it: I got on a call with the client's project sponsor within 48 hours of identifying the scale of the problem. Not an email. Not a status report buried in a weekly update. A direct conversation that went roughly: "We've found something significant, here's what it means for the timeline, here's what it'll cost, and here are the three options we see for dealing with it." We could migrate everything manually, use Henry - our AI-powered migration accelerator - to handle the variation programmatically, or rationalise the content first and migrate a smaller, cleaner set. The client chose the second option. It added five weeks to the project and around £18,000 to the budget.
What we changed: every content audit we run now includes what we call a "deep sample" - we pull a minimum of 20% of pages per content type and inspect them at the component level, not just the template level. It adds three days to the discovery phase. Worth every hour.
The second problem was an integration failure, and honestly, this one still irritates me.
We were integrating the new platform with the client's CRM - a mid-market platform with reasonable documentation and a vendor we'd worked with before. Nothing in the docs mentioned rate limits. We asked during the technical discovery call. No mention. The integration worked perfectly in development and passed initial testing without a hiccup.
Then we ran a load test simulating a realistic Monday morning: around 180 users hitting the portal simultaneously, each triggering CRM lookups on login. The API started returning 429 errors after roughly 2,800 requests per hour. The vendor's undocumented fair-use policy kicked in and throttled us hard. User-facing, profile pages were taking 12-18 seconds to load and about one in four timed out entirely.
Now, was this the vendor's fault for not documenting the limit? Partly, yes. But I'll admit something: our integration testing protocol at the time didn't include sustained load testing against third-party APIs at production-realistic volumes. We tested for correctness - does the data come back right? - but not for resilience under load. That gap is on us.
The resolution took three weeks. We built a caching layer that cut API calls by around 75%, implemented a queue for non-time-sensitive requests, and negotiated a higher rate limit with the vendor - which, it turned out, was available on their enterprise tier for an additional £6,000 per year that hadn't been in anyone's budget. The client wasn't thrilled about that. I don't blame them. We absorbed our own time on the fix because we felt our testing should have caught this before go-live.
What we changed: our integration testing protocol now includes what we call "load-to-break" testing on every third-party API. We deliberately push past expected volumes to find the ceiling before go-live, not after. We also require written confirmation of rate limits, fair-use policies, and throttling behaviour from every third-party vendor before we sign off the integration architecture. Sounds obvious. It is obvious. We just weren't doing it rigorously enough.
The third problem wasn't technical at all. It was organisational, and in some ways it was the hardest to manage.
We were six weeks into the project when the client's Head of Digital left the business. Her replacement came from outside the sector and had fundamentally different views about the platform direction. The original brief had been built around a headless architecture with a React frontend - a decision made carefully, with good reasons, after a proper discovery process. The new stakeholder wanted to revisit it entirely. He'd had a bad experience with headless at his previous firm and wanted to explore whether a traditional DXP would be more appropriate.
In one conversation, the foundational assumptions of the project were in question.
I want to be careful here, because it's easy to frame this as "the client changed their mind and caused chaos." That's not wrong, but it's not the whole story. The truth is, we could have done more at the outset to document the reasoning behind the original architecture decision - not just what was decided, but why, with the evidence and trade-offs explicitly recorded. If that document had existed, the conversation with the new stakeholder would have started from "here's why this decision was made, and here's what would need to be true for us to change it" rather than "let's relitigate the whole thing from scratch."
We lost about four weeks to what was essentially a re-discovery phase. I sat in six meetings with the new stakeholder working through the architecture rationale. We ultimately confirmed the original headless approach, but with two significant modifications to the content editing experience that addressed his legitimate concerns about editorial usability. Those modifications added three weeks and around £14,000 to the project. The bigger cost, though, was momentum - the delivery team was in a holding pattern for nearly a month while the direction was being confirmed. That's a particular kind of expensive that doesn't show up cleanly on a budget sheet.
What we changed: every material decision now gets recorded with its rationale, the alternatives considered, and who was in the room. When stakeholders change - and they do, more often than anyone plans for - the new person inherits a decision trail, not a blank slate. We also now build an explicit stakeholder change protocol into our project governance: if a key decision-maker changes mid-project, we trigger a structured re-alignment session within the first two weeks rather than letting the confusion accumulate through conflicting feedback in stand-ups.
Three problems, three different root causes. A content audit methodology that wasn't deep enough. A third-party dependency we didn't stress-test hard enough. An organisational shift that our governance documentation wasn't robust enough to absorb gracefully.
But they share something. Each problem was found early enough to be managed rather than late enough to become a crisis. And each was communicated to the client directly, honestly, and with options - not buried in a status report or softened in a steering committee update.
That's not heroism. It's process. Specific check-in cadences. Documented decision logs. Escalation paths agreed before anything went wrong. The boring stuff that doesn't make it into pitch decks.
I genuinely believe the governance and communication structures around a migration matter more than the technology selected. I've seen technically excellent platforms implemented by teams with poor governance, and the result is always worse than a solid-but-unremarkable platform implemented by a team that communicates well, finds problems early, and tells the truth about what they find.
Every agency tells you their projects go smoothly. I want to know what happens when they don't.
Fair. Now you know what happens when ours don't. And if that's the kind of transparency you'd want from a partner on your own migration, let's talk - no deck, no pitch, just a straight conversation about your project.