Program management · Field notes

How I Rebuild IT Programs at Scale: 18 Years of Lessons from Nordea to NATO

Large IT programs rarely fail for technical reasons. They fail because the operating model never caught up with the ambition. After eighteen years of rebuilding programs across banking, insurance, energy and defense, the same six patterns keep appearing — and the same governance move keeps pulling them out of the ditch.

By Damian Szulakowski April 24, 2026 12 min read

"Show me a program's WIP limit and its cadence of sponsor decisions, and I will tell you whether it will hit its date — before I look at a single Jira board."

Why programs stall

The programs I have been asked to recover tend to share a surface symptom: missed milestones. But the milestones are a lagging indicator. By the time a steering committee is staring at a red RAG report, the system has been degrading for two or three quarters. The teams know. The PMO knows. The sponsor has usually stopped asking hard questions because the answers have become uncomfortable.

Before I touch any ceremony, I run a two-week diagnostic. I read the last four steering decks, interview every squad lead, map the funded portfolio against the active backlog, and shadow a release planning session. Ninety percent of the time the diagnostic surfaces the same six failure modes.

Failure mode 1 — Outcomes that cannot be refuted

The charter says things like "modernize the lending platform" or "uplift cyber posture." These are directions, not outcomes. A good program outcome can be refuted by a metric in a specific quarter. If you cannot draw the counterfactual — what would have to be true for us to say this program failed on this date — you do not have an outcome, you have an ambition.

The fix is mechanical. I force every workstream lead to write a one-sentence outcome in the form: "By Q[X], [metric] moves from [baseline] to [target] for [population]." No adjectives. No "improved" or "enhanced." If the metric is not instrumented, instrumentation is the first epic. If the baseline is unknown, measurement becomes sprint zero. Sponsors hate this week. By week three they love it.

Failure mode 2 — Dependencies that live in people's heads

Every stalled program has a dependency graph that exists only in the head of two or three architects. When those people are in a meeting, the program moves. When they are on holiday, the program stops. This is not resilience. This is folklore.

I treat dependencies as a first-class artifact. Every squad publishes its inbound and outbound dependencies weekly, with a named owner on both sides and an agreed date. Dependencies without an owner and a date are escalated within 48 hours. The PMO is not a reporting function — it is the exchange that clears these trades. When the exchange is empty, the program runs. When it clogs, you have early warning three months before the milestone slips.

Failure mode 3 — Unbounded work in progress

The single best predictor of whether a program will hit its date is not team velocity. It is the ratio of started-but-not-done epics to throughput. I have walked into programs with 240 epics in flight and 12 delivered per quarter. The math says the average epic finishes in five years. Nobody wants to hear that, but the math does not care.

The intervention is unpopular and effective: a portfolio kanban with a hard WIP limit derived from historical throughput. If throughput is 12 per quarter, WIP is capped at 18. Everything else goes to a parking lot with a clear pull rule. Teams resist because it feels like cancelling work. It is not cancelling work. It is sequencing work. Done epics ship. Started epics bleed.

Failure mode 4 — The sponsor has no time

Programs worth half a billion euros routinely have a sponsor who gives them one hour per fortnight. That sponsor cannot make crisp decisions because they do not have enough context, and the program cannot escalate because the queue is longer than the meeting. The program drifts, not because anyone is negligent, but because the decision bandwidth does not match the decision volume.

I renegotiate the sponsor contract in week two. The ask is simple: ninety minutes per week, always at the same time, with a pre-read delivered 48 hours ahead and a maximum of three decisions on the agenda. If the sponsor cannot commit, I ask them to delegate a named deputy with decision rights. Programs without a decision cadence do not recover. Programs with one almost always do.

Failure mode 5 — Mixed funding horizons in one backlog

Regulated programs in particular tend to fund three horizons in one pot: keep-the-lights-on work, multi-year transformation, and opportunistic experiments. When all three sit in the same backlog competing for the same teams, the urgent always beats the important. KTLO eats transformation alive because KTLO has a named angry customer today and transformation has an abstract target in 2028.

The fix is to split the portfolio into three visible lanes with pre-allocated capacity: run, change, disrupt. The lanes are not aspirational — they are enforced at capacity planning. A team assigned 70 percent to change does not get pulled into run without a formal trade. The visibility itself changes the behaviour. Once sponsors can see that a KTLO pull is costing them a transformation sprint, the pull slows down.

Failure mode 6 — Accountability that averages to zero

The most persistent failure mode is also the most cultural. When five people are accountable for an outcome, nobody is. I have seen RACI charts with four R's, three A's, and an invitation list of twenty-seven. That is not accountability, that is a shield wall.

I insist on exactly one accountable executive per outcome. Not per workstream — per outcome. That executive has the authority to reprioritize, to stop work, to escalate, and to take the credit or the hit. They are not a figurehead. They attend the weekly decision forum. They read the pre-read. They sign the RAG status themselves. When that person is wrong, I have one call to make. When that person is five, I have a committee, and committees do not deliver programs on time.

The recovery pattern

The six failure modes are diagnostic. The recovery is a sequence.

Weeks one and two are the diagnostic. Weeks three and four rebuild the outcome tree and the portfolio kanban. Weeks five to eight install the cadence: weekly decision forum with the sponsor, fortnightly program increment review, monthly portfolio review with the funding committee. Weeks nine to twelve stabilize the metrics — throughput, lead time, escaped defects, dependency aging — until the program can predict its own delivery within a sprint of tolerance.

None of this is novel. PMI has been writing about it for decades. AXELOS codified it in MSP. Scaled Agile, LeSS and Disciplined Agile all have their variants. What matters is not the framework. What matters is that the operating model is visible, enforceable, and owned by people with enough time and authority to run it.

Where defense programs differ

Defense IT programs carry two additional constraints that civilian programs do not. First, classification boundaries fragment the team — contributors cannot always see each other's code, documents, or tickets. Second, the political timeline is not the engineering timeline, and ministerial commitments become dates that precede feasibility studies. Both constraints are survivable if you accept them as input rather than fight them as bugs.

For classification, I front-load the boundary analysis: which artifacts cross which boundary, under which approval, with which crypto. Teams then design around the boundaries instead of hitting them at integration. For the political timeline, I negotiate a staged commitment — initial operating capability first, full operating capability later — and I protect the IOC scope aggressively. The ministerial date lands on a real, if narrower, capability. Full scope arrives when engineering says it does.

I have written more about this pattern in the defense delivery playbook, including the six-step HowTo I use for structuring a program from mobilization to IOC.

What I watch in the first 30 days

Throughput stability. A three-sprint rolling average of epics delivered per team. If it is moving by more than 30 percent sprint over sprint, the system is not stable.
Dependency aging. The oldest outstanding inter-team dependency. If the oldest is over six weeks, I have an exchange problem, not a technical problem.
Decision velocity. Number of sponsor decisions taken per week versus number requested. A ratio below 0.7 predicts a slip within the quarter.
WIP-to-throughput ratio. If started work exceeds three times quarterly throughput, the program has no finish line — it has a queue.
Escaped-defect trend. Not absolute defects — the trend. Rising escaped defects mean test capacity is being consumed by rework, which means future throughput is already mortgaged.

What I do not measure

Story point velocity, feature-team happiness indexes, Jira board hygiene scores. These are noise at the program level. They matter inside a squad; they lie at the steering committee.

Closing note

Rebuilding a program is not a technology problem and not a methodology problem. It is a clarity problem. Clear outcomes, clear ownership, clear cadence, clear limits on work in progress. Everything else — the tooling, the frameworks, the ceremonies — is downstream of those four. I have watched SAFe work, Scrum-of-Scrums work, MSP work, and custom hybrids work. I have also watched all of them fail. The variable is never the framework. The variable is whether the operating model is visible and enforceable, and whether someone with authority is willing to hold the line.

If you are midway through a program that has started to drift, the most useful thing you can do this week is the diagnostic, not the reorganization. Count your WIP. Count your dependencies. Count your sponsor decisions. The numbers will tell you what to do.