Why do organizations continue to fall for the big bang rewrite?
After decades of proven and documented failures, the allure of big bang rewrites, overambitious bolt-on projects and all-you-can-eat modernization programs continue to entice and trap organizations, like moth to a flame. There are probably multiple reasons, looking very valid and legitimate, for that, along with factors across several dimensions (organizational, technological, reputational, budgetary, strategic, psychological and so on), that I will try to examine here.
At its core, the tendency to choose risky migration strategies often stems from a fundamental misunderstanding of the nature of technological evolution. True digital transformation is not about dramatic, one-time rewrites but about creating adaptable, resilient systems that can incrementally improve. History’s fields are strewn with the rotting corpses of these initiatives: NHS, Netscape, remember Vista? and many others that did not transcend into the public sphere, but every big company has some of these.
A model with more chances of delivering sustained success views technology not as a series of attempted revolutionary leaps done in haste when the situation is dire, but as a continuous, adaptive journey of measured improvements.
Definitions: Big Bang vs Bolt-on
Let’s start with clarifying the concepts. Big bang rewrites and Bolt-on modernizations are distinct approaches to modernizing legacy systems. We will focus here mostly on the perils of the big bang rewrite, but let’s be clear on both.
Big Bang
Which we can define, as a complete replacement of the existing system in “one go” (that’s a very relative manner to say it), with the goal that the entire system is rewritten at once. This has immediate problems, one being delayed feedback and the inevitable deviation from unrealistic expected completion dates at which the systems are expected to be swapped and no problems are expected to happen. Needless to say, the the all-or-nothing shot is extremely high risk and difficult to pull off. There are contributing factors such as developing from scratch in a new technology stack, with the associated risks of skills gap, bandwagon effects and FOMO, induced complexity, and loss of accumulated knowledge embedded in the legacy system. Also, this type of effort typically aims to replicate all existing functionality, while also adding new features, rather than taking the opportunity to also look at what can be simplified and optimized (subtractive approach rather than additive, less is more et).
Bolt-on
On the other hand, this denotes an incremental approach to modernization, that is, new functionalities are “bolted on” to the existing system. This, in theory, is better but not at all exempt from risks
Adds new components or features to the existing system
Often uses modern technologies alongside the legacy system
Gradual integration of new technologies, so you get feedback much sooner
The main risk is trying to “bolt on” too much, too soon, too fast, and/or creating a hybrid Frankenstein architecture that results in a system that is more difficult to maintain and more brittle generally, with the possibility of ossifying into an interim architecture that is difficult to move out of later on. This can be the case for example when part of the UI are modernized where others are not, using different backends while other pieces of the legacy are still in place.
There is a delicate balance to strike here and the risks that some poorer choices can become cemented in the new system, thus starting with more or new technical debt. Bolt-on approaches allow for continuous operation of the existing system, while big bang rewrites often require a complete switchover. In order to ensure stability, you will need to adopt certain practices and discipline in observability and careful rollout of new features and changes (canary releases, feature flags, A/B testing, fast rollback).
In both cases, integration concerns are often left for last, when it is well-known that integration is invariably tricker than anticipated.
Now that we’re done with the basic definitions, let’s briefly explore those factors in no particular order.
Socio-technical
Since it can be argued that organizations are socio-technical constructs, and today it is often difficult to cleanly separate the technical from the organizational, some factors affect both dimensions inextricably.
The Appeal of a Clean Slate - and the desire to end all the evils of the legacy system in one fell swoop. The idea of starting fresh with a completely new system seems much easier to sell, and budget easier to get this year with the promise of eliminating all the problems and technical debt of the legacy system in one go. This is often conflated with the idea that the new system will showcase a simplified system design, when the reality is that often the new system will be more complex (probably unjustifiably so) and will inherit flaws of the existing system design plus new ones acquired in its development. This can happen for a mix of several reasons here, from resume-driven engineering, to entrenched constituencies, bandwagon effect, or hero complex.
The Fallacy of Better Performance for Free - another unjustified but "dislodgeable" (is that a word?) meme or belief is that you will also get incredible performance “because the cloud” or something and that we will get there faster starting from scratch. The attractive of short-term gains over long-term stability and risk management, shoved aside. Same fate as befalls redefining features, processes, UX.
This ignores simple facts like that often the database and the physical data model are the ugly beasts no one wants to dance with because they can be fiendishly complex to untangle and the risk is far exceeds the palatable, therefore we put an anti-corruption repository on top, swallow the legacy (or “heritage”) and pretend to sweep it under the rug, hence we end up with the fake microservices architecture joint at the hip by the dependency on a common data model under them all.
Lack of Technical Debt Understanding - Most organizations struggle to quantify technical debt, often because legacy project are big balls of mud that have passed through many different teams, both in-house and external and therefore no one has complete intimate knowledge of the system. In the case of some systems, that can be impossible for a single individual.
Without clear metrics, leaders default to extreme solutions—either complete abandonment or total reconstruction—instead of nuanced, incremental improvements. Inevitably, this tend to result in a hand-waving away of hidden complexity and an overly optimistic appraisal of the time it will take. This is compounded when the next move is to hedge risk away by turning the modernization project into a turn-key project with an external vendor with no prior knowledge of the system.
Risk misperception - Paradoxically, sometimes it seems companies perceive incremental changes as riskier than massive, comprehensive rewrites, or maybe it is the reluctance to a sustained effort for multiple reasons, including the constraints from budgetary cycles. They fail to recognize that large-scale transformations introduce exponentially more risk than controlled, phased migrations, but they tend to think that one big shot will take less that a longer roadmap.
This is often also tied to the fact that while the modernization is due to happen, the current system has to be supported by the same team that has to build the brand new version, something that will very likely not work. This team is often the only repository of specific technical knowledge, but will be stretched beyond the reasonable. To compound the problem, while system X v2 is being modernized, features are often frozen for X v1, which tends to cause despair in business stakeholders who want new features yesterday crying that those are strategic and essential and will hurt the bottom line and we cannot wait one year.
So, the same feature is implemented hurriedly, badly and with technical debt from the start in both v1 and v2. In an effort to control and minimize this, calendars are rushed, often because of stakeholder pressure, which adds to the problem. To be sure, some organizations may very well accurately perceive the risks in front of them, but feel compelled to take them, or sort of ignore them, due to additional factors or extraneous pressures.
Skill Gap Denial - Companies often underestimate the massive skill transformation required, be it for cloud-native architectures, for process reengeneering, or for totally changing the operational model, or to leverage more modern methodologies like service design and design thinking. Instead of investing in gradual upskilling, organizations tend to imagine they can instantly transform their workforce through a massive rewrite, or again, in-house teams are kept away and the initiative is given to a apparently credentialed and reputable vendor.
I guess HBR and McKinsey fables do not help here and inflate these expectations. Vendors might also do the same in their hopes to land some of those big hairy projects. Supposing the project was completed like that, it is almost always the case that knowledge is very hard to transfer to the in-house employees. Also this can demotivate in-house teams who perceive lack of trust and lack of room for growth.
Fear of Impending Obsolescence - Rapid technological change creates existential anxiety. Often companies leave modernization initiatives until it is quite late, perhaps too late. We will do it next year, not this year because our budgets need to remain flat, or because we are doing this other thing, or because leaving the risk for the next guy that comes along. When orgs feel they are too far behind, or they need to present a shiny face to investors or the Board, then the pressure cooker is about to explode and the itch does not bear not scratching any more, organizations tend to feel an excessive urgency and zeal to modernize, but haste is rarely is a good counsellor.
This can lead to overcompensating by pursuing aggressive ambitious modernization strategies to demonstrate they’re not “falling behind.” But then, it is not the best moment to present a well-thought incremental modernization roadmap. Intense and mouting pressure from key stakeholders or the market for radical, visible change push organizations towards drastic solutions rather than measured approaches. The allure of a quick fix is strong, especially when facing urgent legacy issues. Fear of being left behind technologically is a powerful driver for rushed decision-making.
Below many of these motifs, the lack of a data-driven culture and of metrics for success or failure for this sort of program often lies. This can only exacerbate poor decision-making. Other biases that I explore in this article play an important part in pushing organizations to choose risky courses of action.
Another issue is that big bangs are often too focused on the technology side of things, neglecting looking at other areas that are “less tangible”, such as ingrained processes and culture, as well as organization design itself. Without also paying attention to those and changing things there too, it’s rather unlikely that the technology alone can lift it all up. Similarly, remember the well-studied Conway’s Law regarding organization design and system design.
There is also the so-called Agile Transformation Paradox which reflects how organizations that attempt to become more agile sometimes ironically choose the least agile approach to modernization. However, that’s an extensive and interesting topic for another time.
Organizational
Hierarchical Decision-Making Disconnects - Technical realities are often disconnected from executive decision-making. Leaders who don’t deeply understand technological complexities make sweeping decisions based on incomplete information and unrealistic expectations, especially if they don’t bother to ask for advice or support or to understand the finer details. It could be they were lulled by the vendor’s siren song and success case decks too.
Sunk Cost Fallacy - Organizations, or some members, become emotionally and financially invested in existing systems, believing that massive reinvestment will somehow justify previous expenditures. This leads to grand transformation narratives that promise complete redemption through total system reconstruction, rather than incremental, pragmatic improvements. See “Entrenched Constituencies” below. Ambitious transformation narratives are often peddled by vendors and gladly swallowed whole by organizations (see “Hero Complex” below). Similarly, loss aversion.
Entrenched Constituencies - a (anti)pattern I’ve seen is stakeholders who perceive their job is basically tied to the survival of a given system, preferably in their current form since someone has to “manage the chaos” and “understand the complexity” so they can leverage that bit of obsolete knowledge.
These stakeholders will prefer investing in non-transformational patching so that things remain basically the same, as much as NGOs are often interested in perpetuating, if not aggravating, the problems that underpin their raison d’etre. A definitive solution is not something these stakeholders cherish. This is tied also to budgetary control, team size, reporting and position in the org chart (therefore, psychological factors like ego, perceived self-value and relevance)
Underestimating Legacy Knowledge - organizations often disregard or fail to recognize the value of the accumulated knowledge and bug fixes embedded in legacy code. A common mistaken belief is that new code will automatically be better than old, battle-tested code, but that is not necessarily the case. I know, this seems contradictory with other aspects mentioned here, but as always, it depends. Depends on the extant technical knowledge, for example if you have a long-standing team with knowledge vs. they left or were let go and you lost this knowledge.
Underestimating the knowledge of legacy codebases has an added peril that impacts the inherent challenges around integrating legacy systems with modern systems and technologies, especially when it comes to data consistency and integrity. Losing that legacy knowledge will aggravate the issues here. Integration work is rarely simple.
Organizational Procrastination: Sometimes it can be the case that organizations rush into big bang rewrites when faced with regulatory deadlines or compliance requirements. WHether that is something that can be anticipated or not and dealt differently, rather that with a big bang, is an exercise for each specific case, but just like people, organizations also tend to procrastinate and leave some initiatives for last. By then, external pressure leads to, or forces, hasty decisions and artificial timelines. A variation of this is when “panic” sets in and the organization suddenly feels the pressure to modernize in order to keep up with the competition. Perhaps complacency or not paying attention to how the market evolves, to weak signals etc has made the organization rest on its laurels for too long. Similar to Fear of Impending Obsolescence examined earlier.
Lack of Experience - Many organizations lack experience (and the wherewithals or the will to commit them in sufficient amounts) in conducting successful incremental modernization strategies. They may not be aware of the different strategies, options and patterns for gradual replacement with less risk. Instead of building and bridging the skill gap mentioned above, they rush into it.
Cultural / psychological
I won’t repeat on other cognitive barriers and biases that tend to appear in any knowledge-intensive work already explored here, but I added a few ones that seem very particular to big technological efforts.
Hero Complex or the big promise of technological salvation - Technology leaders are often incentivized to pursue “heroic” transformation narratives, presenting themselves as the white knight who solved the gordian knot and ushered in a new era of unheard of prosperity and technological advancement. A complete system rewrite allows them to position themselves as revolutionary change agents, visionary IT leaders and all-mighty wielders of IT prowess, rather than as methodical improvers delivering value in smaller increments with less risk and cost to the organization.
Remember the Greeks, the hubris of the hero brings about the retribution of nemesis. But I digress. Death marches are sometimes a show of heroics, other times just because someone set a milestone in stone for whatever political or marketing reason.
College of Performative Arts - this probably falls between the cultural, the organizational and the psychological but let’s leave it here. Technology leaders often feel the pressure to demonstrate transformative capabilities. A “rebuild from scratch” or “complete modernization” initiative looks more impressive in quarterly presentations and annual reports than measured, conservative approaches with less bombastic results. It creates a narrative of bold technological leadership, even if the underlying strategy is probably fundamentally flawed for many of the reasons explored here and others that aren’t. In sum, the need to demonstrate progress to stakeholders can drive artificial deadlines, neglecting the impact on team morale and technical decisions.
New Ambitious Guy: sometimes a new CIO/CTO is looking to make their mark with transformative projects too soon too fast. If there were previous failed programs, incremental or not, those may be used to justify more radical approaches. This can be a sure killer for the new guy.
Technology Romanticism - There’s a persistent, perhaps “romantic”, or techno-utopian if you will, idea that new technologies can solve all legacy system problems, where often many problems are not rooted in the technological domain, but rather in obsolete operating models, cumbersome processes far past their expiry date and not fit for purpose organizational models. This leads to an almost religious belief in complete reconstruction (the rapture, narratives of new beginnings), ignoring the existing complex realities of existing business processes and institutional knowledge, because, well, they are more complex to deal with and cannot be easily transferred to a third party or require an intentional careful and long process to fix. Related to this is the fact that starting a big new project is more appealing to management or developers than the quieter work of incremental improvement. This can show up as cv-driven architecture or resume-driven programming with teams pushing for adopting microservices without carefully weighing the decision, or by vendor-driven complexity, as explored below.
Undue Confidence - More down to earth than the previous two, there’s often an overconfidence that the new system will be superior and avoid all the problems of the old system as if by magic or by executive fiat. This ignores several facts, one being that many of the legacy problems emerged from real-world usage and years of accrued layers, features and cruft added by many different teams whose knowledge was long lost, and that often the modernization project will prefer to focus on the technological side of things, forgetting about processes, UX, operational considerations etc as mentioned elsewhere here.
Cargo-culting and Bandwagon effect - it’s nothing new that certain types of technological solutions become raging trends and since everyone copies everyone and there is no lack of hype and FOMO, decision-makers will no doubt feel very influenced to adopt specific technological solutions because those seem to be industry norms and you don’t want to be the odd man out. This is not to say that those technological paradigms do not hold value or promise, but to beware the mirage of the holy grail solutions.
Consultant- / Vendor- Driven Complexity - Management consulting firms frequently recommend comprehensive transformation strategies that maximize their billable hours. These strategies typically involve complex, large-scale rewrites that create ongoing consulting opportunities and dependencies. More on this topic about consultancy firms in a previous article.
One image is worth…
Now that we have explore these main factors that often drive the decision to go for a big bang modernization effort, let’s show a simple graphic that I borrowed from https://microservices.io/ and which shows well two main problems that should be main takeaways here
Time to first feedback is far too late
Actual time to completion is much later than anticipated
The “Strangler fig” pattern showcased here is a well-known modernization strategy that gradually replaces an existing system by building a new system around it with the idea of reducing risk by allowing incremental migration. But that is not the topic now.
There are other choices, of course. An article is coming soon on those too, examining modernization and cloud journeys in particular too, from a similar angle.
Mitigation
The first step in the cure must be to learn of the disease. It’s not rare that the different pressures in the business environment and juggling multiple projects vying for our attention push us towards operating in a certain degree of default mode or automatic pilot, which is my motivation in examining these projects from a bias angle with this article and previous ones.
We must first be aware of these multiple factors that can drive us to assume undue risk and not to unthinkingly fall prey to them. That being said, what else can we do? These are general recommendations and not silver bullets.
If the project is really high-stakes (modernizing a core differentiator), staff it accordingly and find the best people and talent to do it. Ensure some of your best people are dedicated to the team. Ensure there is a dedicated Product Owner and Architect. If you have an Enterprise Architect team, they can assess the current state of systems and identify gaps between the existing architecture and the desired future state. This analysis is crucial for determining modernization priorities in terms of portfolio management and in alignment with the organization’s overall business strategy and goals. They should be able to indicate what interim architectures could be necessary in the modernization journey. A classic example is lift and shift between on-premises (origin) and serverless (target). In itself this example sounds simple, but the system will be more complex and will not be working in isolation. Subsystems could be at different stages of legacy and could have different modernization paces in their journeys.
Conduct a honest pre-mortem exercise. Be sure to understand where the biggest pitfalls are and how you can prevent those. Technically and organizationally.
Go for phased Incremental Modernization and break down the modernization effort into smaller, manageable phases. Thus, the gradual implementation of change and new architecture/modernized feature will enable much shorter feedback cycles and continuous assessment and adjustment. You will not risk a big system-wide failure at the big bang end. Work in sprints, needless to say.
Decide which parts of the new system, or new subsystems, are critical, which can be offloaded to a SaaS offering and which are much better known and low risk and can therefore assume the risk of a big bang rewrite. Use tools like the Independent Service Heuristics from Team Topologies.
Always have a clear and working rollback plan or exit strategy for each major phase of the modernization and for key components. You want to have a clear, tested and fast procedure for reverting changes or decisions. Test it regularly.
Pay attention to buy vs build decisions and trade-offs
Knowledge preservation and documentation. If the documentation of the existing system is found wanting, as is often the case, invest in using legacy knowledge existing in the organization to thoroughly document existing business logic and system behaviors, including business stakeholders that often can uncover and illustrate edge cases, workarounds and undocumented features or behaviors.
Keep your business stakeholders/users engaged. Listen to their feedback on new features and changes. . Develop a clear and simple communication strategy. Keep stakeholders informed of progress and challenges, It can be difficult with key stakeholders in high places, but do not be tempted to hide problems under the carpet. If you are not able to fix later on, you will be in problem. By all means bring solutions and mitigations, or lessons learnt, but do not hide risks until it is too late to manage. Also, set realistic and sensible expectations for the modernization timeline and outcomes, do not cave in to the pressure for faster delivery as that will be a problem for your future self.
Director of Technology
5moLauren Stafford as mentioned.
A while ago I read this article which is somehow related to biases and uncertainty in the decision-making process, and the idea to challenge our own vision or conclusions based on a preconception of reality (up to the point to be paralyzed): https://www.wgbh.org/news/2012-06-02/a-peek-inside-the-cia-as-it-tries-to-assess-iran "Your biases will get you things like a confirmation bias: 'I've seen it before, so it must be happening again.' Or an anchoring bias: 'We've come up with that conclusion, and I think it's true, and it's not going to change.'"
Paver le futur du cloud souverain!
7moVery insightful. I would be interested in some publicly known use cases of factual modernization "successes" and the strategy that succeeded? Of course, a little "darker" side of me would also like to read about their failure counterparts; but I don't think a lot of organization will share that. On a different note, concerning modernization strategy (big-bang, bolt-on, incremental phase, etc) I would stick to the "it depends" answer. It depends on the problem you are trying to solve by modernizing. It depends on the piece of software to modernize as well (i.e. limitations). It depends on the pocket depth. It depends... 😬