Main Cloud Migration Strategies
Cloud migration options are commonly referred to as the "6 Rs" of cloud migration,
Rehost
Re-platform
Refactor / Re-architecture
Repurchase (move to a SaaS, generally)
Retire
Retain
but arguably we could also consider two more possible strategies
Rebuild
Replace
Note: I realize this is hardly a new thing, so I will try to explore as well the organizational ramifications for each strategy that I think should be taken into account, and not merely explain the strategies.
The levels of effort, risk and cost involved in each are obviously different, which is why a practice of continuous portfolio management matters.
An effective IT application portfolio management practice is essential for enterprises with large ecosystems to maintain operational efficiency, reduce costs, and support strategic decision-making. A portfolio should provide visibility into application usage, business value, and costs, in order to understand when the latter start to grow compared with the former, identify redundancies, know when to modernize legacy systems, and allocate always scarce resources strategically. Continuous portfolio management methodologies, such as Gartner’s TIME (Tolerate, Invest, Migrate, Eliminate) model, offer structured and objective approaches to assess and categorize applications along dimensions such as technical health, technology lifecycles, business value, strategic alignment etc. These frameworks allow enterprises to streamline their IT landscape by determining which systems to retain as-is, enhance, migrate to newer platforms, or phase out entirely. TIME is just one such tool, there are other similar tools out there, like PACE (yes, also Gartner), a Horizons-based model (core, growth-oriented, emerging), the BCG matrix, a heavy methodology that you can adapt, such as TOGAF, or any custom application scoring framework tailored to your own organizational goals and strategy, normally measuring things like business impact and value, user satisfaction, TCO, risk, stack obsolescence, cost effectiveness and so on. Ideally, data-driven from actual application usage and not stakeholder gut-feeling or opinions. There are vendor agnostic approaches too, such as MoSCoW that can be applied to IT portfolio management. Simpler, more generalist tools like Impact-Effort Matrix or the Kano Model are also useful tools. There are countless others.
I am not selling or advocating for any specific tools from the ones cited. Vendor methodologies come and go all the time, rehashed, reheated and served again to willing customers. But as a tool, they can provide useful guidance by indicating dimensions you should pay attention too, so can help guide your thinking. At the same time, many of those frameworks - especially those are not complicated to use like TOGAF - tend to the oversimplification of the world to 2x2 matrices. Use judiciously. There are entire books on the topic of IT portfolio management, so there are more details than can be captured in a simplistic matrix.
What matters here is that you employ an objective and comprehensive methodologies for articulating and communicating decisions. Or, you adopt a Portfolio Management tool. Ultimately the idea is to have a clear objective view and understanding of the IT Portfolio, so you can decide different migration roadmaps with clear exit criteria defined. Your EA principles should direct those, in line with business strategy and goals, establishing solid governance goalposts. The roadmaps will be showing how to implement phased migration approaches, that will probably be different for different applications. Anyway, this article is not about this topic.
Before we begin
A basic taxonomy to be clear on the terms
Cloud-Native
Applications built specifically to maximize the capabilities of cloud environments, leveraging microservices architecture for modularity, scalability, and independent development.
API-first design ensures seamless integration and interoperability, enabling applications to consume and provide services efficiently.
Incorporates serverless architectures (e.g., AWS Lambda, Azure Functions) and containerization (e.g., Docker, Kubernetes) to optimize resource utilization and scalability.
Emphasizes statelessness and distributed systems design, enabling seamless scaling and high availability across geographic regions.
Designed with continuous delivery (CD) in mind, allowing for rapid iteration and deployment cycles.
Cloud-Resilient
Focused on creating fault-tolerant systems that maintain operational integrity during component failures or disruptions.
Employs resilient design principles, such as redundancy, automated failover mechanisms, and self-healing capabilities.
Supports cloud-agnostic runtime integration, enabling portability across multiple cloud platforms without vendor lock-in.
Embeds proactive monitoring and bundled metrics, offering real-time performance insights and issue detection.
Practices chaos engineering and proactive failure testing to simulate disruptions, identify weaknesses, and build resilience against outages.
Cloud-Friendly
Adopts the 12-factor application methodology, which emphasizes principles like environment parity, configuration as code, and dependency isolation.
Designed for horizontal scalability, allowing for dynamic addition or removal of resources based on workload demands.
Leverages platform-managed services (e.g., load balancers, managed databases) to simplify operations and enhance reliability.
Optimized for high availability by taking advantage of built-in cloud platform features, such as regional redundancy and auto-scaling groups.
Cloud-Ready
Applications exhibit minimal dependency on physical infrastructure, with no reliance on permanent disk access, ensuring flexibility for cloud environments.
Self-contained applications are designed with all required dependencies bundled, reducing setup complexity and improving portability.
Rely on platform-managed ports and networking, enabling seamless connectivity and load distribution within cloud ecosystems.
Built to leverage platform-managed backing services, such as managed storage, caching, and messaging solutions, reducing the burden of maintaining underlying infrastructure.
Positioned as candidates for cloud migration, often requiring minimal rework or reconfiguration to operate effectively in a cloud environment.
Exploring the strategies
Rehost (Lift and Shift)
This approach involves moving applications to the cloud with minimal modifications. It's the quickest and least complicated (which does not necessarily mean simple or not time consuming) migration method, essentially transferring existing servers and applications to cloud infrastructure without changing the code. Normally this means just shifting the workload to a replica of the on-premises environment to vanilla VM-based IaaS. You could containerize those workloads, but ultimately they remain the same under the hood. This approach can be suitable for organizations with time-sensitive migration needs, limited cloud expertise, a large portfolio that precludes a comprehensive cloud journey to PaaS, or applications that don't need or require immediate optimizations. Pros include a faster migration, minimal disruption, and lower immediate transformation costs and risks. Cons include missed opportunities for cloud-native optimization, potentially higher long-term operational costs (that many VMs) due to the large IaaS footprint, and limited scalability.
It's probably something not done massively anymore in 2024, unless you are still closing datacenters , or perhaps incorporating new pieces after an M&A deal, and it is the less mature option in a cloud journey. It can also be a trap from which it is difficult to move out later. Once you are in "the cloud", business stakeholders might be already expecting the miracles of technology to manifest, and there can be less appetite for continuing the journey towards cloud-native modernizations.
To sum up:
Minimal immediate skill transformation needed, especially if a vendor is handholding you
Quickest migration path, ideal if you need to meet tight deadlines to tell your stakeholders, investors etc. that "you are in the cloud"
Limited initial disruption to existing processes, which has less friction and requires less discussion and stakeholder management, but can be probably a bad thing since inefficient processes remain unchanged
Potential for future optimization, which alternatively means limited immediate cloud-native benefits are obtained
May introduce some sort of complacency or reluctance to move forward in the cloud journey for those applications that merit it - applications that are thus migrated tend to remain and linger in that state for too long if they seem to "just work", especially in large portfolios
Constraints include higher long-term operational costs
Limited immediate cloud-native benefits, which can mean missing strategic value, or
Continued challenges in achieving operational efficiencies
Be sure to assess the long-term costs of running a large IaaS footprint
Starting small with the classic "low-hanging fruit" approach is a sensible way to ensure early wins and build organizational confidence, and probably applies to almost, if not all, strategies here.
Re-platform (Lift, Tinker, and Shift)
A middle-ground strategy where applications are migrated with some cloud optimization but without complete redesign, think for example moving only some parts of the systems that are clear-cut and not tangled in a web of dependencies (admittedly rare, but could be the case), ancillary services, or new requirements that could be implemented as a separate piece instead of adding to the monolith you already have. This might also involve modernizing a tier to leverage some cloud capabilities, such as migrating to a managed database offering or implementing basic auto-scaling, with minor modification to your codebase, such as removing or relocating persistent user state or sessions. Or, alternative, moving only some part of the application, some clear-cut service, component or integration that is easier to carve out. A tool like Independent Service Heuristics can be helpful here to identify candidates, as well as specific domain knowledge.
This migration option is suitable for key parts of the portfolio where reaping cloud benefits in an incremental while removing the risk (or the impossibility) of a full re-architecture, where such a thing is not warranted. As with most of the other options, a technical feasibility assessment should be done to find out if this is the right option and how far to go with the modernization. Pros of this option include performance improvements, albeit probably moderate, some cost or operational optimization, and reduced migration complexity avoiding riskier choices. Cons include incomplete cloud-native transformation and potential ongoing performance limitations. PaaS services like Azure App Service or Google Cloud’s serverless offerings to enhance applications without full redesign, although it will be unlikely you can move to a PaaS offering with no modifications either.
Presents more moderate organizational risks:
Enables faster wins for operational improvements while postponing full transformation
Partial skill transformation requirements, depending on how far it goes
Potential performance inconsistencies
Could require the ability to develop different parts of the system with lifecycles that move at different paces (IT Speeds etc), which could lead to problems later on
Requires more diversity of skills in teams, but that will generally be a good thing
Incremental changes may not deliver expected benefits, or present too many issues and technical problems down the road
Risk of creating hybrid environments that are harder to manage (operations, skills, as mentioned)
Incomplete leverage of cloud capabilities
As with the previous approach, this one too risks perpetuating legacy issues in partially modernized systems that never fully modernize
Inherent increased complexity of distributed systems
Refactor/Re-architecture
A comprehensive transformation where applications are significantly redesigned to fully leverage cloud-native architectures, typically involving microservices, containerization, and serverless technologies. Best for mission-critical applications where performance, scalability, and innovation are paramount. Pros include maximum cloud efficiency, enhanced performance, improved scalability, and future-proofing. Cons include high complexity, substantial time investment, significant expertise requirements, and potentially high transformation costs and risks.
A close second in risk profile, with critical organizational challenges such as:
Requires extensive retraining of existing development teams, including building more sophisticated skills like domain-driven design (DDD) for effective decomposition of legacy applications (or you need to resort to specialized consulting)
Substantial cultural resistance to fundamental architectural changes
High likelihood of introducing new technical debt while attempting to eliminate old debt - seen this happening a few times in spites of continuous assurances and promises (incidentally this is not specific to the cloud per se)
Complexity of managing parallel systems during migration
Risk of Reengineering
Potential outages or performance degradation during transition
Significant investment in new tooling and development practices
Two sides of the same coin: better possibilities of incorporating newer technologies at the risk of over-engineering solutions
Requires robust governance to prevent scope/feature creep and ensure alignment with business goals (the "since you are already working on this application" syndrome)
Can lead to just containerizing big applications that suffer no modification to their architecture and thus you end running huge pods that are just VM's in disguise
Repurchase (Replace with SaaS)
Replacing existing on-premises or custom applications with COTS or commercial Software-as-a-Service (SaaS) solutions. Ideal for non-core, non-differentiating standardized business functions like Field Services, CRM, HR, Marketing & Communications where specialized cloud solutions exist. Pros include immediate cloud benefits, reduced maintenance overhead, automatic updates, and typically lower total cost of ownership. Cons include potential loss of customization, vendor lock-in, and possible data migration challenges when moving out of your silos and proprietary custom solutions. Political issues might arise with stakeholders clinging to their beloved applications (and the budget that goes with them), so perhaps the main risks with this strategy fall in the organizational category:
Vendor lock-in challenges, often for the long term, but that's life
Data migration and integration complexities - there will have to be integration projects depending on this migration, which requires a full api-first mindset first
Potential misalignment between SaaS capabilities and specific business processes (you are guaranteed to have many complains here as people protect their turfs, aka job security and org chart relevance)
User adoption and change management difficulties
Hidden costs of customization and integration - which often happens with the siren song the implementing vendor will no doubt offer
Loss of unique competitive advantages embedded in custom solutions - albeit in theory you can have a strategy and build an ecosystem of enhanced services or competitive advantages on top of a COTS product, and this is only a maybe, since you could be repurchasing some system that is not really a core differentiator and more of a standard function everyone already has
Evaluate your SaaS adoption with longer term business strategies and consider vendor-related risks
Retire
Decommissioning applications that are no longer necessary or valuable to the business. This strategy involves identifying and eliminating redundant, obsolete, or inefficient applications during the migration process. Beneficial for organizations looking to streamline their IT portfolio, reduce maintenance costs, and simplify their infrastructure. It is probably the lowest effort option, but some things still need to be attended to, such as migrating or phasing-out customers or users, what to do with data and identify any hidden or forgotten dependency. So, basically, an impact assessment. Pros include reduced complexity, lower operational costs, and improved security by eliminating unnecessary systems. Cons might include potential disruption to business processes and the need for careful impact assessment.
Lowest organizational risk:
Simplifies IT portfolio and frees up IT resources
Reduces maintenance overhead
Potential cost savings
Risks include potential loss of legacy functionality, especially if there are dark corners of undocumented functionality that no one knows anymore about and that could impact workflows in legacy systems with low utilization. Apply the "pull the plug and listen for the shouting and swearing" principle (aka perform impact assessment)
Again, there might be some people clinging to those systems for justification
Disarm politically motivated stakeholders by having data that clearly identifies underutilized applications or legacy systems with limited business value
Retain
Keeping certain applications in their current environment, either due to compliance requirements, recent upgrades, or technical constraints. This might involve postponing migration or maintaining a hybrid infrastructure. Perhaps it is not worthy, not feasible or sunset is expected to happen soon (although I've seen systems that have survived their "roadmapped" decommissions for years). Appropriate for highly specialized applications, those with complex dependencies (think Team Topologies' complicated subsystems and beyond), systems with recent significant investments, or situation where risk, cost and timelines are too high (let the next C*O eat that one). Technology availability could eventually become a killer, though. Pros include minimal immediate disruption and preservation of recent technology investments - or even further amortization. Cons include missed cloud benefits, potential increased complexity in managing hybrid environments, and potential long-term inefficiencies.
Low organizational risk, with some caveats:
Preserves existing infrastructure and knowledge
Minimal immediate disruption, but inefficiencies are kept as-is
Allows for strategic planning - but also for kicking the ball forward and postponing decisions
Potential constraints include mounting technical debt
Increasing maintenance costs
Reduced competitive agility
Put in place ongoing monitoring to evaluate when retained applications might become migration candidates
Keep an eye on possible growing operational costs
Requires management of threatened stakeholders and their turfs, evaluate that and have a communication strategy ready and a clean way forward, or out, for those stakeholders
Some sort of eventual way out must be in place ... eventually
If for specific reasons this is not possible, consider how you can address the operational complexity of maintaining hybrid environments through automation, integration and/or APIs.
Rebuild
A comprehensive strategy involving complete application reconstruction using cloud-native technologies and modern development approaches. Most suitable for legacy applications that are fundamentally incompatible with cloud environments or require complete modernization. Pros include maximum technological flexibility, opportunity to incorporate latest architectural patterns, and potential for significant performance improvements. Cons include extremely high transformation costs, extended development timelines, and substantial organizational disruption. This strategy carries the highest organizational risk due to its comprehensive nature. This strategy carries the highest organizational risk due to its ambitious and comprehensive nature, so beware.
Massive skill gap between existing "legacy IT teams" and cloud-native technologies team, which can potentially introduce serious issues between the lifers and those hip trendy new hires
Underestimation of complexity in complete application reconstruction, which will invariably happen and I will explore in a future article
Significant budget overruns, often 2-3x initial estimates - and missed deadlines - as per the previous point
Prolonged development cycles leading to business functionality gaps, or forces a classic two-pronged approach where the legacy systems still acquires new functionality downstream from the new one, or even bidirectionally between old and new (features, fixes). What could go wrong?
The previous point tend to happen as timelines that are inevitably extended will reduce stakeholder patience and support (or their ability or appetite for forking out some more budget) for the project
Potential loss of institutional knowledge during complete rewrite - poor team management could mean you lose key knowledge mid-air
High probability of scope creep and endless refinement cycles
Risk of creating more complex systems than the original application was, because of creep, new bugs created on top of the ones who were ported to the new system from the original, because there will be pressure to cut corners, "reuse" code and not change the processes too much
Nice opportunity to score some ESG points with your audience & investors and introduce green computing practices and a more sustainable or leaner architecture and cloud offerings
Requires focus on change management, communication and phased development to maintain organizational confidence and drive
Inherent increased complexity of distributed systems
Don't fall for a waterfall approach here, agile and iterative development methodologies will be better choices when it comes to managing the complexities of large-scale rebuilds (although those do not remove the risks, obviously)
Replace
Involves substituting existing applications with entirely new solutions that better meet business requirements. This might mean changing vendors or developing new custom applications (not the same as rebuild) that are more aligned with current business needs. Ideal for scenarios where existing applications are severely limited or where business processes have significantly evolved. Pros include opportunity for radical improvement, potential cost savings, alignment with current business strategies and breaking up with ballast from the past when justified. Cons include potential high initial investment, risk of business disruption, and challenges in data migration and user adaptation. Data migration, user training, and change management are things you will also need to consider here, as in the other strategies too.
Opportunity for a strategic pivot to adopt modern best practices, technically and even organizationally
High likelihood of disruptions in entrenched processes & constituencies
Needs leadership alignment and transparent communication to manage resistance and smooth transition
Basic decision tree
Necessarily basic
Considerations
To successfully navigate these migration approaches, organizations should consider the following
Develop a robust cloud & architecture competency centers. This entails a series of question around the organization of IT and product and platform teams. This is no simple topic, and I recommend you read books such as Team Topologies or Agile IT Organization Design or learn from methodologies such as unFIX
Create detailed migration roadmaps with clear exit criteria and milestones
Invest heavily in continuous skills training
Implement phased migration approaches
Establish strong governance and (possibly) architectural review boards, although I have mixed feelings about those for their obvious limitations in knowledge and reach. I guess it depends on the type of organization and its maturity levels.
Develop comprehensive change management programs
Create fallback and rollback mechanisms
Maintain open communication channels across IT and business units
Have KPIs and metrics in order to build data-driven decision making as opposed to opinions, intuition and hand-waving
Make sure you always have data driving the decisions, not only for founded decision making but also for effectively managing stakeholders and navigating political interests or challenges
Consider what is your approach and policy for Infrastructure as Code, what will your policies be?
Out of the scope of this article, but think that you will also need to adapt your current deployment pipelines from on-prem to the chosen strategy
In all cases, a viewpoint beyond the immediate and the tactical is needed in order to:
Consider Emerging Complexity Factors
Increasing regulatory compliance requirements, something which is clear in the EU with its love for bureaucracy and intervention, as well as with sustainability and other similar trends
Multi-cloud and hybrid cloud environments
Where technological evolution is heading
Increasing complexity of distributed systems (operations, monitoring and observability, reliability etc.)
Build Key Organizational Capabilities for Successful Migration:
Strategic technological vision
Cloud architecture expertise
API-first and product vs project mindset
Continuous learning culture
Financial flexibility
Strong change management practices
Cross-functional collaboration
Risk management maturity
Most organizations should start with Rehost or Re-Platform strategies to build cloud migration experience, then progressively move to more complex strategies like Refactor or Rebuild as organizational cloud maturity increases.
A note on Cloud Repatriation
We've discussed moving to the cloud, but sometimes the reverse trip could be necessary. So-called Cloud Repatriation / Edge retention has gained some recent attention due to some famous articles. This approach involves either deliberately choosing to maintain on-premises infrastructure or moving workloads back from public cloud environments to private data centers or edge computing infrastructure. It's a strategic counterpoint to the widespread cloud migration narrative, acknowledging that cloud isn't always the optimal solution for every technological context.
Organizational Scenarios of Applicability Ideal candidates for this strategy include organizations with specific regulatory constraints, high-performance computing requirements, predictable and stable workloads, sensitive data processing needs, or those experiencing unexpectedly high public cloud operational costs. Industries like financial services, healthcare, government, scientific research, and certain manufacturing sectors often find compelling reasons to retain or repatriate infrastructure.
Pros
Complete control over infrastructure and data sovereignty - there are many ways to build mixed / hybrid architectures where some relevant data resides on-premises while you still run other workloads in the cloud and use on-prem gateways to connect the two worlds. This model applies as well to the Re-platform model where some parts or data can remain on-prem while other parts of the system move to the cloud, complexity notwithstanding.
Predictable, potentially lower long-term operational costs - at least you are reducing your exposure to unilateral prices / licenses increases in exchange for not offloading that much responsibility to the vendor
Elimination of cloud vendor dependencies
Precise performance tuning capabilities
No recurring cloud service expenses
Compliance with strict regulatory environments
Reduced network latency for specific compute-intensive applications
Full customization of hardware and network configurations, but slower provisioning of new infrastructure for which lifecycle now you have to be responsible
Open to debate whether the security for sensitive workloads stands in a repatriation effort
Cons
Significant upfront capital expenditure
Requires sophisticated internal IT infrastructure expertise
Ongoing maintenance and refresh responsibilities
Limited scalability compared to cloud elasticity
Higher costs for disaster recovery and redundancy
Increased complexity in managing hardware lifecycle
Reduced access to cutting-edge cloud-native services
Manual scaling and capacity planning
Higher energy and cooling infrastructure investments
Organizational Constraints and Challenges for Repatriation
Skill Set Requirements
Demands specialized infrastructure engineering talent
Requires continuous training, but just like mostly everything else today (think AI for non-IT employees)
Needs comprehensive understanding of capacity planning
Demands advanced networking and security expertise
Forgotten skills like Hardware in general
Financial Considerations
You will face initial and ongoing capital and staff expenditures
Complex total cost of ownership calculations - Read the "Pricing Granularity" section in the Wardley Maps book for a good exploration of this topic
Need for precise workload forecasting, for cost reasons too
Investment in redundancy and disaster recovery infrastructure
Technological Constraints
Limited access to rapid technological innovation - only mature and tech-savvy engineering organizations will be able keep up the pace
Management of hardware refresh cycles
Reduced flexibility in scaling computational resources
Higher complexity in distributed computing scenarios
Operational Challenges
24/7 infrastructure management responsibilities, SLAs, SLOs, Observability etc
Complex compliance and security maintenance, no offloading responsibilities
Higher overhead for monitoring and maintenance
Limited global distribution capabilities, or at least more difficult
Performance and Scalability Limitations
Fixed computational capacity, although admittedly this is a problem with the cloud, not because of the cloud itself, but because of the applications we move to the cloud and the state they're in, where scalability was hardly a design consideration back then and the only way to scale was up. It can be a different story if you build cloud-native but repatriated from scratch.
Manual resource allocation unless you are able to build a good level of managed automated scalability
Potential underutilization during low-demand periods
Increased complexity in dynamic workload management
Risk Mitigation Strategies for Repatriation
Implement hybrid infrastructure approaches where possible, or how you will approach redundancy depending on criticality (perhaps if it isn't critical you don't need to consider repatriation)
Create flexible, modular infrastructure designs so you can replace pieces or move selected parts to the cloud later on if context changes (or your mind does)
You still need to invest in advanced monitoring and optimization tools
Maintain continuous skills development programs - but that is the same with the cloud, basically
Develop robust disaster recovery and business continuity plans, including effective management of things like backups and their storage
Decision Criteria for Repatriation
Consider the below if you are thinking Repatriation. Examine what other companies have done, how (full, degrees of hybrid) and especially why and what was their context. Yours will be different. And theirs may have changed since.
Consistent, predictable workload characteristics
Regulatory compliance requirements
High-performance computing needs
Sensitive data processing
Cost-effectiveness analysis
Strategic technological independence
Foresee customization requirements
Calculate long-term total cost of ownership and associated costs
Cloud repatriation represents a mature technological strategy that recognizes no single infrastructure approach fits all scenarios. It's a nuanced response to the simplified one-size-fits-all cloud migration narrative based on data and contextual technological decision-making. The most sophisticated organizations view infrastructure as a strategic portfolio, dynamically balancing cloud, on-premises, and edge computing based on specific workload characteristics and organizational objectives.
Further reading
This article is also available here https://github.com/aelena/writings/blob/main/cloud-migrations.md