SlideShare a Scribd company logo
Software ArchitectureSoftware Architecture
for Cloud Infrastructurefor Cloud Infrastructure
Tapio RautonenTapio Rautonen
@trautonen
github.com/trautonen
fi.linkedin.com/in/trautonen
software architect
Enabling cloud superpowers in software development.
Cloud computing characteristicsCloud computing characteristics
On-demand self-service
Consumer can provision computing capabilities without requiring human interaction
Broad network access
Capabilities are available over the network and accessible by heterogeneous clients
Resource pooling
Provider's computing resources are pooled to serve multiple consumers dynamically
Rapid elasticity
Capabilities can be elastically provisioned and appear unlimited for the consumer
Measured service
Automatically controlled and optimized resources by metering capabilities
Software architecture principlesSoftware architecture principles
● Intentional architecture with emergent design
● High modularity
– high cohesion, loose coupling
– low algorithmic complexity
● Well described elements
– expressive and meaningful names and APIs
– clean code
● Passes all defined tests or acceptance criteria
● Lightweight documentation
Software architectureSoftware architecture  cloud computingcloud computing
MicroservicesMicroservices
Distributed computing fallaciesDistributed computing fallacies
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn't change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous
Peter Deutsch / Sun Microsystems
Design for failureDesign for failure
New era of design patternsNew era of design patterns
● Cache-Aside
● Circuit Breaker
● Compensating Transaction
● Command and Query Responsibility Segregation (CQRS)
● Event Sourcing
● Queue-Based Load Leveling
● Sharding
● Throttling
Cache-Aside patternCache-Aside pattern
● Aggregated search combining multiple services
– requires additional search cache (Solr, ElasticSearch, ...)
● Improve performance of frequently
read data
● Local caching results
inconsistent state between
instances
● Consistency of data stores
and cache is really hard to
maintain
There are only two hard things
There are only two hard things
in Computer Science: cache
in Computer Science: cache
invalidation and naming things.
invalidation and naming things.
–– Phil Karlton
Phil Karlton
Circuit Breaker patternCircuit Breaker pattern
● Cloud infrastructure and distributed systems allow remote
services to fail in ways beyond imagination
– prevent failures to cascade
– allow system to operate in degraded mode
● Suitable for big microservices architecture
– creates routing complexity and overhead
– potential single point of failure  must be highly available
● Enables central logging and metrics
– dashboards and central state awareness
Circuit Breaker patternCircuit Breaker pattern
● During normal operation, breaker is in Closed state
– failures will eventually trip breaker
● While in Open state
– calls will fail fast and after some period attempts to reset
● In Half-Open state
– on successful call resets breaker, otherwise trips breaker
http://doc.akka.io/docs/akka/snapshot/common/circuitbreaker.html
Netflix Hystrix dashboardNetflix Hystrix dashboard
Compensating Transaction patternCompensating Transaction pattern
● Irrecoverable failures in distributed systems are hard
– eventual consistency, rollbacks are impossible
● Distributed transactions (XA)
– difficult and complex to implement
– still not bulletproof
– not usable for generic REST services
● Undo the effects of the original operation
– defines an eventually consistent steps for a reverse operation
– compensation logic may be difficult to generate
– operations should be idemponent to prevent further catastrophe
CQRS patternCQRS pattern
● Command and Query Responsibility Segregation
– segregates read and write operations with separate interfaces
– allows to maximize performance, scalability and security
● Introduces flexibility at the cost of complexity
– traditionally same DTO is used for read and write operations
– different data model for read (query) and write (command)
– supports different read and write data stores
– not suitable for simple business rules where CRUD is sufficient
● Often used together with event sourcing pattern
Event Sourcing patternEvent Sourcing pattern
● Append only store of events that describe actions for data
– simplifies tasks in complex domains by avoiding synchronization
– improves performance, scalability and consistency for
transactional data
– can serve multiple different materialized views
● Maintains full audit trail and history
– enables compensation actions
– supports play back at any point in time
● Events are simple
– the operation logic they describe might not be
– updates and deletes must be implemented with compensation
– “at least once” publication requires idemponent consumers
Queue-Based Load Leveling patternQueue-Based Load Leveling pattern
● Buffer between task and service
– minimizes the impact of peaks of work load
– task flood may result unresponsive or failure of the service
● Task provider and service runs asynchronously
– queue decouples tasks from the service
– service can handle tasks at its own optimal pace
– requires a mechanism for responses if the task expects a reply
Service
Tasks
Message queue
Sharding patternSharding pattern
● Divide data store into multiple horizontal partitions
– improves scalability when handling large volumes of data
● Overcomes limitations of single server data store
– finite storage space
– computing resources for large number of concurrent users
– network bandwidth governed performance
– geographically limited storage for legal or performance reasons
● Strategy defines the sharding key and data distribution
– wrong sharding strategy results bad performance
– balancing shards is not trivial, rebalancing is expensive
– referential integrity and consistency is hard to maintain
● Configuring and managing big set of shards is a challenge
Throttling patternThrottling pattern
● Controls the consumption of resource used by a service
– allows the system to function and meet SLA on extreme load
● Throttle after soft limit of resource usage is exceeded
– reject requests for user that exceed the soft limits
– disable or degrade functionality of nonessential services
– queue-based load leveling with priority queues
● Throttling is an architectural decision
– impacts the entire design of the system
– must be detected and performed very quickly
– services should return specific error code for clients
– can be used as an interim measure while autoscaling
SaaS architecture methodologySaaS architecture methodology
● Declarative formats for setup and runtime automation
● Clean contract with infrastructure for maximum portability
● Cloud platform deployments, obviating the need for ops
● Tooling, architecture and dev practices support scaling
Modern software is delivered from the cloud
to heterogeneous clients on-demand
The Twelve-Factor AppThe Twelve-Factor App
I. Codebase
one codebase tracked in revision control, many deploys
II. Dependencies
explicitly declare and isolate dependencies
III. Config
store config in the environment
IV. Backing Services
treat backing services as attached resources
V. Build, release, run
strictly separate build and run stages
VI. Processes
execute the app as one or more stateless processes
http://12factor.net/
The Twelve-Factor AppThe Twelve-Factor App
VII. Port binding
export services via port binding
VIII.Concurrency
scale out via the process model
IX. Disposability
maximize robustness with fast startup and graceful shutdown
X. Dev/prod parity
keep development, staging, and production as similar as possible
XI. Logs
treat logs as event streams
XII. Admin processes
run admin/management tasks as one-off processes
http://12factor.net/
Break the monolith in piecesBreak the monolith in pieces
● Monoliths come with a burden
– cognitive overload for developers
– scaling and continuous deployment becomes difficult
– long-term commitment to technology stack
– allows taking shortcuts for architectural design
If you can't build a monolith, what makes
you think microservices are the answer?
AWS reference architectureAWS reference architecture
Service discoveryService discovery
● Services need to know about each other
– inexistence of centralized service bus
– smart endpoints and client side load balancing
● Service registry is the new single point of failure?
– value availability over consistency
● Provides a limited set of well defined features
– services notify each other of their availability and status
– cleaning of stale services
– easy integration with standard protocols like HTTP or DNS
– notifications on services starting and stopping
Ephemeral runtime environmentsEphemeral runtime environments
● Short lifetime of an application runtime environment
– scaling, testing, materializing ideas
– requires highly automatized infrastructure
● Nothing can be stored in the runtime environment
– logs, file uploads, database storage files, configuration
● Results stateless services
– optimal for horizontal scaling
– integrates to State as a Service
● Must be repeatable and automatically provisioned
Metrics and loggingMetrics and logging
● Ephemeral and dynamic systems
– requires central awareness of state
– audit logging of changes in the system
● Gain understanding how the services are used
– plan for future requirements
– gather scaling metrics
– bill customers for usage (pay-per-use)
– detect faulty behavior
● Balance between value provided and cost of collecting
– robustness of the metering system impacts on profitability
– collect end-to-end scenarios rather than operational factors
AutoscalingAutoscaling
● Adapting to changing workloads
– optimize capacity and operational cost
– increase failure resilience
● Requires key performance metrics capturing
– response times, queue sizes, CPU and memory utilization
● Decision logic based on scaling metrics
– when to scale up and down
– prevent scaling oscillation
● Application must be designed for scaling
– stateless, immutable, automatically provisioned
Asynchronous messagingAsynchronous messaging
● Key strategy for services to communicate and coordinate
– decouple consumer process from the implementing service
– enables scalability and improves resilience
● Basic messaging patterns
– sender posts a one-way message and receiver processes the
message at some point in time
– sender posts a request message and expects a response
message from the receiver
– sender posts a broadcast message which is copied and
delivered to multiple receivers
● Numerous implementation concerns
– message ordering, grouping, repeating, poisoning, expiration,
idempotency and scheduling
Reactive streamsReactive streams
● Originates from The Reactive Manifesto
– Responsive system responds in a timely manner
– Resilient system stays responsive in the face of failure
– Elastic system stays response under varying workload
– Message Driven system relies on asynchronous messaging
● Initiative to provide standard for asynchronous stream
processing with non-blocking back pressure
– minimal set of interfaces and methods to achieve the goal
● Collaboration of people from high profile companies
– Typesafe, Oracle, Pivotal, Netflix, Red Hat, Applied Duality, ...
● Akka Streams, Reactor Composable, RxJava, Ratpack
https://www.coursera.org/course/reactive
Data consistencyData consistency
● All instances of application see the exact same data
– strong consistency
● Application instance might see data of operation in flight
– eventual consistency
● Distributed data stores are subjected to CAP theorem
– consistency, availability, partition tolerance
– only two of the features can be implemented
● Recovering from failures of eventually consistent data
– retry with idemponent commands
– compensating logic
Configuration managementConfiguration management
● Externalize configuration out
of runtime environment
– repeatable, versioned
● Local configuration pitfalls
– limits to single application
– hard for multiple instances
● Runtime reconfiguration
– application can be reconfigured without redeployment or restart
– minimize downtime, enable feature flags, help debugging
– thread safety and performance is a concern
– prepare for rollbacks and unavailability of configuration store
Software erosionSoftware erosion
● Slow deterioration of software leading to faulty behavior
● Fighting erosion is more expensive than usually admitted
● Erosion-resistance comes from separation of concerns
– application – infrastructure
● Clear contract of services provided by infrastructure
– change in infrastructure does not break the contract
– application can change within its respected realm
● Solutions against erosion
– Platform as a Service
– container virtualization
Cloud architecture pitfallsCloud architecture pitfalls
● Failures do cascade
– even without a single point of failure
● Multi-service search is hard to get right
– cache-aside issues
● Never rely on unreliable message delivery
– use asynchronous persistent message stores
● Monolith has one big problem
– microservices will generate a lot of small (and big) problems
● Do not ignore the platform's managed resources
– but evaluate the lock-in risk
Reach for the skiesReach for the skies
● Distributed systems are hard to build
– no silver bullet exists (sorry to disappoint again)
● Cloud infrastructure drives towards microservices
– start with a monolith, expand to microservices
– learn new design patterns during the journey
– automated system requires less ops and offers more resilience
● Do you think Netflix did it right the first time?
– learn from failure
– design for failure
● Cloud native applications are the future
Thank youThank you

More Related Content

PDF
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...
PDF
Adopting the Cloud
PPTX
Cloud patterns at Carleton University
PPTX
Cloud Design Pattern part2
PPTX
Design Pattern that every cloud developer must know
PPTX
Cloud design principles
PPTX
Azure Application Architecture Guide
PPTX
Cloud Design Patterns
Cloud Design Patterns - PRESCRIPTIVE ARCHITECTURE GUIDANCE FOR CLOUD APPLICAT...
Adopting the Cloud
Cloud patterns at Carleton University
Cloud Design Pattern part2
Design Pattern that every cloud developer must know
Cloud design principles
Azure Application Architecture Guide
Cloud Design Patterns

What's hot (20)

PDF
Modern Software Architecture - Cloud Scale Computing
PPTX
Azure Reference Architectures
PPTX
Designing microservices part2
PPTX
Azure reference architectures
PPTX
Designing apps for resiliency
PDF
Caching for Microservices Architectures: Session II - Caching Patterns
PDF
Cloud application architecture with Microsoft Azure
PPTX
Using Camunda on Kubernetes through Operators
PPTX
Caching for Microservives - Introduction to Pivotal Cloud Cache
PPTX
Cloud Migration
PDF
Architecting Cloud Applications - the essential checklist
PPTX
Modeling microservices using DDD
PPTX
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
PPTX
Alex Thissen (Xpirit) - Een verschuiving in architectuur: op weg naar microse...
PPTX
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
PPTX
Building Multi-tenant, Configurable, High Quality Applications on .NET for an...
PPTX
Data Caching Evolution - the SafePeak deck from webcast 2014-04-24
PPT
Cloud enablement
PDF
Step by-step cloud migration checklist
PDF
Einführung: MariaDB heute und unsere Vision für die Zukunft
Modern Software Architecture - Cloud Scale Computing
Azure Reference Architectures
Designing microservices part2
Azure reference architectures
Designing apps for resiliency
Caching for Microservices Architectures: Session II - Caching Patterns
Cloud application architecture with Microsoft Azure
Using Camunda on Kubernetes through Operators
Caching for Microservives - Introduction to Pivotal Cloud Cache
Cloud Migration
Architecting Cloud Applications - the essential checklist
Modeling microservices using DDD
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Alex Thissen (Xpirit) - Een verschuiving in architectuur: op weg naar microse...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Building Multi-tenant, Configurable, High Quality Applications on .NET for an...
Data Caching Evolution - the SafePeak deck from webcast 2014-04-24
Cloud enablement
Step by-step cloud migration checklist
Einführung: MariaDB heute und unsere Vision für die Zukunft
Ad

Viewers also liked (7)

PDF
Terracotta Ehcache : Simpler, faster, distributed
PDF
Compensating Service Transactions
PDF
Circuit breaker DevoxxFr
PDF
Advanced Concept of Caching - Mathilde Lemee - Codemotion Milan 2014
KEY
Stuff About CQRS
PPTX
Cloud Design Pattern part1
PPTX
Circuit Breaker Pattern
Terracotta Ehcache : Simpler, faster, distributed
Compensating Service Transactions
Circuit breaker DevoxxFr
Advanced Concept of Caching - Mathilde Lemee - Codemotion Milan 2014
Stuff About CQRS
Cloud Design Pattern part1
Circuit Breaker Pattern
Ad

Similar to Software Architecture for Cloud Infrastructure (20)

PPTX
Service Architectures at Scale
PPTX
Mykhailo Hryhorash: Архітектура IT-рішень (Частина 1) (UA)
PPTX
Mykhailo Hryhorash: Архітектура IT-рішень (Частина 1) (UA)
PDF
iSAQB gathering 2021 keynote - Architectural patterns for rapid, reliable, fr...
PDF
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...
PPTX
Service Architectures At Scale - QCon London 2015
PPTX
Microservices architecture
PPTX
Applicare patterns di sviluppo con Azure
PDF
Software Architecture Anti-Patterns
PDF
Cloud Design Patterns
PPTX
Jeffrey Richter
PPTX
Service Mesh CTO Forum (Draft 3)
PPTX
The Big Picture - Integrating Buzzwords
PDF
Cloud-native Data: Every Microservice Needs a Cache
PPTX
Iot cloud service v2.0
PDF
Architecting systems for continuous delivery
PDF
Software architecture, Patterns for Scale
PPSX
Microservices Architecture - Cloud Native Apps
PDF
Cloud Native In-Depth
PPTX
Cloud to hybrid edge cloud evolution Jun112020.pptx
Service Architectures at Scale
Mykhailo Hryhorash: Архітектура IT-рішень (Частина 1) (UA)
Mykhailo Hryhorash: Архітектура IT-рішень (Частина 1) (UA)
iSAQB gathering 2021 keynote - Architectural patterns for rapid, reliable, fr...
JS Fest 2019/Autumn. Anton Cherednikov. Choreographic or orchestral architect...
Service Architectures At Scale - QCon London 2015
Microservices architecture
Applicare patterns di sviluppo con Azure
Software Architecture Anti-Patterns
Cloud Design Patterns
Jeffrey Richter
Service Mesh CTO Forum (Draft 3)
The Big Picture - Integrating Buzzwords
Cloud-native Data: Every Microservice Needs a Cache
Iot cloud service v2.0
Architecting systems for continuous delivery
Software architecture, Patterns for Scale
Microservices Architecture - Cloud Native Apps
Cloud Native In-Depth
Cloud to hybrid edge cloud evolution Jun112020.pptx

More from Tapio Rautonen (7)

PDF
Deep dive into AWS CDK custom resources by Tapio Rautonen
PDF
The Public Cloud is a Lie
PDF
Generic Functional Programming with Type Classes
PDF
Making sense out of your big data
PDF
M.O.S.K.A. - Koulun penkiltä pelastamaan Suomea
PDF
Feedback loops - the second way towards the world of DevOps
PDF
Introduction to PaaS and Heroku
Deep dive into AWS CDK custom resources by Tapio Rautonen
The Public Cloud is a Lie
Generic Functional Programming with Type Classes
Making sense out of your big data
M.O.S.K.A. - Koulun penkiltä pelastamaan Suomea
Feedback loops - the second way towards the world of DevOps
Introduction to PaaS and Heroku

Recently uploaded (20)

PDF
Digital Strategies for Manufacturing Companies
PPT
Introduction Database Management System for Course Database
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
ai tools demonstartion for schools and inter college
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
System and Network Administration Chapter 2
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
history of c programming in notes for students .pptx
Digital Strategies for Manufacturing Companies
Introduction Database Management System for Course Database
How Creative Agencies Leverage Project Management Software.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Odoo Companies in India – Driving Business Transformation.pdf
Online Work Permit System for Fast Permit Processing
ai tools demonstartion for schools and inter college
Understanding Forklifts - TECH EHS Solution
ManageIQ - Sprint 268 Review - Slide Deck
ISO 45001 Occupational Health and Safety Management System
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Upgrade and Innovation Strategies for SAP ERP Customers
Design an Analysis of Algorithms II-SECS-1021-03
System and Network Administration Chapter 2
Wondershare Filmora 15 Crack With Activation Key [2025
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
history of c programming in notes for students .pptx

Software Architecture for Cloud Infrastructure

  • 1. Software ArchitectureSoftware Architecture for Cloud Infrastructurefor Cloud Infrastructure
  • 3. Cloud computing characteristicsCloud computing characteristics On-demand self-service Consumer can provision computing capabilities without requiring human interaction Broad network access Capabilities are available over the network and accessible by heterogeneous clients Resource pooling Provider's computing resources are pooled to serve multiple consumers dynamically Rapid elasticity Capabilities can be elastically provisioned and appear unlimited for the consumer Measured service Automatically controlled and optimized resources by metering capabilities
  • 4. Software architecture principlesSoftware architecture principles ● Intentional architecture with emergent design ● High modularity – high cohesion, loose coupling – low algorithmic complexity ● Well described elements – expressive and meaningful names and APIs – clean code ● Passes all defined tests or acceptance criteria ● Lightweight documentation
  • 5. Software architectureSoftware architecture  cloud computingcloud computing MicroservicesMicroservices
  • 6. Distributed computing fallaciesDistributed computing fallacies 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous Peter Deutsch / Sun Microsystems
  • 8. New era of design patternsNew era of design patterns ● Cache-Aside ● Circuit Breaker ● Compensating Transaction ● Command and Query Responsibility Segregation (CQRS) ● Event Sourcing ● Queue-Based Load Leveling ● Sharding ● Throttling
  • 9. Cache-Aside patternCache-Aside pattern ● Aggregated search combining multiple services – requires additional search cache (Solr, ElasticSearch, ...) ● Improve performance of frequently read data ● Local caching results inconsistent state between instances ● Consistency of data stores and cache is really hard to maintain There are only two hard things There are only two hard things in Computer Science: cache in Computer Science: cache invalidation and naming things. invalidation and naming things. –– Phil Karlton Phil Karlton
  • 10. Circuit Breaker patternCircuit Breaker pattern ● Cloud infrastructure and distributed systems allow remote services to fail in ways beyond imagination – prevent failures to cascade – allow system to operate in degraded mode ● Suitable for big microservices architecture – creates routing complexity and overhead – potential single point of failure  must be highly available ● Enables central logging and metrics – dashboards and central state awareness
  • 11. Circuit Breaker patternCircuit Breaker pattern ● During normal operation, breaker is in Closed state – failures will eventually trip breaker ● While in Open state – calls will fail fast and after some period attempts to reset ● In Half-Open state – on successful call resets breaker, otherwise trips breaker http://doc.akka.io/docs/akka/snapshot/common/circuitbreaker.html
  • 12. Netflix Hystrix dashboardNetflix Hystrix dashboard
  • 13. Compensating Transaction patternCompensating Transaction pattern ● Irrecoverable failures in distributed systems are hard – eventual consistency, rollbacks are impossible ● Distributed transactions (XA) – difficult and complex to implement – still not bulletproof – not usable for generic REST services ● Undo the effects of the original operation – defines an eventually consistent steps for a reverse operation – compensation logic may be difficult to generate – operations should be idemponent to prevent further catastrophe
  • 14. CQRS patternCQRS pattern ● Command and Query Responsibility Segregation – segregates read and write operations with separate interfaces – allows to maximize performance, scalability and security ● Introduces flexibility at the cost of complexity – traditionally same DTO is used for read and write operations – different data model for read (query) and write (command) – supports different read and write data stores – not suitable for simple business rules where CRUD is sufficient ● Often used together with event sourcing pattern
  • 15. Event Sourcing patternEvent Sourcing pattern ● Append only store of events that describe actions for data – simplifies tasks in complex domains by avoiding synchronization – improves performance, scalability and consistency for transactional data – can serve multiple different materialized views ● Maintains full audit trail and history – enables compensation actions – supports play back at any point in time ● Events are simple – the operation logic they describe might not be – updates and deletes must be implemented with compensation – “at least once” publication requires idemponent consumers
  • 16. Queue-Based Load Leveling patternQueue-Based Load Leveling pattern ● Buffer between task and service – minimizes the impact of peaks of work load – task flood may result unresponsive or failure of the service ● Task provider and service runs asynchronously – queue decouples tasks from the service – service can handle tasks at its own optimal pace – requires a mechanism for responses if the task expects a reply Service Tasks Message queue
  • 17. Sharding patternSharding pattern ● Divide data store into multiple horizontal partitions – improves scalability when handling large volumes of data ● Overcomes limitations of single server data store – finite storage space – computing resources for large number of concurrent users – network bandwidth governed performance – geographically limited storage for legal or performance reasons ● Strategy defines the sharding key and data distribution – wrong sharding strategy results bad performance – balancing shards is not trivial, rebalancing is expensive – referential integrity and consistency is hard to maintain ● Configuring and managing big set of shards is a challenge
  • 18. Throttling patternThrottling pattern ● Controls the consumption of resource used by a service – allows the system to function and meet SLA on extreme load ● Throttle after soft limit of resource usage is exceeded – reject requests for user that exceed the soft limits – disable or degrade functionality of nonessential services – queue-based load leveling with priority queues ● Throttling is an architectural decision – impacts the entire design of the system – must be detected and performed very quickly – services should return specific error code for clients – can be used as an interim measure while autoscaling
  • 19. SaaS architecture methodologySaaS architecture methodology ● Declarative formats for setup and runtime automation ● Clean contract with infrastructure for maximum portability ● Cloud platform deployments, obviating the need for ops ● Tooling, architecture and dev practices support scaling Modern software is delivered from the cloud to heterogeneous clients on-demand
  • 20. The Twelve-Factor AppThe Twelve-Factor App I. Codebase one codebase tracked in revision control, many deploys II. Dependencies explicitly declare and isolate dependencies III. Config store config in the environment IV. Backing Services treat backing services as attached resources V. Build, release, run strictly separate build and run stages VI. Processes execute the app as one or more stateless processes http://12factor.net/
  • 21. The Twelve-Factor AppThe Twelve-Factor App VII. Port binding export services via port binding VIII.Concurrency scale out via the process model IX. Disposability maximize robustness with fast startup and graceful shutdown X. Dev/prod parity keep development, staging, and production as similar as possible XI. Logs treat logs as event streams XII. Admin processes run admin/management tasks as one-off processes http://12factor.net/
  • 22. Break the monolith in piecesBreak the monolith in pieces ● Monoliths come with a burden – cognitive overload for developers – scaling and continuous deployment becomes difficult – long-term commitment to technology stack – allows taking shortcuts for architectural design If you can't build a monolith, what makes you think microservices are the answer?
  • 23. AWS reference architectureAWS reference architecture
  • 24. Service discoveryService discovery ● Services need to know about each other – inexistence of centralized service bus – smart endpoints and client side load balancing ● Service registry is the new single point of failure? – value availability over consistency ● Provides a limited set of well defined features – services notify each other of their availability and status – cleaning of stale services – easy integration with standard protocols like HTTP or DNS – notifications on services starting and stopping
  • 25. Ephemeral runtime environmentsEphemeral runtime environments ● Short lifetime of an application runtime environment – scaling, testing, materializing ideas – requires highly automatized infrastructure ● Nothing can be stored in the runtime environment – logs, file uploads, database storage files, configuration ● Results stateless services – optimal for horizontal scaling – integrates to State as a Service ● Must be repeatable and automatically provisioned
  • 26. Metrics and loggingMetrics and logging ● Ephemeral and dynamic systems – requires central awareness of state – audit logging of changes in the system ● Gain understanding how the services are used – plan for future requirements – gather scaling metrics – bill customers for usage (pay-per-use) – detect faulty behavior ● Balance between value provided and cost of collecting – robustness of the metering system impacts on profitability – collect end-to-end scenarios rather than operational factors
  • 27. AutoscalingAutoscaling ● Adapting to changing workloads – optimize capacity and operational cost – increase failure resilience ● Requires key performance metrics capturing – response times, queue sizes, CPU and memory utilization ● Decision logic based on scaling metrics – when to scale up and down – prevent scaling oscillation ● Application must be designed for scaling – stateless, immutable, automatically provisioned
  • 28. Asynchronous messagingAsynchronous messaging ● Key strategy for services to communicate and coordinate – decouple consumer process from the implementing service – enables scalability and improves resilience ● Basic messaging patterns – sender posts a one-way message and receiver processes the message at some point in time – sender posts a request message and expects a response message from the receiver – sender posts a broadcast message which is copied and delivered to multiple receivers ● Numerous implementation concerns – message ordering, grouping, repeating, poisoning, expiration, idempotency and scheduling
  • 29. Reactive streamsReactive streams ● Originates from The Reactive Manifesto – Responsive system responds in a timely manner – Resilient system stays responsive in the face of failure – Elastic system stays response under varying workload – Message Driven system relies on asynchronous messaging ● Initiative to provide standard for asynchronous stream processing with non-blocking back pressure – minimal set of interfaces and methods to achieve the goal ● Collaboration of people from high profile companies – Typesafe, Oracle, Pivotal, Netflix, Red Hat, Applied Duality, ... ● Akka Streams, Reactor Composable, RxJava, Ratpack https://www.coursera.org/course/reactive
  • 30. Data consistencyData consistency ● All instances of application see the exact same data – strong consistency ● Application instance might see data of operation in flight – eventual consistency ● Distributed data stores are subjected to CAP theorem – consistency, availability, partition tolerance – only two of the features can be implemented ● Recovering from failures of eventually consistent data – retry with idemponent commands – compensating logic
  • 31. Configuration managementConfiguration management ● Externalize configuration out of runtime environment – repeatable, versioned ● Local configuration pitfalls – limits to single application – hard for multiple instances ● Runtime reconfiguration – application can be reconfigured without redeployment or restart – minimize downtime, enable feature flags, help debugging – thread safety and performance is a concern – prepare for rollbacks and unavailability of configuration store
  • 32. Software erosionSoftware erosion ● Slow deterioration of software leading to faulty behavior ● Fighting erosion is more expensive than usually admitted ● Erosion-resistance comes from separation of concerns – application – infrastructure ● Clear contract of services provided by infrastructure – change in infrastructure does not break the contract – application can change within its respected realm ● Solutions against erosion – Platform as a Service – container virtualization
  • 33. Cloud architecture pitfallsCloud architecture pitfalls ● Failures do cascade – even without a single point of failure ● Multi-service search is hard to get right – cache-aside issues ● Never rely on unreliable message delivery – use asynchronous persistent message stores ● Monolith has one big problem – microservices will generate a lot of small (and big) problems ● Do not ignore the platform's managed resources – but evaluate the lock-in risk
  • 34. Reach for the skiesReach for the skies ● Distributed systems are hard to build – no silver bullet exists (sorry to disappoint again) ● Cloud infrastructure drives towards microservices – start with a monolith, expand to microservices – learn new design patterns during the journey – automated system requires less ops and offers more resilience ● Do you think Netflix did it right the first time? – learn from failure – design for failure ● Cloud native applications are the future