SlideShare a Scribd company logo
© Matthew Bass 2013
Architecting for the Cloud
Len and Matt Bass
Cloud Providers
© Matthew Bass 2013
IaaS Providers
• There are several primary providers
– Amazon: Amazon Web Services (AWS)
– Microsoft: Azure
– Google: Google Compute Engine
– …
• Each of these are set up a bit differently with slightly different
internal decisions and associated services
© Matthew Bass 2013
Goals
• The goals for this talk is not to give you a definitive how to for
each provider
• It’s meant to give you just an introduction
• The idea is that you’ll see how the concepts that we talked
about in the course map to specific providers
• We’ll look primarily at Amazon (with some details from others
thrown in)
• We’ll go through both the overall structure and look at specific
services
© Matthew Bass 2013
Amazon Elastic Compute Cloud
• Amazon EC2 provides compute capacity in the cloud
• You can select the machine image with a given OS and specified
capability
• You can resize the capacity as needed
• Takes minutes to spin up a new VM
• You can specify multiple instances and select where they will run
– Region & availability zones
• You pay per usage/hour depending on the capability of the instance
and if it’s a reserved instance (dedicated)
© Matthew Bass 2013
Regions
• Amazon has divided their cloud offerings into multiple regions. Each region
should be thought of as a separate cloud
– I.e. there is no automatic copying of data from one region to another.
© Matthew Bass 2013
Current AWS Regions
• North America:
– US East (5 availability zones)
– US West Oregon (3 availability zones)
– US West Northern California (3 availability zones)
– USGov Cloud (2 availability zones)
• South America
– Sao Paulo (2 availability zones)
• Europe
– Ireland (2 availability zones)
• Asia Pacific
– Sydney (2 availability zones)
– Singapore (2 availability zones)
– China (1 availability zone)
– Tokyo (3 availability zones)
© Matthew Bass 2013
AWS and Services
• Amazon Web Services offers a number of services
• These services are things like:
– Storage
– Database
– Network capabilities
– Monitoring
– …
• Not all services are available at all regions
– https://aws.amazon.com/about-aws/globalinfrastructure/regional-
product-services/
© Matthew Bass 2013
Amazon Availability Zones
• Amazon has a notion of availability zones
• Engineered to be insulated from failures in other availability zones
• Availability zones are locations within a region
• Amazon has not announced the details of an availability region but presumably they
are
– Physically separate data centers
– Have independent networks
– Have independent power delivery
– …
© Matthew Bass 2013
Amazon Service Level Agreement
• Amazon guarantees 99.95% availability for each region
• IaaS consumers are free to deploy their applications:
– Within an availability zone
– Across availability zones but within a region
– Across regions
• Amazon does not make any claim about the availability of their availability zones
(that I could find)
© Matthew Bass 2013
All-in-one Single Server
© Matthew Bass 2013
Basic 4-server Setup
© Matthew Bass 2013
Multiple Availability Zones
© Matthew Bass 2013
Multiple Regions
© Matthew Bass 2013
Elastic Compute Cloud (EC2) & Redundancy
• EC2 supports different levels of redundancy
– It is up to the customer to determine how much redundancy they
wish to have and how much they wish to pay for it
• Redundant elements can be:
– Within an availability zone
– Across availability zones
– Across regions
© Matthew Bass 2013
Microsoft Azure Regions
• North America
– US Central (Iowa)
– US East (Virginia)
– US East 2 (Virginia)
– US North Central (Illinois)
– US South Central (Texas)
– US West (California)
• Europe
– Europe North (Ireland)
– Europe West (Netherlands)
• Asia Pacific
– East (Hong Kong)
– Southeast (Singapore)
• Japan
– Japan East (Saitama)
– Japan West (Osaka)
• Brazil
– Sao Paulo
© Matthew Bass 2013
Fault Domains in Azure
• In Azure there is the concept of Fault Domains
• A Fault Domain is essentially a rack in a given datacenter
• A consumer is not able to define which fault zones the
application are distributed to
– Unlike an availability zone
• As a result the fault zone is really an internal structure
© Matthew Bass 2013
Upgrade Domains in Azure
• An upgrade domain is similar to a fault domain
• Essentially an upgrade domain will be upgraded at one time
– When Microsoft upgrades their internal infrastructure they do so a
domain at a time
• In order to guard against failures within a fault domains and
upgrades you need to replicate across both fault and upgrade
domains
• This is called an availability set
© Matthew Bass 2013
Azure Availability Sets
© Matthew Bass 2013
Amazon Auto Scaling
• Auto Scaling works in conjunction with Cloudwatch (Amazon’s monitoring
service)
• The idea is the monitoring service monitors the metrics
– CPU utilization
– Latency
– Memory consumption
• The Auto Scaling solution establishes the rules
– Add instances when utilization exceeds 70%
– Remove instances when utilization falls below 10%
• You can specify things like a “cooling off” period
– Where no action is taken until the system has a chance to stabilize
© Matthew Bass 2013
Amazon Elastic Load Balancer
• This is Amazon’s load balancing solution
– Recall the push/pull architecture discussion
• It tracks the status and location of instances
• Routes requests to healthy instances based on criteria that you establish
• Can be used in conjunction with Auto Scaling
– When new instances are added or removed they are registered with the ELB
• Can use in conjunction with Amazon’s DNS (route 53)
– You can use DNS failover to move from one region to another
– The DNS will route traffic to the ELB in the target region
© Matthew Bass 2013
Amazon Simple Queue Service
• SQS is Amazon’s queuing service
– Again recall the push/pull architecture discussion
• It’s a service that supports message queues
• Recall it can be used in conjunction with Auto Scaling to
manage the elasticity of your application
• Pricing is per million requests handled
© Matthew Bass 2013
Amazon Storage Solutions
• Amazon has several storage solutions
– Elastic Block Store (EBS)
– Simple Storage Solution (S3)
– Glacier
• These provide raw unmanaged storage
• This is useful for:
– Disaster recovery
– Backup
– Archiving
– Persistence for your own database solution
© Matthew Bass 2013
Amazon Elastic Block Store
Amazon Elastic Block Store (EBS) is Amazon’s data file system.
Some of its features are
• Data is persisted independently from instances
• EBS data is placed in a specific availability zones and can be attached to instances in
the same availability zone
• EBS data is automatically replicated within availability zone
• There are two networks that connect EBS instances
– A high speed network to provide coordination among instances and move data between
instances.
– A lower speed network used as backup for coordination.
• $0.05 per million I/O requests
© Matthew Bass 2013
Amazon Simple Storage Solution (S3)
• S3 is a scalable storage solution
• Good for content storage and distribution
• Good for backup, archiving, and disaster recovery
• Costs $0.03 per GB of data
• More expensive but faster than Glacier
• Not as fast for I/O as EBS
© Matthew Bass 2013
Amazon Glacier
• Low cost storage solution
• Good for off site archival of Enterprise data
• Good for backup and data archiving
• Good for large volumes of data
• Costs $0.01 per GB of data
© Matthew Bass 2013
Amazon Database Solutions
• Amazon has a number of fully managed database solutions
• These are built on top of one of Amazon’s storage solutions
• They include:
– DynamoDB
– Relational Data Store (RDS)
– Redshift
– ElastiCache
© Matthew Bass 2013
DynamoDB
• Key Value data store
• Uses a throughput oriented pricing model (rather than a
storage oriented model)
• Uses solid state drives
• Guarantees single digit read latencies
• You pay a flat hourly rate based on capacity that you reserve
– Costs $0.0065 per hour for every 10 units of write capacity
– Costs $0.0065 per hour for every 10 unites of read capacity
© Matthew Bass 2013
Relational Data Store
• A distributed relational web service that provides a
relational database for use in applications
• It provides access to MySQL, Oracle, SQL Server, or
PostgreSQL
• It simplifies installation, patching, and backup related
issues
• Priced per hour according to db type, size, and number
© Matthew Bass 2013
Redshift
• Redshift is Amazon’s data warehousing solution
• Integrates with other storage solutions
• Priced at either $0.25 per hour on the low end
• $1000/year per terabyte per year
© Matthew Bass 2013
ElastiCache
• A Web Service that enables an in memory data cache
• Supports:
– Memcached
– Redis
• Improves latency and throughput for read heavy applications
• Prices are per Cache node/hour
© Matthew Bass 2013
Amazon CloudFront
• Amazon’s content delivery network
• Provides edge services
– Competes with companies such as Akamai
• This service will allow you to locate content closer to users
– Reduces latency
• You specify the edge location and point it to the origin
• You can route DNS to the edge location if you want
© Matthew Bass 2013
Amazon Elastic IP Addressing
• Amazon provides elastic IP addressing
• The IP address is associated with your account – not with an
instance
• You can programmatically map the elastic IP to any instance in
your account
• In this way you make the deployment configuration
transparent to the user/application
– Remember the virtual network discussion?
© Matthew Bass 2013
Many Other Services Available
• Authentication services
• Analytics
• Elastic Map Reduce
• Real time data streaming and processing
• Business process automation services
• Email services
• Notification services
• …
© Matthew Bass 2013
Comparison to Other Providers
• Other major providers (Google, Microsoft, Rackspace) offer
similar services
• Google doesn’t have as many services but has different pricing
model
– Charges in 10 minute increments rather than one hour increment
• Microsoft has similar services
• Rackspace also provides comparable options
© Matthew Bass 2013
Outages
• In Amazon (and others) there are some kinds of outages that
are specific to the structure of the provider
• We will now look at some of these outages
© Matthew Bass 2013
Zone Failure
• All of the IaaS providers have some notion of an “availability zone”
• An availability zone (or fault domain in Azure) has it’s own switch,
router, and rack
• These availability zones are isolated from each other in a way that
nodes within an availability zone are not
© Matthew Bass 2013
Zone Failure Modes
• A zone can fail in different ways
Zone 1 Zone 2 Zone 3
Region
© Matthew Bass 2013
Complete Failure
• If for example you have a power outage you’ll have a complete
failure
• If you try to route traffic to any of these machines you’ll get a “no
route to host”
– This happens quickly – fast fail
• You’ll know the zone is out
• You can then spin up a new zone elsewhere
© Matthew Bass 2013
Zone Failure Modes
• You could have a network failure
Zone 1 Zone 2 Zone 3
Region
© Matthew Bass 2013
Network Failure
• If you have a network failure it’s typically not a complete failure
• The machines are still working but the network is having trouble
• There is often still a route to host but your data isn’t reaching the
host
• As a result you don’t get a fast fail
– You’ll get long timeouts
© Matthew Bass 2013
Network Failure
• With the long timeouts your system will start to back up
• It’s difficult to tell the difference between this issue and other
issues that result in latency lags
• This problem can be intermittent as some of the routers might be
down but not all
© Matthew Bass 2013
Zone Failure Modes
• You could have a failure of some zone service
Zone 1 Zone 2 Zone 3
Region
© Matthew Bass 2013
Zone Service Failure
• This is some when a service fails that the zone is dependent on
– It could be something that is part of the platform as a service (e.g.
EBS)
– It could also be a central service in your application
• This causes cascading failures
• Difficult to figure out what is going on
© Matthew Bass 2013
Region Failure
• It’s rare but a Region can fail as well
• Both complete and partial failures have happened
• Typically this starts with isolated issues that cascade
• There might be an issue with a few nodes or with a single availability zone
• Other zones become impacted (often due to additional traffic) and fail
– It can be difficult to determine the scope of the issue while it’s occurring
© Matthew Bass 2013
Regional Failure Modes
• You could loose network access to a region
Zone 1 Zone 2 Zone 3
Region
© Matthew Bass 2013
Regional Outage
• This is often caused by
– a DNS issue
– Router issues
– Network capacity overload
• Causes you to loose access to a region
© Matthew Bass 2013
Regional Failure Modes
• Local failures can cause a control plane overload
Zone 1 Zone 2 Zone 3
Region
© Matthew Bass 2013
Data Store Failure
• As with the other portions of the system the data store can become
unresponsive
• The remedy for this is typically to mark this node as bad and attempt to
bring a new node online
• If the issue is more pervasive it can result in:
– Disrupted availability
– Loss of persistent data
© Matthew Bass 2013
Backup Failure
• Systems will often have a backup data mechanism
• This is often a key component in disaster recovery
• This can also fail
– It can become temporarily or permanently unavailable
© Matthew Bass 2013
Upgrades
• Cloud providers need to upgrade their software as well
• When they do this the nodes that are being upgraded
experience an outage
• If your software is running on these nodes you might
experience an outage as well
© Matthew Bass 2013
Utilizing AWS
• You can utilize AWS in many ways
– You can host your entire application in the cloud
– You can host a specific portion of your application in the cloud
– You can use the cloud for a specialized need
© Matthew Bass 2013
Hosting Your Application
• You can have a system that is fully deployed in the cloud
• You’ll need to figure out how to structure the application to achieve both functional and quality
attribute needs
• You’ll want to first consider quality attribute concerns such as:
– Scalability
– Availability
– Security
– …
• Utilize the techniques we talked about to determine the needs
– Fault modeling (considering the cloud specific faults)
– Threat modeling
– Understanding the anticipated load and desired throughput and latency
• Come up with a gross structure that achieves your objectives
– Think about partitioning of the system to support testing, degraded modes of operation and independent
deployment
© Matthew Bass 2013
Partial Hosting
• You might want to leverage the cloud for a specific portion of your
system e.g.
– Supporting mobile applications
– Databases
– Analytics
– Delivery of particular content
– Hosting your front end
– …
• This is typically going to be driven by cost and quality attribute
needs (e.g. scalability)
© Matthew Bass 2013
Backup and Recovery
• Many organizations utilize the cloud for bulk storage, archiving,
or back up and recovery
• In the past external services were used for such needs
– They often stored data on tape in separate physical locations
• It can be cheaper and more convenient to utilize cloud services
• As a result many organizations use the cloud for such storage
needs
© Matthew Bass 2013
Summary
• Many services are available in the cloud
– Storage
– Network
– Compute related services
– …
• These services provide different levels of service at different pricing
levels
• Utilizing the cloud appropriately and efficiently takes an explicit
understanding of both your needs and the services available

More Related Content

PPT
Design principles of scalable, distributed systems
PDF
Architecting for the cloud elasticity security
PDF
Error in hadoop
PPTX
A load balancing model based on cloud partitioning for the public cloud. ppt
PPTX
LOAD BALANCING ALGORITHMS
PPTX
Cloud computing Module 2 First Part
PPTX
load balancing in public cloud ppt
PPT
The Architect's Two Hats
Design principles of scalable, distributed systems
Architecting for the cloud elasticity security
Error in hadoop
A load balancing model based on cloud partitioning for the public cloud. ppt
LOAD BALANCING ALGORITHMS
Cloud computing Module 2 First Part
load balancing in public cloud ppt
The Architect's Two Hats

What's hot (20)

PDF
MariaDB High Availability Webinar
PDF
Comparing high availability solutions with percona xtradb cluster and percona...
PPTX
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
PPTX
Load balancing
PPTX
Distributed systems and scalability rules
PPTX
Cloud computing
PPT
Building large scale, job processing systems with Scala Akka Actor framework
PPTX
Running MariaDB in multiple data centers
PPT
Designing Distributed Systems: Google Cas Study
PPTX
Optimal load balancing in cloud computing
PDF
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
PPTX
My Dissertation 2016
PDF
Distributed Database practicals
PDF
A load balancing model based on cloud partitioning
PPTX
Natural Laws of Software Performance
ODP
Distributed systems and consistency
PPTX
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
PPTX
Взгляд на облака с точки зрения HPC
PPTX
Spark Overview and Performance Issues
MariaDB High Availability Webinar
Comparing high availability solutions with percona xtradb cluster and percona...
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Load balancing
Distributed systems and scalability rules
Cloud computing
Building large scale, job processing systems with Scala Akka Actor framework
Running MariaDB in multiple data centers
Designing Distributed Systems: Google Cas Study
Optimal load balancing in cloud computing
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
My Dissertation 2016
Distributed Database practicals
A load balancing model based on cloud partitioning
Natural Laws of Software Performance
Distributed systems and consistency
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Взгляд на облака с точки зрения HPC
Spark Overview and Performance Issues
Ad

Viewers also liked (6)

PDF
Shining the Enterprise Light on Shades of Social
PDF
Javaedge 2010-cschalk
PDF
How Cloud Providers' Business Needs Drive Enterprise Identity & Security
PDF
Realizing business value with iam
PDF
Manage Security & Compliance of Your AWS Account using CloudTrail
PDF
Iam cloud security_vision_wp_236732
Shining the Enterprise Light on Shades of Social
Javaedge 2010-cschalk
How Cloud Providers' Business Needs Drive Enterprise Identity & Security
Realizing business value with iam
Manage Security & Compliance of Your AWS Account using CloudTrail
Iam cloud security_vision_wp_236732
Ad

Similar to Architecting for the cloud cloud providers (20)

PDF
Cloud computing aws -key services
PPTX
AWS GLOBAL INFRA AND SERVICE LIST01.pptx
PPTX
Components of AWS infrastructure and AWS Services.pptx
PPTX
Aws platform overview
PPTX
Aws platform overview
PPSX
Amazon ec2 s3 dynamo db
PPTX
Cloud Spotting 2017: An overview of cloud computing
PPTX
Introduction to Amazon Web Services
PPTX
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
PPTX
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
PPTX
Amazon WebServices lection 1
PPTX
AWS cloud computing internship training.pptx
PDF
002 AWSSlides.pdf
PPTX
AWS Distilled
PPTX
DevConf 2020: Resiliency and availability design patterns for the cloud
PDF
Introduction to AWS Services
PPTX
Website on aws
PPTX
SPPU_TE_COMPUTER_CLOUD_COMPUTING_unit 4.pptx
PPTX
sppu_TE_Comp_Cloud_computing_unit 4_cc.pptx
Cloud computing aws -key services
AWS GLOBAL INFRA AND SERVICE LIST01.pptx
Components of AWS infrastructure and AWS Services.pptx
Aws platform overview
Aws platform overview
Amazon ec2 s3 dynamo db
Cloud Spotting 2017: An overview of cloud computing
Introduction to Amazon Web Services
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
Amazon WebServices lection 1
AWS cloud computing internship training.pptx
002 AWSSlides.pdf
AWS Distilled
DevConf 2020: Resiliency and availability design patterns for the cloud
Introduction to AWS Services
Website on aws
SPPU_TE_COMPUTER_CLOUD_COMPUTING_unit 4.pptx
sppu_TE_Comp_Cloud_computing_unit 4_cc.pptx

More from Len Bass (20)

PDF
Devops syllabus
PDF
DevOps Syllabus summer 2020
PDF
11 secure development
PDF
10 disaster recovery
PDF
9 postproduction
PDF
8 pipeline
PDF
7 configuration management
PDF
6 microservice architecture
PDF
5 infrastructure security
PPTX
4 container management
PDF
3 the cloud
PDF
1 virtual machines
PDF
2 networking
PDF
Quantum talk
PDF
Icsa2018 blockchain tutorial
PDF
Experience in teaching devops
PDF
Understanding blockchains
PDF
What is a blockchain
PDF
Dev ops and safety critical systems
PDF
My first deployment pipeline
Devops syllabus
DevOps Syllabus summer 2020
11 secure development
10 disaster recovery
9 postproduction
8 pipeline
7 configuration management
6 microservice architecture
5 infrastructure security
4 container management
3 the cloud
1 virtual machines
2 networking
Quantum talk
Icsa2018 blockchain tutorial
Experience in teaching devops
Understanding blockchains
What is a blockchain
Dev ops and safety critical systems
My first deployment pipeline

Recently uploaded (20)

PDF
System and Network Administration Chapter 2
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Online Work Permit System for Fast Permit Processing
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
System and Network Administraation Chapter 3
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Essential Infomation Tech presentation.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
AI in Product Development-omnex systems
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
ai tools demonstartion for schools and inter college
System and Network Administration Chapter 2
Which alternative to Crystal Reports is best for small or large businesses.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Online Work Permit System for Fast Permit Processing
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms II-SECS-1021-03
System and Network Administraation Chapter 3
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Materi-Enum-and-Record-Data-Type (1).pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Essential Infomation Tech presentation.pptx
Transform Your Business with a Software ERP System
Wondershare Filmora 15 Crack With Activation Key [2025
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
AI in Product Development-omnex systems
PTS Company Brochure 2025 (1).pdf.......
ai tools demonstartion for schools and inter college

Architecting for the cloud cloud providers

  • 1. © Matthew Bass 2013 Architecting for the Cloud Len and Matt Bass Cloud Providers
  • 2. © Matthew Bass 2013 IaaS Providers • There are several primary providers – Amazon: Amazon Web Services (AWS) – Microsoft: Azure – Google: Google Compute Engine – … • Each of these are set up a bit differently with slightly different internal decisions and associated services
  • 3. © Matthew Bass 2013 Goals • The goals for this talk is not to give you a definitive how to for each provider • It’s meant to give you just an introduction • The idea is that you’ll see how the concepts that we talked about in the course map to specific providers • We’ll look primarily at Amazon (with some details from others thrown in) • We’ll go through both the overall structure and look at specific services
  • 4. © Matthew Bass 2013 Amazon Elastic Compute Cloud • Amazon EC2 provides compute capacity in the cloud • You can select the machine image with a given OS and specified capability • You can resize the capacity as needed • Takes minutes to spin up a new VM • You can specify multiple instances and select where they will run – Region & availability zones • You pay per usage/hour depending on the capability of the instance and if it’s a reserved instance (dedicated)
  • 5. © Matthew Bass 2013 Regions • Amazon has divided their cloud offerings into multiple regions. Each region should be thought of as a separate cloud – I.e. there is no automatic copying of data from one region to another.
  • 6. © Matthew Bass 2013 Current AWS Regions • North America: – US East (5 availability zones) – US West Oregon (3 availability zones) – US West Northern California (3 availability zones) – USGov Cloud (2 availability zones) • South America – Sao Paulo (2 availability zones) • Europe – Ireland (2 availability zones) • Asia Pacific – Sydney (2 availability zones) – Singapore (2 availability zones) – China (1 availability zone) – Tokyo (3 availability zones)
  • 7. © Matthew Bass 2013 AWS and Services • Amazon Web Services offers a number of services • These services are things like: – Storage – Database – Network capabilities – Monitoring – … • Not all services are available at all regions – https://aws.amazon.com/about-aws/globalinfrastructure/regional- product-services/
  • 8. © Matthew Bass 2013 Amazon Availability Zones • Amazon has a notion of availability zones • Engineered to be insulated from failures in other availability zones • Availability zones are locations within a region • Amazon has not announced the details of an availability region but presumably they are – Physically separate data centers – Have independent networks – Have independent power delivery – …
  • 9. © Matthew Bass 2013 Amazon Service Level Agreement • Amazon guarantees 99.95% availability for each region • IaaS consumers are free to deploy their applications: – Within an availability zone – Across availability zones but within a region – Across regions • Amazon does not make any claim about the availability of their availability zones (that I could find)
  • 10. © Matthew Bass 2013 All-in-one Single Server
  • 11. © Matthew Bass 2013 Basic 4-server Setup
  • 12. © Matthew Bass 2013 Multiple Availability Zones
  • 13. © Matthew Bass 2013 Multiple Regions
  • 14. © Matthew Bass 2013 Elastic Compute Cloud (EC2) & Redundancy • EC2 supports different levels of redundancy – It is up to the customer to determine how much redundancy they wish to have and how much they wish to pay for it • Redundant elements can be: – Within an availability zone – Across availability zones – Across regions
  • 15. © Matthew Bass 2013 Microsoft Azure Regions • North America – US Central (Iowa) – US East (Virginia) – US East 2 (Virginia) – US North Central (Illinois) – US South Central (Texas) – US West (California) • Europe – Europe North (Ireland) – Europe West (Netherlands) • Asia Pacific – East (Hong Kong) – Southeast (Singapore) • Japan – Japan East (Saitama) – Japan West (Osaka) • Brazil – Sao Paulo
  • 16. © Matthew Bass 2013 Fault Domains in Azure • In Azure there is the concept of Fault Domains • A Fault Domain is essentially a rack in a given datacenter • A consumer is not able to define which fault zones the application are distributed to – Unlike an availability zone • As a result the fault zone is really an internal structure
  • 17. © Matthew Bass 2013 Upgrade Domains in Azure • An upgrade domain is similar to a fault domain • Essentially an upgrade domain will be upgraded at one time – When Microsoft upgrades their internal infrastructure they do so a domain at a time • In order to guard against failures within a fault domains and upgrades you need to replicate across both fault and upgrade domains • This is called an availability set
  • 18. © Matthew Bass 2013 Azure Availability Sets
  • 19. © Matthew Bass 2013 Amazon Auto Scaling • Auto Scaling works in conjunction with Cloudwatch (Amazon’s monitoring service) • The idea is the monitoring service monitors the metrics – CPU utilization – Latency – Memory consumption • The Auto Scaling solution establishes the rules – Add instances when utilization exceeds 70% – Remove instances when utilization falls below 10% • You can specify things like a “cooling off” period – Where no action is taken until the system has a chance to stabilize
  • 20. © Matthew Bass 2013 Amazon Elastic Load Balancer • This is Amazon’s load balancing solution – Recall the push/pull architecture discussion • It tracks the status and location of instances • Routes requests to healthy instances based on criteria that you establish • Can be used in conjunction with Auto Scaling – When new instances are added or removed they are registered with the ELB • Can use in conjunction with Amazon’s DNS (route 53) – You can use DNS failover to move from one region to another – The DNS will route traffic to the ELB in the target region
  • 21. © Matthew Bass 2013 Amazon Simple Queue Service • SQS is Amazon’s queuing service – Again recall the push/pull architecture discussion • It’s a service that supports message queues • Recall it can be used in conjunction with Auto Scaling to manage the elasticity of your application • Pricing is per million requests handled
  • 22. © Matthew Bass 2013 Amazon Storage Solutions • Amazon has several storage solutions – Elastic Block Store (EBS) – Simple Storage Solution (S3) – Glacier • These provide raw unmanaged storage • This is useful for: – Disaster recovery – Backup – Archiving – Persistence for your own database solution
  • 23. © Matthew Bass 2013 Amazon Elastic Block Store Amazon Elastic Block Store (EBS) is Amazon’s data file system. Some of its features are • Data is persisted independently from instances • EBS data is placed in a specific availability zones and can be attached to instances in the same availability zone • EBS data is automatically replicated within availability zone • There are two networks that connect EBS instances – A high speed network to provide coordination among instances and move data between instances. – A lower speed network used as backup for coordination. • $0.05 per million I/O requests
  • 24. © Matthew Bass 2013 Amazon Simple Storage Solution (S3) • S3 is a scalable storage solution • Good for content storage and distribution • Good for backup, archiving, and disaster recovery • Costs $0.03 per GB of data • More expensive but faster than Glacier • Not as fast for I/O as EBS
  • 25. © Matthew Bass 2013 Amazon Glacier • Low cost storage solution • Good for off site archival of Enterprise data • Good for backup and data archiving • Good for large volumes of data • Costs $0.01 per GB of data
  • 26. © Matthew Bass 2013 Amazon Database Solutions • Amazon has a number of fully managed database solutions • These are built on top of one of Amazon’s storage solutions • They include: – DynamoDB – Relational Data Store (RDS) – Redshift – ElastiCache
  • 27. © Matthew Bass 2013 DynamoDB • Key Value data store • Uses a throughput oriented pricing model (rather than a storage oriented model) • Uses solid state drives • Guarantees single digit read latencies • You pay a flat hourly rate based on capacity that you reserve – Costs $0.0065 per hour for every 10 units of write capacity – Costs $0.0065 per hour for every 10 unites of read capacity
  • 28. © Matthew Bass 2013 Relational Data Store • A distributed relational web service that provides a relational database for use in applications • It provides access to MySQL, Oracle, SQL Server, or PostgreSQL • It simplifies installation, patching, and backup related issues • Priced per hour according to db type, size, and number
  • 29. © Matthew Bass 2013 Redshift • Redshift is Amazon’s data warehousing solution • Integrates with other storage solutions • Priced at either $0.25 per hour on the low end • $1000/year per terabyte per year
  • 30. © Matthew Bass 2013 ElastiCache • A Web Service that enables an in memory data cache • Supports: – Memcached – Redis • Improves latency and throughput for read heavy applications • Prices are per Cache node/hour
  • 31. © Matthew Bass 2013 Amazon CloudFront • Amazon’s content delivery network • Provides edge services – Competes with companies such as Akamai • This service will allow you to locate content closer to users – Reduces latency • You specify the edge location and point it to the origin • You can route DNS to the edge location if you want
  • 32. © Matthew Bass 2013 Amazon Elastic IP Addressing • Amazon provides elastic IP addressing • The IP address is associated with your account – not with an instance • You can programmatically map the elastic IP to any instance in your account • In this way you make the deployment configuration transparent to the user/application – Remember the virtual network discussion?
  • 33. © Matthew Bass 2013 Many Other Services Available • Authentication services • Analytics • Elastic Map Reduce • Real time data streaming and processing • Business process automation services • Email services • Notification services • …
  • 34. © Matthew Bass 2013 Comparison to Other Providers • Other major providers (Google, Microsoft, Rackspace) offer similar services • Google doesn’t have as many services but has different pricing model – Charges in 10 minute increments rather than one hour increment • Microsoft has similar services • Rackspace also provides comparable options
  • 35. © Matthew Bass 2013 Outages • In Amazon (and others) there are some kinds of outages that are specific to the structure of the provider • We will now look at some of these outages
  • 36. © Matthew Bass 2013 Zone Failure • All of the IaaS providers have some notion of an “availability zone” • An availability zone (or fault domain in Azure) has it’s own switch, router, and rack • These availability zones are isolated from each other in a way that nodes within an availability zone are not
  • 37. © Matthew Bass 2013 Zone Failure Modes • A zone can fail in different ways Zone 1 Zone 2 Zone 3 Region
  • 38. © Matthew Bass 2013 Complete Failure • If for example you have a power outage you’ll have a complete failure • If you try to route traffic to any of these machines you’ll get a “no route to host” – This happens quickly – fast fail • You’ll know the zone is out • You can then spin up a new zone elsewhere
  • 39. © Matthew Bass 2013 Zone Failure Modes • You could have a network failure Zone 1 Zone 2 Zone 3 Region
  • 40. © Matthew Bass 2013 Network Failure • If you have a network failure it’s typically not a complete failure • The machines are still working but the network is having trouble • There is often still a route to host but your data isn’t reaching the host • As a result you don’t get a fast fail – You’ll get long timeouts
  • 41. © Matthew Bass 2013 Network Failure • With the long timeouts your system will start to back up • It’s difficult to tell the difference between this issue and other issues that result in latency lags • This problem can be intermittent as some of the routers might be down but not all
  • 42. © Matthew Bass 2013 Zone Failure Modes • You could have a failure of some zone service Zone 1 Zone 2 Zone 3 Region
  • 43. © Matthew Bass 2013 Zone Service Failure • This is some when a service fails that the zone is dependent on – It could be something that is part of the platform as a service (e.g. EBS) – It could also be a central service in your application • This causes cascading failures • Difficult to figure out what is going on
  • 44. © Matthew Bass 2013 Region Failure • It’s rare but a Region can fail as well • Both complete and partial failures have happened • Typically this starts with isolated issues that cascade • There might be an issue with a few nodes or with a single availability zone • Other zones become impacted (often due to additional traffic) and fail – It can be difficult to determine the scope of the issue while it’s occurring
  • 45. © Matthew Bass 2013 Regional Failure Modes • You could loose network access to a region Zone 1 Zone 2 Zone 3 Region
  • 46. © Matthew Bass 2013 Regional Outage • This is often caused by – a DNS issue – Router issues – Network capacity overload • Causes you to loose access to a region
  • 47. © Matthew Bass 2013 Regional Failure Modes • Local failures can cause a control plane overload Zone 1 Zone 2 Zone 3 Region
  • 48. © Matthew Bass 2013 Data Store Failure • As with the other portions of the system the data store can become unresponsive • The remedy for this is typically to mark this node as bad and attempt to bring a new node online • If the issue is more pervasive it can result in: – Disrupted availability – Loss of persistent data
  • 49. © Matthew Bass 2013 Backup Failure • Systems will often have a backup data mechanism • This is often a key component in disaster recovery • This can also fail – It can become temporarily or permanently unavailable
  • 50. © Matthew Bass 2013 Upgrades • Cloud providers need to upgrade their software as well • When they do this the nodes that are being upgraded experience an outage • If your software is running on these nodes you might experience an outage as well
  • 51. © Matthew Bass 2013 Utilizing AWS • You can utilize AWS in many ways – You can host your entire application in the cloud – You can host a specific portion of your application in the cloud – You can use the cloud for a specialized need
  • 52. © Matthew Bass 2013 Hosting Your Application • You can have a system that is fully deployed in the cloud • You’ll need to figure out how to structure the application to achieve both functional and quality attribute needs • You’ll want to first consider quality attribute concerns such as: – Scalability – Availability – Security – … • Utilize the techniques we talked about to determine the needs – Fault modeling (considering the cloud specific faults) – Threat modeling – Understanding the anticipated load and desired throughput and latency • Come up with a gross structure that achieves your objectives – Think about partitioning of the system to support testing, degraded modes of operation and independent deployment
  • 53. © Matthew Bass 2013 Partial Hosting • You might want to leverage the cloud for a specific portion of your system e.g. – Supporting mobile applications – Databases – Analytics – Delivery of particular content – Hosting your front end – … • This is typically going to be driven by cost and quality attribute needs (e.g. scalability)
  • 54. © Matthew Bass 2013 Backup and Recovery • Many organizations utilize the cloud for bulk storage, archiving, or back up and recovery • In the past external services were used for such needs – They often stored data on tape in separate physical locations • It can be cheaper and more convenient to utilize cloud services • As a result many organizations use the cloud for such storage needs
  • 55. © Matthew Bass 2013 Summary • Many services are available in the cloud – Storage – Network – Compute related services – … • These services provide different levels of service at different pricing levels • Utilizing the cloud appropriately and efficiently takes an explicit understanding of both your needs and the services available