SlideShare a Scribd company logo
Adrian Hornsby, Technical Evangelist @ AWS
Journey Towards Scaling Your
Application to 10 Million Users
@adhorn
Journey Towards Scaling Your Application to Million Users
Let’s start from…
What are we building?
Static vs Dynamic
Web app vs Enterprise app
Auth vs no-Auth
Server vs Serverless
Select your region
AWS Global Infrastructure
16
Regions
42 Availability Zones
The “Must” from Day 1
• High quality code
• Version controlled
• CI/CD pipeline
• Infrastructure as code
• Security at every layer
• Cost conscious
• Test everything
• DR procedure
Operational Excellence
Users > 1
Start simple, basic
Amazon Simple Storage Service (S3)
Amazon S3
Amazon S3
App 0.0v1
http://poliko.adhorn.me.s3-website-eu-west-1.amazonaws.com
Simple Static Website
Amazon Route53
• Traffic Flow
• Latency Based Routing
• Geo DNS
• Private DNS for Amazon VPC
• DNS Failover
• Health Checks and Monitoring
• Domain Registration
• CloudFront Zone Apex Support
• S3 Zone Apex Support
• Weighted Round Robin
Highly available and scalable DNS web service.
Amazon S3
App 0.0v2
http://poliko.adhorn.me.s3-website-eu-west-1.amazonaws.com
Simple Static Website
Amazon
Route53 http://poliko.adhorn.me
• Cache content at the edge for
faster delivery
• Lower load on origin
• Dynamic and static content
• Streaming video
• Custom SSL certificates
• Low TTLs
Amazon CloudFront (CDN)
Amazon S3
Amazon
CloudFront
Amazon
Route53
App 0.0v3
Simple Static Website
http://poliko.adhorn.me
Custom backend
P2M4 D2 X1 G2T2 R4 I3 C5
General Purpose
GPUGeneral Purpose
Dense storage Large memory
Graphics
intensive
Memory intensive High I/O
Compute intensiveBurstable
Lightsail
Simple VPS
F1
FPGAs
Amazon Elastic Compute Cloud (EC2)
Amazon EC2 Instance Types
App 0.1v1
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
EC2 backend
www.example.com
54.223.92.16
App 0.1v2
Docker
Container
Elastic IP
User
Amazon
Route 53
Containerized backend
www.example.com
54.223.92.16
Managed
API Gateway
cache
Amazon
CloudWatch
API Gateway
Endpoints on
Amazon EC2
Any other publicly
accessible endpoint
AWS Lambda
functions
Amazon
CloudFront
API Gateway
User Amazon
Route 53
App 0.1v3
Serverless backend
Databases
Self-managed Fully managed
Amazon EC2 Amazon
DynamoDB
Amazon RDS
Database options
NoSQL vs SQL
Why start with SQL?
• Easy to change your data access needs
• Established and well-worn technology.
• Lots of existing code, communities, books, and tools.
• You aren’t going to break SQL DBs in your first 10 million
users. No, really, you won’t.*
• Clear patterns to scalability.
*Unless you are doing something SUPER peculiar with the data or you have MASSIVE amounts of it.
…but even then SQL will have a place in your stack.
Why you might need NoSQL?
• Super low-latency applications
• Metadata-driven datasets
• Highly nonrelational data
• Need schema-less data constructs*
• Rapid ingest of data (thousands of records/sec)
• Massive amounts of data (again, in the TB range)
*Need!= “It’s easier to do dev without schemas”
Application
Elastic IP
Database
User
Amazon
Route 53
App 0.2
Separate the data layer
Separation of content type
Application
Elastic IP
Database
User
Amazon
Route 53
App 0.3
Separate static assets from dynamic content
Amazon S3
Amazon
CloudFront
*.js
*.jpeg
*.mp4
Users > 1000
Availability and Redundancy
Elastic Load Balancer
• Highly available
• 1 - 65535
• Health checks
• Session stickiness
• Monitoring / Logging
• Content-based routing
• Container-based apps
• WebSockets
• HTTP/2
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
Load
balancer
App 0.4
Available & redundant application
User
Amazon
Route 53
Amazon
CloudFront
Amazon S3
Caching layer
Amazon Elasticache
• Redis and Memcached Compatible
• Fully Managed
• Easily Scalable
• Transient session data
• Shared state
• High-frequency counters
• Queues
• Leaderboards
• Pub/Sub
• Lists, sets, …
In-memory data store and cache
Amazon
ElastiCache
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance
Standby (Multi-AZ)
ELB
User
Amazon
Route 53
Amazon
CloudFront
Amazon S3
App 0.5
Stateless application
Web
Instance
Web
Instance
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Weekly traffic pattern
Auto Scaling
Auto Scaling
• Maintain your Amazon EC2 instance availability
• Automatically Scale Up and Down your EC2 Fleet
• Scale based on CPU, Memory or Custom metrics
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance
Standby (Multi-AZ)
ELB
App 0.6
Auto scaling groups
User
Amazon
Route 53
Amazon
CloudFront
Amazon S3
Web
Instances
Web
Instances ElastiCache
Auto-Scaling group
Users > 100,000
Journey Towards Scaling Your Application to Million Users
Databases (part 1)
Read / Write Sharding
RDS DB Instance
Read Replica
App
Instance
App
Instance
App
Instance
RDS DB Instance
Master (Multi-AZ)
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Database Federation
Users
DB
Products
DB
App
Instance
App
Instance
App
Instance
Database Sharding
User ShardID
002345 A
002346 B
002347 C
002348 B
002349 A
CBA
App
Instance
App
Instance
App
Instance
Users > 1,000,000
Asynchronous patterns
Message passing
A
Queue
B
A
Queue
BListener
Pub-Sub
SNS, SQS, Redis, RabbitMQ
Async. Architecture (part 1)
Web
Instances
Worker
Instance
Worker
Instance
Queue
API
Instance
API
Instance
API
Instance
API: {DO foo}
PUT JOB: {JobID: 0001, Task: DO foo}
API: {JobID: 0001}
GET JOB: {JobID: 0001, Task: DO foo}
ElastiCache
Result:
{
JobID: 0001,
Result: bar
}
Async. Architecture (part 2)
Worker
Instance
Worker
Instance
Queue
API
Instance
API
Instance
API
Instance
ElastiCache
Amazon SNS
Push Notification
User
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancer
Web
Instance
Web
Instance
Amazon
Route 53User
Amazon S3
Amazon
Cloudfront
ElastiCache
Worker
Instance
Worker
Instance
App 0.7
Decoupling
Queue Amazon SNS
Event-driven patterns
Event driven
A B CEvent on B by A triggers C
How Lambda works
S3 event
notifications
DynamoDB
Streams
Kinesis
events
Cognito
events
SNS
events
Custom
events
CloudTrail
events
LambdaDynamoDB
Kinesis S3
Any custom
Invoked in response to events
- Changes in data
- Changes in state
Redshift
SNS
Access any service,
including your own
Such as…
Lambda functions
CloudWatch
events
Event-driven using Lambda
AWS Lambda:
Resize Images
Users upload photos
S3:
Source Bucket
S3:
Destination Bucket
Triggered on
PUTs
Micro-Services
Amazon
Route 53User
Amazon
Cloudfront
API Edge Service
Product Listing
Service
Recommendation
Service
Any
Service
Auth.
Service
Databases (part 2)
Specialized Database
NoSQL Graph DB
Users = 10,000,000
Happy Scaling!
@adhorn

More Related Content

PPTX
AI and Innovations on AWS
PPTX
Oslo AWSome Day keynote
PPTX
Introduction to Artificial Intelligence (AI) at Amazon
PPTX
Building AI-powered Apps on AWS
PPTX
Amazon AI (October 2017)
PDF
AWS Media Day- AWS 인공 지능 서비스를 활용한 미디어 서비스 개발화 (김기완 솔루션즈 아키텍트)
PDF
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
PDF
AWS와 Alexa 음성 인식 플랫폼을 통한 비즈니스 기회::윤석찬::AWS Summit Seoul 2018
AI and Innovations on AWS
Oslo AWSome Day keynote
Introduction to Artificial Intelligence (AI) at Amazon
Building AI-powered Apps on AWS
Amazon AI (October 2017)
AWS Media Day- AWS 인공 지능 서비스를 활용한 미디어 서비스 개발화 (김기완 솔루션즈 아키텍트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
AWS와 Alexa 음성 인식 플랫폼을 통한 비즈니스 기회::윤석찬::AWS Summit Seoul 2018

Similar to Journey Towards Scaling Your Application to Million Users (7)

PPTX
Journey Towards Scaling Your Application to Million Users
PDF
Scaling on AWS for the First 10 Million Users at Websummit Dublin
PPTX
Escalando para sus primeros 10 millones de usuarios
PPTX
Escalando para sus primeros 10 millones de usuarios
PDF
Escalando hasta sus primeros 10 millones de usuarios
PPTX
Scaling on AWS to the First 10 Million Users
PDF
Escalando hasta sus primeros 10 millones de usuarios
Journey Towards Scaling Your Application to Million Users
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuarios
Scaling on AWS to the First 10 Million Users
Escalando hasta sus primeros 10 millones de usuarios
Ad

More from Adrian Hornsby (20)

PPTX
How can your business benefit from going serverless?
PDF
Can Automotive be as agile as Unicorns?
PDF
Moving Forward with AI - as presented at the Prosessipäivät 2018
PPTX
Chaos Engineering: Why Breaking Things Should Be Practised.
PPTX
Chaos Engineering: Why Breaking Things Should Be Practised.
PPTX
Model Serving for Deep Learning
PDF
AI in Finance: Moving forward!
PPTX
Building a Multi-Region, Active-Active Serverless Backends.
PDF
Moving Forward with AI
PPTX
AI: State of the Union
PPTX
Serverless Architectural Patterns
PPTX
re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...
PPTX
re:Invent re:Cap - Big Data & IoT at Any Scale
PPTX
Innovations and the Cloud
PPTX
Serverless in Action on AWS
PDF
Innovations and The Cloud
PPTX
Devoxx: Building AI-powered applications on AWS
PDF
10 Lessons from 10 Years of AWS
PDF
Developing Sophisticated Serverless Applications with AI
PPTX
AWS Startup Day Bangalore: Being Well-Architected in the Cloud
How can your business benefit from going serverless?
Can Automotive be as agile as Unicorns?
Moving Forward with AI - as presented at the Prosessipäivät 2018
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
Model Serving for Deep Learning
AI in Finance: Moving forward!
Building a Multi-Region, Active-Active Serverless Backends.
Moving Forward with AI
AI: State of the Union
Serverless Architectural Patterns
re:Invent re:Cap - An overview of Artificial Intelligence and Machine Learnin...
re:Invent re:Cap - Big Data & IoT at Any Scale
Innovations and the Cloud
Serverless in Action on AWS
Innovations and The Cloud
Devoxx: Building AI-powered applications on AWS
10 Lessons from 10 Years of AWS
Developing Sophisticated Serverless Applications with AI
AWS Startup Day Bangalore: Being Well-Architected in the Cloud
Ad

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PPTX
Spectroscopy.pptx food analysis technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
Spectroscopy.pptx food analysis technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
sap open course for s4hana steps from ECC to s4
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction

Journey Towards Scaling Your Application to Million Users

Editor's Notes

  • #5: We need some basics to lay the foundations we’ll need to build our knowledge of AWS on top of.
  • #8: Latency regulation
  • #9: Invest time to save time
  • #12: Simplicity.  Durability.  Scalability.  Security.  Broad integration with other AWS services Cloud Data Migration options. Tiered Storage Classes
  • #16: Cloudfront allows you to cache static content at the CF edge for faster delivery from a local pop to the end user; in other words, your static content gets cached locally to a user and then delivered locally reducing download times for the website overall there are over 60 CF cache pops around the world as we mentioned earlier. CloudFront helps lower load on your origin infrastructure You can front end static content as discussed and dynamic content as well For dynamic content, CF proxies and accelerates your connection back to your dynamic origin and you would set a 0 ttl on your dynamic content so CloudFront always goes back to origin to fetch this content.
  • #19: If you look at the far right end of the spectrum, you can see the G2, wich provide a cost-effective, high-performance platform for graphics applications. The P2, designed for general purpose GPU computing and the F1, our FPGA instance designed to accelerate computationally intensive algorithms. One instance in particular has become very popular amoung our customers: the P2 instance
  • #20: This here is the most basic set up you would need to serve up a web application. Any user would first hit Route53 for DNS resolution. Behind the DNS service is an EC2 instance running our webapp and database on a single server, We will need to attach an Elastic IP so Route53 can direct traffic to our webstack at that IP Address with an A record. To scale this infrastructure, the only real option we have is to get a bigger EC2 instance…
  • #21: Same with Docker ( application container ..)
  • #24: At AWS there are a lot of different options to running databases. One is to just install pretty much any database you can think of on an EC2 instance, and manage all of it yourself. If you are really comfortable doing DBA like activities, like backups, patching, security, tuning, this could be an option for you. Also, if you need something highly specialized or customized and need to manage the hardware to achieve this, again this might be for you. If not, then we have a few options that we think are a better idea: First is Amazon RDS, or Relational Database Service. With RDS you get a managed database instance of either MySQL, Oracle, Postgres or SQL Server, with features such as automated daily backups, simple scaling, patch management, snapshots and restores, High availability, and read replicas - depending on the engine you go with. We also have Aurora in Preview today. Amazon Aurora is a MySQL-compatible relational database that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Aurora provides up to five times better performance than MySQL at a price point one tenth that of a commercial relational databases while delivering similar performance and availability. Next up we have DynamoDB, a NoSQL database, built on top of SSDs. DynamoDB is based on the Dynamo whitepaper published by Amazon.com back in 2003. This whitepaper was considered the grandfather of most modern NoSQL databases like Cassandra. DynamoDB is kind of like a cousin of the original paper or an evolution of that whitepaper. One of the key concepts to DynamoDB is what we call “Zero Administration”. With DynamoDB the only knobs to tweak are the reads and writes per second you want the DB to be able to perform at. You set it, and it will give you that capacity with query responses averaging in single digit millisecond. We’ve had customers with loads such as half a million reads and writes per second without DynamoDB even blinking.
  • #26: So Why start with SQL databases? Generally speaking SQL based databases are established and well worn technology. There’s a good chance SQL is older than most people in this room. It has however continued to power most of the largest web applications we deal with on a daily basis. There are a lot of existing code, books, tools, communities, and people who know and understand SQL. Some of these newer nosql databases might have a handful of companies using them at scale. People are key here in addition to all of the other points as you may need to hire people to manage your database. You also aren’t going to break SQL databases in your first 10 million users. And yes there is an asterisk here, and we’ll get to that in a second. Lastly, there are a lot of clear patterns for scalability that we’ll discuss a bit through out this talk. So as for my point here at the bottom, I again strongly recommend SQL based technology, unless your application is doing something SUPER weird with the data, or you’ll have MASSIVE amounts of it, even then, SQL will be in your stack.
  • #27: So why else might you need NoSQL? There are definitely usecases where it makes sense to go NoSQL right off the bat. Some examples: Super low latency applications. Metadata driven datasets High-unrelational data Kind of going along with the previous is where you really need schema-less data constructs. And lets highlight the word NEED here. This isn’t just developers saying its easy to make apps without schemas. That’s just laziness Massive amounts of data, again from the previous slide, in the several TB range. Rapid ingest of data. Where you need to ingest potentially thousands of records per second into a single dataset
  • #28: So for this scenario today and based upon our discussion, we’re going to go with RDS and MYSQL as our database engine.
  • #30: Whats the biggest problem with this one?
  • #33: provide additional visibility into the health of the target instances and containers perform and report on health checks on a per-port basis
  • #34: Next up we need to address the lack of failover and redundancy in our infrastructure. We’re going to do this by adding in another webapp instance, and enabling the Multi-AZ feature of RDS, which will give us a standby instance in a different AZ from the Primary. We’re also going to replace our EIP with an Elastic Load Balancer to share the load between our two web instances Now we have an app that is a bit more scalable and has some fault tolerance built in as well.
  • #36: We could use Elasticache as a place to store common database query information for content that doesn’t change often, like information on our user, or what is in their cart. We should try and do this as often as possible; so what is Elasticache? Elasticache is hosted Memcache or Redis It does speak the same API as the traditional open source products so think of this as Memcache or Redis as a Service where we manage the clusters for you You can scale from one to many nodes This provides very fast single digit ms latencies as well Managed Simplifies and offloads the management, monitoring, and operation of in-memory cache environments. Compatible Most client libraries will work with the respective engines they were built for - no additional changes or tweaking required. Monitored Detailed monitoring statistics for the engine nodes at no extra cost via Amazon CloudWatch No persistence or replication with Memcache With Redis, you can put a replica in a different AZ with persistence
  • #37: We can also move things like session information to Elasticache. We can also use Elasticache to store some of our common database query results which will prevent us from hitting the database too much. This should take load off of our DB tier. Removing session state from our web / app tier is also very key as it allows us to scale up and down without losing session information when this horizontal scaling happens. This is called making our tier “stateless”
  • #38: At the stage, you typically start to have a significant amount of traffic and you can already detect patterns (usage patterns) This is amazon.com usage patterns.. East and weast coast ..
  • #43: CAP theorem
  • #45: Write and updates Counters!!!! Not on the DB – redis!!
  • #46: Database Federation is where we break up the database by function. In our example, we have broken out the Forums DB from the User DB from the Products DB Of course, cross functional queries are harder to do and you may need to do your joins at the application layer for these types of queries This will reduce our database footprint for a while and the great thing is, this does prevent you from having to shard until much further down the line. This isn’t going to help for single large tables; for this we will need to shard.
  • #47: Sharding is where we break up that single large database into multiple DBs. We might need to do this because of database or table size or potentially for high write IOPs as well. Here is an example of us breaking up a database with a large table into 3 databases. Above we show where each userID is located, but the easiest way to describe how this would work would be to use the example of all users with A-H go into one DB, and I – M go in another, and N – Z go into the third DB. Typically this is done by key space and your application has to be aware of where to read from, update and write to for a particular record. ORM support can help here. This does create operation complexity so if you can federate first, do that. This can be done with SQL or NoSQL, and DynamoDB does this for you under the covers on the backend as your data size increases and the reads / writes per second scale.
  • #61: When we start getting into the 5M user plus range, we may start seeing database contention issues on writes to the Master We are going to drill into a couple of techniques to solve these types of issues, and those include Federation and Sharding