SlideShare a Scribd company logo
Michael Kalika
Chief Architect, Intuit Israel
System Scalability
April 22, 2020
Intuit Confidential and Proprietary 2
Agenda
Intro
Use Case
Fundamentals
Scaling Databases
Scaling API
Async Operations
Security in Cloud
Incident Management
Q&A
Intro
4
Hi, I’m Michael
Software Dev Leadership
HP
2004 - 2011
BI/Data Group Manager
HP
2011 - 2015
R&D Architect
Payoneer
2015 - 2017
Architect, DevOps and
PD Lead
Intuit
2017 - 2020
Chief Architect
Intuit
2020 - Today
5
Intuit Confidential and Proprietary 6
Where we startWhere we started
Intuit’s journey began over 35 years ago when our founder Scott Cook sat at his
kitchen table and watched his wife as she balanced their checkbook and thought
there must be a better way.
7
Consumers
Small businesses
Self-employed
Who we serve
Intuit Confidential and Proprietary 8
Who we are
Founded
9,000
Employees
50M
Customers
1993
IPO
~$6.8B
FY19
Revenue
20
Locations
1983
Intuit Confidential and Proprietary 9
Intuit global locations 20 locations in 9 countries
Brazil
São Paulo
Europe
London, UK
Paris, France
Australia
Melbourne
Sydney
India
Bangalore
Israel
Tel Aviv
United States
California:
Los Angeles
Mountain View
San Diego
San Francisco
Boise, ID
Fredericksburg, VA
Plano/Dallas, TX
Reno, NV
Tucson, AZ
Washington, D.C.
Canada
Edmonton
Mississauga/Toronto
Updated November 2019
Mexico
Mexico City
Intuit Confidential and Proprietary 10
Recognized as one of the world’s leading companies
2004 - 2019
Most Admired:
Computer Software
2002 - 2019
100 Best Companies
to Work For
2019
Most Innovative
Companies
2019
Companies Best
Positioned For
Breakout Growth
Intuit Confidential and Proprietary 11
Recognized as one of the top companies to work for
#15 IN THE BAY AREA
#10 IN TECHNOLOGY
#1 in Canada
#24 in the US
#4 in the UK
#2 in India
#3 in Australia
#15 COMPANIES THAT CARE #14 in Israel
Intuit Confidential and Proprietary 12
Be
Bold
Be
Passionate
Learn
Fast
Be
Decisive
Win
Together
Deliver
Awesome
Our values
Integrity Without
Compromise
We Care and Give Back
Intuit Confidential and Proprietary 13
Core capabilities: Our recipe to execute with excellence
WHAT TO SOLVE HOW TO SOLVE
CUSTOMER-DRIVEN INNOVATION (CDI) DESIGN FOR DELIGHT (D4D)
An important,
unsolved
customer
problem
…that we, and
those we
enable, can
solve well
…and build durable
competitive advantage
SUCCESS
IS
HERE
Deep customer
empathy
Go broad to
go narrow
Rapid experiments
with customers
DELIGHT
Use Case
Intuit Confidential and Proprietary 15
Hermes Deliveries
- Assume we have a small startup company that offers
low cost and fast deliveries by connecting people and
delivery needs.
- Initially when we start, our target is Rishon Le-Tzion and
we have tens of daily customers and 2-3 couriers.
Intuit Confidential and Proprietary 16
Hermes Deliveries
- We save all customers, couriers, orders, places, and history in the same database, in a
single DB instance.
- Everything is new and small, so we have no caching, no automation, no auto scaling, no
monitoring.
- This is perfect as we hardly have 1 delivery order in 15 minutes.
Intuit Confidential and Proprietary 17
COVID-19 time
- But, with COVID-19 more and more people started
using our service since we are the cheapest and
fastest service.
- Now, we have 20 orders in 30 minutes and the
number increases…
- We see how successful the service is and decided to
offer our services in Tel Aviv.
Intuit Confidential and Proprietary 18
Turtle Deliveries
- At this point we realize that our system performs
poorly.
- The App works very slow and API latency increased.
- We have database transaction deadlocks and whole
system failures.
Fundamentals
Intuit Confidential and Proprietary 20
Scalability Concepts
vs.
Vertical
Horizontal
Intuit Confidential and Proprietary 21
Scalability Concepts
Intuit Confidential and Proprietary 22
System Decomposition
Distributed System
Intuit Confidential and Proprietary 23
The Scale Cube
Data partitioning
Scale by splitting similar things [sharding]
Example: Cell Architecture and/or Sharding. A cell
is a self-contained installation that can satisfy all the
operations for a shard. A shard is a subset of a
much larger dataset, typically a range of users, for
example.
Horizontal duplication
Scale by cloning
Example: 18 Web servers under Load Balancer
Functional decomposition
Scale by splitting different things
Example: Orders, Inventory, Customers
Scaling Databases
Intuit Confidential and Proprietary 25
Query Optimization, Indexing and Connection Pool
- We use RDBMS which is heavily normalized
- Therefore, it was decided to
1. Introduce some redundant columns which frequently appear in WHERE and JOIN ON clauses
(denormalization)
- This will reduce join queries and break few big queries into smaller.
2. Introduce index to columns that frequently appear in WHERE clauses.
3. Use connection pool for optimization of the number of costly network connections.
Intuit Confidential and Proprietary 26
Vertical Scaling or Scaling Up
- All that helped improve application API latency by 30% which
was good enough at this time.
- We entered new areas.
- It was also decided to upgrade RDS and add more storage for
reducing future risks.
Intuit Confidential and Proprietary 27
New Challenges
- Everything is running great, we have more orders and
delivery couriers, but facing new issues… again…
1. Database index grows and requires maintenance
2. Table scanning with index is slow.
- Upgrade to a bigger RDS instance is costly and we are not
yet profitable.
- What is your next step?
Intuit Confidential and Proprietary 28
Read Replicas
- Bigger RDS is not able to handle all READ and WRITE requests.
- In most cases we need consistency in WRITEs, but small delays on READs are fine.
- Therefore, it was decided to create two READ replicas of a given source RDS instance,
thereby increasing read throughput.
Intuit Confidential and Proprietary 29
Read Replicas
- That was good and we decided to go for new areas.
- Now we see that the Primary instance is not able to handle all
writes and there is latency.
- We also have unacceptable lags between Primary and READ
replicas.
- What’s next?
Intuit Confidential and Proprietary 30
Functional Decomposition
- Our Locations table in database is getting high WRITE
traffic - the R:W ratio is 3:8.
- That table is used for location tracking and it has nothing
to do with the rest of the functionality.
- Why not decompose functionality?
1. Separate Locations table to a new dedicated database
2. Decouple location tracking functionality as a stand-alone
Microservice?
Intuit Confidential and Proprietary 31
Functional Decomposition
- Done deal! It works!
- Now we want to add the rest of the country and we must plan for
a scale
- What can we do?
Intuit Confidential and Proprietary 32
Data Partitioning or Sharding
Share Nothing Model
- It was decided to Shard the database.
- Sharding is a technique that splits data into smaller subsets and distributes them across a number of
physically separated database servers.
- Shards have no knowledge of each other.
If one database shard has a hardware
issue or goes through failover, no
other shards are impacted
The query to read or join data from
multiple database shards must be
specially engineered
North Center South
Intuit Confidential and Proprietary 33
Wrap Up and Questions
- Scalability Cube and Concepts
- Scaling Databases
1. Query Optimization, Indexing, Connection Pool
2. Vertical Scaling / Scaling Up
3. Read Replicas
4. Data Partitioning or Sharding
- Functional Decomposition
Scaling API
Intuit Confidential and Proprietary 35
Horizontal Scaling
- We noticed that our Location API server is extremely loaded and sometimes crashes.
- We decided to create three Location API servers in three AWS availability zones and put
them behind load balancer.
Intuit Confidential and Proprietary 36
Auto Scaling
- But we still face occasional issues during certain hours and we want to be smart with costs.
- Therefore, it was decided to use auto scaling.
Minimum Size Scale out as needed
Desired Capacity
Maximum Size
Application to be live
Application to perform normal
During peak season
During peak load, scale out by
- Minimum 3 servers
- Desired 9 servers
- Maximum 27 servers
- 3 servers, every 10 minutes
Example:
We can also
consider reserved
instances for cost
saving
Very important to autoscale not
only using HW utilization (CPU,
memory, etc.), but also use
system metrics like lag in your
messaging broker for example.
Intuit Confidential and Proprietary 37
Customer Attrition
- So far everything was good, but we noticed some customer attrition.
- We know that our pricing is personal, dynamic and cheapest in the market.
- Customers come as usual, fill-in details, ask for price quote and then go away...
- What could that be?
Intuit Confidential and Proprietary 38
Customer Attrition
Reason
- Turns out that our unique and sophisticated pricing and matching algorithm has some
expensive “personality coefficient” calculation per each delivery courier.
- We want the fastest and nicest couriers to be rewarded a bit more and we look at the
history for calculating “personality coefficient” used in pricing.
Intuit Confidential and Proprietary 39
Customer Attrition
Reason
Therefore, we decided to decompose “personality coefficient” as a separate offline job and create Pricing
service
- The calculation will happen once in 24 hours per each courier
- For fast performance we decided to use Redis as a key/value caching data store for saving of the results
Async Operations
Intuit Confidential and Proprietary 41
Three Important Response Time Limits
- Up to 0.1 seconds: The user doesn’t recognize any perceptible delay.
- Up to 1 second: The delay is slightly perceptible. The user feels a
pause, the site may feel sluggish.
- Up to 10 seconds: With an operation that takes 10 seconds or more
to complete, you’ll lose the user’s attention (unless you give them
feedback).
Intuit Confidential and Proprietary 42
Long Courier Matching
- One of the reasons that people love Hermes Deliveries is our great user experience.
○ For example, we don’t require customer registration and collect only minimum information at
the time of the order - progressive data collection.
- Sometimes it takes more than 10 seconds to find the best matching courier.
○ As a result, some of our customers leave although we do show progress bar.
- How can we solve this?
Intuit Confidential and Proprietary 43
Perceived Performance
Our Mission
- Requirement 1: Best courier matching
- Requirement 2: Do matching in less than 1 sec
- Rule: No trade-offs
Intuit Confidential and Proprietary 44
Perceived Performance
Separation in Time Principle
- If a system or process must satisfy contradictory requirements try to schedule the system
operation in such a way that requirements in conflict take effect at different times.
- Perceived Performance refers to how quickly a software feature appears to perform its
task.
Intuit Confidential and Proprietary 45
Perceived Performance
Solution
- We can anticipate customer behavior based on the history, similar customers and knowing their
location we can match couriers in the background.
- We also know “personality coefficient” that we calculated and cached, so we can calculate price
instantly.
- The moment customers click on “order” we already have everything to start the ordering process.
Intuit Confidential and Proprietary 46
Perceived Performance
Architecture
This is how we started
Intuit Confidential and Proprietary 47
Money Movement
- So far we were working with a 3rd party payment processor Plutos who are responsible for
charging customers and paying couriers:
1. During ordering process we are calling Plutos API and providing Customer credit card to charge
from and Payee ID of courier to pay to
2. All couriers are registered customers of Plutos and they can withdraw money to one of their
preferred payment channels
3. All that happens online
Intuit Confidential and Proprietary 48
Money Movement
Challenge
- Issues with customer credit card -
○ wrong number, insufficient funds etc.
- Plutos are not reliable service and we are losing money and customers because of that.
- What can we do?
Intuit Confidential and Proprietary 49
Online vs. Offline or Sync vs. Async
- Online / Sync
1. Collect customer credit card, verify and authorize funds by calling Pluto API
2. If Pluto fails have another (maybe more expensive) provider for resiliency
3. Store details for offline processing
4. Release customer
- Offline / Async
5. Do Money Movement using Pluto or another provider
Intuit Confidential and Proprietary 50
Architecture
This is how we started
Intuit Confidential and Proprietary 51
Wrap Up and Questions
- Horizontal Scaling
- Auto Scaling
- Functional Decomposition
- Offline Jobs
- Three Important Response Time Limits
- Perceived Performance
- Async vs. Sync
Security in Cloud
Intuit Confidential and Proprietary 53
Shared Responsibility Principle
While AWS (or any Cloud vendor) manages security of the Cloud, security in the
Cloud is the responsibility of the customer
Intuit Confidential and Proprietary 54
Blast Radius
How widespread the threat or failure is?
Goal: minimize blast radius
Intuit Confidential and Proprietary 55
Tactics
- Multiple Cloud account security strategy
○ Accounts per organization, department, product, etc.
- Segmentation of network, data, storage etc.
- Access Control policies and least privilege principle
Intuit Confidential and Proprietary 56
Tactics
- Data Classification and Handling strategy and standard
○ Data can be public, restricted, sensitive, secret etc.
○ For each type of data have a proper standard how to handle at rest, in flight, on screen etc.
- “Dance Like Nobody’s Watching. Encrypt Like Everyone Is.”
- Keys and Secret Management
○ Multiple Keys – preferably key per customer, user, record, column etc.
○ Creation, Rotation etc.
○ Secret Vault
Intuit Confidential and Proprietary 57
Tactics
- Security Reviews and mindset in teams
- Automations
○ Find secrets or sensitive data in Github, Logs, Data
Stores, Customer free text entries such as “comments” or
“descriptions”
○ S3 open for public
○ Policy violations
○ Application scans as a part of CICD such as OWASP Top
10 Risks, 3rd Party and Open Source dependency scans
○ Docker Image vulnerabilities
Intuit Confidential and Proprietary 58
Tactics
- Have Security Incident response plan
- Don’t put all eggs in one basket
- Don’t trust anything and anyone
Incident Management
Intuit Confidential and Proprietary 60
Incidents are
Unavoidable
So you better have a
proper response plan
and data
Intuit Confidential and Proprietary 61
Incident Response Practices
- Have on-call procedure and discipline
- Have monitoring and prioritized alerting system
- Escalate and declare incidents early and often
- During incident identify potential root cause and bring required experts if/when required
- Assess customer impact and communicate
- “Stop the bleeding” first
- Preserve everything you might need for post-mortem root cause analysis and track times
- Define next steps and conduct post-mortem root cause analysis
Q&A
Michael_Kalika@intuit.com
https://www.linkedin.com/in/michaelkalika

More Related Content

PDF
What makes a strong innovation culture?
PDF
What makes a Strong Engineering Culture
PDF
C4U Hackathon Tips
PPT
Lies, damn lies and business cases
PDF
WomenTechIceland and Huawei Workshop presented by Guðrún Ragnarsdóttir of Str...
PPTX
Scaling the Lean Startup in the Enterprise
PPTX
PDF
Onedot Company Portrait
What makes a strong innovation culture?
What makes a Strong Engineering Culture
C4U Hackathon Tips
Lies, damn lies and business cases
WomenTechIceland and Huawei Workshop presented by Guðrún Ragnarsdóttir of Str...
Scaling the Lean Startup in the Enterprise
Onedot Company Portrait

What's hot (19)

PPTX
Ideation Platform
PDF
NUS-ISS Learning Day 2019-What is digital transformation?
PDF
Fjord Equinox: strategy prototyping
PDF
Intergen Smarts 11 (2006)
PDF
Lean Innovation within UnitedHealth Group
PPTX
Playing Nice in the Product Playground
PPTX
Leveraging Digitalisation for Transformation - A Digital Leadership Capstone ...
PDF
Meeting Pack about CIO Leadership. Part 1
PDF
Hackathon - Continuous Innovation by Design
PPTX
Evidence-based Entrepreneurship by Steve Blank
PPTX
CRO & Jobs To Be Done - Jon Hayes @ CRO Pros
PDF
Online Account Opening
PDF
Business experimentation
PDF
2019 01-design thinking-for architects
PPTX
Masterclass architectural thinking
PDF
From project to product mindset and onwards to product platform architectures
PDF
Manufacturer Gains Advantage by Expanding IoT Footprint from Many Machines to...
PDF
Outcome Engineering 101: Five Guidelines to Delivering Products that Create I...
PDF
Applying Design Thinking Principles in Product Management
Ideation Platform
NUS-ISS Learning Day 2019-What is digital transformation?
Fjord Equinox: strategy prototyping
Intergen Smarts 11 (2006)
Lean Innovation within UnitedHealth Group
Playing Nice in the Product Playground
Leveraging Digitalisation for Transformation - A Digital Leadership Capstone ...
Meeting Pack about CIO Leadership. Part 1
Hackathon - Continuous Innovation by Design
Evidence-based Entrepreneurship by Steve Blank
CRO & Jobs To Be Done - Jon Hayes @ CRO Pros
Online Account Opening
Business experimentation
2019 01-design thinking-for architects
Masterclass architectural thinking
From project to product mindset and onwards to product platform architectures
Manufacturer Gains Advantage by Expanding IoT Footprint from Many Machines to...
Outcome Engineering 101: Five Guidelines to Delivering Products that Create I...
Applying Design Thinking Principles in Product Management
Ad

Similar to System Scalability (20)

PPTX
Horizontal Scaling for Millions of Customers!
PPTX
Alex Balazs on Scalable Services at GlueCon 2016
PPTX
Welcome to NodeDay - Michele Iacovone, Intuit
PDF
Nonfunctional Testing: Examine the Other Side of the Coin
PDF
Kafka as an Eventing System to Replatform a Monolith into Microservices
PPTX
Intuit Analytics Cloud 101
POTX
Envisioning your Monitoring Strategy
PDF
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
PDF
What makes a strong Intuit innovation culture
PPTX
Build vs. Buy: A New Look at the Classic IT Dilemma
PPTX
Enterprise Agile Transformation - Intuit Journey
PPTX
SaaS Challenges & Security Concerns
PDF
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
PPTX
Expense Tracker AppIntroduction Background Problem statement Proposed work...
PPTX
Business iQ: What It Is and How to Start - AppD Summit Europe
PPT
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
PPTX
UX Leadership: Helping Teams Design Onboarding Experience Using 3 Frameworks
PDF
Business Utility Application
PDF
TkXel Portfolio
PDF
Bizrec investor pitch
Horizontal Scaling for Millions of Customers!
Alex Balazs on Scalable Services at GlueCon 2016
Welcome to NodeDay - Michele Iacovone, Intuit
Nonfunctional Testing: Examine the Other Side of the Coin
Kafka as an Eventing System to Replatform a Monolith into Microservices
Intuit Analytics Cloud 101
Envisioning your Monitoring Strategy
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
What makes a strong Intuit innovation culture
Build vs. Buy: A New Look at the Classic IT Dilemma
Enterprise Agile Transformation - Intuit Journey
SaaS Challenges & Security Concerns
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
Expense Tracker AppIntroduction Background Problem statement Proposed work...
Business iQ: What It Is and How to Start - AppD Summit Europe
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
UX Leadership: Helping Teams Design Onboarding Experience Using 3 Frameworks
Business Utility Application
TkXel Portfolio
Bizrec investor pitch
Ad

Recently uploaded (20)

PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Introduction to Windows Operating System
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Types of Token_ From Utility to Security.pdf
PPTX
Patient Appointment Booking in Odoo with online payment
PPTX
Cybersecurity: Protecting the Digital World
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PDF
Designing Intelligence for the Shop Floor.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Introduction to Windows Operating System
Trending Python Topics for Data Visualization in 2025
DNT Brochure 2025 – ISV Solutions @ D365
Autodesk AutoCAD Crack Free Download 2025
Computer Software and OS of computer science of grade 11.pptx
Complete Guide to Website Development in Malaysia for SMEs
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Types of Token_ From Utility to Security.pdf
Patient Appointment Booking in Odoo with online payment
Cybersecurity: Protecting the Digital World
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Designing Intelligence for the Shop Floor.pdf

System Scalability

  • 1. Michael Kalika Chief Architect, Intuit Israel System Scalability April 22, 2020
  • 2. Intuit Confidential and Proprietary 2 Agenda Intro Use Case Fundamentals Scaling Databases Scaling API Async Operations Security in Cloud Incident Management Q&A
  • 4. 4 Hi, I’m Michael Software Dev Leadership HP 2004 - 2011 BI/Data Group Manager HP 2011 - 2015 R&D Architect Payoneer 2015 - 2017 Architect, DevOps and PD Lead Intuit 2017 - 2020 Chief Architect Intuit 2020 - Today
  • 5. 5
  • 6. Intuit Confidential and Proprietary 6 Where we startWhere we started Intuit’s journey began over 35 years ago when our founder Scott Cook sat at his kitchen table and watched his wife as she balanced their checkbook and thought there must be a better way.
  • 8. Intuit Confidential and Proprietary 8 Who we are Founded 9,000 Employees 50M Customers 1993 IPO ~$6.8B FY19 Revenue 20 Locations 1983
  • 9. Intuit Confidential and Proprietary 9 Intuit global locations 20 locations in 9 countries Brazil São Paulo Europe London, UK Paris, France Australia Melbourne Sydney India Bangalore Israel Tel Aviv United States California: Los Angeles Mountain View San Diego San Francisco Boise, ID Fredericksburg, VA Plano/Dallas, TX Reno, NV Tucson, AZ Washington, D.C. Canada Edmonton Mississauga/Toronto Updated November 2019 Mexico Mexico City
  • 10. Intuit Confidential and Proprietary 10 Recognized as one of the world’s leading companies 2004 - 2019 Most Admired: Computer Software 2002 - 2019 100 Best Companies to Work For 2019 Most Innovative Companies 2019 Companies Best Positioned For Breakout Growth
  • 11. Intuit Confidential and Proprietary 11 Recognized as one of the top companies to work for #15 IN THE BAY AREA #10 IN TECHNOLOGY #1 in Canada #24 in the US #4 in the UK #2 in India #3 in Australia #15 COMPANIES THAT CARE #14 in Israel
  • 12. Intuit Confidential and Proprietary 12 Be Bold Be Passionate Learn Fast Be Decisive Win Together Deliver Awesome Our values Integrity Without Compromise We Care and Give Back
  • 13. Intuit Confidential and Proprietary 13 Core capabilities: Our recipe to execute with excellence WHAT TO SOLVE HOW TO SOLVE CUSTOMER-DRIVEN INNOVATION (CDI) DESIGN FOR DELIGHT (D4D) An important, unsolved customer problem …that we, and those we enable, can solve well …and build durable competitive advantage SUCCESS IS HERE Deep customer empathy Go broad to go narrow Rapid experiments with customers DELIGHT
  • 15. Intuit Confidential and Proprietary 15 Hermes Deliveries - Assume we have a small startup company that offers low cost and fast deliveries by connecting people and delivery needs. - Initially when we start, our target is Rishon Le-Tzion and we have tens of daily customers and 2-3 couriers.
  • 16. Intuit Confidential and Proprietary 16 Hermes Deliveries - We save all customers, couriers, orders, places, and history in the same database, in a single DB instance. - Everything is new and small, so we have no caching, no automation, no auto scaling, no monitoring. - This is perfect as we hardly have 1 delivery order in 15 minutes.
  • 17. Intuit Confidential and Proprietary 17 COVID-19 time - But, with COVID-19 more and more people started using our service since we are the cheapest and fastest service. - Now, we have 20 orders in 30 minutes and the number increases… - We see how successful the service is and decided to offer our services in Tel Aviv.
  • 18. Intuit Confidential and Proprietary 18 Turtle Deliveries - At this point we realize that our system performs poorly. - The App works very slow and API latency increased. - We have database transaction deadlocks and whole system failures.
  • 20. Intuit Confidential and Proprietary 20 Scalability Concepts vs. Vertical Horizontal
  • 21. Intuit Confidential and Proprietary 21 Scalability Concepts
  • 22. Intuit Confidential and Proprietary 22 System Decomposition Distributed System
  • 23. Intuit Confidential and Proprietary 23 The Scale Cube Data partitioning Scale by splitting similar things [sharding] Example: Cell Architecture and/or Sharding. A cell is a self-contained installation that can satisfy all the operations for a shard. A shard is a subset of a much larger dataset, typically a range of users, for example. Horizontal duplication Scale by cloning Example: 18 Web servers under Load Balancer Functional decomposition Scale by splitting different things Example: Orders, Inventory, Customers
  • 25. Intuit Confidential and Proprietary 25 Query Optimization, Indexing and Connection Pool - We use RDBMS which is heavily normalized - Therefore, it was decided to 1. Introduce some redundant columns which frequently appear in WHERE and JOIN ON clauses (denormalization) - This will reduce join queries and break few big queries into smaller. 2. Introduce index to columns that frequently appear in WHERE clauses. 3. Use connection pool for optimization of the number of costly network connections.
  • 26. Intuit Confidential and Proprietary 26 Vertical Scaling or Scaling Up - All that helped improve application API latency by 30% which was good enough at this time. - We entered new areas. - It was also decided to upgrade RDS and add more storage for reducing future risks.
  • 27. Intuit Confidential and Proprietary 27 New Challenges - Everything is running great, we have more orders and delivery couriers, but facing new issues… again… 1. Database index grows and requires maintenance 2. Table scanning with index is slow. - Upgrade to a bigger RDS instance is costly and we are not yet profitable. - What is your next step?
  • 28. Intuit Confidential and Proprietary 28 Read Replicas - Bigger RDS is not able to handle all READ and WRITE requests. - In most cases we need consistency in WRITEs, but small delays on READs are fine. - Therefore, it was decided to create two READ replicas of a given source RDS instance, thereby increasing read throughput.
  • 29. Intuit Confidential and Proprietary 29 Read Replicas - That was good and we decided to go for new areas. - Now we see that the Primary instance is not able to handle all writes and there is latency. - We also have unacceptable lags between Primary and READ replicas. - What’s next?
  • 30. Intuit Confidential and Proprietary 30 Functional Decomposition - Our Locations table in database is getting high WRITE traffic - the R:W ratio is 3:8. - That table is used for location tracking and it has nothing to do with the rest of the functionality. - Why not decompose functionality? 1. Separate Locations table to a new dedicated database 2. Decouple location tracking functionality as a stand-alone Microservice?
  • 31. Intuit Confidential and Proprietary 31 Functional Decomposition - Done deal! It works! - Now we want to add the rest of the country and we must plan for a scale - What can we do?
  • 32. Intuit Confidential and Proprietary 32 Data Partitioning or Sharding Share Nothing Model - It was decided to Shard the database. - Sharding is a technique that splits data into smaller subsets and distributes them across a number of physically separated database servers. - Shards have no knowledge of each other. If one database shard has a hardware issue or goes through failover, no other shards are impacted The query to read or join data from multiple database shards must be specially engineered North Center South
  • 33. Intuit Confidential and Proprietary 33 Wrap Up and Questions - Scalability Cube and Concepts - Scaling Databases 1. Query Optimization, Indexing, Connection Pool 2. Vertical Scaling / Scaling Up 3. Read Replicas 4. Data Partitioning or Sharding - Functional Decomposition
  • 35. Intuit Confidential and Proprietary 35 Horizontal Scaling - We noticed that our Location API server is extremely loaded and sometimes crashes. - We decided to create three Location API servers in three AWS availability zones and put them behind load balancer.
  • 36. Intuit Confidential and Proprietary 36 Auto Scaling - But we still face occasional issues during certain hours and we want to be smart with costs. - Therefore, it was decided to use auto scaling. Minimum Size Scale out as needed Desired Capacity Maximum Size Application to be live Application to perform normal During peak season During peak load, scale out by - Minimum 3 servers - Desired 9 servers - Maximum 27 servers - 3 servers, every 10 minutes Example: We can also consider reserved instances for cost saving Very important to autoscale not only using HW utilization (CPU, memory, etc.), but also use system metrics like lag in your messaging broker for example.
  • 37. Intuit Confidential and Proprietary 37 Customer Attrition - So far everything was good, but we noticed some customer attrition. - We know that our pricing is personal, dynamic and cheapest in the market. - Customers come as usual, fill-in details, ask for price quote and then go away... - What could that be?
  • 38. Intuit Confidential and Proprietary 38 Customer Attrition Reason - Turns out that our unique and sophisticated pricing and matching algorithm has some expensive “personality coefficient” calculation per each delivery courier. - We want the fastest and nicest couriers to be rewarded a bit more and we look at the history for calculating “personality coefficient” used in pricing.
  • 39. Intuit Confidential and Proprietary 39 Customer Attrition Reason Therefore, we decided to decompose “personality coefficient” as a separate offline job and create Pricing service - The calculation will happen once in 24 hours per each courier - For fast performance we decided to use Redis as a key/value caching data store for saving of the results
  • 41. Intuit Confidential and Proprietary 41 Three Important Response Time Limits - Up to 0.1 seconds: The user doesn’t recognize any perceptible delay. - Up to 1 second: The delay is slightly perceptible. The user feels a pause, the site may feel sluggish. - Up to 10 seconds: With an operation that takes 10 seconds or more to complete, you’ll lose the user’s attention (unless you give them feedback).
  • 42. Intuit Confidential and Proprietary 42 Long Courier Matching - One of the reasons that people love Hermes Deliveries is our great user experience. ○ For example, we don’t require customer registration and collect only minimum information at the time of the order - progressive data collection. - Sometimes it takes more than 10 seconds to find the best matching courier. ○ As a result, some of our customers leave although we do show progress bar. - How can we solve this?
  • 43. Intuit Confidential and Proprietary 43 Perceived Performance Our Mission - Requirement 1: Best courier matching - Requirement 2: Do matching in less than 1 sec - Rule: No trade-offs
  • 44. Intuit Confidential and Proprietary 44 Perceived Performance Separation in Time Principle - If a system or process must satisfy contradictory requirements try to schedule the system operation in such a way that requirements in conflict take effect at different times. - Perceived Performance refers to how quickly a software feature appears to perform its task.
  • 45. Intuit Confidential and Proprietary 45 Perceived Performance Solution - We can anticipate customer behavior based on the history, similar customers and knowing their location we can match couriers in the background. - We also know “personality coefficient” that we calculated and cached, so we can calculate price instantly. - The moment customers click on “order” we already have everything to start the ordering process.
  • 46. Intuit Confidential and Proprietary 46 Perceived Performance Architecture This is how we started
  • 47. Intuit Confidential and Proprietary 47 Money Movement - So far we were working with a 3rd party payment processor Plutos who are responsible for charging customers and paying couriers: 1. During ordering process we are calling Plutos API and providing Customer credit card to charge from and Payee ID of courier to pay to 2. All couriers are registered customers of Plutos and they can withdraw money to one of their preferred payment channels 3. All that happens online
  • 48. Intuit Confidential and Proprietary 48 Money Movement Challenge - Issues with customer credit card - ○ wrong number, insufficient funds etc. - Plutos are not reliable service and we are losing money and customers because of that. - What can we do?
  • 49. Intuit Confidential and Proprietary 49 Online vs. Offline or Sync vs. Async - Online / Sync 1. Collect customer credit card, verify and authorize funds by calling Pluto API 2. If Pluto fails have another (maybe more expensive) provider for resiliency 3. Store details for offline processing 4. Release customer - Offline / Async 5. Do Money Movement using Pluto or another provider
  • 50. Intuit Confidential and Proprietary 50 Architecture This is how we started
  • 51. Intuit Confidential and Proprietary 51 Wrap Up and Questions - Horizontal Scaling - Auto Scaling - Functional Decomposition - Offline Jobs - Three Important Response Time Limits - Perceived Performance - Async vs. Sync
  • 53. Intuit Confidential and Proprietary 53 Shared Responsibility Principle While AWS (or any Cloud vendor) manages security of the Cloud, security in the Cloud is the responsibility of the customer
  • 54. Intuit Confidential and Proprietary 54 Blast Radius How widespread the threat or failure is? Goal: minimize blast radius
  • 55. Intuit Confidential and Proprietary 55 Tactics - Multiple Cloud account security strategy ○ Accounts per organization, department, product, etc. - Segmentation of network, data, storage etc. - Access Control policies and least privilege principle
  • 56. Intuit Confidential and Proprietary 56 Tactics - Data Classification and Handling strategy and standard ○ Data can be public, restricted, sensitive, secret etc. ○ For each type of data have a proper standard how to handle at rest, in flight, on screen etc. - “Dance Like Nobody’s Watching. Encrypt Like Everyone Is.” - Keys and Secret Management ○ Multiple Keys – preferably key per customer, user, record, column etc. ○ Creation, Rotation etc. ○ Secret Vault
  • 57. Intuit Confidential and Proprietary 57 Tactics - Security Reviews and mindset in teams - Automations ○ Find secrets or sensitive data in Github, Logs, Data Stores, Customer free text entries such as “comments” or “descriptions” ○ S3 open for public ○ Policy violations ○ Application scans as a part of CICD such as OWASP Top 10 Risks, 3rd Party and Open Source dependency scans ○ Docker Image vulnerabilities
  • 58. Intuit Confidential and Proprietary 58 Tactics - Have Security Incident response plan - Don’t put all eggs in one basket - Don’t trust anything and anyone
  • 60. Intuit Confidential and Proprietary 60 Incidents are Unavoidable So you better have a proper response plan and data
  • 61. Intuit Confidential and Proprietary 61 Incident Response Practices - Have on-call procedure and discipline - Have monitoring and prioritized alerting system - Escalate and declare incidents early and often - During incident identify potential root cause and bring required experts if/when required - Assess customer impact and communicate - “Stop the bleeding” first - Preserve everything you might need for post-mortem root cause analysis and track times - Define next steps and conduct post-mortem root cause analysis