SlideShare a Scribd company logo
3
Most read
8
Most read
9
Most read
RESILIENT SYSTEM DESIGN
June 2013
Risk & Compliance Engineering, PayPal
Pradeep Ballal
Staale Nerboe
Greg Berry
This deck contains generic architecture information, and does not
reflect the exact details of current or planned systems.
PROBLEM DEFINITION AND SOLUTION
Problem
In a distributed, virtualized environment, system failures are inevitable.
Solution
Isolate functionality to enable independent implementation of appropriate availability
patterns and increase velocity/flexibility of fixes.
Use asynchronous reconciliation to resolve failures without affecting overall customer
experience.
2 Confidential and Proprietary
PPaaS
Circuit Breakers
Clients
Service Container
Circuit Breakers
3
HIGH LEVEL ARCHITECTURE
Confidential and Proprietary
Dependency Dependency
Dependency Dependency Dependency
Dependency
Orchestration/Response Consolidation
Request
Request
Request
Request
Component Container
Functional Component Functional Component
Dependency Dependency DependencyDependency
Functional Component (FC): Isolated set
of functionality that can be developed,
deployed and executed independently.
• Fits well into the Agile Development
methodology
• Fallback behavior defined
Service Container (SC): Contains
infrastructure to orchestrate FCs and
handle response consolidation and
initiate reconciliation during failure.
• Component based model (e.g. OSGi)
including support for hot deploy of FCs
without downtime for service
• Malfunctioning FCs will quickly show and
can be handled dynamically by properties
or real time deployments
• Provide meaningful response back to clients
4
SERVICE CONTAINER
Confidential and Proprietary
Service Container (SC): Contains infrastructure to orchestrate FCs and handle response
consolidation and initiate reconciliation during failure.
• Build on top of PayPal Platform as a Services (PPaaS)
• Component based model (e.g. OSGi) including support for hot deploy of FCs without downtime
for service
• Enforces the concepts of coarse grained services
• Malfunctioning FCs will quickly show and can be handled dynamically by properties or real time
deployments
• Provide meaningful response back to clients
• Non-intrusive on the clients
Functional Component (FC): Isolated set of functionality that can be developed,
deployed and executed independently.
• Fits well into the Agile Development methodology
• FCs can fail independently
• Fallback behavior defined
Clients
FALLBACK
5 Confidential and Proprietary
To create a resilient system each Functional Component and Dependency SHOULD fail
gracefully and have Fallback Behavior. This can be achieved by utilizing a framework
that enforces normalized behavior across the platform.
PS: Fallback Behavior should not be an
afterthought but should be detailed
out in the design in conjunction with
your business partners.
FAILURE
Request
Functional Component / Dependency
Circuit Breakers (Local / Global)
Logging / Monitoring
Normal
Behavior
Fallback
Behavior
Clients
CIRCUIT BREAKERS*
6 Confidential and Proprietary
Circuit Breakers (CB)s serve these purposes:
• It protects the clients from slow or broken FCs
• It protects services from demand in excess
of capacity
• And most importantly it protects the
Business from malfunctioning code by
tracking negative actions (like decline
payment) and if abnormal behavior is
found, shuts down the FC
*Concept first discussed in the excellent book Release It! by Michael Nygard.
Example open source implementation by Netflix: https://github.com/Netflix/Hystrix/wiki
CBs are named after their counterparts
in the physical world.
Local CBs: Track the health of services
Global CBs: Tracks negative behavior
that impacts the Business or health of
overall system
Service ContainerService Container
Request
Request
Functional Component
Dependency
Orchestration/
Response Consolidation
Circuit Breakers (Global)
Circuit Breakers
Request
Request
Functional Component
Dependency
Circuit BreakersConfig
Orchestration/
Response Consolidation
DATA ACCESS – NEED MORE
7 Confidential and Proprietary
Globally Distributed
• You can’t have a single system of record that contains all data
• Latency matters (you can’t go faster than the speed of light)
• There must be a way to partition data and processing
Always Available
• Everything needs to be redundant (or dispensable)
• Can’t have a single point of failure
Shares Nothing
• Systems must be able to run completely
independently
Read
ReplicasRead
Replicas
SoR
Journal
Read Service Life Cycle (CRUD) Service
Latency Bridge
Replay
Clients
8
EVENTUALLY CONSISTENT*
Confidential and Proprietary
CAP theorem: States that of three properties of distributed -data systems—data
consistency, system availability, and tolerance to network partition—only two can be
achieved at any given time.
To account for this fact a reconciliation system is required to identify issues and try to
correct them automatically. Only as a last resort should a Manual Review should be
conducted.
Design considerations:
• Limited DB table scanning: System should not rely on heavy DB table scanning and heavy
queries. If required this SHOULD be done in a DW or on a hadoop cluster and feed back into the
real time system.
• Non-intrusive: Listening only to events from other systems, SHOULD NOT touch code in other
parts of the system (and hence don’t need to get on their road map).
Types of reconciliation:
• Stateless: Only depend on the data in the request.
• Stateful: Depends on business processes and states when failure occurred. Hence when the
system failed may matter in the outcome of the reconciliation.
*See excellent paper “Eventually Consistent” by Werner Vogels, CTO Amazon
Service Container
Clients
9
DETAILED DESIGN
Confidential and Proprietary
Service Container
Request
Request
Functional Component
Dependency
Orchestration/Response
Consolidation
Circuit Breakers (Global)
Circuit Breakers
Request
Request
Functional Component
Dependency
Orchestration/Response
Consolidation
Circuit BreakersConfig
SoR
Reconciliation
&
Actions
Queue
Events
Reports (Manual)
Reconcile
10 Confidential and Proprietary
WE ARE HIRING
If you are interested in helping us solve
these problems, you can contact us at:
dwilfred@paypal.com
http://www.ebaycareers.com

More Related Content

PPTX
Presentation on a CCTV Security System.
PPT
Ask the VC: How to Create a Winning Pitch Deck
PDF
Android Modularization
PPTX
VMware Horizon - news
PPTX
Weather Station Using IoT
DOCX
CHILD TRACKING SYSTEM
PPTX
Android Internship report presentation
PDF
Introduction to Mobile Application Development
Presentation on a CCTV Security System.
Ask the VC: How to Create a Winning Pitch Deck
Android Modularization
VMware Horizon - news
Weather Station Using IoT
CHILD TRACKING SYSTEM
Android Internship report presentation
Introduction to Mobile Application Development

What's hot (20)

PPT
IP CCTV Introduction
PDF
Mobile Application Development Proposal PowerPoint Presentation Slides
PPTX
OTT- understanding infrastructure
PPT
Introduction to IPTV
PDF
Build web applications using google flutter
PPTX
Mobile TV
PDF
PicoScenes Tutorial @ CPS-IOT Week 2022
PDF
IRJET- IOT Based Air and Sound Pollution Monitoring System
PDF
android app development training report
DOC
Video Surveillance Report
PPTX
Introduction to Android and Android Studio
PPTX
Flutter talkshow
PDF
INTRODUCTION TO FLUTTER.pdf
PDF
Dell Boomi AtomSphere - A presentation by RapidValue Solutions
PDF
Google Firebase presentation - English
PDF
Jain Sip Tutorial
PDF
Colt: The Future of Telco Cloud
PPTX
weather monitoiring system.pptx
PPTX
Flutter Festival - Intro Session
PPTX
Point of View -Converged Infrastructure
IP CCTV Introduction
Mobile Application Development Proposal PowerPoint Presentation Slides
OTT- understanding infrastructure
Introduction to IPTV
Build web applications using google flutter
Mobile TV
PicoScenes Tutorial @ CPS-IOT Week 2022
IRJET- IOT Based Air and Sound Pollution Monitoring System
android app development training report
Video Surveillance Report
Introduction to Android and Android Studio
Flutter talkshow
INTRODUCTION TO FLUTTER.pdf
Dell Boomi AtomSphere - A presentation by RapidValue Solutions
Google Firebase presentation - English
Jain Sip Tutorial
Colt: The Future of Telco Cloud
weather monitoiring system.pptx
Flutter Festival - Intro Session
Point of View -Converged Infrastructure
Ad

Viewers also liked (20)

PDF
Global Payment System- Reference Architecture
PPTX
Peter Afanasiev - Architecture of online Payments
DOC
Money adder
PPTX
Nodejs introduce - using Socket.io
PPT
Enterprise Mobile App UX: Designing from UI to Backend
PPTX
Paypal.com ppt
PDF
Webcast: API-Centric Architecture for Building Context-Aware Apps
PPTX
PayPal: A case study
PPTX
PayPal's Private Cloud @ Scale
PPTX
Online Payment Gateway System
PDF
22.10.16
PDF
Tensile behavior of environment friendly jute epoxy laminated
PDF
The Road-To-Wealth-By-Robert-Allen
PDF
Pengertian bkb kit juknis 2016
PPT
Romania cuisine_London meeting
PPTX
Evaluating the Effect of Rural Finance on African Economies
PDF
Company Presentation: global reach
PPTX
3D Printing and the Future (or Demise) of IP
DOCX
MR. LOWEL ORTIZO CURRICULUM VITAE, Updated
PPS
Nghịch lý cuộc đời
Global Payment System- Reference Architecture
Peter Afanasiev - Architecture of online Payments
Money adder
Nodejs introduce - using Socket.io
Enterprise Mobile App UX: Designing from UI to Backend
Paypal.com ppt
Webcast: API-Centric Architecture for Building Context-Aware Apps
PayPal: A case study
PayPal's Private Cloud @ Scale
Online Payment Gateway System
22.10.16
Tensile behavior of environment friendly jute epoxy laminated
The Road-To-Wealth-By-Robert-Allen
Pengertian bkb kit juknis 2016
Romania cuisine_London meeting
Evaluating the Effect of Rural Finance on African Economies
Company Presentation: global reach
3D Printing and the Future (or Demise) of IP
MR. LOWEL ORTIZO CURRICULUM VITAE, Updated
Nghịch lý cuộc đời
Ad

Similar to PayPal Resilient System Design (20)

PDF
Resilient Functional Service Design
PDF
Timeless design in a cloud-native world
PDF
The 7 quests of resilient software design
PDF
2018-05-16 Geeknight Dallas - Distributed Systems Talk
KEY
Dropping ACID - Building Scalable Systems That Work
PPTX
designing distributed scalable and reliable systems
PDF
"Resilient by Design: Strategies for Building Robust Architecture at Uklon", ...
PDF
Agile integration: Decomposing the monolith
PDF
Stateful on Stateless - The Future of Applications in the Cloud
PDF
Building High Fidelity Data Streams (QCon London 2023)
PDF
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
PPTX
Designing Fault Tolerant Microservices
PDF
Cloudstate—Towards Stateful Serverless
PPT
3. challenges
PDF
Reliability and Resilience Patterns
PDF
Acing The System Design Interview 1st Edition Zhiyong Tan
PDF
Reactive: Programming -> Systems -> Architecture
PDF
Cloudstate - Towards Stateful Serverless
PPTX
Designing distributed systems
PDF
SOA, Microservices and Event Driven Architecture
Resilient Functional Service Design
Timeless design in a cloud-native world
The 7 quests of resilient software design
2018-05-16 Geeknight Dallas - Distributed Systems Talk
Dropping ACID - Building Scalable Systems That Work
designing distributed scalable and reliable systems
"Resilient by Design: Strategies for Building Robust Architecture at Uklon", ...
Agile integration: Decomposing the monolith
Stateful on Stateless - The Future of Applications in the Cloud
Building High Fidelity Data Streams (QCon London 2023)
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Designing Fault Tolerant Microservices
Cloudstate—Towards Stateful Serverless
3. challenges
Reliability and Resilience Patterns
Acing The System Design Interview 1st Edition Zhiyong Tan
Reactive: Programming -> Systems -> Architecture
Cloudstate - Towards Stateful Serverless
Designing distributed systems
SOA, Microservices and Event Driven Architecture

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Modernizing your data center with Dell and AMD
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
Modernizing your data center with Dell and AMD
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

PayPal Resilient System Design

  • 1. RESILIENT SYSTEM DESIGN June 2013 Risk & Compliance Engineering, PayPal Pradeep Ballal Staale Nerboe Greg Berry This deck contains generic architecture information, and does not reflect the exact details of current or planned systems.
  • 2. PROBLEM DEFINITION AND SOLUTION Problem In a distributed, virtualized environment, system failures are inevitable. Solution Isolate functionality to enable independent implementation of appropriate availability patterns and increase velocity/flexibility of fixes. Use asynchronous reconciliation to resolve failures without affecting overall customer experience. 2 Confidential and Proprietary
  • 3. PPaaS Circuit Breakers Clients Service Container Circuit Breakers 3 HIGH LEVEL ARCHITECTURE Confidential and Proprietary Dependency Dependency Dependency Dependency Dependency Dependency Orchestration/Response Consolidation Request Request Request Request Component Container Functional Component Functional Component Dependency Dependency DependencyDependency Functional Component (FC): Isolated set of functionality that can be developed, deployed and executed independently. • Fits well into the Agile Development methodology • Fallback behavior defined Service Container (SC): Contains infrastructure to orchestrate FCs and handle response consolidation and initiate reconciliation during failure. • Component based model (e.g. OSGi) including support for hot deploy of FCs without downtime for service • Malfunctioning FCs will quickly show and can be handled dynamically by properties or real time deployments • Provide meaningful response back to clients
  • 4. 4 SERVICE CONTAINER Confidential and Proprietary Service Container (SC): Contains infrastructure to orchestrate FCs and handle response consolidation and initiate reconciliation during failure. • Build on top of PayPal Platform as a Services (PPaaS) • Component based model (e.g. OSGi) including support for hot deploy of FCs without downtime for service • Enforces the concepts of coarse grained services • Malfunctioning FCs will quickly show and can be handled dynamically by properties or real time deployments • Provide meaningful response back to clients • Non-intrusive on the clients Functional Component (FC): Isolated set of functionality that can be developed, deployed and executed independently. • Fits well into the Agile Development methodology • FCs can fail independently • Fallback behavior defined
  • 5. Clients FALLBACK 5 Confidential and Proprietary To create a resilient system each Functional Component and Dependency SHOULD fail gracefully and have Fallback Behavior. This can be achieved by utilizing a framework that enforces normalized behavior across the platform. PS: Fallback Behavior should not be an afterthought but should be detailed out in the design in conjunction with your business partners. FAILURE Request Functional Component / Dependency Circuit Breakers (Local / Global) Logging / Monitoring Normal Behavior Fallback Behavior
  • 6. Clients CIRCUIT BREAKERS* 6 Confidential and Proprietary Circuit Breakers (CB)s serve these purposes: • It protects the clients from slow or broken FCs • It protects services from demand in excess of capacity • And most importantly it protects the Business from malfunctioning code by tracking negative actions (like decline payment) and if abnormal behavior is found, shuts down the FC *Concept first discussed in the excellent book Release It! by Michael Nygard. Example open source implementation by Netflix: https://github.com/Netflix/Hystrix/wiki CBs are named after their counterparts in the physical world. Local CBs: Track the health of services Global CBs: Tracks negative behavior that impacts the Business or health of overall system Service ContainerService Container Request Request Functional Component Dependency Orchestration/ Response Consolidation Circuit Breakers (Global) Circuit Breakers Request Request Functional Component Dependency Circuit BreakersConfig Orchestration/ Response Consolidation
  • 7. DATA ACCESS – NEED MORE 7 Confidential and Proprietary Globally Distributed • You can’t have a single system of record that contains all data • Latency matters (you can’t go faster than the speed of light) • There must be a way to partition data and processing Always Available • Everything needs to be redundant (or dispensable) • Can’t have a single point of failure Shares Nothing • Systems must be able to run completely independently Read ReplicasRead Replicas SoR Journal Read Service Life Cycle (CRUD) Service Latency Bridge Replay Clients
  • 8. 8 EVENTUALLY CONSISTENT* Confidential and Proprietary CAP theorem: States that of three properties of distributed -data systems—data consistency, system availability, and tolerance to network partition—only two can be achieved at any given time. To account for this fact a reconciliation system is required to identify issues and try to correct them automatically. Only as a last resort should a Manual Review should be conducted. Design considerations: • Limited DB table scanning: System should not rely on heavy DB table scanning and heavy queries. If required this SHOULD be done in a DW or on a hadoop cluster and feed back into the real time system. • Non-intrusive: Listening only to events from other systems, SHOULD NOT touch code in other parts of the system (and hence don’t need to get on their road map). Types of reconciliation: • Stateless: Only depend on the data in the request. • Stateful: Depends on business processes and states when failure occurred. Hence when the system failed may matter in the outcome of the reconciliation. *See excellent paper “Eventually Consistent” by Werner Vogels, CTO Amazon
  • 9. Service Container Clients 9 DETAILED DESIGN Confidential and Proprietary Service Container Request Request Functional Component Dependency Orchestration/Response Consolidation Circuit Breakers (Global) Circuit Breakers Request Request Functional Component Dependency Orchestration/Response Consolidation Circuit BreakersConfig SoR Reconciliation & Actions Queue Events Reports (Manual) Reconcile
  • 10. 10 Confidential and Proprietary WE ARE HIRING If you are interested in helping us solve these problems, you can contact us at: dwilfred@paypal.com http://www.ebaycareers.com

Editor's Notes

  • #2: Mr. Pradeep Ballal works as a Senior Architect in the Core Service Product Development with specific focus on Compliance and Risk products with PayPal Singapore. Mr. Ballal is a software generalist with 13 years of technology experience and has special interest in decision management, business rules, enterprise software and architectures. Mr. Staale Nerboe (snerboe@paypal.com) works as a Senior Architect in the Core Service Product Development organization withPayPal Singapore. Mr. Nerboe has 15+ years of Technology Consulting and Software Architecture experience for large global companies world-wide.Mr. Greg Berry (gberry@paypal.com) works as a Principal Architect at PayPal in the Core Services organization. Greg has been an architect in the payments industry for more than 15 years.
  • #6: In a complex system you will see multiple levels of fallback behavior, like a onion. Also, a fallback behavior can also have fallback. E.g. as a last resort if only log and return an error message to the client.
  • #7: CBs can be implemented in various ways including Complex Event Processing (CEP), Database, Global Cache, or any other fast storage media. It needs to support fast read/write, but also be able to handle rolling windows, like last 5 minutes, 1 hour, 24 hours. This gets complex in an environment where there volume of service invocations are high (e.g. with large number of invocations or