SlideShare a Scribd company logo
PayPal Risk Platform
High Performance Practice
Ling ZhiJun (Brian Ling)
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
2017 Software Architecture Summit
TPV/day
~1
BILLIONpayments/year
6.1
BILLIO
N
Computation/day
~20
Billion
Active Customer
Accounts
210M
petabytes of
data
105
Queries/ day
250
Billion
PayPal operates
one of the largest
Online
Payment
in the world
0.32%
Loss Rate
The power of
our platform
Our technology transformation enables us to:
• Process payments at tremendous scale (200+ countries & 25currencies
supported)
• Accelerate the innovation of new products
• Engage world-class developers & technologists
PayPal Overview
2017 Software Architecture Summit
TPV
+35
4
BILLION
payments/year
6.1
BILLIO
N
payments/
second at peak
1.8B
active customer
accounts
210M
petabytes of
data
73
database
calls/ quarter
4.5T
PayPal operates
one of the largest
Online
Payment
in the world
0.32%
Loss Rate
The power of
our platform
Our technology transformation enables us to:
• Process payments at tremendous scale (200+ countries & 25currencies
supported)
• Accelerate the innovation of new products
• Engage world-class developers & technologists
PayPal Risk KPI
Payments
transactions
Requirement for Risk Platform
Accuracy vs Latency Low Latency + Hardware Investment
Vs Large Throughput
2017 Software Architecture Summit
PayPal Risk Platform Architecture
Online
Offline
DAL
Service
Real-time
Compute Data
Offline
Generated Data
Model +
Variable
Computation
Service
Decision
Service
Variable Rollup
Service
Logging System/ ETL
Read
Path
Write
Path
Gateway
Service
Offline
Generated Data
Simulated
Real-time
Data
Offline Variable
Simulation
PlatformModel
Training
Platform Offline Variable
Aggregation
Service
2017 Software Architecture Summit
PayPal Risk Platform Architecture
Online
Offline
DAL
Service
Offline
Generated Data
Real-time
Compute Data
Model +
Variable
Computation
Service
Decision
Service
Variable
Aggregation
Service
Logging System/ ETL
Read
Path
Write
Path
Gateway
Service
Offline
Generated Data
Simulated
Real-time
Data
Offline Variable
Simulation
PlatformModel
Training
Platform Offline Variable
Aggregation
Service
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
DAL Service Ultimate Questions
JVM-Based High Performance & ATB DAL Service
<100ms P99.99 Latency ??
For single instance, 20k-30k Peak TPS ??
• 99.99% Availability-To-Business??
DAL Service Technical Challenges
Budget Cost
• Align with traffic, Hardware
investment Exponential Increase
Performance Issue
• P99 Latency Significantly
differentiate Avg latency
• Too Many Latency Spike under
Traffic
• Storage Cluster Unavailability Impact
Latency
Customer Requirement
• Adopt New Use Case
• Access behavior Differentiate per
Colo
• Flexibility & Fast-evolving Use Case
• Replication
• Traffic Strategy
Operational Cost
• Maintain too many Client with
multiple versions
• Too Frequent Release tie to Biz
Case
• Standby Storage Cluster switch-
over
Req
Tech
Value Cost
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
2017 Software Architecture Summit
Async Original Benefit
• More Efficient Thread Scheduling
• Non-blocking Call
• Event-Driven Callback
• Less Context Switch
• Fault Isolation
2017 Software Architecture Summit
Reactor Pattern Threading Model
2017 Software Architecture Summit
Async DAL Service KPI Comparison
• Low Latency
• ~10-35% Reduction (Average/P99)
0
20000
40000
60000
80000
100000
120000
200030004000500060007000800090001000011000120001300014000150001600017000
LATENCY(INMICROSECONDS)
THROUGHPUT (REQUESTS PER SEC)
E2E Client-Service-Aerospike
Benchmark: Read 50% Write 50%
Latency vs. Throughput (4-core VM)
99thPercentileLatency_update 99thPercentileLatency_read
AvgLatency_read AvgLatency_update
99.9thPercentileLatency_read 99.9thPercentileLatency_update
99.99thPercentileLatency_read 99.99thPercentileLatency_update
2017 Software Architecture Summit
Async DAL Service KPI Comparison – Cont.
• High Throughput
• 3-10X Increase (Single Instance Comparison)
2017 Software Architecture Summit
Async DAL Service KPI Comparison – Cont.
• Less CPU Usage
• 50% CPU Usage Reduction
• 66%+ Reduction for Context Switch & System Interrupts
2017 Software Architecture Summit
Async DAL Service KPI Comparison – Cont.
• Less Thread Pool
• 90% Reduction for Thread pool number
0
20
40
60
80
100
120
140
160
180
200
Server RPC Thread Operation Thread Replication Thread Management Thread
9
0 0 2
200
14
40
2
Thread Number Comparison
Async Sync
Async DAL Service KPI Comparison – Cont.
• Memory Friendly
• 20% Reduction for Memory Allocation
• 100+MB Young Generation after Young GC
• 130+MB Pooled Off-heap
0.00%
0.01%
0.02%
0.03%
0.04%
0.05%
0.06%
0.07%
Sync Async
GC Time / Total Time
GC Time / Total Time
0
50
100
150
200
250
300
350
Sync Async
GC Count
GC Count
We Have ONE Async Dream
• Reform Application Charter from CPU-bound Charter to IO-
bound
• Traffic Throughput (non-)linear growth with CPU Usage
• By guarantee Low Latency, Taking 20-30K TPS with 500MB
JVM Heap (After young GC)
• Cloud Friendly Application
• Less Hardware Investment
• Low Operational Cost
• Easy Capacity Estimation
High Performance Design
E2E Async • Non-blocking Pipeline: Async
RPC + Async DataAccess
Less is More • Shared ThreadPool OVER
Separate ThreadPool
• Inline Execution over
Execution cross Multiple
Thread Pool
Autonomous Memory
Management
• Use Off-Heap as much as
possible
(inbound/outbound &
[de]serialization)
• Release Inbound Memory At
earlier stage (submitRequest)
High Performance Good Practice
• Performance Test as Critical Path
for Each Commit
• [Mandatory] Continuous
Performance Test for Each
Commit
Inbound/Outbound
Management
• Batch Consolidation
• Order Management
• Timeout Management
• Retry Only Happen in Client Side
Programming Habit • Fast Fail over Exception Thrown
Cascading
• Logging & Monitoring Matters
• Thread-safe Write Operation In
Control Plan while Exception-safe
Read Operation In Data Plane
KPI Sign-Off
Async High Level Architecture
Real Time Data Service
Data Set Clients
Data Set 1
Client
Data Set N
Client
Data Set Schema
Data Access API Metadata API Generic Configuration API
KV Store APIClient
Server
Biz logic
HTTP(s) RPC Client
HTTP(s) RPC Server
KV Store API
Generic logic
Schema-less
Read
KV Store
Metadata namespace Data set namespace
Configuration
namespace
Direct access
Service access
Store/Cache
Async DAL Service Hierarchy
Async Data Access Maturity
• Client& Server RoR Identification
• biz-schema aware on Client Side
• Schema-less on Sever Side
• Traffic Sharding & Routing
• Active-Active/Active-Standby
• Auto-Failover
• Multi-Tenancy
• ACL
• Direct/Service-To-Service Replication
… ....
• Source-of-Truth for Online Guideline &
Offline Inventory
• Centralized Configuration
• Zero Restart/Auto-Fresh
DAL Service Feature
Metadata Driven
Data Access
Mapping
DataSet => KV Mapping
Logical => Physical DataSet Mapping
2017 Software Architecture Summit
Async RPC Control Plane Abstraction
2017 Software Architecture Summit
Async RPC Maturity
• Configurable Execution Chain per URL
• Customize protobuf / json encoder
• Inject Monitoring Module
• Execution Resource Configuration
• Threadpool size / netty option (tcp_nodelay)
• Sharable or not
• Service Listener Registry
• Server Container Life Cycle Management
• Graceful Shutdown
• Partial Shutdown Given Container
• Auto Rebuild RPC Client Channel
High Flexibility
Configuration
RPC Resource
Management
Async RPC Embrace Async DataAccess
Async Core Value
• Low Latency + High Throughput
• Low System Load
• SLA Isolation
• Understand Performance Contribution More
• Zero Code Change + Zero Release (new case
on-board)
• Minimize new DB Storage Integration Effort
• Lego-Style Customization
• Highly Reusable Functionality
High Performance
Easy Adoption
Cost Saving • Less Hardware Investment
• Loose Constraint for Hardware/VM SKU
High Flexibility
Configuration
• Execution Chain per URL (RPC)
• DataAccess Storage & Option [consistency &
ttl]
• Traffic Routing Strategy
• Replication Strategy
2017 Software Architecture Summit
Async Family
Async
Data
Access
RPC
(Server/
Client)
In-Memory
Aerospike
Workflow
Messaging
(pub-sub)
Kafka
ActiveMQ
Netty
HBase
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
Future Plan
• Shared Eventloop
• Netty Option (IO Ratio)
• NIO vs Epoll SocketChannel
• JDK SSL vs OpenSSL
• Protobuf vs Msgpack
• Sync Client vs Async Client
• W/- Monitoring/Replication features
Async DataAccess • Compute Operation Support
• DB Server-side UDF Adoption
• Smart Client for Direct & Service Access
• Async HBase Integration
Async RPC • Finer Granularity Monitoring & Throttling
• Error Handling Injection
• Client Side Multiplexing
• Server Push Partial Response + RPC Client
Consolidate Response
Async+Sync Hybrid Workflow Execution
Continuous Performance
Tuning Deep Dive
Open Source in Year 2019
2017 Software Architecture Summit

More Related Content

PPTX
Graphic Processing Unit
PPTX
Safex pay corporate presentation
PDF
Robotic Process Automation (RPA)
PPTX
Virtual reality VS Augmented Reality
PPTX
Peter Afanasiev - Architecture of online Payments
PDF
Payment gateway testing
PPTX
Getting started with RPA (uipath)
PPT
cps_nitin_final.ppt
Graphic Processing Unit
Safex pay corporate presentation
Robotic Process Automation (RPA)
Virtual reality VS Augmented Reality
Peter Afanasiev - Architecture of online Payments
Payment gateway testing
Getting started with RPA (uipath)
cps_nitin_final.ppt

What's hot (20)

PDF
On Call Process (PDF)
PPTX
Online payment gateway provider
PPTX
What's cryptocurrency ?
PPTX
Newgen Banking ppt
PPTX
Building Converged Plantwide Ethernet
PDF
Global Payment Reference Architecture
PDF
IoT and Energy
PPTX
Arm Processor
PDF
Smart retail using IOT
PDF
The UX of Virtual Reality
PDF
Business Requirement Document
PPT
Reconfigurable Computing
PDF
Smart City: Many Applications and Devices
PPTX
RISC-V Unconstrained
PPTX
Cryptocurrency
PDF
Engineering.com webinar: Real-time 3D and digital twins: The power of a virtu...
PPT
Robotics Process Automation for Banking, Financial Services and Insurance (BF...
PDF
Smarter Digital Banking
PDF
Getting started with Stripe
PPTX
METAVERSE SEMINAR PRESENTATION.pptx
On Call Process (PDF)
Online payment gateway provider
What's cryptocurrency ?
Newgen Banking ppt
Building Converged Plantwide Ethernet
Global Payment Reference Architecture
IoT and Energy
Arm Processor
Smart retail using IOT
The UX of Virtual Reality
Business Requirement Document
Reconfigurable Computing
Smart City: Many Applications and Devices
RISC-V Unconstrained
Cryptocurrency
Engineering.com webinar: Real-time 3D and digital twins: The power of a virtu...
Robotics Process Automation for Banking, Financial Services and Insurance (BF...
Smarter Digital Banking
Getting started with Stripe
METAVERSE SEMINAR PRESENTATION.pptx
Ad

Similar to PayPal Risk Platform High Performance Practice (20)

PPTX
7 steps to Enterprise PaaS
PPTX
Tech Talks Microservices
PPTX
Establishing SOA Focused Enterprise Architecture
PDF
Leverage An Intelligent Application Infrastructure for Competitive Advantage.
PPTX
Application design for the cloud using AWS
PDF
Ultra-scale e-Commerce Transaction Services with Lean Middleware
PDF
Spring Boot & Spring Cloud on PAS- Nate Schutta (1/2)
PDF
Evolving to Cloud-Native - Nate Schutta (2/2)
PPT
S+S Architecture Overview
PPTX
Interop Las Vegas Cloud Connect Summit 2014 - Software Defined Data Center
PPTX
Breaking the Monolith
PDF
Using a private cloud to automate and govern enterprise development
PPTX
Azure Application Architecture Guide
PPTX
El camino a las Cloud Native Apps - Introduction
PPTX
VMware vFabric - CIO Webinar - Al Sargent
PPT
ArcReady - Architecting Modern Distributed Applications
PDF
Public Cloud Workshop
PDF
SaaS startups - Software Engineering Challenges
PDF
Evolving to Cloud-Native - Nate Schutta 2/2
PDF
Brian Oliver Pimp My Data Grid
7 steps to Enterprise PaaS
Tech Talks Microservices
Establishing SOA Focused Enterprise Architecture
Leverage An Intelligent Application Infrastructure for Competitive Advantage.
Application design for the cloud using AWS
Ultra-scale e-Commerce Transaction Services with Lean Middleware
Spring Boot & Spring Cloud on PAS- Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (2/2)
S+S Architecture Overview
Interop Las Vegas Cloud Connect Summit 2014 - Software Defined Data Center
Breaking the Monolith
Using a private cloud to automate and govern enterprise development
Azure Application Architecture Guide
El camino a las Cloud Native Apps - Introduction
VMware vFabric - CIO Webinar - Al Sargent
ArcReady - Architecting Modern Distributed Applications
Public Cloud Workshop
SaaS startups - Software Engineering Challenges
Evolving to Cloud-Native - Nate Schutta 2/2
Brian Oliver Pimp My Data Grid
Ad

Recently uploaded (20)

PPTX
An Unlikely Response 08 10 2025.pptx
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PPTX
Project and change Managment: short video sequences for IBA
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PPTX
Intro to ISO 9001 2015.pptx wareness raising
PPTX
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
PPTX
Primary and secondary sources, and history
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PPTX
Tour Presentation Educational Activity.pptx
PPTX
Effective_Handling_Information_Presentation.pptx
PPTX
Relationship Management Presentation In Banking.pptx
PDF
Parts of Speech Prepositions Presentation in Colorful Cute Style_20250724_230...
PPTX
The spiral of silence is a theory in communication and political science that...
PPTX
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
PPTX
2025-08-10 Joseph 02 (shared slides).pptx
PPTX
Understanding-Communication-Berlos-S-M-C-R-Model.pptx
PPTX
Impressionism_PostImpressionism_Presentation.pptx
PPTX
Self management and self evaluation presentation
PPTX
nose tajweed for the arabic alphabets for the responsive
DOC
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
An Unlikely Response 08 10 2025.pptx
Swiggy’s Playbook: UX, Logistics & Monetization
Project and change Managment: short video sequences for IBA
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
Intro to ISO 9001 2015.pptx wareness raising
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
Primary and secondary sources, and history
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Tour Presentation Educational Activity.pptx
Effective_Handling_Information_Presentation.pptx
Relationship Management Presentation In Banking.pptx
Parts of Speech Prepositions Presentation in Colorful Cute Style_20250724_230...
The spiral of silence is a theory in communication and political science that...
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
2025-08-10 Joseph 02 (shared slides).pptx
Understanding-Communication-Berlos-S-M-C-R-Model.pptx
Impressionism_PostImpressionism_Presentation.pptx
Self management and self evaluation presentation
nose tajweed for the arabic alphabets for the responsive
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证

PayPal Risk Platform High Performance Practice

  • 1. PayPal Risk Platform High Performance Practice Ling ZhiJun (Brian Ling)
  • 2. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 3. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 4. 2017 Software Architecture Summit TPV/day ~1 BILLIONpayments/year 6.1 BILLIO N Computation/day ~20 Billion Active Customer Accounts 210M petabytes of data 105 Queries/ day 250 Billion PayPal operates one of the largest Online Payment in the world 0.32% Loss Rate The power of our platform Our technology transformation enables us to: • Process payments at tremendous scale (200+ countries & 25currencies supported) • Accelerate the innovation of new products • Engage world-class developers & technologists PayPal Overview
  • 5. 2017 Software Architecture Summit TPV +35 4 BILLION payments/year 6.1 BILLIO N payments/ second at peak 1.8B active customer accounts 210M petabytes of data 73 database calls/ quarter 4.5T PayPal operates one of the largest Online Payment in the world 0.32% Loss Rate The power of our platform Our technology transformation enables us to: • Process payments at tremendous scale (200+ countries & 25currencies supported) • Accelerate the innovation of new products • Engage world-class developers & technologists PayPal Risk KPI Payments transactions
  • 6. Requirement for Risk Platform Accuracy vs Latency Low Latency + Hardware Investment Vs Large Throughput
  • 7. 2017 Software Architecture Summit PayPal Risk Platform Architecture Online Offline DAL Service Real-time Compute Data Offline Generated Data Model + Variable Computation Service Decision Service Variable Rollup Service Logging System/ ETL Read Path Write Path Gateway Service Offline Generated Data Simulated Real-time Data Offline Variable Simulation PlatformModel Training Platform Offline Variable Aggregation Service
  • 8. 2017 Software Architecture Summit PayPal Risk Platform Architecture Online Offline DAL Service Offline Generated Data Real-time Compute Data Model + Variable Computation Service Decision Service Variable Aggregation Service Logging System/ ETL Read Path Write Path Gateway Service Offline Generated Data Simulated Real-time Data Offline Variable Simulation PlatformModel Training Platform Offline Variable Aggregation Service
  • 9. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 10. DAL Service Ultimate Questions JVM-Based High Performance & ATB DAL Service <100ms P99.99 Latency ?? For single instance, 20k-30k Peak TPS ?? • 99.99% Availability-To-Business??
  • 11. DAL Service Technical Challenges Budget Cost • Align with traffic, Hardware investment Exponential Increase Performance Issue • P99 Latency Significantly differentiate Avg latency • Too Many Latency Spike under Traffic • Storage Cluster Unavailability Impact Latency Customer Requirement • Adopt New Use Case • Access behavior Differentiate per Colo • Flexibility & Fast-evolving Use Case • Replication • Traffic Strategy Operational Cost • Maintain too many Client with multiple versions • Too Frequent Release tie to Biz Case • Standby Storage Cluster switch- over Req Tech Value Cost
  • 12. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 13. 2017 Software Architecture Summit Async Original Benefit • More Efficient Thread Scheduling • Non-blocking Call • Event-Driven Callback • Less Context Switch • Fault Isolation
  • 14. 2017 Software Architecture Summit Reactor Pattern Threading Model
  • 15. 2017 Software Architecture Summit Async DAL Service KPI Comparison • Low Latency • ~10-35% Reduction (Average/P99) 0 20000 40000 60000 80000 100000 120000 200030004000500060007000800090001000011000120001300014000150001600017000 LATENCY(INMICROSECONDS) THROUGHPUT (REQUESTS PER SEC) E2E Client-Service-Aerospike Benchmark: Read 50% Write 50% Latency vs. Throughput (4-core VM) 99thPercentileLatency_update 99thPercentileLatency_read AvgLatency_read AvgLatency_update 99.9thPercentileLatency_read 99.9thPercentileLatency_update 99.99thPercentileLatency_read 99.99thPercentileLatency_update
  • 16. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • High Throughput • 3-10X Increase (Single Instance Comparison)
  • 17. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • Less CPU Usage • 50% CPU Usage Reduction • 66%+ Reduction for Context Switch & System Interrupts
  • 18. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • Less Thread Pool • 90% Reduction for Thread pool number 0 20 40 60 80 100 120 140 160 180 200 Server RPC Thread Operation Thread Replication Thread Management Thread 9 0 0 2 200 14 40 2 Thread Number Comparison Async Sync
  • 19. Async DAL Service KPI Comparison – Cont. • Memory Friendly • 20% Reduction for Memory Allocation • 100+MB Young Generation after Young GC • 130+MB Pooled Off-heap 0.00% 0.01% 0.02% 0.03% 0.04% 0.05% 0.06% 0.07% Sync Async GC Time / Total Time GC Time / Total Time 0 50 100 150 200 250 300 350 Sync Async GC Count GC Count
  • 20. We Have ONE Async Dream • Reform Application Charter from CPU-bound Charter to IO- bound • Traffic Throughput (non-)linear growth with CPU Usage • By guarantee Low Latency, Taking 20-30K TPS with 500MB JVM Heap (After young GC) • Cloud Friendly Application • Less Hardware Investment • Low Operational Cost • Easy Capacity Estimation
  • 21. High Performance Design E2E Async • Non-blocking Pipeline: Async RPC + Async DataAccess Less is More • Shared ThreadPool OVER Separate ThreadPool • Inline Execution over Execution cross Multiple Thread Pool Autonomous Memory Management • Use Off-Heap as much as possible (inbound/outbound & [de]serialization) • Release Inbound Memory At earlier stage (submitRequest)
  • 22. High Performance Good Practice • Performance Test as Critical Path for Each Commit • [Mandatory] Continuous Performance Test for Each Commit Inbound/Outbound Management • Batch Consolidation • Order Management • Timeout Management • Retry Only Happen in Client Side Programming Habit • Fast Fail over Exception Thrown Cascading • Logging & Monitoring Matters • Thread-safe Write Operation In Control Plan while Exception-safe Read Operation In Data Plane KPI Sign-Off
  • 23. Async High Level Architecture Real Time Data Service Data Set Clients Data Set 1 Client Data Set N Client Data Set Schema Data Access API Metadata API Generic Configuration API KV Store APIClient Server Biz logic HTTP(s) RPC Client HTTP(s) RPC Server KV Store API Generic logic Schema-less Read KV Store Metadata namespace Data set namespace Configuration namespace Direct access Service access Store/Cache
  • 24. Async DAL Service Hierarchy
  • 25. Async Data Access Maturity • Client& Server RoR Identification • biz-schema aware on Client Side • Schema-less on Sever Side • Traffic Sharding & Routing • Active-Active/Active-Standby • Auto-Failover • Multi-Tenancy • ACL • Direct/Service-To-Service Replication … .... • Source-of-Truth for Online Guideline & Offline Inventory • Centralized Configuration • Zero Restart/Auto-Fresh DAL Service Feature Metadata Driven Data Access Mapping DataSet => KV Mapping Logical => Physical DataSet Mapping
  • 26. 2017 Software Architecture Summit Async RPC Control Plane Abstraction
  • 27. 2017 Software Architecture Summit Async RPC Maturity • Configurable Execution Chain per URL • Customize protobuf / json encoder • Inject Monitoring Module • Execution Resource Configuration • Threadpool size / netty option (tcp_nodelay) • Sharable or not • Service Listener Registry • Server Container Life Cycle Management • Graceful Shutdown • Partial Shutdown Given Container • Auto Rebuild RPC Client Channel High Flexibility Configuration RPC Resource Management
  • 28. Async RPC Embrace Async DataAccess
  • 29. Async Core Value • Low Latency + High Throughput • Low System Load • SLA Isolation • Understand Performance Contribution More • Zero Code Change + Zero Release (new case on-board) • Minimize new DB Storage Integration Effort • Lego-Style Customization • Highly Reusable Functionality High Performance Easy Adoption Cost Saving • Less Hardware Investment • Loose Constraint for Hardware/VM SKU High Flexibility Configuration • Execution Chain per URL (RPC) • DataAccess Storage & Option [consistency & ttl] • Traffic Routing Strategy • Replication Strategy
  • 30. 2017 Software Architecture Summit Async Family Async Data Access RPC (Server/ Client) In-Memory Aerospike Workflow Messaging (pub-sub) Kafka ActiveMQ Netty HBase
  • 31. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 32. Future Plan • Shared Eventloop • Netty Option (IO Ratio) • NIO vs Epoll SocketChannel • JDK SSL vs OpenSSL • Protobuf vs Msgpack • Sync Client vs Async Client • W/- Monitoring/Replication features Async DataAccess • Compute Operation Support • DB Server-side UDF Adoption • Smart Client for Direct & Service Access • Async HBase Integration Async RPC • Finer Granularity Monitoring & Throttling • Error Handling Injection • Client Side Multiplexing • Server Push Partial Response + RPC Client Consolidate Response Async+Sync Hybrid Workflow Execution Continuous Performance Tuning Deep Dive Open Source in Year 2019

Editor's Notes

  • #8: DAL Service: Control Connection Pool Centralized Control & Highly Reusability (easily storage migration/non-backward compatible migration & throttling & ACL control) => Minimize Client Upgrade & Integration Effort Seamless storage switch & upgrade
  • #9: Control Connection Pool Centralized Control & Highly Reusability (easily storage migration/non-backward compatible migration & throttling & ACL control) Minimize Client Upgrade & Integration Effort
  • #11: GC issue Lock Contention (non-blocking) Threading switch & context switch IO Blocking cache line refresh/cache miss IPC => instruction per cycle
  • #12: Use case: TTL/timeout ACL Replication Traffic strategy
  • #14: Leverage OS support event-driven notification: windows IOCP & Linux Epoll & osx kqueue Fully leverage CPU Cycle only for Inbound & outbound Handle Short-lived Thread Task for better Thread Usage Not-involve Client Thread for blocking waiting for downstream storage response & less impact for Client System Resource Usage 我们可以知道Epoll不负责IO操作,所以它只告诉你当前可读可写了,并且将协议读写缓冲填充,由用户去读写控制,此时我们可以做出额外的许多操作。IOCP则直接将IO通道里的读写操作都做完了才通知用户,当IO通道里发生了堵塞等状况我们是无法控制的。
  • #15: 反应器模式:Boss Thread同步的将输入的请求事件 利用多路复用分配策略快速分发给相应的Worker Thread Handler 通过底层数据存储回调事件 通知事后的Response 处理 ** 异步操作:有通知无需轮询检查 非堵塞:操作结果是否等待(是否马上有返回值)由回调的事件触发后续RPC Channel flush 返回结果给客户端
  • #18: Under same throughput situation
  • #22: Async for platform-wise & framework level, for business logic, not easy to adopt async pattern Use off-heap: Schema-less for inbound & outbound Release request memory: Retry won’t happen in DAL service
  • #24: Aerospike: High write performance & specific optimization for SSD => 1M TPS with P99 <1ms DRAM/SSD Hybrid Solution High ATB & Scalability | Local Replication & XDR Aerospike VLDB 2016 Paper
  • #25: Batch & Retry Traffic Routing & HA ACL & Multi-Tenancy
  • #30: 以性能为导向的 可靠的 全链路异步服务访问框架 灵活支持企业级需求 数据访问 可配置 高性能 异步RPC访问