SlideShare a Scribd company logo
How Orange Financial combat financial
frauds over 50M transactions a day using
Apache Pulsar
Vincent Xie (Bestpay), Jia Zhai (StreamNative)
About us
Vincent (Weisheng) Xie
❏ Current Director @ Orange Financial
❏ Previous Tech lead of ML engineering
team @ Intel
Jia Zhai
❏ Co-Founder of StreamNative
❏ Apache Pulsar PMC Member
❏ Apache BookKeeper PMC Member
Agenda
❏ Background
❏ Apache Pulsar
❏ Unified Data Processing
❏ Our Practices
❏ Q & A
Background Intro
Orange Financial
Orange Financial Services Group (Chinese: 甜橙金融), formerly known as Bestpay, is an affiliate company of
China Telecom. It reached 1.13 trillion CNY ($18.37 Billion) transaction volume in 2018, with 500 million registered
users and 41.9 million active users.
Subsidiaries:
Bestpay - a mobile wallet and payment app
Jieqian - a consumer loan service
Orange Wealth
Orange Insurance
Orange Credit
Orange Financial Cloud
How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar
Source: iiMedia Research Inc.
High Industry Penetration Rate
Source: China Unionpay
Source: RSA
Challenges
❏ High concurrency
❏ > 50M transactions, 1 billion events a day (peek: 35K/s)
❏ Low latency demand
❏ response < 200ms
❏ Large number of batch jobs and streaming jobs
“A merchant’s total transaction volume ($) within the past month (30days)
(current transaction included)”
= sum($past_29days) + sum($today_upto_current)
batch streaming
Architecture V1
API
Gateway
Batch Layer
Speed/Streaming Layer
Architecture V1 - Lambda
API
Gateway
Serving
Layer
Drawbacks
❏ S/W stacks complexity
❏ Realtime / Offline / Serving stacks
❏ Multiple clusters to maintain (Kafka / Hive / Spark / Flink)
❏ Different skill sets to manipulate (Scala / Java / SQL)
❏ Segmented Logics
❏ Historical/Current
❏ Data redundancy
❏ Multiple duplications to move over
Introduce Apache Pulsar
What is Apache Pulsar?
“Flexible Pub/Sub Messaging
Backed by durable log storage”
Pulsar - A cloud-native architecture
Stateless Serving
Durable Storage
Pulsar - Segment Centric Storage
❏ Topic Partition (Managed Ledger)
❏ The storage layer for a single topic
partition
❏ Segment (Ledger)
❏ Single writer, append-only
❏ Replicated to multiple bookies
Pulsar - Pub/Sub
Pulsar - Topic Partitions
Pulsar - Segments
Pulsar - Stream
Pulsar - Stream as a unified view on data
Pulsar - Two levels of reading API
❏ Pub/Sub (Streaming)
❏ Read data from brokers
❏ Consume / Seek / Receive
❏ Subscription Mode - Failover, Shared, Key_Shared
❏ Reprocessing data by rewinding (seeking) the cursors
❏ Segment (Batch)
❏ Read data from storage (bookkeeper or tiered storage)
❏ Fine-grained Parallelism
❏ Predicate pushdown (publish timestamp)
Unified Data Processing on Pulsar
Architecture V2
API
Gateway
Spark Structured
Streaming
Spark SQL
Architecture V2
API
Gateway
Spark Structured
Streaming
Spark SQL
❏ Single Data Store (Pulsar)
❏ Single Computing Engine (Spark)
❏ Unified API
Pulsar-Spark
❏ Deeply integrated with Pulsar schema
❏ Pulsar topics as Structured Streams
❏ Pulsar Connectors for Spark Structured Streaming
❏ Pulsar Connectors for Spark SQL
https://github.com/streamnative/pulsar-spark
Pulsar-Spark / Streaming Queries
https://github.com/streamnative/pulsar-spark
Pulsar-Spark / Batch Queries
https://github.com/streamnative/pulsar-spark
Pulsar-Spark / Write Results to Pulsar
https://github.com/streamnative/pulsar-spark
PoC at Bestpay
❏ Ingest data to Pulsar
❏ Realtime Data
❏ pulsar-io-kafka: connect kafka messages (JSON) to Pulsar
and store them in AVRO format with schema information
❏ Historic Data
❏ pulsar-spark: query the Hive table and insert Hive rows as
Pulsar messages (AVRO) to Pulsar
❏ Data Processing
❏ Spark Structured Streaming: for stream processing
❏ Spark SQL: for batch processing and interactive queries
Benefits
❏ Complexity drop 33% (Number of clusters from 6 down to 4)
❏ Storage saving 8.7% (expect to be 28%)
❏ Time to production boosts 11x (backed with streaming SQL)
❏ Higher stability (expected)
Summary
❏ Apache Pulsar is a cloud-native messaging streaming system
❏ Multi layered architecture
❏ Segment centric storage
❏ Two levels of reading API: Pub/Sub + Segment
❏ Apache Pulsar provides a unified view of data
❏ Pulsar + Spark for a simple unified data processing
References
❏ pulsar-io-kafka: https://github.com/streamnative/pulsar-io-kafka
❏ pulsar-spark: https://github.com/streamnative/pulsar-spark
❏ Apache Pulsar as One Storage System for Both Real-time and Historical Data
Analysis:
https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c590
17
Community
❏ Pulsar Website: https://pulsar.apache.org
❏ Twitter: @apache_pulsar / @streamnativeio
❏ Slack: https://apache-pulsar.herokuapp.com
❏ Mailing Lists
dev@pulsar.apache.org, users@pulsar.apache.org
❏ Github
https://github.com/apache/pulsar
❏ Medium
https://medium.com/streamnative
Thanks!

More Related Content

PDF
How to improve ELK log pipeline performance
PDF
Optimising Geospatial Queries with Dynamic File Pruning
PDF
Elastic Observability keynote
PDF
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
PDF
NATS Streaming - an alternative to Apache Kafka?
PDF
Airflow presentation
PDF
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
PDF
Grafana Loki: like Prometheus, but for Logs
How to improve ELK log pipeline performance
Optimising Geospatial Queries with Dynamic File Pruning
Elastic Observability keynote
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
NATS Streaming - an alternative to Apache Kafka?
Airflow presentation
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
Grafana Loki: like Prometheus, but for Logs

What's hot (20)

PPTX
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
PPTX
Introduction to Kafka and Zookeeper
PDF
BlueStore: a new, faster storage backend for Ceph
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PDF
Intro to open source observability with grafana, prometheus, loki, and tempo(...
PDF
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
PDF
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
PPTX
Spark streaming
PDF
Flash for Apache Spark Shuffle with Cosco
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
PPTX
Large Scale Graph Analytics with JanusGraph
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
PDF
Introduction to Spark Streaming
PPTX
Apache kafka
PDF
CI/CD Tools Universe: The Ultimate List
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
PDF
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
PDF
Spark with Delta Lake
PDF
Serverless Streaming Architectures and Algorithms for the Enterprise
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Building robust CDC pipeline with Apache Hudi and Debezium
Introduction to Kafka and Zookeeper
BlueStore: a new, faster storage backend for Ceph
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Spark streaming
Flash for Apache Spark Shuffle with Cosco
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Large Scale Graph Analytics with JanusGraph
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Introduction to Spark Streaming
Apache kafka
CI/CD Tools Universe: The Ultimate List
Introduction to Apache NiFi dws19 DWS - DC 2019
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Spark with Delta Lake
Serverless Streaming Architectures and Algorithms for the Enterprise
Ad

Similar to How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar (20)

PDF
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
PDF
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
PDF
Cloud lunch and learn real-time streaming in azure
PDF
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PDF
Open keynote_carolyn&matteo&sijie
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
PDF
When apache pulsar meets apache flink
PDF
Fast Streaming into Clickhouse with Apache Pulsar
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PDF
Sink Your Teeth into Streaming at Any Scale
PDF
Sink Your Teeth into Streaming at Any Scale
PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
PDF
Timothy Spann: Apache Pulsar for ML
PDF
Apache Pulsar Development 101 with Python
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
Cloud lunch and learn real-time streaming in azure
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Open keynote_carolyn&matteo&sijie
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
When apache pulsar meets apache flink
Fast Streaming into Clickhouse with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
Sink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann: Apache Pulsar for ML
Apache Pulsar Development 101 with Python
bigdata 2022_ FLiP Into Pulsar Apps
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Distributed Database Design Decisions to Support High Performance Event Strea...
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...

Recently uploaded (20)

PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
AutoCAD Professional Crack 2025 With License Key
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Website Design Services for Small Businesses.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Download FL Studio Crack Latest version 2025 ?
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
AutoCAD Professional Crack 2025 With License Key
Patient Appointment Booking in Odoo with online payment
Designing Intelligence for the Shop Floor.pdf
iTop VPN Free 5.6.0.5262 Crack latest version 2025
Salesforce Agentforce AI Implementation.pdf
CHAPTER 2 - PM Management and IT Context
Website Design Services for Small Businesses.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Weekly report ppt - harsh dattuprasad patel.pptx
Monitoring Stack: Grafana, Loki & Promtail
Oracle Fusion HCM Cloud Demo for Beginners
Operating system designcfffgfgggggggvggggggggg
Download FL Studio Crack Latest version 2025 ?
Why Generative AI is the Future of Content, Code & Creativity?
Digital Systems & Binary Numbers (comprehensive )
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps

How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar

  • 1. How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar Vincent Xie (Bestpay), Jia Zhai (StreamNative)
  • 2. About us Vincent (Weisheng) Xie ❏ Current Director @ Orange Financial ❏ Previous Tech lead of ML engineering team @ Intel Jia Zhai ❏ Co-Founder of StreamNative ❏ Apache Pulsar PMC Member ❏ Apache BookKeeper PMC Member
  • 3. Agenda ❏ Background ❏ Apache Pulsar ❏ Unified Data Processing ❏ Our Practices ❏ Q & A
  • 5. Orange Financial Orange Financial Services Group (Chinese: 甜橙金融), formerly known as Bestpay, is an affiliate company of China Telecom. It reached 1.13 trillion CNY ($18.37 Billion) transaction volume in 2018, with 500 million registered users and 41.9 million active users. Subsidiaries: Bestpay - a mobile wallet and payment app Jieqian - a consumer loan service Orange Wealth Orange Insurance Orange Credit Orange Financial Cloud
  • 8. High Industry Penetration Rate Source: China Unionpay
  • 10. Challenges ❏ High concurrency ❏ > 50M transactions, 1 billion events a day (peek: 35K/s) ❏ Low latency demand ❏ response < 200ms ❏ Large number of batch jobs and streaming jobs
  • 11. “A merchant’s total transaction volume ($) within the past month (30days) (current transaction included)” = sum($past_29days) + sum($today_upto_current) batch streaming
  • 13. Batch Layer Speed/Streaming Layer Architecture V1 - Lambda API Gateway Serving Layer
  • 14. Drawbacks ❏ S/W stacks complexity ❏ Realtime / Offline / Serving stacks ❏ Multiple clusters to maintain (Kafka / Hive / Spark / Flink) ❏ Different skill sets to manipulate (Scala / Java / SQL) ❏ Segmented Logics ❏ Historical/Current ❏ Data redundancy ❏ Multiple duplications to move over
  • 16. What is Apache Pulsar?
  • 17. “Flexible Pub/Sub Messaging Backed by durable log storage”
  • 18. Pulsar - A cloud-native architecture Stateless Serving Durable Storage
  • 19. Pulsar - Segment Centric Storage ❏ Topic Partition (Managed Ledger) ❏ The storage layer for a single topic partition ❏ Segment (Ledger) ❏ Single writer, append-only ❏ Replicated to multiple bookies
  • 21. Pulsar - Topic Partitions
  • 24. Pulsar - Stream as a unified view on data
  • 25. Pulsar - Two levels of reading API ❏ Pub/Sub (Streaming) ❏ Read data from brokers ❏ Consume / Seek / Receive ❏ Subscription Mode - Failover, Shared, Key_Shared ❏ Reprocessing data by rewinding (seeking) the cursors ❏ Segment (Batch) ❏ Read data from storage (bookkeeper or tiered storage) ❏ Fine-grained Parallelism ❏ Predicate pushdown (publish timestamp)
  • 28. Architecture V2 API Gateway Spark Structured Streaming Spark SQL ❏ Single Data Store (Pulsar) ❏ Single Computing Engine (Spark) ❏ Unified API
  • 29. Pulsar-Spark ❏ Deeply integrated with Pulsar schema ❏ Pulsar topics as Structured Streams ❏ Pulsar Connectors for Spark Structured Streaming ❏ Pulsar Connectors for Spark SQL https://github.com/streamnative/pulsar-spark
  • 30. Pulsar-Spark / Streaming Queries https://github.com/streamnative/pulsar-spark
  • 31. Pulsar-Spark / Batch Queries https://github.com/streamnative/pulsar-spark
  • 32. Pulsar-Spark / Write Results to Pulsar https://github.com/streamnative/pulsar-spark
  • 33. PoC at Bestpay ❏ Ingest data to Pulsar ❏ Realtime Data ❏ pulsar-io-kafka: connect kafka messages (JSON) to Pulsar and store them in AVRO format with schema information ❏ Historic Data ❏ pulsar-spark: query the Hive table and insert Hive rows as Pulsar messages (AVRO) to Pulsar ❏ Data Processing ❏ Spark Structured Streaming: for stream processing ❏ Spark SQL: for batch processing and interactive queries
  • 34. Benefits ❏ Complexity drop 33% (Number of clusters from 6 down to 4) ❏ Storage saving 8.7% (expect to be 28%) ❏ Time to production boosts 11x (backed with streaming SQL) ❏ Higher stability (expected)
  • 35. Summary ❏ Apache Pulsar is a cloud-native messaging streaming system ❏ Multi layered architecture ❏ Segment centric storage ❏ Two levels of reading API: Pub/Sub + Segment ❏ Apache Pulsar provides a unified view of data ❏ Pulsar + Spark for a simple unified data processing
  • 36. References ❏ pulsar-io-kafka: https://github.com/streamnative/pulsar-io-kafka ❏ pulsar-spark: https://github.com/streamnative/pulsar-spark ❏ Apache Pulsar as One Storage System for Both Real-time and Historical Data Analysis: https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c590 17
  • 37. Community ❏ Pulsar Website: https://pulsar.apache.org ❏ Twitter: @apache_pulsar / @streamnativeio ❏ Slack: https://apache-pulsar.herokuapp.com ❏ Mailing Lists dev@pulsar.apache.org, users@pulsar.apache.org ❏ Github https://github.com/apache/pulsar ❏ Medium https://medium.com/streamnative