How Orange Financial combat financial
frauds over 50M transactions a day using
Apache Pulsar
Vincent Xie (Bestpay), Jia Zhai (StreamNative)
About us
Vincent (Weisheng) Xie
❏ Current Director @ Orange Financial
❏ Previous Tech lead of ML engineering
team @ Intel
Jia Zhai
❏ Co-Founder of StreamNative
❏ Apache Pulsar PMC Member
❏ Apache BookKeeper PMC Member
Agenda
❏ Background
❏ Apache Pulsar
❏ Unified Data Processing
❏ Our Practices
❏ Q & A
Background Intro
Orange Financial
Orange Financial Services Group (Chinese: 甜橙金融), formerly known as Bestpay, is an affiliate company of
China Telecom. It reached 1.13 trillion CNY ($18.37 Billion) transaction volume in 2018, with 500 million registered
users and 41.9 million active users.
Subsidiaries:
Bestpay - a mobile wallet and payment app
Jieqian - a consumer loan service
Orange Wealth
Orange Insurance
Orange Credit
Orange Financial Cloud
How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar
Source: iiMedia Research Inc.
High Industry Penetration Rate
Source: China Unionpay
Source: RSA
Challenges
❏ High concurrency
❏ > 50M transactions, 1 billion events a day (peek: 35K/s)
❏ Low latency demand
❏ response < 200ms
❏ Large number of batch jobs and streaming jobs
“A merchant’s total transaction volume ($) within the past month (30days)
(current transaction included)”
= sum($past_29days) + sum($today_upto_current)
batch streaming
Architecture V1
API
Gateway
Batch Layer
Speed/Streaming Layer
Architecture V1 - Lambda
API
Gateway
Serving
Layer
Drawbacks
❏ S/W stacks complexity
❏ Realtime / Offline / Serving stacks
❏ Multiple clusters to maintain (Kafka / Hive / Spark / Flink)
❏ Different skill sets to manipulate (Scala / Java / SQL)
❏ Segmented Logics
❏ Historical/Current
❏ Data redundancy
❏ Multiple duplications to move over
Introduce Apache Pulsar
What is Apache Pulsar?
“Flexible Pub/Sub Messaging
Backed by durable log storage”
Pulsar - A cloud-native architecture
Stateless Serving
Durable Storage
Pulsar - Segment Centric Storage
❏ Topic Partition (Managed Ledger)
❏ The storage layer for a single topic
partition
❏ Segment (Ledger)
❏ Single writer, append-only
❏ Replicated to multiple bookies
Pulsar - Pub/Sub
Pulsar - Topic Partitions
Pulsar - Segments
Pulsar - Stream
Pulsar - Stream as a unified view on data
Pulsar - Two levels of reading API
❏ Pub/Sub (Streaming)
❏ Read data from brokers
❏ Consume / Seek / Receive
❏ Subscription Mode - Failover, Shared, Key_Shared
❏ Reprocessing data by rewinding (seeking) the cursors
❏ Segment (Batch)
❏ Read data from storage (bookkeeper or tiered storage)
❏ Fine-grained Parallelism
❏ Predicate pushdown (publish timestamp)
Unified Data Processing on Pulsar
Architecture V2
API
Gateway
Spark Structured
Streaming
Spark SQL
Architecture V2
API
Gateway
Spark Structured
Streaming
Spark SQL
❏ Single Data Store (Pulsar)
❏ Single Computing Engine (Spark)
❏ Unified API
Pulsar-Spark
❏ Deeply integrated with Pulsar schema
❏ Pulsar topics as Structured Streams
❏ Pulsar Connectors for Spark Structured Streaming
❏ Pulsar Connectors for Spark SQL
https://github.com/streamnative/pulsar-spark
Pulsar-Spark / Streaming Queries
https://github.com/streamnative/pulsar-spark
Pulsar-Spark / Batch Queries
https://github.com/streamnative/pulsar-spark
Pulsar-Spark / Write Results to Pulsar
https://github.com/streamnative/pulsar-spark
PoC at Bestpay
❏ Ingest data to Pulsar
❏ Realtime Data
❏ pulsar-io-kafka: connect kafka messages (JSON) to Pulsar
and store them in AVRO format with schema information
❏ Historic Data
❏ pulsar-spark: query the Hive table and insert Hive rows as
Pulsar messages (AVRO) to Pulsar
❏ Data Processing
❏ Spark Structured Streaming: for stream processing
❏ Spark SQL: for batch processing and interactive queries
Benefits
❏ Complexity drop 33% (Number of clusters from 6 down to 4)
❏ Storage saving 8.7% (expect to be 28%)
❏ Time to production boosts 11x (backed with streaming SQL)
❏ Higher stability (expected)
Summary
❏ Apache Pulsar is a cloud-native messaging streaming system
❏ Multi layered architecture
❏ Segment centric storage
❏ Two levels of reading API: Pub/Sub + Segment
❏ Apache Pulsar provides a unified view of data
❏ Pulsar + Spark for a simple unified data processing
References
❏ pulsar-io-kafka: https://github.com/streamnative/pulsar-io-kafka
❏ pulsar-spark: https://github.com/streamnative/pulsar-spark
❏ Apache Pulsar as One Storage System for Both Real-time and Historical Data
Analysis:
https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c590
17
Community
❏ Pulsar Website: https://pulsar.apache.org
❏ Twitter: @apache_pulsar / @streamnativeio
❏ Slack: https://apache-pulsar.herokuapp.com
❏ Mailing Lists
dev@pulsar.apache.org, users@pulsar.apache.org
❏ Github
https://github.com/apache/pulsar
❏ Medium
https://medium.com/streamnative
Thanks!

More Related Content

PDF
Stream or segment : what is the best way to access your events in Pulsar_Neng
PDF
Rails on HBase
PDF
Thug feb 23 2015 Chen Zhang
PDF
Cassandra at mahalo_com_scale_la_meetup_de
PDF
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
PPTX
Chicago Data Summit: Geo-based Content Processing Using HBase
PDF
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
PDF
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
Stream or segment : what is the best way to access your events in Pulsar_Neng
Rails on HBase
Thug feb 23 2015 Chen Zhang
Cassandra at mahalo_com_scale_la_meetup_de
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Chicago Data Summit: Geo-based Content Processing Using HBase
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...

What's hot (20)

PPTX
ImpalaToGo design explained
PPTX
Денис Резник "Моя база данных не справляется с нагрузкой. Что делать?"
PDF
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
PDF
What Every Developer Should Know About Database Scalability
PPTX
Ubiquitous Solr - A Database's not-so-evil Twin
PDF
Володимир Цап "Constraint driven infrastructure - scale or tune?"
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
PPTX
Brief of Caching
PPTX
Brief of Caching - Rafiul Islam
PPTX
ImpalaToGo introduction
PDF
hbaseconasia2019 Recent work on HBase at Pinterest
PPTX
Work with hundred of hot terabytes in JVMs
PPTX
Case studies session 2
PPTX
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
PPTX
Introduction to streaming and messaging flume,kafka,SQS,kinesis
PDF
Streaming with Kafka Akka Spark
PPTX
HBaseCon 2015: HBase Operations in a Flurry
PPTX
Using Cassandra with your Web Application
PDF
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
ImpalaToGo design explained
Денис Резник "Моя база данных не справляется с нагрузкой. Что делать?"
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
What Every Developer Should Know About Database Scalability
Ubiquitous Solr - A Database's not-so-evil Twin
Володимир Цап "Constraint driven infrastructure - scale or tune?"
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
Brief of Caching
Brief of Caching - Rafiul Islam
ImpalaToGo introduction
hbaseconasia2019 Recent work on HBase at Pinterest
Work with hundred of hot terabytes in JVMs
Case studies session 2
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Streaming with Kafka Akka Spark
HBaseCon 2015: HBase Operations in a Flurry
Using Cassandra with your Web Application
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
Ad

Similar to How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar (20)

PDF
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
PDF
Query Pulsar Streams using Apache Flink
PDF
When apache pulsar meets apache flink
PDF
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
PDF
Timothy Spann: Apache Pulsar for ML
PDF
Kafka & Hadoop in Rakuten
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PDF
Cloud lunch and learn real-time streaming in azure
PDF
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
PDF
Structured Streaming with Kafka
PDF
Sink Your Teeth into Streaming at Any Scale
PDF
Sink Your Teeth into Streaming at Any Scale
PDF
All Day DevOps - FLiP Stack for Cloud Data Lakes
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
PDF
3.1.Performance and BigData Ecosystem
PDF
Big Data Streams Architectures. Why? What? How?
PDF
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Query Pulsar Streams using Apache Flink
When apache pulsar meets apache flink
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Timothy Spann: Apache Pulsar for ML
Kafka & Hadoop in Rakuten
bigdata 2022_ FLiP Into Pulsar Apps
Cloud lunch and learn real-time streaming in azure
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Structured Streaming with Kafka
Sink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
All Day DevOps - FLiP Stack for Cloud Data Lakes
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
3.1.Performance and BigData Ecosystem
Big Data Streams Architectures. Why? What? How?
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Ad

Recently uploaded (20)

PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
DNT Brochure 2025 – ISV Solutions @ D365
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PPTX
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
PPTX
Introduction to Windows Operating System
PPTX
most interesting chapter in the world ppt
PPTX
Download Adobe Photoshop Crack 2025 Free
PDF
AI Guide for Business Growth - Arna Softech
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
Lecture 5 Software Requirement Engineering
PPTX
Computer Software - Technology and Livelihood Education
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
Visual explanation of Dijkstra's Algorithm using Python
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
Microsoft Office 365 Crack Download Free
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Topaz Photo AI Crack New Download (Latest 2025)
Trending Python Topics for Data Visualization in 2025
DNT Brochure 2025 – ISV Solutions @ D365
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
Introduction to Windows Operating System
most interesting chapter in the world ppt
Download Adobe Photoshop Crack 2025 Free
AI Guide for Business Growth - Arna Softech
GSA Content Generator Crack (2025 Latest)
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
CCleaner 6.39.11548 Crack 2025 License Key
Lecture 5 Software Requirement Engineering
Computer Software - Technology and Livelihood Education
MCP Security Tutorial - Beginner to Advanced
Visual explanation of Dijkstra's Algorithm using Python
Matchmaking for JVMs: How to Pick the Perfect GC Partner
iTop VPN Crack Latest Version Full Key 2025
Microsoft Office 365 Crack Download Free
Practical Indispensable Project Management Tips for Delivering Successful Exp...

How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar

  • 1. How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar Vincent Xie (Bestpay), Jia Zhai (StreamNative)
  • 2. About us Vincent (Weisheng) Xie ❏ Current Director @ Orange Financial ❏ Previous Tech lead of ML engineering team @ Intel Jia Zhai ❏ Co-Founder of StreamNative ❏ Apache Pulsar PMC Member ❏ Apache BookKeeper PMC Member
  • 3. Agenda ❏ Background ❏ Apache Pulsar ❏ Unified Data Processing ❏ Our Practices ❏ Q & A
  • 5. Orange Financial Orange Financial Services Group (Chinese: 甜橙金融), formerly known as Bestpay, is an affiliate company of China Telecom. It reached 1.13 trillion CNY ($18.37 Billion) transaction volume in 2018, with 500 million registered users and 41.9 million active users. Subsidiaries: Bestpay - a mobile wallet and payment app Jieqian - a consumer loan service Orange Wealth Orange Insurance Orange Credit Orange Financial Cloud
  • 8. High Industry Penetration Rate Source: China Unionpay
  • 10. Challenges ❏ High concurrency ❏ > 50M transactions, 1 billion events a day (peek: 35K/s) ❏ Low latency demand ❏ response < 200ms ❏ Large number of batch jobs and streaming jobs
  • 11. “A merchant’s total transaction volume ($) within the past month (30days) (current transaction included)” = sum($past_29days) + sum($today_upto_current) batch streaming
  • 13. Batch Layer Speed/Streaming Layer Architecture V1 - Lambda API Gateway Serving Layer
  • 14. Drawbacks ❏ S/W stacks complexity ❏ Realtime / Offline / Serving stacks ❏ Multiple clusters to maintain (Kafka / Hive / Spark / Flink) ❏ Different skill sets to manipulate (Scala / Java / SQL) ❏ Segmented Logics ❏ Historical/Current ❏ Data redundancy ❏ Multiple duplications to move over
  • 16. What is Apache Pulsar?
  • 17. “Flexible Pub/Sub Messaging Backed by durable log storage”
  • 18. Pulsar - A cloud-native architecture Stateless Serving Durable Storage
  • 19. Pulsar - Segment Centric Storage ❏ Topic Partition (Managed Ledger) ❏ The storage layer for a single topic partition ❏ Segment (Ledger) ❏ Single writer, append-only ❏ Replicated to multiple bookies
  • 21. Pulsar - Topic Partitions
  • 24. Pulsar - Stream as a unified view on data
  • 25. Pulsar - Two levels of reading API ❏ Pub/Sub (Streaming) ❏ Read data from brokers ❏ Consume / Seek / Receive ❏ Subscription Mode - Failover, Shared, Key_Shared ❏ Reprocessing data by rewinding (seeking) the cursors ❏ Segment (Batch) ❏ Read data from storage (bookkeeper or tiered storage) ❏ Fine-grained Parallelism ❏ Predicate pushdown (publish timestamp)
  • 28. Architecture V2 API Gateway Spark Structured Streaming Spark SQL ❏ Single Data Store (Pulsar) ❏ Single Computing Engine (Spark) ❏ Unified API
  • 29. Pulsar-Spark ❏ Deeply integrated with Pulsar schema ❏ Pulsar topics as Structured Streams ❏ Pulsar Connectors for Spark Structured Streaming ❏ Pulsar Connectors for Spark SQL https://github.com/streamnative/pulsar-spark
  • 30. Pulsar-Spark / Streaming Queries https://github.com/streamnative/pulsar-spark
  • 31. Pulsar-Spark / Batch Queries https://github.com/streamnative/pulsar-spark
  • 32. Pulsar-Spark / Write Results to Pulsar https://github.com/streamnative/pulsar-spark
  • 33. PoC at Bestpay ❏ Ingest data to Pulsar ❏ Realtime Data ❏ pulsar-io-kafka: connect kafka messages (JSON) to Pulsar and store them in AVRO format with schema information ❏ Historic Data ❏ pulsar-spark: query the Hive table and insert Hive rows as Pulsar messages (AVRO) to Pulsar ❏ Data Processing ❏ Spark Structured Streaming: for stream processing ❏ Spark SQL: for batch processing and interactive queries
  • 34. Benefits ❏ Complexity drop 33% (Number of clusters from 6 down to 4) ❏ Storage saving 8.7% (expect to be 28%) ❏ Time to production boosts 11x (backed with streaming SQL) ❏ Higher stability (expected)
  • 35. Summary ❏ Apache Pulsar is a cloud-native messaging streaming system ❏ Multi layered architecture ❏ Segment centric storage ❏ Two levels of reading API: Pub/Sub + Segment ❏ Apache Pulsar provides a unified view of data ❏ Pulsar + Spark for a simple unified data processing
  • 36. References ❏ pulsar-io-kafka: https://github.com/streamnative/pulsar-io-kafka ❏ pulsar-spark: https://github.com/streamnative/pulsar-spark ❏ Apache Pulsar as One Storage System for Both Real-time and Historical Data Analysis: https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c590 17
  • 37. Community ❏ Pulsar Website: https://pulsar.apache.org ❏ Twitter: @apache_pulsar / @streamnativeio ❏ Slack: https://apache-pulsar.herokuapp.com ❏ Mailing Lists dev@pulsar.apache.org, users@pulsar.apache.org ❏ Github https://github.com/apache/pulsar ❏ Medium https://medium.com/streamnative