SlideShare a Scribd company logo
Suqiang Song, Director, Chapter Leader of Data Engineering & AI
Mastercard
AI as a Service
Build Shared AI Service Platforms Based on Deep
Learning Technologies
#AI1SAIS
Differentiation starts with consumer insights from a
massive worldwide payments network and our
experience in data cleansing, analytics and modeling
Mastercard Big Data & AI Expertise
WAREHOUSED
• 10 petabytes
• 5+ year historic global view
• Rapid retrieval
• Above-and-beyond privacy protection and security
MULTI-SOURCED
• 38MM+ merchant locations
• 22,000 issuers CLEANSED, AGGREGATD, ANONYMOUS, AUGMENTED
• 1.5MM automated rules
• Continuously tested
TRANSFORMED INTO ACTIONABLE INSIGHTS
• Reports, indexes, benchmarks
• Behavioral variables
• Models, scores, forecasting
• Econometrics
What can
2.4 BILLION
Global Cards and
56 BILLION
Transactions/
Year mean
to you?
Mastercard Enhanced Artificial Intelligence Capability
with the Acquisitions of Applied Predictive
Technologies(2015) and Brighterion (2017)
What is the AI as a Service ?
©2018Mastercard.ProprietaryandConfidential.
AI Applications
Machine learning frameworks
• Machine learning frameworks:
Provide stable and secure
environments and consolidate
integrated wrappers on top of
variable technologies for regular
machine learning works
• Applications build silos from scratch
Three modes of AI as a Services
• Fully managed machine learning servic
es use templates, pre-built models and
drag-and-drop development tools to si
mplify and expedite the process of usin
g a machine learning framework
• Applications share templates and pre-
built models , assembly and infer them
into pipelines or business context
• Automation Services, tasks like explora
tory data analysis, pre-processing of da
ta, hyper-parameter tuning, model sele
ction and putting models into producti
on can be automated
• “God's Return to God, Satan's Return
to Satan , Math’s Return AI, Business’s
Return Biz”
Machine learning frameworks
AI Applications
Machine learning frameworks
Fully managed
machine learning services
AI Applications
On / Off Premise Advanced
Infrastructure
Fully managed
machine learning services
On / Off Premise Advanced
Infrastructure
On / Off Premise Advanced
Infrastructure
Automation
Services
©2018Mastercard.ProprietaryandConfidential.
5
Time
Cost
Data Exploration &
Harmonization
Features
Engineering
0,0
Regular Mode :Machine learning frameworks
Evaluation
& Benchmarking
Model Deployment
&Serving
$100,000
6 weeks
Modeling
Example : Machine Learning Sandbox
©2018Mastercard.ProprietaryandConfidential.
6
Time
Cost
Features
Engineering
0,0
Plus Mode : Fully managed machine learning services
Evaluation
& Benchmarking
$50,000
2 weeks
Model Deployment
&Serving
Modeling
Data Exploration &
Harmonization
Example : Data Science Workbench
©2018Mastercard.ProprietaryandConfidential.
7
Time
Cost
Features
Engineering
0,0
Premium Mode: Automation Services
Evaluation
& Benchmarking
$10,000
2 days
Model Deployment
&ServingData Exploration &
Harmonization
Modeling
Example : Amazon SageMaker ?
©2018Mastercard.ProprietaryandConfidential.
8
Feature engineering bottlenecks
Pre-calculate hundreds or thousands Long
Term Variables take lots of resources and times
Model scalability limitations
Trade-off between automation in parallel and
scaling machine learning to ever larger datasets
and ever more complicated models
Model Serving to multiple contexts
Gap to connect to existing business
pipelines , offline ,streaming and real-time
Heavily relies on human machine learning
experts
Relies on human to perform the most of tasks
API Enablement and automate deployment
Low productivity to create more models with
low level raw APIs
Isolated promotions and operation readness
with automate deployment
Less integration with end to end data
pipelines, fill in the loop
Gap to bring machine learning process into
the existing enterprise data pipelines ,
including batch , streaming and real-time
1
2
3
4
5
6
Challenges to achieve Premium Automation AI Service
Learning Automation Serving Automation
©2018Mastercard.ProprietaryandConfidential.
What Deep Learning can help ?
©2018Mastercard.ProprietaryandConfidential.
10
Bottlenecks
 Need to pre-calculate hundreds or thousands Long Term Variables for each user, such as total
spends /visits for merchants list, category list divided by week, months and years
 The computation time for LTV features took > 70% of the data processing time for the whole
lifecycle and occupied lots of resources which had huge impact to other critical workloads.
 Miss the feature selection optimizations which could save the data engineering efforts a lot
AUTH DETAIL from last weekLTV DATA from last week MERCHANT
AGED LTV DATA
GEO
CATEGORY
ITEM LEVEL DATA
FILTERED TRANSACTIONS
SUMMED BY USER
AGED BY USERAGED LTV DATA
LTV DATA FOR THIS WEEK
Challenges with Traditional ML : Feature engineering bottlenecks
©2018Mastercard.ProprietaryandConfidential.
11
Improvements
 When build model , only focus on few
pre-defined sliding features and custom
overlap features ( Users only need to
identify the columns names from data
source)
 Remove most of the LTV pre-calculations
works, saved hours time and lots of
resources
 Deep learning algorithm generates
exponential growth of hidden embedding
features ,do the internal features selections
and optimization automatically when it
does cross validation at training stage
With Deep Learning : Remove lots of LTV workloads and simply the feature engineering
©2018Mastercard.ProprietaryandConfidential.
12
…
Item 1 * Users
Item 2* Users
Item n* Users
Feature
Engineering
Training 1
Training 2
Training n
Model 1
Model 2
Model n
Merge
2
2
2
3
3
3
4
1
Prebuilt correlation
Model
Merge all the
prediction results
Evaluation 1
Evaluation 2
Evaluation 3
Limitations
 All the pipelines separated by items and
generate one model for each item
 Have to pre-calculate the correlation
matrix between items
 Lots of redundant duplications and
computations at feature engineering
,training and testing process
 Run items in parallel and occupied
most of cluster resources when executed
 Bad metrics for items with few
transactions
 It is very hard to scale more items , from
hundreds to millions ?
Challenges with Traditional ML : Model scalability
©2018Mastercard.ProprietaryandConfidential.
13
•NCF
• Scenario:Neural Collaborative
Filtering ,recommend products to
customers (priority is to
recommend to active users)
according to customers’ past
history activities.
• https://www.comp.nus.edu.sg/~xia
ngnan/papers/ncf.pdf
•Wide & Deep learning
• Scenario: jointly trained wide linear
models and deep neural networks-
--to combine the benefits of
memorization and generalization
for recommender systems.
• https://pdfs.semanticscholar.org/aa
9d/39e938c84a867ddf2a8cabc575f
fba27b721.pdf
Linear 2
ReLU
Linear 1
ReLU
Concat
CMul
LookupTable
(MF User)
LookupTable
(MLP User)
LookupTable
(MF Item)
Linear 3
Sigmoid
Select
LookupTable
(MLP Item)
ConcatTable
Conca
SelectSelect Select
User index User indexItem Index
User Item Pair
MLP
MF
Embedding
Layers
Item Index
MLP User Embedding MLP Item EmbeddingMF Item EmbeddingMF User Embedding
With Deep Learning : Scale models in deeper and wider without decreasing metrics
©2018Mastercard.ProprietaryandConfidential.
14
Relies on human to perform the following tasks:
Select and construct appropriate features.
Select an appropriate model family.
Optimize model hyper parameters.
Post process machine learning models.
Critically analyze the results obtained.
Challenges with Traditional ML : Heavily relies on human machine learning experts
Training Data Sets
Data Source
Partitioning
Model 2
Model 1
Model n
Testing Data Sets
Validation Data Sets
Choose Best Model
Validate Model Metrics
©2018Mastercard.ProprietaryandConfidential.
15
Improvements
 Common neural network
"tricks", including initialization, L2
and dropout regularization, Batch
normalization, gradient checking
 A variety of optimization
algorithms, such as mini-batch
gradient descent, Momentum,
RMSprop and Adam
 Provides optimization-as-a-
service using an ensemble of
optimization strategies, allowing
practitioners to efficiently
optimize models faster and
cheaper than standard
approaches.
With Deep Learning : Gives more options for finding an optimally performing robust
configuration
Our Explore & Evaluation Journey
©2016Mastercard.ProprietaryandConfidential.
Enterprise requirements for Deep Learning
Seamless integration with
Products Internal & External
• Add deep learning capabilities to existing
Analytic Applications and/or machine learning
workflows rather than rebuild all of them
Collocated with mass data
storage
• Analyze a large amount of data on the
same Big Data clusters where the data
are stored (HDFS, HBase, Hive, etc.) rather
than move or duplicate data
Shared infrastructure with Multi-
tenant isolated resources
• Leverage existing Big Data clusters and deep
learning workloads should be managed and
monitored with other workloads (ETL, data
warehouse, traditional ML etc..) rather than
run DL workloads standalone in separate
clusters
Data governance with
restricted Processing
• Follow data privacy, regulation and
compliance ( such as PCI/PII compliance
and GDPR rather than operate data in
unsecured zones
©2016Mastercard.ProprietaryandConfidential.
• Claimed that the GPU computing are better than CPU which requires new hardware
infrastructure (very long timeline normally )
• Success requires many engineer-hours ( Impossible to Install a Tensor Flow Cluster at
STAGE ...)
• Low level APIs with steep learning curve ( Where is your PHD degree ? )
• Not well integrated with other enterprise tools and need data movements (couldn't
leverage the existing ETL, data warehousing and other analytic relevant data pipelines,
technologies and tool sets. And it is also a big challenge to make duplicate data
pipelines and data copy to the capacity and performance.)
• Tedious and fragile to distribute computations ( less monitoring )
• The concerns of Enterprise Maturity and InfoSec ( use GPU cluster with Tensor Flow from
Google Cloud )
…………..
Maybe not your story , but we have ....
Challenges and limitations to Production considering some “Super Stars”….
©2016Mastercard.ProprietaryandConfidential.
Integrations with existing DL
libraries
• Deep Learning Pipelines (from Databricks)
• Caffe (CaffeOnSpark)
• Keras (Elephas)
• mxnet
• Paddle
• TensorFlow (TensorFlow on Spark,
TensorFrames)
• CNTK (mmlspark)
Implementations of DL on Spark
• BigDL
• DeepDist
• DeepLearning4J
• SparkCL
• SparkNet
What does Spark offer?
©2016Mastercard.ProprietaryandConfidential.
Tensor Flow-on-Spark (or Caffe-on-Spark) uses Spark executors (tasks) to launch Tensor Flow/Caffe
instances in the cluster; however, the distributed deep learning (e.g., training, tuning and prediction) are
performed outside of Spark (across multiple Tensor Flow or Caffe instances).
(1) As a results, Tensor Flow/Caffe still runs on specialized HW (such as GPU servers interconnected by
InfiniBand), and the Open MP implementations in Tensor Flow/Caffe conflicts with the JVM threading in
Spark (resulting in lower performance).
(2) In addition, in this case Tensor Flow/Caffe can only interact the rest of the analytics pipelines in a
very coarse-grained fashion (running as standalone jobs outside of the pipeline, and using HDFS files as
job input and output).
Programming
interface
Contributors commits
BigDL Scala & Python 50 2221
TensorflowOnSpark Python 9 257
Databricks/tensor Python 9 185
Databricks/spark-deep-
learning
Python 8 51
StatisticscollectedonMar5th
, 2018
Need more break down …..
©2016Mastercard.ProprietaryandConfidential.
21
Train Wide and Deep Model ( BigDL)
features Models
model
candidatesampled
partition
Training Data
…
10~12
Months
Raw
Txns
+
Negative
samples
Load Parquet
Train Multiple Models
Train AIS Model ( Mlib)
sampled
partition
sampled
partition
Post
Processing
Simple
Feature
Engineering
models
models
Spark ML Pipeline Stages
Test Data
Predictions
Test
Spark Data FramesParquet Files
Pre-processing
1~2
Months
Feature
Selections
Feature
Selection
Model
Ensemble
Inference
SparkPipeline
Neural Recommender
Using BigDL NCF/ Wide And Deep
Transformer Model
Evaluation
& Fine
Tune
Estimator
Spark Mllib
Train NCF Model ( BigDL)
models
…
Benchmark
User-Merchant
User-Category
User-Geo
User-Merchant-Geo
….
POC: Benchmark BigDL & Spark Mllib
©2016Mastercard.ProprietaryandConfidential.
22
AUROC: A
AUPRCs: B
recall: C
precision: D
20 precision: E
Mllib AIS
Parameters :
MaxIter(100)
RegParam(0.01)
Rank(200)
Alpha(0.01)
BigDL NCF
AUROC: A+23%
AUPRCs: B+31%
recall: C+18%
precision: D+47%
20 precision: E+51%
Parameters :
MaxEpoch(10)
learningRate(3e-2)
learningRateDecay(3e-7)
uOutput(100)
mOutput(200)
batchSize(1.6 M)
BigDL WAD
Parameters :
MaxEpoch(10)
learningRate(1e-2)
learningRateDecay(1e-7)
uOutput(100)
mOutput(200)
batchSize(0.6 M)
AUROC: A+20% (3 % down)
AUPRCs: B+30% (1% down)
recall: C+12% (4 % down)
precision: D+49% (2 % up)
20 precision: E+54% (3% up)
Benchmark results ( > 100 rounds)
©2016Mastercard.ProprietaryandConfidential.
Beyond Deep Learning library , we
need more automated platform
capabilities to fit PROD adoption gaps
©2016Mastercard.ProprietaryandConfidential.
24
Incremental Tuning ( only re-run the
whole pipeline with incremental changed
datasets such as daily changed transactions and
benchmark the models )
 Refresh the dimensional datasets ( such
as adding new users , items …)
 Load the history model to the context
and update incremental parts of model
based on the incremental data sets
 Periodic Re-training with a batch
algorithm and time-series prediction
 Benchmark the history model and update
model and on-board the better ones.
…
Incremental
Fact
Incremental
Dimensional
History Model
Incremental Set
Ingest
Model
Fine Tuner
Lookups Refresher
Model Loader
Models
Benchmark
Ingest
Periodic Incremental Tuning
Incremental Fine Tuning &
Benchmark
Gap 1 : Incremental Tuning
©2016Mastercard.ProprietaryandConfidential.
25
Model Serving (Connect to existing business pipelines , offline ,streaming and real-time )
 Build the model serving capability by exporting model to scoring/prediction/recommendation
services and integration points
 Integrate the model serving services inside the business pipelines , such as embed them into
Spark jobs for offline, Spark Streaming jobs for streaming , the real-time “dialogue” with Kafka
messaging …
Gap 2 : Model Serving to multiple contexts
©2016Mastercard.ProprietaryandConfidential.
26
Gap 3 : Build user friendly high level pipeline APIs
High level pipeline APIs
 Abstract and purify high level data and learning pipeline APIs on top of BigDL lib to simply the
deep learning model assembly process and increase productivity
©2016Mastercard.ProprietaryandConfidential.
27
Gap 4 : Integrated with end to end data pipelines, fill in the loop
Embedded the deep learning process into existing enterprise data pipelines
 Build pre-defined templates and customized processors to bring deep learning process
into the existing enterprise data pipelines , including batch , streaming and real-time
©2016Mastercard.ProprietaryandConfidential.
28
Design and Implement pipelines at Visualized workbench
Pipelines Promotion
Biz. A
Biz. B
Biz. C
Biz. D
Biz. E
Biz. F
Pipeline Designer
AI Pipelines and Flows
Local Dev
Dev
Sandbox
Prod(s)
Stage
Configuration
Management
(Tag /
Branches)
Pipeline
Registry
Generate AI Pipelines
 Deployment sequences
Continuous
integration
(Parameter,
template)
Automate deployment with CI/CD pipelines
Gap 5 : AI Pipelines promotion with automated CI/CD deployment
©2016Mastercard.ProprietaryandConfidential.
Easier to build end-to-end analytics + AI applications
• Reference use cases
• Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, etc.
• Predefined models
• Object detection, image classification, text classification, recommendations, GAN, etc.
• Feature engineering & transformations
• Image, text, speech, 3D imaging, time-series, etc.
• High level pipeline APIs
• Dataframes, ML Pipelines, autograd, transfer learning, Keras/Keras2, etc.
https://github.com/intel-analytics/analytics-zoo
Community improvements : Analytics Zoo -> Unified Analytics + AI Platform for Spark
and BigDL
©2016Mastercard.ProprietaryandConfidential.
Thanks
Q & A

More Related Content

PDF
A cloud readiness assessment framework
PPTX
End-to-end Data Governance with Apache Avro and Atlas
PDF
Introdution to Dataops and AIOps (or MLOps)
PDF
The Netflix Way to deal with Big Data Problems
PPTX
MLOps - The Assembly Line of ML
PDF
Large Language Models Bootcamp
PDF
Enterprise Architecture - TOGAF Overview
PDF
Process architecture - Part II
A cloud readiness assessment framework
End-to-end Data Governance with Apache Avro and Atlas
Introdution to Dataops and AIOps (or MLOps)
The Netflix Way to deal with Big Data Problems
MLOps - The Assembly Line of ML
Large Language Models Bootcamp
Enterprise Architecture - TOGAF Overview
Process architecture - Part II

What's hot (20)

PPT
Business Continuity And Disaster Recovery Notes
PDF
Accenture Regulatory Services
PDF
Data platform architecture
PDF
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
PPTX
Amazon SageMaker for MLOps Presentation.
PDF
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
PDF
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
PDF
ML-Ops how to bring your data science to production
PDF
Cloud Security Governance
PDF
Generative AI
PDF
Understanding DataOps and Its Impact on Application Quality
PPTX
App Modernization Pitch Deck.pptx
PPSX
Microservices, Containers, Kubernetes, Kafka, Kanban
PDF
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
PDF
8 Steps to Creating a Data Strategy
PDF
Digital Transformation And Enterprise Architecture
PPTX
Cloud security Presentation
PDF
Cloud governance - theory and tools
PDF
Apply MLOps at Scale
PPT
Cloud computing
Business Continuity And Disaster Recovery Notes
Accenture Regulatory Services
Data platform architecture
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Amazon SageMaker for MLOps Presentation.
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
ML-Ops how to bring your data science to production
Cloud Security Governance
Generative AI
Understanding DataOps and Its Impact on Application Quality
App Modernization Pitch Deck.pptx
Microservices, Containers, Kubernetes, Kafka, Kanban
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
8 Steps to Creating a Data Strategy
Digital Transformation And Enterprise Architecture
Cloud security Presentation
Cloud governance - theory and tools
Apply MLOps at Scale
Cloud computing
Ad

Similar to AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Technologies with Suqiang Song (20)

PDF
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
PDF
Enterprise deep learning lessons bodkin o reilly ai sf 2017
PDF
C19013010 the tutorial to build shared ai services session 1
PPTX
The world of Machine Learning, Deep Learning and PowerAI
PDF
AI is moving from its academic roots to the forefront of business and industry
PDF
AI in the Financial Services Industry
PDF
Think Big | Enterprise Artificial Intelligence
PDF
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
PPTX
Machine Learning AND Deep Learning for OpenPOWER
PPTX
Introduction to Machine Learning on IBM Power Systems
PPTX
Innovations using PowerAI
PPTX
Machine-Learning-vs-Deep-Learning-Whats-the-Difference
PDF
Dato Keynote
PPTX
Machine Learning Pitch Deck
PDF
FinTech, AI, Machine Learning in Finance
PDF
AI meets Big Data
PDF
Nexxworks bootcamp ML6 (27/09/2017)
PDF
Quant university MRM and machine learning
PDF
Python and Machine Learning Applications in Industry
PDF
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017
C19013010 the tutorial to build shared ai services session 1
The world of Machine Learning, Deep Learning and PowerAI
AI is moving from its academic roots to the forefront of business and industry
AI in the Financial Services Industry
Think Big | Enterprise Artificial Intelligence
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
Machine Learning AND Deep Learning for OpenPOWER
Introduction to Machine Learning on IBM Power Systems
Innovations using PowerAI
Machine-Learning-vs-Deep-Learning-Whats-the-Difference
Dato Keynote
Machine Learning Pitch Deck
FinTech, AI, Machine Learning in Finance
AI meets Big Data
Nexxworks bootcamp ML6 (27/09/2017)
Quant university MRM and machine learning
Python and Machine Learning Applications in Industry
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
1_Introduction to advance data techniques.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Lecture1 pattern recognition............
PDF
Introduction to Data Science and Data Analysis
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
annual-report-2024-2025 original latest.
PDF
[EN] Industrial Machine Downtime Prediction
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
1_Introduction to advance data techniques.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Lecture1 pattern recognition............
Introduction to Data Science and Data Analysis
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
annual-report-2024-2025 original latest.
[EN] Industrial Machine Downtime Prediction
Miokarditis (Inflamasi pada Otot Jantung)
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Fluorescence-microscope_Botany_detailed content
STERILIZATION AND DISINFECTION-1.ppthhhbx
IB Computer Science - Internal Assessment.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Clinical guidelines as a resource for EBP(1).pdf

AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Technologies with Suqiang Song

  • 1. Suqiang Song, Director, Chapter Leader of Data Engineering & AI Mastercard AI as a Service Build Shared AI Service Platforms Based on Deep Learning Technologies #AI1SAIS
  • 2. Differentiation starts with consumer insights from a massive worldwide payments network and our experience in data cleansing, analytics and modeling Mastercard Big Data & AI Expertise WAREHOUSED • 10 petabytes • 5+ year historic global view • Rapid retrieval • Above-and-beyond privacy protection and security MULTI-SOURCED • 38MM+ merchant locations • 22,000 issuers CLEANSED, AGGREGATD, ANONYMOUS, AUGMENTED • 1.5MM automated rules • Continuously tested TRANSFORMED INTO ACTIONABLE INSIGHTS • Reports, indexes, benchmarks • Behavioral variables • Models, scores, forecasting • Econometrics What can 2.4 BILLION Global Cards and 56 BILLION Transactions/ Year mean to you? Mastercard Enhanced Artificial Intelligence Capability with the Acquisitions of Applied Predictive Technologies(2015) and Brighterion (2017)
  • 3. What is the AI as a Service ?
  • 4. ©2018Mastercard.ProprietaryandConfidential. AI Applications Machine learning frameworks • Machine learning frameworks: Provide stable and secure environments and consolidate integrated wrappers on top of variable technologies for regular machine learning works • Applications build silos from scratch Three modes of AI as a Services • Fully managed machine learning servic es use templates, pre-built models and drag-and-drop development tools to si mplify and expedite the process of usin g a machine learning framework • Applications share templates and pre- built models , assembly and infer them into pipelines or business context • Automation Services, tasks like explora tory data analysis, pre-processing of da ta, hyper-parameter tuning, model sele ction and putting models into producti on can be automated • “God's Return to God, Satan's Return to Satan , Math’s Return AI, Business’s Return Biz” Machine learning frameworks AI Applications Machine learning frameworks Fully managed machine learning services AI Applications On / Off Premise Advanced Infrastructure Fully managed machine learning services On / Off Premise Advanced Infrastructure On / Off Premise Advanced Infrastructure Automation Services
  • 5. ©2018Mastercard.ProprietaryandConfidential. 5 Time Cost Data Exploration & Harmonization Features Engineering 0,0 Regular Mode :Machine learning frameworks Evaluation & Benchmarking Model Deployment &Serving $100,000 6 weeks Modeling Example : Machine Learning Sandbox
  • 6. ©2018Mastercard.ProprietaryandConfidential. 6 Time Cost Features Engineering 0,0 Plus Mode : Fully managed machine learning services Evaluation & Benchmarking $50,000 2 weeks Model Deployment &Serving Modeling Data Exploration & Harmonization Example : Data Science Workbench
  • 7. ©2018Mastercard.ProprietaryandConfidential. 7 Time Cost Features Engineering 0,0 Premium Mode: Automation Services Evaluation & Benchmarking $10,000 2 days Model Deployment &ServingData Exploration & Harmonization Modeling Example : Amazon SageMaker ?
  • 8. ©2018Mastercard.ProprietaryandConfidential. 8 Feature engineering bottlenecks Pre-calculate hundreds or thousands Long Term Variables take lots of resources and times Model scalability limitations Trade-off between automation in parallel and scaling machine learning to ever larger datasets and ever more complicated models Model Serving to multiple contexts Gap to connect to existing business pipelines , offline ,streaming and real-time Heavily relies on human machine learning experts Relies on human to perform the most of tasks API Enablement and automate deployment Low productivity to create more models with low level raw APIs Isolated promotions and operation readness with automate deployment Less integration with end to end data pipelines, fill in the loop Gap to bring machine learning process into the existing enterprise data pipelines , including batch , streaming and real-time 1 2 3 4 5 6 Challenges to achieve Premium Automation AI Service Learning Automation Serving Automation
  • 10. ©2018Mastercard.ProprietaryandConfidential. 10 Bottlenecks  Need to pre-calculate hundreds or thousands Long Term Variables for each user, such as total spends /visits for merchants list, category list divided by week, months and years  The computation time for LTV features took > 70% of the data processing time for the whole lifecycle and occupied lots of resources which had huge impact to other critical workloads.  Miss the feature selection optimizations which could save the data engineering efforts a lot AUTH DETAIL from last weekLTV DATA from last week MERCHANT AGED LTV DATA GEO CATEGORY ITEM LEVEL DATA FILTERED TRANSACTIONS SUMMED BY USER AGED BY USERAGED LTV DATA LTV DATA FOR THIS WEEK Challenges with Traditional ML : Feature engineering bottlenecks
  • 11. ©2018Mastercard.ProprietaryandConfidential. 11 Improvements  When build model , only focus on few pre-defined sliding features and custom overlap features ( Users only need to identify the columns names from data source)  Remove most of the LTV pre-calculations works, saved hours time and lots of resources  Deep learning algorithm generates exponential growth of hidden embedding features ,do the internal features selections and optimization automatically when it does cross validation at training stage With Deep Learning : Remove lots of LTV workloads and simply the feature engineering
  • 12. ©2018Mastercard.ProprietaryandConfidential. 12 … Item 1 * Users Item 2* Users Item n* Users Feature Engineering Training 1 Training 2 Training n Model 1 Model 2 Model n Merge 2 2 2 3 3 3 4 1 Prebuilt correlation Model Merge all the prediction results Evaluation 1 Evaluation 2 Evaluation 3 Limitations  All the pipelines separated by items and generate one model for each item  Have to pre-calculate the correlation matrix between items  Lots of redundant duplications and computations at feature engineering ,training and testing process  Run items in parallel and occupied most of cluster resources when executed  Bad metrics for items with few transactions  It is very hard to scale more items , from hundreds to millions ? Challenges with Traditional ML : Model scalability
  • 13. ©2018Mastercard.ProprietaryandConfidential. 13 •NCF • Scenario:Neural Collaborative Filtering ,recommend products to customers (priority is to recommend to active users) according to customers’ past history activities. • https://www.comp.nus.edu.sg/~xia ngnan/papers/ncf.pdf •Wide & Deep learning • Scenario: jointly trained wide linear models and deep neural networks- --to combine the benefits of memorization and generalization for recommender systems. • https://pdfs.semanticscholar.org/aa 9d/39e938c84a867ddf2a8cabc575f fba27b721.pdf Linear 2 ReLU Linear 1 ReLU Concat CMul LookupTable (MF User) LookupTable (MLP User) LookupTable (MF Item) Linear 3 Sigmoid Select LookupTable (MLP Item) ConcatTable Conca SelectSelect Select User index User indexItem Index User Item Pair MLP MF Embedding Layers Item Index MLP User Embedding MLP Item EmbeddingMF Item EmbeddingMF User Embedding With Deep Learning : Scale models in deeper and wider without decreasing metrics
  • 14. ©2018Mastercard.ProprietaryandConfidential. 14 Relies on human to perform the following tasks: Select and construct appropriate features. Select an appropriate model family. Optimize model hyper parameters. Post process machine learning models. Critically analyze the results obtained. Challenges with Traditional ML : Heavily relies on human machine learning experts Training Data Sets Data Source Partitioning Model 2 Model 1 Model n Testing Data Sets Validation Data Sets Choose Best Model Validate Model Metrics
  • 15. ©2018Mastercard.ProprietaryandConfidential. 15 Improvements  Common neural network "tricks", including initialization, L2 and dropout regularization, Batch normalization, gradient checking  A variety of optimization algorithms, such as mini-batch gradient descent, Momentum, RMSprop and Adam  Provides optimization-as-a- service using an ensemble of optimization strategies, allowing practitioners to efficiently optimize models faster and cheaper than standard approaches. With Deep Learning : Gives more options for finding an optimally performing robust configuration
  • 16. Our Explore & Evaluation Journey
  • 17. ©2016Mastercard.ProprietaryandConfidential. Enterprise requirements for Deep Learning Seamless integration with Products Internal & External • Add deep learning capabilities to existing Analytic Applications and/or machine learning workflows rather than rebuild all of them Collocated with mass data storage • Analyze a large amount of data on the same Big Data clusters where the data are stored (HDFS, HBase, Hive, etc.) rather than move or duplicate data Shared infrastructure with Multi- tenant isolated resources • Leverage existing Big Data clusters and deep learning workloads should be managed and monitored with other workloads (ETL, data warehouse, traditional ML etc..) rather than run DL workloads standalone in separate clusters Data governance with restricted Processing • Follow data privacy, regulation and compliance ( such as PCI/PII compliance and GDPR rather than operate data in unsecured zones
  • 18. ©2016Mastercard.ProprietaryandConfidential. • Claimed that the GPU computing are better than CPU which requires new hardware infrastructure (very long timeline normally ) • Success requires many engineer-hours ( Impossible to Install a Tensor Flow Cluster at STAGE ...) • Low level APIs with steep learning curve ( Where is your PHD degree ? ) • Not well integrated with other enterprise tools and need data movements (couldn't leverage the existing ETL, data warehousing and other analytic relevant data pipelines, technologies and tool sets. And it is also a big challenge to make duplicate data pipelines and data copy to the capacity and performance.) • Tedious and fragile to distribute computations ( less monitoring ) • The concerns of Enterprise Maturity and InfoSec ( use GPU cluster with Tensor Flow from Google Cloud ) ………….. Maybe not your story , but we have .... Challenges and limitations to Production considering some “Super Stars”….
  • 19. ©2016Mastercard.ProprietaryandConfidential. Integrations with existing DL libraries • Deep Learning Pipelines (from Databricks) • Caffe (CaffeOnSpark) • Keras (Elephas) • mxnet • Paddle • TensorFlow (TensorFlow on Spark, TensorFrames) • CNTK (mmlspark) Implementations of DL on Spark • BigDL • DeepDist • DeepLearning4J • SparkCL • SparkNet What does Spark offer?
  • 20. ©2016Mastercard.ProprietaryandConfidential. Tensor Flow-on-Spark (or Caffe-on-Spark) uses Spark executors (tasks) to launch Tensor Flow/Caffe instances in the cluster; however, the distributed deep learning (e.g., training, tuning and prediction) are performed outside of Spark (across multiple Tensor Flow or Caffe instances). (1) As a results, Tensor Flow/Caffe still runs on specialized HW (such as GPU servers interconnected by InfiniBand), and the Open MP implementations in Tensor Flow/Caffe conflicts with the JVM threading in Spark (resulting in lower performance). (2) In addition, in this case Tensor Flow/Caffe can only interact the rest of the analytics pipelines in a very coarse-grained fashion (running as standalone jobs outside of the pipeline, and using HDFS files as job input and output). Programming interface Contributors commits BigDL Scala & Python 50 2221 TensorflowOnSpark Python 9 257 Databricks/tensor Python 9 185 Databricks/spark-deep- learning Python 8 51 StatisticscollectedonMar5th , 2018 Need more break down …..
  • 21. ©2016Mastercard.ProprietaryandConfidential. 21 Train Wide and Deep Model ( BigDL) features Models model candidatesampled partition Training Data … 10~12 Months Raw Txns + Negative samples Load Parquet Train Multiple Models Train AIS Model ( Mlib) sampled partition sampled partition Post Processing Simple Feature Engineering models models Spark ML Pipeline Stages Test Data Predictions Test Spark Data FramesParquet Files Pre-processing 1~2 Months Feature Selections Feature Selection Model Ensemble Inference SparkPipeline Neural Recommender Using BigDL NCF/ Wide And Deep Transformer Model Evaluation & Fine Tune Estimator Spark Mllib Train NCF Model ( BigDL) models … Benchmark User-Merchant User-Category User-Geo User-Merchant-Geo …. POC: Benchmark BigDL & Spark Mllib
  • 22. ©2016Mastercard.ProprietaryandConfidential. 22 AUROC: A AUPRCs: B recall: C precision: D 20 precision: E Mllib AIS Parameters : MaxIter(100) RegParam(0.01) Rank(200) Alpha(0.01) BigDL NCF AUROC: A+23% AUPRCs: B+31% recall: C+18% precision: D+47% 20 precision: E+51% Parameters : MaxEpoch(10) learningRate(3e-2) learningRateDecay(3e-7) uOutput(100) mOutput(200) batchSize(1.6 M) BigDL WAD Parameters : MaxEpoch(10) learningRate(1e-2) learningRateDecay(1e-7) uOutput(100) mOutput(200) batchSize(0.6 M) AUROC: A+20% (3 % down) AUPRCs: B+30% (1% down) recall: C+12% (4 % down) precision: D+49% (2 % up) 20 precision: E+54% (3% up) Benchmark results ( > 100 rounds)
  • 23. ©2016Mastercard.ProprietaryandConfidential. Beyond Deep Learning library , we need more automated platform capabilities to fit PROD adoption gaps
  • 24. ©2016Mastercard.ProprietaryandConfidential. 24 Incremental Tuning ( only re-run the whole pipeline with incremental changed datasets such as daily changed transactions and benchmark the models )  Refresh the dimensional datasets ( such as adding new users , items …)  Load the history model to the context and update incremental parts of model based on the incremental data sets  Periodic Re-training with a batch algorithm and time-series prediction  Benchmark the history model and update model and on-board the better ones. … Incremental Fact Incremental Dimensional History Model Incremental Set Ingest Model Fine Tuner Lookups Refresher Model Loader Models Benchmark Ingest Periodic Incremental Tuning Incremental Fine Tuning & Benchmark Gap 1 : Incremental Tuning
  • 25. ©2016Mastercard.ProprietaryandConfidential. 25 Model Serving (Connect to existing business pipelines , offline ,streaming and real-time )  Build the model serving capability by exporting model to scoring/prediction/recommendation services and integration points  Integrate the model serving services inside the business pipelines , such as embed them into Spark jobs for offline, Spark Streaming jobs for streaming , the real-time “dialogue” with Kafka messaging … Gap 2 : Model Serving to multiple contexts
  • 26. ©2016Mastercard.ProprietaryandConfidential. 26 Gap 3 : Build user friendly high level pipeline APIs High level pipeline APIs  Abstract and purify high level data and learning pipeline APIs on top of BigDL lib to simply the deep learning model assembly process and increase productivity
  • 27. ©2016Mastercard.ProprietaryandConfidential. 27 Gap 4 : Integrated with end to end data pipelines, fill in the loop Embedded the deep learning process into existing enterprise data pipelines  Build pre-defined templates and customized processors to bring deep learning process into the existing enterprise data pipelines , including batch , streaming and real-time
  • 28. ©2016Mastercard.ProprietaryandConfidential. 28 Design and Implement pipelines at Visualized workbench Pipelines Promotion Biz. A Biz. B Biz. C Biz. D Biz. E Biz. F Pipeline Designer AI Pipelines and Flows Local Dev Dev Sandbox Prod(s) Stage Configuration Management (Tag / Branches) Pipeline Registry Generate AI Pipelines  Deployment sequences Continuous integration (Parameter, template) Automate deployment with CI/CD pipelines Gap 5 : AI Pipelines promotion with automated CI/CD deployment
  • 29. ©2016Mastercard.ProprietaryandConfidential. Easier to build end-to-end analytics + AI applications • Reference use cases • Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, etc. • Predefined models • Object detection, image classification, text classification, recommendations, GAN, etc. • Feature engineering & transformations • Image, text, speech, 3D imaging, time-series, etc. • High level pipeline APIs • Dataframes, ML Pipelines, autograd, transfer learning, Keras/Keras2, etc. https://github.com/intel-analytics/analytics-zoo Community improvements : Analytics Zoo -> Unified Analytics + AI Platform for Spark and BigDL