AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Technologies with Suqiang Song

Suqiang Song, Director, Chapter Leader of Data Engineering & AI
Mastercard
AI as a Service
Build Shared AI Service Platforms Based on Deep
Learning Technologies
#AI1SAIS

Differentiation starts with consumer insights from a
massive worldwide payments network and our
experience in data cleansing, analytics and modeling
Mastercard Big Data & AI Expertise
WAREHOUSED
• 10 petabytes
• 5+ year historic global view
• Rapid retrieval
• Above-and-beyond privacy protection and security
MULTI-SOURCED
• 38MM+ merchant locations
• 22,000 issuers CLEANSED, AGGREGATD, ANONYMOUS, AUGMENTED
• 1.5MM automated rules
• Continuously tested
TRANSFORMED INTO ACTIONABLE INSIGHTS
• Reports, indexes, benchmarks
• Behavioral variables
• Models, scores, forecasting
• Econometrics
What can
2.4 BILLION
Global Cards and
56 BILLION
Transactions/
Year mean
to you?
Mastercard Enhanced Artificial Intelligence Capability
with the Acquisitions of Applied Predictive
Technologies(2015) and Brighterion (2017)

©2018Mastercard.ProprietaryandConfidential.
AI Applications
Machine learning frameworks
• Machine learning frameworks:
Provide stable and secure
environments and consolidate
integrated wrappers on top of
variable technologies for regular
machine learning works
• Applications build silos from scratch
Three modes of AI as a Services
• Fully managed machine learning servic
es use templates, pre-built models and
drag-and-drop development tools to si
mplify and expedite the process of usin
g a machine learning framework
• Applications share templates and pre-
built models , assembly and infer them
into pipelines or business context
• Automation Services, tasks like explora
tory data analysis, pre-processing of da
ta, hyper-parameter tuning, model sele
ction and putting models into producti
on can be automated
• “God's Return to God, Satan's Return
to Satan , Math’s Return AI, Business’s
Return Biz”
AI Applications
Fully managed
machine learning services
AI Applications
On / Off Premise Advanced
Infrastructure
Fully managed
machine learning services
Infrastructure
Infrastructure
Automation
Services

5
Time
Cost
Data Exploration &
Harmonization
Features
Engineering
0,0
Regular Mode :Machine learning frameworks
Evaluation
& Benchmarking
Model Deployment
&Serving
$100,000
6 weeks
Modeling
Example : Machine Learning Sandbox

6
Time
Cost
Features
Engineering
0,0
Plus Mode : Fully managed machine learning services
Evaluation
& Benchmarking
$50,000
2 weeks
Model Deployment
&Serving
Modeling
Data Exploration &
Harmonization
Example : Data Science Workbench

7
Time
Cost
Features
Engineering
0,0
Premium Mode: Automation Services
Evaluation
& Benchmarking
$10,000
2 days
Model Deployment
&ServingData Exploration &
Harmonization
Modeling
Example : Amazon SageMaker ?

8
Feature engineering bottlenecks
Pre-calculate hundreds or thousands Long
Term Variables take lots of resources and times
Model scalability limitations
Trade-off between automation in parallel and
scaling machine learning to ever larger datasets
and ever more complicated models
Model Serving to multiple contexts
Gap to connect to existing business
pipelines , offline ,streaming and real-time
Heavily relies on human machine learning
experts
Relies on human to perform the most of tasks
API Enablement and automate deployment
Low productivity to create more models with
low level raw APIs
Isolated promotions and operation readness
with automate deployment
Less integration with end to end data
pipelines, fill in the loop
Gap to bring machine learning process into
the existing enterprise data pipelines ,
including batch , streaming and real-time
1
2
3
4
5
6
Challenges to achieve Premium Automation AI Service
Learning Automation Serving Automation

What Deep Learning can help ?

10
Bottlenecks
 Need to pre-calculate hundreds or thousands Long Term Variables for each user, such as total
spends /visits for merchants list, category list divided by week, months and years
 The computation time for LTV features took > 70% of the data processing time for the whole
lifecycle and occupied lots of resources which had huge impact to other critical workloads.
 Miss the feature selection optimizations which could save the data engineering efforts a lot
AUTH DETAIL from last weekLTV DATA from last week MERCHANT
AGED LTV DATA
GEO
CATEGORY
ITEM LEVEL DATA
FILTERED TRANSACTIONS
SUMMED BY USER
AGED BY USERAGED LTV DATA
LTV DATA FOR THIS WEEK
Challenges with Traditional ML : Feature engineering bottlenecks

11
Improvements
 When build model , only focus on few
pre-defined sliding features and custom
overlap features ( Users only need to
identify the columns names from data
source)
 Remove most of the LTV pre-calculations
works, saved hours time and lots of
resources
 Deep learning algorithm generates
exponential growth of hidden embedding
features ,do the internal features selections
and optimization automatically when it
does cross validation at training stage
With Deep Learning : Remove lots of LTV workloads and simply the feature engineering

12
…
Item 1 * Users
Item 2* Users
Item n* Users
Feature
Engineering
Training 1
Training 2
Training n
Model 1
Model 2
Model n
Merge
2
2
2
3
3
3
4
1
Prebuilt correlation
Model
Merge all the
prediction results
Evaluation 1
Evaluation 2
Evaluation 3
Limitations
 All the pipelines separated by items and
generate one model for each item
 Have to pre-calculate the correlation
matrix between items
 Lots of redundant duplications and
computations at feature engineering
,training and testing process
 Run items in parallel and occupied
most of cluster resources when executed
 Bad metrics for items with few
transactions
 It is very hard to scale more items , from
hundreds to millions ?
Challenges with Traditional ML : Model scalability

13
•NCF
• Scenario：Neural Collaborative
Filtering ,recommend products to
customers (priority is to
recommend to active users)
according to customers’ past
history activities.
• https://www.comp.nus.edu.sg/~xia
ngnan/papers/ncf.pdf
•Wide & Deep learning
• Scenario: jointly trained wide linear
models and deep neural networks-
--to combine the benefits of
memorization and generalization
for recommender systems.
• https://pdfs.semanticscholar.org/aa
9d/39e938c84a867ddf2a8cabc575f
fba27b721.pdf
Linear 2
ReLU
Linear 1
ReLU
Concat
CMul
LookupTable
(MF User)
LookupTable
(MLP User)
LookupTable
(MF Item)
Linear 3
Sigmoid
Select
LookupTable
(MLP Item)
ConcatTable
Conca
SelectSelect Select
User index User indexItem Index
User Item Pair
MLP
MF
Embedding
Layers
Item Index
MLP User Embedding MLP Item EmbeddingMF Item EmbeddingMF User Embedding
With Deep Learning : Scale models in deeper and wider without decreasing metrics

14
Relies on human to perform the following tasks:
Select and construct appropriate features.
Select an appropriate model family.
Optimize model hyper parameters.
Post process machine learning models.
Critically analyze the results obtained.
Challenges with Traditional ML : Heavily relies on human machine learning experts
Training Data Sets
Data Source
Partitioning
Model 2
Model 1
Model n
Testing Data Sets
Validation Data Sets
Choose Best Model
Validate Model Metrics

15
Improvements
 Common neural network
"tricks", including initialization, L2
and dropout regularization, Batch
normalization, gradient checking
 A variety of optimization
algorithms, such as mini-batch
gradient descent, Momentum,
RMSprop and Adam
 Provides optimization-as-a-
service using an ensemble of
optimization strategies, allowing
practitioners to efficiently
optimize models faster and
cheaper than standard
approaches.
With Deep Learning : Gives more options for finding an optimally performing robust
configuration

Our Explore & Evaluation Journey

Enterprise requirements for Deep Learning
Seamless integration with
Products Internal & External
• Add deep learning capabilities to existing
Analytic Applications and/or machine learning
workflows rather than rebuild all of them
Collocated with mass data
storage
• Analyze a large amount of data on the
same Big Data clusters where the data
are stored (HDFS, HBase, Hive, etc.) rather
than move or duplicate data
Shared infrastructure with Multi-
tenant isolated resources
• Leverage existing Big Data clusters and deep
learning workloads should be managed and
monitored with other workloads (ETL, data
warehouse, traditional ML etc..) rather than
run DL workloads standalone in separate
clusters
Data governance with
restricted Processing
• Follow data privacy, regulation and
compliance ( such as PCI/PII compliance
and GDPR rather than operate data in
unsecured zones

• Claimed that the GPU computing are better than CPU which requires new hardware
infrastructure (very long timeline normally )
• Success requires many engineer-hours ( Impossible to Install a Tensor Flow Cluster at
STAGE ...)
• Low level APIs with steep learning curve ( Where is your PHD degree ? )
• Not well integrated with other enterprise tools and need data movements (couldn't
leverage the existing ETL, data warehousing and other analytic relevant data pipelines,
technologies and tool sets. And it is also a big challenge to make duplicate data
pipelines and data copy to the capacity and performance.)
• Tedious and fragile to distribute computations ( less monitoring )
• The concerns of Enterprise Maturity and InfoSec ( use GPU cluster with Tensor Flow from
Google Cloud )
…………..
Maybe not your story , but we have ....
Challenges and limitations to Production considering some “Super Stars”….

Integrations with existing DL
libraries
• Deep Learning Pipelines (from Databricks)
• Caffe (CaffeOnSpark)
• Keras (Elephas)
• mxnet
• Paddle
• TensorFlow (TensorFlow on Spark,
TensorFrames)
• CNTK (mmlspark)
Implementations of DL on Spark
• BigDL
• DeepDist
• DeepLearning4J
• SparkCL
• SparkNet
What does Spark offer?

Tensor Flow-on-Spark (or Caffe-on-Spark) uses Spark executors (tasks) to launch Tensor Flow/Caffe
instances in the cluster; however, the distributed deep learning (e.g., training, tuning and prediction) are
performed outside of Spark (across multiple Tensor Flow or Caffe instances).
(1) As a results, Tensor Flow/Caffe still runs on specialized HW (such as GPU servers interconnected by
InfiniBand), and the Open MP implementations in Tensor Flow/Caffe conflicts with the JVM threading in
Spark (resulting in lower performance).
(2) In addition, in this case Tensor Flow/Caffe can only interact the rest of the analytics pipelines in a
very coarse-grained fashion (running as standalone jobs outside of the pipeline, and using HDFS files as
job input and output).
Programming
interface
Contributors commits
BigDL Scala & Python 50 2221
TensorflowOnSpark Python 9 257
Databricks/tensor Python 9 185
Databricks/spark-deep-
learning
Python 8 51
StatisticscollectedonMar5th
, 2018
Need more break down …..

21
Train Wide and Deep Model ( BigDL)
features Models
model
candidatesampled
partition
Training Data
…
10~12
Months
Raw
Txns
+
Negative
samples
Load Parquet
Train Multiple Models
Train AIS Model ( Mlib)
sampled
partition
sampled
partition
Post
Processing
Simple
Feature
Engineering
models
models
Spark ML Pipeline Stages
Test Data
Predictions
Test
Spark Data FramesParquet Files
Pre-processing
1~2
Months
Feature
Selections
Feature
Selection
Model
Ensemble
Inference
SparkPipeline
Neural Recommender
Using BigDL NCF/ Wide And Deep
Transformer Model
Evaluation
& Fine
Tune
Estimator
Spark Mllib
Train NCF Model ( BigDL)
models
…
Benchmark
User-Merchant
User-Category
User-Geo
User-Merchant-Geo
….
POC: Benchmark BigDL & Spark Mllib

22
AUROC: A
AUPRCs: B
recall: C
precision: D
20 precision: E
Mllib AIS
Parameters :
MaxIter(100)
RegParam(0.01)
Rank(200)
Alpha(0.01)
BigDL NCF
AUROC: A+23%
AUPRCs: B+31%
recall: C+18%
precision: D+47%
20 precision: E+51%
Parameters :
MaxEpoch(10)
learningRate(3e-2)
learningRateDecay(3e-7)
uOutput(100)
mOutput(200)
batchSize(1.6 M)
BigDL WAD
Parameters :
MaxEpoch(10)
learningRate(1e-2)
learningRateDecay(1e-7)
uOutput(100)
mOutput(200)
batchSize(0.6 M)
AUROC: A+20% (3 % down)
AUPRCs: B+30% (1% down)
recall: C+12% (4 % down)
precision: D+49% (2 % up)
20 precision: E+54% (3% up)
Benchmark results ( > 100 rounds)

Beyond Deep Learning library , we
need more automated platform
capabilities to fit PROD adoption gaps

24
Incremental Tuning ( only re-run the
whole pipeline with incremental changed
datasets such as daily changed transactions and
benchmark the models )
 Refresh the dimensional datasets ( such
as adding new users , items …)
 Load the history model to the context
and update incremental parts of model
based on the incremental data sets
 Periodic Re-training with a batch
algorithm and time-series prediction
 Benchmark the history model and update
model and on-board the better ones.
…
Incremental
Fact
Incremental
Dimensional
History Model
Incremental Set
Ingest
Model
Fine Tuner
Lookups Refresher
Model Loader
Models
Benchmark
Ingest
Periodic Incremental Tuning
Incremental Fine Tuning &
Benchmark
Gap 1 : Incremental Tuning

25
Model Serving (Connect to existing business pipelines , offline ,streaming and real-time )
 Build the model serving capability by exporting model to scoring/prediction/recommendation
services and integration points
 Integrate the model serving services inside the business pipelines , such as embed them into
Spark jobs for offline, Spark Streaming jobs for streaming , the real-time “dialogue” with Kafka
messaging …
Gap 2 : Model Serving to multiple contexts

26
Gap 3 : Build user friendly high level pipeline APIs
High level pipeline APIs
 Abstract and purify high level data and learning pipeline APIs on top of BigDL lib to simply the
deep learning model assembly process and increase productivity

27
Gap 4 : Integrated with end to end data pipelines, fill in the loop
Embedded the deep learning process into existing enterprise data pipelines
 Build pre-defined templates and customized processors to bring deep learning process
into the existing enterprise data pipelines , including batch , streaming and real-time

28
Design and Implement pipelines at Visualized workbench
Pipelines Promotion
Biz. A
Biz. B
Biz. C
Biz. D
Biz. E
Biz. F
Pipeline Designer
AI Pipelines and Flows
Local Dev
Dev
Sandbox
Prod(s)
Stage
Configuration
Management
(Tag /
Branches)
Pipeline
Registry
Generate AI Pipelines
 Deployment sequences
Continuous
integration
(Parameter,
template)
Automate deployment with CI/CD pipelines
Gap 5 : AI Pipelines promotion with automated CI/CD deployment

Easier to build end-to-end analytics + AI applications
• Reference use cases
• Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, etc.
• Predefined models
• Object detection, image classification, text classification, recommendations, GAN, etc.
• Feature engineering & transformations
• Image, text, speech, 3D imaging, time-series, etc.
• High level pipeline APIs
• Dataframes, ML Pipelines, autograd, transfer learning, Keras/Keras2, etc.
https://github.com/intel-analytics/analytics-zoo
Community improvements : Analytics Zoo -> Unified Analytics + AI Platform for Spark
and BigDL

Thanks
Q & A

AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Technologies with Suqiang Song

More Related Content

What's hot (20)

Similar to AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Technologies with Suqiang Song (20)

More from Databricks (20)

Recently uploaded (20)

AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Technologies with Suqiang Song