Deeplearning in production

Deeplearning in production
the Data Engineer part
1

2
whoami
Scauglog
Data Engineer, Xebia

Team Astro
Product Owner Scrum Master Data Scientists, Data Engineers,
Machine Learning Engineers
4

What is a Deep Learning model?
10

▼ Distributed Prediction
▼ Can create complexe network
▼ Documentation
▼ Community
Choose your Framework
11

Train Workﬂow
Input Data train model
16

Predict Workﬂow
Input Data predict Output Data
model
17

Preprocessing
▼ Scaling (normalisation, min max, …)
▼ Replace null
▼ Lagging
19

Train Workﬂow
Input Data train
model
preprocess
scaler
20

Predict Workﬂow
model
preprocess
scaler
21

Scaling Prediction: naïve approach
def predict(input: RDD[PreprocessRow], modelPath: String): RDD[SinglePredictionRow]
= {
input.map { row =>
// load model
val hdfs = FileSystem.get(new Configuration())
val source = new Path(modelPath)
val model = ModelSerializer.restoreComputationGraph(hdfs.open(source), true)
// make prediction
val prediction = model.output(row.features)(0).getColumn(0).toFloatVector
// return prediction
SinglePredictionRow.fromPreprocessRow(row, prediction(0))
}
}
23

Scaling Prediction: faster
def predict(input: RDD[PreprocessRow], modelPath: String): RDD[SinglePredictionRow]
= {
// load model
lazy val hdfs = FileSystem.get(new Configuration())
lazy val source = new Path(modelPath)
lazy val model = ModelSerializer.restoreComputationGraph(hdfs.open(source), true)
input.map { row =>
// make prediction
val prediction = model.output(row.features)(0).getColumn(0).toFloatVector
// return prediction
SinglePredictionRow.fromPreprocessRow(row, prediction(0))
}
}
24

Scaling Prediction: fastest
def predict(input: RDD[PreprocessRow], modelPath: String):
RDD[SinglePredictionRow] = {
// load model
lazy val hdfs = FileSystem.get(new Configuration())
lazy val source = new Path(modelPath)
lazy val model = ModelSerializer.restoreComputationGraph(hdfs.open(source),
true)
input.mapPartitions { partition =>
val partitionSeq = partition.toSeq
if (partitionSeq.isEmpty) {
val emptySeq: Seq[SinglePredictionRow] = Seq()
emptySeq.toIterator
} else {
val features = partitionSeq.map(_.features).reduce( (x, y) => Nd4j.concat(0,
x, y))
val predictions = model.output(features)(0).getColumn(0).toFloatVector
partitionSeq.zip(predictions).map { case (row, prediction) =>
SinglePredictionRow.fromPreprocessRow(row, prediction)
}.toIterator
}
}
}
25

OOM
▼ ND4J Array are C++ offheap object
▼ Cache your dataframe or look stage details to estimate memory usage
▼ Set spark.yarn.executor.memoryOverhead
▼ Use ND4J workspace to properly manage memory deallocation
▼ Repartition your dataframe before prediction to ensure equals partition
▼ Set spark.sql.shuffle.partitions
26

OOM
def predict(input: RDD[PreprocessRow], modelPath: String, numFeatures: Int, timeSteps: Int): RDD[SinglePredictionRow] =
{
// load model ...
lazy val basicConfig: WorkspaceConfiguration = WorkspaceConfiguration.builder().initialSize(0)
.policyLearning(LearningPolicy.NONE).policyAllocation(AllocationPolicy.STRICT).build()
lazy val workspace = Nd4j.getWorkspaceManager.getAndActivateWorkspace(basicConfig, "myWorkspace")
input.mapPartitions { partition =>
val partitionSeq = partition.toSeq
if (partitionSeq.isEmpty) {
val emptySeq: Seq[SinglePredictionRow] = Seq()
emptySeq.toIterator
} else {
workspace.notifyScopeEntered()
val features = Nd4j.create(partitionSeq.flatMap(_.features).toArray, Array(partitionSeq.size, numFeatures,
timeSteps))
val predictions = model.output(false, workspace, features)(0).toFloatVector
workspace.notifyScopeLeft()
partitionSeq.zip(predictions).map { case (row, prediction) =>
SinglePredictionRow.fromPreprocessRow(row, prediction)
}.toIterator
}
}
} 27

Compile
▼ Maven
▼ -Djavacpp.platform=linux-x86_64
▼ Exclude
▽ deeplearning4j-datasets
▽ deeplearning4j-datavec-iterators
▽ deeplearning4j-ui-components
28

Retrain again and again and again...
▼ Model performance decline over time
▼ Hyperparameter tuning
▼ Deep Learning model rarely comes alone (clustering)
30

Predict Workﬂow
model
preprocess
scaler
Preprocessed
Data
31

Train Workﬂow
train
model
Preprocessed
Data
32

Training at scale: AWS EC2
▼ Create VPC
▼ Create Subnet associated to VPC
▼ Create an IGW associated to VPC
▼ Create a route table associated to IGW
▼ Create a Security Group associated to VPC
▽ Authorize ssh only for my IP
▼ Create a key pair
▼ Create EC2 server with EBS volume
34

▼ Add ssh keys of team members
▼ Install cuda, cudnn, nccl and conﬁgure them
▼ Deploy train jar to EC2 instance
▼ Deploy train pipeline to EC2 instance
▼ Deploy preprocessed data to EC2 instance
▼ Deploy auto shutdown script
35

▼ Ansible
▼ Transfert preprocess data to S3
▼ Store model in S3
▼ Check CPU vs GPU training time
▼ Keep track of training conﬁg and performance
▼ Share knowledge with Data Scientist
▼ Put your data in ESB volume if they ﬁt
36

Training with DL4J: Lessons learned
▼ Beware of tensor order
▼ Prefetch data in memory (InMemoryDatasetIterator)
▼ Add listener to monitor your training compute
performance
▼ Use the UI
37

Keras to DL4J
def execute(config: Config): Unit = {
val kerasModel = KerasModelImport.importKerasModelAndWeights(
config.kerasModelPath, false)
ModelSaver.writeModel(kerasModel, config.outputModelPath)
}
▼ Data Scientist love Keras
▼ Keras is easier to import on notebook
▼ Training on Keras is faster
▼ Keras is compliant with cloud training
(Sagemaker, CloudML)
38

Workﬂow Train
train
model.h5
Preprocessed
Data
39
Keras to DL4J model.zip

Monitoring: mlﬂow
42
▼ Ensure your training machine can reach mlﬂow server
▼ Keep track of your experiment
▽ Training parameter
▽ Performance
▼ Compare results
▼ (model repository, standardize model packaging, easy deployment)

Monitoring: Zeppelin
44
▼ Already in HDP
▼ Authentication
▼ Scheduling
▼ Report View
▼ Auto shutdown
▼ Can mix sources (Scala, JDBC, C*, …)
▼ API to automate deployment

45
Thank you for your attention
Any questions ?

▼ Data Science to Production: https://youtu.be/Gr2SS0xv0xE
▼ DL4J: https://youtu.be/QfnCcPcZogI
▼ Zeppelin: https://youtu.be/w78gZW6BQJI
▼ cloudML: https://youtu.be/oDpBRdjwNik
▼
46

Deeplearning in production

More Related Content

What's hot (13)

Similar to Deeplearning in production (20)

More from Paris Data Engineers ! (11)

Recently uploaded (20)

Deeplearning in production