SlideShare a Scribd company logo
Smart Apps @
Pivotal
Korea Big Data Roadshow – 05/04/2017
Dat Tran (Senior Data Scientist)
2
echo $(whoami)
Senior Data Scientist
@datitran
3
That’s me! A digital native…
4
My top three apps...
5
What do these apps have in common? They are smart…
6
Smart Photo Search
7
Traffic Prediction and ETA
8
Personalized Music Recommendations
9
What are Smart Apps?
High value
for users
Less value
for users
High value
through data
Smart Apps
Less value
through data
10
Why Smart Apps?
Users expect an app to be easy to use and personalized *
o An audience of one is the new normal
o Our smart apps adapt to a customer’s personal likes and dislikes
o By capturing the feedback, we continually improve
* The Rise of the Data Natives by Monica Rogati
11
Three Components of a Smart App
Data of all sizes and formats
including audio and video forms the
backbone of a smart app
Data
A smart system uses data science
to understand and predict user
behavior
Smart System
User Interface
+ +
Data
Smart
System
User
Interface
ACTIONS
An easy to use interface presents
the results of the smart system to
drive the desired action
12
How we do data science @ Pivotal
13
Data Science
Product Management
Product Design
Engineering
Continuous Improvement
14
Radically Agile Data Science @ Pivotal Labs
Pair Programming
Retros
Test Driven Development
Continuous Integration /
API First
Tracker
Standups
15
API First
Source: What is hardcore data science—in practice? (Mikio Braun)
16
Live Demo
17
Smart Data-Driven Apps
Handwritten Digit
Recognition
Next Likely Location
18
Use Cases
Largest Logistic
Services in the
Middle East
Delivery in a Land with No Post Codes
Customer: One of the largest logistics
services company based in the middle east
Problem: No postcodes in middle east and hence
the addresses were confirmed by making a phone
call to every consignee
Data Science Solution: Use machine learning
models to predict the right location for delivery
using historic data
Business Impact: Potential saving the client ~
$2.3M by avoiding phone calls
Technology and Data Overview
~5TB data for ~57M customers
DS Workflow Diagram
Using DBSCAN on GPS
locations per consignee
Consignees with one
location cluster
Consignees with
multiple location
clusters
Use multinomial
logistic regression to
predict the most likely
cluster location
Already sure
about the cluster
location
Built personalized delivery models each for ~57M customers
Impact
Potentially saving ~ $2.3M
No Phone Calls Call API to get delivery location App Integration
Large German Car
Manufacturer
25
Saving Human Lives with the IoT
Customer: A traditional German
automobile manufacturer specializing
on sport cars
Problem: Developed a new sensor
which can detect road conditions well
but with no predictive elements and
operationalization is unclear
Data Science Solution: Use
machine learning models to predict
certain road conditions based on
weather data and other car data
Challenges: No technical
infrastructure and BDS not possible
due to financial constraints
26
Cloud Native Architecture
Predictive API
Redis
for Pivotal CF
Deep Learning
Spring XD
FTP Source Shell Processor
Tap
S3
Feature
Engineering
Dashboard
Enricher
Model Pipeline
Orchestration
EC2
S3Sinkforpersistentstorage
Redis Sink
Measured data
Weather data
λ-Architecture
Modelpersistence
JSON
JSON
CSVCSV
CSV
Real-Time Layer
Batch Layer
Models
27
Spring Cloud Data Flow (Formerly Spring XD)
Unified, distributed, and extensible open-source system
for data ingestion, real time analytics, batch
processing and data export
o Data Ingestion and Pipeline Processing
o Real Time Analytics
o Rapid Dashboarding
o Batch Workflow Orchestration + ETL
Redis
for Pivotal CFSpring XD
FTP Source Shell Processor
Tap
S3
S3Sinkforpersistentstorage
Redis Sink
28
Batch Layer
Deep Learning
S3
Feature
Engineering
Model Pipeline
Orchestration
CSVCSV
29
Real-Time Layer
Predictive API
Redis
for Pivotal CF
Dashboard
Enricher
Measured data
Weather data JSON
JSON
CSV
Real-Time Layer
Models
30
Pivotal Cloud Foundry PaaS
more…
http://
Push App> cf
Modeling
32
Short Introduction into Deep Learning
Input Layer Output Layer
Hidden Layer 1 Hidden Layer 2
Positive
Neutral
Negative
33
Types of Neural Network
Recurrent Neural NetworkFeed-Forward Neural Network
o Models dynamic temporal behaviors
o Many variants: LSTM, GRU, Bi-
directional RNNs etc.
o Applications: Handwriting and
speech recognition and many more
o Ideal for functional mapping problems
o Architectures: Multi-layer perceptron,
CNNs etc.
o Many applications in supervised
learning
34
Recurrent Neural Network
Image
classification
Image
captioning
Sentiment
analysis
Machine
translation
Video
classification
Source: The Unreasonable Effectiveness of Recurrent Neural Networks (Andrej Karpathy)
35
Key Learnings
o Class imbalance problem
o Use simple RNNs over LSTM
o Use GPUs!
o Time-consuming to find the optimal network
o Many data is needed
36
Key Takeaways
37
Key Takeaways
o API First: Bringing the models into production as fast as
possible helps to minimize risk
o Clients can test it and give early/regular feedback
o Fast ROI
o Cloud Foundry enables us to reliably expose models as
scalable predictive APIs
o Cloud Native Data Science is crucial for Smart Apps
38
Questions?
@datitran
Smart App@Pivotal by Dat Tran

More Related Content

PDF
Real Time Business Platform by Ivan Novick from Pivotal
PDF
Moving data to the cloud BY CESAR ROJAS from Pivotal
PDF
Pivotal corporate story by CS Park
PPTX
Spark Summit Keynote by Shaun Connolly
PDF
Webinar: Big Data Integration - Why Same Old, Same Old Won't Cut It
PDF
Deep Learning Image Processing Applications in the Enterprise
PPTX
Cloud-Con: Integration & Web APIs
Real Time Business Platform by Ivan Novick from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
Pivotal corporate story by CS Park
Spark Summit Keynote by Shaun Connolly
Webinar: Big Data Integration - Why Same Old, Same Old Won't Cut It
Deep Learning Image Processing Applications in the Enterprise
Cloud-Con: Integration & Web APIs

What's hot (20)

PPTX
Webinar: BI in the Sky - The New Rules of Cloud Analytics
PDF
Effective Cost Management for Amazon EMR
PDF
DataOps or how I learned to love production - Michael Hausenblas
PDF
Office 360 and Spark
PPTX
Streaming Analytics for IoT with Apache Spark
PDF
Spark and the Enterprise by Tony Baer
PPTX
The Life of an Internet of Things Electron
PDF
Modernizing to a Cloud Data Architecture
PPTX
Hadoop for Humans: Introducing SnapReduce 2.0
PDF
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
PDF
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
PDF
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
PDF
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
PDF
Towards Personalization in Global Digital Health
PDF
DATA @ NFLX (Tableau Conference 2014 Presentation)
PPTX
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
PDF
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
PPTX
Beyond Batch: Is ETL still relevant in the API economy?
PDF
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Effective Cost Management for Amazon EMR
DataOps or how I learned to love production - Michael Hausenblas
Office 360 and Spark
Streaming Analytics for IoT with Apache Spark
Spark and the Enterprise by Tony Baer
The Life of an Internet of Things Electron
Modernizing to a Cloud Data Architecture
Hadoop for Humans: Introducing SnapReduce 2.0
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Towards Personalization in Global Digital Health
DATA @ NFLX (Tableau Conference 2014 Presentation)
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
Beyond Batch: Is ETL still relevant in the API economy?
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
Ad

Similar to Smart App@Pivotal by Dat Tran (20)

PDF
Saving Human Lives with the IoT
PPTX
Data Science Powered Apps for Internet of Things
PDF
Learn How to Operationalize IoT Apps on Pivotal Cloud Foundry
PPTX
Data Analytics in Digital Transformation
PDF
2014 Big_Data_Forum_Pivotal
PDF
Role of Data in Digital Transformation
PDF
Data and its Role in Your Digital Transformation
PDF
AI in the Enterprise
PDF
Big data - Talend presentation to STLHUG
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
PDF
Pivotal Big Data Suite: A Technical Overview
PDF
Pivotal Big Data Suite: A Technical Overview
PDF
Disrupting with Data: Lessons from Silicon Valley
PPTX
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
PDF
Pivotal Digital Transformation Forum: Data Science Technical Overview
PPT
Intelligent Platforms: Iterating Beyond Today's PaaS
PPTX
Personalization using Machine Learning
PPTX
Self Guiding User Experience
PDF
Using Data Science to Build an End-to-End Recommendation System
PDF
Dataiku productive application to production - pap is may 2015
Saving Human Lives with the IoT
Data Science Powered Apps for Internet of Things
Learn How to Operationalize IoT Apps on Pivotal Cloud Foundry
Data Analytics in Digital Transformation
2014 Big_Data_Forum_Pivotal
Role of Data in Digital Transformation
Data and its Role in Your Digital Transformation
AI in the Enterprise
Big data - Talend presentation to STLHUG
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
Disrupting with Data: Lessons from Silicon Valley
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
Pivotal Digital Transformation Forum: Data Science Technical Overview
Intelligent Platforms: Iterating Beyond Today's PaaS
Personalization using Machine Learning
Self Guiding User Experience
Using Data Science to Build an End-to-End Recommendation System
Dataiku productive application to production - pap is may 2015
Ad

More from VMware Tanzu Korea (20)

PDF
꿀밋업시리즈3탄_Spring Boot를 활용한 마이크로서비스 개발과 페어프로그래밍(TDD)
PDF
꿀밋업2탄_도메인 모델에 따른 데이터 분리 저장과 API 연결
PDF
꿀밋업1탄_왜_마이크로서비스인가
PDF
2018 Pivotal DevOps Day_DevOps 플랫폼 소개 및 데모 (Pivotal Application Service, Pivo...
PDF
2018 Pivotal DevOps Day_DevOps 플랫폼 팀 육성/운영 사례
PDF
2018 Pivotal DevOps Day_마이크로서비스 전환 방법론과 사례
PDF
2018 Pivotal DevOps Day_Pivotal 소개 및 세션 아젠다 소개
PDF
Pivotal Concourse를 활용한 CI/CD pipeline automated build-up & Workflow managemen...
PDF
숨겨진 마이크로서비스: 초고속 응답과 고가용성을 위한 캐시 서비스 디자인
PDF
클라우드 네이티브 플랫폼의 미래 - Kubernetes 기반의 PCF 로드맵
PDF
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
PDF
MSA 전략 1: 마이크로서비스, 어떻게 디자인 할 것인가?
PDF
클라우드 네이티브 IT를 위한 4가지 요소와 상관관계 - DevOps, CI/CD, Container, 그리고 MSA
PDF
굿 소프트웨어 컴퍼니로의 여정(Journey To Be a Good Software Company)
PDF
Pivotal 101세미나 발표자료 (PAS,PKS)
PDF
Pivotal Labs 고객사례 - Coinone
PPTX
Spring Project와 최신 Pivotal Cloud Foundry 업데이트
PPTX
Netflix MSA and Pivotal
PDF
클라우드 네이티브로의 전환을 위한 여정
PDF
Cloud native enterprise
꿀밋업시리즈3탄_Spring Boot를 활용한 마이크로서비스 개발과 페어프로그래밍(TDD)
꿀밋업2탄_도메인 모델에 따른 데이터 분리 저장과 API 연결
꿀밋업1탄_왜_마이크로서비스인가
2018 Pivotal DevOps Day_DevOps 플랫폼 소개 및 데모 (Pivotal Application Service, Pivo...
2018 Pivotal DevOps Day_DevOps 플랫폼 팀 육성/운영 사례
2018 Pivotal DevOps Day_마이크로서비스 전환 방법론과 사례
2018 Pivotal DevOps Day_Pivotal 소개 및 세션 아젠다 소개
Pivotal Concourse를 활용한 CI/CD pipeline automated build-up & Workflow managemen...
숨겨진 마이크로서비스: 초고속 응답과 고가용성을 위한 캐시 서비스 디자인
클라우드 네이티브 플랫폼의 미래 - Kubernetes 기반의 PCF 로드맵
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
MSA 전략 1: 마이크로서비스, 어떻게 디자인 할 것인가?
클라우드 네이티브 IT를 위한 4가지 요소와 상관관계 - DevOps, CI/CD, Container, 그리고 MSA
굿 소프트웨어 컴퍼니로의 여정(Journey To Be a Good Software Company)
Pivotal 101세미나 발표자료 (PAS,PKS)
Pivotal Labs 고객사례 - Coinone
Spring Project와 최신 Pivotal Cloud Foundry 업데이트
Netflix MSA and Pivotal
클라우드 네이티브로의 전환을 위한 여정
Cloud native enterprise

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Modernizing your data center with Dell and AMD
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
The AUB Centre for AI in Media Proposal.docx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Modernizing your data center with Dell and AMD
Chapter 3 Spatial Domain Image Processing.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence

Smart App@Pivotal by Dat Tran

  • 1. Smart Apps @ Pivotal Korea Big Data Roadshow – 05/04/2017 Dat Tran (Senior Data Scientist)
  • 2. 2 echo $(whoami) Senior Data Scientist @datitran
  • 3. 3 That’s me! A digital native…
  • 4. 4 My top three apps...
  • 5. 5 What do these apps have in common? They are smart…
  • 9. 9 What are Smart Apps? High value for users Less value for users High value through data Smart Apps Less value through data
  • 10. 10 Why Smart Apps? Users expect an app to be easy to use and personalized * o An audience of one is the new normal o Our smart apps adapt to a customer’s personal likes and dislikes o By capturing the feedback, we continually improve * The Rise of the Data Natives by Monica Rogati
  • 11. 11 Three Components of a Smart App Data of all sizes and formats including audio and video forms the backbone of a smart app Data A smart system uses data science to understand and predict user behavior Smart System User Interface + + Data Smart System User Interface ACTIONS An easy to use interface presents the results of the smart system to drive the desired action
  • 12. 12 How we do data science @ Pivotal
  • 13. 13 Data Science Product Management Product Design Engineering Continuous Improvement
  • 14. 14 Radically Agile Data Science @ Pivotal Labs Pair Programming Retros Test Driven Development Continuous Integration / API First Tracker Standups
  • 15. 15 API First Source: What is hardcore data science—in practice? (Mikio Braun)
  • 17. 17 Smart Data-Driven Apps Handwritten Digit Recognition Next Likely Location
  • 19. Largest Logistic Services in the Middle East
  • 20. Delivery in a Land with No Post Codes Customer: One of the largest logistics services company based in the middle east Problem: No postcodes in middle east and hence the addresses were confirmed by making a phone call to every consignee Data Science Solution: Use machine learning models to predict the right location for delivery using historic data Business Impact: Potential saving the client ~ $2.3M by avoiding phone calls
  • 21. Technology and Data Overview ~5TB data for ~57M customers
  • 22. DS Workflow Diagram Using DBSCAN on GPS locations per consignee Consignees with one location cluster Consignees with multiple location clusters Use multinomial logistic regression to predict the most likely cluster location Already sure about the cluster location Built personalized delivery models each for ~57M customers
  • 23. Impact Potentially saving ~ $2.3M No Phone Calls Call API to get delivery location App Integration
  • 25. 25 Saving Human Lives with the IoT Customer: A traditional German automobile manufacturer specializing on sport cars Problem: Developed a new sensor which can detect road conditions well but with no predictive elements and operationalization is unclear Data Science Solution: Use machine learning models to predict certain road conditions based on weather data and other car data Challenges: No technical infrastructure and BDS not possible due to financial constraints
  • 26. 26 Cloud Native Architecture Predictive API Redis for Pivotal CF Deep Learning Spring XD FTP Source Shell Processor Tap S3 Feature Engineering Dashboard Enricher Model Pipeline Orchestration EC2 S3Sinkforpersistentstorage Redis Sink Measured data Weather data λ-Architecture Modelpersistence JSON JSON CSVCSV CSV Real-Time Layer Batch Layer Models
  • 27. 27 Spring Cloud Data Flow (Formerly Spring XD) Unified, distributed, and extensible open-source system for data ingestion, real time analytics, batch processing and data export o Data Ingestion and Pipeline Processing o Real Time Analytics o Rapid Dashboarding o Batch Workflow Orchestration + ETL Redis for Pivotal CFSpring XD FTP Source Shell Processor Tap S3 S3Sinkforpersistentstorage Redis Sink
  • 29. 29 Real-Time Layer Predictive API Redis for Pivotal CF Dashboard Enricher Measured data Weather data JSON JSON CSV Real-Time Layer Models
  • 30. 30 Pivotal Cloud Foundry PaaS more… http:// Push App> cf
  • 32. 32 Short Introduction into Deep Learning Input Layer Output Layer Hidden Layer 1 Hidden Layer 2 Positive Neutral Negative
  • 33. 33 Types of Neural Network Recurrent Neural NetworkFeed-Forward Neural Network o Models dynamic temporal behaviors o Many variants: LSTM, GRU, Bi- directional RNNs etc. o Applications: Handwriting and speech recognition and many more o Ideal for functional mapping problems o Architectures: Multi-layer perceptron, CNNs etc. o Many applications in supervised learning
  • 35. 35 Key Learnings o Class imbalance problem o Use simple RNNs over LSTM o Use GPUs! o Time-consuming to find the optimal network o Many data is needed
  • 37. 37 Key Takeaways o API First: Bringing the models into production as fast as possible helps to minimize risk o Clients can test it and give early/regular feedback o Fast ROI o Cloud Foundry enables us to reliably expose models as scalable predictive APIs o Cloud Native Data Science is crucial for Smart Apps