SlideShare a Scribd company logo
Self driving computers  active learning workflows with human interpretable vector spaces (2)
Founded   2014
Funding   6.5 Million USD   
Investors   Y Combinator, Tencent  
Customers   Over 25 large enterprises and governments, DL4J gets over 160,000 donloads a month
Employees Around 40 (mostly engineers, includes Ph. Ds)
About Skymind
Production is part of your Training Set
• Edge cases exist in your data
• Imbalanced classes are a problem
• Data/Trends can change over time
• Expanded scope of problem due to
unforeseen difficulties or new business
problem
Why is this Important?
Crashes Happen
What can we do?
Human in The Loop
• Allow humans to have input
• Use Deep Learning to create friendly
vector spaces to inspect
• Use probabilities from models and
decision boundaries to control behavior
• More thorough data analysis to
understand outliers
• Human helps update models
Friendly Vector Spaces
• Word Embeddings
• Transfer Learning Feature Extractors
• Autoencoder Bottlenecks as an
embedding space
Word Embeddings
Fast Text Word2vec
Word Embeddings: A 2 minute primer
• Do an SGD variant on co located pairs
of words minimizing a distance function
between the 2 words
• Run sparse SGD updates on various
rows (each word is a row)
• Various ways of computing accuracy
Transfer Learning
Transfer Learning: A 2 minute primer
• Download a pre-trained neural net
architecture (usually cnn)
• Tune final Layer if doing classification
• Otherwise just use feature extractor as
a compression algorithm for high
dimensional images
• Intuition is similar to layerwise
pretraining of old
Join raw data Transform
Feed groups into autoencoder
and save reconstruction error
of center
Input Data Reconstruction
Autoencoders
13
Learns to cover more of vector space over time as
reconstruction error goes down
Auto-Encoder learning process
14
Auto-Encoders: A 2 minute primer
• Minimize KL Divergence (see previous
slide) between reconstruction and input
• Learn a bottleneck low dimensional
vector for use in other algos or
visualizations
Different kinds of auto-encoders
Variational
Autoencoders
GANs (there are
1000s I am not
covering them all
here)
https://github.com/kozi
str/Awesome-GANs
Commonalities
• Latent vector spaces automatically
learned via SGD
• Low dimension vectors meant to be
consumed externally
Various ways of consuming
Consuming
• Kmeans
• KNN Algos
• Visualization (UMap,Tsne, LargeVis)
KMeans
Using Kmeans
• Tune with target number of classes
• Use as a way of seeing how the neural
net groups your data in to classes
• Pseudo labeling mechanism
• Key: Run on latent vector space
KNN
Various kinds of KNN
• RPTrees (neighbor of my neighbors is
also likely my neighbor)
• VPTrees (Segment the space in to
quadrants, repeated updates using
trees to index vector space
• KDTrees
RPTrees
VPTrees
KDTrees
Visualization
Visualization
• UMap
• Barnes Hut Tsne
• LargeVis
• All dimensionality reduction algorithms
focused on building a coordinate space
via similarities in the vector space
UMap
Just use
this ->
Dataset troubleshooting
• Examine class imbalance
• Weighted loss functions
• Resampling
• Decision thresholds (mainly for binary
classification)
Integrations/Workflow
We achieve this workflow with Discover
Self driving computers  active learning workflows with human interpretable vector spaces (2)
Self driving computers  active learning workflows with human interpretable vector spaces (2)
Self driving computers  active learning workflows with human interpretable vector spaces (2)
Self driving computers  active learning workflows with human interpretable vector spaces (2)

More Related Content

PDF
Advanced deeplearning4j features
PDF
Anomaly Detection and Automatic Labeling with Deep Learning
PDF
Strata Beijing 2017: Jumpy, a python interface for nd4j
PPTX
Deploying signature verification with deep learning
PPTX
Boolan machine learning summit
PDF
Big Data Analytics Tokyo
PPTX
Brief introduction to Distributed Deep Learning
PPTX
Future of ai on the jvm
Advanced deeplearning4j features
Anomaly Detection and Automatic Labeling with Deep Learning
Strata Beijing 2017: Jumpy, a python interface for nd4j
Deploying signature verification with deep learning
Boolan machine learning summit
Big Data Analytics Tokyo
Brief introduction to Distributed Deep Learning
Future of ai on the jvm

What's hot (20)

PDF
Deep Learning on Apache Spark
PDF
First steps with Keras 2: A tutorial with Examples
PDF
World Artificial Intelligence Conference Shanghai 2018
PPTX
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
PPTX
Anomaly detection in deep learning (Updated) English
PDF
Kaz Sato, Evangelist, Google at MLconf ATL 2016
PDF
Keras: Deep Learning Library for Python
PPTX
Productionizing dl from the ground up
PDF
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
PDF
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
PPTX
Deploy Deep Learning Models with TensorFlow + Lambda
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
PDF
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
PPTX
Hadoop summit 2016
PDF
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
PDF
Deep learning with TensorFlow
PPTX
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
PPTX
Machine Learning and Hadoop
PPTX
Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation
PDF
DeepLearning4J and Spark: Successes and Challenges - François Garillot
Deep Learning on Apache Spark
First steps with Keras 2: A tutorial with Examples
World Artificial Intelligence Conference Shanghai 2018
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Anomaly detection in deep learning (Updated) English
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Keras: Deep Learning Library for Python
Productionizing dl from the ground up
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Deploy Deep Learning Models with TensorFlow + Lambda
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Hadoop summit 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Deep learning with TensorFlow
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Machine Learning and Hadoop
Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation
DeepLearning4J and Spark: Successes and Challenges - François Garillot
Ad

Similar to Self driving computers active learning workflows with human interpretable vector spaces (2) (20)

PDF
Architectural Decisions: Smoothly and Consistently
PDF
Architectural Decisions: Smoothly and Consistently
PPTX
No BS Guide to Deep Learning in the Enterprise
PPTX
Empower with visual charts (1)and llms and generative ai.pptx
PDF
NLP and Deep Learning for non_experts
PDF
Software Architecture and Architectors: useless VS valuable
PDF
Constrained Optimization with Genetic Algorithms and Project Bonsai
PDF
Building Big Data Streaming Architectures
PPTX
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
PDF
The Diabolical Developers Guide to Performance Tuning
PPTX
Eric Proegler Oredev Performance Testing in New Contexts
PPTX
Data Structure and Algorithms
PPTX
Unit - I Intro. to OOP Concepts and Control Structure -OOP and CG (2024 Patte...
PPTX
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
PDF
Deep learning for NLP
PDF
Domain Driven Design Big Picture Strategic Patterns
PPTX
The Challenges of Bringing Machine Learning to the Masses
PPTX
Vba Class Level 3
PPTX
Webcast: DevOps in AWS is different! How can containers help?
PPTX
SF Architect Interview questions v1.3.pptx
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
No BS Guide to Deep Learning in the Enterprise
Empower with visual charts (1)and llms and generative ai.pptx
NLP and Deep Learning for non_experts
Software Architecture and Architectors: useless VS valuable
Constrained Optimization with Genetic Algorithms and Project Bonsai
Building Big Data Streaming Architectures
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
The Diabolical Developers Guide to Performance Tuning
Eric Proegler Oredev Performance Testing in New Contexts
Data Structure and Algorithms
Unit - I Intro. to OOP Concepts and Control Structure -OOP and CG (2024 Patte...
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
Deep learning for NLP
Domain Driven Design Big Picture Strategic Patterns
The Challenges of Bringing Machine Learning to the Masses
Vba Class Level 3
Webcast: DevOps in AWS is different! How can containers help?
SF Architect Interview questions v1.3.pptx
Ad

More from Adam Gibson (18)

PDF
End to end MLworkflows
PDF
Deep Learning with GPUs in Production - AI By the Bay
PDF
Wrangleconf Big Data Malaysia 2016
PDF
Distributed deep rl on spark strata singapore
PDF
Deep learning in production with the best
PPTX
Dl4j in the wild
PDF
SKIL - Dl4j in the wild meetup
PDF
Strata Beijing - Deep Learning in Production on Spark
PPTX
Skymind - Udacity China presentation
PDF
Anomaly Detection in Deep Learning (Updated)
PDF
Anomaly detection in deep learning
PPTX
Advanced spark deep learning
PPTX
Skymind Open Power Summit ISV Round Table
PPTX
Recurrent nets and sensors
PPTX
Nd4 j slides.pptx
PPTX
Deep learning on Hadoop/Spark -NextML
PDF
Skymind & Deeplearning4j: Deep Learning for the Enterprise
PPTX
Sf data mining_meetup
End to end MLworkflows
Deep Learning with GPUs in Production - AI By the Bay
Wrangleconf Big Data Malaysia 2016
Distributed deep rl on spark strata singapore
Deep learning in production with the best
Dl4j in the wild
SKIL - Dl4j in the wild meetup
Strata Beijing - Deep Learning in Production on Spark
Skymind - Udacity China presentation
Anomaly Detection in Deep Learning (Updated)
Anomaly detection in deep learning
Advanced spark deep learning
Skymind Open Power Summit ISV Round Table
Recurrent nets and sensors
Nd4 j slides.pptx
Deep learning on Hadoop/Spark -NextML
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Sf data mining_meetup

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Introduction to Business Data Analytics.
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Computer network topology notes for revision
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Reliability_Chapter_ presentation 1221.5784
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Knowledge Engineering Part 1
Launch Your Data Science Career in Kochi – 2025
oil_refinery_comprehensive_20250804084928 (1).pptx
Database Infoormation System (DBIS).pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Business Data Analytics.
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Computer network topology notes for revision
Business Acumen Training GuidePresentation.pptx
climate analysis of Dhaka ,Banglades.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction-to-Cloud-ComputingFinal.pptx

Self driving computers active learning workflows with human interpretable vector spaces (2)

  • 2. Founded   2014 Funding   6.5 Million USD    Investors   Y Combinator, Tencent   Customers   Over 25 large enterprises and governments, DL4J gets over 160,000 donloads a month Employees Around 40 (mostly engineers, includes Ph. Ds) About Skymind
  • 3. Production is part of your Training Set • Edge cases exist in your data • Imbalanced classes are a problem • Data/Trends can change over time • Expanded scope of problem due to unforeseen difficulties or new business problem
  • 4. Why is this Important?
  • 7. Human in The Loop • Allow humans to have input • Use Deep Learning to create friendly vector spaces to inspect • Use probabilities from models and decision boundaries to control behavior • More thorough data analysis to understand outliers • Human helps update models
  • 8. Friendly Vector Spaces • Word Embeddings • Transfer Learning Feature Extractors • Autoencoder Bottlenecks as an embedding space
  • 10. Word Embeddings: A 2 minute primer • Do an SGD variant on co located pairs of words minimizing a distance function between the 2 words • Run sparse SGD updates on various rows (each word is a row) • Various ways of computing accuracy
  • 12. Transfer Learning: A 2 minute primer • Download a pre-trained neural net architecture (usually cnn) • Tune final Layer if doing classification • Otherwise just use feature extractor as a compression algorithm for high dimensional images • Intuition is similar to layerwise pretraining of old
  • 13. Join raw data Transform Feed groups into autoencoder and save reconstruction error of center Input Data Reconstruction Autoencoders 13
  • 14. Learns to cover more of vector space over time as reconstruction error goes down Auto-Encoder learning process 14
  • 15. Auto-Encoders: A 2 minute primer • Minimize KL Divergence (see previous slide) between reconstruction and input • Learn a bottleneck low dimensional vector for use in other algos or visualizations
  • 16. Different kinds of auto-encoders Variational Autoencoders GANs (there are 1000s I am not covering them all here) https://github.com/kozi str/Awesome-GANs
  • 17. Commonalities • Latent vector spaces automatically learned via SGD • Low dimension vectors meant to be consumed externally
  • 18. Various ways of consuming
  • 19. Consuming • Kmeans • KNN Algos • Visualization (UMap,Tsne, LargeVis)
  • 21. Using Kmeans • Tune with target number of classes • Use as a way of seeing how the neural net groups your data in to classes • Pseudo labeling mechanism • Key: Run on latent vector space
  • 22. KNN
  • 23. Various kinds of KNN • RPTrees (neighbor of my neighbors is also likely my neighbor) • VPTrees (Segment the space in to quadrants, repeated updates using trees to index vector space • KDTrees
  • 28. Visualization • UMap • Barnes Hut Tsne • LargeVis • All dimensionality reduction algorithms focused on building a coordinate space via similarities in the vector space
  • 30. Dataset troubleshooting • Examine class imbalance • Weighted loss functions • Resampling • Decision thresholds (mainly for binary classification)
  • 32. We achieve this workflow with Discover

Editor's Notes

  • #3: スカイマインド株式会社カントリーマネージャー 堀と申します。親会社のスカイマインドインクは2014年に設立、ディープラーニングに特化したソフトウェア会社で、サンフランシスコが本社です。日本を中心としたアジア・アメリカ・欧州のファーストクラスエンジニアを多数抱えています。 <number>
  • #4: <number>
  • #5: <number>
  • #6: <number>
  • #7: <number>
  • #8: <number>
  • #9: <number>
  • #10: <number>
  • #11: <number>
  • #12: <number>
  • #13: <number>
  • #14: <number>
  • #15: <number>
  • #16: <number>
  • #17: <number>
  • #18: <number>
  • #19: <number>
  • #20: <number>
  • #21: <number>
  • #22: <number>
  • #23: <number>
  • #24: <number>
  • #25: <number>
  • #26: <number>
  • #27: <number>
  • #28: <number>
  • #29: <number>
  • #30: <number>
  • #31: <number>
  • #32: <number>
  • #33: 「ファーストApp」 は、ディープラーニングデータを可視化して「診断する」ツールです。これには2つの特徴があります。 ①まず、UIがシンプルでわかりやすい データサイエンティストでなくても、データをインポートするだけで、データの品質を瞬時に読み取れます。 ②次に、安価に拡張できる 24時間365日サポート付きの統合プラットフォーム上のアップですから、安心してスケールアップできます。 <number>