SlideShare a Scribd company logo
AI Next Conference: 7/24/2019
Machine Learning
Automated Data Visualization
Ram Seshadri
July 2019 Slide 1
and
AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019
“Machine learning teams are still struggling to take advantage of ML
due to challenges with inflexible frameworks, lack of reproducibility,
collaboration issues, and immature software tools”
Cecelia Shao
Comet.ml
“Why is my Data Science team taking
sooo long to complete a simple project?”
-- A Frustrated CIO
Slide 2
Machine learning teams are still struggling to take advantage of ML due to
challenges with inflexible frameworks, lack of reproducibility, collaboration
issues, and immature software tools.
The Answer?
AI Next Conference: 7/24/2019
Faster
Visualization
Automatic
Feature
Selection
• Auto_ViML
Automatic
Model Selection
and Tuning
• Auto_ViML
One Click Model
Serving and
Production
Auto_ViML was designed along with AutoViz to Build Variant
Interpretable Machine Learning Models Fast!
__
● They are proprietary and expensive (lock-in)
● Black Boxes which are too complex to interpret
● Very little reproducibility outside of tool
HOWEVER CURRENT TOOLS ARE LIMITED BECAUSE...
• AutoViz
INTRODUCING A SIMPLER APPROACH TO AUTO-ML
Slide 3
How can we make DATA SCIENTISTS more productive?
AI Next Conference: 7/24/2019
AI Next Conference: 7/24/2019
●Open Source Tools for Faster Time to Insights with Design Goals as:
○Simple: Invoke them with a single Line of Code (each)
○Flexible: Suited to any kind of structured data set with no Prep required
○Incremental: Can be used by anyone from beginners to experts alike
○Experimental: Compare multiple visualization methods and models step by step
○Interpretable: get clear explanation of steps taken with validation graphs
○Reproducible: No Black Box. Reproducible model pipelines and outputs
○Extensible: Open Source with contributions from Python and DS community
I Built AutoViz and Auto_ViML to make my own life easier.
Hope it will do the same for you.
Slide 4
What is Auto_Viz and Auto_ViML?
AI Next Conference: 7/24/2019
What is AutoViz?
Slide 5
AutoViz enables you to automatically
visualize any data set with a Single Line of
Code. It automatically:
1. Selects a Random Sample from the Data
Set (if the Data Set is very large)
2. Selects most important features using
ML (if Number of Variables is very large)
3. Selects Best Methods to Visualize Data
for a given problem
4. Provides Charts to be saved in PNG,
JPG, and SVG Formats
OVERVIEW
AI Next Conference: 7/24/2019
Why AutoViz?
Slide 6
Help explain your hypotheses and variable selection better to others
BENEFITS
Systematic Look for insights systematically rather than through “gut instinct” or
domain knowledge
Simple Reduce features to the most important ones to deliver simple yet
powerful insights
Explainable
AI Next Conference: 7/24/2019
How AutoViz Works
Slide 7
Variable
Classification
Problem
Identification
Complex
Interactions
AutoViz classifies features into
highly granular data types to
determine how best to
represent them in Charts
AutoViz can visualize any
dataset for a given target:
Regression, Classification, Time
Series, Clustering and more
Most charts involve more than
one variable helping to deliver
powerful insights with minimal
effort
Select the Most
Important Features
Select the Best
Charts
Deliver them Fast!
AutoViz uses the powerful ML
algorithm, XGBoost, to select
important features given the
target variable
AutoViz selects the best ways
to visualize your data to extract
insights from your data
AutoViz selects statistically
valid sample data to visualize
(in case data set is very large)
Design Goals
Implementation
AutoViz PROCESS
AI Next Conference: 7/24/2019
Github: https://github.com/AutoViML/AutoViz
AutoViz: Boston Housing*
import AutoViz_Class as AV
AVC = AV.AutoViz_Class()
Just Import...
And Run AutoViz.
dft = AVC.AutoViz('', sep, target, df,lowess=True)
Results...
Slide 8
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
AI Next Conference: 7/24/2019
AutoViz Example: Housing*
● Number of Rooms and Median Value of
Homes seem to be highly correlated
● As Age of Building increases, Median
Value decreases albeit slowly
INSIGHTS
● NOX and DIS seem to be highly
correlated though they seem
to have a polynomial or non-
linear relationship
Slide 9* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
AutoViz Example: Housing
● Both CRIM and ZN are highly skewed
● Both may require a transformation
INSIGHTS
● PTRATIO and DIS seem to be
somewhat skewed as well but
don’t require transformations
Slide 10* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
AutoViz Example: Boston
RM, LSTAT, TAX, INDUS, AGE, and CRIM seem to be
decently correlated with Target. May be worth
exploring if they come up as Important Features.
INSIGHTS
● Average Median Value of
homes varies widely by CHAS
and RAD. Hence would be
important features in any
model.
Slide 11* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
How to Build a better model?
Slide 13
Remove Low Information
and Redundant Features
Add Polynomial and
Interaction, Other
Features
Select Models from
Simple to Complex and
Perform Tuning
Add Entropy Binning,
Stacking to K-Means
Featurizers to model
Add Imbalanced sampling
and training
Perform Ensembling of
Multiple Types of models
BUILD A ViML Model!
(VARIANT INTERPRETABLE MACHINE LEARNING MODEL, Step by Step)
PROCESS
AI Next Conference: 7/24/2019
Why Auto_ViML?
Slide 15
MULTIPLE MODELS
TRANSPARENCY
FEATURE
ENGINEERING
AUTOMATIC
FEATURE
SELECTION
SYSTEMATIC Auto ViML was designed from the ground-up to mimic how a Data
Scientist would approach a Modeling Problem.
Enables selective model complexity by adding features and
complexity step by step
Provides Deep Insights into the Data Set with Full Transparency
Models with Fewer Features result in Simpler Models. Auto_ViML
Produces models with 10-90% Fewer Features than Regular Models
without Significant Loss of Predictive Power*
* Based on my experience. Your results may vary.
Build and test multiple models thru’ Hyper Tuning and Cross Validation
BENEFITS
AI Next Conference: 7/24/2019
Auto_ViML LETS YOU TRY MULTIPLE APPROACHES
Slide 16
You can access all the powerful features of with one line of Python Code after you import.
You can turn on and turn off features and flags to see how they impact Model.
TRY
MULTIPLE
APPROACHES
TO GET THE
BEST MODEL
INTERACTIONS
vs. NO
INTERACTIONS
BOOSTING
vs. BAGGING
ENSEMBLING
vs. STACKING
IMBALANCED
vs.
BALANCED
GRIDSEARCH
vs. RANDOM
SHAP vs.
FEATURE
IMPORTANCES
Just like a Data Scientist
would...
AI Next Conference: 7/24/2019
Github: https://github.com/AutoViML/Auto_ViML
Auto_ViML: Boston Housing
from Auto_ViML import Auto_ViML
Just Import...
model, features, trainm, testm = Auto_ViML(train,
target, test,
sample_submission='',
hyper_param='GS',
scoring_parameter='f1',
Boosting_Flag=None,
KMeans_Featurizer=False,
Add_Poly=0,
Stacking_Flag=False,
Binning_Flag=False,
Imbalanced_Flag=True,
verbose=0)
And Run Auto_VIML.
Slide 17
Get Model, Features and
transformed Train and
Test data...
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
AI Next Conference: 7/24/2019
Here is an example of a Regression data set: Boston
Housing*. There are 13 predictors in the dataset.
But Auto_ViML finds that only 10 variables are needed
to get the job done. Also Watch the Feature
Importances.
Auto_ViML: Boston Housing*
Slide 18
DATA SET SIZE 506 x
14
TIME TAKEN
6 secs
Variables Selected
10
FEATURE REDUCTION
24%
Results:
Start with Linear Model
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
Auto_ViML: Boston Housing
Slide 19
Results:
Move to Random Forests
Time Taken = 30 seconds
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
AI Next Conference: 7/24/2019
Auto_ViML: Boston Housing
Slide 20
Results:
Close with XGBoost
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
Slide 21
AI Next Conference: 7/24/2019
Auto_ViML: Boston Housing
Slide 22
Linear Model with Interaction Variables
Ensemble Model with Binning
Forests Model with Binning Numerics
XGBoost Model with Stacking
Multiple Models
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019 Slide 23
AI Next Conference: 7/24/2019
Auto_ViML: Wisconsin Breast Cancer
Slide 24
DATA SET SIZE
512 x 32
TIME TAKEN
12 Secs
The Wisconsin Breast Cancer* data set is a classic
Data Set: Auto_ViML took 12 Seconds to find the
best features and best model with Weighted F1
score of 100% on validation set using Linear model
Wisconsin Breast Cancer Data Set
FEATURE REDUCTION
52%
Macro Average ROC AUC
100%
Results:
Compare the results
to another model
using Deep Learning
and Keras
Link
“Hyperparameter
Optimization with
Keras” by Mikko
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019
●What’s Missing / Could be Improved:
○No Feature Engineering: You can create your own or use kits like featuretools, etc.
○No Image/Video/NLP Support: At the moment, it removes these features from model considerations
○No Time Series modeling: Auto_TimeSeries is in the works. Stay Tuned.
○No Neural Networks or Deep Learning: You can add your own modules or use tools like Ludwig
○Model serving: Adding a module for test data transformation necessary
Slide 25
Next Steps for AutoViz and Auto_ViML...
●What’s Missing / Could be Improved:
○Build it into Existing Tools such that structured data can be Visualized Fast!
○Build it into Educational tools to make it easy for Students and Colleges (where small, structured
datasets are the Norm) to help Visualize data (as writing code is still very hard for Students)
○Add additional Visualizations such as Pie Charts, Mosaic Charts, etc.
○Build it into Industrial Instruments such as IoT tools so that large data sets can be visualized
Auto_ViML
AutoViz
AI Next Conference: 7/24/2019
THANK YOU
Slide 27

More Related Content

PPTX
Northwestern 20181004 v9
PDF
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
PPTX
Intel 20180608 v2
PDF
VSSML17 L7. REST API, Bindings, and Basic Workflows
PPTX
Scaling up deep learning by scaling down
PDF
Scaling up Deep Learning by Scaling Down
PDF
How to build containerized architectures for deep learning - Data Festival 20...
PPTX
InTTrust -IBM Artificial Intelligence Event
Northwestern 20181004 v9
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Intel 20180608 v2
VSSML17 L7. REST API, Bindings, and Basic Workflows
Scaling up deep learning by scaling down
Scaling up Deep Learning by Scaling Down
How to build containerized architectures for deep learning - Data Festival 20...
InTTrust -IBM Artificial Intelligence Event

Similar to Auto visualization and viml (20)

PDF
6th International Conference on Machine Learning Techniques (MLTEC 2025)
PPTX
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
PPTX
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
PDF
6th International Conference on Machine Learning Techniques (MLTEC 2025)
PPTX
Serverless machine learning architectures at Helixa
PDF
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
PDF
ODSC West 2022 – Kitbashing in ML
PDF
Generative AI at the edge.pdf
PDF
Continuous Intelligence: Moving Machine Learning into Production Reliably
PPTX
[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova
PDF
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdf
PDF
MicroShed Testing
PDF
Lean and agile software because or despite rising complexity by Yves Caseau
PPTX
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...
PPTX
Build and deploy your machine learning models effortlessly (2)
PDF
G107980 top-it-trends-atlanta-v1904b
PPTX
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...
PPTX
Inteligencia artificial, open source e IBM Call for Code
PDF
Industry and academic partnerships july 2015 final
PPTX
An approach for adapting a cobot workstation to human operator within a deep ...
6th International Conference on Machine Learning Techniques (MLTEC 2025)
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
6th International Conference on Machine Learning Techniques (MLTEC 2025)
Serverless machine learning architectures at Helixa
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
ODSC West 2022 – Kitbashing in ML
Generative AI at the edge.pdf
Continuous Intelligence: Moving Machine Learning into Production Reliably
[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdf
MicroShed Testing
Lean and agile software because or despite rising complexity by Yves Caseau
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...
Build and deploy your machine learning models effortlessly (2)
G107980 top-it-trends-atlanta-v1904b
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...
Inteligencia artificial, open source e IBM Call for Code
Industry and academic partnerships july 2015 final
An approach for adapting a cobot workstation to human operator within a deep ...
Ad

More from Bill Liu (20)

PDF
Walk Through a Real World ML Production Project
PDF
Redefining MLOps with Model Deployment, Management and Observability in Produ...
PDF
Productizing Machine Learning at the Edge
PPTX
Transformers in Vision: From Zero to Hero
PDF
Deep AutoViML For Tensorflow Models and MLOps Workflows
PDF
Metaflow: The ML Infrastructure at Netflix
PDF
Practical Crowdsourcing for ML at Scale
PDF
Building large scale transactional data lake using apache hudi
PDF
Deep Reinforcement Learning and Its Applications
PDF
Big Data and AI in Fighting Against COVID-19
PDF
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
PDF
Build computer vision models to perform object detection and classification w...
PDF
Causal Inference in Data Science and Machine Learning
PDF
Weekly #106: Deep Learning on Mobile
PDF
AISF19 - On Blending Machine Learning with Microeconomics
PDF
AISF19 - Travel in the AI-First World
PDF
AISF19 - Unleash Computer Vision at the Edge
PDF
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
PDF
Toronto meetup 20190917
PPTX
Feature Engineering for NLP
Walk Through a Real World ML Production Project
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Productizing Machine Learning at the Edge
Transformers in Vision: From Zero to Hero
Deep AutoViML For Tensorflow Models and MLOps Workflows
Metaflow: The ML Infrastructure at Netflix
Practical Crowdsourcing for ML at Scale
Building large scale transactional data lake using apache hudi
Deep Reinforcement Learning and Its Applications
Big Data and AI in Fighting Against COVID-19
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Build computer vision models to perform object detection and classification w...
Causal Inference in Data Science and Machine Learning
Weekly #106: Deep Learning on Mobile
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - Travel in the AI-First World
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Toronto meetup 20190917
Feature Engineering for NLP
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Auto visualization and viml

  • 1. AI Next Conference: 7/24/2019 Machine Learning Automated Data Visualization Ram Seshadri July 2019 Slide 1 and
  • 2. AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019 “Machine learning teams are still struggling to take advantage of ML due to challenges with inflexible frameworks, lack of reproducibility, collaboration issues, and immature software tools” Cecelia Shao Comet.ml “Why is my Data Science team taking sooo long to complete a simple project?” -- A Frustrated CIO Slide 2 Machine learning teams are still struggling to take advantage of ML due to challenges with inflexible frameworks, lack of reproducibility, collaboration issues, and immature software tools. The Answer?
  • 3. AI Next Conference: 7/24/2019 Faster Visualization Automatic Feature Selection • Auto_ViML Automatic Model Selection and Tuning • Auto_ViML One Click Model Serving and Production Auto_ViML was designed along with AutoViz to Build Variant Interpretable Machine Learning Models Fast! __ ● They are proprietary and expensive (lock-in) ● Black Boxes which are too complex to interpret ● Very little reproducibility outside of tool HOWEVER CURRENT TOOLS ARE LIMITED BECAUSE... • AutoViz INTRODUCING A SIMPLER APPROACH TO AUTO-ML Slide 3 How can we make DATA SCIENTISTS more productive? AI Next Conference: 7/24/2019
  • 4. AI Next Conference: 7/24/2019 ●Open Source Tools for Faster Time to Insights with Design Goals as: ○Simple: Invoke them with a single Line of Code (each) ○Flexible: Suited to any kind of structured data set with no Prep required ○Incremental: Can be used by anyone from beginners to experts alike ○Experimental: Compare multiple visualization methods and models step by step ○Interpretable: get clear explanation of steps taken with validation graphs ○Reproducible: No Black Box. Reproducible model pipelines and outputs ○Extensible: Open Source with contributions from Python and DS community I Built AutoViz and Auto_ViML to make my own life easier. Hope it will do the same for you. Slide 4 What is Auto_Viz and Auto_ViML?
  • 5. AI Next Conference: 7/24/2019 What is AutoViz? Slide 5 AutoViz enables you to automatically visualize any data set with a Single Line of Code. It automatically: 1. Selects a Random Sample from the Data Set (if the Data Set is very large) 2. Selects most important features using ML (if Number of Variables is very large) 3. Selects Best Methods to Visualize Data for a given problem 4. Provides Charts to be saved in PNG, JPG, and SVG Formats OVERVIEW
  • 6. AI Next Conference: 7/24/2019 Why AutoViz? Slide 6 Help explain your hypotheses and variable selection better to others BENEFITS Systematic Look for insights systematically rather than through “gut instinct” or domain knowledge Simple Reduce features to the most important ones to deliver simple yet powerful insights Explainable
  • 7. AI Next Conference: 7/24/2019 How AutoViz Works Slide 7 Variable Classification Problem Identification Complex Interactions AutoViz classifies features into highly granular data types to determine how best to represent them in Charts AutoViz can visualize any dataset for a given target: Regression, Classification, Time Series, Clustering and more Most charts involve more than one variable helping to deliver powerful insights with minimal effort Select the Most Important Features Select the Best Charts Deliver them Fast! AutoViz uses the powerful ML algorithm, XGBoost, to select important features given the target variable AutoViz selects the best ways to visualize your data to extract insights from your data AutoViz selects statistically valid sample data to visualize (in case data set is very large) Design Goals Implementation AutoViz PROCESS
  • 8. AI Next Conference: 7/24/2019 Github: https://github.com/AutoViML/AutoViz AutoViz: Boston Housing* import AutoViz_Class as AV AVC = AV.AutoViz_Class() Just Import... And Run AutoViz. dft = AVC.AutoViz('', sep, target, df,lowess=True) Results... Slide 8 Thanks to UCI Machine Learning Repository for all data sets in this presentation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • 9. AI Next Conference: 7/24/2019 AutoViz Example: Housing* ● Number of Rooms and Median Value of Homes seem to be highly correlated ● As Age of Building increases, Median Value decreases albeit slowly INSIGHTS ● NOX and DIS seem to be highly correlated though they seem to have a polynomial or non- linear relationship Slide 9* Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 10. AI Next Conference: 7/24/2019 AutoViz Example: Housing ● Both CRIM and ZN are highly skewed ● Both may require a transformation INSIGHTS ● PTRATIO and DIS seem to be somewhat skewed as well but don’t require transformations Slide 10* Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 11. AI Next Conference: 7/24/2019 AutoViz Example: Boston RM, LSTAT, TAX, INDUS, AGE, and CRIM seem to be decently correlated with Target. May be worth exploring if they come up as Important Features. INSIGHTS ● Average Median Value of homes varies widely by CHAS and RAD. Hence would be important features in any model. Slide 11* Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 12. AI Next Conference: 7/24/2019 How to Build a better model? Slide 13 Remove Low Information and Redundant Features Add Polynomial and Interaction, Other Features Select Models from Simple to Complex and Perform Tuning Add Entropy Binning, Stacking to K-Means Featurizers to model Add Imbalanced sampling and training Perform Ensembling of Multiple Types of models BUILD A ViML Model! (VARIANT INTERPRETABLE MACHINE LEARNING MODEL, Step by Step) PROCESS
  • 13. AI Next Conference: 7/24/2019 Why Auto_ViML? Slide 15 MULTIPLE MODELS TRANSPARENCY FEATURE ENGINEERING AUTOMATIC FEATURE SELECTION SYSTEMATIC Auto ViML was designed from the ground-up to mimic how a Data Scientist would approach a Modeling Problem. Enables selective model complexity by adding features and complexity step by step Provides Deep Insights into the Data Set with Full Transparency Models with Fewer Features result in Simpler Models. Auto_ViML Produces models with 10-90% Fewer Features than Regular Models without Significant Loss of Predictive Power* * Based on my experience. Your results may vary. Build and test multiple models thru’ Hyper Tuning and Cross Validation BENEFITS
  • 14. AI Next Conference: 7/24/2019 Auto_ViML LETS YOU TRY MULTIPLE APPROACHES Slide 16 You can access all the powerful features of with one line of Python Code after you import. You can turn on and turn off features and flags to see how they impact Model. TRY MULTIPLE APPROACHES TO GET THE BEST MODEL INTERACTIONS vs. NO INTERACTIONS BOOSTING vs. BAGGING ENSEMBLING vs. STACKING IMBALANCED vs. BALANCED GRIDSEARCH vs. RANDOM SHAP vs. FEATURE IMPORTANCES Just like a Data Scientist would...
  • 15. AI Next Conference: 7/24/2019 Github: https://github.com/AutoViML/Auto_ViML Auto_ViML: Boston Housing from Auto_ViML import Auto_ViML Just Import... model, features, trainm, testm = Auto_ViML(train, target, test, sample_submission='', hyper_param='GS', scoring_parameter='f1', Boosting_Flag=None, KMeans_Featurizer=False, Add_Poly=0, Stacking_Flag=False, Binning_Flag=False, Imbalanced_Flag=True, verbose=0) And Run Auto_VIML. Slide 17 Get Model, Features and transformed Train and Test data... Thanks to UCI Machine Learning Repository for all data sets in this presentation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • 16. AI Next Conference: 7/24/2019 Here is an example of a Regression data set: Boston Housing*. There are 13 predictors in the dataset. But Auto_ViML finds that only 10 variables are needed to get the job done. Also Watch the Feature Importances. Auto_ViML: Boston Housing* Slide 18 DATA SET SIZE 506 x 14 TIME TAKEN 6 secs Variables Selected 10 FEATURE REDUCTION 24% Results: Start with Linear Model * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 17. AI Next Conference: 7/24/2019 Auto_ViML: Boston Housing Slide 19 Results: Move to Random Forests Time Taken = 30 seconds Thanks to UCI Machine Learning Repository for all data sets in this presentation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • 18. AI Next Conference: 7/24/2019 Auto_ViML: Boston Housing Slide 20 Results: Close with XGBoost * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 20. AI Next Conference: 7/24/2019 Auto_ViML: Boston Housing Slide 22 Linear Model with Interaction Variables Ensemble Model with Binning Forests Model with Binning Numerics XGBoost Model with Stacking Multiple Models * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 21. AI Next Conference: 7/24/2019 Slide 23
  • 22. AI Next Conference: 7/24/2019 Auto_ViML: Wisconsin Breast Cancer Slide 24 DATA SET SIZE 512 x 32 TIME TAKEN 12 Secs The Wisconsin Breast Cancer* data set is a classic Data Set: Auto_ViML took 12 Seconds to find the best features and best model with Weighted F1 score of 100% on validation set using Linear model Wisconsin Breast Cancer Data Set FEATURE REDUCTION 52% Macro Average ROC AUC 100% Results: Compare the results to another model using Deep Learning and Keras Link “Hyperparameter Optimization with Keras” by Mikko * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
  • 23. AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019 ●What’s Missing / Could be Improved: ○No Feature Engineering: You can create your own or use kits like featuretools, etc. ○No Image/Video/NLP Support: At the moment, it removes these features from model considerations ○No Time Series modeling: Auto_TimeSeries is in the works. Stay Tuned. ○No Neural Networks or Deep Learning: You can add your own modules or use tools like Ludwig ○Model serving: Adding a module for test data transformation necessary Slide 25 Next Steps for AutoViz and Auto_ViML... ●What’s Missing / Could be Improved: ○Build it into Existing Tools such that structured data can be Visualized Fast! ○Build it into Educational tools to make it easy for Students and Colleges (where small, structured datasets are the Norm) to help Visualize data (as writing code is still very hard for Students) ○Add additional Visualizations such as Pie Charts, Mosaic Charts, etc. ○Build it into Industrial Instruments such as IoT tools so that large data sets can be visualized Auto_ViML AutoViz
  • 24. AI Next Conference: 7/24/2019 THANK YOU Slide 27