SlideShare a Scribd company logo
Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning
Automated Machine Learning in Python
PyCon JP 2019
AI Lab
Python AutoML
Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning
CyberAgent AI Lab
Masashi SHIBATA
c-bata c_bata_
Python
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
1
2
3
4
Automated Feature Engineering
AutoML
Automated Hyperparameter Optimization
Automated Algorithm(Model) Selection
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 1
AutoML


Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions


Automated Hyperparameter Optimization

Hyperopt, Optuna, SMAC3, scikit-optimize, …
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions


HPO + Automated Feature Engineering

featuretools, tsfresh, boruta, …


Automated Algorithm(Model) Selection

Auto-sklearn, TPOT, H2O, auto_ml, MLBox, …
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 2
Grid Search / Random Search
PythonとAutoML at PyConJP 2019
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.


: 

:
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.






Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.




Jamieson, K. G. and Talwalkar, A. S.: Non-stochastic Best Arm Identification

and Hyperparameter Optimization, in AIS-TATS (2016).
10 epochs
trial #1
trial #2
trial #3
trial #4
trial #5
trial #6
trial #7
trial #8
trial #9
30 epochs 90 epochs
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina,
Moritz Hardt, Benjamin Recht, and Ameet Talwalkar.

Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 2018.


PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019
TPE, Asynchronous Successive Halving, Median Stopping Rule
Define-by-Run
https://github.com/pfnet/optuna
PythonとAutoML at PyConJP 2019
rung 0
0.088 0.056 0.035 0.027
0.495
0.122
0.150
0.788
0.093
0.115
0.238
0.106
0.104 0.058
trial 0
trial 4
trial 2
trial 6
trial 24
trial 1
trial 8
trial 5
trial 18
trial 3
trial 7
rung 1 rung 2 rung 3
rung


10 worker 









scikit-optimize


Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 3
1. Feature Preprocessing Operators.

StandardScaler, RobustScaler,
MinMaxScaler, MaxAbsScaler,
RandomizedPCA, Binarizer, and
PolynomialFeatures.

2. Feature Selection Operators:
VarianceThreshold, SelectKBest,
SelectPercentile, SelectFwe, and
Recursive Feature Elimination (RFE).
AutoML feature preprocessing
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning.

In Neural Information Processing Systems (NIPS), 2015
R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization

tool for automating machine learning.

In Workshop on Automatic Machine Learning, 2016
TPOT Auto-sklearn
1. Feature Preprocessing Operators.

StandardScaler, RobustScaler,
MinMaxScaler, MaxAbsScaler,
RandomizedPCA, Binarizer, and
PolynomialFeatures.

2. Feature Selection Operators:
VarianceThreshold, SelectKBest,
SelectPercentile, SelectFwe, and
Recursive Feature Elimination (RFE).
AutoML feature preprocessing
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning.

In Neural Information Processing Systems (NIPS), 2015
R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization

tool for automating machine learning.

In Workshop on Automatic Machine Learning, 2016
TPOT Auto-sklearn
TPOT Auto-sklearn AutoML
• featuretools: Deep feature synthesis

• tsfresh

Wrapper methods, Filter methods, Embedded methods

• scikit-learn Boruta
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019








63 

794
Time Series FeatuRE extraction based on Scalable Hypothesis tests
https://github.com/blue-yonder/tsfresh
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Guyon and A. Elisseeff. An introduction to variable and feature selection.

Journal of Machine Learning Research, 3:1157–1182, 2003.
Filter method
Wrapper method
sklearn.feature_selection.RFE(Recursive Feature Elimination), Boruta (boruta_py)
Embedded method
scikit-learn feature_importances_
Guyon and A. Elisseeff. An introduction to variable and feature selection.

Journal of Machine Learning Research, 3:1157–1182, 2003.
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 4
( )
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
• AutoML 2 

• ML 

• ML 

• 

AutoML as a CASH Problem
Combined Algorithm Selection and Hyperparameter optimization
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning. In Neural Information Processing Systems (NIPS), 2015
Using Optuna for CASH Problems
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Optuna for CASH Problem
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Algorithm Selection
Optuna for CASH Problem
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Hyperparameter optimization
• auto-sklearn

• TPOT

• h2o-3

• auto_ml (unmaintained)

• MLBox
Adithya Balaji, Alexander Allen Benchmarking Automatic Machine Learning Frameworks https://arxiv.org/pdf/1808.06492v1.pdf
AutoML
SMAC3
ChaLearn AutoML challenge 2 track
Auto-sklearn
import sklearn.metrics
import autosklearn.classification
X_train, X_test, y_train, y_test = train_test_split(…)
automl = autosklearn.classification.AutoSklearnClassifier(…)
automl.fit(X_train.copy(), y_train.copy(), dataset_name='breast_cancer')
print(automl.show_models())
predictions = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
https://github.com/automl/auto-sklearn
TPOT
https://github.com/EpistasisLab/tpot
Tree-based Pipeline Optimization Tool for Automating Data Science


TPOT
https://github.com/EpistasisLab/tpot
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(…)
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')
• 20 preprocessors

• 16 feature selectors,

• 1-hot encoding, missing
value imputation,
balancing, scaling

• 17 classifiers

• pre-defined
hyperparameter spaces
• 20 preprocessors

• 12 classifiers

• pre-defined
hyperparameter spaces

• Pipeline search space:

• Flexible: combining
tree-shaped pipelines
Auto-sklearn TPOT
Automated Neural
Architecture Search
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning


THANK YOU


PythonとAutoML at PyConJP 2019





More Related Content

PDF
異常行動検出入門(改)
PDF
MP3と音声圧縮(simple)
PDF
PRML Chapter5.2
PDF
PRML輪読#4
PDF
PRML 5.3-5.4
PDF
OpenAI の音声認識 AI「Whisper」をテストしてみた
PDF
PRML上巻勉強会 at 東京大学 資料 第5章5.1 〜 5.3.1
PDF
[DL輪読会]画像を使ったSim2Realの現況
異常行動検出入門(改)
MP3と音声圧縮(simple)
PRML Chapter5.2
PRML輪読#4
PRML 5.3-5.4
OpenAI の音声認識 AI「Whisper」をテストしてみた
PRML上巻勉強会 at 東京大学 資料 第5章5.1 〜 5.3.1
[DL輪読会]画像を使ったSim2Realの現況

What's hot (20)

PDF
Feature Importance Analysis with XGBoost in Tax audit
PDF
Scikit-Learn: Machine Learning in Python
PPTX
PDF
サポートベクターマシン(SVM)の数学をみんなに説明したいだけの会
PPTX
古典的ゲームAIを用いたAlphaGo解説
PDF
PRML輪読#1
PPTX
Mixed Precision Training
PDF
物体検知(Meta Study Group 発表資料)
PDF
用 C# 與 .NET 也能打造機器學習模型:你所不知道的 ML.NET 初體驗
PPTX
Nltk
PDF
10分でわかる主成分分析(PCA)
PPTX
Prml 最尤推定からベイズ曲線フィッティング
PDF
bag-of-words models
PPTX
2014 3 13(テンソル分解の基礎)
PDF
Optimizer入門&最新動向
PDF
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
PDF
Neural text-to-speech and voice conversion
PPTX
adversarial training.pptx
PDF
RTBにおける機械学習の活用事例
PDF
モデル最適化指標・評価指標の選び方
Feature Importance Analysis with XGBoost in Tax audit
Scikit-Learn: Machine Learning in Python
サポートベクターマシン(SVM)の数学をみんなに説明したいだけの会
古典的ゲームAIを用いたAlphaGo解説
PRML輪読#1
Mixed Precision Training
物体検知(Meta Study Group 発表資料)
用 C# 與 .NET 也能打造機器學習模型:你所不知道的 ML.NET 初體驗
Nltk
10分でわかる主成分分析(PCA)
Prml 最尤推定からベイズ曲線フィッティング
bag-of-words models
2014 3 13(テンソル分解の基礎)
Optimizer入門&最新動向
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
Neural text-to-speech and voice conversion
adversarial training.pptx
RTBにおける機械学習の活用事例
モデル最適化指標・評価指標の選び方
Ad

Similar to PythonとAutoML at PyConJP 2019 (20)

PPTX
Ember
PPTX
Paper summary
PPTX
Driverless Machine Learning Web App
PPTX
Towards Human-Guided Machine Learning - IUI 2019
PDF
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
PDF
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
PPTX
CSL0777-L07.pptx
PDF
Using Bayesian Optimization to Tune Machine Learning Models
PDF
Using Bayesian Optimization to Tune Machine Learning Models
PDF
AI Library - An Open Source Machine Learning Framework
PPTX
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
PDF
AutoML - The Future of AI
PDF
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
PPTX
Getting Started with Azure AutoML
PPTX
AutoML.pptx
PPTX
AutoML.pptx
PPTX
An introduction to Machine Learning with scikit-learn (October 2018)
PDF
Guiding through a typical Machine Learning Pipeline
PDF
DutchMLSchool. ML Automation
PPTX
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Ember
Paper summary
Driverless Machine Learning Web App
Towards Human-Guided Machine Learning - IUI 2019
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
CSL0777-L07.pptx
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
AI Library - An Open Source Machine Learning Framework
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
AutoML - The Future of AI
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Getting Started with Azure AutoML
AutoML.pptx
AutoML.pptx
An introduction to Machine Learning with scikit-learn (October 2018)
Guiding through a typical Machine Learning Pipeline
DutchMLSchool. ML Automation
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Ad

More from Masashi Shibata (20)

PDF
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
PDF
実践Djangoの読み方 - みんなのPython勉強会 #72
PDF
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
PDF
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
PDF
Implementing sobol's quasirandom sequence generator
PDF
DARTS: Differentiable Architecture Search at 社内論文読み会
PDF
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
PDF
Djangoアプリのデプロイに関するプラクティス / Deploy django application
PDF
Django REST Framework における API 実装プラクティス | PyCon JP 2018
PDF
Django の認証処理実装パターン / Django Authentication Patterns
PDF
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
PDF
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
PDF
Golangにおける端末制御 リッチなターミナルUIの実現方法
PDF
How to develop a rich terminal UI application
PDF
Introduction of Feedy
PDF
Webフレームワークを作ってる話 #osakapy
PDF
Pythonのすすめ
PDF
pandasによるデータ加工時の注意点やライブラリの話
PDF
Pythonistaのためのデータ分析入門 - C4K Meetup #3
PDF
テスト駆動開発入門 - C4K Meetup#2
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
実践Djangoの読み方 - みんなのPython勉強会 #72
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
Implementing sobol's quasirandom sequence generator
DARTS: Differentiable Architecture Search at 社内論文読み会
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Djangoアプリのデプロイに関するプラクティス / Deploy django application
Django REST Framework における API 実装プラクティス | PyCon JP 2018
Django の認証処理実装パターン / Django Authentication Patterns
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
Golangにおける端末制御 リッチなターミナルUIの実現方法
How to develop a rich terminal UI application
Introduction of Feedy
Webフレームワークを作ってる話 #osakapy
Pythonのすすめ
pandasによるデータ加工時の注意点やライブラリの話
Pythonistaのためのデータ分析入門 - C4K Meetup #3
テスト駆動開発入門 - C4K Meetup#2

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Mega Projects Data Mega Projects Data
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
annual-report-2024-2025 original latest.
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
oil_refinery_comprehensive_20250804084928 (1).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Acumen Training GuidePresentation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
ISS -ESG Data flows What is ESG and HowHow
Mega Projects Data Mega Projects Data
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Fluorescence-microscope_Botany_detailed content
annual-report-2024-2025 original latest.
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Lecture1 pattern recognition............

PythonとAutoML at PyConJP 2019