SlideShare a Scribd company logo
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이터 사이언티스트
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adding ML feature into your application
using AutoGluon with no ML expertise
Muhyun Kim
Senior Data Scientist
Amazon ML Solutions Lab, AWS
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common ML problems
Regression Classification
Tabular
Prediction
Image
Classification
Object
Detection
Text
Classification
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to solve ML problems
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Required skills in building ML models
Data science for feature engineering
Machine Learning and Deep Learning for modeling
Model tuning experience
ML and DL toolkits such as scikit-learn, TensorFlow, PyTorch, MXNet
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Do I need to learn
machine learning or deep learning
and
ML/DL framework
as an application developer or data analysist?
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AutoML
“Automated machine learning (AutoML) is the process of automating the
process of applying machine learning to real-world problems. AutoML covers the
complete pipeline from the raw dataset to the deployable machine learning
model.” - Wikipedia
Automated Machine Learning provides methods and processes to make
Machine Learning available for non-Machine Learning experts, to improve
efficiency of Machine Learning and to accelerate research on Machine
Learning. - AutoML.org
- Hyperparameter optimization
- Meta-learning (learning to learn)
- Neural architecture search
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AutoGluon - AutoML Toolkit for Deep Learning
https://autogluon.mxnet.io
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AutoGluon for “All”
Developers/Analysists
with no ML skill
Automating all ML pipeline
- feature engineering
- model selection
- model training
- hyperparameter optimization
ML Experts
• Quick model prototyping for baseline
• Hyperparameter optimization
• Optimizing custom models
Researchers
• Model optimization
• Searching for new architectures
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What you can do with AutoGluon
• Quick prototyping achieving the state-of-the-art performance for
• Tabular prediction
• Image classification
• Object detection
• Text classification
• Customizing model searching
• Hyperparameter optimization on model training in Python or PyTorch
• Neural Architecture Searching
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simple 3 steps to get the best ML model
Step 1. Prepare your dataset
Step 2. Load the dataset for training ML
Step 3. Call fit() to get the best ML model
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What happens behind the scene
Loading the dataset for training ML
- ML problem defined (binary/multiple classification or regression)
- Feature engineering for each model being trained
- Missing value handling
- Splitting dataset into training and validation
Calling fit() to get the best ML model
- Training models
- Hyperparameter optimization
- Model selection
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common ML problems
Regression Classification
Tabular
Prediction
Image
Classification
Object
Detection
Text
Classification
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ML algorithms for Tabular prediction
• Random Forest
• https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
• XT (Extremely randomized trees)
• https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
• K-nearest neighbors
• https://scikit-
learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
• CatBoost - gradient boosting on decision trees
• https://catboost.ai/
• LightGBM
• https://lightgbm.readthedocs.io
• Neural Network
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s prepare a tabular dataset
A structured data stored in CSV format where
• each row represents an example and
• each column presents the measurements of some variable or feature
Files stored either in an Amazon S3 bucket or the local file system
Some data found in “Adult data set (https://archive.ics.uci.edu/ml/datasets/adult)”
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
# CUDA 10.0 and a GPU for object detection is recommended
# We install MXNet to utilize deep learning models
# For Linux with GPU installed
pip install --upgrade mxnet-cu100
# For Linux without GPU
pip install --upgrade mxnet
# Install AutoGluon package
pip install autogluon
Step 0: Install AutoGluon
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
from autogluon import TabularPrediction as task
train_path = 'https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/train_data.csv’
train_data = task.Dataset(file_path=train_path)
Step 1: Loading dataset
file_path
df
feature_types
subsample
name
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
predictor = task.fit(train_data, label='class', output_directory='ag-example-out/')
Step 2: Training ML models
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
parameters of fit()
https://autogluon.mxnet.io/api/autogluon.task.html#autogluon.task.TabularPrediction.fit
‘random’ (random search), ‘skopt’ (SKopt Bayesian
optimization), ‘grid’ (grid search), ‘hyperband’
(Hyperband), ‘rl’ (reinforcement learner)
‘mxboard’, ‘tensorboard’, ‘none’
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
test_path = 'https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/test_data.csv'
test_data = task.Dataset(file_path=test_path)
leaderboard = predictor.leaderboard(test_data)
Step 3: Evaluate the model
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
predictor = task.load('ag-example-out/’)
test_path = 'https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/test_data.csv'
test_data = task.Dataset(file_path=test_path)
y_test = test_data['class']
test_data_nolabel = test_data.drop(labels=['class'],axis=1)
y_pred = predictor.predict(test_data_nolabel)
Step 4: Use the model in your app
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
parameters of predict()
https://autogluon.mxnet.io/api/autogluon.task.html#autogluon.task.tabular_prediction.TabularPredictor.predict
dataset : `TabularDataset` or `pandas.DataFrame`
The dataset to make predictions for. Should contain same column names as training Dataset and follow same format
model : str (optional)
The name of the model to get predictions from. Defaults to None, which uses the highest scoring model on the
validation set.
as_pandas : bool (optional)
Whether to return the output as a pandas Series (True) or numpy array (False)
use_pred_cache : bool (optional)
Whether to used previously-cached predictions for table rows we have already predicted on before
add_to_pred_cache : bool (optional)
Whether these predictions should be cached for reuse in future `predict()` calls on the same table rows
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Image classification
from autogluon import ImageClassification as task
# Loading dataset
dataset = task.Dataset('./data/shopeeiet/train’)
# Train image classification models
time_limits = 10 * 60 # 10mins
classifier = task.fit(dataset,
time_limits=time_limits,
ngpus_per_trial=1)
# Test the trained model
test_dataset = task.Dataset('./data/shopeeiet/test')
inds, probs, probs_all = classifier.predict(test_dataset)
shopeeiet/train
├── BabyBibs
├── BabyHat
├── BabyPants
├── ...
shopeeiet/test
├── ...
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
For advanced features of AutoGluon
Other AutoML tasks
- Image classification
- Object detection
- Text classification
Hyperparameter optimization
- Hyperparameter search space and search algorithm customization
- Distributed search
Neural architecture search
- ENAS/ProxylessNAS
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Efficient NAS on Target Hardware: ProxylessNAS
https://autogluon.mxnet.io/tutorials/nas/enas_mnist.html
(Source) PROXYLESSNAS: DIRECTNEURALARCHITECTURESEARCH ONTARGETTASK ANDHARDWARE
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
More toolkits for developers
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resources
AutoGluon (https://autogluon.mxnet.io)
GluonCV (https://gluon-cv.mxnet.io)
AWS Computer Vision: Getting started with GluonCV
(https://www.coursera.org/learn/aws-computer-vision-gluoncv)
Deep Java Library (https://djl.ai)
Dive into Deep Learning (https://d2l.ai, https://ko.d2l.ai)
Machine Learning on AWS (https://ml.aws)
Amazon Machine Learning Solutions Lab (https://aws.amazon.com/ml-solutions-lab)
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS 머신러닝(ML) 교육 및 자격증
Amazon의 개발자와 데이터 과학자를 교육하는 데 직접 활용 되었던 커리큘럼을 기반으로 학습하세요!
전체 팀을 위한
머신러닝 교육
원하는 방법으로!
교육 유연성 제공
전문성에 대한
검증
비즈니스 의사 결정자,
데이터 과학자, 개발자,
데이터 플랫폼 엔지니어 등
역할에 따라 제공되는
맞춤형 학습 경로를
확인하세요.
약 65개 이상의
온라인 과정 및
AWS 전문 강사를 통해
실습과 실적용의 기회가
제공되는 강의실 교육이
준비되어 있습니다.
업계에서 인정받는
‘AWS 공인 머신러닝 – 전문분야’
자격증을 통해
머신러닝 모델을 구축, 학습, 튜닝
및 배포하는 데 필요한
전문 지식이 있음을
입증할 수 있습니다.
https://aws.amazon.com/ko/training
/learning-paths/machine-learning/
Thank you!
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Muhyun Kim
muhyun@amazon.com

More Related Content

PDF
Search @ Spotify
PDF
When Digital becomes Human
PDF
Modern Data Science
PDF
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
PPTX
How to become Data Analyst?
PDF
Data Science Introduction
PDF
Full-stack Data Scientist
PDF
Introduction To Data Science
Search @ Spotify
When Digital becomes Human
Modern Data Science
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
How to become Data Analyst?
Data Science Introduction
Full-stack Data Scientist
Introduction To Data Science

What's hot (19)

PPTX
How to Teach and Learn with ChatGPT - BETT 2023
PPT
Neo4J : Introduction to Graph Database
PDF
Empowering Cities with Data and Knowledge Graphs
PDF
Revolutionizing your Business with AI (AUC VLabs).pdf
PPTX
AI For Enterprise
PDF
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
PPT
Capstone Presentation
PDF
Data science
PDF
Azure BI Cloud Architectural Guidelines.pdf
PPTX
The AI Revolution - Aaron Stelle WFG Title
PDF
Recommending and Searching (Research @ Spotify)
PDF
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
PPTX
Data science & data scientist
PPTX
Data Science: Past, Present, and Future
PDF
Past, present, and future of Recommender Systems: an industry perspective
PDF
Data Architecture Strategies: The Rise of the Graph Database
PPT
Introduction to Business Intelligence
PDF
Homepage Personalization at Spotify
PPTX
Introduction to Data Engineering
How to Teach and Learn with ChatGPT - BETT 2023
Neo4J : Introduction to Graph Database
Empowering Cities with Data and Knowledge Graphs
Revolutionizing your Business with AI (AUC VLabs).pdf
AI For Enterprise
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
Capstone Presentation
Data science
Azure BI Cloud Architectural Guidelines.pdf
The AI Revolution - Aaron Stelle WFG Title
Recommending and Searching (Research @ Spotify)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
Data science & data scientist
Data Science: Past, Present, and Future
Past, present, and future of Recommender Systems: an industry perspective
Data Architecture Strategies: The Rise of the Graph Database
Introduction to Business Intelligence
Homepage Personalization at Spotify
Introduction to Data Engineering
Ad

Similar to [AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이터 사이언티스트 (14)

PPTX
Amazon SageMaker (December 2018)
PDF
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
PPTX
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
PPTX
An Introduction to Amazon SageMaker (October 2018)
PPTX
An introduction to Machine Learning with scikit-learn (October 2018)
PDF
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
PPTX
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
PPTX
Deep Learning on Amazon Sagemaker (July 2019)
PPTX
Building Machine Learning Models Automatically (June 2020)
PDF
Amazon SageMaker workshop
PPTX
Build, train and deploy your ML models with Amazon Sage Maker
PPTX
Where ml ai_heavy
PDF
[AWS Dev Day] 실습워크샵 | 모두를 위한 컴퓨터 비전 딥러닝 툴킷, GluonCV 따라하기
PDF
Building Content Recommendation Systems using MXNet Gluon
Amazon SageMaker (December 2018)
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
An Introduction to Amazon SageMaker (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
Deep Learning on Amazon Sagemaker (July 2019)
Building Machine Learning Models Automatically (June 2020)
Amazon SageMaker workshop
Build, train and deploy your ML models with Amazon Sage Maker
Where ml ai_heavy
[AWS Dev Day] 실습워크샵 | 모두를 위한 컴퓨터 비전 딥러닝 툴킷, GluonCV 따라하기
Building Content Recommendation Systems using MXNet Gluon
Ad

More from Amazon Web Services Korea (20)

PDF
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
PDF
[D3T1S06] Neptune Analytics with Vector Similarity Search
PDF
[D3T1S03] Amazon DynamoDB design puzzlers
PDF
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
PDF
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
PDF
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
PDF
[D3T1S02] Aurora Limitless Database Introduction
PDF
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
PDF
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 2
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 1
PDF
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
PDF
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
PDF
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
PDF
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
PDF
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
PDF
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
PDF
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
PDF
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
PDF
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S02] Aurora Limitless Database Introduction
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 1
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation theory and applications.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
cuic standard and advanced reporting.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
Programs and apps: productivity, graphics, security and other tools
Encapsulation theory and applications.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이터 사이언티스트

  • 2. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adding ML feature into your application using AutoGluon with no ML expertise Muhyun Kim Senior Data Scientist Amazon ML Solutions Lab, AWS
  • 3. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common ML problems Regression Classification Tabular Prediction Image Classification Object Detection Text Classification
  • 4. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to solve ML problems
  • 5. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Required skills in building ML models Data science for feature engineering Machine Learning and Deep Learning for modeling Model tuning experience ML and DL toolkits such as scikit-learn, TensorFlow, PyTorch, MXNet
  • 6. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Do I need to learn machine learning or deep learning and ML/DL framework as an application developer or data analysist?
  • 7. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AutoML “Automated machine learning (AutoML) is the process of automating the process of applying machine learning to real-world problems. AutoML covers the complete pipeline from the raw dataset to the deployable machine learning model.” - Wikipedia Automated Machine Learning provides methods and processes to make Machine Learning available for non-Machine Learning experts, to improve efficiency of Machine Learning and to accelerate research on Machine Learning. - AutoML.org - Hyperparameter optimization - Meta-learning (learning to learn) - Neural architecture search
  • 8. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AutoGluon - AutoML Toolkit for Deep Learning https://autogluon.mxnet.io
  • 9. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AutoGluon for “All” Developers/Analysists with no ML skill Automating all ML pipeline - feature engineering - model selection - model training - hyperparameter optimization ML Experts • Quick model prototyping for baseline • Hyperparameter optimization • Optimizing custom models Researchers • Model optimization • Searching for new architectures
  • 10. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. What you can do with AutoGluon • Quick prototyping achieving the state-of-the-art performance for • Tabular prediction • Image classification • Object detection • Text classification • Customizing model searching • Hyperparameter optimization on model training in Python or PyTorch • Neural Architecture Searching
  • 11. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simple 3 steps to get the best ML model Step 1. Prepare your dataset Step 2. Load the dataset for training ML Step 3. Call fit() to get the best ML model
  • 13. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. What happens behind the scene Loading the dataset for training ML - ML problem defined (binary/multiple classification or regression) - Feature engineering for each model being trained - Missing value handling - Splitting dataset into training and validation Calling fit() to get the best ML model - Training models - Hyperparameter optimization - Model selection
  • 14. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common ML problems Regression Classification Tabular Prediction Image Classification Object Detection Text Classification
  • 15. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. ML algorithms for Tabular prediction • Random Forest • https://scikit- learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html • XT (Extremely randomized trees) • https://scikit- learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html • K-nearest neighbors • https://scikit- learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html • CatBoost - gradient boosting on decision trees • https://catboost.ai/ • LightGBM • https://lightgbm.readthedocs.io • Neural Network
  • 16. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s prepare a tabular dataset A structured data stored in CSV format where • each row represents an example and • each column presents the measurements of some variable or feature Files stored either in an Amazon S3 bucket or the local file system Some data found in “Adult data set (https://archive.ics.uci.edu/ml/datasets/adult)”
  • 17. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. # CUDA 10.0 and a GPU for object detection is recommended # We install MXNet to utilize deep learning models # For Linux with GPU installed pip install --upgrade mxnet-cu100 # For Linux without GPU pip install --upgrade mxnet # Install AutoGluon package pip install autogluon Step 0: Install AutoGluon
  • 18. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. from autogluon import TabularPrediction as task train_path = 'https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/train_data.csv’ train_data = task.Dataset(file_path=train_path) Step 1: Loading dataset file_path df feature_types subsample name
  • 19. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. predictor = task.fit(train_data, label='class', output_directory='ag-example-out/') Step 2: Training ML models
  • 20. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. parameters of fit() https://autogluon.mxnet.io/api/autogluon.task.html#autogluon.task.TabularPrediction.fit ‘random’ (random search), ‘skopt’ (SKopt Bayesian optimization), ‘grid’ (grid search), ‘hyperband’ (Hyperband), ‘rl’ (reinforcement learner) ‘mxboard’, ‘tensorboard’, ‘none’
  • 21. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. test_path = 'https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/test_data.csv' test_data = task.Dataset(file_path=test_path) leaderboard = predictor.leaderboard(test_data) Step 3: Evaluate the model
  • 22. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. predictor = task.load('ag-example-out/’) test_path = 'https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/test_data.csv' test_data = task.Dataset(file_path=test_path) y_test = test_data['class'] test_data_nolabel = test_data.drop(labels=['class'],axis=1) y_pred = predictor.predict(test_data_nolabel) Step 4: Use the model in your app
  • 23. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. parameters of predict() https://autogluon.mxnet.io/api/autogluon.task.html#autogluon.task.tabular_prediction.TabularPredictor.predict dataset : `TabularDataset` or `pandas.DataFrame` The dataset to make predictions for. Should contain same column names as training Dataset and follow same format model : str (optional) The name of the model to get predictions from. Defaults to None, which uses the highest scoring model on the validation set. as_pandas : bool (optional) Whether to return the output as a pandas Series (True) or numpy array (False) use_pred_cache : bool (optional) Whether to used previously-cached predictions for table rows we have already predicted on before add_to_pred_cache : bool (optional) Whether these predictions should be cached for reuse in future `predict()` calls on the same table rows
  • 24. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Image classification from autogluon import ImageClassification as task # Loading dataset dataset = task.Dataset('./data/shopeeiet/train’) # Train image classification models time_limits = 10 * 60 # 10mins classifier = task.fit(dataset, time_limits=time_limits, ngpus_per_trial=1) # Test the trained model test_dataset = task.Dataset('./data/shopeeiet/test') inds, probs, probs_all = classifier.predict(test_dataset) shopeeiet/train ├── BabyBibs ├── BabyHat ├── BabyPants ├── ... shopeeiet/test ├── ...
  • 26. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. For advanced features of AutoGluon Other AutoML tasks - Image classification - Object detection - Text classification Hyperparameter optimization - Hyperparameter search space and search algorithm customization - Distributed search Neural architecture search - ENAS/ProxylessNAS
  • 27. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Efficient NAS on Target Hardware: ProxylessNAS https://autogluon.mxnet.io/tutorials/nas/enas_mnist.html (Source) PROXYLESSNAS: DIRECTNEURALARCHITECTURESEARCH ONTARGETTASK ANDHARDWARE
  • 28. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. More toolkits for developers
  • 29. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resources AutoGluon (https://autogluon.mxnet.io) GluonCV (https://gluon-cv.mxnet.io) AWS Computer Vision: Getting started with GluonCV (https://www.coursera.org/learn/aws-computer-vision-gluoncv) Deep Java Library (https://djl.ai) Dive into Deep Learning (https://d2l.ai, https://ko.d2l.ai) Machine Learning on AWS (https://ml.aws) Amazon Machine Learning Solutions Lab (https://aws.amazon.com/ml-solutions-lab)
  • 30. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS 머신러닝(ML) 교육 및 자격증 Amazon의 개발자와 데이터 과학자를 교육하는 데 직접 활용 되었던 커리큘럼을 기반으로 학습하세요! 전체 팀을 위한 머신러닝 교육 원하는 방법으로! 교육 유연성 제공 전문성에 대한 검증 비즈니스 의사 결정자, 데이터 과학자, 개발자, 데이터 플랫폼 엔지니어 등 역할에 따라 제공되는 맞춤형 학습 경로를 확인하세요. 약 65개 이상의 온라인 과정 및 AWS 전문 강사를 통해 실습과 실적용의 기회가 제공되는 강의실 교육이 준비되어 있습니다. 업계에서 인정받는 ‘AWS 공인 머신러닝 – 전문분야’ 자격증을 통해 머신러닝 모델을 구축, 학습, 튜닝 및 배포하는 데 필요한 전문 지식이 있음을 입증할 수 있습니다. https://aws.amazon.com/ko/training /learning-paths/machine-learning/
  • 31. Thank you! © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Muhyun Kim muhyun@amazon.com