SlideShare a Scribd company logo
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon EMR과 SageMaker를
이용하여 데이터를 준비하고
머신러닝 모델 개발 하기
A W S F O R D A T A W E B I N A R
강성문
Sr. AIML Special Solutions Architect
AWS
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Agenda
2
SageMaker vs EMR
EMR과 SageMaker를 이용한 대용량 데이터 준비와 머신러닝 모델 개발
▪ 데모1. 환경 구성
▪ 데모2. 머신러닝 모델 개발
정리
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon SageMaker 와
EMR은 어떻게 다른가요?
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon EMR (Elastic Map Reduced)
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon SageMaker
PREPARE
SageMaker Ground Truth
Label training data for
machine learning
SageMaker Data Wrangler
Aggregate and prepare data
for machine learning
SageMaker Processing
Built-in Python, BYO R/Spark
SageMaker Feature Store
Store, update, retrieve, and
share features
SageMaker Clarify
Detect bias and understand
model predictions
BUILD
SageMaker Studio notebooks
Jupyter notebooks with elastic
compute and sharing
Built-in and bring-your-own
algorithms
Dozens of optimized algorithms
or bring your own
Local mode
Test and prototype on your
local machine
SageMaker Autopilot
Automatically create machine
learning models with full visibility
SageMaker JumpStart
Pre-built solutions for common
use cases
TRAIN & TUNE
One-click training
Distributed infrastructure
management
SageMaker Experiments
Capture, organize, and
compare every step
Automatic model tuning
Hyperparameter optimization
Distributed training libraries
Training for large datasets
and models
SageMaker Debugger
Debug and profile training runs
Managed spot training
Reduce training cost by 90%
DEPLOY & MANAGE
Fully managed deployment
Fully managed, ultra-low
latency, high throughput
Kubernetes & Kubeflow
integration
Simplify Kubernetes-based
machine learning
Multi-model endpoints
Reduce cost by hosting
multiple models per instance
SageMaker Model Monitor
Maintain accuracy of
deployed models
SageMaker Edge Manager
Manage and monitor models
on edge devices
SageMaker Pipelines
Workflow orchestration
and automation
Amazon SageMaker
SageMaker Studio
Integrated development environment (IDE) for ML
Not a comprehensive list. Visit aws.amazon.com/sagemaker for the latest information
데이터 준비 빌드 학습 & 튜닝 모델 배포 & 관리
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Machine learning cycle
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Build and train models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Manage data on AWS
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Example Scenario
대용량 데이터 전처리 요청
전처리 결과 활용한 모델 개발
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
EMR과 SageMaker를 이용한
대용량 데이터 준비와
머신러닝 모델 개발
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
1
2
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 1 – SageMaker Studio notebooks
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 1 – SageMaker Studio notebooks
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 2 – AWS Service Catalog
User’s custom
product list
VMs,
containers,
services
✓ 사내 정책 준수
✓ 원클릭 배포
✓ 자동화된 리소스 태깅
✓ 예산관리
AWS Service
Catalog
User
admin
Bitnami Certified
App:
WordPress
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 2 – AWS Service Catalog
Constraint
보안, 거버넌스,
배포 제어
Product
IT 서비스, 리소스
Products list
허용된 Product 목록 조회
Portfolio
Product의 집합
Provisioned products
서비스/리소스 생성 및 실행
AWS Service Catalog Administrator
AWS Service Catalog End User
JSON, YML, or
Terraform
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
데모1
[플랫폼 엔지니어 대상] SageMaker
Studio에서 EMR 생성하고 접속할
수 있는 환경 구성
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
2
3
1
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
https://livy.apache.org/
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
https://github.com/jupyter-incubator/sparkmagic
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
데모2
[데이터 사이언티스트 대상]
SageMaker Studio에서 EMR
접속하고 데이터 준비 및 머신러닝
모델 개발하기
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
정리
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Build and train models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
1
2
3
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
SageMaker 에서 Spark를 사용하는 다른 방법
SageMaker Processing SageMaker Spark Library
Data
Data
전처리 Script
SageMaker
Spark Framework
• SageMakerEstimator
• KMeansSageMakerEstimator
• PCASageMakerEstimator
• XGBoostSageMakerEstimator
• SageMakerModel
• …
EMR with SageMaker Pipeline
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
References
37
• SageMaker Studio EMR Integration example code - https://github.com/aws-samples/sagemaker-studio-emr
• SageMaker Studio integration with EMR Workshop - https://catalog.workshops.aws/sagemaker-studio-emr/en-US
• Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://github.com/aws/amazon-sagemaker-
examples/blob/main/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.ipynb
• Create and manage Amazon EMR clusters from SageMaker Studio to run interactive Spark and ML workloads -
https://aws.amazon.com/ko/blogs/machine-learning/part-1-create-and-manage-amazon-emr-clusters-from-sagemaker-studio-to-run-
interactive-spark-and-ml-workloads/
• Prepare data at scale with SageMaker Studio notebooks - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-emr-
cluster.html
• Connect SageMaker Studio Notebooks in a VPC to External Resources - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-
and-internet-access.html
• Apache Livy - https://livy.apache.org/
• Spark Magic - https://github.com/jupyter-incubator/sparkmagic
• Use Apache Spark with Amazon SageMaker - https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html
• Amazon SageMaker Processing (with Spark) - https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#amazon-
sagemaker-processing
• Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-
python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.html
• SageMaker Pipeline Step (with EMR) - https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Thank you!
© 2022, Amazon Web Services, Inc. or its affiliates.
강성문
kseongmo@amazon.com

More Related Content

PDF
Amazon DynamoDB 키 디자인 패턴
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 2
PDF
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
PDF
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
PDF
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
PDF
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
PDF
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스
PDF
AWS Control Tower를 통한 클라우드 보안 및 거버넌스 설계 - 김학민 :: AWS 클라우드 마이그레이션 온라인
Amazon DynamoDB 키 디자인 패턴
AWS Modern Infra with Storage Roadshow 2023 - Day 2
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스
AWS Control Tower를 통한 클라우드 보안 및 거버넌스 설계 - 김학민 :: AWS 클라우드 마이그레이션 온라인

What's hot (20)

PDF
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
PDF
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
PDF
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
PDF
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
PDF
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
PDF
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
PDF
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
PDF
Amazon Aurora Deep Dive (김기완) - AWS DB Day
PPTX
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
PDF
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
PDF
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
PDF
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
PDF
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
PDF
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
PDF
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
PDF
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략 - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...
PDF
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
PDF
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
PDF
LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...
PDF
AWS를 통한 빅데이터 활용 고객 분석 및 캠페인 시스템 구축 사례 - 임혁용 매니저, AWS / 윤성준 차장, 현대백화점 :: AWS S...
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
Amazon Aurora Deep Dive (김기완) - AWS DB Day
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략 - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...
대규모 온프레미스 하둡 마이그레이션을 위한 실행 전략과 최적화 방안 소개-유철민, AWS Data Architect / 박성열,AWS Pr...
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...
AWS를 통한 빅데이터 활용 고객 분석 및 캠페인 시스템 구축 사례 - 임혁용 매니저, AWS / 윤성준 차장, 현대백화점 :: AWS S...
Ad

Similar to Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기 (20)

PDF
AWS reInvent 2022 reCap AI/ML and Data
PDF
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
PDF
AWS Advanced Analytics Automation Toolkit (AAA)
PDF
Machine Learning with Amazon SageMaker
PDF
Speed up data preparation for ML pipelines on AWS
PPTX
20 ways event-driven architectures can improve your development - Copy.pptx
PDF
apidays Paris 2022 - Optimizing architectures for sustainability, Rudy Krol, AWS
PDF
Frome Code to Cloud: Exploring AWS CDK for Infrastructure Management
PDF
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
PDF
aws_deepracer_amazon_sagemaker_look_under_the_hood.pdf
PDF
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
PDF
Amazon SageMaker workshop
PPTX
Easily Label Training Data For Machine Learning At Scale.pptx
PDF
Opinionated re:Invent recap with AWS Heroes & Builders
PDF
AWS+Innovate+-+Modern+Apps+Edition+-+Opening+Keynote.pdf
PDF
Building Modern Streaming Analytics with Confluent on AWS
PPTX
Easily Label Training Data For Machine Learning At Scale.pptx
PDF
The Future of Mainframe Is in the Cloud
PDF
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
PPTX
AWS reInvent 2023 re:Cap services Slide deck
AWS reInvent 2022 reCap AI/ML and Data
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
AWS Advanced Analytics Automation Toolkit (AAA)
Machine Learning with Amazon SageMaker
Speed up data preparation for ML pipelines on AWS
20 ways event-driven architectures can improve your development - Copy.pptx
apidays Paris 2022 - Optimizing architectures for sustainability, Rudy Krol, AWS
Frome Code to Cloud: Exploring AWS CDK for Infrastructure Management
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
aws_deepracer_amazon_sagemaker_look_under_the_hood.pdf
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker workshop
Easily Label Training Data For Machine Learning At Scale.pptx
Opinionated re:Invent recap with AWS Heroes & Builders
AWS+Innovate+-+Modern+Apps+Edition+-+Opening+Keynote.pdf
Building Modern Streaming Analytics with Confluent on AWS
Easily Label Training Data For Machine Learning At Scale.pptx
The Future of Mainframe Is in the Cloud
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS reInvent 2023 re:Cap services Slide deck
Ad

More from Amazon Web Services Korea (20)

PDF
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
PDF
[D3T1S06] Neptune Analytics with Vector Similarity Search
PDF
[D3T1S03] Amazon DynamoDB design puzzlers
PDF
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
PDF
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
PDF
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
PDF
[D3T1S02] Aurora Limitless Database Introduction
PDF
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
PDF
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 1
PDF
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
PDF
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
PDF
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
PDF
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
PDF
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
PDF
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
PDF
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
PDF
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
PDF
From Insights to Action, How to build and maintain a Data Driven Organization...
PDF
[Keynote] Accelerating Business Outcomes with AWS Data - 발표자: Saeed Gharadagh...
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S02] Aurora Limitless Database Introduction
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
AWS Modern Infra with Storage Roadshow 2023 - Day 1
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
From Insights to Action, How to build and maintain a Data Driven Organization...
[Keynote] Accelerating Business Outcomes with AWS Data - 발표자: Saeed Gharadagh...

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
A Presentation on Artificial Intelligence
Advanced methodologies resolving dimensionality complications for autism neur...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Unlocking AI with Model Context Protocol (MCP)
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기

  • 1. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기 A W S F O R D A T A W E B I N A R 강성문 Sr. AIML Special Solutions Architect AWS
  • 2. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Agenda 2 SageMaker vs EMR EMR과 SageMaker를 이용한 대용량 데이터 준비와 머신러닝 모델 개발 ▪ 데모1. 환경 구성 ▪ 데모2. 머신러닝 모델 개발 정리
  • 3. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker 와 EMR은 어떻게 다른가요?
  • 4. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Amazon EMR (Elastic Map Reduced)
  • 5. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker PREPARE SageMaker Ground Truth Label training data for machine learning SageMaker Data Wrangler Aggregate and prepare data for machine learning SageMaker Processing Built-in Python, BYO R/Spark SageMaker Feature Store Store, update, retrieve, and share features SageMaker Clarify Detect bias and understand model predictions BUILD SageMaker Studio notebooks Jupyter notebooks with elastic compute and sharing Built-in and bring-your-own algorithms Dozens of optimized algorithms or bring your own Local mode Test and prototype on your local machine SageMaker Autopilot Automatically create machine learning models with full visibility SageMaker JumpStart Pre-built solutions for common use cases TRAIN & TUNE One-click training Distributed infrastructure management SageMaker Experiments Capture, organize, and compare every step Automatic model tuning Hyperparameter optimization Distributed training libraries Training for large datasets and models SageMaker Debugger Debug and profile training runs Managed spot training Reduce training cost by 90% DEPLOY & MANAGE Fully managed deployment Fully managed, ultra-low latency, high throughput Kubernetes & Kubeflow integration Simplify Kubernetes-based machine learning Multi-model endpoints Reduce cost by hosting multiple models per instance SageMaker Model Monitor Maintain accuracy of deployed models SageMaker Edge Manager Manage and monitor models on edge devices SageMaker Pipelines Workflow orchestration and automation Amazon SageMaker SageMaker Studio Integrated development environment (IDE) for ML Not a comprehensive list. Visit aws.amazon.com/sagemaker for the latest information 데이터 준비 빌드 학습 & 튜닝 모델 배포 & 관리
  • 6. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Machine learning cycle Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 7. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Build and train models using SageMaker Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 8. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Manage data on AWS Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 9. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Example Scenario 대용량 데이터 전처리 요청 전처리 결과 활용한 모델 개발
  • 10. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. EMR과 SageMaker를 이용한 대용량 데이터 준비와 머신러닝 모델 개발
  • 11. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 목표 시스템 구성도 1 2
  • 12. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 1 – SageMaker Studio notebooks
  • 13. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 1 – SageMaker Studio notebooks
  • 14. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 2 – AWS Service Catalog User’s custom product list VMs, containers, services ✓ 사내 정책 준수 ✓ 원클릭 배포 ✓ 자동화된 리소스 태깅 ✓ 예산관리 AWS Service Catalog User admin Bitnami Certified App: WordPress
  • 15. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 2 – AWS Service Catalog Constraint 보안, 거버넌스, 배포 제어 Product IT 서비스, 리소스 Products list 허용된 Product 목록 조회 Portfolio Product의 집합 Provisioned products 서비스/리소스 생성 및 실행 AWS Service Catalog Administrator AWS Service Catalog End User JSON, YML, or Terraform
  • 16. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 데모1 [플랫폼 엔지니어 대상] SageMaker Studio에서 EMR 생성하고 접속할 수 있는 환경 구성
  • 17. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates.
  • 18. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 목표 시스템 구성도 2 3 1
  • 19. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 3 – Apache Livy and SparkMagic https://livy.apache.org/
  • 20. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 3 – Apache Livy and SparkMagic https://github.com/jupyter-incubator/sparkmagic
  • 21. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 3 – Apache Livy and SparkMagic
  • 22. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 데모2 [데이터 사이언티스트 대상] SageMaker Studio에서 EMR 접속하고 데이터 준비 및 머신러닝 모델 개발하기
  • 23. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates.
  • 24. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 정리
  • 25. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Build and train models using SageMaker Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 26. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 목표 시스템 구성도 1 2 3
  • 27. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. SageMaker 에서 Spark를 사용하는 다른 방법 SageMaker Processing SageMaker Spark Library Data Data 전처리 Script SageMaker Spark Framework • SageMakerEstimator • KMeansSageMakerEstimator • PCASageMakerEstimator • XGBoostSageMakerEstimator • SageMakerModel • … EMR with SageMaker Pipeline
  • 28. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. References 37 • SageMaker Studio EMR Integration example code - https://github.com/aws-samples/sagemaker-studio-emr • SageMaker Studio integration with EMR Workshop - https://catalog.workshops.aws/sagemaker-studio-emr/en-US • Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://github.com/aws/amazon-sagemaker- examples/blob/main/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.ipynb • Create and manage Amazon EMR clusters from SageMaker Studio to run interactive Spark and ML workloads - https://aws.amazon.com/ko/blogs/machine-learning/part-1-create-and-manage-amazon-emr-clusters-from-sagemaker-studio-to-run- interactive-spark-and-ml-workloads/ • Prepare data at scale with SageMaker Studio notebooks - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-emr- cluster.html • Connect SageMaker Studio Notebooks in a VPC to External Resources - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks- and-internet-access.html • Apache Livy - https://livy.apache.org/ • Spark Magic - https://github.com/jupyter-incubator/sparkmagic • Use Apache Spark with Amazon SageMaker - https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html • Amazon SageMaker Processing (with Spark) - https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#amazon- sagemaker-processing • Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://sagemaker-examples.readthedocs.io/en/latest/sagemaker- python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.html • SageMaker Pipeline Step (with EMR) - https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html
  • 29. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Thank you! © 2022, Amazon Web Services, Inc. or its affiliates. 강성문 kseongmo@amazon.com