SlideShare a Scribd company logo
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
김일호, Solutions Architect
05-17-2016
개발자가 알아야 할 Amazon DynamoDB 활용법
Agenda
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Local secondary index (LSI)
Alternate sort(=range) key attribute
Index is local to a partition(=hash) key
A1
(partition)
A3
(sort)
A2
(table	key)
A1
(partition)
A2
(sort)
A3 A4 A5
LSIs A1
(partition)
A4
(sort)
A2
(table	key)
A3
(projected)
Table
KEYS_ONLY
INCLUDE A3
A1
(partition)
A5
(sort)
A2
(table	key)
A3
(projected)
A4
(projected) ALL
10 GB max per hash
key, i.e. LSIs limit the
# of range keys!
Global secondary index (GSI)
Alternate partition key
Index is across all table partition key
A1
(partition)
A2 A3 A4 A5
GSIs
A5
(partition)
A4
(sort)
A1
(table	key)
A3
(projected)
Table
INCLUDE A3
A4
(partition)
A5
(sort)
A1
(table	key)
A2
(projected)
A3
(projected)
ALL
A2
(partition)
A1
(table	key)
KEYS_ONLY
RCUs/WCUs provisioned
separately for GSIs
Online indexing
How do GSI updates work?
Table
Primary
table
Primary
table
Primary
table
Primary
table
Global
Secondary
Index
Client
2. Asynchronous
update (in progress)
If GSIs don’t have enough write capacity, table writes will be throttled!
LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use
GSI!
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Scaling
Throughput
• Provision any amount of throughput to a table
Size
• Add any number of items to a table
• Max item size is 400 KB
• LSIs limit the number of range keys due to 10 GB limit
Scaling is achieved through partitioning
Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
200 RCU
Partitioning math
Number of Partitions
By Capacity (Total RCU / 3000) + (Total WCU / 1000)
By Size Total Size / 10 GB
Total Partitions CEILING(MAX (Capacity, Size))
Partitioning example
Table	size	=	8	GB,	RCUs	=	5000,	WCUs	=	500
RCUs	per	partition	=	5000/3	=	1666.67
WCUs	per	partition	=	500/3	=		166.67
Data/partition	=	10/3	=	3.33	GB
RCUs and WCUs are uniformly
spread across partitions
Number of Partitions
By Capacity (5000 / 3000) + (500 / 1000) = 2.17
By Size 8 / 10 = 0.8
Total Partitions CEILING(MAX (2.17, 0.8)) = 3
Allocation of partitions
A partition split occurs when
• Increased provisioned throughput settings
• Increased storage requirements
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
Example: hot keys
Partition
Time
Heat
Example: periodic spike
Getting the most out of DynamoDB throughput
“To get the most out of DynamoDB
throughput, create tables where the
partition key element has a large
number of distinct values, and
values are requested fairly
uniformly, as randomly as possible.”
—DynamoDB Developer Guide
Space: access is evenly spread over
the key-space
Time: requests arrive evenly spaced
in time
What causes throttling?
If sustained throughput goes beyond provisioned throughput per partition
Non-uniform workloads
• Hot keys/hot partitions
• Very large bursts
Mixing hot data with cold data
• Use a table per time period
From the example before:
• Table created with 5000 RCUs, 500 WCUs
• RCUs per partition = 1666.67
• WCUs per partition = 166.67
• If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may throttle
requests
• Solution: Increase provisioned throughput
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
1:1 relationships or key-values
Use a table or GSI with a partition key
Use GetItem or BatchGetItem API
Example: Given an SSN or license number, get attributes
Users	Table
Partition	key Attributes
SSN	=	123-45-6789 Email	=	johndoe@nowhere.com,	License =	TDL25478134
SSN	=	987-65-4321 Email	=	maryfowler@somewhere.com,	License =	TDL78309234
Users-Email-GSI
Partition	key Attributes
License =	TDL78309234 Email	=	maryfowler@somewhere.com,	SSN	=	987-65-4321
License =	TDL25478134 Email	=	johndoe@nowhere.com,	SSN	=	123-45-6789
1:N relationships or parent-children
Use a table or GSI with partition and sort key
Use Query API
Example:
• Given a device, find all readings between epoch X, Y
Device-measurements
Partition	Key Sort	key Attributes
DeviceId	=	1 epoch	=	5513A97C Temperature	=	30,	pressure	=	90
DeviceId	=	1 epoch	=	5513A9DB Temperature	=	30,	pressure	=	90
N:M relationships
Use a table and GSI with partition and sort key elements
switched
Use Query API
Example: Given a user, find all games. Or given a game,
find all users.
User-Games-Table
Hash	Key Range	key
UserId	=	bob GameId	=	Game1
UserId	=	fred GameId	=	Game2
UserId	=	bob GameId	=	Game3
Game-Users-GSI
Hash	Key Range	key
GameId	=	Game1 UserId	=	bob
GameId	=	Game2 UserId	=	fred
GameId	=	Game3 UserId	=	bob
Documents (JSON)
New data types (M, L, BOOL, NULL)
introduced to support JSON
Document SDKs
• Simple programming model
• Conversion to/from JSON
• Java, JavaScript, Ruby, .NET
Cannot index (S,N) elements of a
JSON object stored in M
• Only top-level table
attributes can be used in
LSIs and GSIs without
Streams/Lambda
JavaScript DynamoDB
string S
number N
boolean BOOL
null NULL
array L
object M
Rich expressions
Projection expression to get just some of the attributes
• Query/Get/Scan: ProductReviews.FiveStar[0]
Rich expressions
Projection expression to get just some of the attributes
• Query/Get/Scan: ProductReviews.FiveStar[0]
ProductReviews: {
FiveStar: [
"Excellent! Can't recommend it highly enough! Buy it!",
"Do yourself a favor and buy this." ],
OneStar: [
"Terrible product! Do not buy this." ] }
]
}
Rich expressions
Filter expression
• Query/Scan: #VIEWS > :num
Update expression
• UpdateItem: set Replies = Replies + :num
Rich expressions
Conditional expression
• Put/Update/DeleteItem
• attribute_not_exists (#pr.FiveStar)
• attribute_exists(Pictures.RearView)
1. First looks for an item whose primary key matches that of the
item to be written.
2. Only if the search returns nothing is there no partition key
present in the result.
3. Otherwise, the attribute_not_exists function above fails and
the write will be prevented.
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Game logging
Storing time series data
Time series tables
Events_table_2015_April
Event_id
(partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
Events_table_2015_March
Event_id
(partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
Events_table_2015_Feburary
Event_id
(partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
Events_table_2015_January
Event_id
(partition key)
Timestamp
(sort key)
Attribute1 …. Attribute N
RCUs = 1000
WCUs = 100
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
RCUs = 10
WCUs = 1
Current table
Older tables
HotdataColddata
Don’t mix hot and cold data; archive cold data to Amazon S3
Use a table per time period
• Pre-create daily, weekly, monthly tables
• Provision required throughput for current table
• Writes go to the current table
• Turn off (or reduce) throughput for older tables
• Pre-create heavy users, light users tables
Item shop catalog
Popular items (read)
Partition 1
2000 RCUs
Partition K
2000 RCUs
Partition M
2000 RCUs
Partition 50
2000 RCU
Scaling bottlenecks
Product A Product B
Gamers
ItemShopCatalog Table
SELECT Id, Description, ...
FROM ItemShopCatalog
RequestsPerSecond
Item Primary Key
Request Distribution Per Partition Key
DynamoDB Requests
Partition 1 Partition 2
ItemShopCatalog Table
User
DynamoDB
User
Cache
popular items
SELECT Id, Description, ...
FROM ProductCatalog
RequestsPerSecond
Item Primary Key
Request Distribution Per Partition Key
DynamoDB Requests Cache Hits
Multiplayer online gaming
Query filters vs.
composite key indexes
GameId Date Host Opponent Status
d9bl3 2014-10-02 David Alice DONE
72f49 2014-09-30 Alice Bob PENDING
o2pnb 2014-10-08 Bob Carol IN_PROGRESS
b932s 2014-10-03 Carol Bob PENDING
ef9ca 2014-10-03 David Bob IN_PROGRESS
Games Table
Multiplayer online game data
Partition key
Query for incoming game requests
DynamoDB indexes provide partition and sort
What about queries for two equalities and a range?
SELECT * FROM Game
WHERE Opponent='Bob‘
AND Status=‘PENDING'
ORDER BY Date DESC
(partition)
(sort)
(?)
Secondary Index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Approach 1: Query filter
BobPartition key Sort key
Secondary Index
Approach 1: Query filter
Bob
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
(filtered out)
Needle in a haystack
Bob
Use query filter
• Send back less data “on the wire”
• Simplify application code
• Simple SQL-like expressions
• AND, OR, NOT, ()
Use when your index isn’t entirely selective
Approach 2: composite key
StatusDate
DONE_2014-10-02
IN_PROGRESS_2014-10-08
IN_PROGRESS_2014-10-03
PENDING_2014-09-30
PENDING_2014-10-03
Status
DONE
IN_PROGRESS
IN_PROGRESS
PENDING
PENDING
Date
2014-10-02
2014-10-08
2014-10-03
2014-10-03
2014-09-30
+ =
Secondary Index
Approach 2: composite key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Partition key Sort key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary Index
Approach 2: composite key
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'
Needle in a sorted haystack
Bob
Sparse indexes
Id	
(Hash)
User Game Score Date Award
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23
3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 Champ
5 Ryan G2 123 2012-03-10
6 Jones G2 345 2012-03-20
Game-scores-table
Award	
(Hash)
Id User Score
Champ 4 Mary 2000
Award-GSI
Scan sparse hash GSIs
Replace filter with indexes
Concatenate attributes to form useful
secondary index keys
Take advantage of sparse indexes
Use when You want to optimize a query as much as possible
Status + Date
Big data analytics
with DynamoDB
Transactional Data Processing
DynamoDB is well-suited for transactional processing:
• High concurrency
• Strong consistency
• Atomic updates of single items
• Conditional updates for de-dupe and optimistic concurrency
• Supports both key/value and JSON document schema
• Capable of handling large table sizes with low latency data access
Case 1: Store and Index Metadata for Objects
Stored in Amazon S3
Case 1: Use Case
We have a large number of digital audio files stored in Amazon S3 and
we want to make them searchable
à Use DynamoDB as the primary data store for the metadata.
à Index and query the metadata using Elasticsearch.
Case 1: Steps to Implement
1. Create a Lambda function that reads the metadata from the
ID3 tag and inserts it into a DynamoDB table.
2. Enable S3 notifications on the S3 bucket storing the audio
files.
3. Enable streams on the DynamoDB table.
4. Create a second Lambda function that takes the metadata in
DynamoDB and indexes it using Elasticsearch.
5. Enable the stream as the event source for the Lambda
function.
Case 1: Key Takeaways
DynamoDB + Elasticsearch = Durable, scalable, highly-
available database with rich query capabilities.
Use Lambda functions to respond to events in both
DynamoDB streams and Amazon S3 without having to
manage any underlying compute infrastructure.
Case 2 – Execute Queries Against Multiple Data Sources Using
DynamoDB and Hive
Case 2: Use Case
We want to enrich our audio file metadata stored in DynamoDB with
additional data from the Million Song dataset:
à Million song data set is stored in text files.
à ID3 tag metadata is stored in DynamoDB.
à Use Amazon EMR with Hive to join the two datasets together in a
query.
Case 2: Steps to Implement
1. Spin up an Amazon EMR cluster with
Hive.
2. Create an external Hive table using the
DynamoDBStorageHandler.
3. Create an external Hive table using the
Amazon S3 location of the text files
containing the Million Song project
metadata.
4. Create and run a Hive query that joins
the two external tables together and
writes the joined results out to Amazon
S3.
5. Load the results from Amazon S3 into
DynamoDB.
Case 2: Key Takeaways
Use Amazon EMR to quickly provision a Hadoop cluster
with Hive and to tear it down when done.
Use of Hive with DynamoDB allows items in DynamoDB
tables to be queried/joined with data from a variety of
sources.
Case 3 – Store and Analyze Sensor Data with
DynamoDB and Amazon Redshift
Dashboard
Case 3: Use Case
A large number of sensors are taking readings at regular intervals. You
need to aggregate the data from each reading into a data warehouse
for analysis:
• Use Amazon Kinesis to ingest the raw sensor data.
• Store the sensor readings in DynamoDB for fast access and real-
time dashboards.
• Store raw sensor readings in Amazon S3 for durability and backup.
• Load the data from Amazon S3 into Amazon Redshift using AWS
Lambda.
Case 3: Steps to Implement
1. Create two Lambda functions to
read data from the Amazon
Kinesis stream.
2. Enable the Amazon Kinesis
stream as an event source for
each Lambda function.
3. Write data into DynamoDB in
one of the Lambda functions.
4. Write data into Amazon S3 in the
other Lambda function.
5. Use the aws-lambda-redshift-
loader to load the data in
Amazon S3 into Amazon
Redshift in batches.
Case 3: Key Takeaways
Amazon Kinesis + Lambda + DynamoDB = Scalable, durable, highly
available solution for sensor data ingestion with very low operational
overhead.
DynamoDB is well-suited for near-realtime queries of recent sensor
data readings.
Amazon Redshift is well-suited for deeper analysis of sensor data
readings spanning longer time horizons and very large numbers of
records.
Using Lambda to load data into Amazon Redshift provides a way to
perform ETL in frequent intervals.
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Stream of updates to a table
Asynchronous
Exactly once
Strictly ordered
• Per item
Highly durable
• Scale with table
24-hour lifetime
Sub-second latency
DynamoDB Streams
View type Destination
Old image—before update Name = John, Destination = Mars
New image—after update
Name = John, Destination = Pluto
Old and new images Name = John, Destination = Mars
Name = John, Destination = Pluto
Keys only Name = John
View types
UpdateItem (Name = John, Destination = Pluto)
Stream
Table
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Table
Shard 1
Shard 2
Shard 3
Shard 4
KCL
Worker
KCL
Worker
KCL
Worker
KCL
Worker
Amazon Kinesis Client
Library application
DynamoDB
client application
Updates
DynamoDB Streams and
Amazon Kinesis Client Library
DynamoDB Streams
Open Source Cross-Region
Replication Library
Asia Pacific (Sydney) EU (Ireland) Replica
US East (N. Virginia)
Cross-region replication
DynamoDB Streams and AWS Lambda
Triggers
Lambda function
Notify change
Derivative tables
Amazon CloudSearch
Amazon ElasticSearch
Amazon ElastiCache
DynamoDB Streams
Analytics with DynamoDB Streams
Collect and de-dupe data in DynamoDB
Aggregate data in-memory and flush periodically
Performing real-time aggregation and analytics
EMR
Redshift
DynamoDB
Cross-region Replication
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
여러분의 피드백을 기다립니다!
https://www.awssummit.co.kr
모바일 페이지에 접속하셔서, 지금 세션 평가에
참여하시면, 행사후 기념품을 드립니다.
#AWSSummit 해시태그로 소셜 미디어에 여러분의
행사 소감을 올려주세요.
발표 자료 및 녹화 동영상은 AWS Korea 공식 소셜
채널로 곧 공유될 예정입니다.

More Related Content

PDF
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
PDF
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
PDF
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
PDF
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
PDF
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
PDF
DynamoDB의 안과밖 - 정민영 (비트패킹 컴퍼니)
PDF
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
PPTX
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
DynamoDB의 안과밖 - 정민영 (비트패킹 컴퍼니)
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017

What's hot (20)

PDF
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
PDF
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트:: AWS Summit Online Korea 2020
PDF
AWS Connectivity, VPC Design and Security Pro Tips
PDF
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
PPTX
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
PDF
AWS Control Tower
PDF
Serverless로 이미지 크롤링 프로토타입 개발기::유호균::AWS Summit Seoul 2018
PDF
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
PDF
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
PPTX
AWS Black Belt Techシリーズ AWS Storage Gateway
PDF
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
PDF
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
PDF
Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기
PDF
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
PDF
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
PDF
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
PPTX
AWS Lambda
PDF
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
PDF
AWS와 부하테스트의 절묘한 만남 :: 김무현 솔루션즈 아키텍트 :: Gaming on AWS 2016
PPTX
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
DynamoDB를 게임에서 사용하기 – 김성수, 박경표, AWS솔루션즈 아키텍트:: AWS Summit Online Korea 2020
AWS Connectivity, VPC Design and Security Pro Tips
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
AWS Control Tower
Serverless로 이미지 크롤링 프로토타입 개발기::유호균::AWS Summit Seoul 2018
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Black Belt Techシリーズ AWS Storage Gateway
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
AWS Lambda
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
AWS와 부하테스트의 절묘한 만남 :: 김무현 솔루션즈 아키텍트 :: Gaming on AWS 2016
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
Ad

Viewers also liked (20)

PDF
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
PDF
Dynamodb 삽질기
PDF
DynamoDB를 이용한 PHP와 Django간 세션 공유 - 강대성 (피플펀드컴퍼니)
PDF
게임업계 IT 관리자를 위한 7가지 유용한 팁 - 박선용 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
PDF
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
PDF
Lambda를 활용한 서버없는 아키텍쳐 구현하기 :: 김기완 :: AWS Summit Seoul 2016
PDF
AWS Innovate 2016 : Closing Keynote - Glenn Gore
PDF
AWS Innovate: Smart Deployment on AWS - Andy Kim
PDF
소셜카지노 초기런칭 및 실험결과 공유
PDF
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
PDF
성공적인 게임 런칭을 위한 비밀의 레시피 #3
PDF
관계형 데이터베이스의 새로운 패러다임 Amazon Aurora :: 김상필 :: AWS Summit Seoul 2016
PDF
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
PDF
Gaming on AWS - 3. DynamoDB 모델링 및 Streams 활용법
PDF
Amazed by aws 1st session
PDF
Amazon Aurora Deep Dive (김기완) - AWS DB Day
PDF
Amazon Machine Learning 게임에서 활용해보기 :: 김일호 :: AWS Summit Seoul 2016
PPTX
CloudFront(클라우드 프론트)와 Route53(라우트53) AWS Summit Seoul 2015
PDF
Amazon Aurora 100% 활용하기
PDF
AWS Innovate: Best Practices for Migrating to Amazon DynamoDB - Sangpil Kim
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Dynamodb 삽질기
DynamoDB를 이용한 PHP와 Django간 세션 공유 - 강대성 (피플펀드컴퍼니)
게임업계 IT 관리자를 위한 7가지 유용한 팁 - 박선용 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
Lambda를 활용한 서버없는 아키텍쳐 구현하기 :: 김기완 :: AWS Summit Seoul 2016
AWS Innovate 2016 : Closing Keynote - Glenn Gore
AWS Innovate: Smart Deployment on AWS - Andy Kim
소셜카지노 초기런칭 및 실험결과 공유
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
성공적인 게임 런칭을 위한 비밀의 레시피 #3
관계형 데이터베이스의 새로운 패러다임 Amazon Aurora :: 김상필 :: AWS Summit Seoul 2016
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Gaming on AWS - 3. DynamoDB 모델링 및 Streams 활용법
Amazed by aws 1st session
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Machine Learning 게임에서 활용해보기 :: 김일호 :: AWS Summit Seoul 2016
CloudFront(클라우드 프론트)와 Route53(라우트53) AWS Summit Seoul 2015
Amazon Aurora 100% 활용하기
AWS Innovate: Best Practices for Migrating to Amazon DynamoDB - Sangpil Kim
Ad

More from Amazon Web Services Korea (20)

PDF
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
PDF
[D3T1S06] Neptune Analytics with Vector Similarity Search
PDF
[D3T1S03] Amazon DynamoDB design puzzlers
PDF
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
PDF
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
PDF
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
PDF
[D3T1S02] Aurora Limitless Database Introduction
PDF
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
PDF
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 2
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 1
PDF
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
PDF
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
PDF
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
PDF
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
PDF
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
PDF
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
PDF
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
PDF
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
PDF
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S02] Aurora Limitless Database Introduction
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 1
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Spectroscopy.pptx food analysis technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Empathic Computing: Creating Shared Understanding
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
Spectroscopy.pptx food analysis technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
Empathic Computing: Creating Shared Understanding

개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 김일호, Solutions Architect 05-17-2016 개발자가 알아야 할 Amazon DynamoDB 활용법
  • 2. Agenda Tip 1. DynamoDB Index(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams
  • 3. Tip 1. DynamoDB Index(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams
  • 4. Local secondary index (LSI) Alternate sort(=range) key attribute Index is local to a partition(=hash) key A1 (partition) A3 (sort) A2 (table key) A1 (partition) A2 (sort) A3 A4 A5 LSIs A1 (partition) A4 (sort) A2 (table key) A3 (projected) Table KEYS_ONLY INCLUDE A3 A1 (partition) A5 (sort) A2 (table key) A3 (projected) A4 (projected) ALL 10 GB max per hash key, i.e. LSIs limit the # of range keys!
  • 5. Global secondary index (GSI) Alternate partition key Index is across all table partition key A1 (partition) A2 A3 A4 A5 GSIs A5 (partition) A4 (sort) A1 (table key) A3 (projected) Table INCLUDE A3 A4 (partition) A5 (sort) A1 (table key) A2 (projected) A3 (projected) ALL A2 (partition) A1 (table key) KEYS_ONLY RCUs/WCUs provisioned separately for GSIs Online indexing
  • 6. How do GSI updates work? Table Primary table Primary table Primary table Primary table Global Secondary Index Client 2. Asynchronous update (in progress) If GSIs don’t have enough write capacity, table writes will be throttled!
  • 7. LSI or GSI? LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!
  • 8. Tip 1. DynamoDB Index(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams
  • 9. Scaling Throughput • Provision any amount of throughput to a table Size • Add any number of items to a table • Max item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit Scaling is achieved through partitioning
  • 10. Throughput Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads Read and write throughput limits are independent 200 RCU
  • 11. Partitioning math Number of Partitions By Capacity (Total RCU / 3000) + (Total WCU / 1000) By Size Total Size / 10 GB Total Partitions CEILING(MAX (Capacity, Size))
  • 12. Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500 RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB RCUs and WCUs are uniformly spread across partitions Number of Partitions By Capacity (5000 / 3000) + (500 / 1000) = 2.17 By Size 8 / 10 = 0.8 Total Partitions CEILING(MAX (2.17, 0.8)) = 3
  • 13. Allocation of partitions A partition split occurs when • Increased provisioned throughput settings • Increased storage requirements http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
  • 16. Getting the most out of DynamoDB throughput “To get the most out of DynamoDB throughput, create tables where the partition key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.” —DynamoDB Developer Guide Space: access is evenly spread over the key-space Time: requests arrive evenly spaced in time
  • 17. What causes throttling? If sustained throughput goes beyond provisioned throughput per partition Non-uniform workloads • Hot keys/hot partitions • Very large bursts Mixing hot data with cold data • Use a table per time period From the example before: • Table created with 5000 RCUs, 500 WCUs • RCUs per partition = 1666.67 • WCUs per partition = 166.67 • If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may throttle requests • Solution: Increase provisioned throughput
  • 18. Tip 1. DynamoDB Index(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams
  • 19. 1:1 relationships or key-values Use a table or GSI with a partition key Use GetItem or BatchGetItem API Example: Given an SSN or license number, get attributes Users Table Partition key Attributes SSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134 SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234 Users-Email-GSI Partition key Attributes License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321 License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
  • 20. 1:N relationships or parent-children Use a table or GSI with partition and sort key Use Query API Example: • Given a device, find all readings between epoch X, Y Device-measurements Partition Key Sort key Attributes DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90 DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
  • 21. N:M relationships Use a table and GSI with partition and sort key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Hash Key Range key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game-Users-GSI Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
  • 22. Documents (JSON) New data types (M, L, BOOL, NULL) introduced to support JSON Document SDKs • Simple programming model • Conversion to/from JSON • Java, JavaScript, Ruby, .NET Cannot index (S,N) elements of a JSON object stored in M • Only top-level table attributes can be used in LSIs and GSIs without Streams/Lambda JavaScript DynamoDB string S number N boolean BOOL null NULL array L object M
  • 23. Rich expressions Projection expression to get just some of the attributes • Query/Get/Scan: ProductReviews.FiveStar[0]
  • 24. Rich expressions Projection expression to get just some of the attributes • Query/Get/Scan: ProductReviews.FiveStar[0] ProductReviews: { FiveStar: [ "Excellent! Can't recommend it highly enough! Buy it!", "Do yourself a favor and buy this." ], OneStar: [ "Terrible product! Do not buy this." ] } ] }
  • 25. Rich expressions Filter expression • Query/Scan: #VIEWS > :num Update expression • UpdateItem: set Replies = Replies + :num
  • 26. Rich expressions Conditional expression • Put/Update/DeleteItem • attribute_not_exists (#pr.FiveStar) • attribute_exists(Pictures.RearView) 1. First looks for an item whose primary key matches that of the item to be written. 2. Only if the search returns nothing is there no partition key present in the result. 3. Otherwise, the attribute_not_exists function above fails and the write will be prevented.
  • 27. Tip 1. DynamoDB Index(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams
  • 29. Time series tables Events_table_2015_April Event_id (partition key) Timestamp (sort key) Attribute1 …. Attribute N Events_table_2015_March Event_id (partition key) Timestamp (sort key) Attribute1 …. Attribute N Events_table_2015_Feburary Event_id (partition key) Timestamp (sort key) Attribute1 …. Attribute N Events_table_2015_January Event_id (partition key) Timestamp (sort key) Attribute1 …. Attribute N RCUs = 1000 WCUs = 100 RCUs = 10000 WCUs = 10000 RCUs = 100 WCUs = 1 RCUs = 10 WCUs = 1 Current table Older tables HotdataColddata Don’t mix hot and cold data; archive cold data to Amazon S3
  • 30. Use a table per time period • Pre-create daily, weekly, monthly tables • Provision required throughput for current table • Writes go to the current table • Turn off (or reduce) throughput for older tables • Pre-create heavy users, light users tables
  • 32. Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Partition 50 2000 RCU Scaling bottlenecks Product A Product B Gamers ItemShopCatalog Table SELECT Id, Description, ... FROM ItemShopCatalog
  • 33. RequestsPerSecond Item Primary Key Request Distribution Per Partition Key DynamoDB Requests
  • 34. Partition 1 Partition 2 ItemShopCatalog Table User DynamoDB User Cache popular items SELECT Id, Description, ... FROM ProductCatalog
  • 35. RequestsPerSecond Item Primary Key Request Distribution Per Partition Key DynamoDB Requests Cache Hits
  • 36. Multiplayer online gaming Query filters vs. composite key indexes
  • 37. GameId Date Host Opponent Status d9bl3 2014-10-02 David Alice DONE 72f49 2014-09-30 Alice Bob PENDING o2pnb 2014-10-08 Bob Carol IN_PROGRESS b932s 2014-10-03 Carol Bob PENDING ef9ca 2014-10-03 David Bob IN_PROGRESS Games Table Multiplayer online game data Partition key
  • 38. Query for incoming game requests DynamoDB indexes provide partition and sort What about queries for two equalities and a range? SELECT * FROM Game WHERE Opponent='Bob‘ AND Status=‘PENDING' ORDER BY Date DESC (partition) (sort) (?)
  • 39. Secondary Index Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David Approach 1: Query filter BobPartition key Sort key
  • 40. Secondary Index Approach 1: Query filter Bob Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David SELECT * FROM Game WHERE Opponent='Bob' ORDER BY Date DESC FILTER ON Status='PENDING' (filtered out)
  • 41. Needle in a haystack Bob
  • 42. Use query filter • Send back less data “on the wire” • Simplify application code • Simple SQL-like expressions • AND, OR, NOT, () Use when your index isn’t entirely selective
  • 43. Approach 2: composite key StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03 Status DONE IN_PROGRESS IN_PROGRESS PENDING PENDING Date 2014-10-02 2014-10-08 2014-10-03 2014-10-03 2014-09-30 + =
  • 44. Secondary Index Approach 2: composite key Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Partition key Sort key
  • 45. Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Secondary Index Approach 2: composite key Bob SELECT * FROM Game WHERE Opponent='Bob' AND StatusDate BEGINS_WITH 'PENDING'
  • 46. Needle in a sorted haystack Bob
  • 47. Sparse indexes Id (Hash) User Game Score Date Award 1 Bob G1 1300 2012-12-23 2 Bob G1 1450 2012-12-23 3 Jay G1 1600 2012-12-24 4 Mary G1 2000 2012-10-24 Champ 5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20 Game-scores-table Award (Hash) Id User Score Champ 4 Mary 2000 Award-GSI Scan sparse hash GSIs
  • 48. Replace filter with indexes Concatenate attributes to form useful secondary index keys Take advantage of sparse indexes Use when You want to optimize a query as much as possible Status + Date
  • 50. Transactional Data Processing DynamoDB is well-suited for transactional processing: • High concurrency • Strong consistency • Atomic updates of single items • Conditional updates for de-dupe and optimistic concurrency • Supports both key/value and JSON document schema • Capable of handling large table sizes with low latency data access
  • 51. Case 1: Store and Index Metadata for Objects Stored in Amazon S3
  • 52. Case 1: Use Case We have a large number of digital audio files stored in Amazon S3 and we want to make them searchable à Use DynamoDB as the primary data store for the metadata. à Index and query the metadata using Elasticsearch.
  • 53. Case 1: Steps to Implement 1. Create a Lambda function that reads the metadata from the ID3 tag and inserts it into a DynamoDB table. 2. Enable S3 notifications on the S3 bucket storing the audio files. 3. Enable streams on the DynamoDB table. 4. Create a second Lambda function that takes the metadata in DynamoDB and indexes it using Elasticsearch. 5. Enable the stream as the event source for the Lambda function.
  • 54. Case 1: Key Takeaways DynamoDB + Elasticsearch = Durable, scalable, highly- available database with rich query capabilities. Use Lambda functions to respond to events in both DynamoDB streams and Amazon S3 without having to manage any underlying compute infrastructure.
  • 55. Case 2 – Execute Queries Against Multiple Data Sources Using DynamoDB and Hive
  • 56. Case 2: Use Case We want to enrich our audio file metadata stored in DynamoDB with additional data from the Million Song dataset: à Million song data set is stored in text files. à ID3 tag metadata is stored in DynamoDB. à Use Amazon EMR with Hive to join the two datasets together in a query.
  • 57. Case 2: Steps to Implement 1. Spin up an Amazon EMR cluster with Hive. 2. Create an external Hive table using the DynamoDBStorageHandler. 3. Create an external Hive table using the Amazon S3 location of the text files containing the Million Song project metadata. 4. Create and run a Hive query that joins the two external tables together and writes the joined results out to Amazon S3. 5. Load the results from Amazon S3 into DynamoDB.
  • 58. Case 2: Key Takeaways Use Amazon EMR to quickly provision a Hadoop cluster with Hive and to tear it down when done. Use of Hive with DynamoDB allows items in DynamoDB tables to be queried/joined with data from a variety of sources.
  • 59. Case 3 – Store and Analyze Sensor Data with DynamoDB and Amazon Redshift Dashboard
  • 60. Case 3: Use Case A large number of sensors are taking readings at regular intervals. You need to aggregate the data from each reading into a data warehouse for analysis: • Use Amazon Kinesis to ingest the raw sensor data. • Store the sensor readings in DynamoDB for fast access and real- time dashboards. • Store raw sensor readings in Amazon S3 for durability and backup. • Load the data from Amazon S3 into Amazon Redshift using AWS Lambda.
  • 61. Case 3: Steps to Implement 1. Create two Lambda functions to read data from the Amazon Kinesis stream. 2. Enable the Amazon Kinesis stream as an event source for each Lambda function. 3. Write data into DynamoDB in one of the Lambda functions. 4. Write data into Amazon S3 in the other Lambda function. 5. Use the aws-lambda-redshift- loader to load the data in Amazon S3 into Amazon Redshift in batches.
  • 62. Case 3: Key Takeaways Amazon Kinesis + Lambda + DynamoDB = Scalable, durable, highly available solution for sensor data ingestion with very low operational overhead. DynamoDB is well-suited for near-realtime queries of recent sensor data readings. Amazon Redshift is well-suited for deeper analysis of sensor data readings spanning longer time horizons and very large numbers of records. Using Lambda to load data into Amazon Redshift provides a way to perform ETL in frequent intervals.
  • 63. Tip 1. DynamoDB Index(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams
  • 64. Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24-hour lifetime Sub-second latency DynamoDB Streams
  • 65. View type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John View types UpdateItem (Name = John, Destination = Pluto)
  • 66. Stream Table Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Table Shard 1 Shard 2 Shard 3 Shard 4 KCL Worker KCL Worker KCL Worker KCL Worker Amazon Kinesis Client Library application DynamoDB client application Updates DynamoDB Streams and Amazon Kinesis Client Library
  • 67. DynamoDB Streams Open Source Cross-Region Replication Library Asia Pacific (Sydney) EU (Ireland) Replica US East (N. Virginia) Cross-region replication
  • 68. DynamoDB Streams and AWS Lambda
  • 69. Triggers Lambda function Notify change Derivative tables Amazon CloudSearch Amazon ElasticSearch Amazon ElastiCache DynamoDB Streams
  • 70. Analytics with DynamoDB Streams Collect and de-dupe data in DynamoDB Aggregate data in-memory and flush periodically Performing real-time aggregation and analytics EMR Redshift DynamoDB
  • 73. 여러분의 피드백을 기다립니다! https://www.awssummit.co.kr 모바일 페이지에 접속하셔서, 지금 세션 평가에 참여하시면, 행사후 기념품을 드립니다. #AWSSummit 해시태그로 소셜 미디어에 여러분의 행사 소감을 올려주세요. 발표 자료 및 녹화 동영상은 AWS Korea 공식 소셜 채널로 곧 공유될 예정입니다.