SlideShare a Scribd company logo
ํ•˜๋‘ก ๋ฐ ํ•˜๋‘ก ์—์ฝ” ์‹œ์Šคํ…œ์„ ์ด์šฉํ•œ
๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ์•„ํ‚คํ…์ฒ˜ ์ ์šฉ ์‚ฌ๋ก€

๊น€ํ˜•์ค€ / GRUTER
CONTENTS
1. ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ์˜ ๋น…๋ฐ์ดํ„ฐ

2. e-Commerce ์ ์šฉ ์‚ฌ๋ก€
3. ๋ณด์•ˆ ๋ถ„์„ ํ”Œ๋žซํผ ์‚ฌ๋ก€
4. ๋ฐ”์ด์˜ค ์ธํฌ๋ฉ”ํ‹ฑ์Šค ์‚ฌ๋ก€
5. ์˜จ๋ผ์ธ ์ปจํ…์ธ  ์„œ๋น„์Šค ์‚ฌ๋ก€
์—”ํ„ฐํ”„๋ผ์ด์ฆˆ์˜ ๋น…๋ฐ์ดํ„ฐ
์—”ํ„ฐํ”„๋ผ์ด์ฆˆ์˜ IT ํ™˜๊ฒฝ

โ€ข ํ˜„์žฌ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ IT ํ™˜๊ฒฝ์€
๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ ์šฉํ•˜๊ธฐ ์–ด๋ ค์šด ํ™˜๊ฒฝ
IT ๊ธฐํš ๋ฐ ๊ด€๋ฆฌ ์ค‘์‹ฌ, ์‹คํ–‰์€ ์•„์›ƒ ์†Œ์‹ฑ(BAD)
IT ์žํšŒ์‚ฌ๊ฐ€ ๊ด€๋ฆฌ ๋ฐ ์‹คํ–‰(BAD)
์ฃผ์š” ์šด์˜/๊ฐœ๋ฐœ์€ ์ง์ ‘ ์ˆ˜ํ–‰, ์ผ๋ถ€ ์™ธ์ฃผ(GOOD)

๋Œ€๋ถ€๋ถ„ ์ง์ ‘ ์ˆ˜ํ–‰(GOOD)
๋น…๋ฐ์ดํ„ฐ ํ”„๋กœ์ ํŠธ์˜ ์„ฑ๊ณต ์š”์†Œ

โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ๊ฐ€์น˜ > ๋ถ„์„ ๋น„์šฉ
โ€ข ๋ฌด์—‡์„ ๋ถ„์„ํ•  ๊ฒƒ์ธ๊ฐ€์— ๋Œ€ํ•œ ๊ณ ๋ฏผ
โ€ข ์ง€์†์ ์ธ ๋ถ„์„ ๊ฒฐ๊ณผ ๊ฐœ์„  ํ™œ๋™(ํŠœ๋‹)
โ€ข IT ๋ถ€์„œ๊ฐ€ ์•„๋‹Œ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ๋ถ€์„œ๊ฐ€ ์ฃผ๋„

โ€ข !์ž˜ ์ž‘์„ฑ๋œ ํ”„๋กœ์ ํŠธ ๊ณ„ํš์„œ
โ€ข ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ˆ ๋ ฅ
๋น…๋ฐ์ดํ„ฐ ํ”„๋กœ์ ํŠธ ์ง„ํ–‰
์‹œ์Šคํ…œ ๊ธฐํš
(๋ถ„์„ ๋Œ€์ƒ, ๋ฐ์ดํ„ฐ, ์•Œ๊ณ ๋ฆฌ์ฆ˜)

์‹œ์Šคํ…œ ๊ธฐํš
(๋ถ„์„ ๋„๋ฉ”์ธ๋งŒ ๊ฒฐ์ •,
๋งˆ์ผ€ํŒ…, ์ƒ์‚ฐ์„ฑ ํ–ฅ์ƒ, ... )

์‹œ์Šคํ…œ ๋น„์šฉ ๋ฐ ROI ์‚ฐ์ •

๊ด€๋ จ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
(๊ธฐ์—… ๋‚ด๋ถ€, ์™ธ๋ถ€)

์—…์ฒด ์„ ์ •

๊ฐœ๋ฐœ

์šด์˜

3 ~ 6๊ฐœ์›” ์ด์ƒ ์†Œ์š”

๋ฐ์ดํ„ฐ ๊ฐ€์ง€๊ณ  ๋†€๊ธฐ

๊ฐ€์น˜ ๋ฐœ๊ตด

์‹œ์Šคํ…œ์— ๋ฐ˜์˜

์ง€์†์ ์ธ ํ™œ๋™
๊ฒฐ๋ก !!!

โ€ข ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ํ˜„์žฌ์˜ ๋น…๋ฐ์ดํ„ฐ์˜ ๊ฐ€
์žฅ ํฐ ์ฐจ์ด๋Š”

โ€ข ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋„ ์•„๋‹ˆ๊ณ , ์ข…๋ฅ˜๋„ ์•„๋‹ˆ๊ณ , ์†๋„
๋„ ์•„๋‹Œ
โ€ข ๊ธฐ์—… ์Šค์Šค๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ ๊ทน์ ์œผ๋กœ ์ด์šฉ
ํ•ด์„œ ์ œํ’ˆ ๊ฐœ๋ฐœ, ์„œ๋น„์Šค ๊ธฐ๋Šฅ, ๋งˆ์ผ€ํŒ… ๋“ฑ์— ์ฐจ
๋ณ„ํ™”๋˜๊ณ  ๊ฒฝ์Ÿ ์šฐ์œ„์— ์žˆ๋Š” ๋ฌด๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ.
E-Commerce ์‚ฌ๋ก€
(์‹ค์‹œ๊ฐ„ ๋ถ„์„ ํ”Œ๋žซํผ)
e-Commerce ๋ฐ์ดํ„ฐ ๋ถ„์„

โ€ข ์š”๊ตฌ์‚ฌํ•ญ์€?
ํ˜„์‹ค์€?

โ€ข ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๋กœ๊ทธ ์กฐ์ฐจ๋„ ์ผ ๋‹จ์œ„ ๋ถ„์„
โ€ข HTTP LOG ๋“ฑ
โ€ข ๋น„์ฆˆ๋‹ˆ์Šค์— ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋กœ๊ทธ๋„ ์—†์Œ
โ€ข ์ผ๋ถ€ ๋กœ๊ทธ๋Š” ์™ธ๋ถ€ ์—…์ฒด๋กœ ์ „๋‹ฌ
์ „์ฒด ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜
์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์‹œ์Šคํ…œ ๊ตฌ์„ฑ ์˜ˆ

์ž„์‹œ ์ €์žฅ์†Œ์ธ Queue ์žฅ์•  ์‹œ ๋ฐฉ์•ˆ?

๋ถ„์„ ์ค‘ ์ผ๋ถ€ ๋ถ„์„ ์„œ๋ฒ„ ์žฅ์•  ์‹œ
์ž„์‹œ ๋ถ„์„ ๊ฒฐ๊ณผ๋Š” ์–ด๋–ป๊ฒŒ?

๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ์˜ ์„ฑ๋Šฅ์€?
๋ถ„์„ ๊ฒฐ๊ณผ ์„œ๋น„์Šค ์ œ๊ณต ์‹œ
์ถฉ๋ถ„ํ•œ ๊ธฐ๋Šฅ ์ œ๊ณต?
http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์–ด๋ ค์›€ #1
โ€ข ์ค‘๋ณต, ์œ ์‹ค, ์„ฑ๋Šฅ ๋ชจ๋‘๋ฅผ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์–ด๋ ค์›€
โ€ข ์ด์ค‘ํ™”๋œ ํ์™€ ์ฒดํฌ ํฌ์ธํŒ… ๊ธฐ๋Šฅ์ด ํ•ต์‹ฌ
โ€ข ์ฒดํฌ ํฌ์ธํŒ…์„ ์ž์ฃผ ํ•˜๋ฉด ์„ฑ๋Šฅ ์ €ํ•˜
โ€ข ๊ฐ€๋” ํ•˜๋ฉด ๋ฐ์ดํ„ฐ ์œ ์‹ค์ด ๋†’์•„์ง
โ€ข ์„ฑ๋Šฅ
โ€ข ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ, ๋ถ„์„์˜ ๋ณต์žก์„ฑ(๋‹ค์–‘ํ•œ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ์™€ ์—ฐ
๊ณ„ ๋“ฑ)
โ€ข ์šด์˜ ๊ด€๋ฆฌ
โ€ข ๋ฌด์ •์ง€๋กœ ์šด์˜ ๋˜์–ด์•ผ ํ•จ
โ€ข ํ”„๋กœ๊ทธ๋žจ ๋ฐฐํฌ
โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ
โ€ข ์ €์žฅ ์ฃผ๊ธฐ, ์ฒดํฌ ํฌ์ธํŠธ
โ€ข ์ €์žฅ์†Œ ์„ฑ๋Šฅ, ๊ธฐ๋Šฅ
์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์–ด๋ ค์›€ #2
โ€ข ์‹œ๊ฐ„ ๊ด€๋ฆฌ
โ€ข ๋ถ„์‚ฐ๋œ ํ™˜๊ฒฝ์˜ ์‹œ๊ฐ„ ๋™๊ธฐํ™”
โ€ข Time window ๋™๊ธฐํ™”
โ€ข Data time vs. System time
โ€ข ๋ถ„์„ ๋กœ์ง ๊ตฌํ˜„
โ€ข SQL ๊ธฐ๋ฐ˜
โ€ข ํ”„๋กœ๊ทธ๋žจ ๊ธฐ๋ฐ˜
โ€ข ํ”Œ๋žซํผ๋“ค์˜ ์กฐํ•ฉ
โ€ข Flume, Storm, Kafka ๋“ฑ
โ€ข ๊ฐ๊ฐ์€ HA ๋“ฑ์— ๋Œ€ํ•œ ๊ธฐ๋Šฅ ์ œ๊ณต, But ์กฐํ•ฉ ์‹œ ๋ถˆํ˜‘ํ™”์Œ
โ€ข ์„œ๋ฒ„ ์‚ฌ์ด์ง•
โ€ข Agent/Collelctor ๋Œ“์ˆ˜ ๋น„์œจ, CPU/Network ๋“ฑ
๊ตฌ์ถ•๋œ ์‹ค์‹œ๊ฐ„ ํ”Œ๋žซํผ(์ž์ฒด ๊ฐœ๋ฐœ)

ZooKeeper

Flume Collector

Dimension
Data

๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ
(HBase)
Time Window
Manager
(Master Role)

Realtime Server
memory

Realtime
Client

Queue

User
Processor

Replicator

Partition
Proxy

Processor
Engine

Partitioner

Flume Collector

Partition #1
Realtime
Client

Partition #2

Partition #3
ํŠน์ง• #1
โ€ข ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ํด๋Ÿฌ์Šคํ„ฐ ํŒŒํ‹ฐ์…˜
โ€ข ๋ฐ์ดํ„ฐ ํŒŒํ‹ฐ์…˜ ์ฒ˜๋ฆฌ ์‰ฌ์šด ์žฅ์ 
โ€ข ์„œ๋ฒ„ ์ถ”๊ฐ€/์ œ๊ฑฐ ๋‹จ์ ์€ Shell ๋ช…๋ น์„ ํ†ตํ•ด ์‹คํ–‰
โ€ข ํŒŒํ‹ฐ์…˜ ์ด์ค‘ํ™”
โ€ข ํ•˜๋‚˜์˜ ํŒŒํ‹ฐ์…˜์€ ๋‘ ๊ฐœ์˜ ์„œ๋ฒ„๊ฐ€ ๋‹ด๋‹น(Master/Slave)
โ€ข ๋ถ„์‚ฐ ์‹ค์‹œ๊ฐ„ ๋ถ„์„์— ํ•„์š”ํ•œ ๋‹ค์–‘ํ•œ ๋ชจ๋“ˆ ๊ธฐ๋ณธ ์ œ๊ณต
โ€ข ๋ถ„์‚ฐ๋œ ์„œ๋ฒ„๋“ค ์‚ฌ์ด์— ๋™๊ธฐํ™”๋œ Flush ๊ธฐ๋Šฅ
โ€ข Time ๋™๊ธฐํ™” ๊ธฐ๋Šฅ, Esper ์—ฐ๊ณ„ ๋ชจ๋“ˆ
โ€ข WorkGroup
โ€ข ํ•˜๋‚˜์˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„์„ ๋ชจ๋“ˆ์ด
์—ฐ๊ฒฐ ๋˜์–ด์•ผ ํ•จ.
โ€ข ํ•˜๋‚˜์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„์„ ์—…๋ฌด๋ฅผ ๋™์‹œ์— ์ˆ˜ํ–‰
ํŠน์ง• #2
โ€ข ์ž์ฒด ๊ฐœ๋ฐœ
โ€ข ๊ณต๊ฐœ๋œ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์†”๋ฃจ์…˜์€ ๋‹ค์Œ ๊ธฐ๋Šฅ ์ œ๊ณต
โ€ข ๋ฐ๋ชฌ ์„œ๋ฒ„, ๋ฐ์ดํ„ฐ ์†ก์ˆ˜์‹  RPC
โ€ข ํ”„๋กœ๊ทธ๋žจ ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ํŒŒํ‹ฐ์…”๋‹, Queue์™€ ์—ฐ๋™
โ€ข ํ™œ์šฉ ๊ฐ€๋Šฅํ•œ ์กฐ๊ฐ ๋ชจ์Œ์€ ๋Œ€๋ถ€๋ถ„ ์˜คํ”ˆ ์†Œ์Šค๋กœ ๋‚˜์™€ ์žˆ์Œ
โ€ข RPC: Thrift, Avro, Protobuf, Netty
โ€ข Event, Cluster Membership, Synchronization:
ZooKeeper
โ€ข Query Processing: Esper
โ€ข Queue: Kafka, RabbitMQ, ZeroMQ
๋ฐ์ดํ„ฐ ๋ถ„์„ ํ๋ฆ„
Load in memory

hash(url)

IP-City
Data

URL, Count(1)
Group by URL

Log Parsing

WorkGroup #1
(LogType=URL)
time batch 60 sec.
TOP 100
Order by count
Desc

URL, Count(1)
Group by URL

log
data

Log Parsing

Log Parsing

Count
(Distinct User)

HBase Table

hash(user_id)
Count
(Distinct User)

WorkGroup #2
(LogType=User)

time batch 20 sec.
๊ฒฐ๋ก 
โ€ข ์‹ค์‹œ๊ฐ„ ๋ถ„์„์€ ๋Œ€์„ธ์ด์ง€๋งŒ ๋งŽ์€ ๋‚œ๊ด€์ด ์กด์žฌ
โ€ข ๊ณ ๊ฐ์˜ ์š”๊ตฌ(์ •ํ•ฉ์„ฑ, ์•ˆ์ •์„ฑ ๋ชจ๋‘ ๋งŒ์กฑ ๋“ฑ)
โ€ข ๋ฉ”ํƒ€ ์ •๋ณด(JOIN) ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ
โ€ข ์šด์˜์˜ ์–ด๋ ค์›€(ํ•ญ์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ํ˜๋Ÿฌ ๋‹ค๋‹˜)
โ€ข ๋ถ„์„ ๋Œ€์ƒ ๋ฐ์ดํ„ฐ์˜ ์†์„ฑ, ๋ถ„์„ ๋กœ์ง ๋“ฑ์— ๋”ฐ๋ผ ์ ์ ˆํ•œ
ํ”Œ๋žซํผ ์„ ํƒ
โ€ข ํ”Œ๋žซํผ์€ ๊ธฐ๋ณธ๋งŒ ์ œ๊ณต
โ€ข ๋งŽ์€ ๊ฒƒ์„ ๊ทธ ์œ„์— ๋งŒ๋“ค์–ด์•ผ ํ•จ
โ€ข ์ ์ ˆํ•œ ํ”Œ๋žซํผ์ด ์—†์œผ๋ฉด ๋งŒ๋“œ๋Š” ๊ฒƒ๋„ ๋ฐฉ๋ฒ•
๋ณด์•ˆ ๋ถ„์„ ํ”Œ๋žซํผ ์‚ฌ๋ก€
(๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ๊ฒ€์ƒ‰)
๋ณด์•ˆ ๋ฐ์ดํ„ฐ ๋ถ„์„

๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ด์„œ
ํ†ตํ•ฉ ์ €์žฅ์†Œ์— ์ €์žฅํ•œ ๋‹ค์Œ
๋ถ„์„์„ ํ†ตํ•ด์„œ ๋ณด์•ˆ ์œ„ํ˜‘์„ ์ฐพ์•„๋‚ด๊ณ 
๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์„œ
์‹ค์‹œ๊ฐ„ ๊ฐ์ง€ ๋ฐ ๋Œ€์‘ ์‹œ์Šคํ…œ์— ์ ์šฉํ•ด์„œ
๋ณด์•ˆ ๊ณต๊ฒฉ์— ๋Œ€๋น„ํ•œ๋‹ค
์ด ๊ณผ์ •์„ ์ง€์†์ ์œผ๋กœ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ๋” ๊ฐ•๋ ฅํ•˜๊ณ 
์ง€๋Šฅ์ ์ธ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์„œ ๋ณ€ํ™”ํ•˜๋Š” ๋ณด์•ˆ ์œ„ํ˜‘
์— ๋Œ€์‘ํ•œ๋‹ค
์ „์ฒด ์•„ํ‚คํ…์ฒ˜
Data source/collector
(various log data)

Data collector/ real-time analysis
Flume Collector

Data Source
(Web Server)

Cluster Monitoring

Cluster coordinator

Rule Manager

Zookeeper

ARM

Cloumon

Logical Node
primary storage(File/Structured), near real-time analysis

Thrift Flume
Source Agent

Pipeline-Sink

Thrift
Sink

Temporary

HBase
RegionServer

SemiStructured

Cloustream

Hadoop
DataNode

NoSQL
(HBase)

Origin File

Near real-time
analysis

Hadoop
Thrift
Source

Data source/collector
(standard protocols
such as FTP, HTTP)
Data Source

FTP/ Flume
HTTP Agent

Temporary

Thrift
Sink

Search engine
Search

ElasticSearch
Real-time
Analysis

Index

Batch analysis/storage
Batch analysis

Real-time analysis result
storage (File/Structured)
HBase
RegionServer

SemiStructured

Hive
Hadoop
MapReduce
Hadoop
DataNode

Hadoop
DataNode

Origin File

Oracle/MySQL

RDB

Analysis
Result
Origin File
๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
โ€ข ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ๋ฐœ์ƒ์› = ์œ ์—ฐํ•œ ์ˆ˜์ง‘ ์‹œ์Šคํ…œ
โ€ข ์‹ค์‹œ๊ฐ„ ์ˆ˜์ง‘ = ์ด๋ฒคํŠธ ์ŠคํŠธ๋ฆฌ๋ฐ
โ€ข ๋‹ค์–‘ํ•œ ํ”„๋กœ์„ธ์‹ฑ = pluggable pipeline ๊ตฌ์กฐ
โ€ข scalability, reliability, extensibility, manageability
โ€ข Flume
agent

data

collector

.
.

.
.

agent

collector

data

storage
์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ #1
โ€ข Flume OG ์‚ฌ์šฉ
โ€ข ์ค‘์•™ ์ง‘์ค‘ ๊ด€๋ฆฌ ๊ธฐ๋Šฅ์ด ์šฐ์ˆ˜(NG์— ๋น„ํ•ด)
โ€ข ๋„์ž… ๋‹น์‹œ NG๋Š” ์„ฑ์ˆ™๋œ ์ƒํƒœ๊ฐ€ ์•„๋‹ˆ์—ˆ์Œ
โ€ข Tailing์ด ์‰ฝ์ง€ ์•Š์Œ
โ€ข ๊ธฐ๋ณธ ์ œ๊ณต Tailer๋Š” ์‹ค์ œ ์—…๋ฌด ์ ์šฉ์— ํ•œ๊ณ„
โ€ข ๊ธฐ์กด ์šด์˜ ์žฅ๋น„ ๋ถ€ํ•˜ ์ตœ์†Œ(CPU/Network ๋“ฑ)
โ€ข CPU 5%์ดํ•˜, Memory 32MB ์ดํ•˜
โ€ข Checkpoint ๊ด€๋ฆฌ ๊ธฐ๋Šฅ
โ€ข Agent ์žฌ ์‹œ์ž‘ ์‹œ Throttling ๊ธฐ๋Šฅ
โ€ข Network ๋Œ€์—ญ ๋ชจ๋‘ ์‚ฌ์šฉ ๋ฌธ์ œ
โ€ข Rolling File์— ๋Œ€ํ•œ ์ธ์‹
โ€ข Windows 2000 Server?
์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ #2
โ€ข ๋‹ค์–‘ํ•œ ํ”„๋กœํ† ์ฝœ ๋ฐ ์žฅ๋น„ ์ง€์›
โ€ข TCP, Syslog, SNMP ๋“ฑ
โ€ข Linux, AIX, HP-UX, Solaris, Windows
โ€ข ์œ ์‹ค/์ค‘๋ณต/์„ฑ๋Šฅ ๋ชจ๋‘ ๋งŒ์กฑํ•˜๊ธฐ ์–ด๋ ค์›€
โ€ข Collector ์ด์ค‘ํ™”
โ€ข Agent -> Collector -> ์ €์žฅ์†Œ๊นŒ์ง€ ์ €์žฅ ํ›„ ACK(์„ฑ๋Šฅ ์ €
ํ•˜)
โ€ข ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ์ž˜๋˜๊ณ  ์žˆ๋Š”์ง€ ๋ชจ๋‹ˆํ„ฐ๋ง ์–ด๋ ค์›€
โ€ข Component(Agent, Switch, Collector, ์ €์žฅ์†Œ ๋“ฑ) ๋ชจ๋‹ˆํ„ฐ
๋ง ๊ตฌ์„ฑ ํ•„์š” -> ์–ด๋ ค์›€
โ€ข ๊ฐœ๋ฐœ ์™ธ๋ถ€์ ์ธ ์‚ฌํ•ญ์ด ๋” ํฐ ์–ด๋ ค์›€
โ€ข ๋ฐฉํ™”๋ฒฝ ํ•ด์ œ
โ€ข Agent ์„ค์น˜์— ๋Œ€ํ•œ ๊ฑฐ๋ถ€๊ฐ
๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰
โ€ข ์š”๊ตฌ์‚ฌํ•ญ
โ€ข ์ „์ฒด ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ(์ˆ˜๋ฐฑGB/์ผ), ๋ˆ„์  6๊ฐœ์›” ๋ณด๊ด€, ์‘๋‹ต์†๋„
๋Š” 10 ~ 30์ดˆ ์ด๋‚ด
โ€ข ํ˜„์‹ค์€?
โ€ข ์ƒ์šฉ ์†”๋ฃจ์…˜์€ ๊ณ ๊ฐ€์˜ ๋น„์šฉ, ๋ผ์ด์„ ์Šค๊ฐ€ ํŠธ๋ž˜ํ”ฝ ์ค‘์‹ฌ
โ€ข ์ผ๋ฐ˜์ ์ธ ๊ฒ€์ƒ‰ ์†”๋ฃจ์…˜(์˜คํ”ˆ์†Œ์Šค ์†”๋ฃจ์…˜ ํฌํ•จ)์€ ์„œ๋น„์Šค์—
๋งž์ถฐ์ ธ ์žˆ์–ด ๋Œ€์šฉ๋Ÿ‰, ์žฅ๊ธฐ๊ฐ„ ๋ฐ์ดํ„ฐ ๋ณด๊ด€์—๋Š” ์ทจ์•ฝ
โ€ข ์•„์ด๋””์–ด
โ€ข ๊ฒ€์ƒ‰ ํด๋Ÿฌ์Šคํ„ฐ ์ด์ค‘ํ™”
โ€ข ์ตœ๊ทผ ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค/๊ฒ€์ƒ‰์šฉ -> Native ElasticSearch
โ€ข ๊ณผ๊ฑฐ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๋ณด๊ด€/๊ฒ€์ƒ‰์šฉ -> ElasticSearch for
Hadoop
๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰ ์•„ํ‚คํ…์ฒ˜
์‹ค์‹œ๊ฐ„ ์ƒ‰์ธ ํด๋Ÿฌ์Šคํ„ฐ(์ตœ์‹  ๋ฐ์ดํ„ฐ)

์ฝ๊ธฐ ์ „์šฉ ํด๋Ÿฌ์Šคํ„ฐ(์ „์ฒด ๋ฐ์ดํ„ฐ)

Server1
Hadoop
FileSystem
(for Analytic)

index1
(SAS or SATA)

Collector
HDFSSink
ElasticSearc
h
Sink

Hadoop FileSystem
(for elastcisearch)

ElasticSearch

Server2

index
7

Index
Migration
Tool

index
8

index
9

index
10

index
11

index
12

ElasticSearch

Server1

Application
Searcher

HDFS
Gateway

HDFS
Gateway

ElasticSearch

index2
(SAS or SATA)

Server2

ElasticSearch
๋ฐ”์ด์˜ค์ธํฌ๋งคํ‹ฑ์Šค
(Hadoop ๊ธฐ๋ฐ˜ Genome ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค)
์š”๊ตฌ์‚ฌํ•ญ: Genome Browser์šฉ DB

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Challenges
โ€ข ๋„๋ฉ”์ธ ์ดํ•ด์˜ ์–ด๋ ค์›€
โ€ข AATCTATA AATCTATA AATCTATA โ€ฆ
โ€ข ์ˆ˜ ๋งŽ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐ ์ˆ˜์‹
โ€ข Maxam-Gilbert sequencing
โ€ข R-Tree
โ€ข ๋‹ค์–‘ํ•œ Data format
โ€ข FASTA, SAM, BAM, SNP, CNV, Inversion
Large InDel, Small InDel
โ€ข ๋Œ€์šฉ๋Ÿ‰ ๋ ˆ์ฝ”๋“œ ์ €์žฅ๊ณผ ๊ฒ€์ƒ‰ (Read only)
์‹œ์Šคํ…œ ๊ตฌ์„ฑ
Uploader

Application Server

ZooKeeper

Master Server

Server Cluster Membership
Genome Browser
Uploader

Data Server Failover

JDBC

Master Election

Client

Indexer

Genome Allocation

Cluster Configuration

Meta Management

Meta Infomation

Data Server #1

โ€ฆ
Genome Unit #1
Disk
Index

Memory
Index

Data
File

Index
File

Index
File

Index
File

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Hadoop DataNode

Hadoop DataNode

โ€ฆ

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Index
File

Data
File

Hadoop DataNode
๊ฒฐ๋ก 
โ€ข Hadoop์„ ์ด์šฉํ•˜์—ฌ
โ€ข ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋ฉด์„œ๋„
โ€ข ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ 1 ~ 2 ms ์ด๋‚ด์— ์กฐํšŒํ•  ์ˆ˜ ์žˆ๋Š”
์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.
์˜จ๋ผ์ธ ์ปจํ…์ธ  ์„œ๋น„์Šค
(๋น…๋ฐ์ดํ„ฐ ๋„์ž… ํ™˜๊ฒฝ)
๊ฐ€์žฅ ์„ฑ๊ณตํ•œ ์‚ฌ๋ก€

โ€ข ์„œ๋น„์Šค ๊ธฐํš์˜ ํŒจ๋Ÿฌ๋‹ค์ž„ ๋ณ€ํ™”
โ€ข ํ”„๋กœ์„ธ์Šค ๋ณ€ํ™”
โ€ข ๊ธฐํš์ž์™€ ๊ฐœ๋ฐœ์ž ๋ชจ๋‘๊ฐ€ ์„œ๋น„์Šค ๋ฐœ๊ตด
โ€ข ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋†€ ์ˆ˜ ์žˆ๋Š” ์ฒด๊ณ„ ๋งˆ๋ จ
โ€ข ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ ์†Œ์Šค ํ™•๋Œ€
โ€ข ์˜คํ”ˆ ์†Œ์Šค ๊ธฐ์ˆ  ๋‚ด์žฌํ™”
๊ตฌ์ถ• ์•„ํ‚คํ…์ฒ˜
HDFS
WAS

Flume

DBMS

StandBy
NameNode

Hive only
MRv1

sqoop

DW

Active
NameNode

๋ฐฐ์น˜๋ถ„์„

sqoop

JournalNode

DataNode

DataNode

๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ

DataNode

๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ์ž

๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ
Batch Processing
Active Cluster
Table

Table

StandBy Cluster

Table

Table

HBase

Table

Table

HBase

RealTime
โ€ข HDFS: hadoop-2.0.0-cdh4.3.0
โ€ข MRv1: hadoop-2.0.0-mr1-cdh.4.3.0
โ€ข HBase: hbase-0.94.6-cdh4.3.0
โ€ข Hive: hive-0.10.0-cdh4.3.0

API ์„œ๋ฒ„
์—”๋“œ ์œ ์ €
ํ”„๋กœ์ ํŠธ ์กฐ์ง ๊ตฌ์„ฑ

โ€ข ๊ธฐํš์ž
โ€ข ๋ถ„์„ ๋ฃฐ ๊ตฌ์„ฑ ๋ฐ ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ
โ€ข ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ์ด์šฉ ์„œ๋น„์Šค ๊ธฐํš ๋ฐ˜์˜
โ€ข ์•„ํ‚คํ…์ฒ˜
โ€ข ๋Œ€๋ถ€๋ถ„์˜ ์‹œ์Šคํ…œ ๊ตฌ์„ฑ ๋ฐ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ์ฒด๊ณ„๋ฅผ ์•Œ๊ณ  ์žˆ์Œ
โ€ข ์ง์ ‘ ๊ฐœ๋ฐœ์— ์ฐธ์—ฌ, ๊ฐœ๋ฐœ๋„ ์ž˜ํ•จ
โ€ข ๊ฐœ๋ฐœ์ž
โ€ข ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„์„ ๋ฃฐ ๊ฐœ๋ฐœ ์—…๋ฌด๋ฅผ ์ˆ˜ํ–‰
โ€ข ์‹œ์Šคํ…œ ์šด์˜์ž
โ€ข Hadoop ํด๋Ÿฌ์Šคํ„ฐ ์„ค์น˜ ๋ฐ ์šด์˜
โ€ข ๊ด€๋ฆฌ์ž
โ€ข ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ์— ์ ๊ทน ์ฐธ์—ฌ
Hive

โ€ข MapReduce์— ์ต์ˆ™์น˜ ์•Š์€ ๊ฐœ๋ฐœ์ž ์ ‘๊ทผ ์šฉ์ด
โ€ข Sqoop์œผ๋กœ ์ด๊ด€๋œ ๋ฐ์ดํ„ฐ ๊ฐ€๊ณต ์ ํ•ฉ
โ€ข ๋ถ„์„ ๋ฃฐ ๊ฐœ๋ฐœ ๊ธฐ๊ฐ„ ๋‹จ์ถ•
๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ #1

๋„ˆ๋ฌด ๋งŽ์€ ๊ตฌํ˜„ ๋Œ€์ƒ Hive ์งˆ์˜
๏ƒ  ๊ทธ ๋งŽ์€ ์งˆ์˜๋ฅผ ๋‹ค ๋งŒ๋“ค ๊ฒƒ์ธ๊ฐ€?
์งˆ์˜ ๋‚ด ๋ฐ˜๋ณต๋˜๋Š” ํŒจํ„ด ๋ถ„์„
์ƒ์† ๊ด€๊ณ„๊ฐ€ ํ˜•์„ฑ๋˜๋Š” ์งˆ์˜
ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ๋ณ€๊ฒฝ๋˜๋Š” ์งˆ์˜

๏ƒ  ์งˆ์˜๋ฅผ ์‰ฝ๊ฒŒ ๋งŒ๋“ค๊ณ , ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•
์€?
๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ #2

์ƒˆ๋กœ์šด ๋ถ„์„ ๋Œ€์ƒ
๋ฐ์ดํ„ฐ ์ถ”๊ฐ€

Hive ํ…Œ์ด๋ธ”
๋ฉ”ํƒ€ ์ •๋ณด

์‹œ์Šคํ…œ ๋‹ด๋‹น์ž

๊ธฐํš์ž
ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

๋ฃฐ ์ƒ์„ฑ

๋ถ„์„ ๋Œ€์ƒ
์˜ค๋ธŒ์ ํŠธ ๋“ฑ๋ก

์‹œ์Šคํ…œ ๋‹ด๋‹น์ž

๋ถ„์„ ๋ฃฐ ๋””์ž์ธ

Ad-hoc
์งˆ์˜ ์‹คํ–‰

๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ
/์‹คํ–‰

์‹œ์Šคํ…œ ๋‹ด๋‹น์ž
์ž๋™/๋ฐฐ์น˜
์˜ค๋ธŒ์ ํŠธ
๋ฉ”ํƒ€ ์ •๋ณด

์˜ค๋ธŒ์ ํŠธ
๋ฉ”ํƒ€ ์ •๋ณด

์‹คํ–‰ ๊ฒฐ๊ณผ

ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

๊ฒฐ๊ณผ ์กฐํšŒ
๊ธฐํš์ž

๊ฒฐ๊ณผ ์ œ๊ณต
API
๋ถ„์„ ๊ฒฐ๊ณผ ์„œ๋น„์Šค
โ€ข ํ•ด๊ฒฐํ•ด์•ผ ๋  ๋ฌธ์ œ
โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ํฌ๋‹ค.
โ€ข ์‚ฌ์šฉ์ž * ์ œํ’ˆ ์ˆ˜ * ์ผ์ž * ๋ถ„์„ ๋ฃฐ ๊ฐœ์ˆ˜
โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ์ž…๋ ฅ์€ ์–ด๋–ป๊ฒŒ?
โ€ข ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž ๋Œ€์ƒ ์„œ๋น„์Šค์ด๊ธฐ ๋•Œ๋ฌธ์— ์•ˆ์ •์  ์šด
์˜
โ€ข ์กฐํšŒ ์„ฑ๋Šฅ๋„ ์ข‹์•„์•ผ ํ•จ
๋ถ„์„ ๊ฒฐ๊ณผ ์„œ๋น„์Šค ์‹œ์Šคํ…œ ๊ตฌ์„ฑ
โ€ข HBase ๊ธฐ๋ฐ˜ ์ด์ค‘ํ™” ์‹œ์Šคํ…œ ๊ตฌ์„ฑ
๋ถ„์„ ๊ฒฐ๊ณผ
(HDFS)

HFileUploader

๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ
Active Cluster

StandBy Cluster
Active Cluster ๊ด€๋ฆฌ

Table

Table

Table

HBase

WAS

Table

Table

Table

HBase
(๋ถ„์„์šฉ ํด๋Ÿฌ์Šคํ„ฐ ํ™œ์šฉ)

WAS

ZooKeeper
์ถ”์ง„๊ณผ์ • #1
โ€ข Stage1
โ€ข DW ํ•™์Šต์— ์˜ํ•œ ๊ธฐ๋Œ€ ์‹ฌ๋ฆฌ
โ€ข ๋น…๋ฐ์ดํ„ฐ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ ์š”๊ตฌ์‚ฌํ•ญ
โ€ข Agile ๋ฐฉ์‹์œผ๋กœ ๋ถ„์„ ์ˆ˜ํ–‰
โ€ข ๊ฐœ๋ฐœํŒ€/์šด์˜ํŒ€ ๊ต์œก ๋ฐ ์‹ค์Šต
โ€ข Stage2
โ€ข ๋น…๋ฐ์ดํ„ฐ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ์š”๊ตฌ์‚ฌํ•ญ
โ€ข ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ๊ฐ„์— ๋Œ€ํ•œ ํ˜„์—…์˜ ์ดํ•ด
โ€ข Stage1 ๊ฒฐ๊ณผ ๊ณต์œ ์— ๋”ฐ๋ฅธ ํ˜„์—… ๊ด€์‹ฌ ์ฆ๊ฐ€
์ถ”์ง„๊ณผ์ • #2
โ€ข Stage3
โ€ข ์—”๋“œ ์œ ์ €์šฉ ๋ผ์ด๋ธŒ ์„œ๋น„์Šค ์˜คํ”ˆ
โ€ข ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๊ธฐํš ์š”๊ฑด ๊ธ‰์ฆ
โ€ข ๊ฐœ๋ฐœํŒ€/์šด์˜ํŒ€ ๊ธฐ์ˆ  ์„ฑ์ˆ™๋„ ์ฆ๊ฐ€
1๋…„ ํ˜‘์—…ํ•ด์„œ
์ด์ œ ๊ธฐ๋ณธ ๊ตฌ์„ฑ
http://si.wsj.net/public/resources/images/OB-UA904_0805bo_G_20120805170407.jpg
http://runtokorea.com/wp-content/uploads/2013/02/1218_boston-marathon-2.jpg
Q&A
THANK YOU

More Related Content

PDF
EGIPTO, Tecnologรญa en la construcciรณn .pdf
PDF
201210 ๊ทธ๋ฃจํ„ฐ ๋น…๋ฐ์ดํ„ฐ_ํ”Œ๋žซํผ_์•„ํ‚คํ…์ณ_๋ฐ_์†”๋ฃจ์…˜_์†Œ๊ฐœ
ย 
PDF
๊ณ ์„ฑ๋Šฅ ๋น…๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ๋ถ„์„ ์†”๋ฃจ์…˜ - ํ‹ฐ๋งฅ์Šค์†Œํ”„ํŠธ ํ—ˆ์Šน์žฌ ํŒ€์žฅ
PDF
๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ๊ณผ ์‹œ์žฅ ์ „๋ง(2014)
PDF
234 deview2013 แ„€แ…ตแ†ทแ„’แ…งแ†ผแ„Œแ…ฎแ†ซ
PDF
Big data 20111203_๋ฐฐํฌํŒ
PDF
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: GRUTER์˜ ๋น…๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ๋ฐ ์ „๋žต ์†Œ๊ฐœ
ย 
PDF
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: ์ธํ„ฐ๋„ท ์‡ผํ•‘๋ชฐ์˜ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ํ”Œ๋žซํผ ๊ตฌ์ถ• ์‚ฌ๋ก€
ย 
EGIPTO, Tecnologรญa en la construcciรณn .pdf
201210 ๊ทธ๋ฃจํ„ฐ ๋น…๋ฐ์ดํ„ฐ_ํ”Œ๋žซํผ_์•„ํ‚คํ…์ณ_๋ฐ_์†”๋ฃจ์…˜_์†Œ๊ฐœ
ย 
๊ณ ์„ฑ๋Šฅ ๋น…๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ๋ถ„์„ ์†”๋ฃจ์…˜ - ํ‹ฐ๋งฅ์Šค์†Œํ”„ํŠธ ํ—ˆ์Šน์žฌ ํŒ€์žฅ
๋น…๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ๊ณผ ์‹œ์žฅ ์ „๋ง(2014)
234 deview2013 แ„€แ…ตแ†ทแ„’แ…งแ†ผแ„Œแ…ฎแ†ซ
Big data 20111203_๋ฐฐํฌํŒ
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: GRUTER์˜ ๋น…๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ๋ฐ ์ „๋žต ์†Œ๊ฐœ
ย 
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: ์ธํ„ฐ๋„ท ์‡ผํ•‘๋ชฐ์˜ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ํ”Œ๋žซํผ ๊ตฌ์ถ• ์‚ฌ๋ก€
ย 

Similar to DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim (20)

PPTX
2017 ์ฃผ์š” ๊ธฐ์ˆ  ํ๋ฆ„ ๋ฐ ๊ฐœ์š”
PDF
SQream DB, GPU-accelerated data warehouse
PDF
The Data tech for AI based innovation(๊ธฐ์—…์˜ AI๊ธฐ๋ฐ˜ ํ˜์‹ ์„ ์ง€์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ธฐ์ˆ )
PPTX
[๊ฒฝ๋ถ] I'mcloud information
PDF
์‹ค์‹œ๊ฐ„ ๋น… ๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ ๋ฐ Daum ํ™œ์šฉ ์‚ฌ๋ก€ ์†Œ๊ฐœ (2013)
PPTX
Azure๋ฅผ ์ด์šฉํ•œ Join ์—†๋Š” ๊ธ€๋กœ๋ฒŒ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ ์„ค๊ณ„ํ•˜๊ธฐ
PDF
DB๊ด€์ ์—์„œ ๋ณธ ๋น…๋ฐ์ดํ„ฐ (2019๋…„ 8์›”)
PPT
Big Data Overview
PDF
Real-time Big Data Analytics Practice with Unstructured Data
ย 
PDF
Jco ์†Œ์…œ ๋น…๋ฐ์ดํ„ฐ_20120218
PDF
๋ฐ๋ธŒ์‹œ์Šคํ„ฐ์ฆˆ ๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ ๊ตฌ์ถ• ์ด์•ผ๊ธฐ : Data Lake architecture case study (๋ฐ•์ฃผํ™ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์ธํ”„๋ผ ํŒ€...
PDF
์กฐ๋Œ€ํ˜‘์˜ ์„œ๋ฒ„ ์‚ฌ์ด๋“œ - ๋Œ€์šฉ๋Ÿ‰ ์•„ํ‚คํ…์ฒ˜์™€ ์„ฑ๋ŠฅํŠœ๋‹
PDF
ํ™•์žฅ๊ฐ€๋Šฅํ•œ ์›น ์•„ํ‚คํ…์ณ ๊ตฌ์ถ• ๋ฐฉ์•ˆ
ย 
PDF
Daumโ€™s Business Analytics Use-cases based on Bigdata technology (2012)
PDF
๋น…๋ฐ์ดํ„ฐํ”Œ๋žซํผ๊ตฌ์ถ•_๊ฐœ๋ฐฉํ˜•ํ”Œ๋žซํผ์ค‘์‹ฌ.pdf
PDF
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: ๋ณด์•ˆ ๋กœ๊ทธ ๋ถ„์„์„ ์œ„ํ•œ ๋น…๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ์‚ฌ๋ก€
ย 
PPTX
Gruter TECHDAY 2014 MelOn BigData
ย 
PDF
๋น…๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ธฐ์ˆ ์˜ ์ดํ•ด
PDF
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
PDF
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
2017 ์ฃผ์š” ๊ธฐ์ˆ  ํ๋ฆ„ ๋ฐ ๊ฐœ์š”
SQream DB, GPU-accelerated data warehouse
The Data tech for AI based innovation(๊ธฐ์—…์˜ AI๊ธฐ๋ฐ˜ ํ˜์‹ ์„ ์ง€์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ธฐ์ˆ )
[๊ฒฝ๋ถ] I'mcloud information
์‹ค์‹œ๊ฐ„ ๋น… ๋ฐ์ดํ„ฐ ๊ธฐ์ˆ  ํ˜„ํ™ฉ ๋ฐ Daum ํ™œ์šฉ ์‚ฌ๋ก€ ์†Œ๊ฐœ (2013)
Azure๋ฅผ ์ด์šฉํ•œ Join ์—†๋Š” ๊ธ€๋กœ๋ฒŒ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ ์„ค๊ณ„ํ•˜๊ธฐ
DB๊ด€์ ์—์„œ ๋ณธ ๋น…๋ฐ์ดํ„ฐ (2019๋…„ 8์›”)
Big Data Overview
Real-time Big Data Analytics Practice with Unstructured Data
ย 
Jco ์†Œ์…œ ๋น…๋ฐ์ดํ„ฐ_20120218
๋ฐ๋ธŒ์‹œ์Šคํ„ฐ์ฆˆ ๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ ๊ตฌ์ถ• ์ด์•ผ๊ธฐ : Data Lake architecture case study (๋ฐ•์ฃผํ™ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์ธํ”„๋ผ ํŒ€...
์กฐ๋Œ€ํ˜‘์˜ ์„œ๋ฒ„ ์‚ฌ์ด๋“œ - ๋Œ€์šฉ๋Ÿ‰ ์•„ํ‚คํ…์ฒ˜์™€ ์„ฑ๋ŠฅํŠœ๋‹
ํ™•์žฅ๊ฐ€๋Šฅํ•œ ์›น ์•„ํ‚คํ…์ณ ๊ตฌ์ถ• ๋ฐฉ์•ˆ
ย 
Daumโ€™s Business Analytics Use-cases based on Bigdata technology (2012)
๋น…๋ฐ์ดํ„ฐํ”Œ๋žซํผ๊ตฌ์ถ•_๊ฐœ๋ฐฉํ˜•ํ”Œ๋žซํผ์ค‘์‹ฌ.pdf
GRUTER๊ฐ€ ๋“ค๋ ค์ฃผ๋Š” Big Data Platform ๊ตฌ์ถ• ์ „๋žต๊ณผ ์ ์šฉ ์‚ฌ๋ก€: ๋ณด์•ˆ ๋กœ๊ทธ ๋ถ„์„์„ ์œ„ํ•œ ๋น…๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ์‚ฌ๋ก€
ย 
Gruter TECHDAY 2014 MelOn BigData
ย 
๋น…๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ธฐ์ˆ ์˜ ์ดํ•ด
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
AWS๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ๋ฐ์ดํ„ฐ๋ ˆ์ดํฌ ๊ตฌ์ถ•ํ•˜๊ธฐ - ๊น€์ง„์›… (SK C&C) :: AWS Community Day 2020
Ad

More from Gruter (20)

PDF
MelOn ๋น…๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ๊ณผ Tajo ์ด์•ผ๊ธฐ
ย 
PDF
Introduction to Apache Tajo: Future of Data Warehouse
ย 
PDF
Expanding Your Data Warehouse with Tajo
ย 
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
ย 
PPTX
Introduction to Apache Tajo
ย 
PDF
์Šคํƒ€ํŠธ์—…์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ๋ถ„์„ : Tajo on AWS
ย 
PDF
What's New Tajo 0.10 and Its Beyond
ย 
PDF
Big data analysis with R and Apache Tajo (in Korean)
ย 
PDF
Efficient Inยญโ€situ Processing of Various Storage Types on Apache Tajo
ย 
PDF
Tajo TPC-H Benchmark Test on AWS
ย 
PDF
Data analysis with Tajo
ย 
PDF
Gruter TECHDAY 2014 Realtime Processing in Telco
ย 
PDF
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
ย 
PPTX
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
ย 
PDF
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
ย 
PPTX
Apache Tajo - BWC 2014
ย 
PPTX
Elastic Search Performance Optimization - Deview 2014
ย 
PPTX
Hadoop security DeView 2014
ย 
PPTX
Vectorized processing in_a_nutshell_DeView2014
ย 
PPTX
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
ย 
MelOn ๋น…๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ๊ณผ Tajo ์ด์•ผ๊ธฐ
ย 
Introduction to Apache Tajo: Future of Data Warehouse
ย 
Expanding Your Data Warehouse with Tajo
ย 
Introduction to Apache Tajo: Data Warehouse for Big Data
ย 
Introduction to Apache Tajo
ย 
์Šคํƒ€ํŠธ์—…์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ๋ถ„์„ : Tajo on AWS
ย 
What's New Tajo 0.10 and Its Beyond
ย 
Big data analysis with R and Apache Tajo (in Korean)
ย 
Efficient Inยญโ€situ Processing of Various Storage Types on Apache Tajo
ย 
Tajo TPC-H Benchmark Test on AWS
ย 
Data analysis with Tajo
ย 
Gruter TECHDAY 2014 Realtime Processing in Telco
ย 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
ย 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
ย 
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
ย 
Apache Tajo - BWC 2014
ย 
Elastic Search Performance Optimization - Deview 2014
ย 
Hadoop security DeView 2014
ย 
Vectorized processing in_a_nutshell_DeView2014
ย 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
ย 
Ad

DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim

  • 1. ํ•˜๋‘ก ๋ฐ ํ•˜๋‘ก ์—์ฝ” ์‹œ์Šคํ…œ์„ ์ด์šฉํ•œ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ์•„ํ‚คํ…์ฒ˜ ์ ์šฉ ์‚ฌ๋ก€ ๊น€ํ˜•์ค€ / GRUTER
  • 2. CONTENTS 1. ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ์˜ ๋น…๋ฐ์ดํ„ฐ 2. e-Commerce ์ ์šฉ ์‚ฌ๋ก€ 3. ๋ณด์•ˆ ๋ถ„์„ ํ”Œ๋žซํผ ์‚ฌ๋ก€ 4. ๋ฐ”์ด์˜ค ์ธํฌ๋ฉ”ํ‹ฑ์Šค ์‚ฌ๋ก€ 5. ์˜จ๋ผ์ธ ์ปจํ…์ธ  ์„œ๋น„์Šค ์‚ฌ๋ก€
  • 4. ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ์˜ IT ํ™˜๊ฒฝ โ€ข ํ˜„์žฌ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ IT ํ™˜๊ฒฝ์€ ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ ์šฉํ•˜๊ธฐ ์–ด๋ ค์šด ํ™˜๊ฒฝ IT ๊ธฐํš ๋ฐ ๊ด€๋ฆฌ ์ค‘์‹ฌ, ์‹คํ–‰์€ ์•„์›ƒ ์†Œ์‹ฑ(BAD) IT ์žํšŒ์‚ฌ๊ฐ€ ๊ด€๋ฆฌ ๋ฐ ์‹คํ–‰(BAD) ์ฃผ์š” ์šด์˜/๊ฐœ๋ฐœ์€ ์ง์ ‘ ์ˆ˜ํ–‰, ์ผ๋ถ€ ์™ธ์ฃผ(GOOD) ๋Œ€๋ถ€๋ถ„ ์ง์ ‘ ์ˆ˜ํ–‰(GOOD)
  • 5. ๋น…๋ฐ์ดํ„ฐ ํ”„๋กœ์ ํŠธ์˜ ์„ฑ๊ณต ์š”์†Œ โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ๊ฐ€์น˜ > ๋ถ„์„ ๋น„์šฉ โ€ข ๋ฌด์—‡์„ ๋ถ„์„ํ•  ๊ฒƒ์ธ๊ฐ€์— ๋Œ€ํ•œ ๊ณ ๋ฏผ โ€ข ์ง€์†์ ์ธ ๋ถ„์„ ๊ฒฐ๊ณผ ๊ฐœ์„  ํ™œ๋™(ํŠœ๋‹) โ€ข IT ๋ถ€์„œ๊ฐ€ ์•„๋‹Œ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ๋ถ€์„œ๊ฐ€ ์ฃผ๋„ โ€ข !์ž˜ ์ž‘์„ฑ๋œ ํ”„๋กœ์ ํŠธ ๊ณ„ํš์„œ โ€ข ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ˆ ๋ ฅ
  • 6. ๋น…๋ฐ์ดํ„ฐ ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ์‹œ์Šคํ…œ ๊ธฐํš (๋ถ„์„ ๋Œ€์ƒ, ๋ฐ์ดํ„ฐ, ์•Œ๊ณ ๋ฆฌ์ฆ˜) ์‹œ์Šคํ…œ ๊ธฐํš (๋ถ„์„ ๋„๋ฉ”์ธ๋งŒ ๊ฒฐ์ •, ๋งˆ์ผ€ํŒ…, ์ƒ์‚ฐ์„ฑ ํ–ฅ์ƒ, ... ) ์‹œ์Šคํ…œ ๋น„์šฉ ๋ฐ ROI ์‚ฐ์ • ๊ด€๋ จ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ (๊ธฐ์—… ๋‚ด๋ถ€, ์™ธ๋ถ€) ์—…์ฒด ์„ ์ • ๊ฐœ๋ฐœ ์šด์˜ 3 ~ 6๊ฐœ์›” ์ด์ƒ ์†Œ์š” ๋ฐ์ดํ„ฐ ๊ฐ€์ง€๊ณ  ๋†€๊ธฐ ๊ฐ€์น˜ ๋ฐœ๊ตด ์‹œ์Šคํ…œ์— ๋ฐ˜์˜ ์ง€์†์ ์ธ ํ™œ๋™
  • 7. ๊ฒฐ๋ก !!! โ€ข ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ํ˜„์žฌ์˜ ๋น…๋ฐ์ดํ„ฐ์˜ ๊ฐ€ ์žฅ ํฐ ์ฐจ์ด๋Š” โ€ข ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋„ ์•„๋‹ˆ๊ณ , ์ข…๋ฅ˜๋„ ์•„๋‹ˆ๊ณ , ์†๋„ ๋„ ์•„๋‹Œ โ€ข ๊ธฐ์—… ์Šค์Šค๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ ๊ทน์ ์œผ๋กœ ์ด์šฉ ํ•ด์„œ ์ œํ’ˆ ๊ฐœ๋ฐœ, ์„œ๋น„์Šค ๊ธฐ๋Šฅ, ๋งˆ์ผ€ํŒ… ๋“ฑ์— ์ฐจ ๋ณ„ํ™”๋˜๊ณ  ๊ฒฝ์Ÿ ์šฐ์œ„์— ์žˆ๋Š” ๋ฌด๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ.
  • 9. e-Commerce ๋ฐ์ดํ„ฐ ๋ถ„์„ โ€ข ์š”๊ตฌ์‚ฌํ•ญ์€?
  • 10. ํ˜„์‹ค์€? โ€ข ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๋กœ๊ทธ ์กฐ์ฐจ๋„ ์ผ ๋‹จ์œ„ ๋ถ„์„ โ€ข HTTP LOG ๋“ฑ โ€ข ๋น„์ฆˆ๋‹ˆ์Šค์— ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋กœ๊ทธ๋„ ์—†์Œ โ€ข ์ผ๋ถ€ ๋กœ๊ทธ๋Š” ์™ธ๋ถ€ ์—…์ฒด๋กœ ์ „๋‹ฌ
  • 12. ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์‹œ์Šคํ…œ ๊ตฌ์„ฑ ์˜ˆ ์ž„์‹œ ์ €์žฅ์†Œ์ธ Queue ์žฅ์•  ์‹œ ๋ฐฉ์•ˆ? ๋ถ„์„ ์ค‘ ์ผ๋ถ€ ๋ถ„์„ ์„œ๋ฒ„ ์žฅ์•  ์‹œ ์ž„์‹œ ๋ถ„์„ ๊ฒฐ๊ณผ๋Š” ์–ด๋–ป๊ฒŒ? ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ์˜ ์„ฑ๋Šฅ์€? ๋ถ„์„ ๊ฒฐ๊ณผ ์„œ๋น„์Šค ์ œ๊ณต ์‹œ ์ถฉ๋ถ„ํ•œ ๊ธฐ๋Šฅ ์ œ๊ณต? http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
  • 13. ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์–ด๋ ค์›€ #1 โ€ข ์ค‘๋ณต, ์œ ์‹ค, ์„ฑ๋Šฅ ๋ชจ๋‘๋ฅผ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์–ด๋ ค์›€ โ€ข ์ด์ค‘ํ™”๋œ ํ์™€ ์ฒดํฌ ํฌ์ธํŒ… ๊ธฐ๋Šฅ์ด ํ•ต์‹ฌ โ€ข ์ฒดํฌ ํฌ์ธํŒ…์„ ์ž์ฃผ ํ•˜๋ฉด ์„ฑ๋Šฅ ์ €ํ•˜ โ€ข ๊ฐ€๋” ํ•˜๋ฉด ๋ฐ์ดํ„ฐ ์œ ์‹ค์ด ๋†’์•„์ง โ€ข ์„ฑ๋Šฅ โ€ข ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ, ๋ถ„์„์˜ ๋ณต์žก์„ฑ(๋‹ค์–‘ํ•œ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ์™€ ์—ฐ ๊ณ„ ๋“ฑ) โ€ข ์šด์˜ ๊ด€๋ฆฌ โ€ข ๋ฌด์ •์ง€๋กœ ์šด์˜ ๋˜์–ด์•ผ ํ•จ โ€ข ํ”„๋กœ๊ทธ๋žจ ๋ฐฐํฌ โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ โ€ข ์ €์žฅ ์ฃผ๊ธฐ, ์ฒดํฌ ํฌ์ธํŠธ โ€ข ์ €์žฅ์†Œ ์„ฑ๋Šฅ, ๊ธฐ๋Šฅ
  • 14. ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์–ด๋ ค์›€ #2 โ€ข ์‹œ๊ฐ„ ๊ด€๋ฆฌ โ€ข ๋ถ„์‚ฐ๋œ ํ™˜๊ฒฝ์˜ ์‹œ๊ฐ„ ๋™๊ธฐํ™” โ€ข Time window ๋™๊ธฐํ™” โ€ข Data time vs. System time โ€ข ๋ถ„์„ ๋กœ์ง ๊ตฌํ˜„ โ€ข SQL ๊ธฐ๋ฐ˜ โ€ข ํ”„๋กœ๊ทธ๋žจ ๊ธฐ๋ฐ˜ โ€ข ํ”Œ๋žซํผ๋“ค์˜ ์กฐํ•ฉ โ€ข Flume, Storm, Kafka ๋“ฑ โ€ข ๊ฐ๊ฐ์€ HA ๋“ฑ์— ๋Œ€ํ•œ ๊ธฐ๋Šฅ ์ œ๊ณต, But ์กฐํ•ฉ ์‹œ ๋ถˆํ˜‘ํ™”์Œ โ€ข ์„œ๋ฒ„ ์‚ฌ์ด์ง• โ€ข Agent/Collelctor ๋Œ“์ˆ˜ ๋น„์œจ, CPU/Network ๋“ฑ
  • 15. ๊ตฌ์ถ•๋œ ์‹ค์‹œ๊ฐ„ ํ”Œ๋žซํผ(์ž์ฒด ๊ฐœ๋ฐœ) ZooKeeper Flume Collector Dimension Data ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ (HBase) Time Window Manager (Master Role) Realtime Server memory Realtime Client Queue User Processor Replicator Partition Proxy Processor Engine Partitioner Flume Collector Partition #1 Realtime Client Partition #2 Partition #3
  • 16. ํŠน์ง• #1 โ€ข ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ํด๋Ÿฌ์Šคํ„ฐ ํŒŒํ‹ฐ์…˜ โ€ข ๋ฐ์ดํ„ฐ ํŒŒํ‹ฐ์…˜ ์ฒ˜๋ฆฌ ์‰ฌ์šด ์žฅ์  โ€ข ์„œ๋ฒ„ ์ถ”๊ฐ€/์ œ๊ฑฐ ๋‹จ์ ์€ Shell ๋ช…๋ น์„ ํ†ตํ•ด ์‹คํ–‰ โ€ข ํŒŒํ‹ฐ์…˜ ์ด์ค‘ํ™” โ€ข ํ•˜๋‚˜์˜ ํŒŒํ‹ฐ์…˜์€ ๋‘ ๊ฐœ์˜ ์„œ๋ฒ„๊ฐ€ ๋‹ด๋‹น(Master/Slave) โ€ข ๋ถ„์‚ฐ ์‹ค์‹œ๊ฐ„ ๋ถ„์„์— ํ•„์š”ํ•œ ๋‹ค์–‘ํ•œ ๋ชจ๋“ˆ ๊ธฐ๋ณธ ์ œ๊ณต โ€ข ๋ถ„์‚ฐ๋œ ์„œ๋ฒ„๋“ค ์‚ฌ์ด์— ๋™๊ธฐํ™”๋œ Flush ๊ธฐ๋Šฅ โ€ข Time ๋™๊ธฐํ™” ๊ธฐ๋Šฅ, Esper ์—ฐ๊ณ„ ๋ชจ๋“ˆ โ€ข WorkGroup โ€ข ํ•˜๋‚˜์˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„์„ ๋ชจ๋“ˆ์ด ์—ฐ๊ฒฐ ๋˜์–ด์•ผ ํ•จ. โ€ข ํ•˜๋‚˜์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„์„ ์—…๋ฌด๋ฅผ ๋™์‹œ์— ์ˆ˜ํ–‰
  • 17. ํŠน์ง• #2 โ€ข ์ž์ฒด ๊ฐœ๋ฐœ โ€ข ๊ณต๊ฐœ๋œ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์†”๋ฃจ์…˜์€ ๋‹ค์Œ ๊ธฐ๋Šฅ ์ œ๊ณต โ€ข ๋ฐ๋ชฌ ์„œ๋ฒ„, ๋ฐ์ดํ„ฐ ์†ก์ˆ˜์‹  RPC โ€ข ํ”„๋กœ๊ทธ๋žจ ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ํŒŒํ‹ฐ์…”๋‹, Queue์™€ ์—ฐ๋™ โ€ข ํ™œ์šฉ ๊ฐ€๋Šฅํ•œ ์กฐ๊ฐ ๋ชจ์Œ์€ ๋Œ€๋ถ€๋ถ„ ์˜คํ”ˆ ์†Œ์Šค๋กœ ๋‚˜์™€ ์žˆ์Œ โ€ข RPC: Thrift, Avro, Protobuf, Netty โ€ข Event, Cluster Membership, Synchronization: ZooKeeper โ€ข Query Processing: Esper โ€ข Queue: Kafka, RabbitMQ, ZeroMQ
  • 18. ๋ฐ์ดํ„ฐ ๋ถ„์„ ํ๋ฆ„ Load in memory hash(url) IP-City Data URL, Count(1) Group by URL Log Parsing WorkGroup #1 (LogType=URL) time batch 60 sec. TOP 100 Order by count Desc URL, Count(1) Group by URL log data Log Parsing Log Parsing Count (Distinct User) HBase Table hash(user_id) Count (Distinct User) WorkGroup #2 (LogType=User) time batch 20 sec.
  • 19. ๊ฒฐ๋ก  โ€ข ์‹ค์‹œ๊ฐ„ ๋ถ„์„์€ ๋Œ€์„ธ์ด์ง€๋งŒ ๋งŽ์€ ๋‚œ๊ด€์ด ์กด์žฌ โ€ข ๊ณ ๊ฐ์˜ ์š”๊ตฌ(์ •ํ•ฉ์„ฑ, ์•ˆ์ •์„ฑ ๋ชจ๋‘ ๋งŒ์กฑ ๋“ฑ) โ€ข ๋ฉ”ํƒ€ ์ •๋ณด(JOIN) ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ โ€ข ์šด์˜์˜ ์–ด๋ ค์›€(ํ•ญ์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ํ˜๋Ÿฌ ๋‹ค๋‹˜) โ€ข ๋ถ„์„ ๋Œ€์ƒ ๋ฐ์ดํ„ฐ์˜ ์†์„ฑ, ๋ถ„์„ ๋กœ์ง ๋“ฑ์— ๋”ฐ๋ผ ์ ์ ˆํ•œ ํ”Œ๋žซํผ ์„ ํƒ โ€ข ํ”Œ๋žซํผ์€ ๊ธฐ๋ณธ๋งŒ ์ œ๊ณต โ€ข ๋งŽ์€ ๊ฒƒ์„ ๊ทธ ์œ„์— ๋งŒ๋“ค์–ด์•ผ ํ•จ โ€ข ์ ์ ˆํ•œ ํ”Œ๋žซํผ์ด ์—†์œผ๋ฉด ๋งŒ๋“œ๋Š” ๊ฒƒ๋„ ๋ฐฉ๋ฒ•
  • 20. ๋ณด์•ˆ ๋ถ„์„ ํ”Œ๋žซํผ ์‚ฌ๋ก€ (๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ๊ฒ€์ƒ‰)
  • 21. ๋ณด์•ˆ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ด์„œ ํ†ตํ•ฉ ์ €์žฅ์†Œ์— ์ €์žฅํ•œ ๋‹ค์Œ ๋ถ„์„์„ ํ†ตํ•ด์„œ ๋ณด์•ˆ ์œ„ํ˜‘์„ ์ฐพ์•„๋‚ด๊ณ  ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์„œ ์‹ค์‹œ๊ฐ„ ๊ฐ์ง€ ๋ฐ ๋Œ€์‘ ์‹œ์Šคํ…œ์— ์ ์šฉํ•ด์„œ ๋ณด์•ˆ ๊ณต๊ฒฉ์— ๋Œ€๋น„ํ•œ๋‹ค ์ด ๊ณผ์ •์„ ์ง€์†์ ์œผ๋กœ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ๋” ๊ฐ•๋ ฅํ•˜๊ณ  ์ง€๋Šฅ์ ์ธ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์„œ ๋ณ€ํ™”ํ•˜๋Š” ๋ณด์•ˆ ์œ„ํ˜‘ ์— ๋Œ€์‘ํ•œ๋‹ค
  • 22. ์ „์ฒด ์•„ํ‚คํ…์ฒ˜ Data source/collector (various log data) Data collector/ real-time analysis Flume Collector Data Source (Web Server) Cluster Monitoring Cluster coordinator Rule Manager Zookeeper ARM Cloumon Logical Node primary storage(File/Structured), near real-time analysis Thrift Flume Source Agent Pipeline-Sink Thrift Sink Temporary HBase RegionServer SemiStructured Cloustream Hadoop DataNode NoSQL (HBase) Origin File Near real-time analysis Hadoop Thrift Source Data source/collector (standard protocols such as FTP, HTTP) Data Source FTP/ Flume HTTP Agent Temporary Thrift Sink Search engine Search ElasticSearch Real-time Analysis Index Batch analysis/storage Batch analysis Real-time analysis result storage (File/Structured) HBase RegionServer SemiStructured Hive Hadoop MapReduce Hadoop DataNode Hadoop DataNode Origin File Oracle/MySQL RDB Analysis Result Origin File
  • 23. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ โ€ข ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ๋ฐœ์ƒ์› = ์œ ์—ฐํ•œ ์ˆ˜์ง‘ ์‹œ์Šคํ…œ โ€ข ์‹ค์‹œ๊ฐ„ ์ˆ˜์ง‘ = ์ด๋ฒคํŠธ ์ŠคํŠธ๋ฆฌ๋ฐ โ€ข ๋‹ค์–‘ํ•œ ํ”„๋กœ์„ธ์‹ฑ = pluggable pipeline ๊ตฌ์กฐ โ€ข scalability, reliability, extensibility, manageability โ€ข Flume agent data collector . . . . agent collector data storage
  • 24. ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ #1 โ€ข Flume OG ์‚ฌ์šฉ โ€ข ์ค‘์•™ ์ง‘์ค‘ ๊ด€๋ฆฌ ๊ธฐ๋Šฅ์ด ์šฐ์ˆ˜(NG์— ๋น„ํ•ด) โ€ข ๋„์ž… ๋‹น์‹œ NG๋Š” ์„ฑ์ˆ™๋œ ์ƒํƒœ๊ฐ€ ์•„๋‹ˆ์—ˆ์Œ โ€ข Tailing์ด ์‰ฝ์ง€ ์•Š์Œ โ€ข ๊ธฐ๋ณธ ์ œ๊ณต Tailer๋Š” ์‹ค์ œ ์—…๋ฌด ์ ์šฉ์— ํ•œ๊ณ„ โ€ข ๊ธฐ์กด ์šด์˜ ์žฅ๋น„ ๋ถ€ํ•˜ ์ตœ์†Œ(CPU/Network ๋“ฑ) โ€ข CPU 5%์ดํ•˜, Memory 32MB ์ดํ•˜ โ€ข Checkpoint ๊ด€๋ฆฌ ๊ธฐ๋Šฅ โ€ข Agent ์žฌ ์‹œ์ž‘ ์‹œ Throttling ๊ธฐ๋Šฅ โ€ข Network ๋Œ€์—ญ ๋ชจ๋‘ ์‚ฌ์šฉ ๋ฌธ์ œ โ€ข Rolling File์— ๋Œ€ํ•œ ์ธ์‹ โ€ข Windows 2000 Server?
  • 25. ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ #2 โ€ข ๋‹ค์–‘ํ•œ ํ”„๋กœํ† ์ฝœ ๋ฐ ์žฅ๋น„ ์ง€์› โ€ข TCP, Syslog, SNMP ๋“ฑ โ€ข Linux, AIX, HP-UX, Solaris, Windows โ€ข ์œ ์‹ค/์ค‘๋ณต/์„ฑ๋Šฅ ๋ชจ๋‘ ๋งŒ์กฑํ•˜๊ธฐ ์–ด๋ ค์›€ โ€ข Collector ์ด์ค‘ํ™” โ€ข Agent -> Collector -> ์ €์žฅ์†Œ๊นŒ์ง€ ์ €์žฅ ํ›„ ACK(์„ฑ๋Šฅ ์ € ํ•˜) โ€ข ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ์ž˜๋˜๊ณ  ์žˆ๋Š”์ง€ ๋ชจ๋‹ˆํ„ฐ๋ง ์–ด๋ ค์›€ โ€ข Component(Agent, Switch, Collector, ์ €์žฅ์†Œ ๋“ฑ) ๋ชจ๋‹ˆํ„ฐ ๋ง ๊ตฌ์„ฑ ํ•„์š” -> ์–ด๋ ค์›€ โ€ข ๊ฐœ๋ฐœ ์™ธ๋ถ€์ ์ธ ์‚ฌํ•ญ์ด ๋” ํฐ ์–ด๋ ค์›€ โ€ข ๋ฐฉํ™”๋ฒฝ ํ•ด์ œ โ€ข Agent ์„ค์น˜์— ๋Œ€ํ•œ ๊ฑฐ๋ถ€๊ฐ
  • 26. ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰ โ€ข ์š”๊ตฌ์‚ฌํ•ญ โ€ข ์ „์ฒด ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ(์ˆ˜๋ฐฑGB/์ผ), ๋ˆ„์  6๊ฐœ์›” ๋ณด๊ด€, ์‘๋‹ต์†๋„ ๋Š” 10 ~ 30์ดˆ ์ด๋‚ด โ€ข ํ˜„์‹ค์€? โ€ข ์ƒ์šฉ ์†”๋ฃจ์…˜์€ ๊ณ ๊ฐ€์˜ ๋น„์šฉ, ๋ผ์ด์„ ์Šค๊ฐ€ ํŠธ๋ž˜ํ”ฝ ์ค‘์‹ฌ โ€ข ์ผ๋ฐ˜์ ์ธ ๊ฒ€์ƒ‰ ์†”๋ฃจ์…˜(์˜คํ”ˆ์†Œ์Šค ์†”๋ฃจ์…˜ ํฌํ•จ)์€ ์„œ๋น„์Šค์— ๋งž์ถฐ์ ธ ์žˆ์–ด ๋Œ€์šฉ๋Ÿ‰, ์žฅ๊ธฐ๊ฐ„ ๋ฐ์ดํ„ฐ ๋ณด๊ด€์—๋Š” ์ทจ์•ฝ โ€ข ์•„์ด๋””์–ด โ€ข ๊ฒ€์ƒ‰ ํด๋Ÿฌ์Šคํ„ฐ ์ด์ค‘ํ™” โ€ข ์ตœ๊ทผ ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค/๊ฒ€์ƒ‰์šฉ -> Native ElasticSearch โ€ข ๊ณผ๊ฑฐ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๋ณด๊ด€/๊ฒ€์ƒ‰์šฉ -> ElasticSearch for Hadoop
  • 27. ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰ ์•„ํ‚คํ…์ฒ˜ ์‹ค์‹œ๊ฐ„ ์ƒ‰์ธ ํด๋Ÿฌ์Šคํ„ฐ(์ตœ์‹  ๋ฐ์ดํ„ฐ) ์ฝ๊ธฐ ์ „์šฉ ํด๋Ÿฌ์Šคํ„ฐ(์ „์ฒด ๋ฐ์ดํ„ฐ) Server1 Hadoop FileSystem (for Analytic) index1 (SAS or SATA) Collector HDFSSink ElasticSearc h Sink Hadoop FileSystem (for elastcisearch) ElasticSearch Server2 index 7 Index Migration Tool index 8 index 9 index 10 index 11 index 12 ElasticSearch Server1 Application Searcher HDFS Gateway HDFS Gateway ElasticSearch index2 (SAS or SATA) Server2 ElasticSearch
  • 29. ์š”๊ตฌ์‚ฌํ•ญ: Genome Browser์šฉ DB http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
  • 30. Challenges โ€ข ๋„๋ฉ”์ธ ์ดํ•ด์˜ ์–ด๋ ค์›€ โ€ข AATCTATA AATCTATA AATCTATA โ€ฆ โ€ข ์ˆ˜ ๋งŽ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐ ์ˆ˜์‹ โ€ข Maxam-Gilbert sequencing โ€ข R-Tree โ€ข ๋‹ค์–‘ํ•œ Data format โ€ข FASTA, SAM, BAM, SNP, CNV, Inversion Large InDel, Small InDel โ€ข ๋Œ€์šฉ๋Ÿ‰ ๋ ˆ์ฝ”๋“œ ์ €์žฅ๊ณผ ๊ฒ€์ƒ‰ (Read only)
  • 31. ์‹œ์Šคํ…œ ๊ตฌ์„ฑ Uploader Application Server ZooKeeper Master Server Server Cluster Membership Genome Browser Uploader Data Server Failover JDBC Master Election Client Indexer Genome Allocation Cluster Configuration Meta Management Meta Infomation Data Server #1 โ€ฆ Genome Unit #1 Disk Index Memory Index Data File Index File Index File Index File Index File Data File Index File Data File Index File Data File Index File Data File Index File Data File Index File Data File Index File Data File Index File Data File Hadoop DataNode Hadoop DataNode โ€ฆ Index File Data File Index File Data File Index File Data File Index File Data File Hadoop DataNode
  • 32. ๊ฒฐ๋ก  โ€ข Hadoop์„ ์ด์šฉํ•˜์—ฌ โ€ข ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋ฉด์„œ๋„ โ€ข ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ 1 ~ 2 ms ์ด๋‚ด์— ์กฐํšŒํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.
  • 34. ๊ฐ€์žฅ ์„ฑ๊ณตํ•œ ์‚ฌ๋ก€ โ€ข ์„œ๋น„์Šค ๊ธฐํš์˜ ํŒจ๋Ÿฌ๋‹ค์ž„ ๋ณ€ํ™” โ€ข ํ”„๋กœ์„ธ์Šค ๋ณ€ํ™” โ€ข ๊ธฐํš์ž์™€ ๊ฐœ๋ฐœ์ž ๋ชจ๋‘๊ฐ€ ์„œ๋น„์Šค ๋ฐœ๊ตด โ€ข ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋†€ ์ˆ˜ ์žˆ๋Š” ์ฒด๊ณ„ ๋งˆ๋ จ โ€ข ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ ์†Œ์Šค ํ™•๋Œ€ โ€ข ์˜คํ”ˆ ์†Œ์Šค ๊ธฐ์ˆ  ๋‚ด์žฌํ™”
  • 35. ๊ตฌ์ถ• ์•„ํ‚คํ…์ฒ˜ HDFS WAS Flume DBMS StandBy NameNode Hive only MRv1 sqoop DW Active NameNode ๋ฐฐ์น˜๋ถ„์„ sqoop JournalNode DataNode DataNode ๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ DataNode ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ์ž ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ Batch Processing Active Cluster Table Table StandBy Cluster Table Table HBase Table Table HBase RealTime โ€ข HDFS: hadoop-2.0.0-cdh4.3.0 โ€ข MRv1: hadoop-2.0.0-mr1-cdh.4.3.0 โ€ข HBase: hbase-0.94.6-cdh4.3.0 โ€ข Hive: hive-0.10.0-cdh4.3.0 API ์„œ๋ฒ„ ์—”๋“œ ์œ ์ €
  • 36. ํ”„๋กœ์ ํŠธ ์กฐ์ง ๊ตฌ์„ฑ โ€ข ๊ธฐํš์ž โ€ข ๋ถ„์„ ๋ฃฐ ๊ตฌ์„ฑ ๋ฐ ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ โ€ข ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ์ด์šฉ ์„œ๋น„์Šค ๊ธฐํš ๋ฐ˜์˜ โ€ข ์•„ํ‚คํ…์ฒ˜ โ€ข ๋Œ€๋ถ€๋ถ„์˜ ์‹œ์Šคํ…œ ๊ตฌ์„ฑ ๋ฐ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ์ฒด๊ณ„๋ฅผ ์•Œ๊ณ  ์žˆ์Œ โ€ข ์ง์ ‘ ๊ฐœ๋ฐœ์— ์ฐธ์—ฌ, ๊ฐœ๋ฐœ๋„ ์ž˜ํ•จ โ€ข ๊ฐœ๋ฐœ์ž โ€ข ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„์„ ๋ฃฐ ๊ฐœ๋ฐœ ์—…๋ฌด๋ฅผ ์ˆ˜ํ–‰ โ€ข ์‹œ์Šคํ…œ ์šด์˜์ž โ€ข Hadoop ํด๋Ÿฌ์Šคํ„ฐ ์„ค์น˜ ๋ฐ ์šด์˜ โ€ข ๊ด€๋ฆฌ์ž โ€ข ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ์— ์ ๊ทน ์ฐธ์—ฌ
  • 37. Hive โ€ข MapReduce์— ์ต์ˆ™์น˜ ์•Š์€ ๊ฐœ๋ฐœ์ž ์ ‘๊ทผ ์šฉ์ด โ€ข Sqoop์œผ๋กœ ์ด๊ด€๋œ ๋ฐ์ดํ„ฐ ๊ฐ€๊ณต ์ ํ•ฉ โ€ข ๋ถ„์„ ๋ฃฐ ๊ฐœ๋ฐœ ๊ธฐ๊ฐ„ ๋‹จ์ถ•
  • 38. ๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ #1 ๋„ˆ๋ฌด ๋งŽ์€ ๊ตฌํ˜„ ๋Œ€์ƒ Hive ์งˆ์˜ ๏ƒ  ๊ทธ ๋งŽ์€ ์งˆ์˜๋ฅผ ๋‹ค ๋งŒ๋“ค ๊ฒƒ์ธ๊ฐ€? ์งˆ์˜ ๋‚ด ๋ฐ˜๋ณต๋˜๋Š” ํŒจํ„ด ๋ถ„์„ ์ƒ์† ๊ด€๊ณ„๊ฐ€ ํ˜•์„ฑ๋˜๋Š” ์งˆ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ๋ณ€๊ฒฝ๋˜๋Š” ์งˆ์˜ ๏ƒ  ์งˆ์˜๋ฅผ ์‰ฝ๊ฒŒ ๋งŒ๋“ค๊ณ , ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ• ์€?
  • 39. ๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ #2 ์ƒˆ๋กœ์šด ๋ถ„์„ ๋Œ€์ƒ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ Hive ํ…Œ์ด๋ธ” ๋ฉ”ํƒ€ ์ •๋ณด ์‹œ์Šคํ…œ ๋‹ด๋‹น์ž ๊ธฐํš์ž ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๋ฃฐ ์ƒ์„ฑ ๋ถ„์„ ๋Œ€์ƒ ์˜ค๋ธŒ์ ํŠธ ๋“ฑ๋ก ์‹œ์Šคํ…œ ๋‹ด๋‹น์ž ๋ถ„์„ ๋ฃฐ ๋””์ž์ธ Ad-hoc ์งˆ์˜ ์‹คํ–‰ ๋ถ„์„ ๋ฃฐ ๊ด€๋ฆฌ /์‹คํ–‰ ์‹œ์Šคํ…œ ๋‹ด๋‹น์ž ์ž๋™/๋ฐฐ์น˜ ์˜ค๋ธŒ์ ํŠธ ๋ฉ”ํƒ€ ์ •๋ณด ์˜ค๋ธŒ์ ํŠธ ๋ฉ”ํƒ€ ์ •๋ณด ์‹คํ–‰ ๊ฒฐ๊ณผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๊ฒฐ๊ณผ ์กฐํšŒ ๊ธฐํš์ž ๊ฒฐ๊ณผ ์ œ๊ณต API
  • 40. ๋ถ„์„ ๊ฒฐ๊ณผ ์„œ๋น„์Šค โ€ข ํ•ด๊ฒฐํ•ด์•ผ ๋  ๋ฌธ์ œ โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ํฌ๋‹ค. โ€ข ์‚ฌ์šฉ์ž * ์ œํ’ˆ ์ˆ˜ * ์ผ์ž * ๋ถ„์„ ๋ฃฐ ๊ฐœ์ˆ˜ โ€ข ๋ถ„์„ ๊ฒฐ๊ณผ ์ž…๋ ฅ์€ ์–ด๋–ป๊ฒŒ? โ€ข ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž ๋Œ€์ƒ ์„œ๋น„์Šค์ด๊ธฐ ๋•Œ๋ฌธ์— ์•ˆ์ •์  ์šด ์˜ โ€ข ์กฐํšŒ ์„ฑ๋Šฅ๋„ ์ข‹์•„์•ผ ํ•จ
  • 41. ๋ถ„์„ ๊ฒฐ๊ณผ ์„œ๋น„์Šค ์‹œ์Šคํ…œ ๊ตฌ์„ฑ โ€ข HBase ๊ธฐ๋ฐ˜ ์ด์ค‘ํ™” ์‹œ์Šคํ…œ ๊ตฌ์„ฑ ๋ถ„์„ ๊ฒฐ๊ณผ (HDFS) HFileUploader ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ์†Œ Active Cluster StandBy Cluster Active Cluster ๊ด€๋ฆฌ Table Table Table HBase WAS Table Table Table HBase (๋ถ„์„์šฉ ํด๋Ÿฌ์Šคํ„ฐ ํ™œ์šฉ) WAS ZooKeeper
  • 42. ์ถ”์ง„๊ณผ์ • #1 โ€ข Stage1 โ€ข DW ํ•™์Šต์— ์˜ํ•œ ๊ธฐ๋Œ€ ์‹ฌ๋ฆฌ โ€ข ๋น…๋ฐ์ดํ„ฐ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ ์š”๊ตฌ์‚ฌํ•ญ โ€ข Agile ๋ฐฉ์‹์œผ๋กœ ๋ถ„์„ ์ˆ˜ํ–‰ โ€ข ๊ฐœ๋ฐœํŒ€/์šด์˜ํŒ€ ๊ต์œก ๋ฐ ์‹ค์Šต โ€ข Stage2 โ€ข ๋น…๋ฐ์ดํ„ฐ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ์š”๊ตฌ์‚ฌํ•ญ โ€ข ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ๊ฐ„์— ๋Œ€ํ•œ ํ˜„์—…์˜ ์ดํ•ด โ€ข Stage1 ๊ฒฐ๊ณผ ๊ณต์œ ์— ๋”ฐ๋ฅธ ํ˜„์—… ๊ด€์‹ฌ ์ฆ๊ฐ€
  • 43. ์ถ”์ง„๊ณผ์ • #2 โ€ข Stage3 โ€ข ์—”๋“œ ์œ ์ €์šฉ ๋ผ์ด๋ธŒ ์„œ๋น„์Šค ์˜คํ”ˆ โ€ข ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๊ธฐํš ์š”๊ฑด ๊ธ‰์ฆ โ€ข ๊ฐœ๋ฐœํŒ€/์šด์˜ํŒ€ ๊ธฐ์ˆ  ์„ฑ์ˆ™๋„ ์ฆ๊ฐ€
  • 44. 1๋…„ ํ˜‘์—…ํ•ด์„œ ์ด์ œ ๊ธฐ๋ณธ ๊ตฌ์„ฑ http://si.wsj.net/public/resources/images/OB-UA904_0805bo_G_20120805170407.jpg http://runtokorea.com/wp-content/uploads/2013/02/1218_boston-marathon-2.jpg
  • 45. Q&A