SlideShare a Scribd company logo
Hadoop Engineering v.1.0 for dataconference.io 
2014-12-06 
์ •๊ตฌ๋ฒ” 
Search developer at Daumkakao 
mypowerbox@gmail.com
Hadoop Engineering? 
Hadoop 
open-source software framework 
for distributed storage 
and distributed processing 
of Big Data on clusters of commodity hardware 
- Wikipedia 
Engineering 
the application of scientific, economic, social, and practical knowledge in order to 
invent, design, build, maintain, and improve structures, 
machines, devices, systems, materials and processes 
- Wikipedia
์–ด๋ ต๋‹ค! ์‰ฝ๊ฒŒ ํ•ฉ์‹œ๋‹ค!! 
ํ•˜๋‘ก ์—”์ง€๋‹ˆ์–ด๋ง 
ํ•˜๋‘ก์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชฉ์  ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋“ž ๊ณ„ํš ๋ฐ ํ™œ๋™ 
ํ•˜์ง€๋งŒโ€ฆ ๋„ˆ๋ฌด ๋ฐฉ๋Œ€ํ•˜๋‹คโ€ฆ 
์†”์งํžˆ ๊ณผ์ •๋งŒ ๋‹ค ์„ค๋ช…ํ•ด๋„ ์˜ค๋Š˜ ์ง‘์— ๋ชป๊ฐ€๊ฒŒ ๋œ๋‹ค๋Š”โ€ฆ 
์ฃผ์–ด์ง‚ ์‹œ๊ฐ„๋‚ด๋กœ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ€๋ถ‚๊นŒ์ง€๋งŒ ํ•œ๋‹ค. 
๋‚˜๋จธ์ง€๋Š”? 
๋‚˜์ค‘์— ๊ผญ ํ•œ๋‹ค! 
to be continue~
๋จผ์ €, ํ™˜์ƒ์„ ๋ฒ„๋ฆฌ์ž! 
ํ•˜๋‘ก์€ ๊ณ„์† ๋…ธ๋“œ๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด ์ ์žฌ๋Ÿ‰/์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐ„๋‹ค!? 
์ด๋กž์ ์œผ๋กœ๋Š” ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜๋ก ์ ์žฌ๋Ÿ‰/์„ฑ๋Šฅ์€ ๋น„๋ก€ํ•ด์„œ ์ฆ๊ฐ€ํ•œ๋‹ค. 
๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„œ๋Š”โ€ฆ 
System์˜ ํ•š๊ณ„ โ‰ค Money์˜ ํ•š๊ณ„ 
PC๊ธ‰ ์žฅ๋น„๋ผ๋„ ๋งŽ์œผ๋ฉด ์ ์žฌ๋Ÿ‰๊ณผ ์„ฑ๋Šฅ์ด ์—„์ฒญ๋‚  ๊ฒƒ์ด๋‹ค!? 
์•„์šฐํ† ๋ฐ˜์—์„œ ๋ชจ๋‹ ์ˆ˜๋ฐฑ๋Œ€๊ฐ€ ์˜ค๋ฐ”์ดํŠธ(overheat)ํ•  ๋งŒํผ ์งˆ์ฃผํ•ด๋„โ€ฆ 
์–ด์ฐจํ”ผ ํŽ˜๋ผ๋ฆฌ ์•ž์—์„œ๋Š” ๋‹ฌ๊ตฌ์ง€์ผ ๋ฟโ€ฆ 
PC๋Š” ์„œ๋ฒ„๊ฐ€ ์•„๋‹ˆ๋‹ค! (๋ถ€ํ’ˆ์„ฑ๋Šฅ/๋‚ด๊ตฌ์„ฑ ๋“ฑ๋“ฑ) 
๊ฐ€๊ฒฉ๋Œ€๋น„ ์„ฑ๋Šฅ๋น„(ROI)๊ฐ€ ์ข‹์€ ์„œ๋ฒ„๋ฅผ ์ ์ ƒํžˆ ์“ฐ๋Š”๊ฒŒ ํ˜„๋ช…ํ•˜๋‹ค.
์˜ˆ์‚ฐ ํˆฌ์ž… ์ ‚๋žต 
์˜ˆ์‚ฐํˆฌ์ž…์ด ํด์ˆ˜๋ก, ๋น„๋ก€ํ•š ๊ฒƒ๋ณด๋‹ค ๋” ๋งŽ์€ ์„ฑ๋Šฅ์„ ์–ป๋Š”๋‹ค. 
์กฐ๊ธˆ์”ฉ ์ž์ฃผ ๊ตฌ๋งค vs ํ•œ๋ฐฉ์— ์™•์ฐฝ ๊ตฌ๋งค 
ํ•˜์ง€๋งŒ ํ˜„์‹ค์€ ๋ฐ•๋ฆฌ๋‹ค๋งค์˜ ์Šน๋ฆฌ! (์•„๋งˆ์กฒ, ์ฝ”์ŠคํŠธ์ฝ”, ์›”๋งˆํŠธโ€ฆ) 
์ง‚์งœ ๋ˆ์„ ์•„๋ผ๊ณ  ์‹ถ๋‹ค๋ฉด ์ข€ ์ฐธ์•˜๋‹ค๊ฐ€ ํ•š๋ฐฉ์— ํฌ๊ฒŒ ์จ์•ผ ํ•š๋‹ค! 
๊ทผ๋ฐ ํ•š๋ฒˆ์— ๋งŽ์ด ๊ตฌ๋งคํ•˜์ž๋‹ˆ ๋„ˆ๋ฌด Risk๊ฐ€ ์ปค์ง‚๋‹ค. 
๊ทธ๋ž˜์„œโ€ฆ ๊ต‰์žฅํžˆ ๋””ํ…Œ์ผํ•˜๊ณ  ์น˜๋ฐ€ํ•œ ์ ‚๋žต์ด ํ•„์š”ํ•˜๋‹ค. 
ex) ์ˆ˜์–ต~์ˆ˜๋ฐฑ์–ต์„ ํˆฌ์žํ–ˆ๋Š”๋ฐ ์žฅ๋น„๊ถํ•ฉ์ด ์•ˆ๋งž์•„ ๊ฒฐ๊ตญ ์„ฑ๋Šฅ์ด ๋‚ฎ์•„์„œ ๋งํ–ˆ๋‹ค. 
You fire!!!
์‹คํŒจํ•˜์ง€ ์•Š๋Š” ์žฅ๋น„๊ตฌ๋งค ์ ‚๋žต 
๋˜๋„๋ก ๋งŽ์€ ๋ฒค๋”(vendor)๋ฅผ ๋งŒ๋‚˜๊ณ  ํ˜‘์ƒํ•š๋‹ค. 
๋น…๋ฒค๋”๊ฐ€ ํ•ญ์ƒ ๋น„์‹ผ๊ฑด ์•„๋‹ˆ๋‹ค. 
๊ทธ๋ฆฌ๊ณ  ์ค‘์†Œ๋ฒค๋”๊ฐ€ ํ•ญ์ƒ ์ €๋ ดํ•œ๊ฑด ๋”์šฑ ์•„๋‹ˆ๋‹ค. 
ํ—›๋œ ๋ฏฟ์Œ์„ ๊ฐ–๊ฑฐ๋‚˜ ๋งˆ์Œ์„ ๋†“๋Š” ์ˆœ๊ฐ„์— ๋ฐ”๋กœ ํ˜ธ๊ตฌ๊ฐ€ ๋œ๋‹ค. 
์นœ๊ตฌ๋“  ํ•™๊ต/์ง์žฅ ์„ ํ›„๋ฐฐ๋“ โ€ฆ ์„ธ์ƒ์— ๋ฏฟ์„ ๋†ˆ์€ ์—†๋‹ค. ์ˆซ์ž๋งŒ ๋ฏฟ์–ด๋ผ! 
๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ์ŠคํŽ™์„ ๊ธฐ์ค€์œผ๋กœ ์‚ผ๋Š”๋‹ค. 
๋ชจ๋“ž ๋ฒค๋”๊ฐ€ ๋งž์ถฐ์ค„ ์ˆ˜ ์žˆ๋Š” ์ŠคํŽ™์„ ๊ธฐ์ค€์œผ๋กœ ์ •ํ•˜๊ณ  ํ˜‘์ƒํ•œ๋‹ค. 
ํŠน์ • ์žฅ๋น„์— ๋Œ€ํ•œ ์˜์กฒ์„ฑ์€ ๊ฑธ๋ฆฌ๊ธฐ๋Š” ์‰ฌ์›Œ๋„ ํ’€๊ธฐ๋Š” ์ •๋ง ์–ด๋ ต๋‹ค. 
ํŠน์ • ๋ฒค๋”์˜ ํŠน์„ฑ์ด ๊ฐ•์กฐ๋œ ์ŠคํŽ™์€ ์ฒ ์ €ํžˆ ๋ฐฐ์ œํ•š๋‹ค.
๋ชฉ๋ˆ์ด ์—†๋‹ค! ๊ตฌ๋งคํ•˜์ง€ ์•Š๊ณ  ๋ Œํƒˆํ•˜๋Š” ์‹œ๋Œ€! 
์ด๋ฏธ ์ฃผ๋ ฅ ์„œ๋น„์Šค์™€ ๋ฐ์ดํ„ฐ๊ฐ€ AWS์— ์žˆ๋Š” ๊ฒฝ์šฐ 
EMR์€ ํ•„์š”ํ•  ๋•Œ ํ•„์š”ํ•œ ๋งŒํผ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด ๋งค์šฐ ํ•ฉ๋ฆฌ์ . 
๊ทธ๋Ÿฌ๋‚˜, 
์žฅ๊ธฐ๊ฐ„/์ง€์†์ ์ธ ์‚ฌ์šฉ ๏ƒจ ๋น„์šฉ ์ด์Šˆ 
์žฅ์• ๋ฐœ์ƒ ๏ƒจ ์•„๋งˆ์กฒ ํ•ด๋ฐ”๋ผ๊ธฐํ˜• ์ข€๋น„๋กœ ๋ณ€์‹ž 
AWS์— ๋ชจ๋“  ์šด๋ช…์„ ๊ฑธ์–ด๋†“์€ ๋ถ„๋“ค์—๊ฒŒ ๊ฐ•๋ ฅ ์ถ”์ฒœํ•จ. 
Public cloud์˜ VM์œผ๋กœ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒฝ์šฐ 
์ ƒ๋Œ€ ์ด๋Ÿฐ ์ง“์„ ํ•ด์„œ๋Š” ์•ˆ๋จ. 
- ๋ป˜์ง’์„ ๊ฒฝํ—˜ํ•œ ์„ ๊ตฌ์ž(๋˜๋Š” ๋งˆ๋ฃจํƒ€)์˜ ํšŒ๊ณ  
์„ฑ๋Šฅ/๋น„์šฉ ๋ชจ๋‘ ๋งŒ์กฑํ•› ์ˆ˜ ์—†์Œ.
Private cloud๋ฅผ ๊ตฌ์ถ•ํ–ˆ๋Š”๋ฐโ€ฆ 
CloudStack ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•ํ•š ๊ฒฝ์šฐ 
๋™์ผ๋ž™(๋™์ผRVM) ์ด๋‚ด๋กœ ๊ทœ๋ชจ๋ฅผ ์ œํ•œํ•  ๊ฒฝ์šฐ ์ ๋‹นํžˆ ์“ธ๋งŒํ•จ. 
๋™๊ธ‰์˜ EMR๋ณด๋‹ค ์ข€ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ. 
๊ทธ๋Ÿฌ๋‚˜, 
๋ฐ˜๋“œ์‹œ ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ๋ฌผ๋ฆฌ๋ž™์— ์ง€์ •ํ•ด์„œ ๋ชฐ๋นตํ•ด์•ผ ํ•จ. 
๋”œ๋ ˆ๋งˆ : ๊ทธ๋Ÿผ ๋ญํ•˜๋Ÿฌ cloud๋ฅผ ์“ฐ๋Š”๊ฑฐ์ง€??? 
์—ฌ๋Ÿฌ ๋ฌผ๋ฆฌ๋ž™์œผ๋กœ ๊ตฌ์„ฑํ•  ์ˆ˜๋„ ์žˆ์œผ๋‚˜ ์—„์ฒญ๋‚œ ์„ฑ๋Šฅ์ €ํ•˜๋ฅผ ๊ฐ์ˆ˜ํ•ด์•ผ ํ•จ. 
OpenStack ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•ํ•š ๊ฒฝ์šฐ 
CloudStack์— ๋น„ํ•ด ๊ตฌ์กฐ์ ์œผ๋กœ ์œ ์—ฐํ•˜๊ณ  ์ด์ƒ์ ์ž„. 
๊ทธ๋Ÿฌ๋‚˜, 
Network ๊ตฌ์„ฑ์—์„œ ์‹ค์ œ๋กœ ์—„์ฒญ๋‚œ ๋Œ€์—ญํญ์„ ํ•„์š”๋กœ ํ•จ. 
์„œ๋ฒ„๋ณด๋‹ค ๋„คํŠธ์›Œํฌ์— ๋” ํˆฌ์žํ•› ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ถ”์ฒœํ•จ. (์ˆ˜๋ฐฑGbps?)
์ง์ ‘ ๊ตฌ์ถ•ํ•˜๋ ค๋ฉด ์–ด๋””์—? 
ํšŒ์‚ฌ ๋‚ด๋ถ€์—์„œ ๊ด€๋ฆฌํ•˜๋Š” ์ ‚์‚ฐ์‹ค์— ์„ค์น˜ 
์ƒ๋ฉด/๋„คํŠธ์›Œํฌ ๋น„์šฉ ์ ƒ์•ฝ, ์žฅ์• ๋Œ€์‘/์œ ์ง€๋ณด์ˆ˜ํ•˜๋Š”๋ฐ ์ตœ๊ณ ์˜ ์„ ํƒ์ง€. 
๊ทธ๋Ÿฌ๋‚˜, 
(์ ‚์‚ฌ ๋„คํŠธ์›Œํฌ๋ฅผ ๋งˆ๋น„์‹œํ‚ค๊ณ  ์‹ถ์ง€ ์•Š๋‹ค๋ฉด) 
๋ฐ˜๋“œ์‹œ vLAN ๊ตฌ์„ฑ ๋ฐ ํ์‡„๋ง์„ ๋ณ„๋„ ๊ตฌ์„ฑํ•ด์„œ ์—„์ฒญ๋‚œ ํŠธ๋ž˜ํ”ฝ์„ ๊ฐ€๋‘ฌ์•ผ ํ•จ. 
์ ‚์‚ฐ์‹ค์ด IDC ์ˆ˜์ค€์— ๋ชป ๋ฏธ์นœ๋‹ค๋ฉด ์ถ”๊ฐ€ ๊ณ ๋ ค์‚ฌํ•ญ(๋น„์šฉ)์ด ๋งŽ์ด ์ฆ๊ฐ€ํ•จ. 
IDC์— co-location์œผ๋กœ ์งฑ๋ฐ•์•„ ๋†“๊ธฐ 
์‚ฌ๋‚ด ์ ‚์‚ฐ์‹ค์ด ์—†๊ฑฐ๋‚˜ ์ƒ๋ฉด์ด ๋ถ€์กฑํ•˜๋ฉด IDC๊ฐ€ ์œ ์ผํ•œ ์„ ํƒ์ง€์ž„. 
๊ทธ๋Ÿฌ๋‚˜, 
์ƒ๋ฉด/๋„คํŠธ์›Œํฌ ๋น„์šฉ์ด ๋†’๋‹ค. (1U๋‹น 100๋งŒ์›/๋…„ ์ด์ƒ ์ง€์ถœ) 
์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์–ธ์ œ๋“ ์ง€ IDC์— ๋“ค์–ด๊ฐ€ ๋ณต๊ตฌํ•› ์šฉ์ž๊ฐ€ ํ•„์š”.
์šฉ๋Ÿ‰์‚ฐ์ •์€ ์–ด๋–ป๊ฒŒ? 
โ‘  1์ผ ์ ์žฌ๋Ÿ‰ 
โ‘ก ํ–ฅํ›„ 2~3๋…„ ๋™์•ˆ ์ฆ๊ฐ€๊ฐ€ ์˜ˆ์ƒ๋˜๋Š” ์ฆ๊ฐ€ ๋น„์œจ (์„œ๋น„์Šค ์ง€ํ‘œ ๋“ฑ์„ ์ฐธ๊ณ ๋กœ ๋น„์œจ์„ ๋ณด์ˆ˜์ ์œผ๋กœ ์‚ฐ์ถœ) 
โ‘ข ๊ธฐ์กฒ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ์ž์ฒด์ ์œผ๋กœ ์ƒ์‚ฐํ•˜๋Š” ์šฉ๋Ÿ‰ 
โ‘ฃ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•˜์ง€ ์•Š๊ณ  ๋ณด๊ด€ํ•˜๋Š” ๊ธฐ๊ฐ„ 
โ‘ค OS ์„ค์น˜์šฉ๋Ÿ‰ (๋Œ€๋‹น 1~20GB) 
โ‘ฅ ์ž„์‹œ ์šฉ๋Ÿ‰ (HDFS temp ๋ฐ shuffle์šฉ, ๋ณดํ†ต 10%) 
โ‘ฆ ์„ค์น˜ ํ”„๋กœ๊ทธ๋žจ ์šฉ๋Ÿ‰ (๋Œ€๋‹น 1~30GB) 
โ‘ง ์ž์ฒด๋กœ๊ทธ ์ ์žฌ ์šฉ๋Ÿ‰ (hadoop ์ž์ฒด log๋ฅผ ๋ณด๊ด€ํ•  ์šฉ๋Ÿ‰, ๋ณดํ†ต 10%) 
โ‘จ ์žฅ์• ๋Œ€์‘ ์—ฌ์œ ์œจ (๋ณดํ†ต 30%) 
โ‘ฉ ๋…ธ๋“œ๋‹น ๋””์Šคํฌ ๊ฐœ์ˆ˜ (1U=4 or 8, 2U=12 or 24) 
โ‘ช ๊ฐœ๋ณ„ ๋””์Šคํฌ ์šฉ๋Ÿ‰ 
โ‘ซ ๋‹จ์œ„ ๋ณด์ • (1TB ๋””์Šคํฌ๋Š” ์‹ค์ œ 1,000,000,000,000 Byte ๏ƒ  931GB ๏ƒจ 10% ๋ณด์ •์ด ํ•„์š”) 
- HDFS ์ ์žฌ ์†Œ์š”๋Ÿ‰ ๊ณต์‹ : Hs = (โ‘  + โ‘ข ) x โ‘ก x โ‘ฃ x 3[replica] 
ex) Hs = ( 500GB + 30GB ) x 101 / 100 x 365day x 3[rep] = 573TB 
- ์ ‚์ฒด ์„œ๋ฒ„๋Œ€์ˆ˜ 1์ฐจ ์‚ฐ์ถœ ๊ณต์‹ : St1 = Hs / โ‘ฉ / โ‘ช x โ‘ซ ex) St1 = 573 / 4 / 3 x 10% = 53๋Œ€ (์†Œ์ˆ˜์ ์ดํ•˜ ๋ฐœ์ƒ์‹œ ๋ฌด์กฐ๊ฑด +1 ํ•ด์•ผํ•จ) 
- ์ ‚์ฒด ๋ฌผ๋ฆฌ์  ์šฉ๋Ÿ‰ ๊ณต์‹ : Ts = ( Hs + ( โ‘ค + โ‘ฆ ) x St1 x โ‘ฅ x โ‘ง ) x โ‘จ 
ex) Ts = ( 573TB + ( 20GB + 30GB ) x 53๋Œ€ x 10% x 10% x 10% ) x 30% = 750 TB 
- ์ ‚์ฒด ์„œ๋ฒ„๋Œ€์ˆ˜ ์ตœ์ข… ์‚ฐ์ถœ ๊ณต์‹ : St2 = Ts / โ‘ฉ / โ‘ช x โ‘ซ 
ex) St2 = 750 / 4 / 3 x 10% = 69๋Œ€ 
ps. Hive๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ข€๋” ๋””ํ…Œ์ผ์ด ํ•„์š”. (File format/์••์ถ• ์ข…๋ฅ˜์— ๋”ฐ๋ผโ€ฆ)
์—ฌ์ฐจ์ €์ฐจ ๋„์ž…ํ–ˆ๋Š”๋ฐ.. ๋‚ด ์Šน์งˆ์ด ๊ธ‰ํ•š๊ฑด๊ฐ€, ์„œ๋ฒ„๊ฐ€ ๋А๋ฆฐ๊ฑด๊ฐ€ 
์„ฑ๋Šฅ์ด์Šˆ๋Š” ๋ฏธ๋ฆฌ ๋Œ€๋น„ํ•˜๋”๋ผ๋„ ๋Œ€๋ถ€๋ถ‚ ๋ฐœ์ƒํ•œ๋‹ค. 
(์•ˆ์ƒ๊ธฐ๋Š”๊ฒŒ ์ด์ƒํ•จโ€ฆ ์•„๋‹ˆ๋ฉด ์ž‘์—…์„ ๋นก์„ธ๊ฒŒ ์•ˆ๋Œ๋ ธ๋˜๊ฐ€โ€ฆ) 
์•„์ฃผ ๋‹ค์–‘ํ•œ ์›์ธ์ด ์กฒ์žฌํ•จ. 
๋Œ€๋ถ€๋ถ‚์˜ ์ด์Šˆ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ณณ ๏ƒ  ์„ฑ๋Šฅ์ด ๋‚ฎ์€ ์š”์†Œ ๏ƒจ ๋””์Šคํฌ & ๋„คํŠธ์›Œํฌ! 
์ผ๋ฐ˜ SATA Disk = ํ‰๊ท  150MB/s 
๋„คํŠธ์›Œํฌ 1Gb = ํ‰๊ท  100MB/s 
1Gb๋Š” ์ผ๋ฐ˜ ๋””์Šคํฌ 1๊ฐœ ์†๋„๋„ ์•ˆ๋˜๋Š” ๊ฑฐ๋ถ์ดโ€ฆ 
๊ทธ๋ž˜์„œ ํ•˜๋‘ก์„ ์‚ฌ์šฉํ•˜๋Š” ๊ณณ์€ ๋Œ€๋ถ€๋ถ‚ 10Gb๊ฐ€ ๊ธฐ๋ณธ์ด ๋˜์—ˆ๋‹ค.
๋””์Šคํฌ ์„ฑ๋Šฅ ์ด์Šˆ : RAID๋ฅผ ์ž˜ ์“ฐ๋ฉด ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์ง‚๋‹ค. 
Case 
Disk 
์ ์šฉ ์‚ฌํ•ญ 
Data Node 
Master Node 
๊ธฐํƒ€ 
(์ˆ˜์ง‘/Import/Export) 
Boot ์˜์—ญ 
DFS ์˜์—ญ 
1U 
2.5โ€ 
(8~10๊ฐœ) 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ 
โ€ขDisk 2๊ฐœ๋ฅผ RAID-1 
โ€ข๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ 
โ€ขDisk 2๊ฐœ์”ฉ RAID-0 ์„ค์ • ๏ƒ  3~4๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ 
โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 (Package+Data) 
โ€ขAll RAID-10 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ 
โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 or 5 (Package+Data) 
โ€ขAll RAID-10 or 5 
3.5โ€ 
(4๊ฐœ) 
โ€ข๊ฐ€์šฉ์„ฑ ๊ทน๋Œ€ํ™” 
โ€ขDisk 1๊ฐœ์— OS ์„ค์น˜์šฉ ์ตœ์†Œ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ, ๋‚˜๋จธ์ง€๋Š” DFS ์˜์—ญ 
โ€ขSSD๋ฅผ ์ถ”๊ฐ€ ๏ƒ  OS์ ‚์šฉ 
โ€ข๊ฐ€์šฉ์„ฑ ๊ทน๋Œ€ํ™” 
โ€ขDisk ๊ฐœ๋ณ„ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ ๏ƒ  3~4๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ 
โ€ขAll RAID-10 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ 
โ€ขAll RAID-10 or 5 
2U 
2.5โ€ 
(20~24๊ฐœ) 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ 
โ€ขDisk 2๊ฐœ๋ฅผ RAID-1 
โ€ข๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ 
โ€ขDisk 2๊ฐœ์”ฉ RAID-0 ์„ค์ • ๏ƒ  9~11๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ 
โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 (Package+Data) 
โ€ขAll RAID-10 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ 
โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 or 5 (Package+Data) 
โ€ขAll RAID-10 or 5 
3.5โ€ 
(8~10๊ฐœ) 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ 
โ€ขDisk 2๊ฐœ๋ฅผ RAID-1 
โ€ขSSD๋ฅผ ์ถ”๊ฐ€ ๏ƒ  OS์ ‚์šฉ 
โ€ข๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ 
โ€ขDisk 2๊ฐœ์”ฉ RAID-0 ์„ค์ • ๏ƒ  3~4๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ 
โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 (Package+Data) 
โ€ขAll RAID-10 
โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ 
โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 or 5 (Package+Data) 
โ€ขAll RAID-10 or 5
๋””์Šคํฌ ์„ฑ๋Šฅ ์ด์Šˆ : ๊ทธ๋Ÿผ SSD๋ฅผ ์“ฐ๋ฉด ์—„์ฒญ๋‚˜๊ฒŒ ์ข‹์•„์ง€๋ ค๋‚˜? 
์™œ SSD๋ฅผ ์‚ฌ์šฉํ•˜๋ ค ํ•˜๋Š”๊ฐ€? 
๏ƒ  ๋น ๋ฅธ ์†๋„์™€ ์—„์ฒญ๋‚œ ๋žš๋ค์–ต์„ธ์Šค ์„ฑ๋Šฅ (์‹ค์ œ๋กœ HBase์šฉ Hadoop์—์„œ ์ž์ฃผ ์ฑ„์šฉ) 
ํ•˜์ง€๋งŒ ์•„์ง์€ ๋‚ด๊ตฌ์„ฑ์— ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ ๏ƒ  WRITE ํšŸ์ˆ˜ ์ œํ•œ (์‹ค์ œ RDB main store๋กœ ์‚ฌ์šฉ์‹œ 6๊ฐœ์›”์„ ๋ชป๋ฒ„ํ‹ฐ๊ณ  ์‚ฌ๋ง) 
๏ƒ  ๋งŒ์•ฝ ๋™์ผ ์—ญํ• ์˜ ์—ฌ๋Ÿฌ ์žฅ๋น„์— ๋™์‹œ์— ์žฅ์ฐฉํ•˜๋ฉด ๊ฑฐ์˜ ๋™์‹œ๋‹ค๋ฐœ๋กœ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒ 
๏ƒ  ๋งŒ์•ฝ DataNode์—๋‹ค ์žฅ์ฐฉํ–ˆ๋‹ค๋ฉดโ€ฆ ํ•˜๋‘ก์˜ 3-replica๋„ ๋ฌด์šฉ์ง€๋ฌผ์ด ๋  ์ˆ˜ ์žˆ์Œ. 
๋‚ด๊ตฌ์„ฑ๊ณผ ์„ฑ๋Šฅ์ด ์ข‹์€ ์ œํ’ˆ์€ ๊ฐ€๊ฒฉ์ด 10๋ฐฐ ์ด์ƒ ์ƒ์Šน ๏ƒ  Fusion IO (์ด์   ์ด๊ฒŒ ๋ถ€์˜ ์ƒ์ง“์ธ๊ฐ€?) 
SSD, ๊ทผ๋ฐ ์ •๋ง ๋น ๋ฅธ๊ฐ€? ๏ƒ  ์ผ๋ฐ˜ SSD๋Š” 500 MB/s ์ˆ˜์ค€, Fusion IO๋Š” ๋ณดํ†ต 1 GB/s ์ˆ˜์ค€ 
Workaround : SSD์— ๋งž๋จน๋Š” ์†๋„๊ฐ€ ํ•„์š”ํ•› ๋•Œ ๏ƒ  ๋‹จ์ผ HDD๋Š” ๋ณดํ†ต 150 MB/s ์ˆ˜์ค€ (์ตœ์‹ž๋ชจ๋ธ์€ 170MB/s๊นŒ์ง€ ํ™•์ธ) ๏ƒ  HDD 4๊ฐœ ์ด์ƒ์„ RAID-10๋กœ ๋ฌถ์œผ๋ฉด 350MB/s ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ ํš๋“ ๏ƒ  ์š”์ฆ˜ RAID Controller ์„ฑ๋Šฅ ๋ฌด์ง€ ์ข‹์•„์ง (RAID-5๊ฐ€ RAID-10๋ณด๋‹ค ๋น ๋ฆ„, ์ŠคํŽ™ํ™•์ธ ํ•„์ˆ˜!)
๋””์Šคํฌ ์„ฑ๋Šฅ ์ด์Šˆ : OS ํŠœ๋‹๋„ ํ•„์š”ํ•ด! 
์˜์™ธ๋กœ ๋ฆฌ๋ˆ…์Šค ์ปค๋„๋ฒ„๊ทธ๊ฐ€ ๋งŽ๋‹ค. 
๏ƒ  ๋Œ€ํ‘œ์ ์œผ๋กœ RHEL 6.2/6.3 THP issue (THP๋ฅผ ๋น„ํ™œ์„ฑํ™”์‹œ์ผœ์•ผ ํ•จ) http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/ 
I/O Scheduler ๏ƒ  RAID๊ฐ€ ์—†์œผ๋ฉด DataNode๋Š” daedline 
๏ƒ  RAID๊ฐ€ ์žˆ์œผ๋ฉด DataNode๋Š” noop 
Disk Cache๋ฅผ ์ตœ๋Œ€ํ•š ์ฅ์–ด์งœ์•ผ ํ•š๋‹ค. ๏ƒ  Linux์˜ Read Ahead cache๋Š” ๊ฒจ์šฐ 128KB. 
๏ƒ  ์ ์ ƒํžˆ ์ฆ๊ฐ€์‹œ์ผœ์•ผ ํ•œ๋‹ค. (๋ณดํ†ต 2MB ์ถ”์ฒœ) 
๏ƒ  1MB ๋‹จ์œ„๋กœ Disk์˜ Cache size (๋ณดํ†ต 64MB)๊นŒ์ง€ 1~8MB ๋‹จ์œ„๋กœ ๋Š˜๋ฆฌ๋ฉด์„œ ํ…Œ์ŠคํŠธ ํ•„์š”. 
๏ƒ  ์–ถ์ œ๊นŒ์ง€? ๏ƒจ I/O Wait์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ๊บฝ์ผ ๋•Œ ๊นŒ์ง€โ€ฆ ๊ทธ๋•Œ์˜ ์บ์‹œํฌ๊ธฐ๋ฅผ ์ตœ์ ์œผ๋กœ ์‚ผ๋Š”๋‹ค. 
๏ƒ  RAID Controller๊ฐ€ ์žฅ์ฐฉ๋œ ๊ฒฝ์šฐ ์ตœ๋Œ€ 2GB์˜ cache๊ฐ€ ์žˆ์Œ.
๊ธ€๋กœ๋ฒŒ ๋ ˆํผ๋Ÿฐ์Šค ์ŠคํŽ™์„ ์•Œ๊ณ  ์‹ถ์–ด์š”! 
๋ชจ๋“  ํžŒํŠธ๋Š” www.opencompute.org์—์„œ ํ™•์ธํ•› ์ˆ˜ ์žˆ๋‹ค. 
facebook์˜ ํ‘œ์ค€ ์žฅ๋น„์ŠคํŽ™์„ ์ •๋ฆฌํ•œ ๋‚ด์šฉ๋“คโ€ฆ 
facebook? 
์•„๋งˆ ์ ‚์„ธ๊ณ„์—์„œ Hadoop์„ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ํšŒ์‚ฌโ€ฆ 
์ฐธ๊ณ ๋กœ, ์—ฌ๊ธฐ์—์„œ ๋„คํŠธ์›Œํฌ ๊ด€๋ จ PDF๋“ค์„ ๋‹ค์šฒ๋ฐ›์•„์„œ ๋ณด์‹œ๋ผโ€ฆ 
10Gb๋Š” ๋‹น์—ฐํžˆ ๊ธฐ๋ณธ. 
UP-Link๋Š” 40Gb๊ฐ€ ์ตœ๋Œ€ 12๊ฐœ!!! 
(40 x 12 = 480Gbps ๋Œ€์—ญํญ) 
๊ทธ๋Ÿฐ๋ฐโ€ฆ 
๊ฒฝํ—˜์ ์œผ๋กœ ์˜ˆ์ƒํ•˜๊ฑด๋ฐ, 
์ด ์ •๋„ ์ŠคํŽ™๋„ facebook์—์„œ๋Š” ๋ชจ์ž๋ฅผ๊ป„? 
๊ณต์‹ ์ฑ„๋„์„ ํ†ตํ•ด ๋น„๊ณต์‹์ ์œผ๋กœ ๋“ค์€ ์ด์•ผ๊ธฐ๋กœ๋Š”โ€ฆ 
ํ˜„์žฌ ๋ฏธ๊ตญ ๋ช‡๋ช‡ ํšŒ์‚ฌ์—์„œ 100Gb ์žฅ๋น„๋ฅผ BMT ํ•˜๋Š” ์ค‘ ์ด๋ผ๊ณ โ€ฆ
๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

More Related Content

PPTX
about hadoop yes
PDF
ํ•˜๋‘ก (Hadoop) ๋ฐ ๊ด€๋ จ๊ธฐ์ˆ  ํ›‘์–ด๋ณด๊ธฐ
PDF
hadoop ch1
PPTX
3ํšŒ ์„œ์šธ Hadoop ์‚ฌ์šฉ์ž ๋ชจ์ž„ / ์•„ํŒŒ์น˜ ํ”ผ๋‹‰์Šค
PDF
์„œ์šธ ํ•˜๋‘ก ์‚ฌ์šฉ์ž ๋ชจ์ž„ ๋ฐœํ‘œ์ž๋ฃŒ
PDF
ํ•˜๋‘ก ์•Œ์•„๋ณด๊ธฐ(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
PDF
PostgreSQL ์ด์•ผ๊ธฐ
PPTX
An introduction to hadoop
about hadoop yes
ํ•˜๋‘ก (Hadoop) ๋ฐ ๊ด€๋ จ๊ธฐ์ˆ  ํ›‘์–ด๋ณด๊ธฐ
hadoop ch1
3ํšŒ ์„œ์šธ Hadoop ์‚ฌ์šฉ์ž ๋ชจ์ž„ / ์•„ํŒŒ์น˜ ํ”ผ๋‹‰์Šค
์„œ์šธ ํ•˜๋‘ก ์‚ฌ์šฉ์ž ๋ชจ์ž„ ๋ฐœํ‘œ์ž๋ฃŒ
ํ•˜๋‘ก ์•Œ์•„๋ณด๊ธฐ(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
PostgreSQL ์ด์•ผ๊ธฐ
An introduction to hadoop

What's hot (20)

PPT
๊ตฌ๊ธ€์˜ ๊ณต๋ฃกํ™”
ย 
PPTX
์ด๊ฒƒ์ด ๋ ˆ๋””์Šค๋‹ค.
PDF
14 virtual memory
PPT
Hadoop Introduction (1.0)
PDF
Hadoop๋ฐœํ‘œ์ž๋ฃŒ
PDF
์Šคํƒ€ํŠธ์—… ์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ ๋ถ„์„ : Tajo on AWS
PDF
Big query at GDG Korea Cloud meetup
PPTX
Introduction to Apache Tajo
ย 
PDF
Hdfs
PDF
Cassandra ๋ฉ˜๋ถ•๊ธฐ | Devon 2012
PDF
Expanding Your Data Warehouse with Tajo
PDF
์•Œ๊ณ  ์“ฐ์ž! HBase | Devon 2012
PDF
Redis edu 3
PPTX
Ndc14 ๋ถ„์‚ฐ ์„œ๋ฒ„ ๊ตฌ์ถ•์˜ ABC
PDF
Redis
PDF
DirectStroageํ”„๋กœ๊ทธ๋ž˜๋ฐ์†Œ๊ฐœ
PDF
HDFS Overview
KEY
Distributed Programming Framework, hadoop
ย 
PDF
แ„’แ…กแ„ƒแ…ฎแ†ธ แ„Œแ…ฉแ‡‚แ„‹แ…ณแ†ซแ„‹แ…ฃแ†จแ„‹แ…ตแ„Œแ…ตแ„†แ…กแ†ซ ๋งŒ๋ณ‘ํ†ต์น˜์•ฝ์€ ์•„๋‹ˆ๋‹ค
PDF
NoSQL ๊ฐ„๋‹จํ•œ ์†Œ๊ฐœ
๊ตฌ๊ธ€์˜ ๊ณต๋ฃกํ™”
ย 
์ด๊ฒƒ์ด ๋ ˆ๋””์Šค๋‹ค.
14 virtual memory
Hadoop Introduction (1.0)
Hadoop๋ฐœํ‘œ์ž๋ฃŒ
์Šคํƒ€ํŠธ์—… ์‚ฌ๋ก€๋กœ ๋ณธ ๋กœ๊ทธ ๋ฐ์ดํ„ฐ ๋ถ„์„ : Tajo on AWS
Big query at GDG Korea Cloud meetup
Introduction to Apache Tajo
ย 
Hdfs
Cassandra ๋ฉ˜๋ถ•๊ธฐ | Devon 2012
Expanding Your Data Warehouse with Tajo
์•Œ๊ณ  ์“ฐ์ž! HBase | Devon 2012
Redis edu 3
Ndc14 ๋ถ„์‚ฐ ์„œ๋ฒ„ ๊ตฌ์ถ•์˜ ABC
Redis
DirectStroageํ”„๋กœ๊ทธ๋ž˜๋ฐ์†Œ๊ฐœ
HDFS Overview
Distributed Programming Framework, hadoop
ย 
แ„’แ…กแ„ƒแ…ฎแ†ธ แ„Œแ…ฉแ‡‚แ„‹แ…ณแ†ซแ„‹แ…ฃแ†จแ„‹แ…ตแ„Œแ…ตแ„†แ…กแ†ซ ๋งŒ๋ณ‘ํ†ต์น˜์•ฝ์€ ์•„๋‹ˆ๋‹ค
NoSQL ๊ฐ„๋‹จํ•œ ์†Œ๊ฐœ
Ad

Similar to Hadoop engineering v1.0 for dataconference.io (20)

PPT
091106kofpublic 091108170852-phpapp02 (๋ฒˆ์—ญ๋ณธ)
PPTX
๋ถ„์‚ฐ์ €์žฅ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์— ๋Œ€ํ•œ 12๊ฐ€์ง€ ์ด์•ผ๊ธฐ
PDF
Tdc2013 ์„ ๋ฐฐ๋“ค์—๊ฒŒ ๋ฐฐ์šฐ๋Š” server scalability
PPTX
Introduction to scalability
PPTX
Hadoop administration
PDF
log-monitoring-architecture.pdf
PDF
[NDC 2018] Spark, Flintrock, Airflow ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ํƒ„๋ ฅ์ ์ด๊ณ  ์œ ์—ฐํ•œ ๋ฐ์ดํ„ฐ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ์ž๋™ํ™” ์ธํ”„๋ผ ๊ตฌ์ถ•
PDF
Internet Scale Service Arichitecture
PDF
์•Œ์•„๋‘๋ฉด ์“ธ๋ฐ์žˆ๋Š” ์žกํ•™์‚ฌ์ „- AWS TipsํŽธ::ํ—ˆ์ค€, ๊น€๋ณ‘์ˆ˜::AWS Summit Seoul 2018
PPTX
AWS ์ธํ”„๋ผ/์•„ํ‚คํ…์ณ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•œ ๋น„์šฉ์ ˆ๊ฐ - ์ตœ์ธ์˜, AWS ์†”๋ฃจ์…˜ ์•„ํ‚คํ…ํŠธ :: AWS Travel and Transportatio...
PDF
์•ˆ์ •์ ์ธ ์„œ๋น„์Šค ์šด์˜ 2013.08
PDF
[แ„‹แ…ฉแ„‘แ…ณแ†ซแ„‰แ…ฉแ„‰แ…ณแ„แ…ฅแ†ซแ„‰แ…ฅแ†ฏแ„แ…ตแ†ผ] VMware แ„ƒแ…ขแ„‹แ…กแ†ซ ๊ฒ€ํ† ๋ฅผ ์œ„ํ•œ ํ”„๋ผ์ด๋น— ํด๋ผ์šฐ๋“œ ์†”๋ฃจ์…˜ ์ œ์–ธ
PDF
Scalable webservice
PDF
[giip] A.I. Infrastructure Advisor (์ธ๊ณต์ง€๋Šฅ ์ธํ”„๋ผ ์–ด๋“œ๋ฐ”์ด์ €)
PDF
[NDC18] ์•ผ์ƒ์˜ ๋•… ๋“€๋ž‘๊ณ ์˜ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ด์•ผ๊ธฐ: ๋กœ๊ทธ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ๊ฒฝํ—˜ ๊ณต์œ  (2๋ถ€)
PPTX
MSAแ„‹แ…ช infra
PDF
๋น…๋ฐ์ดํ„ฐ, big data
PDF
์•ˆ์ •์ ์ธ ์„œ๋น„์Šค ์šด์˜ 2014.03
PDF
์ฐพ์•„๊ฐ€๋Š” AWS ์„ธ๋ฏธ๋‚˜(๊ตฌ๋กœ,๊ฐ€์‚ฐ,ํŒ๊ต) - AWS ํด๋ผ์šฐ๋“œ๋กœ ์„œ๋น„์Šค ๋ฌดํ•œ๋Œ€๋กœ ํ™•์žฅํ•˜๊ธฐ (๋ฐ•์ฒ ์ˆ˜ ์†”๋ฃจ์…˜์ฆˆ ์•„ํ‚คํ…ํŠธ)
PDF
AWS Finance Symposium_์ฒœ๋งŒ ๊ณ ๊ฐ์„ ์œ„ํ•œ AWS ํด๋ผ์šฐ๋“œ ์•„ํ‚คํ…์ณ ํ™•์žฅํ•˜๊ธฐ
091106kofpublic 091108170852-phpapp02 (๋ฒˆ์—ญ๋ณธ)
๋ถ„์‚ฐ์ €์žฅ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์— ๋Œ€ํ•œ 12๊ฐ€์ง€ ์ด์•ผ๊ธฐ
Tdc2013 ์„ ๋ฐฐ๋“ค์—๊ฒŒ ๋ฐฐ์šฐ๋Š” server scalability
Introduction to scalability
Hadoop administration
log-monitoring-architecture.pdf
[NDC 2018] Spark, Flintrock, Airflow ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ํƒ„๋ ฅ์ ์ด๊ณ  ์œ ์—ฐํ•œ ๋ฐ์ดํ„ฐ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ์ž๋™ํ™” ์ธํ”„๋ผ ๊ตฌ์ถ•
Internet Scale Service Arichitecture
์•Œ์•„๋‘๋ฉด ์“ธ๋ฐ์žˆ๋Š” ์žกํ•™์‚ฌ์ „- AWS TipsํŽธ::ํ—ˆ์ค€, ๊น€๋ณ‘์ˆ˜::AWS Summit Seoul 2018
AWS ์ธํ”„๋ผ/์•„ํ‚คํ…์ณ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•œ ๋น„์šฉ์ ˆ๊ฐ - ์ตœ์ธ์˜, AWS ์†”๋ฃจ์…˜ ์•„ํ‚คํ…ํŠธ :: AWS Travel and Transportatio...
์•ˆ์ •์ ์ธ ์„œ๋น„์Šค ์šด์˜ 2013.08
[แ„‹แ…ฉแ„‘แ…ณแ†ซแ„‰แ…ฉแ„‰แ…ณแ„แ…ฅแ†ซแ„‰แ…ฅแ†ฏแ„แ…ตแ†ผ] VMware แ„ƒแ…ขแ„‹แ…กแ†ซ ๊ฒ€ํ† ๋ฅผ ์œ„ํ•œ ํ”„๋ผ์ด๋น— ํด๋ผ์šฐ๋“œ ์†”๋ฃจ์…˜ ์ œ์–ธ
Scalable webservice
[giip] A.I. Infrastructure Advisor (์ธ๊ณต์ง€๋Šฅ ์ธํ”„๋ผ ์–ด๋“œ๋ฐ”์ด์ €)
[NDC18] ์•ผ์ƒ์˜ ๋•… ๋“€๋ž‘๊ณ ์˜ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ์ด์•ผ๊ธฐ: ๋กœ๊ทธ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ๊ฒฝํ—˜ ๊ณต์œ  (2๋ถ€)
MSAแ„‹แ…ช infra
๋น…๋ฐ์ดํ„ฐ, big data
์•ˆ์ •์ ์ธ ์„œ๋น„์Šค ์šด์˜ 2014.03
์ฐพ์•„๊ฐ€๋Š” AWS ์„ธ๋ฏธ๋‚˜(๊ตฌ๋กœ,๊ฐ€์‚ฐ,ํŒ๊ต) - AWS ํด๋ผ์šฐ๋“œ๋กœ ์„œ๋น„์Šค ๋ฌดํ•œ๋Œ€๋กœ ํ™•์žฅํ•˜๊ธฐ (๋ฐ•์ฒ ์ˆ˜ ์†”๋ฃจ์…˜์ฆˆ ์•„ํ‚คํ…ํŠธ)
AWS Finance Symposium_์ฒœ๋งŒ ๊ณ ๊ฐ์„ ์œ„ํ•œ AWS ํด๋ผ์šฐ๋“œ ์•„ํ‚คํ…์ณ ํ™•์žฅํ•˜๊ธฐ
Ad

Hadoop engineering v1.0 for dataconference.io

  • 1. Hadoop Engineering v.1.0 for dataconference.io 2014-12-06 ์ •๊ตฌ๋ฒ” Search developer at Daumkakao mypowerbox@gmail.com
  • 2. Hadoop Engineering? Hadoop open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware - Wikipedia Engineering the application of scientific, economic, social, and practical knowledge in order to invent, design, build, maintain, and improve structures, machines, devices, systems, materials and processes - Wikipedia
  • 3. ์–ด๋ ต๋‹ค! ์‰ฝ๊ฒŒ ํ•ฉ์‹œ๋‹ค!! ํ•˜๋‘ก ์—”์ง€๋‹ˆ์–ด๋ง ํ•˜๋‘ก์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชฉ์  ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋“ž ๊ณ„ํš ๋ฐ ํ™œ๋™ ํ•˜์ง€๋งŒโ€ฆ ๋„ˆ๋ฌด ๋ฐฉ๋Œ€ํ•˜๋‹คโ€ฆ ์†”์งํžˆ ๊ณผ์ •๋งŒ ๋‹ค ์„ค๋ช…ํ•ด๋„ ์˜ค๋Š˜ ์ง‘์— ๋ชป๊ฐ€๊ฒŒ ๋œ๋‹ค๋Š”โ€ฆ ์ฃผ์–ด์ง‚ ์‹œ๊ฐ„๋‚ด๋กœ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ€๋ถ‚๊นŒ์ง€๋งŒ ํ•œ๋‹ค. ๋‚˜๋จธ์ง€๋Š”? ๋‚˜์ค‘์— ๊ผญ ํ•œ๋‹ค! to be continue~
  • 4. ๋จผ์ €, ํ™˜์ƒ์„ ๋ฒ„๋ฆฌ์ž! ํ•˜๋‘ก์€ ๊ณ„์† ๋…ธ๋“œ๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด ์ ์žฌ๋Ÿ‰/์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐ„๋‹ค!? ์ด๋กž์ ์œผ๋กœ๋Š” ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜๋ก ์ ์žฌ๋Ÿ‰/์„ฑ๋Šฅ์€ ๋น„๋ก€ํ•ด์„œ ์ฆ๊ฐ€ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„œ๋Š”โ€ฆ System์˜ ํ•š๊ณ„ โ‰ค Money์˜ ํ•š๊ณ„ PC๊ธ‰ ์žฅ๋น„๋ผ๋„ ๋งŽ์œผ๋ฉด ์ ์žฌ๋Ÿ‰๊ณผ ์„ฑ๋Šฅ์ด ์—„์ฒญ๋‚  ๊ฒƒ์ด๋‹ค!? ์•„์šฐํ† ๋ฐ˜์—์„œ ๋ชจ๋‹ ์ˆ˜๋ฐฑ๋Œ€๊ฐ€ ์˜ค๋ฐ”์ดํŠธ(overheat)ํ•  ๋งŒํผ ์งˆ์ฃผํ•ด๋„โ€ฆ ์–ด์ฐจํ”ผ ํŽ˜๋ผ๋ฆฌ ์•ž์—์„œ๋Š” ๋‹ฌ๊ตฌ์ง€์ผ ๋ฟโ€ฆ PC๋Š” ์„œ๋ฒ„๊ฐ€ ์•„๋‹ˆ๋‹ค! (๋ถ€ํ’ˆ์„ฑ๋Šฅ/๋‚ด๊ตฌ์„ฑ ๋“ฑ๋“ฑ) ๊ฐ€๊ฒฉ๋Œ€๋น„ ์„ฑ๋Šฅ๋น„(ROI)๊ฐ€ ์ข‹์€ ์„œ๋ฒ„๋ฅผ ์ ์ ƒํžˆ ์“ฐ๋Š”๊ฒŒ ํ˜„๋ช…ํ•˜๋‹ค.
  • 5. ์˜ˆ์‚ฐ ํˆฌ์ž… ์ ‚๋žต ์˜ˆ์‚ฐํˆฌ์ž…์ด ํด์ˆ˜๋ก, ๋น„๋ก€ํ•š ๊ฒƒ๋ณด๋‹ค ๋” ๋งŽ์€ ์„ฑ๋Šฅ์„ ์–ป๋Š”๋‹ค. ์กฐ๊ธˆ์”ฉ ์ž์ฃผ ๊ตฌ๋งค vs ํ•œ๋ฐฉ์— ์™•์ฐฝ ๊ตฌ๋งค ํ•˜์ง€๋งŒ ํ˜„์‹ค์€ ๋ฐ•๋ฆฌ๋‹ค๋งค์˜ ์Šน๋ฆฌ! (์•„๋งˆ์กฒ, ์ฝ”์ŠคํŠธ์ฝ”, ์›”๋งˆํŠธโ€ฆ) ์ง‚์งœ ๋ˆ์„ ์•„๋ผ๊ณ  ์‹ถ๋‹ค๋ฉด ์ข€ ์ฐธ์•˜๋‹ค๊ฐ€ ํ•š๋ฐฉ์— ํฌ๊ฒŒ ์จ์•ผ ํ•š๋‹ค! ๊ทผ๋ฐ ํ•š๋ฒˆ์— ๋งŽ์ด ๊ตฌ๋งคํ•˜์ž๋‹ˆ ๋„ˆ๋ฌด Risk๊ฐ€ ์ปค์ง‚๋‹ค. ๊ทธ๋ž˜์„œโ€ฆ ๊ต‰์žฅํžˆ ๋””ํ…Œ์ผํ•˜๊ณ  ์น˜๋ฐ€ํ•œ ์ ‚๋žต์ด ํ•„์š”ํ•˜๋‹ค. ex) ์ˆ˜์–ต~์ˆ˜๋ฐฑ์–ต์„ ํˆฌ์žํ–ˆ๋Š”๋ฐ ์žฅ๋น„๊ถํ•ฉ์ด ์•ˆ๋งž์•„ ๊ฒฐ๊ตญ ์„ฑ๋Šฅ์ด ๋‚ฎ์•„์„œ ๋งํ–ˆ๋‹ค. You fire!!!
  • 6. ์‹คํŒจํ•˜์ง€ ์•Š๋Š” ์žฅ๋น„๊ตฌ๋งค ์ ‚๋žต ๋˜๋„๋ก ๋งŽ์€ ๋ฒค๋”(vendor)๋ฅผ ๋งŒ๋‚˜๊ณ  ํ˜‘์ƒํ•š๋‹ค. ๋น…๋ฒค๋”๊ฐ€ ํ•ญ์ƒ ๋น„์‹ผ๊ฑด ์•„๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ค‘์†Œ๋ฒค๋”๊ฐ€ ํ•ญ์ƒ ์ €๋ ดํ•œ๊ฑด ๋”์šฑ ์•„๋‹ˆ๋‹ค. ํ—›๋œ ๋ฏฟ์Œ์„ ๊ฐ–๊ฑฐ๋‚˜ ๋งˆ์Œ์„ ๋†“๋Š” ์ˆœ๊ฐ„์— ๋ฐ”๋กœ ํ˜ธ๊ตฌ๊ฐ€ ๋œ๋‹ค. ์นœ๊ตฌ๋“  ํ•™๊ต/์ง์žฅ ์„ ํ›„๋ฐฐ๋“ โ€ฆ ์„ธ์ƒ์— ๋ฏฟ์„ ๋†ˆ์€ ์—†๋‹ค. ์ˆซ์ž๋งŒ ๋ฏฟ์–ด๋ผ! ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ์ŠคํŽ™์„ ๊ธฐ์ค€์œผ๋กœ ์‚ผ๋Š”๋‹ค. ๋ชจ๋“ž ๋ฒค๋”๊ฐ€ ๋งž์ถฐ์ค„ ์ˆ˜ ์žˆ๋Š” ์ŠคํŽ™์„ ๊ธฐ์ค€์œผ๋กœ ์ •ํ•˜๊ณ  ํ˜‘์ƒํ•œ๋‹ค. ํŠน์ • ์žฅ๋น„์— ๋Œ€ํ•œ ์˜์กฒ์„ฑ์€ ๊ฑธ๋ฆฌ๊ธฐ๋Š” ์‰ฌ์›Œ๋„ ํ’€๊ธฐ๋Š” ์ •๋ง ์–ด๋ ต๋‹ค. ํŠน์ • ๋ฒค๋”์˜ ํŠน์„ฑ์ด ๊ฐ•์กฐ๋œ ์ŠคํŽ™์€ ์ฒ ์ €ํžˆ ๋ฐฐ์ œํ•š๋‹ค.
  • 7. ๋ชฉ๋ˆ์ด ์—†๋‹ค! ๊ตฌ๋งคํ•˜์ง€ ์•Š๊ณ  ๋ Œํƒˆํ•˜๋Š” ์‹œ๋Œ€! ์ด๋ฏธ ์ฃผ๋ ฅ ์„œ๋น„์Šค์™€ ๋ฐ์ดํ„ฐ๊ฐ€ AWS์— ์žˆ๋Š” ๊ฒฝ์šฐ EMR์€ ํ•„์š”ํ•  ๋•Œ ํ•„์š”ํ•œ ๋งŒํผ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด ๋งค์šฐ ํ•ฉ๋ฆฌ์ . ๊ทธ๋Ÿฌ๋‚˜, ์žฅ๊ธฐ๊ฐ„/์ง€์†์ ์ธ ์‚ฌ์šฉ ๏ƒจ ๋น„์šฉ ์ด์Šˆ ์žฅ์• ๋ฐœ์ƒ ๏ƒจ ์•„๋งˆ์กฒ ํ•ด๋ฐ”๋ผ๊ธฐํ˜• ์ข€๋น„๋กœ ๋ณ€์‹ž AWS์— ๋ชจ๋“  ์šด๋ช…์„ ๊ฑธ์–ด๋†“์€ ๋ถ„๋“ค์—๊ฒŒ ๊ฐ•๋ ฅ ์ถ”์ฒœํ•จ. Public cloud์˜ VM์œผ๋กœ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒฝ์šฐ ์ ƒ๋Œ€ ์ด๋Ÿฐ ์ง“์„ ํ•ด์„œ๋Š” ์•ˆ๋จ. - ๋ป˜์ง’์„ ๊ฒฝํ—˜ํ•œ ์„ ๊ตฌ์ž(๋˜๋Š” ๋งˆ๋ฃจํƒ€)์˜ ํšŒ๊ณ  ์„ฑ๋Šฅ/๋น„์šฉ ๋ชจ๋‘ ๋งŒ์กฑํ•› ์ˆ˜ ์—†์Œ.
  • 8. Private cloud๋ฅผ ๊ตฌ์ถ•ํ–ˆ๋Š”๋ฐโ€ฆ CloudStack ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•ํ•š ๊ฒฝ์šฐ ๋™์ผ๋ž™(๋™์ผRVM) ์ด๋‚ด๋กœ ๊ทœ๋ชจ๋ฅผ ์ œํ•œํ•  ๊ฒฝ์šฐ ์ ๋‹นํžˆ ์“ธ๋งŒํ•จ. ๋™๊ธ‰์˜ EMR๋ณด๋‹ค ์ข€ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ. ๊ทธ๋Ÿฌ๋‚˜, ๋ฐ˜๋“œ์‹œ ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ๋ฌผ๋ฆฌ๋ž™์— ์ง€์ •ํ•ด์„œ ๋ชฐ๋นตํ•ด์•ผ ํ•จ. ๋”œ๋ ˆ๋งˆ : ๊ทธ๋Ÿผ ๋ญํ•˜๋Ÿฌ cloud๋ฅผ ์“ฐ๋Š”๊ฑฐ์ง€??? ์—ฌ๋Ÿฌ ๋ฌผ๋ฆฌ๋ž™์œผ๋กœ ๊ตฌ์„ฑํ•  ์ˆ˜๋„ ์žˆ์œผ๋‚˜ ์—„์ฒญ๋‚œ ์„ฑ๋Šฅ์ €ํ•˜๋ฅผ ๊ฐ์ˆ˜ํ•ด์•ผ ํ•จ. OpenStack ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•ํ•š ๊ฒฝ์šฐ CloudStack์— ๋น„ํ•ด ๊ตฌ์กฐ์ ์œผ๋กœ ์œ ์—ฐํ•˜๊ณ  ์ด์ƒ์ ์ž„. ๊ทธ๋Ÿฌ๋‚˜, Network ๊ตฌ์„ฑ์—์„œ ์‹ค์ œ๋กœ ์—„์ฒญ๋‚œ ๋Œ€์—ญํญ์„ ํ•„์š”๋กœ ํ•จ. ์„œ๋ฒ„๋ณด๋‹ค ๋„คํŠธ์›Œํฌ์— ๋” ํˆฌ์žํ•› ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ถ”์ฒœํ•จ. (์ˆ˜๋ฐฑGbps?)
  • 9. ์ง์ ‘ ๊ตฌ์ถ•ํ•˜๋ ค๋ฉด ์–ด๋””์—? ํšŒ์‚ฌ ๋‚ด๋ถ€์—์„œ ๊ด€๋ฆฌํ•˜๋Š” ์ ‚์‚ฐ์‹ค์— ์„ค์น˜ ์ƒ๋ฉด/๋„คํŠธ์›Œํฌ ๋น„์šฉ ์ ƒ์•ฝ, ์žฅ์• ๋Œ€์‘/์œ ์ง€๋ณด์ˆ˜ํ•˜๋Š”๋ฐ ์ตœ๊ณ ์˜ ์„ ํƒ์ง€. ๊ทธ๋Ÿฌ๋‚˜, (์ ‚์‚ฌ ๋„คํŠธ์›Œํฌ๋ฅผ ๋งˆ๋น„์‹œํ‚ค๊ณ  ์‹ถ์ง€ ์•Š๋‹ค๋ฉด) ๋ฐ˜๋“œ์‹œ vLAN ๊ตฌ์„ฑ ๋ฐ ํ์‡„๋ง์„ ๋ณ„๋„ ๊ตฌ์„ฑํ•ด์„œ ์—„์ฒญ๋‚œ ํŠธ๋ž˜ํ”ฝ์„ ๊ฐ€๋‘ฌ์•ผ ํ•จ. ์ ‚์‚ฐ์‹ค์ด IDC ์ˆ˜์ค€์— ๋ชป ๋ฏธ์นœ๋‹ค๋ฉด ์ถ”๊ฐ€ ๊ณ ๋ ค์‚ฌํ•ญ(๋น„์šฉ)์ด ๋งŽ์ด ์ฆ๊ฐ€ํ•จ. IDC์— co-location์œผ๋กœ ์งฑ๋ฐ•์•„ ๋†“๊ธฐ ์‚ฌ๋‚ด ์ ‚์‚ฐ์‹ค์ด ์—†๊ฑฐ๋‚˜ ์ƒ๋ฉด์ด ๋ถ€์กฑํ•˜๋ฉด IDC๊ฐ€ ์œ ์ผํ•œ ์„ ํƒ์ง€์ž„. ๊ทธ๋Ÿฌ๋‚˜, ์ƒ๋ฉด/๋„คํŠธ์›Œํฌ ๋น„์šฉ์ด ๋†’๋‹ค. (1U๋‹น 100๋งŒ์›/๋…„ ์ด์ƒ ์ง€์ถœ) ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์–ธ์ œ๋“ ์ง€ IDC์— ๋“ค์–ด๊ฐ€ ๋ณต๊ตฌํ•› ์šฉ์ž๊ฐ€ ํ•„์š”.
  • 10. ์šฉ๋Ÿ‰์‚ฐ์ •์€ ์–ด๋–ป๊ฒŒ? โ‘  1์ผ ์ ์žฌ๋Ÿ‰ โ‘ก ํ–ฅํ›„ 2~3๋…„ ๋™์•ˆ ์ฆ๊ฐ€๊ฐ€ ์˜ˆ์ƒ๋˜๋Š” ์ฆ๊ฐ€ ๋น„์œจ (์„œ๋น„์Šค ์ง€ํ‘œ ๋“ฑ์„ ์ฐธ๊ณ ๋กœ ๋น„์œจ์„ ๋ณด์ˆ˜์ ์œผ๋กœ ์‚ฐ์ถœ) โ‘ข ๊ธฐ์กฒ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ์ž์ฒด์ ์œผ๋กœ ์ƒ์‚ฐํ•˜๋Š” ์šฉ๋Ÿ‰ โ‘ฃ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•˜์ง€ ์•Š๊ณ  ๋ณด๊ด€ํ•˜๋Š” ๊ธฐ๊ฐ„ โ‘ค OS ์„ค์น˜์šฉ๋Ÿ‰ (๋Œ€๋‹น 1~20GB) โ‘ฅ ์ž„์‹œ ์šฉ๋Ÿ‰ (HDFS temp ๋ฐ shuffle์šฉ, ๋ณดํ†ต 10%) โ‘ฆ ์„ค์น˜ ํ”„๋กœ๊ทธ๋žจ ์šฉ๋Ÿ‰ (๋Œ€๋‹น 1~30GB) โ‘ง ์ž์ฒด๋กœ๊ทธ ์ ์žฌ ์šฉ๋Ÿ‰ (hadoop ์ž์ฒด log๋ฅผ ๋ณด๊ด€ํ•  ์šฉ๋Ÿ‰, ๋ณดํ†ต 10%) โ‘จ ์žฅ์• ๋Œ€์‘ ์—ฌ์œ ์œจ (๋ณดํ†ต 30%) โ‘ฉ ๋…ธ๋“œ๋‹น ๋””์Šคํฌ ๊ฐœ์ˆ˜ (1U=4 or 8, 2U=12 or 24) โ‘ช ๊ฐœ๋ณ„ ๋””์Šคํฌ ์šฉ๋Ÿ‰ โ‘ซ ๋‹จ์œ„ ๋ณด์ • (1TB ๋””์Šคํฌ๋Š” ์‹ค์ œ 1,000,000,000,000 Byte ๏ƒ  931GB ๏ƒจ 10% ๋ณด์ •์ด ํ•„์š”) - HDFS ์ ์žฌ ์†Œ์š”๋Ÿ‰ ๊ณต์‹ : Hs = (โ‘  + โ‘ข ) x โ‘ก x โ‘ฃ x 3[replica] ex) Hs = ( 500GB + 30GB ) x 101 / 100 x 365day x 3[rep] = 573TB - ์ ‚์ฒด ์„œ๋ฒ„๋Œ€์ˆ˜ 1์ฐจ ์‚ฐ์ถœ ๊ณต์‹ : St1 = Hs / โ‘ฉ / โ‘ช x โ‘ซ ex) St1 = 573 / 4 / 3 x 10% = 53๋Œ€ (์†Œ์ˆ˜์ ์ดํ•˜ ๋ฐœ์ƒ์‹œ ๋ฌด์กฐ๊ฑด +1 ํ•ด์•ผํ•จ) - ์ ‚์ฒด ๋ฌผ๋ฆฌ์  ์šฉ๋Ÿ‰ ๊ณต์‹ : Ts = ( Hs + ( โ‘ค + โ‘ฆ ) x St1 x โ‘ฅ x โ‘ง ) x โ‘จ ex) Ts = ( 573TB + ( 20GB + 30GB ) x 53๋Œ€ x 10% x 10% x 10% ) x 30% = 750 TB - ์ ‚์ฒด ์„œ๋ฒ„๋Œ€์ˆ˜ ์ตœ์ข… ์‚ฐ์ถœ ๊ณต์‹ : St2 = Ts / โ‘ฉ / โ‘ช x โ‘ซ ex) St2 = 750 / 4 / 3 x 10% = 69๋Œ€ ps. Hive๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ข€๋” ๋””ํ…Œ์ผ์ด ํ•„์š”. (File format/์••์ถ• ์ข…๋ฅ˜์— ๋”ฐ๋ผโ€ฆ)
  • 11. ์—ฌ์ฐจ์ €์ฐจ ๋„์ž…ํ–ˆ๋Š”๋ฐ.. ๋‚ด ์Šน์งˆ์ด ๊ธ‰ํ•š๊ฑด๊ฐ€, ์„œ๋ฒ„๊ฐ€ ๋А๋ฆฐ๊ฑด๊ฐ€ ์„ฑ๋Šฅ์ด์Šˆ๋Š” ๋ฏธ๋ฆฌ ๋Œ€๋น„ํ•˜๋”๋ผ๋„ ๋Œ€๋ถ€๋ถ‚ ๋ฐœ์ƒํ•œ๋‹ค. (์•ˆ์ƒ๊ธฐ๋Š”๊ฒŒ ์ด์ƒํ•จโ€ฆ ์•„๋‹ˆ๋ฉด ์ž‘์—…์„ ๋นก์„ธ๊ฒŒ ์•ˆ๋Œ๋ ธ๋˜๊ฐ€โ€ฆ) ์•„์ฃผ ๋‹ค์–‘ํ•œ ์›์ธ์ด ์กฒ์žฌํ•จ. ๋Œ€๋ถ€๋ถ‚์˜ ์ด์Šˆ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ณณ ๏ƒ  ์„ฑ๋Šฅ์ด ๋‚ฎ์€ ์š”์†Œ ๏ƒจ ๋””์Šคํฌ & ๋„คํŠธ์›Œํฌ! ์ผ๋ฐ˜ SATA Disk = ํ‰๊ท  150MB/s ๋„คํŠธ์›Œํฌ 1Gb = ํ‰๊ท  100MB/s 1Gb๋Š” ์ผ๋ฐ˜ ๋””์Šคํฌ 1๊ฐœ ์†๋„๋„ ์•ˆ๋˜๋Š” ๊ฑฐ๋ถ์ดโ€ฆ ๊ทธ๋ž˜์„œ ํ•˜๋‘ก์„ ์‚ฌ์šฉํ•˜๋Š” ๊ณณ์€ ๋Œ€๋ถ€๋ถ‚ 10Gb๊ฐ€ ๊ธฐ๋ณธ์ด ๋˜์—ˆ๋‹ค.
  • 12. ๋””์Šคํฌ ์„ฑ๋Šฅ ์ด์Šˆ : RAID๋ฅผ ์ž˜ ์“ฐ๋ฉด ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์ง‚๋‹ค. Case Disk ์ ์šฉ ์‚ฌํ•ญ Data Node Master Node ๊ธฐํƒ€ (์ˆ˜์ง‘/Import/Export) Boot ์˜์—ญ DFS ์˜์—ญ 1U 2.5โ€ (8~10๊ฐœ) โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ โ€ขDisk 2๊ฐœ๋ฅผ RAID-1 โ€ข๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ โ€ขDisk 2๊ฐœ์”ฉ RAID-0 ์„ค์ • ๏ƒ  3~4๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 (Package+Data) โ€ขAll RAID-10 โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 or 5 (Package+Data) โ€ขAll RAID-10 or 5 3.5โ€ (4๊ฐœ) โ€ข๊ฐ€์šฉ์„ฑ ๊ทน๋Œ€ํ™” โ€ขDisk 1๊ฐœ์— OS ์„ค์น˜์šฉ ์ตœ์†Œ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ, ๋‚˜๋จธ์ง€๋Š” DFS ์˜์—ญ โ€ขSSD๋ฅผ ์ถ”๊ฐ€ ๏ƒ  OS์ ‚์šฉ โ€ข๊ฐ€์šฉ์„ฑ ๊ทน๋Œ€ํ™” โ€ขDisk ๊ฐœ๋ณ„ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ ๏ƒ  3~4๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ โ€ขAll RAID-10 โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ โ€ขAll RAID-10 or 5 2U 2.5โ€ (20~24๊ฐœ) โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ โ€ขDisk 2๊ฐœ๋ฅผ RAID-1 โ€ข๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ โ€ขDisk 2๊ฐœ์”ฉ RAID-0 ์„ค์ • ๏ƒ  9~11๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 (Package+Data) โ€ขAll RAID-10 โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 or 5 (Package+Data) โ€ขAll RAID-10 or 5 3.5โ€ (8~10๊ฐœ) โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ โ€ขDisk 2๊ฐœ๋ฅผ RAID-1 โ€ขSSD๋ฅผ ์ถ”๊ฐ€ ๏ƒ  OS์ ‚์šฉ โ€ข๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ โ€ขDisk 2๊ฐœ์”ฉ RAID-0 ์„ค์ • ๏ƒ  3~4๊ฐœ ํŒŒํ‹ฐ์…˜ ๊ตฌ์„ฑ โ€ข๊ณ ๊ฐ€์šฉ์„ฑ ์ ์šฉ โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 (Package+Data) โ€ขAll RAID-10 โ€ข๊ณ ๊ฐ€์šฉ์„ฑ + ๊ณ ์„ฑ๋Šฅ ํ™œ์šฉ โ€ขDisk 2๊ฐœ๋งŒ RAID-1 (Boot) + ๋‚˜๋จธ์ง€ RAID-10 or 5 (Package+Data) โ€ขAll RAID-10 or 5
  • 13. ๋””์Šคํฌ ์„ฑ๋Šฅ ์ด์Šˆ : ๊ทธ๋Ÿผ SSD๋ฅผ ์“ฐ๋ฉด ์—„์ฒญ๋‚˜๊ฒŒ ์ข‹์•„์ง€๋ ค๋‚˜? ์™œ SSD๋ฅผ ์‚ฌ์šฉํ•˜๋ ค ํ•˜๋Š”๊ฐ€? ๏ƒ  ๋น ๋ฅธ ์†๋„์™€ ์—„์ฒญ๋‚œ ๋žš๋ค์–ต์„ธ์Šค ์„ฑ๋Šฅ (์‹ค์ œ๋กœ HBase์šฉ Hadoop์—์„œ ์ž์ฃผ ์ฑ„์šฉ) ํ•˜์ง€๋งŒ ์•„์ง์€ ๋‚ด๊ตฌ์„ฑ์— ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ ๏ƒ  WRITE ํšŸ์ˆ˜ ์ œํ•œ (์‹ค์ œ RDB main store๋กœ ์‚ฌ์šฉ์‹œ 6๊ฐœ์›”์„ ๋ชป๋ฒ„ํ‹ฐ๊ณ  ์‚ฌ๋ง) ๏ƒ  ๋งŒ์•ฝ ๋™์ผ ์—ญํ• ์˜ ์—ฌ๋Ÿฌ ์žฅ๋น„์— ๋™์‹œ์— ์žฅ์ฐฉํ•˜๋ฉด ๊ฑฐ์˜ ๋™์‹œ๋‹ค๋ฐœ๋กœ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒ ๏ƒ  ๋งŒ์•ฝ DataNode์—๋‹ค ์žฅ์ฐฉํ–ˆ๋‹ค๋ฉดโ€ฆ ํ•˜๋‘ก์˜ 3-replica๋„ ๋ฌด์šฉ์ง€๋ฌผ์ด ๋  ์ˆ˜ ์žˆ์Œ. ๋‚ด๊ตฌ์„ฑ๊ณผ ์„ฑ๋Šฅ์ด ์ข‹์€ ์ œํ’ˆ์€ ๊ฐ€๊ฒฉ์ด 10๋ฐฐ ์ด์ƒ ์ƒ์Šน ๏ƒ  Fusion IO (์ด์   ์ด๊ฒŒ ๋ถ€์˜ ์ƒ์ง“์ธ๊ฐ€?) SSD, ๊ทผ๋ฐ ์ •๋ง ๋น ๋ฅธ๊ฐ€? ๏ƒ  ์ผ๋ฐ˜ SSD๋Š” 500 MB/s ์ˆ˜์ค€, Fusion IO๋Š” ๋ณดํ†ต 1 GB/s ์ˆ˜์ค€ Workaround : SSD์— ๋งž๋จน๋Š” ์†๋„๊ฐ€ ํ•„์š”ํ•› ๋•Œ ๏ƒ  ๋‹จ์ผ HDD๋Š” ๋ณดํ†ต 150 MB/s ์ˆ˜์ค€ (์ตœ์‹ž๋ชจ๋ธ์€ 170MB/s๊นŒ์ง€ ํ™•์ธ) ๏ƒ  HDD 4๊ฐœ ์ด์ƒ์„ RAID-10๋กœ ๋ฌถ์œผ๋ฉด 350MB/s ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ ํš๋“ ๏ƒ  ์š”์ฆ˜ RAID Controller ์„ฑ๋Šฅ ๋ฌด์ง€ ์ข‹์•„์ง (RAID-5๊ฐ€ RAID-10๋ณด๋‹ค ๋น ๋ฆ„, ์ŠคํŽ™ํ™•์ธ ํ•„์ˆ˜!)
  • 14. ๋””์Šคํฌ ์„ฑ๋Šฅ ์ด์Šˆ : OS ํŠœ๋‹๋„ ํ•„์š”ํ•ด! ์˜์™ธ๋กœ ๋ฆฌ๋ˆ…์Šค ์ปค๋„๋ฒ„๊ทธ๊ฐ€ ๋งŽ๋‹ค. ๏ƒ  ๋Œ€ํ‘œ์ ์œผ๋กœ RHEL 6.2/6.3 THP issue (THP๋ฅผ ๋น„ํ™œ์„ฑํ™”์‹œ์ผœ์•ผ ํ•จ) http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/ I/O Scheduler ๏ƒ  RAID๊ฐ€ ์—†์œผ๋ฉด DataNode๋Š” daedline ๏ƒ  RAID๊ฐ€ ์žˆ์œผ๋ฉด DataNode๋Š” noop Disk Cache๋ฅผ ์ตœ๋Œ€ํ•š ์ฅ์–ด์งœ์•ผ ํ•š๋‹ค. ๏ƒ  Linux์˜ Read Ahead cache๋Š” ๊ฒจ์šฐ 128KB. ๏ƒ  ์ ์ ƒํžˆ ์ฆ๊ฐ€์‹œ์ผœ์•ผ ํ•œ๋‹ค. (๋ณดํ†ต 2MB ์ถ”์ฒœ) ๏ƒ  1MB ๋‹จ์œ„๋กœ Disk์˜ Cache size (๋ณดํ†ต 64MB)๊นŒ์ง€ 1~8MB ๋‹จ์œ„๋กœ ๋Š˜๋ฆฌ๋ฉด์„œ ํ…Œ์ŠคํŠธ ํ•„์š”. ๏ƒ  ์–ถ์ œ๊นŒ์ง€? ๏ƒจ I/O Wait์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ๊บฝ์ผ ๋•Œ ๊นŒ์ง€โ€ฆ ๊ทธ๋•Œ์˜ ์บ์‹œํฌ๊ธฐ๋ฅผ ์ตœ์ ์œผ๋กœ ์‚ผ๋Š”๋‹ค. ๏ƒ  RAID Controller๊ฐ€ ์žฅ์ฐฉ๋œ ๊ฒฝ์šฐ ์ตœ๋Œ€ 2GB์˜ cache๊ฐ€ ์žˆ์Œ.
  • 15. ๊ธ€๋กœ๋ฒŒ ๋ ˆํผ๋Ÿฐ์Šค ์ŠคํŽ™์„ ์•Œ๊ณ  ์‹ถ์–ด์š”! ๋ชจ๋“  ํžŒํŠธ๋Š” www.opencompute.org์—์„œ ํ™•์ธํ•› ์ˆ˜ ์žˆ๋‹ค. facebook์˜ ํ‘œ์ค€ ์žฅ๋น„์ŠคํŽ™์„ ์ •๋ฆฌํ•œ ๋‚ด์šฉ๋“คโ€ฆ facebook? ์•„๋งˆ ์ ‚์„ธ๊ณ„์—์„œ Hadoop์„ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ํšŒ์‚ฌโ€ฆ ์ฐธ๊ณ ๋กœ, ์—ฌ๊ธฐ์—์„œ ๋„คํŠธ์›Œํฌ ๊ด€๋ จ PDF๋“ค์„ ๋‹ค์šฒ๋ฐ›์•„์„œ ๋ณด์‹œ๋ผโ€ฆ 10Gb๋Š” ๋‹น์—ฐํžˆ ๊ธฐ๋ณธ. UP-Link๋Š” 40Gb๊ฐ€ ์ตœ๋Œ€ 12๊ฐœ!!! (40 x 12 = 480Gbps ๋Œ€์—ญํญ) ๊ทธ๋Ÿฐ๋ฐโ€ฆ ๊ฒฝํ—˜์ ์œผ๋กœ ์˜ˆ์ƒํ•˜๊ฑด๋ฐ, ์ด ์ •๋„ ์ŠคํŽ™๋„ facebook์—์„œ๋Š” ๋ชจ์ž๋ฅผ๊ป„? ๊ณต์‹ ์ฑ„๋„์„ ํ†ตํ•ด ๋น„๊ณต์‹์ ์œผ๋กœ ๋“ค์€ ์ด์•ผ๊ธฐ๋กœ๋Š”โ€ฆ ํ˜„์žฌ ๋ฏธ๊ตญ ๋ช‡๋ช‡ ํšŒ์‚ฌ์—์„œ 100Gb ์žฅ๋น„๋ฅผ BMT ํ•˜๋Š” ์ค‘ ์ด๋ผ๊ณ โ€ฆ