SlideShare a Scribd company logo
Enabling Exploratory Analytics of Data in
Shared-service Hadoop Clusters
PRESENTED BY Sagi Zelnick Principal Architect @ Yahoo and Ledion Bitincka Principal Architect @ Splunk
Hadoop Summit June 2014 San Jose, CA
Overview
2 Yahoo Proprietary
!  Hadoop @ Yahoo: 8+ years of innovation
!  Hunk @ Yahoo: organization-wide investment for next 3+ years
!  Yahoo providing Hunk as a self-service to explore, analyze & visualize data in HDFS
›  Hunk allows for visually browsing very complex tables (250+ fields)
›  Rapid prototyping for new jobs with almost instant results for searches, without having
to wait for the entire job/query to finish
›  Cuts down on the development cycles by faster interaction with results
›  Built-in graphs/charts makes for a powerful solution for many situations
About your speakers
3 Yahoo Proprietary
Sagi Zelnick Ledion Bitincka
Principal Architect Principal Architect
Yahoo Splunk
Hunk + Hadoop @ Yahoo
4Yahoo Proprietary
5 Yahoo Proprietary
History of Hadoop innovation @ Yahoo
Over 600PB of Hadoop storage (over half an Exabyte)
6 Yahoo Proprietary
!  Very large clusters used by many groups across the enterprise.
!  More than 35,000 individual datanodes.
!  Hadoop is provided as a service.
!  Multiple cluster types such as research, dev, sandbox and production.
!  Services such as HBase, Hive, Oozie, etc…
!  Users are free to run jobs, but have resource constraints.
!  Maintained by the Grid Operations Group.
Improving operational visibility with Hunk
!  We pointed Hunk at many operational logs and event data we already
had on the grid.
!  This includes system metrics, HDFS ops, JVM stats and YARN metrics.
!  Created instrumentation to measure usage per user and job.
!  Analyzed terabytes of NameNode audit logs.
!  Job history leveraged for visualizing usage/growth and historical views.
!  Custom events for HBase statistics.
7 Yahoo Proprietary
Use Case Customer Benefits
System metrics from 35k nodes Grid Ops / Grid
Customers
Identify slow tasks/nodes
when debugging
Historical insights of resources All Grid Customers Track organic growth
Job performance All Grid Customers Improved job SLAs
HBase metrics All Grid Customers Track region/RS/table
metrics…
Job logs in near real-time All Grid Customers / Ops Search for errors directly
from the YARN logs
Namenode operational data Research, Dev Improved performance and
stability
Tracking Hadoop performance and metrics in Hunk
8 Yahoo Proprietary
Measuring NameNode performance pre & post upgrades
9 Yahoo Proprietary
!  Historical visualizations of all operations.
!  Search data in Hunk from billions of NameNode events.
!  Measure JVM and memory usage.
!  Insights into operational performance.
Yahoo Proprietary
index="simon_blue_new_all" this_cluster="dilithiumblue*" (log_subtype="DFS" #hdfs=hdfs) | timechart spa
n=1h avg(number*) as num_*
Last 7 days
✓ 10,086 events (5/15/14 1:00:00.000 AM to 5/22/14 1:36:34.000 AM)
_time
num_BlockReports num_CopyBl...perations num_HeartBeats num_ReadBl...perations
num_ReadMe...perations num_Replac...Operations num_WriteB...Operations num_blockChecksumOp
Fri May 16
2014
Sun May 18 Tue May 20
200,000,000
400,000,000
600,000,000
_time ↕
num_Bl
ockRep
orts ↕
num_Copy
BlockOpera
tions ↕
num_
HeartB
eats ↕
num_Read
BlockOpera
tions ↕
num_ReadMe
tadataOperati
ons ↕
num_Replac
eBlockOperat
ions ↕
num_Write
BlockOpera
tions ↕
num_blo
ckChecks
umOp ↕
2014-05-15 01:00 112443
7.7359
02
46721126.
819672
51495
7.3840
98
12930433.0
77869
0.000000 94210832.78
6885
63512425.9
67213
13975.30
6557
2014-05-15 02:00 111549
6.2904
92
53597000.
262295
29871
7.6370
49
10402176.7
17213
0.000000 94109944.65
5738
93916552.3
93443
35459.28
8689
2014-05-15 03:00 111037
2.4173
56566721.
704918
42849
4.9449
13296385.5
90164
0.000000 94141430.29
5082
97353478.2
29508
20307.54
9344
Visualization
Visualization using Hunk
10
11 Yahoo Proprietary
n=5m avg(number*) as num_*
Last 2 days
✓ 2,753 events (5/20/14 1:14:21.000 AM to 5/22/14 1:14:21.000 AM)
_time
num_BlockReports num_CopyBl...perations num_HeartBeats num_ReadBl...perations
num_ReadMe...perations num_Replac...Operations num_WriteB...Operations num_blockChecksumOp
12:00 PM
Tue May 20
2014
12:00 AM
Wed May 21
12:00 PM
1,000,000,000
250,000,000
500,000,000
750,000,000
_time ↕
num_Bl
ockRep
orts ↕
num_Copy
BlockOpera
tions ↕
num_
HeartB
eats ↕
num_Read
BlockOpera
tions ↕
num_ReadMe
tadataOperati
ons ↕
num_Replac
eBlockOperat
ions ↕
num_Write
BlockOpera
tions ↕
num_blo
ckChecks
umOp ↕
2014-05-20 01:15:00 105604
7.0240
00
34677652.
000000
12412
1.2640
00
26242490.8
00000
0.000000 88112292.80
0000
126478486.
400000
51405.34
6000
2014-05-20 01:20:00 105551 30920700. 10653 22756041.8 0.000000 87745422.40 92323387.2 32070.48
Visualization
Sample troubleshooting in Hunk of 750 million events
12 Yahoo Proprietary
New Search
index="simon_blue_new_all" this_cluster="dilithiumblue*" (log_subtype="JVM" ProcessName="NameNode") | tim
echart span=5m avg(Threads*) as threads_*
Last 2 days
✓ 8,463 events (5/20/14 12:00:00.000 AM to 5/22/14 12:00:00.000 AM)
_time
threads_Blocked threads_New threads_Runnable threads_Terminated threads_TimedWaiting
threads_Waiting
12:00 AM
Tue May 20
2014
12:00 PM 12:00 AM
Wed May 21
12:00 PM
200
400
_time ↕
threads_Block
ed ↕
threads_Ne
w ↕
threads_Runna
ble ↕
threads_Terminat
ed ↕
threads_TimedWait
ing ↕
threads_Waiti
ng ↕
2014-05-20 00:00:00 72.360000 10.638333 5.485833 0.000000 21.208333 78.555000
2014-05-20 00:05:00 70.177333 10.554667 5.277333 0.000000 20.744667 76.578000
2014-05-20 00:10:00 70.211333 9.998667 5.022000 0.000000 19.333333 73.766667
2014-05-20 00:15:00 70.300667 10.268000 5.156667 0.000000 17.488667 70.122000
2014-05-20 00:20:00 70.422667 10.376000 5.188000 0.000000 15.700000 66.611333
2014-05-20 00:25:00 70.444000 10.288000 5.144000 0.000000 14.089333 63.400667
Visualization
Big picture plus granular details
Analyzing NameNode RPC calls (troubleshooting)
13 Yahoo Proprietary
!  Who is making what RPC call (open, listStatus, create, etc.).
!  How often are they making these RPC calls.
!  From which IP/host are they coming from.
!  Search and visualize historical data from billions of events.
!  Prevent NameNode abuse/misuse.
14 Yahoo Proprietary
Visualizing 834 million discrete events …
15 Yahoo Confidential & Proprietary
… continued
Queue insights (capacity & provisioning)
!  Each Hadoop job runs in a specific queue.
!  We track every aspect of the YARN framework.
!  Immediate queue performance and configuration profiling via job
history server.
!  Historical views and trends that enable better capacity management.
!  Improved queue utilization and allocation management.
16 Yahoo Proprietary
 New Search
index="jobsummary_logs_all_red" cluster="dilithium*" | eval total_slot_seconds=(mapSlotSeconds + reduceSlotSec
onds) | eval gb_hours=((total_slot_seconds * 0.5) / 3600) | eval gb_hours=round(gb_hours) | timechart span=6h sum
(gb_hours) as gb_hours by queue
Last 7 days
✓ 1,175,726 events (5/20/14 8:00:00.000 PM to 5/27/14 8:26:26.000 PM)
200,000
400,000
600,000
_time ↕
OTH
ER
↕
apg_dai
lyhigh_
p3 ↕
apg_dail
ymedium
_p5 ↕
apg_hou
rlyhigh_
p1 ↕
apg_ho
urlylow_
p4 ↕
apg_hourl
ymedium
_p2 ↕
apg
_p7
↕
curveb
all_larg
e ↕
curveb
all_me
d ↕
sling
shot
↕
sling
stone
↕
2014-05-20 18:00 415
4
45512 7071 25643 12111 29664 347
3
26547 14192 6087
5
4537
6
2014-05-21 00:00 193
41
92661 18005 41008 22944 88115 108
96
38648 8693 4818
6
8767
0
2014-05-21 06:00 211 108137 38398 35627 14934 101925 244 29269 14066 2434 4783
Visualization
_time
Wed May 21
2014
Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26
Search | Splunk 6.1.0 http://spbl103n01.blue.ygrid.yahoo.com:9999/en-US/app/search...
Visualizing queues
17 Yahoo Proprietary
Self-service job reports
18 Yahoo Proprietary
!  Each job is unique and so are the map and reduce elements.
!  How to start analyzing jobs?
!  Historical job performance and profiling enables in-depth
performance tuning.
!  Long terms historical views and trending of growth.
19 Yahoo Proprietary
clu
ster
↕
us
er
↕
que
ue
↕ jobName ↕ jobId ↕
status
↕
gb-ho
urs ↕
run_
mins
↕
cob
alt
g
m
on
grid
eng
PigLatin:findRemoteHDFSFromAudits.pig job_1398982765
383_315271
SUCCE
EDED
108.0
0
33.07
cob
alt
g
m
on
grid
eng
PigLatin:findRemoteHDFSFromAudits.pig job_1398982765
383_312700
SUCCE
EDED
104.0
0
37.37
cob
alt
g
m
on
grid
eng
PigLatin:findRemoteHDFSFromAudits.pig job_1398982765
383_309715
SUCCE
EDED
88.00 29.83
cob
alt
g
m
on
grid
ops
distcp: job_1398982765
383_309921
SUCCE
EDED
36.00 68.49
cob
alt
g
m
on
grid
ops
SPLK_spbl103n01.blue.ygrid.yahoo.com_1401125953.2076_0 job_1398982765
383_313570
SUCCE
EDED
25.00 14.26
cob
alt
g
m
on
grid
ops
nnaudit_DR_2014_05_25 job_1398982765
383_308938
SUCCE
EDED
25.00 15.43
cob g grid nnaudit_DB_2014_05_25 job_1398982765 SUCCE 24.00 18.07
New Search
index="jobsummary_logs_all_blue" cluster="*" user="gmon" |
eval total_slot_seconds=(mapSlotSeconds + reduceSlotSeconds) |
eval gb_hours=((total_slot_seconds * 0.5) / 3600) |
eval gb_hours=round(gb_hours,2) |
eval runtime=(finishTime-submitTime)/1000 | stats sum(gb_hours) as gb-hours
avg(runtime) as run_mins
by cluster user queue jobName jobId status| eval run_mins=round(run_mins/60,2) | sort -gb-hours
Yesterday
✓ 4,871 events (5/26/14 12:00:00.000 AM to 5/27/14 12:00:00.000 AM)
Statistics (4,871)
20 Yahoo Proprietary
21 Yahoo Proprietary
22 Yahoo Proprietary
More data to tap into with the metastore / Hive sources
23 Yahoo Proprietary
!  Using the metastore we can setup virtual indexes to any table(s) in
Hive, without the need to define the schema up-front
!  Visualize very complex tables (250+ fields)
!  Rapid prototyping for new jobs with almost instant results for searches,
without having to wait for the entire job/query to finish
!  Built-in aggregates and graphs/charts
!  Accelerates development workflow by providing faster interaction with
data
... it’s not just logs we’re looking at
24 Yahoo Proprietary
Meet%Hunk%!
26%
Integrated%Analy4cs%Pla8orm%for%Diverse%Data%Stores%
Full%featured,!
Integrated!
Product%
Fast!Insights!!
for!Everyone%
Works!with!
What!You!
Have!Today%
Explore% Visualize% Dashboard
s%
Share%Analyze%
Hadoop!Clusters! NoSQL!and!Other!Data!Stores!
Hadoop%Client%Libraries% Streaming%Resource%Libraries%
27%
Fast%Deployment%and%Configura4on%
Just%point%at%Hadoop%
•  Cer4fied%integra4ons%to%all%
major%Hadoop%distribu4ons%
•  Choose%1stLgen%MapReduce%
or%YARN%%
•  Create%Virtual%Indexes%across%
one%or%more%clusters%
•  From%download%to%searching%
data%in%<%60%minutes%
Connect%to%one%or%mul4ple%Hadoop%clusters%
YARN%
cer4fied%
28%
Interac4ve%Search%and%Results%Preview%
Rapidly%interact%with%data%
•  Powerful%Search%Processing%
Language%(SPL™)%
•  Ad%hoc%exploratory%analy4cs%
across%massive%datasets%
•  Preview%results%
•  No%fixed%schema%
•  No%requirement%to%
“understand”%data%upfront%
Search%
interface%
Preview%
results%
Drill%down%
to%raw%data%
Pause%or%stop%MapReduce%jobs%
29%
Powerful%Dashboards%for%SelfLService%Analy4cs%
Interac4ve%Dashboards%
and%Charts%
•  EasyLtoLuse%dashboard%editor%
•  Chart%overlay%
•  Pan%and%zoom%
•  InLdashboard%drill%down%
•  Embed%charts%and%
dashboards%in%3rd%party%apps%
•  Reuse%skills%with%Splunk%
Enterprise%6.1%and%Hunk%6.1%
30%
Automate%Access%for%Rapid%Explora4on%
Supported%File%Formats%
•  Text%files%
•  Sequence%files%%
•  RCFile%
•  ORC%files%
•  Parquet%
31%
RoleLbased%Security%for%Shared%Clusters%
PassLthrough%
Authen4ca4on%
•  Provide%roleLbased%security%
for%Hadoop%clusters%
•  Access%Hadoop%resources%
under%security%and%
compliance%
•  Integrates%with%Kerberos%
for%Hadoop%security%
Business!
Analyst%
MarkeNng!
Analyst%
Sys!
Admin%
Business!!
Analyst!!
Queue:!!
Biz!AnalyNcs%
MarkeNng!
Analyst!
Queue:!
MarkeNng%
Sys!!
Admin2!
Queue:!!
Prod%
32%
Powerful%Developer%
Environment%
•  Use%a%standardsLbased%web%
framework%and%REST%API%%
•  Customize%dashboards%and%
UIs%with%Simple%XML,%
JavaScript%or%Django%
•  Choose%among%SDKs%%
•  One%integra4on%for%both%
Splunk%Enterprise%and%Hunk%
Build%Analy4csLRich%Big%Data%Apps%
33%
Explore,%analyze%and%visualize%data%in%
one%integrated%pla8orm%
Point%Hunk%at%your%storage%clusters%and%
explore%data%immediately%
Preview%results%as%MapReduce%jobs%run%and%
accelerate%reports%with%no%fixed%schemas%
INTERACTIVE!
SEARCH!
RICH!DEVELOPER!
ENVIRONMENT!
Build%big%data%apps%using%standard%web%
languages%and%frameworks%
FULL%FEATURED!
ANALYTICS!
FAST!TO!DEPLOY!
AND!DRIVE!VALUE!
FullLFeatured,%Integrated%Analy4cs%Pla8orm%
Question/Comments?
Sagi Zelnick – Principal Architect
Email: zelnicks@yahoo-inc.com
Ledion Bitincka – Principal Architect
Email: lbitincka@splunk.com

More Related Content

PPTX
SplunkLive! Customer Presentation - Cardinal Health
PPT
SplunkLive! Customer Presentation - Penn State Hershey Medical Center
PDF
SplunkLive! Customer Presentation – Harris
PPTX
Power of Splunk Search Processing Language (SPL)
PPTX
Splunk Architecture overview
PPTX
How to Design, Build and Map IT and Business Services in Splunk
PPTX
Customer Presentation
PPTX
SplunkLive! Atlanta Customer Presentation – Intercontinental Exchange
SplunkLive! Customer Presentation - Cardinal Health
SplunkLive! Customer Presentation - Penn State Hershey Medical Center
SplunkLive! Customer Presentation – Harris
Power of Splunk Search Processing Language (SPL)
Splunk Architecture overview
How to Design, Build and Map IT and Business Services in Splunk
Customer Presentation
SplunkLive! Atlanta Customer Presentation – Intercontinental Exchange

What's hot (20)

PDF
Splunk Sales Presentation Imagemaker 2014
PPTX
SplunkLive! London 2016 Splunk Overview
PPTX
Getting Started with Splunk Breakout Session
PPTX
SplunkLive! Customer Presentation - Garmin International
PPTX
How to Design, Build and Map IT and Business Services in Splunk
PPTX
Splunk Ninjas: New Features, Pivot, and Search Dojo
PPTX
SplunkLive! Customer Presentation - ExxonMobil
PPTX
Data Onboarding Breakout Session
PPTX
SplunkLive! London: Splunk ninjas- new features and search dojo
PPTX
Sl boston 05_12_15_ener_noc_final_public
PDF
Splunk in Staples: IT Operations
PPTX
Customer Presentation
PPTX
Splunk for Developers
PPTX
Splunk Enterprise 6.4
PPTX
SplunkLive! San Francisco Dec 2012 - Intuit
PPTX
SplunkLive! Warsaw 2016 - Cisco
PPTX
Power of Splunk Search Processing Language (SPL) ...
PPTX
Splunk Ninjas: New Features and Search Dojo
PPTX
SplunkLive! Customer Presentation - Staples
PPTX
Getting Started with Splunk Enterprise
Splunk Sales Presentation Imagemaker 2014
SplunkLive! London 2016 Splunk Overview
Getting Started with Splunk Breakout Session
SplunkLive! Customer Presentation - Garmin International
How to Design, Build and Map IT and Business Services in Splunk
Splunk Ninjas: New Features, Pivot, and Search Dojo
SplunkLive! Customer Presentation - ExxonMobil
Data Onboarding Breakout Session
SplunkLive! London: Splunk ninjas- new features and search dojo
Sl boston 05_12_15_ener_noc_final_public
Splunk in Staples: IT Operations
Customer Presentation
Splunk for Developers
Splunk Enterprise 6.4
SplunkLive! San Francisco Dec 2012 - Intuit
SplunkLive! Warsaw 2016 - Cisco
Power of Splunk Search Processing Language (SPL) ...
Splunk Ninjas: New Features and Search Dojo
SplunkLive! Customer Presentation - Staples
Getting Started with Splunk Enterprise
Ad

Viewers also liked (20)

PPTX
SplunkLive! Hunk Technical Deep Dive
PDF
Vantrix hunk
PPTX
SplunkLive! Hunk Technical Overview
PPTX
Monitoring a Database Driven System Utilizing Splunk's DB Connect
PPTX
Hunk - Unlocking the Power of Big Data
PPTX
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
PDF
Catalog lc 2012-2
PPT
cloning
PDF
Loushkii lookbook voyager 2012 spectrum
ODP
Power point brescia
PDF
Thấu hiểu và vượt qua sự trì hoãn
PDF
Sandy financial analysis
PDF
Kpi_závěr ukol
PDF
ukol KPI
ODP
power point brescia
PDF
2013 ufsc rt_seccom
PPTX
Advertising awards
PDF
Rapid-fire BI
PDF
2013 ufsc rt_grad_class
PPTX
Woocommerce
SplunkLive! Hunk Technical Deep Dive
Vantrix hunk
SplunkLive! Hunk Technical Overview
Monitoring a Database Driven System Utilizing Splunk's DB Connect
Hunk - Unlocking the Power of Big Data
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
Catalog lc 2012-2
cloning
Loushkii lookbook voyager 2012 spectrum
Power point brescia
Thấu hiểu và vượt qua sự trì hoãn
Sandy financial analysis
Kpi_závěr ukol
ukol KPI
power point brescia
2013 ufsc rt_seccom
Advertising awards
Rapid-fire BI
2013 ufsc rt_grad_class
Woocommerce
Ad

Similar to Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters (20)

PPTX
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
PPTX
Hunk - Unlocking The Power of Big Data Breakout Session
PDF
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
PDF
Unleash your cluster with YARN
PDF
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
PPTX
Real time big data applications with hadoop ecosystem
PPTX
Big dataarchitecturesandecosystem+nosql
PDF
Facebook - Jonthan Gray - Hadoop World 2010
PDF
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
PDF
Hadoop - Past, Present and Future - v2.0
PPTX
Hunk: Splunk Analytics for Hadoop
PPTX
Analyzing Hadoop Using Hadoop
PPTX
Apache Hadoop YARN: Past, Present and Future
PDF
Apache Hadoop YARN
PPTX
Big data summit
PPTX
December 2013 HUG: Hunk - Splunk over Hadoop
PDF
Understanding Hadoop
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
PPTX
Collection of Small Tips on Further Stabilizing your Hadoop Cluster
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Hunk - Unlocking The Power of Big Data Breakout Session
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Unleash your cluster with YARN
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Real time big data applications with hadoop ecosystem
Big dataarchitecturesandecosystem+nosql
Facebook - Jonthan Gray - Hadoop World 2010
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop - Past, Present and Future - v2.0
Hunk: Splunk Analytics for Hadoop
Analyzing Hadoop Using Hadoop
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN
Big data summit
December 2013 HUG: Hunk - Splunk over Hadoop
Understanding Hadoop
Big Data Analytics with Hadoop, MongoDB and SQL Server
Collection of Small Tips on Further Stabilizing your Hadoop Cluster
Hadoop - Architectural road map for Hadoop Ecosystem

More from Brett Sheppard (16)

PDF
2025 Instawork State of Flexible Workforce.pdf
PDF
5 ways-to-improve-your-security-with-splunk
PDF
Sample Google Paid campaign results
PDF
Summary of Made to Stick book
PDF
Shift from manual to interactive reporting
PDF
Brett sheppard references
PDF
Datadog APM Product Launch
PDF
Brett Sheppard Sample Portfolio
PDF
Idc datadog-expands-into-apm
PDF
Tdwi brett-sheppard-interview-april-2014
PDF
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
PDF
Datadog brief
PDF
SEO Checklist For Rapid-growth Startups
PDF
GigaOM Putting Big Data to Work by Brett Sheppard
PDF
DxContinuum Forrester Webinar
PDF
Cloudera Hunk
2025 Instawork State of Flexible Workforce.pdf
5 ways-to-improve-your-security-with-splunk
Sample Google Paid campaign results
Summary of Made to Stick book
Shift from manual to interactive reporting
Brett sheppard references
Datadog APM Product Launch
Brett Sheppard Sample Portfolio
Idc datadog-expands-into-apm
Tdwi brett-sheppard-interview-april-2014
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
Datadog brief
SEO Checklist For Rapid-growth Startups
GigaOM Putting Big Data to Work by Brett Sheppard
DxContinuum Forrester Webinar
Cloudera Hunk

Recently uploaded (20)

PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Computer network topology notes for revision
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Lecture1 pattern recognition............
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Global journeys: estimating international migration
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Moving the Public Sector (Government) to a Digital Adoption
Major-Components-ofNKJNNKNKNKNKronment.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Reliability_Chapter_ presentation 1221.5784
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Computer network topology notes for revision
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Lecture1 pattern recognition............
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
Supervised vs unsupervised machine learning algorithms
Global journeys: estimating international migration
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
IB Computer Science - Internal Assessment.pptx

Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters

  • 1. Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters PRESENTED BY Sagi Zelnick Principal Architect @ Yahoo and Ledion Bitincka Principal Architect @ Splunk Hadoop Summit June 2014 San Jose, CA
  • 2. Overview 2 Yahoo Proprietary !  Hadoop @ Yahoo: 8+ years of innovation !  Hunk @ Yahoo: organization-wide investment for next 3+ years !  Yahoo providing Hunk as a self-service to explore, analyze & visualize data in HDFS ›  Hunk allows for visually browsing very complex tables (250+ fields) ›  Rapid prototyping for new jobs with almost instant results for searches, without having to wait for the entire job/query to finish ›  Cuts down on the development cycles by faster interaction with results ›  Built-in graphs/charts makes for a powerful solution for many situations
  • 3. About your speakers 3 Yahoo Proprietary Sagi Zelnick Ledion Bitincka Principal Architect Principal Architect Yahoo Splunk
  • 4. Hunk + Hadoop @ Yahoo 4Yahoo Proprietary
  • 5. 5 Yahoo Proprietary History of Hadoop innovation @ Yahoo
  • 6. Over 600PB of Hadoop storage (over half an Exabyte) 6 Yahoo Proprietary !  Very large clusters used by many groups across the enterprise. !  More than 35,000 individual datanodes. !  Hadoop is provided as a service. !  Multiple cluster types such as research, dev, sandbox and production. !  Services such as HBase, Hive, Oozie, etc… !  Users are free to run jobs, but have resource constraints. !  Maintained by the Grid Operations Group.
  • 7. Improving operational visibility with Hunk !  We pointed Hunk at many operational logs and event data we already had on the grid. !  This includes system metrics, HDFS ops, JVM stats and YARN metrics. !  Created instrumentation to measure usage per user and job. !  Analyzed terabytes of NameNode audit logs. !  Job history leveraged for visualizing usage/growth and historical views. !  Custom events for HBase statistics. 7 Yahoo Proprietary
  • 8. Use Case Customer Benefits System metrics from 35k nodes Grid Ops / Grid Customers Identify slow tasks/nodes when debugging Historical insights of resources All Grid Customers Track organic growth Job performance All Grid Customers Improved job SLAs HBase metrics All Grid Customers Track region/RS/table metrics… Job logs in near real-time All Grid Customers / Ops Search for errors directly from the YARN logs Namenode operational data Research, Dev Improved performance and stability Tracking Hadoop performance and metrics in Hunk 8 Yahoo Proprietary
  • 9. Measuring NameNode performance pre & post upgrades 9 Yahoo Proprietary !  Historical visualizations of all operations. !  Search data in Hunk from billions of NameNode events. !  Measure JVM and memory usage. !  Insights into operational performance.
  • 10. Yahoo Proprietary index="simon_blue_new_all" this_cluster="dilithiumblue*" (log_subtype="DFS" #hdfs=hdfs) | timechart spa n=1h avg(number*) as num_* Last 7 days ✓ 10,086 events (5/15/14 1:00:00.000 AM to 5/22/14 1:36:34.000 AM) _time num_BlockReports num_CopyBl...perations num_HeartBeats num_ReadBl...perations num_ReadMe...perations num_Replac...Operations num_WriteB...Operations num_blockChecksumOp Fri May 16 2014 Sun May 18 Tue May 20 200,000,000 400,000,000 600,000,000 _time ↕ num_Bl ockRep orts ↕ num_Copy BlockOpera tions ↕ num_ HeartB eats ↕ num_Read BlockOpera tions ↕ num_ReadMe tadataOperati ons ↕ num_Replac eBlockOperat ions ↕ num_Write BlockOpera tions ↕ num_blo ckChecks umOp ↕ 2014-05-15 01:00 112443 7.7359 02 46721126. 819672 51495 7.3840 98 12930433.0 77869 0.000000 94210832.78 6885 63512425.9 67213 13975.30 6557 2014-05-15 02:00 111549 6.2904 92 53597000. 262295 29871 7.6370 49 10402176.7 17213 0.000000 94109944.65 5738 93916552.3 93443 35459.28 8689 2014-05-15 03:00 111037 2.4173 56566721. 704918 42849 4.9449 13296385.5 90164 0.000000 94141430.29 5082 97353478.2 29508 20307.54 9344 Visualization Visualization using Hunk 10
  • 11. 11 Yahoo Proprietary n=5m avg(number*) as num_* Last 2 days ✓ 2,753 events (5/20/14 1:14:21.000 AM to 5/22/14 1:14:21.000 AM) _time num_BlockReports num_CopyBl...perations num_HeartBeats num_ReadBl...perations num_ReadMe...perations num_Replac...Operations num_WriteB...Operations num_blockChecksumOp 12:00 PM Tue May 20 2014 12:00 AM Wed May 21 12:00 PM 1,000,000,000 250,000,000 500,000,000 750,000,000 _time ↕ num_Bl ockRep orts ↕ num_Copy BlockOpera tions ↕ num_ HeartB eats ↕ num_Read BlockOpera tions ↕ num_ReadMe tadataOperati ons ↕ num_Replac eBlockOperat ions ↕ num_Write BlockOpera tions ↕ num_blo ckChecks umOp ↕ 2014-05-20 01:15:00 105604 7.0240 00 34677652. 000000 12412 1.2640 00 26242490.8 00000 0.000000 88112292.80 0000 126478486. 400000 51405.34 6000 2014-05-20 01:20:00 105551 30920700. 10653 22756041.8 0.000000 87745422.40 92323387.2 32070.48 Visualization Sample troubleshooting in Hunk of 750 million events
  • 12. 12 Yahoo Proprietary New Search index="simon_blue_new_all" this_cluster="dilithiumblue*" (log_subtype="JVM" ProcessName="NameNode") | tim echart span=5m avg(Threads*) as threads_* Last 2 days ✓ 8,463 events (5/20/14 12:00:00.000 AM to 5/22/14 12:00:00.000 AM) _time threads_Blocked threads_New threads_Runnable threads_Terminated threads_TimedWaiting threads_Waiting 12:00 AM Tue May 20 2014 12:00 PM 12:00 AM Wed May 21 12:00 PM 200 400 _time ↕ threads_Block ed ↕ threads_Ne w ↕ threads_Runna ble ↕ threads_Terminat ed ↕ threads_TimedWait ing ↕ threads_Waiti ng ↕ 2014-05-20 00:00:00 72.360000 10.638333 5.485833 0.000000 21.208333 78.555000 2014-05-20 00:05:00 70.177333 10.554667 5.277333 0.000000 20.744667 76.578000 2014-05-20 00:10:00 70.211333 9.998667 5.022000 0.000000 19.333333 73.766667 2014-05-20 00:15:00 70.300667 10.268000 5.156667 0.000000 17.488667 70.122000 2014-05-20 00:20:00 70.422667 10.376000 5.188000 0.000000 15.700000 66.611333 2014-05-20 00:25:00 70.444000 10.288000 5.144000 0.000000 14.089333 63.400667 Visualization Big picture plus granular details
  • 13. Analyzing NameNode RPC calls (troubleshooting) 13 Yahoo Proprietary !  Who is making what RPC call (open, listStatus, create, etc.). !  How often are they making these RPC calls. !  From which IP/host are they coming from. !  Search and visualize historical data from billions of events. !  Prevent NameNode abuse/misuse.
  • 14. 14 Yahoo Proprietary Visualizing 834 million discrete events …
  • 15. 15 Yahoo Confidential & Proprietary … continued
  • 16. Queue insights (capacity & provisioning) !  Each Hadoop job runs in a specific queue. !  We track every aspect of the YARN framework. !  Immediate queue performance and configuration profiling via job history server. !  Historical views and trends that enable better capacity management. !  Improved queue utilization and allocation management. 16 Yahoo Proprietary
  • 17.  New Search index="jobsummary_logs_all_red" cluster="dilithium*" | eval total_slot_seconds=(mapSlotSeconds + reduceSlotSec onds) | eval gb_hours=((total_slot_seconds * 0.5) / 3600) | eval gb_hours=round(gb_hours) | timechart span=6h sum (gb_hours) as gb_hours by queue Last 7 days ✓ 1,175,726 events (5/20/14 8:00:00.000 PM to 5/27/14 8:26:26.000 PM) 200,000 400,000 600,000 _time ↕ OTH ER ↕ apg_dai lyhigh_ p3 ↕ apg_dail ymedium _p5 ↕ apg_hou rlyhigh_ p1 ↕ apg_ho urlylow_ p4 ↕ apg_hourl ymedium _p2 ↕ apg _p7 ↕ curveb all_larg e ↕ curveb all_me d ↕ sling shot ↕ sling stone ↕ 2014-05-20 18:00 415 4 45512 7071 25643 12111 29664 347 3 26547 14192 6087 5 4537 6 2014-05-21 00:00 193 41 92661 18005 41008 22944 88115 108 96 38648 8693 4818 6 8767 0 2014-05-21 06:00 211 108137 38398 35627 14934 101925 244 29269 14066 2434 4783 Visualization _time Wed May 21 2014 Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26 Search | Splunk 6.1.0 http://spbl103n01.blue.ygrid.yahoo.com:9999/en-US/app/search... Visualizing queues 17 Yahoo Proprietary
  • 18. Self-service job reports 18 Yahoo Proprietary !  Each job is unique and so are the map and reduce elements. !  How to start analyzing jobs? !  Historical job performance and profiling enables in-depth performance tuning. !  Long terms historical views and trending of growth.
  • 19. 19 Yahoo Proprietary clu ster ↕ us er ↕ que ue ↕ jobName ↕ jobId ↕ status ↕ gb-ho urs ↕ run_ mins ↕ cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_1398982765 383_315271 SUCCE EDED 108.0 0 33.07 cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_1398982765 383_312700 SUCCE EDED 104.0 0 37.37 cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_1398982765 383_309715 SUCCE EDED 88.00 29.83 cob alt g m on grid ops distcp: job_1398982765 383_309921 SUCCE EDED 36.00 68.49 cob alt g m on grid ops SPLK_spbl103n01.blue.ygrid.yahoo.com_1401125953.2076_0 job_1398982765 383_313570 SUCCE EDED 25.00 14.26 cob alt g m on grid ops nnaudit_DR_2014_05_25 job_1398982765 383_308938 SUCCE EDED 25.00 15.43 cob g grid nnaudit_DB_2014_05_25 job_1398982765 SUCCE 24.00 18.07 New Search index="jobsummary_logs_all_blue" cluster="*" user="gmon" | eval total_slot_seconds=(mapSlotSeconds + reduceSlotSeconds) | eval gb_hours=((total_slot_seconds * 0.5) / 3600) | eval gb_hours=round(gb_hours,2) | eval runtime=(finishTime-submitTime)/1000 | stats sum(gb_hours) as gb-hours avg(runtime) as run_mins by cluster user queue jobName jobId status| eval run_mins=round(run_mins/60,2) | sort -gb-hours Yesterday ✓ 4,871 events (5/26/14 12:00:00.000 AM to 5/27/14 12:00:00.000 AM) Statistics (4,871)
  • 23. More data to tap into with the metastore / Hive sources 23 Yahoo Proprietary !  Using the metastore we can setup virtual indexes to any table(s) in Hive, without the need to define the schema up-front !  Visualize very complex tables (250+ fields) !  Rapid prototyping for new jobs with almost instant results for searches, without having to wait for the entire job/query to finish !  Built-in aggregates and graphs/charts !  Accelerates development workflow by providing faster interaction with data ... it’s not just logs we’re looking at
  • 27. 27% Fast%Deployment%and%Configura4on% Just%point%at%Hadoop% •  Cer4fied%integra4ons%to%all% major%Hadoop%distribu4ons% •  Choose%1stLgen%MapReduce% or%YARN%% •  Create%Virtual%Indexes%across% one%or%more%clusters% •  From%download%to%searching% data%in%<%60%minutes% Connect%to%one%or%mul4ple%Hadoop%clusters% YARN% cer4fied%
  • 28. 28% Interac4ve%Search%and%Results%Preview% Rapidly%interact%with%data% •  Powerful%Search%Processing% Language%(SPL™)% •  Ad%hoc%exploratory%analy4cs% across%massive%datasets% •  Preview%results% •  No%fixed%schema% •  No%requirement%to% “understand”%data%upfront% Search% interface% Preview% results% Drill%down% to%raw%data% Pause%or%stop%MapReduce%jobs%
  • 29. 29% Powerful%Dashboards%for%SelfLService%Analy4cs% Interac4ve%Dashboards% and%Charts% •  EasyLtoLuse%dashboard%editor% •  Chart%overlay% •  Pan%and%zoom% •  InLdashboard%drill%down% •  Embed%charts%and% dashboards%in%3rd%party%apps% •  Reuse%skills%with%Splunk% Enterprise%6.1%and%Hunk%6.1%
  • 31. 31% RoleLbased%Security%for%Shared%Clusters% PassLthrough% Authen4ca4on% •  Provide%roleLbased%security% for%Hadoop%clusters% •  Access%Hadoop%resources% under%security%and% compliance% •  Integrates%with%Kerberos% for%Hadoop%security% Business! Analyst% MarkeNng! Analyst% Sys! Admin% Business!! Analyst!! Queue:!! Biz!AnalyNcs% MarkeNng! Analyst! Queue:! MarkeNng% Sys!! Admin2! Queue:!! Prod%
  • 34. Question/Comments? Sagi Zelnick – Principal Architect Email: zelnicks@yahoo-inc.com Ledion Bitincka – Principal Architect Email: lbitincka@splunk.com