SlideShare a Scribd company logo
Cloudera Manager – API’s &
Extensibility
Bala Venkatrao, Products@Cloudera
December 2013

1
Cloudera Manager
End-to-End Administration for CDH

Manage

1
Monitor
2
Diagnose
3
Integrate
4

Easily deploy, configure & optimize clusters

Maintain a central view of all activity

Easily identify and resolve issues

Use Cloudera Manager with existing tools

2

©2013 Cloudera, Inc. All Rights Reserved.
Integrating with your IT Mgmt tools
Datacenter Operations

Various options of integrating Cloudera Manager into your existing
Installation,
Datacenter Operations/Tools Monitoring
Alerting
Deployment
Tools
tools
Tools
e.g. Orion,
• Cloudera Manager API
e.g. Chef,
e.g Nagios,
Tivoli, BMC
Puppet etc.
SNMP etc.
etc.
• Introduced in CM4 (June 2012)
• Installation & deployment
• Monitoring
• SNMP Alerts
• Introduced in CM4.5 (Feb 2013)
• Hadoop Operations
And more…
Cloudera
• Monitoring ‘tsquery’ (Feb 2013)
Manager
• User-defined triggers/alarms (new for C5!)
• Service extensibility (new for C5!)

3

©2013 Cloudera, Inc. All Rights Reserved.
Cloudera Manager (CM) API
•

•

API access was a feature introduced in Cloudera Manager 4.0, providing programmatic access
to cluster operations (such as configuration and restart) and monitoring information (such as
health and metrics).
The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host
and port as the CM web UI, and does not require an extra process or extra configuration. API
users have the same privileges as they do in the web UI world.
• Docs & Examples
http://cloudera.github.io/cm_api/
https://github.com/cloudera/cm_api
• Java/Python clients
http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/

4

©2013Cloudera, Inc. All Rights Reserved.
Examples of integration with CM API
•

Installation & Deployment
•
•

Chef/Puppet
Dell Crowbar
•

•

StackIQ
•

•
•

•

http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-withdell-crowbar-and-cloudera-manager/
http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-withCloudera

WANdisco – non-stop NN setup
Several other customers/partners leveraging the API’s as part of their
install & deployment process

Monitoring & Alerting
•
•

Oracle Enterprise Manager (via Big Data Appliance)
Nagios
•
•

https://github.com/cloudera/cm_api/tree/master/nagios
https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl

Develop & Contribute your plug-in’s using Cloudera
• SNMP alerts integration with IBM Netcool
Manager API
5

©2013 Cloudera, Inc. All Rights Reserved.
Cloudera Manager – Monitoring via ‘tsquery’
•

Introduced as part of CM4.5 release (Feb 2013)

•

Great way to add interesting charts (above & beyond what is provided by default)
and monitor metrics that are relevant to your clusters

•

The tsquery language is used to specify statements for retrieving time-series data
from the Cloudera Manager time-series data store

•

Example: How do I compare all disk IO for all the DataNodes that belong to a specific
HDFS service?
select bytes_read, bytes_written where roleType=DATANODE and
serviceName=hdfs1

•

Retrieved time-series data can be plotted via various options – line, bar, scatter, heat
maps, table list etc.

•

Extending this concept to create user-defined triggers/alarms (new for C5!).

•

More details
•

6

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-ManagerDiagnostics-Guide/cm5dg_chart_time_series_data.html

©2013 Cloudera, Inc. All Rights Reserved.
Examples of Cloudera Manager ‘tsquery’
Example1: How do I track the
aggregate Cluster Disk IO?
select dt0(read_bytes_disk_sum),
dt0(write_bytes_disk_sum) where
category = CLUSTER and clusterId =
$CLUSTERID
Example2: How do I compare CPU
usage across hosts?
select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_system) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100

Create & Contribute your ‘tsqueries’!
https://github.com/cloudera/cm_charting_scrapbook
7

©2013 Cloudera, Inc. All Rights Reserved.
Cloudera as an Application Platform

ISV’s view of a Database

Workload
Mgmt

Drivers
JDBC/ODBC

Security
Mgmt

Data
Access
API’s

ISV’s view of an OS

Systems
Mgmt

Package
Mgmt

Core Database

8

Process/
Resource
Mgmt

Security
Mgmt

Data
Access
API’s

Core OS kernel

©2013Cloudera, Inc. All Rights Reserved.

Systems
Mgmt
Cloudera as an Application Platform

ISV’s view of Cloudera

Package
Mgmt

Workload/
Process
Mgmt

Security
Mgmt

Data
Access
API’s

Drivers
JDBC/ODBC

CDH

9

©2013Cloudera, Inc. All Rights Reserved.

Systems
Mgmt
Cloudera Platform Features
Features

Description

Examples

Package Mgmt

- Ability to easily package and distribute binaries/jars via
“Parcels”

Informatica, Syncsort, LZO libraries

Workload/ Process Mgmt

- Ability to deploy applications as stand-alone processes
or via YARN* on the Hadoop cluster
- Isolation of cluster resources

SAS, 0xData, Accumulo, Spark

Security Mgmt

- Support for Kerberos Mgmt
- Role bases access control for Tables/Views in
Hive/Impala via Sentry

Data Access API’s

- HDFS API, HBase API, Search API, Spark API
- Kite (formerly Cloudera Development Kit)

Causata, Basis Tech, CounterTack, Amdocs

Drivers

- ODBC/JDBC drivers for Hive/Impala

Zoomdata, Tableau, Microstrategy, Qlikview

Systems Mgmt

- End-to-End management of an application via Cloudera
Manager (CM)

StackIQ, Dell Crowbar, Oracle OEM

Manage

-Deploy and upgrade (rolling) services and pkgs
-Manage configurations

Monitor

-Proactive health checks
-Track resource utilization
-Custom metrics charts

Diagnose

-Distributed log collection and searching
-Tag and track key events

Integrate

-Access CM via API

* Support for YARN planned as part of CM5.x in FY14

10

©2013Cloudera, Inc. All Rights Reserved.
Example – Deployment via Parcels

The platform for Big Data

+

The ETL app for hadoop

•

•

Smarter Deployment & Administration: Seamless integration with
Cloudera Manager for one-click deployment and easier
administration

•

11

Smarter Architecture: No code generation. ETL engine runs natively
within Hadoop MapReduce, via plugin included in CDH 4.2

Smarter Monitoring: Comprehensive logging capabilities + activity
monitoring through Cloudera Manager

©2013Cloudera, Inc. All Rights Reserved.
How it works
1. Download Syncsort DMX-h “Parcel” file to your custom repository
File contains everything you need to properly
deploy Syncsort DMX-h ETL Edition on Cloudera

2. Distribute & activate DMX-h parcel on your Cloudera cluster

A

C

Find Nodes

Install
Components

Assign Roles

Enter the names of the hosts
which will be included in the
Hadoop cluster. Click
Continue.
12

B

Cloudera Manager
automatically installs the CDH
components on the hosts you
specified.

Verify the roles of the nodes
within your cluster. Make
changes as necessary.

©2013Cloudera, Inc. All Rights Reserved.
Syncsort DMX-h + Cloudera Manager
Cloudera Manager

CDH Cluster + ISV software

Support
Integration
Monitoring

Syncsort
DMX-h

A
P
I

Management

Installation

CDH Nodes

13

DMX-h on every CDH node

©2013Cloudera, Inc. All Rights Reserved.

13
Get a 360° View of Your Cluster, Including DMX-h Logs

View service health
& performance
Get host-level
snapshots
Monitor &
diagnose workloads
Gather, view & Distribute your own Parcels via Cloudera Manager and
Build and search
Hadoop & DMX-h logs

…And more!!
14

share it with the community !
©2013Cloudera, Inc. All Rights Reserved.
Service Extensibility
•

Introduced in C5
•

Still in Beta!

•
•

Similar look and feel as existing services

•

Easy to write (Java-free!)

•

Flexible

•

15

Single management console for CDH, non-CDH services and
ISV applications

Independent release cycle

©2013Cloudera, Inc. All Rights Reserved.
So.. How does it work?
• A JSON file that describes of your service
• Set of control scripts
• Packaged as a JAR file
• As promised, Java-free

16

©2013Cloudera, Inc. All Rights Reserved.
Example: Cloudera Manager Extensions - Spark

17

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions

18

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

19

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

20

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

21

©2013Cloudera, Inc. All Rights Reserved.
The Code
name : “spark”,

#!/bin/bash

roles : [{

CMD=$1

name : "master",

MASTER_PORT=<read in from ./params.properties>

startRunner : {
program : "scripts/control.sh",

case $CMD in

args : [ "start_master",

(start_master)

"./params.properties"]

exec $SPARK_HOME/scripts/spark-start.sh master"

},

;;

parameters : [{

(*)

name : "master_port",

echo "$timestamp Don't understand [$CMD]"

type : "port",

;;

default : 7077

esac

}],
configWriter : {
generators : [{
filename : "params.properties"
}]
}]
22

©2013Cloudera, Inc. All Rights Reserved.
Next Steps
• Documentation & SDK as part of C5 Beta2
or later (definitely before GA!)
• Working with select ISV’s (SAS, 0xData
etc.) as part of Beta to further fine-tune
this feature
Develop & Contribute your Cloudera Manager service extensibility
plug-in’s !
23

©2013Cloudera, Inc. All Rights Reserved.
Service Extensibility

Vertical Extension

Vision of CM Extensibility

Horizontal Extension

0xData

SAS

Syncsort

Informatica

Revolution

API

Ops Apps
Capacity
Mgr

Security
ISV’s

SLA Mgr

Cost
Optimizer

CDH

CM
SNMP API

Oracle
OEM

24

Nagios

Dell

Chef/
Puppet

©2013Cloudera, Inc. All Rights Reserved.

Accumulo

Spark

Giraph
Q&A
• If you interested in learning more,
participating in Beta, contributing plug-ins
or Apps, contact: bala@cloudera.com

25

©2013Cloudera, Inc. All Rights Reserved.
Appendix/Resources
•

•

•

•

•

26

Systems Management
•
Cloudera Manager API
•
http://cloudera.github.io/cm_api/
•
http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/
Package Management
•
Docs on Parcels
•
http://training.cloudera.com/elearning/Parcels/
•
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-ManagerIntroduction/cmi_primer.html
•
http://blog.cloudera.com/blog/2013/05/faq-understanding-the-parcel-binary-distribution-format/
•
http://blog.cloudera.com/blog/2013/07/one-engineers-experience-with-parcel/
Data Access API’s
•
http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/
•
https://github.com/cloudera/cdk
Workload/Resource Management
•
Cloudera Manager 5 documentation
•
http://cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-ManagingClusters/cm5mc_managing_resources.html
•
http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together/
Security Management
•
http://blog.cloudera.com/blog/2013/07/with-sentry-cloudera-fills-hadoops-enterprise-security-gap/

©2013Cloudera, Inc. All Rights Reserved.

More Related Content

PDF
Cloudera cluster setup and configuration
PDF
Extending and Automating Cloudera Manager via API
PPTX
Hadoop cluster setup by using cloudera manager
PDF
Cluster management and automation with cloudera manager
PPTX
Hadoop cluster 安裝
PPTX
Ansible Automation - Enterprise Use Cases | Juncheng Anthony Lin
PPTX
How to scheduled jobs in a cloudera cluster without oozie
PPTX
How to implement a gdpr solution in a cloudera architecture
Cloudera cluster setup and configuration
Extending and Automating Cloudera Manager via API
Hadoop cluster setup by using cloudera manager
Cluster management and automation with cloudera manager
Hadoop cluster 安裝
Ansible Automation - Enterprise Use Cases | Juncheng Anthony Lin
How to scheduled jobs in a cloudera cluster without oozie
How to implement a gdpr solution in a cloudera architecture

What's hot (20)

PPT
A Tour of Internal Accumulo Testing
PPT
Mmik powershell dsc_slideshare_v1
PDF
Troubleshooting Apache Cloudstack
PDF
Installing Hadoop / Spark from scratch
PDF
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
PDF
Virtual Router in CloudStack 4.4
PPTX
Enhancing OpenStack FWaaS for real world application
PDF
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
PPTX
Why Your Apache Spark Job is Failing
PDF
Whats new in Cloudstack 4.11 - behind the headlines
PDF
Cloud stack troubleshooting
PDF
Deploying OpenStack with Chef
DOCX
Guide - Migrating from Heroku to AWS using CloudFormation
PPTX
OpenStack Keystone with LDAP
PDF
Introduction openstack-meetup-nov-28
PDF
Chef for OpenStack: OpenStack Spring Summit 2013
PDF
OpenStack in Enterprise
ODP
Compute node HA - current upstream development
PDF
OpenStack Deployment with Chef Workshop
A Tour of Internal Accumulo Testing
Mmik powershell dsc_slideshare_v1
Troubleshooting Apache Cloudstack
Installing Hadoop / Spark from scratch
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Virtual Router in CloudStack 4.4
Enhancing OpenStack FWaaS for real world application
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
Why Your Apache Spark Job is Failing
Whats new in Cloudstack 4.11 - behind the headlines
Cloud stack troubleshooting
Deploying OpenStack with Chef
Guide - Migrating from Heroku to AWS using CloudFormation
OpenStack Keystone with LDAP
Introduction openstack-meetup-nov-28
Chef for OpenStack: OpenStack Spring Summit 2013
OpenStack in Enterprise
Compute node HA - current upstream development
OpenStack Deployment with Chef Workshop
Ad

Viewers also liked (7)

PDF
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
PPTX
What the Enterprise Requires - Usability
PDF
Cloudera hadoop installation
PDF
Inside Flume
PDF
Hadoop trong triển khai Big Data
PDF
Livy: A REST Web Service For Apache Spark
PDF
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
What the Enterprise Requires - Usability
Cloudera hadoop installation
Inside Flume
Hadoop trong triển khai Big Data
Livy: A REST Web Service For Apache Spark
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Ad

Similar to Cloudera User Group SF - Cloudera Manager: APIs & Extensibility (20)

PPTX
Pa cloudera manager-api's_extensibility_v2
PDF
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
PPTX
Apache Spark Operations
PPTX
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
PPTX
BlueData Integration with Cloudera Manager
PPTX
Introducing Workload XM 8.7.18
PPTX
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
PDF
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
PPTX
Dev ops for big data cluster management tools
PDF
Introducing Cloudera Director at Big Data Bash
PPTX
Cloudera - The Modern Platform for Analytics
PPTX
Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages ...
PPTX
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
PDF
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
PPTX
Five Tips for Running Cloudera on AWS
PDF
Cloudera 5.3 Update
PPTX
Configuring a Secure, Multitenant Cluster for the Enterprise
PPTX
Instant hadoop of your own
PPTX
Big data journey to the cloud 5.30.18 asher bartch
Pa cloudera manager-api's_extensibility_v2
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Apache Spark Operations
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
BlueData Integration with Cloudera Manager
Introducing Workload XM 8.7.18
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Dev ops for big data cluster management tools
Introducing Cloudera Director at Big Data Bash
Cloudera - The Modern Platform for Analytics
Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages ...
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Five Tips for Running Cloudera on AWS
Cloudera 5.3 Update
Configuring a Secure, Multitenant Cluster for the Enterprise
Instant hadoop of your own
Big data journey to the cloud 5.30.18 asher bartch

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
Spectral efficient network and resource selection model in 5G networks
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
The AUB Centre for AI in Media Proposal.docx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

  • 1. Cloudera Manager – API’s & Extensibility Bala Venkatrao, Products@Cloudera December 2013 1
  • 2. Cloudera Manager End-to-End Administration for CDH Manage 1 Monitor 2 Diagnose 3 Integrate 4 Easily deploy, configure & optimize clusters Maintain a central view of all activity Easily identify and resolve issues Use Cloudera Manager with existing tools 2 ©2013 Cloudera, Inc. All Rights Reserved.
  • 3. Integrating with your IT Mgmt tools Datacenter Operations Various options of integrating Cloudera Manager into your existing Installation, Datacenter Operations/Tools Monitoring Alerting Deployment Tools tools Tools e.g. Orion, • Cloudera Manager API e.g. Chef, e.g Nagios, Tivoli, BMC Puppet etc. SNMP etc. etc. • Introduced in CM4 (June 2012) • Installation & deployment • Monitoring • SNMP Alerts • Introduced in CM4.5 (Feb 2013) • Hadoop Operations And more… Cloudera • Monitoring ‘tsquery’ (Feb 2013) Manager • User-defined triggers/alarms (new for C5!) • Service extensibility (new for C5!) 3 ©2013 Cloudera, Inc. All Rights Reserved.
  • 4. Cloudera Manager (CM) API • • API access was a feature introduced in Cloudera Manager 4.0, providing programmatic access to cluster operations (such as configuration and restart) and monitoring information (such as health and metrics). The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host and port as the CM web UI, and does not require an extra process or extra configuration. API users have the same privileges as they do in the web UI world. • Docs & Examples http://cloudera.github.io/cm_api/ https://github.com/cloudera/cm_api • Java/Python clients http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/ 4 ©2013Cloudera, Inc. All Rights Reserved.
  • 5. Examples of integration with CM API • Installation & Deployment • • Chef/Puppet Dell Crowbar • • StackIQ • • • • http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-withdell-crowbar-and-cloudera-manager/ http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-withCloudera WANdisco – non-stop NN setup Several other customers/partners leveraging the API’s as part of their install & deployment process Monitoring & Alerting • • Oracle Enterprise Manager (via Big Data Appliance) Nagios • • https://github.com/cloudera/cm_api/tree/master/nagios https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl Develop & Contribute your plug-in’s using Cloudera • SNMP alerts integration with IBM Netcool Manager API 5 ©2013 Cloudera, Inc. All Rights Reserved.
  • 6. Cloudera Manager – Monitoring via ‘tsquery’ • Introduced as part of CM4.5 release (Feb 2013) • Great way to add interesting charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters • The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series data store • Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service? select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1 • Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list etc. • Extending this concept to create user-defined triggers/alarms (new for C5!). • More details • 6 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-ManagerDiagnostics-Guide/cm5dg_chart_time_series_data.html ©2013 Cloudera, Inc. All Rights Reserved.
  • 7. Examples of Cloudera Manager ‘tsquery’ Example1: How do I track the aggregate Cluster Disk IO? select dt0(read_bytes_disk_sum), dt0(write_bytes_disk_sum) where category = CLUSTER and clusterId = $CLUSTERID Example2: How do I compare CPU usage across hosts? select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100, dt0(total_cpu_system) / getHostFact(numCores, 1) * 100, dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100, dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100, dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100, dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100 Create & Contribute your ‘tsqueries’! https://github.com/cloudera/cm_charting_scrapbook 7 ©2013 Cloudera, Inc. All Rights Reserved.
  • 8. Cloudera as an Application Platform ISV’s view of a Database Workload Mgmt Drivers JDBC/ODBC Security Mgmt Data Access API’s ISV’s view of an OS Systems Mgmt Package Mgmt Core Database 8 Process/ Resource Mgmt Security Mgmt Data Access API’s Core OS kernel ©2013Cloudera, Inc. All Rights Reserved. Systems Mgmt
  • 9. Cloudera as an Application Platform ISV’s view of Cloudera Package Mgmt Workload/ Process Mgmt Security Mgmt Data Access API’s Drivers JDBC/ODBC CDH 9 ©2013Cloudera, Inc. All Rights Reserved. Systems Mgmt
  • 10. Cloudera Platform Features Features Description Examples Package Mgmt - Ability to easily package and distribute binaries/jars via “Parcels” Informatica, Syncsort, LZO libraries Workload/ Process Mgmt - Ability to deploy applications as stand-alone processes or via YARN* on the Hadoop cluster - Isolation of cluster resources SAS, 0xData, Accumulo, Spark Security Mgmt - Support for Kerberos Mgmt - Role bases access control for Tables/Views in Hive/Impala via Sentry Data Access API’s - HDFS API, HBase API, Search API, Spark API - Kite (formerly Cloudera Development Kit) Causata, Basis Tech, CounterTack, Amdocs Drivers - ODBC/JDBC drivers for Hive/Impala Zoomdata, Tableau, Microstrategy, Qlikview Systems Mgmt - End-to-End management of an application via Cloudera Manager (CM) StackIQ, Dell Crowbar, Oracle OEM Manage -Deploy and upgrade (rolling) services and pkgs -Manage configurations Monitor -Proactive health checks -Track resource utilization -Custom metrics charts Diagnose -Distributed log collection and searching -Tag and track key events Integrate -Access CM via API * Support for YARN planned as part of CM5.x in FY14 10 ©2013Cloudera, Inc. All Rights Reserved.
  • 11. Example – Deployment via Parcels The platform for Big Data + The ETL app for hadoop • • Smarter Deployment & Administration: Seamless integration with Cloudera Manager for one-click deployment and easier administration • 11 Smarter Architecture: No code generation. ETL engine runs natively within Hadoop MapReduce, via plugin included in CDH 4.2 Smarter Monitoring: Comprehensive logging capabilities + activity monitoring through Cloudera Manager ©2013Cloudera, Inc. All Rights Reserved.
  • 12. How it works 1. Download Syncsort DMX-h “Parcel” file to your custom repository File contains everything you need to properly deploy Syncsort DMX-h ETL Edition on Cloudera 2. Distribute & activate DMX-h parcel on your Cloudera cluster A C Find Nodes Install Components Assign Roles Enter the names of the hosts which will be included in the Hadoop cluster. Click Continue. 12 B Cloudera Manager automatically installs the CDH components on the hosts you specified. Verify the roles of the nodes within your cluster. Make changes as necessary. ©2013Cloudera, Inc. All Rights Reserved.
  • 13. Syncsort DMX-h + Cloudera Manager Cloudera Manager CDH Cluster + ISV software Support Integration Monitoring Syncsort DMX-h A P I Management Installation CDH Nodes 13 DMX-h on every CDH node ©2013Cloudera, Inc. All Rights Reserved. 13
  • 14. Get a 360° View of Your Cluster, Including DMX-h Logs View service health & performance Get host-level snapshots Monitor & diagnose workloads Gather, view & Distribute your own Parcels via Cloudera Manager and Build and search Hadoop & DMX-h logs …And more!! 14 share it with the community ! ©2013Cloudera, Inc. All Rights Reserved.
  • 15. Service Extensibility • Introduced in C5 • Still in Beta! • • Similar look and feel as existing services • Easy to write (Java-free!) • Flexible • 15 Single management console for CDH, non-CDH services and ISV applications Independent release cycle ©2013Cloudera, Inc. All Rights Reserved.
  • 16. So.. How does it work? • A JSON file that describes of your service • Set of control scripts • Packaged as a JAR file • As promised, Java-free 16 ©2013Cloudera, Inc. All Rights Reserved.
  • 17. Example: Cloudera Manager Extensions - Spark 17 ©2013Cloudera, Inc. All Rights Reserved.
  • 19. Cloudera Manager Extensions: Spark 19 ©2013Cloudera, Inc. All Rights Reserved.
  • 20. Cloudera Manager Extensions: Spark 20 ©2013Cloudera, Inc. All Rights Reserved.
  • 21. Cloudera Manager Extensions: Spark 21 ©2013Cloudera, Inc. All Rights Reserved.
  • 22. The Code name : “spark”, #!/bin/bash roles : [{ CMD=$1 name : "master", MASTER_PORT=<read in from ./params.properties> startRunner : { program : "scripts/control.sh", case $CMD in args : [ "start_master", (start_master) "./params.properties"] exec $SPARK_HOME/scripts/spark-start.sh master" }, ;; parameters : [{ (*) name : "master_port", echo "$timestamp Don't understand [$CMD]" type : "port", ;; default : 7077 esac }], configWriter : { generators : [{ filename : "params.properties" }] }] 22 ©2013Cloudera, Inc. All Rights Reserved.
  • 23. Next Steps • Documentation & SDK as part of C5 Beta2 or later (definitely before GA!) • Working with select ISV’s (SAS, 0xData etc.) as part of Beta to further fine-tune this feature Develop & Contribute your Cloudera Manager service extensibility plug-in’s ! 23 ©2013Cloudera, Inc. All Rights Reserved.
  • 24. Service Extensibility Vertical Extension Vision of CM Extensibility Horizontal Extension 0xData SAS Syncsort Informatica Revolution API Ops Apps Capacity Mgr Security ISV’s SLA Mgr Cost Optimizer CDH CM SNMP API Oracle OEM 24 Nagios Dell Chef/ Puppet ©2013Cloudera, Inc. All Rights Reserved. Accumulo Spark Giraph
  • 25. Q&A • If you interested in learning more, participating in Beta, contributing plug-ins or Apps, contact: bala@cloudera.com 25 ©2013Cloudera, Inc. All Rights Reserved.
  • 26. Appendix/Resources • • • • • 26 Systems Management • Cloudera Manager API • http://cloudera.github.io/cm_api/ • http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/ Package Management • Docs on Parcels • http://training.cloudera.com/elearning/Parcels/ • http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-ManagerIntroduction/cmi_primer.html • http://blog.cloudera.com/blog/2013/05/faq-understanding-the-parcel-binary-distribution-format/ • http://blog.cloudera.com/blog/2013/07/one-engineers-experience-with-parcel/ Data Access API’s • http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/ • https://github.com/cloudera/cdk Workload/Resource Management • Cloudera Manager 5 documentation • http://cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-ManagingClusters/cm5mc_managing_resources.html • http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together/ Security Management • http://blog.cloudera.com/blog/2013/07/with-sentry-cloudera-fills-hadoops-enterprise-security-gap/ ©2013Cloudera, Inc. All Rights Reserved.