SlideShare a Scribd company logo
Hadoop on OpenStack with
Sahara
August 19, 2014
Matthew Farrellee (@spinningmatt)
Emerging Technology and Strategy
CTO Office, Red Hat
Hadoop is
8/19/14 tesora.com
• Narrow definition - Apache Hadoop - a specific
Apache project originally from Yahoo!, based on
papers from Google
• Broad definition - the ecosystem of projects,
primarily within Apache, that integrate in some
form with Apache Hadoop
• I’m going to use the broad definition
Hadoop often looks like
8/19/14 tesora.com
• Multiple, loosely coupled
projects focused on data
storage and processing
• Includes: workload,
resource, system
management; data ingest
& storage; compute
frameworks and domain
languages
Hadoop is often used to
8/19/14 tesora.com
• Store data
• ETL data
• Analyze data
• Structured and unstructured
Data today
8/19/14 tesora.com
• Structured or unstructured
• >2.5x more unstructured
• Rate of growth for unstructured is 2x structured
Data problems
8/19/14 tesora.com
• It’s not just that processing data is expensive
• In hardware costs
• In computational time
• Most of all, in human time
• Data creation outpaces storage capacity
Value
Value
Data flows
8/19/14 tesora.com
DatabaseData
DB
Data
Many still look like this... ...but start to look like this...
The analysis itself is hard
8/19/14 tesora.com
• Data sources are hard to find, or create
• Data is always dirty and needs cleaning
• Clean data is always approximate
• Figuring out the right question to ask takes
iterations
Sahara’s goal
8/19/14 tesora.com
Make managing data processing (e.g. Hadoop)
infrastructure and tools so simple they just get out of
your way
Sahara’s history
8/19/14 tesora.com
• Started at the Portland summit (April 2013)
• Joint effort by Red Hat, Mirantis and
Hortonworks
• Originally called Savanna
• Incubated in Icehouse (released April 2014)
• Supported Apache and Hortonworks Hadoop
• Integrated for Juno (release October 2014)
Sahara’s use cases
8/19/14 tesora.com
• Cluster
• Start / stop / scale
• Different shapes and sizes
• Repeatable (template mechanism)
• Workload (Elastic Data Processing, a.k.a EDP)
• Job = Analysis code + Data urls
• Queued and run across clusters (ephemeral or
persistent)
Sahara’s architecture
8/19/14 tesora.com
Data
Sources
Sahara
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Sahara
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara Service
Sahara’s vendor plugins
8/19/14 tesora.com
• It’s how users pick different software versions
• It’s how data processing frameworks are
integrated
• e.g. Vanilla (ref. impl. w/ Apache versions),
HDP (via Ambari), Spark (based on Vanilla),
CDH (spec approved), MapR (spec in review),
IDH (being removed)
Sahara’s API
8/19/14 tesora.com
• Both REST and Python (of course)
• Accessible from CLI and Horizon
Sahara’s basic structures
8/19/14 tesora.com
• Plugins - controller for specific software collections
• Images - in Glance, w/ special plugin specific tags
• Templates
• Two kinds, node group and cluster
• Combine node groups to form a cluster
template
• Clusters - the live clusters
Sahara’s EDP structures
8/19/14 tesora.com
• Data sources
• Input and output locations (Swift/HDFS/etc urls)
• Job binaries
• Often JARs or scripts, stored in a data source
• Jobs
• Templates for a job w/ parameters empty
• Job executions
• Instances of templates w/ parameters filled
Juno roadmap
8/19/14 tesora.com
https://review.openstack.org/#/q/sahara-specs+AND+status:merged,n,z
https://blueprints.launchpad.net/sahara
• Highlights -
• Dashboard merged into Horizon
• Spark w/ EDP
• CDH plugin
• Storm plugin
• Security group and Swift auth
8/19/14 tesora.com
Demo video: http://youtu.be/vmry_kXqn4c
Questions?

More Related Content

PDF
Benchmarking sahara based big data as a service solutions
PDF
Hadoop on OpenStack - Sahara @DevNation 2014
PPTX
20150425 experimenting with openstack sahara on docker
PPTX
Apache Spark Introduction @ University College London
PDF
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
PPTX
Hadoop Ecosystem
PPTX
Introduction to the Hadoop EcoSystem
PDF
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Benchmarking sahara based big data as a service solutions
Hadoop on OpenStack - Sahara @DevNation 2014
20150425 experimenting with openstack sahara on docker
Apache Spark Introduction @ University College London
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
Hadoop Ecosystem
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop Ecosystem (FrOSCon Edition)

What's hot (20)

PDF
Apache Spark Briefing
PDF
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
PPTX
Cloudera Impala + PostgreSQL
PDF
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
PPTX
Apache drill
PPTX
Architecting Applications with Hadoop
PDF
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
PPTX
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
PPTX
20150314 sahara intro and the future plan for open stack meetup
PPTX
An intriduction to hive
PDF
Hadoop and Spark
PDF
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
PPTX
Jaws - Data Warehouse with Spark SQL by Ema Orhian
PDF
SQL on Hadoop in Taiwan
PDF
Getting Spark ready for real-time, operational analytics
PPSX
Hadoop Ecosystem
PDF
Future of Data Intensive Applicaitons
PDF
Hadoop to spark-v2
PDF
Spark vs Hadoop
PPTX
The hadoop 2.0 ecosystem and yarn
Apache Spark Briefing
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Cloudera Impala + PostgreSQL
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Apache drill
Architecting Applications with Hadoop
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
20150314 sahara intro and the future plan for open stack meetup
An intriduction to hive
Hadoop and Spark
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Jaws - Data Warehouse with Spark SQL by Ema Orhian
SQL on Hadoop in Taiwan
Getting Spark ready for real-time, operational analytics
Hadoop Ecosystem
Future of Data Intensive Applicaitons
Hadoop to spark-v2
Spark vs Hadoop
The hadoop 2.0 ecosystem and yarn
Ad

Viewers also liked (11)

PPTX
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
ODP
Sahara presentation latest - Codemotion Rome 2015
PPTX
20151027 sahara + manila final
PDF
OpenStack Data Processing ("Sahara") project update - December 2014
PDF
20150704 benchmark and user experience in sahara weiting
PPTX
Hello OpenStack, Meet Hadoop
PDF
Sahara Updates - Kilo Edition
PDF
آشنایی با جرم‌یابی قانونی رایانه‌ای
PPTX
Cloud Security and Risk Management
PPTX
The Evolution of OpenStack – From Infancy to Enterprise
PPTX
Big Data on OpenStack
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
Sahara presentation latest - Codemotion Rome 2015
20151027 sahara + manila final
OpenStack Data Processing ("Sahara") project update - December 2014
20150704 benchmark and user experience in sahara weiting
Hello OpenStack, Meet Hadoop
Sahara Updates - Kilo Edition
آشنایی با جرم‌یابی قانونی رایانه‌ای
Cloud Security and Risk Management
The Evolution of OpenStack – From Infancy to Enterprise
Big Data on OpenStack
Ad

Similar to OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara (20)

PDF
Hadoop on OpenStack - Trove Day 2014
PPTX
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
PPTX
Hadoop Solutions
PDF
Application architectures with Hadoop – Big Data TechCon 2014
PDF
Application architectures with hadoop – big data techcon 2014
PPTX
Hadoo its a good pdf to read some notes p.pptx
PPTX
Hadoop And Their Ecosystem ppt
PPTX
Hadoop And Their Ecosystem
PDF
Data Modeling in Hadoop - Essentials for building data driven applications
PDF
Hadoop Primer
PPTX
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
PDF
Apache Arrow -- Cross-language development platform for in-memory data
PPTX
hadoop-ecosystem-ppt.pptx
PPTX
Hadoop and their in big data analysis EcoSystem.pptx
PPTX
Hadoop In Action
PPTX
Big data and hadoop anupama
PPTX
Hadoop training
PDF
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
PPTX
MahoutNew
Hadoop on OpenStack - Trove Day 2014
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Hadoop Solutions
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with hadoop – big data techcon 2014
Hadoo its a good pdf to read some notes p.pptx
Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem
Data Modeling in Hadoop - Essentials for building data driven applications
Hadoop Primer
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Apache Arrow -- Cross-language development platform for in-memory data
hadoop-ecosystem-ppt.pptx
Hadoop and their in big data analysis EcoSystem.pptx
Hadoop In Action
Big data and hadoop anupama
Hadoop training
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
MahoutNew

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
MYSQL Presentation for SQL database connectivity
CIFDAQ's Market Insight: SEC Turns Pro Crypto
The AUB Centre for AI in Media Proposal.docx
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Monthly Chronicles - July 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara