SlideShare a Scribd company logo
SQL-in-Hadoop
• SQL is hot again!
Apache Hive (+ Stinger/Tez)
Apache Drill
Shark/Spark
Impala
Phoenix
Greenplum (HAWQ)
Cascading Lingual
Hadapt
Splice Machine
• MapR provides the broadest SQL support
Apache Hive 0.11
GA
Impala on MapR
Private beta (25-50% faster)
Apache Drill 1.0
Alpha this month
• Hadoop BI tools can do a lot more than SQL queries
Why Apache Drill?
• Community-driven project
– SQL is an application interface
– Users don’t want vendor lock-in
• Next-generation SQL-in-Hadoop
– Full ANSI SQL:2003
– Schema is optional
– Nested data: JSON, Protobuf, …
– Highly extensible
– YARN integration
Who’s contributing?
MapR
Pentaho
Oracle
VMWare
Microsoft
Thoughtworks
UT Austin
UW Madison
RJMetrics
XingCloud
Lines of code:
> 100K
It’s Not Just About Queries…
• Real-time data loading so you don’t query stale data
– HDFS was not designed for these workloads
• Common storage and resource mgmt for all Big Data applications
– Enterprise-grade: HA, DP (snapshots), DR (mirrors)
– Multi-tenancy
– Read/write access (POSIX)
MapRDistributed Data System (MDDS)
YARN
Batch
(MapReduce)
SQL
(Drill, Tez, Impala)
Search
(Solr, Elasticsearch)
Streaming
(Storm)
File-based
(POSIX)
Table-based
(MDDS, HBase)
MapRDistribution for Apache Hadoop

More Related Content

PPTX
Building Big data solutions in Azure
PPTX
Ignite Your Big Data With a Spark!
PPTX
Big data solutions in azure
PPTX
Big data solutions in Azure
PDF
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
PPTX
Big Data tools in practice
Building Big data solutions in Azure
Ignite Your Big Data With a Spark!
Big data solutions in azure
Big data solutions in Azure
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Big Data tools in practice

What's hot (20)

PPTX
Not only SQL - Database Choices
PPTX
Atlanta MLConf
PDF
Koalas: Pandas on Apache Spark
PDF
An Introduction to Sparkling Water by Michal Malohlava
PPTX
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
PPTX
Microsoft Azure Databricks
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
PPTX
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
PPTX
Microsoft Machine Learning Smackdown
PDF
Machine Learning Data Lineage with MLflow and Delta Lake
PDF
Spark Streaming and MLlib - Hyderabad Spark Group
PPTX
Digital Transformation with Microsoft Azure
PDF
Exponea - Kafka and Hadoop as components of architecture
PDF
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
PDF
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Big Data Adavnced Analytics on Microsoft Azure
PDF
How R Developers Can Build and Share Data and AI Applications that Scale with...
PDF
Spark as a Service with Azure Databricks
PPTX
Azure data bricks by Eugene Polonichko
Not only SQL - Database Choices
Atlanta MLConf
Koalas: Pandas on Apache Spark
An Introduction to Sparkling Water by Michal Malohlava
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Microsoft Azure Databricks
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Microsoft Machine Learning Smackdown
Machine Learning Data Lineage with MLflow and Delta Lake
Spark Streaming and MLlib - Hyderabad Spark Group
Digital Transformation with Microsoft Azure
Exponea - Kafka and Hadoop as components of architecture
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Big Data Adavnced Analytics on Microsoft Azure
How R Developers Can Build and Share Data and AI Applications that Scale with...
Spark as a Service with Azure Databricks
Azure data bricks by Eugene Polonichko
Ad

Viewers also liked (20)

PDF
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
PPTX
Alan Gates, Hortonworks_Hadoop&SQL
PPTX
Bizitzaren historia
PDF
Startup Series: Lean Analytics, Innovation, and Tilting at Windmills
PPTX
1.nigam shah stanford_meetup
PDF
Expt panel hive_data_rp_20130320_final-1
PDF
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
PPTX
Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman
PPTX
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
PPTX
Susheel Patel, Pivotal_Hadoop&SQL
PDF
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
PPT
Redbook
PDF
Untethered health in a networked society by James Mathews
PPTX
Pre production planning
PPS
Very beautiful
PDF
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
PPTX
My magazine edited
PPTX
The Hive Think Tank: Rocking the Database World with RocksDB
PPTX
La musica
PPT
Chictopia for Mobile & Social Commerce panel discussion
[Japanese Content] Sumant Mandal_Opportunites in Big Data, The Hive in Japan,...
Alan Gates, Hortonworks_Hadoop&SQL
Bizitzaren historia
Startup Series: Lean Analytics, Innovation, and Tilting at Windmills
1.nigam shah stanford_meetup
Expt panel hive_data_rp_20130320_final-1
Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...
Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
Susheel Patel, Pivotal_Hadoop&SQL
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
Redbook
Untethered health in a networked society by James Mathews
Pre production planning
Very beautiful
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
My magazine edited
The Hive Think Tank: Rocking the Database World with RocksDB
La musica
Chictopia for Mobile & Social Commerce panel discussion
Ad

Similar to Tomer Shiran, MapR_Hadoop&SQL (20)

PDF
Advanced Analytics and Big Data (August 2014)
PPT
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
PPTX
Hadoop and Big Data: Revealed
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
PPTX
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
PDF
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
PPTX
Hadoop - Looking to the Future By Arun Murthy
PDF
2014 08-20-pit-hug
PPTX
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
PDF
Big Data Developers Moscow Meetup 1 - sql on hadoop
PDF
Apache Spark: killer or savior of Apache Hadoop?
ODP
The other Apache Technologies your Big Data solution needs
PPTX
Hadoop and IoT Sinergija 2014
PPTX
Apache Drill
PPTX
Glint with Apache Spark
PPTX
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
PDF
Gunther hagleitner:apache hive & stinger
PDF
Hortonworks tech workshop in-memory processing with spark
PDF
Big Data Hoopla Simplified - TDWI Memphis 2014
PPTX
Cloudera Hadoop Distribution
Advanced Analytics and Big Data (August 2014)
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Hadoop and Big Data: Revealed
Big Data Analytics with Hadoop, MongoDB and SQL Server
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Hadoop - Looking to the Future By Arun Murthy
2014 08-20-pit-hug
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Big Data Developers Moscow Meetup 1 - sql on hadoop
Apache Spark: killer or savior of Apache Hadoop?
The other Apache Technologies your Big Data solution needs
Hadoop and IoT Sinergija 2014
Apache Drill
Glint with Apache Spark
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Gunther hagleitner:apache hive & stinger
Hortonworks tech workshop in-memory processing with spark
Big Data Hoopla Simplified - TDWI Memphis 2014
Cloudera Hadoop Distribution

More from The Hive (20)

PDF
"Responsible AI", by Charlie Muirhead
PPTX
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
PDF
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
PDF
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
PPTX
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
PDF
Data Science in the Enterprise
PDF
AI in Software for Augmenting Intelligence Across the Enterprise
PPTX
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
PPTX
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
PPTX
Social Impact & Ethics of AI by Steve Omohundro
PDF
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
PDF
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
PDF
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
PPTX
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
PDF
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
PPTX
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
PDF
The Hive Think Tank: Heron at Twitter
PPTX
The Hive Think Tank: Unpacking AI for Healthcare
PPTX
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
PDF
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
"Responsible AI", by Charlie Muirhead
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
Data Science in the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
Social Impact & Ethics of AI by Steve Omohundro
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...

Tomer Shiran, MapR_Hadoop&SQL

  • 1. SQL-in-Hadoop • SQL is hot again! Apache Hive (+ Stinger/Tez) Apache Drill Shark/Spark Impala Phoenix Greenplum (HAWQ) Cascading Lingual Hadapt Splice Machine • MapR provides the broadest SQL support Apache Hive 0.11 GA Impala on MapR Private beta (25-50% faster) Apache Drill 1.0 Alpha this month • Hadoop BI tools can do a lot more than SQL queries
  • 2. Why Apache Drill? • Community-driven project – SQL is an application interface – Users don’t want vendor lock-in • Next-generation SQL-in-Hadoop – Full ANSI SQL:2003 – Schema is optional – Nested data: JSON, Protobuf, … – Highly extensible – YARN integration Who’s contributing? MapR Pentaho Oracle VMWare Microsoft Thoughtworks UT Austin UW Madison RJMetrics XingCloud Lines of code: > 100K
  • 3. It’s Not Just About Queries… • Real-time data loading so you don’t query stale data – HDFS was not designed for these workloads • Common storage and resource mgmt for all Big Data applications – Enterprise-grade: HA, DP (snapshots), DR (mirrors) – Multi-tenancy – Read/write access (POSIX) MapRDistributed Data System (MDDS) YARN Batch (MapReduce) SQL (Drill, Tez, Impala) Search (Solr, Elasticsearch) Streaming (Storm) File-based (POSIX) Table-based (MDDS, HBase) MapRDistribution for Apache Hadoop