SlideShare a Scribd company logo
Coordinating the Many
        Tools of Big Data
Strata 2013

Alan F. Gates
@alanfgates




                              Page 1
Big Data = Terabytes, Petabytes, …




Image Credit: Gizmodo
             © Hortonworks 2013
                                        Page 2
But It Is Also Complex Algorithms
• An example from a talk by Jimmy Lin at Hadoop Summit
  2012 on calculations Twitter is doing via UDFs in Pig.
  This equation uses stochastic gradient descent to do
  machine learning with their data:



   w(t+1) =w(t) −γ(t)∇(f(x;w(t)),y)




      © Hortonworks 2013
                                                       Page 3
And New Tools
• Apache Hadoop brings with it a large selection of tools
  and paradigms
   – Apache HBase, Apache Cassandra – Distributed, high volume
     reads and rights of individual data records
   – Apache Hive - SQL
   – Apache Pig, Cascading – Data flow programming for ETL, data
     modeling, and exploration
   – Apache Giraph – Graph processing
   – MapReduce – Batch processing
   – Storm, S4 – Stream processing
   – Plus lots of commercial offerings




      © Hortonworks 2013
                                                                   Page 4
Pre-Cloud: One Tool per Machine
• Databases presented SQL or SQL-like paradigms for operating on data
• Other tools came in separate packages (e.g. R) or on separate platforms (e.g.
  SAS).



                             Data
                             Mart
                                             Statistical
                                             Analysis
         Data
       Warehouse


                             Cube/M                        OLTP
                              OLAP



        © Hortonworks 2013
                                                                            Page 5
Cloud: Many Tools One Platform
   • Users no longer want to be concerned with what platform their data is in – just
     apply the tool to it
   • SQL no longer the only or primary data access tool

                                                                           Statistical
                  Data                                                     Analysis
                  Mart
  Data
Warehouse




Cube/M                                                                   OLT
 OLAP                                                                     P




            © Hortonworks 2013
                                                                                     Page 6
Upside - Pick the Right Tool for the Job




    © Hortonworks 2013
                                       Page 7
Downside – Tools Don’t Play Well Together

• Hard for users to share data between tools
  – Different storage formats
  – Different data models
  – Different user defined function interfaces




      © Hortonworks 2013
                                                 Page 8
Downside – Wasted Developer Time
• Wastes developer time since each tool supplies the
  redundant functionality


                                          Hive

                             Pig         Parser

                            Parser     Metadata

                           Optimizer   Optimizer
                           Physical     Physical
                           Planner      Planner

                           Executor     Executor


      © Hortonworks 2013
                                                       Page 9
Downside – Wasted Developer Time
• Wastes developer time since each tool supplies the
  redundant functionality


                                                   Hive

                             Pig                  Parser

                            Parser               Metadata

                           Optimizer             Optimizer
                           Physical              Physical
                                       Overlap
                           Planner               Planner

                           Executor              Executor


      © Hortonworks 2013
                                                             Page 10
Conclusion: We Need Services
• We need to find a way to share services where we can
• Gives users the same experience across tools
• Allows developers to share effort when it makes sense




        © Hortonworks 2013
                                                          Page 11
Hadoop = Distributed Data Operating
System
Service                                                   Hadoop Component

Table Management                                          Hive

Access To Metadata                                        HCatalog

User authentication                                       Knox

Resource management                                       YARN

Notification                                              HCatalog

REST/Connectors                                           webhcat, webhdfs, Hive, HBase,
                                                          Oozie
Relational data processing                                Tez

                               Exists   Pieces exist in this component   New Project

          © Hortonworks 2013
                                                                                           Page 12
Hadoop = Distributed Data Operating
System
Service                                                   Hadoop Component

Table Management                                          Hive

Access To Metadata                                        HCatalog

User authentication                                       Knox

Resource management                                       YARN

Notification                                              HCatalog

REST/Connectors                                           webhcat, webhdfs, Hive, HBase,
                                                          Oozie
Relational data processing                                Tez

                               Exists   Pieces exist in this component   New Project

          © Hortonworks 2013
                                                                                           Page 13
HCatalog – Table Management
• Opens up Hive’s tables to other tools inside and outside
  Hadoop
• Presents tools with a table paradigm that abstracts away
  storage details
• Provides a shared data model
• Provides a shared code path for data and metadata access




      © Hortonworks 2013
                                                             Page 14
HCatalog – Table Management
• Opens up Hive’s tables to other tools inside and outside
  Hadoop
• Presents tools with a table paradigm that abstracts away
  storage details
• Provides a shared data model
• Provides a shared code path for data and metadata access

                             Hive




                           Metastore




      © Hortonworks 2013
                                                             Page 15
HCatalog – Table Management
• Opens up Hive’s tables to other tools inside and outside
  Hadoop
• Presents tools with a table paradigm that abstracts away
  storage details
• Provides a shared data model
• Provides a shared code path for data and metadata access

                             Hive            Pig
                                            HCat
                                           Loader



                           Metastore      MapReduce
                                           HCatInput
                                            Format

      © Hortonworks 2013
                                                             Page 16
HCatalog – Table Management
• Opens up Hive’s tables to other tools inside and outside
  Hadoop
• Presents tools with a table paradigm that abstracts away
  storage details
• Provides a shared data model
• Provides a shared code path for data and metadata access

                                Hive         Pig
   External
   Systems                                  HCat
                                           Loader
   REST
                    WebHCat
                              Metastore   MapReduce
                                           HCatInput
                                            Format

      © Hortonworks 2013
                                                             Page 17
Tez – Moving Beyond MapReduce
• Low level data-processing execution engine
• Use it for the base of MapReduce, Hive, Pig, Cascading
  etc.
• Enables pipelining of jobs
• Removes task and job launch times
• Hive and Pig jobs no longer need to move to the end of
  the queue between steps in the pipeline
• Does not write intermediate output to HDFS
  – Much lighter disk and network usage
• Built on YARN



      © Hortonworks 2013
                                                       Page 18
Pig/Hive-MR versus Pig/Hive-Tez
                                            SELECT a.state, COUNT(*), AVERAGE(c.price)
                                                             FROM a
                                                      JOIN b ON (a.id = b.id)
                                                  JOIN c ON (a.itemId = c.itemId)
                                                        GROUP BY a.state


                                   Job 1



                                                          Job 2

I/O Synchronization
      Barrier




             I/O Synchronization
                   Barrier




                                                  Job 3




                         Pig/Hive - MR
                       © Hortonworks 2013
                                                                                         Page 19
Pig/Hive-MR versus Pig/Hive-Tez
                                            SELECT a.state, COUNT(*), AVERAGE(c.price)
                                                             FROM a
                                                      JOIN b ON (a.id = b.id)
                                                  JOIN c ON (a.itemId = c.itemId)
                                                        GROUP BY a.state


                                   Job 1



                                                          Job 2

I/O Synchronization
      Barrier




             I/O Synchronization
                   Barrier



                                                                         Single Job


                                                  Job 3




                         Pig/Hive - MR                                                   Pig/Hive - Tez
                       © Hortonworks 2013
                                                                                                          Page 20
FastQuery: Beyond Batch with YARN




 Tez Generalizes Map-Reduce           Always-On Tez Service
Simplified execution plans process   Low latency processing for
        data more efficiently        all Hadoop data processing




       © Hortonworks 2013
                                                                  Page 21
Knox – Single Sign On




   © Hortonworks 2013
                        Page 22
Today’s Access Options
• Direct Access
   – Access Services via REST (WebHDFS, WebHCat)
   – Need knowledge of and access to whole cluster
   – Security handled by each component in the cluster
   – Kerberos details exposed to users


          User              {REST}   Hadoop Cluster


• Gateway / Portal Nodes
   – Dedicated nodes behind firewall
   – User SSH to node to access Hadoop services

                             SSH
                                      GW
          User                                  Hadoop Cluster
                                     Node


       © Hortonworks 2013
                                                                 Page 23
Knox Design Goals
• Operators can firewall cluster without end user access to
  “gateway node”
• Users see one cluster end-point that aggregates
  capabilities for data access, metadata and job control
• Provide perimeter security to make Hadoop security setup
  easier
• Enable integration enterprise and cloud identity
  management environments




      © Hortonworks 2013
                                                        Page 24
Perimeter Verification & Authentication
Verification
- Verify identity token                       Authentication       Hadoop Cluster
- SAML, propagation of identity
Authentication
                                                    User Store
- Establish identity at Gateway to
  Authenticate with LDAP + AD                        KDC, AD,             DN        DN
                                                      LDAP
                                                                  Web     DN        DN
                                                                  HDFS
                                                                               NN
                            {REST}                    Knox
         Client                                      Gateway

                                                                               JT
                                                                  Web
                                                                               Hive
                                     ID Provider                  HCat
                                      KDC, AD,
                                        LDAP                                 HCat

                                                   Verification
                © Hortonworks 2013
                                                                                      Page 25
Thank You




   © Hortonworks 2012
                        Page 26

More Related Content

PPTX
Stinger hadoop summit june 2013
PPTX
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
PDF
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
PPTX
LLAP: long-lived execution in Hive
PPTX
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
PPTX
Hive Does ACID
PPTX
Hive & HBase For Transaction Processing
PPTX
Hive ACID Apache BigData 2016
Stinger hadoop summit june 2013
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
LLAP: long-lived execution in Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Hive Does ACID
Hive & HBase For Transaction Processing
Hive ACID Apache BigData 2016

What's hot (20)

PPTX
Apache Hive on ACID
PPTX
Hive acid-updates-strata-sjc-feb-2015
PDF
A TPC Benchmark of Hive LLAP and Comparison with Presto
PPTX
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
PDF
From Device to Data Center to Insights
PPTX
Hive acid-updates-summit-sjc-2014
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PPTX
Evolving HDFS to Generalized Storage Subsystem
PPTX
Hive: Loading Data
PPTX
Hive acid and_2.x new_features
PDF
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
PPTX
High throughput data replication over RAFT
PPTX
Llap: Locality is Dead
PPTX
Apache Tez : Accelerating Hadoop Query Processing
PPTX
HiveACIDPublic
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PDF
Hortonworks Technical Workshop: HBase and Apache Phoenix
PDF
Apache Ratis - In Search of a Usable Raft Library
PPTX
Data organization: hive meetup
PPTX
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Apache Hive on ACID
Hive acid-updates-strata-sjc-feb-2015
A TPC Benchmark of Hive LLAP and Comparison with Presto
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
From Device to Data Center to Insights
Hive acid-updates-summit-sjc-2014
Large-Scale Stream Processing in the Hadoop Ecosystem
Evolving HDFS to Generalized Storage Subsystem
Hive: Loading Data
Hive acid and_2.x new_features
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
High throughput data replication over RAFT
Llap: Locality is Dead
Apache Tez : Accelerating Hadoop Query Processing
HiveACIDPublic
Apache Hive 2.0: SQL, Speed, Scale
Hortonworks Technical Workshop: HBase and Apache Phoenix
Apache Ratis - In Search of a Usable Raft Library
Data organization: hive meetup
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Ad

Viewers also liked (14)

PDF
Outline providing effectivefeedbacktoemployees (1)
PPTX
Simply the best college best work
PPTX
Bowling event
PDF
Strata Stinger Talk October 2013
PPTX
Hive2.0 big dataspain-nov-2016
PPTX
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
PPTX
Big data spain keynote nov 2016
PDF
Rpp reproduksi - copy (1)
PPTX
Keynote apache bd-eu-nov-2016
PPTX
Hortonworks apache training
PPTX
Hive analytic workloads hadoop summit san jose 2014
PPTX
Brownian motion
PPT
Types dbms
Outline providing effectivefeedbacktoemployees (1)
Simply the best college best work
Bowling event
Strata Stinger Talk October 2013
Hive2.0 big dataspain-nov-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Big data spain keynote nov 2016
Rpp reproduksi - copy (1)
Keynote apache bd-eu-nov-2016
Hortonworks apache training
Hive analytic workloads hadoop summit san jose 2014
Brownian motion
Types dbms
Ad

Similar to Strata feb2013 (20)

PPTX
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
PPTX
Apache Hadoop Now Next and Beyond
PDF
Hadoop Trends
PDF
2013 feb 20_thug_h_catalog
PPTX
Introduction to Microsoft HDInsight and BI Tools
PDF
Hadoop - Now, Next and Beyond
PDF
Jan 2012 HUG: HCatalog
PDF
Big Data Real Time Applications
PDF
Introduction to Hortonworks Data Platform
PDF
HCatalog: Table Management for Hadoop - CHUG - 20120917
PPTX
Big data ppt
PPTX
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
KEY
Agile analytics applications on hadoop
KEY
Hortonworks: Agile Analytics Applications
PDF
Integration of HIve and HBase
PDF
Integration of Hive and HBase
PDF
An introduction to apache drill presentation
PPTX
H cat berlinbuzzwords2012
PPTX
A unified data modeler in the world of big data
PPTX
Apache hadoop for windows server and windwos azure
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Apache Hadoop Now Next and Beyond
Hadoop Trends
2013 feb 20_thug_h_catalog
Introduction to Microsoft HDInsight and BI Tools
Hadoop - Now, Next and Beyond
Jan 2012 HUG: HCatalog
Big Data Real Time Applications
Introduction to Hortonworks Data Platform
HCatalog: Table Management for Hadoop - CHUG - 20120917
Big data ppt
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Agile analytics applications on hadoop
Hortonworks: Agile Analytics Applications
Integration of HIve and HBase
Integration of Hive and HBase
An introduction to apache drill presentation
H cat berlinbuzzwords2012
A unified data modeler in the world of big data
Apache hadoop for windows server and windwos azure

Recently uploaded (20)

PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
DOCX
search engine optimization ppt fir known well about this
PDF
Getting Started with Data Integration: FME Form 101
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Hindi spoken digit analysis for native and non-native speakers
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Five Habits of High-Impact Board Members
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PPT
Geologic Time for studying geology for geologist
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Modernising the Digital Integration Hub
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Architecture types and enterprise applications.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
STKI Israel Market Study 2025 version august
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
search engine optimization ppt fir known well about this
Getting Started with Data Integration: FME Form 101
A novel scalable deep ensemble learning framework for big data classification...
Hindi spoken digit analysis for native and non-native speakers
Module 1.ppt Iot fundamentals and Architecture
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Five Habits of High-Impact Board Members
Univ-Connecticut-ChatGPT-Presentaion.pdf
Zenith AI: Advanced Artificial Intelligence
Geologic Time for studying geology for geologist
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
1 - Historical Antecedents, Social Consideration.pdf
Modernising the Digital Integration Hub
CloudStack 4.21: First Look Webinar slides
Architecture types and enterprise applications.pdf
Hybrid model detection and classification of lung cancer
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
STKI Israel Market Study 2025 version august

Strata feb2013

  • 1. Coordinating the Many Tools of Big Data Strata 2013 Alan F. Gates @alanfgates Page 1
  • 2. Big Data = Terabytes, Petabytes, … Image Credit: Gizmodo © Hortonworks 2013 Page 2
  • 3. But It Is Also Complex Algorithms • An example from a talk by Jimmy Lin at Hadoop Summit 2012 on calculations Twitter is doing via UDFs in Pig. This equation uses stochastic gradient descent to do machine learning with their data: w(t+1) =w(t) −γ(t)∇(f(x;w(t)),y) © Hortonworks 2013 Page 3
  • 4. And New Tools • Apache Hadoop brings with it a large selection of tools and paradigms – Apache HBase, Apache Cassandra – Distributed, high volume reads and rights of individual data records – Apache Hive - SQL – Apache Pig, Cascading – Data flow programming for ETL, data modeling, and exploration – Apache Giraph – Graph processing – MapReduce – Batch processing – Storm, S4 – Stream processing – Plus lots of commercial offerings © Hortonworks 2013 Page 4
  • 5. Pre-Cloud: One Tool per Machine • Databases presented SQL or SQL-like paradigms for operating on data • Other tools came in separate packages (e.g. R) or on separate platforms (e.g. SAS). Data Mart Statistical Analysis Data Warehouse Cube/M OLTP OLAP © Hortonworks 2013 Page 5
  • 6. Cloud: Many Tools One Platform • Users no longer want to be concerned with what platform their data is in – just apply the tool to it • SQL no longer the only or primary data access tool Statistical Data Analysis Mart Data Warehouse Cube/M OLT OLAP P © Hortonworks 2013 Page 6
  • 7. Upside - Pick the Right Tool for the Job © Hortonworks 2013 Page 7
  • 8. Downside – Tools Don’t Play Well Together • Hard for users to share data between tools – Different storage formats – Different data models – Different user defined function interfaces © Hortonworks 2013 Page 8
  • 9. Downside – Wasted Developer Time • Wastes developer time since each tool supplies the redundant functionality Hive Pig Parser Parser Metadata Optimizer Optimizer Physical Physical Planner Planner Executor Executor © Hortonworks 2013 Page 9
  • 10. Downside – Wasted Developer Time • Wastes developer time since each tool supplies the redundant functionality Hive Pig Parser Parser Metadata Optimizer Optimizer Physical Physical Overlap Planner Planner Executor Executor © Hortonworks 2013 Page 10
  • 11. Conclusion: We Need Services • We need to find a way to share services where we can • Gives users the same experience across tools • Allows developers to share effort when it makes sense © Hortonworks 2013 Page 11
  • 12. Hadoop = Distributed Data Operating System Service Hadoop Component Table Management Hive Access To Metadata HCatalog User authentication Knox Resource management YARN Notification HCatalog REST/Connectors webhcat, webhdfs, Hive, HBase, Oozie Relational data processing Tez Exists Pieces exist in this component New Project © Hortonworks 2013 Page 12
  • 13. Hadoop = Distributed Data Operating System Service Hadoop Component Table Management Hive Access To Metadata HCatalog User authentication Knox Resource management YARN Notification HCatalog REST/Connectors webhcat, webhdfs, Hive, HBase, Oozie Relational data processing Tez Exists Pieces exist in this component New Project © Hortonworks 2013 Page 13
  • 14. HCatalog – Table Management • Opens up Hive’s tables to other tools inside and outside Hadoop • Presents tools with a table paradigm that abstracts away storage details • Provides a shared data model • Provides a shared code path for data and metadata access © Hortonworks 2013 Page 14
  • 15. HCatalog – Table Management • Opens up Hive’s tables to other tools inside and outside Hadoop • Presents tools with a table paradigm that abstracts away storage details • Provides a shared data model • Provides a shared code path for data and metadata access Hive Metastore © Hortonworks 2013 Page 15
  • 16. HCatalog – Table Management • Opens up Hive’s tables to other tools inside and outside Hadoop • Presents tools with a table paradigm that abstracts away storage details • Provides a shared data model • Provides a shared code path for data and metadata access Hive Pig HCat Loader Metastore MapReduce HCatInput Format © Hortonworks 2013 Page 16
  • 17. HCatalog – Table Management • Opens up Hive’s tables to other tools inside and outside Hadoop • Presents tools with a table paradigm that abstracts away storage details • Provides a shared data model • Provides a shared code path for data and metadata access Hive Pig External Systems HCat Loader REST WebHCat Metastore MapReduce HCatInput Format © Hortonworks 2013 Page 17
  • 18. Tez – Moving Beyond MapReduce • Low level data-processing execution engine • Use it for the base of MapReduce, Hive, Pig, Cascading etc. • Enables pipelining of jobs • Removes task and job launch times • Hive and Pig jobs no longer need to move to the end of the queue between steps in the pipeline • Does not write intermediate output to HDFS – Much lighter disk and network usage • Built on YARN © Hortonworks 2013 Page 18
  • 19. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Job 1 Job 2 I/O Synchronization Barrier I/O Synchronization Barrier Job 3 Pig/Hive - MR © Hortonworks 2013 Page 19
  • 20. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Job 1 Job 2 I/O Synchronization Barrier I/O Synchronization Barrier Single Job Job 3 Pig/Hive - MR Pig/Hive - Tez © Hortonworks 2013 Page 20
  • 21. FastQuery: Beyond Batch with YARN Tez Generalizes Map-Reduce Always-On Tez Service Simplified execution plans process Low latency processing for data more efficiently all Hadoop data processing © Hortonworks 2013 Page 21
  • 22. Knox – Single Sign On © Hortonworks 2013 Page 22
  • 23. Today’s Access Options • Direct Access – Access Services via REST (WebHDFS, WebHCat) – Need knowledge of and access to whole cluster – Security handled by each component in the cluster – Kerberos details exposed to users User {REST} Hadoop Cluster • Gateway / Portal Nodes – Dedicated nodes behind firewall – User SSH to node to access Hadoop services SSH GW User Hadoop Cluster Node © Hortonworks 2013 Page 23
  • 24. Knox Design Goals • Operators can firewall cluster without end user access to “gateway node” • Users see one cluster end-point that aggregates capabilities for data access, metadata and job control • Provide perimeter security to make Hadoop security setup easier • Enable integration enterprise and cloud identity management environments © Hortonworks 2013 Page 24
  • 25. Perimeter Verification & Authentication Verification - Verify identity token Authentication Hadoop Cluster - SAML, propagation of identity Authentication User Store - Establish identity at Gateway to Authenticate with LDAP + AD KDC, AD, DN DN LDAP Web DN DN HDFS NN {REST} Knox Client Gateway JT Web Hive ID Provider HCat KDC, AD, LDAP HCat Verification © Hortonworks 2013 Page 25
  • 26. Thank You © Hortonworks 2012 Page 26

Editor's Notes

  • #3: This is how we tend to think of Big data
  • #6: Limited in a couple of ways:Scalability limited by being on one machine or a small cluster that counts on all participants being upHard to apply different types of processing without moving data around
  • #7: Hive is the only SQL based app in this pileOther apps still in the picture, it’s not like Hadoop is displacing everything