SlideShare a Scribd company logo
*
@daniel_abadi
Yale University
* The Big Data phenomenon is the best thing that
could have happened to the database
community
* Despite other definitions related to ‘3 Vs’ --Big Data means BIG Data

* Which means we need scalable database systems

* Still two main components of Big Data
* Performing data analysis at scale
* Performing requests on data at scale

*
* Database community has won the battle

* Some thought that MapReduce might replace

traditional database technology as the primary
means to perform analysis at scale
* Just about every MapReduce vendor has abandoned
this goal
* Hadapt, Impala, Tez, and several others are in a
race to see who can add the most traditional
database execution technology to Hadoop fastest
* Everyone is going in the direction of cost-based
optimizers, traditional database operators, and
push-based query execution

*
* The database community is losing the battle

* NoSQL systems still have very little traditional database
technology inside (despite adding SQL interfaces)
* No race to add DB technology --- why?

* Don’t blame CAP --- CAP is only relevant when there’s a
*

network partition
We never figured out how to do ACID and active replication at
scale

*

Many new proposals make simplifying assumptions in order to
handle scale

* It’s been 30 years ---- why can’t we build a distributed

database that can handle distributed transactions over
actively replicated data at scale?

*

More Related Content

PPT
Daniel Abadi HadoopWorld 2010
PPTX
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
PPT
Daniel Abadi: VLDB 2009 Panel
PPTX
Hadoop and Graph Data Management: Challenges and Opportunities
PDF
Shared slides-edbt-keynote-03-19-13
PPT
Boston Hadoop Meetup, April 26 2012
PPT
Presentation on Hadoop Technology
PPTX
SQL-on-Hadoop Tutorial
Daniel Abadi HadoopWorld 2010
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi: VLDB 2009 Panel
Hadoop and Graph Data Management: Challenges and Opportunities
Shared slides-edbt-keynote-03-19-13
Boston Hadoop Meetup, April 26 2012
Presentation on Hadoop Technology
SQL-on-Hadoop Tutorial

What's hot (20)

PPTX
Hadoop bigdata overview
PPTX
Hadoop and Big Data
PDF
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
PPTX
Apache Hadoop
PPTX
PPT on Hadoop
PDF
Apache Hadoop - Big Data Engineering
PDF
Seminar_Report_hadoop
PPT
Cloud Computing: Hadoop
PPTX
Big Data and Hadoop
ODP
Big data, map reduce and beyond
PPTX
Big Data & Hadoop Tutorial
PPTX
Hadoop introduction
PPTX
عصر کلان داده، چرا و چگونه؟
PPTX
Hadoop
DOCX
Hadoop technology doc
PPTX
Big data concepts
DOCX
Hadoop Seminar Report
PPTX
Big data ppt
PPT
Seminar Presentation Hadoop
PDF
Introduction to Hadoop and MapReduce
Hadoop bigdata overview
Hadoop and Big Data
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Apache Hadoop
PPT on Hadoop
Apache Hadoop - Big Data Engineering
Seminar_Report_hadoop
Cloud Computing: Hadoop
Big Data and Hadoop
Big data, map reduce and beyond
Big Data & Hadoop Tutorial
Hadoop introduction
عصر کلان داده، چرا و چگونه؟
Hadoop
Hadoop technology doc
Big data concepts
Hadoop Seminar Report
Big data ppt
Seminar Presentation Hadoop
Introduction to Hadoop and MapReduce
Ad

Viewers also liked (7)

PDF
Invisible loading
PPTX
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
PDF
Consistency Tradeoffs in Modern Distributed Database System Design
PDF
VLDB 2009 Tutorial on Column-Stores
PPTX
The Power of Determinism in Database Systems
PPT
CAP, PACELC, and Determinism
PPT
Column-Stores vs. Row-Stores: How Different are they Really?
Invisible loading
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Consistency Tradeoffs in Modern Distributed Database System Design
VLDB 2009 Tutorial on Column-Stores
The Power of Determinism in Database Systems
CAP, PACELC, and Determinism
Column-Stores vs. Row-Stores: How Different are they Really?
Ad

Similar to Beckman abadi-5min-pres (20)

PPTX
Information processing architectures
PDF
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
PDF
IRJET- Systematic Review: Progression Study on BIG DATA articles
PPT
NoSQL Basics - a quick tour
PPTX
Big data management
PDF
Using BIG DATA implementations onto Software Defined Networking
PDF
PPTX
Introduction to Cloud computing and Big Data-Hadoop
PDF
Big Data: hype or necessity?
PDF
Big Data: hype or necessity?
PPTX
The Six pillars for Building big data analytics ecosystems
PDF
Big Data: an introduction
PPT
Big data edel
PDF
Big data and hadoop overvew
PDF
Big Data using NoSQL Technologies
PPT
Seminar presentation
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PPTX
Jax 2013 - Big Data and Personalised Medicine
PDF
A Survey on Big Data Analysis Techniques
DOCX
Hadoop Seminar Report
Information processing architectures
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IRJET- Systematic Review: Progression Study on BIG DATA articles
NoSQL Basics - a quick tour
Big data management
Using BIG DATA implementations onto Software Defined Networking
Introduction to Cloud computing and Big Data-Hadoop
Big Data: hype or necessity?
Big Data: hype or necessity?
The Six pillars for Building big data analytics ecosystems
Big Data: an introduction
Big data edel
Big data and hadoop overvew
Big Data using NoSQL Technologies
Seminar presentation
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Jax 2013 - Big Data and Personalised Medicine
A Survey on Big Data Analysis Techniques
Hadoop Seminar Report

Beckman abadi-5min-pres

  • 2. * The Big Data phenomenon is the best thing that could have happened to the database community * Despite other definitions related to ‘3 Vs’ --Big Data means BIG Data * Which means we need scalable database systems * Still two main components of Big Data * Performing data analysis at scale * Performing requests on data at scale *
  • 3. * Database community has won the battle * Some thought that MapReduce might replace traditional database technology as the primary means to perform analysis at scale * Just about every MapReduce vendor has abandoned this goal * Hadapt, Impala, Tez, and several others are in a race to see who can add the most traditional database execution technology to Hadoop fastest * Everyone is going in the direction of cost-based optimizers, traditional database operators, and push-based query execution *
  • 4. * The database community is losing the battle * NoSQL systems still have very little traditional database technology inside (despite adding SQL interfaces) * No race to add DB technology --- why? * Don’t blame CAP --- CAP is only relevant when there’s a * network partition We never figured out how to do ACID and active replication at scale * Many new proposals make simplifying assumptions in order to handle scale * It’s been 30 years ---- why can’t we build a distributed database that can handle distributed transactions over actively replicated data at scale? *