SlideShare a Scribd company logo
A Proposed Answer to Phil’s Question: What Does This Say About the Database Field? Daniel Abadi
We’re Addicts Addict (verb): “to devote or surrender (oneself) to something habitually or obsessively” Mounting evidence that relational database technology is unsuitable for Web-scale data management Yet we cling to our RDBMS technology, refusing to acknowledge this evidence Addiction is a very serious matter Puts one at a disadvantage --- we’re being  left behind Highest impact research on Web scale data management is being published outside of SIGMOD/VLDB
What should we do? There are lots of resources for addicts Many programs work in steps to help addicts gradually kick the addiction Stepwise programs generally designed for individuals, but straightforward to extend to entire research communities
Step 1: Admit You Have a Problem Case study: Facebook 2.5 petabyte enterprise data warehouse Adding 15TB of new data a day RDBMSs should theoretically scale to this amount of data (esp. Gamma-style parallel DBMSs) They use Hadoop instead But their analysts don’t speak MapReduce! So they allocate a team of superstar developers to build an SQL layer on top of Hadoop -- Hive Entire companies are being started that specialize in using Hadoop to create data warehouses But data warehousing has always been the domain of relational database systems!
Step 2: Believe in a Higher Power Greater Than Yourself The higher power is … Google / systems community MapReduce published in OSDI Dynamo published in SOSP  BigTable published in OSDI Dryad published in EuroSys
Step 3: Make a Searching and Fearless Inventory of Yourself People who chose not to use database systems aren’t dumb There must be a reason We’re too expensive  Free / open source databases like MySQL/PostgreSQL/Ingres don’t scale out of the box Proprietary solutions price by the TB We’re too hard to use We don’t scale Seriously, we don’t scale Yes, I know we should scale in theory. But in practice we don’t scale. Even the expensive solutions.
Step 4: Admit the Exact Nature of Our Wrongs Admitting all of our wrongs is too overwhelming For now, let’s focus on our wrongs for analytical workloads Parallel databases should be able to scale indefinitely Current implementations have limitations Sometimes caused by first-order effects like hard limits required by various system components More often caused by second-order effects Systems are designed assuming failures are a rare event (not true at scale!) Systems designed assuming each node has predictable performance (not true at scale!)
Step 5: Remove Our Shortcomings Need more focus on fault tolerant systems research Need more focus on runtime scheduling Need better parallelization of UDFs Need to convince one of the parallel DBMS upstarts to release their code open source
Bottom Line Additions are hard to kick Need to work hard to remove our shortcomings Need to reclaim our leadership in the data management arena

More Related Content

PPT
Daniel Abadi HadoopWorld 2010
PPTX
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
PPTX
Beckman abadi-5min-pres
PPTX
Hadoop and Graph Data Management: Challenges and Opportunities
PDF
Shared slides-edbt-keynote-03-19-13
PPT
Boston Hadoop Meetup, April 26 2012
PPTX
SQL-on-Hadoop Tutorial
PPT
Presentation on Hadoop Technology
Daniel Abadi HadoopWorld 2010
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Beckman abadi-5min-pres
Hadoop and Graph Data Management: Challenges and Opportunities
Shared slides-edbt-keynote-03-19-13
Boston Hadoop Meetup, April 26 2012
SQL-on-Hadoop Tutorial
Presentation on Hadoop Technology

What's hot (20)

PPTX
Jstorm introduction-0.9.6
PPTX
Hadoop and Big Data
PPTX
Hadoop Tutorial For Beginners
PPTX
Big Data & Hadoop Tutorial
PPTX
Hadoop introduction
PPTX
Big data Hadoop presentation
PDF
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
PDF
Why Talend for Big Data?
PPTX
Big Data Introduction
PPTX
Whatisbigdataandwhylearnhadoop
PDF
Seminar_Report_hadoop
PPTX
Big Data and Hadoop
PPTX
Hadoop and big data
PPTX
عصر کلان داده، چرا و چگونه؟
DOCX
Hadoop technology doc
PDF
Introduction to Bigdata and HADOOP
PPTX
Big data concepts
PDF
Apache Hadoop - Big Data Engineering
PPTX
Apache Hadoop
Jstorm introduction-0.9.6
Hadoop and Big Data
Hadoop Tutorial For Beginners
Big Data & Hadoop Tutorial
Hadoop introduction
Big data Hadoop presentation
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Why Talend for Big Data?
Big Data Introduction
Whatisbigdataandwhylearnhadoop
Seminar_Report_hadoop
Big Data and Hadoop
Hadoop and big data
عصر کلان داده، چرا و چگونه؟
Hadoop technology doc
Introduction to Bigdata and HADOOP
Big data concepts
Apache Hadoop - Big Data Engineering
Apache Hadoop
Ad

Viewers also liked (7)

PDF
Invisible loading
PPTX
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
PDF
Consistency Tradeoffs in Modern Distributed Database System Design
PDF
VLDB 2009 Tutorial on Column-Stores
PPTX
The Power of Determinism in Database Systems
PPT
CAP, PACELC, and Determinism
PPT
Column-Stores vs. Row-Stores: How Different are they Really?
Invisible loading
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Consistency Tradeoffs in Modern Distributed Database System Design
VLDB 2009 Tutorial on Column-Stores
The Power of Determinism in Database Systems
CAP, PACELC, and Determinism
Column-Stores vs. Row-Stores: How Different are they Really?
Ad

Similar to Daniel Abadi: VLDB 2009 Panel (20)

PDF
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
PDF
Oracle 0472
PPTX
Solving the Database Problem
PDF
The Coming Database Revolution
PDF
Where Does Big Data Meet Big Database - QCon 2012
PPTX
Transform your DBMS to drive engagement innovation with Big Data
PPTX
PDF
Oracle vs NoSQL – The good, the bad and the ugly
PDF
Storage Systems For Scalable systems
PPTX
NoSQLDatabases
ODP
Databases benoitg 2009-03-10
PPTX
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
PDF
Readings in Database Systems Fourth Edition Joseph M. Hellerstein
PPTX
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
PDF
NoSQL Now! NoSQL Architecture Patterns
PPTX
High Performance and Scalability Database Design
PDF
Polyglot Persistence - Two Great Tastes That Taste Great Together
PPTX
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
PPTX
Who Will Win the Database Wars?
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
Oracle 0472
Solving the Database Problem
The Coming Database Revolution
Where Does Big Data Meet Big Database - QCon 2012
Transform your DBMS to drive engagement innovation with Big Data
Oracle vs NoSQL – The good, the bad and the ugly
Storage Systems For Scalable systems
NoSQLDatabases
Databases benoitg 2009-03-10
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Readings in Database Systems Fourth Edition Joseph M. Hellerstein
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
NoSQL Now! NoSQL Architecture Patterns
High Performance and Scalability Database Design
Polyglot Persistence - Two Great Tastes That Taste Great Together
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Who Will Win the Database Wars?

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development
Advanced Soft Computing BINUS July 2025.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)

Daniel Abadi: VLDB 2009 Panel

  • 1. A Proposed Answer to Phil’s Question: What Does This Say About the Database Field? Daniel Abadi
  • 2. We’re Addicts Addict (verb): “to devote or surrender (oneself) to something habitually or obsessively” Mounting evidence that relational database technology is unsuitable for Web-scale data management Yet we cling to our RDBMS technology, refusing to acknowledge this evidence Addiction is a very serious matter Puts one at a disadvantage --- we’re being left behind Highest impact research on Web scale data management is being published outside of SIGMOD/VLDB
  • 3. What should we do? There are lots of resources for addicts Many programs work in steps to help addicts gradually kick the addiction Stepwise programs generally designed for individuals, but straightforward to extend to entire research communities
  • 4. Step 1: Admit You Have a Problem Case study: Facebook 2.5 petabyte enterprise data warehouse Adding 15TB of new data a day RDBMSs should theoretically scale to this amount of data (esp. Gamma-style parallel DBMSs) They use Hadoop instead But their analysts don’t speak MapReduce! So they allocate a team of superstar developers to build an SQL layer on top of Hadoop -- Hive Entire companies are being started that specialize in using Hadoop to create data warehouses But data warehousing has always been the domain of relational database systems!
  • 5. Step 2: Believe in a Higher Power Greater Than Yourself The higher power is … Google / systems community MapReduce published in OSDI Dynamo published in SOSP BigTable published in OSDI Dryad published in EuroSys
  • 6. Step 3: Make a Searching and Fearless Inventory of Yourself People who chose not to use database systems aren’t dumb There must be a reason We’re too expensive Free / open source databases like MySQL/PostgreSQL/Ingres don’t scale out of the box Proprietary solutions price by the TB We’re too hard to use We don’t scale Seriously, we don’t scale Yes, I know we should scale in theory. But in practice we don’t scale. Even the expensive solutions.
  • 7. Step 4: Admit the Exact Nature of Our Wrongs Admitting all of our wrongs is too overwhelming For now, let’s focus on our wrongs for analytical workloads Parallel databases should be able to scale indefinitely Current implementations have limitations Sometimes caused by first-order effects like hard limits required by various system components More often caused by second-order effects Systems are designed assuming failures are a rare event (not true at scale!) Systems designed assuming each node has predictable performance (not true at scale!)
  • 8. Step 5: Remove Our Shortcomings Need more focus on fault tolerant systems research Need more focus on runtime scheduling Need better parallelization of UDFs Need to convince one of the parallel DBMS upstarts to release their code open source
  • 9. Bottom Line Additions are hard to kick Need to work hard to remove our shortcomings Need to reclaim our leadership in the data management arena