Google's Dremel

Dremel
Interactive Analysis
of Web-Scale Datasets
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva
Shivakumar, Matt Tolton, Theo Vassilakis

Presented by Maria Stylianou
marsty5@gmail.com
November 8th, 2012

KTH – Royal Institute of Technology

Outline
● Motivation

● Dremel – basic information
● Dremel's Key Aspects
– Columnar Format
– Query Execution

● Evaluation & Conclusions 2

Motivation

Data Big Data
● Web-scale Datasets → more frequent
● Large-scale Data Analysis → essential!

NOT
FAST
Speed Matters! 3

Dremel to the rescue!
● Interactive ad-hoc query system
Scalable Fault tolerant Fast

Access data
'in place'
● Analysis on in situ nested data

Non
relational
4

MapReduce or Dremel
or both

?

5

Key Aspects of Dremel
● Storage Format
– Columnar storage representation for nested
data

● Query Language & Execution
– SQL & Multi-level serving tree

6

Storage Format
Columnar Storage Representation

7

Data Model
● Based on strongly-typed nested records
schema

Repetition
Level
Definition
Level records

Query Language & Execution
SQL & Multi-level Serving Tree
Tablet
Contains
N rows from
the table

9

Query Execution
Query Dispatcher

● Schedules queries based on their priorities
● Balances the load
Servers
● Provides fault tolerance running
– Handles stragglers slow
– Tablets are three-way replicated

10

Experiments
Environment

11

Experiments
Local Disk - Performance

12

Experiments
MapReduce and Dremel

Counts the average number
of terms in a specific field

3000 workers
hours
minutes

seconds

13

Experiments
Impact of Stragglers

14

Experiments
Scalability

Selects top-20 adverts and
Their number of occurrences
In T4

15

What's happening today?
● Google BigQuery
– Web Service [pay-per-query]

● Open Dremel → Apache Drill
– Open Source Implementation
of Google BigQuery
– Flexibility: broader range of query languages

16

MapReduce or Dremel
or both
?
MR Dremel
Data Processing Record Column
Oriented Oriented
In-situ Processing No Yes!

Size of Queries Large Small/Medium

MapReduce AND Dremel 17

Conclusions
Multi-level Columnar
Execution Data
trees Layout

Scalable & Efficient
MapReduce benefits
Near-linear scalability

18

References
● S. Melnik et al. Dremel: Interactive Analysis of Web-
Scale Datasets. PVLDB, 3(1):330–339, 2010
●
G. Czajkowski. Sorting 1PB with MapReduce.
http://googleblog.blogspot.se/2008/11/sorting-1pb-with-mapreduce.html

● Apache Drill, http://wiki.apache.org/incubator/DrillProposal
● Google BigQuery, https://developers.google.com/bigquery/

Google's Dremel

More Related Content

What's hot (13)

Similar to Google's Dremel (20)

More from Maria Stylianou (16)

Recently uploaded (20)

Google's Dremel

Editor's Notes