Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: 2015
Gopal Vijayaraghavan
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC – Optimized Row-Columnar File
Columnar Storage+
Row-groups & Fixed splits
Protobuf Metadata Storage+
+
Type-safe Vectorization+
Hive ACID transactions+
Single SerDe for Format+
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Need for Speed: The Stinger Initiative
Stinger: An Open Roadmap to improve Apache Hive’s performance 100x.
Launched: February 2013; Delivered: April 2014.
Delivered in 100% Apache Open Source.
SQL Engine
Vectorized
SQL Engine
Columnar
Storage
ORC
= 100X+ +
Distributed
Execution
Apache Tez
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC at Facebook
Saved more than 1,400
servers worth of storage.
Compressioni
Compression ratio
increased from 5x to 8x
globally.
Compressioni
[1]
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC at Spotify
16x less HDFS read when
using ORC versus Avro.(5)
IOi
32x less CPU when using
ORC versus Avro.(5)
CPUi
[2]
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Today
What is Optimized about ORC?
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC – Optimized Row-Columnar File
Columnar Storage+
Row-groups & Stripe splits
Protobuf Metadata Storage+
+
Type-safe Vectorization+
Hive ACID transactions+
Single SerDe for Format+
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Columnar Storage
Storage Performance
● Compress each column differently
● Detect & compress common sub-sequences
● Auto-increment ids
● String Enums
● Large Integers (uid scale)
● Unique strings (UUIDS)
Read Performance
● Column projection
● Columnar deserializers
● Data locality
Write Throughput
● Stats auto-gather
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Row-groups & Stripe splits
Split Parallelism
● Effective parallelism
● No seeks to find boundaries
● No splits with zero data
● Decompress fixed chunks
Stripes
● Single unsplittable chunk
● Will reside in 1 HDFS block entirely
● Is self-contained for all read ops
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A Single SerDe for all ORC Files
A Single Writer
● No mismatch of serialization
● Forward compatibility
Readers
● Multiple reader implementations
● Allows for vector readers
● And row-mode readers
● Similar loop – good JIT hit-rate
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Protobuf Metadata Storage
Standardized Metadata
● Readers are easier to write
● Metadata readers are auto-generated
Metadata Forward Compatibility
● Protobuf Optional fields
Statistics Storage in Metadata
● Standard serialization for stats
● Allows for PPD into the IO layer
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type-safe Vectorization
Schema on Write
● Write ORC Structs with types
● SerDe & Inputformat
Read Performance
● Data is read with few copies
● Primitive types are fast
● Primitives are also unboxed
● Predicates are typed too
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: ETL Improvements
Always more new data
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC (Zlib): Compress Differently
674
389
433
ORC (old zlib) ORC SNAPPY ORC (new zlib)
ETL for TPC-H LineItem (scale 1 Tb)
Time Taken
Different Zlib algorithms for encoding
● Z_FILTERED
● Z_DEFAULT
● Z_BEST_SPEED
● Z_DEFAULT_COMPRESSION
In detail
● Compress IS_NULL bitsets lightly
● Compress Integers differently from Doubles
● Compress string dictionaries differently
● Allow for user choice
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC (Zlib): Compress Differently
Different Zlib algorithms for encoding
● Z_FILTERED
● Z_DEFAULT
● Z_BEST_SPEED
● Z_DEFAULT_COMPRESSION
In detail
● Compress IS_NULL bitsets lightly
● Compress Integers differently from Doubles
● Compress string dictionaries differently
● Allow for user choice
178.5
225.1
172.2
ORC (old zlib) ORC SNAPPY ORC (new zlib)
Data Sizes for TPC-H Lineitem (Scale 1 Tb)
Size on Disk
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using JDK8 SIMD: Integer Writers
Integer encodings
● Base + Delta
● Run-length
● Direct
Trade-off for Size/Speed
● Use fixed bit-width loops
● Snap to nearest bit-width
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 2 4 8 16 24 32 40 48 56 64
MeanTime(ms)
Bit Width
ORC Write Integer Performance
(smaller better)
hive 0.13 bitpacking
hive 1.0 bitpacking (new)
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Double Writers
273.331
247.634
231.741
0
50
100
150
200
250
300
old buffered + BE buffered + LE
MeanTime(ms)
Double Write Modes
ORC Write Double Performance
(smaller is better)
Double Writers
● JVM is big-endian
● X86 is little-endian
● Special handling of NaN
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Scale compression buffers
269.4
263.3
258.5 258.4 258.4 258.4
184.8 183.5 182.2 180.1 178.3 177.4
140
160
180
200
220
240
260
280
300
320
8 16 32 64 128 256
SizeinMB
Compression Buffer Size in KB
File Size
ZLIB
SNAPPY
Large Columns vs More Columns
● Adjust when >1000 columns
Trade offs
● Compression
● Low memory use
More additions
● Dynamically partitioned insert
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Streaming Ingest + ACID
Broken pattern: Partitions for Atomicity-
- Isolation & Consistency on retries+
Transactions are pluggable (txn.manager)+
Cache/Replication friendly (base + deltas)+
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP and Sub-second
ORC – Pushing for Sub-second
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Row Indexes
Min-Max pruning
● Evaluate on statistics
Bloom filters
● Better String filters
● Filter a random distribution
LLAP Future
● Row-level vector SARGs
5999989709
540,000
10,000
No Indexes Min-Max Indexes Bloomfilter Indexes
from tpch_1000.lineitem where l_orderkey = 1212000001;
(log scale)
Rows Read
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Row Indexes
Min-Max pruning
● Evaluate on Statistics
Bloom filters
● Better String filters
● Filter a random distribution
LLAP Future
● Row-level vector SARGs
74
4.5 1.34
No Indexes Min-Max Indexes Bloomfilter Indexes
* from tpch_1000.lineitem where l_orderkey=1212000001;
(smaller better)
Time Taken (seconds)
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: JDK8 SIMD Readers
Integer encodings
● Base + Delta
● Run-length
● Direct
Trade-off for Size/Speed
● Use fixed bit-width loops
● Snap to nearest bit-width
0
200
400
600
800
1000
1200
1400
1600
1800
1 2 4 8 16 24 32 40 48 56 64
MeanTime(ms)
Bit Width
ORC Read Integer Performance
hive 0.13 unpacking
hive-1.0 unpacking (new)
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Vectorization + SIMD
Advantage of a Single SerDe
● Primitive Types
Allocation free tight inner loops
● JDK8 has auto-vectorization
Vectorized Early Filter
● Vectors can be filtered early in ORC
● StringDictionary can be used to binary-search
Vectorized SIMD Join
● Performance for single key joins
0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm2
0x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm2
0x00007f13d2e6afba: movslq %eax,%r10
0x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3
;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate
(line 94)
0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8)
0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm2
0x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8)
;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate
(line 94)
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Split Strategies + Tez Grouping
Amdahl’s Law
● As fast as the slowest task
● Slice work thinly, but not too thin
Split-generation vs Execution time
● ETL
● BI
● Hybrid
Split-grouping & estimation
● ColumnarSplit size
● Group by estimate, not file size
● Bucket pruning
Slow split
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP
- JIT Performance for short queries+
Row-group level caching+
Asynchronous IO Elevator+
+ Multi-threaded Column Vector processing+
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions?
?
Interested? Stop by the Hortonworks booth to learn more
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Endnotes
(1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/
(2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014

More Related Content

PPTX
ORC Deep Dive 2020
PPTX
File Format Benchmarks - Avro, JSON, ORC, & Parquet
PPTX
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
PDF
Under the Hood of a Shard-per-Core Database Architecture
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
PPTX
ORC improvement in Apache Spark 2.3
PDF
ORC Files
PPTX
Druid deep dive
ORC Deep Dive 2020
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Under the Hood of a Shard-per-Core Database Architecture
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
ORC improvement in Apache Spark 2.3
ORC Files
Druid deep dive

What's hot (20)

PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
PPTX
Apache Spark Architecture
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
PPTX
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
Dynamic Partition Pruning in Apache Spark
PDF
MongoDB WiredTiger Internals
PDF
Redis cluster
PDF
Ceph Block Devices: A Deep Dive
PDF
Top 5 Mistakes When Writing Spark Applications
PDF
Parquet performance tuning: the missing guide
PPTX
ELK Stack
PPTX
RocksDB compaction
PDF
Linux tuning to improve PostgreSQL performance
PDF
Why Use an Oracle Database?
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
PDF
Redis persistence in practice
PDF
5 Steps to PostgreSQL Performance
PDF
Presto on Apache Spark: A Tale of Two Computation Engines
PDF
Photon Technical Deep Dive: How to Think Vectorized
File Format Benchmark - Avro, JSON, ORC & Parquet
Apache Spark Architecture
File Format Benchmark - Avro, JSON, ORC and Parquet
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Dynamic Partition Pruning in Apache Spark
MongoDB WiredTiger Internals
Redis cluster
Ceph Block Devices: A Deep Dive
Top 5 Mistakes When Writing Spark Applications
Parquet performance tuning: the missing guide
ELK Stack
RocksDB compaction
Linux tuning to improve PostgreSQL performance
Why Use an Oracle Database?
ClickHouse Deep Dive, by Aleksei Milovidov
Redis persistence in practice
5 Steps to PostgreSQL Performance
Presto on Apache Spark: A Tale of Two Computation Engines
Photon Technical Deep Dive: How to Think Vectorized
Ad

Viewers also liked (20)

PDF
Optimizing Hive Queries
PPTX
ORC File and Vectorization - Hadoop Summit 2013
PPTX
ORC: 2015 Faster, Better, Smaller
PPTX
ORC 2015: Faster, Better, Smaller
PPTX
Adding ACID Updates to Hive
PPTX
Structor - Automated Building of Virtual Hadoop Clusters
PPTX
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
PPTX
Data protection2015
PPTX
Protecting Enterprise Data in Apache Hadoop
PDF
Plugging the Holes: Security and Compatability in Hadoop
PDF
Parquet Hadoop Summit 2013
PPTX
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
PPTX
Hive acid-updates-strata-sjc-feb-2015
PPTX
ORC File Introduction
PDF
Next Generation MapReduce
PDF
Bay Area HUG Feb 2011 Intro
PDF
Next Generation Hadoop Operations
PPTX
Apache Hive on ACID
PDF
Differences of Deep Learning Frameworks
PPTX
Hive: Loading Data
Optimizing Hive Queries
ORC File and Vectorization - Hadoop Summit 2013
ORC: 2015 Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Adding ACID Updates to Hive
Structor - Automated Building of Virtual Hadoop Clusters
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
Data protection2015
Protecting Enterprise Data in Apache Hadoop
Plugging the Holes: Security and Compatability in Hadoop
Parquet Hadoop Summit 2013
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Hive acid-updates-strata-sjc-feb-2015
ORC File Introduction
Next Generation MapReduce
Bay Area HUG Feb 2011 Intro
Next Generation Hadoop Operations
Apache Hive on ACID
Differences of Deep Learning Frameworks
Hive: Loading Data
Ad

Similar to ORC 2015 (20)

PDF
ORC 2015: Faster, Better, Smaller
PPTX
ORC File - Optimizing Your Big Data
PPTX
ORC improvement in Apache Spark 2.3
PPTX
Performance Update: When Apache ORC Met Apache Spark
PPTX
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
PPTX
Hive present-and-feature-shanghai
PPTX
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
PPTX
ORC Improvement & Roadmap in Apache Spark 2.3 and 2.4
PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
PPTX
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
PPTX
Using Apache Hive with High Performance
PDF
Optimizing Hive Queries
PDF
Improving performance of decision support queries in columnar cloud database ...
PDF
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
PPTX
Hive for Analytic Workloads
PPTX
Hive analytic workloads hadoop summit san jose 2014
PDF
Ingesting Data at Blazing Speed Using Apache Orc
PDF
The Apache Spark File Format Ecosystem
PDF
Vectorized Query Execution in Apache Spark at Facebook
PDF
Gunther hagleitner:apache hive & stinger
ORC 2015: Faster, Better, Smaller
ORC File - Optimizing Your Big Data
ORC improvement in Apache Spark 2.3
Performance Update: When Apache ORC Met Apache Spark
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Hive present-and-feature-shanghai
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
ORC Improvement & Roadmap in Apache Spark 2.3 and 2.4
File Format Benchmark - Avro, JSON, ORC & Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Using Apache Hive with High Performance
Optimizing Hive Queries
Improving performance of decision support queries in columnar cloud database ...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Hive for Analytic Workloads
Hive analytic workloads hadoop summit san jose 2014
Ingesting Data at Blazing Speed Using Apache Orc
The Apache Spark File Format Ecosystem
Vectorized Query Execution in Apache Spark at Facebook
Gunther hagleitner:apache hive & stinger

More from t3rmin4t0r (7)

PPTX
Llap: Locality is Dead
PPSX
LLAP Nov Meetup
PPTX
Data organization: hive meetup
PPTX
TEZ-8 UI Walkthrough
PDF
Tez: Accelerating Data Pipelines - fifthel
PPTX
Performance Hive+Tez 2
PPTX
Hive+Tez: A performance deep dive
Llap: Locality is Dead
LLAP Nov Meetup
Data organization: hive meetup
TEZ-8 UI Walkthrough
Tez: Accelerating Data Pipelines - fifthel
Performance Hive+Tez 2
Hive+Tez: A performance deep dive

Recently uploaded (20)

PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
Types of Token_ From Utility to Security.pdf
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Website Design Services for Small Businesses.pdf
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
"Secure File Sharing Solutions on AWS".pptx
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
DNT Brochure 2025 – ISV Solutions @ D365
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Types of Token_ From Utility to Security.pdf
GSA Content Generator Crack (2025 Latest)
iTop VPN Crack Latest Version Full Key 2025
Topaz Photo AI Crack New Download (Latest 2025)
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Website Design Services for Small Businesses.pdf
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
How Tridens DevSecOps Ensures Compliance, Security, and Agility
Computer Software and OS of computer science of grade 11.pptx
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Weekly report ppt - harsh dattuprasad patel.pptx
MCP Security Tutorial - Beginner to Advanced
Monitoring Stack: Grafana, Loki & Promtail
Wondershare Recoverit Full Crack New Version (Latest 2025)
"Secure File Sharing Solutions on AWS".pptx
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)

ORC 2015

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: 2015 Gopal Vijayaraghavan
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Fixed splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Need for Speed: The Stinger Initiative Stinger: An Open Roadmap to improve Apache Hive’s performance 100x. Launched: February 2013; Delivered: April 2014. Delivered in 100% Apache Open Source. SQL Engine Vectorized SQL Engine Columnar Storage ORC = 100X+ + Distributed Execution Apache Tez
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC at Facebook Saved more than 1,400 servers worth of storage. Compressioni Compression ratio increased from 5x to 8x globally. Compressioni [1]
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC at Spotify 16x less HDFS read when using ORC versus Avro.(5) IOi 32x less CPU when using ORC versus Avro.(5) CPUi [2]
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Today What is Optimized about ORC?
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Stripe splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Columnar Storage Storage Performance ● Compress each column differently ● Detect & compress common sub-sequences ● Auto-increment ids ● String Enums ● Large Integers (uid scale) ● Unique strings (UUIDS) Read Performance ● Column projection ● Columnar deserializers ● Data locality Write Throughput ● Stats auto-gather
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Row-groups & Stripe splits Split Parallelism ● Effective parallelism ● No seeks to find boundaries ● No splits with zero data ● Decompress fixed chunks Stripes ● Single unsplittable chunk ● Will reside in 1 HDFS block entirely ● Is self-contained for all read ops
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A Single SerDe for all ORC Files A Single Writer ● No mismatch of serialization ● Forward compatibility Readers ● Multiple reader implementations ● Allows for vector readers ● And row-mode readers ● Similar loop – good JIT hit-rate
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Protobuf Metadata Storage Standardized Metadata ● Readers are easier to write ● Metadata readers are auto-generated Metadata Forward Compatibility ● Protobuf Optional fields Statistics Storage in Metadata ● Standard serialization for stats ● Allows for PPD into the IO layer
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type-safe Vectorization Schema on Write ● Write ORC Structs with types ● SerDe & Inputformat Read Performance ● Data is read with few copies ● Primitive types are fast ● Primitives are also unboxed ● Predicates are typed too
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: ETL Improvements Always more new data
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently 674 389 433 ORC (old zlib) ORC SNAPPY ORC (new zlib) ETL for TPC-H LineItem (scale 1 Tb) Time Taken Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice 178.5 225.1 172.2 ORC (old zlib) ORC SNAPPY ORC (new zlib) Data Sizes for TPC-H Lineitem (Scale 1 Tb) Size on Disk
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Using JDK8 SIMD: Integer Writers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Write Integer Performance (smaller better) hive 0.13 bitpacking hive 1.0 bitpacking (new)
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Double Writers 273.331 247.634 231.741 0 50 100 150 200 250 300 old buffered + BE buffered + LE MeanTime(ms) Double Write Modes ORC Write Double Performance (smaller is better) Double Writers ● JVM is big-endian ● X86 is little-endian ● Special handling of NaN
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Scale compression buffers 269.4 263.3 258.5 258.4 258.4 258.4 184.8 183.5 182.2 180.1 178.3 177.4 140 160 180 200 220 240 260 280 300 320 8 16 32 64 128 256 SizeinMB Compression Buffer Size in KB File Size ZLIB SNAPPY Large Columns vs More Columns ● Adjust when >1000 columns Trade offs ● Compression ● Low memory use More additions ● Dynamically partitioned insert
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Streaming Ingest + ACID Broken pattern: Partitions for Atomicity- - Isolation & Consistency on retries+ Transactions are pluggable (txn.manager)+ Cache/Replication friendly (base + deltas)+
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP and Sub-second ORC – Pushing for Sub-second
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 5999989709 540,000 10,000 No Indexes Min-Max Indexes Bloomfilter Indexes from tpch_1000.lineitem where l_orderkey = 1212000001; (log scale) Rows Read
  • 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on Statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 74 4.5 1.34 No Indexes Min-Max Indexes Bloomfilter Indexes * from tpch_1000.lineitem where l_orderkey=1212000001; (smaller better) Time Taken (seconds)
  • 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: JDK8 SIMD Readers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Read Integer Performance hive 0.13 unpacking hive-1.0 unpacking (new)
  • 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Vectorization + SIMD Advantage of a Single SerDe ● Primitive Types Allocation free tight inner loops ● JDK8 has auto-vectorization Vectorized Early Filter ● Vectors can be filtered early in ORC ● StringDictionary can be used to binary-search Vectorized SIMD Join ● Performance for single key joins 0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm2 0x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm2 0x00007f13d2e6afba: movslq %eax,%r10 0x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3 ;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94) 0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8) 0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm2 0x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8) ;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)
  • 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Split Strategies + Tez Grouping Amdahl’s Law ● As fast as the slowest task ● Slice work thinly, but not too thin Split-generation vs Execution time ● ETL ● BI ● Hybrid Split-grouping & estimation ● ColumnarSplit size ● Group by estimate, not file size ● Bucket pruning Slow split
  • 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP - JIT Performance for short queries+ Row-group level caching+ Asynchronous IO Elevator+ + Multi-threaded Column Vector processing+
  • 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)
  • 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more
  • 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Endnotes (1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/ (2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014