OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yuanjian Li

Daoyuan Wang (Intel)
Yuanjian Li (Baidu)
OAP: Optimized Analytics
Package for Spark Platform

Notice and Disclaimers:
• Intel, the Intel logo are trademarks of IntelCorporation in the U.S. and/or other countries. *Othernames and brandsmay be
claimed as the property of others.
See Trademarkson intel.com for fulllist of Intel trademarks.
• Optimization Notice:
Intel's compilers may or may not optimize to the same degree for non-Intelmicroprocessorsfor optimizations that are not
unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other
optimizations. Inteldoes not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors
not manufactured by Intel.
Microprocessor-dependentoptimizations in this product are intended for use with Intelmicroprocessors. Certain
optimizations not specific to Intelmicroarchitecture are reserved for Intelmicroprocessors. Please refer to the applicable
product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
• Intel technologies may require enabled hardware, specific software, or servicesactivation. Checkwith your system
manufacturer or retailer.
• No computer systemcan be absolutely secure. Inteldoes not assumeany liability for lost or stolen data or systems or any
damages resulting from such losses.
• You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning
Intel products described herein. You agree to grant Intela non-exclusive, royalty-free license to any patent claim thereafter
drafted which includes subject matter disclosed herein.
• No license (express or implied, by estoppelor otherwise) to any intellectualpropertyrights is granted by this document.
• The products described may contain design defectsor errorsknownas errata which maycausethe product to deviate from
publish.

About me
Daoyuan Wang
• developer@Intel
• Focuses on Spark
optimization
• An active Spark
contributor since 2014
Yuanjian Li
• Baidu INF distributed
computation
• Apache Spark
contributor
• Baidu Spark team
leader

Agenda
• Background for OAP
• Key features
• Benchmark
• OAP and Spark in Baidu
• Future plans

Data Analytics in Big Data Definition
• People wants OLAP
against large dataset
as fast as possible.
• People wants extract
information from new
coming data as soon
as possible.

Data Analytics Acceleration is
Required by Spark Users
http://cdn2.hubspot.net/hubfs/438089/DataBricks_Surveys_-_Content/2016_Spark_Survey/2016_Spark_Infographic.pdf

Emerging hardware technology
Intel® Optane™ Technology
Data Center Solutions
Accelerate applications for
fast caching and storage,
reduce transaction costs for
latency-sensitive workloads
and increase scale per server.
Intel® Optane™ technology
allows data centers to deploy
bigger and more affordable
datasets to gain new insights
from large memory pools.

Our proposal – OAP
Spark* Job Server
Spark SQL / StructuredStreaming/ Core
Cassandra* HBase*Redis*Alluxio*
HDFS* S3* … Storage Layer
Hive* Table Parquet * JSON * ORC *
Redis *
Connector
Cassandra *
Connector
OAP (Codename “Spinach”)
• IndexedDataSource / CacheAware
• RDMA, QAT, ISA-L,FPGA …
• User Customized Indices
• Columnar formats & supportParquet, ORC
• Runtime ComputingV.S.Data Store
• Columnar Fine-grainedCache
• Spark Executor in-process Cache
• 3D Xpoint (APP Direct Mode)
• Auto tuningbasedonperiodicaljobhistory
• K8S Integration/ AES-NI Encryption

Why OAP
Low cost
• Makes full use of
existing hardware
• Open source
Good
Performance
• Index just like
traditional database
• Up to 5x boost in
real-world
Easy to Use
• Easy to deploy
• Easy to maintain
• Easy to learn

A Simple Example
1. Run with OAP
$SPARK_HOME/sbin/start-thriftserver --package oap.jar;
2. Create a OAP table
beeline> CREATE TABLE src(a: Int, b: String) USING spn;
3. Create a single column B+ Tree index
beeline> CREATE SINDEX idx_1 ON src (a) USING BTREE;
4. Insert data
beeline> INSERT INTO TABLE src SELECT key, value FROM xxx;
5. Refresh index
beeline> REFRESH SINDEX on src;
6. Execution would automatically utilize index
beeline> SELECT MAX(value), MIN(value) FROM src WHERE a > 100 and a <
1000;

OAP Files and Fibers
Column (Fiber) #1
Column (Fiber) #2
Column (Fiber) #N
RowGroup #1
…RowGroup #2
RowGroup #N
Index meta
statistics
Index data
structure
(Index Fiber)
One Index file
for every data
file
Index meta
statistics
Index data
structure
(Index Fiber)
OAP meta file
OAP data
files
OAP
index files
OAP
index files

14
OAP Internals - index
Spark predicate
push down
FilteredScan
Read OAP Meta
Available
index?
read statistics
before use index
Get Local RowID
from index
Full table scan
Access data file for
RowIDs directly
Y
N
OAP cached access
Index selection
Supports Btree Index
and BitMap Index, find
best match among all
created indices
Supports statistics such
as MinMax, PartbyValue,
Sample, BloomFilter
Only reads data fibers
we need and puts those
fibers into cache (in-
memory fiber)

OAP compatible layer
RowGroup #k
RowGroup #1
RowGroup #2
Parquet compatible layer
Read row #m from parquet file
Find Row group #k
Read row group and
get specific rows
Parquet data file
Cache

OAP Data locality
Spark as a Service
Meta Data
FiberCacheManager
Executor
Index
Storage(HDFS / S3 / OSS)
SpinachContext (Driver)
FiberSensor
HeartBeat

Performance
72.083
7.095
2.304
0
10
20
30
40
50
60
70
80
Parquet Vectorized Read OAP Indexed Read OAP Indexed Read with
Fiber Cache
QueryTime(seconds)
OAP Index And Cache Performance
Cluster:
1 Master + 2 Slaves
Hardware:
CPU – 2x E5-2699 v4
RAM – 256 GB
Storage – S3610 1.6TB
Data:
300GB (Compressed Parquet)
2 Billion Records

Spark In Baidu
• Spark import to Baidu
• Version: 0.8
80
1000
3000
6500
50 300
1500
5800
0
1000
2000
3000
4000
5000
6000
7000
Nodes Jobs/day
2014 2015 2016 2017
• Build standalone
cluster
• Integratewith in-
houseFSPub-
SubDW
• Version: 1.4
• Build Cluster over
YARN
• Integratewith in-
houseResource
Scheduler System
• Version: 1.6
• SQLGraph Service
over Spark
• OAP
• Version: 2.1

Baidu Big SQLBaiduBigSQL
Web UI Restful API
BBS HTTPServer
BBS Worker BBS Worker BBS Worker
BBS Master
Cache & Index Layer(OAP)
Spark Over Yarn
Roll Up Table Layer
API Layer:
• Meta Control API
• Job API:
LoadExportQueryInde
x Control
Control Layer:
• Meta Control
• Job Scheduler
• Spark Driver
• Query Classification
Boosting Layer:
• Roll Up Table
Management
• Roll Up Query
Change
• Index CreateUpdate
• CacheHit

Baidu Big SQL
Query Physical Queue(FAIR)
Import Physical Queues
BBS Worker
Big Query
Pool Small Query Pool
Index Create
Pool
BBS Master
Import Physical Queues
Load Physical Queues
Spark Over YARN
Data Sources
Logs DW
Load Job
alter table create indexclassify query
Resource Management & Isolation
Query Job

Introductory Story
Get the top 10
charge sum and
correspond
advertiser which
triggered by the
query word‘flower’
• Create index on ‘userid’ column
• Various index types to choosefor
different fields types
• ×5 speed boosting than native
spark sql, ×80 than MR Job
• 3 day baidu charging log, 4TB
data,70000+files, query timein
10~15s

Roll Up Table Layer
date userid searchid baiduid cmatch
…
…
shows clicks charge
1 1 1 10 2 10 1 5
1 1 2 11 3 10 1 5
1 1 3 12 2 10 1 5
1 1 4 13 1 10 1 5
1 1 5 14 1 10 1 5
1 2 6 14 2 10 1 5
1 2 7 15 3 10 1 5
1 2 8 16 4 10 1 5
1 2 9 17 5 10 1 5
700+ Columns
99% query only use <10 columns
Select date,userid,shows,clicks,charge from…
date userid shows clicks charge
1 1 50 5 25
1 2 40 4 20
Multi Roll Up Table
(user-transparent)
date cmatch shows clicks charge
1 1 20 2 10
1 2 30 3 15
1 3 20 2 10
1 4 10 1 5
1 5 10 1 5

OAP In BigSQL
… Name Department Age …
… … … … …
… John INF 35 …
… Michelle AI-Lab 29 …
… Amy INF 42 …
… Kim AI-Lab 27 …
… Mary AI-Lab 47 …
… … … … …
DataFile
IndexFile
Sorted Age Row Index
in Data File
27 3
29 1
35 0
42 2
45 4
Department Bit Array
INF 10100
AI-Lab 01011
Index Build
NormalTableScan
UseIndex
Skippable Reader
Select xxx from xxx where age > 29 and department in (INF, AI-Lab)

OAP In BigSQL
… Name Department Age …
… … … … …
… John INF 35 …
… Michelle AI-Lab 29 …
… Amy INF 42 …
… Kim AI-Lab 27 …
… Mary AI-Lab 47 …
… … … … …
DataFile
InMemoryCache
Load Cache
Department Row Index
in Data File
INF 2
AI-Lab 3
Age Row Index
in Data File
35 0
29 1

BBS’s Contribute to Spark
• Spark-4502
Spark SQL reads unneccesary nested fields from Parquet
• Spark-18700
getCached in HiveMetastoreCatalog not thread safe cause driver OOM
• Spark-20408
Get glob path in parallel to reduce resolve relation time
• …

Future plans
• Compatible with more data formats
• Explicit cache and cache management
• Optimize SQL operators (join, aggregate) with index
• Integrate with structured streaming
• Utilize Latest hardware technology, such as Intel QAT
or 3D XPoint.
• Welcome to contribute!
https://github.com/Intel-bigdata/OAP

OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yuanjian Li

Thank You.
daoyuan.wang@intel.com
liyuanjian@baidu.com

OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yuanjian Li

More Related Content

What's hot (20)

Similar to OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yuanjian Li (20)

More from Databricks (20)

Recently uploaded (20)

OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yuanjian Li