SlideShare a Scribd company logo
http://kylin.io
Apache Kylin
Extreme OLAP Engine
Seshu Adunuthula
Director, Analytics Platform, eBay | sadunuthula@ebay.com
http://kylin.io
Agenda
 What’s Apache Kylin?
 Features
 Performance
 Roadmap
 Q & A
http://kylin.io
Extreme OLAP Engine for Big Data
Kylin is an open source Distributed Analytics Engine from eBay
that provides SQL interface and multi-dimensional analysis
(OLAP) on Hadoop supporting extremely large datasets
What’s Kylin
kylin / ˈkiːˈlɪn / 麒麟
--n. (in Chinese art) a mythical animal of composite form
• Open Sourced on Oct 1st, 2014
• Be Accepted as Apache Incubator Project on Nov 25th, 2014
http://kylin.io
Business Needs for Big Data Analysis
 Sub-second query latency on billions of rows
 ANSI SQL for both analysts and engineers
 Full OLAP capability to offer advanced functionality
 Seamless Integration with BI Tools
 Support of high cardinality and high dimensions
 High concurrency – thousands of end users
 Distributed and scale out architecture for large data volume
http://kylin.io
 Huge volume data
 Table scan
 Big table joins
 Data shuffling
 Analysis on different granularity
 Runtime aggregation expensive
 Map Reduce job
 Batch processing
Technical Challenges
http://kylin.io
OLAP Cube – Balance between Space and Time
time, item
time, item, location
time, item, location, supplier
time item location supplier
time, location
Time, supplier
item, location
item, supplier
location, supplier
time, item, supplier
time, location, supplier
item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells
1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>
2. (9/15, milk, Urbana, *) - <time, item, location>
3. (*, milk, Urbana, *) - <item, location>
4. (*, milk, Chicago, *) - <item, location>
5. (*, milk, *, *) - <item>
• Cuboid = one combination of dimensions
• Cube = all combination of dimensions (all cuboids)
http://kylin.io
Kylin Architecture Overview
7
Cube Build Engine
(MapReduce…)
SQL
Low Latency -
Seconds
Mid Latency - Minutes
Routing
3rd Party App
(Web App, Mobile…)
Metadata
SQL-Based Tool
(BI Tools: Tableau…)
Query Engine
Hadoop
Hive
REST API JDBC/ODBC
 Online Analysis Data Flow
 Offline Data Flow
 Clients/Users interactive with
Kylin via SQL
 OLAP Cube is transparent to
users
Star Schema Data Key Value Data
Data
Cube
OLAP
Cube
(HBase)
SQL
REST Server
http://kylin.io
 Hive
 Input source
 Pre-join star schema during cube building
 MapReduce
 Pre-aggregation metrics during cube building
 HDFS
 Store intermediated files during cube building.
 HBase
 Store data cube.
 Serve query on data cube.
 Coprocessor is used for query processing.
How Does Kylin Utilize Hadoop Components?
http://kylin.io
Agenda
 What’s Apache Kylin?
 Features
 Performance
 Roadmap
 Q & A
http://kylin.io
 Extremely Fast OLAP Engine at Scale
Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data
 ANSI SQL Interface on Hadoop
Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions
 Seamless Integration with BI Tools
Kylin currently offers integration capability with BI Tools like Tableau.
 Interactive Query Capability
Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive
queries for the same dataset
 MOLAP Cube
User can define a data model and pre-build in Kylin with more than 10+ billions of raw
data records
Features Highlights
http://kylin.io
 Compression and Encoding Support
 Incremental Refresh of Cubes
 Approximate Query Capability for distinct Count (HyperLogLog)
 Leverage HBase Coprocessor for query latency
 Job Management and Monitoring
 Easy Web interface to manage, build, monitor and query cubes
 Security capability to set ACL at Cube/Project Level
 Support LDAP Integration
Features Highlights…
http://kylin.io
Cube Designer
http://kylin.io
Job Management
http://kylin.io
Query and Visualization
http://kylin.io
Tableau Integration
http://kylin.io
Agenda
 What’s Apache Kylin?
 Features
 Performance
 Roadmap
 Q & A
http://kylin.io
Kylin vs. Hive
# Query
Type
Return Dataset Query
On Kylin (s)
Query
On Hive (s)
Comments
1 High Level
Aggregation
4 0.129 157.437 1,217 times
2 Analysis Query 22,669 1.615 109.206 68 times
3 Drill Down to
Detail
325,029 12.058 113.123 9 times
4 Drill Down to
Detail
524,780 22.42 6383.21 278 times
5 Data Dump 972,002 49.054 N/A
0
50
100
150
200
SQL #1 SQL #2 SQL #3
Hive
Kylin
High Level
Aggregatio
n
Analysis
Query
Drill Down
to Detail
Low Level
Aggregatio
n
Transactio
n Level
Based on 12+B records case
http://kylin.io
Performance - Query Latency
90%tile queries <5s
Green Line: 90%tile queries
Gray Line: 95%tile queries
http://kylin.io
Agenda
 What’s Apache Kylin?
 Features
 Performance
 Roadmap
 Q & A
http://kylin.io
Kylin Evolution Roadmap
201520142013
Initial
Prototype
for MOLAP
• Basic end to end
POC
MOLAP
• Incremental
Refresh
• ANSI SQL
• ODBC Driver
• Web GUI
• ACL
• Open Source
HOLAP
• Streaming OLAP
• JDBC Driver
• New UI
• Excel Support
• … more
Next Gen
• Automation
• Capacity
Management
• In-Memory
Analysis (TBD)
• Spark (TBD)
• … more
TBD
Future…
Sep, 2013
Jan, 2014
Sep, 2014
Q1, 2015
http://kylin.io
 Kylin Site:
 http://kylin.io
 Twitter:
 @ApacheKylin
 Github:
 apache/incubator-kylin
 WeChat (微信)
 ApacheKylin
Open Source

More Related Content

PPTX
Apache Kylin’s Performance Boost from Apache HBase
PPTX
Apache Kylin 1.5 Updates
PPTX
Apache Kylin Extreme OLAP Engine for Big Data
PPTX
Apache kylin (china hadoop summit 2015 shanghai)
PPTX
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
PPTX
Design cube in Apache Kylin
PPTX
Apache Kylin – Cubes on Hadoop
PPTX
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin 1.5 Updates
Apache Kylin Extreme OLAP Engine for Big Data
Apache kylin (china hadoop summit 2015 shanghai)
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Design cube in Apache Kylin
Apache Kylin – Cubes on Hadoop
Apache kylin 2.0: from classic olap to real-time data warehouse

What's hot (20)

PDF
Apache Kylin - Balance between space and time - Hadoop Summit 2015
PPTX
Elastic Stack Basic - All The Capabilities in 6.3!
PPTX
Big Data at Pinterest - Presented by Qubole
PPTX
Datacubes in Apache Hive at ApacheCon
ODP
BigQuery at AppsFlyer - past, present and future
PPTX
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
PDF
Rounds analytics pipeline
PPTX
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
PDF
StackEngine Demo - Docker Austin
PDF
Big problems Big Data, simple solutions
PPTX
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
PDF
Azure data analytics platform - A reference architecture
PDF
Modern Data architecture Design
PPTX
Tokyo azure meetup #2 big data made easy
PPTX
IPC Global Big Data To Decision Solution Overview
PDF
Google Cloud Dataflow
PPTX
Rounds tips & tricks
PDF
Google Cloud Platform for Data Science teams
PDF
On-boarding with JanusGraph Performance
PDF
Introducing the Hub for Data Orchestration
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Elastic Stack Basic - All The Capabilities in 6.3!
Big Data at Pinterest - Presented by Qubole
Datacubes in Apache Hive at ApacheCon
BigQuery at AppsFlyer - past, present and future
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Rounds analytics pipeline
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
StackEngine Demo - Docker Austin
Big problems Big Data, simple solutions
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
Azure data analytics platform - A reference architecture
Modern Data architecture Design
Tokyo azure meetup #2 big data made easy
IPC Global Big Data To Decision Solution Overview
Google Cloud Dataflow
Rounds tips & tricks
Google Cloud Platform for Data Science teams
On-boarding with JanusGraph Performance
Introducing the Hub for Data Orchestration
Ad

Similar to ApacheKylin_HBaseCon2015 (20)

PPTX
Apache Kylin Introduction
PPTX
Apache kylin - Big Data Technology Conference 2014 Beijing
PPTX
Apache Kylin Streaming
PPTX
Kylin OLAP Engine Tour
PDF
Accelerating Big Data Analytics with Apache Kylin
PDF
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
PDF
Apache Kylin Use Cases in China and Japan
PDF
Apache Kylin - Balance Between Space and Time
PPTX
Kylin olap part 1- getting started
PPTX
Adding Spark support to Kylin at Bay Area Spark Meetup
PDF
Apache kylin boost your SQLs on extremely large dataset
PDF
Apache kylin boost your sqls on extremely large dataset
PDF
The Evolution of Apache Kylin by Luke Han
PPTX
Apache kylin 101 - Get Sub-Second Analytics on Massive Datasets
PPTX
Apache Kylin 101
PPTX
Apache Kylin @ Big Data Europe 2015
PDF
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
PDF
Apache Kylin and Use Cases - 2018 Big Data Spain
PPTX
Apache Kylin - OLAP Cubes for SQL on Hadoop
PPTX
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin Introduction
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache Kylin Streaming
Kylin OLAP Engine Tour
Accelerating Big Data Analytics with Apache Kylin
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
Apache Kylin Use Cases in China and Japan
Apache Kylin - Balance Between Space and Time
Kylin olap part 1- getting started
Adding Spark support to Kylin at Bay Area Spark Meetup
Apache kylin boost your SQLs on extremely large dataset
Apache kylin boost your sqls on extremely large dataset
The Evolution of Apache Kylin by Luke Han
Apache kylin 101 - Get Sub-Second Analytics on Massive Datasets
Apache Kylin 101
Apache Kylin @ Big Data Europe 2015
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin on HBase: Extreme OLAP engine for big data
Ad

More from Luke Han (10)

PDF
Augmented OLAP for Big Data
PPTX
Refactoring your EDW with Mobile Analytics Products
PPTX
Building Enterprise OLAP on Hadoop for FSI
PDF
The Apache Way - Building Open Source Community in China - Luke Han
PDF
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
PDF
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
PDF
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
PPTX
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
PDF
Apache Kylin Open Source Journey for QCon2015 Beijing
PPTX
Actuate presentation 2011
Augmented OLAP for Big Data
Refactoring your EDW with Mobile Analytics Products
Building Enterprise OLAP on Hadoop for FSI
The Apache Way - Building Open Source Community in China - Luke Han
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Apache Kylin Open Source Journey for QCon2015 Beijing
Actuate presentation 2011

Recently uploaded (20)

PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Essential Infomation Tech presentation.pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
L1 - Introduction to python Backend.pptx
PDF
System and Network Administration Chapter 2
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
ai tools demonstartion for schools and inter college
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Introduction to Artificial Intelligence
PPTX
Transform Your Business with a Software ERP System
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
How to Migrate SBCGlobal Email to Yahoo Easily
Essential Infomation Tech presentation.pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
2025 Textile ERP Trends: SAP, Odoo & Oracle
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Materi_Pemrograman_Komputer-Looping.pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
L1 - Introduction to python Backend.pptx
System and Network Administration Chapter 2
Online Work Permit System for Fast Permit Processing
Odoo POS Development Services by CandidRoot Solutions
ai tools demonstartion for schools and inter college
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Softaken Excel to vCard Converter Software.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Introduction to Artificial Intelligence
Transform Your Business with a Software ERP System
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool

ApacheKylin_HBaseCon2015

  • 1. http://kylin.io Apache Kylin Extreme OLAP Engine Seshu Adunuthula Director, Analytics Platform, eBay | sadunuthula@ebay.com
  • 2. http://kylin.io Agenda  What’s Apache Kylin?  Features  Performance  Roadmap  Q & A
  • 3. http://kylin.io Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets What’s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form • Open Sourced on Oct 1st, 2014 • Be Accepted as Apache Incubator Project on Nov 25th, 2014
  • 4. http://kylin.io Business Needs for Big Data Analysis  Sub-second query latency on billions of rows  ANSI SQL for both analysts and engineers  Full OLAP capability to offer advanced functionality  Seamless Integration with BI Tools  Support of high cardinality and high dimensions  High concurrency – thousands of end users  Distributed and scale out architecture for large data volume
  • 5. http://kylin.io  Huge volume data  Table scan  Big table joins  Data shuffling  Analysis on different granularity  Runtime aggregation expensive  Map Reduce job  Batch processing Technical Challenges
  • 6. http://kylin.io OLAP Cube – Balance between Space and Time time, item time, item, location time, item, location, supplier time item location supplier time, location Time, supplier item, location item, supplier location, supplier time, item, supplier time, location, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item> • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids)
  • 7. http://kylin.io Kylin Architecture Overview 7 Cube Build Engine (MapReduce…) SQL Low Latency - Seconds Mid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata SQL-Based Tool (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC  Online Analysis Data Flow  Offline Data Flow  Clients/Users interactive with Kylin via SQL  OLAP Cube is transparent to users Star Schema Data Key Value Data Data Cube OLAP Cube (HBase) SQL REST Server
  • 8. http://kylin.io  Hive  Input source  Pre-join star schema during cube building  MapReduce  Pre-aggregation metrics during cube building  HDFS  Store intermediated files during cube building.  HBase  Store data cube.  Serve query on data cube.  Coprocessor is used for query processing. How Does Kylin Utilize Hadoop Components?
  • 9. http://kylin.io Agenda  What’s Apache Kylin?  Features  Performance  Roadmap  Q & A
  • 10. http://kylin.io  Extremely Fast OLAP Engine at Scale Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data  ANSI SQL Interface on Hadoop Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions  Seamless Integration with BI Tools Kylin currently offers integration capability with BI Tools like Tableau.  Interactive Query Capability Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same dataset  MOLAP Cube User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records Features Highlights
  • 11. http://kylin.io  Compression and Encoding Support  Incremental Refresh of Cubes  Approximate Query Capability for distinct Count (HyperLogLog)  Leverage HBase Coprocessor for query latency  Job Management and Monitoring  Easy Web interface to manage, build, monitor and query cubes  Security capability to set ACL at Cube/Project Level  Support LDAP Integration Features Highlights…
  • 16. http://kylin.io Agenda  What’s Apache Kylin?  Features  Performance  Roadmap  Q & A
  • 17. http://kylin.io Kylin vs. Hive # Query Type Return Dataset Query On Kylin (s) Query On Hive (s) Comments 1 High Level Aggregation 4 0.129 157.437 1,217 times 2 Analysis Query 22,669 1.615 109.206 68 times 3 Drill Down to Detail 325,029 12.058 113.123 9 times 4 Drill Down to Detail 524,780 22.42 6383.21 278 times 5 Data Dump 972,002 49.054 N/A 0 50 100 150 200 SQL #1 SQL #2 SQL #3 Hive Kylin High Level Aggregatio n Analysis Query Drill Down to Detail Low Level Aggregatio n Transactio n Level Based on 12+B records case
  • 18. http://kylin.io Performance - Query Latency 90%tile queries <5s Green Line: 90%tile queries Gray Line: 95%tile queries
  • 19. http://kylin.io Agenda  What’s Apache Kylin?  Features  Performance  Roadmap  Q & A
  • 20. http://kylin.io Kylin Evolution Roadmap 201520142013 Initial Prototype for MOLAP • Basic end to end POC MOLAP • Incremental Refresh • ANSI SQL • ODBC Driver • Web GUI • ACL • Open Source HOLAP • Streaming OLAP • JDBC Driver • New UI • Excel Support • … more Next Gen • Automation • Capacity Management • In-Memory Analysis (TBD) • Spark (TBD) • … more TBD Future… Sep, 2013 Jan, 2014 Sep, 2014 Q1, 2015
  • 21. http://kylin.io  Kylin Site:  http://kylin.io  Twitter:  @ApacheKylin  Github:  apache/incubator-kylin  WeChat (微信)  ApacheKylin Open Source