SlideShare a Scribd company logo
Confidential © 2014 Actian Corporation1
SQL + Hadoop: The High Performance
Advantage
Turn Hadoop into a High Performance Analytics Platform
Emma McGrattan, Actian
Jim Hare, Actian
8 July 2014
Confidential © 2014 Actian Corporation2
1. Introduction
2. Hadoop Challenges
3. Actian Analytics Platform –
Hadoop SQL Edition
4. Industrialized, High
Performance SQL in Hadoop
5. Questions
Agenda
All lines are muted
To ask a question, use
Chat or Q&A panel
Recording will be made
available
We‘ll be running a few
polling questions
Confidential © 2014 Actian Corporation3
$140M Revenues + Profitable
10,000+ Customers
Global Presence: 8 world-wide offices, 7x 24 multinational support model
3
“Actian is now very powerfully
positioned in the big data and
analytics markets.” Robin Bloor
Actian is Delivering Transformational Value
“Actian has assembled all of the next generation
IPs into a single analytics platform, allowing
users a level of flexibility in data interaction that
competitors have not been able to match.”
siliconANGLE
Confidential © 2014 Actian Corporation4
Big Data Offers Significant Opportunities
Personalized Experience
New Products/Services
Reduce Risk
Predictive Analytics
Many Data Sources
Low Cost Storage
…But only for those who embrace it
Improve Decision-Making
Confidential © 2014 Actian Corporation5
Enter Hadoop as the Big Data Enabler
for Low Cost Storage
DW
Offload
Landing
Zone
Data Reservoir
?
Confidential © 2014 Actian Corporation6
But It isn’t Easy with Hadoop
Batch performance
Time to Value
Expensive Skills
Silo’d Data
Access
Data preparation
Confidential © 2014 Actian Corporation7
Hadoop Complexity Forcing Organizations
to Move Data in order to Analyze it
DW
Offload
Landing
Zone
Hadoop Data Reservoir
Data
Management
Analytics
Processing
Visualization
& Data
Science
Workbench
Result: duplicate storage & infrastructure costs, more IT
resources, network bandwidth usage, and complexity
Data
Transfer
Confidential © 2014 Actian Corporation8
CIOs Challenged by Big Data Costs
One in three CIOs pay
between 21 cents to 30 cents per
gigabyte a month.
Translation: it costs a company $3.12
million per year to store 500,000
gigabytes at an average cost of 26
cents per gigabyte per month.
Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html
-- CIO Insight
Confidential © 2014 Actian Corporation9
CIOs Challenged by Types of Big Data
73% of CIOs day up
to 50% of their data
will be unstructured
within two years.
Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html
-- CIO Insight
Confidential © 2014 Actian Corporation10
Instead, what if you could move the
analytic processing to the Hadoop data?
Data Science
Workbench
Analytic
Processing
Data
Management
… And transform Hadoop from a data lake into a high
performance, fully functional analytics platform
SQL User
Access
Confidential © 2014 Actian Corporation11
What is it?
Introducing the Actian Analytics Platform –
Hadoop SQL Edition
Patented X100 vector processing engine plus visual data and analytics work
flow, all running natively in Hadoop via YARN
Turns Hadoop into a High-Performance, Fully-Functional Analytics Database
How is this unique?
Highest performing, most industrialized SQL access to Hadoop data
Only end-to-end analytic processing natively in Hadoop
Most consumable, accessible, manageable Hadoop analytics
What does this mean to you?
Removes all barriers for business access to big data analytics
Enables SQL users with no constraints on Hadoop data
Accelerates time to value
Confidential © 2014 Actian Corporation12
The Industry’s Abuzz – about Actian!
“Deploying on Hadoop enables the Actian Analytics Platform to scale to massively
parallel scale without having to modify the underlying engine. For Actian, Hadoop
is a means to an end; it provides an opening for Actian to introduce a fast SQL
engine that operates at scale.”
Tony Baer, Principal Analyst, Software, Ovum
“Actian’s platform now makes Hadoop data repositories accessible to the entire
enterprise by empowering millions of business-savvy SQL users and business
analysts to conduct advanced analytics directly on data in the Hadoop
Distributed File System (HDFS). Companies investing in Hadoop now can
broaden the scope of data discovery, increase the accuracy of decisions, and
speed time to value.”
Daniel Gutierrez, Inside Big Data
“The latest version of the Actian Analytics Platform provides end-to-end analytic
processing natively in Hadoop. This will make the Hadoop Big Data framework
more accessible by offering high-performance ELT (extract, load and transform)
and SQL analytics on Hadoop with no need for MapReduce skills. This is a big
deal because data scientists with Hadoop skills are in short supply, while SQL
skills are relatively abundant.”
Confidential © 2014 Actian Corporation13
Libraries of Analytics
Hadoop
Connections to Access Any Data
Actian Analytics Platform – Hadoop SQL Edition
Visual Data and Analytic Workbench
High Performance
Data Flow Engine
Industrialized SQL
Analytics Database
Natively in Hadoop
Removes all barriers for business access to big data
analytics
Business
Processes
Users
Machines
Applications
Expansive Connectivity  Data Blending & Enrichment  Discovery  Data Science  Analytics  Operational BI
Enterprise Data
Machine Data
Social Data
Data Warehouse
SaaS Data
Amazon
Redshift
Confidential © 2014 Actian Corporation14
Actian Analytics Platform – Hadoop SQL Edition
Lightning fast and industrial strength
SQL in Hadoop – Up to 30X faster than
Impala
Full end-to-end analytic processing
platform - all native in Hadoop
Packaged with “real world” solution
blueprints
Confidential © 2014 Actian Corporation15
Visual Data Science & Analytics Workbench
• Drag/drop interface with 100’s of data prep and analytic functions
• Connect, blend, & enrich data and perform discovery & data science
• Build and test predictive models
• Running on top of a high performance data flow engine
• All natively within Hadoop via YARN
Confidential © 2014 Actian Corporation16
Ubiquitous Skills
■ 1 Million+ SQL Users
■ $ Lower cost
■ Easy to find, in most
companies
■ Embedded in the business
Specialty Skills
■ 150K MapReduce
Programmers
■ $$$ Expensive
■ 170K Shortage, hard to find
■ Separate from the business
Unleash millions of business-savvy, SQL users
with no constraints on Hadoop data
Actian Analytics PlatformTM
Analyze ActConnect
+
Confidential © 2014 Actian Corporation17
Actian Analytics Platform = 25 Minutes
Log Reader Filter Rows Group Load Vectork-Means
Coding MapReduce = 4 Weeks
Avro Writer
MapReduce Code
k-Means
MapReduce Code
Log Reader Filter Rows Group Load Vector
MapReduce Code MapReduce Code MapReduce Code MapReduce Code
Accelerate time to value and turn Hadoop data
into transformational value
Confidential © 2014 Actian Corporation18
Vendor Approaches to “SQL on Hadoop”
“marketing jobs”
“wrapped legacy”
“from scratch”
SQL Outside Hadoop
• Connector approach
• MPP DB  need 2 clusters
• Expensive, hard to manage
Mature but non-Integrated
• Legacy engine (e.g. Postgres) + top layer
• Store data outside HDFS (local files)
• Separate Failover Management (tools)
Integrated but Immature
• No trickle updates
• Immature/poor optimizers+engines
• I18N, security, workload mgmt,
access control?
Confidential © 2014 Actian Corporation19
“wrapped
legacy”
“from
scratch”
Maturity
(SQL support,
ACID, reliability,
security, connectivity,
performance)
Hadoop IntegrationLow Native
High
“marketing jobs” Mature &
Integrated
+
+
“SQL on Hadoop” Vendor Landscape
Confidential © 2014 Actian Corporation20 Confidential © 2014 Actian Corporation 20
Actian Vector Hadoop Edition
Actian Analytics Platform
Hadoop SQL Edition
Actian Analytics Platform
NameNode
DataNode DataNode
DataNode DataNode
DataNode DataNode
DataNode DataNode
Prepare
Standard SQL Interfaces
Orchestrate
Connect
Connect to any data
via Actian
DataConnect
Manage dataflow
across the entire
analytic process
6 POINTS OF
INNOVATION:
Vector Processing
On Chip Cache
Fast Real-time
Updates
Smart Compression
Storage Indexes
Multi-Core Parallelism
Running natively in
Hadoop via YARN
Prepare, enrich, and
analyze any data with
Actian DataFlow
NEXT GENERATION
DATABASE
TECHNOLOGY::
Columnar
Compressed
Storage Indexes
Confidential © 2014 Actian Corporation21
Actian Vector – Unmatched InnovationTime/CyclestoProcess
Data Processed
DISK
RAM
CHIP
10GB2-3GB40-400MB
2-20150-250Millions
Vector Processing
Single
Instruction
Multiple
Data
2nd Gen Column Store
Limit I/O
Efficient real time updates
Smarter Compression
Maximize throughput
Vectorized decompression
Exploiting Chip Cache
Process data on chip – not in RAM
1
2
3
4
Multi-core Parallelism
Maximize system resource
utilization…
Storage Indexes
Quickly identify candidate data
blocks
Minimize IO
5
6
Confidential © 2014 Actian Corporation22
TPC-H 1TB – Faster, Less Hardware
0 100,000 200,000 300,000 400,000
Actian Vector 445,529
Actian Vector 436,788
SQL Server 219,888
Oracle 209,534
Oracle 201,487
SQL Server 173,962
Sybase IQ 164,747
Oracle 140,181
SQL Server 134,117
June ‘12
May ‘11
Aug ‘11
June ‘11
Sept ‘11
Apr ‘11
Dec ‘10
Apr ‘10
Dec ‘11
$57,146
$1,229,968
$460,869
$2,402,706
$753,392
$278,527
$85,621
$1,249,967
$258,880
Hardware Cost
(excluding discounts)QphH
Fastest TPC-H QphH@1TB Benchmark (non-clustered)
Source: www.tpc.org /
Confidential © 2014 Actian Corporation23
HADOOP
YARN
HDFS
Standard
SQL
Interfaces
DataNode
HDFS
Visual Data
& Analytics
Workflow
Actian Analytics Platform – Hadoop SQL Edition
Transform Hadoop into a High Performance Analytics Platform
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
X100X100X100
Read
Load
Actian Vector
Blend &
Enrich
Data Science
& Analytics
DataNode
HDFS
X100
HDFS
Vector
• Original file format
• Standard block
replication
NameNode
High Performance,
Industrialized SQL
Database
High Performance,
Parallelized Data Flow
Engine
• Column-based
blocks
• Compressed
• Partitioned
Replicated
Vector
• >=3 Replicated
Copies of Vector
Blocks
• Leveraged to co-
locate data with
various join keys
Confidential © 2014 Actian Corporation24
History of the TPC-DS Comparison
Confidential © 2014 Actian Corporation25 Confidential © 2014 Actian Corporation 25
TPC-DS Benchmark Components
Operational
Systems
Refresh Process Ad-hoc Reporting
Queries
User Queries
DSS Database
TPC-DS
Reports
Store
Web
Catalog
Inventory
Promotions
Set of Files
ETL
Confidential © 2014 Actian Corporation26
Actian Hadoop SQL Performance
0
5
10
15
20
25
30
35
Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98
“Impala Subset” of TPC-DS Queries at Scale Factor 3000 (3TB)
Speedup vs Impala
Impala Actian
16x avg. speedup
Background to “Impala Subset “of TPC-DS benchmark can be found here:
http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/
Both Executed on the Same Hardware and Software Environment:
5 Node Cluster with 64GB of RAM per node and 12x2TB Hard Disks.
SpeedupFactor
Confidential © 2014 Actian Corporation27
Comprehensive – covers full analytic process: data blending & enrichment, discovery &
data science, analytics & operational BI
Accessible – standard ANSI SQL to support standard BI tools; plus key advanced
analytics including cube, grouping sets and windowing functions
Optimized – mature, proven planner and optimizer; optimal use of every node, CPU,
memory, and cache
Secure – native DBMS security including authentication, user and role-based security,
data protection, and encryption
Reliable - fully ACID-compliant with multi-version read consistency, plus system-wide
failover protection
Manageable – resources managed automatically in Hadoop via YARN
Consumable – now usable by millions of users with every SQL tool and application on
the planet
Scalable – unlimited expansion to handle extreme #s of users, nodes, data
Most Industrialized SQL in Hadoop
Confidential © 2014 Actian Corporation28
Actian Director for Management
Confidential © 2014 Actian Corporation29
Actian Analytics Platform – Hadoop SQL Edition
Industrialized, High-Performance SQL in Hadoop
Only end-to-end analytic processing natively in Hadoop
Highest performing, most industrialized SQL in Hadoop
Removes all barriers for business access to big data analytics
Unleashes millions of business-savvy SQL users on Hadoop data
Outperforms Cloudera’s Impala by up to 30x
Actian transforms Hadoop from a data lake into a high-
performance analytics platform.
Confidential © 2014 Actian Corporation30
Transform Hadoop – Transform your Business
Confidential © 2014 Actian Corporation31
3
Get started today! www.actian.com/hadoop
Pre-register for an
evaluation copy of
Actian’s SQL in
Hadoop
bigdata.actian.com/
sql-in-hadoop
Register for a Sand
Hill Hadoop Survey
Results webinar on
July 24, 2014
bigdata.actian.com/
SandHill- Hadoop-
Results
2
1
Confidential © 2014 Actian Corporation32
3
Get started today! www.actian.com/hadoop
Pre-register for an
evaluation copy of
Actian’s SQL in
Hadoop
bigdata.actian.com/
sql-in-hadoop
Register for a Sand
Hill Hadoop Survey
Results webinar on
July 24, 2014
bigdata.actian.com/
SandHill- Hadoop-
Results
2
1

More Related Content

PPTX
Analytics at the Speed of Thought: Actian Express Overview
PDF
Turning Your Data Lake into Measurable Business Value
PDF
Hortonworks roadshow
PDF
2017 OpenWorld Keynote for Data Integration
PPTX
How to Operationalise Real-Time Hadoop in the Cloud
PPTX
Real-time Data Pipelines with SAP and Apache Kafka
PPTX
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
PPTX
2020 Big Data & Analytics Maturity Survey Results
Analytics at the Speed of Thought: Actian Express Overview
Turning Your Data Lake into Measurable Business Value
Hortonworks roadshow
2017 OpenWorld Keynote for Data Integration
How to Operationalise Real-Time Hadoop in the Cloud
Real-time Data Pipelines with SAP and Apache Kafka
Apache Hadoop India Summit 2011 talk "Data Integration on Hadoop" by Sanjay K...
2020 Big Data & Analytics Maturity Survey Results

What's hot (19)

PDF
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
PPTX
Break Free From Oracle with Attunity and Microsoft
PPTX
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
PPTX
Digital Business Transformation in the Streaming Era
PPT
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
PPTX
Apache Impala (incubating) 2.5 Performance Update
PDF
Azure for SAP Solutions - Use Cases and Migration Options
PPTX
Big data journey to the cloud rohit pujari 5.30.18
PPTX
Consolidate your data marts for fast, flexible analytics 5.24.18
PPTX
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
PPTX
Top Trends in Building Data Lakes for Machine Learning and AI
PPTX
Kudu Forrester Webinar
PPTX
Hortonworks Oracle Big Data Integration
PPTX
Driving Better Products with Customer Intelligence

PPTX
Swimming Across the Data Lake, Lessons learned and keys to success
PDF
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
PDF
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
PPTX
Cloudera, Azure and Big Data at Cloudera Meetup '17
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Break Free From Oracle with Attunity and Microsoft
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Digital Business Transformation in the Streaming Era
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Apache Impala (incubating) 2.5 Performance Update
Azure for SAP Solutions - Use Cases and Migration Options
Big data journey to the cloud rohit pujari 5.30.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Top Trends in Building Data Lakes for Machine Learning and AI
Kudu Forrester Webinar
Hortonworks Oracle Big Data Integration
Driving Better Products with Customer Intelligence

Swimming Across the Data Lake, Lessons learned and keys to success
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Cloudera, Azure and Big Data at Cloudera Meetup '17
Ad

Viewers also liked (6)

PPTX
Drive Customer Loyalty with Big Data 2.0
PDF
7 Ingredients to Create Real Value From Hadoop
PDF
Transforming Healthcare Data Into Value
PPTX
The Bank Job: How to stop ATM Fraud in Real Time
PPTX
Jump start your analytics investments and accelerate analytics ROI
PPTX
Elevating customer analytics - how to gain a 720 degree view of your customer
Drive Customer Loyalty with Big Data 2.0
7 Ingredients to Create Real Value From Hadoop
Transforming Healthcare Data Into Value
The Bank Job: How to stop ATM Fraud in Real Time
Jump start your analytics investments and accelerate analytics ROI
Elevating customer analytics - how to gain a 720 degree view of your customer
Ad

Similar to SQL + Hadoop: The High Performance Advantage� (20)

PPTX
Actian Analytics Platform - Hadoop SQL Edition
PDF
Hadoop as an Analytic Platform: Why Not?
PDF
Get Started Quickly with IBM's Hadoop as a Service
PPTX
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
PDF
Open Innovation with Power Systems
PPTX
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
PPTX
IBM Smarter Analytics
PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
PPTX
Building Confidence in Big Data - IBM Smarter Business 2013
PDF
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
PDF
Paris FOD Meetup #5 Cognizant Presentation
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
PPTX
Yahoo! Hack Europe
PPTX
Accelerating Big Data Analytics
PPTX
How Experian increased insights with Hadoop
PPTX
OAC Workshop - Detroit 2019
PDF
ds_Pivotal_Big_Data_Suite_Product_Suite
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PPTX
Hadoop Reporting and Analysis - Jaspersoft
PPTX
Impala Unlocks Interactive BI on Hadoop
Actian Analytics Platform - Hadoop SQL Edition
Hadoop as an Analytic Platform: Why Not?
Get Started Quickly with IBM's Hadoop as a Service
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Open Innovation with Power Systems
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
IBM Smarter Analytics
Intel and Cloudera: Accelerating Enterprise Big Data Success
Building Confidence in Big Data - IBM Smarter Business 2013
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Paris FOD Meetup #5 Cognizant Presentation
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Yahoo! Hack Europe
Accelerating Big Data Analytics
How Experian increased insights with Hadoop
OAC Workshop - Detroit 2019
ds_Pivotal_Big_Data_Suite_Product_Suite
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Hadoop Reporting and Analysis - Jaspersoft
Impala Unlocks Interactive BI on Hadoop

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Lecture1 pattern recognition............
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Global journeys: estimating international migration
PDF
Introduction to Business Data Analytics.
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
1_Introduction to advance data techniques.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Lecture1 pattern recognition............
Business Ppt On Nestle.pptx huunnnhhgfvu
Launch Your Data Science Career in Kochi – 2025
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Fluorescence-microscope_Botany_detailed content
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
oil_refinery_comprehensive_20250804084928 (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Reliability_Chapter_ presentation 1221.5784
Global journeys: estimating international migration
Introduction to Business Data Analytics.
Data_Analytics_and_PowerBI_Presentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
1_Introduction to advance data techniques.pptx

SQL + Hadoop: The High Performance Advantage�

  • 1. Confidential © 2014 Actian Corporation1 SQL + Hadoop: The High Performance Advantage Turn Hadoop into a High Performance Analytics Platform Emma McGrattan, Actian Jim Hare, Actian 8 July 2014
  • 2. Confidential © 2014 Actian Corporation2 1. Introduction 2. Hadoop Challenges 3. Actian Analytics Platform – Hadoop SQL Edition 4. Industrialized, High Performance SQL in Hadoop 5. Questions Agenda All lines are muted To ask a question, use Chat or Q&A panel Recording will be made available We‘ll be running a few polling questions
  • 3. Confidential © 2014 Actian Corporation3 $140M Revenues + Profitable 10,000+ Customers Global Presence: 8 world-wide offices, 7x 24 multinational support model 3 “Actian is now very powerfully positioned in the big data and analytics markets.” Robin Bloor Actian is Delivering Transformational Value “Actian has assembled all of the next generation IPs into a single analytics platform, allowing users a level of flexibility in data interaction that competitors have not been able to match.” siliconANGLE
  • 4. Confidential © 2014 Actian Corporation4 Big Data Offers Significant Opportunities Personalized Experience New Products/Services Reduce Risk Predictive Analytics Many Data Sources Low Cost Storage …But only for those who embrace it Improve Decision-Making
  • 5. Confidential © 2014 Actian Corporation5 Enter Hadoop as the Big Data Enabler for Low Cost Storage DW Offload Landing Zone Data Reservoir ?
  • 6. Confidential © 2014 Actian Corporation6 But It isn’t Easy with Hadoop Batch performance Time to Value Expensive Skills Silo’d Data Access Data preparation
  • 7. Confidential © 2014 Actian Corporation7 Hadoop Complexity Forcing Organizations to Move Data in order to Analyze it DW Offload Landing Zone Hadoop Data Reservoir Data Management Analytics Processing Visualization & Data Science Workbench Result: duplicate storage & infrastructure costs, more IT resources, network bandwidth usage, and complexity Data Transfer
  • 8. Confidential © 2014 Actian Corporation8 CIOs Challenged by Big Data Costs One in three CIOs pay between 21 cents to 30 cents per gigabyte a month. Translation: it costs a company $3.12 million per year to store 500,000 gigabytes at an average cost of 26 cents per gigabyte per month. Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html -- CIO Insight
  • 9. Confidential © 2014 Actian Corporation9 CIOs Challenged by Types of Big Data 73% of CIOs day up to 50% of their data will be unstructured within two years. Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html -- CIO Insight
  • 10. Confidential © 2014 Actian Corporation10 Instead, what if you could move the analytic processing to the Hadoop data? Data Science Workbench Analytic Processing Data Management … And transform Hadoop from a data lake into a high performance, fully functional analytics platform SQL User Access
  • 11. Confidential © 2014 Actian Corporation11 What is it? Introducing the Actian Analytics Platform – Hadoop SQL Edition Patented X100 vector processing engine plus visual data and analytics work flow, all running natively in Hadoop via YARN Turns Hadoop into a High-Performance, Fully-Functional Analytics Database How is this unique? Highest performing, most industrialized SQL access to Hadoop data Only end-to-end analytic processing natively in Hadoop Most consumable, accessible, manageable Hadoop analytics What does this mean to you? Removes all barriers for business access to big data analytics Enables SQL users with no constraints on Hadoop data Accelerates time to value
  • 12. Confidential © 2014 Actian Corporation12 The Industry’s Abuzz – about Actian! “Deploying on Hadoop enables the Actian Analytics Platform to scale to massively parallel scale without having to modify the underlying engine. For Actian, Hadoop is a means to an end; it provides an opening for Actian to introduce a fast SQL engine that operates at scale.” Tony Baer, Principal Analyst, Software, Ovum “Actian’s platform now makes Hadoop data repositories accessible to the entire enterprise by empowering millions of business-savvy SQL users and business analysts to conduct advanced analytics directly on data in the Hadoop Distributed File System (HDFS). Companies investing in Hadoop now can broaden the scope of data discovery, increase the accuracy of decisions, and speed time to value.” Daniel Gutierrez, Inside Big Data “The latest version of the Actian Analytics Platform provides end-to-end analytic processing natively in Hadoop. This will make the Hadoop Big Data framework more accessible by offering high-performance ELT (extract, load and transform) and SQL analytics on Hadoop with no need for MapReduce skills. This is a big deal because data scientists with Hadoop skills are in short supply, while SQL skills are relatively abundant.”
  • 13. Confidential © 2014 Actian Corporation13 Libraries of Analytics Hadoop Connections to Access Any Data Actian Analytics Platform – Hadoop SQL Edition Visual Data and Analytic Workbench High Performance Data Flow Engine Industrialized SQL Analytics Database Natively in Hadoop Removes all barriers for business access to big data analytics Business Processes Users Machines Applications Expansive Connectivity  Data Blending & Enrichment  Discovery  Data Science  Analytics  Operational BI Enterprise Data Machine Data Social Data Data Warehouse SaaS Data Amazon Redshift
  • 14. Confidential © 2014 Actian Corporation14 Actian Analytics Platform – Hadoop SQL Edition Lightning fast and industrial strength SQL in Hadoop – Up to 30X faster than Impala Full end-to-end analytic processing platform - all native in Hadoop Packaged with “real world” solution blueprints
  • 15. Confidential © 2014 Actian Corporation15 Visual Data Science & Analytics Workbench • Drag/drop interface with 100’s of data prep and analytic functions • Connect, blend, & enrich data and perform discovery & data science • Build and test predictive models • Running on top of a high performance data flow engine • All natively within Hadoop via YARN
  • 16. Confidential © 2014 Actian Corporation16 Ubiquitous Skills ■ 1 Million+ SQL Users ■ $ Lower cost ■ Easy to find, in most companies ■ Embedded in the business Specialty Skills ■ 150K MapReduce Programmers ■ $$$ Expensive ■ 170K Shortage, hard to find ■ Separate from the business Unleash millions of business-savvy, SQL users with no constraints on Hadoop data Actian Analytics PlatformTM Analyze ActConnect +
  • 17. Confidential © 2014 Actian Corporation17 Actian Analytics Platform = 25 Minutes Log Reader Filter Rows Group Load Vectork-Means Coding MapReduce = 4 Weeks Avro Writer MapReduce Code k-Means MapReduce Code Log Reader Filter Rows Group Load Vector MapReduce Code MapReduce Code MapReduce Code MapReduce Code Accelerate time to value and turn Hadoop data into transformational value
  • 18. Confidential © 2014 Actian Corporation18 Vendor Approaches to “SQL on Hadoop” “marketing jobs” “wrapped legacy” “from scratch” SQL Outside Hadoop • Connector approach • MPP DB  need 2 clusters • Expensive, hard to manage Mature but non-Integrated • Legacy engine (e.g. Postgres) + top layer • Store data outside HDFS (local files) • Separate Failover Management (tools) Integrated but Immature • No trickle updates • Immature/poor optimizers+engines • I18N, security, workload mgmt, access control?
  • 19. Confidential © 2014 Actian Corporation19 “wrapped legacy” “from scratch” Maturity (SQL support, ACID, reliability, security, connectivity, performance) Hadoop IntegrationLow Native High “marketing jobs” Mature & Integrated + + “SQL on Hadoop” Vendor Landscape
  • 20. Confidential © 2014 Actian Corporation20 Confidential © 2014 Actian Corporation 20 Actian Vector Hadoop Edition Actian Analytics Platform Hadoop SQL Edition Actian Analytics Platform NameNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Prepare Standard SQL Interfaces Orchestrate Connect Connect to any data via Actian DataConnect Manage dataflow across the entire analytic process 6 POINTS OF INNOVATION: Vector Processing On Chip Cache Fast Real-time Updates Smart Compression Storage Indexes Multi-Core Parallelism Running natively in Hadoop via YARN Prepare, enrich, and analyze any data with Actian DataFlow NEXT GENERATION DATABASE TECHNOLOGY:: Columnar Compressed Storage Indexes
  • 21. Confidential © 2014 Actian Corporation21 Actian Vector – Unmatched InnovationTime/CyclestoProcess Data Processed DISK RAM CHIP 10GB2-3GB40-400MB 2-20150-250Millions Vector Processing Single Instruction Multiple Data 2nd Gen Column Store Limit I/O Efficient real time updates Smarter Compression Maximize throughput Vectorized decompression Exploiting Chip Cache Process data on chip – not in RAM 1 2 3 4 Multi-core Parallelism Maximize system resource utilization… Storage Indexes Quickly identify candidate data blocks Minimize IO 5 6
  • 22. Confidential © 2014 Actian Corporation22 TPC-H 1TB – Faster, Less Hardware 0 100,000 200,000 300,000 400,000 Actian Vector 445,529 Actian Vector 436,788 SQL Server 219,888 Oracle 209,534 Oracle 201,487 SQL Server 173,962 Sybase IQ 164,747 Oracle 140,181 SQL Server 134,117 June ‘12 May ‘11 Aug ‘11 June ‘11 Sept ‘11 Apr ‘11 Dec ‘10 Apr ‘10 Dec ‘11 $57,146 $1,229,968 $460,869 $2,402,706 $753,392 $278,527 $85,621 $1,249,967 $258,880 Hardware Cost (excluding discounts)QphH Fastest TPC-H QphH@1TB Benchmark (non-clustered) Source: www.tpc.org /
  • 23. Confidential © 2014 Actian Corporation23 HADOOP YARN HDFS Standard SQL Interfaces DataNode HDFS Visual Data & Analytics Workflow Actian Analytics Platform – Hadoop SQL Edition Transform Hadoop into a High Performance Analytics Platform DataNode HDFS DataNode HDFS DataNode HDFS X100X100X100 Read Load Actian Vector Blend & Enrich Data Science & Analytics DataNode HDFS X100 HDFS Vector • Original file format • Standard block replication NameNode High Performance, Industrialized SQL Database High Performance, Parallelized Data Flow Engine • Column-based blocks • Compressed • Partitioned Replicated Vector • >=3 Replicated Copies of Vector Blocks • Leveraged to co- locate data with various join keys
  • 24. Confidential © 2014 Actian Corporation24 History of the TPC-DS Comparison
  • 25. Confidential © 2014 Actian Corporation25 Confidential © 2014 Actian Corporation 25 TPC-DS Benchmark Components Operational Systems Refresh Process Ad-hoc Reporting Queries User Queries DSS Database TPC-DS Reports Store Web Catalog Inventory Promotions Set of Files ETL
  • 26. Confidential © 2014 Actian Corporation26 Actian Hadoop SQL Performance 0 5 10 15 20 25 30 35 Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98 “Impala Subset” of TPC-DS Queries at Scale Factor 3000 (3TB) Speedup vs Impala Impala Actian 16x avg. speedup Background to “Impala Subset “of TPC-DS benchmark can be found here: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/ Both Executed on the Same Hardware and Software Environment: 5 Node Cluster with 64GB of RAM per node and 12x2TB Hard Disks. SpeedupFactor
  • 27. Confidential © 2014 Actian Corporation27 Comprehensive – covers full analytic process: data blending & enrichment, discovery & data science, analytics & operational BI Accessible – standard ANSI SQL to support standard BI tools; plus key advanced analytics including cube, grouping sets and windowing functions Optimized – mature, proven planner and optimizer; optimal use of every node, CPU, memory, and cache Secure – native DBMS security including authentication, user and role-based security, data protection, and encryption Reliable - fully ACID-compliant with multi-version read consistency, plus system-wide failover protection Manageable – resources managed automatically in Hadoop via YARN Consumable – now usable by millions of users with every SQL tool and application on the planet Scalable – unlimited expansion to handle extreme #s of users, nodes, data Most Industrialized SQL in Hadoop
  • 28. Confidential © 2014 Actian Corporation28 Actian Director for Management
  • 29. Confidential © 2014 Actian Corporation29 Actian Analytics Platform – Hadoop SQL Edition Industrialized, High-Performance SQL in Hadoop Only end-to-end analytic processing natively in Hadoop Highest performing, most industrialized SQL in Hadoop Removes all barriers for business access to big data analytics Unleashes millions of business-savvy SQL users on Hadoop data Outperforms Cloudera’s Impala by up to 30x Actian transforms Hadoop from a data lake into a high- performance analytics platform.
  • 30. Confidential © 2014 Actian Corporation30 Transform Hadoop – Transform your Business
  • 31. Confidential © 2014 Actian Corporation31 3 Get started today! www.actian.com/hadoop Pre-register for an evaluation copy of Actian’s SQL in Hadoop bigdata.actian.com/ sql-in-hadoop Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014 bigdata.actian.com/ SandHill- Hadoop- Results 2 1
  • 32. Confidential © 2014 Actian Corporation32 3 Get started today! www.actian.com/hadoop Pre-register for an evaluation copy of Actian’s SQL in Hadoop bigdata.actian.com/ sql-in-hadoop Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014 bigdata.actian.com/ SandHill- Hadoop- Results 2 1

Editor's Notes

  • #7: But it isn’t easy Changing your company is not easy. Give examples: you’ve just invested $1m in a data warehouse, but business now wants to … It now will cost you 10 fold.
  • #12: We are announcing Vector on Hadoop - industrial strength sql on hadoop with atom smashing speed never before seen in the industry. This is a core part of our Actian Analytics Platform – Hadoop SQL Edition. Let me tell you about it (details below) and show you a few things. What are we announcing? Highest performing, most industrialized SQL in Hadoop Turns Hadoop into a High-Performance, Fully-Functional Analytics Database Actian Analytics Platform – Hadoop SQL Edition includes our hardened (patented) X100 vector processing engine, combined with Actian’s visual data and analytics work flow, all running natively in Hadoop via YARN How is this unique? Highest performing, most industrialized SQL access to Hadoop data Only end-to-end analytic processing natively in Hadoop (covers the full analytics processes: data blending & enrichment, discovery & data science, analytics & operational BI) Most consumable, accessible, manageable Hadoop analytics What does this mean to our customers? Removes all barriers for business access to big data analytics Unleashes millions of business-savvy, SQL users with no constraints on Hadoop data to improve the accuracy of their analytical predictions and decision-making Accelerates time to value and turns Hadoop data into transformational value: customer delight, competitive advantage, world-class risk management, disruptive business models
  • #15: I’m going to show you three things: How fast it is, how easy it is to get started and how it can be used in real-world scenarios.
  • #20: internationalization
  • #22: 1: We use vectorized processing to exploit modern CPU architecture. We execute one operation at a time on a vector of data, which allows for tight inner code loops without branching. This way, we can use SIMD instructions and, because of the lack of branching, make sure the CPU pipelines are not thrashed. A vector is typically 1024 rows of a single column, so it’s a manageable amount of data while the overhead per row is still negligible. 2: A vector will fit in the CPU cache together with the code for a particular operation, so all execution is in-cache. 3: To feed this engine with enough data, we’re also applying the vectorized paradigm to the storage subsystem. First of all, we’re using a column store, so only relevant columns are read from disk. Data is stored in blocks of typically 512mb and a single block contains only data from a single column (there are exceptions). Blocks of different columns can be interleaved per block, but typically more than one block of the same column is grouped. To keep the stable storage fast and defragmented, we use in-memory overlays to store updates to the data. These overlays are automatically flushed to stable storage when needed. 4: The blocks are stored compressed on-disk. We’ve got a number of lightweight compression algorithms and the most efficient one is chosen per block, depending on the data characteristics. The decompression takes place per vector and can be done in the CPU cache, which neatly ties in with our in-cache execution. We have a buffer manager that predicts what blocks are needed when and makes sure no blocks that will be used in the near future are evicted from the buffer cache. 5: We have min-max indexes on the disk blocks, so when data is not completely random we can narrow down the ranges of blocks we need to read from disk, per column. All in all, the execution engine is able to do about 1.5GB/s per core, and high-end I/O subsystems are able to keep up with this.
  • #27: Execution Subset of TPC-DS as chosen by Impala Data size is 3TB (SF3000) Executed on 5-node “rushcluster” in Austin Both Impala and Vector numbers are on the same hardware Comparison with Impala Verified that Impala plans are sensible Currently observed average speedup is 11x Optimal query plans (manually written) gives us 16x speedup These are real numbers! We executed manual plans directly Changes in the cost model would get us to this performance Performance improvements Cost model changes will get us to 16x speedup Pipeline of query execution changes Well into H2 Estimated to get us 2x improvement So, estimated speedup vs Impala would be ~30x (no guarantees) Planning to run TPC-H SF1000 and SF3000 With all planned improvements (end of the year) we should be able to beat the EXASOL cluster numbers.
  • #30: What are we announcing? Actian Analytics Platform – Hadoop SQL Edition, the first offering that turns Hadoop into a fully-functioning analytics platform. This new edition introduces the highest performing, most industrialized SQL in Hadoop, powered by our hardened (patented) X100 vector processing engine, combined with Actian’s visual data and analytics work flow, all running natively in Hadoop via YARN. How is this unique? Provides the only end-to-end analytic processing natively in Hadoop (covers the full analytics processes: data blending & enrichment, discovery & data science, analytics & operational BI) Delivers the highest performing, most industrialized SQL access to Hadoop data Makes the entire analytic process more consumable, easier to access, and easier to manage than on any other What does this mean to our customers? Industrialized SQL in Hadoop removes all barriers for business access to big data analytics Broad SQL access unleashes millions of business-savvy, SQL users with no constraints on Hadoop data to improve the accuracy of their analytical predictions and decision-making Turbocharged Hadoop analytics and SQL in Hadoop accelerates time to value and turns Hadoop data into transformational value: customer delight, competitive advantage, world-class risk management, disruptive business models
  • #31: We want to partner with you to identify where the most obvious places where big data analytics could be applied to your organization.