SlideShare a Scribd company logo
Big Data 2.0: ETL & Analytics
Implementing a next generation platform
Tyler Mitchell, Paul Dingman
Innovation Lab
January 2014
ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS

Outcomes

Sources
Enterprise

Applications

Data
Warehouse

Actian Analytics Platform
Connect

Analyze

Customer
Delight

Act

Social
Competitive
Advantage

Accelerators
Internet of Things
DataFlow
WWW

Machine
Data

Matrix

Vector

World-Class Risk
Management

Mobile

Traditional

NoSQL

SaaS
Disruptive New
Business Models

→
→
→
→

2

Rapid Time to Value
Unlimited Scale
Extreme Performance
Disruptive price/performance

→ Modern GUI Development
→ In-memory Analytics
→ Extends Hadoop and
NoSQL analytics
→ Complements Traditional

→
→
→
→

200+ data connectors
600+ analytic functions
Full deployment choice
Certification with broad
set of analytics tools
Actian Matrix for High Performance
Analytics at Any Scale
Serve up highperformance analytic
processing for any app

On-Demand Analytics
On-Demand Integration
Orchestration

Manage dataflows
across the entire
analytic process

Connect to any data
source at the point of
the query
700+ indatabase, analytic
functions

Analytic Libraries
Optimizer

Massively Parallel
LEADER NODE

Columnar

5 LEVELS OF
OPTIMIZATION:

Compressed
Compiled

SQL

Connected

Planning
Execution
Communications
Memory

H

H

H

H

H

H

H

H

H

H

H

H

Node-to-node, bidirectional sharing of
analytics & processes
with Hadoop nodes

Confidential © 2013 Actian Corporation
Actian DataFlow – High Speed Hadoop ETL,
DQ, and Analytics, No Programming
Actian Dataflow
Choose from five sets
of operators:

Transformation & Analytics Libraries

Connections

Visual Framework

Transformation

Automatically detect
resources, plan
optimal utilization,
and parallelize all
workloads on Hadoop

Data Quality
Use dual pipeline
parallelism to
accelerate
performance 10X

Analytics

Data Science

Optimize
Query Pipelining

Manage the entire
analytic process in a
visual framework with
no coding required.

Hadoop – Leader Node

Reuse and share all
components from
operators to
workflows

Take processing to
where the data
lives, runs natively on
any Hadoop
distribution

Actian Accelerator
for Hadoop

Run fully optimized
processing directly on
the Hadoop node or
on any file system

CPU Pipelining
Optimized, On-HDFS Processing

Confidential © 2013 Actian Corporation
ACTIAN DATAFLOW – ETL & ANALYTICS
ACTIAN DATAFLOW – ETL & ANALYTICS

•
•
•
•

Predefined operators
Reduced IO
In-memory operations
Pipeline parallelism

Hadoop 2.0 - what is the big deal

YARN – a new resourced scheduler !
Yet Another Resource Scheduler”
DATAFLOW

DATAFLOW

ob Tracker and Task Tracker has been split up
to increase scalability
Remove MapReduce from core architecture

Now there is a
Operator Library – ETL/DQ
 Reading/Writing
 Text Processing
 Data Exploration
 Data Matching
 Aggregation
 Filtering
 Manipulation

7
Innovation Lab
 Tactical mission:
• Driving platform integration

 Strategic mission:
• Blueprint next-generation analytic
apps & solution architectures

• Advance new science where data
and algorithms intersect
• Solution demoware

Confidential © 2014 Actian Corporation

8
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

DBMS – SMP/MPP
Time Series
Event
Logs

ETL & Analytics
Semantic Web

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

EVERYTHING IS LOG DATA

• Application logs
• System monitoring
• Real-time feeds

Event
Logs

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

TIME-ORDERED PERSISTENCE

DBMS – SMP/MPP
Time Series
Event
Logs

ETL & Analytics
•
•
•
•

Schema-less flexibility
Semantic Web
Extendable first-class citizens (Time, Location, Type)
Universal accessibility
Complete archive of raw events

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

VARIABLE OUTPUT TARGETS

DBMS – SMP/MPP

Traditional DW loading

Time Series

Time window analysis

ETL & Analytics

Load, analyze, re-feed

Semantic Web

Patterns, graph traversal, visuals

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
ACTIAN DATAFLOW

SPARQLverse

Confidential © 2014 Actian Corporation
DATA LOADING

ACTIAN DATA CLOUD LOG FILE EXAMPLE

 2013-08-01T03:38:42.236-0500
 [74.95.141.217, 10.120.245.3]

 User[id=2162,name=tmitchell]
 login
 57509328

Confidential © 2014 Actian Corporation
DATA LOADING

ACTIAN DATA CLOUD LOG FILE EXAMPLE

 2013-08-01T03:38:42.236-0500 - Time
 [74.95.141.217, 10.120.245.3]

- Space

 User[id=2162,name=tmitchell]

- People

 login

- Activity

 57509328

- Magnitude

Confidential © 2014 Actian Corporation
DATA LOADING

HBASE LOADER

Dataflow workflow built into
KNIME open source data mining app

Confidential © 2014 Actian Corporation
DATA LOADING

HBASE STRUCTURED
Event Record
 hasSource – IP Address
 hasTime – timestamp

 hasValue – full source
 hasType – data cloud type
 hasLoadTimestamp – timestamp

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
Sparqlverse

Confidential © 2014 Actian Corporation
HBASE TO OPENTSDB

Optimized HBase reader, selects a time window
and dumps to text files for serving to OpenTSDB

Perpetual
Load
Service
Confidential © 2014 Actian Corporation
EMIT TO OPENTSDB

event.glassfish
1390373743
38720912
method=listUsers
rowid=0548e8
id=79

- metric name
- timestamp
- execution time
- method called
- row ID
- user ID

Confidential © 2014 Actian Corporation
OPENTSDB UI

Confidential © 2014 Actian Corporation
OPENTSDB UI

Confidential © 2014 Actian Corporation
CUSTOM WEB VIZ

Built using:
• Autobahn Python Websockets
• OpenTSDB Web API
• D3 visualization
Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
Sparqlverse

Confidential © 2014 Actian Corporation
Analytics Library

25
MACHINE LEARNING ON HBASE

Observe

Act!

Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
SPARQLverse

* aka SPARQLBase.com

Confidential © 2014 Actian Corporation
DATA LOADING

RDF/SEMANTIC WEB LOADER

RDF/Tr
iples
Writer
Coming Soon

Confidential © 2014 Actian Corporation
FROM LOG TO SPARQLVERSE

From
Single Record

Confidential © 2014 Actian Corporation
TRIPLES EXAMPLE

Agent <produces> Record

Record <logsDataAbout> User
Client <isCalledBy> User

Client <requestsFrom> S

Server <repliesTo> Cli

Confidential © 2014 Actian Corporation
SAMPLE SPARQL QUERY
SELECT (count(*) as ?cntCalls) (sum(?time) as ?timeSum)
FROM <event>
WHERE {
?record :logsDataAbout ?client .
?user :initiates ?client .
?record :exectime ?time . }

Confidential © 2014 Actian Corporation
SAMPLE SPARQL QUERY

…
?record :logsDataAbout ?client .
?user :initiates ?client .
…

Confidential © 2014 Actian Corporation
VISUALIZE DATA GRAPHS

Gephi desktop UI
- supports RDF import

D3 web UI example
Confidential © 2014 Actian Corporation
BIG DATA 2.0 – “BOWTIE” ARCHITECTURE

PUTTING A FACE TO THE NAME

Actian Matrix
OpenTSDB
Event
Logs

Dataflow Analytics
Sparqlverse

•

Used behind Amazon Redshift

Confidential © 2014 Actian Corporation
Actian Matrix for High Performance
Analytics at Any Scale
Serve up highperformance analytic
processing for any app

On-Demand Analytics
On-Demand Integration
Orchestration

Manage dataflows
across the entire
analytic process

Connect to any data
source at the point of
the query
700+ indatabase, analytic
functions

Analytic Libraries
Optimizer

Massively Parallel
LEADER NODE

Columnar

5 LEVELS OF
OPTIMIZATION:

Compressed
Compiled

SQL

Connected

Planning
Execution
Communications
Memory

H

H

H

H

H

H

H

H

H

H

H

H

Node-to-node, bidirectional sharing of
analytics & processes
with Hadoop nodes

Confidential © 2013 Actian Corporation
EXPORT TO MATRIX/MPP

HBASE TO MATRIX LOADER

Load Matrix MPP

Confidential © 2014 Actian Corporation
EXPORT TO MATRIX/MPP

HBase to Matrix Loader

Confidential © 2014 Actian Corporation
FUTURE DIRECTION

Confidential © 2014 Actian Corporation
FUTURE DIRECTION

Real-time processing
Sematic event processing
Continued integration

Confidential © 2014 Actian Corporation
THANK YOU
www.actian.com
facebook.com/actiancorp

Tyler.Mitchell@actian.com
Paul.Dingman@actian.com

@actiancorp

Confidential © 2014 Actian Corporation

More Related Content

PDF
A Reference Architecture for ETL 2.0
PDF
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
PPTX
Big data architectures and the data lake
PDF
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
PDF
Big Data: Architecture and Performance Considerations in Logical Data Lakes
PPTX
Hadoop Powers Modern Enterprise Data Architectures
PPTX
Microsoft Azure Big Data Analytics
PPTX
Data lake – On Premise VS Cloud
A Reference Architecture for ETL 2.0
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Big data architectures and the data lake
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Hadoop Powers Modern Enterprise Data Architectures
Microsoft Azure Big Data Analytics
Data lake – On Premise VS Cloud

What's hot (19)

PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
PPTX
PDF
Hadoop data-lake-white-paper
PDF
Democratizing Data Science on Kubernetes
PDF
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
PDF
Planing and optimizing data lake architecture
PDF
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
PPTX
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PPTX
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PDF
Solving Big Data Problems using Hortonworks
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PPTX
Top Trends in Building Data Lakes for Machine Learning and AI
PDF
Data lake
PDF
Architecture of Big Data Solutions
PDF
The Warranty Data Lake – After, Inc.
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Hadoop data-lake-white-paper
Democratizing Data Science on Kubernetes
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
Planing and optimizing data lake architecture
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Data Lake for the Cloud: Extending your Hadoop Implementation
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Solving Big Data Problems using Hortonworks
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Big Data Analytics in the Cloud with Microsoft Azure
Top Trends in Building Data Lakes for Machine Learning and AI
Data lake
Architecture of Big Data Solutions
The Warranty Data Lake – After, Inc.
Ad

Viewers also liked (20)

PDF
Hadoop Integration into Data Warehousing Architectures
KEY
Large scale ETL with Hadoop
PPTX
Hadoop and Enterprise Data Warehouse
PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
PPTX
Roadmap for solution company
PPT
Netadminpres
PPTX
Security analytics
PPTX
Agile Data Mining with Data Vault 2.0 (english)
PDF
Performing network security analytics
PDF
IP&A109 Next-Generation Analytics Architecture for the Year 2020
PPTX
Envisioning the Next Generation of Analytics
PDF
Hw09 Welcome To Hadoop World
PDF
Next generation security analytics
PPTX
Introduction To Data Vault - DAMA Oregon 2012
PDF
Network Security‬ and Big ‪‎Data Analytics‬
PPTX
Survey: Security Analytics and Intelligence
PPTX
Security Analytics and Big Data: What You Need to Know
PPTX
Hadoop: An Industry Perspective
PDF
Building a Big Data platform with the Hadoop ecosystem
PPTX
Hive acid-updates-strata-sjc-feb-2015
Hadoop Integration into Data Warehousing Architectures
Large scale ETL with Hadoop
Hadoop and Enterprise Data Warehouse
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
Roadmap for solution company
Netadminpres
Security analytics
Agile Data Mining with Data Vault 2.0 (english)
Performing network security analytics
IP&A109 Next-Generation Analytics Architecture for the Year 2020
Envisioning the Next Generation of Analytics
Hw09 Welcome To Hadoop World
Next generation security analytics
Introduction To Data Vault - DAMA Oregon 2012
Network Security‬ and Big ‪‎Data Analytics‬
Survey: Security Analytics and Intelligence
Security Analytics and Big Data: What You Need to Know
Hadoop: An Industry Perspective
Building a Big Data platform with the Hadoop ecosystem
Hive acid-updates-strata-sjc-feb-2015
Ad

Similar to Big Data 2.0: ETL & Analytics: Implementing a next generation platform (20)

PPTX
Actian Analytics Platform - Hadoop SQL Edition
PPTX
SQL + Hadoop: The High Performance Advantage�
PPTX
Analytics at the Speed of Thought: Actian Express Overview
PDF
Time's Up! Getting Value from Big Data Now
PDF
Streaming Visualization
PDF
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
PDF
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
PDF
How to scale your PaaS with OVH infrastructure?
PDF
Horses for Courses: Database Roundtable
PPTX
Digital Business Transformation in the Streaming Era
PPTX
How to Build Continuous Ingestion for the Internet of Things
PDF
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
PDF
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
PPTX
Feature Store as a Data Foundation for Machine Learning
ODP
Cloud Computing & Sun Vision 03262009
PDF
Continuuity Presents at Under the Radar 2013
PPTX
Vmware Serengeti - Based on Infochimps Ironfan
PDF
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
PDF
Benefits of the Azure Cloud
PDF
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Actian Analytics Platform - Hadoop SQL Edition
SQL + Hadoop: The High Performance Advantage�
Analytics at the Speed of Thought: Actian Express Overview
Time's Up! Getting Value from Big Data Now
Streaming Visualization
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
How to scale your PaaS with OVH infrastructure?
Horses for Courses: Database Roundtable
Digital Business Transformation in the Streaming Era
How to Build Continuous Ingestion for the Internet of Things
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Feature Store as a Data Foundation for Machine Learning
Cloud Computing & Sun Vision 03262009
Continuuity Presents at Under the Radar 2013
Vmware Serengeti - Based on Infochimps Ironfan
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Benefits of the Azure Cloud
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures

More from Caserta (20)

PPTX
Using Machine Learning & Spark to Power Data-Driven Marketing
PPTX
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
PDF
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
PDF
General Data Protection Regulation - BDW Meetup, October 11th, 2017
PDF
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
PPTX
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
PDF
Introduction to Data Science (Data Summit, 2017)
PDF
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
PDF
The Rise of the CDO in Today's Enterprise
PDF
Building a New Platform for Customer Analytics
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
PDF
You're the New CDO, Now What?
PDF
The Data Lake - Balancing Data Governance and Innovation
PDF
Making Big Data Easy for Everyone
PDF
Big Data Analytics on the Cloud
PDF
Intro to Data Science on Hadoop
PDF
The Emerging Role of the Data Lake
PDF
Not Your Father's Database by Databricks
PDF
Mastering Customer Data on Apache Spark
PDF
Moving Past Infrastructure Limitations
Using Machine Learning & Spark to Power Data-Driven Marketing
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Introduction to Data Science (Data Summit, 2017)
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
The Rise of the CDO in Today's Enterprise
Building a New Platform for Customer Analytics
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
You're the New CDO, Now What?
The Data Lake - Balancing Data Governance and Innovation
Making Big Data Easy for Everyone
Big Data Analytics on the Cloud
Intro to Data Science on Hadoop
The Emerging Role of the Data Lake
Not Your Father's Database by Databricks
Mastering Customer Data on Apache Spark
Moving Past Infrastructure Limitations

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation theory and applications.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Artificial Intelligence
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
Encapsulation theory and applications.pdf
Review of recent advances in non-invasive hemoglobin estimation
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
A Presentation on Artificial Intelligence
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

  • 1. Big Data 2.0: ETL & Analytics Implementing a next generation platform Tyler Mitchell, Paul Dingman Innovation Lab January 2014
  • 2. ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS Outcomes Sources Enterprise Applications Data Warehouse Actian Analytics Platform Connect Analyze Customer Delight Act Social Competitive Advantage Accelerators Internet of Things DataFlow WWW Machine Data Matrix Vector World-Class Risk Management Mobile Traditional NoSQL SaaS Disruptive New Business Models → → → → 2 Rapid Time to Value Unlimited Scale Extreme Performance Disruptive price/performance → Modern GUI Development → In-memory Analytics → Extends Hadoop and NoSQL analytics → Complements Traditional → → → → 200+ data connectors 600+ analytic functions Full deployment choice Certification with broad set of analytics tools
  • 3. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
  • 4. Actian DataFlow – High Speed Hadoop ETL, DQ, and Analytics, No Programming Actian Dataflow Choose from five sets of operators: Transformation & Analytics Libraries Connections Visual Framework Transformation Automatically detect resources, plan optimal utilization, and parallelize all workloads on Hadoop Data Quality Use dual pipeline parallelism to accelerate performance 10X Analytics Data Science Optimize Query Pipelining Manage the entire analytic process in a visual framework with no coding required. Hadoop – Leader Node Reuse and share all components from operators to workflows Take processing to where the data lives, runs natively on any Hadoop distribution Actian Accelerator for Hadoop Run fully optimized processing directly on the Hadoop node or on any file system CPU Pipelining Optimized, On-HDFS Processing Confidential © 2013 Actian Corporation
  • 5. ACTIAN DATAFLOW – ETL & ANALYTICS
  • 6. ACTIAN DATAFLOW – ETL & ANALYTICS • • • • Predefined operators Reduced IO In-memory operations Pipeline parallelism Hadoop 2.0 - what is the big deal YARN – a new resourced scheduler ! Yet Another Resource Scheduler” DATAFLOW DATAFLOW ob Tracker and Task Tracker has been split up to increase scalability Remove MapReduce from core architecture Now there is a
  • 7. Operator Library – ETL/DQ  Reading/Writing  Text Processing  Data Exploration  Data Matching  Aggregation  Filtering  Manipulation 7
  • 8. Innovation Lab  Tactical mission: • Driving platform integration  Strategic mission: • Blueprint next-generation analytic apps & solution architectures • Advance new science where data and algorithms intersect • Solution demoware Confidential © 2014 Actian Corporation 8
  • 9. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics Semantic Web Confidential © 2014 Actian Corporation
  • 10. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE EVERYTHING IS LOG DATA • Application logs • System monitoring • Real-time feeds Event Logs Confidential © 2014 Actian Corporation
  • 11. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE TIME-ORDERED PERSISTENCE DBMS – SMP/MPP Time Series Event Logs ETL & Analytics • • • • Schema-less flexibility Semantic Web Extendable first-class citizens (Time, Location, Type) Universal accessibility Complete archive of raw events Confidential © 2014 Actian Corporation
  • 12. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE VARIABLE OUTPUT TARGETS DBMS – SMP/MPP Traditional DW loading Time Series Time window analysis ETL & Analytics Load, analyze, re-feed Semantic Web Patterns, graph traversal, visuals Confidential © 2014 Actian Corporation
  • 13. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics ACTIAN DATAFLOW SPARQLverse Confidential © 2014 Actian Corporation
  • 14. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500  [74.95.141.217, 10.120.245.3]  User[id=2162,name=tmitchell]  login  57509328 Confidential © 2014 Actian Corporation
  • 15. DATA LOADING ACTIAN DATA CLOUD LOG FILE EXAMPLE  2013-08-01T03:38:42.236-0500 - Time  [74.95.141.217, 10.120.245.3] - Space  User[id=2162,name=tmitchell] - People  login - Activity  57509328 - Magnitude Confidential © 2014 Actian Corporation
  • 16. DATA LOADING HBASE LOADER Dataflow workflow built into KNIME open source data mining app Confidential © 2014 Actian Corporation
  • 17. DATA LOADING HBASE STRUCTURED Event Record  hasSource – IP Address  hasTime – timestamp  hasValue – full source  hasType – data cloud type  hasLoadTimestamp – timestamp Confidential © 2014 Actian Corporation
  • 18. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
  • 19. HBASE TO OPENTSDB Optimized HBase reader, selects a time window and dumps to text files for serving to OpenTSDB Perpetual Load Service Confidential © 2014 Actian Corporation
  • 20. EMIT TO OPENTSDB event.glassfish 1390373743 38720912 method=listUsers rowid=0548e8 id=79 - metric name - timestamp - execution time - method called - row ID - user ID Confidential © 2014 Actian Corporation
  • 21. OPENTSDB UI Confidential © 2014 Actian Corporation
  • 22. OPENTSDB UI Confidential © 2014 Actian Corporation
  • 23. CUSTOM WEB VIZ Built using: • Autobahn Python Websockets • OpenTSDB Web API • D3 visualization Confidential © 2014 Actian Corporation
  • 24. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse Confidential © 2014 Actian Corporation
  • 26. MACHINE LEARNING ON HBASE Observe Act! Confidential © 2014 Actian Corporation
  • 27. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics SPARQLverse * aka SPARQLBase.com Confidential © 2014 Actian Corporation
  • 28. DATA LOADING RDF/SEMANTIC WEB LOADER RDF/Tr iples Writer Coming Soon Confidential © 2014 Actian Corporation
  • 29. FROM LOG TO SPARQLVERSE From Single Record Confidential © 2014 Actian Corporation
  • 30. TRIPLES EXAMPLE Agent <produces> Record Record <logsDataAbout> User Client <isCalledBy> User Client <requestsFrom> S Server <repliesTo> Cli Confidential © 2014 Actian Corporation
  • 31. SAMPLE SPARQL QUERY SELECT (count(*) as ?cntCalls) (sum(?time) as ?timeSum) FROM <event> WHERE { ?record :logsDataAbout ?client . ?user :initiates ?client . ?record :exectime ?time . } Confidential © 2014 Actian Corporation
  • 32. SAMPLE SPARQL QUERY … ?record :logsDataAbout ?client . ?user :initiates ?client . … Confidential © 2014 Actian Corporation
  • 33. VISUALIZE DATA GRAPHS Gephi desktop UI - supports RDF import D3 web UI example Confidential © 2014 Actian Corporation
  • 34. BIG DATA 2.0 – “BOWTIE” ARCHITECTURE PUTTING A FACE TO THE NAME Actian Matrix OpenTSDB Event Logs Dataflow Analytics Sparqlverse • Used behind Amazon Redshift Confidential © 2014 Actian Corporation
  • 35. Actian Matrix for High Performance Analytics at Any Scale Serve up highperformance analytic processing for any app On-Demand Analytics On-Demand Integration Orchestration Manage dataflows across the entire analytic process Connect to any data source at the point of the query 700+ indatabase, analytic functions Analytic Libraries Optimizer Massively Parallel LEADER NODE Columnar 5 LEVELS OF OPTIMIZATION: Compressed Compiled SQL Connected Planning Execution Communications Memory H H H H H H H H H H H H Node-to-node, bidirectional sharing of analytics & processes with Hadoop nodes Confidential © 2013 Actian Corporation
  • 36. EXPORT TO MATRIX/MPP HBASE TO MATRIX LOADER Load Matrix MPP Confidential © 2014 Actian Corporation
  • 37. EXPORT TO MATRIX/MPP HBase to Matrix Loader Confidential © 2014 Actian Corporation
  • 38. FUTURE DIRECTION Confidential © 2014 Actian Corporation
  • 39. FUTURE DIRECTION Real-time processing Sematic event processing Continued integration Confidential © 2014 Actian Corporation

Editor's Notes

  • #5: Extreme PerformanceRuns natively on Hadoop, so 500% faster than MapReduceExtreme ScaleRun on a laptopScale out to n number of nodes on any file systemExtreme AgilityETL, DQ and Analytics on Hadoop with no codingMove from any FS to any FS with no changes