SlideShare a Scribd company logo
© 2015 Autodesk
Building a Self-Service
Big Data Pipeline
Charlie Crocker
Business Analytics Program Lead
Hadoop Summit, San Jose – June 2015
© 2015 Autodesk
© 2015 Autodesk
Multi-core & GPU
Cloud
Distributed Computing
Reality Capture
Model Sophistication
Variations Data
Compute
© 2015 Autodesk
© 2015 Autodesk
BIG DATA PIPELINE DETAILS
© 2015 Autodesk
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
© 2014 Autodesk
CONSISTENT TRUSTED ACCESSIBLE
INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
© 2015 Autodesk
Production Big Data Pipeline Stats
• Core Services
• 360 Products/Services
• Desktop Products
• Operations Data
• 2.1 billion transactions/day
• 350 source types
• 750-800 GB indexed daily
• 165(+) active Users
• 800 Terabytes total
• 90 GB/day
• 350 S3
Aggregations
• 128 Tableau Desktop
• 57 Tableau Server
• 25 Datameer Users
• 10 Qlikview Dashboards
• 150 QV Users
• >80 GBQ Tables
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
Example: Specific Service Calls
Over 60 million/day
© 2015 Autodesk
© 2015 Autodesk
Example: Desktop Analytics Managed Source:
Trusted
Consistent
Accessible
3.1M Users/Wk
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to Kafka
Apply
Log
Schema
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to Kafka
Apply
Log
Schema
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
SLOW
DOWN
SLOW
DOWN
SLOW
DOWN
SLOW
DOWN
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
SLOW
DOWN
SLOW
DOWN
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
Forward
to Kafka
Apply
Log
Schema
Onboard faster:
Transition to Services
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to Kafka
Apply
Log
Schema
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
Deliver value faster:
Streamlined Access
Onboard faster:
Transition to Services
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
TRANSITION TO SERVICES
© 2015 Autodesk
Tools
Fragmented
Architecture
Manual ingestion
(Kafka)
Dashboard POCs
Production Scaling
Services
Architecture Alignment
Managed Ingestion (CSE)
ADSK Dashboard
Framework
© 2015 Autodesk
. highly available
. secure
. massively scalable
. insanely high volume
. cloud ops infrastructure
Build Services
© 2015 Autodesk
. easy to consume sdks
. simple data contracts
. self service onboarding
. fault tolerant sdks
Make Services Ridiculously Easy
© 2015 Autodesk
Fast Access
Layer
Client SDKs
Data Portal
Analytics as a Service
API Access
Cross Service
Eventing
Metadata
Management
Analytics Tools Scoring Pipeline Dashboard
Framework
Other
Services
+
Scaleable
Compute
Workflow
Management
Ingestion Injection
© 2015 Autodesk
Platform Services Detail
Desktop
(Windows, Mac, Linux)
Mobile
(iOS, Android,
Windows)
Web
(Chrome, Explorer,
Safari, etc.)
Client MPA
Service
Cloud Services
Explore/Publish
Datameer
API Access
Data Virtualization (EDW)
Denodo
Batch Processing
(Hive Cluster)
Fast Access
Google BigQuery, Red Shift, Spark, QVD
Reporting
Tableau, Qlikview,
Dashboards
Core Services Traditional Data Warehouses
Back Office
(SAP, Siebel, etc.)
Enterprise Data Lake: Storage (S3)
Query Processing
(Hive Cluster)
CSE (Ingestion) Injector
Govern Enterprise Data Lake: Metadata
© 2015 Autodesk
STREAMLINED DATA ACCESS
© 2015 Autodesk
Analytics Consumers
Non-Technical Users
1000s
10s
Business Analyst
Data Analyst
Data
Scientists
Analytics
Ops
© 2014 Autodesk
• Excel like
• Easy to access
• Medium to small
data set
• Easy to display
• Easy to aggregate
• Handle large data
• Data visualization
• Integration with
other tools• Connection with other
data source
• Handle unstructured
data
• Combine data from
multiple sources
© 2015 Autodesk
Self-Service Explore, Aggregation and Publish
Non-technical users need to quickly explore,
create, and publish aggregations from the data lake
and visualize the results in their tool of choice.
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
One Source, Multiple Access Points
 Daily push to
 S3 buckets and REST API
 Google Big Query or Redshift
 Access
 Tableau Server (GBQ)
 Qlikview (REST, QVDs)
 ADSK Dashboards (S3)
 Datameer (S3)
 Hive (EMR and S3)
 Data Products
 Early Warning System
 Syndicated Video Wall
 Executive Daily Reports
 Personalized Product Experiences
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 year (or more)
Aggregated &
summarized
data
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Unified Customer
Profile
QlikView
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 year (or more)
Aggregated &
summarized
data
ess &
actional
ODS
AP
crip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Unified Customer
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
From this
© 2015 Autodesk
To this
© 2015 Autodesk
Datameer: Big Data Analytics for Hadoop
Wizard-led Data Integration
No ETL
70+ Connectors + plug-in API
Smart Sampling
Point-and-click Analytics
Spreadsheet UI
270+ pre-built functions
Visual Data Profiling
Drag-and-Drop Visualization
30+ Visualization Widgets
HTML5 support
View on any device
© 2015 Autodesk
Datameer: Create Standard Aggregations
 Parse JSON from S3
 Join to account data
 Process using EMR compute
 Output directly to S3
 Output directly to Tableau Server
Couple hours instead of 5 weeks
waiting for engineering sprint
© 2015 Autodesk
One
Catalog
© 2015 Autodesk
One
Catalog
© 2015 Autodesk
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
© 2014 Autodesk
CONSISTENT TRUSTED ACCESSIBLE
INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
© 2015 Autodesk
We’re Hiring!
Data Geeks (Scientists?)
Data Analysts
Data Engineers
Charlie.Crocker@autodesk.com
Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to
their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or
graphical errors that may appear in this document.
© 2015 Autodesk, Inc. All rights reserved.

More Related Content

PDF
Data Warehouse or Data Lake, Which Do I Choose?
PDF
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
PDF
Data Governance
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
PPTX
Developing a Data Strategy
PDF
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
PPTX
Introduction to Data Engineering
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Data Warehouse or Data Lake, Which Do I Choose?
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
Data Governance
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Developing a Data Strategy
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Introduction to Data Engineering
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...

What's hot (20)

PPT
Data Lakehouse Symposium | Day 1 | Part 2
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PPTX
Big data architectures and the data lake
PPTX
Talend Data Quality
PDF
Five Things to Consider About Data Mesh and Data Governance
PPTX
Data Governance Best Practices
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
PDF
Review of Data Management Maturity Models
PPTX
The Data Driven University - Automating Data Governance and Stewardship in Au...
PDF
8 Steps to Creating a Data Strategy
PDF
Introducing Databricks Delta
PDF
You Need a Data Catalog. Do You Know Why?
PDF
Business-Architecture-Model-DAMA-Presentation.pdf
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
Building a modern data warehouse
PDF
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
PDF
Modern Data architecture Design
PDF
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
PDF
DMBOK and Data Governance
Data Lakehouse Symposium | Day 1 | Part 2
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Big data architectures and the data lake
Talend Data Quality
Five Things to Consider About Data Mesh and Data Governance
Data Governance Best Practices
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Review of Data Management Maturity Models
The Data Driven University - Automating Data Governance and Stewardship in Au...
8 Steps to Creating a Data Strategy
Introducing Databricks Delta
You Need a Data Catalog. Do You Know Why?
Business-Architecture-Model-DAMA-Presentation.pdf
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Building a modern data warehouse
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
Modern Data architecture Design
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Data Warehousing Trends, Best Practices, and Future Outlook
DMBOK and Data Governance
Ad

Viewers also liked (7)

PDF
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
PPTX
AI For Enterprise
PPTX
Comparison of MPP Data Warehouse Platforms
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
KEY
Intro to Data Science for Enterprise Big Data
PDF
2017 Digital Yearbook
PDF
Digital in 2017 Global Overview
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
AI For Enterprise
Comparison of MPP Data Warehouse Platforms
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Intro to Data Science for Enterprise Big Data
2017 Digital Yearbook
Digital in 2017 Global Overview
Ad

Similar to Building a Self-Service Big Data Pipeline (20)

PPTX
Hadoop @ LifeWay
PDF
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
PDF
Horses for Courses: Database Roundtable
PDF
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
PPTX
Enterprise Cloud Data Platforms - with Microsoft Azure
PPSX
BI on Cloud - Perspective from SAP
PDF
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
PPTX
GIS Into to Cloud Microsoft Azure
PPTX
TIBCO Advanced Analytics Meetup (TAAM) November 2015
PPTX
Microsoft cloud big data strategy
PDF
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
PDF
From an experiment to a real production environment
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PDF
Big Data Ready Enterprise
PDF
SAP Business Data Cloud: Was die neue SAP-Lösung für Unternehmen und ihre Dat...
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
PDF
Drive Business Outcomes for Big Data Environments
PDF
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
PPTX
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
PDF
Wie Sie ungenutzte SAP BusinessObjects Lizenzen für die SAP Analytics Cloud n...
Hadoop @ LifeWay
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Horses for Courses: Database Roundtable
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Enterprise Cloud Data Platforms - with Microsoft Azure
BI on Cloud - Perspective from SAP
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
GIS Into to Cloud Microsoft Azure
TIBCO Advanced Analytics Meetup (TAAM) November 2015
Microsoft cloud big data strategy
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
From an experiment to a real production environment
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Ready Enterprise
SAP Business Data Cloud: Was die neue SAP-Lösung für Unternehmen und ihre Dat...
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Drive Business Outcomes for Big Data Environments
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Wie Sie ungenutzte SAP BusinessObjects Lizenzen für die SAP Analytics Cloud n...

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Monthly Chronicles - July 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Building a Self-Service Big Data Pipeline

  • 1. © 2015 Autodesk Building a Self-Service Big Data Pipeline Charlie Crocker Business Analytics Program Lead Hadoop Summit, San Jose – June 2015
  • 3. © 2015 Autodesk Multi-core & GPU Cloud Distributed Computing Reality Capture Model Sophistication Variations Data Compute
  • 5. © 2015 Autodesk BIG DATA PIPELINE DETAILS
  • 6. © 2015 Autodesk 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 © 2014 Autodesk CONSISTENT TRUSTED ACCESSIBLE INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
  • 7. © 2015 Autodesk Production Big Data Pipeline Stats • Core Services • 360 Products/Services • Desktop Products • Operations Data • 2.1 billion transactions/day • 350 source types • 750-800 GB indexed daily • 165(+) active Users • 800 Terabytes total • 90 GB/day • 350 S3 Aggregations • 128 Tableau Desktop • 57 Tableau Server • 25 Datameer Users • 10 Qlikview Dashboards • 150 QV Users • >80 GBQ Tables Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 8. © 2015 Autodesk Example: Specific Service Calls Over 60 million/day
  • 10. © 2015 Autodesk Example: Desktop Analytics Managed Source: Trusted Consistent Accessible 3.1M Users/Wk
  • 11. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Kafka Apply Log Schema Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 12. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Kafka Apply Log Schema Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore SLOW DOWN SLOW DOWN SLOW DOWN SLOW DOWN Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 13. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore SLOW DOWN SLOW DOWN Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d Forward to Kafka Apply Log Schema Onboard faster: Transition to Services
  • 14. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Kafka Apply Log Schema Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore Deliver value faster: Streamlined Access Onboard faster: Transition to Services Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 16. © 2015 Autodesk Tools Fragmented Architecture Manual ingestion (Kafka) Dashboard POCs Production Scaling Services Architecture Alignment Managed Ingestion (CSE) ADSK Dashboard Framework
  • 17. © 2015 Autodesk . highly available . secure . massively scalable . insanely high volume . cloud ops infrastructure Build Services
  • 18. © 2015 Autodesk . easy to consume sdks . simple data contracts . self service onboarding . fault tolerant sdks Make Services Ridiculously Easy
  • 19. © 2015 Autodesk Fast Access Layer Client SDKs Data Portal Analytics as a Service API Access Cross Service Eventing Metadata Management Analytics Tools Scoring Pipeline Dashboard Framework Other Services + Scaleable Compute Workflow Management Ingestion Injection
  • 20. © 2015 Autodesk Platform Services Detail Desktop (Windows, Mac, Linux) Mobile (iOS, Android, Windows) Web (Chrome, Explorer, Safari, etc.) Client MPA Service Cloud Services Explore/Publish Datameer API Access Data Virtualization (EDW) Denodo Batch Processing (Hive Cluster) Fast Access Google BigQuery, Red Shift, Spark, QVD Reporting Tableau, Qlikview, Dashboards Core Services Traditional Data Warehouses Back Office (SAP, Siebel, etc.) Enterprise Data Lake: Storage (S3) Query Processing (Hive Cluster) CSE (Ingestion) Injector Govern Enterprise Data Lake: Metadata
  • 22. © 2015 Autodesk Analytics Consumers Non-Technical Users 1000s 10s Business Analyst Data Analyst Data Scientists Analytics Ops © 2014 Autodesk • Excel like • Easy to access • Medium to small data set • Easy to display • Easy to aggregate • Handle large data • Data visualization • Integration with other tools• Connection with other data source • Handle unstructured data • Combine data from multiple sources
  • 23. © 2015 Autodesk Self-Service Explore, Aggregation and Publish Non-technical users need to quickly explore, create, and publish aggregations from the data lake and visualize the results in their tool of choice. Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 24. © 2015 Autodesk One Source, Multiple Access Points  Daily push to  S3 buckets and REST API  Google Big Query or Redshift  Access  Tableau Server (GBQ)  Qlikview (REST, QVDs)  ADSK Dashboards (S3)  Datameer (S3)  Hive (EMR and S3)  Data Products  Early Warning System  Syndicated Video Wall  Executive Daily Reports  Personalized Product Experiences Analytics & Reports Batch Oriented Business, Product & Customer Behavior Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 year (or more) Aggregated & summarized data Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Unified Customer Profile QlikView 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 year (or more) Aggregated & summarized data ess & actional ODS AP crip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Unified Customer Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 27. © 2015 Autodesk Datameer: Big Data Analytics for Hadoop Wizard-led Data Integration No ETL 70+ Connectors + plug-in API Smart Sampling Point-and-click Analytics Spreadsheet UI 270+ pre-built functions Visual Data Profiling Drag-and-Drop Visualization 30+ Visualization Widgets HTML5 support View on any device
  • 28. © 2015 Autodesk Datameer: Create Standard Aggregations  Parse JSON from S3  Join to account data  Process using EMR compute  Output directly to S3  Output directly to Tableau Server Couple hours instead of 5 weeks waiting for engineering sprint
  • 31. © 2015 Autodesk 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 © 2014 Autodesk CONSISTENT TRUSTED ACCESSIBLE INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
  • 32. © 2015 Autodesk We’re Hiring! Data Geeks (Scientists?) Data Analysts Data Engineers Charlie.Crocker@autodesk.com
  • 33. Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. © 2015 Autodesk, Inc. All rights reserved.

Editor's Notes

  • #8: Ingest 2.1 billion event/day 200 (+) data sources flowing into production Splunk 600-650 GB indexed daily Process 800 Terabytes of active data in Enterprise Data Lake (EDL) 90 GB/day entering the EDL 300 S3 Aggregations updating daily, with replicated in 50 GBQ Tables Consume >100 Tableau Desktop and 50 Tableau Server users Feeding 8 Qlikview dashboards with 150 active QV users Feeding the Early Warning System
  • #11: Global Reach Fast ramp up Managed Data Source What does it mean to be managed Owner Pipeline Metadata Stays current
  • #13: Interactive & Focused
  • #14: Interactive & Focused
  • #20: API access, workflow management (Oozie), support for data streaming and machine learning.