SlideShare a Scribd company logo
© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary
Interactive Visual Statistics
on Massive Datasets
Peter Wang
CTO, Co-founder Continuum Analytics
@pwang
© 2016 Continuum Analytics - Confidential & Proprietary 2
• Introductions
• Company Overview
• Goals of Analytics and IT teams
• Why Python for Data Science
• Anaconda - Making Python Better for Data Science
• Package Management
• Cluster Environment Management
• Notebook Computing
• Demonstrations
• Q&A / Next steps
AGENDA
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
What’s the Problem?
© 2016 Continuum Analytics - Confidential & Proprietary
Big data magnifies small problems
4
• Of course, big data presents storage and computation problems
• More importantly, standard plotting tools have problems that are
magnified by big data:
• Overdrawing/Overplotting
• Saturation
• Undersaturation
• Binning issues
• We’ll first explain these problems, and then present a new technique
called datashading to address them head-on.
© 2016 Continuum Analytics - Confidential & Proprietary
Overdrawing
5
• For a scatterplot, the order in
which points are drawn is very
important
• The same distribution can look
entirely different depending on
plotting order
• Last data plotted overplots
© 2016 Continuum Analytics - Confidential & Proprietary
Overdrawing
6
• Underlying issue is just
occlusion
• Same problem happens with
one category, but less
obvious
• Can prevent occlusion using
transparency
© 2016 Continuum Analytics - Confidential & Proprietary
Saturation
7
• E.g. for alpha = 0.1, up to 10 points can
overlap before saturating the available
brightness
• Now the order of plotting matters less
• After 10 points, first-plotted data still lost
• For one category, 10, 20, or 2000 points
overlapping will look identical
© 2016 Continuum Analytics - Confidential & Proprietary
Saturation
8
• Same alpha value, more points:
• Now is highly misleading
• alpha value depends on size, overlap of
dataset
• Difficult-to-set parameter, hard to know
when data is misrepresented
© 2016 Continuum Analytics - Confidential & Proprietary
Saturation
9
• Can try to reduce point size to reduce
overplotting and saturation
• Now points are hard to see, with no
guarantee of avoiding problems
• Another difficult-to-set parameter
• For really big data, scatterplots start to
become very inefficient, because there
are many datapoints per pixel — may
as well be binning by pixel
© 2016 Continuum Analytics - Confidential & Proprietary
Binning issues
10
• Can use heatmap instead
of scatter
• Avoids saturation by auto-
ranging on bins
• Result independent of data
size
• Here two merged normal
distributions look very
different at different binning
• Another difficult-to-set
parameter
© 2016 Continuum Analytics - Confidential & Proprietary
Plotting big data
11
• When exploring really big data, the visualization is all you have — there’s
no way to look at each of the individual data points
• Common plotting problems can lead to completely incorrect conclusions
based on misleading visualizations
• Slow processing makes trial and error approach ineffective
When data is large, you don’t know when the viz is lying.
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading
13
• Flexible, configurable pipeline for automatic plotting
• Provides flexible plugins for viz stages, like in graphics shaders
• Completely prevents overplotting, saturation, and undersaturation
• Mitigates binning issues by providing fully interactive exploration in web
browsers, even of very large datasets on ordinary machines
• Statistical transformations of data are a first-class aspect of the
visualization
• Allows rapid iteration of visual styles & configs, interactive selections and
filtering, to support data exploration
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading Pipeline: Projection
14
Data
Project /
Synthesize
Scene
• Stage 1: select variables (columns) to project onto the screen
• Data often filtered at this stage
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading Pipeline: Aggregation
15
Data
Project /
Synthesize
Scene Aggregates
Sample /
Raster
• Stage 2: Aggregate data into a fixed set of bins
• Each bin yields one or more scalars (total count, mean, stddev, etc.)
© 2016 Continuum Analytics - Confidential & Proprietary
Datashading Pipeline: Transfer
16
Data
Project /
Synthesize
Scene Aggregates
Sample /
Raster Transfer
Image
• Stage 3: Transform data using one or more transfer functions, culminating in a function
that yields a visible image
• Each stage can be replaced and configured separately
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
Demos
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
New Developments
© 2016 Continuum Analytics - Confidential & Proprietary
Flexible Statistics
19
Normalized Vegetation
Difference Index
© 2016 Continuum Analytics - Confidential & Proprietary
Flexible Statistics
20
Slope & Aspect Ratio
from pure Elevation
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
Anaconda
© 2016 Continuum Analytics - Confidential & Proprietary 22
• Simplify setup for non-engineers

• Enable easy development on and
deployment to multiple platforms.

• Enable data scientists to experiment
and iterate even more rapidly

• Eliminate the pains associated with
package and dependency management
Why Did We Create Anaconda?
To Enhance Python and Enable Data Scientist to Quickly Engage with Their Data
© 2016 Continuum Analytics - Confidential & Proprietary 23
Anaconda
Modern, Open-Source Analytics Platform
powered by Python
Quickly Engage w/ Your Data
• 500+ Popular Python Packages
• Optimized & Compiled
• Free for Everyone
• Extensible via Conda Package Manager
• Sandbox Packages & Libraries
• Cross-Platform – Windows, Linux, Mac
• Not just Python - over 230 R packages
• Foundation of our Enterprise Products
© 2016 Continuum Analytics - Confidential & Proprietary 24
On-premises package repository and sharing platform
• Governance for your analytics environment - maintain
control of the packages used by your analysts

• Easily replicate and share analysts’ environments

• Centrally store proprietary libraries and manage versioning
Cluster environment management
• Manages Python, R, Java, Scala packages

across the cluster

• Easily replicate analysts’ environments for different jobs/
users/groups

• Strong support for Hadoop & Spark
Anaconda Enterprise
© 2016 Continuum Analytics - Confidential & Proprietary 25
Anaconda Enterprise
Scalable Computing and Collaboration
• Multi-user notebook deployments
• Scalable notebook deployment model
• Project-based management
• Notebook versioning and locking
• Extended support for Hadoop Stack
(Storm, Spark Streaming, Kafka)
• Single sign-on support(PKI, Kerberos etc.)
• Burst Compute support
© 2016 Continuum Analytics - Confidential & Proprietary 26
Consulting
Customers include:
• JPL
• DARPA
• Sandia National Labs
• AMD
• Bank of America
• Bloomberg
We Will Help Design, Architect, and Build the Right Analytics For You
Leverage our Open-Source Projects

More Related Content

PPTX
Building and Maintaining Bulletproof Systems with DataStax
PPTX
The Big Data Ecosystem for Financial Services
PPTX
Introduction: Architecting for Scale
PPT
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
PPTX
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
PDF
Southwest Power Pool big data case study
PDF
Designing a Distributed Cloud Database for Dummies
PDF
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Building and Maintaining Bulletproof Systems with DataStax
The Big Data Ecosystem for Financial Services
Introduction: Architecting for Scale
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Southwest Power Pool big data case study
Designing a Distributed Cloud Database for Dummies
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...

What's hot (20)

PPT
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
PDF
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
PDF
Delivering rapid-fire Analytics with Snowflake and Tableau
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
PPTX
A brief history of data warehousing
PPTX
Part 1: Introducing the Cloudera Data Science Workbench
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
PDF
Modernizing Data Management Through Metadata
PDF
Observability at Spotify
PPTX
Unlock the value in your big data reservoir using oracle big data discovery a...
PDF
Big Data at a Gaming Company: Spil Games
PPTX
How to Realize an Additional 270% ROI on Snowflake
PPTX
Altis AWS Snowflake Practice
PPTX
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
PPTX
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
PDF
Top 5 Considerations for a Big Data Solution
PPTX
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
PDF
Introduction to Cloud Applications
PDF
What is Big Data Discovery, and how it complements traditional business anal...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Delivering rapid-fire Analytics with Snowflake and Tableau
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
A brief history of data warehousing
Part 1: Introducing the Cloudera Data Science Workbench
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Modernizing Data Management Through Metadata
Observability at Spotify
Unlock the value in your big data reservoir using oracle big data discovery a...
Big Data at a Gaming Company: Spil Games
How to Realize an Additional 270% ROI on Snowflake
Altis AWS Snowflake Practice
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Top 5 Considerations for a Big Data Solution
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Introduction to Cloud Applications
What is Big Data Discovery, and how it complements traditional business anal...
Ad

Viewers also liked (17)

PPTX
PLOTCON NYC: Building Products Out of Data
PPTX
PLOTCON NYC: The Future of Business Intelligence: Data Visualization
PDF
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PDF
PLOTCON NYC: Custom Colormaps for Your Field
PPTX
PLOTCON NYC: Enterprise Dataviz' Unicorn Problem
PDF
PLOTCON NYC: New Open Viz in R
PDF
PLOTCON NYC: Domain Specific Visualization
PDF
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...
PPTX
PLOTCON NYC: New Data Viz in Data Journalism
PDF
PLOTCON NYC: Data Science in the Enterprise From Concept to Execution
PDF
PLOTCON NYC: Building a Flexible Analytics Stack
PPTX
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social Data
PDF
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PDF
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in Julia
PPTX
PLOTCON NYC: Text is data! Analysis and Visualization Methods
PPTX
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
PPTX
What’s New in the Berkeley Data Analytics Stack
PLOTCON NYC: Building Products Out of Data
PLOTCON NYC: The Future of Business Intelligence: Data Visualization
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Custom Colormaps for Your Field
PLOTCON NYC: Enterprise Dataviz' Unicorn Problem
PLOTCON NYC: New Open Viz in R
PLOTCON NYC: Domain Specific Visualization
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualizati...
PLOTCON NYC: New Data Viz in Data Journalism
PLOTCON NYC: Data Science in the Enterprise From Concept to Execution
PLOTCON NYC: Building a Flexible Analytics Stack
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social Data
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in Julia
PLOTCON NYC: Text is data! Analysis and Visualization Methods
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
What’s New in the Berkeley Data Analytics Stack
Ad

Similar to PLOTCON NYC: Interactive Visual Statistics on Massive Datasets (20)

PDF
PyData Barcelona Keynote
PDF
DataVirtulization
PPTX
Talend 6.1 - What's New in Talend?
PDF
Semantic E-Commerce - Use Cases in Enterprise Web Applications
PDF
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
PDF
Self-Service Analytics with Guard Rails
PPTX
5 Ways to Make Waves with Informatica and Salesforce Analytics
PPTX
Cognos Analytics Release 6: March 2017 Enhancements
PDF
The lean principles of data ops
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PPTX
How Data Drives Business at Choice Hotels
PDF
How to scale your PaaS with OVH infrastructure?
PDF
Alluxio Use Cases and Future Directions
PPTX
Redefining the Role of IT in a Self-Help Data Integration Environment
PPTX
OpenSource and the Cloud ApacheCon.pptx
PDF
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...
PDF
Presentación Paco Bermejo - La Noche del Sector Financiero
PPTX
TidalScale Overview
PPT
Difference between data warehouse and data mining
PDF
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
PyData Barcelona Keynote
DataVirtulization
Talend 6.1 - What's New in Talend?
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
Self-Service Analytics with Guard Rails
5 Ways to Make Waves with Informatica and Salesforce Analytics
Cognos Analytics Release 6: March 2017 Enhancements
The lean principles of data ops
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
How Data Drives Business at Choice Hotels
How to scale your PaaS with OVH infrastructure?
Alluxio Use Cases and Future Directions
Redefining the Role of IT in a Self-Help Data Integration Environment
OpenSource and the Cloud ApacheCon.pptx
Ken Czekaj & Robert Wright - Leveraging APM NPM Solutions to Compliment Cyber...
Presentación Paco Bermejo - La Noche del Sector Financiero
TidalScale Overview
Difference between data warehouse and data mining
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers

Recently uploaded (20)

PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Mega Projects Data Mega Projects Data
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Logistic Regression ml machine learning.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Foundation of Data Science unit number two notes
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Quality review (1)_presentation of this 21
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Mega Projects Data Mega Projects Data
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Logistic Regression ml machine learning.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Foundation of Data Science unit number two notes
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Fluorescence-microscope_Botany_detailed content
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Data_Analytics_and_PowerBI_Presentation.pptx
Quality review (1)_presentation of this 21
Reliability_Chapter_ presentation 1221.5784
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
IB Computer Science - Internal Assessment.pptx

PLOTCON NYC: Interactive Visual Statistics on Massive Datasets

  • 1. © 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary Interactive Visual Statistics on Massive Datasets Peter Wang CTO, Co-founder Continuum Analytics @pwang
  • 2. © 2016 Continuum Analytics - Confidential & Proprietary 2 • Introductions • Company Overview • Goals of Analytics and IT teams • Why Python for Data Science • Anaconda - Making Python Better for Data Science • Package Management • Cluster Environment Management • Notebook Computing • Demonstrations • Q&A / Next steps AGENDA
  • 3. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary What’s the Problem?
  • 4. © 2016 Continuum Analytics - Confidential & Proprietary Big data magnifies small problems 4 • Of course, big data presents storage and computation problems • More importantly, standard plotting tools have problems that are magnified by big data: • Overdrawing/Overplotting • Saturation • Undersaturation • Binning issues • We’ll first explain these problems, and then present a new technique called datashading to address them head-on.
  • 5. © 2016 Continuum Analytics - Confidential & Proprietary Overdrawing 5 • For a scatterplot, the order in which points are drawn is very important • The same distribution can look entirely different depending on plotting order • Last data plotted overplots
  • 6. © 2016 Continuum Analytics - Confidential & Proprietary Overdrawing 6 • Underlying issue is just occlusion • Same problem happens with one category, but less obvious • Can prevent occlusion using transparency
  • 7. © 2016 Continuum Analytics - Confidential & Proprietary Saturation 7 • E.g. for alpha = 0.1, up to 10 points can overlap before saturating the available brightness • Now the order of plotting matters less • After 10 points, first-plotted data still lost • For one category, 10, 20, or 2000 points overlapping will look identical
  • 8. © 2016 Continuum Analytics - Confidential & Proprietary Saturation 8 • Same alpha value, more points: • Now is highly misleading • alpha value depends on size, overlap of dataset • Difficult-to-set parameter, hard to know when data is misrepresented
  • 9. © 2016 Continuum Analytics - Confidential & Proprietary Saturation 9 • Can try to reduce point size to reduce overplotting and saturation • Now points are hard to see, with no guarantee of avoiding problems • Another difficult-to-set parameter • For really big data, scatterplots start to become very inefficient, because there are many datapoints per pixel — may as well be binning by pixel
  • 10. © 2016 Continuum Analytics - Confidential & Proprietary Binning issues 10 • Can use heatmap instead of scatter • Avoids saturation by auto- ranging on bins • Result independent of data size • Here two merged normal distributions look very different at different binning • Another difficult-to-set parameter
  • 11. © 2016 Continuum Analytics - Confidential & Proprietary Plotting big data 11 • When exploring really big data, the visualization is all you have — there’s no way to look at each of the individual data points • Common plotting problems can lead to completely incorrect conclusions based on misleading visualizations • Slow processing makes trial and error approach ineffective When data is large, you don’t know when the viz is lying.
  • 12. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary Datashading
  • 13. © 2016 Continuum Analytics - Confidential & Proprietary Datashading 13 • Flexible, configurable pipeline for automatic plotting • Provides flexible plugins for viz stages, like in graphics shaders • Completely prevents overplotting, saturation, and undersaturation • Mitigates binning issues by providing fully interactive exploration in web browsers, even of very large datasets on ordinary machines • Statistical transformations of data are a first-class aspect of the visualization • Allows rapid iteration of visual styles & configs, interactive selections and filtering, to support data exploration
  • 14. © 2016 Continuum Analytics - Confidential & Proprietary Datashading Pipeline: Projection 14 Data Project / Synthesize Scene • Stage 1: select variables (columns) to project onto the screen • Data often filtered at this stage
  • 15. © 2016 Continuum Analytics - Confidential & Proprietary Datashading Pipeline: Aggregation 15 Data Project / Synthesize Scene Aggregates Sample / Raster • Stage 2: Aggregate data into a fixed set of bins • Each bin yields one or more scalars (total count, mean, stddev, etc.)
  • 16. © 2016 Continuum Analytics - Confidential & Proprietary Datashading Pipeline: Transfer 16 Data Project / Synthesize Scene Aggregates Sample / Raster Transfer Image • Stage 3: Transform data using one or more transfer functions, culminating in a function that yields a visible image • Each stage can be replaced and configured separately
  • 17. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary Demos
  • 18. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary New Developments
  • 19. © 2016 Continuum Analytics - Confidential & Proprietary Flexible Statistics 19 Normalized Vegetation Difference Index
  • 20. © 2016 Continuum Analytics - Confidential & Proprietary Flexible Statistics 20 Slope & Aspect Ratio from pure Elevation
  • 21. © 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary Anaconda
  • 22. © 2016 Continuum Analytics - Confidential & Proprietary 22 • Simplify setup for non-engineers
 • Enable easy development on and deployment to multiple platforms.
 • Enable data scientists to experiment and iterate even more rapidly
 • Eliminate the pains associated with package and dependency management Why Did We Create Anaconda? To Enhance Python and Enable Data Scientist to Quickly Engage with Their Data
  • 23. © 2016 Continuum Analytics - Confidential & Proprietary 23 Anaconda Modern, Open-Source Analytics Platform powered by Python Quickly Engage w/ Your Data • 500+ Popular Python Packages • Optimized & Compiled • Free for Everyone • Extensible via Conda Package Manager • Sandbox Packages & Libraries • Cross-Platform – Windows, Linux, Mac • Not just Python - over 230 R packages • Foundation of our Enterprise Products
  • 24. © 2016 Continuum Analytics - Confidential & Proprietary 24 On-premises package repository and sharing platform • Governance for your analytics environment - maintain control of the packages used by your analysts
 • Easily replicate and share analysts’ environments
 • Centrally store proprietary libraries and manage versioning Cluster environment management • Manages Python, R, Java, Scala packages
 across the cluster
 • Easily replicate analysts’ environments for different jobs/ users/groups
 • Strong support for Hadoop & Spark Anaconda Enterprise
  • 25. © 2016 Continuum Analytics - Confidential & Proprietary 25 Anaconda Enterprise Scalable Computing and Collaboration • Multi-user notebook deployments • Scalable notebook deployment model • Project-based management • Notebook versioning and locking • Extended support for Hadoop Stack (Storm, Spark Streaming, Kafka) • Single sign-on support(PKI, Kerberos etc.) • Burst Compute support
  • 26. © 2016 Continuum Analytics - Confidential & Proprietary 26 Consulting Customers include: • JPL • DARPA • Sandia National Labs • AMD • Bank of America • Bloomberg We Will Help Design, Architect, and Build the Right Analytics For You Leverage our Open-Source Projects