SlideShare a Scribd company logo
1877 Big Data Analytics on
Teradata:
An Introduction to
Revolution R Enterprise
Bill Jacobs
Dir., Product Marketing, Revolution Analytics
Demystifying R
 What is R
 Why is it so

popular?
 Is it only open
source?
3
Our view: Big Data meets Big Math =
New Business Outcomes

THE
PERFECT
STORM

+ Computing Power
+ Data
+ Pace of Business
+ Customer Expectations

+ Data Science
+ Computer Science
+ Management Science

Confidential to Revolution Analytics and shared
with Siemens under the NDA dated 27/9/2013

Better Business
Decisions

New
Business
Outcomes

4
Big Analytics Delivers Value from Big Data

Volume

Variety

Velocity

The three Vs
of Big Data:

The three V’s of Big Data Big Analytics:
Maximizing Value,
accommodating data Volatility,

while assuring Veracity of insights
5

Confidential to Revolution Analytics
R Open Source
-

Language, Community, Collaboration

-

Robert Gentleman & Ross Ihaka, 1993

-

Version 1.0 released 2000

-

2.5 Million Global Users

-

Over 4,800 add-on ―Packages‖

-

Why R?
R in Universities = New Talent

WELCOME & INTRODUCTIONS
Emerging Modeling/Visualization
Lower Cost Alternative

Open Source = Flexible & Innovative
Access to Free Packages
Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013

6
R is Exploding in Popularity & Functionality
Internet Discussion

Package Growth

Mean monthly traffic on email discussion list

Number of R packages listed on CRAN

4,000

2500

R

2000

3,000

1500

2,000

Stata
1000

SAS

1,000

SPSS
S-Plus

0
1995

2000

2005

500

0

2010

Web Site Popularity

Scholarly Activity

Number of links to main web site

Google Scholar hits (’05-’09 CAGR)

R

4,000

SAS

2,000
1,050

SPSS

900

S-Plus
Stata

600

R
SAS

46%
-11%

SPSS -27%
S-Plus
Stata

Source: http://r4stats.com/popularity

0%
10%

7
R is exploding in popularity & functionality
R Usage Growth
Rexer Data Miner Survey, 2007-2013

70% of data miners report using R

“I’ve been astonished by the rate at
which R has been adopted. Four
years ago, everyone in my
economics department [at the
University of Chicago] was using
Stata; now, as far as I can tell, R is
the standard tool, and students learn
it first.”
Deputy Editor for New Products at Forbes

24% use R as primary tool

“A key benefit of R is that it provides
near-instant availability of new and
experimental methods created by its
user base — without waiting for the
development/release cycle of
commercial software. SAS
recognizes the value of R to our
customer base…”

Source: www.rexeranalytics.com
Product Marketing Manager SAS Institute, Inc
R Is The Most Commonly Used Primarly
Analytics Tool

70% of data miners report using R

24% use R as primary tool

Source: www.rexeranalytics.com

Source: www.rexeranalytics.com
Example of advanced visualization with R

Facebook
Network
Graphic

10
R Community, collaboration and breadth:
CRAN task views (sub set of 4800+ packages)

Source: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/

Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013

11
Key Big Data Challenge: The Analytics Talent Pool

12
The Analytics Talent Pool With R

2 Million R Users

13
R is open source and drives analytic innovation
but….has some limitations for Enterprises

Big Data

In-memory bound

Hybrid memory &
disk scalability

Operates on bigger
volumes & factors

Speed of
Analysis

Single threaded

Parallel threading

Shrinks analysis
time

Enterprise
Readiness

Community
support

Commercial support

Delivers full service
production support

Analytic
Breadth &
Depth

5000+ innovative
analytic packages

Leverage open
source packages
plus Big Data ready
packages

Supercharges R

Commerci
al Viability

Risk of
deployment of
open source

Commercial license

Eliminate risk with
open source

14
Our History & Our Future
Revolution R Enterprise
V1 through V6.1

Revolution R Enterprise
V6.2 through V9

Revolution R Enterprise
V10 through v11

NA Offices
NYC
Dallas

Company
Founding
Relocate HQ
to Palo Alto
250
Customers

2007

500
Customers

2013

Chapter 1
Capture
Mindshare

1000
Customers

2015

Chapter 2
Mobilize with
Market Focus

Company Confidential – Do not distribute

2017

Chapter 3
Scalable
Growth

15
Revolution Confidential

200+ Customer Stories
Finance & Insurance

Academic & Gov’t

Healthcare & Life Sciences

Digital Media & Retail

Manufacturing & High Tech

16
Revolution Analytics - Overview
We are the only provider of a commercial analytics platform based on the open source
R statistical computing language.
Distributed, high performance
analytical algorithms

Power

Easier to build and deploy
analytic applications
Stable, scalable
multi-platform with

Productivity
Enterprise
Readiness

Professional services
enablement

world-wide support
World Wide Support Teams
• Standard and Premium Programs
• Technical Account Managers
• Customer Success Managers

Professional Services
• Architecture planning
• Systems Integration
• Advanced analytic applications
• Full life cycle projects
17
Customers Revolutionize their Business

Power

4X performance
50M records scored daily

“…we saw about a 4x
performance
improvement on 50
million records. It
works brilliantly.”
- CEO, John Wallace,
DataSong

Scalability
TB’s data from 200+ data sources
10’s thousands attributes
100’s millions of scores daily

“We’ve been able to scale
our solution to a problem
that’s so big that most
companies could not
address it…..”
- SVP Analytics, Kevin
Lyons, eXelate

Performance
2X data
2X attributes
no impact on performance

“We need a highperformance analytics
…we can now identify
opportunities for our clients
that would otherwise be
lost.”
- Chief Analytics Officer,
Leon Zemel, [x+1]
19
Revolution R
Enterprise
 What is Revolution R
Enterprise?
 How does Revolution R
Enterprise work with
Teradata Database?
Revolution R Enterprise
is….
the only big data big analytics platform
based on open source R,
the defacto statistical computing language for
modern analytics

 High Performance, Scalable Analytics
 Portable Across Enterprise Platforms
 Easier to Build & Deploy Analytics

21
How is RRE Used?
Discovering Patterns
with Big Data

Building Models
Efficiently

Flexibly Deploying
Models to Consumers

 Customer
segmentation
 Market basket
analysis
 Social networking
analysis
 Fraud detection
 Marketing attribution
 Sentiment analysis
 …and much more








 Customer lifetime value
 Pricing optimization
 Recommendation
engines
 …and much more

Credit risk
Customer churn
Propensity to buy
Market risk
Operational risk
…and much more

22
Introducing Revolution R Enterprise (RRE)
The Big Data Big Analytics Platform
 Big Data Big Analytics Ready
– Enterprise readiness
DevelopR

ConnectR

DeployR

– High performance analytics
– Multi-platform architecture
– Data source integration
– Development tools

ScaleR

– Deployment tools

DistributedR

23
The Platform Step by Step:
R Capabilities
R+CRAN

RevoR

• Open source R interpreter
• UPDATED R 3.0.2
• Freely-available R algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing
R scripts, functions and
packages

• Performance enhanced R interpreter
• Based on open source R
• Adds high-performance math

Available On:
•
•
•
•
•
•
•
•
•
•
•

PlatformTM LSFTM Linux®
Microsoft® HPC Clusters
Microsoft Azure Burst
Windows® & Linux Servers
Windows & Linux Workstations
Teradata® Database
IBM® Netezza®
IBM BigInsightsTM
Cloudera Hadoop®
Hortonworks Hadoop
Intel® Hadoop

24
Big Data Speed @ Scale with
Revolution R Enterprise (RRE)
In-Hadoop Execution

First, we enhance and
accelerate the Open Source
R interpreter.

In-Database Execution
Parallelized User Code

Parallelized Algorithms
Multi-Core Processing
Multi-Threaded Execution

Memory Management
Fast Math Libraries

25
Open Source R Performance: Multi-threaded
Math
Open
Customers report 5-50x
Source R

Revolution R
Enterprise

performance improvements
compared to Open Source R —
without changing any code

Computation (4-core laptop)

Open Source R

Revolution R

Speedup

Matrix Multiply

176 sec

9.3 sec

18x

Cholesky Factorization

25.5 sec

1.3 sec

19x

Linear Discriminant Analysis

189 sec

74 sec

3x

R Benchmarks (Matrix Functions)

22 sec

3.5 sec

5x

R Benchmarks (Program Control)

5.6 sec

5.4 sec

Not appreciable

Linear Algebra1

General R Benchmarks2

1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/
26
The Platform Step by Step:
Parallelization & Data Sourcing

ConnectR
• High-speed & direct connectors

Available for:

ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical
tests
• Correlation & covariance matrices
• Predictive Models – linear, logistic,
GLM
• Machine learning
• Monte Carlo simulation
• NEW Tools for distributing
customized algorithms across nodes

• High-performance XDF
• SAS, SPSS, delimited & fixed format
text data files
• Hadoop HDFS (text & XDF)
• Teradata Database & Aster
• EDWs and ADWs
• ODBC

DistributedR
• Distributed computing framework
• Delivers portability across platforms

Available on:
•
•
•
•
•
•
•
•

Windows Servers
Red Hat and NEW SuSE Linux Servers
IBM Platform LSF Linux
Microsoft HPC Clusters
Microsoft Azure Burst
NEW Teradata Database
NEW Cloudera Hadoop
NEW Hortonworks Hadoop

27
Big Data Speed @ Scale with
Revolution R Enterprise (RRE)
In-Hadoop Execution

Second, we built a platform for hosting
R with Big Data on a variety of
massively parallel platforms.

In-Database Execution
Parallelized User Code

Parallelized Algorithms
Multi-Core Processing
Multi-Threaded Execution

Memory Management
Fast Math Libraries

28
Revolution R Enterprise
Powering Next Generation Analytics

COMBINE INTERMEDIATE RESULTS

29
SAS HPA Speed comparison* Logistic
Regression

Rows of data

1 billion

Parameters

“just a few”

Time

80 seconds

Data location

In memory

Nodes

32

Cores

384

RAM

1,536 GB

1 billion

Double

7

45%

44 seconds
On disk

1/6th

5

5%

20

5%

80 GB

Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a
20th as much RAM, a 6th as many nodes, and not pre-loading data into RAM.

Revolution R Enterprise Delivers Performance at 2%

of the Cost

*As published by SAS in HPC Wire, April 21, 2011
30
Analytics Layer: High Performance Big Data
Analytics with ScaleR

R Data Step

Descriptive
Statistics

Statistical
Tests

Sampling

Predictive
Modeling

Data
Visualization

Machine
Learning

Simulation

31
ScaleR: Fast Parallel External Memory
Algorithms
Data Prep, Distillation & Descriptive Analytics

R Data Step












Data import –
Delimited, Fixed, SAS, SPSS, O
BDC
Variable creation &
transformation
Recode variables
Factor variables
Missing value handling
Sort
Merge
Split
Aggregate by category
(means, sums)
Use any of the functionality of
the R language to transform
and clean data row by row!

Descriptive Statistics















Min / Max
Mean
Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product
matrix for set variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data (standard
tables & long form)
Marginal Summaries of Cross
Tabulations

Company Confidential – Do not distribute

Statistical Tests






Chi Square Test
t-Test
F-Test
Plus 100’s of other tests
available in R!

Sampling




Subsample (observations &
variables)
Random Sampling
High quality, fast, parallel
random number generators

32
ScaleR: Fast Parallel External Memory
Algorithms
Statistical Modeling

Predictive Models










Covariance, Correlation, Sums of
Squares (cross product matrix for set
variables) matrices
Multiple Linear Regression
Generalized Linear Models (GLM) - All
exponential family distributions:
binomial, Gaussian, inverse
Gaussian, Poisson, Tweedie. Standard
link functions including:
cauchit, identity, log, logit, probit. User
defined distributions & link functions.
Logistic Regression
Classification & Regression Trees
Decision Forests
Predictions/scoring for models
Residuals for all models

Machine Learning

Data Visualization





Histogram
Line Plot
Lorenz Curve
ROC Curves (actual data and predicted
values)
Plus numerous tools in R and ScaleR
to generate big data visualizations



Cluster Analysis


K-Means

Classification



Decision Trees
Decision Forests

Simulation




High quality, fast, parallel random
number generators
Use the rich functionality of R
for simulations
33
The Power of Revolution R Enterprise
Performance & Scalability
ScaleR
ScaleR

Moves computation to data

ScaleR

V
a
l
u
e

Moves computation to data

Leverage CRAN

ScaleR

Labor saving power

DistributedR

Maximizes computation

DistributedR

Powerful divide & conquer

DistributedR

Effective memory utilization

RevoR

3-50X faster

Open Source

Leverage latest innovation

34
Why Teradata And Revolution R Enterprise?
 Teradata User Demand
 Data Movement Penalty Growing
 New Analytics Requiring MPP Approach
 R Popularity
 Open Source Limitations
 Arrival of Teradata v14.10

35
+
Revolution Analytics coupled with the Teradata Unified Data Architecture accelerates
big data analytics using the widely-accepted R language.
Available Today:
 Scalable R analytics on servers
connected to Teradata
 High speed, parallel data transfer, 5x
faster than RODBC
 Integrated parallel analytics solution
Teradata Version 14.0

Upcoming Capabilities (4Q13)
 Parallel R in-database for big data
analytics on Teradata
 R programmers can immediately build
parallel R models completely in R
 Revolution parallel in-database
algorithms exclusively available on
Teradata

Revolution R
Enterprise 6.2
High-Speed
TPT Connector

Company Confidential

Teradata
Version 14.10
+
Revolution R
Enterprise V7

36
Introducing Revolution R Enterprise Version 7 on
Teradata Database
 New Teradata Table Operators
 New Parallelized Algorithms
 In-Database Execution of Parallelized Algorithms

 Executes R Scripts From R Workstations or Servers
 Provides Orders of Magnitude Performance Gains
 Supports Multiple Platforms in UDA
 Available Late 2013

37
Revolution Analytics in the UDA
UNIFIED DATA ARCHITECTURE
With Revolution R Enterprise

RODBC

Seamless use of R analytics across the
Teradata UDA
38
Transparent Parallelization of Analytical, Predictive Modeling and
Machine Learning in Teradata

HOW DOES IT WORK?

39
Understanding R’s Compute Workload

R Script
< 1%

Computational
Workload Breakdown

Compute Burden from Script
or Command

Compute Burden from
Algorithmic Computations

Algorithms
99.xxx%

40
ScaleR PEMAs: High Performance Analytical
Algorithms
 Users Script Calls ScaleR PEMA
– No Unique Code or Setup for Parallelism
– ScaleR Algorithms are “just another R package”

– Using PEMAs is Transparent, Automatic, Fast and Scales
Linearly

 PEMAs Transparently Parallelize Algorithm Execution
– Parallelized Versions of Statistics, Predictive Modeling and
Machine Learning Algorithms
– PEMAs Transparently Distribute Computations Across AMPs
– Results are Consolidated Into A Single Result Set
– Provides Write Once Deploy Anywhere (WODA) Portability

41
Transparent Distributed Computing with
RRE ScaleR
Transparent to the Script
 Algorithm Starts A Master Process
 Master Identifies Environment

In Revolution R Enterprise:
 Script Calls ScaleR PEMA


Algorithm Executes

 Algorithm Returns to Script
 Script Continues Execution






Threading?
Cores?
Chips?
Distributed Nodes?

 Master Initializes Algorithm


Prepares Instructions for Nodes

 Master Executes Table Operators In
Each VAMP




VAMPs process each data segment
Table Operator runs in each VAMP
Table Operator returns Intermediate Result
Object (IRO) to master process

 Master Process Combines IROs
 Returns Consolidated Answer to Script

42
ScaleR PEMAs on Teradata:
Transparent Distribution of R Analytics
Desktops &
Servers
Revolution R
Enterprise

 For Each Call to a
ScaleR Algorithm:
– One Request
– Many Subtasks
– One Answer

Corporate
Applications
Revolution R
Enterprise
ODBC

Teradata
Database
+
Revolution
R
Enterprise

Extended Stored
Procedure

Table
Operators

AMPs
43
Revolution R Enterprise Ecosystem
Power of Integration
SI / Service

Deployment / Consumption

MSP / DSP

Advanced Analytics

ETL
Corios

Data / Infrastructure

46
The Platform Step by Step:
Tools & Deployment
DevelopR

DeployR

• Freely-available R algorithms
• Callable by RevoR
• Embeddable in R scripts

• Web services software
development kit
• Integrates R Into application
infrastructures

Available on:
• Can be called by RevoR
• Can be run singe-node
using RevoR
• Analyze large data using
RDataStep package
• Run on multiple nodes using
rxEXEC package

DevelopR

DeployR

Capabilities:
• Invokes R Scripts from
web services calls
• RESTful interface for
easy integration
• Works with leading desktop
& BI tools

47
DevelopR Integrated Development
Environment
Script with type ahead
and code snippets

Sophisticated debugging
with breakpoints ,
variable values etc.

Solutions window
for organizing code
and data

Objects loaded
in the R
Environment

Packages
installed and
loaded

Object
details

http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm
48
Data Analysis
DeployR
R / Statistical
Modeling Expert

Deployment
Expert

Business Intelligence

 Seamless

Mobile Web Apps

 Bring the power of R to any web enabled application

 Simple
 Leverage common APIs including JS, Java, .NET

 Scalable
 Robustly scale user and compute workloads

 Secure

Cloud / SaaS

 Manage enterprise security with LDAP & SSO

49
Create Custom, On-Demand Analytical
Apps
Some Examples:
On-demand sales
forecasting

Leveraging the
power of R from
Microsoft tools

Real-time social
media sentiment
analysis

50
Alteryx and Revolution Analytics
Making Predictive Analytics
More Accessible and Scalable
Empowering Analysts with
Easy-to-Use Predictive
Tools combined with the
Leading R Platform

Delivering Enterprise-Scale
Predictive Analytics to Line
of Business Analysts

Enabling a Broader
Audience to Harness the
Universe of R

51
Summary.
 R is Hot.
– Most Broadly Used Analytical Language
– Its Popularity Addresses Critical Talent Gap
– Vast Functionality Via CRAN
– R Needs a Platform For Big Data Big Analytics

 Revolution Provides Enterprise-Capable Platforms for R.
– High Performance.
– Scalable via Transparent Distributed Execution
– Portable – Write Once Deploy Anywhere - WODA
– Commercial Support & Services Cut Project Risks

 Teradata + Revolution Provide a Robust Solution
– Teradata provides stable, high-performane big data environment
– Revolution provides speed, scale, portability and stability for the enterprise

52
Next steps?

The leading commercial provider of software and support for the popular
open source R statistics language.

www.revolutionanalytics.com

650.646.9545

Twitter: @RevolutionR

53
Thank You.

54

More Related Content

PDF
In-Database Analytics Deep Dive with Teradata and Revolution
PDF
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
PPTX
Reproducibility with Checkpoint & RRO - NYC R Conference
PPTX
DeployR: Revolution R Enterprise with Business Intelligence Applications
PDF
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
PDF
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
PDF
Revolution R - 100% R and More
PDF
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
In-Database Analytics Deep Dive with Teradata and Revolution
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
Reproducibility with Checkpoint & RRO - NYC R Conference
DeployR: Revolution R Enterprise with Business Intelligence Applications
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R - 100% R and More
Performance and Scale Options for R with Hadoop: A comparison of potential ar...

What's hot (20)

PPTX
Are You Ready for Big Data Big Analytics?
PDF
R and Big Data using Revolution R Enterprise with Hadoop
PDF
Intro to R for SAS and SPSS User Webinar
PPTX
The network structure of cran 2015 07-02 final
PPTX
Simple Reproducibility with the checkpoint package
PDF
Moving From SAS to R Webinar Presentation - 07Aug14
PPTX
High Performance Predictive Analytics in R and Hadoop
PDF
R for SAS Users Complement or Replace Two Strategies
PPTX
A Step Towards Reproducibility in R
PDF
High Performance Predictive Analytics in R and Hadoop
PDF
Big Data - Analytics with R
PPTX
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
PPTX
Revolution R: 100% R and more
PPTX
Predictive Analytics with Hadoop
PPTX
Big data analytics using R
PPTX
Big data business case
PDF
SparkR Best Practices for R Data Scientists
PDF
Present and future of unified, portable and efficient data processing with Ap...
PPTX
Apache Atlas: Governance for your Data
PDF
Analytics with R in SQL Server 2016
Are You Ready for Big Data Big Analytics?
R and Big Data using Revolution R Enterprise with Hadoop
Intro to R for SAS and SPSS User Webinar
The network structure of cran 2015 07-02 final
Simple Reproducibility with the checkpoint package
Moving From SAS to R Webinar Presentation - 07Aug14
High Performance Predictive Analytics in R and Hadoop
R for SAS Users Complement or Replace Two Strategies
A Step Towards Reproducibility in R
High Performance Predictive Analytics in R and Hadoop
Big Data - Analytics with R
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Revolution R: 100% R and more
Predictive Analytics with Hadoop
Big data analytics using R
Big data business case
SparkR Best Practices for R Data Scientists
Present and future of unified, portable and efficient data processing with Ap...
Apache Atlas: Governance for your Data
Analytics with R in SQL Server 2016
Ad

Viewers also liked (20)

PPT
Web 3.0
PPTX
Teradata QueryGrid to MongoDB Lightning Introduction
PPTX
Connecting Teradata and MongoDB with QueryGrid
PDF
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
PPTX
Presto for the Enterprise @ Hadoop Meetup
PDF
AWS Meet-up: Logging At Scale on AWS
PPTX
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
PDF
Prestogres internals
PPTX
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
PDF
Presto as a Service - Tips for operation and monitoring
PDF
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PDF
Big Data: SQL query federation for Hadoop and RDBMS data
PPTX
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
PDF
Presto - SQL on anything
PDF
Understanding Presto - Presto meetup @ Tokyo #1
PPTX
Presto: Distributed sql query engine
PPTX
Teradata Big Data London Seminar
PDF
Big SQL Competitive Summary - Vendor Landscape
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
PPTX
Data virtualization, Data Federation & IaaS with Jboss Teiid
Web 3.0
Teradata QueryGrid to MongoDB Lightning Introduction
Connecting Teradata and MongoDB with QueryGrid
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto for the Enterprise @ Hadoop Meetup
AWS Meet-up: Logging At Scale on AWS
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Prestogres internals
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Presto as a Service - Tips for operation and monitoring
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Big Data: SQL query federation for Hadoop and RDBMS data
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Presto - SQL on anything
Understanding Presto - Presto meetup @ Tokyo #1
Presto: Distributed sql query engine
Teradata Big Data London Seminar
Big SQL Competitive Summary - Vendor Landscape
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Data virtualization, Data Federation & IaaS with Jboss Teiid
Ad

Similar to Big data analytics on teradata with revolution r enterprise bill jacobs (20)

PPTX
Revolution Analytics Podcast
PPTX
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
PPTX
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
PPTX
Decision trees in hadoop
PPTX
Revolution R: 100% R and more
PDF
Big Data Analytics with R
PPTX
Revolution R Enterprise - Portland R User Group, November 2013
PDF
Creating Value That Scales with Revolution Analytics & Alteryx
PDF
Applications in R - Success and Lessons Learned from the Marketplace
PDF
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
PDF
What's New in Revolution R Enterprise 6.2
PDF
Batter Up! Advanced Sports Analytics with R and Storm
PDF
Risk Analysis in the Financial Services Industry
PDF
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
PDF
Big Data in Action – Real-World Solution Showcase
PDF
Microsoft and Revolution Analytics -- what's the add-value? 20150629
PDF
Revolution R Enterprise: 100% R and More (14 Mar 2013)
PDF
Data Culture Series - Keynote & Panel - 19h May - London
PDF
Bluegranite AA Webinar FINAL 28JUN16
PDF
Michal Marušan: Scalable R
Revolution Analytics Podcast
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Decision trees in hadoop
Revolution R: 100% R and more
Big Data Analytics with R
Revolution R Enterprise - Portland R User Group, November 2013
Creating Value That Scales with Revolution Analytics & Alteryx
Applications in R - Success and Lessons Learned from the Marketplace
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
What's New in Revolution R Enterprise 6.2
Batter Up! Advanced Sports Analytics with R and Storm
Risk Analysis in the Financial Services Industry
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Big Data in Action – Real-World Solution Showcase
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Revolution R Enterprise: 100% R and More (14 Mar 2013)
Data Culture Series - Keynote & Panel - 19h May - London
Bluegranite AA Webinar FINAL 28JUN16
Michal Marušan: Scalable R

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Electronic commerce courselecture one. Pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
“AI and Expert System Decision Support & Business Intelligence Systems”
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
Electronic commerce courselecture one. Pdf
Understanding_Digital_Forensics_Presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Spectroscopy.pptx food analysis technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Big data analytics on teradata with revolution r enterprise bill jacobs

  • 1. 1877 Big Data Analytics on Teradata: An Introduction to Revolution R Enterprise Bill Jacobs Dir., Product Marketing, Revolution Analytics
  • 2. Demystifying R  What is R  Why is it so popular?  Is it only open source?
  • 3. 3
  • 4. Our view: Big Data meets Big Math = New Business Outcomes THE PERFECT STORM + Computing Power + Data + Pace of Business + Customer Expectations + Data Science + Computer Science + Management Science Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013 Better Business Decisions New Business Outcomes 4
  • 5. Big Analytics Delivers Value from Big Data Volume Variety Velocity The three Vs of Big Data: The three V’s of Big Data Big Analytics: Maximizing Value, accommodating data Volatility, while assuring Veracity of insights 5 Confidential to Revolution Analytics
  • 6. R Open Source - Language, Community, Collaboration - Robert Gentleman & Ross Ihaka, 1993 - Version 1.0 released 2000 - 2.5 Million Global Users - Over 4,800 add-on ―Packages‖ - Why R? R in Universities = New Talent WELCOME & INTRODUCTIONS Emerging Modeling/Visualization Lower Cost Alternative Open Source = Flexible & Innovative Access to Free Packages Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013 6
  • 7. R is Exploding in Popularity & Functionality Internet Discussion Package Growth Mean monthly traffic on email discussion list Number of R packages listed on CRAN 4,000 2500 R 2000 3,000 1500 2,000 Stata 1000 SAS 1,000 SPSS S-Plus 0 1995 2000 2005 500 0 2010 Web Site Popularity Scholarly Activity Number of links to main web site Google Scholar hits (’05-’09 CAGR) R 4,000 SAS 2,000 1,050 SPSS 900 S-Plus Stata 600 R SAS 46% -11% SPSS -27% S-Plus Stata Source: http://r4stats.com/popularity 0% 10% 7
  • 8. R is exploding in popularity & functionality R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R “I’ve been astonished by the rate at which R has been adopted. Four years ago, everyone in my economics department [at the University of Chicago] was using Stata; now, as far as I can tell, R is the standard tool, and students learn it first.” Deputy Editor for New Products at Forbes 24% use R as primary tool “A key benefit of R is that it provides near-instant availability of new and experimental methods created by its user base — without waiting for the development/release cycle of commercial software. SAS recognizes the value of R to our customer base…” Source: www.rexeranalytics.com Product Marketing Manager SAS Institute, Inc
  • 9. R Is The Most Commonly Used Primarly Analytics Tool 70% of data miners report using R 24% use R as primary tool Source: www.rexeranalytics.com Source: www.rexeranalytics.com
  • 10. Example of advanced visualization with R Facebook Network Graphic 10
  • 11. R Community, collaboration and breadth: CRAN task views (sub set of 4800+ packages) Source: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/ Confidential to Revolution Analytics and shared with Siemens under the NDA dated 27/9/2013 11
  • 12. Key Big Data Challenge: The Analytics Talent Pool 12
  • 13. The Analytics Talent Pool With R 2 Million R Users 13
  • 14. R is open source and drives analytic innovation but….has some limitations for Enterprises Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commerci al Viability Risk of deployment of open source Commercial license Eliminate risk with open source 14
  • 15. Our History & Our Future Revolution R Enterprise V1 through V6.1 Revolution R Enterprise V6.2 through V9 Revolution R Enterprise V10 through v11 NA Offices NYC Dallas Company Founding Relocate HQ to Palo Alto 250 Customers 2007 500 Customers 2013 Chapter 1 Capture Mindshare 1000 Customers 2015 Chapter 2 Mobilize with Market Focus Company Confidential – Do not distribute 2017 Chapter 3 Scalable Growth 15
  • 16. Revolution Confidential 200+ Customer Stories Finance & Insurance Academic & Gov’t Healthcare & Life Sciences Digital Media & Retail Manufacturing & High Tech 16
  • 17. Revolution Analytics - Overview We are the only provider of a commercial analytics platform based on the open source R statistical computing language. Distributed, high performance analytical algorithms Power Easier to build and deploy analytic applications Stable, scalable multi-platform with Productivity Enterprise Readiness Professional services enablement world-wide support World Wide Support Teams • Standard and Premium Programs • Technical Account Managers • Customer Success Managers Professional Services • Architecture planning • Systems Integration • Advanced analytic applications • Full life cycle projects 17
  • 18. Customers Revolutionize their Business Power 4X performance 50M records scored daily “…we saw about a 4x performance improvement on 50 million records. It works brilliantly.” - CEO, John Wallace, DataSong Scalability TB’s data from 200+ data sources 10’s thousands attributes 100’s millions of scores daily “We’ve been able to scale our solution to a problem that’s so big that most companies could not address it…..” - SVP Analytics, Kevin Lyons, eXelate Performance 2X data 2X attributes no impact on performance “We need a highperformance analytics …we can now identify opportunities for our clients that would otherwise be lost.” - Chief Analytics Officer, Leon Zemel, [x+1] 19
  • 19. Revolution R Enterprise  What is Revolution R Enterprise?  How does Revolution R Enterprise work with Teradata Database?
  • 20. Revolution R Enterprise is…. the only big data big analytics platform based on open source R, the defacto statistical computing language for modern analytics  High Performance, Scalable Analytics  Portable Across Enterprise Platforms  Easier to Build & Deploy Analytics 21
  • 21. How is RRE Used? Discovering Patterns with Big Data Building Models Efficiently Flexibly Deploying Models to Consumers  Customer segmentation  Market basket analysis  Social networking analysis  Fraud detection  Marketing attribution  Sentiment analysis  …and much more        Customer lifetime value  Pricing optimization  Recommendation engines  …and much more Credit risk Customer churn Propensity to buy Market risk Operational risk …and much more 22
  • 22. Introducing Revolution R Enterprise (RRE) The Big Data Big Analytics Platform  Big Data Big Analytics Ready – Enterprise readiness DevelopR ConnectR DeployR – High performance analytics – Multi-platform architecture – Data source integration – Development tools ScaleR – Deployment tools DistributedR 23
  • 23. The Platform Step by Step: R Capabilities R+CRAN RevoR • Open source R interpreter • UPDATED R 3.0.2 • Freely-available R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages • Performance enhanced R interpreter • Based on open source R • Adds high-performance math Available On: • • • • • • • • • • • PlatformTM LSFTM Linux® Microsoft® HPC Clusters Microsoft Azure Burst Windows® & Linux Servers Windows & Linux Workstations Teradata® Database IBM® Netezza® IBM BigInsightsTM Cloudera Hadoop® Hortonworks Hadoop Intel® Hadoop 24
  • 24. Big Data Speed @ Scale with Revolution R Enterprise (RRE) In-Hadoop Execution First, we enhance and accelerate the Open Source R interpreter. In-Database Execution Parallelized User Code Parallelized Algorithms Multi-Core Processing Multi-Threaded Execution Memory Management Fast Math Libraries 25
  • 25. Open Source R Performance: Multi-threaded Math Open Customers report 5-50x Source R Revolution R Enterprise performance improvements compared to Open Source R — without changing any code Computation (4-core laptop) Open Source R Revolution R Speedup Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable Linear Algebra1 General R Benchmarks2 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ 26
  • 26. The Platform Step by Step: Parallelization & Data Sourcing ConnectR • High-speed & direct connectors Available for: ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • NEW Tools for distributing customized algorithms across nodes • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBC DistributedR • Distributed computing framework • Delivers portability across platforms Available on: • • • • • • • • Windows Servers Red Hat and NEW SuSE Linux Servers IBM Platform LSF Linux Microsoft HPC Clusters Microsoft Azure Burst NEW Teradata Database NEW Cloudera Hadoop NEW Hortonworks Hadoop 27
  • 27. Big Data Speed @ Scale with Revolution R Enterprise (RRE) In-Hadoop Execution Second, we built a platform for hosting R with Big Data on a variety of massively parallel platforms. In-Database Execution Parallelized User Code Parallelized Algorithms Multi-Core Processing Multi-Threaded Execution Memory Management Fast Math Libraries 28
  • 28. Revolution R Enterprise Powering Next Generation Analytics COMBINE INTERMEDIATE RESULTS 29
  • 29. SAS HPA Speed comparison* Logistic Regression Rows of data 1 billion Parameters “just a few” Time 80 seconds Data location In memory Nodes 32 Cores 384 RAM 1,536 GB 1 billion Double 7 45% 44 seconds On disk 1/6th 5 5% 20 5% 80 GB Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a 20th as much RAM, a 6th as many nodes, and not pre-loading data into RAM. Revolution R Enterprise Delivers Performance at 2% of the Cost *As published by SAS in HPC Wire, April 21, 2011 30
  • 30. Analytics Layer: High Performance Big Data Analytics with ScaleR R Data Step Descriptive Statistics Statistical Tests Sampling Predictive Modeling Data Visualization Machine Learning Simulation 31
  • 31. ScaleR: Fast Parallel External Memory Algorithms Data Prep, Distillation & Descriptive Analytics R Data Step           Data import – Delimited, Fixed, SAS, SPSS, O BDC Variable creation & transformation Recode variables Factor variables Missing value handling Sort Merge Split Aggregate by category (means, sums) Use any of the functionality of the R language to transform and clean data row by row! Descriptive Statistics              Min / Max Mean Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix for set variables) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data (standard tables & long form) Marginal Summaries of Cross Tabulations Company Confidential – Do not distribute Statistical Tests     Chi Square Test t-Test F-Test Plus 100’s of other tests available in R! Sampling    Subsample (observations & variables) Random Sampling High quality, fast, parallel random number generators 32
  • 32. ScaleR: Fast Parallel External Memory Algorithms Statistical Modeling Predictive Models         Covariance, Correlation, Sums of Squares (cross product matrix for set variables) matrices Multiple Linear Regression Generalized Linear Models (GLM) - All exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions including: cauchit, identity, log, logit, probit. User defined distributions & link functions. Logistic Regression Classification & Regression Trees Decision Forests Predictions/scoring for models Residuals for all models Machine Learning Data Visualization     Histogram Line Plot Lorenz Curve ROC Curves (actual data and predicted values) Plus numerous tools in R and ScaleR to generate big data visualizations  Cluster Analysis  K-Means Classification   Decision Trees Decision Forests Simulation   High quality, fast, parallel random number generators Use the rich functionality of R for simulations 33
  • 33. The Power of Revolution R Enterprise Performance & Scalability ScaleR ScaleR Moves computation to data ScaleR V a l u e Moves computation to data Leverage CRAN ScaleR Labor saving power DistributedR Maximizes computation DistributedR Powerful divide & conquer DistributedR Effective memory utilization RevoR 3-50X faster Open Source Leverage latest innovation 34
  • 34. Why Teradata And Revolution R Enterprise?  Teradata User Demand  Data Movement Penalty Growing  New Analytics Requiring MPP Approach  R Popularity  Open Source Limitations  Arrival of Teradata v14.10 35
  • 35. + Revolution Analytics coupled with the Teradata Unified Data Architecture accelerates big data analytics using the widely-accepted R language. Available Today:  Scalable R analytics on servers connected to Teradata  High speed, parallel data transfer, 5x faster than RODBC  Integrated parallel analytics solution Teradata Version 14.0 Upcoming Capabilities (4Q13)  Parallel R in-database for big data analytics on Teradata  R programmers can immediately build parallel R models completely in R  Revolution parallel in-database algorithms exclusively available on Teradata Revolution R Enterprise 6.2 High-Speed TPT Connector Company Confidential Teradata Version 14.10 + Revolution R Enterprise V7 36
  • 36. Introducing Revolution R Enterprise Version 7 on Teradata Database  New Teradata Table Operators  New Parallelized Algorithms  In-Database Execution of Parallelized Algorithms  Executes R Scripts From R Workstations or Servers  Provides Orders of Magnitude Performance Gains  Supports Multiple Platforms in UDA  Available Late 2013 37
  • 37. Revolution Analytics in the UDA UNIFIED DATA ARCHITECTURE With Revolution R Enterprise RODBC Seamless use of R analytics across the Teradata UDA 38
  • 38. Transparent Parallelization of Analytical, Predictive Modeling and Machine Learning in Teradata HOW DOES IT WORK? 39
  • 39. Understanding R’s Compute Workload R Script < 1% Computational Workload Breakdown Compute Burden from Script or Command Compute Burden from Algorithmic Computations Algorithms 99.xxx% 40
  • 40. ScaleR PEMAs: High Performance Analytical Algorithms  Users Script Calls ScaleR PEMA – No Unique Code or Setup for Parallelism – ScaleR Algorithms are “just another R package” – Using PEMAs is Transparent, Automatic, Fast and Scales Linearly  PEMAs Transparently Parallelize Algorithm Execution – Parallelized Versions of Statistics, Predictive Modeling and Machine Learning Algorithms – PEMAs Transparently Distribute Computations Across AMPs – Results are Consolidated Into A Single Result Set – Provides Write Once Deploy Anywhere (WODA) Portability 41
  • 41. Transparent Distributed Computing with RRE ScaleR Transparent to the Script  Algorithm Starts A Master Process  Master Identifies Environment In Revolution R Enterprise:  Script Calls ScaleR PEMA  Algorithm Executes  Algorithm Returns to Script  Script Continues Execution     Threading? Cores? Chips? Distributed Nodes?  Master Initializes Algorithm  Prepares Instructions for Nodes  Master Executes Table Operators In Each VAMP    VAMPs process each data segment Table Operator runs in each VAMP Table Operator returns Intermediate Result Object (IRO) to master process  Master Process Combines IROs  Returns Consolidated Answer to Script 42
  • 42. ScaleR PEMAs on Teradata: Transparent Distribution of R Analytics Desktops & Servers Revolution R Enterprise  For Each Call to a ScaleR Algorithm: – One Request – Many Subtasks – One Answer Corporate Applications Revolution R Enterprise ODBC Teradata Database + Revolution R Enterprise Extended Stored Procedure Table Operators AMPs 43
  • 43. Revolution R Enterprise Ecosystem Power of Integration SI / Service Deployment / Consumption MSP / DSP Advanced Analytics ETL Corios Data / Infrastructure 46
  • 44. The Platform Step by Step: Tools & Deployment DevelopR DeployR • Freely-available R algorithms • Callable by RevoR • Embeddable in R scripts • Web services software development kit • Integrates R Into application infrastructures Available on: • Can be called by RevoR • Can be run singe-node using RevoR • Analyze large data using RDataStep package • Run on multiple nodes using rxEXEC package DevelopR DeployR Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with leading desktop & BI tools 47
  • 45. DevelopR Integrated Development Environment Script with type ahead and code snippets Sophisticated debugging with breakpoints , variable values etc. Solutions window for organizing code and data Objects loaded in the R Environment Packages installed and loaded Object details http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm 48
  • 46. Data Analysis DeployR R / Statistical Modeling Expert Deployment Expert Business Intelligence  Seamless Mobile Web Apps  Bring the power of R to any web enabled application  Simple  Leverage common APIs including JS, Java, .NET  Scalable  Robustly scale user and compute workloads  Secure Cloud / SaaS  Manage enterprise security with LDAP & SSO 49
  • 47. Create Custom, On-Demand Analytical Apps Some Examples: On-demand sales forecasting Leveraging the power of R from Microsoft tools Real-time social media sentiment analysis 50
  • 48. Alteryx and Revolution Analytics Making Predictive Analytics More Accessible and Scalable Empowering Analysts with Easy-to-Use Predictive Tools combined with the Leading R Platform Delivering Enterprise-Scale Predictive Analytics to Line of Business Analysts Enabling a Broader Audience to Harness the Universe of R 51
  • 49. Summary.  R is Hot. – Most Broadly Used Analytical Language – Its Popularity Addresses Critical Talent Gap – Vast Functionality Via CRAN – R Needs a Platform For Big Data Big Analytics  Revolution Provides Enterprise-Capable Platforms for R. – High Performance. – Scalable via Transparent Distributed Execution – Portable – Write Once Deploy Anywhere - WODA – Commercial Support & Services Cut Project Risks  Teradata + Revolution Provide a Robust Solution – Teradata provides stable, high-performane big data environment – Revolution provides speed, scale, portability and stability for the enterprise 52
  • 50. Next steps? The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR 53

Editor's Notes

  • #22: Enterprise readinessPerformance architectureBig Data analyticsData source integrationDevelopment toolsDeployment tools
  • #24: Enterprise readinessBuild assurance: Continuous testing, custom validationImplementation tools: validation utilityTechnical support, documentation, trainingPerformance architectureFast math librariesBetter memory managementMulti-core processingDistributed computing architectureBig Data analyticsDescriptive StatisticsCross TabulationStatistical TestsCorrelation, Covariance and SSCP MatricesLinear RegressionLogistic RegressionGeneralized Linear ModelsDecision TreesK-Means ClusteringData source integrationODBCTeradata (high speed)Text Files: Delimited &amp; Fixed formatSASSPSSHadoop:HDFS &amp; HbaseDevelopment toolsVisual DebuggerScript EditorR SnippetsObject BrowserSolution ExplorerCustomizable WorkspaceVersion Control Plug-InDeployment toolsR objects as JSON, XMLSupports Java, JavaScript, .NETRESTful web services APISecurity: LDAP, SSOBuilt-In load balancingAsynchronous schedulingManagement consoleAccelerators: Jaspersoft, Qlikview
  • #30: A Revolution R Enterprise ScaleR analytic is provided a data source as inputThe analytic loops over data, reading a block at a time. Blocks of data are read by a separate worker thread (Thread 0).Worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memoryWhen all of the data is processed a master results object is created from the intermediate results objects
  • #45: RRE environments on workstations and servers submit execution requests to TD just as they do for other platforms.Steps in the execution include: In response to an execution request from a user or a tool, the users’s RRE instance packages the R Script and environment metadata into a request.Request is transferred to Teradata via ODBC interface and contains the R code and environment details.Teradata’s Gateway Process receives the ODBC request, enqueuing it for one of the Parsing Engines (PE), typically the least busy of them across the machine.The PE invokes the RRE engine (deployed inside of an Extended Stored Procedure)The RRE instance inside the XSP is essentially a master process. It decomposes the run request into one or more set of executions The execution requests are delivered to PEMAs running as table operators by the XSPScaleR PEMAs run as table operators on chunks of data provided by the Teradata database (part of the new native Table Operator functionality)PEMAs PEMAs run in VAMPs as table operators. Can be run interatively. TD chunks the data via AMPs We produce intermediate result objects Int. Results are returned to XSPs.
  • #49: Type ahead: the IDE recognizes an R function as you type in the first few characters and shows the completed formula and parametersCode snippets: Templates for common R functions e.g. for loop, xy plot. These are written in XML and users can add their ownSolution Window: The RPE organizes R scripts and data files in folders by Solution. This facilitates but does not implement versioningThe lists of packages of installed and the list of loaded packages are available for inspection. Clicking on these packages shows their components in the object windowThe top right Object Browser window shows all of the objects available in the R environmentThe bottom right object window shows the details of particular objectsDebugging Tools: when running in debugging mode the RPE supports breakpoints, stepping in and out of code and shows the contents of variables upon “mouse over”.Users may step through all code available in the Solution that is active.
  • #51: DeployR Examples at: http://50.57.191.94/revolution/docs/examples/User:testuserPassword: secret
  • #52: Alteryx:Alteryx has always been about enabling business users to create powerful analytic applications through simple drag and drop functionsAnd as our support for R has developed, we have seen an evolution, as we have added more functionality, that has been fervently adopted by our customersAs we see more and more customers like WalMart using R and doing predictive analytics, we are starting to see these customers come up against the limitations of Open R. To address the scalability issues, and we started talking to Revolution Analytics. RevolutionDelivering Enterprise-Scale Predictive Analytics to Line of Business AnalystsWe make R enterprise ready in a number of ways. The end result is a powerful, scalable way to run predictive analytics, that takes leverages the open source community’s innovation and broad pool of experts.Furthermore, we are Enabling a Broader Audience to Harness the Power of RR is the most widely adopted statistical language with over 2M usersR is the standard statistical platform at universities around the worldR has a vibrant ecosystem that is constantly improving and innovating and offering new ways to use RAlteryx enables experts to adopt these new innovations but then to make the available to analysts by incorporating them back into the workflow, and then enabling those applications to be made available to business users through the Alteryx Gallery.