SlideShare a Scribd company logo
Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Datenanalysen auf Enterprise Niveau
mit Oracle R Enterprise
Dr. Nadine Schöne
Sales Consultant
Oracle Direct, Sales Consulting
Dr. Michael Haupt
Tech Lead, FastR Project
Virtual Machine Research Group, Oracle Labs
Negib Marhoul
Leading Senior Sales Consultant
Oracle Direct, Sales Consulting
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
4
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
Datenanalysen im Enterprise
R und Oracle R Enterprise (ORE)
Demo
Oracle Labs und FastR
Weitere Informationen
1
2
3
4
5
5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Datenanalysen im Enterprise
6
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hintergrund
Statistik und Mining Verfahren
 Zeitaufwendige
Analyseprozesse
 Mehrere Interationen
 Workflows von immer
wiederkehrenden
Arbeitsschritten
 Ressourcen-intensive
Datenanalysen
Daten
sammeln
Daten
identifizieren
Daten
aufbereiten
Daten
analysieren
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Wichtige Themen für Enterprise Data Analytics
1. Skalierbarkeit
2. Performance
3. Entwicklung & Produktion
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R und Oracle R Enterprise (ORE)
10
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R ist …
1. Eine Programmiersprache
2. Eine statistische Workbench
3. Ein Data Science Ökosystem
R ist die lingua franca für Data Science.
R logo © R Foundation, vonhttp://www.r-project.org
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Aspekte herkömmlicher R/Datenbank-Interaktion
12
R logo © R Foundation, vonhttp://www.r-project.org
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R Engine andere
R-Packages
Oracle R Enterprise Packages
User R Engine (Desktop)1
User-Tabellen
Oracle DBSQL
Ergebnisse
Datenbank Compute Engine2
R Engine andere
R-Packages
Oracle R Enterprise Packages
R Engine(s) verwaltet durch Oracle DB
R
Ergebnisse
3
Transparency Layer => Nutzung der Rechenkraft der Datenbank
Kein Flat File Export => Zeitersparnis + Nutzung der Rechenkraft des Servers
„Collaborative Execution“-Modell
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
“R is a powerful and interesting tool for
data analysis! ORE brings R into a
scalable DB engine (solving problems
of data management, analysis and
scalability). We actually can obtain
information and added value from not
so actively used data.”
– Stefano Alberto Russo, Researcher at CERN Openlab
14
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
• Oracle R Distribution
• ROracle
• Oracle R Enterprise
• Oracle R Advanced Analytics for Hadoop
Kostenlos für die R Community
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle R Enterprise auf einen Blick
Function push-down –
Datentransformation & Statistiken
R workspace console
Oracle statistics engine
OBIEE, Web Services
Unveränderte
User Experience
Skalierbar auf große
Datenmengen
Einbettung in
operationale Systeme
©2014 Oracle – All Rights Reserved
Entwicklung Produktion Anwendung
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse I
17
200.000 Haushalte
3 Jahre
1 Messung/Stunde
5.256 Mrd. Messwerte
(2.628 Messwerte/Kunde)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Sensordaten-Analyse II
18
10 s/Modell
200.000 Haushalte
➔
200.000 Modelle
23 Tage + 4 Stunden 4,3 Stunden
Oracle R
Enterprise
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Integration Data Miner mit Oracle R Enterprise
 SQL Query node
– Erlaubt die Integration von R Skripten
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics
• Data Understanding & Visualization
– Summary & Descriptive Statistics
– Histograms, scatter plots, box plots, bar charts
– R graphics: 3-D plots, link plots, special R graph types
– Cross tabulations
– Tests for Correlations (t-test, Pearson’s, ANOVA)
– Selected Base SAS equivalents
• Data Selection, Preparation and Transformations
– Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple
schemas
– Sampling techniques
– Re-coding, Missing values
– Aggregations
– Spatial data
– R to SQL transparency and push down
• Classification Models
– Logistic Regression (GLM)
– Naive Bayes
– Decision Trees
– Support Vector Machines (SVM)
– Neural Networks (NNs)
• Regression Models
– Multiple Regression (GLM)
– Support Vector Machines
Wide Range of In-Database Data Mining and Statistical Functions
 Clustering
– Hierarchical K-means
– Orthogonal Partitioning
– Expectation Maximization
 Anomaly Detection
– Special case Support Vector Machine (1-Class SVM)
 Associations / Market Basket Analysis
– A Priori algorithm
 Feature Selection and Reduction
– Attribute Importance (Minimum Description Length)
– Principal Components Analysis (PCA)
– Non-negative Matrix Factorization
– Singular Vector Decomposition
 Text Mining
– Most OAA algorithms support unstructured data (i.e. customer
comments, email, abstracts, etc.)
 Transactional Data
– Most OAA algorithms support transactional data (i.e. purchase
transactions, repeated measures over time)
 R packages—ability to run open source
– Broad range of R CRAN packages can be run as part of database
process via R to SQL transparency and/or via Embedded R mode
* included in every Oracle Database
Data Understanding & Visualization
Classification & Regression Models
Clustering
Run open source R packages
Data Preparation and Transformations
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Demo
21
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R 3.1.1
Oracle R Enterprise (ORE) 1.4.1
Oracle DB
12.1.0.2.0
R, SQL
Software-Komponenten im VM-Image
Oracle SQLDeveloper 4.0.3Rstudio 0.98.1079
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benefits
6054 R-Packages
23
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Labs und FastR
24
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25
Safe Harbor Statement
The following is intended to provide some insight into a line of research in Oracle Labs. It
is intended for information purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing decisions. Oracle reserves the right
to alter its development plans and practices at any time, and the development, release,
and timing of any features or functionality described in connection with any
Oracle product or service remains at the sole discretion of Oracle. Any views expressed in
this presentation are my own and do not necessarily reflect the views of Oracle.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Mission of Oracle Labs is straightforward:
Identify, explore, and transfer new
technologies that have the potential to
substantially improve Oracle's business.
– Edward Screven, Chief Corporate Architect, Oracle
26
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Überlegungen zu R
• R eignet sich hervorragend für
statistische Aufgaben.
Warum sollte man C und Fortran
verwenden?
• R ist als Sprache inhärent parallel.
Warum sollte man Parallelität extra
implementieren?
27
Library'2
(R'+'Fortran)
Library'1
(R'+'C)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
FastR
• Open-Source-R-Implementierung
– GPL 2
– https://bitbucket.org/allr/fastr
– Forschungsprototyp
– Linux, Mac
• Eigenschaften
– In “100 % Java” implementiert
– Mit Truffle (Interpreter)
und Graal (dynamischer Compiler)
28
Library'2'(R)
Library'1'(R)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Truffle und Graal
29
Node%Transi, ons:
Specializing%for%Types
Unini, alized
Generic
AST$Interpreter
Unini- alized$Nodes
AST$Interpreter
Rewri. en$Nodes Compiled)Code
Deop%miza%on
to,AST,Interpreter
Node%Rewri*ng%to%Update
Profiling%Feedback
Node%Rewri*ng
for%Profiling%Feedback
Compila( on*using
Par( al*Evalua( on
Recompila*on,using
Par*al,Evalua*on
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benchmark-Ergebnisse: Shootout
• Benchmark-Eigenschaften
– “Computer Languages Shootout Game”
– Keine typischen R-Anwendungen
• Ergebnisse
– Achtung, logarithmische Achse
– Die meisten sind ca. 10x schneller
– Positive Ausnahme: ca. 520x
30
1
10
100
1000
Geometric mean:
10x improvement over GNU R
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
PGX: Überblick
PGX ist ein Framework zur Datenanalyse, das
mächtige Graphen-Analysen der Daten unterstützt
Recommendation
Influencer
Identification
Community
Detection
Pattern Matching
PGX führt schnelle und parallele Analysen auf
großen Graphen aus – sowohl auf einer einzelnen
Maschine als auch in einer verteilten Umgebung.
PGX ist eng integriert mit der Oracle DB (Optionen
RDF und PG), welche Graphdaten auf persistentem
Speicher konsistent verwaltet.
PGX
…
Single Machine Distributed
Graph
Program
(DSL)
compiler
Unsere DSL-Compiler-Technologie erlaubt einfaches
Umschalten zwischen zwei Umgebungen.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mehr Informationen
32
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mehr Informationen
33
ORE Discussion Forum:
https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r
Oracle Advanced Analytics:
http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html
ORE-Blog:
https://blogs.oracle.com/R/
FastR:
https://bitbucket.org/allR/fastR
Graal/Truffle:
https://wiki.openjdk.java.net/display/Graal/Main
Oracle Labs im OTN:
http://www.oracle.com/technetwork/oracle-labs/index.html
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Kontakt
Dr. Nadine Schöne| Sales Consultant
Email: nadine.schoene@oracle.com
Tel: +49 331 200 7190
Dr. Michael Haupt | Tech Lead, FastR Project
Email: michael.haupt@oracle.com
Tel: +49 331 200 7277
ORACLE Deutschland B.V. & Co. KG
Schiffbauergasse 14
14467 Potsdam
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 35
Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014

More Related Content

PDF
Meetup Oracle Database BCN: 2.1 Data Management Trends
PDF
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - 19c RAC
PDF
Oracle in Database Hadoop
PDF
Oracle Unified Information Architeture + Analytics by Example
PPTX
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...
PDF
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
PDF
LAD - GroundBreakers - Jul 2019 - Using Oracle Autonomous Health Framework to...
PPTX
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Meetup Oracle Database BCN: 2.1 Data Management Trends
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - 19c RAC
Oracle in Database Hadoop
Oracle Unified Information Architeture + Analytics by Example
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
LAD - GroundBreakers - Jul 2019 - Using Oracle Autonomous Health Framework to...
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...

What's hot (20)

PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
PDF
OOW-TBE-12c-CON7307-Sharable
PDF
What's new in Oracle Trace File Analyzer version 12.2.1.1.0
DOCX
Biswajit_Sarkar_Database_Administrator
PDF
Oracle RAC 12c Rel. 2 & Cluster Architecture Internals OOW17 by Anil Nair
PPTX
HBase and Drill: How loosley typed SQL is ideal for NoSQL
PPTX
REST Enabling Your Oracle Database
PPTX
Oracle REST Data Services Best Practices/ Overview
PDF
Webinar: Selecting the Right SQL-on-Hadoop Solution
PDF
New availability features in oracle rac 12c release 2 anair ss
DOCX
LT Infotech_Amit_Kurani_10621681_CV
PDF
Oracle NoSQL Database release 3.0 overview
DOC
RaghuvirSingh
PDF
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
PDF
Oracle RAC 19c - the Basis for the Autonomous Database
PDF
Oracle RAC 12c Rel. 2 Under the Hood and Best Practices
PPTX
Hortonworks Big Data Career Paths and Training
PPTX
The Oracle Autonomous Database
PDF
Oracle Maximum Availability Architecture
PDF
Oracle RAC - New Generation
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
OOW-TBE-12c-CON7307-Sharable
What's new in Oracle Trace File Analyzer version 12.2.1.1.0
Biswajit_Sarkar_Database_Administrator
Oracle RAC 12c Rel. 2 & Cluster Architecture Internals OOW17 by Anil Nair
HBase and Drill: How loosley typed SQL is ideal for NoSQL
REST Enabling Your Oracle Database
Oracle REST Data Services Best Practices/ Overview
Webinar: Selecting the Right SQL-on-Hadoop Solution
New availability features in oracle rac 12c release 2 anair ss
LT Infotech_Amit_Kurani_10621681_CV
Oracle NoSQL Database release 3.0 overview
RaghuvirSingh
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
Oracle RAC 19c - the Basis for the Autonomous Database
Oracle RAC 12c Rel. 2 Under the Hood and Best Practices
Hortonworks Big Data Career Paths and Training
The Oracle Autonomous Database
Oracle Maximum Availability Architecture
Oracle RAC - New Generation
Ad

Viewers also liked (20)

PDF
10 WAYS ELEARNING MAXIMIZES ROI
PDF
AIDWORKER-Unfallversicherungen bei Tropen- und Infektionserkrankungen
PPT
Andrea Soares
PDF
00 manual vison credit gregal entidades financieras instalación
DOCX
ODP
Der isolierte Karton auch mal für anderes..
PDF
TELEPĀTISKU PASŪTĪJUMUS MANS TĒVS JEHOVA
PPT
Eva schulze
PPTX
El lago bled, Luis Velasquez
PPTX
Sistemas de-información-gerencial
PPTX
Pereira
DOCX
Aproductos del prodesor abraham gerardo rios valencia
PPTX
Mitos Relacionados a las Dietas y Productos milagros
PPTX
Portafolio1 innovación educativa rea
PPSX
VERONICS BATCH EVENING
PPTX
Sistema nervioso periférico
DOC
Sociedades
PPTX
E portafolio gestion empresarial Grupo:201512_233
PPTX
Materiales de construcción
DOCX
Eficiencia
10 WAYS ELEARNING MAXIMIZES ROI
AIDWORKER-Unfallversicherungen bei Tropen- und Infektionserkrankungen
Andrea Soares
00 manual vison credit gregal entidades financieras instalación
Der isolierte Karton auch mal für anderes..
TELEPĀTISKU PASŪTĪJUMUS MANS TĒVS JEHOVA
Eva schulze
El lago bled, Luis Velasquez
Sistemas de-información-gerencial
Pereira
Aproductos del prodesor abraham gerardo rios valencia
Mitos Relacionados a las Dietas y Productos milagros
Portafolio1 innovación educativa rea
VERONICS BATCH EVENING
Sistema nervioso periférico
Sociedades
E portafolio gestion empresarial Grupo:201512_233
Materiales de construcción
Eficiencia
Ad

Similar to Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014 (20)

PDF
Slidedeck Mehr als Reporting - Datenanalysen mit Oracle R Enterprise - DOAG D...
PDF
Slidedeck Datenanalysen auf Speed - Oracle R Enterprise (ORE) Demo - DOAG Big...
PDF
Slidedeck Datenanalyse mit Oracle R Enterprise for Beginners - DOAG2015
PPTX
A practical introduction to Oracle NoSQL Database - OOW2014
PPTX
All of the Performance Tuning Features in Oracle SQL Developer
PDF
Session 203 iouc summit database
PDF
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
PDF
Oracle Warehouse Builder to Oracle Data Integrator 12c Migration Utility
PPTX
Simplify IT: Oracle SuperCluster
PPTX
Tame Big Data with Oracle Data Integration
PDF
Tapping into the Big Data Reservoir (CON7934)
PDF
A gentle introduction to Oracle R Enterprise
PPTX
Oracle Database Cloud Service
PPTX
Raster Algebra mit Oracle Spatial und uDig
PDF
Oracle super cluster for oracle e business suite
PPTX
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
PDF
Solution Use Case Demo: The Power of Relationships in Your Big Data
PDF
NoSQL, Growing up at Oracle
PPTX
20140722 Taiwan MySQL User Group Meeting Tech Updates
PDF
Reducing the Risks of Migrating Off Oracle
 
Slidedeck Mehr als Reporting - Datenanalysen mit Oracle R Enterprise - DOAG D...
Slidedeck Datenanalysen auf Speed - Oracle R Enterprise (ORE) Demo - DOAG Big...
Slidedeck Datenanalyse mit Oracle R Enterprise for Beginners - DOAG2015
A practical introduction to Oracle NoSQL Database - OOW2014
All of the Performance Tuning Features in Oracle SQL Developer
Session 203 iouc summit database
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Oracle Warehouse Builder to Oracle Data Integrator 12c Migration Utility
Simplify IT: Oracle SuperCluster
Tame Big Data with Oracle Data Integration
Tapping into the Big Data Reservoir (CON7934)
A gentle introduction to Oracle R Enterprise
Oracle Database Cloud Service
Raster Algebra mit Oracle Spatial und uDig
Oracle super cluster for oracle e business suite
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
NoSQL, Growing up at Oracle
20140722 Taiwan MySQL User Group Meeting Tech Updates
Reducing the Risks of Migrating Off Oracle
 

Recently uploaded (20)

PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
annual-report-2024-2025 original latest.
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Business_Capability_Map_Collection__pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
A Complete Guide to Streamlining Business Processes
DOCX
Factor Analysis Word Document Presentation
CYBER SECURITY the Next Warefare Tactics
annual-report-2024-2025 original latest.
SAP 2 completion done . PRESENTATION.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Navigating the Thai Supplements Landscape.pdf
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
New ISO 27001_2022 standard and the changes
Topic 5 Presentation 5 Lesson 5 Corporate Fin
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Business_Capability_Map_Collection__pptx
modul_python (1).pptx for professional and student
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Introduction to Data Science and Data Analysis
Pilar Kemerdekaan dan Identi Bangsa.pptx
A Complete Guide to Streamlining Business Processes
Factor Analysis Word Document Presentation

Slidedeck Datenanalysen auf Enterprise-Niveau mit Oracle R Enterprise - DOAG2014

  • 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
  • 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Datenanalysen auf Enterprise Niveau mit Oracle R Enterprise Dr. Nadine Schöne Sales Consultant Oracle Direct, Sales Consulting Dr. Michael Haupt Tech Lead, FastR Project Virtual Machine Research Group, Oracle Labs Negib Marhoul Leading Senior Sales Consultant Oracle Direct, Sales Consulting
  • 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 4
  • 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Agenda Datenanalysen im Enterprise R und Oracle R Enterprise (ORE) Demo Oracle Labs und FastR Weitere Informationen 1 2 3 4 5 5
  • 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Datenanalysen im Enterprise 6
  • 7. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7
  • 8. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Hintergrund Statistik und Mining Verfahren  Zeitaufwendige Analyseprozesse  Mehrere Interationen  Workflows von immer wiederkehrenden Arbeitsschritten  Ressourcen-intensive Datenanalysen Daten sammeln Daten identifizieren Daten aufbereiten Daten analysieren
  • 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Wichtige Themen für Enterprise Data Analytics 1. Skalierbarkeit 2. Performance 3. Entwicklung & Produktion
  • 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R und Oracle R Enterprise (ORE) 10
  • 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R ist … 1. Eine Programmiersprache 2. Eine statistische Workbench 3. Ein Data Science Ökosystem R ist die lingua franca für Data Science. R logo © R Foundation, vonhttp://www.r-project.org
  • 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Aspekte herkömmlicher R/Datenbank-Interaktion 12 R logo © R Foundation, vonhttp://www.r-project.org
  • 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R Engine andere R-Packages Oracle R Enterprise Packages User R Engine (Desktop)1 User-Tabellen Oracle DBSQL Ergebnisse Datenbank Compute Engine2 R Engine andere R-Packages Oracle R Enterprise Packages R Engine(s) verwaltet durch Oracle DB R Ergebnisse 3 Transparency Layer => Nutzung der Rechenkraft der Datenbank Kein Flat File Export => Zeitersparnis + Nutzung der Rechenkraft des Servers „Collaborative Execution“-Modell
  • 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | “R is a powerful and interesting tool for data analysis! ORE brings R into a scalable DB engine (solving problems of data management, analysis and scalability). We actually can obtain information and added value from not so actively used data.” – Stefano Alberto Russo, Researcher at CERN Openlab 14
  • 15. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | • Oracle R Distribution • ROracle • Oracle R Enterprise • Oracle R Advanced Analytics for Hadoop Kostenlos für die R Community
  • 16. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle R Enterprise auf einen Blick Function push-down – Datentransformation & Statistiken R workspace console Oracle statistics engine OBIEE, Web Services Unveränderte User Experience Skalierbar auf große Datenmengen Einbettung in operationale Systeme ©2014 Oracle – All Rights Reserved Entwicklung Produktion Anwendung
  • 17. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Sensordaten-Analyse I 17 200.000 Haushalte 3 Jahre 1 Messung/Stunde 5.256 Mrd. Messwerte (2.628 Messwerte/Kunde)
  • 18. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Sensordaten-Analyse II 18 10 s/Modell 200.000 Haushalte ➔ 200.000 Modelle 23 Tage + 4 Stunden 4,3 Stunden Oracle R Enterprise
  • 19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Integration Data Miner mit Oracle R Enterprise  SQL Query node – Erlaubt die Integration von R Skripten
  • 20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Advanced Analytics • Data Understanding & Visualization – Summary & Descriptive Statistics – Histograms, scatter plots, box plots, bar charts – R graphics: 3-D plots, link plots, special R graph types – Cross tabulations – Tests for Correlations (t-test, Pearson’s, ANOVA) – Selected Base SAS equivalents • Data Selection, Preparation and Transformations – Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple schemas – Sampling techniques – Re-coding, Missing values – Aggregations – Spatial data – R to SQL transparency and push down • Classification Models – Logistic Regression (GLM) – Naive Bayes – Decision Trees – Support Vector Machines (SVM) – Neural Networks (NNs) • Regression Models – Multiple Regression (GLM) – Support Vector Machines Wide Range of In-Database Data Mining and Statistical Functions  Clustering – Hierarchical K-means – Orthogonal Partitioning – Expectation Maximization  Anomaly Detection – Special case Support Vector Machine (1-Class SVM)  Associations / Market Basket Analysis – A Priori algorithm  Feature Selection and Reduction – Attribute Importance (Minimum Description Length) – Principal Components Analysis (PCA) – Non-negative Matrix Factorization – Singular Vector Decomposition  Text Mining – Most OAA algorithms support unstructured data (i.e. customer comments, email, abstracts, etc.)  Transactional Data – Most OAA algorithms support transactional data (i.e. purchase transactions, repeated measures over time)  R packages—ability to run open source – Broad range of R CRAN packages can be run as part of database process via R to SQL transparency and/or via Embedded R mode * included in every Oracle Database Data Understanding & Visualization Classification & Regression Models Clustering Run open source R packages Data Preparation and Transformations
  • 21. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Demo 21
  • 22. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | R 3.1.1 Oracle R Enterprise (ORE) 1.4.1 Oracle DB 12.1.0.2.0 R, SQL Software-Komponenten im VM-Image Oracle SQLDeveloper 4.0.3Rstudio 0.98.1079
  • 23. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Benefits 6054 R-Packages 23
  • 24. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Labs und FastR 24
  • 25. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25 Safe Harbor Statement The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.
  • 26. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | The Mission of Oracle Labs is straightforward: Identify, explore, and transfer new technologies that have the potential to substantially improve Oracle's business. – Edward Screven, Chief Corporate Architect, Oracle 26
  • 27. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Überlegungen zu R • R eignet sich hervorragend für statistische Aufgaben. Warum sollte man C und Fortran verwenden? • R ist als Sprache inhärent parallel. Warum sollte man Parallelität extra implementieren? 27 Library'2 (R'+'Fortran) Library'1 (R'+'C)
  • 28. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | FastR • Open-Source-R-Implementierung – GPL 2 – https://bitbucket.org/allr/fastr – Forschungsprototyp – Linux, Mac • Eigenschaften – In “100 % Java” implementiert – Mit Truffle (Interpreter) und Graal (dynamischer Compiler) 28 Library'2'(R) Library'1'(R)
  • 29. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Truffle und Graal 29 Node%Transi, ons: Specializing%for%Types Unini, alized Generic AST$Interpreter Unini- alized$Nodes AST$Interpreter Rewri. en$Nodes Compiled)Code Deop%miza%on to,AST,Interpreter Node%Rewri*ng%to%Update Profiling%Feedback Node%Rewri*ng for%Profiling%Feedback Compila( on*using Par( al*Evalua( on Recompila*on,using Par*al,Evalua*on
  • 30. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Benchmark-Ergebnisse: Shootout • Benchmark-Eigenschaften – “Computer Languages Shootout Game” – Keine typischen R-Anwendungen • Ergebnisse – Achtung, logarithmische Achse – Die meisten sind ca. 10x schneller – Positive Ausnahme: ca. 520x 30 1 10 100 1000 Geometric mean: 10x improvement over GNU R
  • 31. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | PGX: Überblick PGX ist ein Framework zur Datenanalyse, das mächtige Graphen-Analysen der Daten unterstützt Recommendation Influencer Identification Community Detection Pattern Matching PGX führt schnelle und parallele Analysen auf großen Graphen aus – sowohl auf einer einzelnen Maschine als auch in einer verteilten Umgebung. PGX ist eng integriert mit der Oracle DB (Optionen RDF und PG), welche Graphdaten auf persistentem Speicher konsistent verwaltet. PGX … Single Machine Distributed Graph Program (DSL) compiler Unsere DSL-Compiler-Technologie erlaubt einfaches Umschalten zwischen zwei Umgebungen.
  • 32. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Mehr Informationen 32
  • 33. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Mehr Informationen 33 ORE Discussion Forum: https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r Oracle Advanced Analytics: http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html ORE-Blog: https://blogs.oracle.com/R/ FastR: https://bitbucket.org/allR/fastR Graal/Truffle: https://wiki.openjdk.java.net/display/Graal/Main Oracle Labs im OTN: http://www.oracle.com/technetwork/oracle-labs/index.html
  • 34. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Kontakt Dr. Nadine Schöne| Sales Consultant Email: nadine.schoene@oracle.com Tel: +49 331 200 7190 Dr. Michael Haupt | Tech Lead, FastR Project Email: michael.haupt@oracle.com Tel: +49 331 200 7277 ORACLE Deutschland B.V. & Co. KG Schiffbauergasse 14 14467 Potsdam
  • 35. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 35