SlideShare a Scribd company logo
FOR FINANCE
CEO Jan-Kees Buenen
CTO Jorik Blaas
FOR FINANCE
“to Know your Datalake”
+ +
Dealing with the
complexity of
data at scale
Volume, variety, velocity and
veracity
Text
a,b,c,…
Sensors
IoT
Wearables
Etc.
Digital Images
Video
Pictures
Scans
Network
Numbers
1,2,3,…
SYNERSCOPE’S CHOICE FOR A HIGH-PERFORMANCE
CONVERGED SYSTEM
o Scalable
o Flexible
o Attacks the right bottlenecks
o Unique balance of compute, memory and
storage
SYNERSCOPE’S PRIMARY USE CASES FOR IBM
POWER
o High Volume transactions
o On-premise image analytics
o Sensor data (IoT)
o Large scale networks
IBM Power
Systems
COMPUTER AUTOMATION PLUS HUMAN INTERACTION
Machine
Learning + Visual
Sensemaking
o Speed and Flexibility
o Double cognitive system
Four main components
End to End
Analytics
Connect & Correlate
at Scale
State of the art
Access Control
AI
Deep Learning
TIME BURNERS WHEN WORKING WITH DATA
o Getting the infrastructure ready
o Data Science tooling deployment
o Loading data into the platform
o Data quality, provenance or cleaning
o Searching the data
o Selecting the relevant data
o Finding which data holds information
value
o Operationalize the new findings
o Getting to data driven ways of working
End-to-End
Analytics
Productivity with Raw
and Complex Data
BULK PROCESSING FOR ALL THE HEAVY LIFTING
o Ingest from unlimited data sources
o Content Scanning at field level
o Data-driven Correlation
o Enterprise Search
Connect and
correlate data at
Scale
Fully Automated, zero interaction by Ixiwa IximeerSped up by Ixiwa
Traditional ETL data science approach
Re-arrange the analytic workflow,
Bulk
Ingest
Scan &
Organize
Correlate &
Enrich
Find Extract Analyze
Find Ingest Organize Enrich Extract Analyze
AUTOMATED BULK PROCESSES DO THE HEAVY LIFTING
o Ingest
o Content Scanning
o Data-driven Correlation
o Enterprise Search
Replace upfront
human efforts
with targeted
efforts
FINE TOUCH INTEGRATION WITH HDP AND SPARK
o Scalable performance
IXIWA BUILD-IN INTELLIGENCE TO
o Detect input format and encodings
o Extract text from PDF, DOCX, XLSX, etc. with
Tika
o Index all data with SOLR for datalake-wide
search
o Automatically tag the data
Ixiwa on
IBM Power
Systems with
POWER8 for
heavy data lifting
PATENTED MANY-TO-MANY AUTO-CORRELATOR AT SCALE
Data
Similarity &
Linking
Data
Fingerprinting Auto-correlator
Similar data
redundant
Linkable data
• Augment
• Enrich
ACCESS CONTROL
o Role based access made easy
o First do bulk ingest then find data-driven
access
o Field level data scanning for precision
o Compartmentalize the data
o Roles and field content together define access
o Continuous audit trails
GDPR READY
o Know where all sensitive data is
o Know which applications and users access
what
Enterprise Grade
Security for your
Datalake
Ixiwa + Ranger + Atlas
IXIWA ON APACHE RANGER AND APACHE ATLAS
o Automated data tagging
o Easy setting of access permissions
o Bulk ingested data made easy to share safely
o Full provenance trails kept in Ranger
o Transparent, traceable and auditable
SECURE HIGH-LEVEL TASK AND WORKFLOW SYSTEM
o Reduce the need for one-off data science
activities
Smart data
scanning and
data
management
BOTTLENECKS
o How data is presented to humans
o Collaboration in and between groups
o Complex decision making
o Data driven decisions
AI BECOMES A HUMANS BEST COMPANION
o Changes how data is presented
o Allows humans to do what humans do best
o ASSOCIATE DATA AND INFORMATION
Productivity
Productivity
Productivity
with data at Scale
AI and
Deep Learning
Quality of the People
and Quality of the Data
Analyst
Analyze Emerging
Patterns & Context
Historical Data
Test found rules, return
high-quality hits
Streaming Flows
Implement found rules,
alert in near real-time
COMBINE IBM- HORTONWORKS- GOOGLE
SynerScope’s
fine touch
integration of
HDP, GPU &
Tensorflow
o Mixed cluster with multiple node types
o Leverage the Hybrid nature of Tensorflow (CPU+GPU)
o Leverage YARN node labels to launch on GPU nodes
o Models stored on HDFS
HDP
YARN
CPU Nodes
TensorFlow
PySpark
GPU Nodes
CUDA
TensorFlow
PySpark
SYNERSCOPE & TENSORFLOW ON IBM POWER SYSTEMS
Warp power for
Storage,
Compute and
Learning
o PowerAI Deep Learning distribution includes pre-compiled
TensorFlow binaries for fast deployment
o Direct support for NVIDIA Tesla and Pascal series
o NVLink support on Pascal
Spectrum
Scale
Storage
IBM
POWER8
HDP
PySpark
IBM POWER8
&
NVIDIA Pascal
HDP
PowerAI
TensorFlow
PySpark
Scalable ComputeScalable Storage Scalable Learning
EXAMPLE OF A PERFECT TASK FOR IBM POWER SYSTEMS
o Image classification and grading
o Batch-wise training on ever expanding data
o Field level data scanning for precision
On-premise
Image
Classification
when a public cloud is off-limits
SynerScope the
more flexible link
Practical Live Modelling with SAP, IBM Power Systems and ML
Live Order
Data
Stream
Full
Historical
Data
Labels &
Predictio
ns
Async
Stream
Processin
g
Trained and
Dynamically
Updated Models
Virtualized on IBM
POWER8
IBM POWER8 (lab) IBM POWER8 NVIDIA
CONTACTS
INFO@SYNERSCOPE.COM
CTO jorik.blaas@synerscope.com
CEO jan-kees.Buenen@synerscope.com
Learn more about Hortonworks on Power:
ibm.biz/hortonworksOnPower
Questions?

More Related Content

PPTX
Build Big Data Enterprise Solutions Faster on Azure HDInsight
PPTX
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
PPTX
How Hadoop Makes the Natixis Pack More Efficient
PPTX
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
PPTX
Scaling Data Science on Big Data
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
PDF
Data-In-Motion Unleashed
PPTX
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
How Hadoop Makes the Natixis Pack More Efficient
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Scaling Data Science on Big Data
High Performance Spatial-Temporal Trajectory Analysis with Spark
Data-In-Motion Unleashed
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...

What's hot (20)

PPTX
Log I am your father
PDF
Empowering you with Democratized Data Access, Data Science and Machine Learning
PPTX
Hadoop Journey at Walgreens
PPTX
Depositing Value from Transactional Data at Danske Bank
PPTX
Hadoop Reporting and Analysis - Jaspersoft
PPTX
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
PDF
Filling the Data Lake
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
PPTX
Big Data at your Desk with KNIME
PPTX
The key to unlocking the Value in the IoT? Managing the Data!
PDF
Benefits of Hadoop as Platform as a Service
PPTX
Keys for Success from Streams to Queries
PDF
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
PDF
Building a Big Data platform with the Hadoop ecosystem
PPTX
Hadoop for the Masses
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
PPTX
Hadoop crash course workshop at Hadoop Summit
PPTX
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
PDF
Big Data Architecture and Deployment
Log I am your father
Empowering you with Democratized Data Access, Data Science and Machine Learning
Hadoop Journey at Walgreens
Depositing Value from Transactional Data at Danske Bank
Hadoop Reporting and Analysis - Jaspersoft
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
Filling the Data Lake
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Big Data at your Desk with KNIME
The key to unlocking the Value in the IoT? Managing the Data!
Benefits of Hadoop as Platform as a Service
Keys for Success from Streams to Queries
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Building a Big Data platform with the Hadoop ecosystem
Hadoop for the Masses
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
Hadoop crash course workshop at Hadoop Summit
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Big Data Architecture and Deployment
Ad

Similar to Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive Business (20)

PDF
Ibm db2update2019 icp4 data
PDF
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
PDF
Deep Learning Image Processing Applications in the Enterprise
PDF
AI in the enterprise
PPTX
AI in the Enterprise at Scale
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
ICP for Data- Enterprise platform for AI, ML and Data Science
PDF
AI at Scale in Enterprises
PPTX
Big Data Expo 2015 - Pentaho The Future of Analytics
PDF
AI Scalability for the Next Decade
PPTX
Cognitive Assistant for Data Scientists (CADS)
PDF
Libera la potenza del Machine Learning
PDF
Warp10, a horizontal framework for Time Series data, OW2con'18, June 7-8, 201...
 
PDF
Is your data paying you dividends?
PDF
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
PPTX
Benefits of Transferring Real-Time Data to Hadoop at Scale
PPTX
IBM Modern Analytics Journey
PDF
Enabling a hardware accelerated deep learning data science experience for Apa...
PDF
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
PDF
G111614 top-trends-sydney2019-v1910a
Ibm db2update2019 icp4 data
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Deep Learning Image Processing Applications in the Enterprise
AI in the enterprise
AI in the Enterprise at Scale
Accelerating Data Science and Real Time Analytics at Scale
ICP for Data- Enterprise platform for AI, ML and Data Science
AI at Scale in Enterprises
Big Data Expo 2015 - Pentaho The Future of Analytics
AI Scalability for the Next Decade
Cognitive Assistant for Data Scientists (CADS)
Libera la potenza del Machine Learning
Warp10, a horizontal framework for Time Series data, OW2con'18, June 7-8, 201...
 
Is your data paying you dividends?
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
Benefits of Transferring Real-Time Data to Hadoop at Scale
IBM Modern Analytics Journey
Enabling a hardware accelerated deep learning data science experience for Apa...
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
G111614 top-trends-sydney2019-v1910a
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
PPT
State of Security: Apache Spark & Apache Zeppelin
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
PDF
Enabling Digital Diagnostics with a Data Science Platform
PDF
Revolutionize Text Mining with Spark and Zeppelin
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
PDF
Hadoop Crash Course
PDF
Data Science Crash Course
PDF
Apache Spark Crash Course
PDF
Dataflow with Apache NiFi
PPTX
Schema Registry - Set you Data Free
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
Mool - Automated Log Analysis using Data Science and ML
PPTX
HBase in Practice
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
PPTX
Backup and Disaster Recovery in Hadoop
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Data Science Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
A Presentation on Artificial Intelligence
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Modernizing your data center with Dell and AMD
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Digital-Transformation-Roadmap-for-Companies.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
A Presentation on Artificial Intelligence
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Unlocking AI with Model Context Protocol (MCP)
Modernizing your data center with Dell and AMD
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive Business

  • 1. FOR FINANCE CEO Jan-Kees Buenen CTO Jorik Blaas
  • 2. FOR FINANCE “to Know your Datalake” + +
  • 3. Dealing with the complexity of data at scale Volume, variety, velocity and veracity Text a,b,c,… Sensors IoT Wearables Etc. Digital Images Video Pictures Scans Network Numbers 1,2,3,…
  • 4. SYNERSCOPE’S CHOICE FOR A HIGH-PERFORMANCE CONVERGED SYSTEM o Scalable o Flexible o Attacks the right bottlenecks o Unique balance of compute, memory and storage SYNERSCOPE’S PRIMARY USE CASES FOR IBM POWER o High Volume transactions o On-premise image analytics o Sensor data (IoT) o Large scale networks IBM Power Systems
  • 5. COMPUTER AUTOMATION PLUS HUMAN INTERACTION Machine Learning + Visual Sensemaking o Speed and Flexibility o Double cognitive system
  • 6. Four main components End to End Analytics Connect & Correlate at Scale State of the art Access Control AI Deep Learning
  • 7. TIME BURNERS WHEN WORKING WITH DATA o Getting the infrastructure ready o Data Science tooling deployment o Loading data into the platform o Data quality, provenance or cleaning o Searching the data o Selecting the relevant data o Finding which data holds information value o Operationalize the new findings o Getting to data driven ways of working End-to-End Analytics Productivity with Raw and Complex Data
  • 8. BULK PROCESSING FOR ALL THE HEAVY LIFTING o Ingest from unlimited data sources o Content Scanning at field level o Data-driven Correlation o Enterprise Search Connect and correlate data at Scale
  • 9. Fully Automated, zero interaction by Ixiwa IximeerSped up by Ixiwa Traditional ETL data science approach Re-arrange the analytic workflow, Bulk Ingest Scan & Organize Correlate & Enrich Find Extract Analyze Find Ingest Organize Enrich Extract Analyze
  • 10. AUTOMATED BULK PROCESSES DO THE HEAVY LIFTING o Ingest o Content Scanning o Data-driven Correlation o Enterprise Search Replace upfront human efforts with targeted efforts
  • 11. FINE TOUCH INTEGRATION WITH HDP AND SPARK o Scalable performance IXIWA BUILD-IN INTELLIGENCE TO o Detect input format and encodings o Extract text from PDF, DOCX, XLSX, etc. with Tika o Index all data with SOLR for datalake-wide search o Automatically tag the data Ixiwa on IBM Power Systems with POWER8 for heavy data lifting
  • 12. PATENTED MANY-TO-MANY AUTO-CORRELATOR AT SCALE Data Similarity & Linking Data Fingerprinting Auto-correlator Similar data redundant Linkable data • Augment • Enrich
  • 13. ACCESS CONTROL o Role based access made easy o First do bulk ingest then find data-driven access o Field level data scanning for precision o Compartmentalize the data o Roles and field content together define access o Continuous audit trails GDPR READY o Know where all sensitive data is o Know which applications and users access what Enterprise Grade Security for your Datalake Ixiwa + Ranger + Atlas
  • 14. IXIWA ON APACHE RANGER AND APACHE ATLAS o Automated data tagging o Easy setting of access permissions o Bulk ingested data made easy to share safely o Full provenance trails kept in Ranger o Transparent, traceable and auditable SECURE HIGH-LEVEL TASK AND WORKFLOW SYSTEM o Reduce the need for one-off data science activities Smart data scanning and data management
  • 15. BOTTLENECKS o How data is presented to humans o Collaboration in and between groups o Complex decision making o Data driven decisions AI BECOMES A HUMANS BEST COMPANION o Changes how data is presented o Allows humans to do what humans do best o ASSOCIATE DATA AND INFORMATION Productivity Productivity Productivity with data at Scale
  • 16. AI and Deep Learning Quality of the People and Quality of the Data
  • 17. Analyst Analyze Emerging Patterns & Context Historical Data Test found rules, return high-quality hits Streaming Flows Implement found rules, alert in near real-time
  • 18. COMBINE IBM- HORTONWORKS- GOOGLE SynerScope’s fine touch integration of HDP, GPU & Tensorflow o Mixed cluster with multiple node types o Leverage the Hybrid nature of Tensorflow (CPU+GPU) o Leverage YARN node labels to launch on GPU nodes o Models stored on HDFS HDP YARN CPU Nodes TensorFlow PySpark GPU Nodes CUDA TensorFlow PySpark
  • 19. SYNERSCOPE & TENSORFLOW ON IBM POWER SYSTEMS Warp power for Storage, Compute and Learning o PowerAI Deep Learning distribution includes pre-compiled TensorFlow binaries for fast deployment o Direct support for NVIDIA Tesla and Pascal series o NVLink support on Pascal Spectrum Scale Storage IBM POWER8 HDP PySpark IBM POWER8 & NVIDIA Pascal HDP PowerAI TensorFlow PySpark Scalable ComputeScalable Storage Scalable Learning
  • 20. EXAMPLE OF A PERFECT TASK FOR IBM POWER SYSTEMS o Image classification and grading o Batch-wise training on ever expanding data o Field level data scanning for precision On-premise Image Classification when a public cloud is off-limits
  • 22. Practical Live Modelling with SAP, IBM Power Systems and ML Live Order Data Stream Full Historical Data Labels & Predictio ns Async Stream Processin g Trained and Dynamically Updated Models Virtualized on IBM POWER8 IBM POWER8 (lab) IBM POWER8 NVIDIA
  • 23. CONTACTS INFO@SYNERSCOPE.COM CTO jorik.blaas@synerscope.com CEO jan-kees.Buenen@synerscope.com Learn more about Hortonworks on Power: ibm.biz/hortonworksOnPower Questions?