SlideShare a Scribd company logo
Grab some coffee and enjoy 
the pre-­show banter before 
the top of the hour!
Big Data in Action: Real-World Solution Showcase 
The Briefing Room
Twitter Tag: #briefr 
The Briefing Room 
Welcome 
Host: 
Eric Kavanagh 
eric.kavanagh@bloorgroup.com
Mission 
! Reveal the essential characteristics of enterprise software, 
Twitter Tag: #briefr 
The Briefing Room 
good and bad 
! Provide a forum for detailed analysis of today’s innovative 
technologies 
! Give vendors a chance to explain their product to savvy 
analysts 
! Allow audience members to pose serious questions... and get 
answers!
Twitter Tag: #briefr 
The Briefing Room 
Topics 
This Month: BIG DATA 
March: CLOUD 
April: BIG DATA 
2014 Editorial Calendar at 
www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr 
The Briefing Room 
Big Data
Analysts: Lindy Ryan and John O’Brien 
Twitter Tag: #briefr 
Lindy Ryan is the Research Director for Radiant Advisor’s Data 
Discovery and Visualization practice and leads research and analyst 
activities in the confluence of data discovery, visualization, and data 
science from a business needs perspective. She also retains the role 
of Editor in Chief of RediscoveringBI Magazine. As Radiant Advisors’ 
Editor in Chief for three years, Lindy participated in in-depth 
discussions and analysis with industry thought leaders and vendors 
while maturing her position and perspectives in the BI industry. 
John O’Brien is Principal and CEO of Radiant Advisors. With over 25 
years of experience delivering value through data warehousing and BI 
programs, John’s unique perspective comes from the combination of 
his roles as a practitioner, consultant, and vendor in the BI industry. 
His knowledge in designing, building, and growing enterprise BI 
systems and teams brings real world insights to each role and phase 
within a BI program. Today, through Radiant Advisors John provides 
research and advisory services that guide companies in meeting the 
demands of next generation information management, architecture, 
and emerging technologies. 
The Briefing Room
! IBM offers a full suite of Big Data solutions, including 
InfoSphere Streams, InfoSphere BigInsights and InfoSphere 
Data Explorer 
! IBM also offers a series of products designed to leverage the 
Twitter Tag: #briefr 
The Briefing Room 
power of Hadoop 
! Stream Integration is a Premier Business Partner with IBM 
and focuses its consultancy exclusively on IBM products 
IBM
Twitter Tag: #briefr 
The Briefing Room 
Guests: 
Eric Poulin 
VP of Business Analytics, 
Stream Integration 
Paul Flach 
VP of Enterprise Analytics, 
Stream Integration
10 
Big 
Data 
Performance 
for 
Analy3cs 
Eric 
Poulin 
VP, 
Analy3cs 
 
Big 
Data 
Eric.poulin@streamintegra3on.com
11 
11 
Agenda 
• Overview 
of 
Stream 
Integra3on 
• Big 
Data 
Performance 
for 
Analy3cs 
• Modular 
Analy3cs
12 
Company 
Overview 
• Award Winning Information 
Lifecycle Consultancy 
• Founded in 2000 
• IBM Premier Partner 
• Exclusively focused on IBM 
Information Management, Big Data 
and Analytics 
• Offices in North America, 
Caribbean, and Europe 
• Development and Support Centers 
in India and China 
Copyright 
© 
2014, 
Stream 
Integra3on 
Inc. 
All 
rights 
reserved. 
12
LINKING 
DATA 
13 
TO 
THE 
BUSINESS 
REQUIREMENTS 
TRANSACTIONAL 
 
COLLABORATIVE 
APPLICATIONS 
MANAGE 
CONTENT 
ANALYZE 
BIG 
DATA 
STRUCTURED 
DATA 
INTEGRATE 
INFOSPHERE 
MDM 
GOVERN 
DATA 
BUSINESS 
ANALYTICS 
APPLICATIONS 
STREAMS 
EXTERNAL 
INFORMATION 
SOURCES 
ww 
QUALITY 
LIFECYCLE 
MANAGEMENT 
SECURITY 
 
PRIVACY 
INFORMATION 
SERVER 
DESIGN 
★ 
DEPLOY 
★ 
OPERATE 
★ 
MANAGE 
★ 
EXTEND 
BIG 
INSIGHTS 
TRADITIONAL 
SOURCES 
PUREDATA/NETEZZA 
STREAMING 
INFORMATION
14 
Performance 
for 
the 
Future 
of 
Analy3cs 
Paul 
Flach 
Stream 
Integra3on
15 
Capabili3es 
Required 
for 
Hadoop 
Style 
Workloads 
Applica3on 
Support 
and 
Development 
Cluster 
and 
Workload 
Management 
Run3me 
Visualiza3on 
 
Discovery 
Data 
Ingest 
Analy3cs 
Engines 
Data 
Store 
File 
System 
Tooling 
Security 
15
16 
Big 
SQL 
provides 
na3ve 
SQL 
for 
Hadoop 
ANSI 
SQL 
92+ 
support
17 
Coordinator node 
Map 
Reduce 
MPP 
RunKme 
n+2 
User 
Data 
temp(s) 
HDFS 
Hadoop Data Node(s) 
SQL sub-sections 
Map 
Reduce 
MPP 
RunKme 
n+n 
User 
Data 
temp(s) 
HDFS 
Head Node 
Catalog 
Host 2 Host n 
Host 1 
Cluster 
network 
Local 
fs 
(temps) 
Local 
fs 
(catalog 
tables) 
Distributed 
fs 
sync 
Map 
Reduce 
MPP 
RunKme 
n+1 
User 
Data 
temp(s) 
HDFS 
Direct 
Hadoop 
data 
access 
sync 
sync 
Big 
AcceleraKon 
Query 
OpKmizer 
Common 
SQL 
BigInsights 
– 
DB2 
– 
Netezza 
Oracle 
– 
Teradata 
Next 
Gen 
Big 
SQL 
will 
provide 
first 
MPP 
query 
engine 
for 
Hadoop
18 
BigSheets 
provides 
business 
users 
with 
access 
to 
data 
without 
programming 
Spreadsheet-­‐style 
interface 
Data 
VisualizaKon 
 
Graphs
19 
Watson 
Explorer 
included 
in 
BigInsights 
Faceted 
Search, 
NavigaKon 
 
Discovery
20 
AnalyKcs 
Accelerators 
provide 
ability 
to 
extract 
insights 
more 
quickly 
Text 
Social 
Media 
Machine 
Data
21 
App 
Store 
reduces 
development 
effort 
and 
enables 
reusability 
Combine 
Hadoop 
Apps
22 
Open 
Source 
Hadoop 
Components 
Visualization  Discovery Data Ingest 
Open 
Source 
Analytics Engines 
Cluster Optimization and Management 
Nutch 
Runtime 
Data Store HBase 
File System 
MapReduce 
HDFS 
Application Support and Development Tooling 
MapReduce 
Pig 
Hive 
ZooKeeper 
Sqoop 
Security 
HCatalog 
Flume 
Avro 
Lucene 
Oozie 
Derby 
22
23 
BigInsights 
Enterprise 
Edi3on 
Components 
Visualization  Discovery Data Ingest 
Netezza 
DB2 
Analytics Engines 
Cluster Optimization and Management Streams 
Derby 
Private 
firewall 
Open 
Source 
DataStage 
Nutch 
IBM 
IBM InfoSphere BigInsights 
Integrated 
Installer 
Runtime 
Admin 
Console 
Data Store HBase 
File System 
MapReduce 
HDFS 
Text 
Processing 
Engine 
and 
Extractor 
Library 
(AQL+HIL) 
JDBC 
Application Support and Development Tooling 
App 
infrastructure 
MapReduce 
Pig 
Hive 
Splicable 
Text 
Compression 
High 
Availability 
ZooKeeper 
Sqoop 
SystemML 
Eclipse 
Big 
SQL 
Security 
HCatalog 
R 
Gnip 
BoardReader 
GPFS-­‐FPO 
LDAP 
Guardium 
Flume 
Jaql 
Avro 
BigSheets 
Dashboard 
/ 
visualiza3on 
Data 
Explorer 
Lucene 
Oozie 
PAM 
Enhanced 
Monitoring 
Adap3ve 
MapReduce 
Teradata 
23
24 
Modular 
Analy3cs
Plagorm 
Analy3c 
Modules 
Cloud 
Computing 
GIS Engine Forecasting 
Engine 
Routing Engine 
Inventory Engine 
Work Force 
Engine 
Solutions 
Core Engine 
Streams 
BigInsights 
IMDB 
Column-­‐Store 
PureData 
In-­‐Flight 
Data 
Self-­‐ 
Structured 
Data 
Frequently 
Requested 
Summaries 
Low 
Entropy 
Data 
Mixed 
Workload 
Requests 
25 25
26 
Thank 
you!
Twitter Tag: #briefr 
The Briefing Room 
Perceptions  Questions 
Analysts: 
Lindy Ryan and 
John O’Brien
BIG DATA 
IN ACTION 
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 
28 
Real-World Solution Showcase with Stream Integration 
Inside Analysis – The Briefing Room, February 25, 2014 
Lindy Ryan | Research Director, Data Discovery  Visualization 
@lindy_ryan lindy.ryan@radiantadvisors.com 
John O’Brien | Principal Analyst, Modern Data Platforms 
@obrienjw john.obrien@radiantadvisors.com
Big Data in Action: Real-World Solutions 
MODERN DATA PLATFORM 
Apache Hadoop 
R programs 
Map Reduce 
Flexibility Class 
Extending SQL Access to Big Data and Hadoop via Hive and other HDFS SQL engines 
Highly Optimized for Analytics 
MPP In-memory MOLAP 
Hive SQL 
Highly Specialized for Analytics 
Document 
Stores 
PIG / Hive 
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 
29 
Enterprise 
Data 
Warehouses 
Master 
Reference 
Data 
askdjfl 
kasjdfl 
iuyuiio 
Graphs 
Text 
Analytics 
Optimized Class Reference Class 
Discovery, Scalable, Programs Discovery  Analytics Oriented Stable, Context, SQL 
Operational Systems, Big Data, Streams 
HDFS 
Columnar
Big Data in Action: Real-World Solutions 
SQL-ON-HADOOP 
Apache 
Hadoop v1 
HCatalog 
Apache 
Hadoop v2 
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 
30 
PIG 
Hive-QL 
MapReduce 
Hadoop HDFS 
Map 
Reduce 
PIG 
Hive-QL 
HCatalog 
YARN 
Hadoop HDFS 
Hadoop 
Distributions 
and 3rd Party 
PIG 
Hive 
Map 
Reduce 
HCatalog 
YARN 
Impala, HAWQ 
InfiniDB, Presto 
Hadoop HDFS 
MPP Engine 
Not all SQL-on-Hadoop is the same: 
1. SQL capabilities (SQL-92, Analytic functions SQL-2003? SQL-2011? UDF?) 
2. Scalability (not always the same as Hadoop scalability) 
3. Speed (flat out performance response time without caching) 
File types: ORCFILE, SEQPART, Parquet
Big Data in Action: Real-World Solutions 
TRADITIONAL FORMS OF DISCOVERY 
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 
31 
Spreadsheets 
• Most popular business “analytic” tool 
• Having access to the data is the value 
• Analysts can slice and dice data for insights 
Basic Visualizations 
• Provide visual representations of data 
• Provide insights beyond plain text data 
• Simplify complex information  highlight trends
Big Data in Action: Real-World Solutions 
ANALYTIC FORMS OF DISCOVERY 
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 
32 
Multi-Faceted, “Search Mode” 
• Discovery within structured  unstructured data 
• Mine through various forms of data at once 
• Google-like search to iterate and deep dive 
Advanced Visualizations 
• Visualize clusters of data and correlations 
• Discover analytic models iteratively with data 
• Visual cues and cognitive sciences UX
For more information 
www.RadiantAdvisors.com 
Twitter: @RadiantAdvisors #ModernBI #RediscoveringBI 
RSS: feed://radiantadvisors.com/feed/ 
Email: info@RadiantAdvisors.com 
LinkedIn: www.linkedin.com/company/radiant-advisors 
Subscribe: Rediscovering BI quarterly e-magazine 
www.radiantadvisors.com/rediscoveringbi 
THANK YOU! 
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 
33
Big Data in Action: Real-World Solutions 
ANALYST QUESTIONS 
© Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 
34 
1. How are you handling the performance or SQL 
capabilities in Hive with Big SQL? 
2. How do users define schema for Big SQL? 
3. Can you explain user roles, security, and metadata in 
the App Store? Who is the store administrator?
Twitter Tag: #briefr 
The Briefing Room
This Month: BIG DATA 
March: CLOUD 
April: BIG DATA 
www.insideanalysis.com/webcasts/the-briefing-room 
Twitter Tag: #briefr 
The Briefing Room 
Upcoming Topics 
2014 Editorial Calendar at 
www.insideanalysis.com
Twitter Tag: #briefr 
THANK YOU 
for your 
ATTENTION! 
The Briefing Room

More Related Content

PDF
451 Research Impact Report
PDF
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
PPTX
Accelerating Data Lakes and Streams with Real-time Analytics
PDF
Big Data & SQL: The On-Ramp to Hadoop
PDF
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
PDF
Ironfan: Your Foundation for Flexible Big Data Infrastructure
PPTX
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
PDF
Agile, Automated, Aware: How to Model for Success
451 Research Impact Report
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Accelerating Data Lakes and Streams with Real-time Analytics
Big Data & SQL: The On-Ramp to Hadoop
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Ironfan: Your Foundation for Flexible Big Data Infrastructure
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
Agile, Automated, Aware: How to Model for Success

What's hot (20)

PDF
Hadoop Big Data Lakes Keynote
PPTX
Infochimps + CloudCon: Infinite Monkey Theorem
PPTX
Govern This! Data Discovery and the application of data governance with new s...
PDF
Game Changed – How Hadoop is Reinventing Enterprise Thinking
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
PDF
IDC Retail Insights - What's Possible with a Modern Data Architecture?
PDF
Privacy-Preserving AI Network - PlatON 2.0
PPTX
Big Data/Hadoop Option Analysis
PPTX
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
PDF
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
PDF
The Scout24 Data Platform (A Technical Deep Dive)
PPTX
Building intelligent applications, experimental ML with Uber’s Data Science W...
PDF
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
PDF
Smart data for a predictive bank
PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
PPT
Big Data Real Time Analytics - A Facebook Case Study
Hadoop Big Data Lakes Keynote
Infochimps + CloudCon: Infinite Monkey Theorem
Govern This! Data Discovery and the application of data governance with new s...
Game Changed – How Hadoop is Reinventing Enterprise Thinking
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Privacy-Preserving AI Network - PlatON 2.0
Big Data/Hadoop Option Analysis
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
The Maturity Model: Taking the Growing Pains Out of Hadoop
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Scout24 Data Platform (A Technical Deep Dive)
Building intelligent applications, experimental ML with Uber’s Data Science W...
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Smart data for a predictive bank
Hadoop 2.0: YARN to Further Optimize Data Processing
Big Data Real Time Analytics - A Facebook Case Study
Ad

Similar to Big Data in Action – Real-World Solution Showcase (20)

PDF
The Agile Analyst: Solving the Data Problem with Virtualization
PDF
Hadoop as an Analytic Platform: Why Not?
PDF
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
PDF
Time to Fly - Why Predictive Analytics is Going Mainstream
PPTX
Hadoop Reporting and Analysis - Jaspersoft
PDF
Data Discovery and BI - Is there Really a Difference?
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
PDF
Big Data Tools: A Deep Dive into Essential Tools
PPT
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
PDF
R and Big Data using Revolution R Enterprise with Hadoop
PDF
Big Data Analytics
PDF
Transforming Business in a Digital Era with Big Data and Microsoft
PDF
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
PDF
Moving Targets: Harnessing Real-time Value from Data in Motion
PDF
SAP BI Roadmap
PDF
Has Traditional MDM Finally Met its Match?
PDF
BAR360 open data platform presentation at DAMA, Sydney
PDF
The Great Lakes: How to Approach a Big Data Implementation
PDF
Getting started with Hadoop on the Cloud with Bluemix
PPTX
The Agile Analyst: Solving the Data Problem with Virtualization
Hadoop as an Analytic Platform: Why Not?
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
Time to Fly - Why Predictive Analytics is Going Mainstream
Hadoop Reporting and Analysis - Jaspersoft
Data Discovery and BI - Is there Really a Difference?
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Big Data Tools: A Deep Dive into Essential Tools
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
R and Big Data using Revolution R Enterprise with Hadoop
Big Data Analytics
Transforming Business in a Digital Era with Big Data and Microsoft
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Moving Targets: Harnessing Real-time Value from Data in Motion
SAP BI Roadmap
Has Traditional MDM Finally Met its Match?
BAR360 open data platform presentation at DAMA, Sydney
The Great Lakes: How to Approach a Big Data Implementation
Getting started with Hadoop on the Cloud with Bluemix
Ad

More from Inside Analysis (20)

PDF
An Ounce of Prevention: Forging Healthy BI
PDF
First in Class: Optimizing the Data Lake for Tighter Integration
PDF
Fit For Purpose: Preventing a Big Data Letdown
PDF
To Serve and Protect: Making Sense of Hadoop Security
PDF
The Hadoop Guarantee: Keeping Analytics Running On Time
PDF
Introducing: A Complete Algebra of Data
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
PDF
Ahead of the Stream: How to Future-Proof Real-Time Analytics
PDF
All Together Now: Connected Analytics for the Internet of Everything
PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
PDF
The Biggest Picture: Situational Awareness on a Global Level
PDF
Structurally Sound: How to Tame Your Architecture
PDF
SQL In Hadoop: Big Data Innovation Without the Risk
PDF
The Perfect Fit: Scalable Graph for Big Data
PDF
A Revolutionary Approach to Modernizing the Data Warehouse
PDF
Rethinking Data Availability and Governance in a Mobile World
PDF
DisrupTech - Dave Duggal
PPTX
Modus Operandi
PPTX
Phasic Systems - Dr. Geoffrey Malafsky
PPT
Red Hat - Sarangan Rangachari
An Ounce of Prevention: Forging Healthy BI
First in Class: Optimizing the Data Lake for Tighter Integration
Fit For Purpose: Preventing a Big Data Letdown
To Serve and Protect: Making Sense of Hadoop Security
The Hadoop Guarantee: Keeping Analytics Running On Time
Introducing: A Complete Algebra of Data
The Role of Data Wrangling in Driving Hadoop Adoption
Ahead of the Stream: How to Future-Proof Real-Time Analytics
All Together Now: Connected Analytics for the Internet of Everything
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
The Biggest Picture: Situational Awareness on a Global Level
Structurally Sound: How to Tame Your Architecture
SQL In Hadoop: Big Data Innovation Without the Risk
The Perfect Fit: Scalable Graph for Big Data
A Revolutionary Approach to Modernizing the Data Warehouse
Rethinking Data Availability and Governance in a Mobile World
DisrupTech - Dave Duggal
Modus Operandi
Phasic Systems - Dr. Geoffrey Malafsky
Red Hat - Sarangan Rangachari

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Cloud computing and distributed systems.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
Dropbox Q2 2025 Financial Results & Investor Presentation
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology

Big Data in Action – Real-World Solution Showcase

  • 1. Grab some coffee and enjoy the pre-­show banter before the top of the hour!
  • 2. Big Data in Action: Real-World Solution Showcase The Briefing Room
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com
  • 4. Mission ! Reveal the essential characteristics of enterprise software, Twitter Tag: #briefr The Briefing Room good and bad ! Provide a forum for detailed analysis of today’s innovative technologies ! Give vendors a chance to explain their product to savvy analysts ! Allow audience members to pose serious questions... and get answers!
  • 5. Twitter Tag: #briefr The Briefing Room Topics This Month: BIG DATA March: CLOUD April: BIG DATA 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
  • 6. Twitter Tag: #briefr The Briefing Room Big Data
  • 7. Analysts: Lindy Ryan and John O’Brien Twitter Tag: #briefr Lindy Ryan is the Research Director for Radiant Advisor’s Data Discovery and Visualization practice and leads research and analyst activities in the confluence of data discovery, visualization, and data science from a business needs perspective. She also retains the role of Editor in Chief of RediscoveringBI Magazine. As Radiant Advisors’ Editor in Chief for three years, Lindy participated in in-depth discussions and analysis with industry thought leaders and vendors while maturing her position and perspectives in the BI industry. John O’Brien is Principal and CEO of Radiant Advisors. With over 25 years of experience delivering value through data warehousing and BI programs, John’s unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in designing, building, and growing enterprise BI systems and teams brings real world insights to each role and phase within a BI program. Today, through Radiant Advisors John provides research and advisory services that guide companies in meeting the demands of next generation information management, architecture, and emerging technologies. The Briefing Room
  • 8. ! IBM offers a full suite of Big Data solutions, including InfoSphere Streams, InfoSphere BigInsights and InfoSphere Data Explorer ! IBM also offers a series of products designed to leverage the Twitter Tag: #briefr The Briefing Room power of Hadoop ! Stream Integration is a Premier Business Partner with IBM and focuses its consultancy exclusively on IBM products IBM
  • 9. Twitter Tag: #briefr The Briefing Room Guests: Eric Poulin VP of Business Analytics, Stream Integration Paul Flach VP of Enterprise Analytics, Stream Integration
  • 10. 10 Big Data Performance for Analy3cs Eric Poulin VP, Analy3cs Big Data Eric.poulin@streamintegra3on.com
  • 11. 11 11 Agenda • Overview of Stream Integra3on • Big Data Performance for Analy3cs • Modular Analy3cs
  • 12. 12 Company Overview • Award Winning Information Lifecycle Consultancy • Founded in 2000 • IBM Premier Partner • Exclusively focused on IBM Information Management, Big Data and Analytics • Offices in North America, Caribbean, and Europe • Development and Support Centers in India and China Copyright © 2014, Stream Integra3on Inc. All rights reserved. 12
  • 13. LINKING DATA 13 TO THE BUSINESS REQUIREMENTS TRANSACTIONAL COLLABORATIVE APPLICATIONS MANAGE CONTENT ANALYZE BIG DATA STRUCTURED DATA INTEGRATE INFOSPHERE MDM GOVERN DATA BUSINESS ANALYTICS APPLICATIONS STREAMS EXTERNAL INFORMATION SOURCES ww QUALITY LIFECYCLE MANAGEMENT SECURITY PRIVACY INFORMATION SERVER DESIGN ★ DEPLOY ★ OPERATE ★ MANAGE ★ EXTEND BIG INSIGHTS TRADITIONAL SOURCES PUREDATA/NETEZZA STREAMING INFORMATION
  • 14. 14 Performance for the Future of Analy3cs Paul Flach Stream Integra3on
  • 15. 15 Capabili3es Required for Hadoop Style Workloads Applica3on Support and Development Cluster and Workload Management Run3me Visualiza3on Discovery Data Ingest Analy3cs Engines Data Store File System Tooling Security 15
  • 16. 16 Big SQL provides na3ve SQL for Hadoop ANSI SQL 92+ support
  • 17. 17 Coordinator node Map Reduce MPP RunKme n+2 User Data temp(s) HDFS Hadoop Data Node(s) SQL sub-sections Map Reduce MPP RunKme n+n User Data temp(s) HDFS Head Node Catalog Host 2 Host n Host 1 Cluster network Local fs (temps) Local fs (catalog tables) Distributed fs sync Map Reduce MPP RunKme n+1 User Data temp(s) HDFS Direct Hadoop data access sync sync Big AcceleraKon Query OpKmizer Common SQL BigInsights – DB2 – Netezza Oracle – Teradata Next Gen Big SQL will provide first MPP query engine for Hadoop
  • 18. 18 BigSheets provides business users with access to data without programming Spreadsheet-­‐style interface Data VisualizaKon Graphs
  • 19. 19 Watson Explorer included in BigInsights Faceted Search, NavigaKon Discovery
  • 20. 20 AnalyKcs Accelerators provide ability to extract insights more quickly Text Social Media Machine Data
  • 21. 21 App Store reduces development effort and enables reusability Combine Hadoop Apps
  • 22. 22 Open Source Hadoop Components Visualization Discovery Data Ingest Open Source Analytics Engines Cluster Optimization and Management Nutch Runtime Data Store HBase File System MapReduce HDFS Application Support and Development Tooling MapReduce Pig Hive ZooKeeper Sqoop Security HCatalog Flume Avro Lucene Oozie Derby 22
  • 23. 23 BigInsights Enterprise Edi3on Components Visualization Discovery Data Ingest Netezza DB2 Analytics Engines Cluster Optimization and Management Streams Derby Private firewall Open Source DataStage Nutch IBM IBM InfoSphere BigInsights Integrated Installer Runtime Admin Console Data Store HBase File System MapReduce HDFS Text Processing Engine and Extractor Library (AQL+HIL) JDBC Application Support and Development Tooling App infrastructure MapReduce Pig Hive Splicable Text Compression High Availability ZooKeeper Sqoop SystemML Eclipse Big SQL Security HCatalog R Gnip BoardReader GPFS-­‐FPO LDAP Guardium Flume Jaql Avro BigSheets Dashboard / visualiza3on Data Explorer Lucene Oozie PAM Enhanced Monitoring Adap3ve MapReduce Teradata 23
  • 25. Plagorm Analy3c Modules Cloud Computing GIS Engine Forecasting Engine Routing Engine Inventory Engine Work Force Engine Solutions Core Engine Streams BigInsights IMDB Column-­‐Store PureData In-­‐Flight Data Self-­‐ Structured Data Frequently Requested Summaries Low Entropy Data Mixed Workload Requests 25 25
  • 27. Twitter Tag: #briefr The Briefing Room Perceptions Questions Analysts: Lindy Ryan and John O’Brien
  • 28. BIG DATA IN ACTION © Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 28 Real-World Solution Showcase with Stream Integration Inside Analysis – The Briefing Room, February 25, 2014 Lindy Ryan | Research Director, Data Discovery Visualization @lindy_ryan lindy.ryan@radiantadvisors.com John O’Brien | Principal Analyst, Modern Data Platforms @obrienjw john.obrien@radiantadvisors.com
  • 29. Big Data in Action: Real-World Solutions MODERN DATA PLATFORM Apache Hadoop R programs Map Reduce Flexibility Class Extending SQL Access to Big Data and Hadoop via Hive and other HDFS SQL engines Highly Optimized for Analytics MPP In-memory MOLAP Hive SQL Highly Specialized for Analytics Document Stores PIG / Hive © Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 29 Enterprise Data Warehouses Master Reference Data askdjfl kasjdfl iuyuiio Graphs Text Analytics Optimized Class Reference Class Discovery, Scalable, Programs Discovery Analytics Oriented Stable, Context, SQL Operational Systems, Big Data, Streams HDFS Columnar
  • 30. Big Data in Action: Real-World Solutions SQL-ON-HADOOP Apache Hadoop v1 HCatalog Apache Hadoop v2 © Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 30 PIG Hive-QL MapReduce Hadoop HDFS Map Reduce PIG Hive-QL HCatalog YARN Hadoop HDFS Hadoop Distributions and 3rd Party PIG Hive Map Reduce HCatalog YARN Impala, HAWQ InfiniDB, Presto Hadoop HDFS MPP Engine Not all SQL-on-Hadoop is the same: 1. SQL capabilities (SQL-92, Analytic functions SQL-2003? SQL-2011? UDF?) 2. Scalability (not always the same as Hadoop scalability) 3. Speed (flat out performance response time without caching) File types: ORCFILE, SEQPART, Parquet
  • 31. Big Data in Action: Real-World Solutions TRADITIONAL FORMS OF DISCOVERY © Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 31 Spreadsheets • Most popular business “analytic” tool • Having access to the data is the value • Analysts can slice and dice data for insights Basic Visualizations • Provide visual representations of data • Provide insights beyond plain text data • Simplify complex information highlight trends
  • 32. Big Data in Action: Real-World Solutions ANALYTIC FORMS OF DISCOVERY © Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 32 Multi-Faceted, “Search Mode” • Discovery within structured unstructured data • Mine through various forms of data at once • Google-like search to iterate and deep dive Advanced Visualizations • Visualize clusters of data and correlations • Discover analytic models iteratively with data • Visual cues and cognitive sciences UX
  • 33. For more information www.RadiantAdvisors.com Twitter: @RadiantAdvisors #ModernBI #RediscoveringBI RSS: feed://radiantadvisors.com/feed/ Email: info@RadiantAdvisors.com LinkedIn: www.linkedin.com/company/radiant-advisors Subscribe: Rediscovering BI quarterly e-magazine www.radiantadvisors.com/rediscoveringbi THANK YOU! © Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 33
  • 34. Big Data in Action: Real-World Solutions ANALYST QUESTIONS © Copyright 2014 Radiant Advisors. All Rights Reserved v1.10.000 34 1. How are you handling the performance or SQL capabilities in Hive with Big SQL? 2. How do users define schema for Big SQL? 3. Can you explain user roles, security, and metadata in the App Store? Who is the store administrator?
  • 35. Twitter Tag: #briefr The Briefing Room
  • 36. This Month: BIG DATA March: CLOUD April: BIG DATA www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room Upcoming Topics 2014 Editorial Calendar at www.insideanalysis.com
  • 37. Twitter Tag: #briefr THANK YOU for your ATTENTION! The Briefing Room