SlideShare a Scribd company logo
© Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
What Is Big Data?
Architectures and Practical Use Cases
Tony Pearson
Master Inventor and Senior IT Specialist
IBM Corporation
2
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Abstract
Do you understand the storage
implications of big data analytics?
This session will explain what big
data is, provide some practical use
cases, then explain the IBM
products that support big data
3
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
This week with Tony Pearson
Day Time Topic
Monday 10:15am Opening Session – Storage
01:45pm IBM's Cloud Storage Options
Tuesday 11:30am Software Defined Storage -- Why? What? How? (repeats Friday)
03:15pm The Pendulum Swings Back –
Understanding Converged and Hyperconverged Environments
04:30pm New Generation of Storage Tiering:
Less Management Lower Cost and Increased Performance
Wednesday 09:00am What Is Big Data? Architectures and Practical Use Cases
01:45pm Data Footprint Reduction – Understanding IBM Storage Efficiency Options
03:15pm IBM Spectrum Virtualize – SVC, Storwize and FlashSystem V9000 (repeats Friday)
Thursday 10:15am IBM Spectrum Scale and Elastic Storage Offerings
01:45pm IBM Spectrum Scale for File and Object storage
03:15pm IBM Storage Integration with OpenStack
05:45pm Meet the Experts
Friday 09:00am Software Defined Storage -- Why? What? How?
10:15am IBM Spectrum Virtualize – SVC, Storwize and FlashSystem V9000
What is Big Data?
Big Data Use Cases
IBM Analytics Platform
IBM Spectrum Scale
Agenda
5
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
What is Big Data?
Data sets so large and complex
that it becomes difficult to process
using relational databases
The challenges include capture,
curation, storage, search, sharing,
transfer, analysis and visualization
Analysis of a single large set of
related data allows correlations to
be found
Can be used to identify trends,
patterns and insights to make
better decisions
Source: Wikipedia
6
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
OLAP
cube
Extract
Transform
Load (ETL)
Strategic planning
based on historical
analysis and
speculation
Day-to-day
operations based on
reports, news,
intuition
Business Executives
Make decisions
3
Traditional Decision Making Process
Reports
Batch
Processing
Transaction and
Application data
Database
Administrators
System of Record
Gather data
1
Business
Analysts
Analyze
2
7
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
What has Changed in the Last Few Decades?
1986 2015
6%
99%
Analog
data
Digital
data
Transaction and
Application data
Machine
data
Social media,
email
Enterprise
content
20%
Structured data
80%
Unstructured data
8
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
New Sources of Data to Analyze –
the Four V’s of big data
Volume
– Scale of data has grown beyond
relational database capabilities
Variety
– Machine data, enterprise content,
and social media and email
Velocity
– Computing has advanced to
receive and analyze real-time
data streams
Veracity
– How much can you trust the data
is right and accurate?
Transaction and
Application data
Database
Administrators
System of Record
System of Engagement
System of Insight
Machine
Data,
log data
Social
media,
photos,
audio,
video,
email
Enterprise
content
Storage
Administrators
Gather and Identify sources of data
1
9
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Data is the New Oil
DATA is the
new OIL In its raw form,
oil has little value…
Once processed
and refined,
it helps to power the
world!
10
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Structured,
Repeatable,
Linear
OLAP
cube
Unstructured,
Exploratory,
Iterative
New Capabilities to Analyze the Data
Reports Visualization and
Discovery
Hadoop
Data warehousing
Stream
Computing
Integration and
Governance
Text Analytics
Business
Analyst
Data
Scientist
Analyze data2
11
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
What does a Data Scientist do?
“It’s no longer hard to find the answer to a
given question; the hard part is finding the
right question. And as questions evolve, we
gain better insight into our ecosystem and
our business.”
-- Kevin Weil, Lead Analyst at Twitter
A data scientist must have…
– Strong business acumen
– Modeling, statistics, analytics and math skills
– Ability to communicate findings, tell a story
from the data, to both business and IT leaders
Inquisitive: exploring, doing “what if?”
analyses, questioning existing assumptions
and processes to spot trends, patterns and
hidden insight.
Computers are useless.
They can only give you
answers.
– Pablo Picasso
Source: http://www-01.ibm.com/software/data/infosphere/data-scientist/
http://blog.cloudera.com/blog/2010/09/twitter-analytics-lead-kevin-weil-and-a-presenter-at-hadoop-world-interviewed/
12
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Data Information Knowledge Wisdom (DIKW)
Wisdom
Applied I better stop the car!
Knowledge
Context
The traffic light I am
driving towards has
turned red
Information
Meaning
South-facing light at
corner of Pitt and George
streets has turn red
Data
Raw
červený
685 nm, 421 THz,
#FF0000
http://legoviews.com/2013/04/06/put-knowledge-into-action-and-enhance-organisational-wisdom-lsp-and-dikw/
13
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Better Decisions for New Business Outcomes
Day-to-day
operations based
on real-time
analytics
Strategic planning
based on science,
trends, patterns
and insight
Know Everything
about your
Customers
Innovate new
products at Speed
and Scale
Instant Awareness
of Fraud and Risk
Exploit Instrumented
Assets
Run Zero-latency
Operations
Business
Executive
Make Decisions
and Take Action
3
Empowered
Employees
14
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
statistical
models
Decision Making Process in the Era of big data
Real-time
Analytics
Database
Administrators
System of Insight
Strategic planning
based on science,
trends, patterns and
insight
Dashboard
Storage
Administrators
Gather and Identify sources of data
1
Day-to-day
operations based
on real-time
analytics
Business Executives
Empowered Employees
Make Decisions
and Take Action
3Data
Scientists
Business
Analysts
Analyze data2
What is Big Data?
Big Data Use Cases
IBM Analytics Platform
IBM Spectrum Scale
Agenda
16
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Practical Use Cases – The Analytics Landscape
Degree of Complexity
CompetitiveAdvantage
Standard Reporting
Ad hoc reporting
Query/drill down
Alerts
Simulation
Forecasting
Predictive modeling
Optimization
What exactly is the problem?
What will happen next if ?
What if these trends continue?
What could happen…. ?
What actions are needed?
How many, how often, where?
What happened?
Stochastic Optimization
Based on: Competing on Analytics, Davenport and Harris, 2007
Descriptive
Prescriptive
Predictive
How can we achieve the best
outcome?
How can we achieve the best
outcome including the effects of
variability?
17
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Innovate New Products and Services at Speed and Scale
Vestas, the world’s largest wind energy company, was able to use
big data and IBM technology to increase wind power generation
through optimal turbine placement.
Reducing the time to analyze petabytes of data with
IBM Big Insights software and IBM Spectrum Scale
“Before, it could take us three
weeks to get a response to
some of our questions simply
because we had to process a
lot of data. We expect that we
can get answers for the same
questions now in 15 minutes.”
– Lars Christian Christensen
18
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
If You are Not Paying for it…
Then you are not the Customer,
… You are the Product Being Sold!
How much is each
user worth to Social
Media companies?
Sources: Geek & Poke comic,
“Let’s Talk about Data” by Neha Mehta
19
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Social Network Public
Database
How valuable is Amy to my retail
sales? Who does she influence?
What do they spend?
Retailer
Amy Bearn
32, Married, mother of 3,
Accountant
Telco Score: 91
CPG Score: 76
Fashion Score: 88
Telco
company
How valuable is Amy to my mobile
phone network? How likely is she to
switch carriers? How many other
customers will follow
Merged Network
Calling Network
360 Degree View of the Customer –
A Demographic of One
20
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Deep Individual
Customer Insight
• Preferences
• Interests
• Likes
Run Zero-Latency Operations
Direct Channel Workflow Enrich
Initiate Direct
Response
Initiate
Channel
Response
Initiate
Process or
Workflow
Enrich
Customer
Profile
Real-time
Decision
21
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
How Target® Figured Out a Teen Girl Was Pregnant
Before Her Father Did
Every time you go shopping, you share intimate
details about your consumption patterns with
retailers.
Target has figured out how to data-mine whether
you have a baby on the way
Looked at historical buying data for all the ladies
who had signed up for Target baby registries
– Unscented soaps and lotions
– Calcium, magnesium and zinc supplements
About 25 products help generate “pregnancy
prediction” score and her “baby due date”
Target sends coupons timed to very specific
stages of her pregnancy
Source: http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/
“My daughter got this in the mail. She’s
still in high school, and you’re sending
her coupons for baby clothes and cribs?”
-- Angry father of teen girl
“I had a talk with my daughter,…She’s due
in August. I owe you an apology.”
-- Same father, 3 days later
22
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Exploit Instrumented Assets
Doctors from University of Ontario apply big data to
neonatal infant monitoring to predict infection
Detect Neonatal Patient Symptoms
Up to 24 Hours sooner
Continuously correlate data
Thousands of events
each second
Signal Processing
and Data Cleansing
Heart Rate Variability
What is Big Data?
Big Data Use Cases
IBM Analytics Platform
IBM Spectrum Scale
Agenda
24
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
The IBM big data platform advantage
BI /
Reporting
BI /
Reporting
Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM big data platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
• The platform provides benefit
as you move from an entry
point to a second and third
project
• Shared components and
integration between systems
lowers deployment costs
• Key points of leverage
• Reuse text analytics across streams and
BigInsights
• Hadoop connectors between Streams
and Information Integration
• Common integration, metadata and
governance across all engines
• Accelerators built across multiple engines
– common analytics, models, and
visualization
25
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Simplify your data warehouse
Customer Need
– Business users are hampered by the poor
performance of analytics of a general-purpose
enterprise warehouse – queries take hours to
run
– Enterprise data warehouse is encumbered by
too much data for too many purposes
– Need to ingest huge volumes of structured data
and run multiple concurrent deep analytic
queries against it
– IT needs to reduce the cost of maintaining the
data warehouse
Value Statement
– Speed and Simplicity for deep analytics
– 100s to 1000s users/second for operation
analytics
Customer examples
– Catalina Marketing – executing 10x the amount
of predictive workloads with the same staff
System for Transactions
System for Analytics
System for Operational Analytics
Get started with
IBM PureData Systems!
26
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Ad-Hoc versus Operational Analytics
27
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Analyze streaming data in Real time
Customer Need
– Harness and process streaming data
sources
– Select valuable data and insights to be
stored for further processing
– Quickly process and analyze perishable
data, and take timely action
Value Statement
– Significantly reduced processing time and
cost – process and then store what’s
valuable
– React in real-time to capture opportunities
before they expire
Customer examples
– Ufone – Telco Call Detail Record (CDR)
analytics for customer churn prevention
Get started with IBM Streams!
Visualization
Streams Runtime
Deployments
Sync
Adapters
Analytic
Operators
Source
Adapters
Automated
and
Optimized
Deployment
Streaming Data
Sources
Streams Studio IDE
28
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Dominant Players vs. Contender platforms
OS Tape Cloud
Management
Big Data &
Analytics
Dominant
Player
Microsoft
Windows
Quantum
DLT
Amazon Web
Services
Cloudera
Contender
platform
Linux Linear Tape
Open (LTO)
OpenStack Open Data
Platform
Supporters
of Contender
platform
IBM,
RedHat,
SUSE,
Oracle and
others
IBM, HP,
Certance
and others
IBM, HP,
Rackspace,
RedHat, Dell,
Cisco, VMware
and others
IBM, Pivotal,
Hortonworks
and others
29
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
IBM InfoSphere BigInsights is a 100% standard Hadoop distribution
By default, open source components are always deployed
Elect to use proprietary capabilities depending on your needs
In some cases, proprietary capabilities offer significant benefits
Open standards first, but with freedom of choice
HDFS
YARN
HIVE
MapReduce
PIG
Spectrum
Scale
Platform
Symphony
Big SQL
Adaptive
MapReduce
BigSheets
Share data with non-Hadoop applications
and simplify data management
Re-use existing tools and expertise,
Avoid additional development costs
Boost performance, support time-critical
workloads, do more with less
True multi-tenancy to boost service levels
and avoid duplication on infrastructure
Simplify access for end-users,
minimize software development
30
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Text Analytics
Spectrum Scale Platform Symphony
IBM BigInsights
Enterprise Management
System ML on Big R
Distributed R
IBM Open Platform with Apache Hadoop
IBM BigInsights Data Scientist
IBM BigInsights Analyst
Big SQL
Big Sheets
Big SQL
BigSheets
IBM BigInsights for
Apache Hadoop
IBM BigInsights for Apache Hadoop
Three new user-centric modules founded on an Open Data Platform
IBM Open Platform with Apache Hadoop is IBM’s own 100% open source Apache
Hadoop distribution. IBM will include the ODP common kernel when available.
Business Analyst
Data Scientist
Administrator
31
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Platform Symphony Integrates with Hadoop
YARN uses a pluggable architecture for schedulers.
– FIFO, Fair, and Capacity Schedulers implemented this way
– Symphony EGO is also implemented this way.
Therefore, scheduler is completely transparent to YARN Applications.
ISV Certification for Platform Symphony is not required.
YARN (open source)
Fair Capacity
Symphony
EGO
FIFO
Like other schedulers, queues and policies are defined in Platform Symphony EGO.
App1 App2 App3
32
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Spark, a Complement to Hadoop
32
• Spark - complement Hadoop, not replace
• Provides distributed memory abstractions for clusters to support applications that repeatedly use a
working set of data,
• Iterative algorithms (machine learning),
• Interactive data mining tools (R, Python, ..)
• Spark Programming Model – Resilient Distributed Datasets (RDDs)
• Immutable collections partitioned across cluster that can be rebuilt if a partition is lost
• Created by transforming data in stable storage using data flow operators (map, filter, group-by, …)
• Can be cached across parallel operations
• Spark uses HDFS or IBM Spectrum Scale
• Can use any Hadoop data source
• Use Hadoop InputFormats and OutputFormats
• Spark runs on YARN
• Can run on the same cluster with MapReduce
• Spark works with Hadoop ecosystem
• Flume, Sqoop, HBase
• Spark architectural considerations
• Keep dataset in memory
• Spark programs can be bottlenecked by any
resource in cluster: CPU, network bandwidth,
memory. Most often, if data fits in memory, the
bottleneck is network bandwidth.
HDFS or IBM Spectrum Scale YARN
33
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
IBM InfoSphere BigInsights – Big SQL
Native Hadoop Data Sources
CSV SEQ Parquet RC
AVRO ORC JSON Custom
Optimized SQL MPP Run-time
Big SQL
SQL based
Application
IBM’s SQL for Hadoop
• Makes Hadoop data accessible to a
wider audience
• Familiar, widely known syntax
• Leverage native Hadoop data sources
Complements the Data Warehouse
• Exploratory analytics
• Sandbox, Data Lake
Included in IBM BigInsights
Use familiar SQL tools
• Cognos, SPSS, Tableau, MicroStrategy
34
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Information
Ingestion and
Operational
Information
Decision
Management
BI and Predictive
Analytics
Navigation
and Discovery
Intelligence
Analysis
Landing Area,
Analytics Zone
and Archive
Raw Data
Structured Data
Text Analytics
Data Mining
Entity Analytics
Machine Learning
Real-time
Analytics
Video/Audio
Network/Sensor
Entity Analytics
Predictive
Exploration,
Integrated Warehouse,
and Mart Zones
Discovery
Deep Reflection
Operational
Predictive
Stream Processing
Data Integration
Master Data
Streams
Information Governance, Security and Business Continuity
Architecture Pattern for big data Implementation
Application
Transaction
Machine
data
Social media,
email
Enterprise
content
Data at Rest
What is Big Data?
Big Data Use Cases
IBM Analytics Platform
IBM Spectrum Scale
Agenda
36
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Why use IBM Spectrum Scale™
Extreme Scalability
Add or Remove nodes and
storage, without disruption or
performance impact to
applications
Universal Access to Data
All servers and clients have access to
data through a variety of file and object
protocols
High Performance
Parallel access with no hot spots
Proven Reliability
Used by over 200 of the top 500 Supercomputers
Survive any node or storage failure with Distributed
RAID and redundant components
37
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Hadoop Analytics – HDFS vs IBM Spectrum Scale™
HDFS
Save
Results
Discard
Rest
IBM Hadoop Connector
allows Map/Reduce
programs to process data
without application
changes
IBM Spectrum Scale
Application data
stored on IBM
Spectrum Scale is
readily available
for analytics
Save
Results
JFS2
NTFS
EXT4
Data Sources
mashup of structured and unstructured data
from a variety of sources
Actionable Insights
Provides answers to the
Who, What, Where, When,
Why and How
Business Intelligence
& Predictive Analytics
> Competitive Advantages
> New Threats and Fraud
> Changing Needs
and Forecasting
> And More!
38
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Hadoop HDFS
HDFS NameNode HA added in version 2.0.
NameNode HA in active/passive configuration
Difficulty to ingest data – special tools required
Lacking enterprise readiness
No single point of failure, distributed
metadata in active/active configuration since
1998
Ingest data using policies for data
placement
Versatile, Multi-purpose,
Hybrid Storage (locality and shared)
Enterprise ready with support for advanced
storage features (Encryption, DR, replication,
SW RAID etc)
Large block-sizes – poor support for small files
Variable block sizes – suited to multiple types
of data and metadata access pattern
Scale compute and storage independently
(Policy based ILM)
Compute and Storage tightly coupled –
leading to very low CPU utilization
Single-purpose, Hadoop MapReduce only
POSIX file system – easy to use and manage
Non-POSIX file system – obscure commands.
Does not support in-place updates.
IBM Spectrum Scale
HDFS versus IBM Spectrum Scale™
39
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
HDFS
Namenode
Secondary
Namenode
IBM Spectrum Scale™ – File Placement Optimization
SAN
Internal, Direct-Attach
TCP/IP or RDMA Network
• Spectrum Scale avoids the need for a central namenode, a
common failure point in HDFS
• Avoid long recovery times in the event of namenode
failure
• Spectrum Scale can intermix FPO with standard NSD server
and client nodes in the same cluster
• POSIX compliance which is key to avoid data islands.
• Robustness and performance at massive scale and
maturity
File Placement Optimization
(FPO)
Creates a “shared nothing”
cluster similar to HDFS in
Hadoop environments
40
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Share-Nothing versus Shared-Disk Deployments
Data
Data
Data Parity
Data
Data
Data
Copy
Copy
Copy
Copy
Copy
Copy
TCP/IP
or RDMA
Need more compute?
Add another node!
Spectrum Scale and Elastic Storage
Server reduce storage to one
RAID-protected copy of the data
Scale compute and storage
capacity separately
Spectrum Scale FPO
can keep 1,2 or 3
replicas of the data
Need more
storage capacity?
Add another
node!
3x versus 1.3x
TCP/IP
or RDMA
41
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
IBM Spectrum Scale™ –
Software, Systems or Cloud Services
Software
• Install software on your
own choice of Industry
standard x86 or
POWER servers
Pre-built Systems
• Elastic Storage Server with
distributed RAID
• Storwize V7000 Unified
Cloud Services
• Spectrum Scale can be
deployed on any Cloud
Scale
42
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Session summary
Big data is being generated by
everything around us
– Every digital process and social
media exchange produces it
– Systems, sensors and mobile
devices transmit it
Big data is arriving from multiple
sources at amazing velocities,
volumes and varieties
To extract meaningful value from
big data, you need optimal
processing power, storage,
analytics capabilities, and skills
Sources: The Economist, and special thanks to
Dr. Bob Sutor, IBM VP, Business Solutions & Mathematical Sciences
43
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Session Evaluations
YOUR OPINION MATTERS!
Submit four or more session
evaluations by 5:30pm Wednesday
to be eligible for drawings!
*Winners will be notified Thursday morning. Prizes must be picked up at
registration desk, during operating hours, by the conclusion of the event.
1 2 3 4
44
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
45
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Big Data & Analytics
Building Big Data and Analytics Solutions in the Cloud
http://www.redbooks.ibm.com/abstracts/redp5085.html?Open
o IBM BigInsights
o IBM PureData System for Hadoop
o IBM PureData System for Analytics
o IBM PureData System for Operational Analytics
o IBM InfoSphere Warehouse
o IBM Streams
o IBM InfoSphere Data Explorer (Watson Explorer)
o IBM InfoSphere Data Architect
o IBM InfoSphere Information Analyzer
o IBM InfoSphere Information Server
o IBM InfoSphere Information Server for Data Quality
o IBM InfoSphere Master Data Management Family
o IBM InfoSphere Optim Family
o IBM InfoSphere Guardium Family
“Analytics is about examining data to derive interesting and relevant
trends and patterns, which can be used to inform decisions, optimize
processes, and even drive new business models.”
46
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Research Paper
“In this paper, we revisit the debate on
the need of a new non-POSIX storage
stack for cloud analytics and argue,
based on an initial evaluation, that it can
be built on traditional POSIX-based
cluster filesystems.“
47
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Hadoop for the Enterprise
http://www.ibm.com/software/data/infosphere/hadoop/enterprise.html
IBM BigInsights for Apache Hadoop provides a 100% open source platform and offers
analytic and enterprise capabilities for Hadoop.
48
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
IBM Tucson Executive Briefing Center
Tucson, Arizona is home for
storage hardware and software
design and development
IBM Tucson Executive
Briefing Center offers:
–Technology briefings
–Product demonstrations
–Solution workshops
Take a video tour!
– http://youtu.be/CXrpoCZAazg
49
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
About the Speaker
Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined
IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings
on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud
Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with
strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products.
Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners
every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1
most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage:
Volume I through V.
Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware
and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in
Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and
software products.
9000 S. Rita Road
Bldg 9032 Floor 1
Tucson, AZ 85744
+1 520-799-4309 (Office)
tpearson@us.ibm.com
Tony Pearson
Master Inventor,
Senior IT Specialist
IBM System Storage™
50
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Email:
tpearson@us.ibm.com
Twitter:
twitter.com/az99Øtony
Blog:
ibm.co/Pearson
Books:
www.lulu.com/spotlight/99Ø_tony
IBM Expert Network on Slideshare:
www.slideshare.net/az99Øtony
Facebook:
www.facebook.com/tony.pearson.16121
Linkedin:
www.linkedin.com/profile/view?id=103718598
Additional Resources from Tony Pearson
51
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Continue growing your IBM skills
ibm.com/training
provides a comprehensive
portfolio of skills and career
accelerators that are
designed to meet all your
training needs.
If you can’t find the training that is right for
you with our Global Training Providers, we
can help.
Contact IBM Training at dpmc@us.ibm.com
Global Skills Initiative
52
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical University/Symposia materials
may not be reproduced in whole or in part without the prior written permission of
IBM.
Trademarks and Disclaimers
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other
countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a
registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open
Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband
Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO
Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental
costs and performance characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not
constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor
announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to
non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or
delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's
current investment and development activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the
workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM
representative or Business Partner for the most current pricing in your geography.
Photographs shown may be engineering prototypes. Changes may be incorporated in production models.
© IBM Corporation 2015. All rights reserved.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the
World Wide Web at http://www.ibm.com/legal/copytrade.shtml.
ZSP03490-USEN-00

More Related Content

PPTX
Databricks Platform.pptx
PDF
Lakehouse in Azure
PPTX
Building a modern data warehouse
PPTX
Screw DevOps, Let's Talk DataOps
PDF
Introducing Databricks Delta
DOCX
07. Analytics & Reporting Requirements Template
PPTX
Data Lake Overview
PPTX
Data Lakehouse Symposium | Day 4
Databricks Platform.pptx
Lakehouse in Azure
Building a modern data warehouse
Screw DevOps, Let's Talk DataOps
Introducing Databricks Delta
07. Analytics & Reporting Requirements Template
Data Lake Overview
Data Lakehouse Symposium | Day 4

What's hot (20)

PDF
Webinar Data Mesh - Part 3
PDF
Modernizing to a Cloud Data Architecture
PDF
5 Steps for Architecting a Data Lake
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
PDF
You Need a Data Catalog. Do You Know Why?
PPTX
ODSC May 2019 - The DataOps Manifesto
PDF
Data Lake Architecture – Modern Strategies & Approaches
PPT
Big Data
PPTX
Big data architectures and the data lake
PPTX
Introduction to snowflake
PPTX
Data mesh
PPTX
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
PDF
Data Pipline Observability meetup
PPTX
How to Build & Sustain a Data Governance Operating Model
PPTX
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
What is big data?
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
PPTX
PPTX
Big data Presentation
Webinar Data Mesh - Part 3
Modernizing to a Cloud Data Architecture
5 Steps for Architecting a Data Lake
Modern Data Warehousing with the Microsoft Analytics Platform System
You Need a Data Catalog. Do You Know Why?
ODSC May 2019 - The DataOps Manifesto
Data Lake Architecture – Modern Strategies & Approaches
Big Data
Big data architectures and the data lake
Introduction to snowflake
Data mesh
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
Data Pipline Observability meetup
How to Build & Sustain a Data Governance Operating Model
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Data Lakehouse, Data Mesh, and Data Fabric (r1)
What is big data?
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Big data Presentation
Ad

Viewers also liked (11)

PPT
Big Data Analytics 2014
PPTX
Analysis of Major Trends in Big Data Analytics
PDF
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
PPTX
Big data analytics in banking sector
PDF
Overview - IBM Big Data Platform
PPTX
Microsoft Azure Big Data Analytics
PPTX
Big Data Analytics
PPTX
Big Data Analytics
PPTX
Big Data - The 5 Vs Everyone Must Know
PPTX
Big data ppt
PPTX
What is Big Data?
Big Data Analytics 2014
Analysis of Major Trends in Big Data Analytics
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big data analytics in banking sector
Overview - IBM Big Data Platform
Microsoft Azure Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data - The 5 Vs Everyone Must Know
Big data ppt
What is Big Data?
Ad

Similar to IBM Big Data Analytics Concepts and Use Cases (20)

PDF
S ba0881 big-data-use-cases-pearson-edge2015-v7
PPTX
Where the data jobs are? A Data PDX talk
PDF
Industry and academic partnerships july 2015 final
PDF
IBM Technology Day 2013 BigData Salle Rome
PDF
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
PDF
Delivering Data Science to the Business
PPTX
The exciting new world of code & data
PDF
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
PDF
02 a holistic approach to big data
PDF
Ibm big data-platform
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PPTX
Big data
PDF
Future of Power: Big Data - Søren Ravn
PDF
Big data presentation (2014)
PPTX
"Demystifying Big Data by AIBDP.org
PDF
Key note big data analytics ecosystem strategy
PPTX
Make data simple in the cognitive era
PPTX
Building Confidence in Big Data - IBM Smarter Business 2013
PPSX
De-Mystifying Big Data
PDF
Big data and you
 
S ba0881 big-data-use-cases-pearson-edge2015-v7
Where the data jobs are? A Data PDX talk
Industry and academic partnerships july 2015 final
IBM Technology Day 2013 BigData Salle Rome
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Delivering Data Science to the Business
The exciting new world of code & data
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
02 a holistic approach to big data
Ibm big data-platform
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Big data
Future of Power: Big Data - Søren Ravn
Big data presentation (2014)
"Demystifying Big Data by AIBDP.org
Key note big data analytics ecosystem strategy
Make data simple in the cognitive era
Building Confidence in Big Data - IBM Smarter Business 2013
De-Mystifying Big Data
Big data and you
 

Recently uploaded (20)

PPTX
Introduction to machine learning and Linear Models
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Global journeys: estimating international migration
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Logistic Regression ml machine learning.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Mega Projects Data Mega Projects Data
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to machine learning and Linear Models
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Ppt On Nestle.pptx huunnnhhgfvu
Global journeys: estimating international migration
Moving the Public Sector (Government) to a Digital Adoption
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Logistic Regression ml machine learning.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Mega Projects Data Mega Projects Data
.pdf is not working space design for the following data for the following dat...
IB Computer Science - Internal Assessment.pptx
Database Infoormation System (DBIS).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

IBM Big Data Analytics Concepts and Use Cases

  • 1. © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What Is Big Data? Architectures and Practical Use Cases Tony Pearson Master Inventor and Senior IT Specialist IBM Corporation
  • 2. 2 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Abstract Do you understand the storage implications of big data analytics? This session will explain what big data is, provide some practical use cases, then explain the IBM products that support big data
  • 3. 3 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. This week with Tony Pearson Day Time Topic Monday 10:15am Opening Session – Storage 01:45pm IBM's Cloud Storage Options Tuesday 11:30am Software Defined Storage -- Why? What? How? (repeats Friday) 03:15pm The Pendulum Swings Back – Understanding Converged and Hyperconverged Environments 04:30pm New Generation of Storage Tiering: Less Management Lower Cost and Increased Performance Wednesday 09:00am What Is Big Data? Architectures and Practical Use Cases 01:45pm Data Footprint Reduction – Understanding IBM Storage Efficiency Options 03:15pm IBM Spectrum Virtualize – SVC, Storwize and FlashSystem V9000 (repeats Friday) Thursday 10:15am IBM Spectrum Scale and Elastic Storage Offerings 01:45pm IBM Spectrum Scale for File and Object storage 03:15pm IBM Storage Integration with OpenStack 05:45pm Meet the Experts Friday 09:00am Software Defined Storage -- Why? What? How? 10:15am IBM Spectrum Virtualize – SVC, Storwize and FlashSystem V9000
  • 4. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  • 5. 5 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What is Big Data? Data sets so large and complex that it becomes difficult to process using relational databases The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization Analysis of a single large set of related data allows correlations to be found Can be used to identify trends, patterns and insights to make better decisions Source: Wikipedia
  • 6. 6 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. OLAP cube Extract Transform Load (ETL) Strategic planning based on historical analysis and speculation Day-to-day operations based on reports, news, intuition Business Executives Make decisions 3 Traditional Decision Making Process Reports Batch Processing Transaction and Application data Database Administrators System of Record Gather data 1 Business Analysts Analyze 2
  • 7. 7 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What has Changed in the Last Few Decades? 1986 2015 6% 99% Analog data Digital data Transaction and Application data Machine data Social media, email Enterprise content 20% Structured data 80% Unstructured data
  • 8. 8 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. New Sources of Data to Analyze – the Four V’s of big data Volume – Scale of data has grown beyond relational database capabilities Variety – Machine data, enterprise content, and social media and email Velocity – Computing has advanced to receive and analyze real-time data streams Veracity – How much can you trust the data is right and accurate? Transaction and Application data Database Administrators System of Record System of Engagement System of Insight Machine Data, log data Social media, photos, audio, video, email Enterprise content Storage Administrators Gather and Identify sources of data 1
  • 9. 9 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Data is the New Oil DATA is the new OIL In its raw form, oil has little value… Once processed and refined, it helps to power the world!
  • 10. 10 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Structured, Repeatable, Linear OLAP cube Unstructured, Exploratory, Iterative New Capabilities to Analyze the Data Reports Visualization and Discovery Hadoop Data warehousing Stream Computing Integration and Governance Text Analytics Business Analyst Data Scientist Analyze data2
  • 11. 11 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. What does a Data Scientist do? “It’s no longer hard to find the answer to a given question; the hard part is finding the right question. And as questions evolve, we gain better insight into our ecosystem and our business.” -- Kevin Weil, Lead Analyst at Twitter A data scientist must have… – Strong business acumen – Modeling, statistics, analytics and math skills – Ability to communicate findings, tell a story from the data, to both business and IT leaders Inquisitive: exploring, doing “what if?” analyses, questioning existing assumptions and processes to spot trends, patterns and hidden insight. Computers are useless. They can only give you answers. – Pablo Picasso Source: http://www-01.ibm.com/software/data/infosphere/data-scientist/ http://blog.cloudera.com/blog/2010/09/twitter-analytics-lead-kevin-weil-and-a-presenter-at-hadoop-world-interviewed/
  • 12. 12 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Data Information Knowledge Wisdom (DIKW) Wisdom Applied I better stop the car! Knowledge Context The traffic light I am driving towards has turned red Information Meaning South-facing light at corner of Pitt and George streets has turn red Data Raw červený 685 nm, 421 THz, #FF0000 http://legoviews.com/2013/04/06/put-knowledge-into-action-and-enhance-organisational-wisdom-lsp-and-dikw/
  • 13. 13 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Better Decisions for New Business Outcomes Day-to-day operations based on real-time analytics Strategic planning based on science, trends, patterns and insight Know Everything about your Customers Innovate new products at Speed and Scale Instant Awareness of Fraud and Risk Exploit Instrumented Assets Run Zero-latency Operations Business Executive Make Decisions and Take Action 3 Empowered Employees
  • 14. 14 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. statistical models Decision Making Process in the Era of big data Real-time Analytics Database Administrators System of Insight Strategic planning based on science, trends, patterns and insight Dashboard Storage Administrators Gather and Identify sources of data 1 Day-to-day operations based on real-time analytics Business Executives Empowered Employees Make Decisions and Take Action 3Data Scientists Business Analysts Analyze data2
  • 15. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  • 16. 16 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Practical Use Cases – The Analytics Landscape Degree of Complexity CompetitiveAdvantage Standard Reporting Ad hoc reporting Query/drill down Alerts Simulation Forecasting Predictive modeling Optimization What exactly is the problem? What will happen next if ? What if these trends continue? What could happen…. ? What actions are needed? How many, how often, where? What happened? Stochastic Optimization Based on: Competing on Analytics, Davenport and Harris, 2007 Descriptive Prescriptive Predictive How can we achieve the best outcome? How can we achieve the best outcome including the effects of variability?
  • 17. 17 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Innovate New Products and Services at Speed and Scale Vestas, the world’s largest wind energy company, was able to use big data and IBM technology to increase wind power generation through optimal turbine placement. Reducing the time to analyze petabytes of data with IBM Big Insights software and IBM Spectrum Scale “Before, it could take us three weeks to get a response to some of our questions simply because we had to process a lot of data. We expect that we can get answers for the same questions now in 15 minutes.” – Lars Christian Christensen
  • 18. 18 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. If You are Not Paying for it… Then you are not the Customer, … You are the Product Being Sold! How much is each user worth to Social Media companies? Sources: Geek & Poke comic, “Let’s Talk about Data” by Neha Mehta
  • 19. 19 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Social Network Public Database How valuable is Amy to my retail sales? Who does she influence? What do they spend? Retailer Amy Bearn 32, Married, mother of 3, Accountant Telco Score: 91 CPG Score: 76 Fashion Score: 88 Telco company How valuable is Amy to my mobile phone network? How likely is she to switch carriers? How many other customers will follow Merged Network Calling Network 360 Degree View of the Customer – A Demographic of One
  • 20. 20 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Deep Individual Customer Insight • Preferences • Interests • Likes Run Zero-Latency Operations Direct Channel Workflow Enrich Initiate Direct Response Initiate Channel Response Initiate Process or Workflow Enrich Customer Profile Real-time Decision
  • 21. 21 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. How Target® Figured Out a Teen Girl Was Pregnant Before Her Father Did Every time you go shopping, you share intimate details about your consumption patterns with retailers. Target has figured out how to data-mine whether you have a baby on the way Looked at historical buying data for all the ladies who had signed up for Target baby registries – Unscented soaps and lotions – Calcium, magnesium and zinc supplements About 25 products help generate “pregnancy prediction” score and her “baby due date” Target sends coupons timed to very specific stages of her pregnancy Source: http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/ “My daughter got this in the mail. She’s still in high school, and you’re sending her coupons for baby clothes and cribs?” -- Angry father of teen girl “I had a talk with my daughter,…She’s due in August. I owe you an apology.” -- Same father, 3 days later
  • 22. 22 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Exploit Instrumented Assets Doctors from University of Ontario apply big data to neonatal infant monitoring to predict infection Detect Neonatal Patient Symptoms Up to 24 Hours sooner Continuously correlate data Thousands of events each second Signal Processing and Data Cleansing Heart Rate Variability
  • 23. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  • 24. 24 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. The IBM big data platform advantage BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM big data platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse • The platform provides benefit as you move from an entry point to a second and third project • Shared components and integration between systems lowers deployment costs • Key points of leverage • Reuse text analytics across streams and BigInsights • Hadoop connectors between Streams and Information Integration • Common integration, metadata and governance across all engines • Accelerators built across multiple engines – common analytics, models, and visualization
  • 25. 25 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Simplify your data warehouse Customer Need – Business users are hampered by the poor performance of analytics of a general-purpose enterprise warehouse – queries take hours to run – Enterprise data warehouse is encumbered by too much data for too many purposes – Need to ingest huge volumes of structured data and run multiple concurrent deep analytic queries against it – IT needs to reduce the cost of maintaining the data warehouse Value Statement – Speed and Simplicity for deep analytics – 100s to 1000s users/second for operation analytics Customer examples – Catalina Marketing – executing 10x the amount of predictive workloads with the same staff System for Transactions System for Analytics System for Operational Analytics Get started with IBM PureData Systems!
  • 26. 26 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Ad-Hoc versus Operational Analytics
  • 27. 27 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Analyze streaming data in Real time Customer Need – Harness and process streaming data sources – Select valuable data and insights to be stored for further processing – Quickly process and analyze perishable data, and take timely action Value Statement – Significantly reduced processing time and cost – process and then store what’s valuable – React in real-time to capture opportunities before they expire Customer examples – Ufone – Telco Call Detail Record (CDR) analytics for customer churn prevention Get started with IBM Streams! Visualization Streams Runtime Deployments Sync Adapters Analytic Operators Source Adapters Automated and Optimized Deployment Streaming Data Sources Streams Studio IDE
  • 28. 28 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Dominant Players vs. Contender platforms OS Tape Cloud Management Big Data & Analytics Dominant Player Microsoft Windows Quantum DLT Amazon Web Services Cloudera Contender platform Linux Linear Tape Open (LTO) OpenStack Open Data Platform Supporters of Contender platform IBM, RedHat, SUSE, Oracle and others IBM, HP, Certance and others IBM, HP, Rackspace, RedHat, Dell, Cisco, VMware and others IBM, Pivotal, Hortonworks and others
  • 29. 29 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM InfoSphere BigInsights is a 100% standard Hadoop distribution By default, open source components are always deployed Elect to use proprietary capabilities depending on your needs In some cases, proprietary capabilities offer significant benefits Open standards first, but with freedom of choice HDFS YARN HIVE MapReduce PIG Spectrum Scale Platform Symphony Big SQL Adaptive MapReduce BigSheets Share data with non-Hadoop applications and simplify data management Re-use existing tools and expertise, Avoid additional development costs Boost performance, support time-critical workloads, do more with less True multi-tenancy to boost service levels and avoid duplication on infrastructure Simplify access for end-users, minimize software development
  • 30. 30 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Text Analytics Spectrum Scale Platform Symphony IBM BigInsights Enterprise Management System ML on Big R Distributed R IBM Open Platform with Apache Hadoop IBM BigInsights Data Scientist IBM BigInsights Analyst Big SQL Big Sheets Big SQL BigSheets IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Three new user-centric modules founded on an Open Data Platform IBM Open Platform with Apache Hadoop is IBM’s own 100% open source Apache Hadoop distribution. IBM will include the ODP common kernel when available. Business Analyst Data Scientist Administrator
  • 31. 31 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Platform Symphony Integrates with Hadoop YARN uses a pluggable architecture for schedulers. – FIFO, Fair, and Capacity Schedulers implemented this way – Symphony EGO is also implemented this way. Therefore, scheduler is completely transparent to YARN Applications. ISV Certification for Platform Symphony is not required. YARN (open source) Fair Capacity Symphony EGO FIFO Like other schedulers, queues and policies are defined in Platform Symphony EGO. App1 App2 App3
  • 32. 32 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Spark, a Complement to Hadoop 32 • Spark - complement Hadoop, not replace • Provides distributed memory abstractions for clusters to support applications that repeatedly use a working set of data, • Iterative algorithms (machine learning), • Interactive data mining tools (R, Python, ..) • Spark Programming Model – Resilient Distributed Datasets (RDDs) • Immutable collections partitioned across cluster that can be rebuilt if a partition is lost • Created by transforming data in stable storage using data flow operators (map, filter, group-by, …) • Can be cached across parallel operations • Spark uses HDFS or IBM Spectrum Scale • Can use any Hadoop data source • Use Hadoop InputFormats and OutputFormats • Spark runs on YARN • Can run on the same cluster with MapReduce • Spark works with Hadoop ecosystem • Flume, Sqoop, HBase • Spark architectural considerations • Keep dataset in memory • Spark programs can be bottlenecked by any resource in cluster: CPU, network bandwidth, memory. Most often, if data fits in memory, the bottleneck is network bandwidth. HDFS or IBM Spectrum Scale YARN
  • 33. 33 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM InfoSphere BigInsights – Big SQL Native Hadoop Data Sources CSV SEQ Parquet RC AVRO ORC JSON Custom Optimized SQL MPP Run-time Big SQL SQL based Application IBM’s SQL for Hadoop • Makes Hadoop data accessible to a wider audience • Familiar, widely known syntax • Leverage native Hadoop data sources Complements the Data Warehouse • Exploratory analytics • Sandbox, Data Lake Included in IBM BigInsights Use familiar SQL tools • Cognos, SPSS, Tableau, MicroStrategy
  • 34. 34 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Information Ingestion and Operational Information Decision Management BI and Predictive Analytics Navigation and Discovery Intelligence Analysis Landing Area, Analytics Zone and Archive Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Real-time Analytics Video/Audio Network/Sensor Entity Analytics Predictive Exploration, Integrated Warehouse, and Mart Zones Discovery Deep Reflection Operational Predictive Stream Processing Data Integration Master Data Streams Information Governance, Security and Business Continuity Architecture Pattern for big data Implementation Application Transaction Machine data Social media, email Enterprise content Data at Rest
  • 35. What is Big Data? Big Data Use Cases IBM Analytics Platform IBM Spectrum Scale Agenda
  • 36. 36 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Why use IBM Spectrum Scale™ Extreme Scalability Add or Remove nodes and storage, without disruption or performance impact to applications Universal Access to Data All servers and clients have access to data through a variety of file and object protocols High Performance Parallel access with no hot spots Proven Reliability Used by over 200 of the top 500 Supercomputers Survive any node or storage failure with Distributed RAID and redundant components
  • 37. 37 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Hadoop Analytics – HDFS vs IBM Spectrum Scale™ HDFS Save Results Discard Rest IBM Hadoop Connector allows Map/Reduce programs to process data without application changes IBM Spectrum Scale Application data stored on IBM Spectrum Scale is readily available for analytics Save Results JFS2 NTFS EXT4 Data Sources mashup of structured and unstructured data from a variety of sources Actionable Insights Provides answers to the Who, What, Where, When, Why and How Business Intelligence & Predictive Analytics > Competitive Advantages > New Threats and Fraud > Changing Needs and Forecasting > And More!
  • 38. 38 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Hadoop HDFS HDFS NameNode HA added in version 2.0. NameNode HA in active/passive configuration Difficulty to ingest data – special tools required Lacking enterprise readiness No single point of failure, distributed metadata in active/active configuration since 1998 Ingest data using policies for data placement Versatile, Multi-purpose, Hybrid Storage (locality and shared) Enterprise ready with support for advanced storage features (Encryption, DR, replication, SW RAID etc) Large block-sizes – poor support for small files Variable block sizes – suited to multiple types of data and metadata access pattern Scale compute and storage independently (Policy based ILM) Compute and Storage tightly coupled – leading to very low CPU utilization Single-purpose, Hadoop MapReduce only POSIX file system – easy to use and manage Non-POSIX file system – obscure commands. Does not support in-place updates. IBM Spectrum Scale HDFS versus IBM Spectrum Scale™
  • 39. 39 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. HDFS Namenode Secondary Namenode IBM Spectrum Scale™ – File Placement Optimization SAN Internal, Direct-Attach TCP/IP or RDMA Network • Spectrum Scale avoids the need for a central namenode, a common failure point in HDFS • Avoid long recovery times in the event of namenode failure • Spectrum Scale can intermix FPO with standard NSD server and client nodes in the same cluster • POSIX compliance which is key to avoid data islands. • Robustness and performance at massive scale and maturity File Placement Optimization (FPO) Creates a “shared nothing” cluster similar to HDFS in Hadoop environments
  • 40. 40 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Share-Nothing versus Shared-Disk Deployments Data Data Data Parity Data Data Data Copy Copy Copy Copy Copy Copy TCP/IP or RDMA Need more compute? Add another node! Spectrum Scale and Elastic Storage Server reduce storage to one RAID-protected copy of the data Scale compute and storage capacity separately Spectrum Scale FPO can keep 1,2 or 3 replicas of the data Need more storage capacity? Add another node! 3x versus 1.3x TCP/IP or RDMA
  • 41. 41 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM Spectrum Scale™ – Software, Systems or Cloud Services Software • Install software on your own choice of Industry standard x86 or POWER servers Pre-built Systems • Elastic Storage Server with distributed RAID • Storwize V7000 Unified Cloud Services • Spectrum Scale can be deployed on any Cloud Scale
  • 42. 42 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Session summary Big data is being generated by everything around us – Every digital process and social media exchange produces it – Systems, sensors and mobile devices transmit it Big data is arriving from multiple sources at amazing velocities, volumes and varieties To extract meaningful value from big data, you need optimal processing power, storage, analytics capabilities, and skills Sources: The Economist, and special thanks to Dr. Bob Sutor, IBM VP, Business Solutions & Mathematical Sciences
  • 43. 43 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Session Evaluations YOUR OPINION MATTERS! Submit four or more session evaluations by 5:30pm Wednesday to be eligible for drawings! *Winners will be notified Thursday morning. Prizes must be picked up at registration desk, during operating hours, by the conclusion of the event. 1 2 3 4
  • 44. 44 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
  • 45. 45 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Big Data & Analytics Building Big Data and Analytics Solutions in the Cloud http://www.redbooks.ibm.com/abstracts/redp5085.html?Open o IBM BigInsights o IBM PureData System for Hadoop o IBM PureData System for Analytics o IBM PureData System for Operational Analytics o IBM InfoSphere Warehouse o IBM Streams o IBM InfoSphere Data Explorer (Watson Explorer) o IBM InfoSphere Data Architect o IBM InfoSphere Information Analyzer o IBM InfoSphere Information Server o IBM InfoSphere Information Server for Data Quality o IBM InfoSphere Master Data Management Family o IBM InfoSphere Optim Family o IBM InfoSphere Guardium Family “Analytics is about examining data to derive interesting and relevant trends and patterns, which can be used to inform decisions, optimize processes, and even drive new business models.”
  • 46. 46 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Research Paper “In this paper, we revisit the debate on the need of a new non-POSIX storage stack for cloud analytics and argue, based on an initial evaluation, that it can be built on traditional POSIX-based cluster filesystems.“
  • 47. 47 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Hadoop for the Enterprise http://www.ibm.com/software/data/infosphere/hadoop/enterprise.html IBM BigInsights for Apache Hadoop provides a 100% open source platform and offers analytic and enterprise capabilities for Hadoop.
  • 48. 48 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. IBM Tucson Executive Briefing Center Tucson, Arizona is home for storage hardware and software design and development IBM Tucson Executive Briefing Center offers: –Technology briefings –Product demonstrations –Solution workshops Take a video tour! – http://youtu.be/CXrpoCZAazg
  • 49. 49 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. About the Speaker Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products. Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1 most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through V. Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products. 9000 S. Rita Road Bldg 9032 Floor 1 Tucson, AZ 85744 +1 520-799-4309 (Office) tpearson@us.ibm.com Tony Pearson Master Inventor, Senior IT Specialist IBM System Storage™
  • 50. 50 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Email: tpearson@us.ibm.com Twitter: twitter.com/az99Øtony Blog: ibm.co/Pearson Books: www.lulu.com/spotlight/99Ø_tony IBM Expert Network on Slideshare: www.slideshare.net/az99Øtony Facebook: www.facebook.com/tony.pearson.16121 Linkedin: www.linkedin.com/profile/view?id=103718598 Additional Resources from Tony Pearson
  • 51. 51 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Continue growing your IBM skills ibm.com/training provides a comprehensive portfolio of skills and career accelerators that are designed to meet all your training needs. If you can’t find the training that is right for you with our Global Training Providers, we can help. Contact IBM Training at dpmc@us.ibm.com Global Skills Initiative
  • 52. 52 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. Trademarks and Disclaimers Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. © IBM Corporation 2015. All rights reserved. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. ZSP03490-USEN-00