SlideShare a Scribd company logo
CONSULTING SOLUTIONS OUTSOURCING
PARTNER FOR A NEW
ERA
Transform Your Business with
Big Data and Hortonworks
Tom Kersnick – Pactera – Director Big Data Solutions
Robby Richardson – Hortonworks – Enterprise Account Manager
Topics
© Pactera. Confidential. All Rights Reserved.
2 Who is Hortonworks?
3 Hortonworks HDP: Enterprise Hadoop Distribution
4
5 Pactera Intro
6 Big Data Initiatives
Hadoop 2.0: The Enterprise Generation
1 Hortonworks Intro
2
Hortonworks Snapshot
• We distribute the only 100%
Open Source Enterprise
Hadoop Distribution:
Hortonworks Data Platform
• We engineer, test & certify HDP
for enterprise usage
• We employ the core architects,
builders and operators of
Apache Hadoop
• We drive innovation within
Apache Software Foundation
projects
• We are uniquely positioned to
deliver the highest quality of
Hadoop support
• We enable the ecosystem to
work better with Hadoop
Develop Distribute Support
We develop, distribute and support
the ONLY 100% open source
Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CA
Employees: 200+ and growing
Investors: Benchmark, Index, Yahoo
3© Pactera. Confidential. All Rights Reserved. 3
Rapid Customer Growth
4© Pactera. Confidential. All Rights Reserved. 4
Hortonworks HDP: Enterprise Hadoop 1.x Distribution
© Pactera. Confidential. All Rights Reserved.
OS Cloud VM Appliance
PLATFORM
SERVICES
HADOOP
CORE
Enterprise Readiness
High Availability, Disaster Recovery,
Security and Snapshots
HORTONWORKS
DATA PLATFORM (HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HIVE
(HCATALOG)
PIG HBASE
OOZIE
AMBARI
HDFS
MAP REDUCE
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source and
complete distribution
• Enterprise grade, proven and
tested at scale
• Ecosystem endorsed to ensure
interoperability
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
5
Hadoop 2.0… The Enterprise Generation
© Pactera. Confidential. All Rights Reserved.
Business Value
Big Data
Transactions,
Interactions,
Observations
Single Platform
Multiple Use
BATCH
INTERACTIVE
ONLINE
1.0 Architected for the Large Web Properties
2.0 Architected for the Broad Enterprise
Enterprise Requirements Hadoop 2.0 Features
Mixed workloads YARN
Interactive Query Hive on Tez
Reliability Full Stack HA
Point in time Recovery Snapshots
Multi Data Center Disaster Recovery
ZERO downtime Rolling Upgrades
Security Knox Gateway
6
HDP: Enterprise Hadoop 2.0 Distribution
© Pactera. Confidential. All Rights Reserved.
OS/VM Cloud Appliance
PLATFORM
SERVICES
HADOOP
CORE
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling
Upgrades, Security and
Snapshots
HORTONWORKS
DATA PLATFORM (HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HIVE &
HCATALOG
PIG HBASE
HDFS
MAP
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source and
complete distribution
• Enterprise grade, proven and
tested at scale
• Ecosystem endorsed to ensure
interoperability
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN*
TEZ* OTHERREDUCE
7
Seamless Interoperability with Microsoft Tools
© Pactera. Confidential. All Rights Reserved.
• Integrated with Microsoft
tools for native big data
analysis
» Bi-directional connectors for SQL
Server and SQL Azure through
SQOOP
» Excel ODBC integration through
Hive
• Addressing demand for
Hadoop on Windows
» Ideal for Windows customers with
Hadoop operational experience
• Enables most common
Hadoop workloads in the
Enterprise
» Data refinement and ETL offload
for high-volume data landing
» Data exploration for discovery of
new business opportunities
» Data enrichment for fined tuned
delivery and recommendation
engines
APPLICATIONSDATASYSTEMS
Microsoft Applications
HORTONWORKS
DATA PLATFORM
For Windows
DATASOURCES
MOBILE
DATA
OLTP, PO
S
SYSTEMS
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
8
Transferring Our Hadoop Expertise to You
© Pactera. Confidential. All Rights Reserved.
The expert source for
Apache Hadoop training & certification
• World class training programs designed to help you learn
fast
• Role-based hands on classes with 50% lab time
• Certification to demonstrate Hadoop Expertise in
Development and Administration
• Expert consulting services
• Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox
• Free download
• Fastest way to learn Apache Hadoop
• Personal, portable Hadoop environment
9
Hortonworks Summary
© Pactera. Confidential. All Rights Reserved.
• Leading the Innovation in Core Hadoop
• Addressing the requirements for Enterprise usage
• Enabling interoperability of the ecosystem
• No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out moreworks.com
» www.hortonworks.com/hadoop-training/
» www.hortonworks.com/sandbox
10
Big Data is Critical
© Pactera. Confidential. All Rights Reserved.
Challenges to Using Big Data
Given that nearly less than one-third of businesses are in the dark about their
available data, it makes sense that silos are the primary hurdle in using this
information.
Lack of
sharing data is
an obstacle to
measuring
marketing ROI
Not using data
effectively to
personalize
marketing
communications
Not able to
link data
together at
the individual
customer level
Data collected
infrequently or
not quickly
enough
Too little or no
customer/
consumer data
51% 45% 42% 39% 29%
11
What Initiatives Are Using Big Data
© Pactera. Confidential. All Rights Reserved. 12
Keys to a Successful Big Data Initiative
© Pactera. Confidential. All Rights Reserved.
Define the Impact
• Short term VS. Long term measures
What cannot be answered today?
• This is your starting point
Create User Centric Internal | External Applications
• Decision support framework
Predicting the Consumer
• Algorithms, Models, Testing, and
More Testing!
13
Obstacles to Define Big Data ROI
© Pactera. Confidential. All Rights Reserved.
Not enough skilled resources for adaptation
• Advance competencies
Traditional IT Architectures cause limitations
• Identifying the right technologies
• Adapting to particular needs
• Assemble business use cases
• Silos
Optimizing Solutions
• Strong internal use cases
• Inability to effectively automate data
14
Solution Architecture using Multiple Ecosystems
© Pactera. Confidential. All Rights Reserved.
incoming
outgoing
Real Time In-Memory
Solution
EDW
Hadoop
Sand
box
2
3
4
7
8
9
6
5
Models
Algorithms
Simulations
1. Data Feeds into a Real-Time Memory solution that will ingest data into EDW, Hadoop, and other platforms as
mobile, API’s, etc.
2. ELT streaming into In-Memory Solution to provide visibility to Real-Time Social, Mobile, and Shell approaches to
Algorithms, Models, and Simulations
3. In-Memory Real-Time Solution such as YARN or Storm to digest data to EDW, Hadoop, Social Media, and other such
platforms.
4. EDW for Structured Information from Sources in 1.
5. Hadoop for semi-structured and unstructured data. Solution architecture including Sand Box availability.
6. Shell UI Interfaces utilizing data from Real-Time in memory solution as well as EDW, Hadoop, etc. for
Models, Algorithms, and Simulations.
7. Structured and Unstructured Reporting in reporting interfaces
8. Deep Dive analytics in Hadoop and Real-time Streaming
9. Real-Time customer interaction for Social and other similar platforms.
1
15
Predictive
Analysis
Use Case
for Online Travel
Company
16© Pactera. Confidential. All Rights Reserved.
Flight Cost by Variants Determination
Data Feeds utilize real-time in-memory streaming to execute matching algorithms.
Used in order to determine views within a session of certain one-way and round trip
flights viewed by users.
Predictive Analytics algorithms determine how to increase/decrease prices based on
views, market pricing, time, and availability.
© Pactera. Confidential. All Rights Reserved.
http
logs
partners
custom
incoming
outgoing
destinations
rdbms
hadoop
application
mobile
Real Time In-Memory Solution
(Storm)
17
Solution Architecture using YARN
© Pactera. Confidential. All Rights Reserved.
• Created to manage resource needs across all uses
• Ensures predictable performance & QoS for all apps
• Enables apps to run “IN” Hadoop rather than “ON”
» Key to leveraging all other common services of the Hadoop platform: security,
data lifecycle management, etc.
Applications Run Natively IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, S4,…)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
ONLINE
(HBase)
OTHER
(Search)
(Weave…)
18
Pactera Big Data Capability
© Pactera. Confidential. All Rights Reserved.
Big Data Solution Architecture
 In-Memory Solutions
 Scalable Distributed Platforms
Next Generation Analytics
 Models, Algorithms, and Simulations
 Visualization
Improving Operational Ability
 Help companies drive more operational efficiencies from existing
investments.
 Moving from the realm of data scientists into everyday business transactions
and encounters.
New Business Processes
 Impact on both customer intelligence and operational efficiency by making
everything immediately actionable.
 Armed with immediate decision-making capability and intelligence,
companies will be able to implement new business processes that will
change how business is done.
 We ask the Right Questions
19
How Pactera can help with Big Data
Implementation and Architecture
Benchmark and Monitoring
Implementation and Architecture
POC (2-4 Weeks)
© Pactera. Confidential. All Rights Reserved.
Executive Workshop
Strategies, Planning, and Expectations
• Big Data strategy on what tomorrow will look like
• Using Big Data to establish market dominance
• Big Data project takeaways
• Roadblocks to implementing Big Data analytics
• Defining an ROI for Big Data
• Getting the right ROI on Big Data
Workshop
(4 Hours)
Proof of Concept
(2-4 Weeks)
Projects:
•Benchmark & Monitoring
•Integrations & Migrations
•Implementation & Architecture
•Project Management
•Analytics
•Reporting
Technical Workshop
End-To-End Management
• System tuning/auto-tuning and configuration management
• Dealing with both structured and unstructured data
• Monitoring, diagnosis, and automated behavior detection
Solution Architecture
• Processor, memory, and system architectures for data analysis
• Benchmarks, metrics, and workload characterization for big
data
• Availability, fault tolerance and recovery issues
• Data management and analytics for vast amounts of
unstructured data
20
© Pactera. Confidential. All Rights Reserved.
Thank You
Tom Kersnick
tom.kersnick@pactera.com
Robby Richardson
rrichardson@hortonworks.com

More Related Content

PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
PDF
Enterprise Apache Hadoop: State of the Union
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
PDF
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
PPTX
Break Through the Traditional Advertisement Services with Big Data and Apache...
PDF
The Next Generation of Big Data Analytics
PDF
Actian forrester- hortonworks
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hadoop 2.0: YARN to Further Optimize Data Processing
Enterprise Apache Hadoop: State of the Union
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Break Through the Traditional Advertisement Services with Big Data and Apache...
The Next Generation of Big Data Analytics
Actian forrester- hortonworks
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1

What's hot (20)

PDF
IDC Retail Insights - What's Possible with a Modern Data Architecture?
PDF
Talend Open Studio and Hortonworks Data Platform
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
PDF
Cloudian 451-hortonworks - webinar
PDF
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
PDF
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
PPTX
Yahoo! Hack Europe
PDF
Hortonworks and Red Hat Webinar - Part 2
PDF
Hortonworks and Clarity Solution Group
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
PDF
Dataguise hortonworks insurance_feb25
PPTX
HPE and Hortonworks join forces to Deliver Healthcare Transformation
PDF
Hortonworks sqrrl webinar v5.pptx
PDF
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
PDF
Apache Hadoop on the Open Cloud
PDF
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
PDF
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
PPTX
Hortonworks Oracle Big Data Integration
PPTX
The Power of your Data Achieved - Next Gen Modernization
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Talend Open Studio and Hortonworks Data Platform
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Cloudian 451-hortonworks - webinar
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Yahoo! Hack Europe
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Clarity Solution Group
Eliminating the Challenges of Big Data Management Inside Hadoop
Dataguise hortonworks insurance_feb25
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks sqrrl webinar v5.pptx
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Apache Hadoop on the Open Cloud
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks Oracle Big Data Integration
The Power of your Data Achieved - Next Gen Modernization
Ad

Similar to Transform You Business with Big Data and Hortonworks (20)

PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
PDF
Bridging the Big Data Gap in the Software-Driven World
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PPTX
Create a Smarter Data Lake with HP Haven and Apache Hadoop
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
Hitachi Data Systems Hadoop Solution
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PDF
Introduction to Hadoop
PDF
Discover.hdp2.2.ambari.final[1]
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
PPTX
Hadoop Reporting and Analysis - Jaspersoft
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
PDF
Hortonworks and Voltage Security webinar
PDF
Building a Modern Data Architecture with Enterprise Hadoop
PPTX
PPTX
Why Hadoop as a Service?
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
PPT
Apresentação Hadoop
PDF
Hortonworks Hadoop @ Oslo Hadoop User Group
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Bridging the Big Data Gap in the Software-Driven World
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks - What's Possible with a Modern Data Architecture?
Supporting Financial Services with a More Flexible Approach to Big Data
Hitachi Data Systems Hadoop Solution
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Introduction to Hadoop
Discover.hdp2.2.ambari.final[1]
Supporting Financial Services with a More Flexible Approach to Big Data
Hadoop Reporting and Analysis - Jaspersoft
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks and Voltage Security webinar
Building a Modern Data Architecture with Enterprise Hadoop
Why Hadoop as a Service?
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Apresentação Hadoop
Hortonworks Hadoop @ Oslo Hadoop User Group
Ad

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Modernizing your data center with Dell and AMD
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Monthly Chronicles - July 2025
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
Dropbox Q2 2025 Financial Results & Investor Presentation
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Modernizing your data center with Dell and AMD
20250228 LYD VKU AI Blended-Learning.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Transform You Business with Big Data and Hortonworks

  • 1. CONSULTING SOLUTIONS OUTSOURCING PARTNER FOR A NEW ERA Transform Your Business with Big Data and Hortonworks Tom Kersnick – Pactera – Director Big Data Solutions Robby Richardson – Hortonworks – Enterprise Account Manager
  • 2. Topics © Pactera. Confidential. All Rights Reserved. 2 Who is Hortonworks? 3 Hortonworks HDP: Enterprise Hadoop Distribution 4 5 Pactera Intro 6 Big Data Initiatives Hadoop 2.0: The Enterprise Generation 1 Hortonworks Intro 2
  • 3. Hortonworks Snapshot • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop Develop Distribute Support We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo 3© Pactera. Confidential. All Rights Reserved. 3
  • 4. Rapid Customer Growth 4© Pactera. Confidential. All Rights Reserved. 4
  • 5. Hortonworks HDP: Enterprise Hadoop 1.x Distribution © Pactera. Confidential. All Rights Reserved. OS Cloud VM Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE (HCATALOG) PIG HBASE OOZIE AMBARI HDFS MAP REDUCE Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS 5
  • 6. Hadoop 2.0… The Enterprise Generation © Pactera. Confidential. All Rights Reserved. Business Value Big Data Transactions, Interactions, Observations Single Platform Multiple Use BATCH INTERACTIVE ONLINE 1.0 Architected for the Large Web Properties 2.0 Architected for the Broad Enterprise Enterprise Requirements Hadoop 2.0 Features Mixed workloads YARN Interactive Query Hive on Tez Reliability Full Stack HA Point in time Recovery Snapshots Multi Data Center Disaster Recovery ZERO downtime Rolling Upgrades Security Knox Gateway 6
  • 7. HDP: Enterprise Hadoop 2.0 Distribution © Pactera. Confidential. All Rights Reserved. OS/VM Cloud Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE & HCATALOG PIG HBASE HDFS MAP Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS KNOX* OOZIE AMBARI FALCON* YARN* TEZ* OTHERREDUCE 7
  • 8. Seamless Interoperability with Microsoft Tools © Pactera. Confidential. All Rights Reserved. • Integrated with Microsoft tools for native big data analysis » Bi-directional connectors for SQL Server and SQL Azure through SQOOP » Excel ODBC integration through Hive • Addressing demand for Hadoop on Windows » Ideal for Windows customers with Hadoop operational experience • Enables most common Hadoop workloads in the Enterprise » Data refinement and ETL offload for high-volume data landing » Data exploration for discovery of new business opportunities » Data enrichment for fined tuned delivery and recommendation engines APPLICATIONSDATASYSTEMS Microsoft Applications HORTONWORKS DATA PLATFORM For Windows DATASOURCES MOBILE DATA OLTP, PO S SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) 8
  • 9. Transferring Our Hadoop Expertise to You © Pactera. Confidential. All Rights Reserved. The expert source for Apache Hadoop training & certification • World class training programs designed to help you learn fast • Role-based hands on classes with 50% lab time • Certification to demonstrate Hadoop Expertise in Development and Administration • Expert consulting services • Programs designed to transfer knowledge • Industry leading Hadoop Sandbox • Free download • Fastest way to learn Apache Hadoop • Personal, portable Hadoop environment 9
  • 10. Hortonworks Summary © Pactera. Confidential. All Rights Reserved. • Leading the Innovation in Core Hadoop • Addressing the requirements for Enterprise usage • Enabling interoperability of the ecosystem • No lock-in. 100% Open Source. • Best in industry support with flexible pricing model • Find out moreworks.com » www.hortonworks.com/hadoop-training/ » www.hortonworks.com/sandbox 10
  • 11. Big Data is Critical © Pactera. Confidential. All Rights Reserved. Challenges to Using Big Data Given that nearly less than one-third of businesses are in the dark about their available data, it makes sense that silos are the primary hurdle in using this information. Lack of sharing data is an obstacle to measuring marketing ROI Not using data effectively to personalize marketing communications Not able to link data together at the individual customer level Data collected infrequently or not quickly enough Too little or no customer/ consumer data 51% 45% 42% 39% 29% 11
  • 12. What Initiatives Are Using Big Data © Pactera. Confidential. All Rights Reserved. 12
  • 13. Keys to a Successful Big Data Initiative © Pactera. Confidential. All Rights Reserved. Define the Impact • Short term VS. Long term measures What cannot be answered today? • This is your starting point Create User Centric Internal | External Applications • Decision support framework Predicting the Consumer • Algorithms, Models, Testing, and More Testing! 13
  • 14. Obstacles to Define Big Data ROI © Pactera. Confidential. All Rights Reserved. Not enough skilled resources for adaptation • Advance competencies Traditional IT Architectures cause limitations • Identifying the right technologies • Adapting to particular needs • Assemble business use cases • Silos Optimizing Solutions • Strong internal use cases • Inability to effectively automate data 14
  • 15. Solution Architecture using Multiple Ecosystems © Pactera. Confidential. All Rights Reserved. incoming outgoing Real Time In-Memory Solution EDW Hadoop Sand box 2 3 4 7 8 9 6 5 Models Algorithms Simulations 1. Data Feeds into a Real-Time Memory solution that will ingest data into EDW, Hadoop, and other platforms as mobile, API’s, etc. 2. ELT streaming into In-Memory Solution to provide visibility to Real-Time Social, Mobile, and Shell approaches to Algorithms, Models, and Simulations 3. In-Memory Real-Time Solution such as YARN or Storm to digest data to EDW, Hadoop, Social Media, and other such platforms. 4. EDW for Structured Information from Sources in 1. 5. Hadoop for semi-structured and unstructured data. Solution architecture including Sand Box availability. 6. Shell UI Interfaces utilizing data from Real-Time in memory solution as well as EDW, Hadoop, etc. for Models, Algorithms, and Simulations. 7. Structured and Unstructured Reporting in reporting interfaces 8. Deep Dive analytics in Hadoop and Real-time Streaming 9. Real-Time customer interaction for Social and other similar platforms. 1 15
  • 16. Predictive Analysis Use Case for Online Travel Company 16© Pactera. Confidential. All Rights Reserved.
  • 17. Flight Cost by Variants Determination Data Feeds utilize real-time in-memory streaming to execute matching algorithms. Used in order to determine views within a session of certain one-way and round trip flights viewed by users. Predictive Analytics algorithms determine how to increase/decrease prices based on views, market pricing, time, and availability. © Pactera. Confidential. All Rights Reserved. http logs partners custom incoming outgoing destinations rdbms hadoop application mobile Real Time In-Memory Solution (Storm) 17
  • 18. Solution Architecture using YARN © Pactera. Confidential. All Rights Reserved. • Created to manage resource needs across all uses • Ensures predictable performance & QoS for all apps • Enables apps to run “IN” Hadoop rather than “ON” » Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc. Applications Run Natively IN Hadoop HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) ONLINE (HBase) OTHER (Search) (Weave…) 18
  • 19. Pactera Big Data Capability © Pactera. Confidential. All Rights Reserved. Big Data Solution Architecture  In-Memory Solutions  Scalable Distributed Platforms Next Generation Analytics  Models, Algorithms, and Simulations  Visualization Improving Operational Ability  Help companies drive more operational efficiencies from existing investments.  Moving from the realm of data scientists into everyday business transactions and encounters. New Business Processes  Impact on both customer intelligence and operational efficiency by making everything immediately actionable.  Armed with immediate decision-making capability and intelligence, companies will be able to implement new business processes that will change how business is done.  We ask the Right Questions 19
  • 20. How Pactera can help with Big Data Implementation and Architecture Benchmark and Monitoring Implementation and Architecture POC (2-4 Weeks) © Pactera. Confidential. All Rights Reserved. Executive Workshop Strategies, Planning, and Expectations • Big Data strategy on what tomorrow will look like • Using Big Data to establish market dominance • Big Data project takeaways • Roadblocks to implementing Big Data analytics • Defining an ROI for Big Data • Getting the right ROI on Big Data Workshop (4 Hours) Proof of Concept (2-4 Weeks) Projects: •Benchmark & Monitoring •Integrations & Migrations •Implementation & Architecture •Project Management •Analytics •Reporting Technical Workshop End-To-End Management • System tuning/auto-tuning and configuration management • Dealing with both structured and unstructured data • Monitoring, diagnosis, and automated behavior detection Solution Architecture • Processor, memory, and system architectures for data analysis • Benchmarks, metrics, and workload characterization for big data • Availability, fault tolerance and recovery issues • Data management and analytics for vast amounts of unstructured data 20
  • 21. © Pactera. Confidential. All Rights Reserved. Thank You Tom Kersnick tom.kersnick@pactera.com Robby Richardson rrichardson@hortonworks.com

Editor's Notes

  • #12: Big Data is extremely critical in organizations just to keep up with the masses.In most retail organizations, internal data is very challenging to comprehend in understanding your customer as well as demand.Publications state that 1/3 of retailers are in the dark regarding data that could be available to them. The Silo approach within organizations is the primary cause of the broken data pipeline.The primary reasons as of why this is a hurdle are due to:*The lack of sharing data – definitely a major obstacle in measuring ROI*Misuse of available data in marketing communications – not able to personalize directly to your customer*Linking data at the customer level – this is needed to thoroughly understand user behavior*Infrequent data collection – only extracting from logs and online serving systems used within your traditional reporting ecosystem*Not enough customer data – not capturing the details of the customer (includes proper timings of viewed product, key indicators on why a user looks at one product versus another and so on)
  • #18: Flight Cost Variant Determination Flight Cost is one of the algorithm methods being used to increase/decrease revenue based on page views, consumer marketing, and time spent on a particular one-way or round-trip flight by a consumer. The goal is to provide not only alternatives, but increase/decrease cost while other consumers are also viewing the same flights. This is determined by sales from all related airlines and competitors during the flight availability. This method can be extended to use other sources as well.Destinations:web applicationsmobile applicationshadooprdbmsIn the solution architecture shown, the in memory solution processes views, marketing, customer behavior, time, and competitor results to derive a increased or decreased price for a given one-way or round trip flight. This allows this travel company to determine the proper pricing based on these measures within an algorithm. The architecture shown also allows this travel company to try out other predictive models at any given point in time to see if one model out performs another. They could be utilizing similar measures and outcomes as well as new derived measures from their predictive models. Overall, this is a win for the travel company. Never losing revenue from the original ‘bread and butter’ model they always apply. Fascinating right?As you can see in the outgoing destinations, this provides consistent results in all platforms allowing a finite understanding of how the travel company is generating results overall. The solution can provide endless results based on predictive models that can be applied in real-time. Any day, any time, any millisecond.
  • #21: Pactera offers a complete life cycle solutions within your organization. We offer a free 4 hour executive and technical workshop within your organization. We just ask for you to fill out a 1 page questionnaire to help us understand your expectations.The executive workshop entails strategy, planning, and your current and future goals.The technical workshop is a deep dive involving end to end management and a proper solution architecture based on your current and up and coming goals. Once the workshops is complete, we will provide you an assessment of the outcome.We also offer a 2-4 week proof of concept to ensure your project is put into action. And finally, we offer Full lifecycle in the following:Benchmark & MonitoringIntegrations & MigrationsImplementation & ArchitectureProject ManagementAnalyticsReporting