SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Your customer’s journey to success AND
Your journey to quicker and larger expansions
Making Data Real
2© Cloudera, Inc. All rights reserved.
What we are going to discuss here
1. How advanced analytics is better than data
warehousing
2. Being data-driven is a journey, not a project
3. Our most successful customers do five key things
3© Cloudera, Inc. All rights reserved.
Innovation around the world driven by data
Using data about bets per second
and machine learning to promote
responsible gambling by
customising offers to minimise the
customer's vulnerability.
A "smart business" application for
small businesses that enables them
to see patterns in an anonymised
data generated by the bank's other
customers.
CONNECT PRODUCT & SERVICES (IoT) DRIVE CUSTOMER INSIGHTSPROTECT LIVES
Analyzing acoustic data coming
from turbines in real-time to
monitor the health of and predict
failures in turbines for hydro power
stations.
4© Cloudera, Inc. All rights reserved.
Advanced analytics is better than data warehousing
Build your data asset economically and at scale
1. Collect data in native format – enables agility
2. Build history by collecting data prior to its use
Securely share on-prem, in cloud, anywhere
3. Security at the data layer increases flexibility and ability to protect privacy
4. Create community data and drive innovation by sharing across your business
Innovate with analytics and operationalize the insights
5. Analyze data in near real-time
6. Build and deploy machine learning models and other advanced analytics
7. Deliver insights via enterprise, mobile and web applications
5© Cloudera, Inc. All rights reserved.
Think Big.
Start small.
Iterate to success.
Being data-driven is a journey, not a project.
6© Cloudera, Inc. All rights reserved.
Customers ask: what does ‘think big’ mean?
Determine your strategic initiatives.
Read your annual report.
Define a reasonable timeframe and goals.
Typically 3-5 strategic initiatives in parallel
Often segmented by business unit
At maturity initiatives cross business units.
7© Cloudera, Inc. All rights reserved.
Customers ask: what does ‘start small’ mean?
Embrace the familiar.
Enhance something you know.
Make a report better with more data.
Then go get a shiny object (new data).
Bring in and integrate.
Showcase your results with a visualization.
8© Cloudera, Inc. All rights reserved.
Customers ask: what does ‘iterate often’ mean?
Break strategic initiatives into quarterly objectives.
Break your quarterly objectives into sprints.
Deliver and visualize outcomes at every sprint exit.
Continuously learning and adapting.
Outcomes can be both positive and negative.
Outcomes can be about the business and data.
9© Cloudera, Inc. All rights reserved.
Our Most Successful Customers iterate these Five Things
1. Build a Big Data Culture
Led by an enabled executive sponsor(s). Communication methodologies. Advocating change.
2. Assemble the right team
Tightly aligned team. Mix of seasoned experts and innovators
3. Adopt an agile approach for data engineering, data science, analysis
Successful projects start small, are hypothesis driven and iterate to success approach.
Roadmaps: Document expected direction, yet expect insights to create change
4. Efficiently operationalize insights
Analytics -> Reports, Big Data -> Actions. Create a bridge between Dev and Ops
5. Rightsize your data governance
Rightsize and iteratively building towards maturity.
10© Cloudera, Inc. All rights reserved.
Description
Executive
Sponsorship
 Executive Sponsor for the overall Big Data mission including advocacy for
creating/collecting data and business stakeholders for individual use cases. Align to
strategic initiatives.
Community
 Build community through communications about vision, insights, data and platform
and technology.
 Make communications more programmatic across the entire organization with
meet ups, big data days and hackathons.
Foster a culture that iteratively and continuous builds a strong
sharing community. Enable many in the organization – over time –
to become evangelists
1. Build a data-driven culture
Visualizations  Visualize EVERYTHING! Use visualizations to tell stories about the data asset itself
(how big is it, how fast is it growing) as well as insights found for the business.
11© Cloudera, Inc. All rights reserved.
Logical Information Architecture
An environment that supports new ways of working
Ingestion Zone Discovery Zone Integrated Zone Production
Trusted power users have broad
access & new tools.
The new data engineering
team partners with
data stewards to ingest
full fidelity raw data
Continuous deployment concepts move
data, models, etc from exploratory
environment to production
Business users have
narrowed access with
traditional BI tools
and applications
12© Cloudera, Inc. All rights reserved.
Ingestion Zone Discovery Zone Integrated Zone Production
Raw Trusted
Ingest
Validation &
Verification
Enrichment Transform Routing
Logical Information Architecture
An environment that supports new ways of working
13© Cloudera, Inc. All rights reserved.
2. Assemble the right team 1
Executive
Architecture &
Operations
Data
Engineering
Data Science,
SQL & app
development
Vision and Goals
14© Cloudera, Inc. All rights reserved.
Description
An essential key to success is having a strong executive sponsor for the
overall Big Data mission including advocacy for creating/collecting data
and business stakeholders for individual use cases.
Profile
 An executive focused on change, and willing to take risk to ensure the
success of the business via the Big Data initiatives.
Education
 Use every opportunity to bring the topic in front of potential
sponsors and stakeholders. Share industry and business potential
ROI models (heeding the warning not to overstate).
Advocacy
 Build big data success stories from within the business. Advocate for
the use of data in new ways. Support the proactive collection of data
and lead the charge to assign value to data.
The Important Role of the Executive Sponsor
15© Cloudera, Inc. All rights reserved.
Hadoop and the Big Data technology ecosystems change rapidly
– infrastructure architecture is a critical component of your
team. Architects need to balance tactical and strategic needs.
Communication
 The software and hardware infrastructure is often physically operated
by an different group. The Architect needs close collaboration.
Education
 Continually explore new technologies, including 3rd party tools –
architects need to stay ahead of the curve. Training is essential:
admin, developer.
Leadership
 Be the infrastructure expert and advise on new projects and new
requirements from the data management team and the business.
Know when to call in the experts on Hadoop and Big Data.
Description
Your Infrastructure Team & Architect
16© Cloudera, Inc. All rights reserved.
Data is only useful if users can employ it in a meaningful way.
Data engineers have to be committed to making your company’s
data the utmost strategic asset, from acquisition to advocacy.
Communication
 Document, secure, audit the data. Create simple schemas and search
indexes for each data set. Create common profiles, and continually
advocate for new data and for improved data.
Education
 Get trained and certified with Cloudera Administrator, Developer,
Data Analyst courses and become an expert with Navigator, Sentry
Leadership
 Promote and evangelize to educate on the value of Big Data, take the
lead on data governance – love the data
Description
Your Data Engineering Team
17© Cloudera, Inc. All rights reserved.
Curiosity
Math &
Statistical
Knowledge
Hacking
skills
Subject
Matter
Expertise
The hybrid data scientist
• Subject Matter Expertise lies
in the business
• Hacking skills can come from
existing IT staff or new hires
• Staff at least one true Ph.D
statistician for model
oversight across all teams
Important character trait
Data Science
A luxury is finding one or more
data scientists that cross these
disciplines
Your Data Scientist Team(s)
18© Cloudera, Inc. All rights reserved.
Often a centralized Data Science team can partner with the
business to identify data that differentiates, explore use cases to
solve, and help to jumpstart business teams. Be mindful not to
overbuild centrally.
Agility  The team must be able to learn quickly and adapt
Skills
 Hybrid skills of computer science (hacking), domain expertise and
at least one true statistician. Data Science training.
Teams
 Often businesses find the domain expertise in-house, add in MS/Ph.D.
candidates from local universities and hire that one true statistician
Experts
 This team must be the “data experts” for the entire company in order
to fulfil the vision of sharing data for maximum innovation
Description
Staff for Success: Data Science-as-a-Service
19© Cloudera, Inc. All rights reserved.
Lower risk
 Risk of funding long-running projects with limited business value is
small. Use daily results to improve the process or change course.
Lower costs
 Can run infrastructure, data and insights workstreams in parallel.
Avoids large build-out of infrastructure and data before insights.
Communication
 With clear short-term results, enables a continuous communications
stream showcasing results or failures
Team
 Can start with small team, and add additional scrum teams as value is
determined and investment is available
Agile methodology provides actionable results more rapidly and
measures the value gained at each step, in small iterations. Agile
should be applied to data and insights project workstreams.
Description
3. Adopt an agile approach to data engineering & science
20© Cloudera, Inc. All rights reserved.
Use Case Development
EDH Buildout
Data Governance & Common Profile Development
Data Engineering
Agile Methodology Enables Iterative Workstreams
Use Case Development/App development/Data science
21© Cloudera, Inc. All rights reserved.
Agile Use Case Development
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
Agile Data Ingestion/Management
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
Agile Methodology Enables Iterative Workstreams
Agile Data Governance & Common Profile Development
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
EDH Buildout
22© Cloudera, Inc. All rights reserved.
Description
DevOps  Operationalizing data and insights from analytics is more like digital and mobile
development cycles, than traditional ERP or RDBMS applications
Communication
 Start by finding those people who understood both sides of the agile development
and IT deployment cycle – and bridge the communication gap between both sides
 Initially most DevOps processes were manual – making sure web code was unit and
functionally tested, ensuring source code was under a control system, etc.
Apply DevOps concepts from the world of application / web
development to the world of data and analytics – from
managing data that needs to move to production, as well as the
models used to create insights from that data.
4. Efficiently operationalize your insights
Continuous
Delivery
 Move towards automation of the processes needed to move new analytical
code/models/data from development into production. Eventually get to complete
automation of those processes, allowing for continuous deployment of new
analytical models and data into production
23© Cloudera, Inc. All rights reserved.
Data Stewards
Owners and/or creators
of the data
Responsibilities
 Providing knowledge
about the data (e.g.
privacy, use case
concerns)
 Documenting and
improving the raw data,
with focus on link-ability
Data Engineers
Implement the data
governance policies
Responsibilities
 Defining and driving the
governance
 Organizing and hosting
the Governance Council
 Delivering and utilizing
tools (e.g. Navigator) to
enforce governance
Data Governance
Council
Business owners of the
Data Governance
Responsibilities
 Communication about
and enforcement of
data governance
 Assigning data steward
roles
 Improving the link-
ability of data
5. Rightsize Your Big Data Governance
24© Cloudera, Inc. All rights reserved.
Rightsize your data governance: Iterate to maturity
Chaos: “We don’t
know what’s in our
data hub”
CYA: Basic
governance artifact
capture
Self-service: Data
curation automation
Automation: Data
stewardship and
lifecycle automation
Continuous
improvement:
ongoing
optimization
1
2
3
4
5Initial
Managed
Standardized
Measured
Optimized
25© Cloudera, Inc. All rights reserved.
Data can make what is
impossible today,
possible tomorrow
26© Cloudera, Inc. All rights reserved.
Changing our relationship with the
products and services we consume
27© Cloudera, Inc. All rights reserved.
Improving reliability, quality,
& sustainability
28© Cloudera, Inc. All rights reserved.
Reaching for the stars
29© Cloudera, Inc. All rights reserved.
Protecting the most vulnerable
30© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

PPTX
Put Alternative Data to Use in Capital Markets

PPTX
Unlocking data science in the enterprise - with Oracle and Cloudera
PPTX
Transforming Insurance Analytics with Big Data and Automated Machine Learning

PPTX
Get Started with Cloudera’s Cyber Solution
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
PPTX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
PPTX
The Vortex of Change - Digital Transformation (Presented by Intel)
PPTX
The Big Picture: Real-time Data is Defining Intelligent Offers
Put Alternative Data to Use in Capital Markets

Unlocking data science in the enterprise - with Oracle and Cloudera
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Get Started with Cloudera’s Cyber Solution
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
The Vortex of Change - Digital Transformation (Presented by Intel)
The Big Picture: Real-time Data is Defining Intelligent Offers

What's hot (20)

PPTX
Turning Data into Business Value with a Modern Data Platform
PPTX
Using Big Data to Transform Your Customer’s Experience - Part 1

PPTX
IoT-Enabled Predictive Maintenance
PPTX
Advanced Analytics for Investment Firms and Machine Learning
PPTX
Driving Better Products with Customer Intelligence

PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
PPTX
Becoming Data-Driven Through Cultural Change
PPTX
Building a Modern Analytic Database with Cloudera 5.8
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
PPTX
How to Lower TCO and Avoid Cloud Lock-in

PPTX
From Insight to Action: Using Data Science to Transform Your Organization
PPTX
Customer Best Practices: Optimizing Cloudera on AWS
PPTX
Deep Learning with Cloudera
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Preparing for the Cybersecurity Renaissance
PPTX
Strategies for Enterprise Grade Azure-based Analytics
PPTX
Cloudera Fast Forward Labs: Accelerate machine learning
PPTX
High-Performance Analytics in the Cloud with Apache Impala
PPTX
Optimize your cloud strategy for machine learning and analytics
PPTX
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Turning Data into Business Value with a Modern Data Platform
Using Big Data to Transform Your Customer’s Experience - Part 1

IoT-Enabled Predictive Maintenance
Advanced Analytics for Investment Firms and Machine Learning
Driving Better Products with Customer Intelligence

Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Becoming Data-Driven Through Cultural Change
Building a Modern Analytic Database with Cloudera 5.8
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
How to Lower TCO and Avoid Cloud Lock-in

From Insight to Action: Using Data Science to Transform Your Organization
Customer Best Practices: Optimizing Cloudera on AWS
Deep Learning with Cloudera
2020 Cloudera Data Impact Awards Finalists
Preparing for the Cybersecurity Renaissance
Strategies for Enterprise Grade Azure-based Analytics
Cloudera Fast Forward Labs: Accelerate machine learning
High-Performance Analytics in the Cloud with Apache Impala
Optimize your cloud strategy for machine learning and analytics
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Ad

Similar to The Five Markers on Your Big Data Journey (20)

PDF
Creating your Center of Excellence (CoE) for data driven use cases
PPTX
The Journey to Success with Big Data
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
PPTX
When SAP alone is not enough
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
The 5 Biggest Data Myths in Telco: Exposed
PDF
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
PDF
How to succeed at data without even trying!
PDF
Capgemini Leap Data Transformation Framework with Cloudera
PPTX
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
PPTX
Introducing the data science sandbox as a service 8.30.18
PDF
Ask bigger questions
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PDF
Gab Genai Cloudera - Going Beyond Traditional Analytic
PPTX
Applying Big Data Superpowers to Healthcare
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
PDF
How to Create a Data Analytics Roadmap
 
Creating your Center of Excellence (CoE) for data driven use cases
The Journey to Success with Big Data
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
When SAP alone is not enough
Edc event vienna presentation 1 oct 2019
The 5 Biggest Data Myths in Telco: Exposed
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
How to succeed at data without even trying!
Capgemini Leap Data Transformation Framework with Cloudera
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Introducing the data science sandbox as a service 8.30.18
Ask bigger questions
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Gab Genai Cloudera - Going Beyond Traditional Analytic
Applying Big Data Superpowers to Healthcare
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
How to Create a Data Analytics Roadmap
 
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Cloudera SDX
PPTX
Introducing Workload XM 8.7.18
PPTX
Get started with Cloudera's cyber solution
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Cloudera SDX
Introducing Workload XM 8.7.18
Get started with Cloudera's cyber solution
Spark and Deep Learning Frameworks at Scale 7.19.18

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

The Five Markers on Your Big Data Journey

  • 1. 1© Cloudera, Inc. All rights reserved. Your customer’s journey to success AND Your journey to quicker and larger expansions Making Data Real
  • 2. 2© Cloudera, Inc. All rights reserved. What we are going to discuss here 1. How advanced analytics is better than data warehousing 2. Being data-driven is a journey, not a project 3. Our most successful customers do five key things
  • 3. 3© Cloudera, Inc. All rights reserved. Innovation around the world driven by data Using data about bets per second and machine learning to promote responsible gambling by customising offers to minimise the customer's vulnerability. A "smart business" application for small businesses that enables them to see patterns in an anonymised data generated by the bank's other customers. CONNECT PRODUCT & SERVICES (IoT) DRIVE CUSTOMER INSIGHTSPROTECT LIVES Analyzing acoustic data coming from turbines in real-time to monitor the health of and predict failures in turbines for hydro power stations.
  • 4. 4© Cloudera, Inc. All rights reserved. Advanced analytics is better than data warehousing Build your data asset economically and at scale 1. Collect data in native format – enables agility 2. Build history by collecting data prior to its use Securely share on-prem, in cloud, anywhere 3. Security at the data layer increases flexibility and ability to protect privacy 4. Create community data and drive innovation by sharing across your business Innovate with analytics and operationalize the insights 5. Analyze data in near real-time 6. Build and deploy machine learning models and other advanced analytics 7. Deliver insights via enterprise, mobile and web applications
  • 5. 5© Cloudera, Inc. All rights reserved. Think Big. Start small. Iterate to success. Being data-driven is a journey, not a project.
  • 6. 6© Cloudera, Inc. All rights reserved. Customers ask: what does ‘think big’ mean? Determine your strategic initiatives. Read your annual report. Define a reasonable timeframe and goals. Typically 3-5 strategic initiatives in parallel Often segmented by business unit At maturity initiatives cross business units.
  • 7. 7© Cloudera, Inc. All rights reserved. Customers ask: what does ‘start small’ mean? Embrace the familiar. Enhance something you know. Make a report better with more data. Then go get a shiny object (new data). Bring in and integrate. Showcase your results with a visualization.
  • 8. 8© Cloudera, Inc. All rights reserved. Customers ask: what does ‘iterate often’ mean? Break strategic initiatives into quarterly objectives. Break your quarterly objectives into sprints. Deliver and visualize outcomes at every sprint exit. Continuously learning and adapting. Outcomes can be both positive and negative. Outcomes can be about the business and data.
  • 9. 9© Cloudera, Inc. All rights reserved. Our Most Successful Customers iterate these Five Things 1. Build a Big Data Culture Led by an enabled executive sponsor(s). Communication methodologies. Advocating change. 2. Assemble the right team Tightly aligned team. Mix of seasoned experts and innovators 3. Adopt an agile approach for data engineering, data science, analysis Successful projects start small, are hypothesis driven and iterate to success approach. Roadmaps: Document expected direction, yet expect insights to create change 4. Efficiently operationalize insights Analytics -> Reports, Big Data -> Actions. Create a bridge between Dev and Ops 5. Rightsize your data governance Rightsize and iteratively building towards maturity.
  • 10. 10© Cloudera, Inc. All rights reserved. Description Executive Sponsorship  Executive Sponsor for the overall Big Data mission including advocacy for creating/collecting data and business stakeholders for individual use cases. Align to strategic initiatives. Community  Build community through communications about vision, insights, data and platform and technology.  Make communications more programmatic across the entire organization with meet ups, big data days and hackathons. Foster a culture that iteratively and continuous builds a strong sharing community. Enable many in the organization – over time – to become evangelists 1. Build a data-driven culture Visualizations  Visualize EVERYTHING! Use visualizations to tell stories about the data asset itself (how big is it, how fast is it growing) as well as insights found for the business.
  • 11. 11© Cloudera, Inc. All rights reserved. Logical Information Architecture An environment that supports new ways of working Ingestion Zone Discovery Zone Integrated Zone Production Trusted power users have broad access & new tools. The new data engineering team partners with data stewards to ingest full fidelity raw data Continuous deployment concepts move data, models, etc from exploratory environment to production Business users have narrowed access with traditional BI tools and applications
  • 12. 12© Cloudera, Inc. All rights reserved. Ingestion Zone Discovery Zone Integrated Zone Production Raw Trusted Ingest Validation & Verification Enrichment Transform Routing Logical Information Architecture An environment that supports new ways of working
  • 13. 13© Cloudera, Inc. All rights reserved. 2. Assemble the right team 1 Executive Architecture & Operations Data Engineering Data Science, SQL & app development Vision and Goals
  • 14. 14© Cloudera, Inc. All rights reserved. Description An essential key to success is having a strong executive sponsor for the overall Big Data mission including advocacy for creating/collecting data and business stakeholders for individual use cases. Profile  An executive focused on change, and willing to take risk to ensure the success of the business via the Big Data initiatives. Education  Use every opportunity to bring the topic in front of potential sponsors and stakeholders. Share industry and business potential ROI models (heeding the warning not to overstate). Advocacy  Build big data success stories from within the business. Advocate for the use of data in new ways. Support the proactive collection of data and lead the charge to assign value to data. The Important Role of the Executive Sponsor
  • 15. 15© Cloudera, Inc. All rights reserved. Hadoop and the Big Data technology ecosystems change rapidly – infrastructure architecture is a critical component of your team. Architects need to balance tactical and strategic needs. Communication  The software and hardware infrastructure is often physically operated by an different group. The Architect needs close collaboration. Education  Continually explore new technologies, including 3rd party tools – architects need to stay ahead of the curve. Training is essential: admin, developer. Leadership  Be the infrastructure expert and advise on new projects and new requirements from the data management team and the business. Know when to call in the experts on Hadoop and Big Data. Description Your Infrastructure Team & Architect
  • 16. 16© Cloudera, Inc. All rights reserved. Data is only useful if users can employ it in a meaningful way. Data engineers have to be committed to making your company’s data the utmost strategic asset, from acquisition to advocacy. Communication  Document, secure, audit the data. Create simple schemas and search indexes for each data set. Create common profiles, and continually advocate for new data and for improved data. Education  Get trained and certified with Cloudera Administrator, Developer, Data Analyst courses and become an expert with Navigator, Sentry Leadership  Promote and evangelize to educate on the value of Big Data, take the lead on data governance – love the data Description Your Data Engineering Team
  • 17. 17© Cloudera, Inc. All rights reserved. Curiosity Math & Statistical Knowledge Hacking skills Subject Matter Expertise The hybrid data scientist • Subject Matter Expertise lies in the business • Hacking skills can come from existing IT staff or new hires • Staff at least one true Ph.D statistician for model oversight across all teams Important character trait Data Science A luxury is finding one or more data scientists that cross these disciplines Your Data Scientist Team(s)
  • 18. 18© Cloudera, Inc. All rights reserved. Often a centralized Data Science team can partner with the business to identify data that differentiates, explore use cases to solve, and help to jumpstart business teams. Be mindful not to overbuild centrally. Agility  The team must be able to learn quickly and adapt Skills  Hybrid skills of computer science (hacking), domain expertise and at least one true statistician. Data Science training. Teams  Often businesses find the domain expertise in-house, add in MS/Ph.D. candidates from local universities and hire that one true statistician Experts  This team must be the “data experts” for the entire company in order to fulfil the vision of sharing data for maximum innovation Description Staff for Success: Data Science-as-a-Service
  • 19. 19© Cloudera, Inc. All rights reserved. Lower risk  Risk of funding long-running projects with limited business value is small. Use daily results to improve the process or change course. Lower costs  Can run infrastructure, data and insights workstreams in parallel. Avoids large build-out of infrastructure and data before insights. Communication  With clear short-term results, enables a continuous communications stream showcasing results or failures Team  Can start with small team, and add additional scrum teams as value is determined and investment is available Agile methodology provides actionable results more rapidly and measures the value gained at each step, in small iterations. Agile should be applied to data and insights project workstreams. Description 3. Adopt an agile approach to data engineering & science
  • 20. 20© Cloudera, Inc. All rights reserved. Use Case Development EDH Buildout Data Governance & Common Profile Development Data Engineering Agile Methodology Enables Iterative Workstreams Use Case Development/App development/Data science
  • 21. 21© Cloudera, Inc. All rights reserved. Agile Use Case Development Scrum Team Release 1 Release 2 Release 3 Production Ready Scrum Team Release 1 Release 2 Production Ready Release 4 Release 3 Agile Data Ingestion/Management Scrum Team Release 1 Release 2 Release 3 Production Ready Scrum Team Release 1 Release 2 Production Ready Release 4 Release 3 Agile Methodology Enables Iterative Workstreams Agile Data Governance & Common Profile Development Scrum Team Release 1 Release 2 Release 3 Production Ready Scrum Team Release 1 Release 2 Production Ready Release 4 Release 3 EDH Buildout
  • 22. 22© Cloudera, Inc. All rights reserved. Description DevOps  Operationalizing data and insights from analytics is more like digital and mobile development cycles, than traditional ERP or RDBMS applications Communication  Start by finding those people who understood both sides of the agile development and IT deployment cycle – and bridge the communication gap between both sides  Initially most DevOps processes were manual – making sure web code was unit and functionally tested, ensuring source code was under a control system, etc. Apply DevOps concepts from the world of application / web development to the world of data and analytics – from managing data that needs to move to production, as well as the models used to create insights from that data. 4. Efficiently operationalize your insights Continuous Delivery  Move towards automation of the processes needed to move new analytical code/models/data from development into production. Eventually get to complete automation of those processes, allowing for continuous deployment of new analytical models and data into production
  • 23. 23© Cloudera, Inc. All rights reserved. Data Stewards Owners and/or creators of the data Responsibilities  Providing knowledge about the data (e.g. privacy, use case concerns)  Documenting and improving the raw data, with focus on link-ability Data Engineers Implement the data governance policies Responsibilities  Defining and driving the governance  Organizing and hosting the Governance Council  Delivering and utilizing tools (e.g. Navigator) to enforce governance Data Governance Council Business owners of the Data Governance Responsibilities  Communication about and enforcement of data governance  Assigning data steward roles  Improving the link- ability of data 5. Rightsize Your Big Data Governance
  • 24. 24© Cloudera, Inc. All rights reserved. Rightsize your data governance: Iterate to maturity Chaos: “We don’t know what’s in our data hub” CYA: Basic governance artifact capture Self-service: Data curation automation Automation: Data stewardship and lifecycle automation Continuous improvement: ongoing optimization 1 2 3 4 5Initial Managed Standardized Measured Optimized
  • 25. 25© Cloudera, Inc. All rights reserved. Data can make what is impossible today, possible tomorrow
  • 26. 26© Cloudera, Inc. All rights reserved. Changing our relationship with the products and services we consume
  • 27. 27© Cloudera, Inc. All rights reserved. Improving reliability, quality, & sustainability
  • 28. 28© Cloudera, Inc. All rights reserved. Reaching for the stars
  • 29. 29© Cloudera, Inc. All rights reserved. Protecting the most vulnerable
  • 30. 30© Cloudera, Inc. All rights reserved. Thank you