SlideShare a Scribd company logo
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 1
STKI Summit 2019
THE DATA
UNICORNS
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 2
Main Themes
2019
for Data and Analytics
DATA-CENTRIC THE DATA DEBT
01
02
03
06
05
08
07
04
Applications, processes &
decisions becoming data-
centric
Data Catalogs proliferation
But Lack of data ownership
and strategy remains
REAL PROBLEMS
Use of Design Thinking
and Empathy concepts to
solve REAL problems
DATA LITERACY
The data “language” in
organizations will
increase
DATA SCIENCE FOR ALL
AI, ML and
Automation will
empower citizen DS
DATA PRODUCTS
“Data product
managers” will manage
the entire lifecycle
DATA TEAMS
Agile-like teams will
collaborate around
data “products”
AUTOMATION
Automation in data
management data
science processes
STKI Summit 2019
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 3
ARE WE READY FOR A DATA-CENTRIC REALITY?
Intelligent Automation
Seamless Experiences
AI-fueled processes
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 4
Payroll Sales Call center
Software Software Software
Infra Infra Infra
Developers Developers Developers
Users Users
Silo Silo Silo
Application Centric Computing
(systems of transactions)
Customer Facing Computing
(systems of engagement)
DATA Centric Computing
(systems of decisions)
Automation
Revolution
(Preemptive)
AI/ML/DL Data Science
Intelligence
Systems
Human/ Machine Workforce
IoT Process
Engineering
UsersUsers
Digital
(forced)
Transformation
Channels
APIs
AGILE
Customer
Journey
UX
Marketing
Automation
RPA
Data Analytics
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 5
of organizations have adopted or have plans to
adopt AI in the next 5 years (IDC)
AI-Driven companies will steal
$1.2 Trillion from competitors
by 2020
The
race
for AI
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 6
63% of CEOs think AI will have a
greater impact than the internet
Source: PWC
The
race
for AI
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 7
Like it or not, the AGE of AUTOMATION is here
Ratio of human-machine working hours – 2018 vs. 2022
human machinehuman machine
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 8
Processes and business operations rely on data
This means future businesses will be
DATA CENTRIC
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 9
Source: PWC
10-year old gap!
WHAT DO CEOs SAY ABOUT THEIR OWN DATA GAP?
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 10
3 reasons for this gap:
lack of
analytical
Mindset
Data Siloing
Poor data
reliability
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 11
DATA DRIVEN
is more of a cultural thing
Being data-driven
means that people’s
decisions & actions
rely on data
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 12
DATA LITERACY* is a new
language, and we all need to
be fluent in it
*Data literacy: the ability to read, write and communicate data in context
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 13
Source: The data literacy project
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 14
POOR DATA LITERACY IS A MAJOR ROAD BLOCK FOR CDOs
Source: Gartner CDO Survey
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 15
thedataliteracyproject.org
A global community dedicated to building a data-literate culture for all
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 16
The rise of CDOs
Source: Gartner CDO Survey
29 FTEs reporting
directly to CDOs
25% increase in
CDOs funding
Will be Mission-critical
function in 75% of orgs.
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 17
Source: Gartner CDO Survey
Risk
Mitigation
Cost
Cutting
Value Creation
27% 28% 45%
CDOs time allocation:
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 18
63% 28%
DO YOU HAVE A CDO (DATA) IN PLACE?
Source: STKI DATA Survey, 2019Source: Gartner CDO Study YES
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 19
CDO survey: Israel
STKI CX
Survey
2019
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 20
CDO survey: Israel
STKI CX
Survey
2019
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 21
ONE CDO STRUCTURE >DOESN’T< FIT ALL
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 22
Source: Oracle
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 23
Source: IBM
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 24
WE WANT
DATA
DEMOCRACY
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 25
WE WANT
DATA
DEMOCRACY
WHO ARE WE? DATA SCIENTISTS!
WHAT DO WE WANT?
WHEN DO WE WANT IT? NOW!!!
SELF SERVICE!
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 26
…and then came the DATA LAKE
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 27
MYTH: REALITY:
Data lakes are the answer to
data democratization and
self service.
Let’s upload a lot of data
into the lake as quickly as
possible.
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 28
This actually created a big problem.
Data is not harmonized, data lakes are
full of isolated data islands:
Organizations widened their data debt
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 29
The DATA DEBT
64% Duplicate data
Missing data - Fields that should
contain values, but do not.
25% data entry errors
No single version of the truth
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 30
“80% of data science is
cleaning the data
20 % is complaining
about cleaning the
data”
Source: Kaggle State of Data Science Survey
WHAT ARE DATA SCIENTISTS’ MAIN CHALLENGES?
1. Dirty data
2. No access
3. Privacy issues
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 31
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 32
20% 33%600B$ 12%
of average
data set is
dirty
is the annual
cost to the
U.S economy
due to bad
data
of company
projects fail
because of
weak data
is the average
annual
revenue loss
(Sources: Springer Link; IBM )
BAD DATA = BAD DECISIONS
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 33
STORE
ACCESSDEPLOY
PREPARE
MODEL
6 1
2
3
5
4 Store the data:
DW/ DL/ Data Mart/
Logical DW
Transform
Clean
Understand
DATA CENTRIC ARCHITECTURE
6.DEPLOY 1.ACCESS
2.INGEST
3.STORE4.PREPARE
Model
5.MODEL
Learn
Train
GOVERN INGEST
Run code in operational
processes
Systems of Decisions
Data Dictionary
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 34
From Waterfall to Agile, Iterative Processes
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 35
WHAT’S THE RIGHT BALANCE
for a DATA-CENTRIC-READY BUSINESS?
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 36
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION (DATAOPS)
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 37
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION: DATAOPS
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 38
What is stopping you from becoming data centric?
Source: Atscale Big Data Maturity Report
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 39
Do you need a data catalog?
Yes.
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 40
DISCOVERY
Easily Search
and browse
data
WHY DO
YOU NEED
A DATA
CATALOG?
ENABLES
Self service
to DS and
analysts
TAGGING
Data is
described
technical &
business
CURATION
Self service
to DS and
analysts
FEEDBACK
Rating and
reviews by
users
BALANCE
Between
the need
to control
and to
consume
AWARENESS
Be informed
of relevant
and available
data
HARMONIZE
Enable single
version of the
truth
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 41
Through to the end of 2022, manual tasks in
data management will be cut by 45% thanks
to ML and automated service-level
management (Gartner)
AUTOMATION
in
Data Management
Cleaning, wrangling, transforming, and loading
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 42
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION: DATAOPS
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 43
WANTED: Analytics Engineer
Research tasks
Build/plan models
Statistical languages
Prototype ML
models
DATA
ENGINEER
BUSINESS
ANALYST
DATA SCIENTIST
Ingestion
Storage
Transformation
Preparation
Virtualization
Enrichment
Business Logic
Understand
the impact to
the business
R, Python
Hadoop
Spark
Kafka
ML
Data Visualization
Unstructured data
Business understanding
Communication skills
Storytelling skills
DB Administration
Storage
Visualization
SQL
Data Pipeline
Business understanding
Communication skills
Data Architecture
NoSQL
Data Integration, ETL, APIs
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 44
Data Engineer
Business Analyst Operations Data Scientist
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 45
Define a data governance strategy
Enforce a “data catalog” policy
Define harmonized data definitions
Create a central COE for DS teams
HARMONIZED DATA PLATFORM
AGILE TEAM
AUTOMATION: DATAOPS
1
2
3
Create data teams/ squads
Product owner
Automate as many processes as
possible in the DS value chain
Use DataOps/ MLOps as reference
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 46
THE GOAL:
Managing data products
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 47
DATAOPS IS NOW A “THING”
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 48
The next evolution: MLOps (“A/B Testing” for DS)
ML training (a.k.a model generation, model build or model fit) that generates the model
ML inference (a.k.a prediction, scoring, or model serve) that generates the insights.
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 49
Source: EY
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 50
Source: EY
Chatbots,
NLP/NLG
and RPA.
Chatbots,
NLP/NLG.
IPA, ML,
NLP/NLG,
RPA
IPA (Intelligent
process automation),
ML and RPA
Deep Learning, ML
and IPA (Intelligent
process automation)
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 51
GNS HEALTH: DISCOVERING CAUSAL LINKS
AGRICULTURE : FARMERS RECOMMENDATIONS
GNS applies ML to find overlooked relationships in
patients’ health records. It creates hypotheses to
explain it and then suggests which are most likely.
Result: GNS uncovered a new drug interaction
hidden in unstructured patient notes.
AI system provides real-time recommendations for
farmers on how to increase productivity (which
crops to plant, where to grow, nitrogen in soil…)
Result: farmers happy about the crop yields
obtained with AI’s guidance.
AI solution that improves accuracy of fraud detection.
Monitors millions of transactions daily, purchase
location, customer behavior, IP addresses… to identify
patterns that signal possible fraud.
DANSKE BANK: AI FOR FRAUD DETECTION
AI & ML USE CASE EXAMPLES
Result: Improved fraud detection rate by
50%, decreased false positives by 60%.
Investigators can concentrate efforts on
flagged transactions.
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 52
OCEAN MEDALLION
A “LOVE BOAT” EXPERIENCE
DANSKE BANK: AI FOR FRAUD DETECTION
Instead of just alleviating the “friction”
of typical travel experiences (lines,
room keys, paying for things) it will use
data to anticipate what you want to
do, eat, and see.
The medallion can be used to pay;
unlock the door to your room as you
approach; can be used on the ship’s
gambling platform; provide
recommendations based on preferences
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 53
35% consider Machine learning models to be ‘black
boxes’ (but feel the models can be explained by
experts – “explainers”).
10% of the participants are confident of explaining
most or all models.
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 54
#MyData
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 55
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 56
What is
“personal data”?
How do you
manage it?
Source: Skillzme
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 57
Fair data economy
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 58
42% say that lack of trust prevents
them from using digital services
Source: Sitra’s 2018 four country survey (Europe: Finland, Netherlands, France, Germany)
Trust is built by having the power to
influence how your data is used
In a survey for IBM, 75 percent of respondents said they will
not buy a product from a company – no matter how great the
product – if they don’t trust that company to protect their data
“Give me your data”
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 59
• Build a data catalog as a “data-lake gatekeeper”
• Tackle point-specific data quality projects
• Assign mixed data teams and an “agile way of
working” for specific dynamic analytic
• Automate as much as possible!
• Define key business questions:
“Start with the problem, not the data”
• Design a data governance strategy
• Establish CDO-IT-LOBs collaborative processes
• Focus on promoting data literacy
• Implement DevOps/DataOps principals
Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 60
Einat Shimoni
EVP & Senior Analyst
STKI

More Related Content

PDF
Modernizing to a Cloud Data Architecture
PDF
Journey for a data driven organization
PPTX
DW Migration Webinar-March 2022.pptx
PDF
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
PPTX
Data Lakehouse Symposium | Day 4
PDF
Data Mesh for Dinner
PDF
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
PDF
Evolution from EDA to Data Mesh: Data in Motion
Modernizing to a Cloud Data Architecture
Journey for a data driven organization
DW Migration Webinar-March 2022.pptx
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
Data Lakehouse Symposium | Day 4
Data Mesh for Dinner
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
Evolution from EDA to Data Mesh: Data in Motion

What's hot (20)

PDF
Data Mesh at CMC Markets: Past, Present and Future
PDF
Ml ops on AWS
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PPTX
Big data architectures and the data lake
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PDF
Data Mesh Part 4 Monolith to Mesh
PPTX
Developing Data Products
PDF
The Knowledge Graph Explosion
PPTX
Introducción a la arquitectura Data Lake con Azure
PPTX
Is the traditional data warehouse dead?
PDF
Introducing Databricks Delta
PPTX
How to Choose The Right Database on AWS - Berlin Summit - 2019
PDF
The Role of Data Governance in a Data Strategy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
PDF
Google BigQuery Best Practices
PDF
Introduction SQL Analytics on Lakehouse Architecture
PDF
Data Platform Architecture Principles and Evaluation Criteria
PDF
Modernizing our data platform
PDF
Time to Talk about Data Mesh
Data Mesh at CMC Markets: Past, Present and Future
Ml ops on AWS
Enabling a Data Mesh Architecture with Data Virtualization
Big data architectures and the data lake
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Data Mesh Part 4 Monolith to Mesh
Developing Data Products
The Knowledge Graph Explosion
Introducción a la arquitectura Data Lake con Azure
Is the traditional data warehouse dead?
Introducing Databricks Delta
How to Choose The Right Database on AWS - Berlin Summit - 2019
The Role of Data Governance in a Data Strategy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Google BigQuery Best Practices
Introduction SQL Analytics on Lakehouse Architecture
Data Platform Architecture Principles and Evaluation Criteria
Modernizing our data platform
Time to Talk about Data Mesh
Ad

Similar to The Data Unicorns (20)

PDF
Journey data driven organization
PDF
What makes an effective data team?
PDF
How organizations can become data-driven: three main rules
PDF
The Journey to a data driven organization
PDF
Einat shimoni at the IGTCloud predicting the future event
PDF
Data Integrity Trends
PDF
Data Innovation Summit: Data Integrity Trends
PDF
Harness the power of data
PDF
Building a Data Culture at Your Organization - Dawn of the Data Age Lecture S...
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PDF
Data Products and teams
PDF
Data is not the new snake oil
PDF
SuperWeek 2016 - Garbage In Garbage Out: Data Quality in a TMS World
PDF
Data science and its potential to change business as we know it. The Roadmap ...
PDF
How to make your data scientists happy
PDF
Data Quality Success Stories
PPTX
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
PDF
STKI Summit 2019 Innovation Terroir
PDF
Data & Analytics trends STKI summit 2016
PDF
Chief data-officers-guide-on-transforming-to-a-data-driven-organization
Journey data driven organization
What makes an effective data team?
How organizations can become data-driven: three main rules
The Journey to a data driven organization
Einat shimoni at the IGTCloud predicting the future event
Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
Harness the power of data
Building a Data Culture at Your Organization - Dawn of the Data Age Lecture S...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Data Products and teams
Data is not the new snake oil
SuperWeek 2016 - Garbage In Garbage Out: Data Quality in a TMS World
Data science and its potential to change business as we know it. The Roadmap ...
How to make your data scientists happy
Data Quality Success Stories
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
STKI Summit 2019 Innovation Terroir
Data & Analytics trends STKI summit 2016
Chief data-officers-guide-on-transforming-to-a-data-driven-organization
Ad

More from Einat Shimoni (20)

PDF
2021 STKI Summit Part 1: Data and CX
PDF
The secret spice of great customer experiences
PDF
Customer, Data Employee Trio
PDF
The Evolution of Analytics in Organizations
PDF
The Algorithm of Magical Customer Experiences
PDF
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...
PDF
The Journey to Customer Journeys
PDF
Bots: from dream to reality
PDF
Are bots the future of customer engagement?
PDF
Why are we talking about data?
PDF
What is Data Driven Marketing?
PDF
What's new in Data and Analytics for CX and Marketing?
PDF
How does Big data empower marketing?
PDF
3 practical questions for CMOs in 2016
PDF
Let's simplify Content Marketing
PDF
2016 positioning apps_analytics_final
PDF
Applications and Analytics players and positioning
PDF
Big data analytics
PDF
The digital customer and Systems of Immersion
PDF
Application combined v12
2021 STKI Summit Part 1: Data and CX
The secret spice of great customer experiences
Customer, Data Employee Trio
The Evolution of Analytics in Organizations
The Algorithm of Magical Customer Experiences
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...
The Journey to Customer Journeys
Bots: from dream to reality
Are bots the future of customer engagement?
Why are we talking about data?
What is Data Driven Marketing?
What's new in Data and Analytics for CX and Marketing?
How does Big data empower marketing?
3 practical questions for CMOs in 2016
Let's simplify Content Marketing
2016 positioning apps_analytics_final
Applications and Analytics players and positioning
Big data analytics
The digital customer and Systems of Immersion
Application combined v12

Recently uploaded (20)

PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Architecture types and enterprise applications.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Getting Started with Data Integration: FME Form 101
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
August Patch Tuesday
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
The various Industrial Revolutions .pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Tartificialntelligence_presentation.pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Enhancing emotion recognition model for a student engagement use case through...
Architecture types and enterprise applications.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Hindi spoken digit analysis for native and non-native speakers
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Web App vs Mobile App What Should You Build First.pdf
Getting started with AI Agents and Multi-Agent Systems
Getting Started with Data Integration: FME Form 101
WOOl fibre morphology and structure.pdf for textiles
August Patch Tuesday
NewMind AI Weekly Chronicles – August ’25 Week III
Assigned Numbers - 2025 - Bluetooth® Document
The various Industrial Revolutions .pptx
cloud_computing_Infrastucture_as_cloud_p

The Data Unicorns

  • 1. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 1 STKI Summit 2019 THE DATA UNICORNS
  • 2. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 2 Main Themes 2019 for Data and Analytics DATA-CENTRIC THE DATA DEBT 01 02 03 06 05 08 07 04 Applications, processes & decisions becoming data- centric Data Catalogs proliferation But Lack of data ownership and strategy remains REAL PROBLEMS Use of Design Thinking and Empathy concepts to solve REAL problems DATA LITERACY The data “language” in organizations will increase DATA SCIENCE FOR ALL AI, ML and Automation will empower citizen DS DATA PRODUCTS “Data product managers” will manage the entire lifecycle DATA TEAMS Agile-like teams will collaborate around data “products” AUTOMATION Automation in data management data science processes STKI Summit 2019
  • 3. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 3 ARE WE READY FOR A DATA-CENTRIC REALITY? Intelligent Automation Seamless Experiences AI-fueled processes
  • 4. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 4 Payroll Sales Call center Software Software Software Infra Infra Infra Developers Developers Developers Users Users Silo Silo Silo Application Centric Computing (systems of transactions) Customer Facing Computing (systems of engagement) DATA Centric Computing (systems of decisions) Automation Revolution (Preemptive) AI/ML/DL Data Science Intelligence Systems Human/ Machine Workforce IoT Process Engineering UsersUsers Digital (forced) Transformation Channels APIs AGILE Customer Journey UX Marketing Automation RPA Data Analytics
  • 5. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 5 of organizations have adopted or have plans to adopt AI in the next 5 years (IDC) AI-Driven companies will steal $1.2 Trillion from competitors by 2020 The race for AI
  • 6. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 6 63% of CEOs think AI will have a greater impact than the internet Source: PWC The race for AI
  • 7. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 7 Like it or not, the AGE of AUTOMATION is here Ratio of human-machine working hours – 2018 vs. 2022 human machinehuman machine
  • 8. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 8 Processes and business operations rely on data This means future businesses will be DATA CENTRIC
  • 9. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 9 Source: PWC 10-year old gap! WHAT DO CEOs SAY ABOUT THEIR OWN DATA GAP?
  • 10. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 10 3 reasons for this gap: lack of analytical Mindset Data Siloing Poor data reliability
  • 11. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 11 DATA DRIVEN is more of a cultural thing Being data-driven means that people’s decisions & actions rely on data
  • 12. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 12 DATA LITERACY* is a new language, and we all need to be fluent in it *Data literacy: the ability to read, write and communicate data in context
  • 13. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 13 Source: The data literacy project
  • 14. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 14 POOR DATA LITERACY IS A MAJOR ROAD BLOCK FOR CDOs Source: Gartner CDO Survey
  • 15. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 15 thedataliteracyproject.org A global community dedicated to building a data-literate culture for all
  • 16. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 16 The rise of CDOs Source: Gartner CDO Survey 29 FTEs reporting directly to CDOs 25% increase in CDOs funding Will be Mission-critical function in 75% of orgs.
  • 17. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 17 Source: Gartner CDO Survey Risk Mitigation Cost Cutting Value Creation 27% 28% 45% CDOs time allocation:
  • 18. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 18 63% 28% DO YOU HAVE A CDO (DATA) IN PLACE? Source: STKI DATA Survey, 2019Source: Gartner CDO Study YES
  • 19. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 19 CDO survey: Israel STKI CX Survey 2019
  • 20. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 20 CDO survey: Israel STKI CX Survey 2019
  • 21. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 21 ONE CDO STRUCTURE >DOESN’T< FIT ALL
  • 22. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 22 Source: Oracle
  • 23. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 23 Source: IBM
  • 24. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 24 WE WANT DATA DEMOCRACY
  • 25. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 25 WE WANT DATA DEMOCRACY WHO ARE WE? DATA SCIENTISTS! WHAT DO WE WANT? WHEN DO WE WANT IT? NOW!!! SELF SERVICE!
  • 26. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 26 …and then came the DATA LAKE
  • 27. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 27 MYTH: REALITY: Data lakes are the answer to data democratization and self service. Let’s upload a lot of data into the lake as quickly as possible.
  • 28. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 28 This actually created a big problem. Data is not harmonized, data lakes are full of isolated data islands: Organizations widened their data debt
  • 29. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 29 The DATA DEBT 64% Duplicate data Missing data - Fields that should contain values, but do not. 25% data entry errors No single version of the truth
  • 30. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 30 “80% of data science is cleaning the data 20 % is complaining about cleaning the data” Source: Kaggle State of Data Science Survey WHAT ARE DATA SCIENTISTS’ MAIN CHALLENGES? 1. Dirty data 2. No access 3. Privacy issues
  • 31. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 31
  • 32. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 32 20% 33%600B$ 12% of average data set is dirty is the annual cost to the U.S economy due to bad data of company projects fail because of weak data is the average annual revenue loss (Sources: Springer Link; IBM ) BAD DATA = BAD DECISIONS
  • 33. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 33 STORE ACCESSDEPLOY PREPARE MODEL 6 1 2 3 5 4 Store the data: DW/ DL/ Data Mart/ Logical DW Transform Clean Understand DATA CENTRIC ARCHITECTURE 6.DEPLOY 1.ACCESS 2.INGEST 3.STORE4.PREPARE Model 5.MODEL Learn Train GOVERN INGEST Run code in operational processes Systems of Decisions Data Dictionary
  • 34. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 34 From Waterfall to Agile, Iterative Processes
  • 35. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 35 WHAT’S THE RIGHT BALANCE for a DATA-CENTRIC-READY BUSINESS?
  • 36. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 36 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION (DATAOPS) 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  • 37. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 37 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION: DATAOPS 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  • 38. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 38 What is stopping you from becoming data centric? Source: Atscale Big Data Maturity Report
  • 39. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 39 Do you need a data catalog? Yes.
  • 40. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 40 DISCOVERY Easily Search and browse data WHY DO YOU NEED A DATA CATALOG? ENABLES Self service to DS and analysts TAGGING Data is described technical & business CURATION Self service to DS and analysts FEEDBACK Rating and reviews by users BALANCE Between the need to control and to consume AWARENESS Be informed of relevant and available data HARMONIZE Enable single version of the truth
  • 41. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 41 Through to the end of 2022, manual tasks in data management will be cut by 45% thanks to ML and automated service-level management (Gartner) AUTOMATION in Data Management Cleaning, wrangling, transforming, and loading
  • 42. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 42 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION: DATAOPS 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  • 43. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 43 WANTED: Analytics Engineer Research tasks Build/plan models Statistical languages Prototype ML models DATA ENGINEER BUSINESS ANALYST DATA SCIENTIST Ingestion Storage Transformation Preparation Virtualization Enrichment Business Logic Understand the impact to the business R, Python Hadoop Spark Kafka ML Data Visualization Unstructured data Business understanding Communication skills Storytelling skills DB Administration Storage Visualization SQL Data Pipeline Business understanding Communication skills Data Architecture NoSQL Data Integration, ETL, APIs
  • 44. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 44 Data Engineer Business Analyst Operations Data Scientist
  • 45. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 45 Define a data governance strategy Enforce a “data catalog” policy Define harmonized data definitions Create a central COE for DS teams HARMONIZED DATA PLATFORM AGILE TEAM AUTOMATION: DATAOPS 1 2 3 Create data teams/ squads Product owner Automate as many processes as possible in the DS value chain Use DataOps/ MLOps as reference
  • 46. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 46 THE GOAL: Managing data products
  • 47. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 47 DATAOPS IS NOW A “THING”
  • 48. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 48 The next evolution: MLOps (“A/B Testing” for DS) ML training (a.k.a model generation, model build or model fit) that generates the model ML inference (a.k.a prediction, scoring, or model serve) that generates the insights.
  • 49. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 49 Source: EY
  • 50. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 50 Source: EY Chatbots, NLP/NLG and RPA. Chatbots, NLP/NLG. IPA, ML, NLP/NLG, RPA IPA (Intelligent process automation), ML and RPA Deep Learning, ML and IPA (Intelligent process automation)
  • 51. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 51 GNS HEALTH: DISCOVERING CAUSAL LINKS AGRICULTURE : FARMERS RECOMMENDATIONS GNS applies ML to find overlooked relationships in patients’ health records. It creates hypotheses to explain it and then suggests which are most likely. Result: GNS uncovered a new drug interaction hidden in unstructured patient notes. AI system provides real-time recommendations for farmers on how to increase productivity (which crops to plant, where to grow, nitrogen in soil…) Result: farmers happy about the crop yields obtained with AI’s guidance. AI solution that improves accuracy of fraud detection. Monitors millions of transactions daily, purchase location, customer behavior, IP addresses… to identify patterns that signal possible fraud. DANSKE BANK: AI FOR FRAUD DETECTION AI & ML USE CASE EXAMPLES Result: Improved fraud detection rate by 50%, decreased false positives by 60%. Investigators can concentrate efforts on flagged transactions.
  • 52. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 52 OCEAN MEDALLION A “LOVE BOAT” EXPERIENCE DANSKE BANK: AI FOR FRAUD DETECTION Instead of just alleviating the “friction” of typical travel experiences (lines, room keys, paying for things) it will use data to anticipate what you want to do, eat, and see. The medallion can be used to pay; unlock the door to your room as you approach; can be used on the ship’s gambling platform; provide recommendations based on preferences
  • 53. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 53 35% consider Machine learning models to be ‘black boxes’ (but feel the models can be explained by experts – “explainers”). 10% of the participants are confident of explaining most or all models.
  • 54. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 54 #MyData
  • 55. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 55
  • 56. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 56 What is “personal data”? How do you manage it? Source: Skillzme
  • 57. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 57 Fair data economy
  • 58. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 58 42% say that lack of trust prevents them from using digital services Source: Sitra’s 2018 four country survey (Europe: Finland, Netherlands, France, Germany) Trust is built by having the power to influence how your data is used In a survey for IBM, 75 percent of respondents said they will not buy a product from a company – no matter how great the product – if they don’t trust that company to protect their data “Give me your data”
  • 59. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 59 • Build a data catalog as a “data-lake gatekeeper” • Tackle point-specific data quality projects • Assign mixed data teams and an “agile way of working” for specific dynamic analytic • Automate as much as possible! • Define key business questions: “Start with the problem, not the data” • Design a data governance strategy • Establish CDO-IT-LOBs collaborative processes • Focus on promoting data literacy • Implement DevOps/DataOps principals
  • 60. Copyright@STKI_2019 Do not remove source or attribution from any slide or graph 60 Einat Shimoni EVP & Senior Analyst STKI