SlideShare a Scribd company logo
“Story”fying your Data : How to go from Data to Insights to Stories
Shravan KumarDSS, Sept 14th, 2020
How to make yourself Indispensable in your Career with Data
How a nurse changed the course of a war using data storytelling
Nightingale, helped curtail the death rate from a whopping 40% to a mere 2%
3
Created by Florence Nightingale for Queen
Victoria during England’s war with France.
Visualizes deaths due to:
Red: War wounds
Black: Other war-related causes
Blue: Avoidable hospital diseases
4
INTRODUCTION
Shravan Kumar A
Director, Client Success
“Simplify Data Science for all”100+ Clients
Insights as Stories
Help start, apply and adopt Data Science
@sh_ra_van
/shravankumara
Introduction to Data Portraits
5
How to Create a Data Portrait
6
7Source: McKinsey – COVID-19 Briefing materials
COVID-19 Impact on Industries – A Perspective
8Source: McKinsey – COVID-19 Briefing materials
COVID-19 Impact on Industries – A Perspective
9
Companies are working to minimize COVID-19 impact and build resilience
1 Source: BCG Covid-19 report, Apr 2, 2020
2 Source: McKinsey - How CDOs can navigate COVID-19 response, Apr 2020
COVID-19 has disrupted every industry. All
sectors display an element of fragility and
are susceptible to shock.2
Industries at the forefront of the crisis are
relying on data to inform their response and
rebound strategies.
McKinsey1 suggests three waves of data-
driven actions that organizations can take:
1. Ensure data teams – and the whole
organization remain operational.
2. Lead solutions to prepare for the crisis-
triggered challenges.
3. Prepare for the next normal and get
ready to execute the plans.
The effects of the outbreak aren’t going away quickly. This realization has settled in.
10
DATA SCIENCE:
WHAT’S THE VALUE?
IT’S A RECESSION.
WHY DATA NOW?
REALITY CHECK: HOW
TO THRIVE?
11
Senior Data ScientistPrincipal AI StorytellerChief Data Wizard
FEELING LUCKY? HERE’S A DATA SCIENCE TITLE GENERATOR!
Data
Statistical
ML
AI
Chief
Principal
Senior
Junior
Associate
Deputy
Assistant
Scientist
Engineer
Analyst
Designer
Developer
Designer
Storyteller
Ninja
Chef
Wrangler
Evangelist
Rock Star
Wizard
Alchemist
Vanity keywords Areas Activities
12
BUZZWORDS AND BUSTED BUDGETS
13
THE JOURNEY FROM DATA TO DECISIONS
Data Engineering
MaturityPhases
Data Science
Data as
‘Culture’
Data
Collection
Data
Storage
Data
Transformation
Reporting Insights Consumption Decisions
Source: Article – When and how to build out your data science team
14
THE JOURNEY FROM DATA TO DECISIONS
Data Engineering Data Science
Data
Collection
Data
Storage
Data
Transformatio
n
Reporting Insights Consumption
MaturityPhases
Source: Article – When and how to build out your data science team
Data as
‘Culture’
Decisions
15
REPORTING: DESCRIPTIVE SUMMARIES
2019 Boston Chicago Detroit New York
Month Price Sales Price Sales Price Sales Price Sales
Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50
Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75
Revenue numbers from four Cities
16
INSIGHT: PREDICTING TELCO CUSTOMER CHURN
Tenure (months)
0 - 12 36+12-36
Data Usage >
1.5 GB
01
YN
Bill > $65
0
N Y
• Simple Decision-tree model offered ~30% reduction in churn
• Advanced black-box models offered ~50%, but with low explainability
0Low Risk
1
High Risk
Source: Gramener
17
CONSUMPTION: WHEN ARE PEOPLE BORN IN THE US?
Source: https://gramener.com/posters/Birthdays.pdf
..so, conceptions
might happen here
Very high
births..
Love the Valentine’s?
Too busy holidaying?
Avoid April
Fool’s Day?
Unlucky 13th?
More births
Fewer births
18
More births
CONSUMPTION: WHAT’S THE BIRTH PATTERN IN INDIA?
Source: https://gramener.com/posters/Birthdays.pdf
Fewer births
Most births in
the first half
A striking birth pattern seen on the 5th, 10th,
15th, 20th and 25th of each month…
Very low births
Aug onwards
Why? Birthdates are ‘changed’ to
aid early school admissions
.. this is a typical
indication of fraud!
This adversely impacts children’s marks
It’s a well-established fact that older children tend to do
better at school in most activities. Since many children
have had their birth dates brought forward, these younger
children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend
to score lower marks.
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)
Children “born” on round numbered days score lower marks on average,
due to a higher proportion of younger children
Class Xth English Marks Distribution
0
5,000
10,000
15,000
20,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Stories have four types of narratives to explain visualizations
Remember “SEAR”: Summarize, Explain, Annotate, Recommend 21
0
5,000
10,000
15,000
20,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Marks
# students
Teachers add marks to stop some students from failing
This chart shows Class 10 students’ English
marks in Tamil Nadu, India, in 2011. The X-axis
has the mark a student has scored. The Y-axis
has the # of students who scored that mark.
Large number of
students score
exactly 35 marks
Few (but not 0) students
fail at 31-34 marks
What’s unusual
Large number of students
score 35 marks.
Few (but not 0) students score
between 30-35
Only some students get this benefit.
Identify a fair policy that will be applied consistently.
Summarize the visual in its title
Don’t describe the chart.
Don’t write the user’s question.
Write the answer itself. Like a headline.
Explain & interpret the visual
How should the user read it?
What do you say when you talk through it?
Explain what the visual is. Then the axes.
Then its contents. Then the inference.
Recommend an action
How should I act on this?
You need to change the audience.
(Otherwise, you made no difference.)
Annotate essential elements
What should the user focus their eyes on?
Point it out, or highlight it with colors
Interpret what they’re seeing – in words.
This is a bell curve. But the spike at 35 (the mark
at which students pass) is unusual. Teachers
must be adding marks to some of the students
who are likely to fail by a small margin.
No one scores 0-4
marks
An energy utility detected billing fraud
This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011.
An unusually large number of readings are aligned with the slab boundaries.
Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the
number of customers with a customers with a specific bill amount (in units, or KWh).
Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than
someone with 100 units. So people have a strong incentive to stay at or within a slab boundary.
An energy utility (with over 50 million subscribers)
had 10 years worth of customer billing data
available.
Most fraud detection software failed to load the
data, and sampled data revealed little or no insight.
This can happen in one of two ways.
First, people may be monitoring their usage very
carefully, and turn off their lights and fans the instant
their usage hits the slab boundary.
Or, more realistically, there’s probably some level of corruption
involved, where customers pay a small sum to the meter reading staff
to ensure that it stays exactly at the slab boundary, giving them the
advantage of a lower price.
23
CONSUMPTION: DECODING MAHABHARATHA’S RELATIONSHIP
Source: https://gramener.com/mahabharatha/
24
INSIGHT + CONSUMPTION: DATA STORIES FROM THE WORLD BANK
Source: World bank storytelling, by Gramener
25
DATA & AI CAN SAVE LIVES
TOO
The Story of
Marikina City, Philippines
Link
• Highly urbanized city situated on the
river basin of Marikina
• Faced with huge flood hazard
levels. Better & resilient
infrastructure planning needed
• How can Urban planners plan for
better emergency evac & rescue?
• Can AI be applied to solve this
problem? If applied, how can the
urban planner understand it?
26
INSIGHT: IDENTIFYING QUALITY OF LIFE FROM SATELLITE IMAGES
Source: https://qol.gramener.com/
Data stories through Comicgen
An e.g. CoVID-19 Data Explained by Data Comics
Link
Comic character in a data callout:
Samuel L. Jackson
Harrison Ford
Morgan Freeman
Tom Hanks
Tom Cruise
Insights and Story telling approach
30
Stage 1- Identify
Business Problem
Define the problem
statement by understanding:
• What is the basic need
and desired outcome?
• Who will benefit?
• What is the impact?
• What is the success
criteria?
Stage 2- Translate to Data
Problem
• Breakdown the problem
statement into multiple use-
cases
• Connect each use case with
a data set
• Understand any limitations
on data sources- Internal
and External?
Stage 4- Translate to
Business Answer
• Stitch insights from
individual use case to
create a story
• Connect data story to help
in better decision making
• Measure success
Stage 3- Data Answer
Target each use case with
data through:
• EDA and transformation
• Modelling
• Generating insights
• Sales Rep
• Data Consultant
• Account Manager
• Solution Lead
• Analyst Lead
• Data Consultant
• Account Manager
• Solution Architect
• Solution Lead
• Analyst Lead
• Data Consultant
• Data Scientist
• Solution Architect
• Solution Lead
• Data Consultant
• Account Manager
• Solution Lead
In summary, here are the 9 steps to go from data to a data story
31
Who is your audience? They determine the story
What is their problem? That defines your analysis
Find the right analysis to solve the problem
Filter for big, useful, surprising insights
Start with the takeaway. Summarize your entire story
Add supporting analyses as a tree
Pick a format based on how your audience will consume the story
Pick a visual design based on the takeaway
Annotate to explain & engage. Use four types of narratives
32
DATA SCIENCE:
WHAT’S THE VALUE?
IT’S A RECESSION.
WHY DATA NOW?
REALITY CHECK: HOW
TO THRIVE?
33
1. Most Data Science projects solve the wrong Problem..
Tip #1: Master the application of knowledge
34
AI IS COMING FOR THE DATA SCIENCE JOBS
AI and automation will
do away with most of
the grunt work in the
data science workflow
today.
Applied knowledge will
keep you relevant for
much longer.
Wolbachia blocks dengue, Zika and chikungunya virus
transmission
Wolbachia mosquito releases
Adults Eggs Community
Model design
20,000 ppl / km2
15,000 ppl / km2
Identify where people live Detect buildings
Estimate human population
density
100m2
grids
e.g.
Site scoping
• Set boundary of potential
release area
• Identify the areas where
people live
• Map mosquito release points
over area with a grid
• Organise release area into
stages
39
2. Data Analytics needs a lot more than Data & Analytics..
Tip #2: Learn non-core skills
40
DATA SCIENCE SOLUTION: LET’S TAKE THIS EXAMPLE..
Source: World bank storytelling, by Gramener
41
..AND BREAK IT DOWN INTO THE BUILDING BLOCKS
Domain
Design
Analytics
Development
• Impact analytics
• Clustering techniques
• Business workflow
• Influencing factors
• Frontend/backend coding
• Data transformation
• User journey
• Visuals & aesthetics
Project
Management
• Piecing it all together
• Change management
42
HERE ARE THE 5 ROLES & SKILLS CRITICAL FOR DATA SCIENCE
Data
Translator
ML
Engineer
Information
Designer
Data
Scientist
Data Science
Manager
Comic characters from Gramener Comicgen library
Domain
Design
Analytics
Development
Project
Management
• Domain expertise
• Business analysis
• Solutioning
• Software engineering
• Front/back-end coding
• Data pipelining
• Information design
• User centered design
• Interface/visual design (parts)
• Stats & ML
• Interpret insights
• Scripting skills
• Project management
• Business analysis/solutioning
• Team handling
43
3. Data cleaning takes up a majority of time on projects..
Tip #3: Sharpen ability to handle data
44
In data science, 80% of the time is spent preparing data,
and the other 20% on complaining about preparing the data!
- Kirk Borne
“
45
4. Technology goes obsolete faster in Data Science..
Tip #4: Learn new tools quickly
46
WHAT DOES THE DATA TOOLS LANDSCAPE LOOK LIKE?
The tool does not matter. A person’s skill with the tool does.
Pick an ability to learn new tools rapidly
Source: https://mattturck.com/data2019/
47
EXAMPLE: WHAT ARE YOUR TOOL OPTIONS TO VISUALIZE DATA?
Code-based
Plug-n-
play
Flexibility
Complexity
Google Data Studio
Excel
Google Sheets
Tableau
Raw
Vismio
Datawrapper
Timeline JS
Polestar
Vega
Vega-lite
d3,
matplotlib
C3
High charts
Nvd3
Gramex
ggplot, bokeh
Plotly
Choose tools based on flexibility, your background and tool availability
48
Tip #4: Learn new tools quickly
Tip #2: Learn non-core skills
Tip #3: Sharpen ability to handle data
Tip #1: Master the application of knowledge
49
DATA SCIENCE:
WHAT’S THE VALUE?
IT’S A RECESSION.
WHY DATA NOW?
REALITY CHECK: HOW
TO THRIVE?
50
WHAT DOES THE RECESSION MEAN FOR JOBS IN DATA SCIENCE?
Source: McKinsey report – Lives and Livelihoods
Data jobs and specialized professions
are relatively less impacted
Industries with the lowest wages and
lowest educational attainment are hit
the hardest
51
HERE’S WHY DATA IS KEY FOR COVID-19 AND THE RECESSION
Enterprises
B
Community
C
Remote workforce & collaboration
Market demand & Cash flows1
2
Supply chain & Logistics3
Identifying vulnerability and contact-tracing
Tracking the COVID-19 patient lifecycle1
2
Predicting infection rates and spread2
Public Health
A
Understand behavioral shifts
Mapping the effectiveness of shutdown1
2
Address people concerns during Covid-193
Source: Gramener – NYC 311 analysisSource: Kinsa Health weather map Source: Gramener – Supply Chain flow
52
HOW DO YOU STAY RELEVANT AND GROW IN YOUR CAREER PATH?
Do your own
data projects
Read/Write on
data science
Maintain a public
portfolio
Compete, learn &
re-apply
Source: Article – How to demonstrate your passion for Data
53
@sh_ra_van
/shravankumara
Please help me improve the session by
answering the feedback survey that will
be sent to your email 
THANK YOU!
GRACIAS!
MERCI!

More Related Content

PPTX
The value of storytelling through data
PPTX
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
PPTX
The ultimate guide to data storytelling | Materclass
PPTX
Humanizing Data Storytelling for Greater Business Impact
PDF
Data Storytelling - Game changer for Analytics
PDF
How AI Can Help You Make Your Audience Sit Up and Take Notice
PDF
Data & Storytelling - What Now?
PPTX
The Art of Storytelling Using Data Science
The value of storytelling through data
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
The ultimate guide to data storytelling | Materclass
Humanizing Data Storytelling for Greater Business Impact
Data Storytelling - Game changer for Analytics
How AI Can Help You Make Your Audience Sit Up and Take Notice
Data & Storytelling - What Now?
The Art of Storytelling Using Data Science

What's hot (20)

PPTX
Insights from Data: Overcoming Objections
PPTX
Exploratory data analysis
PPTX
Data monetization
PDF
Entering the Data Analytics industry
PDF
Oct 2017 Measurement Hour: Highlights from the Summit on the Future of Measur...
PDF
1115 track2 siegel
PPTX
'Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club
PDF
'Recession-proofing' your Business with Data
PDF
1120 track2 bennett
PDF
2016 Data Science Salary Survey
PDF
Predictive Data Analytics to Help Your Customers
PDF
Elsevier
PDF
Data Sourcing Best Practices for Reporting (Webinar slides)
PDF
CFO's Guide to Business Analytics
PPTX
Algorithms and the technology of personalisation final
PPTX
Does big data = big insights?
PDF
Real-world state of the BI market: Webinar presentation slides
PDF
1330 keynote owusu
PPTX
Analytics in business
DOC
Why good spreadsheets make bad strategies
Insights from Data: Overcoming Objections
Exploratory data analysis
Data monetization
Entering the Data Analytics industry
Oct 2017 Measurement Hour: Highlights from the Summit on the Future of Measur...
1115 track2 siegel
'Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club
'Recession-proofing' your Business with Data
1120 track2 bennett
2016 Data Science Salary Survey
Predictive Data Analytics to Help Your Customers
Elsevier
Data Sourcing Best Practices for Reporting (Webinar slides)
CFO's Guide to Business Analytics
Algorithms and the technology of personalisation final
Does big data = big insights?
Real-world state of the BI market: Webinar presentation slides
1330 keynote owusu
Analytics in business
Why good spreadsheets make bad strategies
Ad

Similar to Storyfying your Data: How to go from Data to Insights to Stories (20)

PPTX
Nr14: Ten tips for data journalists
PPTX
Data Storytelling for Social Change
PDF
Defining Constituents, Data Vizzes and Telling a Data Story
PPTX
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
PDF
The Data Stroytelling Handbook
PDF
TechSoup Connect Western Canada: Data To Action: Making Your Data Visible and...
PDF
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
PPTX
APLIC 2014 - Impact? Intrigue? Value-add? The ins and outs of Data Visualization
PDF
Data scientist
PPTX
Data visualisation as a campaign tool for change
PPTX
Data visualization for social problems
PDF
What's the Value of Data Science for Organizations: Tips for Invincibility in...
PDF
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
PDF
Explore Data: Data Science + Visualization
PPTX
DATASCIENCE.pptx
PDF
Data fluency for the 21st century
PPTX
How to Enter the Data Analytics Industry?
PDF
Nonprofits & Data: When Data is Everywhere, Where Do You Start?
PDF
Around Data Science
Nr14: Ten tips for data journalists
Data Storytelling for Social Change
Defining Constituents, Data Vizzes and Telling a Data Story
Data Quality: Are Your Data Suitable For Answering Your Questions? - Experfy ...
The Data Stroytelling Handbook
TechSoup Connect Western Canada: Data To Action: Making Your Data Visible and...
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
APLIC 2014 - Impact? Intrigue? Value-add? The ins and outs of Data Visualization
Data scientist
Data visualisation as a campaign tool for change
Data visualization for social problems
What's the Value of Data Science for Organizations: Tips for Invincibility in...
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Explore Data: Data Science + Visualization
DATASCIENCE.pptx
Data fluency for the 21st century
How to Enter the Data Analytics Industry?
Nonprofits & Data: When Data is Everywhere, Where Do You Start?
Around Data Science
Ad

More from Gramener (20)

PPTX
6 Methods to Improve Your Manufacturing Process with Computer Vision
PDF
Detecting Manufacturing Defects with Computer Vision
PDF
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma & Healthcare
PDF
Automated Barcode Generation System in Manufacturing
PDF
The Role of Technology to Save Biodiversity
PPTX
Enable Storytelling with Power BI & Comicgen Plugin
PDF
The Most Effective Method For Selecting Data Science Projects
PPTX
Low Code Platform To Build Data & AI Products
PPTX
5 Key Foundations To Build An Effective CX Program
PPTX
Using Power BI To Improve Media Buying & Ad Performance
PPSX
Recession Proofing With Data : Webinar
PPTX
Engage Your Audience With PowerPoint Decks: Webinar
PPTX
Structure Your Data Science Teams For Best Outcomes
PPTX
Dawn Of Geospatial AI - Webinar
PPTX
5 Steps To Become A Data-Driven Organization : Webinar
PPTX
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
PPTX
Saving Lives with Geospatial AI - Pycon Indonesia 2020
PPTX
Driving Transformation in Industries with Artificial Intelligence (AI)
PPTX
Data and Storytelling | What Now?
PDF
Introduction to Data Storytelling | Rasagy Sharma - Gramener
6 Methods to Improve Your Manufacturing Process with Computer Vision
Detecting Manufacturing Defects with Computer Vision
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma & Healthcare
Automated Barcode Generation System in Manufacturing
The Role of Technology to Save Biodiversity
Enable Storytelling with Power BI & Comicgen Plugin
The Most Effective Method For Selecting Data Science Projects
Low Code Platform To Build Data & AI Products
5 Key Foundations To Build An Effective CX Program
Using Power BI To Improve Media Buying & Ad Performance
Recession Proofing With Data : Webinar
Engage Your Audience With PowerPoint Decks: Webinar
Structure Your Data Science Teams For Best Outcomes
Dawn Of Geospatial AI - Webinar
5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
Saving Lives with Geospatial AI - Pycon Indonesia 2020
Driving Transformation in Industries with Artificial Intelligence (AI)
Data and Storytelling | What Now?
Introduction to Data Storytelling | Rasagy Sharma - Gramener

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Lecture1 pattern recognition............
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Global journeys: estimating international migration
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Mega Projects Data Mega Projects Data
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Computer network topology notes for revision
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Lecture1 pattern recognition............
Supervised vs unsupervised machine learning algorithms
Miokarditis (Inflamasi pada Otot Jantung)
Global journeys: estimating international migration
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Mega Projects Data Mega Projects Data
.pdf is not working space design for the following data for the following dat...
Data_Analytics_and_PowerBI_Presentation.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
IB Computer Science - Internal Assessment.pptx
Computer network topology notes for revision
STUDY DESIGN details- Lt Col Maksud (21).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Reliability_Chapter_ presentation 1221.5784
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Storyfying your Data: How to go from Data to Insights to Stories

  • 1. “Story”fying your Data : How to go from Data to Insights to Stories Shravan KumarDSS, Sept 14th, 2020 How to make yourself Indispensable in your Career with Data
  • 2. How a nurse changed the course of a war using data storytelling
  • 3. Nightingale, helped curtail the death rate from a whopping 40% to a mere 2% 3 Created by Florence Nightingale for Queen Victoria during England’s war with France. Visualizes deaths due to: Red: War wounds Black: Other war-related causes Blue: Avoidable hospital diseases
  • 4. 4 INTRODUCTION Shravan Kumar A Director, Client Success “Simplify Data Science for all”100+ Clients Insights as Stories Help start, apply and adopt Data Science @sh_ra_van /shravankumara
  • 5. Introduction to Data Portraits 5
  • 6. How to Create a Data Portrait 6
  • 7. 7Source: McKinsey – COVID-19 Briefing materials COVID-19 Impact on Industries – A Perspective
  • 8. 8Source: McKinsey – COVID-19 Briefing materials COVID-19 Impact on Industries – A Perspective
  • 9. 9 Companies are working to minimize COVID-19 impact and build resilience 1 Source: BCG Covid-19 report, Apr 2, 2020 2 Source: McKinsey - How CDOs can navigate COVID-19 response, Apr 2020 COVID-19 has disrupted every industry. All sectors display an element of fragility and are susceptible to shock.2 Industries at the forefront of the crisis are relying on data to inform their response and rebound strategies. McKinsey1 suggests three waves of data- driven actions that organizations can take: 1. Ensure data teams – and the whole organization remain operational. 2. Lead solutions to prepare for the crisis- triggered challenges. 3. Prepare for the next normal and get ready to execute the plans. The effects of the outbreak aren’t going away quickly. This realization has settled in.
  • 10. 10 DATA SCIENCE: WHAT’S THE VALUE? IT’S A RECESSION. WHY DATA NOW? REALITY CHECK: HOW TO THRIVE?
  • 11. 11 Senior Data ScientistPrincipal AI StorytellerChief Data Wizard FEELING LUCKY? HERE’S A DATA SCIENCE TITLE GENERATOR! Data Statistical ML AI Chief Principal Senior Junior Associate Deputy Assistant Scientist Engineer Analyst Designer Developer Designer Storyteller Ninja Chef Wrangler Evangelist Rock Star Wizard Alchemist Vanity keywords Areas Activities
  • 13. 13 THE JOURNEY FROM DATA TO DECISIONS Data Engineering MaturityPhases Data Science Data as ‘Culture’ Data Collection Data Storage Data Transformation Reporting Insights Consumption Decisions Source: Article – When and how to build out your data science team
  • 14. 14 THE JOURNEY FROM DATA TO DECISIONS Data Engineering Data Science Data Collection Data Storage Data Transformatio n Reporting Insights Consumption MaturityPhases Source: Article – When and how to build out your data science team Data as ‘Culture’ Decisions
  • 15. 15 REPORTING: DESCRIPTIVE SUMMARIES 2019 Boston Chicago Detroit New York Month Price Sales Price Sales Price Sales Price Sales Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50 Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75 Revenue numbers from four Cities
  • 16. 16 INSIGHT: PREDICTING TELCO CUSTOMER CHURN Tenure (months) 0 - 12 36+12-36 Data Usage > 1.5 GB 01 YN Bill > $65 0 N Y • Simple Decision-tree model offered ~30% reduction in churn • Advanced black-box models offered ~50%, but with low explainability 0Low Risk 1 High Risk Source: Gramener
  • 17. 17 CONSUMPTION: WHEN ARE PEOPLE BORN IN THE US? Source: https://gramener.com/posters/Birthdays.pdf ..so, conceptions might happen here Very high births.. Love the Valentine’s? Too busy holidaying? Avoid April Fool’s Day? Unlucky 13th? More births Fewer births
  • 18. 18 More births CONSUMPTION: WHAT’S THE BIRTH PATTERN IN INDIA? Source: https://gramener.com/posters/Birthdays.pdf Fewer births Most births in the first half A striking birth pattern seen on the 5th, 10th, 15th, 20th and 25th of each month… Very low births Aug onwards Why? Birthdates are ‘changed’ to aid early school admissions .. this is a typical indication of fraud!
  • 19. This adversely impacts children’s marks It’s a well-established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer. The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend to score lower marks. • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013) Children “born” on round numbered days score lower marks on average, due to a higher proportion of younger children
  • 20. Class Xth English Marks Distribution 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  • 21. Stories have four types of narratives to explain visualizations Remember “SEAR”: Summarize, Explain, Annotate, Recommend 21 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Marks # students Teachers add marks to stop some students from failing This chart shows Class 10 students’ English marks in Tamil Nadu, India, in 2011. The X-axis has the mark a student has scored. The Y-axis has the # of students who scored that mark. Large number of students score exactly 35 marks Few (but not 0) students fail at 31-34 marks What’s unusual Large number of students score 35 marks. Few (but not 0) students score between 30-35 Only some students get this benefit. Identify a fair policy that will be applied consistently. Summarize the visual in its title Don’t describe the chart. Don’t write the user’s question. Write the answer itself. Like a headline. Explain & interpret the visual How should the user read it? What do you say when you talk through it? Explain what the visual is. Then the axes. Then its contents. Then the inference. Recommend an action How should I act on this? You need to change the audience. (Otherwise, you made no difference.) Annotate essential elements What should the user focus their eyes on? Point it out, or highlight it with colors Interpret what they’re seeing – in words. This is a bell curve. But the spike at 35 (the mark at which students pass) is unusual. Teachers must be adding marks to some of the students who are likely to fail by a small margin. No one scores 0-4 marks
  • 22. An energy utility detected billing fraud This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of readings are aligned with the slab boundaries. Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh). Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary. An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available. Most fraud detection software failed to load the data, and sampled data revealed little or no insight. This can happen in one of two ways. First, people may be monitoring their usage very carefully, and turn off their lights and fans the instant their usage hits the slab boundary. Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.
  • 23. 23 CONSUMPTION: DECODING MAHABHARATHA’S RELATIONSHIP Source: https://gramener.com/mahabharatha/
  • 24. 24 INSIGHT + CONSUMPTION: DATA STORIES FROM THE WORLD BANK Source: World bank storytelling, by Gramener
  • 25. 25 DATA & AI CAN SAVE LIVES TOO The Story of Marikina City, Philippines Link • Highly urbanized city situated on the river basin of Marikina • Faced with huge flood hazard levels. Better & resilient infrastructure planning needed • How can Urban planners plan for better emergency evac & rescue? • Can AI be applied to solve this problem? If applied, how can the urban planner understand it?
  • 26. 26 INSIGHT: IDENTIFYING QUALITY OF LIFE FROM SATELLITE IMAGES Source: https://qol.gramener.com/
  • 27. Data stories through Comicgen An e.g. CoVID-19 Data Explained by Data Comics Link
  • 28. Comic character in a data callout:
  • 29. Samuel L. Jackson Harrison Ford Morgan Freeman Tom Hanks Tom Cruise
  • 30. Insights and Story telling approach 30 Stage 1- Identify Business Problem Define the problem statement by understanding: • What is the basic need and desired outcome? • Who will benefit? • What is the impact? • What is the success criteria? Stage 2- Translate to Data Problem • Breakdown the problem statement into multiple use- cases • Connect each use case with a data set • Understand any limitations on data sources- Internal and External? Stage 4- Translate to Business Answer • Stitch insights from individual use case to create a story • Connect data story to help in better decision making • Measure success Stage 3- Data Answer Target each use case with data through: • EDA and transformation • Modelling • Generating insights • Sales Rep • Data Consultant • Account Manager • Solution Lead • Analyst Lead • Data Consultant • Account Manager • Solution Architect • Solution Lead • Analyst Lead • Data Consultant • Data Scientist • Solution Architect • Solution Lead • Data Consultant • Account Manager • Solution Lead
  • 31. In summary, here are the 9 steps to go from data to a data story 31 Who is your audience? They determine the story What is their problem? That defines your analysis Find the right analysis to solve the problem Filter for big, useful, surprising insights Start with the takeaway. Summarize your entire story Add supporting analyses as a tree Pick a format based on how your audience will consume the story Pick a visual design based on the takeaway Annotate to explain & engage. Use four types of narratives
  • 32. 32 DATA SCIENCE: WHAT’S THE VALUE? IT’S A RECESSION. WHY DATA NOW? REALITY CHECK: HOW TO THRIVE?
  • 33. 33 1. Most Data Science projects solve the wrong Problem.. Tip #1: Master the application of knowledge
  • 34. 34 AI IS COMING FOR THE DATA SCIENCE JOBS AI and automation will do away with most of the grunt work in the data science workflow today. Applied knowledge will keep you relevant for much longer.
  • 35. Wolbachia blocks dengue, Zika and chikungunya virus transmission
  • 37. Model design 20,000 ppl / km2 15,000 ppl / km2 Identify where people live Detect buildings Estimate human population density 100m2 grids e.g.
  • 38. Site scoping • Set boundary of potential release area • Identify the areas where people live • Map mosquito release points over area with a grid • Organise release area into stages
  • 39. 39 2. Data Analytics needs a lot more than Data & Analytics.. Tip #2: Learn non-core skills
  • 40. 40 DATA SCIENCE SOLUTION: LET’S TAKE THIS EXAMPLE.. Source: World bank storytelling, by Gramener
  • 41. 41 ..AND BREAK IT DOWN INTO THE BUILDING BLOCKS Domain Design Analytics Development • Impact analytics • Clustering techniques • Business workflow • Influencing factors • Frontend/backend coding • Data transformation • User journey • Visuals & aesthetics Project Management • Piecing it all together • Change management
  • 42. 42 HERE ARE THE 5 ROLES & SKILLS CRITICAL FOR DATA SCIENCE Data Translator ML Engineer Information Designer Data Scientist Data Science Manager Comic characters from Gramener Comicgen library Domain Design Analytics Development Project Management • Domain expertise • Business analysis • Solutioning • Software engineering • Front/back-end coding • Data pipelining • Information design • User centered design • Interface/visual design (parts) • Stats & ML • Interpret insights • Scripting skills • Project management • Business analysis/solutioning • Team handling
  • 43. 43 3. Data cleaning takes up a majority of time on projects.. Tip #3: Sharpen ability to handle data
  • 44. 44 In data science, 80% of the time is spent preparing data, and the other 20% on complaining about preparing the data! - Kirk Borne “
  • 45. 45 4. Technology goes obsolete faster in Data Science.. Tip #4: Learn new tools quickly
  • 46. 46 WHAT DOES THE DATA TOOLS LANDSCAPE LOOK LIKE? The tool does not matter. A person’s skill with the tool does. Pick an ability to learn new tools rapidly Source: https://mattturck.com/data2019/
  • 47. 47 EXAMPLE: WHAT ARE YOUR TOOL OPTIONS TO VISUALIZE DATA? Code-based Plug-n- play Flexibility Complexity Google Data Studio Excel Google Sheets Tableau Raw Vismio Datawrapper Timeline JS Polestar Vega Vega-lite d3, matplotlib C3 High charts Nvd3 Gramex ggplot, bokeh Plotly Choose tools based on flexibility, your background and tool availability
  • 48. 48 Tip #4: Learn new tools quickly Tip #2: Learn non-core skills Tip #3: Sharpen ability to handle data Tip #1: Master the application of knowledge
  • 49. 49 DATA SCIENCE: WHAT’S THE VALUE? IT’S A RECESSION. WHY DATA NOW? REALITY CHECK: HOW TO THRIVE?
  • 50. 50 WHAT DOES THE RECESSION MEAN FOR JOBS IN DATA SCIENCE? Source: McKinsey report – Lives and Livelihoods Data jobs and specialized professions are relatively less impacted Industries with the lowest wages and lowest educational attainment are hit the hardest
  • 51. 51 HERE’S WHY DATA IS KEY FOR COVID-19 AND THE RECESSION Enterprises B Community C Remote workforce & collaboration Market demand & Cash flows1 2 Supply chain & Logistics3 Identifying vulnerability and contact-tracing Tracking the COVID-19 patient lifecycle1 2 Predicting infection rates and spread2 Public Health A Understand behavioral shifts Mapping the effectiveness of shutdown1 2 Address people concerns during Covid-193 Source: Gramener – NYC 311 analysisSource: Kinsa Health weather map Source: Gramener – Supply Chain flow
  • 52. 52 HOW DO YOU STAY RELEVANT AND GROW IN YOUR CAREER PATH? Do your own data projects Read/Write on data science Maintain a public portfolio Compete, learn & re-apply Source: Article – How to demonstrate your passion for Data
  • 53. 53 @sh_ra_van /shravankumara Please help me improve the session by answering the feedback survey that will be sent to your email  THANK YOU! GRACIAS! MERCI!