SlideShare a Scribd company logo
Data-Driven Culture
DATA-DRIVEN and DATA-SCIENCE
Johan Himberg / Reaktor 29.2.2016
survey data on the business practices and IT investments of 179 large,
publicly traded companies
Firms that emphasise “data driven decision making”
have output and productivity that is 5-6% higher than what would be
expected given other investments and IT usage.
relationship also appears in asset utilisation, return on equity and market
value
Why “data-driven”
WHY
2
Brynjolfson et al (2011) on Data-Driven
Business acumen
what for
Operations Research
optimal decisions and actions
Probability theory
how to handle uncertainties
Analytics
insights and machine learning from data
Computer Science
how to implement all that
Data Science in business
WHY
3
Data Science & analytics
BASICS
BASICS
5
Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
Data Science & analytics
BUSINESS CASES
SECTION TITLE
7
Beware of empty “data-speak”
A quote from my colleague Janne Sinkkonen from a presentation at Helsinki University Machine
learning course:
“Data-speak” hides the processes behind data. 

What creates the data? What is done with the results?
The goal is not “data analysis”
Define your goal and setup without using the word ‘data’.
REAKTOR
2016
Sell audiences
Google, Facebook, media, …
Sell information
credit rating, car register,…
Information business
BUSINESS CASE
8
Operations
BUSINESS CASE
9
Create beneficial events
marketing: targeting, cross-sell, up-sell, conversion
find right product/service to sell or buy, find a good doctor, expert etc.
Avoid non-beneficial events
churn, people leaving, waste,
credit loss, fraud, …
system failures, …
Optimize
customer value,
work force, schedules,
prices, discounts, stocks,
relevancy for customer,
production quality, speed
Rationalise
process efficiency, lead times, handle complexity, search time … 
Understand: customer & product base, transactions, or processes 
internally: ERP, CRM, HR, sales systems, production, …
externally: location, routes, weather, demographics, estates, …
Efficiency and competition
React faster, streamlined decision making, risk awareness
Financial efficiency
Innovations
Well-informed strategic decisions
Understanding customer groups’ needs for product and service
development
Understanding and predicting world events, economics, demographics, ….
React to market fluctuation or changes in financial environment
Internal and external image and culture
Transparency, learning as a part of company culture
Customer satisfaction, personalisation, brand
Strategic
BUSINESS CASE
10
Netflix
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content.
- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering
Example
VIRTUES
11
Data Science & analytics
TASKS & RISKS
BASICS
13
Some dimensions
1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
BASICS
14
Informative - Operative
Informative (for understanding)
Analysis results for understanding things, results for management for making decisions:
reports, predictions, what-if analyses, simulations, visualisations,…
Operative
Automated system that makes decisions based on some rules or models, or
results that are directly operative, if not automated.
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
BASICS
15
Active - Passive
Active
You make an “intervention” and gather evidence in tests designed to reveal an effect.
Example: A/B testing.
Passive
Data is just collected, captured “as it happens”: customer transactions, sales, web-browsing,
tweets
REAKTOR / JOHAN HIMBERG
FEBRUARY 2016
BASICS
16
Use cases
REAKTOR
2016
Descriptive
What has happened?
Diagnostic
Why did it happen?
Passive Active
Customer profiles
Customer segmentation
Shopping cart analysis
Predictive
What will happen?
Prescriptive
What should I do?
Informative
Operative
Marketing impact analysis
Price elasticity analysis
Web design testing
Up-sell/cross-sell
New customer acquisition
Churn prediction
Life-time value prediction
Demography prediction
Marketing impact optimisation
Recommendation system
in a dynamic environment
Data Science & analytics
RISKS & PROBLEMS
RISKS / PROBLEMS
18
Issues by analytics use case
REAKTOR
2016
Descriptive
• isolated / ad hoc reports
• isolated ad hoc decisions
• feedback loop (report - decision
- effect)
• ignoring statistics
• analysts as sql-monkeys
• UI / visualization
Diagnostic
• statistical skills
• testing and organisation
• correlation vs. causality
• requires lots of
communication
Passive Active
Predictive
• what to predict: how to
quantify the target
• access to historical data
• quantifying and understanding
the risk(s)
• prediction accuracy validation
for future
Prescriptive
• what to optimize?
• complex software system
• technical feedback loop
• co-op between “human” and
“artificial intelligence”
• monitoring
Informative
Operative
•Focusing on wrong things
•not recognising the analytics use cases
•“data first”: long time from investment to benefits
•not starting from the beef: actions and decisions
•thinking only IT solutions and products
•careful examination and validation of the algorithms, but not setting targets
and risks according to the business target
•Organisation
•silos: communication through hierarchy
•no access to data, internal politics
•technical details decided by business people
•business criteria set by technical people
Examples…
RISKS / PROBLEMS
19
•Underestimating complexity (time & scope)
•both software and analytics to be build simultaneously
•the time and effort needed with “data wrangling”
•the time used for UIs and visualisations
•the feedback loop
•Unrealistic expectations (quality)
•on analytical systems in general (they are not that intelligent); rules needed
•a product, a model, an algorithm, a data scientist solves all the problems
•risks and targets cannot always be defined properly right away
•there is no guarantee on accuracy on a particular case before trying
…more examples
RISKS / PROBLEMS
20
Culture that helps to handle risk
WISE - DETERMINED - CURIOUS
Wise: Solve the right problems with analytics!
Determined: aim at specific, concrete things
Curious: be ready to divert, seek for evidence
Bayesian: understand uncertainties and risks
Truthful: don’t bend results upon wishes, it’s data science
Courageous: act on evidence
Active and Agile: test, don’t just observe; inspect - adapt - learn
Transparent and Helpful: co-operate from end-to-end, don’t silo
Culture that helps to handle risk
VIRTUES
22
Culture that helps to handle risk
WISE - DETERMINED - CURIOUS
Netflix prize competition (2006-2008)
Who gets the best RMSE (root mean squared error) on true user likings?
BUT
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific
context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month
subscription retention, which correlates well with maximizing consumption of video content. We therefore optimize
our algorithms to give the highest scores to titles that a member is most likely to play and enjoy.”---Netflix Prize
objective... is just one of the many components of an effective recommendation system... We also need to take
into account factors such as context, title popularity… Supporting all the different contexts in which we want to make
recommendations requires a range of algorithms that are tuned to the needs of those contexts.”
- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering
Aim at the right things
VIRTUES
24
Always aim at something specific … but be open-minded and curious
Example: Röntgen and Fleming (Nobel laureates)
• their most famous findings were “accidental”, but
• they were skilled scientists doing disciplined research for some other aim
Explore occasionally “from data to insights”. But not aimlessly. 
If you find something interesting, make a disciplined analysis, preferably a test.
Curiosity
VIRTUES
25
Culture that helps to handle risk
BAYESIAN - TRUTHFUL
The main ingredients of data science!
Making decisions based on data analysis requires the concepts of risk and
probability.
Understanding probabilities
VIRTUES
27
Culture that helps to handle risk
COURAGE
Courage
“Data driven means that progress in an activity is compelled by data
rather than by intuition or personal experience. It is often labeled as
the business jargon for what scientists call evidence based decision
making
- Wikipedia 2016-02-24

“I take risks, sometimes patients die. But not taking risks causes more
patients to die, so I guess my biggest problem is I've been cursed with
the ability to do the math.
- Fictional character Dr. House in Fox television series “House”
Culture that helps to handle risk
HELPFUL - TRANSPARENT - AGILE
Agile - Transparent
Doing data-driven work and data science in any organisation model boils
down to
	 	 “Involve everyone along the information path”
Agile development - Team decides details
Start from
•concrete actions that can be optimized
•decisions they require, and
•how to measure the effects properly
Remember the feedback loop!
Develop constantly
Lecture @AaltoBIZ, Johan Himberg, 2015
Action
optimize
decide
deploy
Data
big, small, open
local, web, meta, …
Information
report
visualize
model
Businessdrivers
aim 1
aim 2
aim 3
aim 4
aim 5
For example
• Automatised decisions;
recommendation, targeting
• Simulation
• prescriptive, predictive
modelling
For example
• documentation on meaning
of the data
• KPIs, profiles, segments,
factors, DW dashboards
• descriptive, diagnostic,
predictive modelling
For example
• source integrations
• Extract - Load - Transform
• Metadata
• modelling for cleansing &
consistency
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Think & plan from deployment to data
Pick an aim!
Lecture @AaltoBIZ, Johan Himberg, 2015
Action DataInformation
Businessdrivers
aim 1
start from here!
aim 3
aim 4
aim 5
For example
• Business: need optimising
for customer retention
• Marketing: we could start
with special offer by SMS
• Data Scientist: we’ll set up
test & control groups!
For example
• Solution expert: Field ZPOR
means revenue per unit and
it is calculated based on …
• Customer transactions are
not in Data Warehouse,
they’re aggregated on
monthly level - Let’s get daily
data from system Z
For example
• Now we have transactions
for 1M users for 1 yr fields
a,b,c,d,e …
• …
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Data-Driven is inherently iterative and benefits from agility.
Data and processes are often not like assumed.
Be curious, keep backlog, inspect, adapt.
Lecture @AaltoBIZ, Johan Himberg, 2015
Action DataInformation
Businessdrivers
aim 1
aim 2
aim 3
aim 4
aim 5
For example
• deploy campaign, collect
responses
For example
• calibrate & apply model
For example
• get data for modeling
• store results
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Execute based on model, collect data
THE LOOP: results
Action DataInformation
Businessdrivers
aim 1
aim 2
aim 3
aim 4
aim 5
Backlog example
• test & control group
handling in marketing
automation
• Involve N.N. to the process
Backlog example
• define new information
source
• Look for a new data source
for determining income on
zip code areas
• correct documentation
• automatization for the
campaign modelling
Backlog example
• better system configuration
& architecture
• automatization for the
campaign process…
• new data: record information
on all campaigns
modelling
what are the actions what are the insights
wrangling
what data means
testing
what is the impact
Information path focused backlog
Lecture @AaltoBIZ, Johan Himberg, 2015
Don’t silo
• A change of culture; information (not data) is everybody’s business as well as
money
• One data scientist can’t excel all of this:
• PO / Technical Account Manager
• Business specialist
• Solution owner / process owner
• Data Steward
• Developer
• Visualization / UX expert
Data Scientists’ special role
• Data scientists main tasks are in methods, but also in
processes and machinery of
• making evidence based decisions (automated if possible)
• finding out confidence on the outcome (by active tests if
possible)
• getting insights based on models and data
• Data scientist often act as a “glue”.
Lecture @AaltoBIZ, Johan Himberg, 2015
Culture that helps to handle risk
TECHNOLOGY
Technology
• Different analytical tasks need different tools. One has to integrate
different systems. Remember that you need a feedback loop!
• Prefer systems
• that give mass-access to historical, transactional data on
individual level instead of just aggregates (avoid being “blinded by
averages”)
• from which you’ll get the data, transformations, and results out to
another system (avoid being “data hostage”)
• where you see what the analytics actually does at least on modular
level (avoid being “method hostage”) Prefer being able to see the
actual implementation (open source)
• Pick a product when you know the task, your needs, the product
quality.
Lecture @AaltoBIZ, Johan Himberg, 2015
References
• Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does Data-
Driven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/
abstract=1819486 or http://dx.doi.org/10.2139/ssrn.1819486
• Netflix case: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
• Big Data landscape: http://mattturck.com/2016/02/01/big-data-landscape/#more-917
• Data science skills
• http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
• http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html
www.reaktor.com

More Related Content

PDF
Data Driven Culture with Slalom's Director of Analytics
PDF
Lecture notes on being Data-Driven and doing Data Science
PPTX
Building the Analytics Capability
PDF
Data Analytics Strategy
PDF
How to Monetize Your Data Assets and Gain a Competitive Advantage
 
PDF
BI Consultancy - Data, Analytics and Strategy
PDF
Digital Transformation, Analytics, and the Modern C-Suite
PDF
5 Steps to Transform into a Data-Driven Organization - Ganes Kesari - Gramen...
Data Driven Culture with Slalom's Director of Analytics
Lecture notes on being Data-Driven and doing Data Science
Building the Analytics Capability
Data Analytics Strategy
How to Monetize Your Data Assets and Gain a Competitive Advantage
 
BI Consultancy - Data, Analytics and Strategy
Digital Transformation, Analytics, and the Modern C-Suite
5 Steps to Transform into a Data-Driven Organization - Ganes Kesari - Gramen...

What's hot (20)

PDF
Data strategy in a Big Data world
PDF
Data Strategy
PDF
Predictive vs Prescriptive Analytics
PDF
Applications of AI in Supply Chain Management: Hype versus Reality
PDF
Creating a Data-Driven Organization, Crunchconf, October 2015
PDF
Data-Driven Organisation
PPTX
Why Data Science Projects Fail
PPTX
Data Analytics: Better Decision, Better Business
PDF
Slides: Bridging the Data Disconnect – Trends in Global Data Management
PDF
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
PDF
How to Create and Manage a Successful Analytics Organization
PDF
Building a Winning Roadmap for Analytics
PDF
Slides: Go Beyond Dashboards With the Next Generation of Analytics
PPTX
Data Strategy - Executive MBA Class, IE Business School
PDF
Slides: Taking an Active Approach to Data Governance
PPTX
Developing a Data Strategy -- A Guide For Business Leaders
 
PDF
When and How Data Lakes Fit into a Modern Data Architecture
PDF
Data strategy - How & When to Invest (SXSW V2V Core Conversation)
PDF
Data-Ed: Data-centric Strategy & Roadmap
PDF
Creating a Data-Driven Organization (Data Day Seattle 2015)
Data strategy in a Big Data world
Data Strategy
Predictive vs Prescriptive Analytics
Applications of AI in Supply Chain Management: Hype versus Reality
Creating a Data-Driven Organization, Crunchconf, October 2015
Data-Driven Organisation
Why Data Science Projects Fail
Data Analytics: Better Decision, Better Business
Slides: Bridging the Data Disconnect – Trends in Global Data Management
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
How to Create and Manage a Successful Analytics Organization
Building a Winning Roadmap for Analytics
Slides: Go Beyond Dashboards With the Next Generation of Analytics
Data Strategy - Executive MBA Class, IE Business School
Slides: Taking an Active Approach to Data Governance
Developing a Data Strategy -- A Guide For Business Leaders
 
When and How Data Lakes Fit into a Modern Data Architecture
Data strategy - How & When to Invest (SXSW V2V Core Conversation)
Data-Ed: Data-centric Strategy & Roadmap
Creating a Data-Driven Organization (Data Day Seattle 2015)
Ad

Viewers also liked (20)

PPTX
How to reach a Data Driven culture
PDF
Create your Big Data vision and Hadoop-ify your data warehouse
PPTX
Wikibon Big Data Capital Markets Day 2014
PPTX
Application of Data Science in Government Services – IPMA Forum 2016 Speaker ...
PDF
Introduction to Big Data
PDF
Steps towards a Data Value Chain
PDF
Becoming a Data Driven Organisation
PDF
#BigDataCanarias: "Big Data & Career Paths"
PDF
20160419 CCASA
PDF
Building a Data-Driven Culture
PPTX
Building a data-driven culture
PDF
How big data tranform your business? Data Science Thailand Meet up #6
PDF
Honey's Data Dinner#13 跨領域專案開發經驗談(User Story Mapping)
PPTX
組織創新管理- 序言(科特勒談創新型組織)
PPTX
Data Science Project Lifecycle and Skill Set
PPTX
組織創新管理 - 啟動者(科特勒談創新型組織)
PPTX
組織創新管理 - 促進者(科特勒談創新型組織)
PPTX
組織創新管理 - 執行者(科特勒談創新型組織)
PDF
Applying Data Science to Your Business Problem
PDF
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
How to reach a Data Driven culture
Create your Big Data vision and Hadoop-ify your data warehouse
Wikibon Big Data Capital Markets Day 2014
Application of Data Science in Government Services – IPMA Forum 2016 Speaker ...
Introduction to Big Data
Steps towards a Data Value Chain
Becoming a Data Driven Organisation
#BigDataCanarias: "Big Data & Career Paths"
20160419 CCASA
Building a Data-Driven Culture
Building a data-driven culture
How big data tranform your business? Data Science Thailand Meet up #6
Honey's Data Dinner#13 跨領域專案開發經驗談(User Story Mapping)
組織創新管理- 序言(科特勒談創新型組織)
Data Science Project Lifecycle and Skill Set
組織創新管理 - 啟動者(科特勒談創新型組織)
組織創新管理 - 促進者(科特勒談創新型組織)
組織創新管理 - 執行者(科特勒談創新型組織)
Applying Data Science to Your Business Problem
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Ad

Similar to Lecture on Data Science in a Data-Driven Culture (20)

PDF
Data Analytics Integration in Organizations
PDF
Training Taster: Leading the way to become a data-driven organization
PDF
Business_Analytics_Lecture 1 02.09.2024.pdf
PDF
Lightning talk on the future of analytics - CloudCamp London, 2016
PDF
The Essential Data Ingredient
PDF
Loras College 2016 Business Analytics Symposium Keynote
PDF
How to make your data scientists happy
PPTX
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
PPTX
Making Advanced Analytics Work for You
PDF
Data Science Introduction - Data Science: What Art Thou?
PDF
In the Absence of Fact - Stephen Harris
PPTX
NTEN Your Analytics doesn't have to be dramatic to be useful
PDF
Applied_Data_Science_Presented_by_Yhat
PPTX
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
PDF
Start With Why: Build Product Progress with a Strong Data Culture
PPTX
Start With Why: Build Product Progress with a Strong Data Culture
PDF
How to Build Data Science Teams that Deliver Business Value
PDF
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
PPTX
DataOps: Nine steps to transform your data science impact Strata London May 18
PDF
Data Science in Business: Value Creation of Business
Data Analytics Integration in Organizations
Training Taster: Leading the way to become a data-driven organization
Business_Analytics_Lecture 1 02.09.2024.pdf
Lightning talk on the future of analytics - CloudCamp London, 2016
The Essential Data Ingredient
Loras College 2016 Business Analytics Symposium Keynote
How to make your data scientists happy
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Making Advanced Analytics Work for You
Data Science Introduction - Data Science: What Art Thou?
In the Absence of Fact - Stephen Harris
NTEN Your Analytics doesn't have to be dramatic to be useful
Applied_Data_Science_Presented_by_Yhat
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Start With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data Culture
How to Build Data Science Teams that Deliver Business Value
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
DataOps: Nine steps to transform your data science impact Strata London May 18
Data Science in Business: Value Creation of Business

Recently uploaded (20)

PPTX
Introduction to Knowledge Engineering Part 1
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Introduction to Business Data Analytics.
PDF
Mega Projects Data Mega Projects Data
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Foundation of Data Science unit number two notes
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Global journeys: estimating international migration
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Lecture1 pattern recognition............
Introduction to Knowledge Engineering Part 1
Fluorescence-microscope_Botany_detailed content
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Business Data Analytics.
Mega Projects Data Mega Projects Data
IB Computer Science - Internal Assessment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Foundation of Data Science unit number two notes
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Global journeys: estimating international migration
Supervised vs unsupervised machine learning algorithms
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Lecture1 pattern recognition............

Lecture on Data Science in a Data-Driven Culture

  • 1. Data-Driven Culture DATA-DRIVEN and DATA-SCIENCE Johan Himberg / Reaktor 29.2.2016
  • 2. survey data on the business practices and IT investments of 179 large, publicly traded companies Firms that emphasise “data driven decision making” have output and productivity that is 5-6% higher than what would be expected given other investments and IT usage. relationship also appears in asset utilisation, return on equity and market value Why “data-driven” WHY 2 Brynjolfson et al (2011) on Data-Driven
  • 3. Business acumen what for Operations Research optimal decisions and actions Probability theory how to handle uncertainties Analytics insights and machine learning from data Computer Science how to implement all that Data Science in business WHY 3
  • 4. Data Science & analytics BASICS
  • 5. BASICS 5 Some dimensions 1. Business case 2. Analytical task 1. Active - Passive system 2. Informative - Operative aim 3. Modelling (model selection and fitting) 4. Data: structure, amount, velocity, and source REAKTOR / JOHAN HIMBERG FEBRUARY 2016
  • 6. Data Science & analytics BUSINESS CASES
  • 7. SECTION TITLE 7 Beware of empty “data-speak” A quote from my colleague Janne Sinkkonen from a presentation at Helsinki University Machine learning course: “Data-speak” hides the processes behind data. 
 What creates the data? What is done with the results? The goal is not “data analysis” Define your goal and setup without using the word ‘data’. REAKTOR 2016
  • 8. Sell audiences Google, Facebook, media, … Sell information credit rating, car register,… Information business BUSINESS CASE 8
  • 9. Operations BUSINESS CASE 9 Create beneficial events marketing: targeting, cross-sell, up-sell, conversion find right product/service to sell or buy, find a good doctor, expert etc. Avoid non-beneficial events churn, people leaving, waste, credit loss, fraud, … system failures, … Optimize customer value, work force, schedules, prices, discounts, stocks, relevancy for customer, production quality, speed Rationalise process efficiency, lead times, handle complexity, search time …  Understand: customer & product base, transactions, or processes  internally: ERP, CRM, HR, sales systems, production, … externally: location, routes, weather, demographics, estates, …
  • 10. Efficiency and competition React faster, streamlined decision making, risk awareness Financial efficiency Innovations Well-informed strategic decisions Understanding customer groups’ needs for product and service development Understanding and predicting world events, economics, demographics, …. React to market fluctuation or changes in financial environment Internal and external image and culture Transparency, learning as a part of company culture Customer satisfaction, personalisation, brand Strategic BUSINESS CASE 10
  • 11. Netflix "The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month subscription retention, which correlates well with maximizing consumption of video content. - 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering Example VIRTUES 11
  • 12. Data Science & analytics TASKS & RISKS
  • 13. BASICS 13 Some dimensions 1. Business case 2. Analytical task 1. Active - Passive system 2. Informative - Operative aim 3. Modelling (model selection and fitting) 4. Data: structure, amount, velocity, and source REAKTOR / JOHAN HIMBERG FEBRUARY 2016
  • 14. BASICS 14 Informative - Operative Informative (for understanding) Analysis results for understanding things, results for management for making decisions: reports, predictions, what-if analyses, simulations, visualisations,… Operative Automated system that makes decisions based on some rules or models, or results that are directly operative, if not automated. REAKTOR / JOHAN HIMBERG FEBRUARY 2016
  • 15. BASICS 15 Active - Passive Active You make an “intervention” and gather evidence in tests designed to reveal an effect. Example: A/B testing. Passive Data is just collected, captured “as it happens”: customer transactions, sales, web-browsing, tweets REAKTOR / JOHAN HIMBERG FEBRUARY 2016
  • 16. BASICS 16 Use cases REAKTOR 2016 Descriptive What has happened? Diagnostic Why did it happen? Passive Active Customer profiles Customer segmentation Shopping cart analysis Predictive What will happen? Prescriptive What should I do? Informative Operative Marketing impact analysis Price elasticity analysis Web design testing Up-sell/cross-sell New customer acquisition Churn prediction Life-time value prediction Demography prediction Marketing impact optimisation Recommendation system in a dynamic environment
  • 17. Data Science & analytics RISKS & PROBLEMS
  • 18. RISKS / PROBLEMS 18 Issues by analytics use case REAKTOR 2016 Descriptive • isolated / ad hoc reports • isolated ad hoc decisions • feedback loop (report - decision - effect) • ignoring statistics • analysts as sql-monkeys • UI / visualization Diagnostic • statistical skills • testing and organisation • correlation vs. causality • requires lots of communication Passive Active Predictive • what to predict: how to quantify the target • access to historical data • quantifying and understanding the risk(s) • prediction accuracy validation for future Prescriptive • what to optimize? • complex software system • technical feedback loop • co-op between “human” and “artificial intelligence” • monitoring Informative Operative
  • 19. •Focusing on wrong things •not recognising the analytics use cases •“data first”: long time from investment to benefits •not starting from the beef: actions and decisions •thinking only IT solutions and products •careful examination and validation of the algorithms, but not setting targets and risks according to the business target •Organisation •silos: communication through hierarchy •no access to data, internal politics •technical details decided by business people •business criteria set by technical people Examples… RISKS / PROBLEMS 19
  • 20. •Underestimating complexity (time & scope) •both software and analytics to be build simultaneously •the time and effort needed with “data wrangling” •the time used for UIs and visualisations •the feedback loop •Unrealistic expectations (quality) •on analytical systems in general (they are not that intelligent); rules needed •a product, a model, an algorithm, a data scientist solves all the problems •risks and targets cannot always be defined properly right away •there is no guarantee on accuracy on a particular case before trying …more examples RISKS / PROBLEMS 20
  • 21. Culture that helps to handle risk WISE - DETERMINED - CURIOUS
  • 22. Wise: Solve the right problems with analytics! Determined: aim at specific, concrete things Curious: be ready to divert, seek for evidence Bayesian: understand uncertainties and risks Truthful: don’t bend results upon wishes, it’s data science Courageous: act on evidence Active and Agile: test, don’t just observe; inspect - adapt - learn Transparent and Helpful: co-operate from end-to-end, don’t silo Culture that helps to handle risk VIRTUES 22
  • 23. Culture that helps to handle risk WISE - DETERMINED - CURIOUS
  • 24. Netflix prize competition (2006-2008) Who gets the best RMSE (root mean squared error) on true user likings? BUT "The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month subscription retention, which correlates well with maximizing consumption of video content. We therefore optimize our algorithms to give the highest scores to titles that a member is most likely to play and enjoy.”---Netflix Prize objective... is just one of the many components of an effective recommendation system... We also need to take into account factors such as context, title popularity… Supporting all the different contexts in which we want to make recommendations requires a range of algorithms that are tuned to the needs of those contexts.” - 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering Aim at the right things VIRTUES 24
  • 25. Always aim at something specific … but be open-minded and curious Example: Röntgen and Fleming (Nobel laureates) • their most famous findings were “accidental”, but • they were skilled scientists doing disciplined research for some other aim Explore occasionally “from data to insights”. But not aimlessly.  If you find something interesting, make a disciplined analysis, preferably a test. Curiosity VIRTUES 25
  • 26. Culture that helps to handle risk BAYESIAN - TRUTHFUL
  • 27. The main ingredients of data science! Making decisions based on data analysis requires the concepts of risk and probability. Understanding probabilities VIRTUES 27
  • 28. Culture that helps to handle risk COURAGE
  • 29. Courage “Data driven means that progress in an activity is compelled by data rather than by intuition or personal experience. It is often labeled as the business jargon for what scientists call evidence based decision making - Wikipedia 2016-02-24 “I take risks, sometimes patients die. But not taking risks causes more patients to die, so I guess my biggest problem is I've been cursed with the ability to do the math. - Fictional character Dr. House in Fox television series “House”
  • 30. Culture that helps to handle risk HELPFUL - TRANSPARENT - AGILE
  • 31. Agile - Transparent Doing data-driven work and data science in any organisation model boils down to “Involve everyone along the information path” Agile development - Team decides details Start from •concrete actions that can be optimized •decisions they require, and •how to measure the effects properly Remember the feedback loop! Develop constantly Lecture @AaltoBIZ, Johan Himberg, 2015
  • 32. Action optimize decide deploy Data big, small, open local, web, meta, … Information report visualize model Businessdrivers aim 1 aim 2 aim 3 aim 4 aim 5 For example • Automatised decisions; recommendation, targeting • Simulation • prescriptive, predictive modelling For example • documentation on meaning of the data • KPIs, profiles, segments, factors, DW dashboards • descriptive, diagnostic, predictive modelling For example • source integrations • Extract - Load - Transform • Metadata • modelling for cleansing & consistency modelling what are the actions what are the insights wrangling what data means testing what is the impact Think & plan from deployment to data Pick an aim! Lecture @AaltoBIZ, Johan Himberg, 2015
  • 33. Action DataInformation Businessdrivers aim 1 start from here! aim 3 aim 4 aim 5 For example • Business: need optimising for customer retention • Marketing: we could start with special offer by SMS • Data Scientist: we’ll set up test & control groups! For example • Solution expert: Field ZPOR means revenue per unit and it is calculated based on … • Customer transactions are not in Data Warehouse, they’re aggregated on monthly level - Let’s get daily data from system Z For example • Now we have transactions for 1M users for 1 yr fields a,b,c,d,e … • … modelling what are the actions what are the insights wrangling what data means testing what is the impact Data-Driven is inherently iterative and benefits from agility. Data and processes are often not like assumed. Be curious, keep backlog, inspect, adapt. Lecture @AaltoBIZ, Johan Himberg, 2015
  • 34. Action DataInformation Businessdrivers aim 1 aim 2 aim 3 aim 4 aim 5 For example • deploy campaign, collect responses For example • calibrate & apply model For example • get data for modeling • store results modelling what are the actions what are the insights wrangling what data means testing what is the impact Execute based on model, collect data THE LOOP: results
  • 35. Action DataInformation Businessdrivers aim 1 aim 2 aim 3 aim 4 aim 5 Backlog example • test & control group handling in marketing automation • Involve N.N. to the process Backlog example • define new information source • Look for a new data source for determining income on zip code areas • correct documentation • automatization for the campaign modelling Backlog example • better system configuration & architecture • automatization for the campaign process… • new data: record information on all campaigns modelling what are the actions what are the insights wrangling what data means testing what is the impact Information path focused backlog Lecture @AaltoBIZ, Johan Himberg, 2015
  • 36. Don’t silo • A change of culture; information (not data) is everybody’s business as well as money • One data scientist can’t excel all of this: • PO / Technical Account Manager • Business specialist • Solution owner / process owner • Data Steward • Developer • Visualization / UX expert
  • 37. Data Scientists’ special role • Data scientists main tasks are in methods, but also in processes and machinery of • making evidence based decisions (automated if possible) • finding out confidence on the outcome (by active tests if possible) • getting insights based on models and data • Data scientist often act as a “glue”. Lecture @AaltoBIZ, Johan Himberg, 2015
  • 38. Culture that helps to handle risk TECHNOLOGY
  • 39. Technology • Different analytical tasks need different tools. One has to integrate different systems. Remember that you need a feedback loop! • Prefer systems • that give mass-access to historical, transactional data on individual level instead of just aggregates (avoid being “blinded by averages”) • from which you’ll get the data, transformations, and results out to another system (avoid being “data hostage”) • where you see what the analytics actually does at least on modular level (avoid being “method hostage”) Prefer being able to see the actual implementation (open source) • Pick a product when you know the task, your needs, the product quality. Lecture @AaltoBIZ, Johan Himberg, 2015
  • 40. References • Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does Data- Driven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/ abstract=1819486 or http://dx.doi.org/10.2139/ssrn.1819486 • Netflix case: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html • Big Data landscape: http://mattturck.com/2016/02/01/big-data-landscape/#more-917 • Data science skills • http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html • http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html