SlideShare a Scribd company logo
The Value of Digital Technologies
Big Data
Sofia, 23 March 2018
Severino Meregalli
Scientific Coordinator – DEVO Lab
SDA Bocconi
THE BUSINESS CONTEXT: WHY DATA EXPLOITATION IS SO IMPORTANT
• Dynamism and complexity as structural elements
• Fuzzy business scenarios
• Complexity management and profit linkage
• The fall of management as a science and of prescriptive management
• The fall of the “legendary” long term strategic planning as an antidote to complexity
• The “evergreen” gap between Business Requirements and Information Systems
• Desperate search of insight and knowledge sources
THE (BIG) DATA LANDSCAPE
• Generating value from data and analytics is one of the pillars of competitive advantage
• Decision-making in complex and dynamic organizations calls for a full exploitation of data
resources
• Progressive digitalization of businesses vs skills needed to take advantage of large and
complex dataset
• Big Data, Data Discovery and Analytics have suffered all negative impacts due to hype and
the rise of improvised players
• Wide range of high performing technologies and players
• Cost/benefit leverage calls for a deep understanding of the real opportunities and hurdles in
Data exploitation
DATA EXPLOSION VS ABILITY TO EXECUTE
• There will be a shortage of talent necessary for organizations to take advantage of big data.
By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with
deep analytical skills as well as 1.5 million managers and analysts with the know-how to use
the analysis of big data to make effective decisions.
• Organizations need not only to put the right talent and technology in place but also
structure workflows and incentives to optimize the use of big data.
McKinsey Global Institute 2011
5
THE MEDIA HYPE
A CROWDED MARKET
7
HYPE… AS USUAL
Gartner Hype Cycle for Emerging Technologies, 2014
8
THE CALL FOR A MANAGERIAL APPROACH (DEVO LAB SDA BOCCONI)
Value
Shortcut
9
THE ISSUE
After the first wave of technology adoption for
managing and analyzing large datasets, both
the academic and the practitioner community
acknowledged the risks of (another) «hype
driven» approach
10
THREE KEY TOPICS
Physical vs Social
Data quality
Context
• It is relatively less complex to get significant results when the focus of the analysis is on deterministic
phenomena (Natural Sciences) rather than on Social Sciences
• In Natural Sciences it is possible to explain/understand a phenomenon by observing a singularity (eg
a star with an odd orbit) …the same does not apply to Social Sciences (eg trendsetter vs crazy
behavior)
• Predictive analysis, as well as the mere understanding of phenomena impacted by social variables is
still characterized by issues difficult to address, even when companies have large amounts of data
and computing power
• The paradox is that in the digital world sometimes it is easier to influence behaviors rather than
understand them
• The short term economic value is proportional to the difficulty of the task: higher in Social Sciences,
lower in Natural Sciences
11
PHYSICAL VS. SOCIAL PHENOMENA
12
DATA QUALITY MANAGEMENT
• The stratification of large amount of data, with different formats, different scopes, emphasizes the old
but evergreen concept of “Garbage-in, Garbage-OUT“
• Big Data tools and technologies have not yet solved this problem and, in some cases, it has been
amplified by the presence of data from sources that are out of control (i.e. Social Networks)
• “Data Quality" attitude is a precondition to initiate a virtuous cycle of data value exploitation
• Technology is here to help, but we still have issues:
– uniqueness (single source of truth)
– accountability for data quality (not IT)
– consistency of goals between who produces and who analyses the data
– availability of consistent and shared data information (metadata)
– legal issues
13
UNDERSTANDING DOMAIN, CONTEXT AND DECODING RESULTS
• The breadth and variety of datasets allow analysts to find numerous correlations between variables,
which can not be found in small datasets
• The issue is to understand which are the meaningful correlations to be considered, since.. the more
the variables, the more correlations that can show significance
• Context is hard to interpret at scale and even harder to maintain when data are reduced to fit into a
model. Obtaining and managing context data will be a challenge.
The more variables, the more correlations
that can show significance. Falsity also grows
faster than information; it is nonlinear
(convex) with respect to data
N. Taleb - Professor of risk engineering at New York
University’s Polytechnic Institute.
14
THE MORE VARIABLES, THE MORE CORRELATIONS THAT CAN SHOW SIGNIFICANCE…
• Contextual data are scarce and very often not available or not consistent
with the needs
• Each application domain requires to involve experts that know it from inside.
Statistical “brute force” approach does not work well in Social Sciences
• The issue is to find the sweet spot between “obvious” and “false” findings
15
UNDERSTANDING DOMAIN, CONTEXT AND DECODING RESULTS
• Differentiate between physical and social phenomena
• Measure the "quality" of available data
– Accuracy
– Reliability
– Completeness
– Consistency
– Timeliness
• Consider the availability of domain experts / knowledge when dealing with social
phenomena
16
THREE PILLARS FOR DATA VALUE
17
THE ANALYSIS MODEL
Data Quality
Value
Level of
Determinism
Low
High
Low High
Value
Social phenomena
Physical phenomena
POSSIBLE PATHS
Level of
DeterminismLow
High
Low High
ValueData Quality
Value
19
VOLVO CAR CORPORATION CASE HISTORY
The Company
• Global leader in the automotive industry
• Acquired by Geely Auto Group in 2010
• Focus on quality and safety : Our vision is to design cars that should not crash. In the shorter perspective
the aim is that by 2020 no-one should be killed or injured in a Volvo car.
Scope of Work
• Improving quality of data
collected from dealers,
engineering, production and
from diagnostic systems
(DRO)
• Build and unified repository
of integrated data
Achievements
• Problem identification and
prioritization of maintenance
activities
• Solving problems of quality
during the production
processes
• Warranty programs
management accuracy
• Potential failure predictive
analysis
The Needs
• Analyze mechanical
performances of the vehicles
in real driving conditions in
order to improve design,
production and after-sales
service (warranty) processes
20
VOLVO CAR CORPORATION CASE HISTORY
Low
High
Low High
Value
FullPartialNull
Level of
Determinism
Data Quality
Value
21
SCE SMART CONNECT CASE HISTORY
The Company
• Southern California Edison is the largest subsidiary of Edison International
• For over a century, the company provides electricity to about 14 million customers in Southern
California (Central, Coastal & Southern California)
Scope of Work
• Acquisition of data from
Smart Meter (720 readings
per month per customer,
about 5.6 billion of readings
per month total)
• Smart meters data
integration with expenses
and demographic
information
Achievements
• Improvement in production
and distribution flow
management
• Peak usage prediction
The Needs
• Provide customers with a
weekly reporting of energy
consumption, in order to
gain expenses control
22
SOUTHERN CALIFORNIA EDISON CASE HISTORY
Low
High
Low
Value
Full
Level of
Determinism
Data Quality
Value
23
GDF SUEZ CASE HISTORY
The Company
• French group, one of the main Utility worldwide (turnover of about 70 billion €)
• Founded in 2008 after Suez and GDF merge
• Core business: production and distribution of electricity, natural gas and renewable sources
Scope of Work
• Customer size wasn’t
addressed consistently
(admin vs. Commercial data)
• Improvement in: Data
Quality, CRM & Billing
integration, Marketing
Campaigns
• Incremental understanding
of customers’ related
phenomena
Achievements
• Customer’s value – based
segmentation
• Churn, due to customer’s
relocation, prevention
• «Gas-only» customer’s
acquisition (electricity)
The Needs
• After liberalization of the
energy market in France, B2C
(CH&P) Business Unit was
willing to pursue the
opportunity to grow in the
electricity market leveraging
their gas market share
• Understand customer
segmentation, where to
focus sales and marketing
initiatives and how
24
GDF SUEZ CASE HISTORY
Low
High
Low High
Value
Level of
Determinism
Data Quality
Value
• The analysis of case studies highlights how a mature approach to Data Value
bank on two main dimensions:
– data quality
– the ability to interpret /understand phenomena
• Thanks to the analysis of case histories, it has been possible to identify a first set
of Data Value components
25
LESSON LEARNED FROM CASE HISTORIES
26
DATA VALUE LAYERS
INTRINSIC VALUE
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Physical Social
Domain Expertise
27
DATA VALUE LAYERS
POTENTIAL VALUE
DataToolsExpertise
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Physical Social
Context - Data
Context - Models
Domain Expertise
Context - Data
Context - Models
Edge Computing
vs
Edge Organizations
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
29
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
Data ownership and
side effects control
30
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
Data storage
technologies
evolution is much
slower than data
growth
Big Data, Machine
Learning and
Quantum Computing:
the perfect storm ?
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
Research, consulting,
teaching, software
industry, working again
without hackatons
(or with real rewards
and better ethics)
34
SUMMARY AND RECOMMENDATIONS
• There are no "big data". We have only data which are manageable / unmanageable with
state of the art technologies
• The real challenge is getting «Big Info» and take better decisions
• Natural and Social domains are different
• Data quality is still the precondition for any project
• Context understanding and contextual data are (in social applications) very often the real
bottleneck
• Use a checklist to asses data value components before starting a project
• Only consider vendors that are able to provide fully integrated solutions to their data issues
(no room for improvised players)
• Not to capitalize on data sets in Natural Sciences Domains is a big mistake
…..transforming data sets in value in Social Sciences is (still) a big challenge
• Davenport T.H., Big Data at Work, Dispelling the Myths, Uncovering the Opportunities, Harvard
Business Review Press, 2013
• Davenport T.H., Data Scientist: The Sexiest Job of the 21st Century, Harvard Business Review,
October, 2012 http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
• Gartner Hype Cycle for Emerging Technologies, 2014
• McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity,
2011
• Redman T, Data’s Credibility Problem, Harvard Business Review, December 2013
http://hbr.org/2013/12/datas-credibility-problem/ar/1
• Ross J.W., Beath C.M., Quaadgras A., You may not need Big Data after all, Harvard Business
Review, December 2013, http://hbr.org/2013/12/you-may-not-need-big-data-after-all/ar/1
• Taleb N. N., Beware the Big Errors of ‘Big Data’, Wired, 2013 www.wired.com/2013/02/big-data-
means-big-errors-people/
REFERENCES

More Related Content

PPTX
CSCMP 2014 :exploring scm big data cscmp
PPTX
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
PDF
Financial Technology Gartner Summit Briefing - Vin Malhotra, Partner Accenture
PDF
Technolony Vision 2016 - Primacy Of People First In A Digital World - Vin Mal...
PPTX
Age Friendly Economy - Improving your business with external data
PDF
Data driven decision making process - infographic
PPTX
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...
PPTX
The New Economics of Manufacturing
CSCMP 2014 :exploring scm big data cscmp
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
Financial Technology Gartner Summit Briefing - Vin Malhotra, Partner Accenture
Technolony Vision 2016 - Primacy Of People First In A Digital World - Vin Mal...
Age Friendly Economy - Improving your business with external data
Data driven decision making process - infographic
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...
The New Economics of Manufacturing

What's hot (20)

PDF
"Big data in western europe today" Forrester / Xerox 2015
PPTX
Social CRM - the academic perspective
PDF
Property & Casualty: Deterring Claims Leakage in the Digital Age
PDF
Serene Zawaydeh - Big Data -Investment -Wavelets
PDF
Master Of Science Dissertation
PDF
vanZylKrsek2007-02-23-1
PDF
Machine Learning in Customer Analytics
PDF
Customer Engagement Open Group Oct 2015
PDF
Professionalising Data Analytics and Artificial Intelligence
PPTX
Digital Disruption and Consumer Trust - Resolving the Challenge of GDPR
PPTX
COMPETITIVE INTELLIGENCE
PDF
Emerging Data Quality Trends for Governing and Analyzing Big Data
PDF
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
PPTX
Hedge Fund case study solution - Credit default swaps execution system and Gr...
PDF
Making Data Quality a Way of Life
PPTX
Waters USA 2013: Data Leaders vs. Data Laggards
PPTX
Influence of Big Data Analytics in Supply Chain Management- A case study in B...
PPTX
Customer Journey Analytics
PDF
Mit tech review_machinelearning
DOCX
Influence of Big Data Analytics in Supply Chain Management- A case study in B...
"Big data in western europe today" Forrester / Xerox 2015
Social CRM - the academic perspective
Property & Casualty: Deterring Claims Leakage in the Digital Age
Serene Zawaydeh - Big Data -Investment -Wavelets
Master Of Science Dissertation
vanZylKrsek2007-02-23-1
Machine Learning in Customer Analytics
Customer Engagement Open Group Oct 2015
Professionalising Data Analytics and Artificial Intelligence
Digital Disruption and Consumer Trust - Resolving the Challenge of GDPR
COMPETITIVE INTELLIGENCE
Emerging Data Quality Trends for Governing and Analyzing Big Data
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Hedge Fund case study solution - Credit default swaps execution system and Gr...
Making Data Quality a Way of Life
Waters USA 2013: Data Leaders vs. Data Laggards
Influence of Big Data Analytics in Supply Chain Management- A case study in B...
Customer Journey Analytics
Mit tech review_machinelearning
Influence of Big Data Analytics in Supply Chain Management- A case study in B...
Ad

Similar to Disruptive as Usual: New Technologies and Data Value Professor Severino Meregalli (20)

PPTX
Challenges in adapting predictive analytics
PDF
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
PDF
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
PPT
Jean michel viola masterclass 14th of june
PPTX
Internet of things, Big Data and Analytics 101
PDF
Business_models_for_bigdata_2014_oxford
PPTX
Newcastle Intro 2015
PDF
uae views on big data
PPTX
Leading enterprise-scale big data business outcomes
PPTX
Analytic Transformation | 2013 Loras College Business Analytics Symposium
PDF
Data-centric design and the knowledge graph
PDF
Keynote: Graphs in Government_Lance Walter, CMO
PPTX
Introduction 2014
PDF
Bridging Data Gaps with a Solid Data Foundation - A Key Imperative for Today’...
PDF
How to get started in extracting business value from big data 1 of 2 oct 2013
PPTX
Strategic Intelligence in Growth Stage Technology Businesses - Dave Litwiller...
PDF
Innovative Data Leveraging for Procurement Analytics
PPTX
final oracle presentation
PPTX
Research Presentation: How Numbers are Powering the Next Era of Marketing
PPTX
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
Challenges in adapting predictive analytics
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Jean michel viola masterclass 14th of june
Internet of things, Big Data and Analytics 101
Business_models_for_bigdata_2014_oxford
Newcastle Intro 2015
uae views on big data
Leading enterprise-scale big data business outcomes
Analytic Transformation | 2013 Loras College Business Analytics Symposium
Data-centric design and the knowledge graph
Keynote: Graphs in Government_Lance Walter, CMO
Introduction 2014
Bridging Data Gaps with a Solid Data Foundation - A Key Imperative for Today’...
How to get started in extracting business value from big data 1 of 2 oct 2013
Strategic Intelligence in Growth Stage Technology Businesses - Dave Litwiller...
Innovative Data Leveraging for Procurement Analytics
final oracle presentation
Research Presentation: How Numbers are Powering the Next Era of Marketing
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
Ad

More from Data Science Society (20)

PDF
[Data Meetup] Data Science in Finance - Factor Models in Finance
PDF
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
PPTX
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
PPTX
Computer Vision in Real Estate
PPTX
ML in Proptech - Concept to Production
PPTX
Lessons Learned: Linked Open Data implemented in 2 Use Cases
PPT
AI methods for localization in noisy environment
PPTX
Object Identification and Detection Hackathon Solution
PPTX
Data Science for Open Innovation in SMEs and Large Corporations
PDF
Air Pollution in Sofia - Solution through Data Science by Kiwi team
PPTX
Machine Learning in Astrophysics
PPTX
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
PPTX
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
PDF
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
PDF
Relationships between research tasks and data structure (basic methods and a...
PDF
Data science tools - A.Marchev and K.Haralampiev
PDF
Problems of Application of Machine Learning in the CRM - panel
PDF
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
PDF
Master class Hristo Hadjitchonev - Aubg
PPTX
Open Data reveals corruption practices - case from Datathon 2017
[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
Computer Vision in Real Estate
ML in Proptech - Concept to Production
Lessons Learned: Linked Open Data implemented in 2 Use Cases
AI methods for localization in noisy environment
Object Identification and Detection Hackathon Solution
Data Science for Open Innovation in SMEs and Large Corporations
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Machine Learning in Astrophysics
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
Relationships between research tasks and data structure (basic methods and a...
Data science tools - A.Marchev and K.Haralampiev
Problems of Application of Machine Learning in the CRM - panel
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Master class Hristo Hadjitchonev - Aubg
Open Data reveals corruption practices - case from Datathon 2017

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
perinatal infections 2-171220190027.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
BIOMOLECULES PPT........................
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
The Minerals for Earth and Life Science SHS.pptx
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPT
veterinary parasitology ````````````.ppt
PPTX
Overview of calcium in human muscles.pptx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Placing the Near-Earth Object Impact Probability in Context
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
The Land of Punt — A research by Dhani Irwanto
Hypertension_Training_materials_English_2024[1] (1).pptx
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
perinatal infections 2-171220190027.pptx
. Radiology Case Scenariosssssssssssssss
The scientific heritage No 166 (166) (2025)
Introcution to Microbes Burton's Biology for the Health
lecture 2026 of Sjogren's syndrome l .pdf
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
BIOMOLECULES PPT........................
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The Minerals for Earth and Life Science SHS.pptx
CORDINATION COMPOUND AND ITS APPLICATIONS
veterinary parasitology ````````````.ppt
Overview of calcium in human muscles.pptx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Seminar Hypertension and Kidney diseases.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Placing the Near-Earth Object Impact Probability in Context

Disruptive as Usual: New Technologies and Data Value Professor Severino Meregalli

  • 1. The Value of Digital Technologies Big Data Sofia, 23 March 2018 Severino Meregalli Scientific Coordinator – DEVO Lab SDA Bocconi
  • 2. THE BUSINESS CONTEXT: WHY DATA EXPLOITATION IS SO IMPORTANT • Dynamism and complexity as structural elements • Fuzzy business scenarios • Complexity management and profit linkage • The fall of management as a science and of prescriptive management • The fall of the “legendary” long term strategic planning as an antidote to complexity • The “evergreen” gap between Business Requirements and Information Systems • Desperate search of insight and knowledge sources
  • 3. THE (BIG) DATA LANDSCAPE • Generating value from data and analytics is one of the pillars of competitive advantage • Decision-making in complex and dynamic organizations calls for a full exploitation of data resources • Progressive digitalization of businesses vs skills needed to take advantage of large and complex dataset • Big Data, Data Discovery and Analytics have suffered all negative impacts due to hype and the rise of improvised players • Wide range of high performing technologies and players • Cost/benefit leverage calls for a deep understanding of the real opportunities and hurdles in Data exploitation
  • 4. DATA EXPLOSION VS ABILITY TO EXECUTE • There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. • Organizations need not only to put the right talent and technology in place but also structure workflows and incentives to optimize the use of big data. McKinsey Global Institute 2011
  • 7. 7 HYPE… AS USUAL Gartner Hype Cycle for Emerging Technologies, 2014
  • 8. 8 THE CALL FOR A MANAGERIAL APPROACH (DEVO LAB SDA BOCCONI) Value Shortcut
  • 9. 9 THE ISSUE After the first wave of technology adoption for managing and analyzing large datasets, both the academic and the practitioner community acknowledged the risks of (another) «hype driven» approach
  • 10. 10 THREE KEY TOPICS Physical vs Social Data quality Context
  • 11. • It is relatively less complex to get significant results when the focus of the analysis is on deterministic phenomena (Natural Sciences) rather than on Social Sciences • In Natural Sciences it is possible to explain/understand a phenomenon by observing a singularity (eg a star with an odd orbit) …the same does not apply to Social Sciences (eg trendsetter vs crazy behavior) • Predictive analysis, as well as the mere understanding of phenomena impacted by social variables is still characterized by issues difficult to address, even when companies have large amounts of data and computing power • The paradox is that in the digital world sometimes it is easier to influence behaviors rather than understand them • The short term economic value is proportional to the difficulty of the task: higher in Social Sciences, lower in Natural Sciences 11 PHYSICAL VS. SOCIAL PHENOMENA
  • 12. 12 DATA QUALITY MANAGEMENT • The stratification of large amount of data, with different formats, different scopes, emphasizes the old but evergreen concept of “Garbage-in, Garbage-OUT“ • Big Data tools and technologies have not yet solved this problem and, in some cases, it has been amplified by the presence of data from sources that are out of control (i.e. Social Networks) • “Data Quality" attitude is a precondition to initiate a virtuous cycle of data value exploitation • Technology is here to help, but we still have issues: – uniqueness (single source of truth) – accountability for data quality (not IT) – consistency of goals between who produces and who analyses the data – availability of consistent and shared data information (metadata) – legal issues
  • 13. 13 UNDERSTANDING DOMAIN, CONTEXT AND DECODING RESULTS • The breadth and variety of datasets allow analysts to find numerous correlations between variables, which can not be found in small datasets • The issue is to understand which are the meaningful correlations to be considered, since.. the more the variables, the more correlations that can show significance • Context is hard to interpret at scale and even harder to maintain when data are reduced to fit into a model. Obtaining and managing context data will be a challenge. The more variables, the more correlations that can show significance. Falsity also grows faster than information; it is nonlinear (convex) with respect to data N. Taleb - Professor of risk engineering at New York University’s Polytechnic Institute.
  • 14. 14 THE MORE VARIABLES, THE MORE CORRELATIONS THAT CAN SHOW SIGNIFICANCE…
  • 15. • Contextual data are scarce and very often not available or not consistent with the needs • Each application domain requires to involve experts that know it from inside. Statistical “brute force” approach does not work well in Social Sciences • The issue is to find the sweet spot between “obvious” and “false” findings 15 UNDERSTANDING DOMAIN, CONTEXT AND DECODING RESULTS
  • 16. • Differentiate between physical and social phenomena • Measure the "quality" of available data – Accuracy – Reliability – Completeness – Consistency – Timeliness • Consider the availability of domain experts / knowledge when dealing with social phenomena 16 THREE PILLARS FOR DATA VALUE
  • 17. 17 THE ANALYSIS MODEL Data Quality Value Level of Determinism Low High Low High Value Social phenomena Physical phenomena
  • 18. POSSIBLE PATHS Level of DeterminismLow High Low High ValueData Quality Value
  • 19. 19 VOLVO CAR CORPORATION CASE HISTORY The Company • Global leader in the automotive industry • Acquired by Geely Auto Group in 2010 • Focus on quality and safety : Our vision is to design cars that should not crash. In the shorter perspective the aim is that by 2020 no-one should be killed or injured in a Volvo car. Scope of Work • Improving quality of data collected from dealers, engineering, production and from diagnostic systems (DRO) • Build and unified repository of integrated data Achievements • Problem identification and prioritization of maintenance activities • Solving problems of quality during the production processes • Warranty programs management accuracy • Potential failure predictive analysis The Needs • Analyze mechanical performances of the vehicles in real driving conditions in order to improve design, production and after-sales service (warranty) processes
  • 20. 20 VOLVO CAR CORPORATION CASE HISTORY Low High Low High Value FullPartialNull Level of Determinism Data Quality Value
  • 21. 21 SCE SMART CONNECT CASE HISTORY The Company • Southern California Edison is the largest subsidiary of Edison International • For over a century, the company provides electricity to about 14 million customers in Southern California (Central, Coastal & Southern California) Scope of Work • Acquisition of data from Smart Meter (720 readings per month per customer, about 5.6 billion of readings per month total) • Smart meters data integration with expenses and demographic information Achievements • Improvement in production and distribution flow management • Peak usage prediction The Needs • Provide customers with a weekly reporting of energy consumption, in order to gain expenses control
  • 22. 22 SOUTHERN CALIFORNIA EDISON CASE HISTORY Low High Low Value Full Level of Determinism Data Quality Value
  • 23. 23 GDF SUEZ CASE HISTORY The Company • French group, one of the main Utility worldwide (turnover of about 70 billion €) • Founded in 2008 after Suez and GDF merge • Core business: production and distribution of electricity, natural gas and renewable sources Scope of Work • Customer size wasn’t addressed consistently (admin vs. Commercial data) • Improvement in: Data Quality, CRM & Billing integration, Marketing Campaigns • Incremental understanding of customers’ related phenomena Achievements • Customer’s value – based segmentation • Churn, due to customer’s relocation, prevention • «Gas-only» customer’s acquisition (electricity) The Needs • After liberalization of the energy market in France, B2C (CH&P) Business Unit was willing to pursue the opportunity to grow in the electricity market leveraging their gas market share • Understand customer segmentation, where to focus sales and marketing initiatives and how
  • 24. 24 GDF SUEZ CASE HISTORY Low High Low High Value Level of Determinism Data Quality Value
  • 25. • The analysis of case studies highlights how a mature approach to Data Value bank on two main dimensions: – data quality – the ability to interpret /understand phenomena • Thanks to the analysis of case histories, it has been possible to identify a first set of Data Value components 25 LESSON LEARNED FROM CASE HISTORIES
  • 26. 26 DATA VALUE LAYERS INTRINSIC VALUE Data Model Data Volume – Cross Section Data Volume – Stock Data Quality Quantitative tools Cognitive tools Data Model Data Volume – Cross Section Data Volume – Stock Data Quality Quantitative tools Cognitive tools Physical Social
  • 27. Domain Expertise 27 DATA VALUE LAYERS POTENTIAL VALUE DataToolsExpertise Data Model Data Volume – Cross Section Data Volume – Stock Data Quality Quantitative tools Cognitive tools Data Model Data Volume – Cross Section Data Volume – Stock Data Quality Quantitative tools Cognitive tools Physical Social Context - Data Context - Models Domain Expertise Context - Data Context - Models
  • 28. Edge Computing vs Edge Organizations NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
  • 29. 29 NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES Data ownership and side effects control
  • 30. 30 NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES Data storage technologies evolution is much slower than data growth
  • 31. Big Data, Machine Learning and Quantum Computing: the perfect storm ? NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
  • 32. NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
  • 33. NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES Research, consulting, teaching, software industry, working again without hackatons (or with real rewards and better ethics)
  • 34. 34 SUMMARY AND RECOMMENDATIONS • There are no "big data". We have only data which are manageable / unmanageable with state of the art technologies • The real challenge is getting «Big Info» and take better decisions • Natural and Social domains are different • Data quality is still the precondition for any project • Context understanding and contextual data are (in social applications) very often the real bottleneck • Use a checklist to asses data value components before starting a project • Only consider vendors that are able to provide fully integrated solutions to their data issues (no room for improvised players) • Not to capitalize on data sets in Natural Sciences Domains is a big mistake …..transforming data sets in value in Social Sciences is (still) a big challenge
  • 35. • Davenport T.H., Big Data at Work, Dispelling the Myths, Uncovering the Opportunities, Harvard Business Review Press, 2013 • Davenport T.H., Data Scientist: The Sexiest Job of the 21st Century, Harvard Business Review, October, 2012 http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ • Gartner Hype Cycle for Emerging Technologies, 2014 • McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity, 2011 • Redman T, Data’s Credibility Problem, Harvard Business Review, December 2013 http://hbr.org/2013/12/datas-credibility-problem/ar/1 • Ross J.W., Beath C.M., Quaadgras A., You may not need Big Data after all, Harvard Business Review, December 2013, http://hbr.org/2013/12/you-may-not-need-big-data-after-all/ar/1 • Taleb N. N., Beware the Big Errors of ‘Big Data’, Wired, 2013 www.wired.com/2013/02/big-data- means-big-errors-people/ REFERENCES