SlideShare a Scribd company logo
A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
Why am I talking to you about
?
Healthcare
and
medicine
IoT and
Smart-cities
FinTech
Education
and
Learning
Digital
humanities
Media and
social Media
Agritech
Environment
and
Sustainability
Government
and public
sector
Customer
services
Entertain. /
creative
sector
A data view of the data science process
A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
?
A data-view of
the data science
process
Mathieu d’Aquin - @mdaquin
Data Science Institute
Insight Centre for Data Analytics
NUI Galway
?
As in Biology? Simplifying, the observation of naturally
occurring phenomenons and principles in relation to data?
As in Physics? Again simplifying, the theorisation and
experimental verification of fundamental laws of data?
As in Social Sciences? Really simplifying, the investigation
and the social, economic or cultural implications of data
on individuals, groups and society?
Hypo. /
Question
Plan Collect
data
Analyse
data
Extract
results
Exploit
results
Hypo. /
Question
Plan Collect
data
Analyse
data
Extract
results
Exploit
results
Data Models
New
info
What-
ever
was the
goal
Hypo. /
Question
Plan Collect
data
Analyse
data
Extract
results
Exploit
results
Data Models
New
info
What-
ever
was the
goal
The study of
this process
and its
characteristics
Hypo. /
Question
Plan Collect
data
Analyse
data
Extract
results
Exploit
results
Data Models
New
info
What-
ever
was the
goal
The study of
those things
and their
characteristics
Dataset
Dataset
Source
Dataset
Characteristics
obtained from with
derived from
Dataset
License
Regulation
Source
Dataset
Characteristics
associated with
obtained from with
derived from
Dataset
License
Regulation
Source
Dataset
Characteristics
Data
Science
Task
associated with
obtained from with
derived from
used for
Dataset
License
Regulation
Source
Dataset
Characteristics
Data
Science
Task
Technique
Parameters
...
associated with
obtained from with
derived from
used for
implemented by
using
produced
Dataset
License
Regulation
Source
Dataset
Characteristics
Data
Science
Task
Technique
Model
Model
Parameters
...
associated with
obtained from with
derived from
used for
implemented by
using
produced
version of
produced
Dataset
License
Regulation
Source
Dataset
Characteristics
Data
Science
Task
Technique
Model
Model
Parameters
...
associated with
obtained from with
derived from
used for
implemented by
using
produced
version of
produced
Example: Describing a data process with ontologies
(The Datanode ontology - E. Daga)
A vocabulary to describe the
relationships between input
data set, intermediary data
assets and the outputs of a
data process.
Dataset
License
Regulation
Source
Dataset
Characteristics
Data
Science
Task
Technique
Model
Model
Parameters
...
associated with
obtained from with
derived from
used for
implemented by
using
produced
version of
produced
Smart meter
data
Anonymisation
Solar panel
monitoring
Anonymisation
Weather data
Location
data
Electricity
tariff data
analysisAnon data
Anon data
Model
prediction/
recommendation
Results
Smart meter
data
Anonymisation
Solar panel
monitoring
Anonymisation
Weather data
Location
data
Electricity
tariff data
analysisAnon data
Anon data
Model
prediction/
recommendation
Results
Data
prot.
Corp
lic. 1
Corp
lic. 2
Data
prot.
Data
prot.
User
T&C
OGL
Corp
lic. 3
Smart meter
data
Anonymisation
Solar panel
monitoring
Anonymisation
Weather data
Location
data
Electricity
tariff data
analysisAnon data
Anon data
Model
prediction/
recommendation
Results
Data
prot.
Corp
lic. 1
Corp
lic. 2
Data
prot.
Data
prot.
User
T&C
OGL
Corp
lic. 3
?
Example: Machine readable policies and inference
rules for their propagation (E. Daga)
Dataset
License
Regulation
Source
Dataset
Characteristics
Data
Science
Task
Technique
Model
Model
Parameters
...
associated with
obtained from with
derived from
used for
implemented by
using
produced
version of
produced
Example: Studying large Data Science platforms
(ongoing work - M. Adel)
Thousands of datasets used in
thousands of data science
processes.
Allows us to better understand
the tasks of data science, how
they occur, in what contexts…
As well as what characteristics
of datasets lead to what use in
data science processes.
Data Ethics
Hypo. /
Question
Plan Collect
data
Analyse
data
Extract
results
Exploit
results
Where ethical implications are (might be) considered
Where they are important
Towards a methodology for Ethics by Design in Data Science
(with P. Troullinou)
‘Ethics by
Design’ for Data
Science
Dialectic
The process is based on a conversational
approach between data and critical social
scientists throughout the project’s life-cycle.
Reflective
Ethical concerns are not pre-fixed; they may
emanate from any stage of the project; thus,
constant reflexivity on activities and
researchers is needed.
Creative, not disruptive
The objective of this process is to achieve a
positive impact on the research, increase its
value addressing ethics throughout the
project’s life-cycle.
All- encompassing
Ethical concerns appear as much in the
research activities as in their outcomes, their
use and exploitation; the process needs to
expand on all stages.
Using science fiction to guide ethical thinking
Used/controlled by a small number of individuals
Used/controlled by all
Usedaccuratelyaccordingtointended
purpose
Hacked,biased,inaccurate
S3E1: Nosedive
S3E5: Men
against fire
S3E6: Hated
in the nation
S4E2: Arkangel
S4E3: Crocodile
S4E5:
Metalhead
S3E2:
Playtest
S2E1:
Be
right
back
S1E3: The Entire history of
you
Using science fiction to guide ethical thinking
Write scenarios, short stories, based on the following four
premisses: In a near future, what I am developing/the results I
will obtain will be...
Used as intended
by millions/most
people/many
people
Used as intended
a small group with
control/power
Abused, hacked,
inaccurate or
biased, while used
by millions/most
people/many
people
Abused, hacked,
inaccurate or
biased, while used
by a small group
with control/power
What could possibly go
wrong?
(see Re-coding Black
Mirror workshops)
Conclusion
Data Science has grown very quickly as a discipline, to reach huge
economic and societal impact. And it is not stopping.
This is leading to the creation of a very large number of datasets,
techniques, tools, models, approaches, methods, that are driven by
practices and applications in various domains.
The study of those artefacts is becoming critical, to extract the
fundamental principles that guide data science as a discipline and a
process. Understanding those principles is essential to drive the
impact of data science in an informed way.
Data science practice can support data science theory, but this is not a
job for the data/computer scientist alone. It needs to be a
conversation with social scientists, business experts, legal experts...
Mathieu d’Aquin
@mdaquin
mdaquin.net
mathieu.daquin@nuigalway.ie

More Related Content

PDF
Intro to Data Science for Non-Data Scientists
PDF
Exploring the Data science Process
PDF
What data scientists really do, according to 50 data scientists
PPTX
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
PPTX
Introduction to Data Science
PDF
The Data Science Process
PDF
1. introduction to data science —
PPTX
Data science 101
Intro to Data Science for Non-Data Scientists
Exploring the Data science Process
What data scientists really do, according to 50 data scientists
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Introduction to Data Science
The Data Science Process
1. introduction to data science —
Data science 101

What's hot (20)

PDF
How to Build Data Science Teams
PDF
8 minute intro to data science
PDF
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
PPTX
A Practical-ish Introduction to Data Science
PDF
Data Science Project Lifecycle
DOCX
Datascienceindia article
PDF
Introduction to Data Science (Data Summit, 2017)
PPTX
Data science applications and usecases
PPTX
introduction to data science
PPTX
Data Science: Past, Present, and Future
PDF
Data Science For Social Scientists Workshop
PPTX
The Other 99% of a Data Science Project
PDF
Data science presentation 2nd CI day
PDF
Solve User Problems: Data Architecture for Humans
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PDF
Architecting a Platform for Enterprise Use - Strata London 2018
PDF
Pay no attention to the man behind the curtain - the unseen work behind data ...
PDF
Data_Scientist_Position_Description
PDF
The Black Box: Interpretability, Reproducibility, and Data Management
PPTX
Machine Learning in Big Data
How to Build Data Science Teams
8 minute intro to data science
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
A Practical-ish Introduction to Data Science
Data Science Project Lifecycle
Datascienceindia article
Introduction to Data Science (Data Summit, 2017)
Data science applications and usecases
introduction to data science
Data Science: Past, Present, and Future
Data Science For Social Scientists Workshop
The Other 99% of a Data Science Project
Data science presentation 2nd CI day
Solve User Problems: Data Architecture for Humans
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Architecting a Platform for Enterprise Use - Strata London 2018
Pay no attention to the man behind the curtain - the unseen work behind data ...
Data_Scientist_Position_Description
The Black Box: Interpretability, Reproducibility, and Data Management
Machine Learning in Big Data
Ad

Similar to A data view of the data science process (20)

PPTX
Lecture 1 - Data Mining (data minging).pptx
PPTX
Using Open Science to advance science - advancing open data
PDF
Supervised Multi Attribute Gene Manipulation For Cancer
PPT
Acting as Advocate? Seven steps for libraries in the data decade
PDF
CLIR Fellows - Science Data - 14_0730
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PPT
Open Data in a Big Data World: easy to say, but hard to do?
PPT
wolstencroft-ogf20-astro
PDF
Data Science And Big Data An Environment Of Computational Intelligence 1st Ed...
PPT
Machine Learning, Data Mining, and
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPT
Datamining - Introduction - Knowledge Discovery in Databases
PPT
SENCER_panel.ppt
PDF
Data Warehousing And Knowledge Discovery 13th International Conference Dawak ...
PDF
Making an impact with data science
PPTX
Capstone Project.pptx
PPT
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
PDF
Next Generation Sequencing in Big Data
PDF
Prerquisite for Data Sciecne, KDD, Attribute Type
PDF
Data Science Provenance: From Drug Discovery to Fake Fans
Lecture 1 - Data Mining (data minging).pptx
Using Open Science to advance science - advancing open data
Supervised Multi Attribute Gene Manipulation For Cancer
Acting as Advocate? Seven steps for libraries in the data decade
CLIR Fellows - Science Data - 14_0730
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Open Data in a Big Data World: easy to say, but hard to do?
wolstencroft-ogf20-astro
Data Science And Big Data An Environment Of Computational Intelligence 1st Ed...
Machine Learning, Data Mining, and
Data Science and AI in Biomedicine: The World has Changed
Datamining - Introduction - Knowledge Discovery in Databases
SENCER_panel.ppt
Data Warehousing And Knowledge Discovery 13th International Conference Dawak ...
Making an impact with data science
Capstone Project.pptx
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Next Generation Sequencing in Big Data
Prerquisite for Data Sciecne, KDD, Attribute Type
Data Science Provenance: From Drug Discovery to Fake Fans
Ad

More from Mathieu d'Aquin (20)

PDF
A factorial study of neural network learning from differences for regression
PDF
Recentrer l'intelligence artificielle sur les connaissances
PDF
Data and Knowledge as Commodities
PDF
Unsupervised learning approach for identifying sub-genres in music scores
PDF
Is knowledge engineering still relevant?
PDF
Dealing with Open Domain Data
PDF
Web Analytics for Everyday Learning
PDF
Presentation a in ovive montpellier - 26%2 f06%2f2018 (1)
PDF
Learning Analytics: understand learning and support the learner
PDF
The AFEL Project
PDF
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...
PDF
Data ethics
PDF
Data for Learning and Learning with Data
PDF
Towards an “Ethics in Design” methodology for AI research projects
PDF
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...
PDF
Profiling information sources and services for discovery
PDF
Analyse de données et de réseaux sociaux pour l’aide à l’apprentissage infor...
PDF
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
PDF
Data analytics beyond data processing and how it affects Industry 4.0
PDF
Données ouvertes et traces numériques
A factorial study of neural network learning from differences for regression
Recentrer l'intelligence artificielle sur les connaissances
Data and Knowledge as Commodities
Unsupervised learning approach for identifying sub-genres in music scores
Is knowledge engineering still relevant?
Dealing with Open Domain Data
Web Analytics for Everyday Learning
Presentation a in ovive montpellier - 26%2 f06%2f2018 (1)
Learning Analytics: understand learning and support the learner
The AFEL Project
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...
Data ethics
Data for Learning and Learning with Data
Towards an “Ethics in Design” methodology for AI research projects
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...
Profiling information sources and services for discovery
Analyse de données et de réseaux sociaux pour l’aide à l’apprentissage infor...
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
Data analytics beyond data processing and how it affects Industry 4.0
Données ouvertes et traces numériques

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
A Presentation on Artificial Intelligence
NewMind AI Weekly Chronicles - August'25 Week I
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Monthly Chronicles - July 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
A Presentation on Artificial Intelligence

A data view of the data science process