Paradoxes in Data Science

Paradoxes in
Data Science
Pier Paolo Ippolito

Hello!
▰ Data Scientist at SAS Institute
▰ Towards Data Science writer
▰ Freelancer
▰ MSc Artificial Intelligence
(University of Southampton)
2

Paradoxes
Paradoxes are a class of phenomena which arise when, although starting from
premises known as true, we derive some sort of logically unreasonable result. As
Machine Learning models create knowledge from data, this makes them
susceptible to possible cognitive paradoxes between training and testing.
4

Agenda
▰ Data Science without data: Modelling and
Simulations
▰ Simpson's Paradox
▰ Accuracy Paradox
▰ Learnability-Godel Paradox
▰ The Law of Unintended Consequences
5

1.
Modelling and Simulations in
Data Science
Using Data Science and Machine
Learning even when there is no data
available.

“Essentially all models are
wrong, but some are
useful.
7
George E. P. Box, Statistics for Experimenters,
second edition, 2005, page 440

8
Epidemic Modelling: COVID-19

Modelling Approaches
There are two main types of programmable simulation models:
▰ Mathematical Models: make use of mathematical symbols and relationships in
order to summarise processes. Compartmental Models in Epidemiology are a
typical example of mathematical models (e.g. SIR, SEIR, etc…).
▰ Process Models: are based on a list of steps handcrafted by the designer in
order to represent an environment (e.g. Agent-Based Modelling).
9

Web Application
✔ Streamlit interface
✔ Docker support
✔ Automatic update of COVID-19 stats every 24 hours
✔ International news updated every 2 hours
✔ Docker container hosted on the Azure Container registry, deployed on an Azure
Web App
14

Practical Demonstration
✔ Complete Publication
✔ Extras
✔ Open Source Code
✔ Web Application
✔ Medium Article
16

Forest Fire Simulation
(Mesa)
17

Forest Fire Simulation
(Hash)
18

Simpson’s Paradox
«Simpson's paradox, is a phenomenon in probability and statistics, in which a trend
appears in several different groups of data but disappears or reverses when these
groups are combined. The paradox can be resolved when causal relations are
appropriately addressed in the statistical modelling.»
20

“When a measure becomes
a target, it ceases to be a
good measure.
25
Charles Goodhart

Pareto Principle
26 Image Source

Learnability-Godel Paradox
Kurt Gödel is one of the most famous mathematicians of the last century. Undisputedly,
one of it’s most interesting theories are the two Gödel’s Incompleteness Theorems.:
▰ According to these theorems, nowadays Mathematics has some intrinsic limitations
which doesn’t allow it to state with certainty if a statement is true or not. The whole
field of Data Science is deeply interconnected with mathematical thinking and
therefore this leads us to a paradox (Learnability-Godel Paradox).
▰ Depending on if Gödel theory is right or wrong, this would demonstrate to be either
possible or not to make extrapolations from a population sample.
28

5.
The Law of Unintended
Consequences

The Law of Unintended
Consequences
30
Image Source

Conclusion
In this presentation, I introduced some of the main paradoxes related to Data Science.
Although, many other common paradoxes could potentially have implications in Data
Science and Artificial Intelligence. Some examples are:
• Friendship paradox in Network Analysis
• Berkson’s Paradox
• Braess’s Paradox
• Moravec Paradox
• Birthday Paradox
32

Thank
you! Questions?
Contacts:
▰ LinkedIn
▰ GitHub
▰ Online Portfolio
▰ Towards Data Science
33

Paradoxes in Data Science

More Related Content

What's hot (20)

Similar to Paradoxes in Data Science (20)

More from Alexey Grigorev (20)

Recently uploaded (20)

Paradoxes in Data Science