SlideShare a Scribd company logo
Intro to Data
Science AI
Loic Merckel
Frankfurt (DE), 01/2025
Tesco: The Pilot,Early
Success of Data
Leveraging?
What scares me about this is
that you know more about my
customers after three months
than I know after 30 years.
— Ian MacLaurin (while Tesco
chairman), to Clive Humby after he
presented his findings.
Some Do
Drive
Tremendous
Value
Creation By
Leveraging
Data
"If we have
data,let’s look
at data.If all we
have are
opinions,let’s
go with mine."
— Jim Barksdale (former CEO of
Netscape)
Organizations Have Data,
But Most of It May Be
Dead Weight
Key data waste findings from Veritas The
UK 2020 Databerg Report Revisited:
53% is dark (unclassified) data.
–
28% is ROT (Redundant, Obsolete,
Trivial) data.
–
Only 19% is business-critical data.
–
Organizations spending significant
resources on storing non-critical data.
–
Is Having Data
Really Enough?
McKinsey (2023) states that:
success remains the exception, not the
rule,
and
even successful organizational
transformations deliver less than their
full potential.
–
BCG (2020) notes that only 30% of digital
transformations meet or exceed their target value
and result in sustainable change.
–
Data alone is not enough to drive results without the right tools and strategy.
Tesco: The Sequel
"I made a mistake on Tesco.
That was a huge mistake by
me."
— Warren Buffet (2014)
Initial Success: Tesco’s Clubcard set the
standard for data-driven retail, making
Tesco a global leader.
–
Over-reliance on Data: Data alone
proved insufficient without innovation and
adaptability.
–
Competitive Pressures: Simpler, price-
focused rivals like Aldi and Lidl outpaced
Tesco.
–
Broader Implications: Big Data needs
strategic vision and agility to sustain value
(e.g., balancing analytics with customer
trust).
–
Schrage, M. (2014). Tesco’s downfall is a warning to data-driven retailers. Harvard Business
Review.
–
Warren Buffet on CNBC: https://www.cnbc.com/2014/10/03/warren-buffet-i-made-a-mistake-
on-tesco.html
–
Gaining Insight With
Real-World Data
To adapt to data scientists
something I read on the T-shirt of a
classmate years ago about software
developers:
The junior thinks it is hard.
–
The mid-level thinks it is easy.
–
The true senior knows it is
hard.
–
"When the
data and the
anecdotes
disagree,the
anecdotes are
usually right."
— Jeff Bezos
youtu.be/uFUc_5OMB5s?si=NjRe4dnSfyL_OVHc
Explain?
Predict?
Describe?
What are we doing here?
In "What is the Question?" by Leek &
Peng (2015), the authors argue that
misidentifying the type of analysis is
the most frequent cause of flawed
conclusions.
Like Kafka’s K., analysts often don't know their purpose
The Answer is 42! But
What Was The
Question?
Table copied from Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347.
https://doi.org/10.1126/science.aaa6146. The two examples are discussed by Leek & Peng
(2015).
Google Flu Trends' Fiasco: Explanatory → Predictive
Cellphones and Brain Cancer: Inferential → Causal
Ambitious goal to predict flu outbreaks from search data
faster than CDC reports.
–
Failed spectacularly, in large part, due to overfitting
(50M terms vs 1,152 data points).
–
Even simple models using old CDC data were
performing better.
–
Case-control studies found an association between
phone use and brain cancer (30–200% increased risk),
but these studies cannot infer causation due to recall
bias and methodological flaws.
–
Larger prospective studies found no evidence of a
causal link, illustrating the confusion between inferential
and causal reasoning.
–
Chocolate
& Nobel
Prizes
Predictive modeling often
needs correlation, not
causation.
Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. N Engl J Med,
367(16), 1562-1564.
Model Over-Fitting
"With four
parameters I can
fit an elephant,
and with five I can
make him wiggle
his trunk."
— attributed to John von Neumann
Dyson, F. A meeting with Enrico Fermi. Nature 427, 297 (2004). https://doi.org/10.1038/427297a
–
Mayer et al. (2010). Drawing an elephant with four complex parameters. American Journal of Physics
–
Machine Learning vs.
Statistical Modeling
Not competitors, but complementary approaches
working together to unlock insights from data
ML: Algorithms that learn patterns from data
with the goal of generalizing to unseen
examples (with minimal assumptions about
the data generating process).
–
SM: Mathematical frameworks to understand
relationships in data and quantify uncertainty
in conclusions (focuses on understanding
the data generating process through
explicit assumptions).
–
Note: The intricate relationship between these fields is demonstrated in how ML processes often rely on statistical methods for assessment and validation.
The Foundation of Insights:
Representative Data in SM
and ML
SM: Rigorous and Controlled (ideally)
ML: Flexible but Data-Hungry
Set Minimum Detectable Effect (MDE).
–
Define significance level (α) and power (1-β).
–
α (Type I error rate): The probability of incorrectly rejecting H .
– 0
β (Type II error rate): The probability of failing to reject H when it is false.
– 0
Power (1 - β): probability of correctly rejecting H when H is true.
– 0 A
Calculate required sample size (⚠: ↔ ).
–
Conduct systematic data collection.
–
Derive conclusions through structured analysis.
–
Often leverages available or opportunistic data.
–
Focuses on mitigating overfitting (variance).
–
Uses data augmentation or synthetic data to expand datasets.
–
Employs robust validation strategies (e.g., train/validation/test
splits or cross-validation).
–
The ML's Oversize-
and-Regularize
Paradigm
SM: Typically prefers parsimonious models with
fewer parameters, emphasizing interpretability
and theoretical underpinnings.
–
ML: Often relies on large, complex models to
capture intricate patterns in big datasets,
mitigating overfitting through regularization.
–
ResNet and GPT architectures are highly
oversized but use techniques like dropout,
weight decay, and other forms of regularization.
–
This approach has proven highly effective for
real-world tasks, driving state-of-the-art
performance in modern ML.
–
Machine Learning: Oversized Power, Like a Wrecking Ball Controlled by a Crane (Regularization) vs. Statistical Modeling: Precision and Parsimony, Like a Hammer
Sampling Is Crucial for SM
Douglas Hubbard tells a story where the famous
statistician John Tukey is quoted saying:
A random selection of three people
would have been better than a
group of 300 chosen by Mr.Kinsey.
1
Hubbard, D. W. (2010). How to Measure Anything: Finding the Value of "Intangibles" in Business.
1.
Sampling: The Good, The
Somewhat Bad and the
Truly Ugly
Probability Sampling (The Good)
Every member of the population has a known, non-zero chance of
being selected. This method ensures representativeness and
allows for statistical inference. (E.g., random sampling, stratified
sampling, and cluster sampling.)
Non-Probability Sampling (The Somewhat Bad)
Members of the population are selected based on subjective criteria
or convenience, without ensuring representativeness. While faster and
cheaper, it risks bias and limits generalizability. (E.g., convenience
sampling and quota sampling).
Spurious Sampling (The Truly Ugly)
A flawed method where the sample is improperly drawn or defined,
leading to misleading or invalid conclusions. Often results from poor
study design, selection bias, or data contamination.
Samples
must be
handled
with
care!
Sampling User
Manual: Act One
We want to analyze the number of
coffee served during the day at the
cafeteria. We have two hypotheses:
: Morning vs. Afternoon
( )
– H01
μ =
morning μafternoon
: Morning vs. Evening
( )
– H02
μ =
morning μevening
Sampling User Manual:
Act Two
Family-Wise Tests
A family of tests refers to a group of hypotheses that are
analyzed together and share some logical or contextual
relationship. Tests may be grouped when they:
Family-Wise Error Rate (FWER)
Family-Wise Error Correction
Corrections to reduce the per-test α to ensure the FWER
remains below a desired threshold (e.g., 0.05).
Address related aspects of the same research question.
–
Are derived from the same experimental framework.
–
Share statistical or logical dependencies.
–
Probability of making at least one Type I error (i.e., falsely
reject the null) across a family of hypothesis tests.
–
, where = number of tests
– FWER = 1 − (1 − α)m
m
Example of Correction Methods
Our Coffee Study
Bonferroni: α' = α/m.
–
Holm-Bonferroni: Stepwise approach.
–
Benjamini-Hochberg: Controls False Discovery Rate (FDR)
instead.
–
Tests: Morning vs. Afternoon, Morning vs. Evening.
–
These tests collectively assess the hypothesis: “Morning
coffee consumption is higher than other times of the day.”
–
Therefore, they form a family of tests (m = 2).
–
Without correction: FWER = 1-(1-0.05)² ≈ 0.0975 (9.75%).
–
With Bonferroni (α'=0.05/2): FWER controlled at ≈ 5%.
–
An Introduction to AI (Formerly Data Science)
Remember
the Big
Data Buzz?
Big data drives sampling error toward zero but
does not reduce other errors associated with
inferences drawn from a sample.... big data [is]
different from little data in that we must be
careful not to be fooled by our own estimated
precision — Nagler & Tucker (2015, p. 1) .
1
Nagler, J., & Tucker, J. A. (2015). Drawing Inferences and Testing Theories with Big Data. PS: Political Science & Politics,
48(1), 84–88. doi:10.1017/S1049096514001796
1.
Big Data and the
Illusion of Precision
or Certainty
Large sample sizes reduce sampling error,
creating an illusion of precision (e.g., narrow CIs, p-
value ≈ 0).
–
Statistical significance can be misleading:
–
Overfitted models may capture noise, not signal.
–
Sample bias undermines validity (e.g., data not
collected with a proper sampling methodology
or plan).
–
Precision in estimates does not guarantee
meaningful or accurate results.
–
I have learned and taught that the
primary product of a research inquiry is
one or more measures of effect size,
not p-values— Cohen (1990, p. 1310) .
1
Cohen, J. (1990). Things I have learned (so far). In Annual Convention of the American Psychological Association.
1.
Effect Size: Beyond
Statistical Significance
Why Effect Size Matters
Focus on meaningful impact
Statistical significance ≠ Practical importance:
–
Large sample sizes can detect trivial effects (e.g., p-value ≈ 0
for a tiny mean difference).
–
Effect size reflects the magnitude and practical relevance of
a result.
–
How large is the effect in real-world terms?
–
Is the effect relevant to the decision-making process?
–
Example: A study finds a statistically significant difference in sales
(p ≈ 0) between two products in a dataset of 1 million rows.
Effect size: Average difference = $0.01.
–
Interpretation (Haribo candies): For items priced at $1.00,
this represents a 1% difference, which could have a huge
economic impact given high sales volume (≈60B Goldbears
alone per year).
–
Interpretation (luxury cars): For items priced at $50,000+,
this difference is negligible and irrelevant for decision-making
(e.g., BMW sales ≈2.5M cars per year).
–
Big or Small Data, Pay Attention
to the Tail...
"Many of the features of heavy tailed phenomena
would render our traditional statistical tools useless
at best, dangerous at worst" — Cooke & Nieboer (2011, p. 7).
Challenges with Heavy Tails
Central Limit Theorem (CLT):
Non-Parametric Methods:
Heavy-tailed data often defies common statistical assumptions.
–
Example: Income distributions, natural disasters, or stock market returns.
–
Assumes finite variance, which heavy-tailed distributions (like Pareto, Cauchy,
Weibull, Lévy) may not have.
–
Misleading p-values or confidence intervals can result. (E.g., the t-test may be
unreliable because CLT may not apply.)
–
Bootstrapping struggles with heavy tails (Hall, 1990).
–
Extreme values dominate resampling, leading to unstable results.
–
Cooke, R. M., & Nieboer, D. (2011). Heavy-tailed distributions: Data, diagnostics, and new developments. Resources for the
Future Discussion Paper, (11-19).
–
Hall, P. (1990). Asymptotic properties of the bootstrap for heavy-tailed distributions. The Annals of Probability, 1342-1360.
–
Misidentifying The Type of Analysis and
Spurious Sampling Are Not The Only Causes
of Troubles
Isaac B. (2024). Harvard Business School
Investigation Report Recommended
Firing Francesca Gino. The Harvard
Crimson.
–
Cyranoski, D. (2014). Accusations pile up
amid Japan’s stem-cell controversy.
Nature.
–
Ioannidis J. P. A. (2005). Why Most
Published Research Findings Are False.
PLoS Med.
–
Harford , T. (2023). Behind the fraud drama
rocking academia. Financial Times.
https://on.ft.com/47Z0WMc.
–
Notable Research
Misconduct Cases
STAP Cell Case (2014)
Francesca Gino Case (2023)
RIKEN researcher claimed groundbreaking stem cell
method in Nature.
–
Data manipulation discovered, papers retracted.
–
Investigation committee faced own integrity crisis.
–
Supervisor died by suicide, researcher resigned.
–
Harvard Business School professor studying dishonesty.
–
Data fabrication discovered in behavioral science
research.
–
Multiple papers retracted.
–
Filed $25M lawsuit against Harvard.
–
Arbitrary Depiction of Dolos (Δόλος)
[T]he headline-grabbing cases of misconduct
and fraud are mere distractions. The state of
our science is strong, but it’s plagued by a
universal problem: Science is hard — really
fucking hard— Christie Aschwanden .
1
Aschwanden, C. (2015). Science isn’t broken. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/
1.
"Listening to Beatles
Music Makes People
Younger"
A Case Study in Data Dredging (Simmons et al., 2011)
The "Study"
This highlights how flexibility in analysis can manufacture
statistical significance, even for implausible outcomes.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data
collection and analysis allows presenting anything as significant. Psychological science, 22(11), 1359-1366.
10 students listened to "When I'm Sixty-Four." And 10 others
to "Kalimba" (Windows OS).
–
Collected many variables including birth dates.
–
Used father's age as a covariate.
–
Result: Participants were "1.5 years younger"! (P = 0.04)
–
Researcher Flexibility Issues
Researchers have too much flexibility ("researcher degrees of
freedom") in collecting and analyzing data.
The authors emphasize that adhering to established hypothesis
testing frameworks and transparent practices is essential to
avoid false positives.
Stopping Rule: Deciding when to stop collecting data.
–
Variable Selection: Choosing which measures to analyze.
–
Condition Selection: Comparing specific groups post hoc.
–
Data Exclusion: Removing or keeping observations arbitrary.
–
The problem is not with the statistical tools
themselves, but with researcher behavior and
incentives. Simonsohn (2014, p. 9) notes:
The root cause of p-
hacking ...lies in a
conflict of interest.
Researchers are rewarded for finding certain
types of results.
And no! There is no evidence that dishonesty
fosters creativity...
Simonsohn, U. (2014). Posterior-hacking: Selective reporting invalidates Bayesian results also.
https://dx.doi.org/10.2139/ssrn.2374040.
–
Ioannidis J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Med 2(8): e124.
https://doi.org/10.1371/journal.pmed.0020124.
–
A striking example: a fraudulent paper claiming benefits from dishonesty.
The Basic and Applied Social
Psychology Journal Went
Ballistic With Banning "p-
values"
and, instead, "encourage the use of larger sample sizes ...
because as the sample size increases, descriptive statistics
become increasingly stable and sampling error is less of a
problem" (Trafimow & Marks, 2015).
The journal also critiques some Bayesian procedures, reserving
judgment on their use on a case-by-case basis.
As discussed earlier, sampling errors are reduced with larger
sample sizes, but other problems remain.
–
Heavy tailed distributed variables “relating to both natural and
social systems are becoming increasingly ubiquitous” (Vogel,
2024, p. 1).
–
In those situations, the bold move of banning p-values may fall
short in preventing fishy descriptive statistics.
–
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.
–
Vogel, R. M. et al. (2024). When Heavy Tails Disrupt Statistical Inference. The American Statistician, 1–15.
–
Posterior-Hacking: A Bayesian
Counterpart to p-Hacking
While p-hacking is widely discussed, Bayesian methods are not immune to misuse. Posterior-hacking
highlights similar challenges in Bayesian analysis (Simonsohn, 2014).
No statistical framework is immune to misuse. Transparency and rigorous checks are critical, regardless of the
approach.
Simonsohn, U. (2014). Posterior-hacking: Selective reporting invalidates Bayesian results also. https://dx.doi.org/10.2139/ssrn.2374040
Tweaking priors—justified refinement is fine, but arbitrary changes to favor desired posterior
results can mislead.
–
Selective reporting of favorable models or estimates.
–
Repeated re-analysis with different assumptions to achieve the "right" result.
–
An Introduction to AI (Formerly Data Science)
Estimating the
Reproducibility of
Psychological Science
Open Science Collaboration. (2015). Estimating the reproducibility of
psychological science. Science, 349(6251).
https://doi.org/10.1126/science.aac4716
Original studies: 97 out of 100
were deemed significant (with
estimated power 92%) hence an
expected 89 replications
significant.
–
Replication studies: 37 out of 97
only were significant.
–
Image from: https://doi.org/10.1126/science.aac4716
Image from: Aschwanden, C. (2015). Science isn’t broken. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/ — Research paper: Silberzahn R. et al. (2018) at
https://doi.org/10.1177/2515245917747646
The important lesson here is that a
single analysis is not sufficient to find a
definitive answer — Christie Aschwanden .
1
Aschwanden, C. (2015). Science isn’t broken. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/
1.
Aschwanden, C. (2016).
Failure Is Moving
Science Forward,
The Replication
Crisis Is a Sign That
Science Is Working.
FiveThirtyEight.
fivethirtyeight.com/features/failure-is-moving-
science-forward/
Korbmacher, M. et al. (2023).
The replication
crisis has led to
positive structural,
procedural,and
community
changes.
Commun Psychol 1, 3.
nature.com/articles/s44271-023-00003-2
The Corporate
World Isn't
Shielded
And there is no open crisis to spark positive
changes...
This wisdom, passed down from Jason Zweig's father, captures a cynical but often
accurate observation about incentives in corporate settings. It resonates particularly
when considering how data and analysis are presented in business contexts, where the
pressure to deliver "favorable results" often challenges integrity (Zweig, 2018):
Zweig, J. (2018). Three Ways to Get Paid. Jason Zweig. Retrieved from https://jasonzweig.com/three-ways-to-get-paid/.
Lie to people who want to be lied to, and you'll get rich.
[E.g., selectively presenting metrics to align with leadership's preferences.]
–
Tell the truth to those who want the truth, and you'll make a living.
[E.g., offering nuanced insights to leaders who value transparency.]
–
Tell the truth to those who want to be lied to, and you'll go broke.
[E.g., refusing to manipulate data in a culture where validation trumps truth.]
–
Some Wisdoms
Twyman's law: A principle in statistics
that warns against trusting unusually
good results.
–
Occam's razor: Simpler models or
explanations are preferred, but only if they
perform equally well.
–
Law of Diminishing Returns: More
features or data are not always better—
focus on quality over quantity.
–
Garbage In, Garbage Out (GIGO): The
quality of your data directly affects the
quality of your insights.
–
"Every genuine
test of a theory
is an attempt to
falsify it,or to
refute it."
— Popper (1962)
So long, and thanks for all the fish... Or any questions?

More Related Content

DOCX
Hypothesis testing
PDF
DS-38data sciencehandbooknotescompiled-46.pdf
PPTX
To Explain Or To Predict?
PDF
Error Control and Severity
PDF
Mayo O&M slides (4-28-13)
PPTX
CO 3. Hypothesis Testing which is basicl
PDF
Mayo, DG March 8-Emory AI Systems and society conference slides.pdf
PDF
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
Hypothesis testing
DS-38data sciencehandbooknotescompiled-46.pdf
To Explain Or To Predict?
Error Control and Severity
Mayo O&M slides (4-28-13)
CO 3. Hypothesis Testing which is basicl
Mayo, DG March 8-Emory AI Systems and society conference slides.pdf
The Data Errors we Make by Sean Taylor at Big Data Spain 2017

Similar to An Introduction to AI (Formerly Data Science) (20)

PDF
Jsm big-data
PPTX
intro_big_data.pptx
PDF
Data interpretation in description analysis
PDF
Module 4: Model Selection and Evaluation
PDF
Making Statistics Work For Us: Item Bias, Decision Making, and Data-Driven Si...
PPT
Agent-Based Modelling and Microsimulation: Ne’er the Twain Shall Meet?
DOCX
Mb0050 research methodology
PPTX
IE_expressyourself_EssayH
PPTX
Chapter_9.pptx
PDF
Thompson Sampling for Machine Learning - Ruben Mak
PPTX
UNIT1-2.pptx
PPT
250Lec5INFERENTIAL STATISTICS FOR RESEARC
PPT
Descriptive And Inferential Statistics for Nursing Research
PPT
Ds vs Is discuss 3.1
PPTX
Statistical Approaches to Missing Data
PDF
Data Science Interview Questions PDF By ScholarHat
DOCX
Mb0050 research methodology
PPT
Stats Workshop2010
PDF
D. Mayo: Replication Research Under an Error Statistical Philosophy
PDF
the Introduction to statistical analysis
Jsm big-data
intro_big_data.pptx
Data interpretation in description analysis
Module 4: Model Selection and Evaluation
Making Statistics Work For Us: Item Bias, Decision Making, and Data-Driven Si...
Agent-Based Modelling and Microsimulation: Ne’er the Twain Shall Meet?
Mb0050 research methodology
IE_expressyourself_EssayH
Chapter_9.pptx
Thompson Sampling for Machine Learning - Ruben Mak
UNIT1-2.pptx
250Lec5INFERENTIAL STATISTICS FOR RESEARC
Descriptive And Inferential Statistics for Nursing Research
Ds vs Is discuss 3.1
Statistical Approaches to Missing Data
Data Science Interview Questions PDF By ScholarHat
Mb0050 research methodology
Stats Workshop2010
D. Mayo: Replication Research Under an Error Statistical Philosophy
the Introduction to statistical analysis
Ad

More from Loic Merckel (8)

PDF
Agile Infinity: When the Customer Is an Abstract Concept
PDF
Introduction to Generative Artificial Intelligence
PDF
Introduction to LLMs
PDF
Intro to LLMs
PDF
Generative Models and ChatGPT
PDF
iQHorse—Towards Beating the Market
PDF
Ideation: First Step Towards Innovation
PDF
Are Decisions From a Single Point Wise?
Agile Infinity: When the Customer Is an Abstract Concept
Introduction to Generative Artificial Intelligence
Introduction to LLMs
Intro to LLMs
Generative Models and ChatGPT
iQHorse—Towards Beating the Market
Ideation: First Step Towards Innovation
Are Decisions From a Single Point Wise?
Ad

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Database Infoormation System (DBIS).pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Computer network topology notes for revision
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
annual-report-2024-2025 original latest.
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Quality review (1)_presentation of this 21
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Database Infoormation System (DBIS).pptx
Business Analytics and business intelligence.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Acceptance and paychological effects of mandatory extra coach I classes.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Acumen Training GuidePresentation.pptx
Computer network topology notes for revision
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Reliability_Chapter_ presentation 1221.5784
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
climate analysis of Dhaka ,Banglades.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Business Ppt On Nestle.pptx huunnnhhgfvu
annual-report-2024-2025 original latest.
oil_refinery_comprehensive_20250804084928 (1).pptx
Quality review (1)_presentation of this 21
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

An Introduction to AI (Formerly Data Science)

  • 1. Intro to Data Science AI Loic Merckel Frankfurt (DE), 01/2025
  • 2. Tesco: The Pilot,Early Success of Data Leveraging? What scares me about this is that you know more about my customers after three months than I know after 30 years. — Ian MacLaurin (while Tesco chairman), to Clive Humby after he presented his findings.
  • 4. "If we have data,let’s look at data.If all we have are opinions,let’s go with mine." — Jim Barksdale (former CEO of Netscape)
  • 5. Organizations Have Data, But Most of It May Be Dead Weight Key data waste findings from Veritas The UK 2020 Databerg Report Revisited: 53% is dark (unclassified) data. – 28% is ROT (Redundant, Obsolete, Trivial) data. – Only 19% is business-critical data. – Organizations spending significant resources on storing non-critical data. –
  • 6. Is Having Data Really Enough? McKinsey (2023) states that: success remains the exception, not the rule, and even successful organizational transformations deliver less than their full potential. – BCG (2020) notes that only 30% of digital transformations meet or exceed their target value and result in sustainable change. – Data alone is not enough to drive results without the right tools and strategy.
  • 7. Tesco: The Sequel "I made a mistake on Tesco. That was a huge mistake by me." — Warren Buffet (2014) Initial Success: Tesco’s Clubcard set the standard for data-driven retail, making Tesco a global leader. – Over-reliance on Data: Data alone proved insufficient without innovation and adaptability. – Competitive Pressures: Simpler, price- focused rivals like Aldi and Lidl outpaced Tesco. – Broader Implications: Big Data needs strategic vision and agility to sustain value (e.g., balancing analytics with customer trust). – Schrage, M. (2014). Tesco’s downfall is a warning to data-driven retailers. Harvard Business Review. – Warren Buffet on CNBC: https://www.cnbc.com/2014/10/03/warren-buffet-i-made-a-mistake- on-tesco.html –
  • 8. Gaining Insight With Real-World Data To adapt to data scientists something I read on the T-shirt of a classmate years ago about software developers: The junior thinks it is hard. – The mid-level thinks it is easy. – The true senior knows it is hard. –
  • 9. "When the data and the anecdotes disagree,the anecdotes are usually right." — Jeff Bezos youtu.be/uFUc_5OMB5s?si=NjRe4dnSfyL_OVHc
  • 10. Explain? Predict? Describe? What are we doing here? In "What is the Question?" by Leek & Peng (2015), the authors argue that misidentifying the type of analysis is the most frequent cause of flawed conclusions. Like Kafka’s K., analysts often don't know their purpose
  • 11. The Answer is 42! But What Was The Question? Table copied from Leek, J. T., & Peng, R. D. (2015). What is the question?. Science, 347. https://doi.org/10.1126/science.aaa6146. The two examples are discussed by Leek & Peng (2015). Google Flu Trends' Fiasco: Explanatory → Predictive Cellphones and Brain Cancer: Inferential → Causal Ambitious goal to predict flu outbreaks from search data faster than CDC reports. – Failed spectacularly, in large part, due to overfitting (50M terms vs 1,152 data points). – Even simple models using old CDC data were performing better. – Case-control studies found an association between phone use and brain cancer (30–200% increased risk), but these studies cannot infer causation due to recall bias and methodological flaws. – Larger prospective studies found no evidence of a causal link, illustrating the confusion between inferential and causal reasoning. –
  • 12. Chocolate & Nobel Prizes Predictive modeling often needs correlation, not causation. Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. N Engl J Med, 367(16), 1562-1564.
  • 13. Model Over-Fitting "With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." — attributed to John von Neumann Dyson, F. A meeting with Enrico Fermi. Nature 427, 297 (2004). https://doi.org/10.1038/427297a – Mayer et al. (2010). Drawing an elephant with four complex parameters. American Journal of Physics –
  • 14. Machine Learning vs. Statistical Modeling Not competitors, but complementary approaches working together to unlock insights from data ML: Algorithms that learn patterns from data with the goal of generalizing to unseen examples (with minimal assumptions about the data generating process). – SM: Mathematical frameworks to understand relationships in data and quantify uncertainty in conclusions (focuses on understanding the data generating process through explicit assumptions). – Note: The intricate relationship between these fields is demonstrated in how ML processes often rely on statistical methods for assessment and validation.
  • 15. The Foundation of Insights: Representative Data in SM and ML SM: Rigorous and Controlled (ideally) ML: Flexible but Data-Hungry Set Minimum Detectable Effect (MDE). – Define significance level (α) and power (1-β). – α (Type I error rate): The probability of incorrectly rejecting H . – 0 β (Type II error rate): The probability of failing to reject H when it is false. – 0 Power (1 - β): probability of correctly rejecting H when H is true. – 0 A Calculate required sample size (⚠: ↔ ). – Conduct systematic data collection. – Derive conclusions through structured analysis. – Often leverages available or opportunistic data. – Focuses on mitigating overfitting (variance). – Uses data augmentation or synthetic data to expand datasets. – Employs robust validation strategies (e.g., train/validation/test splits or cross-validation). –
  • 16. The ML's Oversize- and-Regularize Paradigm SM: Typically prefers parsimonious models with fewer parameters, emphasizing interpretability and theoretical underpinnings. – ML: Often relies on large, complex models to capture intricate patterns in big datasets, mitigating overfitting through regularization. – ResNet and GPT architectures are highly oversized but use techniques like dropout, weight decay, and other forms of regularization. – This approach has proven highly effective for real-world tasks, driving state-of-the-art performance in modern ML. – Machine Learning: Oversized Power, Like a Wrecking Ball Controlled by a Crane (Regularization) vs. Statistical Modeling: Precision and Parsimony, Like a Hammer
  • 17. Sampling Is Crucial for SM Douglas Hubbard tells a story where the famous statistician John Tukey is quoted saying: A random selection of three people would have been better than a group of 300 chosen by Mr.Kinsey. 1 Hubbard, D. W. (2010). How to Measure Anything: Finding the Value of "Intangibles" in Business. 1.
  • 18. Sampling: The Good, The Somewhat Bad and the Truly Ugly Probability Sampling (The Good) Every member of the population has a known, non-zero chance of being selected. This method ensures representativeness and allows for statistical inference. (E.g., random sampling, stratified sampling, and cluster sampling.) Non-Probability Sampling (The Somewhat Bad) Members of the population are selected based on subjective criteria or convenience, without ensuring representativeness. While faster and cheaper, it risks bias and limits generalizability. (E.g., convenience sampling and quota sampling). Spurious Sampling (The Truly Ugly) A flawed method where the sample is improperly drawn or defined, leading to misleading or invalid conclusions. Often results from poor study design, selection bias, or data contamination.
  • 20. Sampling User Manual: Act One We want to analyze the number of coffee served during the day at the cafeteria. We have two hypotheses: : Morning vs. Afternoon ( ) – H01 μ = morning μafternoon : Morning vs. Evening ( ) – H02 μ = morning μevening
  • 21. Sampling User Manual: Act Two Family-Wise Tests A family of tests refers to a group of hypotheses that are analyzed together and share some logical or contextual relationship. Tests may be grouped when they: Family-Wise Error Rate (FWER) Family-Wise Error Correction Corrections to reduce the per-test α to ensure the FWER remains below a desired threshold (e.g., 0.05). Address related aspects of the same research question. – Are derived from the same experimental framework. – Share statistical or logical dependencies. – Probability of making at least one Type I error (i.e., falsely reject the null) across a family of hypothesis tests. – , where = number of tests – FWER = 1 − (1 − α)m m Example of Correction Methods Our Coffee Study Bonferroni: α' = α/m. – Holm-Bonferroni: Stepwise approach. – Benjamini-Hochberg: Controls False Discovery Rate (FDR) instead. – Tests: Morning vs. Afternoon, Morning vs. Evening. – These tests collectively assess the hypothesis: “Morning coffee consumption is higher than other times of the day.” – Therefore, they form a family of tests (m = 2). – Without correction: FWER = 1-(1-0.05)² ≈ 0.0975 (9.75%). – With Bonferroni (α'=0.05/2): FWER controlled at ≈ 5%. –
  • 24. Big data drives sampling error toward zero but does not reduce other errors associated with inferences drawn from a sample.... big data [is] different from little data in that we must be careful not to be fooled by our own estimated precision — Nagler & Tucker (2015, p. 1) . 1 Nagler, J., & Tucker, J. A. (2015). Drawing Inferences and Testing Theories with Big Data. PS: Political Science & Politics, 48(1), 84–88. doi:10.1017/S1049096514001796 1.
  • 25. Big Data and the Illusion of Precision or Certainty Large sample sizes reduce sampling error, creating an illusion of precision (e.g., narrow CIs, p- value ≈ 0). – Statistical significance can be misleading: – Overfitted models may capture noise, not signal. – Sample bias undermines validity (e.g., data not collected with a proper sampling methodology or plan). – Precision in estimates does not guarantee meaningful or accurate results. –
  • 26. I have learned and taught that the primary product of a research inquiry is one or more measures of effect size, not p-values— Cohen (1990, p. 1310) . 1 Cohen, J. (1990). Things I have learned (so far). In Annual Convention of the American Psychological Association. 1.
  • 27. Effect Size: Beyond Statistical Significance Why Effect Size Matters Focus on meaningful impact Statistical significance ≠ Practical importance: – Large sample sizes can detect trivial effects (e.g., p-value ≈ 0 for a tiny mean difference). – Effect size reflects the magnitude and practical relevance of a result. – How large is the effect in real-world terms? – Is the effect relevant to the decision-making process? – Example: A study finds a statistically significant difference in sales (p ≈ 0) between two products in a dataset of 1 million rows. Effect size: Average difference = $0.01. – Interpretation (Haribo candies): For items priced at $1.00, this represents a 1% difference, which could have a huge economic impact given high sales volume (≈60B Goldbears alone per year). – Interpretation (luxury cars): For items priced at $50,000+, this difference is negligible and irrelevant for decision-making (e.g., BMW sales ≈2.5M cars per year). –
  • 28. Big or Small Data, Pay Attention to the Tail... "Many of the features of heavy tailed phenomena would render our traditional statistical tools useless at best, dangerous at worst" — Cooke & Nieboer (2011, p. 7). Challenges with Heavy Tails Central Limit Theorem (CLT): Non-Parametric Methods: Heavy-tailed data often defies common statistical assumptions. – Example: Income distributions, natural disasters, or stock market returns. – Assumes finite variance, which heavy-tailed distributions (like Pareto, Cauchy, Weibull, Lévy) may not have. – Misleading p-values or confidence intervals can result. (E.g., the t-test may be unreliable because CLT may not apply.) – Bootstrapping struggles with heavy tails (Hall, 1990). – Extreme values dominate resampling, leading to unstable results. – Cooke, R. M., & Nieboer, D. (2011). Heavy-tailed distributions: Data, diagnostics, and new developments. Resources for the Future Discussion Paper, (11-19). – Hall, P. (1990). Asymptotic properties of the bootstrap for heavy-tailed distributions. The Annals of Probability, 1342-1360. –
  • 29. Misidentifying The Type of Analysis and Spurious Sampling Are Not The Only Causes of Troubles Isaac B. (2024). Harvard Business School Investigation Report Recommended Firing Francesca Gino. The Harvard Crimson. – Cyranoski, D. (2014). Accusations pile up amid Japan’s stem-cell controversy. Nature. – Ioannidis J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Med. – Harford , T. (2023). Behind the fraud drama rocking academia. Financial Times. https://on.ft.com/47Z0WMc. –
  • 30. Notable Research Misconduct Cases STAP Cell Case (2014) Francesca Gino Case (2023) RIKEN researcher claimed groundbreaking stem cell method in Nature. – Data manipulation discovered, papers retracted. – Investigation committee faced own integrity crisis. – Supervisor died by suicide, researcher resigned. – Harvard Business School professor studying dishonesty. – Data fabrication discovered in behavioral science research. – Multiple papers retracted. – Filed $25M lawsuit against Harvard. – Arbitrary Depiction of Dolos (Δόλος)
  • 31. [T]he headline-grabbing cases of misconduct and fraud are mere distractions. The state of our science is strong, but it’s plagued by a universal problem: Science is hard — really fucking hard— Christie Aschwanden . 1 Aschwanden, C. (2015). Science isn’t broken. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/ 1.
  • 32. "Listening to Beatles Music Makes People Younger" A Case Study in Data Dredging (Simmons et al., 2011) The "Study" This highlights how flexibility in analysis can manufacture statistical significance, even for implausible outcomes. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science, 22(11), 1359-1366. 10 students listened to "When I'm Sixty-Four." And 10 others to "Kalimba" (Windows OS). – Collected many variables including birth dates. – Used father's age as a covariate. – Result: Participants were "1.5 years younger"! (P = 0.04) – Researcher Flexibility Issues Researchers have too much flexibility ("researcher degrees of freedom") in collecting and analyzing data. The authors emphasize that adhering to established hypothesis testing frameworks and transparent practices is essential to avoid false positives. Stopping Rule: Deciding when to stop collecting data. – Variable Selection: Choosing which measures to analyze. – Condition Selection: Comparing specific groups post hoc. – Data Exclusion: Removing or keeping observations arbitrary. –
  • 33. The problem is not with the statistical tools themselves, but with researcher behavior and incentives. Simonsohn (2014, p. 9) notes: The root cause of p- hacking ...lies in a conflict of interest. Researchers are rewarded for finding certain types of results. And no! There is no evidence that dishonesty fosters creativity... Simonsohn, U. (2014). Posterior-hacking: Selective reporting invalidates Bayesian results also. https://dx.doi.org/10.2139/ssrn.2374040. – Ioannidis J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124. – A striking example: a fraudulent paper claiming benefits from dishonesty.
  • 34. The Basic and Applied Social Psychology Journal Went Ballistic With Banning "p- values" and, instead, "encourage the use of larger sample sizes ... because as the sample size increases, descriptive statistics become increasingly stable and sampling error is less of a problem" (Trafimow & Marks, 2015). The journal also critiques some Bayesian procedures, reserving judgment on their use on a case-by-case basis. As discussed earlier, sampling errors are reduced with larger sample sizes, but other problems remain. – Heavy tailed distributed variables “relating to both natural and social systems are becoming increasingly ubiquitous” (Vogel, 2024, p. 1). – In those situations, the bold move of banning p-values may fall short in preventing fishy descriptive statistics. – Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2. – Vogel, R. M. et al. (2024). When Heavy Tails Disrupt Statistical Inference. The American Statistician, 1–15. –
  • 35. Posterior-Hacking: A Bayesian Counterpart to p-Hacking While p-hacking is widely discussed, Bayesian methods are not immune to misuse. Posterior-hacking highlights similar challenges in Bayesian analysis (Simonsohn, 2014). No statistical framework is immune to misuse. Transparency and rigorous checks are critical, regardless of the approach. Simonsohn, U. (2014). Posterior-hacking: Selective reporting invalidates Bayesian results also. https://dx.doi.org/10.2139/ssrn.2374040 Tweaking priors—justified refinement is fine, but arbitrary changes to favor desired posterior results can mislead. – Selective reporting of favorable models or estimates. – Repeated re-analysis with different assumptions to achieve the "right" result. –
  • 37. Estimating the Reproducibility of Psychological Science Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://doi.org/10.1126/science.aac4716 Original studies: 97 out of 100 were deemed significant (with estimated power 92%) hence an expected 89 replications significant. – Replication studies: 37 out of 97 only were significant. – Image from: https://doi.org/10.1126/science.aac4716
  • 38. Image from: Aschwanden, C. (2015). Science isn’t broken. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/ — Research paper: Silberzahn R. et al. (2018) at https://doi.org/10.1177/2515245917747646
  • 39. The important lesson here is that a single analysis is not sufficient to find a definitive answer — Christie Aschwanden . 1 Aschwanden, C. (2015). Science isn’t broken. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/ 1.
  • 40. Aschwanden, C. (2016). Failure Is Moving Science Forward, The Replication Crisis Is a Sign That Science Is Working. FiveThirtyEight. fivethirtyeight.com/features/failure-is-moving- science-forward/ Korbmacher, M. et al. (2023). The replication crisis has led to positive structural, procedural,and community changes. Commun Psychol 1, 3. nature.com/articles/s44271-023-00003-2
  • 41. The Corporate World Isn't Shielded And there is no open crisis to spark positive changes...
  • 42. This wisdom, passed down from Jason Zweig's father, captures a cynical but often accurate observation about incentives in corporate settings. It resonates particularly when considering how data and analysis are presented in business contexts, where the pressure to deliver "favorable results" often challenges integrity (Zweig, 2018): Zweig, J. (2018). Three Ways to Get Paid. Jason Zweig. Retrieved from https://jasonzweig.com/three-ways-to-get-paid/. Lie to people who want to be lied to, and you'll get rich. [E.g., selectively presenting metrics to align with leadership's preferences.] – Tell the truth to those who want the truth, and you'll make a living. [E.g., offering nuanced insights to leaders who value transparency.] – Tell the truth to those who want to be lied to, and you'll go broke. [E.g., refusing to manipulate data in a culture where validation trumps truth.] –
  • 43. Some Wisdoms Twyman's law: A principle in statistics that warns against trusting unusually good results. – Occam's razor: Simpler models or explanations are preferred, but only if they perform equally well. – Law of Diminishing Returns: More features or data are not always better— focus on quality over quantity. – Garbage In, Garbage Out (GIGO): The quality of your data directly affects the quality of your insights. –
  • 44. "Every genuine test of a theory is an attempt to falsify it,or to refute it." — Popper (1962)
  • 45. So long, and thanks for all the fish... Or any questions?