SlideShare a Scribd company logo
data science @ The New York Times:
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
what industry can learn from us; what we can learn from industry
data science @ The New York Times
biology: 1892 vs. 1995
biology: 1892 vs. 1995
biology changed for good.
biology: 1892 vs. 1995
new toolset, new mindset
biology: 1892 vs. 1995
“These are indeed exciting times, not unlike the early days
of recombinant DNA in the 1970s, in which a
revolutionary new technology permitted entirely new
questions about the nature of genes to be raised. This
challenge is new to biology, and its resolution will require,
in addition to existing paradigms of molecular biology,
new sets of analytical tools... disciplines outside of biology
will be required to collaborate on this problem.”
new toolset, new mindset
genetics: 1837 vs. 2012
ML toolset; data science mindset
genetics: 1837 vs. 2012
genetics: 1837 vs. 2012
ML toolset; data science mindset
arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
Raj, Dewar, Palacios, Rabadan, CW
data science: mindset & toolset
modern history:
2009
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
in tf
in gcp
data science: mindset & toolset
drew conway, 2010
data science: mindset & toolset
develop + deploy
machine learning solutions
to
newsroom + business problems
data science @ The New York Times:
data science @ The New York Timesdata science @ The New York Times
1851
news: 20th century
church state
news: 20th century
church state
news: 20th century
church state
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
news: 21st century
church state
data
learnings
- descriptive modeling
- predictive modeling
- prescriptive modeling
(actually ML, shhhh…)
- (unsupervised learning)
- (supervised learning)
- (reinforcement learning)
(actually ML, shhhh…)
2012; h/t michael littman
learnings
- descriptive modeling
- predictive modeling
- prescriptive modeling
- descriptive modeling
- predictive modeling
- prescriptive modeling
recommendation as inference
bit.ly/AlexCTM
recommendation as inference
Chong & Blei, SIGKDD 2011
CTM generative model:
r: clicks
w: words
u: user-topic association
v: article-topic association
Chong & Blei. SIGKDD 2011
related: learning phenotypes from EHR
Chong & Blei. SIGKDD 2011
related: learning phenotypes from EHR
UPDATE COPYReaderscope
In the course of our global expansion, we
realized we needed to have much more
sophisticated, real-time insight into what’s
happening across our site. 

Who is reading what? And where?
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
f
f
f
Tool: Readerscope
AUDIENCE INSIGHTS ENGINE
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
f
f
f
Tool: Readerscope
AUDIENCE INSIGHTS ENGINE
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
f
Tool: Readerscope
AUDIENCE INSIGHTS ENGINE
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
C-Suite
C-Suite, Executives and BDMs - Entertainment
C-Suite, Executives and BDMs - Media
C-Suite|
C-Suite
C-Suite, Executives and BDMs - Entertainment
C-Suite, Executives and BDMs - Media
LOCATIONS FAQAUDIENCE SEGMENTSTOPICS
FAQ or Intro Information
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut quislabore et dolore magna aliqua. enim ad
minim veniam, quis quis aliqua ullamconostrud
exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum. aliquip ex ea
commodo consequat.
Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
aliquip ex ea commodo consequat.aliquip ex
Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback
Searchex. High-tech Lifestyle, Parents, Media - Comedy Films
Audience Segment
Search by
nytreaderscope
Illustration by Clara Nguyen
f
Tool: Readerscope
AUDIENCE INSIGHTS ENGINE
learnings
- descriptive modeling
- predictive modeling
- prescriptive modeling
- descriptive modeling
- predictive modeling
- prescriptive modeling
predictive modeling, e.g.,
“the funnel”
predictive modeling, e.g.,
“the funnel”
interpretable predictive modeling
supercoolstuff
interpretable predictive modeling
supercoolstuff
Middendorf, Kundaje, Shah, Freund, CW, Leslie
arxiv.org/abs/q-bio/0701021
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
work w/Daeil Kim & Hiroko Tabuchi
work w/Daeil Kim & Hiroko Tabuchi
work w/Daeil Kim & Hiroko Tabuchi
work w/Daeil Kim & Hiroko Tabuchi
driving question: which records should she investigate?
We conducted
user research
to gather
millions of
observations
about how
different
articles made
people feel.
DATA COLLECTION
When reading this article, did you feel…
Anger Sadness Happiness Despair
Hurt No Emotion Jealousy Frustration
Anxiety Hope Hate Interest
Guilt Contentment Contempt Love
Compassion Shame Amusement Stress
Irritation Fear Boredom Surprise
Confusion Disgust Irony Pride
Disappointment
*units based on 100th percentile
Adventur
ous
98
Interest
42
Happiness
96
Self

Confident
39
Love
97
*units based on 100th percentile
Hate
Inspired
100
Amused
100
Sadness
27
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
Sources: Google DFP, NYT Ad Performance Data, Sizmek
April May June July August September October November December
Perspective Targeting Impression Volume By Month
Throughout the year, NYT
began running more and more
perspective targeting
campaigns every month.
And performance kept
breaking boundaries and
setting new benchmarks for
success.
A Record First Year
Perspective Targeting
outperformed brands’ own
first-party segments,
demonstrating that we’ve
managed to find a new way to
deliver meaning to these
audiences.
learnings
- descriptive modeling
- predictive modeling
- prescriptive modeling
- descriptive modeling
- predictive modeling
- prescriptive modeling
learnings
- descriptive modeling
- predictive modeling
- prescriptive modeling
- descriptive modeling
- predictive modeling
- prescriptive modeling
…two examples
predicting engagement
w/“audience development” team
leverage methods which are predictive yet performant
w/“audience development” team
driving question: which content
should we promote, where and when?
NB: data informed, not data-driven
learnings
- descriptive modeling
- predictive modeling
- prescriptive modeling
- descriptive modeling
- predictive modeling
- prescriptive modeling
… recommendation as prescription
2018: algos for *highly editorially curated* content pools
- smarter living
- midterms
- editors picks
2019: all of the above, plus:
- For You Tab
- stay tuned…
2019: all of the above, plus:
- For You Tab
- stay tuned…
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
Data Science at The New York Times: what industry can learn from us; what we can learn from industry
slow: Randomized controlled trial
fast: bandits
old (1933) idea: do the best you can
Lihong Li (YHOO->MSFT->GOOG), 2011
thompson sampling & “bandits”
old (1933) idea: do the best you can
Related/extensions: lessons from statistical physics:
- Variational Methods for complex algebraic models
- Urteaga & W. arXiv:1709.0316 / AISTATS 2018
- Monte Carlo methods
- nonparametric mixture models: Urteaga & W. arXiv:
1808.02932 [stat.ML] via MC) 

- particle filtering for arbitrary computational
models: Urteaga & W. arXiv:1808.02933
approximate variational methods for bandits
Urteaga & W. arXiv:1709.03163
cf. modelingsocialdata.org
monte carlo filtering w/drift? -> ‘genetic’ bandit
dynamic environments
Urteaga & W. arXiv:1808.02932
common requirements in
data science:
common requirements in
data science:
1. people
2. ideas
3. things
cf. John Boyd, USAF
monica rogati, Aug 1 2017 hackernoon.com
things: de>da>ds/ml/ai
data science: ideas
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
ML primitives: learning, scoring, testing;
- speed?
- scale?
- cost?
watch this space: NYT+AI
physics
math/fin p chem app math
cog sciEE
people.. so far (we’re hiring!!!!)
astrophys math/fin
pure mathapp math
cog sciEE
biophysseismology neuro
physics
data science @ The New York Times:
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
chris.wiggins@hackNY.org
@chrishwiggins
what industry can learn from us; what we can learn from industry
data science @ The New York Times:
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
chris.wiggins@hackNY.org
@chrishwiggins
what industry can learn from us; what we can learn from industry
also: we’re hiring!

More Related Content

PDF
20 Copywriting Disasters (and how to avoid them)
PDF
FlatPro - Presentation Template
PDF
a mission-driven approach to personalizing the customer journey
PDF
data science at the new york times
PDF
Data Science at The New York Times
PPTX
Cloud pharmaceuticals (2)
PPTX
Geometric Minimalist Pitch Deck for automotive
PPTX
Geometric Minimalist Pitch Deckss123.pptx
20 Copywriting Disasters (and how to avoid them)
FlatPro - Presentation Template
a mission-driven approach to personalizing the customer journey
data science at the new york times
Data Science at The New York Times
Cloud pharmaceuticals (2)
Geometric Minimalist Pitch Deck for automotive
Geometric Minimalist Pitch Deckss123.pptx

Similar to Data Science at The New York Times: what industry can learn from us; what we can learn from industry (19)

PPTX
Geometric Minimalist Pitch Deck (1).pptx
PPTX
Geometric1234 Minimalist Pitch Deck.pptx
PDF
Copy of Playful 3D Characters Editorial Meeting Presentation.pdf
PPT
How to Create a Virtual Law Practice
PPTX
Global Business
PDF
Oral Communication
PPTX
Code leader2
PDF
MSP Automation - Application and Execution
PPTX
Answers to the World's Scariest Employment Law Questions
PPTX
Green Minimalist Professional Tech Start-Up Pitch Deck Presentation.pptx
PPTX
High Impact PowerPoint Presentations- Compilation 3
PPTX
Cannabis PowerPoint template | Weedly
PDF
Yellow Watercolor Organic Creative Project Presentation.pdf
PPTX
Antara - Green nice to feel abstract live
PPTX
Advertising Pitch Deck thats help and get advertisement
PPTX
Animated marketing
PPT
Virtual Law Practice: Basic Concepts
PDF
apresentação de modelo no canva gratuito
PPT
Virtual Law Practice: Basic Concept from the ABA ELawyering Task Force
Geometric Minimalist Pitch Deck (1).pptx
Geometric1234 Minimalist Pitch Deck.pptx
Copy of Playful 3D Characters Editorial Meeting Presentation.pdf
How to Create a Virtual Law Practice
Global Business
Oral Communication
Code leader2
MSP Automation - Application and Execution
Answers to the World's Scariest Employment Law Questions
Green Minimalist Professional Tech Start-Up Pitch Deck Presentation.pptx
High Impact PowerPoint Presentations- Compilation 3
Cannabis PowerPoint template | Weedly
Yellow Watercolor Organic Creative Project Presentation.pdf
Antara - Green nice to feel abstract live
Advertising Pitch Deck thats help and get advertisement
Animated marketing
Virtual Law Practice: Basic Concepts
apresentação de modelo no canva gratuito
Virtual Law Practice: Basic Concept from the ABA ELawyering Task Force
Ad

More from chris wiggins (20)

PDF
"data hum: a core approach to the ethics of data"
PDF
"data: past, present, and future" day 1 lecture 2020-01-20
PDF
history and ethics of data
PDF
"data: past, present, and future" lecture 1 (intro) 1/22/19
PDF
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones
PDF
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...
PDF
Data: Past, Present, and Future (Lecture 1, Spring 2018)
PDF
data science: past present & future [American Statistical Association (ASA) C...
PDF
Machine Learning Summer School 2016
PDF
lean + design thinking in building data products
PDF
data science @NYT ; inaugural Data Science Initiative Lecture
PDF
data history / data science @ NYT
PDF
data science history / data science @ NYT
PDF
data science: past, present, and future
PDF
Chris Wiggins: "engagement & reality"
PDF
intro data science at NYT 2015-01-22
PDF
data science in academia and the real world
PDF
Lean workbench 2013-07-24
PDF
Wiggins 2013 05-29
PDF
variational bayes in biophysics
"data hum: a core approach to the ethics of data"
"data: past, present, and future" day 1 lecture 2020-01-20
history and ethics of data
"data: past, present, and future" lecture 1 (intro) 1/22/19
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...
Data: Past, Present, and Future (Lecture 1, Spring 2018)
data science: past present & future [American Statistical Association (ASA) C...
Machine Learning Summer School 2016
lean + design thinking in building data products
data science @NYT ; inaugural Data Science Initiative Lecture
data history / data science @ NYT
data science history / data science @ NYT
data science: past, present, and future
Chris Wiggins: "engagement & reality"
intro data science at NYT 2015-01-22
data science in academia and the real world
Lean workbench 2013-07-24
Wiggins 2013 05-29
variational bayes in biophysics
Ad

Recently uploaded (20)

PPTX
Current and future trends in Computer Vision.pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Geodesy 1.pptx...............................................
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Construction Project Organization Group 2.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
PPT on Performance Review to get promotions
Current and future trends in Computer Vision.pptx
Categorization of Factors Affecting Classification Algorithms Selection
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Internet of Things (IOT) - A guide to understanding
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Mechanical Engineering MATERIALS Selection
Safety Seminar civil to be ensured for safe working.
Geodesy 1.pptx...............................................
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Embodied AI: Ushering in the Next Era of Intelligent Systems
Construction Project Organization Group 2.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
PPT on Performance Review to get promotions

Data Science at The New York Times: what industry can learn from us; what we can learn from industry

  • 1. data science @ The New York Times: chris.wiggins@columbia.edu chris.wiggins@nytimes.com what industry can learn from us; what we can learn from industry
  • 2. data science @ The New York Times
  • 4. biology: 1892 vs. 1995 biology changed for good.
  • 5. biology: 1892 vs. 1995 new toolset, new mindset
  • 6. biology: 1892 vs. 1995 “These are indeed exciting times, not unlike the early days of recombinant DNA in the 1970s, in which a revolutionary new technology permitted entirely new questions about the nature of genes to be raised. This challenge is new to biology, and its resolution will require, in addition to existing paradigms of molecular biology, new sets of analytical tools... disciplines outside of biology will be required to collaborate on this problem.” new toolset, new mindset
  • 7. genetics: 1837 vs. 2012 ML toolset; data science mindset
  • 9. genetics: 1837 vs. 2012 ML toolset; data science mindset arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost Raj, Dewar, Palacios, Rabadan, CW
  • 16. data science: mindset & toolset drew conway, 2010
  • 17. data science: mindset & toolset develop + deploy machine learning solutions to newsroom + business problems
  • 18. data science @ The New York Times:
  • 19. data science @ The New York Timesdata science @ The New York Times
  • 20. 1851
  • 26. learnings - descriptive modeling - predictive modeling - prescriptive modeling
  • 27. (actually ML, shhhh…) - (unsupervised learning) - (supervised learning) - (reinforcement learning)
  • 28. (actually ML, shhhh…) 2012; h/t michael littman
  • 29. learnings - descriptive modeling - predictive modeling - prescriptive modeling - descriptive modeling - predictive modeling - prescriptive modeling
  • 32. Chong & Blei, SIGKDD 2011 CTM generative model: r: clicks w: words u: user-topic association v: article-topic association
  • 33. Chong & Blei. SIGKDD 2011 related: learning phenotypes from EHR
  • 34. Chong & Blei. SIGKDD 2011 related: learning phenotypes from EHR
  • 35. UPDATE COPYReaderscope In the course of our global expansion, we realized we needed to have much more sophisticated, real-time insight into what’s happening across our site. 
 Who is reading what? And where?
  • 36. LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen f f f Tool: Readerscope AUDIENCE INSIGHTS ENGINE
  • 37. LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen f f f Tool: Readerscope AUDIENCE INSIGHTS ENGINE
  • 38. LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen f Tool: Readerscope AUDIENCE INSIGHTS ENGINE
  • 39. LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen C-Suite C-Suite, Executives and BDMs - Entertainment C-Suite, Executives and BDMs - Media C-Suite| C-Suite C-Suite, Executives and BDMs - Entertainment C-Suite, Executives and BDMs - Media LOCATIONS FAQAUDIENCE SEGMENTSTOPICS FAQ or Intro Information Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut quislabore et dolore magna aliqua. enim ad minim veniam, quis quis aliqua ullamconostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. aliquip ex ea commodo consequat.aliquip ex Privacy Policy Terms of Service © The New York Times CompanyNYTimes.com Send Us Feedback Searchex. High-tech Lifestyle, Parents, Media - Comedy Films Audience Segment Search by nytreaderscope Illustration by Clara Nguyen f Tool: Readerscope AUDIENCE INSIGHTS ENGINE
  • 40. learnings - descriptive modeling - predictive modeling - prescriptive modeling - descriptive modeling - predictive modeling - prescriptive modeling
  • 44. interpretable predictive modeling supercoolstuff Middendorf, Kundaje, Shah, Freund, CW, Leslie arxiv.org/abs/q-bio/0701021
  • 46. work w/Daeil Kim & Hiroko Tabuchi
  • 47. work w/Daeil Kim & Hiroko Tabuchi
  • 48. work w/Daeil Kim & Hiroko Tabuchi
  • 49. work w/Daeil Kim & Hiroko Tabuchi driving question: which records should she investigate?
  • 50. We conducted user research to gather millions of observations about how different articles made people feel. DATA COLLECTION When reading this article, did you feel… Anger Sadness Happiness Despair Hurt No Emotion Jealousy Frustration Anxiety Hope Hate Interest Guilt Contentment Contempt Love Compassion Shame Amusement Stress Irritation Fear Boredom Surprise Confusion Disgust Irony Pride Disappointment
  • 51. *units based on 100th percentile
  • 52. Adventur ous 98 Interest 42 Happiness 96 Self
 Confident 39 Love 97 *units based on 100th percentile Hate Inspired 100 Amused 100 Sadness 27
  • 54. Sources: Google DFP, NYT Ad Performance Data, Sizmek April May June July August September October November December Perspective Targeting Impression Volume By Month Throughout the year, NYT began running more and more perspective targeting campaigns every month. And performance kept breaking boundaries and setting new benchmarks for success. A Record First Year
  • 55. Perspective Targeting outperformed brands’ own first-party segments, demonstrating that we’ve managed to find a new way to deliver meaning to these audiences.
  • 56. learnings - descriptive modeling - predictive modeling - prescriptive modeling - descriptive modeling - predictive modeling - prescriptive modeling
  • 57. learnings - descriptive modeling - predictive modeling - prescriptive modeling - descriptive modeling - predictive modeling - prescriptive modeling …two examples
  • 60. leverage methods which are predictive yet performant w/“audience development” team
  • 61. driving question: which content should we promote, where and when?
  • 62. NB: data informed, not data-driven
  • 63. learnings - descriptive modeling - predictive modeling - prescriptive modeling - descriptive modeling - predictive modeling - prescriptive modeling … recommendation as prescription
  • 64. 2018: algos for *highly editorially curated* content pools - smarter living - midterms - editors picks
  • 65. 2019: all of the above, plus: - For You Tab - stay tuned…
  • 66. 2019: all of the above, plus: - For You Tab - stay tuned…
  • 71. old (1933) idea: do the best you can
  • 72. Lihong Li (YHOO->MSFT->GOOG), 2011 thompson sampling & “bandits”
  • 73. old (1933) idea: do the best you can
  • 74. Related/extensions: lessons from statistical physics: - Variational Methods for complex algebraic models - Urteaga & W. arXiv:1709.0316 / AISTATS 2018 - Monte Carlo methods - nonparametric mixture models: Urteaga & W. arXiv: 1808.02932 [stat.ML] via MC) 
 - particle filtering for arbitrary computational models: Urteaga & W. arXiv:1808.02933
  • 75. approximate variational methods for bandits Urteaga & W. arXiv:1709.03163
  • 76. cf. modelingsocialdata.org monte carlo filtering w/drift? -> ‘genetic’ bandit
  • 77. dynamic environments Urteaga & W. arXiv:1808.02932
  • 79. common requirements in data science: 1. people 2. ideas 3. things cf. John Boyd, USAF
  • 80. monica rogati, Aug 1 2017 hackernoon.com things: de>da>ds/ml/ai
  • 85. watch this space: NYT+AI physics math/fin p chem app math cog sciEE
  • 86. people.. so far (we’re hiring!!!!) astrophys math/fin pure mathapp math cog sciEE biophysseismology neuro physics
  • 87. data science @ The New York Times: chris.wiggins@columbia.edu chris.wiggins@nytimes.com chris.wiggins@hackNY.org @chrishwiggins what industry can learn from us; what we can learn from industry
  • 88. data science @ The New York Times: chris.wiggins@columbia.edu chris.wiggins@nytimes.com chris.wiggins@hackNY.org @chrishwiggins what industry can learn from us; what we can learn from industry also: we’re hiring!