SlideShare a Scribd company logo
Data Science @ Scale
@davidcoallier
Part of an amazing team at Barricade.io
Data Science is
Hard
Data Hacking is
“Easy”
Data Analysis is
“Easy”
Data Expertise is
“Easy”
Got all?
Having the three is real hard!
Data Science at Scale @ barricade.io
Is that it?
Well don’t forget your purpose.
You are not an economist.
ɪˈkɒnəmɪst/: Someone with all the answers, and none of the questions.
The Data Scientific
Method
Find a question.
Use the data you have
Features & Tests
Analyse Results
You will be sad.
Conversate
Talk about your findings.
Good Chats
Imply egoless and collaborative data scientists.
Recap.
1. Hacking
2. Maths & Stats
3. Expertise
And
1. Question
2. Be Pragmatic
3. Features
4. Analyse
5. Share.
A team!
Rarely a single-person effort.
An Example
Fraud Prevention — Business Prevention
I knew better.
Obviously… duh
We didn’t share.
Science has historically been shared.
Not with p-values
Empathise.
Use human language, not lingo.
For us at
Barricade
Data Science at Scale @ barricade.io
Doing this at
scale is hard.
We’re still small
About a billion data points a day.
Humble Beginnings
Typically… an Queue and an API.
This had issues.
Hard to scale, hard to decouple, etc.
Enter the
Lambda Architecture.
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Speed Layer
Data Science at Scale @ barricade.io
Batch Layer
Data Science at Scale @ barricade.io
Speed Layer: U new behaviour from new data
Batch Layer: All classified behaviour since T
Serving Layer
Data Science at Scale @ barricade.io
Speed Layer: U new behaviour from new data
Batch Layer: All classified behaviour since T
Serve Layer: Batch layer U Speed Layer
Cache Layer
Data Science at Scale @ barricade.io
On Amazon AWS
Identifying an
Attack.
Data Science at Scale @ barricade.io
Ahh! What’s that?
Kafka Queue.
Distributed messaging system
Append-only log
Consumers have offsets
Partition for parallelism
Replicate for redundancy
Message order guaranteed, per-partition
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Barricade
Customer
Data Science at Scale @ barricade.io
Questions?
@davidcoallier
@barricadeio

More Related Content

PDF
Data Science, what even...
PDF
Data Science, what even?!
PDF
Clare Corthell: Learning Data Science Online
PDF
Claudia Gold: Learning Data Science Online
PDF
Introduction to Python
PDF
Introduction To Data Science With Python
PDF
Developing in R - the contextual Multi-Armed Bandit edition
PDF
Search as Communication: Lessons from a Personal Journey
Data Science, what even...
Data Science, what even?!
Clare Corthell: Learning Data Science Online
Claudia Gold: Learning Data Science Online
Introduction to Python
Introduction To Data Science With Python
Developing in R - the contextual Multi-Armed Bandit edition
Search as Communication: Lessons from a Personal Journey

What's hot (16)

PDF
Introduction to Python for Data Science
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PDF
How to become a data scientist
PPTX
Research presentation
PPTX
Frontiers of Open Data Science Research
DOCX
Siddhant Thakur Resume
PDF
Python for Data Science
PPTX
BioIT Webinar on AI and data methods for drug discovery
PDF
The Promise and Peril of Very Big Models
PDF
Putting the Magic in Data Science
PDF
Enterprise Search: How do we get there from here?
PPTX
Introduction to Big Data/Machine Learning
PDF
Introduction to Data Science
PDF
Probabilistic Programming: Why, What, How, When?
PDF
Ai black box
PDF
Data driven portfolio management agile2017
Introduction to Python for Data Science
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
How to become a data scientist
Research presentation
Frontiers of Open Data Science Research
Siddhant Thakur Resume
Python for Data Science
BioIT Webinar on AI and data methods for drug discovery
The Promise and Peril of Very Big Models
Putting the Magic in Data Science
Enterprise Search: How do we get there from here?
Introduction to Big Data/Machine Learning
Introduction to Data Science
Probabilistic Programming: Why, What, How, When?
Ai black box
Data driven portfolio management agile2017
Ad

Viewers also liked (20)

PDF
Thesis Midterm032610
PPTX
Peer Review - MFADT thesis
PDF
Thesis Midterm032610
PDF
Lecturas F2 4 Fn 1 C10
PDF
Smseminar2304 [kompatibilitetsmodus]
PPT
Darwin Udg Milan
PPTX
presentacion
PDF
CLOrox 1%20Consolidated%20Earnings,%20Segment%20Information%20and%20Consoli...
PPTX
Sosialemedier221012
PPT
N.C. State Fair and social media
PPT
Presentation
PPT
N.C. State Fair and social media
PPS
The Piece Of Cake
PDF
liz claiborne 2006LCAnnualReport
PPT
Registering & Booking Appointments
PDF
Resource Thursdays
DOCX
Cuestionariojornadadereflexion
PDF
molson coors brewing 200710K
PPTX
I Joined Twitter - Now What?
PPTX
Sosialemedier it-trender2011 - bare for løye
Thesis Midterm032610
Peer Review - MFADT thesis
Thesis Midterm032610
Lecturas F2 4 Fn 1 C10
Smseminar2304 [kompatibilitetsmodus]
Darwin Udg Milan
presentacion
CLOrox 1%20Consolidated%20Earnings,%20Segment%20Information%20and%20Consoli...
Sosialemedier221012
N.C. State Fair and social media
Presentation
N.C. State Fair and social media
The Piece Of Cake
liz claiborne 2006LCAnnualReport
Registering & Booking Appointments
Resource Thursdays
Cuestionariojornadadereflexion
molson coors brewing 200710K
I Joined Twitter - Now What?
Sosialemedier it-trender2011 - bare for løye
Ad

Similar to Data Science at Scale @ barricade.io (20)

PDF
Module 1 introduction to machine learning
PPTX
SPWK '20 - explaining data science to humans.pptx
PDF
In:Confidence 2019 - Tools for privacy-aware data analysis
PDF
Data management workshop 101113
PPTX
PDF
Thinkful - Intro to Data Science - Washington DC
PDF
Landing your first Data Science Job: The Technical Interview
PDF
I believe I can fly (Extract London 2015)
PDF
How Do I Get a Job in Data Science? | People Ask Google
PDF
Career in Data Science (July 2017, DTLA)
PDF
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
PDF
User research assets: treasure or trash? (by Kate Towsey at #NUX6)
PDF
The data science handbook pre release (1)
PDF
Data Science - Reality Check
PDF
Module 1.2 data preparation
PPTX
Trusting a Distributed Data Pipeline | Masters of Conversion
 
PDF
Getting Started in Data Science
PDF
Intro to Data Science
PDF
Data vs Hunch - Beyond Lecture at Hyper Island 2015
PDF
Data vs Hunch - Lecture at Hyper Island 2015
Module 1 introduction to machine learning
SPWK '20 - explaining data science to humans.pptx
In:Confidence 2019 - Tools for privacy-aware data analysis
Data management workshop 101113
Thinkful - Intro to Data Science - Washington DC
Landing your first Data Science Job: The Technical Interview
I believe I can fly (Extract London 2015)
How Do I Get a Job in Data Science? | People Ask Google
Career in Data Science (July 2017, DTLA)
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
User research assets: treasure or trash? (by Kate Towsey at #NUX6)
The data science handbook pre release (1)
Data Science - Reality Check
Module 1.2 data preparation
Trusting a Distributed Data Pipeline | Masters of Conversion
 
Getting Started in Data Science
Intro to Data Science
Data vs Hunch - Beyond Lecture at Hyper Island 2015
Data vs Hunch - Lecture at Hyper Island 2015

More from David Coallier (15)

PDF
PRISM seed-stage Investor Deck
PDF
The Artful Business of Data Mining: Computational Statistics with Open Source...
KEY
Taking PHP to the next level
KEY
Mobile Cloud Architectures
KEY
Taking PHP To the next level
KEY
Orchestra at EngineYard
KEY
The Orchestra Platform
KEY
Breaking Technologies
KEY
Building APIs with FRAPI
KEY
RESTful APIs and FRAPI
PDF
Open Source for the greater good
PDF
PHP 5.3, a walkthrough
PDF
RESTful APIs and FRAPI, a matter of minutes
KEY
An introduction to CouchDB
KEY
Get ready for web3.0! Open up your app!
PRISM seed-stage Investor Deck
The Artful Business of Data Mining: Computational Statistics with Open Source...
Taking PHP to the next level
Mobile Cloud Architectures
Taking PHP To the next level
Orchestra at EngineYard
The Orchestra Platform
Breaking Technologies
Building APIs with FRAPI
RESTful APIs and FRAPI
Open Source for the greater good
PHP 5.3, a walkthrough
RESTful APIs and FRAPI, a matter of minutes
An introduction to CouchDB
Get ready for web3.0! Open up your app!

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Logistic Regression ml machine learning.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Global journeys: estimating international migration
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
.pdf is not working space design for the following data for the following dat...
Computer network topology notes for revision
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Logistic Regression ml machine learning.pptx
Business Acumen Training GuidePresentation.pptx
Moving the Public Sector (Government) to a Digital Adoption
Galatica Smart Energy Infrastructure Startup Pitch Deck
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Data_Analytics_and_PowerBI_Presentation.pptx
Quality review (1)_presentation of this 21
Global journeys: estimating international migration
climate analysis of Dhaka ,Banglades.pptx
.pdf is not working space design for the following data for the following dat...

Data Science at Scale @ barricade.io