SlideShare a Scribd company logo
Open Data Science Conference 2015
Why We Need More Data
Lots of Data
3
The Effect of Better Algorithms
4 CrowdFlower, Inc. – Proprietary and Confidential
0%
5%
10%
15%
20%
25%
Naïve Bayes Maximum Entropy SVM
Classifier Error Rate
Real World Data
5
Active Semi-Supervised Learning for
Improving Word Alignment
(Vamshi ACL ’10)
The Effect of Better Features
6 CrowdFlower, Inc. – Proprietary and Confidential
0%
5%
10%
15%
20%
25%
30%
Unigrams Bigrams Unigrams+Bigrams
Classifier Error Rate
Real World Data
7
The Effect of More Data
8 CrowdFlower, Inc. – Proprietary and Confidential
0%
2%
4%
6%
8%
10%
12%
14%
N 2N 4N
Classifier Error Rate
Real World Data
9
Active Semi-Supervised Learning for
Improving Word Alignment
(Vamshi ACL ’10)
The Effect of Cleaner Data
10 CrowdFlower, Inc. – Proprietary and Confidential
0%
2%
4%
6%
8%
10%
12%
14%
90% Accurate Data 95% Accurate Data 100% Accurate Data
Classifier Error Rate
11
Where do Data Scientists Spend Their Time
The Power of Open Data
CrowdFlower Data Enrichment Platform
13
Color Data
14
15
16
17
18
19
20
Fleshmap
21
22
Drug Side Effects
23
24
25
Apple Watch
26
Apple Watch
27
Apple Watch
28
Apple Watch
29
Data for Everyone
Collecting the Same Data Over and Over
31
Open Data
32
Make Your Data Public Setting
33
Data for Everyone
34
Data For Everyone Library
35
Data for Everyone
36
Data For Everyone
37
Categorize URLs
38
URL Categorization
39
Open Data API
40
Record Data
41
Extracting Names and Titles
42
Summarization
43
Is an Image Funny?
44
Classifying Medical Images
45
Attributes of People
46
47
396 Scripts
48
Lukas Biewald
lukas@crowdflower.com
@L2K
Thank You

More Related Content

PPTX
Active Learning and Human-in-the-Loop
PPTX
Remote Work Statistics - Summary
PDF
Post-Equifax: How to Trust But Verify Your Software Supply Chain
PPTX
Profit prediction presentation gabe kwakyi
PDF
Custom event prospecting: Winning with Targeting in the Data Gold Rush
PDF
Baseball Database Queries with SQL and dplyr
PDF
Fantasy Football Draft Optimization in R - HRUG
PPTX
How Oracle Uses CrowdFlower For Sentiment Analysis
Active Learning and Human-in-the-Loop
Remote Work Statistics - Summary
Post-Equifax: How to Trust But Verify Your Software Supply Chain
Profit prediction presentation gabe kwakyi
Custom event prospecting: Winning with Targeting in the Data Gold Rush
Baseball Database Queries with SQL and dplyr
Fantasy Football Draft Optimization in R - HRUG
How Oracle Uses CrowdFlower For Sentiment Analysis

Viewers also liked (9)

PDF
Building Better Models Faster Using Active Learning
PPTX
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
PPTX
R seminar dplyr package
PPTX
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
PDF
4 R Tutorial DPLYR Apply Function
PDF
Data manipulation with dplyr
PPTX
7 Myths of AI
PDF
[系列活動] Data exploration with modern R
PPTX
Algerian R Users Group (Official Kick Off)
Building Better Models Faster Using Active Learning
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
R seminar dplyr package
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
4 R Tutorial DPLYR Apply Function
Data manipulation with dplyr
7 Myths of AI
[系列活動] Data exploration with modern R
Algerian R Users Group (Official Kick Off)
Ad

Similar to Open Data Science Conference 2015 (20)

PPTX
Bad Data is Polluting Big Data
PPTX
Continuous testing
PPTX
Big data webinar
PPT
Revel Presents at Under the Radar
PPTX
Decreasing false positives in automated testing
PDF
State of the Market - Data Quality in 2023
PDF
1530 track1 rosenbaum
PDF
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
PDF
Quick Response Fraud Detection
PPTX
Build a Next-Generation Clinical Operational Metrics Solution
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PDF
Dreamforce 2013 One Hit Wonder Sessions
PDF
Enabling patient-centricity-pfizer
PDF
Building and deploying AI/ML models on AWS for Biosciences professionals
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
SCA del Software Open Source: come interpretarlo per evitare problemi di sicu...
PPTX
"Implementing data quality automation with open source stack" - Max Martynov,...
PDF
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
PPTX
Make Your Reports Over the Counter
PPTX
Data drift and machine learning
Bad Data is Polluting Big Data
Continuous testing
Big data webinar
Revel Presents at Under the Radar
Decreasing false positives in automated testing
State of the Market - Data Quality in 2023
1530 track1 rosenbaum
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
Quick Response Fraud Detection
Build a Next-Generation Clinical Operational Metrics Solution
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Dreamforce 2013 One Hit Wonder Sessions
Enabling patient-centricity-pfizer
Building and deploying AI/ML models on AWS for Biosciences professionals
GraphTalk New Zealand - The Art of The Possible.pptx
SCA del Software Open Source: come interpretarlo per evitare problemi di sicu...
"Implementing data quality automation with open source stack" - Max Martynov,...
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
Make Your Reports Over the Counter
Data drift and machine learning
Ad

More from CrowdFlower (8)

PPTX
CrowdFlower NDA Crowds - Secure, exceptional tasking at a massive scale.
PPTX
CrowdFlower Product Webinar - Graphical Editor and Visual Reports
PPTX
Humanizing The Machine
PDF
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
PDF
Virtual Data Steward: Data Management 3.0
PDF
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 poster
PDF
The State of Enterprise Crowdsourcing 2013
PDF
CrowdFlower University Oct. 21 2013
CrowdFlower NDA Crowds - Secure, exceptional tasking at a massive scale.
CrowdFlower Product Webinar - Graphical Editor and Visual Reports
Humanizing The Machine
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
Virtual Data Steward: Data Management 3.0
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 poster
The State of Enterprise Crowdsourcing 2013
CrowdFlower University Oct. 21 2013

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Introduction to machine learning and Linear Models
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction-to-Cloud-ComputingFinal.pptx
Supervised vs unsupervised machine learning algorithms
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to machine learning and Linear Models
SAP 2 completion done . PRESENTATION.pptx
IB Computer Science - Internal Assessment.pptx
1_Introduction to advance data techniques.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
[EN] Industrial Machine Downtime Prediction
Fluorescence-microscope_Botany_detailed content
Clinical guidelines as a resource for EBP(1).pdf
Qualitative Qantitative and Mixed Methods.pptx
ISS -ESG Data flows What is ESG and HowHow
.pdf is not working space design for the following data for the following dat...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx

Open Data Science Conference 2015

Editor's Notes

  • #42: Over 200,000 Records
  • #47: 59,000 records