SlideShare a Scribd company logo
Life of Data Scientist
myths and reality
Jeong, Buhwan Ph.D
Data Hacker / Kakao Corp.
Data scientists
are big data wranglers. They take an enormous
mass of messy data points and use their formidable
skills in math, statistics and programming to clean,
massage and organize them. Then they apply all
their analytic powers and domain knowledge to
uncover hidden solutions to business challenges.
Script (modified) from http://www.mastersindatascience.org/careers/data-scientist/
A data scientist is someone who is
better at statistics than any software engineer and
better at software engineering than any statistician.
domain knowledge
business understanding+
http://www.mastersindatascience.org/careers/data-scientist/
Diagram from https://www.quora.com/What-is-a-data-scientist-3
Image from http://paper4pc.com/superman-logosuperman.html
http://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
Life of a data scientist (pub)
Life of a data scientist (pub)
Life of a data scientist (pub)
DB
Log
SQL
Data
TXT / EXL
Visualization
Implement
Test & Deploy
[KR]
Algorithm
- Regression
- Classification
- Clustering
[Insight]
Big Data?
Volume Variety
Velocity Value
Value
BIG & FAST SMART
Count & Trend Predictive
Technical Meaningful
Analytics
Volume Variety
Velocity
Engineering Science
Data Science
Scientific Method
Proved by
Theory
Verified with
Experiment
Algorithm
(Equations)
Testing
(Evidence)
Experiment & Test
Hypothesis
Experiment
Graduation
Observation
Deployment
Test (Comparison)
Observation
Off-line Test
Deployment
On-line Test
Test Deploy
Modeling Test set
Observe
A (Treatment)
B (Control)
← Offline : Online →
Solve
M T W T F S S
Code Release
Off-line Test
On-line Test
Deployment
Monitoring & Improvement
Netflix’s Weekly Test & Deployment
Image from https://vwo.com/ab-testing/
On-line A/B Test
Image from https://vwo.com/ab-testing/
From Yahoo! (Creative Best Practices: Native Ads)
From Yahoo! (Creative Best Practices: Native Ads)
Life of a data scientist (pub)
A/B Test Configuration
Traffic-driven
For every incoming request,
if random() < 0.1, then assign the treatment group (10%)
otherwise, assign the control group (90%)
User-driven
For every requestor (whose userId ends with ‘NN’)
if ‘NN’ is in ’00 ~ 09’, then assign the treatment group
otherwise, assign the control group
Random
Control Group
Treatment (A)
A/B Test
Random
Control Group
Treatment A
Treatment B
Treatment C
Multivariate Test
Multivariate test: https://www.optimizely.com/resources/multivariate-testing/
Life of a data scientist (pub)
Life of a data scientist (pub)
Life of a data scientist (pub)
Red Daum vs Blue Daum
Data over Algorithm
Forbes.com: http://goo.gl/bauDHw
DB
Log
SQL Data
Implement
Test & Deploy
[KR]
Algorithm
- Regression
- Classification
- Clustering
[Insight]
20 60
15
5
Forbes.com: http://goo.gl/bauDHw
Hacking Data for business goals
- Right data
- Right algorithm
- Right evaluation
Good UI/UX
is defined by
User Adoption
Human Hacker
Image from https://goo.gl/vClux5
Enjoy your Jeju

More Related Content

PPTX
How To Become a Data Scientist in Iran Marketplace
PDF
Data science presentation 2nd CI day
PDF
Introduction to Data Science
PDF
How to Become a Data Scientist
PDF
Data science presentation
PDF
Data science
PPTX
Data Science: Not Just For Big Data
PDF
Data science
How To Become a Data Scientist in Iran Marketplace
Data science presentation 2nd CI day
Introduction to Data Science
How to Become a Data Scientist
Data science presentation
Data science
Data Science: Not Just For Big Data
Data science

What's hot (20)

PDF
Data science and_analytics_for_ordinary_people_ebook
PDF
Data science e machine learning
PPTX
Data Science presentation for elementary school students
PDF
Introduction on Data Science
PPTX
Data Science using Python
PPTX
Data Science: Past, Present, and Future
PPTX
Introduction to data science
PDF
Introduction To Data Science
PDF
How to become a Data Scientist?
PDF
Introduction to Python for Data Science
PPTX
Intro to Data Science by DatalentTeam at Data Science Clinic#11
PDF
Introduction to Data Science
PPS
Big Data Science: Intro and Benefits
PDF
Myths and Mathemagical Superpowers of Data Scientists
PDF
Unit 3 part 2
PDF
Introduction to Data Science (Data Science Thailand Meetup #1)
PPTX
Introduction of Data Science
PDF
Python for Data Science - TDC 2015
PDF
Using hadoop for big data
PDF
Data Science
Data science and_analytics_for_ordinary_people_ebook
Data science e machine learning
Data Science presentation for elementary school students
Introduction on Data Science
Data Science using Python
Data Science: Past, Present, and Future
Introduction to data science
Introduction To Data Science
How to become a Data Scientist?
Introduction to Python for Data Science
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Introduction to Data Science
Big Data Science: Intro and Benefits
Myths and Mathemagical Superpowers of Data Scientists
Unit 3 part 2
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction of Data Science
Python for Data Science - TDC 2015
Using hadoop for big data
Data Science
Ad

Viewers also liked (13)

PDF
Deep learning - Conceptual understanding and applications
PDF
[4차]왓챠 알고리즘 분석(151106)
PPTX
Big Data = MISSION IMPOSSIBLE?
PDF
[235]루빅스개발이야기 황지수
PDF
Recommendation for dummy
PDF
차원축소 훑어보기 (PCA, SVD, NMF)
PDF
Hello, Recommender System
PDF
16-1학기 ITS 10기 오리엔테이션
PDF
컨텐츠 기반 A/B 테스트 구현 사례
PPTX
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
PDF
추놀 5회 무엇이든 분류해 보기
PDF
제1화 추천 시스템 이란.ppt
PPTX
1, 빅데이터 시대의 인공지능 문동선 v2
Deep learning - Conceptual understanding and applications
[4차]왓챠 알고리즘 분석(151106)
Big Data = MISSION IMPOSSIBLE?
[235]루빅스개발이야기 황지수
Recommendation for dummy
차원축소 훑어보기 (PCA, SVD, NMF)
Hello, Recommender System
16-1학기 ITS 10기 오리엔테이션
컨텐츠 기반 A/B 테스트 구현 사례
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
추놀 5회 무엇이든 분류해 보기
제1화 추천 시스템 이란.ppt
1, 빅데이터 시대의 인공지능 문동선 v2
Ad

Similar to Life of a data scientist (pub) (20)

PPTX
W-JAX Keynote - Big Data and Corporate Evolution
PDF
Cheat sheets for data scientists
PDF
Microsoft Data Science Technologies: Architecture Edition 201509
PDF
2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf
DOC
Phrases for resume and interview start Mar31
PDF
Microsoft Data Science Technologies: Back Office Edition
PDF
Microsoft Data Science Technologies 201510
PDF
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
PDF
Data science
PDF
From Rocket Science to Data Science
PDF
Big data careers
PDF
Big Data Engineer Resume. Timely Delivery: We unde
PDF
Come diventare data scientist - Paolo Pellegrini
DOCX
Sakthi Shenbagam - Data warehousing Consultant
PPTX
Deep learning
DOCX
Prashant Kumar
PPTX
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
PPTX
(The life of a) Data engineer
PDF
Top 3 Interesting Careers in Big Data.pdf
PDF
Microsoft Technologies for Data Science sql_saturday_201505
W-JAX Keynote - Big Data and Corporate Evolution
Cheat sheets for data scientists
Microsoft Data Science Technologies: Architecture Edition 201509
2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf
Phrases for resume and interview start Mar31
Microsoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies 201510
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
Data science
From Rocket Science to Data Science
Big data careers
Big Data Engineer Resume. Timely Delivery: We unde
Come diventare data scientist - Paolo Pellegrini
Sakthi Shenbagam - Data warehousing Consultant
Deep learning
Prashant Kumar
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
(The life of a) Data engineer
Top 3 Interesting Careers in Big Data.pdf
Microsoft Technologies for Data Science sql_saturday_201505

More from Buhwan Jeong (6)

PDF
A General introduction to Ad ranking algorithms
PDF
포스트 테일러 시대에 살아남기
PDF
Unexperienced pasts
PDF
Minority Report about Search Experience & Keyword Management
PDF
DDC2011 - Association
PDF
Internet Trends (C*), Search & Social
A General introduction to Ad ranking algorithms
포스트 테일러 시대에 살아남기
Unexperienced pasts
Minority Report about Search Experience & Keyword Management
DDC2011 - Association
Internet Trends (C*), Search & Social

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Quality review (1)_presentation of this 21
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Lecture1 pattern recognition............
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Reliability_Chapter_ presentation 1221.5784
Foundation of Data Science unit number two notes
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
climate analysis of Dhaka ,Banglades.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
IB Computer Science - Internal Assessment.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Quality review (1)_presentation of this 21
Miokarditis (Inflamasi pada Otot Jantung)
Lecture1 pattern recognition............
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Major-Components-ofNKJNNKNKNKNKronment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Database Infoormation System (DBIS).pptx
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Reliability_Chapter_ presentation 1221.5784

Life of a data scientist (pub)