SlideShare a Scribd company logo
Ning Zhou
2018.03.08
Preparing for the Transition:
Data Science as a Student v.s. in the Industry
2
How I got started with data science
Bachelor Master PhD
Industry
Research
Industry
Product Development
Classroom
Project
Academic
Project
Industry
Project
Classroom vs Academic vs Industry
3
Problem Data User Peer
Problem
Academic
● Technical problems
abstracted from
business scenarios
● Prefer challenging
problems that are
difficult to solve
4
Industry
● Problem often arrives
as a product
requirement instead
of a technical problem
● Prefer low-hanging
fruits that can bring
large impact with
relatively small efforts
Classroom
● Well-defined
problems with clear
metrics to measure
success
● “Solved” problems
with known
solutions
Data
Academic
● Open dataset with
some quality
assurance
● Mid to large volume
● Work with industry
datasets too, but
often pre-collected
5
Industry
● “Dirty” data
● Can be from very
small to very large
volume
● Data collection
takes time
Classroom
● Clean data
● Relatively “small”
volume
User
Academic
● Limited
opportunities to test
with real users
● Offline test is still
the most common
way to measure
performance
6
Industry
● Impact on real
users (no matter
good or bad…)
● Online test is
considered as
“final”
Classroom
● No real user impact
● Mostly offline test
only
Peer
Academic
● Smart peers from
the broader
research
community working
on similar topics
7
Industry
● Smart colleagues,
but they normally
work on different
projects
Classroom
● Smart classmates
who work on the
same project
Five Things I tried that didn’t help
● Team up with smart and hardworking classmates and then just be lazy
● Try random open-source models without thinking through
● “Tune metrics” instead of tuning models
● Manipulate data manually to throw out bad or difficult samples
● Procrastinate until the deadline approaches
8
Five Things I tried that helped
● Be curious about what other people are working on
● Keep cost and performance in mind
● Stay updated with the latest progress from both academia and industry
● Try things hands-on
● Write papers / technical sketches / blogs
9
Tusen takk!
Questions?
You can also contact me at ning@duoja.com.
10

More Related Content

PPTX
Industry - academia collaboration in practice
PDF
Exploring Choice Overload in Related-Article Recommendations in Digital Libra...
PPTX
Develop hospital information system
PPTX
Survey of Engineering Managers
PPTX
Google science fair
PDF
E xamplecg predictive analytics certification course brochure
PDF
Judging Criteria for Regeneron ISEF
PPTX
Educational Data Mining/Learning Analytics issue brief overview
Industry - academia collaboration in practice
Exploring Choice Overload in Related-Article Recommendations in Digital Libra...
Develop hospital information system
Survey of Engineering Managers
Google science fair
E xamplecg predictive analytics certification course brochure
Judging Criteria for Regeneron ISEF
Educational Data Mining/Learning Analytics issue brief overview

Similar to Preparing for the transition - data science as a student vs in the industry (20)

PDF
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...
PPTX
Why is Test Driven Development for Analytics or Data Projects so Hard?
PPTX
industry and academic collaboration in practice.pptx
PPTX
Max Kunytsia, “Why is continuous product discovery better than continuous del...
PPTX
Why is TDD so hard for Data Engineering and Analytics Projects?
PPTX
Data Analytics Training in Chandigarh sector 34
PPTX
Adopting data8 at a two year college
PPTX
Lak2018: Scaling Nationally: Seven Lesson Learned
PPTX
Which institute is best for data science?
PPTX
Best Selenium certification course
PPTX
Data science training in hyd ppt (1)
PPTX
Data science training institute in hyderabad
PPTX
Data science training in Hyderabad
PPTX
Data science training Hyderabad
PPTX
Data science online training in hyderabad
PPTX
Data science training in hyd ppt (1)
PPTX
data science training and placement
PPTX
online data science training
PPTX
Data science online training in hyderabad
PPTX
data science online training in hyderabad
Big Data LDN 2017: Preserving The Key Principles Of Academic Research In A Bu...
Why is Test Driven Development for Analytics or Data Projects so Hard?
industry and academic collaboration in practice.pptx
Max Kunytsia, “Why is continuous product discovery better than continuous del...
Why is TDD so hard for Data Engineering and Analytics Projects?
Data Analytics Training in Chandigarh sector 34
Adopting data8 at a two year college
Lak2018: Scaling Nationally: Seven Lesson Learned
Which institute is best for data science?
Best Selenium certification course
Data science training in hyd ppt (1)
Data science training institute in hyderabad
Data science training in Hyderabad
Data science training Hyderabad
Data science online training in hyderabad
Data science training in hyd ppt (1)
data science training and placement
online data science training
Data science online training in hyderabad
data science online training in hyderabad
Ad

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative analysis of optical character recognition models for extracting...
Review of recent advances in non-invasive hemoglobin estimation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25-Week II
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Ad

Preparing for the transition - data science as a student vs in the industry

  • 1. Ning Zhou 2018.03.08 Preparing for the Transition: Data Science as a Student v.s. in the Industry
  • 2. 2 How I got started with data science Bachelor Master PhD Industry Research Industry Product Development Classroom Project Academic Project Industry Project
  • 3. Classroom vs Academic vs Industry 3 Problem Data User Peer
  • 4. Problem Academic ● Technical problems abstracted from business scenarios ● Prefer challenging problems that are difficult to solve 4 Industry ● Problem often arrives as a product requirement instead of a technical problem ● Prefer low-hanging fruits that can bring large impact with relatively small efforts Classroom ● Well-defined problems with clear metrics to measure success ● “Solved” problems with known solutions
  • 5. Data Academic ● Open dataset with some quality assurance ● Mid to large volume ● Work with industry datasets too, but often pre-collected 5 Industry ● “Dirty” data ● Can be from very small to very large volume ● Data collection takes time Classroom ● Clean data ● Relatively “small” volume
  • 6. User Academic ● Limited opportunities to test with real users ● Offline test is still the most common way to measure performance 6 Industry ● Impact on real users (no matter good or bad…) ● Online test is considered as “final” Classroom ● No real user impact ● Mostly offline test only
  • 7. Peer Academic ● Smart peers from the broader research community working on similar topics 7 Industry ● Smart colleagues, but they normally work on different projects Classroom ● Smart classmates who work on the same project
  • 8. Five Things I tried that didn’t help ● Team up with smart and hardworking classmates and then just be lazy ● Try random open-source models without thinking through ● “Tune metrics” instead of tuning models ● Manipulate data manually to throw out bad or difficult samples ● Procrastinate until the deadline approaches 8
  • 9. Five Things I tried that helped ● Be curious about what other people are working on ● Keep cost and performance in mind ● Stay updated with the latest progress from both academia and industry ● Try things hands-on ● Write papers / technical sketches / blogs 9
  • 10. Tusen takk! Questions? You can also contact me at ning@duoja.com. 10