SlideShare a Scribd company logo
Getting Started with Data Science
December 2017
http://bit.ly/data-science-sd
Deskhub-main - stake2017!
Jordan Zurowski
Thinkful Community Manager
MA in Industrial/Organizational
Psychology
About me
About you
You already have a career in data
I'm interested in switching into a data career
I just want to see what all the fuss is about
About Thinkful
Thinkful helps people become developers or data
scientists through 1-on-1 mentorship and project-based
learning
These workshops are built using this approach.
Today's Goals
What is Data Science?
How and why has the field emerged?
What do they do?
Next steps
Deck 92-146 (3)
Deck 92-146 (3)
Deck 92-146 (3)
Example: LinkedIn 2006
“[LinkedIn] was like arriving at a conference
reception and realizing you don’t know
anyone. So you just stand in the corner
sipping your drink—and you probably leave
early.”
-LinkedIn Manager, June 2006
Enter: Data Scientist
Jonathan Goldman
Joined LinkedIn in 2006, only
8M users (450M in 2016)
Started experiments to predict
people’s networks
Engineers were dismissive: “you
can already import your
address book”
The Result
Other Examples
Uber — Where drivers should hang out
Tala — Microfinance loan approval
Why now?
Big Data: datasets whose size is
beyond the ability of typical database
software tools to capture, store,
manage, and analyze
Brief history of "big data"
Trend "started" in 2005
Web 2.0 - Majority of content is created
by users
Mobile accelerates this — data/person
skyrockets
Big Data
90% of the data in the world
today has been created in the
last two years alone
- IBM, May 2013
The Problem
The Solution
Data Scientists - Jack of All Trades
Data Science is just the beginning
“The United States alone faces a shortage
of 140,000 to 190,000 people with deep
analytical skills as well as 1.5 million
managers and analysts to analyze big
data and make decisions based on their
findings.”
- McKinsey
The Process - LinkedIn Example
Frame the question
Collect the raw data
Process the data
Explore the data
Communicate results
Case: Frame the Question
What questions do we want to answer?
Case: Frame the Question
What connections (type and number) lead to
higher user engagement?
Which connections do people want to make
but are currently limited from making?
How might we predict these types of
connections with limited data from the user?
Case: Collect the Data
What data do we need to answer these
questions?
Case: Collect the Data
Connection data (who is who connected to?)
Demographic data (what is the profile of the
connection)
Engagement data (how do they use the site)
Case: Process the Data
How is the data “dirty” and how can we clean
it?
Case: Process the Data
User input
Redundancies
Feature changes
Data model changes
Case: Explore the Data
What are the meaningful patterns in the
data?
Case: Explore the Data
Triangle closing
Time overlaps
Geographic overlaps
Case: Communicate Findings
How do we communicate this? To whom?
Case: Communicate Findings
“People You Know” feature increased
clickthrough by 30% (generating millions
more page views)
Tools
SQL Queries
Business Analytics Software
Machine Learning Algorithms
#1 SQL Queries
SQL is the standard querying language
to access and manipulate databases
#1 SQL Queries
SELECT full_name FROM friends WHERE age>22
#2: Visualization Software
Business analytics software for your database
enabling you to easily find and communicate
insights visually
#2: Visualization Software
#3: Machine Learning Algorithms
Machine learning algorithms provide
computers with the ability to learn
without being explicitly programmed —
“programming by example”
Iris Data Set
Iris Data Set
Iris Data Set
Use Cases for Machine Learning
Classification — Predict categories
Regression — Predict values Anomaly
Fraud Detection — Find unusual occurrences
Clustering — Discover structure
It may seem like a daunting opportunity
But if you're interested...
Knowledge of statistics, algorithms, &
software
Comfort with languages & tools (Python,
SQL, Tableau)
Inquisitiveness and intellectual curiosity
Strong communication skills
It’s all Teachable!
Ways to keep learning
For aspiring developers...
Source: Bureau of Labor Statistics
92%of grads placed in full-time tech jobs
job guarantee
Link for the third party audit jobs report:
https://www.thinkful.com/outcomes
Thinkful's track record of getting students jobs
Our students receive unprecedented support
1-on-1 Learning Mentor
1-on-1 Career MentorProgram Manager
San Diego Community
You
1-on-1 mentorship enables flexible learning
Learn anywhere,
anytime, and at your
own schedule
You don't have to quit
your job to start career
transition
Thinkful's Free Resource
Introduction to Python, Data
Visualization, and Stats.
Unlimited mentor-led Q&A sessions
Personal Program Manager
bit.ly/tf-ds-free-
course

More Related Content

PDF
Data sci sd-11.6.17
PDF
Startds9.19.17sd
PDF
Getstarteddssd12717sd
PDF
Getting started in ds (july 17) atlanta
PDF
Thinkful - Intro to Data Science - Washington DC
PDF
2017 06-14-getting started with data science
PDF
Getting started in data science (4:3)
PDF
Getting started in data science (4:3)
Data sci sd-11.6.17
Startds9.19.17sd
Getstarteddssd12717sd
Getting started in ds (july 17) atlanta
Thinkful - Intro to Data Science - Washington DC
2017 06-14-getting started with data science
Getting started in data science (4:3)
Getting started in data science (4:3)

What's hot (20)

PDF
What is Data Science
PPTX
Data science
PDF
From Rocket Science to Data Science
PDF
Minne analytics presentation 2018 12 03 final compressed
PPTX
Data science
PPTX
PPTX
1. Data Analytics-introduction
PPTX
Data Literacy
PPTX
What is Data?
PPTX
SSE 2017 10-09
PDF
Generating Cultural Personas From Social Data - A Perspective of Middle Easte...
PDF
Data Science for Finance Interview.
PPTX
Applications of machine learning
PDF
The State of Artificial Intelligence and What It Means for the Philippines
PDF
Five NLP Challenges in Data-Driven Personas
PDF
Responsible AI
PPTX
Introduction of Data Science
PPTX
SMART Seminar Series: "From Big Data to Smart data"
PDF
Introduction to Data Science (Data Summit, 2017)
PPTX
Artificial Intelligence: Humans Need Not Apply
What is Data Science
Data science
From Rocket Science to Data Science
Minne analytics presentation 2018 12 03 final compressed
Data science
1. Data Analytics-introduction
Data Literacy
What is Data?
SSE 2017 10-09
Generating Cultural Personas From Social Data - A Perspective of Middle Easte...
Data Science for Finance Interview.
Applications of machine learning
The State of Artificial Intelligence and What It Means for the Philippines
Five NLP Challenges in Data-Driven Personas
Responsible AI
Introduction of Data Science
SMART Seminar Series: "From Big Data to Smart data"
Introduction to Data Science (Data Summit, 2017)
Artificial Intelligence: Humans Need Not Apply
Ad

Similar to Deck 92-146 (3) (20)

PDF
D92-198gstindspdx
PDF
Career in Data Science (July 2017, DTLA)
PDF
Thinkful DC - Intro to Data Science
PDF
Intro to Data Science
PDF
Getting started in Data Science (April 2017, Los Angeles)
PDF
Getting Started in Data Science
PDF
PPTX
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
PPTX
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PPT
data science ppt of emngineering studnets
PPT
From Developer to Data Scientist
PDF
Industrial Data Science
PDF
Math in data
PPTX
Introduction to Big Data Analytics
PDF
from_physics_to_data_science
PPTX
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
PDF
Dont wait what 300 ld leaders have learned about building data fluency
PPTX
Data science for BE subject code is 2cs642
PDF
Untitled document.pdf
D92-198gstindspdx
Career in Data Science (July 2017, DTLA)
Thinkful DC - Intro to Data Science
Intro to Data Science
Getting started in Data Science (April 2017, Los Angeles)
Getting Started in Data Science
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
data science ppt of emngineering studnets
From Developer to Data Scientist
Industrial Data Science
Math in data
Introduction to Big Data Analytics
from_physics_to_data_science
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Dont wait what 300 ld leaders have learned about building data fluency
Data science for BE subject code is 2cs642
Untitled document.pdf
Ad

More from Thinkful (20)

PDF
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
PDF
LA 1/31/18 Intro to JavaScript: Fundamentals
PDF
LA 1/31/18 Intro to JavaScript: Fundamentals
PDF
Itjsf129
PDF
Twit botsd1.30.18
PDF
Build your-own-instagram-filters-with-javascript-202-335 (1)
PDF
Baggwjs124
PDF
Become a Data Scientist: A Thinkful Info Session
PDF
Vpet sd-1.25.18
PDF
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
PDF
How to Choose a Programming Language
PDF
Batbwjs117
PDF
1/16/18 Intro to JS Workshop
PDF
LA 1/16/18 Intro to Javascript: Fundamentals
PDF
(LA 1/16/18) Intro to JavaScript: Fundamentals
PDF
Websitesd1.15.17.
PDF
Bavpwjs110
PDF
Byowwhc110
PDF
Getting started-jan-9-2018
PDF
Introjs1.9.18tf
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: Fundamentals
Itjsf129
Twit botsd1.30.18
Build your-own-instagram-filters-with-javascript-202-335 (1)
Baggwjs124
Become a Data Scientist: A Thinkful Info Session
Vpet sd-1.25.18
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
How to Choose a Programming Language
Batbwjs117
1/16/18 Intro to JS Workshop
LA 1/16/18 Intro to Javascript: Fundamentals
(LA 1/16/18) Intro to JavaScript: Fundamentals
Websitesd1.15.17.
Bavpwjs110
Byowwhc110
Getting started-jan-9-2018
Introjs1.9.18tf

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
KodekX | Application Modernization Development
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Modernizing your data center with Dell and AMD
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KodekX | Application Modernization Development
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Modernizing your data center with Dell and AMD
Reach Out and Touch Someone: Haptics and Empathic Computing

Deck 92-146 (3)

  • 1. Getting Started with Data Science December 2017 http://bit.ly/data-science-sd Deskhub-main - stake2017!
  • 2. Jordan Zurowski Thinkful Community Manager MA in Industrial/Organizational Psychology About me
  • 3. About you You already have a career in data I'm interested in switching into a data career I just want to see what all the fuss is about
  • 4. About Thinkful Thinkful helps people become developers or data scientists through 1-on-1 mentorship and project-based learning These workshops are built using this approach.
  • 5. Today's Goals What is Data Science? How and why has the field emerged? What do they do? Next steps
  • 9. Example: LinkedIn 2006 “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  • 10. Enter: Data Scientist Jonathan Goldman Joined LinkedIn in 2006, only 8M users (450M in 2016) Started experiments to predict people’s networks Engineers were dismissive: “you can already import your address book”
  • 12. Other Examples Uber — Where drivers should hang out Tala — Microfinance loan approval
  • 13. Why now? Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  • 14. Brief history of "big data" Trend "started" in 2005 Web 2.0 - Majority of content is created by users Mobile accelerates this — data/person skyrockets
  • 15. Big Data 90% of the data in the world today has been created in the last two years alone - IBM, May 2013
  • 18. Data Scientists - Jack of All Trades
  • 19. Data Science is just the beginning “The United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” - McKinsey
  • 20. The Process - LinkedIn Example Frame the question Collect the raw data Process the data Explore the data Communicate results
  • 21. Case: Frame the Question What questions do we want to answer?
  • 22. Case: Frame the Question What connections (type and number) lead to higher user engagement? Which connections do people want to make but are currently limited from making? How might we predict these types of connections with limited data from the user?
  • 23. Case: Collect the Data What data do we need to answer these questions?
  • 24. Case: Collect the Data Connection data (who is who connected to?) Demographic data (what is the profile of the connection) Engagement data (how do they use the site)
  • 25. Case: Process the Data How is the data “dirty” and how can we clean it?
  • 26. Case: Process the Data User input Redundancies Feature changes Data model changes
  • 27. Case: Explore the Data What are the meaningful patterns in the data?
  • 28. Case: Explore the Data Triangle closing Time overlaps Geographic overlaps
  • 29. Case: Communicate Findings How do we communicate this? To whom?
  • 30. Case: Communicate Findings “People You Know” feature increased clickthrough by 30% (generating millions more page views)
  • 31. Tools SQL Queries Business Analytics Software Machine Learning Algorithms
  • 32. #1 SQL Queries SQL is the standard querying language to access and manipulate databases
  • 33. #1 SQL Queries SELECT full_name FROM friends WHERE age>22
  • 34. #2: Visualization Software Business analytics software for your database enabling you to easily find and communicate insights visually
  • 36. #3: Machine Learning Algorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  • 40. Use Cases for Machine Learning Classification — Predict categories Regression — Predict values Anomaly Fraud Detection — Find unusual occurrences Clustering — Discover structure
  • 41. It may seem like a daunting opportunity
  • 42. But if you're interested... Knowledge of statistics, algorithms, & software Comfort with languages & tools (Python, SQL, Tableau) Inquisitiveness and intellectual curiosity Strong communication skills It’s all Teachable!
  • 43. Ways to keep learning
  • 44. For aspiring developers... Source: Bureau of Labor Statistics
  • 45. 92%of grads placed in full-time tech jobs job guarantee Link for the third party audit jobs report: https://www.thinkful.com/outcomes Thinkful's track record of getting students jobs
  • 46. Our students receive unprecedented support 1-on-1 Learning Mentor 1-on-1 Career MentorProgram Manager San Diego Community You
  • 47. 1-on-1 mentorship enables flexible learning Learn anywhere, anytime, and at your own schedule You don't have to quit your job to start career transition
  • 48. Thinkful's Free Resource Introduction to Python, Data Visualization, and Stats. Unlimited mentor-led Q&A sessions Personal Program Manager bit.ly/tf-ds-free- course