SlideShare a Scribd company logo
DATA
SCIENCE
MORE THAN MINING

                                                 “The sexiest job
                                                    in the next
                                                 10 years will be
                                                  statisticians.”
                                                       — Hal Varian,
                                                      Chief economist,
                                                          Google




While the concept of data science has been around for
decades, the notion of a data scientist has become a
sought-after and in-demand career leading to a rise of a new
generation of data scientists.

The phenomenon in technology development significantly
exposes the staggering growth rates of “big data.”
Technology innovation and the World Wide Web provide for
the growth of new types of data — such as user-generated
content — and tools that can be used to interpret it.

Social media platforms such as Facebook (the largest social
network and valued at $52 billion) depend on data science to
create innovative, interactive features that encourage users
to get interested and stay that way — all so that we know it's
important.

But what does the term ‘Data Science’ really mean?




What is data science?
Data science can be broken down into four essential parts.



Mining data                                      Statistics




Collecting and formatting                          Information analysis
the information




Interpret                                        Leverage



                                                         A B
                                                         C ?


Representation or visualization in               Implications of the data,
the form of presentations,                       application of the data, interaction
infographics, graphs or charts                   using the data and predictions
                                                 formed from studying it




Defining a data scientist
A good data scientist understands the importance of:



Scouring                                         Organization
Their eyes search for                            Their voice asks questions
information on the web                           about what they hope to
  Vectorized operations                          accomplish at the end of
                                                 the project, setting
  Algorithmic strategizing
                                                 information goals.
  APIs




Extraction                                                   Expansion &
Takes information they want and                              Application
organizing it using formulas. They
organize the information in order to                         The appropriate data flows
form educated, insightful conclusions                        out of the person in the form
using statistical and these                                  of keywords, Facebook “Likes”
mathematical methods:                                        and other statistics.
   Factor Analysis
   Regression Analysis
   Correlation
   Time Series Analysis




Creating new theories and
predictions based upon the data
Ask questions to further expound             pile-up and missed opportunities.
upon the data beyond the reaches of
                                             For example, statistics regarding
hard numbers or facts.
                                             holiday shopping trends are
Apply the information in a useful,           imperative around the holiday
innovative manner to applications            season. If the statistics are
whose success depends on data                processed and the conclusions are
science.                                     drawn too late, the season has
                                             passed and the information can no
Immediately process terabytes of
                                             longer be utilized to its full potential.
data that flow in to prevent




Required skills
for a data scientist
A successful data scientist must have a combination of skills that opens up
possibilities both for that individual and their team. Visualization processes are
often disjointed since each person is typically assigned to a specific part of the
project. The designer depends on the information architect. The information
architect depends on stats from the statistician, and so on. A true data scientist
should be skilled in multiple areas.


                               Expertise in
Hacking and                    Mathematics,
Computer                       Statistics,                         Creativity
Science                        Data Mining                         & Insight



                                              %

Knowing how to take            Pulling important                   Knowing what
advantage of                   statistics and                      statistics are
computers and the              coherently organizing               important and how
internet to create             them using                          to leverage them
data-mining formulas           mathematic prowess
                               and computer formulas




Dangers of data science
Statistics can be displayed in a misleading manner
Leading the pollee:
What type of question are you more likely
to answer “yes” to?




                 85%                                                70%
                 No                                                 Yes


Should Americans be taxed                        Should taxes support the
so others can take advantage                     government’s aid to those
of welfare and avoid working?                    who are unable to find work?




                                     Facts that are left out
                                     Including only the starting
                                     and ending points
                                     of data makes the change
                                     seem more drastic.




                                     A collage of carefully
 9 of 10




                                     selected information
                                     combined to induce a
                                     certain opinion
                                     Selection bias occurs when an unrepresentative
                                     population has been taken for a survey or study
                                     and then the results are advertised to the public
                                     consumers as if it represented the total
                                     population. An example is a toothpaste brand
                                     that shows the user how ‘studies’ can often be
                                     weighted in a company's favor.




Ironically, facts and stats can be used to
paint a very inaccurate — and damaging —
picture of a business, organization or
general topic.




Facts about data science

1790                                  The first big data collection project in
                                      history was by the U.S. Census, which
                                      started in 1790.




5MB                                         When hard drives were first
                                            invented, a 5 megabyte server
                                            took up roughly the space of a
                                            luxury refrigerator. Today, a
                                            32 gigabyte micro-SD card
                                            measures around 5/8 x 3/8 inch
                                            and weighs about 0.5 grams.


                                                 32GB




When collecting mass quantities of data, some human remedial input is needed,
this gave birth to   crowd sourcing, The best example is
Amazon's mechanical turk.




Modern collecting of big data is possible with   cloud computing,
or the spreading of the data across several physical resources that can be accessed
remotely, rather than concentrated at one location.



“The computing and processing of
data is literally 100 to 1,000 times
faster and cheaper than before.”
— Scott Yara, Greenplum

More Related Content

PDF
A Primer for a layman about Big Data, Business Analytics and Cloud
PDF
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
PDF
Data Analaytics.04. Data visualization
PDF
From Information to Insight: Data Storytelling for Organizations
PPTX
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...
PDF
KM - Cognitive Computing overview by Ken Martin 13Apr2016
PDF
EDW 2015 cognitive computing panel session
PPTX
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
A Primer for a layman about Big Data, Business Analytics and Cloud
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Data Analaytics.04. Data visualization
From Information to Insight: Data Storytelling for Organizations
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...
KM - Cognitive Computing overview by Ken Martin 13Apr2016
EDW 2015 cognitive computing panel session
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...

What's hot (20)

PDF
Big data march2016 ipsos mori
PDF
Ibm 1129-the big data zoo
DOCX
Big Data-Job 2
PDF
Data science landscape in the insurance industry
DOCX
Global Data Management: Governance, Security and Usefulness in a Hybrid World
PDF
Brief introduction to data visualization
PDF
Hadoop Overview
PDF
How to collect and organize data
PPTX
Data Science Innovations : Democratisation of Data and Data Science
PDF
Big data Paper
PDF
Semantic Web Investigation within Big Data Context
PDF
Data science and the art of persuasion
PPTX
Keynote Dubai
PPTX
Lecture #03
PDF
Causal networks, learning and inference - Introduction
DOCX
Map Reduce in Big fata
PPTX
Knowledge Graphs and their central role in big data processing: Past, Present...
PDF
Keynote acm10.14.2017
PDF
Km cognitive computing overview by ken martin 19 jan2015
PDF
Talk straps: Interactivity between Human and Artificial Intelligence
Big data march2016 ipsos mori
Ibm 1129-the big data zoo
Big Data-Job 2
Data science landscape in the insurance industry
Global Data Management: Governance, Security and Usefulness in a Hybrid World
Brief introduction to data visualization
Hadoop Overview
How to collect and organize data
Data Science Innovations : Democratisation of Data and Data Science
Big data Paper
Semantic Web Investigation within Big Data Context
Data science and the art of persuasion
Keynote Dubai
Lecture #03
Causal networks, learning and inference - Introduction
Map Reduce in Big fata
Knowledge Graphs and their central role in big data processing: Past, Present...
Keynote acm10.14.2017
Km cognitive computing overview by ken martin 19 jan2015
Talk straps: Interactivity between Human and Artificial Intelligence
Ad

Viewers also liked (9)

PPTX
My buyer agency services
PPT
Tracey Taylor Real Estate Buyer Presentation
PDF
1st time homebuyer flyer
PDF
Buy A New Home in 2015 - Buyer presentation
PPTX
First time buyer slide show
PPT
Buyer presentation
PDF
Who Is the First Time Homebuyer - Infographic | New American Funding
PPT
First Time Home Buyer Seminar
PPT
1st Time Home Buyer Seminars
My buyer agency services
Tracey Taylor Real Estate Buyer Presentation
1st time homebuyer flyer
Buy A New Home in 2015 - Buyer presentation
First time buyer slide show
Buyer presentation
Who Is the First Time Homebuyer - Infographic | New American Funding
First Time Home Buyer Seminar
1st Time Home Buyer Seminars
Ad

Similar to Data scientist (20)

PDF
Data science-Introductions-Real World Application
PPTX
Unit 1-FDS. .pptx
PDF
Top 10 data science takeaways for executives
PDF
Untitled document.pdf
PDF
What is data science ?
PDF
Snowball Group Whitepaper - Spotlight on Big Data
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PDF
Why is Data Science a Popular Career Choice.pdf
PPTX
Data analytics using Scalable Programming
PDF
Insight white paper_2014
PDF
Embracing data science
DOCX
Big data (word file)
PDF
Data science mastery course in pitampura
PPTX
Introduction To Data Mining and Data Mining Techniques.pptx
PDF
Big Data & Analytics Trends 2016 Vin Malhotra
PDF
The Future of Data Science Trends and Predictions - CETPA Infotech
PDF
365 Data Science
PDF
Data Scientist - Good Rebels -
PPTX
Ds article ppt
PDF
Big data upload
Data science-Introductions-Real World Application
Unit 1-FDS. .pptx
Top 10 data science takeaways for executives
Untitled document.pdf
What is data science ?
Snowball Group Whitepaper - Spotlight on Big Data
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Why is Data Science a Popular Career Choice.pdf
Data analytics using Scalable Programming
Insight white paper_2014
Embracing data science
Big data (word file)
Data science mastery course in pitampura
Introduction To Data Mining and Data Mining Techniques.pptx
Big Data & Analytics Trends 2016 Vin Malhotra
The Future of Data Science Trends and Predictions - CETPA Infotech
365 Data Science
Data Scientist - Good Rebels -
Ds article ppt
Big data upload

More from Trieu Nguyen (20)

PDF
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
PDF
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
PDF
Building Your Customer Data Platform with LEO CDP
PDF
How to track and improve Customer Experience with LEO CDP
PDF
[Notes] Customer 360 Analytics with LEO CDP
PDF
Leo CDP - Pitch Deck
PDF
LEO CDP - What's new in 2022
PDF
Lộ trình triển khai LEO CDP cho ngành bất động sản
PDF
Why is LEO CDP important for digital business ?
PDF
From Dataism to Customer Data Platform
PDF
Data collection, processing & organization with USPA framework
PDF
Part 1: Introduction to digital marketing technology
PDF
Why is Customer Data Platform (CDP) ?
PDF
How to build a Personalized News Recommendation Platform
PDF
How to grow your business in the age of digital marketing 4.0
PDF
Video Ecosystem and some ideas about video big data
PDF
Concepts, use cases and principles to build big data systems (1)
PDF
Open OTT - Video Content Platform
PDF
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
PDF
Introduction to Recommendation Systems (Vietnam Web Submit)
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP
How to track and improve Customer Experience with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
Leo CDP - Pitch Deck
LEO CDP - What's new in 2022
Lộ trình triển khai LEO CDP cho ngành bất động sản
Why is LEO CDP important for digital business ?
From Dataism to Customer Data Platform
Data collection, processing & organization with USPA framework
Part 1: Introduction to digital marketing technology
Why is Customer Data Platform (CDP) ?
How to build a Personalized News Recommendation Platform
How to grow your business in the age of digital marketing 4.0
Video Ecosystem and some ideas about video big data
Concepts, use cases and principles to build big data systems (1)
Open OTT - Video Content Platform
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Introduction to Recommendation Systems (Vietnam Web Submit)

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Institutional Correction lecture only . . .
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
Classroom Observation Tools for Teachers
PPTX
Pharma ospi slides which help in ospi learning
PDF
01-Introduction-to-Information-Management.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Complications of Minimal Access Surgery at WLH
102 student loan defaulters named and shamed – Is someone you know on the list?
2.FourierTransform-ShortQuestionswithAnswers.pdf
RMMM.pdf make it easy to upload and study
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Anesthesia in Laparoscopic Surgery in India
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial diseases, their pathogenesis and prophylaxis
Institutional Correction lecture only . . .
Microbial disease of the cardiovascular and lymphatic systems
TR - Agricultural Crops Production NC III.pdf
Basic Mud Logging Guide for educational purpose
Classroom Observation Tools for Teachers
Pharma ospi slides which help in ospi learning
01-Introduction-to-Information-Management.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Insiders guide to clinical Medicine.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

Data scientist

  • 1. DATA SCIENCE MORE THAN MINING “The sexiest job in the next 10 years will be statisticians.” — Hal Varian, Chief economist, Google While the concept of data science has been around for decades, the notion of a data scientist has become a sought-after and in-demand career leading to a rise of a new generation of data scientists. The phenomenon in technology development significantly exposes the staggering growth rates of “big data.” Technology innovation and the World Wide Web provide for the growth of new types of data — such as user-generated content — and tools that can be used to interpret it. Social media platforms such as Facebook (the largest social network and valued at $52 billion) depend on data science to create innovative, interactive features that encourage users to get interested and stay that way — all so that we know it's important. But what does the term ‘Data Science’ really mean? What is data science? Data science can be broken down into four essential parts. Mining data Statistics Collecting and formatting Information analysis the information Interpret Leverage A B C ? Representation or visualization in Implications of the data, the form of presentations, application of the data, interaction infographics, graphs or charts using the data and predictions formed from studying it Defining a data scientist A good data scientist understands the importance of: Scouring Organization Their eyes search for Their voice asks questions information on the web about what they hope to Vectorized operations accomplish at the end of the project, setting Algorithmic strategizing information goals. APIs Extraction Expansion & Takes information they want and Application organizing it using formulas. They organize the information in order to The appropriate data flows form educated, insightful conclusions out of the person in the form using statistical and these of keywords, Facebook “Likes” mathematical methods: and other statistics. Factor Analysis Regression Analysis Correlation Time Series Analysis Creating new theories and predictions based upon the data Ask questions to further expound pile-up and missed opportunities. upon the data beyond the reaches of For example, statistics regarding hard numbers or facts. holiday shopping trends are Apply the information in a useful, imperative around the holiday innovative manner to applications season. If the statistics are whose success depends on data processed and the conclusions are science. drawn too late, the season has passed and the information can no Immediately process terabytes of longer be utilized to its full potential. data that flow in to prevent Required skills for a data scientist A successful data scientist must have a combination of skills that opens up possibilities both for that individual and their team. Visualization processes are often disjointed since each person is typically assigned to a specific part of the project. The designer depends on the information architect. The information architect depends on stats from the statistician, and so on. A true data scientist should be skilled in multiple areas. Expertise in Hacking and Mathematics, Computer Statistics, Creativity Science Data Mining & Insight % Knowing how to take Pulling important Knowing what advantage of statistics and statistics are computers and the coherently organizing important and how internet to create them using to leverage them data-mining formulas mathematic prowess and computer formulas Dangers of data science Statistics can be displayed in a misleading manner Leading the pollee: What type of question are you more likely to answer “yes” to? 85% 70% No Yes Should Americans be taxed Should taxes support the so others can take advantage government’s aid to those of welfare and avoid working? who are unable to find work? Facts that are left out Including only the starting and ending points of data makes the change seem more drastic. A collage of carefully 9 of 10 selected information combined to induce a certain opinion Selection bias occurs when an unrepresentative population has been taken for a survey or study and then the results are advertised to the public consumers as if it represented the total population. An example is a toothpaste brand that shows the user how ‘studies’ can often be weighted in a company's favor. Ironically, facts and stats can be used to paint a very inaccurate — and damaging — picture of a business, organization or general topic. Facts about data science 1790 The first big data collection project in history was by the U.S. Census, which started in 1790. 5MB When hard drives were first invented, a 5 megabyte server took up roughly the space of a luxury refrigerator. Today, a 32 gigabyte micro-SD card measures around 5/8 x 3/8 inch and weighs about 0.5 grams. 32GB When collecting mass quantities of data, some human remedial input is needed, this gave birth to crowd sourcing, The best example is Amazon's mechanical turk. Modern collecting of big data is possible with cloud computing, or the spreading of the data across several physical resources that can be accessed remotely, rather than concentrated at one location. “The computing and processing of data is literally 100 to 1,000 times faster and cheaper than before.” — Scott Yara, Greenplum