SlideShare a Scribd company logo
Ch. Eick Introduction to Data Science 8/28/2023
What is Data Science?
 What words come to mind when you think of Data
Science?
 What experience do you have with Data
Science?
 Why are you taking the Data Science I class?
1
Ch. Eick Introduction to Data Science 8/28/2023
Data Science Definitions
Data Science is an interdisciplinary field about processes and systems to extract
knowledge or insights from data in various forms, either structured or
unstructured, which is a continuation of some of the data analysis fields such as
statistics, data mining, and predictive analytics, similar to Knowledge Discovery
in Data (KDD).
Data Science is a multi-disciplinary field that uses scientific methods, processes,
algorithms and systems to extract knowledge and insights from structured and
unstructured data. Data science is the same concept as data mining and big
data: "use the most powerful hardware, the most powerful programming
systems, and the most efficient algorithms to solve problems" Data science is a
"concept to unify statistics, data analysis, machine learning and their related
methods" in order to "understand and analyze actual phenomena" with data
technology“ (Wikipedia)
Ch. Eick Introduction to Data Science 8/28/2023
Data Science Definitions II
 There are many, but most say data science is:
– Broad – broader than any one existing discipline
– Interdisciplinary: Computer Science, Statistics,
Information Science, Databases, Mathematics
– Applied focus on extracting knowledge from data to
inform decision making.
– Focuses on the skills needed to collect, manage,
store, distribute, analyze, visualize, reuse data and on
data storytelling.
 There are many visual representations in Data
Science
3
Ch. Eick Introduction to Data Science 8/28/2023
Some More Definitions
4
Ch. Eick Introduction to Data Science 8/28/2023
More Definitions
5
Ch. Eick Introduction to Data Science 8/28/2023
Data Science is Broad!
6
Ch. Eick Introduction to Data Science 8/28/2023
Data Science Word Cloud
7
Ch. Eick Introduction to Data Science 8/28/2023
Data Analysis
 We analyze data to extract meaning from it.
 Virtually all data analysis focuses on data
reduction
 Data reduction comes in the form of:
– Descriptive statistics
– Measures of association
– Graphical visualizations
 The objective is to abstract from all of the data
some feature or set of features that captures
evidence of the process you are studying
8
Ch. Eick Introduction to Data Science 8/28/2023
The Data Lifecycle
 Data science considers data at every stage of what is
called the data lifecycle.
 This lifecycle generally refers to everything from
collecting data to analyzing it to sharing it so others
can re-analyze it.
 New visions of this process in particular focus on
integrating every action that creates, analyzes, or
otherwise touches data.
 These same new visions treat the process as
dynamic – archives are not just digital shoe boxes
under the bed.
 There are many representations of the this lifecycle.
9
Ch. Eick Introduction to Data Science 8/28/2023
10
Ch. Eick Introduction to Data Science 8/28/2023
Data curation is a term used to indicate management
activities related to organization and integration
of data collected from various sources, annotation of the
data, and publication and presentation of the data such
that the value of the data is maintained over time, and the
data remains available for reuse and preservation.
Data Curation
Ch. Eick Introduction to Data Science 8/28/2023
News August 29, 2023
 Task1 ProblemSet1 should be available on Sept. 5 or earlier;
check course website!
 Plans for today’s class
– Finish Introduction to Data Science/Data Mining
– 3p: Conduct Survey R/Python Knowledge of Students in the
course
– Data Science Basics & Exploratory Data Analysis
 Plans for the August 31 class
– Continue Data Science Basics & Exploratory Data Analysis
– Preprocessing for DS/Data Mining
– Optional: Brief Discussion of Task1 of ProblemSet1
– More Discussion GHC and Announce GHC Groups
 There will be a lab as part of the Sept. 5 class; bring laptop!
12
Ch. Eick Introduction to Data Science 8/28/2023
What is Missing?
 Most definitions of data science underplay or
leave out discussions of:
– Substantive theory
– Metadata
– Privacy and Ethics
13
Ch. Eick Introduction to Data Science 8/28/2023
Privacy and Ethics
 Data, the elements of data science, and even so-
called “Big Data” are not new.
 One thing that is new is the greater variety of
data and, most importantly, the amount of data
available about humans.
 Discussion and good policy regarding privacy,
security, and the ethical use of data about people
lags behind the methods of collecting, sharing,
archiving, and analyzing data.
14
Ch. Eick Introduction to Data Science 8/28/2023
Big Data
 The launch of the Data Science conversation has
been sparked primarily by the so-called “Big
Data” revolution.
 As mentioned, we have always had data that
taxed our technical and computational capacities.
 “Big Data” makes front-page news, however,
because of the explosion of data about people.
 Contemporary definitions of Big Data focus on:
– Volume (the amount of data)
– Velocity (the speed of data in and out)
– Variety (the diverse types of data)
15
Ch. Eick Introduction to Data Science 8/28/2023
16
Ch. Eick Introduction to Data Science 8/28/2023
Talk Outline
1. Importance of Data Science
2. Data Science is More than Using Tools
3. Data Storytelling
4. Examples of Data Storytelling
5. Conclusion
Data Analysis and Intelligent Systems Lab
Data Science According to Swami Chandrasekaran
Ch. Eick Introduction to Data Science 8/28/2023
Data Science and Storytelling
 Google’s Chief Economist Dr. Hal R.Varian stated, "The
ability to take data—to be able to understand it, to process it,
to extract value from it, to visualize it, to communicate it—
that’s going to be a hugely important skill in the next
decades."
 “When hiring data scientists, people tend to focus primarily
on technical qualifications. It’s hard to find candidates who
have the right mix of computational and statistical skills. But
what’s even harder is finding people who have those skills
and are good at communicating the story behind the data.”
Michael Li
Data Analysis and Intelligent Systems Lab
Ch. Eick Introduction to Data Science 8/28/2023
Data Science and Storytelling 2
 “Data scientists are involved with gathering data, massaging it into a tractable
form, making it tell its story, and presenting that story to others.” – Mike
Loukides, VP, O’Reilly Media
 “Our challenge as data scientists is to translate this haystack of information
into guidance for staff so they can make smart decisions…We “humanize” the
data by turning raw numbers into a story about our performance. Data
scientists want to believe that data has all the answers. But the most important
part of our job is qualitative: asking questions, creating directives from our
data, and telling its story. ” Jeff Bladtand and Bob Filbin
 Telling Stories with Data in 3 Steps (Quick Study) - Bing video (showed
this video on Aug. 24, 2023)
 The Power in Effective Data Storytelling | Malavica Sridhar | TEDxUIUC -
Bing video starting 6:40
 20 of the Best Video Storytelling Examples EVER | Wyzowl
 20 Best Data Storytelling Examples (updated for 2023) — Juice Analytics
Data Analysis and Intelligent Systems Lab
Not covered in August 2023, except we showed one video!
Ch. Eick Introduction to Data Science 8/28/2023
Comment
 This concludes the Introduction to Data
Science/Data Mining
 The remaining slides of this slide show and other
slides will be discussed in a lecture centering of
“Data Storytelling”, in the second half of
November 2023…
 There will be also some limited coverage of Data
Science Ethics in November, 2023.
Ch. Eick Introduction to Data Science 8/28/2023
Data Science is More than Using Tools
 The problem with data is that it says a lot, but it also says nothing.
‘Big data’ is terrific, but it’s usually thin. To understand why
something is happening, we have to engage in both forensics and
guess work.”- Sendhil Mullainathan, Professor of economics,
Harvard
 “But a theory is not like an airline or bus timetable. We are not
interested simply in the accuracy of its predictions. A theory also
serves as a base for thinking. It helps us to understand what is
going on by enabling us to organize our thoughts. Faced with a
choice between a theory which predicts well but gives us little
insight into how the system works and one which gives us this
insight but predicts badly, I would choose the latter, and I am
inclined to think that most economists would do the same.”
― Ronald H. Coase, Essays on Economics and Economists
Data Analysis and Intelligent Systems Lab
Slides which follow were not covered in August 2023!
Ch. Eick Introduction to Data Science 8/28/2023
Data Storytelling is Currently “Hot”
Evidence:
 Quotations of Leaders in Data Science we
presented earlier
 Watch Commercials: https://www.tableau.com/solutions/customer/storytelling-data-0
 Popularity of TED Talks, most of which mostly
center on data storytelling.
 New productsaps: https://www.esri.com/arcgis-blog/story-maps/
 …Learn Data Story telling on Tableau 2022 in 7 mins - Bing video
 A lot of data storytelling contests
Data Analysis and Intelligent Systems Lab
Ch. Eick Introduction to Data Science 8/28/2023
Importance of Data Science
 “It is a capital mistake to theorize before one has
data.”- Arthur Conan Doyle, Author of Sherlock
Holmes
 If you’re a scientist, and you have to have an
answer, even in the absence of data, you’re not
going to be a good scientist.” – Neil deGrasse
Tyson, Astrophysicist
 “Without big data analytics, companies are blind
and deaf, wandering out onto the Web like deer
on a freeway.” – Geoffrey Moore, Partner at MDV
Data Analysis and Intelligent Systems Lab
Ch. Eick Introduction to Data Science 8/28/2023
Resposibilities of Data Scientists
1. We have to have some committment to telleth
„Torture the data long enough and it will confess to anything."
Nobel Prize winning economist Ronald Coase
“To find signals in data, we must learn to reduce the noise -
not just the noise that resides in the data, but also the noise
that resides in us. It is nearly impossible for noisy minds to
perceive anything but noise in data.”
― Stephen Few, Signal: Understanding What Matters in a
World of Noise
2. We have to know what we are doing
Data Analysis and Intelligent Systems Lab
Ch. Eick Introduction to Data Science 8/28/2023
COSC Data Science Curriculum
COSC 3337: Data Science II (starting in 2019)
Credit Hours: 3.0
Lecture Contact Hours: 3 Lab Contact Hours: 0
Prerequisite: ‘Data Structures’
Data science process, data preprocessing, exploratory data
analysis, data visualization, basic statistics, basic machine
learning concepts, classification and prediction, similarity
assessment, clustering, post-processing and interpreting data
analysis results, use of data analysis tools and programming
languages and data analysis case studies.
Data Analysis and Intelligent Systems Lab
Ch. Eick Introduction to Data Science 8/28/2023
26

More Related Content

PPT
Data_Science_Presentationforlearning machine learning
PDF
Data science-Introductions-Real World Application
PPTX
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
PDF
Luciano uvi hackfest.28.10.2020
PDF
DAVLectuer3 Exploratory data analysis .pdf
PPTX
DS_Teacher_Presentation DS and Education.pptx
PDF
DLBDSIDS01_E_Session 1 dATA sCIENCES pRÄSO
PPTX
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data_Science_Presentationforlearning machine learning
Data science-Introductions-Real World Application
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
Luciano uvi hackfest.28.10.2020
DAVLectuer3 Exploratory data analysis .pdf
DS_Teacher_Presentation DS and Education.pptx
DLBDSIDS01_E_Session 1 dATA sCIENCES pRÄSO
Data Science Introduction: Concepts, lifecycle, applications.pptx

Similar to Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. Descriptive statistics. Inferential statistics. Python Libraries for Data Science. (20)

PDF
50YearsDataScience.pdf
PDF
Data science
PPT
data science ppt of emngineering studnets
PPTX
313 IDS _Course_Introduction_PPT.pptx
PDF
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
PPTX
Data science and visualization power point
PPTX
Unit 1-FDS. .pptx
PPT
Data Science-1 (1).ppt
PPTX
Business Plan Business Presentation in Blue Yellow White Corporate Geometric ...
PPTX
Data analytics using Scalable Programming
PPTX
Chapter 1 Introduction to Datascience (1).pptx
PPTX
introduction TO DS 1.pptxvbvcbvcbvcbvcbvcb
PDF
1. introduction to data science —
PPTX
Introduction to data science Course
PPTX
UNIT I- Introduction- data science key components, features
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
PDF
Untitled document.pdf
PDF
S1-Introduction_to_Computational_physics.pdf
PPTX
DataScienceandVisualization_Mod_1_ppt.pptx
PPTX
DATASCIENCE.pptx
50YearsDataScience.pdf
Data science
data science ppt of emngineering studnets
313 IDS _Course_Introduction_PPT.pptx
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
Data science and visualization power point
Unit 1-FDS. .pptx
Data Science-1 (1).ppt
Business Plan Business Presentation in Blue Yellow White Corporate Geometric ...
Data analytics using Scalable Programming
Chapter 1 Introduction to Datascience (1).pptx
introduction TO DS 1.pptxvbvcbvcbvcbvcbvcb
1. introduction to data science —
Introduction to data science Course
UNIT I- Introduction- data science key components, features
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Untitled document.pdf
S1-Introduction_to_Computational_physics.pdf
DataScienceandVisualization_Mod_1_ppt.pptx
DATASCIENCE.pptx
Ad

Recently uploaded (20)

PPTX
Geodesy 1.pptx...............................................
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Well-logging-methods_new................
PPTX
Sustainable Sites - Green Building Construction
PPTX
additive manufacturing of ss316l using mig welding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Well-logging-methods_new................
Sustainable Sites - Green Building Construction
additive manufacturing of ss316l using mig welding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mechanical Engineering MATERIALS Selection
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
CYBER-CRIMES AND SECURITY A guide to understanding
Arduino robotics embedded978-1-4302-3184-4.pdf
573137875-Attendance-Management-System-original
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Model Code of Practice - Construction Work - 21102022 .pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Ad

Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. Descriptive statistics. Inferential statistics. Python Libraries for Data Science.

  • 1. Ch. Eick Introduction to Data Science 8/28/2023 What is Data Science?  What words come to mind when you think of Data Science?  What experience do you have with Data Science?  Why are you taking the Data Science I class? 1
  • 2. Ch. Eick Introduction to Data Science 8/28/2023 Data Science Definitions Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Data (KDD). Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. Data science is the same concept as data mining and big data: "use the most powerful hardware, the most powerful programming systems, and the most efficient algorithms to solve problems" Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data technology“ (Wikipedia)
  • 3. Ch. Eick Introduction to Data Science 8/28/2023 Data Science Definitions II  There are many, but most say data science is: – Broad – broader than any one existing discipline – Interdisciplinary: Computer Science, Statistics, Information Science, Databases, Mathematics – Applied focus on extracting knowledge from data to inform decision making. – Focuses on the skills needed to collect, manage, store, distribute, analyze, visualize, reuse data and on data storytelling.  There are many visual representations in Data Science 3
  • 4. Ch. Eick Introduction to Data Science 8/28/2023 Some More Definitions 4
  • 5. Ch. Eick Introduction to Data Science 8/28/2023 More Definitions 5
  • 6. Ch. Eick Introduction to Data Science 8/28/2023 Data Science is Broad! 6
  • 7. Ch. Eick Introduction to Data Science 8/28/2023 Data Science Word Cloud 7
  • 8. Ch. Eick Introduction to Data Science 8/28/2023 Data Analysis  We analyze data to extract meaning from it.  Virtually all data analysis focuses on data reduction  Data reduction comes in the form of: – Descriptive statistics – Measures of association – Graphical visualizations  The objective is to abstract from all of the data some feature or set of features that captures evidence of the process you are studying 8
  • 9. Ch. Eick Introduction to Data Science 8/28/2023 The Data Lifecycle  Data science considers data at every stage of what is called the data lifecycle.  This lifecycle generally refers to everything from collecting data to analyzing it to sharing it so others can re-analyze it.  New visions of this process in particular focus on integrating every action that creates, analyzes, or otherwise touches data.  These same new visions treat the process as dynamic – archives are not just digital shoe boxes under the bed.  There are many representations of the this lifecycle. 9
  • 10. Ch. Eick Introduction to Data Science 8/28/2023 10
  • 11. Ch. Eick Introduction to Data Science 8/28/2023 Data curation is a term used to indicate management activities related to organization and integration of data collected from various sources, annotation of the data, and publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for reuse and preservation. Data Curation
  • 12. Ch. Eick Introduction to Data Science 8/28/2023 News August 29, 2023  Task1 ProblemSet1 should be available on Sept. 5 or earlier; check course website!  Plans for today’s class – Finish Introduction to Data Science/Data Mining – 3p: Conduct Survey R/Python Knowledge of Students in the course – Data Science Basics & Exploratory Data Analysis  Plans for the August 31 class – Continue Data Science Basics & Exploratory Data Analysis – Preprocessing for DS/Data Mining – Optional: Brief Discussion of Task1 of ProblemSet1 – More Discussion GHC and Announce GHC Groups  There will be a lab as part of the Sept. 5 class; bring laptop! 12
  • 13. Ch. Eick Introduction to Data Science 8/28/2023 What is Missing?  Most definitions of data science underplay or leave out discussions of: – Substantive theory – Metadata – Privacy and Ethics 13
  • 14. Ch. Eick Introduction to Data Science 8/28/2023 Privacy and Ethics  Data, the elements of data science, and even so- called “Big Data” are not new.  One thing that is new is the greater variety of data and, most importantly, the amount of data available about humans.  Discussion and good policy regarding privacy, security, and the ethical use of data about people lags behind the methods of collecting, sharing, archiving, and analyzing data. 14
  • 15. Ch. Eick Introduction to Data Science 8/28/2023 Big Data  The launch of the Data Science conversation has been sparked primarily by the so-called “Big Data” revolution.  As mentioned, we have always had data that taxed our technical and computational capacities.  “Big Data” makes front-page news, however, because of the explosion of data about people.  Contemporary definitions of Big Data focus on: – Volume (the amount of data) – Velocity (the speed of data in and out) – Variety (the diverse types of data) 15
  • 16. Ch. Eick Introduction to Data Science 8/28/2023 16
  • 17. Ch. Eick Introduction to Data Science 8/28/2023 Talk Outline 1. Importance of Data Science 2. Data Science is More than Using Tools 3. Data Storytelling 4. Examples of Data Storytelling 5. Conclusion Data Analysis and Intelligent Systems Lab Data Science According to Swami Chandrasekaran
  • 18. Ch. Eick Introduction to Data Science 8/28/2023 Data Science and Storytelling  Google’s Chief Economist Dr. Hal R.Varian stated, "The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it— that’s going to be a hugely important skill in the next decades."  “When hiring data scientists, people tend to focus primarily on technical qualifications. It’s hard to find candidates who have the right mix of computational and statistical skills. But what’s even harder is finding people who have those skills and are good at communicating the story behind the data.” Michael Li Data Analysis and Intelligent Systems Lab
  • 19. Ch. Eick Introduction to Data Science 8/28/2023 Data Science and Storytelling 2  “Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” – Mike Loukides, VP, O’Reilly Media  “Our challenge as data scientists is to translate this haystack of information into guidance for staff so they can make smart decisions…We “humanize” the data by turning raw numbers into a story about our performance. Data scientists want to believe that data has all the answers. But the most important part of our job is qualitative: asking questions, creating directives from our data, and telling its story. ” Jeff Bladtand and Bob Filbin  Telling Stories with Data in 3 Steps (Quick Study) - Bing video (showed this video on Aug. 24, 2023)  The Power in Effective Data Storytelling | Malavica Sridhar | TEDxUIUC - Bing video starting 6:40  20 of the Best Video Storytelling Examples EVER | Wyzowl  20 Best Data Storytelling Examples (updated for 2023) — Juice Analytics Data Analysis and Intelligent Systems Lab Not covered in August 2023, except we showed one video!
  • 20. Ch. Eick Introduction to Data Science 8/28/2023 Comment  This concludes the Introduction to Data Science/Data Mining  The remaining slides of this slide show and other slides will be discussed in a lecture centering of “Data Storytelling”, in the second half of November 2023…  There will be also some limited coverage of Data Science Ethics in November, 2023.
  • 21. Ch. Eick Introduction to Data Science 8/28/2023 Data Science is More than Using Tools  The problem with data is that it says a lot, but it also says nothing. ‘Big data’ is terrific, but it’s usually thin. To understand why something is happening, we have to engage in both forensics and guess work.”- Sendhil Mullainathan, Professor of economics, Harvard  “But a theory is not like an airline or bus timetable. We are not interested simply in the accuracy of its predictions. A theory also serves as a base for thinking. It helps us to understand what is going on by enabling us to organize our thoughts. Faced with a choice between a theory which predicts well but gives us little insight into how the system works and one which gives us this insight but predicts badly, I would choose the latter, and I am inclined to think that most economists would do the same.” ― Ronald H. Coase, Essays on Economics and Economists Data Analysis and Intelligent Systems Lab Slides which follow were not covered in August 2023!
  • 22. Ch. Eick Introduction to Data Science 8/28/2023 Data Storytelling is Currently “Hot” Evidence:  Quotations of Leaders in Data Science we presented earlier  Watch Commercials: https://www.tableau.com/solutions/customer/storytelling-data-0  Popularity of TED Talks, most of which mostly center on data storytelling.  New productsaps: https://www.esri.com/arcgis-blog/story-maps/  …Learn Data Story telling on Tableau 2022 in 7 mins - Bing video  A lot of data storytelling contests Data Analysis and Intelligent Systems Lab
  • 23. Ch. Eick Introduction to Data Science 8/28/2023 Importance of Data Science  “It is a capital mistake to theorize before one has data.”- Arthur Conan Doyle, Author of Sherlock Holmes  If you’re a scientist, and you have to have an answer, even in the absence of data, you’re not going to be a good scientist.” – Neil deGrasse Tyson, Astrophysicist  “Without big data analytics, companies are blind and deaf, wandering out onto the Web like deer on a freeway.” – Geoffrey Moore, Partner at MDV Data Analysis and Intelligent Systems Lab
  • 24. Ch. Eick Introduction to Data Science 8/28/2023 Resposibilities of Data Scientists 1. We have to have some committment to telleth „Torture the data long enough and it will confess to anything." Nobel Prize winning economist Ronald Coase “To find signals in data, we must learn to reduce the noise - not just the noise that resides in the data, but also the noise that resides in us. It is nearly impossible for noisy minds to perceive anything but noise in data.” ― Stephen Few, Signal: Understanding What Matters in a World of Noise 2. We have to know what we are doing Data Analysis and Intelligent Systems Lab
  • 25. Ch. Eick Introduction to Data Science 8/28/2023 COSC Data Science Curriculum COSC 3337: Data Science II (starting in 2019) Credit Hours: 3.0 Lecture Contact Hours: 3 Lab Contact Hours: 0 Prerequisite: ‘Data Structures’ Data science process, data preprocessing, exploratory data analysis, data visualization, basic statistics, basic machine learning concepts, classification and prediction, similarity assessment, clustering, post-processing and interpreting data analysis results, use of data analysis tools and programming languages and data analysis case studies. Data Analysis and Intelligent Systems Lab
  • 26. Ch. Eick Introduction to Data Science 8/28/2023 26

Editor's Notes