22PCOAM21 Session 2 Understanding Data Source.pptx
1. 07/26/2025 1
Department of Computer Science & Engineering (SB-ET)
III B. Tech -II Semester
DATAANALYTICS
SUBJECT CODE: 22PCOAM21
AcademicYear : 2024-2025
By
Dr.M.Gokilavani
GNITC
Department of CSE (SB-ET)
2. 07/26/2025 Department of CSE (SB-ET) 2
22PCOAM21 DATAANALYTICS
UNIT – I
Syllabus
Data Management: Design Data Architecture and manage the data for
analysis, understand various sources of Data like Sensors/Signals/GPS etc.
Data Management, Data Quality (noise, outliers, missing values, duplicate
data) and Data Preprocessing &Processing.
Course Prerequisites
1. Database Management Systems.
2. Knowledge of probability and statistics.
3. 07/26/2025 3
TEXTBOOK:
• Student’s Handbook for Associate Analytics - II, III.
• Data Mining Concepts and Techniques, Han, Kamber, 3rd Edition, Morgan Kaufmann
Publishers.
REFERENCES:
• Introduction to Data Mining, Tan, Steinbach and Kumar, Addision Wisley, 2006.
• Data Mining Analysis and Concepts, M. Zaki and W. Meira
• Mining of Massive Datasets, Jure Leskovec Stanford Univ. Anand Rajaraman Milliway Labs,
Jeffrey D Ullman Stanford Univ.
No of Hours Required: 13
Department of CSE (SB-ET)
UNIT - I LECTURE - 02
4. 07/26/2025 Department of CSE (SB-ET) 4
Data & Data Collection
• Data is a collection of measurements and facts.
• A tool that helps an individual or a group of individuals reach a sound
conclusion by providing them with some information.
• Data collection serves as the critical first step in this process, laying the
foundation for extracting meaningful insights from raw information.
• Structured like numerical records
• Unstructured like text, audio, or video, organizations can transform raw data
into actionable knowledge.
• In the process of big data analysis, “Data collection” is the initial step before
starting to analyze the patterns or useful information in the data.
UNIT - I LECTURE - 02
5. 07/26/2025 Department of CSE (SB-ET) 5
Data Collection
• The data that is collected is known as raw data, which is not useful now, but after
cleaning the impure and utilizing that data for further analysis forms information, the
information obtained is known as “knowledge”.
• There are two types
• Qualitative data
• Quantitative data
• Qualitative data which is a group of non-numerical data such as words, sentences
mostly focus on behavior and actions of the group.
• Quantitative data which is in numerical forms and can be calculated using different
scientific tools and sampling data.
UNIT - I LECTURE - 02
6. 07/26/2025 Department of CSE (SB-ET) 6
Data Collection
The actual data is then further divided mainly into two types known as:
• Primary data
• Secondary data
UNIT - I LECTURE - 02
8. 07/26/2025 Department of CSE (SB-ET) 8
Primary Data
• The data which is Raw, original, and extracted directly from the official sources is
known as primary data.
• This type of data is collected directly by performing techniques such as questionnaires,
interviews, and surveys.
• The data collected must be according to the demand and requirements of the target
audience on which analysis is performed otherwise it would be a burden in the data
processing.
• Few methods of collecting primary data
• Interview method - E.g: Telephone, face to face, email, etc.
• Survey method - E.g: Text, audio, or video
• Observation method – E.g: Results
• Experimental Method – E.g: Research, and Investigation
UNIT - I LECTURE - 02
9. 07/26/2025 Department of CSE (SB-ET) 9
Secondary Data
• Secondary data is the data which has already been collected and
reused again for some valid purpose.
• This type of data is previously recorded from primary data and it has two
types of sources named
• Internal source
• External source
• Other Resources such as Sensors data, Satellite Data, Web traffic etc.,
UNIT - I LECTURE - 02
10. 07/26/2025 Department of CSE (SB-ET) 10
Topics to be covered in next session 3
• Data Quality (noise, outliers, missing values, duplicate
data)
Thank you!!!
UNIT - I LECTURE - 02