SlideShare a Scribd company logo
Dr. Dalal AL-Alimi
• What is data?
• Role of Databases in Data Analytics
• Difference between a database and a dataset
Data refers to symbols or signs that represent a measurement or a model of
reality. By itself, data has no meaning until it is interpreted according to
higher-level conventions and understandings (HLCUs).
• Why This Definition?
When preparing data, you need to consider the purpose (or HLCU) it’s being
prepared for:
1.If the data is for humans, it may need to be clear and visually understandable.
2.If the data is for machines or algorithms, it should be structured and
formatted appropriately.
3.Different algorithms may require different kinds of data preprocessing based
on their requirements.
• The DIKW pyramid explains the hierarchy of data
and its transformation:
1. Data: A collection of symbols – cannot answer any
questions.
2. Information: Processed data – can answer the questions
who, when, where, and what.
3. Knowledge: Descriptive application of Information – can
answer the question how.
4. Wisdom: Embodiment of Knowledge and appreciation of
why.
• The pyramid also shows the relationship between
abundance and value:
• Data is abundant but has low value until it is processed.
• Wisdom is rare but highly valuable as it involves
actionable insights.
• The definitions of all four elements of DDVW are
presented as follows:
o Data: All possible data from across all the data
resources
o Dataset: A relevant collection of data selected from all
the available data sources and organized for the next
step
o Visualization: The comprehensible presentation of
what has been found in the dataset (similar to
Knowledge in DIKW – descriptive application of
Information)
o Wisdom: Embodiment of Knowledge and appreciation
of why (the same as Wisdom in DIKW)
The most universal data structure – a table
At the end of successful data preprocessing, we want to
create a table that is ready to be mined, analyzed, or
visualized. We call this table a dataset.
• Data objects are known by many different names, such as data points, rows,
records, examples, samples, tuples, and many more. However, as you know for a
table to make sense, you need the conceptual definition of data objects.
• Data attributes have different names such as columns, variables, features, and
dimensions might be used instead of attributes.
A database is a technology for storing and retrieving data in a way
that is both effective and efficient.
• Why Databases Are Not Analytics-Ready
Databases are primarily designed to:
1.Store data efficiently.
2.Retrieve data quickly.
However, they are not specifically organized for analytics. This means
that the structure of the data in a database might not match what is
needed for analysis.
1.The first step in data analytics is to find
and gather data from databases and other
sources.
2.The next step is to reorganize the data
into a format or dataset that is useful for
answering analytical or decision-making
questions.
• In essence, databases store raw data, and
it’s up to analysts to prepare this data for
meaningful analysis.
• A database as a technological solution for storing and retrieving
data both effectively and efficiently.
• A dataset is a specific organization and presentation of some data
for a specific reason.
Goal: Predict hourly electricity consumption in the city of Redlands based on weather data.
Dataset Design:
• Data object: Each row represents an hour in the city of Redlands.
• Attributes (columns):
• Average temperature.
• Average humidity.
• Average wind speed.
• Electricity consumption.
• Data Sources:
• Weather data: Comes from five databases, each recording weather every 15 minutes for its specific location in
the city.
• Electricity data: Comes from one database managed by the city’s electricity supplier, recording consumption
every 5 minutes.
• Mainly there are four types of databases:
1. Relational databases, or structured databases, are an ecosystem of data collection and
management in which both the collected data and the incoming data must conform with a
pre-defined set of relationships between the data.
2. NoSQL, or unstructured databases, are precisely the solution for the problem of wanting to
store data that we are unable to structure, or are ambivalent to do so. Furthermore,
unstructured databases can be used as an interim house for data we do not have the resources
to structure just now.
3. Distributed database is a collection of databases (structured, unstructured, or a combination
of the two) whose data is physically stored in multiple locations.
4. Blockchain is a database alternative that does not have a central authority while providing
data safety; like Bitcoin.
1. Supervised Datasets
• Definition: These datasets contain input-output (feature-label) pairs. Each data
point has both independent variables (features) and a corresponding dependent
variable (label).
• Used For: Supervised learning tasks such as classification and regression.
• Examples:
• Classification: A dataset of images labeled as "cat" or "dog".
• Regression: A dataset of house prices based on square footage, number of bedrooms, etc.
2. Unsupervised Datasets
• Definition: These datasets contain only independent variables (features)
without predefined labels.
• Used For: Unsupervised learning tasks such as clustering, dimensionality
reduction, and anomaly detection.
• Examples:
• Clustering: Grouping customers based on purchasingbuy behavior without predefined
categories.
• Dimensionality Reduction: Principal Component Analysis (PCA) applied to high-
dimensional data.
Feature Supervised Dataset Unsupervised Dataset
Labels Available? Yes (Labeled) No (Unlabeled)
Task Examples Classification, Regression Clustering, Anomaly Detection
Algorithms Used
SVM, Decision Trees, Neural Networks,
Linear Regression
K-Means, DBSCAN, PCA,
Autoencoders
Output Type Predicts specific values or categories Finds patterns and structures
Complexity Less More
• Understanding the differences between structured and unstructured
database
1. What is a database and its types?
2. In your own words, describe the difference between a dataset and a
database.

More Related Content

DOCX
Information Systems For Business and BeyondChapter 4Data a.docx
PPT
Datamining
PPTX
RowanDay4.pptx
PDF
Dbms intro
PPT
Database Management & Models
PPTX
Introductio to Data Science and types of data
PPTX
Relational Database management notes for mca.pptx
PPTX
DATAWAREHOUSE MAIn under data mining for
Information Systems For Business and BeyondChapter 4Data a.docx
Datamining
RowanDay4.pptx
Dbms intro
Database Management & Models
Introductio to Data Science and types of data
Relational Database management notes for mca.pptx
DATAWAREHOUSE MAIn under data mining for

Similar to Introduction to Artificial Intelligence_ Lec 4 (20)

PDF
Management Information Systems 13th Edition Laudon Solutions Manual
PDF
Management Information Systems 13th Edition Laudon Solutions Manual
PPTX
Database Introduction for MIS Students.pptx
PPTX
DATABASE MANAGEMENT SYSTEMS CS 3492.pptx
PDF
Management Information Systems 13th Edition Laudon Solutions Manual
PPTX
Concepts of Data Bases
PPTX
Dbms and it infrastructure
PDF
Business Analytics Project Example
PPTX
DATA RESOURCE MANAGEMENT
PDF
PPTX
Dbms Useful PPT
PPTX
History of database processing module 1 (2)
PPTX
Presentation1
PPT
Unit 1
PPTX
Business Intelligence.pptx
PPTX
ch2 DS.pptx
PDF
Lect. 7 - MIS and business analytics.pdf
PPTX
Chapter 2- Data Science and big data.pptx
PPT
Database Management System Introduction
Management Information Systems 13th Edition Laudon Solutions Manual
Management Information Systems 13th Edition Laudon Solutions Manual
Database Introduction for MIS Students.pptx
DATABASE MANAGEMENT SYSTEMS CS 3492.pptx
Management Information Systems 13th Edition Laudon Solutions Manual
Concepts of Data Bases
Dbms and it infrastructure
Business Analytics Project Example
DATA RESOURCE MANAGEMENT
Dbms Useful PPT
History of database processing module 1 (2)
Presentation1
Unit 1
Business Intelligence.pptx
ch2 DS.pptx
Lect. 7 - MIS and business analytics.pdf
Chapter 2- Data Science and big data.pptx
Database Management System Introduction
Ad

More from Dalal2Ali (9)

PDF
Introduction to Artificial Intelligence_Lec 1
PDF
Introduction to Artificial Intelligence_ Lec 2
PDF
Introduction to Artificial Intelligence_ Lec 9
PDF
Introduction to Artificial Intelligence_ Lec 10
PDF
Introduction to Artificial Intelligence_ Lec 8
PDF
Introduction to Artificial Intelligence_ Lec 6
PDF
Introduction to Artificial Intelligence_ Lec 7
PDF
Introduction to Artificial Intelligence_ Lec 3
PDF
Introduction to Artificial Intelligence_ Lec 5
Introduction to Artificial Intelligence_Lec 1
Introduction to Artificial Intelligence_ Lec 2
Introduction to Artificial Intelligence_ Lec 9
Introduction to Artificial Intelligence_ Lec 10
Introduction to Artificial Intelligence_ Lec 8
Introduction to Artificial Intelligence_ Lec 6
Introduction to Artificial Intelligence_ Lec 7
Introduction to Artificial Intelligence_ Lec 3
Introduction to Artificial Intelligence_ Lec 5
Ad

Recently uploaded (20)

PPTX
Introduction to machine learning and Linear Models
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Foundation of Data Science unit number two notes
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Computer network topology notes for revision
PPT
Quality review (1)_presentation of this 21
Introduction to machine learning and Linear Models
Supervised vs unsupervised machine learning algorithms
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Business Acumen Training GuidePresentation.pptx
Introduction to Knowledge Engineering Part 1
Foundation of Data Science unit number two notes
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Computer network topology notes for revision
Quality review (1)_presentation of this 21

Introduction to Artificial Intelligence_ Lec 4

  • 2. • What is data? • Role of Databases in Data Analytics • Difference between a database and a dataset
  • 3. Data refers to symbols or signs that represent a measurement or a model of reality. By itself, data has no meaning until it is interpreted according to higher-level conventions and understandings (HLCUs). • Why This Definition? When preparing data, you need to consider the purpose (or HLCU) it’s being prepared for: 1.If the data is for humans, it may need to be clear and visually understandable. 2.If the data is for machines or algorithms, it should be structured and formatted appropriately. 3.Different algorithms may require different kinds of data preprocessing based on their requirements.
  • 4. • The DIKW pyramid explains the hierarchy of data and its transformation: 1. Data: A collection of symbols – cannot answer any questions. 2. Information: Processed data – can answer the questions who, when, where, and what. 3. Knowledge: Descriptive application of Information – can answer the question how. 4. Wisdom: Embodiment of Knowledge and appreciation of why. • The pyramid also shows the relationship between abundance and value: • Data is abundant but has low value until it is processed. • Wisdom is rare but highly valuable as it involves actionable insights.
  • 5. • The definitions of all four elements of DDVW are presented as follows: o Data: All possible data from across all the data resources o Dataset: A relevant collection of data selected from all the available data sources and organized for the next step o Visualization: The comprehensible presentation of what has been found in the dataset (similar to Knowledge in DIKW – descriptive application of Information) o Wisdom: Embodiment of Knowledge and appreciation of why (the same as Wisdom in DIKW)
  • 6. The most universal data structure – a table At the end of successful data preprocessing, we want to create a table that is ready to be mined, analyzed, or visualized. We call this table a dataset. • Data objects are known by many different names, such as data points, rows, records, examples, samples, tuples, and many more. However, as you know for a table to make sense, you need the conceptual definition of data objects. • Data attributes have different names such as columns, variables, features, and dimensions might be used instead of attributes.
  • 7. A database is a technology for storing and retrieving data in a way that is both effective and efficient. • Why Databases Are Not Analytics-Ready Databases are primarily designed to: 1.Store data efficiently. 2.Retrieve data quickly. However, they are not specifically organized for analytics. This means that the structure of the data in a database might not match what is needed for analysis.
  • 8. 1.The first step in data analytics is to find and gather data from databases and other sources. 2.The next step is to reorganize the data into a format or dataset that is useful for answering analytical or decision-making questions. • In essence, databases store raw data, and it’s up to analysts to prepare this data for meaningful analysis.
  • 9. • A database as a technological solution for storing and retrieving data both effectively and efficiently. • A dataset is a specific organization and presentation of some data for a specific reason.
  • 10. Goal: Predict hourly electricity consumption in the city of Redlands based on weather data. Dataset Design: • Data object: Each row represents an hour in the city of Redlands. • Attributes (columns): • Average temperature. • Average humidity. • Average wind speed. • Electricity consumption. • Data Sources: • Weather data: Comes from five databases, each recording weather every 15 minutes for its specific location in the city. • Electricity data: Comes from one database managed by the city’s electricity supplier, recording consumption every 5 minutes.
  • 11. • Mainly there are four types of databases: 1. Relational databases, or structured databases, are an ecosystem of data collection and management in which both the collected data and the incoming data must conform with a pre-defined set of relationships between the data. 2. NoSQL, or unstructured databases, are precisely the solution for the problem of wanting to store data that we are unable to structure, or are ambivalent to do so. Furthermore, unstructured databases can be used as an interim house for data we do not have the resources to structure just now. 3. Distributed database is a collection of databases (structured, unstructured, or a combination of the two) whose data is physically stored in multiple locations. 4. Blockchain is a database alternative that does not have a central authority while providing data safety; like Bitcoin.
  • 12. 1. Supervised Datasets • Definition: These datasets contain input-output (feature-label) pairs. Each data point has both independent variables (features) and a corresponding dependent variable (label). • Used For: Supervised learning tasks such as classification and regression. • Examples: • Classification: A dataset of images labeled as "cat" or "dog". • Regression: A dataset of house prices based on square footage, number of bedrooms, etc.
  • 13. 2. Unsupervised Datasets • Definition: These datasets contain only independent variables (features) without predefined labels. • Used For: Unsupervised learning tasks such as clustering, dimensionality reduction, and anomaly detection. • Examples: • Clustering: Grouping customers based on purchasingbuy behavior without predefined categories. • Dimensionality Reduction: Principal Component Analysis (PCA) applied to high- dimensional data.
  • 14. Feature Supervised Dataset Unsupervised Dataset Labels Available? Yes (Labeled) No (Unlabeled) Task Examples Classification, Regression Clustering, Anomaly Detection Algorithms Used SVM, Decision Trees, Neural Networks, Linear Regression K-Means, DBSCAN, PCA, Autoencoders Output Type Predicts specific values or categories Finds patterns and structures Complexity Less More
  • 15. • Understanding the differences between structured and unstructured database
  • 16. 1. What is a database and its types? 2. In your own words, describe the difference between a dataset and a database.