ME-438
AI AND INTERNET OF THINGS
ELECTIVE COURSE
NED University of Engineering & Technology
1
THIS WEEK
 Data Acquisition in Machine Learning
 Data Acquisition Techniques and Tools
AI and Internet of Things
DR. HAIDER ALI 2
DATA ACQUISITION IN MACHINE LEARNING
AI and Internet of Things
DR. HAIDER ALI 3
DATA ACQUISITION
AI and Internet of Things
DR. HAIDER ALI 4
“Data acquisition is the process of sampling signals that
measure real-world physical conditions and converting the
resulting samples into digital numeric values that a computer
can manipulate.”
LIFE-CYCLE OF A MACHINE LEARNING PROJECT
The life-cycle of a Machine Learning project follows:
1. Defining the project objective: Identifying the business problem, converting it into a
statistical problem, and then to the optimization problem
2. Data Acquisition or Collection: Acquiring and merging the data from all the appropriate
sources
3. Data Exploration and Pre-processing: Cleaning and preprocessing the data to create
homogeneity, performing exploratory data analysis and statistical analysis to understand the
relationships between the variables.
4. Feature Engineering: Create new features based on empirical relationships and select
significant variables using dimension reductional techniques.
AI and Internet of Things
DR. HAIDER ALI 5
LIFE-CYCLE OF A MACHINE LEARNING PROJECT
5. Model Building: Training the dataset and building the model by selecting the appropriate ML
algorithms to identify the patterns.
6. Execution & Model Validation: Implementation of the model and validating the model such
as validating and fine-tuning the parameters.
7. Deployment: is the representation of business-usable results of the ML process — models
are deployed to enterprise apps, systems, and data stores.
8. Interpretation, Data Visualization, and Documentation: Interpreting, visualizing, and
communicating the model insights. Documenting the modeling process for reproducibility and
creating the model monitoring and maintenance plan.
AI and Internet of Things
DR. HAIDER ALI 6
AI and Internet of Things
DR. HAIDER ALI 7
DATA ACQUISITION IN MACHINE LEARNING
 Collection and Integration of the data
 Formatting
 Labeling
AI and Internet of Things
DR. HAIDER ALI 8
COLLECTION AND INTEGRATION OF THE DATA
The data is extracted from various sources and also the data is
usually available at different places so multiple data need to be
combined to be used. The data acquired is typically in raw format
and not suitable for immediate consumption and analysis.
AI and Internet of Things
DR. HAIDER ALI 9
FORMATTING
 Prepare or organize the datasets as per the analysis requirements.
AI and Internet of Things
DR. HAIDER ALI 10
LABELING
 After gathering data, it is required to label the data. One such
instance is in an application factory, one would want to label the
images of the components if the components are defective or not.
THE DATA ACQUISITION PROCESS
The process of data acquisition involves searching for the datasets that
can be used to train the Machine Learning models. Having said that, it is
not simple. There are various approaches to acquiring data, here have
bucketed into three main segments such as:
 Data Discovery
 Data Augmentation
 Data Generation
AI and Internet of Things
DR. HAIDER ALI 11
DATA DISCOVERY
The first approach to acquiring data is Data discovery. It is a
key step when indexing, sharing, and searching for new
datasets available on the web and incorporating data lakes.
It can be broken into two steps: Searching and Sharing.
Firstly, the data must be labeled or indexed and published
for sharing using many available collaborative systems for
this purpose.
AI and Internet of Things
DR. HAIDER ALI 12
DATA AUGMENTATION
The next approach for data acquisition is Data
augmentation. Augment means to make something greater
by adding to it, so here in the context of data acquisition, we
are essentially enriching the existing data by adding more
external data. In Deep and Machine learning, using pre-
trained models and embeddings is common to increase the
features to train on.
AI and Internet of Things
DR. HAIDER ALI 13
AI and Internet of Things
DR. HAIDER ALI 14
AI and Internet of Things
DR. HAIDER ALI 15
DATA GENERATION
As the name suggests, the data is generated. If we do not have enough and
any external data is not available, the option is to generate the datasets
manually or automatically.
Instead of collecting and labeling large datasets, there are several techniques
for generating synthetic data that has similar properties to real data. Synthetic
data has major advantages, including reduced cost, higher accuracy in data
labeling (because the labels in synthetic data are already known), scalability (it
is easy to create vast amounts of simulated data), and variety. Synthetic data
can be used to create data samples for edge cases that do not frequently occur
in the real world.
AI and Internet of Things
DR. HAIDER ALI 16
DATA ACQUISITION TECHNIQUES AND TOOLS
AI and Internet of Things
DR. HAIDER ALI 17
DATA ACQUISITION TECHNIQUES AND TOOLS
The major tools and techniques for data acquisition are:
1.Data Warehouses and ETL
2.Data Lakes and ELT
3.Cloud Data Warehouse providers
AI and Internet of Things
DR. HAIDER ALI 18
DATA WAREHOUSES AND ETL
DR. HAIDER ALI AI and Internet of Things 19
DATA WAREHOUSES AND ETL
A data warehouse is a type of database that is used for storing and managing
large amounts of data. It is designed to facilitate the process of querying and
analyzing data, and is often used by organizations to support business
intelligence and decision-making activities. Data warehouses typically store data
from multiple sources, such as operational databases, transactional systems,
and external sources, and are designed to support the efficient execution of
complex queries and analysis. This allows organizations to gain insights into
their data and make informed decisions based on that information.
AI and Internet of Things
DR. HAIDER ALI 20
DATA LAKES AND ELT
A data lake is a storage repository having the capacity to store large amounts of
data, including structured, semi-structured, and unstructured data. It can store
images, videos, audio, sound records, and PDF files. It helps for faster ingestion
of new data.
Unlike data warehouses, data lakes store everything, are more flexible, and
follow the Extract, Load, and Transform (ELT) approach. The data is first loaded
and not transformed until required to transform. Therefore, the data is processed
later as per the requirements.
AI and Internet of Things
DR. HAIDER ALI 21
CLOUD DATA WAREHOUSE PROVIDERS
A cloud data warehouse is another service that collects, organizes, and
stores data. Cloud data warehouses are quicker and cheaper to set up as
no physical hardware needs to be procured.
• Amazon Redshift
• Snowflake
• Google BigQuery
• IBM Db2 Warehouse
• Microsoft Azure Synapse
• Oracle Autonomous Data Warehouse
• SAP Data Warehouse Cloud
• Yellowbrick Data
• Teradata Integrated Data Warehouse
DR. HAIDER ALI
AI and Internet of
Things 22
THANK YOU
DR. HAIDER ALI AI and Internet of Things 23

More Related Content

PDF
The Evolving Role of the Data Engineer - Whitepaper | Qubole
PPTX
Big Data Driven Solutions to Combat Covid' 19
PDF
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
PDF
Course 8 : How to start your big data project by Eric Rodriguez
DOCX
Big data (word file)
PDF
Big Data Tools: A Deep Dive into Essential Tools
PPTX
Introduction To Data Mining and Data Mining Techniques.pptx
PDF
The book of elephant tattoo
The Evolving Role of the Data Engineer - Whitepaper | Qubole
Big Data Driven Solutions to Combat Covid' 19
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Course 8 : How to start your big data project by Eric Rodriguez
Big data (word file)
Big Data Tools: A Deep Dive into Essential Tools
Introduction To Data Mining and Data Mining Techniques.pptx
The book of elephant tattoo

Similar to Lecture # 9.pptx (20)

PPTX
What is data science ?
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PPTX
data collection, data integration, data management, data modeling.pptx
PPTX
Ch1IntroductiontoDataScience.pptx
PPTX
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DOCX
Business Intelligence
PDF
A Survey on Data Mining
PDF
25 Best Data Mining Tools in 2022
PDF
Data Integration Made Easy Databricks Connects Your Data Ecosystem
PDF
What Does a Data Engineer Do | IABAC Certification
PPTX
Unit i big data introduction
PPTX
Data Science- Basics.pptx
PDF
2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf
PPTX
Big data
PDF
intelligent-data-lake_executive-brief
PPT
Data mining and data warehousing
PDF
ANALYTICS OF DATA USING HADOOP-A REVIEW
PPT
DATA WAREHOUSING AND DATA MINING
PPT
DATA WAREHOUSING AND DATA MINING
What is data science ?
Advanced Analytics and Machine Learning with Data Virtualization
data collection, data integration, data management, data modeling.pptx
Ch1IntroductiontoDataScience.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
Business Intelligence
A Survey on Data Mining
25 Best Data Mining Tools in 2022
Data Integration Made Easy Databricks Connects Your Data Ecosystem
What Does a Data Engineer Do | IABAC Certification
Unit i big data introduction
Data Science- Basics.pptx
2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf
Big data
intelligent-data-lake_executive-brief
Data mining and data warehousing
ANALYTICS OF DATA USING HADOOP-A REVIEW
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
Ad

Recently uploaded (20)

PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
newyork.pptxirantrafgshenepalchinachinane
PDF
The Ikigai Template _ Recalibrate How You Spend Your Time.pdf
PPT
250152213-Excitation-SystemWERRT (1).ppt
PPTX
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
PDF
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PPTX
Layers_of_the_Earth_Grade7.pptx class by
PDF
Exploring The Internet Of Things(IOT).ppt
PDF
Understand the Gitlab_presentation_task.pdf
PPTX
Database Information System - Management Information System
PPTX
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
PDF
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
PDF
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
PPT
12 Things That Make People Trust a Website Instantly
PPTX
Mathew Digital SEO Checklist Guidlines 2025
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PPTX
AI_Cyberattack_Solutions AI AI AI AI .pptx
PPTX
Top Website Bugs That Hurt User Experience – And How Expert Web Design Fixes
Exploring VPS Hosting Trends for SMBs in 2025
newyork.pptxirantrafgshenepalchinachinane
The Ikigai Template _ Recalibrate How You Spend Your Time.pdf
250152213-Excitation-SystemWERRT (1).ppt
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
Layers_of_the_Earth_Grade7.pptx class by
Exploring The Internet Of Things(IOT).ppt
Understand the Gitlab_presentation_task.pdf
Database Information System - Management Information System
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
12 Things That Make People Trust a Website Instantly
Mathew Digital SEO Checklist Guidlines 2025
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
AI_Cyberattack_Solutions AI AI AI AI .pptx
Top Website Bugs That Hurt User Experience – And How Expert Web Design Fixes
Ad

Lecture # 9.pptx

  • 1. ME-438 AI AND INTERNET OF THINGS ELECTIVE COURSE NED University of Engineering & Technology 1
  • 2. THIS WEEK  Data Acquisition in Machine Learning  Data Acquisition Techniques and Tools AI and Internet of Things DR. HAIDER ALI 2
  • 3. DATA ACQUISITION IN MACHINE LEARNING AI and Internet of Things DR. HAIDER ALI 3
  • 4. DATA ACQUISITION AI and Internet of Things DR. HAIDER ALI 4 “Data acquisition is the process of sampling signals that measure real-world physical conditions and converting the resulting samples into digital numeric values that a computer can manipulate.”
  • 5. LIFE-CYCLE OF A MACHINE LEARNING PROJECT The life-cycle of a Machine Learning project follows: 1. Defining the project objective: Identifying the business problem, converting it into a statistical problem, and then to the optimization problem 2. Data Acquisition or Collection: Acquiring and merging the data from all the appropriate sources 3. Data Exploration and Pre-processing: Cleaning and preprocessing the data to create homogeneity, performing exploratory data analysis and statistical analysis to understand the relationships between the variables. 4. Feature Engineering: Create new features based on empirical relationships and select significant variables using dimension reductional techniques. AI and Internet of Things DR. HAIDER ALI 5
  • 6. LIFE-CYCLE OF A MACHINE LEARNING PROJECT 5. Model Building: Training the dataset and building the model by selecting the appropriate ML algorithms to identify the patterns. 6. Execution & Model Validation: Implementation of the model and validating the model such as validating and fine-tuning the parameters. 7. Deployment: is the representation of business-usable results of the ML process — models are deployed to enterprise apps, systems, and data stores. 8. Interpretation, Data Visualization, and Documentation: Interpreting, visualizing, and communicating the model insights. Documenting the modeling process for reproducibility and creating the model monitoring and maintenance plan. AI and Internet of Things DR. HAIDER ALI 6
  • 7. AI and Internet of Things DR. HAIDER ALI 7
  • 8. DATA ACQUISITION IN MACHINE LEARNING  Collection and Integration of the data  Formatting  Labeling AI and Internet of Things DR. HAIDER ALI 8
  • 9. COLLECTION AND INTEGRATION OF THE DATA The data is extracted from various sources and also the data is usually available at different places so multiple data need to be combined to be used. The data acquired is typically in raw format and not suitable for immediate consumption and analysis. AI and Internet of Things DR. HAIDER ALI 9
  • 10. FORMATTING  Prepare or organize the datasets as per the analysis requirements. AI and Internet of Things DR. HAIDER ALI 10 LABELING  After gathering data, it is required to label the data. One such instance is in an application factory, one would want to label the images of the components if the components are defective or not.
  • 11. THE DATA ACQUISITION PROCESS The process of data acquisition involves searching for the datasets that can be used to train the Machine Learning models. Having said that, it is not simple. There are various approaches to acquiring data, here have bucketed into three main segments such as:  Data Discovery  Data Augmentation  Data Generation AI and Internet of Things DR. HAIDER ALI 11
  • 12. DATA DISCOVERY The first approach to acquiring data is Data discovery. It is a key step when indexing, sharing, and searching for new datasets available on the web and incorporating data lakes. It can be broken into two steps: Searching and Sharing. Firstly, the data must be labeled or indexed and published for sharing using many available collaborative systems for this purpose. AI and Internet of Things DR. HAIDER ALI 12
  • 13. DATA AUGMENTATION The next approach for data acquisition is Data augmentation. Augment means to make something greater by adding to it, so here in the context of data acquisition, we are essentially enriching the existing data by adding more external data. In Deep and Machine learning, using pre- trained models and embeddings is common to increase the features to train on. AI and Internet of Things DR. HAIDER ALI 13
  • 14. AI and Internet of Things DR. HAIDER ALI 14
  • 15. AI and Internet of Things DR. HAIDER ALI 15
  • 16. DATA GENERATION As the name suggests, the data is generated. If we do not have enough and any external data is not available, the option is to generate the datasets manually or automatically. Instead of collecting and labeling large datasets, there are several techniques for generating synthetic data that has similar properties to real data. Synthetic data has major advantages, including reduced cost, higher accuracy in data labeling (because the labels in synthetic data are already known), scalability (it is easy to create vast amounts of simulated data), and variety. Synthetic data can be used to create data samples for edge cases that do not frequently occur in the real world. AI and Internet of Things DR. HAIDER ALI 16
  • 17. DATA ACQUISITION TECHNIQUES AND TOOLS AI and Internet of Things DR. HAIDER ALI 17
  • 18. DATA ACQUISITION TECHNIQUES AND TOOLS The major tools and techniques for data acquisition are: 1.Data Warehouses and ETL 2.Data Lakes and ELT 3.Cloud Data Warehouse providers AI and Internet of Things DR. HAIDER ALI 18
  • 19. DATA WAREHOUSES AND ETL DR. HAIDER ALI AI and Internet of Things 19
  • 20. DATA WAREHOUSES AND ETL A data warehouse is a type of database that is used for storing and managing large amounts of data. It is designed to facilitate the process of querying and analyzing data, and is often used by organizations to support business intelligence and decision-making activities. Data warehouses typically store data from multiple sources, such as operational databases, transactional systems, and external sources, and are designed to support the efficient execution of complex queries and analysis. This allows organizations to gain insights into their data and make informed decisions based on that information. AI and Internet of Things DR. HAIDER ALI 20
  • 21. DATA LAKES AND ELT A data lake is a storage repository having the capacity to store large amounts of data, including structured, semi-structured, and unstructured data. It can store images, videos, audio, sound records, and PDF files. It helps for faster ingestion of new data. Unlike data warehouses, data lakes store everything, are more flexible, and follow the Extract, Load, and Transform (ELT) approach. The data is first loaded and not transformed until required to transform. Therefore, the data is processed later as per the requirements. AI and Internet of Things DR. HAIDER ALI 21
  • 22. CLOUD DATA WAREHOUSE PROVIDERS A cloud data warehouse is another service that collects, organizes, and stores data. Cloud data warehouses are quicker and cheaper to set up as no physical hardware needs to be procured. • Amazon Redshift • Snowflake • Google BigQuery • IBM Db2 Warehouse • Microsoft Azure Synapse • Oracle Autonomous Data Warehouse • SAP Data Warehouse Cloud • Yellowbrick Data • Teradata Integrated Data Warehouse DR. HAIDER ALI AI and Internet of Things 22
  • 23. THANK YOU DR. HAIDER ALI AI and Internet of Things 23