SlideShare a Scribd company logo
1
2
3
4
5
6
7
8
Data Sourc
  Data source is the primary location from
where data comes. The data source can be a
database, a dataset, a spreadsheet or even
hard-coded data
Data Lake
A data lake is usually a single store
of all enterprise data
Data Pipeline
a pipeline, also known as a data
pipeline, is a set of data processing
elements connected in series
Data Store
A Data Store is a connection to a
store of data, whether the data is
stored in a database or in one or
more files
Data Warehouse
A data warehouse is a system that
pulls together data from many
different sources
Data Processing
Data processing is, generally, "the
collection and manipulation of items
of data to produce meaningful
information
Data Revenue
The statistic shows the revenue from
the global big data
Data Visulization
Data visualization is the graphical
representation of information and
data
 Data Analysis Model 
Linux O/S ,
Hadoop Big Data,
Hadoop File System(HDFS)
3 replication Datanode
Hive Database,
Data Warehouse
SELECT * FROM Customers;
SELECT Country FROM Customers;
SELECT * FROM Customers
WHERE Country='Mexico';
 
Web, App Cloud,
          Database, Excel,
Notepad, etc.    Data Set:
01.Healthcare: [Record–46935]
02.Weather-history:[Record–4573]
03.World Demography
04.Census Tracts 2010:[Record-21]
05.Animal_Services_Intake_Data
06.Average_Daily_Traffic_Counts
07.Acciental_Durg_Related_Death
08.Popular_Baby_Names:[Record
09.Sales_Tax_Rates:[Record-1911]
10.Restaurants:[Record-1328]
11.Acciental_Durg_Related_Death
12.Census Tracts 2010:[Record-216]
13.Employees_Salary:[Record–824]
14.Customer_transactional_spending
15.Customer_Order:[Record–1000]
16.Employees_Salary:-[Record–824]
JSON, Web Application File Format : {
"employee":{
"name":"John", "age":30, "city":"New York" }
}
Excel/CVS/Text File Format:
John,30,New York
Robert,25,DC
Text File/Sequence File
RC File/ORC File
AVRO File
Parquet/Binary File
Customer
Product
1M Data
ETL/ELT, sqoop import-all-tables
-Top 10 Revenue- 
     generating products
-Total revenue per product
- Top 10 Diseases 
-Weather Analysis
Version:  v.001.2020
Database
Power by Open Source Platform

More Related Content

PPTX
EDI Training Module 12: Learn to Cite and Link Your Data
PPTX
Data warehouse
PDF
Smart data hub
PDF
Hadoop & Data Warehouse
PPTX
Introduction
PPTX
Data Warehouse
DOC
Dwdm unit 1-2016-Data ingarehousing
PPTX
EDI Training Module 11: Publishing Data in the EDI Repository
EDI Training Module 12: Learn to Cite and Link Your Data
Data warehouse
Smart data hub
Hadoop & Data Warehouse
Introduction
Data Warehouse
Dwdm unit 1-2016-Data ingarehousing
EDI Training Module 11: Publishing Data in the EDI Repository

What's hot (19)

PPTX
data warehousing and data mining
PPTX
CRM - Data Collection, Storage and Acces.
PDF
Data Warehouse Best Practices
PDF
ETL DW-RealTime
PPT
Cs636 dw-intro
PDF
Performance Benchmarking of Key-Value Store NoSQL Databases
PPTX
Data warehousing
PPTX
PPTX
Data mining and data warehousing
PPT
Data Warehousing
PPTX
Lecture 1 introduction to data warehouse
PPTX
What is SEND?
PPTX
Visualizing Austin's data with Elasticsearch and Kibana
PPT
Csci12 report aug18
PDF
ECSA 2013 (Cuesta)
PPT
DATA WAREHOUSING AND DATA MINING
PDF
T6.6 Sensitive Data Activities
PPT
03 data mining : data warehouse
PPT
Data pre processing
data warehousing and data mining
CRM - Data Collection, Storage and Acces.
Data Warehouse Best Practices
ETL DW-RealTime
Cs636 dw-intro
Performance Benchmarking of Key-Value Store NoSQL Databases
Data warehousing
Data mining and data warehousing
Data Warehousing
Lecture 1 introduction to data warehouse
What is SEND?
Visualizing Austin's data with Elasticsearch and Kibana
Csci12 report aug18
ECSA 2013 (Cuesta)
DATA WAREHOUSING AND DATA MINING
T6.6 Sensitive Data Activities
03 data mining : data warehouse
Data pre processing
Ad

Similar to Data cloud lab version v.001.2020 (20)

PDF
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
PDF
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?
PPTX
Building a Big Data Pipeline
PPTX
Lecture 5- Data Collection and Storage.pptx
PPTX
Data lake-itweekend-sharif university-vahid amiry
PPTX
Data Warehouse to Data Science
PDF
Agile data lake? An oxymoron?
PDF
Are You Killing the Benefits of Your Data Lake?
PPT
Hadoop India Summit, Feb 2011 - Informatica
PPTX
Big Data Architecture Intro and its implementation in the insutry.pptx
PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
PDF
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
PPTX
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
PDF
Big data – An Introduction, July 2013
PDF
Accelerate and modernize your data pipelines
PDF
The Central Hub: Defining the Data Lake
PPSX
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
PDF
Microsoft Big Data @ SQLUG 2013
PDF
IoT Crash Course Hadoop Summit SJ
PDF
Solving Big Data Problems using Hortonworks
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
How Do You Build Data Pipelines that Are Agile, Automated, and Accurate?
Building a Big Data Pipeline
Lecture 5- Data Collection and Storage.pptx
Data lake-itweekend-sharif university-vahid amiry
Data Warehouse to Data Science
Agile data lake? An oxymoron?
Are You Killing the Benefits of Your Data Lake?
Hadoop India Summit, Feb 2011 - Informatica
Big Data Architecture Intro and its implementation in the insutry.pptx
Hadoop 2.0: YARN to Further Optimize Data Processing
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Big data – An Introduction, July 2013
Accelerate and modernize your data pipelines
The Central Hub: Defining the Data Lake
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Microsoft Big Data @ SQLUG 2013
IoT Crash Course Hadoop Summit SJ
Solving Big Data Problems using Hortonworks
Ad

Recently uploaded (20)

PPTX
Business Acumen Training GuidePresentation.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Lecture1 pattern recognition............
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Computer network topology notes for revision
PDF
.pdf is not working space design for the following data for the following dat...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Acumen Training GuidePresentation.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Lecture1 pattern recognition............
Miokarditis (Inflamasi pada Otot Jantung)
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to machine learning and Linear Models
IB Computer Science - Internal Assessment.pptx
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Data_Analytics_and_PowerBI_Presentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision
.pdf is not working space design for the following data for the following dat...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
1_Introduction to advance data techniques.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Data cloud lab version v.001.2020

  • 1. 1 2 3 4 5 6 7 8 Data Sourc   Data source is the primary location from where data comes. The data source can be a database, a dataset, a spreadsheet or even hard-coded data Data Lake A data lake is usually a single store of all enterprise data Data Pipeline a pipeline, also known as a data pipeline, is a set of data processing elements connected in series Data Store A Data Store is a connection to a store of data, whether the data is stored in a database or in one or more files Data Warehouse A data warehouse is a system that pulls together data from many different sources Data Processing Data processing is, generally, "the collection and manipulation of items of data to produce meaningful information Data Revenue The statistic shows the revenue from the global big data Data Visulization Data visualization is the graphical representation of information and data  Data Analysis Model  Linux O/S , Hadoop Big Data, Hadoop File System(HDFS) 3 replication Datanode Hive Database, Data Warehouse SELECT * FROM Customers; SELECT Country FROM Customers; SELECT * FROM Customers WHERE Country='Mexico';   Web, App Cloud,           Database, Excel, Notepad, etc.    Data Set: 01.Healthcare: [Record–46935] 02.Weather-history:[Record–4573] 03.World Demography 04.Census Tracts 2010:[Record-21] 05.Animal_Services_Intake_Data 06.Average_Daily_Traffic_Counts 07.Acciental_Durg_Related_Death 08.Popular_Baby_Names:[Record 09.Sales_Tax_Rates:[Record-1911] 10.Restaurants:[Record-1328] 11.Acciental_Durg_Related_Death 12.Census Tracts 2010:[Record-216] 13.Employees_Salary:[Record–824] 14.Customer_transactional_spending 15.Customer_Order:[Record–1000] 16.Employees_Salary:-[Record–824] JSON, Web Application File Format : { "employee":{ "name":"John", "age":30, "city":"New York" } } Excel/CVS/Text File Format: John,30,New York Robert,25,DC Text File/Sequence File RC File/ORC File AVRO File Parquet/Binary File Customer Product 1M Data ETL/ELT, sqoop import-all-tables -Top 10 Revenue-       generating products -Total revenue per product - Top 10 Diseases  -Weather Analysis Version:  v.001.2020 Database Power by Open Source Platform