SlideShare a Scribd company logo
UNIT-3
Processing data visualization
TOPICS
οƒ˜ Processing Big Data
οƒ˜ Integrating disparate data stores
οƒ˜ Mapping data to the programming framework
οƒ˜ Connecting and extracting data from storage
οƒ˜ Transforming data for processing
οƒ˜ subdividing data in preparation for Hadoop Map Reduce.
Processing Big Data
Data processing means to process the data i.e. to
convert its format. As we all know data is the very
useful and when it is well presented, and it
becomes informative and useful. Data processing
system is also referred as information system.
UNIT-3 Data Visualization for the life used...
Types of Data Processing
β€’ Manual Data Processing
β€’ Mechanical Data Processing
β€’ Electronic Data Processing
β€’ Batch Data Processing
β€’ Real-time Data Processing
β€’ Online Data Processing
β€’ Automatic Data Processing
Processing
BIG Data
Integrating disparate data stores
Integrating disparate data stores involves
combining and connecting data from different
sources or storage systems to provide a unified
and coherent view of the information. This
process is crucial for organizations that have data
scattered across various databases, file systems, or
platforms
Here are the key aspects of integrating disparate
data stores
Data Discovery
Data Mapping
Data Extraction
Data Transformation
Data Loading
Data Synchronization
HADOOP
MAPREDUCE
Mapping data to the programming framework
Map and Reduce
Map: As the name suggests its main use is to map the input
data in key-value pairs. The input to the map may be a key-
value pair where the key can be the id of some kind of address
and value is the actual value that it keeps. The Map() function
will be executed in its memory repository on each of these
input key-value pairs and generates the intermediate key-value
pair which works as input for the Reducer
or Reduce() function.
Reduce: The intermediate key-value pairs
that work as input for Reducer are
shuffled and sort and send to
the Reduce() function. Reducer aggregate
or group the data based on its key-value
pair as per the reducer algorithm written
by the developer.
What is data
mapping
The data mapping process in 5
steps
Identify all data fields that must be
mapped
Standardize naming conventions across
sources
Create data transformation rules and
schema logic
Test your logic
Complete the migration, integration, or
transformation
Connecting and extracting data from
storage
Data EXTRACTION Software
β€’ Web Scraping Tools
β€’ OCR (Optical Character Recognition
β€’ PDF Extraction
β€’ Screen Scraping
β€’ Text Analytics
Data Transformation in Data
Mining
The goal of data transformation is to
prepare the data for data mining so that
it can be used to extract useful insights
and knowledge.
Phases of MapReduce – How Hadoop MapReduce Works
It covers all the phases of MapReduce job
execution like Input Files, Input Format,
Input Splits, Record Reader, Mapper,
Combiner, Partitioner, Shuffling, and
Sorting, Reducer, Record Writer, and
Output Format in detail.
THANK
YOU!!

More Related Content

DOCX
Summer Independent Study Report
PPTX
Introduction to Map-Reduce in Hadoop.pptx
PPTX
Introduction to Map-Reduce in Hadoop.pptx
PPTX
NOSQL introduction for big data analytics
PDF
Big Data Analytics Chapter3-6@2021.pdf
PPTX
Map reduce helpful for college students.pptx
PDF
Hadoop eco system with mapreduce hive and pig
PPTX
Module3 for enginerring students ppt.pptx
Summer Independent Study Report
Introduction to Map-Reduce in Hadoop.pptx
Introduction to Map-Reduce in Hadoop.pptx
NOSQL introduction for big data analytics
Big Data Analytics Chapter3-6@2021.pdf
Map reduce helpful for college students.pptx
Hadoop eco system with mapreduce hive and pig
Module3 for enginerring students ppt.pptx

Similar to UNIT-3 Data Visualization for the life used... (20)

PPTX
Aggahsbsbsbsbsbsbsbsbsbwbshhwhwhwgwhwhwh
PPTX
Hadoop and MapReduce
PDF
Hadoop Master Class : A concise overview
PDF
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
PPTX
Big Data.pptx
PPTX
Big Data
PDF
Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015
PDF
Emergent Distributed Data Storage
PDF
Seminar_Report_hadoop
PPTX
Hadoop MapReduce Paradigm
PDF
MapReduce and Hadoop
PDF
1. Big Data - Introduction(what is bigdata).pdf
PPTX
This gives a brief detail about big data
PPTX
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
PPT
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014
PPTX
MapReduce Paradigm
PPTX
Introduction to MapReduce
PPTX
introduction to Complete Map and Reduce Framework
PDF
MapReduce Algorithm Design
PDF
Aggahsbsbsbsbsbsbsbsbsbwbshhwhwhwgwhwhwh
Hadoop and MapReduce
Hadoop Master Class : A concise overview
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
Big Data.pptx
Big Data
Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015
Emergent Distributed Data Storage
Seminar_Report_hadoop
Hadoop MapReduce Paradigm
MapReduce and Hadoop
1. Big Data - Introduction(what is bigdata).pdf
This gives a brief detail about big data
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014
MapReduce Paradigm
Introduction to MapReduce
introduction to Complete Map and Reduce Framework
MapReduce Algorithm Design
Ad

More from AadityaRathi4 (8)

PPTX
Unit _2 Data Processing.pptx FOR THE DATA SCIENCE STUDENTSHE
PPTX
MACHINE LEARNING PPT.pptx for the machine learning studnets
PPTX
Data about big data preaentation fr the mba studnets
PPT
database management System how we handle it
PPTX
Data about big data generaally used for kowing big data
PPTX
Big data analytics for the bussiness purpose
PPTX
UNIT-1 Data Visualization for the life use
PPTX
UNIT-1 Data Visualization used in daily life
Unit _2 Data Processing.pptx FOR THE DATA SCIENCE STUDENTSHE
MACHINE LEARNING PPT.pptx for the machine learning studnets
Data about big data preaentation fr the mba studnets
database management System how we handle it
Data about big data generaally used for kowing big data
Big data analytics for the bussiness purpose
UNIT-1 Data Visualization for the life use
UNIT-1 Data Visualization used in daily life
Ad

Recently uploaded (20)

PPTX
Introduction to cybersecurity and digital nettiquette
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPTX
Mathew Digital SEO Checklist Guidlines 2025
PPTX
Introduction to Information and Communication Technology
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PPTX
Digital Literacy And Online Safety on internet
PPTX
artificial intelligence overview of it and more
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Internet___Basics___Styled_ presentation
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
PPTX
newyork.pptxirantrafgshenepalchinachinane
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
Introduction to cybersecurity and digital nettiquette
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
presentation_pfe-universite-molay-seltan.pptx
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Mathew Digital SEO Checklist Guidlines 2025
Introduction to Information and Communication Technology
Sims 4 Historia para lo sims 4 para jugar
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Design_with_Watersergyerge45hrbgre4top (1).ppt
Digital Literacy And Online Safety on internet
artificial intelligence overview of it and more
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Internet___Basics___Styled_ presentation
An introduction to the IFRS (ISSB) Stndards.pdf
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
newyork.pptxirantrafgshenepalchinachinane
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)

UNIT-3 Data Visualization for the life used...

  • 1. UNIT-3 Processing data visualization TOPICS οƒ˜ Processing Big Data οƒ˜ Integrating disparate data stores οƒ˜ Mapping data to the programming framework οƒ˜ Connecting and extracting data from storage οƒ˜ Transforming data for processing οƒ˜ subdividing data in preparation for Hadoop Map Reduce.
  • 2. Processing Big Data Data processing means to process the data i.e. to convert its format. As we all know data is the very useful and when it is well presented, and it becomes informative and useful. Data processing system is also referred as information system.
  • 4. Types of Data Processing β€’ Manual Data Processing β€’ Mechanical Data Processing β€’ Electronic Data Processing β€’ Batch Data Processing β€’ Real-time Data Processing β€’ Online Data Processing β€’ Automatic Data Processing
  • 6. Integrating disparate data stores Integrating disparate data stores involves combining and connecting data from different sources or storage systems to provide a unified and coherent view of the information. This process is crucial for organizations that have data scattered across various databases, file systems, or platforms
  • 7. Here are the key aspects of integrating disparate data stores Data Discovery Data Mapping Data Extraction Data Transformation Data Loading Data Synchronization
  • 8. HADOOP MAPREDUCE Mapping data to the programming framework
  • 9. Map and Reduce Map: As the name suggests its main use is to map the input data in key-value pairs. The input to the map may be a key- value pair where the key can be the id of some kind of address and value is the actual value that it keeps. The Map() function will be executed in its memory repository on each of these input key-value pairs and generates the intermediate key-value pair which works as input for the Reducer or Reduce() function.
  • 10. Reduce: The intermediate key-value pairs that work as input for Reducer are shuffled and sort and send to the Reduce() function. Reducer aggregate or group the data based on its key-value pair as per the reducer algorithm written by the developer.
  • 12. The data mapping process in 5 steps Identify all data fields that must be mapped Standardize naming conventions across sources Create data transformation rules and schema logic Test your logic Complete the migration, integration, or transformation
  • 13. Connecting and extracting data from storage
  • 14. Data EXTRACTION Software β€’ Web Scraping Tools β€’ OCR (Optical Character Recognition β€’ PDF Extraction β€’ Screen Scraping β€’ Text Analytics
  • 15. Data Transformation in Data Mining The goal of data transformation is to prepare the data for data mining so that it can be used to extract useful insights and knowledge.
  • 16. Phases of MapReduce – How Hadoop MapReduce Works It covers all the phases of MapReduce job execution like Input Files, Input Format, Input Splits, Record Reader, Mapper, Combiner, Partitioner, Shuffling, and Sorting, Reducer, Record Writer, and Output Format in detail.