SlideShare a Scribd company logo
Big Data
Presented by,
Mohamedsalman S
(BIT CSE)
contents
 Introduction.
 Components.
 Methods.
 What is Hadoop.
 Hadoop Offers.
 Map reduce.
 What is HPCC.
 HPCC Components.
 Big Data Samples.
 Difference between Hpcc and Hadoop.
 Private and Security issues.
 Knowledge Discovery.
 Conclusion.
Introduction
 Big data and its analysis are at the center of modern science and
business.
 These data are generated from online transactions, emails, videos,
audios, images etc.
 They are stored in databases grow massively and become difficult to
capture, store, manage, share.
 It is predicted to double every two years reaching about 8zettabytes
of data by 2015.
Components
 Vareity.
Variety makes big data really big.
Big data comes from a great variety of sources.
Generally has in three types structured, unstructured and semi-
structured.
Structured data inserts a data warehouse already tagged and
easily sorted.
Unstructured data is random and difficult to analyze.
Components
Semi-structured data does not conform to fixed fields but contains
tags to separate data elements.
 Volume.
Volume or the size of data now is larger than terabytes, petabytes and
zettabytes.
 Velocity.
The flow of data is massive and continuous.
Big data should be used as it streams into the organization in order to
maximize its value.
Methods
 Facing lots of new data which arrives in many different forms.
 Big data has generated a whole new industry of supporting
architectures such as MapReduce.
 MapReduce is a programming framework for distributed computing.
 Created by google using divide and conquer method.
 MapReduce can be divided into two stages.
Map Step. Hpcc.
Reduce Step. Hadoop.
What is Hadoop?
 Hadoop is an open-source software framework.
 Its Java based framework.
 Essentially it accomplishes two tasks massive data storage and faster
processing.
 Its not replace in database warehouse or ETL.
Hadoop Offers
 HDFS - responsible for storing data on the clusters.
 MapReduce.
 Hbase - distributed database for random read/write access.
 Pig - high level data processing system.
 Hive - data warehouse application.
 Sqoop - transferring data between relational databases and Hadoop.
Mapreduce
 MapReduce is a programming framework for distributed computing.
 Created by google using divide and conquer method.
 MapReduce can be divided into two stages.
Map Step.
Reduce Step.
Map Reduce
What is HPCC?
 HPCC also known as DAS.
 HPCC Systems distributed data intensive open source computing
platform and provides big data workflow management services.
 Unlike Hadoop, HPCC’s data model defined by user.
 HPCC Platform does not require third party tools like GreenPlum,
Cassandra, RDBMS, Oozie.
HPCC Components
 HPCC Data Refinery
Massively parallel ETL engine that enables data integration
and provides batch oriented data manipulation.
 HPCC Data Delivery Engine
High throughput, ultra fast, low latency.
 Enterprise Control Language
Simple usage programming language optimized for big data
operations and query transactions.
Big Data Samples
 Biological science.
 Life sciences.
 Medical records.
 Scientific research.
 Mobile phones.
 Government.
Difference between Hpcc and
Hadoop
Knowledge Discovery
 Some operations designed to get information from complicated data
sets.
 Removing noise, handling missing data fields and calculating time
information.
 Mapping purposes to a particular data mining methods.
 Choose data mining algorithm and method for searching data
patterns.
Privacy and Security Issues
 It required that big data stores are rightly controlled.
 To ensure authentication a cryptographically secure communication
framework has to be implemented.
 They control data according to specified by the regulations such as
imposing store periods.
 Organizations have to consider legal branching for storing data.
Knowledge Discovery
 Some operations designed to get information from complicated data
sets.
 Removing noise, handling missing data fields and calculating time
information.
 Mapping purposes to a particular data mining methods.
 Choose data mining algorithm and method for searching data
patterns.
Conclusion
 Difficult to managing the data.
 Data keep in secure manner.
 Its used more no of organization.

More Related Content

PDF
Hadoop and Big Data Analytics | Sysfore
PPTX
Big Data Analytics
PDF
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
PDF
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
PPTX
A Glimpse of Bigdata - Introduction
PDF
Analysis of big data in pandemic case
PPT
Big Data Analytics 2014
DOCX
Big data abstract
Hadoop and Big Data Analytics | Sysfore
Big Data Analytics
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
A Glimpse of Bigdata - Introduction
Analysis of big data in pandemic case
Big Data Analytics 2014
Big data abstract

What's hot (20)

PDF
Big Data
PDF
Big data analysis concepts and references
PPTX
PDF
big data
PPTX
Introduction to Big Data
PPTX
View on big data technologies
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
PPTX
Bigdata " new level"
PPTX
Big Data & Data Science
PPSX
Big Data
PDF
Introduction to Big Data
PDF
Lecture1 introduction to big data
PPTX
Big Data Hadoop
PPTX
Exploring Big Data Analytics Tools
PDF
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
PPTX
Big data unit 2
PDF
Intro to big data and applications - day 2
PPTX
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
PDF
Future of Data - Big Data
PPTX
What is Big Data ?
Big Data
Big data analysis concepts and references
big data
Introduction to Big Data
View on big data technologies
Big Data Analysis Patterns - TriHUG 6/27/2013
Bigdata " new level"
Big Data & Data Science
Big Data
Introduction to Big Data
Lecture1 introduction to big data
Big Data Hadoop
Exploring Big Data Analytics Tools
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big data unit 2
Intro to big data and applications - day 2
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Future of Data - Big Data
What is Big Data ?
Ad

Viewers also liked (20)

DOC
'Denktank' Minerva is politiek vehikel van vakbonden en ngo’s
PDF
Fundraising in healthcare what investors look for
PDF
nicolettibell
PDF
Revista Oficial Real Avilés CF, número 3
PPS
Great leaders
PDF
Looking at crowdsourcing and some of its legal implications
PPTX
INITIATIVES DES FONDS SODÉMEX, FONDS DE SOLIDARITÉ FTQ ET SIDEX POUR FINANCER...
PDF
Kpi Partners Company Profile1
PPTX
Hermenéutica analógica
PDF
Comparativ therapeutics of fever
PDF
Pbm single point_gowrishankar
PDF
실리콘밸리 2014년 1분기 벤처투자리스트.
DOCX
Práctica no. 4
PPT
Overview of MEASURE Evaluation In the ECOWAS Region
PPTX
Global and APAC OTC trends (March 2013)
PDF
Winning in growth cities
PDF
Kpi Partners
PPT
Emploi tourisme-aquitaine
PPT
Gabriel
PDF
M2 roadshow us gowri shankar, single point
'Denktank' Minerva is politiek vehikel van vakbonden en ngo’s
Fundraising in healthcare what investors look for
nicolettibell
Revista Oficial Real Avilés CF, número 3
Great leaders
Looking at crowdsourcing and some of its legal implications
INITIATIVES DES FONDS SODÉMEX, FONDS DE SOLIDARITÉ FTQ ET SIDEX POUR FINANCER...
Kpi Partners Company Profile1
Hermenéutica analógica
Comparativ therapeutics of fever
Pbm single point_gowrishankar
실리콘밸리 2014년 1분기 벤처투자리스트.
Práctica no. 4
Overview of MEASURE Evaluation In the ECOWAS Region
Global and APAC OTC trends (March 2013)
Winning in growth cities
Kpi Partners
Emploi tourisme-aquitaine
Gabriel
M2 roadshow us gowri shankar, single point
Ad

Similar to Big data (20)

PPTX
Big data
PPTX
Big data
PPTX
Case study on big data
PDF
Big data and hadoop
PPTX
bda ghhhhhftttyygghhjjuuujjjhhunit1.pptx
PDF
Big data technology
PPT
Big Data & Hadoop
PPT
Big data with hadoop
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PPT
Hadoop HDFS.ppt
PDF
Learn About Big Data and Hadoop The Most Significant Resource
PDF
PPTX
Apache hadoop introduction and architecture
PPTX
Big data and hadoop
PPT
Data analytics & its Trends
PDF
Big data and hadoop overvew
PPTX
selected topics in CS-CHaaapteerobe.pptx
PPTX
Big data peresintaion
PPTX
DataJan27.pptxDataFoundationsPresentation
PDF
Big data
Big data
Case study on big data
Big data and hadoop
bda ghhhhhftttyygghhjjuuujjjhhunit1.pptx
Big data technology
Big Data & Hadoop
Big data with hadoop
Lecture 5 - Big Data and Hadoop Intro.ppt
Hadoop HDFS.ppt
Learn About Big Data and Hadoop The Most Significant Resource
Apache hadoop introduction and architecture
Big data and hadoop
Data analytics & its Trends
Big data and hadoop overvew
selected topics in CS-CHaaapteerobe.pptx
Big data peresintaion
DataJan27.pptxDataFoundationsPresentation

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Digital Logic Computer Design lecture notes
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Sustainable Sites - Green Building Construction
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Lecture Notes Electrical Wiring System Components
DOCX
573137875-Attendance-Management-System-original
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Project quality management in manufacturing
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
web development for engineering and engineering
Model Code of Practice - Construction Work - 21102022 .pdf
Digital Logic Computer Design lecture notes
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Sustainable Sites - Green Building Construction
Structs to JSON How Go Powers REST APIs.pdf
Geodesy 1.pptx...............................................
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Lecture Notes Electrical Wiring System Components
573137875-Attendance-Management-System-original
UNIT 4 Total Quality Management .pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Operating System & Kernel Study Guide-1 - converted.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Project quality management in manufacturing
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
web development for engineering and engineering

Big data

  • 2. contents  Introduction.  Components.  Methods.  What is Hadoop.  Hadoop Offers.  Map reduce.  What is HPCC.  HPCC Components.  Big Data Samples.  Difference between Hpcc and Hadoop.  Private and Security issues.  Knowledge Discovery.  Conclusion.
  • 3. Introduction  Big data and its analysis are at the center of modern science and business.  These data are generated from online transactions, emails, videos, audios, images etc.  They are stored in databases grow massively and become difficult to capture, store, manage, share.  It is predicted to double every two years reaching about 8zettabytes of data by 2015.
  • 4. Components  Vareity. Variety makes big data really big. Big data comes from a great variety of sources. Generally has in three types structured, unstructured and semi- structured. Structured data inserts a data warehouse already tagged and easily sorted. Unstructured data is random and difficult to analyze.
  • 5. Components Semi-structured data does not conform to fixed fields but contains tags to separate data elements.  Volume. Volume or the size of data now is larger than terabytes, petabytes and zettabytes.  Velocity. The flow of data is massive and continuous. Big data should be used as it streams into the organization in order to maximize its value.
  • 6. Methods  Facing lots of new data which arrives in many different forms.  Big data has generated a whole new industry of supporting architectures such as MapReduce.  MapReduce is a programming framework for distributed computing.  Created by google using divide and conquer method.  MapReduce can be divided into two stages. Map Step. Hpcc. Reduce Step. Hadoop.
  • 7. What is Hadoop?  Hadoop is an open-source software framework.  Its Java based framework.  Essentially it accomplishes two tasks massive data storage and faster processing.  Its not replace in database warehouse or ETL.
  • 8. Hadoop Offers  HDFS - responsible for storing data on the clusters.  MapReduce.  Hbase - distributed database for random read/write access.  Pig - high level data processing system.  Hive - data warehouse application.  Sqoop - transferring data between relational databases and Hadoop.
  • 9. Mapreduce  MapReduce is a programming framework for distributed computing.  Created by google using divide and conquer method.  MapReduce can be divided into two stages. Map Step. Reduce Step.
  • 11. What is HPCC?  HPCC also known as DAS.  HPCC Systems distributed data intensive open source computing platform and provides big data workflow management services.  Unlike Hadoop, HPCC’s data model defined by user.  HPCC Platform does not require third party tools like GreenPlum, Cassandra, RDBMS, Oozie.
  • 12. HPCC Components  HPCC Data Refinery Massively parallel ETL engine that enables data integration and provides batch oriented data manipulation.  HPCC Data Delivery Engine High throughput, ultra fast, low latency.  Enterprise Control Language Simple usage programming language optimized for big data operations and query transactions.
  • 13. Big Data Samples  Biological science.  Life sciences.  Medical records.  Scientific research.  Mobile phones.  Government.
  • 15. Knowledge Discovery  Some operations designed to get information from complicated data sets.  Removing noise, handling missing data fields and calculating time information.  Mapping purposes to a particular data mining methods.  Choose data mining algorithm and method for searching data patterns.
  • 16. Privacy and Security Issues  It required that big data stores are rightly controlled.  To ensure authentication a cryptographically secure communication framework has to be implemented.  They control data according to specified by the regulations such as imposing store periods.  Organizations have to consider legal branching for storing data.
  • 17. Knowledge Discovery  Some operations designed to get information from complicated data sets.  Removing noise, handling missing data fields and calculating time information.  Mapping purposes to a particular data mining methods.  Choose data mining algorithm and method for searching data patterns.
  • 18. Conclusion  Difficult to managing the data.  Data keep in secure manner.  Its used more no of organization.