SlideShare a Scribd company logo
Big Data and Hadoop Ecosystem Overview
Presented by
Obinna C Ekeh
What is Big Data ?
Sources of Big Data
Characteristics of Big Data – The four V’s of Big Data
Big Data Types
Data Growth Rate
Breakthrough Enabling Technologies
1)Hadoop
2)Spark
• Apache Hadoop is an open source software framework for storage and large
scale processing of data-sets on clusters of commodity hardware
-MapReduce : MapReduce is a software framework/model for processing
large datasets that are in commodity hardware that form a cluster (Hadoop uses
MapReduce to process data)
Breakthrough Enabling Technologies
Why is Big Data so important?
Enables businesses /organizations to derive new and better insights
It’s the backbone to these emerging technologies that are radically
changing our world
Big Data is the Lifeblood of the 4th Industrial Revolution
Artificial
Intelligence
Machine
Learning
Deep learning
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Hadoop ecosystem Overview
Big data and hadoop overview
References
Slides on Hadoop ecosystem overview
Big Data Modelling and Management Systems Course on Coursera
https://www.coursera.org/learn/big-data-management
Source of Mayors Speech
Mayor Andrew Ginther's 2017 State of The City Speech
https://www.columbusunderground.com/full-text-of-the-2017-state-of-the-city-
address

More Related Content

PPTX
Hadoop Tutorial
PPTX
Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles
PPTX
Starfish-A self tuning system for bigdata analytics
PDF
Big Data
PPTX
Overview of bigdata
PPTX
Neo4j GraphTour New York_Thomson Reuters SS
PDF
Datamining with big data
PPTX
Trends in big data
Hadoop Tutorial
Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles
Starfish-A self tuning system for bigdata analytics
Big Data
Overview of bigdata
Neo4j GraphTour New York_Thomson Reuters SS
Datamining with big data
Trends in big data

What's hot (16)

PPT
Introduction to Big Data & Hadoop
PPTX
Introduction to Big Data & Big Data 1.0 System
PDF
Getting Digital Preservation Data Out Of Wikidata
PPT
Greencomputing1 120424132051-phpapp01
PDF
Skillshare - Let's talk about R in Data Journalism
PPTX
Tuesday 9.15 john frey innovation @ hp
PPTX
#4 FAIR - Keith Russell
PPTX
HPC Top 5 Stories: August 25, 2017
DOCX
Abstract
PPTX
Akhil's hadoop
PPTX
#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17
PDF
Assignment 1
DOCX
International Journal of Data mining Management Systems (IJDMS)
PDF
Big Data
Introduction to Big Data & Hadoop
Introduction to Big Data & Big Data 1.0 System
Getting Digital Preservation Data Out Of Wikidata
Greencomputing1 120424132051-phpapp01
Skillshare - Let's talk about R in Data Journalism
Tuesday 9.15 john frey innovation @ hp
#4 FAIR - Keith Russell
HPC Top 5 Stories: August 25, 2017
Abstract
Akhil's hadoop
#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17
Assignment 1
International Journal of Data mining Management Systems (IJDMS)
Big Data
Ad

Similar to Big data and hadoop overview (20)

PDF
Hadoop Master Class : A concise overview
PDF
Bigdata and Hadoop Bootcamp
PDF
big data analytics introduction chapter 1
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
Big data Presentation
PDF
Big data and hadoop
PPTX
big data and hadoop
PPT
Big data and hadoop
PPTX
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
PPTX
Not Just Another Overview of Apache Hadoop
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
PDF
The Hadoop Ecosystem for Developers
PPTX
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
PPTX
big_data_presentation with creativitty__
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PDF
BDtraining
PPT
Big Data and Hadoop Basics
PDF
PPTX
Hadoop Platforms - Introduction, Importance, Providers
PDF
Introduction to Big Data
Hadoop Master Class : A concise overview
Bigdata and Hadoop Bootcamp
big data analytics introduction chapter 1
Introduction to Big Data & Hadoop Architecture - Module 1
Big data Presentation
Big data and hadoop
big data and hadoop
Big data and hadoop
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Not Just Another Overview of Apache Hadoop
Hadoop_EcoSystem slide by CIDAC India.pptx
The Hadoop Ecosystem for Developers
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
big_data_presentation with creativitty__
Lecture 5 - Big Data and Hadoop Intro.ppt
BDtraining
Big Data and Hadoop Basics
Hadoop Platforms - Introduction, Importance, Providers
Introduction to Big Data
Ad

Recently uploaded (20)

PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Database Infoormation System (DBIS).pptx
DOCX
Factor Analysis Word Document Presentation
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Introduction to the R Programming Language
PDF
Global Data and Analytics Market Outlook Report
PPT
Predictive modeling basics in data cleaning process
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
How to run a consulting project- client discovery
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Transcultural that can help you someday.
PPTX
Introduction to Inferential Statistics.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Database Infoormation System (DBIS).pptx
Factor Analysis Word Document Presentation
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
A Complete Guide to Streamlining Business Processes
Qualitative Qantitative and Mixed Methods.pptx
Introduction to the R Programming Language
Global Data and Analytics Market Outlook Report
Predictive modeling basics in data cleaning process
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
How to run a consulting project- client discovery
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
SAP 2 completion done . PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Transcultural that can help you someday.
Introduction to Inferential Statistics.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Optimise Shopper Experiences with a Strong Data Estate.pdf

Big data and hadoop overview

Editor's Notes

  • #3: Not a single definition for Big Data. In a nutshell it refers to data that is so big in scale and variety that it becomes impossible to store and process it using traditional RDBMS technologies
  • #4: Big data sources: Machines, people and organizations
  • #5: Big Data characteristics: volume, velocity, variety and veracity
  • #6: Structured data: basically from csv, RDBMS etc. Unstructured data: videos, audios, pdf, email messages etc (Data has no underlying model/structure). Semi-structured data: xml, json
  • #8: Hadoop is an open-source eco-system of software tools used to store and process big data. It was conceived or started by Doug cutting while he worked in Yahoo. The idea was gotten from a paper published by Google on its “Big Table Project”. Its currently an open-source project of Apache and has numerous contributors with new software included from time to time
  • #9: Spark is also for processing data in a cluster. Its 100x more faster than MapReduce and supports a concept called “in-memory computation” which very important for the field machine learning. Spark has an SQL interface called Spark-SQL and it provides an interface for 3 programming languages (java, python and scala)
  • #10: Autonomous trucks are already been used in the outbacks of Australia, its not a matter of if it will occur, its just a matter of time. A lot of white collar jobs today will be non-existent in 15yrs, in scenarios were they do exist, the number of professionals needed per task will be significantly fewer.
  • #12: HDFS is basically the filesystem of Hadoop.
  • #13: Manages resources in the eco-system (CPU cores, RAM, Storage), basically allocates resources to jobs performed on the platform.
  • #14: MapReduce model operates basically by processing data in two stages: the “Map’ stage and the “Reduce”. These stages generate outputs based on the code you write into them
  • #15: It’s a high level scripting language used for ETL
  • #16: Used to analyze data from social networks
  • #17: Spark is also for processing data in a cluster. Its 100x more faster than MapReduce and supports a concept called “in-memory computation” which very important for the field machine learning. Spark has an SQL interface called Spark-SQL and it provides an interface for 3 programming languages (java, python and scala)
  • #19: Basically ensures all the software's in the eco-system are in-sync ie are working harmoniously
  • #20: Sir Arthur C. Clarke, he was a British science fiction writer.