SlideShare a Scribd company logo
1© Copyright 2013 EMC Corporation. All rights reserved.
Big Data – General
Introduction
- Vignesh Gopalan , IIG
2© Copyright 2013 EMC Corporation. All rights reserved.
Agenda
Big Data – Definition
Importance of Big Data
Technologies used in Big Data Analysis
3© Copyright 2013 EMC Corporation. All rights reserved.
Big Data – A Definition
Volume
Variety
Velocity
Veracity
The ‘V’s of Big Data
4© Copyright 2013 EMC Corporation. All rights reserved.
Why is Big Data Important?
Business Analytics
Big Science like LHC, Gene Sequencing Programs
Big Government
5© Copyright 2013 EMC Corporation. All rights reserved.
Big Data – Technologies Primer
MapReduce computation framework and Hadoop
Distributed File System
Distributed databases
NoSQL technologies
6© Copyright 2013 EMC Corporation. All rights reserved.
MapReduce
Published by Google
Scalable
Fault-Tolerant
Batch Computation in parallel
A distributed computation framework
7© Copyright 2013 EMC Corporation. All rights reserved.
MapReduce
Consists of two functions operating
on key-value pairs.
Map – performs filtering and sorting
Reduce - performs summary operation on
Map step results.
… continued
8© Copyright 2013 EMC Corporation. All rights reserved.
Map Reduce…
Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
9© Copyright 2013 EMC Corporation. All rights reserved.
Map Reduce…
Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
10© Copyright 2013 EMC Corporation. All rights reserved.
Distributed File System
Distributed and scalable file system
Highly Available
Intrinsically aware of Map and Reduce jobs
Supports horizontal and vertical partitioning
HDFS – Hadoop Distributed File System
11© Copyright 2013 EMC Corporation. All rights reserved.
HDFS Architecture
Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
12© Copyright 2013 EMC Corporation. All rights reserved.
Apache Hadoop
Open Source implementation of MapReduce + DFS
Image Courtesy – Wikipedia
13© Copyright 2013 EMC Corporation. All rights reserved.
NoSQL Databases
Highly optimized key-value stores
No ACID Guarantees. Eventual consistency
Fault-Tolerant, Distributed architecture.
Amazon Dynamo, Redis are examples.
A distributed computation framework
Big Data – General Introduction

More Related Content

PPTX
EMC Academic Alliance overview
 
PPTX
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
PDF
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
 
PDF
EMC Starter Kit - IBM BigInsights - EMC Isilon
PPTX
Cloud Infrastructure and Services (CIS) - Webinar
 
PPTX
Emc vi pr software defined storage
PDF
Scale-Out Data Lake with EMC Isilon
 
PPTX
EMC config Hadoop
EMC Academic Alliance overview
 
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
 
EMC Starter Kit - IBM BigInsights - EMC Isilon
Cloud Infrastructure and Services (CIS) - Webinar
 
Emc vi pr software defined storage
Scale-Out Data Lake with EMC Isilon
 
EMC config Hadoop

What's hot (20)

PPTX
EMC Isilon Solutions for Data Archives
PDF
Transform Your Business with Big Data Storage
 
PPTX
EMC isilon for -media-and-entertainment-sales-deck
PDF
White Paper: Best Practices for Data Replication with EMC Isilon SyncIQ
 
PPTX
7. emc isilon hdfs enterprise storage for hadoop
PDF
White Paper: EMC Isilon OneFS Operating System
 
PDF
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
 
PDF
White Paper: EMC Isilon OneFS — A Technical Overview
 
PDF
EMC Academic Alliance Presentation
PDF
The Future of Storage : EMC Software Defined Solution
 
PPTX
EMC ScaleIO Overview
PPTX
Emc isilon overview
PPTX
Emc vi pr data services
PPTX
Emc isilon technical deep dive workshop
PDF
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
PDF
Transforming Mission Critical Applications
PDF
EMC ViPR Services Storage Engine Architecture
 
PDF
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
PPSX
EMC-ISILON_MphasiS_Walk_through
PDF
Journey to the Software Defined Data Center: EMA Research Results Revealed
EMC Isilon Solutions for Data Archives
Transform Your Business with Big Data Storage
 
EMC isilon for -media-and-entertainment-sales-deck
White Paper: Best Practices for Data Replication with EMC Isilon SyncIQ
 
7. emc isilon hdfs enterprise storage for hadoop
White Paper: EMC Isilon OneFS Operating System
 
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
 
White Paper: EMC Isilon OneFS — A Technical Overview
 
EMC Academic Alliance Presentation
The Future of Storage : EMC Software Defined Solution
 
EMC ScaleIO Overview
Emc isilon overview
Emc vi pr data services
Emc isilon technical deep dive workshop
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
Transforming Mission Critical Applications
EMC ViPR Services Storage Engine Architecture
 
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
EMC-ISILON_MphasiS_Walk_through
Journey to the Software Defined Data Center: EMA Research Results Revealed
Ad

Viewers also liked (17)

PDF
EMC & OpenStack: A View From Within
 
PPTX
Integumentary Terms
PDF
Recovered file 1
PPTX
PDF
PDF
Insaat kursu-pendik
PDF
InDA Brochure India
PDF
What a dude born in 1888 taught me about design.
PPTX
PDF
SLG_EMC
PDF
Building the Case for New Technology Have Inspiration, Will Travel ...
PPTX
EMC World 2016 - code.01 Everything as Code - How did we get here?
PPTX
iNARTE Presentation to EMC Symposium 2016
PPT
Dell corporation ltd
PPTX
EMC World 2016 Summary (Part 1)
PPT
EMC Documentum Enterprise Content Management 6.5
PDF
Unit 7 Book
EMC & OpenStack: A View From Within
 
Integumentary Terms
Recovered file 1
Insaat kursu-pendik
InDA Brochure India
What a dude born in 1888 taught me about design.
SLG_EMC
Building the Case for New Technology Have Inspiration, Will Travel ...
EMC World 2016 - code.01 Everything as Code - How did we get here?
iNARTE Presentation to EMC Symposium 2016
Dell corporation ltd
EMC World 2016 Summary (Part 1)
EMC Documentum Enterprise Content Management 6.5
Unit 7 Book
Ad

Similar to Big Data – General Introduction (20)

PDF
Getting started with Hadoop on the Cloud with Bluemix
PPTX
Big Data Hadoop (Overview)
PPTX
PDF
Bigdata and Hadoop Bootcamp
PDF
Semantic web meetup 14.november 2013
PPTX
Bar camp bigdata
PDF
Hadoop Overview
 
PPTX
Fundamentals of big data analytics and Hadoop
PPTX
big data and hadoop
PPTX
Big data analytics - hadoop
PPTX
Big data Presentation
PPTX
Bw tech hadoop
PPTX
BW Tech Meetup: Hadoop and The rise of Big Data
PPTX
Big Data and Hadoop
PDF
Hadoop 101
 
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
PDF
Introduction to Big Data and Hadoop using Local Standalone Mode
PDF
PDF
Research on big data
PDF
Research ON Big Data
Getting started with Hadoop on the Cloud with Bluemix
Big Data Hadoop (Overview)
Bigdata and Hadoop Bootcamp
Semantic web meetup 14.november 2013
Bar camp bigdata
Hadoop Overview
 
Fundamentals of big data analytics and Hadoop
big data and hadoop
Big data analytics - hadoop
Big data Presentation
Bw tech hadoop
BW Tech Meetup: Hadoop and The rise of Big Data
Big Data and Hadoop
Hadoop 101
 
Hadoop_EcoSystem slide by CIDAC India.pptx
Introduction to Big Data and Hadoop using Local Standalone Mode
Research on big data
Research ON Big Data

More from EMC (20)

PDF
Cloud Foundry Summit Berlin Keynote
 
PPTX
EMC GLOBAL DATA PROTECTION INDEX
 
PDF
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
PDF
Citrix ready-webinar-xtremio
 
PDF
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
PPTX
EMC with Mirantis Openstack
 
PPTX
Modern infrastructure for business data lake
 
PDF
Force Cyber Criminals to Shop Elsewhere
 
PDF
Pivotal : Moments in Container History
 
PDF
Data Lake Protection - A Technical Review
 
PDF
Mobile E-commerce: Friend or Foe
 
PDF
Virtualization Myths Infographic
 
PDF
Intelligence-Driven GRC for Security
 
PDF
The Trust Paradox: Access Management and Trust in an Insecure Age
 
PDF
EMC Technology Day - SRM University 2015
 
PDF
EMC Academic Summit 2015
 
PDF
Data Science and Big Data Analytics Book from EMC Education Services
 
PDF
Using EMC Symmetrix Storage in VMware vSphere Environments
 
PDF
Using EMC VNX storage with VMware vSphereTechBook
 
PDF
2014 Cybercrime Roundup: The Year of the POS Breach
 
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
 
2014 Cybercrime Roundup: The Year of the POS Breach
 

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Teaching material agriculture food technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Cloud computing and distributed systems.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf
Programs and apps: productivity, graphics, security and other tools
Teaching material agriculture food technology
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Cloud computing and distributed systems.
sap open course for s4hana steps from ECC to s4
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
MIND Revenue Release Quarter 2 2025 Press Release
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm

Big Data – General Introduction

  • 1. 1© Copyright 2013 EMC Corporation. All rights reserved. Big Data – General Introduction - Vignesh Gopalan , IIG
  • 2. 2© Copyright 2013 EMC Corporation. All rights reserved. Agenda Big Data – Definition Importance of Big Data Technologies used in Big Data Analysis
  • 3. 3© Copyright 2013 EMC Corporation. All rights reserved. Big Data – A Definition Volume Variety Velocity Veracity The ‘V’s of Big Data
  • 4. 4© Copyright 2013 EMC Corporation. All rights reserved. Why is Big Data Important? Business Analytics Big Science like LHC, Gene Sequencing Programs Big Government
  • 5. 5© Copyright 2013 EMC Corporation. All rights reserved. Big Data – Technologies Primer MapReduce computation framework and Hadoop Distributed File System Distributed databases NoSQL technologies
  • 6. 6© Copyright 2013 EMC Corporation. All rights reserved. MapReduce Published by Google Scalable Fault-Tolerant Batch Computation in parallel A distributed computation framework
  • 7. 7© Copyright 2013 EMC Corporation. All rights reserved. MapReduce Consists of two functions operating on key-value pairs. Map – performs filtering and sorting Reduce - performs summary operation on Map step results. … continued
  • 8. 8© Copyright 2013 EMC Corporation. All rights reserved. Map Reduce… Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  • 9. 9© Copyright 2013 EMC Corporation. All rights reserved. Map Reduce… Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  • 10. 10© Copyright 2013 EMC Corporation. All rights reserved. Distributed File System Distributed and scalable file system Highly Available Intrinsically aware of Map and Reduce jobs Supports horizontal and vertical partitioning HDFS – Hadoop Distributed File System
  • 11. 11© Copyright 2013 EMC Corporation. All rights reserved. HDFS Architecture Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  • 12. 12© Copyright 2013 EMC Corporation. All rights reserved. Apache Hadoop Open Source implementation of MapReduce + DFS Image Courtesy – Wikipedia
  • 13. 13© Copyright 2013 EMC Corporation. All rights reserved. NoSQL Databases Highly optimized key-value stores No ACID Guarantees. Eventual consistency Fault-Tolerant, Distributed architecture. Amazon Dynamo, Redis are examples. A distributed computation framework