SlideShare a Scribd company logo
Copyright ©2012 Big Logic Technologies
A Big Data - Technology, Consulting & Training Firm
-- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it
provides a Big Data Analytics Platform.

-- At Big Logic, we share our experiences after guiding many enterprises through successful Big
Data projects. We empower you to decide on build versus buy when it comes to achieving your
defined business objectives across various technical environments.

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Big data is a term applied to data sets whose size is beyond the ability of commonly used
software tools to capture, manage, and process the data within a tolerable elapsed time.

Gartner Predicts
800% data
growth over next
5 years

4
Copyright ©2012 Big Logic Technologies

80-90% of data
produced today
is unstructured
Copyright ©2012 Big Logic Technologies
6
Copyright ©2012 Big Logic Technologies
gigabyte (GB)

109

1024MB

terabyte (TB)

1012

1024GB

petabyte (PB)

1015

1024TB

exabyte (EB)

1018

1024PB

zettabyte (ZB)

1021

1024EB

yottabyte (YB)

1024

1024YB

2020
35 zettabytes
i.e. 35Billion TBs

44x as much
Data and Content
Over Coming Decade

2009
800,000 petabytes

Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010

1 zettabyte = 1 099 511 627 776 GB
7

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Source:
http://www.slideshare.net/cultureofperform
ance/gartner-idc-and-mckinsey-on-big-data
Copyright ©2012 Big Logic Technologies
“ Moore's law is the observation that, over the history of computing hardware, the
number of transistors on integrated circuits doubles approximately every two years. ”
..Intel co-founder Gordon E. Moore

Copyright ©2012 Big Logic Technologies
RAM Max Capacity : 32GB

HDD Max Size : 6TB

-------------------CPU Max Speed-------------------

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
If I Need to process 100TB datasets
• On 1 node:
– scanning @ 50MB/s = 23 days
• On 1000 node cluster:
– scanning @ 50MB/s = 33 min
 Challenge: Hardware Problems / Process and combine data from
Multiple disks

Copyright ©2012 Big Logic Technologies
•Apache Hadoop is an open source framework for storing, processing
and analysing massive amounts of multi-structured data in a
distributed environment.
•Hadoop was inspired by Google's MapReduce and Google File
System (GFS) papers.
Copyright ©2012 Big Logic Technologies
If you are in any of the above segments you would be the part of the above revenue

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies

More Related Content

PDF
Big Data Story - From An Engineer's Perspective
PPTX
Gail Zhou on "Big Data Technology, Strategy, and Applications"
PDF
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
PPTX
A Big Data Timeline
PPTX
A brief history of "big data"
PDF
Introduction to Big Data
PDF
The Evolution of Blue Ocean Databases, from SQL to Blockchain
PPTX
Introduction of Big data and Hadoop
Big Data Story - From An Engineer's Perspective
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
A Big Data Timeline
A brief history of "big data"
Introduction to Big Data
The Evolution of Blue Ocean Databases, from SQL to Blockchain
Introduction of Big data and Hadoop

What's hot (20)

PPTX
What is big data?
PPTX
Big data
PPTX
Presentation About Big Data (DBMS)
PPTX
PDF
Big Data Hadoop Training by Easylearning Guru
ODP
e-Infrastructure @ Science
 
PDF
Big Data on Public Cloud
PDF
متن‌بازسازی کلان‌داده
PPT
Big data analysis using map/reduce
PPTX
A novel approach to big data veracity using crowd-sourcing techniques
PPTX
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
PPTX
Our big data
ODP
re:Introduce Big Data and Hadoop Eco-system.
PPTX
Big data by Mithlesh sadh
PPTX
Big_data_ppt
PPTX
Data mining with big data
PDF
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
PPTX
VFB 2013 - HP Labs - Horizon Scanning - Technology Trends
PPTX
Big Data & Hadoop Introduction
PPTX
The rise of “Big Data” on cloud computing
What is big data?
Big data
Presentation About Big Data (DBMS)
Big Data Hadoop Training by Easylearning Guru
e-Infrastructure @ Science
 
Big Data on Public Cloud
متن‌بازسازی کلان‌داده
Big data analysis using map/reduce
A novel approach to big data veracity using crowd-sourcing techniques
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Our big data
re:Introduce Big Data and Hadoop Eco-system.
Big data by Mithlesh sadh
Big_data_ppt
Data mining with big data
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
VFB 2013 - HP Labs - Horizon Scanning - Technology Trends
Big Data & Hadoop Introduction
The rise of “Big Data” on cloud computing
Ad

Viewers also liked (9)

PDF
Food From The Heart
PDF
insilico: Neuronal Network Simulation C++ Library
PPTX
Project management for waste water treatment project
PPTX
Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
PPTX
Ensenyament de pentinats
PDF
Food From the Heart
PPT
Socialconstructivism
PDF
Git Tutorial
PPTX
Sejarah wawasan 2020
Food From The Heart
insilico: Neuronal Network Simulation C++ Library
Project management for waste water treatment project
Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
Ensenyament de pentinats
Food From the Heart
Socialconstructivism
Git Tutorial
Sejarah wawasan 2020
Ad

Similar to Introduction to Big Data by Manouj Bongirr (20)

PPT
Murli Thirumale, CEO Ocarina Networks
PPTX
Big data oracle_introduccion
PDF
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
PDF
Big Data - A Real Life Revolution
PPTX
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
ODP
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
PDF
Big Data: Myths and Realities
PDF
Big Data: an introduction
PDF
Oracle databáze – Konsolidovaná Data Management Platforma
PDF
Towards A Reference Architecture for BIG DATA.pdf
PDF
Analyzing Big Data - Jeff Scheel
PPTX
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
PPTX
Big Data - Hadoop and MapReduce - Aditya Garg
PDF
6 enriching your data warehouse with big data and hadoop
PPTX
Deutsche Telekom on Big Data
PPTX
big-data-8722-m8RQ3h1.pptx
PDF
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
MT30 Best practices for data lake adoption
Murli Thirumale, CEO Ocarina Networks
Big data oracle_introduccion
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Big Data - A Real Life Revolution
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Big Data: Myths and Realities
Big Data: an introduction
Oracle databáze – Konsolidovaná Data Management Platforma
Towards A Reference Architecture for BIG DATA.pdf
Analyzing Big Data - Jeff Scheel
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Big Data - Hadoop and MapReduce - Aditya Garg
6 enriching your data warehouse with big data and hadoop
Deutsche Telekom on Big Data
big-data-8722-m8RQ3h1.pptx
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
MT30 Best practices for data lake adoption

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Modernizing your data center with Dell and AMD
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Introduction to Big Data by Manouj Bongirr

  • 1. Copyright ©2012 Big Logic Technologies
  • 2. A Big Data - Technology, Consulting & Training Firm -- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it provides a Big Data Analytics Platform. -- At Big Logic, we share our experiences after guiding many enterprises through successful Big Data projects. We empower you to decide on build versus buy when it comes to achieving your defined business objectives across various technical environments. Copyright ©2012 Big Logic Technologies
  • 3. Copyright ©2012 Big Logic Technologies
  • 4. Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Gartner Predicts 800% data growth over next 5 years 4 Copyright ©2012 Big Logic Technologies 80-90% of data produced today is unstructured
  • 5. Copyright ©2012 Big Logic Technologies
  • 6. 6 Copyright ©2012 Big Logic Technologies
  • 7. gigabyte (GB) 109 1024MB terabyte (TB) 1012 1024GB petabyte (PB) 1015 1024TB exabyte (EB) 1018 1024PB zettabyte (ZB) 1021 1024EB yottabyte (YB) 1024 1024YB 2020 35 zettabytes i.e. 35Billion TBs 44x as much Data and Content Over Coming Decade 2009 800,000 petabytes Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010 1 zettabyte = 1 099 511 627 776 GB 7 Copyright ©2012 Big Logic Technologies
  • 8. Copyright ©2012 Big Logic Technologies
  • 10. “ Moore's law is the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. ” ..Intel co-founder Gordon E. Moore Copyright ©2012 Big Logic Technologies
  • 11. RAM Max Capacity : 32GB HDD Max Size : 6TB -------------------CPU Max Speed------------------- Copyright ©2012 Big Logic Technologies
  • 12. Copyright ©2012 Big Logic Technologies
  • 13. Copyright ©2012 Big Logic Technologies
  • 14. If I Need to process 100TB datasets • On 1 node: – scanning @ 50MB/s = 23 days • On 1000 node cluster: – scanning @ 50MB/s = 33 min  Challenge: Hardware Problems / Process and combine data from Multiple disks Copyright ©2012 Big Logic Technologies
  • 15. •Apache Hadoop is an open source framework for storing, processing and analysing massive amounts of multi-structured data in a distributed environment. •Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. Copyright ©2012 Big Logic Technologies
  • 16. If you are in any of the above segments you would be the part of the above revenue Copyright ©2012 Big Logic Technologies
  • 17. Copyright ©2012 Big Logic Technologies