SlideShare a Scribd company logo
Big Data – A Brief Overview

    Petabytes, Hadoop, Analytics,
 Collaborative business intelligence,
Data scientists, In-Memory Databases,
           NoSQL platforms
Big Data
•   What is it?
•   Where does it come from?
•   How do we process it?
•   What do we do with it?
•   Who are the players?
•   What are the opportunities?
What Is Big Data?

Like the term Cloud, it is a bit
           Nebulous
Big data – a brief overview
Attributes of Big Data
• Volume
• Velocity - streaming
• Variety
Where Does It Come From?

        It Depends
Key Drivers

Spread of cloud computing, mobile
   computing and social media
technologies, financial transactions
Sources of Big Data
•   Chatter from social networks,
•   Web server logs,
•   Traffic flow sensors,
•   Satellite imagery,
•   Broadcast audio streams,
•   Banking transactions,
•   MP3s of rock music,
•   The content of web pages,
•   Scans of government documents,
•   GPS trails,
•   Telemetry from automobiles,
•   Financial market data
•   ….
Big data – a brief overview
Big data – a brief overview
How Do We Process It?
Process Pipeline




Source: http://radar.oreilly.com
Hadoop

A distributed processing Framework
       based on Map/Reduce
Pig
A platform for analyzing large data sets that
    consists of a high-level language for
expressing data analysis programs, coupled
  with infrastructure for evaluating these
                  programs.
Mahout

A machine learning library with algorithms
  for clustering, classification and batch
   based collaborative filtering that are
 implemented on top of Apache Hadoop.
Hive

Data warehouse software built on top of
Apache Hadoop that facilitates querying
and managing large datasets residing in
         distributed storage.
Pegasus

A Peta-scale graph mining system that runs
 in parallel, distributed manner on top of
                    Hadoop
Sqoop

A tool designed for efficiently transferring
 bulk data between Apache Hadoop and
structured data stores such as relational
               databases.
Flume

          A distributed service for
collecting, aggregating, and moving large
        log data amounts to HDFS.
Yahoo S4
 S4 is a general-purpose, distributed, scalable,
partially fault-tolerant, pluggable platform that
     allows programmers to easily develop
     applications for processing continuous
           unbounded streams of data.
Twitter Storm

Storm can be used to process a
stream of new data and update
    databases in real time.
Trends

Funding, Companies, Applications, Jo
             bs, IPOs
Funding & IPO
• Cloudera, (Commerical Hadoop) more than
  $75 million
• MapR (Cloudera competitor) has raised more
  than $25 million
• 10Gen (Maker of the MongoDB) $32 million
• DataStax (Products based on Apache
  Cassandra) $11 million
• Splunk raised about $230 million through IPO
Big data – a brief overview
Big data – a brief overview
Big Data Application Domains
•   Healthcare
•   The public sector
•   Retail
•   Manufacturing
•   Personal-location data
•   Finance
A Few Examples
Big data – a brief overview
Big data – a brief overview
PayPal Tracking Architecture
Market and Market Segments

   Research Data and Predictions
Big data – a brief overview
http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
Market for big data tools will rise
from $9 billion to $86 billion in 2020
http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
Big data – a brief overview
Future of Big Data
• More Powerful and Expressive Tools for Analysis
• Streaming Data Processing (Storm from Twitter and S4 from
  Yahoo)
• Rise of Data Market Places (InfoChimps, Azure
  Marketplace)
• Development of Data Science Workflows and Tools
  (Chorus, The Guardian, New York Times)
• Increased Understanding of Analysis and Visualization



http://www.evolven.com/blog/big-data-predictions.html
http://www.evolven.com/blog/big-data-predictions.html
Opportunities
Skills Gap
•   Statistics
•   Operations Research
•   Math
•   Programming
•   So-called "Data Hacking"
Big data – a brief overview
Big data – a brief overview

More Related Content

PPTX
Big data ppt
PPTX
An exploration in analysis and visualization
PDF
Big data analytics with Apache Hadoop
PPTX
Big Data - Applications and Technologies Overview
PDF
Big Data Hadoop Training by Easylearning Guru
PPTX
Big_data_ppt
PPTX
Chapter 1 big data
PPTX
Big Data Analytics
Big data ppt
An exploration in analysis and visualization
Big data analytics with Apache Hadoop
Big Data - Applications and Technologies Overview
Big Data Hadoop Training by Easylearning Guru
Big_data_ppt
Chapter 1 big data
Big Data Analytics

What's hot (20)

PDF
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
PPTX
A Big Data Concept
PPTX
big data overview ppt
PDF
Big Data Final Presentation
PDF
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
PPTX
Big Data Analytics MIS presentation
PPTX
Introduction to Big Data
PDF
Big data Big Analytics
PPTX
What is big data?
PPTX
Big data by Mithlesh sadh
PPT
Big data Analytics
PDF
Sina Sohangir Presentation on IWMC 2015
PPTX
PPTX
Big Data Overview 2013-2014
PPTX
Big Data PPT by Rohit Dubey
PPT
Big data analytics, survey r.nabati
PDF
Core concepts and Key technologies - Big Data Analytics
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
PDF
Big data - what, why, where, when and how
PPTX
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
A Big Data Concept
big data overview ppt
Big Data Final Presentation
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data Analytics MIS presentation
Introduction to Big Data
Big data Big Analytics
What is big data?
Big data by Mithlesh sadh
Big data Analytics
Sina Sohangir Presentation on IWMC 2015
Big Data Overview 2013-2014
Big Data PPT by Rohit Dubey
Big data analytics, survey r.nabati
Core concepts and Key technologies - Big Data Analytics
Tools and Methods for Big Data Analytics by Dahl Winters
Big data - what, why, where, when and how
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Ad

Viewers also liked (20)

PPTX
Tutorial semantic wikis and applications
PPTX
Big Data Analytics with Hadoop
PPTX
Big Data - An Overview
PPT
Hadoop Demo eConvergence
PDF
Big data overview
PPTX
Overview of big data & hadoop version 1 - Tony Nguyen
PDF
BigData Overview
PDF
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
PDF
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
PDF
BigData - Hadoop -by 侯圣文@secooler
KEY
Intro to Data Science for Enterprise Big Data
PDF
Myths and Mathemagical Superpowers of Data Scientists
PPTX
Big data ppt
PDF
Titan: The Rise of Big Graph Data
PDF
How to Interview a Data Scientist
PDF
Titan: Big Graph Data with Cassandra
PDF
A Statistician's View on Big Data and Data Science (Version 1)
PDF
Introduction to R for Data Mining
PPT
Big Data: An Overview
PPTX
What is Big Data?
Tutorial semantic wikis and applications
Big Data Analytics with Hadoop
Big Data - An Overview
Hadoop Demo eConvergence
Big data overview
Overview of big data & hadoop version 1 - Tony Nguyen
BigData Overview
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
BigData - Hadoop -by 侯圣文@secooler
Intro to Data Science for Enterprise Big Data
Myths and Mathemagical Superpowers of Data Scientists
Big data ppt
Titan: The Rise of Big Graph Data
How to Interview a Data Scientist
Titan: Big Graph Data with Cassandra
A Statistician's View on Big Data and Data Science (Version 1)
Introduction to R for Data Mining
Big Data: An Overview
What is Big Data?
Ad

Similar to Big data – a brief overview (20)

PDF
Business of Big Data
PDF
Big data
PPT
Where does hadoop come handy
PDF
KEY
Intro To Hadoop
PDF
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
 
PPTX
Big Data & Hadoop Introduction
ODP
Hadoop introduction
PDF
PDF
TCS_DATA_ANALYSIS_REPORT_ADITYA
PPT
Big Data
PPTX
Getting Started with Big Data in the Cloud
PPTX
Gilbane Boston 2012 Big Data 101
PDF
Hadoop framework thesis (3)
DOCX
Hadoop Report
PDF
big data analytics introduction chapter 1
PDF
PDF
Big Data-Survey
PPTX
Big Data_Architecture.pptx
Business of Big Data
Big data
Where does hadoop come handy
Intro To Hadoop
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
 
Big Data & Hadoop Introduction
Hadoop introduction
TCS_DATA_ANALYSIS_REPORT_ADITYA
Big Data
Getting Started with Big Data in the Cloud
Gilbane Boston 2012 Big Data 101
Hadoop framework thesis (3)
Hadoop Report
big data analytics introduction chapter 1
Big Data-Survey
Big Data_Architecture.pptx

More from Dorai Thodla (20)

PDF
Exploring opportunities for microproducts version 2
PDF
Product skills
PPTX
Spotting Opportunities
PPTX
Why startups need mentors and coaches
PPTX
Student startups
PPTX
Tools for teaching and learning
PPTX
A few lessons from 4 startups
PPT
From Device Drivers to Data Analytics
PPTX
Tracking emerging technologies
PPTX
Collaboration and knowledge sharing with twitter
PPT
Exploring Opportunities E Week Talk
PPT
Leveraging Social Media
PPTX
InfoTools: Beyond Search
PPS
Dorai Nasscom Apr29
PPT
Learning About Learning And Thinking About Thinking
PPS
Dorai Nasscom Apr29
PPT
Technology Trends
PPT
Bootstrapping A Product Company Version 2
PPT
Improving Teaching/Learning
PDF
Technology Trends
Exploring opportunities for microproducts version 2
Product skills
Spotting Opportunities
Why startups need mentors and coaches
Student startups
Tools for teaching and learning
A few lessons from 4 startups
From Device Drivers to Data Analytics
Tracking emerging technologies
Collaboration and knowledge sharing with twitter
Exploring Opportunities E Week Talk
Leveraging Social Media
InfoTools: Beyond Search
Dorai Nasscom Apr29
Learning About Learning And Thinking About Thinking
Dorai Nasscom Apr29
Technology Trends
Bootstrapping A Product Company Version 2
Improving Teaching/Learning
Technology Trends

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
1. Introduction to Computer Programming.pptx
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Programs and apps: productivity, graphics, security and other tools
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Big data – a brief overview