SlideShare a Scribd company logo
2
Most read
4
Most read
5
Most read
Introduction to
Big Data
Joey Li
joeylicc@gmail.com
@joeylicc
joeylicc.wordpress.com
What is Big Data?
Big Data is a collection of data sets so large and complex
that it becomes difficult to process using traditional
database systems.
Big Data Challenges (3Vs)
Volume
Amount of Data

Velocity
Speed of Data
In & Out

Variety
Range of
Data Types &
Sources
Microsoft Solution to Big Data
●
●
●
●
●

Microsoft HDInsight
Microsoft .NET SDK for Hadoop
Microsoft ODBC Driver for Hive
Microsoft Excel (Power View & PowerPivot)
Microsoft SharePoint (Power View)
Microsoft HDInsight
● 100% Apache Hadoop compatible Big Data
implementation
● Microsoft support of HDInsight on Windows Server and
Windows Azure
● Simplified deployment and ease of manageability with
System Center 2012 or Windows Azure
● Elegant connectivity to Microsoft Office Excel 2013 and
Business Intelligence tools
What is Hadoop?
Apache Hadoop is an open-source software
framework that allows for the distributed processing of
large data sets across clusters of computers using
simple programming model. It is designed to scale up from
single servers to thousands of machines, each offering
local computation and storage.
What is Hadoop? (Cont.)
Hadoop includes 2 major modules
1. Hadoop Distributed File System (HDFS)
A distributed file system that provides high-throughput
access to application data
2. Hadoop MapReduce
A programming model for parallel processing of large
data sets
Hadoop Architecture
Hadoop Cluster
HDFS Write Operation
HDFS Read Operation
MapReduce
Hadoop Ecosystem
Microsoft .NET SDK for Hadoop
●
●
●
●

HDInsight Cluster Management
Hadoop Job Submission
Customize Map/Reduce Job
LINQ to Hive
Microsoft ODBC Driver for Hive
● Connect the following tools to Hadoop for
data insight
○ Microsoft Excel (Power View & PowerPivot)
○ Microsoft SharePoint (Power View)
○ Microsoft SQL Server
■ Database Engine
■ Analysis Services
Learning Hadoop
● Get Started with Hadoop@Hortonworks
http://hortonworks.com/get-started/

● Big Data University
http://bigdatauniversity.com/

● Getting Started with Microsoft Big Data
http://www.microsoftvirtualacademy.com/training-courses/getting-startedwith-microsoft-big-data
References
● Big Data@Wikipedia
http://en.wikipedia.org/wiki/Big_data

● Big Data@Microsoft
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/businessintelligence/big-data.aspx

● Hortonworks Data Platform (HDP)
http://hortonworks.com/
References (Cont.)
● Apache Hadoop
http://hadoop.apache.org/

● Apache Hadoop@Wikipedia
http://en.wikipedia.org/wiki/Apache_Hadoop

● Microsoft .NET SDK for Hadoop
http://hadoopsdk.codeplex.com/

● Microsoft ODBC Driver for Hive
http://www.microsoft.com/en-us/download/details.aspx?id=37134

More Related Content

PPTX
Introduction to Big Data
PDF
UNIT 1 -BIG DATA ANALYTICS Full.pdf
PPTX
PDF
Big data.
PPTX
Big data
PPTX
Hadoop technology
PPTX
BIG DATA and USE CASES
Introduction to Big Data
UNIT 1 -BIG DATA ANALYTICS Full.pdf
Big data.
Big data
Hadoop technology
BIG DATA and USE CASES

What's hot (20)

PPTX
Big data
PPTX
Big Data & Hadoop Introduction
PPTX
Big data and Hadoop
PPTX
INTRODUCTION TO BIG DATA AND HADOOP
PPTX
Introduction to Hadoop and Hadoop component
PPTX
Big_data_ppt
PPTX
Big Data - Applications and Technologies Overview
PPTX
Big data Presentation
PPTX
Big Data Open Source Technologies
PPT
Data Dictionary
PPTX
Presentation About Big Data (DBMS)
PDF
Introduction to Hadoop
PDF
Hadoop Overview & Architecture
 
PDF
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
PPTX
What is big data?
PPT
Hadoop hive presentation
PPSX
PDF
Data warehouse architecture
PPTX
Big data
Big data
Big Data & Hadoop Introduction
Big data and Hadoop
INTRODUCTION TO BIG DATA AND HADOOP
Introduction to Hadoop and Hadoop component
Big_data_ppt
Big Data - Applications and Technologies Overview
Big data Presentation
Big Data Open Source Technologies
Data Dictionary
Presentation About Big Data (DBMS)
Introduction to Hadoop
Hadoop Overview & Architecture
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
What is big data?
Hadoop hive presentation
Data warehouse architecture
Big data
Ad

Viewers also liked (20)

PPTX
Big Data - A brief introduction
PPTX
Big data deep learning: applications and challenges
PPTX
A brief history of "big data"
PDF
Introduction to Big Data
PDF
Virtualization, the cloud enabler
PPTX
Big data and its applications
PDF
Big Data simplified
PPTX
Big Data Application Architectures - IoT
PPTX
Big Data Tutorial V4
KEY
Big Data Trends
PPTX
Big Idea For Big Data
PPTX
Big Data - The 5 Vs Everyone Must Know
PPTX
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
PPTX
Big data ppt
PPT
Big Data
PPT
Big data ppt
PPTX
What is Big Data?
PPTX
Big Data Analytics with Hadoop
PPTX
Big data ppt
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - A brief introduction
Big data deep learning: applications and challenges
A brief history of "big data"
Introduction to Big Data
Virtualization, the cloud enabler
Big data and its applications
Big Data simplified
Big Data Application Architectures - IoT
Big Data Tutorial V4
Big Data Trends
Big Idea For Big Data
Big Data - The 5 Vs Everyone Must Know
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big data ppt
Big Data
Big data ppt
What is Big Data?
Big Data Analytics with Hadoop
Big data ppt
Big Data - 25 Amazing Facts Everyone Should Know
Ad

Similar to Introduction to Big Data (20)

PPTX
Big Data in the Real World
PPTX
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
PPTX
HDInsight Hadoop on Windows Azure
PDF
Introduction To Big Data Analytics On Hadoop - SpringPeople
PPT
Data analytics & its Trends
PDF
Big Data Analytics Lecture notes pdf notes
PDF
Introduction to Big Data
PPTX
Introduction to Big Data
PPTX
Hd insight overview
PPTX
Big Data - What's the Big Deal
PPTX
Big-Data-Seminar-6-Aug-2014-Koenig
PPTX
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
PPTX
Introduction to Azure HDInsight
PPTX
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
PPTX
Uotm workshop
PPTX
Big Data
PDF
Hadoop Master Class : A concise overview
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PDF
Microsoft Big Data
PPTX
Introduction to BIG DATA
Big Data in the Real World
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
HDInsight Hadoop on Windows Azure
Introduction To Big Data Analytics On Hadoop - SpringPeople
Data analytics & its Trends
Big Data Analytics Lecture notes pdf notes
Introduction to Big Data
Introduction to Big Data
Hd insight overview
Big Data - What's the Big Deal
Big-Data-Seminar-6-Aug-2014-Koenig
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Introduction to Azure HDInsight
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Uotm workshop
Big Data
Hadoop Master Class : A concise overview
Lecture 5 - Big Data and Hadoop Intro.ppt
Microsoft Big Data
Introduction to BIG DATA

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity

Introduction to Big Data

  • 1. Introduction to Big Data Joey Li joeylicc@gmail.com @joeylicc joeylicc.wordpress.com
  • 2. What is Big Data? Big Data is a collection of data sets so large and complex that it becomes difficult to process using traditional database systems. Big Data Challenges (3Vs) Volume Amount of Data Velocity Speed of Data In & Out Variety Range of Data Types & Sources
  • 3. Microsoft Solution to Big Data ● ● ● ● ● Microsoft HDInsight Microsoft .NET SDK for Hadoop Microsoft ODBC Driver for Hive Microsoft Excel (Power View & PowerPivot) Microsoft SharePoint (Power View)
  • 4. Microsoft HDInsight ● 100% Apache Hadoop compatible Big Data implementation ● Microsoft support of HDInsight on Windows Server and Windows Azure ● Simplified deployment and ease of manageability with System Center 2012 or Windows Azure ● Elegant connectivity to Microsoft Office Excel 2013 and Business Intelligence tools
  • 5. What is Hadoop? Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
  • 6. What is Hadoop? (Cont.) Hadoop includes 2 major modules 1. Hadoop Distributed File System (HDFS) A distributed file system that provides high-throughput access to application data 2. Hadoop MapReduce A programming model for parallel processing of large data sets
  • 13. Microsoft .NET SDK for Hadoop ● ● ● ● HDInsight Cluster Management Hadoop Job Submission Customize Map/Reduce Job LINQ to Hive
  • 14. Microsoft ODBC Driver for Hive ● Connect the following tools to Hadoop for data insight ○ Microsoft Excel (Power View & PowerPivot) ○ Microsoft SharePoint (Power View) ○ Microsoft SQL Server ■ Database Engine ■ Analysis Services
  • 15. Learning Hadoop ● Get Started with Hadoop@Hortonworks http://hortonworks.com/get-started/ ● Big Data University http://bigdatauniversity.com/ ● Getting Started with Microsoft Big Data http://www.microsoftvirtualacademy.com/training-courses/getting-startedwith-microsoft-big-data
  • 16. References ● Big Data@Wikipedia http://en.wikipedia.org/wiki/Big_data ● Big Data@Microsoft http://www.microsoft.com/en-us/sqlserver/solutions-technologies/businessintelligence/big-data.aspx ● Hortonworks Data Platform (HDP) http://hortonworks.com/
  • 17. References (Cont.) ● Apache Hadoop http://hadoop.apache.org/ ● Apache Hadoop@Wikipedia http://en.wikipedia.org/wiki/Apache_Hadoop ● Microsoft .NET SDK for Hadoop http://hadoopsdk.codeplex.com/ ● Microsoft ODBC Driver for Hive http://www.microsoft.com/en-us/download/details.aspx?id=37134