Apache Hadoop-based Services für Windows Azure




Sascha Dittmann
Software Developer / Solution Architect
Twitter: @SaschaDittmann
Blog:    http://www.sascha-dittmann.de
Apache Hadoop & Co

             Zookeeper




    Pig
Hadoop Distributed File System

           Cluster Startvorgang
Hadoop Distributed File System
           Ausfall des Namenodes (Failover)
Hadoop Distributed File System
       Benuteranfrage


                        ①

           ②     ②          ②
Hadoop Distributed File System
 Portable Operating System Interface (POSIX)
 Replikation auf mehrere Datenknoten
js> #ls input/ncdc
Found 9 items
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:01 /user/Sascha/input/ncdc/_distcp_logs_g0dedn
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 12:04 /user/Sascha/input/ncdc/_distcp_logs_ofj0u6
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:09 /user/Sascha/input/ncdc/all
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:01 /user/Sascha/input/ncdc/all2
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/metadata
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro-tab
-rw-r--r-- 3 Sascha   supergroup   529 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt
-rw-r--r-- 3 Sascha   supergroup   168 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
Map/Reduce
 DataNode   DataNode   DataNode   0067011990999991950051507004+68750
                                  0043011990999991950051512004+68750
                                  0043011990999991950051518004+68750
                                  0043012650999991949032412004+62300
                                  0043012650999991949032418004+62300




                                  1949,0
                                                         1952,-11
                                  1950,22
   Map        Map        Map      1950,55
                                                         1950,33




   Sort       Sort       Sort     1949,0
                                  1950,[22,33,55]
  Shuffle    Shuffle    Shuffle   1952,-11




             Reduce
                                  1949,0
                                  1950,55
                                  1952,-11
Map/Reduce
 DataNode   DataNode   DataNode   0067011990999991950051507004+68750
                                  0043011990999991950051512004+68750
                                  0043011990999991950051518004+68750
                                  0043012650999991949032412004+62300
                                  0043012650999991949032418004+62300




                                  1949,0
                                                         1952,-11
                                  1950,22
   Map        Map        Map      1950,55
                                                         1950,33




                                  1949,0                 1952,-11
 Combine    Combine    Combine    1950,55                1950,33




   Sort       Sort       Sort     1949,0
                                  1950,[33,55]
  Shuffle    Shuffle    Shuffle   1952,-11




             Reduce
                                  1949,0
                                  1950,55
                                  1952,-11
RDBMS vs. Map/Reduce
                          RDBMS                  Map/Reduce
Datenmenge                Gigabytes              Petabytes
Zugriff                   Interaktiv und Batch   Batch
Lese- / Schreibzugriffe   Viele Lese- und        Einmaliges Schreiben
                          Schreibzugriffe        Viele Lesezugriffe
Datenstruktur             Statisches Schema      Dynamisches Schema
Datenintegrität           Hoch                   Niedrig
Skalierverhalten          Nicht-Linear           Linear
Apache Hadoop & Co

             Zookeeper




    Pig
Demos
 Hadoop Dashboard
 Interactive Console
 Remote Desktop
 Nutzung des WA Storage
 Map/Reduce via JavaScript
 C# Streaming
 Power Pivot
Cloud Bloggers


Die Blogs der deutschen Cloud Computing-Community

Link: http://cloudbloggers.de

More Related Content

PDF
Getting Oriented with MapKit
PPTX
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
PPTX
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
PDF
Hadoop Einführung @codecentric
PDF
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
PDF
Hadoop 2.0 - The Next Level
PPTX
Crawl the entire web in 10 minutes...and just 100€
PPTX
Big Data Bullshit Bingo
Getting Oriented with MapKit
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
Hadoop Einführung @codecentric
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Hadoop 2.0 - The Next Level
Crawl the entire web in 10 minutes...and just 100€
Big Data Bullshit Bingo

Similar to .NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Services für Windows Azure (20)

PDF
Hadoop - Lessons Learned
PDF
第一回Hadoop会at tenjin 20100625
PDF
PStorM
PDF
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
PPT
Big Data Analytics with Hadoop with @techmilind
 
KEY
Buzz words
PDF
Lecture 10: Data-Intensive Computing for Text Analysis (Fall 2011)
PDF
An important part of electrical engineering is PCB design. One impor.pdf
PDF
Geoff Rothman Presentation on Parallel Processing
PDF
Progressive NOSQL: Cassandra
PDF
2012-06-25 - MapReduce auf Azure
PDF
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
PDF
Hadoop Pig
KEY
Taming Cassandra
PDF
Intro to Map Reduce
PDF
Apache HBase: Introduction to a column-oriented data store
PPTX
EMC2, Владимир Суворов
PPTX
Megadata With Python and Hadoop
PPTX
Hadoop and mysql by Chris Schneider
PDF
Partitioning Under The Hood
Hadoop - Lessons Learned
第一回Hadoop会at tenjin 20100625
PStorM
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Big Data Analytics with Hadoop with @techmilind
 
Buzz words
Lecture 10: Data-Intensive Computing for Text Analysis (Fall 2011)
An important part of electrical engineering is PCB design. One impor.pdf
Geoff Rothman Presentation on Parallel Processing
Progressive NOSQL: Cassandra
2012-06-25 - MapReduce auf Azure
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
Hadoop Pig
Taming Cassandra
Intro to Map Reduce
Apache HBase: Introduction to a column-oriented data store
EMC2, Владимир Суворов
Megadata With Python and Hadoop
Hadoop and mysql by Chris Schneider
Partitioning Under The Hood
Ad

More from Sascha Dittmann (15)

PPTX
C# + SQL = Big Data
PDF
Hochskalierbare, relationale Datenbanken in Microsoft Azure
PDF
Microsoft R - Data Science at Scale
PDF
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
PPTX
dotnet Cologne 2015 - Azure Service Fabric
PPTX
SQL Saturday #313 Rheinland - MapReduce in der Praxis
PPTX
Microsoft HDInsight Podcast #001 - Was ist HDInsight
PDF
dotnet Cologne 2013 - Windows Azure Mobile Services
PPTX
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
PPTX
Developer Open Space 2012 - Cloud Computing Workshop
PPTX
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PPTX
CloudOps Summit 2012 - 3 Wege in die Cloud
PPTX
Big Data & NoSQL
PPTX
NoSQL mit RavenDB und Azure
PPTX
Windows Azure für Entwickler V1
C# + SQL = Big Data
Hochskalierbare, relationale Datenbanken in Microsoft Azure
Microsoft R - Data Science at Scale
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
dotnet Cologne 2015 - Azure Service Fabric
SQL Saturday #313 Rheinland - MapReduce in der Praxis
Microsoft HDInsight Podcast #001 - Was ist HDInsight
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
Developer Open Space 2012 - Cloud Computing Workshop
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
CloudOps Summit 2012 - 3 Wege in die Cloud
Big Data & NoSQL
NoSQL mit RavenDB und Azure
Windows Azure für Entwickler V1
Ad

Recently uploaded (20)

PPTX
The various Industrial Revolutions .pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
August Patch Tuesday
PPT
What is a Computer? Input Devices /output devices
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Architecture types and enterprise applications.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
DOCX
search engine optimization ppt fir known well about this
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPT
Geologic Time for studying geology for geologist
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Modernising the Digital Integration Hub
The various Industrial Revolutions .pptx
observCloud-Native Containerability and monitoring.pptx
O2C Customer Invoices to Receipt V15A.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
Web Crawler for Trend Tracking Gen Z Insights.pptx
August Patch Tuesday
What is a Computer? Input Devices /output devices
DP Operators-handbook-extract for the Mautical Institute
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Architecture types and enterprise applications.pdf
Tartificialntelligence_presentation.pptx
A novel scalable deep ensemble learning framework for big data classification...
search engine optimization ppt fir known well about this
A contest of sentiment analysis: k-nearest neighbor versus neural network
Geologic Time for studying geology for geologist
WOOl fibre morphology and structure.pdf for textiles
Zenith AI: Advanced Artificial Intelligence
A review of recent deep learning applications in wood surface defect identifi...
Modernising the Digital Integration Hub

.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Services für Windows Azure

  • 1. Apache Hadoop-based Services für Windows Azure Sascha Dittmann Software Developer / Solution Architect Twitter: @SaschaDittmann Blog: http://www.sascha-dittmann.de
  • 2. Apache Hadoop & Co Zookeeper Pig
  • 3. Hadoop Distributed File System Cluster Startvorgang
  • 4. Hadoop Distributed File System Ausfall des Namenodes (Failover)
  • 5. Hadoop Distributed File System Benuteranfrage ① ② ② ②
  • 6. Hadoop Distributed File System  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/_distcp_logs_g0dedn drwxr-xr-x - Sascha supergroup 0 2012-04-24 12:04 /user/Sascha/input/ncdc/_distcp_logs_ofj0u6 drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • 7. Map/Reduce DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[22,33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  • 8. Map/Reduce DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 1949,0 1952,-11 Combine Combine Combine 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  • 9. RDBMS vs. Map/Reduce RDBMS Map/Reduce Datenmenge Gigabytes Petabytes Zugriff Interaktiv und Batch Batch Lese- / Schreibzugriffe Viele Lese- und Einmaliges Schreiben Schreibzugriffe Viele Lesezugriffe Datenstruktur Statisches Schema Dynamisches Schema Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  • 10. Apache Hadoop & Co Zookeeper Pig
  • 11. Demos  Hadoop Dashboard  Interactive Console  Remote Desktop  Nutzung des WA Storage  Map/Reduce via JavaScript  C# Streaming  Power Pivot
  • 12. Cloud Bloggers Die Blogs der deutschen Cloud Computing-Community Link: http://cloudbloggers.de