SlideShare a Scribd company logo
SARA Hadoop Hackathon
   Evert.Lammerts@sara.nl
   December 7, 2010
DJOERD HIEMSTRA
                                             (UTwente)




EDGAR MEIJ
     (UvA)



             SARA Hadoop Hackathon, December 7, 2010
2002             2004                   2006

Nutch*           MR/GFS**               Hadoop


*  http://nutch.apache.org/
** http://labs.google.com/papers/mapreduce.html
   http://labs.google.com/papers/gfs.html

                     SARA Hadoop Hackathon, December 7, 2010
2010: A Hype in Production




http://wiki.apache.org/hadoop/PoweredBy
                SARA Hadoop Hackathon, December 7, 2010
Super computing




Cloud computing                               Grid computing




     Cluster computing              GPU computing


                    http://www.sara.nl/

           SARA Hadoop Hackathon, December 7, 2010
:-(
                       Data         Expensive!
                                                         Computation




                                         :-)
                       Data          Cheaper!
                                                         Computation




Ref: Luiz André Barroso and Urs Hölzle, Google Inc.
   The Datacenter as a Computer: An Introduction to the Design of Warehouse­Scale Machines



                             SARA Hadoop Hackathon, December 7, 2010
NameNode              JobTracker




DN   TT   DN      TT             DN        TT        DN     TT


DN   TT   DN      TT             DN        TT        DN     TT



                                                    DN    DataNode

                                                    TT   TaskTracker

          SARA Hadoop Hackathon, December 7, 2010
File   Map                              Shuffle         Reduce           Output
       $ echo “${email#*@}, ${name}”     $ sort          $ wc ­l




                                                                      ewi.utwente.nl, 1
                                                                      gmail.com,      2
                                                                      nbic.nl,        1
                                                                      nikhef.nl,      3
                                                                      sara.nl,        1




                            SARA Hadoop Hackathon, December 7, 2010
From: Hadoop, The Definitive Guide (2nd Edition), Tom White




           SARA Hadoop Hackathon, December 7, 2010
Today

09.30 - 09.50   Welcome & Introduction
09.50 - 10.15   Map/Reduce @ University of Twente
10.15 - 10.30   Kick-off hackathon
14.00 - 15.00   Optional: SARA tour
10.30 - 17.00   Hackathon
17.00 - 17.30   Results and closing




                SARA Hadoop Hackathon, December 7, 2010

More Related Content

ODP
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
ODP
Hadoop @ Sara & BiG Grid
PDF
Large-Scale Data Storage and Processing for Scientists with Hadoop
PDF
Notes on data-intensive processing with Hadoop Mapreduce
ODP
Introduction NL-HUG (April)
PPTX
Hadoop for beginners free course ppt
PDF
Hadoop: Distributed data processing
PDF
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
Hadoop @ Sara & BiG Grid
Large-Scale Data Storage and Processing for Scientists with Hadoop
Notes on data-intensive processing with Hadoop Mapreduce
Introduction NL-HUG (April)
Hadoop for beginners free course ppt
Hadoop: Distributed data processing
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010

What's hot (20)

PPT
Another Intro To Hadoop
PPTX
Hadoop: Distributed Data Processing
PPTX
Sf NoSQL MeetUp: Apache Hadoop and HBase
PPTX
Big Data Analytics for Non-Programmers
PPTX
Hadoop Presentation - PPT
PDF
EclipseCon Keynote: Apache Hadoop - An Introduction
PPTX
Introduction to Hadoop Technology
ODP
Hadoop seminar
PDF
Hadoop/Spark Non-Technical Basics
PDF
Geek camp
PPTX
HADOOP TECHNOLOGY ppt
PPTX
Big data and hadoop
PPTX
Apache hadoop introduction and architecture
PDF
CityLABS Workshop: Working with large tables
PPT
Hadoop Technologies
PPTX
Intro to hadoop ecosystem
PPTX
Hadoop overview
PPTX
Intro to Big Data Hadoop
PPTX
Big data Analytics Hadoop
PDF
Introduction to Hadoop part1
Another Intro To Hadoop
Hadoop: Distributed Data Processing
Sf NoSQL MeetUp: Apache Hadoop and HBase
Big Data Analytics for Non-Programmers
Hadoop Presentation - PPT
EclipseCon Keynote: Apache Hadoop - An Introduction
Introduction to Hadoop Technology
Hadoop seminar
Hadoop/Spark Non-Technical Basics
Geek camp
HADOOP TECHNOLOGY ppt
Big data and hadoop
Apache hadoop introduction and architecture
CityLABS Workshop: Working with large tables
Hadoop Technologies
Intro to hadoop ecosystem
Hadoop overview
Intro to Big Data Hadoop
Big data Analytics Hadoop
Introduction to Hadoop part1

Similar to Introduction to SARA's Hadoop Hackathon - dec 7th 2010 (20)

PDF
20100128ebay
PDF
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
PDF
Getting Started with Hadoop
PDF
Seattle hug 2010
PDF
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
PDF
Riak Intro
PPTX
Hadoop
PPTX
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
PPTX
Large Scale Data With Hadoop
PDF
20091203gemini
PPTX
A brief history of "big data"
PPTX
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
PPT
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
PPTX
TheEdge10 : Big Data is Here - Hadoop to the Rescue
PDF
HUG Meetup 2013: HCatalog / Hive Data Out
PDF
May 2013 HUG: HCatalog/Hive Data Out
ODP
Hadoop demo ppt
PDF
Distributed Social Networking
PDF
20100201hplabs
20100128ebay
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Getting Started with Hadoop
Seattle hug 2010
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Riak Intro
Hadoop
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Large Scale Data With Hadoop
20091203gemini
A brief history of "big data"
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
TheEdge10 : Big Data is Here - Hadoop to the Rescue
HUG Meetup 2013: HCatalog / Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
Hadoop demo ppt
Distributed Social Networking
20100201hplabs

Recently uploaded (20)

PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Touch Screen Technology
PDF
Approach and Philosophy of On baking technology
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
SOPHOS-XG Firewall Administrator PPT.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hindi spoken digit analysis for native and non-native speakers
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Touch Screen Technology
Approach and Philosophy of On baking technology
A novel scalable deep ensemble learning framework for big data classification...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
A comparative analysis of optical character recognition models for extracting...
Web App vs Mobile App What Should You Build First.pdf
Unlocking AI with Model Context Protocol (MCP)
Assigned Numbers - 2025 - Bluetooth® Document
WOOl fibre morphology and structure.pdf for textiles
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
1 - Historical Antecedents, Social Consideration.pdf

Introduction to SARA's Hadoop Hackathon - dec 7th 2010

  • 1. SARA Hadoop Hackathon Evert.Lammerts@sara.nl December 7, 2010
  • 2. DJOERD HIEMSTRA (UTwente) EDGAR MEIJ (UvA) SARA Hadoop Hackathon, December 7, 2010
  • 3. 2002 2004 2006 Nutch* MR/GFS** Hadoop *  http://nutch.apache.org/ ** http://labs.google.com/papers/mapreduce.html    http://labs.google.com/papers/gfs.html SARA Hadoop Hackathon, December 7, 2010
  • 4. 2010: A Hype in Production http://wiki.apache.org/hadoop/PoweredBy SARA Hadoop Hackathon, December 7, 2010
  • 5. Super computing Cloud computing Grid computing Cluster computing GPU computing http://www.sara.nl/ SARA Hadoop Hackathon, December 7, 2010
  • 6. :-( Data Expensive! Computation :-) Data Cheaper! Computation Ref: Luiz André Barroso and Urs Hölzle, Google Inc.    The Datacenter as a Computer: An Introduction to the Design of Warehouse­Scale Machines SARA Hadoop Hackathon, December 7, 2010
  • 7. NameNode JobTracker DN TT DN TT DN TT DN TT DN TT DN TT DN TT DN TT DN DataNode TT TaskTracker SARA Hadoop Hackathon, December 7, 2010
  • 8. File Map Shuffle Reduce Output $ echo “${email#*@}, ${name}” $ sort $ wc ­l ewi.utwente.nl, 1 gmail.com,      2 nbic.nl,        1 nikhef.nl,      3 sara.nl,        1 SARA Hadoop Hackathon, December 7, 2010
  • 9. From: Hadoop, The Definitive Guide (2nd Edition), Tom White SARA Hadoop Hackathon, December 7, 2010
  • 10. Today 09.30 - 09.50 Welcome & Introduction 09.50 - 10.15 Map/Reduce @ University of Twente 10.15 - 10.30 Kick-off hackathon 14.00 - 15.00 Optional: SARA tour 10.30 - 17.00 Hackathon 17.00 - 17.30 Results and closing SARA Hadoop Hackathon, December 7, 2010