SlideShare a Scribd company logo
 
QUICK AND DIRTY  PARALLEL PROCESSING  ON THE CLOUD Daniel Sikar
EC2 S3
 
Tools AWS Command line tools
Elastic MapReduce Ruby library
Hadoop
s3cmd
Hadoop MapReduce Job Tracker HDFS – Distributed file system
Hadoop MapReduce usage Data crunching in general Clicks Statistics etc
Hadoop Project Mgmt Committee
MapReduce ?
MapReduce Key Pairs <key,value>
MapReduce
HTTP Logs Log file A: (...) FreeTouchScreenNokia5230 (...) (...) GetRidofAllSpeedCameras(...) (...) USManWinsLottery (...) (...) BNPToLaunchElectionManifesto (...) Log file B: (...) FreeTouchScreenNokia5230 (...) (...) BodyLanguageTellsAll (...)
MapReduce <FreeTouchScreenNokia5230, 1> + <FreeTouchScreenNokia5230, 1> = <FreeTouchScreenNokia5230, 2>
Hadoop Streaming Running MapReduce jobs  with .exe fiels  and scripts $ <list> | mapper | reducer
Hadoop Streaming Running MapReduce jobs  with .exe fiels  and scripts $ <list> | mapper | reducer
Real life example of Hadoop Streaming usage
Wikipedia Page Access Logs
Wine Grape Varieties
Wikipedia WGV Page Access Stats
Business Decisions
Launching a virtual Hadoop Cluster $  elastic-mapreduce  --create --name &quot;Wiki log crunch&quot; --alive --num-instances –instance-type c1.medium 20 Created job flow <job flow id> $  ec2din (...)
 
Hadoop Standalone Operation
Pseudo-Distributed Operation
Fully-Distributed Operation
NameNode
JobTracker
DataNode + TaskTracker
Hadoop Standalone Operation

More Related Content

ODP
Daniel Sikar: Hadoop MapReduce - 06/09/2010
PPTX
scalable machine learning
PPTX
Cloudstack interfaces to EC2 and GCE
PPTX
Ordered Record Collection
PDF
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
PDF
Scala+data
PDF
Unified Data Platform, by Pauline Yeung of Cisco Systems
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Daniel Sikar: Hadoop MapReduce - 06/09/2010
scalable machine learning
Cloudstack interfaces to EC2 and GCE
Ordered Record Collection
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Scala+data
Unified Data Platform, by Pauline Yeung of Cisco Systems
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO

What's hot (20)

PPT
Upgrading To The New Map Reduce API
PPT
Python Coding Examples for Drive Time Analysis
PDF
Data warehouse or conventional database: Which is right for you?
PDF
scikit-cuda
DOCX
Raw system logs processing with hive
PPTX
Leveraging Intra-Node Parallelization in HPCC Systems
PDF
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
Sydney Python Presentation (Feb 2010) - Tracking Large Metallic Objects / Goo...
PPTX
Beyond Lists - Functional Kats Conf Dublin 2015
KEY
Hadoop導入事例 in クックパッド
PPTX
mesos-devoxx14
PDF
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
PDF
"Metrics: Where and How", Vsevolod Polyakov
PDF
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
PDF
Java data structures powered by Redis. Introduction to Redisson @ Redis Light...
PPTX
RedisConf17 - Distributed Java Map Structures and Services with Redisson
PDF
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
PDF
Debugging & Tuning in Spark
PPTX
Scrap Your MapReduce - Apache Spark
PPTX
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Upgrading To The New Map Reduce API
Python Coding Examples for Drive Time Analysis
Data warehouse or conventional database: Which is right for you?
scikit-cuda
Raw system logs processing with hive
Leveraging Intra-Node Parallelization in HPCC Systems
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Sydney Python Presentation (Feb 2010) - Tracking Large Metallic Objects / Goo...
Beyond Lists - Functional Kats Conf Dublin 2015
Hadoop導入事例 in クックパッド
mesos-devoxx14
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
"Metrics: Where and How", Vsevolod Polyakov
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Java data structures powered by Redis. Introduction to Redisson @ Redis Light...
RedisConf17 - Distributed Java Map Structures and Services with Redisson
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Debugging & Tuning in Spark
Scrap Your MapReduce - Apache Spark
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Ad

Similar to Aws Quick Dirty Hadoop Mapreduce Ec2 S3 (20)

ODP
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
PDF
Improving Apache Spark Downscaling
PDF
Hopping in clouds - phpuk 17
PPT
Cloud State of the Union for Java Developers
PPT
TopicMapReduceComet log analysis by using splunk
PPTX
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PDF
Amazed by AWS Series #4
PPTX
FP - Découverte de Play Framework Scala
PDF
R the unsung hero of Big Data
PDF
Dok Talks #124 - Intro to Druid on Kubernetes
PDF
k8s-batch-sig_-_Dask_on_Kubernetes.pptx__1_.pdf
PDF
Into The Box 2018 Going live with commandbox and docker
PDF
Going live with BommandBox and docker Into The Box 2018
PDF
Miscelaneous Debris
PDF
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
PPTX
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
PDF
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
PPTX
«Почему Spark отнюдь не так хорош»
PDF
GDG Cloud Iasi - Docker For The Busy Developer.pdf
PPTX
ETL with SPARK - First Spark London meetup
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Improving Apache Spark Downscaling
Hopping in clouds - phpuk 17
Cloud State of the Union for Java Developers
TopicMapReduceComet log analysis by using splunk
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
Amazed by AWS Series #4
FP - Découverte de Play Framework Scala
R the unsung hero of Big Data
Dok Talks #124 - Intro to Druid on Kubernetes
k8s-batch-sig_-_Dask_on_Kubernetes.pptx__1_.pdf
Into The Box 2018 Going live with commandbox and docker
Going live with BommandBox and docker Into The Box 2018
Miscelaneous Debris
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
«Почему Spark отнюдь не так хорош»
GDG Cloud Iasi - Docker For The Busy Developer.pdf
ETL with SPARK - First Spark London meetup
Ad

More from Skills Matter (20)

PDF
5 things cucumber is bad at by Richard Lawrence
ODP
Patterns for slick database applications
PDF
Scala e xchange 2013 haoyi li on metascala a tiny diy jvm
ODP
Oscar reiken jr on our success at manheim
ODP
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
PDF
Cukeup nyc ian dees on elixir, erlang, and cucumberl
PDF
Cukeup nyc peter bell on getting started with cucumber.js
PDF
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
ODP
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
ODP
Progressive f# tutorials nyc don syme on keynote f# in the open source world
PDF
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
PPTX
Dmitry mozorov on code quotations code as-data for f#
PDF
A poet's guide_to_acceptance_testing
PDF
Russ miles-cloudfoundry-deep-dive
KEY
Serendipity-neo4j
PDF
Simon Peyton Jones: Managing parallelism
PDF
Plug 20110217
PDF
Lug presentation
PPT
I went to_a_communications_workshop_and_they_t
PDF
Plug saiku
5 things cucumber is bad at by Richard Lawrence
Patterns for slick database applications
Scala e xchange 2013 haoyi li on metascala a tiny diy jvm
Oscar reiken jr on our success at manheim
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
Cukeup nyc ian dees on elixir, erlang, and cucumberl
Cukeup nyc peter bell on getting started with cucumber.js
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
Progressive f# tutorials nyc don syme on keynote f# in the open source world
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
Dmitry mozorov on code quotations code as-data for f#
A poet's guide_to_acceptance_testing
Russ miles-cloudfoundry-deep-dive
Serendipity-neo4j
Simon Peyton Jones: Managing parallelism
Plug 20110217
Lug presentation
I went to_a_communications_workshop_and_they_t
Plug saiku

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Spectroscopy.pptx food analysis technology
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectroscopy.pptx food analysis technology
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
A comparative analysis of optical character recognition models for extracting...
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction

Aws Quick Dirty Hadoop Mapreduce Ec2 S3

Editor's Notes

  • #22: So without further ado lets get this show on the road and run a job concurrently on a few virtual machines.