SlideShare a Scribd company logo
MapReduce
@shot6
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
•                 MapReduce

     –    Mapper/Reducer
• 
MapReduce                      	
•              WordCount
• 
• 
     – Mapper/Reducer       Job   ⾏行行
     – InputFormat/OutputFormat         ⽅方
     – HDFS(FileSystem)
     –     Writable     ⽅方
WordCount	
•  Hadoop          Hello World
•                   API
   (org.apache.hadoop.mapreduce)
•  API
Grep	
•  grep
  – grepJob/sortJob 2
        ⾏行行
  – JobConf/Mapper/Reducer            ⽅方
  – Mapper RegexMapper     ⾏行行   <Text,
    Long> SequenceFileFormat
  – sortJob
  –                                ⼒力力
  – 
Grep
                  -	
•  JobConf
•  Mapper
•  Reducer
o.a.hadoop.mapred.JobConf	
• 
     –           mapred-default.xml
     –    conf/mapred-site.xml
     – XML    ⾝身
       DOM
     – ⾃自        ⽬目    ⼿手
     –  ⼦子
       •  JobConf child = new JobConf(   Conf,   jar
                                 );
mapred-site.xml	
<configuration>
<!–                 -->
<property>
 <key>mapred.job.tracker</key>
 <value>your-site:9001</value>
</property>
</configuration>
o.a.hadoop.mapred.Mapper	
•  Mapper
•  InputSplit    Mapper
•  MapTask/MapRunner
•  map(KEY, VALUE, COLLECTOR,
   REPORTER)
     – KEY:Map      VALUE:Map
     – COLLECTOR:
     – REPORTER:                     API
•         MapReduceBase
o.a.hadoop.mapred.MapTask	
•  Map
•  initiazlize              (Task Reducer    )
  –                                     ⽣生
  –               (o.a.h.mapred.TaskStatus.State)
       •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED,
          KILLED, COMMIT_PENDING, FAILED_UNCLEAN,
          KILLED_UNCLEAN
  – OutputCommiter ⽣生
       •  Task        ⼒力力            ⾏行行
       •                                         ⼒力力
          – mapred.work.output.dir
o.a.h.mapred.MapTask cont	
•  run        runOldMapper
•  JobClient
   InputSplit
•  RecordReader
o.a.h.mapred.MapTask cont2	
•  Reduce
  –              spill                   (*            )
       •  $mapred.local.dir/taskTracker/jobcache/$
          {taskid}/output/spill${spillNumber}.out
  – Reducer
                 ⼒力力
       •  Combiner        min.num.spills.for.combine
                          combiner
  –              RecordWriter                 ⼒力力
•  MapRunner
o.a.h.mapred.MapRunner	
•  MapRunnable
  – mapred.map.runner.class
  – Hadoop
    PipeMapRunner
  –               Map
    MultiThreadedMapRunner
o.a.h.mapred.MapRunner
                cont	
•  run(RecordReader, OutputCollector,
   Reporter)
     – RecordReader: InputFormat Split
         Reader(InputFormat/RecordReader
                           )
• 
     – RecordReader
     – 
                ⾝身
     – 
MapTask	
      MapRunner	
              Mapper	
         Record            Output
                                                         Reader	
          Collector	
       Input
      Split⽣生 	
  
                                                          	
                                                                                   	

                                                                             Spill
              & run	
                            createKey()                SpillThread
                                                 createValue()	
                    	
  

                                                 next(key, value)	

            EOF         	
     Map(key, value,
                                                                           Spill
                               outputCollector, reporter)
m(_ _)m
•  Mapper
     – JobConf
     – Mapper/MapRunner/MapTask
• 
     – Reducer
       •  Reducer   ⾏行行
       •  Reducer                 ⾏行行
     – InputFormat/RecordReader
o.a.h.mapred.Reducer	
•  Reducer
•  InputSplit      Mapper
•  ReduceTask/ReduceRunner
•  reduce(KEY, Iterator<VALUE>,
   COLLECTOR, REPORTER)
     – KEY:    Iterator<VALUE>:
     – COLLECTOR:
     – REPORTER:                       API
•         MapReduceBase
o.a.h.mapred.ReduceTask	
•  SHUFFLE
•  ReduceTask.ReduceCopier
  – fetchOutputs(            Merger.MergeQueue)
    •  Map                            x   mapred.reduce.parallel.copies

         – MapOutputCopier
    •  Map
          ⾏行行 LocalFSMerger
    •                  ⾏行行 InMemFSMergeThread
    •  GetMapEventsThread
         – Map
         – <     , MapOutputLocation(taskId, host, httpUrl)>
    •    ⼀一 TaskTracker                                         ⼯工
o.a.h.mapred.ReduceTask	
•  run(RecordReader, OutputCollector,
   Reporter)
•  SORT
  – Memory, disk                        ⽣生
    •  RowKeyValueItetator
  – Reducer ⽣生
  – RecordWriter ⽣生
  – ReduceValuesIterator       ⾏行行

More Related Content

PPTX
Introduction to Apache Pig
PDF
Apache Hadoop for System Administrators
PPTX
Terraform infraestructura como código
KEY
Perl on Amazon Elastic MapReduce
PPTX
Hadoop on osx
KEY
My life as a beekeeper
PDF
Introduction to Apache Hive
PPTX
Introduction to Apache Pig
Introduction to Apache Pig
Apache Hadoop for System Administrators
Terraform infraestructura como código
Perl on Amazon Elastic MapReduce
Hadoop on osx
My life as a beekeeper
Introduction to Apache Hive
Introduction to Apache Pig

What's hot (20)

PPT
Hive User Meeting August 2009 Facebook
PDF
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
PDF
SQL to Hive Cheat Sheet
PDF
Installing Apache Hive, internal and external table, import-export
PPTX
Hive commands
PDF
Sql cheat sheet
PDF
Shark - Lab Assignment
KEY
Hive vs Pig for HadoopSourceCodeReading
PDF
HadoopThe Hadoop Java Software Framework
KEY
Hadoop導入事例 in クックパッド
PDF
Introduction to scoop and its functions
PDF
Infrastructure as Code with Terraform
PPTX
Lua: the world's most infuriating language
PDF
HBase + Hue - LA HBase User Group
PDF
Build your own_map_by_yourself
PDF
REST Active Resource - 7º Encontro do GURU Sorocaba
PPT
Hive User Meeting March 2010 - Hive Team
KEY
Using spaces (Drupal)
PPTX
Advanced Sqoop
PDF
What's New In JDK 10
Hive User Meeting August 2009 Facebook
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
SQL to Hive Cheat Sheet
Installing Apache Hive, internal and external table, import-export
Hive commands
Sql cheat sheet
Shark - Lab Assignment
Hive vs Pig for HadoopSourceCodeReading
HadoopThe Hadoop Java Software Framework
Hadoop導入事例 in クックパッド
Introduction to scoop and its functions
Infrastructure as Code with Terraform
Lua: the world's most infuriating language
HBase + Hue - LA HBase User Group
Build your own_map_by_yourself
REST Active Resource - 7º Encontro do GURU Sorocaba
Hive User Meeting March 2010 - Hive Team
Using spaces (Drupal)
Advanced Sqoop
What's New In JDK 10
Ad

Similar to サンプルから見るMap reduceコード (20)

PPTX
Hadoop MapReduce Streaming and Pipes
PDF
Lecture 2 part 3
PPT
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
PPT
mapreduce ppt.ppt
PPT
L3.fa14.ppt
PPT
MAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
PPT
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
PDF
Osd ctw spark
PPTX
MAP REDUCE IN DATA SCIENCE.pptx
PPTX
Map Reduce
PDF
Hadoop Overview kdd2011
PDF
Hadoop Overview & Architecture
 
PDF
Hive Anatomy
PDF
Introduction to Spark on Hadoop
KEY
PDF
Hadoop first mr job - inverted index construction
PDF
Large Scale Data Processing & Storage
PDF
Elephant in the cloud
PDF
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
PDF
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
Hadoop MapReduce Streaming and Pipes
Lecture 2 part 3
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
mapreduce ppt.ppt
L3.fa14.ppt
MAPREDUCE ppt big data computing fall 2014 indranil gupta.ppt
L4.FA16n nm,m,m,,m,m,m,mmnm,n,mnmnmm.ppt
Osd ctw spark
MAP REDUCE IN DATA SCIENCE.pptx
Map Reduce
Hadoop Overview kdd2011
Hadoop Overview & Architecture
 
Hive Anatomy
Introduction to Spark on Hadoop
Hadoop first mr job - inverted index construction
Large Scale Data Processing & Storage
Elephant in the cloud
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
Ad

More from Shinpei Ohtani (17)

PDF
Amazon Aurora
PDF
AWS Lambda and Amazon API Gateway
PDF
ECS for Docker Meetup #4
PDF
JVM的な何か@JVM Operation Casual Talk
PDF
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
PDF
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
PDF
プログラマブルクラウドの薦め
PDF
サンプルから見るMapReduceコード
PPTX
Hadoopソースリーディング第1回アジェンダ
PPTX
Hadoopソースリーディング第1回アジェンダ
PPT
はやわかりHadoop
PPT
T2 Web Framework
PDF
T2 Hacks
PDF
T2 webframework
PPT
Struts2を始めよう!
PPT
Struts2 in a nutshell
PPT
ASP.NET MVC 1.0
Amazon Aurora
AWS Lambda and Amazon API Gateway
ECS for Docker Meetup #4
JVM的な何か@JVM Operation Casual Talk
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
プログラマブルクラウドの薦め
サンプルから見るMapReduceコード
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダ
はやわかりHadoop
T2 Web Framework
T2 Hacks
T2 webframework
Struts2を始めよう!
Struts2 in a nutshell
ASP.NET MVC 1.0

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Advanced IT Governance
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
GamePlan Trading System Review: Professional Trader's Honest Take
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
Advanced Soft Computing BINUS July 2025.pdf
Advanced IT Governance
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

サンプルから見るMap reduceコード

  • 2. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 3. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 4. •  MapReduce –  Mapper/Reducer • 
  • 5. MapReduce •  WordCount •  •  – Mapper/Reducer Job ⾏行行 – InputFormat/OutputFormat ⽅方 – HDFS(FileSystem) –  Writable ⽅方
  • 6. WordCount •  Hadoop Hello World •  API (org.apache.hadoop.mapreduce) •  API
  • 7. Grep •  grep – grepJob/sortJob 2 ⾏行行 – JobConf/Mapper/Reducer ⽅方 – Mapper RegexMapper ⾏行行 <Text, Long> SequenceFileFormat – sortJob –  ⼒力力 – 
  • 8. Grep - •  JobConf •  Mapper •  Reducer
  • 9. o.a.hadoop.mapred.JobConf •  –  mapred-default.xml –  conf/mapred-site.xml – XML ⾝身 DOM – ⾃自 ⽬目 ⼿手 –  ⼦子 •  JobConf child = new JobConf( Conf, jar );
  • 10. mapred-site.xml <configuration> <!– --> <property> <key>mapred.job.tracker</key> <value>your-site:9001</value> </property> </configuration>
  • 11. o.a.hadoop.mapred.Mapper •  Mapper •  InputSplit Mapper •  MapTask/MapRunner •  map(KEY, VALUE, COLLECTOR, REPORTER) – KEY:Map VALUE:Map – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 12. o.a.hadoop.mapred.MapTask •  Map •  initiazlize (Task Reducer ) –  ⽣生 –  (o.a.h.mapred.TaskStatus.State) •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN – OutputCommiter ⽣生 •  Task ⼒力力 ⾏行行 •  ⼒力力 – mapred.work.output.dir
  • 13. o.a.h.mapred.MapTask cont •  run runOldMapper •  JobClient InputSplit •  RecordReader
  • 14. o.a.h.mapred.MapTask cont2 •  Reduce –  spill (* ) •  $mapred.local.dir/taskTracker/jobcache/$ {taskid}/output/spill${spillNumber}.out – Reducer ⼒力力 •  Combiner min.num.spills.for.combine combiner –  RecordWriter ⼒力力 •  MapRunner
  • 15. o.a.h.mapred.MapRunner •  MapRunnable – mapred.map.runner.class – Hadoop PipeMapRunner –  Map MultiThreadedMapRunner
  • 16. o.a.h.mapred.MapRunner cont •  run(RecordReader, OutputCollector, Reporter) – RecordReader: InputFormat Split Reader(InputFormat/RecordReader ) •  – RecordReader –  ⾝身 – 
  • 17. MapTask MapRunner Mapper Record Output Reader Collector Input Split⽣生   Spill & run createKey() SpillThread createValue()   next(key, value) EOF   Map(key, value, Spill outputCollector, reporter)
  • 19. •  Mapper – JobConf – Mapper/MapRunner/MapTask •  – Reducer •  Reducer ⾏行行 •  Reducer ⾏行行 – InputFormat/RecordReader
  • 20. o.a.h.mapred.Reducer •  Reducer •  InputSplit Mapper •  ReduceTask/ReduceRunner •  reduce(KEY, Iterator<VALUE>, COLLECTOR, REPORTER) – KEY: Iterator<VALUE>: – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 21. o.a.h.mapred.ReduceTask •  SHUFFLE •  ReduceTask.ReduceCopier – fetchOutputs( Merger.MergeQueue) •  Map x mapred.reduce.parallel.copies – MapOutputCopier •  Map ⾏行行 LocalFSMerger •  ⾏行行 InMemFSMergeThread •  GetMapEventsThread – Map – < , MapOutputLocation(taskId, host, httpUrl)> •  ⼀一 TaskTracker ⼯工
  • 22. o.a.h.mapred.ReduceTask •  run(RecordReader, OutputCollector, Reporter) •  SORT – Memory, disk ⽣生 •  RowKeyValueItetator – Reducer ⽣生 – RecordWriter ⽣生 – ReduceValuesIterator ⾏行行