SlideShare a Scribd company logo
Apache Hama
a Bulk Synchronous Parallel Computing

          Edward J. Yoon
     <edwardyoon@apache.org>
Who Am I
• Edward J. Yoon
  – @eddieyoon
• Founder of Apache Hama
• PMC member of Apache BigTop
• Oracle Employee
What’s Hama?
• Open Source
  – Under Apache 2.0 License
• Written In Java
• Apache Top Level Project
Apache hama @ Samsung SW Academy
Characteristics
• a General BSP computing engine
   – M/R like Input/Output Formatter
      • SequenceFile, Text, Accumulo, Hbase, …, etc.
   – Job Manager
   – Checkpoint Recovery
• Streaming and Pipes
   – Python, C++, …, etc.
• Graph and Machine Learning Packages
   – K-means, Gradient Descent, Collaborative Filtering
Bulk Synchronous Parallel?
• Originally introduced by Valiant
• a Sequence of supersteps
Compare to M/R and MPI
• Supports message-passing paradigm style of
  application development
• Provides a flexible, simple, and easy-to-use
  small APIs
• Enables to perform better than MPI for
  communication-intensive applications
• Guarantees impossibility of deadlocks or
  collisions in the communication mechanisms
So, fit for what?
• Processing Big Data w/ complicated
  relationships
   – e.g., graph or network.
• Iterative or Recursive scientific applications
• Continuous Event Processing
Which is the Big Data?
Could be applied to
•   Analyze user actions and patterns
•   Social Target Marketing
•   Observe evolution of Social networks
•   Detect anomaly rapidly in Real-time
•   Business Intelligence
Internals
• Pluggable RPC Architecture for message
  transfer
  – e.g., Hadoop RPC, Avro RPC, …, etc.
• Message Collector, Bundler, and
  Compressor to reduce network overheads
  and contentions
  – e.g., Snappy, Bzip2, …, etc.
BSP API


public abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer)
  throws IOException, SyncException;
BSP Examples
•   Pi Calculation
•   Sparse Matrix-Vector Multiplication
•   K-means Clustering
•   Gradient Descent
Graph API


public void compute(Iterator<M> messages)
   throws IOException;
Graph Examples
•   In-link Count
•   Single Source Shortest Path
•   Pagerank
•   Bipartitie Matching
•   Semi-Clustering
Find Maximum Value
SSSP Performance
• a SSSP for random
  graph of 1 billion edges
  is computed in 400
  seconds on 1 Oracle
  BDA

More Related Content

PDF
Apache Hama at Samsung Open Source Conference
PDF
Apache Hama 0.4
PDF
Quick Understanding of NoSQL
PDF
Introduction of Apache Hama - 2011
PDF
Introduction to apache horn (incubating)
PDF
Summary machine learning and model deployment
PPTX
Map Reduce
PDF
B.MONICA II M.SC COMPUTER SCIENCE
Apache Hama at Samsung Open Source Conference
Apache Hama 0.4
Quick Understanding of NoSQL
Introduction of Apache Hama - 2011
Introduction to apache horn (incubating)
Summary machine learning and model deployment
Map Reduce
B.MONICA II M.SC COMPUTER SCIENCE

What's hot (19)

PPTX
GraphLab Conference 2014 Keynote - Carlos Guestrin
PDF
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
PDF
Hadoop Ecosystem Architecture Overview
PDF
Geek camp
PPT
Introduction to Apache Hadoop
PPTX
Putting Lipstick on Apache Pig at Netflix
PPT
Giraph++: From "Think Like a Vertex" to "Think Like a Graph"
PPTX
Big Data and Hadoop Training in Bangalore by myTectra
PPT
2 hadoop@e bay-hug-2010-07-21
PPTX
Analysing of big data using map reduce
PPTX
Hadoop and Mapreduce for .NET User Group
PPT
Hadoop at Yahoo! -- Hadoop World NY 2009
PDF
Apache Giraph
PDF
The Bitter Lesson of ML Pipelines
PPTX
Big Data and Hadoop
PDF
Large Scale Graph Processing with Apache Giraph
PDF
Scaling Machine Learning with Apache Spark
PDF
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
PPTX
2011.10.14 Apache Giraph - Hortonworks
GraphLab Conference 2014 Keynote - Carlos Guestrin
Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...
Hadoop Ecosystem Architecture Overview
Geek camp
Introduction to Apache Hadoop
Putting Lipstick on Apache Pig at Netflix
Giraph++: From "Think Like a Vertex" to "Think Like a Graph"
Big Data and Hadoop Training in Bangalore by myTectra
2 hadoop@e bay-hug-2010-07-21
Analysing of big data using map reduce
Hadoop and Mapreduce for .NET User Group
Hadoop at Yahoo! -- Hadoop World NY 2009
Apache Giraph
The Bitter Lesson of ML Pipelines
Big Data and Hadoop
Large Scale Graph Processing with Apache Giraph
Scaling Machine Learning with Apache Spark
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
2011.10.14 Apache Giraph - Hortonworks
Ad

Similar to Apache hama @ Samsung SW Academy (20)

PDF
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
PDF
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
PPTX
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
PPTX
Automatic Scaling Iterative Computations
PPTX
Next generation analytics with yarn, spark and graph lab
PDF
Introducing Apache Giraph for Large Scale Graph Processing
PDF
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
PDF
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
PDF
Building an open source high performance data analytics platform
PPT
Hadoop trainingin bangalore
PPTX
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
PDF
High-Performance and Scalable Designs of Programming Models for Exascale Systems
PPTX
Big data analytics_7_giants_public_24_sep_2013
PDF
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
PPTX
Comparing Big Data and Simulation Applications and Implications for Software ...
PDF
Panda scalable hpc_bestpractices_tue100418
PDF
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
PDF
3 f6 9_distributed_systems
PPTX
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
PPTX
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Automatic Scaling Iterative Computations
Next generation analytics with yarn, spark and graph lab
Introducing Apache Giraph for Large Scale Graph Processing
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Building an open source high performance data analytics platform
Hadoop trainingin bangalore
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
High-Performance and Scalable Designs of Programming Models for Exascale Systems
Big data analytics_7_giants_public_24_sep_2013
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
Comparing Big Data and Simulation Applications and Implications for Software ...
Panda scalable hpc_bestpractices_tue100418
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
3 f6 9_distributed_systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Ad

More from Edward Yoon (11)

PDF
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
PDF
K means 알고리즘을 이용한 영화배우 클러스터링
PDF
차세대하둡과 주목해야할 오픈소스
PPT
The evolution of web and big data
PPTX
MongoDB introduction
PDF
Monitoring and mining network traffic in clouds
PDF
Apache hama 0.2-userguide
PDF
Usage case of HBase for real-time application
PDF
Understand Of Linear Algebra
PDF
BigTable And Hbase
PPT
Heart Proposal
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
K means 알고리즘을 이용한 영화배우 클러스터링
차세대하둡과 주목해야할 오픈소스
The evolution of web and big data
MongoDB introduction
Monitoring and mining network traffic in clouds
Apache hama 0.2-userguide
Usage case of HBase for real-time application
Understand Of Linear Algebra
BigTable And Hbase
Heart Proposal

Apache hama @ Samsung SW Academy

  • 1. Apache Hama a Bulk Synchronous Parallel Computing Edward J. Yoon <edwardyoon@apache.org>
  • 2. Who Am I • Edward J. Yoon – @eddieyoon • Founder of Apache Hama • PMC member of Apache BigTop • Oracle Employee
  • 3. What’s Hama? • Open Source – Under Apache 2.0 License • Written In Java • Apache Top Level Project
  • 5. Characteristics • a General BSP computing engine – M/R like Input/Output Formatter • SequenceFile, Text, Accumulo, Hbase, …, etc. – Job Manager – Checkpoint Recovery • Streaming and Pipes – Python, C++, …, etc. • Graph and Machine Learning Packages – K-means, Gradient Descent, Collaborative Filtering
  • 6. Bulk Synchronous Parallel? • Originally introduced by Valiant • a Sequence of supersteps
  • 7. Compare to M/R and MPI • Supports message-passing paradigm style of application development • Provides a flexible, simple, and easy-to-use small APIs • Enables to perform better than MPI for communication-intensive applications • Guarantees impossibility of deadlocks or collisions in the communication mechanisms
  • 8. So, fit for what? • Processing Big Data w/ complicated relationships – e.g., graph or network. • Iterative or Recursive scientific applications • Continuous Event Processing
  • 9. Which is the Big Data?
  • 10. Could be applied to • Analyze user actions and patterns • Social Target Marketing • Observe evolution of Social networks • Detect anomaly rapidly in Real-time • Business Intelligence
  • 11. Internals • Pluggable RPC Architecture for message transfer – e.g., Hadoop RPC, Avro RPC, …, etc. • Message Collector, Bundler, and Compressor to reduce network overheads and contentions – e.g., Snappy, Bzip2, …, etc.
  • 12. BSP API public abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, SyncException;
  • 13. BSP Examples • Pi Calculation • Sparse Matrix-Vector Multiplication • K-means Clustering • Gradient Descent
  • 14. Graph API public void compute(Iterator<M> messages) throws IOException;
  • 15. Graph Examples • In-link Count • Single Source Shortest Path • Pagerank • Bipartitie Matching • Semi-Clustering
  • 17. SSSP Performance • a SSSP for random graph of 1 billion edges is computed in 400 seconds on 1 Oracle BDA