SlideShare a Scribd company logo
Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
9.0
César Diniz Maciel
Executive IT Specialist
IBM Corporate Strategy
cmaciel@us.ibm.com
IBA4101 - POWERing your big data
solution with IBM: open-source
Hadoop on POWER
© Copyright IBM Corporation 2015
Session objectives
●
At the end of this session you should be able to understand
the options available to run open-source Hadoop on IBM
Power Systems, from building it from source code to using
packaged solutions.
●
You will also learn about integrated SW solutions and
appliances that leverage Power Systems technology to
accelerate Hadoop-based implementations.
© Copyright IBM Corporation 2015
Acknowledgements
●
This presentation is based on the work done by the IBM
Solutions Operating Environment (IBMSOE) team. I would
like to thank the contribution of the following experts from
IBM US:
– Corentin Baron
– Luke Browning
– Pascal Oliva
– Tony Reix
– Daniele Silvestre
© Copyright IBM Corporation 2015
Hadoop
●
From the hadoop website:
“The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using
simple programming models. It is designed to scale up from single servers to
thousands of machines, each offering local computation and storage. Rather
than rely on hardware to deliver high-availability, the library itself is designed to
detect and handle failures at the application layer, so delivering a highly-
available service on top of a cluster of computers, each of which may be prone
to failures.“
●
Actively developed by the open source community, and by several
vendors
– Several pre-packaged distributions of Hadoop, including from IBM – IBM
Infosphere BigInsights
– Hadoop market expected to reach 20 Billion in 2018 according to
Transparency Market Research
© Copyright IBM Corporation 2015
Hadoop components
© Copyright IBM Corporation 2015
Hadoop components
© Copyright IBM Corporation 2015
Apache Spark – in-memory mapreduce
© Copyright IBM Corporation 2015
Open Source Hadoop on Power Systems
●
The IBM Solutions Operating Environment hosts buildable
Hadoop source trees optimized for Linux running on
POWER. These trees support both RedHat Enterprise
Linux (RHEL) v6.5 or later on big-endian with PowerVM and
Ubuntu v14.04 on little-endian with PowerKVM. It also
works with openSUSE 13.2 (big-endian). Other versions or
releases may also work, but have not been tested.
●
This build is based on the open source Hadoop tree, with
changes required for building on Power Systems, and with
IBM Java (optimized for POWER). The source is hosted on
github.
© Copyright IBM Corporation 2015
Why build from scratch?
●
Alternative for deploying Hadoop without purchasing
commercial software
●
Community-based development and support
●
Ability to run Hadoop on flexible environments (older
systems, different operating environments)
●
Desire to learn the details on the Hadoop framework
© Copyright IBM Corporation 2015
Hardware prerequisites
●
Hadoop itself does not have hardware requisites
– Designed to run in a cheap and unreliable system, and
to scale out
– Obviously, the faster and more reliable the system is,
the better the solution will be and easier to maintain
●
It can be built on POWER7 processor-based systems and
later
– Functionally equivalent in POWER7 or POWER8, little
endian or big endian
© Copyright IBM Corporation 2015
I like open-source, but I don't want to build from
scratch….
As it is with x86, there are pre-packaged solutions of Hadoop available for
Power Systems
●
IBM have the Infosphere BigInsights on Power Systems since 2012
– Version 3.2 available for POWER7 processor-based systems
– Now updated with the latest IBM Open Platform with Apache Hadoop
(IOP), for POWER8 processor-based systems
– IOP is available as a 100% open-source, free to use product, and a
value-add, fee-based additional components
●
Veristorm offers the Veristorm Data Hub, a 100% open-source Hadoop build
for POWER8, free to use.
Details on how to build from scratch on POWER at the
Backup section of the presentation
© Copyright IBM Corporation 2015
Any Power Systems server running RH 6.5
Trial version available at
https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-beta-iibob
© Copyright IBM Corporation 2015
IBM Open Platform with Apache Hadoop
http://www-03.ibm.com/software/products/en/ibm-open-platform-with-apache-ha
doop
© Copyright IBM Corporation 2015
IBM Open Platform with Apache Hadoop
IBM® Open Platform with Apache Hadoop builds the
platform for big data projects and provides the most
current Apache Hadoop open source content. IBM offers
this open source Apache distribution as a free download
as well as a supported offering for all your Hadoop
workloads.
➔ 100% Apache Hadoop Open Source platform
➔ HDFS, YARN, MapReduce, Ambari, Hbase, Hive, Oozie,
Parquet, Parquet Format, Pig, Snappy, Solr, Sqoop,
Zookeeper, Open JDK, Knox, Slider
➔ Spark in-memory distributed compute engine
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
Download IBM Open Platform with Apache Hadoop
© Copyright IBM Corporation 2015
Installation and configuration
●
Installation involves adding a repository and installing Ambari
●
“The Apache Ambari project is aimed at making Hadoop management simpler
by developing software for provisioning, managing, and monitoring Apache
Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop
management web UI backed by its RESTful APIs.“
●
After installation is complete, start the Ambari Server and perform the
installation via web browser
– ambari­server setup
– ambari­server start
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
23
IBM Systems Technical University, October 5-9 | Hilton Orlando
© Copyright IBM Corporation 2015. Technical
University/Symposia materials may not be
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
2
4
IBM BigInsights 4.1 Value Adds
IBM BigInsights
Analyst
Big SQL
Big SQL
BigSheets
BigSheets
Text Analytics
Text Analytics
Machine Learning on Big R
Machine Learning on Big R
Big R
Big R
IBM BigInsights
Data Scientist
Big SQL
Big SQL
BigSheets
BigSheets
IBM Value Adds
IOP – 100% open-source, free
Download and use for production
Optional fee-based support
IOP – 100% open-source, free
Download and use for production
Optional fee-based support
© Copyright IBM Corporation 2015
© 2014 IBM Corporation
2
5
IBM BigInsights Extends Value of IOP
 PLUS
– Big SQL for rich ANSI compliant SQL interface to Hadoop
– World class Text Analytics from IBM Research
– Distributed Frame Leverage R as a query language on Hadoop
– Scalable algorithms from IBM Research – Systemml
– GPFS-FPO for enterprise data life cycle manangement
– Browser-based excel-like analytics inteface
– Probabilistic matching engine on Hadoop
BigMatch
BigSql Text
Analytics
BigR
Machine
Learning
BigSheets
© Copyright IBM Corporation 2015
●
Veristorm Provides binaries of Hadoop for POWER8 free of charge
●
Compiled from the same IBMSOE source, in a joint work between IBM and
Veristorm
●
Supported with Ubuntu, RedHat and SuSE (all LE versions)
●
Provides the same functionality as the build compiled by source, but in a faster
and easier deployment way
●
Community support for the free package, and option of (paid) Enterprise
support
●
http://www.veristorm.com/veristorm-data-hub
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
Veristorm Data Hub
●
Prerequisites
– IBM JDK 1.7 or superior
– Python 2.6 or superior
– Postgresql-9.3+
– zlibc 0.9k-4.1 or greater
●
Installation can be done with the downloaded package, or via repository
– echo "deb
http://repo.veristorm.com/repos/vdh/apt/debian vdh­
ppc­vstore main" > /etc/apt/sources.list.d/vdh.list
– apt­get update
– apt­get install vdh­ppc­ambari­server
© Copyright IBM Corporation 2015
Managing the cluster with Ambari - Hortonworks
© Copyright IBM Corporation 2015
Managing the cluster with Ambari - Veristorm
© Copyright IBM Corporation 2015
Managing the cluster with Ambari - IOP
© Copyright IBM Corporation 2015
Conclusions
●
Open-source Hadoop is well regarded by customers
●
The ability to leverage the characteristics of the Power
Systems servers with the flexibility of Linux and Hadoop
makes it an interesting combination with little or no
knowledge curve
●
The free, community-supported Veristorm package
simplifies the implementation and management of Hadoop
on Power Systems.
●
IBM Open Platform for Apache Hadoop brings an IBM-
backed, 100% free open-source package for enterprise
environments.
●
As we go forward, focus on performance and management
makes the solution even more attractive.
© Copyright IBM Corporation
2015
© Copyright IBM Corporation 2015
Continue growing your IBM skills
ibm.com/training provides a
comprehensive portfolio of skills and career
accelerators that are designed to meet all
your training needs.
• Training in cities local to you - where and
when you need it, and in the format you want
– Use IBM Training Search to locate public training classes
near to you with our five Global Training Providers
– Private training is also available with our Global Training Providers
• Demanding a high standard of quality –
view the paths to success
– Browse Training Paths and Certifications to find the
course that is right for you
• If you can’t find the training that is right for you with our
Global Training Providers, we can help.
– Contact IBM Training at dpmc@us.ibm.com
33
Global Skills Initiative
© Copyright IBM Corporation
2015
© Copyright IBM Corporation 2015
Backup slides
© Copyright IBM Corporation 2015
Software prerequisites
●
Hadoop requires several components in order to run, and
others are required to build the binaries
●
Besides the typical development environment (C/C++
compiler, Java, build tools) there are libraries and Java build
tools that are required. Most of the tools are available with
the Linux distribution.
●
Depending on the Linux distribution, and release,
requirements do change
– Redhat 6.5 brings the snappy library precompiled.
Redhat 7 does not
© Copyright IBM Corporation 2015
Prerequisite install for Ubuntu Linux
root@cmaciel4:~# apt­get install cmake automake
autoconf git openssl libssl­dev zlib1g fuse
libfuse­dev
Download and install IBM Java from here (if not already
installed)
●
Java 1.6 or superior is required
●
Set JAVA_HOME and PATH to point to the Java binaries
– Linux sources scripts upon login from /etc/profile.d
– Adding scripts for environmental variables under this directory makes it
easy to maintain
– /etc/profile.d/java.sh
●
export JAVA_HOME=/opt/ibm/ibm­java­ppc64le­71
●
export PATH=$JAVA_HOME/bin:$PATH
© Copyright IBM Corporation 2015
Prerequisite install for Ubuntu Linux
●
Download and install gcc Advance Toolchain
– IBM Advance Toolchain for PowerLinux Documentation
– „The IBM Advance Toolchain for PowerLinux is a set of open source
development tools and runtime libraries which allows users to take
leading edge advantage of IBM's latest POWER hardware features on
Linux“
– gcc and libraries optimized for latest POWER processors
●
Add the repository per the documentation
●
apt­get install advance­toolchain­at8.0­runtime advance­
toolchain­at8.0­devel advance­toolchain­at8.0­perf
advance­toolchain­at8.0­mcore­libs
●
Set PATH to include the advance toolchain binaries
– export PATH=/opt/at8.0/bin:$PATH
© Copyright IBM Corporation 2015
Prerequisite install
Hadoop uses Apache Maven for SW project management, and
Apache Ant as a Java build tool.
„Apache Maven is a software project management and comprehension tool. Based on
the concept of a project object model (POM), Maven can manage a project's build,
reporting and documentation from a central piece of information.“
„Apache Ant is a Java library and command-line tool whose mission is to drive
processes described in build files as targets and extension points dependent upon
each other. The main known usage of Ant is the build of Java applications.“
Download Maven (version 3.3 or higher)
Download Ant
© Copyright IBM Corporation 2015
Prerequisite install
Hadoop can optionally use snappy, a compression library
developed by Google
– apt­get install libsnappy1 libsnappy­dev
– Not available on Redhat – need to
compile from source
© Copyright IBM Corporation 2015
Prerequisite install
●
Set Maven and Ant environment variables
– export M2_HOME=/opt/apache­maven­3.3.1
– export M2=$M2_HOME/bin
– export PATH=$M2:$PATH
– export ANT_HOME=/opt/apache­ant­1.9.4
– export PATH=$ANT_HOME/bin:$PATH
Install Google's Protocol Buffers
– apt­get install protobuf­compiler
libprotobuf­dev
– Not available on Redhat – need to
compile from source
© Copyright IBM Corporation 2015
Building Hadoop
●
Once all prerequisites are installed, run a final check:
– echo $JAVA_HOME
– echo $M2_HOME
– echo $ANT_HOME
– echo $M2
– echo $PATH
●
Check all commands run properly:
– gcc ­­version
– java ­version
– ant ­version
– mvn ­version
– protoc ­­version
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
Building Hadoop
●
If everything is working properly, retrieve the Hadoop code from
https://github.com/ibmsoe/hadoop-common (either using the zip file,
or cloning the tree using git).
●
To compile and install Hadoop in to Maven cache using JNI and
snappy use the following build command from the root of the
Hadoop tree directory (for example, hadoop­common):
– <hadoop_common> $ mvn install ­Pnative ­DskipTests
­Drequire.snappy
●
Install installs in local package repository
●
­Pnative builds native code
●
­DskipTests compiles without running tests in the binaries
●
­Drequire.snappy uses the snappy compression libraries
© Copyright IBM Corporation 2015
© Copyright IBM Corporation 2015
Creating a distribution package
●
mvn package ­Pnative,dist ­Drequire.snappy ­DskipTests
­Dtar
●
Tar file saved at
hadoop_common/hadoop­dist/target/hadoop­dist­2.4.1.tar.gz
●
Archive contains all binaries required to run Hadoop
© Copyright IBM Corporation 2015
Hadoop basic commands
●
hadoop@cmaciel4:~/hadoop­dist/hadoop­2.4.1$ bin/hadoop version
– Hadoop 2.4.1
Subversion https://github.com/ibmsoe/hadoop­common ­r
4d0782d33c2f6b91d34b44b15f09b142eab1a403
Compiled by hadoop on 2015­04­15T21:39Z
Compiled with protoc 2.5.0
From source with checksum c38708d16c3c1bd89a3ab88f5aefa9
This command was run using /home/hadoop/hadoop­dist/hadoop­
2.4.1/share/hadoop/common/hadoop­common­2.4.1.jar
●
hadoop@cmaciel4:~/hadoop­dist/hadoop­2.4.1$ bin/hadoop fs ­ls
– Found 7 items
drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 bin
drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 etc
drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 include
drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 lib
drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 libexec
drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 sbin
drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 share

More Related Content

PDF
Ibm leads way with hadoop and spark 2015 may 15
PDF
HP Helion OpenStack step by step
PDF
Ibm power systems this is power on a smarter planet
PDF
Red Hat for Power Systems IBM Enterprise2014 Las Vegas
PPTX
Carpe Datum: Building Big Data Analytical Applications with HP Haven
PDF
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
PPTX
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
PPTX
Pivotal Strata NYC 2015 Apache HAWQ Launch
Ibm leads way with hadoop and spark 2015 may 15
HP Helion OpenStack step by step
Ibm power systems this is power on a smarter planet
Red Hat for Power Systems IBM Enterprise2014 Las Vegas
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
Pivotal Strata NYC 2015 Apache HAWQ Launch

Similar to POWERing your big data solution with IBM: open-source Hadoop on POWER (20)

PDF
Big SQL Competitive Summary - Vendor Landscape
PPTX
Get started with hadoop hive hive ql languages
PDF
Exploring the Open Source Linux Ecosystem
 
PDF
SAP HANA on POWER9 systems
PDF
DFW BlueMix Meetup - demo and slides
PPT
Hadoop_Its_Not_Just_Internal_Storage_V14
PDF
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
PDF
The sensor data challenge - Innovations (not only) for the Internet of Things
PPTX
Open Source in Entperprises - A Presentation by SAP at OSCON 2014 Confernece
PPTX
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
PDF
Building Apache Hadoop from source on IBM Power Systems
PDF
1040: OpenPOWER Foundation Update
PDF
SUSE Linux Enterprise Server for IBM Power
PDF
Big SQL NYC Event December by Virender
PPTX
A modern, flexible approach to Hadoop implementation incorporating innovation...
PDF
Hadoop Everywhere & Cloudbreak
PDF
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
PDF
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
 
PDF
Hadoop in adtech
PDF
Dissecting and Attacking RMI Frameworks
Big SQL Competitive Summary - Vendor Landscape
Get started with hadoop hive hive ql languages
Exploring the Open Source Linux Ecosystem
 
SAP HANA on POWER9 systems
DFW BlueMix Meetup - demo and slides
Hadoop_Its_Not_Just_Internal_Storage_V14
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
The sensor data challenge - Innovations (not only) for the Internet of Things
Open Source in Entperprises - A Presentation by SAP at OSCON 2014 Confernece
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Building Apache Hadoop from source on IBM Power Systems
1040: OpenPOWER Foundation Update
SUSE Linux Enterprise Server for IBM Power
Big SQL NYC Event December by Virender
A modern, flexible approach to Hadoop implementation incorporating innovation...
Hadoop Everywhere & Cloudbreak
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
 
Hadoop in adtech
Dissecting and Attacking RMI Frameworks
Ad

Recently uploaded (20)

PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Transform Your Business with a Software ERP System
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
medical staffing services at VALiNTRY
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Essential Infomation Tech presentation.pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
System and Network Administraation Chapter 3
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Design an Analysis of Algorithms I-SECS-1021-03
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
wealthsignaloriginal-com-DS-text-... (1).pdf
Operating system designcfffgfgggggggvggggggggg
Transform Your Business with a Software ERP System
Odoo POS Development Services by CandidRoot Solutions
2025 Textile ERP Trends: SAP, Odoo & Oracle
How to Migrate SBCGlobal Email to Yahoo Easily
medical staffing services at VALiNTRY
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Softaken Excel to vCard Converter Software.pdf
Digital Strategies for Manufacturing Companies
How Creative Agencies Leverage Project Management Software.pdf
Essential Infomation Tech presentation.pptx
Design an Analysis of Algorithms II-SECS-1021-03
System and Network Administraation Chapter 3
Ad

POWERing your big data solution with IBM: open-source Hadoop on POWER

  • 1. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 9.0 César Diniz Maciel Executive IT Specialist IBM Corporate Strategy cmaciel@us.ibm.com IBA4101 - POWERing your big data solution with IBM: open-source Hadoop on POWER
  • 2. © Copyright IBM Corporation 2015 Session objectives ● At the end of this session you should be able to understand the options available to run open-source Hadoop on IBM Power Systems, from building it from source code to using packaged solutions. ● You will also learn about integrated SW solutions and appliances that leverage Power Systems technology to accelerate Hadoop-based implementations.
  • 3. © Copyright IBM Corporation 2015 Acknowledgements ● This presentation is based on the work done by the IBM Solutions Operating Environment (IBMSOE) team. I would like to thank the contribution of the following experts from IBM US: – Corentin Baron – Luke Browning – Pascal Oliva – Tony Reix – Daniele Silvestre
  • 4. © Copyright IBM Corporation 2015 Hadoop ● From the hadoop website: “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly- available service on top of a cluster of computers, each of which may be prone to failures.“ ● Actively developed by the open source community, and by several vendors – Several pre-packaged distributions of Hadoop, including from IBM – IBM Infosphere BigInsights – Hadoop market expected to reach 20 Billion in 2018 according to Transparency Market Research
  • 5. © Copyright IBM Corporation 2015 Hadoop components
  • 6. © Copyright IBM Corporation 2015 Hadoop components
  • 7. © Copyright IBM Corporation 2015 Apache Spark – in-memory mapreduce
  • 8. © Copyright IBM Corporation 2015 Open Source Hadoop on Power Systems ● The IBM Solutions Operating Environment hosts buildable Hadoop source trees optimized for Linux running on POWER. These trees support both RedHat Enterprise Linux (RHEL) v6.5 or later on big-endian with PowerVM and Ubuntu v14.04 on little-endian with PowerKVM. It also works with openSUSE 13.2 (big-endian). Other versions or releases may also work, but have not been tested. ● This build is based on the open source Hadoop tree, with changes required for building on Power Systems, and with IBM Java (optimized for POWER). The source is hosted on github.
  • 9. © Copyright IBM Corporation 2015 Why build from scratch? ● Alternative for deploying Hadoop without purchasing commercial software ● Community-based development and support ● Ability to run Hadoop on flexible environments (older systems, different operating environments) ● Desire to learn the details on the Hadoop framework
  • 10. © Copyright IBM Corporation 2015 Hardware prerequisites ● Hadoop itself does not have hardware requisites – Designed to run in a cheap and unreliable system, and to scale out – Obviously, the faster and more reliable the system is, the better the solution will be and easier to maintain ● It can be built on POWER7 processor-based systems and later – Functionally equivalent in POWER7 or POWER8, little endian or big endian
  • 11. © Copyright IBM Corporation 2015 I like open-source, but I don't want to build from scratch…. As it is with x86, there are pre-packaged solutions of Hadoop available for Power Systems ● IBM have the Infosphere BigInsights on Power Systems since 2012 – Version 3.2 available for POWER7 processor-based systems – Now updated with the latest IBM Open Platform with Apache Hadoop (IOP), for POWER8 processor-based systems – IOP is available as a 100% open-source, free to use product, and a value-add, fee-based additional components ● Veristorm offers the Veristorm Data Hub, a 100% open-source Hadoop build for POWER8, free to use. Details on how to build from scratch on POWER at the Backup section of the presentation
  • 12. © Copyright IBM Corporation 2015 Any Power Systems server running RH 6.5 Trial version available at https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-beta-iibob
  • 13. © Copyright IBM Corporation 2015 IBM Open Platform with Apache Hadoop http://www-03.ibm.com/software/products/en/ibm-open-platform-with-apache-ha doop
  • 14. © Copyright IBM Corporation 2015 IBM Open Platform with Apache Hadoop IBM® Open Platform with Apache Hadoop builds the platform for big data projects and provides the most current Apache Hadoop open source content. IBM offers this open source Apache distribution as a free download as well as a supported offering for all your Hadoop workloads. ➔ 100% Apache Hadoop Open Source platform ➔ HDFS, YARN, MapReduce, Ambari, Hbase, Hive, Oozie, Parquet, Parquet Format, Pig, Snappy, Solr, Sqoop, Zookeeper, Open JDK, Knox, Slider ➔ Spark in-memory distributed compute engine
  • 15. © Copyright IBM Corporation 2015
  • 16. © Copyright IBM Corporation 2015 Download IBM Open Platform with Apache Hadoop
  • 17. © Copyright IBM Corporation 2015 Installation and configuration ● Installation involves adding a repository and installing Ambari ● “The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs.“ ● After installation is complete, start the Ambari Server and perform the installation via web browser – ambari­server setup – ambari­server start
  • 18. © Copyright IBM Corporation 2015
  • 19. © Copyright IBM Corporation 2015
  • 20. © Copyright IBM Corporation 2015
  • 21. © Copyright IBM Corporation 2015
  • 22. © Copyright IBM Corporation 2015
  • 23. 23 IBM Systems Technical University, October 5-9 | Hilton Orlando © Copyright IBM Corporation 2015. Technical University/Symposia materials may not be © Copyright IBM Corporation 2015
  • 24. © Copyright IBM Corporation 2015 2 4 IBM BigInsights 4.1 Value Adds IBM BigInsights Analyst Big SQL Big SQL BigSheets BigSheets Text Analytics Text Analytics Machine Learning on Big R Machine Learning on Big R Big R Big R IBM BigInsights Data Scientist Big SQL Big SQL BigSheets BigSheets IBM Value Adds IOP – 100% open-source, free Download and use for production Optional fee-based support IOP – 100% open-source, free Download and use for production Optional fee-based support
  • 25. © Copyright IBM Corporation 2015 © 2014 IBM Corporation 2 5 IBM BigInsights Extends Value of IOP  PLUS – Big SQL for rich ANSI compliant SQL interface to Hadoop – World class Text Analytics from IBM Research – Distributed Frame Leverage R as a query language on Hadoop – Scalable algorithms from IBM Research – Systemml – GPFS-FPO for enterprise data life cycle manangement – Browser-based excel-like analytics inteface – Probabilistic matching engine on Hadoop BigMatch BigSql Text Analytics BigR Machine Learning BigSheets
  • 26. © Copyright IBM Corporation 2015 ● Veristorm Provides binaries of Hadoop for POWER8 free of charge ● Compiled from the same IBMSOE source, in a joint work between IBM and Veristorm ● Supported with Ubuntu, RedHat and SuSE (all LE versions) ● Provides the same functionality as the build compiled by source, but in a faster and easier deployment way ● Community support for the free package, and option of (paid) Enterprise support ● http://www.veristorm.com/veristorm-data-hub
  • 27. © Copyright IBM Corporation 2015
  • 28. © Copyright IBM Corporation 2015 Veristorm Data Hub ● Prerequisites – IBM JDK 1.7 or superior – Python 2.6 or superior – Postgresql-9.3+ – zlibc 0.9k-4.1 or greater ● Installation can be done with the downloaded package, or via repository – echo "deb http://repo.veristorm.com/repos/vdh/apt/debian vdh­ ppc­vstore main" > /etc/apt/sources.list.d/vdh.list – apt­get update – apt­get install vdh­ppc­ambari­server
  • 29. © Copyright IBM Corporation 2015 Managing the cluster with Ambari - Hortonworks
  • 30. © Copyright IBM Corporation 2015 Managing the cluster with Ambari - Veristorm
  • 31. © Copyright IBM Corporation 2015 Managing the cluster with Ambari - IOP
  • 32. © Copyright IBM Corporation 2015 Conclusions ● Open-source Hadoop is well regarded by customers ● The ability to leverage the characteristics of the Power Systems servers with the flexibility of Linux and Hadoop makes it an interesting combination with little or no knowledge curve ● The free, community-supported Veristorm package simplifies the implementation and management of Hadoop on Power Systems. ● IBM Open Platform for Apache Hadoop brings an IBM- backed, 100% free open-source package for enterprise environments. ● As we go forward, focus on performance and management makes the solution even more attractive.
  • 33. © Copyright IBM Corporation 2015 © Copyright IBM Corporation 2015 Continue growing your IBM skills ibm.com/training provides a comprehensive portfolio of skills and career accelerators that are designed to meet all your training needs. • Training in cities local to you - where and when you need it, and in the format you want – Use IBM Training Search to locate public training classes near to you with our five Global Training Providers – Private training is also available with our Global Training Providers • Demanding a high standard of quality – view the paths to success – Browse Training Paths and Certifications to find the course that is right for you • If you can’t find the training that is right for you with our Global Training Providers, we can help. – Contact IBM Training at dpmc@us.ibm.com 33 Global Skills Initiative
  • 34. © Copyright IBM Corporation 2015 © Copyright IBM Corporation 2015 Backup slides
  • 35. © Copyright IBM Corporation 2015 Software prerequisites ● Hadoop requires several components in order to run, and others are required to build the binaries ● Besides the typical development environment (C/C++ compiler, Java, build tools) there are libraries and Java build tools that are required. Most of the tools are available with the Linux distribution. ● Depending on the Linux distribution, and release, requirements do change – Redhat 6.5 brings the snappy library precompiled. Redhat 7 does not
  • 36. © Copyright IBM Corporation 2015 Prerequisite install for Ubuntu Linux root@cmaciel4:~# apt­get install cmake automake autoconf git openssl libssl­dev zlib1g fuse libfuse­dev Download and install IBM Java from here (if not already installed) ● Java 1.6 or superior is required ● Set JAVA_HOME and PATH to point to the Java binaries – Linux sources scripts upon login from /etc/profile.d – Adding scripts for environmental variables under this directory makes it easy to maintain – /etc/profile.d/java.sh ● export JAVA_HOME=/opt/ibm/ibm­java­ppc64le­71 ● export PATH=$JAVA_HOME/bin:$PATH
  • 37. © Copyright IBM Corporation 2015 Prerequisite install for Ubuntu Linux ● Download and install gcc Advance Toolchain – IBM Advance Toolchain for PowerLinux Documentation – „The IBM Advance Toolchain for PowerLinux is a set of open source development tools and runtime libraries which allows users to take leading edge advantage of IBM's latest POWER hardware features on Linux“ – gcc and libraries optimized for latest POWER processors ● Add the repository per the documentation ● apt­get install advance­toolchain­at8.0­runtime advance­ toolchain­at8.0­devel advance­toolchain­at8.0­perf advance­toolchain­at8.0­mcore­libs ● Set PATH to include the advance toolchain binaries – export PATH=/opt/at8.0/bin:$PATH
  • 38. © Copyright IBM Corporation 2015 Prerequisite install Hadoop uses Apache Maven for SW project management, and Apache Ant as a Java build tool. „Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.“ „Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications.“ Download Maven (version 3.3 or higher) Download Ant
  • 39. © Copyright IBM Corporation 2015 Prerequisite install Hadoop can optionally use snappy, a compression library developed by Google – apt­get install libsnappy1 libsnappy­dev – Not available on Redhat – need to compile from source
  • 40. © Copyright IBM Corporation 2015 Prerequisite install ● Set Maven and Ant environment variables – export M2_HOME=/opt/apache­maven­3.3.1 – export M2=$M2_HOME/bin – export PATH=$M2:$PATH – export ANT_HOME=/opt/apache­ant­1.9.4 – export PATH=$ANT_HOME/bin:$PATH Install Google's Protocol Buffers – apt­get install protobuf­compiler libprotobuf­dev – Not available on Redhat – need to compile from source
  • 41. © Copyright IBM Corporation 2015 Building Hadoop ● Once all prerequisites are installed, run a final check: – echo $JAVA_HOME – echo $M2_HOME – echo $ANT_HOME – echo $M2 – echo $PATH ● Check all commands run properly: – gcc ­­version – java ­version – ant ­version – mvn ­version – protoc ­­version
  • 42. © Copyright IBM Corporation 2015
  • 43. © Copyright IBM Corporation 2015
  • 44. © Copyright IBM Corporation 2015 Building Hadoop ● If everything is working properly, retrieve the Hadoop code from https://github.com/ibmsoe/hadoop-common (either using the zip file, or cloning the tree using git). ● To compile and install Hadoop in to Maven cache using JNI and snappy use the following build command from the root of the Hadoop tree directory (for example, hadoop­common): – <hadoop_common> $ mvn install ­Pnative ­DskipTests ­Drequire.snappy ● Install installs in local package repository ● ­Pnative builds native code ● ­DskipTests compiles without running tests in the binaries ● ­Drequire.snappy uses the snappy compression libraries
  • 45. © Copyright IBM Corporation 2015
  • 46. © Copyright IBM Corporation 2015 Creating a distribution package ● mvn package ­Pnative,dist ­Drequire.snappy ­DskipTests ­Dtar ● Tar file saved at hadoop_common/hadoop­dist/target/hadoop­dist­2.4.1.tar.gz ● Archive contains all binaries required to run Hadoop
  • 47. © Copyright IBM Corporation 2015 Hadoop basic commands ● hadoop@cmaciel4:~/hadoop­dist/hadoop­2.4.1$ bin/hadoop version – Hadoop 2.4.1 Subversion https://github.com/ibmsoe/hadoop­common ­r 4d0782d33c2f6b91d34b44b15f09b142eab1a403 Compiled by hadoop on 2015­04­15T21:39Z Compiled with protoc 2.5.0 From source with checksum c38708d16c3c1bd89a3ab88f5aefa9 This command was run using /home/hadoop/hadoop­dist/hadoop­ 2.4.1/share/hadoop/common/hadoop­common­2.4.1.jar ● hadoop@cmaciel4:~/hadoop­dist/hadoop­2.4.1$ bin/hadoop fs ­ls – Found 7 items drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 bin drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 etc drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 include drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 lib drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 libexec drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 sbin drwxrwxr­x ­ hadoop hadoop 4096 2015­04­15 17:53 share