SlideShare a Scribd company logo
Clouldera Implementation Guide for
Production Deployments
In this article i will cover a detailed step by step guide for installing Cloudera CDH 5.14
using Cloudera Manager and External Database Setup and create a Hadoop Cluster. This
is the recommended path for all production deployments.
The standard Cloudera installation guide was kinda confusing for me, it keep looping
between different URLs that is hard to have a clear path for the implementation with even
some steps that do not work in the explained order as well some that needs to be with
different syntax.
Here i am sharing a clear and easy path to follow with references, please feel free to reach
me for any clarifications or any suggestions for improvements :)
Contacts:
Name: Ahmed Mekawy
Email: ahmedmekawy@hotmail.com
LinkedIn: https://www.linkedin.com/in/ahmed-mekawy-1ba11031/
Please feel free to reach me when you do have a need to setup a production environment
or administration training classes and I will be happy to help. Let's get started:
Implementation Overview:
Install and configure the database, install the Oracle JDK
– Database should be external for production deployments ( this what we will do here)
– Embedded PostgreSQL is okay for testing or ‘proof of concept’ work
Ensure access to the Cloudera software repositories
– For Cloudera Manager
– For CDH
Install Cloudera Manager and agents
Install the CDH Parcel services or RPMs for the services required on each host in the
cluster
Implementation Environment Planning:
I am using VirtualBox to create a VM with Centos 7, my hostname is cloudera.
The VM is 5G RAM , 15 GB Disk Space ,with 1 Network Card and Internet access.
I will use MySQL as the external database for Cloudera Manager and CDH components.
For different setup, you only need to ensure having the right ceritified matrix and
capacity planing in place, the rest of the steps are exactly the same as this guide, review
the following links:
Please review CDH 5 and Cloudera Manager 5 Requirements and Supported Versions .
Hardware Requirements Guide
Building local repositories for hosts with no internet access.
Implementation step by step:
login as: root
root@192.168.1.50's password:
Disable Firewall:
[root@cloudera ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-03-05 09:07:48 EST; 1min 10s ago
[root@cloudera ~]# service firewalld stop
Redirecting to /bin/systemctl stop firewalld.service
[root@cloudera ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@cloudera ~]#
Disable SELinux:
[root@cloudera ~]# sestatus
SELinux status: disabled
[root@cloudera ~]#
Install Python:
[root@cloudera ~]# rpm -qa |grep -i python
python-2.7.5-58.el7.x86_64
[root@cloudera ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.50 cloudera
Get repo file from
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_vd.html
[root@cloudera yum.repos.d]# wget https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-
manager.repo
-bash: wget: command not found
[root@cloudera yum.repos.d]# yum install wget
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
wget x86_64 1.14-15.el7_4.1 updates 547 k
Installed:
wget.x86_64 0:1.14-15.el7_4.1
Complete!
Added cloudera repo:
[root@cloudera yum.repos.d]#
[root@cloudera yum.repos.d]# wget https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-
manager.repo
--2018-03-05 09:28:40-- https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-
manager.repo
Resolving archive.cloudera.com (archive.cloudera.com)... 151.101.0.167, 151.101.64.167,
151.101.128.167, ...
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.0.167|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 290
Saving to: ‘cloudera-manager.repo’
100%[======================================>] 290 --.-K/s in 0s
2018-03-05 09:28:46 (28.5 MB/s) - ‘cloudera-manager.repo’ saved [290/290]
[root@cloudera yum.repos.d]# ls
CentOS-Base.repo CentOS-fasttrack.repo CentOS-Vault.repo
CentOS-CR.repo CentOS-Media.repo cloudera-manager.repo
CentOS-Debuginfo.repo CentOS-Sources.repo
Install JAVA JDK:
[root@cloudera yum.repos.d]# yum install oracle-j2sdk1.7
Loaded plugins: fastestmirror
cloudera-manager | 951 B 00:00
cloudera-manager/primary | 4.3 kB 00:00
Loading mirror speeds from cached hostfile
* base: mirror.airenetworks.es
* extras: mirror.crazynetwork.it
* updates: mirrors.prometeus.net
cloudera-manager 7/7
Resolving Dependencies
--> Running transaction check
---> Package oracle-j2sdk1.7.x86_64 0:1.7.0+update67-1 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
oracle-j2sdk1.7 x86_64 1.7.0+update67-1 cloudera-manager 135 M
Transaction Summary
================================================================================
Install 1 Package
Total download size: 135 M
Installed size: 279 M
Is this ok [y/d/N]: y
Downloading packages:
Installed:
oracle-j2sdk1.7.x86_64 0:1.7.0+update67-1
Complete!
Install Cloudera Manager Components:
[root@cloudera yum.repos.d]# yum install cloudera-manager-daemons cloudera-manager-server
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.airenetworks.es
* extras: mirror.crazynetwork.it
* updates: mirrors.prometeus.net
Resolving Dependencies
--> Running transaction check
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
cloudera-manager-daemons
x86_64 5.14.1-1.cm5141.p0.1.el7 cloudera-manager 700 M
cloudera-manager-server x86_64 5.14.1-1.cm5141.p0.1.el7 cloudera-manager 8.5 k
Transaction Summary
================================================================================
Install 2 Packages (+27 Dependent packages)
Total size: 711 M
Total download size: 700 M
Installed size: 918 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
cloudera-manager-daemons-5.14.1-1.cm5141.p0.1.el7.x86_64.r | 700 MB 33:36
Installed:
cloudera-manager-daemons.x86_64 0:5.14.1-1.cm5141.p0.1.el7
cloudera-manager-server.x86_64 0:5.14.1-1.cm5141.p0.1.el7
Complete!
[root@cloudera yum.repos.d]#
Installing mysql database:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ig_mysql.html#cmig_topic_5_5
[root@cloudera yum.repos.d]# yum install mysql-server
No package mysql-server available.
Error: Nothing to do
[root@cloudera yum.repos.d]#
Mysql is not in the default repo fro Centos 7 , the right approach is to download the mysql community
package which will update the needed repo file
[root@cloudera yum.repos.d]# wget https://repo.mysql.com//mysql57-community-release-el7-
11.noarch.rpm
100%[======================================>] 25,680 --.-K/s in 0.08s
2018-03-05 13:26:51 (302 KB/s) - ‘mysql57-community-release-el7-11.noarch.rpm’ saved [25680/25680]
[root@cloudera yum.repos.d]# rpm -ivh mysql57-community-release-el7-11.noarch.rpm
warning: mysql57-community-release-el7-11.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID
5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql57-community-release-el7-11 ################################# [100%]
[root@cloudera yum.repos.d]# ls
CentOS-Base.repo CentOS-Media.repo mysql-community.repo
CentOS-CR.repo CentOS-Sources.repo mysql-community-source.repo
CentOS-Debuginfo.repo CentOS-Vault.repo
CentOS-fasttrack.repo cloudera-manager.repo
[root@cloudera yum.repos.d]# df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/centos-root 14616576 2260784 12355792 16% /
[root@cloudera yum.repos.d]# yum install mysql-server
mysql-connectors-community | 2.5 kB 00:00
mysql-tools-community | 2.5 kB 00:00
mysql57-community | 2.5 kB 00:00
(1/3): mysql-connectors-community/x86_64/primary_db | 18 kB 00:00
(2/3): mysql-tools-community/x86_64/primary_db | 39 kB 00:01
(3/3): mysql57-community/x86_64/primary_db | 134 kB 00:02
(1/6): mysql-community-common-5.7.21-1.el7.x86_64.rpm | 272 kB 00:05
(2/6): mysql-community-libs-5.7.21-1.el7.x86_64.rpm | 2.1 MB 00:43
(3/6): mysql-community-libs-compat-5.7.21-1.el7.x86_64.rpm | 2.0 MB 00:39
(4/6): net-tools-2.0-0.22.20131004git.el7.x86_64.rpm | 305 kB 00:24
(5/6): mysql-community-client-5.7.21-1.el7.x86_64.rpm | 24 MB 08:25
(6/6): mysql-community-server-5.7.21-1.el7.x86_64.rpm | 164 MB 30:03
--------------------------------------------------------------------------------
Total 104 kB/s | 193 MB 31:32
Complete!
[root@cloudera mysql]# ls -lrt /etc/my.cnf
-rw-r--r-- 1 root root 960 Dec 27 23:10 /etc/my.cnf
[root@cloudera mysql]# cp /etc/my.cnf /etc/my.cnf.org
[root@cloudera mysql]# systemctl start mysqld
[root@cloudera mysql]# systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-03-05 14:09:00 EST; 29s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Retrieving mysql auto generated password:
[root@cloudera mysql]# grep 'temporary password' /var/log/mysqld.log
2018-03-05T19:08:56.327113Z 1 [Note] A temporary password is generated for root@localhost:
HFauGGUl=6Fh
Removing password validation plugin:
[root@cloudera mysql]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 2
Server version: 5.7.21
mysql> uninstall plugin validate_password;
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this
statement.
mysql> alter user root@localhost IDENTIFIED BY ABCxyz$123456' ;
Query OK, 0 rows affected (0.00 sec)
mysql> uninstall plugin validate_password;
Query OK, 0 rows affected (0.01 sec)
mysql>
[root@cloudera mysql]# /usr/bin/mysql_secure_installation
Securing the MySQL server deployment.
Enter password for user root:
VALIDATE PASSWORD PLUGIN can be used to test passwords
and improve security. It checks the strength of password
and allows the users to set only those passwords which are
secure enough. Would you like to setup VALIDATE PASSWORD plugin?
Press y|Y for Yes, any other key for No: No
Using existing password for root.
Change the password for root ? ((Press y|Y for Yes, any other key for No) : y
New password:
Re-enter new password:
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.
Remove anonymous users? (Press y|Y for Yes, any other key for No) : Y
Success.
Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot
guess at the root password from the network.
Disallow root login remotely? (Press y|Y for Yes, any other key for No) : N
... skipping.
By default, MySQL comes with a database named 'test' that anyone can access. This is also intended
only for testing, and should be removed before moving into a production environment.
Remove test database and access to it? (Press y|Y for Yes, any other key for No) : Y
- Dropping test database...
Success.
- Removing privileges on test database...
Success.
Reloading the privilege tables will ensure that all changes made so far will take effect immediately.
Reload privilege tables now? (Press y|Y for Yes, any other key for No) : Y
Success.
All done!
[root@cloudera mysql]#
Download and install the MySQL JDBC client driver:
[root@cloudera backup]# wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-
5.1.45.tar.gz
2018-03-05 14:24:02 (104 KB/s) - ‘mysql-connector-java-5.1.45.tar.gz’ saved [3467861/3467861]
[root@cloudera backup]# ls
mysql-connector-java-5.1.45.tar.gz
[root@cloudera backup]# ls
mysql-connector-java-5.1.45 mysql-connector-java-5.1.45.tar.gz
[root@cloudera backup]# cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar
/usr/share/java/mysql-connector-java.jar
cp: cannot create regular file ‘/usr/share/java/mysql-connector-java.jar’: No such file or directory
[root@cloudera backup]# mkdir -p /usr/share/java/
[root@cloudera backup]# cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar
/usr/share/java/mysql-connector-java.jar
[root@cloudera backup]#
Tidy the mysql with moving the ib_logfiles and create needed database:
[root@cloudera backup]# systemctl stop mysqld
[root@cloudera backup]# mv /var/lib/mysql/ib_logfile0 /backup
[root@cloudera backup]# mv /var/lib/mysql/ib_logfile1 /backup
[root@cloudera etc]# mysql -uroot -p
Enter password:
mysql> create database rman DEFAULT CHARACTER SET utf8;
Query OK, 1 row affected (0.00 sec)
mysql> grant all on rman.* TO 'rman'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected, 1 warning (0.00 sec)
Configure cloudera manager to use the mysql as its external database:
[root@cloudera etc]# /usr/share/cmf/schema/scm_prepare_database.sh mysql -h localhost -uroot -
pwelcome1 --scm-host localhost scm scm scm
JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
Verifying that we can write to /etc/cloudera-scm-server
Mon Mar 05 14:46:56 EST 2018 WARN: Establishing SSL connection without server's identity verification
is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection
must be established by default if explicit option isn't set. For compliance with existing applications not
using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by
setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/share/java/mysql-connector-
java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/schema/../lib/*
com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.
Mon Mar 05 14:46:58 EST 2018 WARN: Establishing SSL connection without server's identity verification
is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection
must be established by default if explicit option isn't set. For compliance with existing applications not
using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by
setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[ main] DbCommandExecutor INFO Successfully connected to database.
All done, your SCM database is configured correctly!
Start Cloudera manager server:
[root@cloudera ~]# service cloudera-scm-server start
[root@cloudera ~]#
tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
2018-03-05 14:58:45,006 INFO SearchRepositoryManager-
0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Finished constructing
repo:2018-03-05T19:58:45.006Z
2018-03-05 14:58:45,767 INFO WebServerImpl:org.mortbay.log: jetty-6.1.26.cloudera.4
2018-03-05 14:58:45,768 INFO WebServerImpl:org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:7180
2018-03-05 14:58:45,768 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty
server.
Installation has been completed successfully 
Now start web browser with the VM IP address and port 7180 to start agents’ deployment and CDH
cluster setup.
Default login is admin/admin
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
The warnings are mainly due to machine resources mainly disk space and memory , with the right
resources you will not see those warnings.
Congratulation, you have completed Cloudera Manager setup with its agents and external databases
then created a new CDH cluster 

More Related Content

PDF
Whitepaper MS SQL Server on Linux
PPTX
How to scheduled jobs in a cloudera cluster without oozie
PPTX
How to implement a gdpr solution in a cloudera architecture
PPTX
How to create a multi tenancy for an interactive data analysis
DOCX
Tharun_Resume_Updated
PDF
En rhel-deploy-oracle-rac-database-12c-rhel-7
PDF
Enterprise managerclodcontrolinstallconfiguration emc12c
PPTX
How to create a secured cloudera cluster
Whitepaper MS SQL Server on Linux
How to scheduled jobs in a cloudera cluster without oozie
How to implement a gdpr solution in a cloudera architecture
How to create a multi tenancy for an interactive data analysis
Tharun_Resume_Updated
En rhel-deploy-oracle-rac-database-12c-rhel-7
Enterprise managerclodcontrolinstallconfiguration emc12c
How to create a secured cloudera cluster

What's hot (20)

PDF
Netxms install guide
PDF
Oracle SOA, BPM, OSB, BAM, & B2B 12C
PDF
MySQL Monitoring 101
PDF
Instalar MySQL CentOS
PPTX
How to configure a hive high availability connection with zeppelin
PDF
Habilitar repositorio EPEL RHEL
PDF
Installing oracle grid infrastructure and database 12c r1
PDF
OSMC 2019 | Use Cloud services & features in your redundant Icinga2 Environme...
PPTX
Building cloud stack at scale
PPTX
Hadoop cluster 安裝
DOCX
PDF
在Oel5上安装配置oracle gird control 10.2.0.5
PDF
Backup workflow for SMHV on windows 2008R2 HYPER-V
PPT
State of the Dolphin, at db tech showcase Osaka 2014
PDF
Mastering VMware Datacenter Part-1
PDF
Curso de MySQL 5.7
PDF
NoSQL атакует: JSON функции в MySQL сервере.
PDF
Installation CentOS 6.3
PDF
Install oracle database 12c software on windows
Netxms install guide
Oracle SOA, BPM, OSB, BAM, & B2B 12C
MySQL Monitoring 101
Instalar MySQL CentOS
How to configure a hive high availability connection with zeppelin
Habilitar repositorio EPEL RHEL
Installing oracle grid infrastructure and database 12c r1
OSMC 2019 | Use Cloud services & features in your redundant Icinga2 Environme...
Building cloud stack at scale
Hadoop cluster 安裝
在Oel5上安装配置oracle gird control 10.2.0.5
Backup workflow for SMHV on windows 2008R2 HYPER-V
State of the Dolphin, at db tech showcase Osaka 2014
Mastering VMware Datacenter Part-1
Curso de MySQL 5.7
NoSQL атакует: JSON функции в MySQL сервере.
Installation CentOS 6.3
Install oracle database 12c software on windows
Ad

Similar to Clouldera Implementation Guide for Production Deployments (20)

PDF
Cloudera hadoop installation
PPTX
Instant hadoop of your own
PDF
Percona Cluster Installation with High Availability
PDF
Installing spark 2
PDF
Cloudera cluster setup and configuration
PDF
StackiFest 16: Stacki Overview- Anoop Rajendra
PPTX
Hadoop cluster setup by using cloudera manager
PDF
Hw09 Clouderas Distribution For Hadoop
PDF
Hadoop meet Rex(How to construct hadoop cluster with rex)
PPTX
Cloudera amazon-ec2
PDF
My talk at LVEE 2016
PDF
DevOps Bootcamp course resource (1)-1-99.pdf
PDF
Navigating the Container Orchestration Maze
PPTX
Pa cloudera manager-api's_extensibility_v2
PDF
How to Become Cloud Backup Provider
PDF
Final White Paper_
PDF
How to become cloud backup provider
PDF
linux installation.pdf
PDF
Extending and Automating Cloudera Manager via API
PDF
deploying-oracle-12c-on-rhel6_1.2_1.pdf
Cloudera hadoop installation
Instant hadoop of your own
Percona Cluster Installation with High Availability
Installing spark 2
Cloudera cluster setup and configuration
StackiFest 16: Stacki Overview- Anoop Rajendra
Hadoop cluster setup by using cloudera manager
Hw09 Clouderas Distribution For Hadoop
Hadoop meet Rex(How to construct hadoop cluster with rex)
Cloudera amazon-ec2
My talk at LVEE 2016
DevOps Bootcamp course resource (1)-1-99.pdf
Navigating the Container Orchestration Maze
Pa cloudera manager-api's_extensibility_v2
How to Become Cloud Backup Provider
Final White Paper_
How to become cloud backup provider
linux installation.pdf
Extending and Automating Cloudera Manager via API
deploying-oracle-12c-on-rhel6_1.2_1.pdf
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Clouldera Implementation Guide for Production Deployments

  • 1. Clouldera Implementation Guide for Production Deployments In this article i will cover a detailed step by step guide for installing Cloudera CDH 5.14 using Cloudera Manager and External Database Setup and create a Hadoop Cluster. This is the recommended path for all production deployments. The standard Cloudera installation guide was kinda confusing for me, it keep looping between different URLs that is hard to have a clear path for the implementation with even some steps that do not work in the explained order as well some that needs to be with different syntax. Here i am sharing a clear and easy path to follow with references, please feel free to reach me for any clarifications or any suggestions for improvements :) Contacts: Name: Ahmed Mekawy Email: ahmedmekawy@hotmail.com LinkedIn: https://www.linkedin.com/in/ahmed-mekawy-1ba11031/ Please feel free to reach me when you do have a need to setup a production environment or administration training classes and I will be happy to help. Let's get started: Implementation Overview: Install and configure the database, install the Oracle JDK – Database should be external for production deployments ( this what we will do here) – Embedded PostgreSQL is okay for testing or ‘proof of concept’ work Ensure access to the Cloudera software repositories – For Cloudera Manager – For CDH
  • 2. Install Cloudera Manager and agents Install the CDH Parcel services or RPMs for the services required on each host in the cluster Implementation Environment Planning: I am using VirtualBox to create a VM with Centos 7, my hostname is cloudera. The VM is 5G RAM , 15 GB Disk Space ,with 1 Network Card and Internet access. I will use MySQL as the external database for Cloudera Manager and CDH components. For different setup, you only need to ensure having the right ceritified matrix and capacity planing in place, the rest of the steps are exactly the same as this guide, review the following links: Please review CDH 5 and Cloudera Manager 5 Requirements and Supported Versions . Hardware Requirements Guide Building local repositories for hosts with no internet access. Implementation step by step: login as: root root@192.168.1.50's password: Disable Firewall: [root@cloudera ~]# systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2018-03-05 09:07:48 EST; 1min 10s ago [root@cloudera ~]# service firewalld stop Redirecting to /bin/systemctl stop firewalld.service
  • 3. [root@cloudera ~]# systemctl disable firewalld Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service. Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service. [root@cloudera ~]# Disable SELinux: [root@cloudera ~]# sestatus SELinux status: disabled [root@cloudera ~]# Install Python: [root@cloudera ~]# rpm -qa |grep -i python python-2.7.5-58.el7.x86_64 [root@cloudera ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.50 cloudera Get repo file from https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_vd.html [root@cloudera yum.repos.d]# wget https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera- manager.repo -bash: wget: command not found [root@cloudera yum.repos.d]# yum install wget ================================================================================ Package Arch Version Repository Size
  • 4. ================================================================================ Installing: wget x86_64 1.14-15.el7_4.1 updates 547 k Installed: wget.x86_64 0:1.14-15.el7_4.1 Complete! Added cloudera repo: [root@cloudera yum.repos.d]# [root@cloudera yum.repos.d]# wget https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera- manager.repo --2018-03-05 09:28:40-- https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera- manager.repo Resolving archive.cloudera.com (archive.cloudera.com)... 151.101.0.167, 151.101.64.167, 151.101.128.167, ... Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.0.167|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 290 Saving to: ‘cloudera-manager.repo’ 100%[======================================>] 290 --.-K/s in 0s 2018-03-05 09:28:46 (28.5 MB/s) - ‘cloudera-manager.repo’ saved [290/290] [root@cloudera yum.repos.d]# ls CentOS-Base.repo CentOS-fasttrack.repo CentOS-Vault.repo CentOS-CR.repo CentOS-Media.repo cloudera-manager.repo
  • 5. CentOS-Debuginfo.repo CentOS-Sources.repo Install JAVA JDK: [root@cloudera yum.repos.d]# yum install oracle-j2sdk1.7 Loaded plugins: fastestmirror cloudera-manager | 951 B 00:00 cloudera-manager/primary | 4.3 kB 00:00 Loading mirror speeds from cached hostfile * base: mirror.airenetworks.es * extras: mirror.crazynetwork.it * updates: mirrors.prometeus.net cloudera-manager 7/7 Resolving Dependencies --> Running transaction check ---> Package oracle-j2sdk1.7.x86_64 0:1.7.0+update67-1 will be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: oracle-j2sdk1.7 x86_64 1.7.0+update67-1 cloudera-manager 135 M
  • 6. Transaction Summary ================================================================================ Install 1 Package Total download size: 135 M Installed size: 279 M Is this ok [y/d/N]: y Downloading packages: Installed: oracle-j2sdk1.7.x86_64 0:1.7.0+update67-1 Complete! Install Cloudera Manager Components: [root@cloudera yum.repos.d]# yum install cloudera-manager-daemons cloudera-manager-server Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.airenetworks.es * extras: mirror.crazynetwork.it * updates: mirrors.prometeus.net Resolving Dependencies --> Running transaction check Dependencies Resolved ================================================================================ Package Arch Version Repository Size
  • 7. ================================================================================ Installing: cloudera-manager-daemons x86_64 5.14.1-1.cm5141.p0.1.el7 cloudera-manager 700 M cloudera-manager-server x86_64 5.14.1-1.cm5141.p0.1.el7 cloudera-manager 8.5 k Transaction Summary ================================================================================ Install 2 Packages (+27 Dependent packages) Total size: 711 M Total download size: 700 M Installed size: 918 M Is this ok [y/d/N]: y Downloading packages: Delta RPMs disabled because /usr/bin/applydeltarpm not installed. cloudera-manager-daemons-5.14.1-1.cm5141.p0.1.el7.x86_64.r | 700 MB 33:36 Installed: cloudera-manager-daemons.x86_64 0:5.14.1-1.cm5141.p0.1.el7 cloudera-manager-server.x86_64 0:5.14.1-1.cm5141.p0.1.el7 Complete! [root@cloudera yum.repos.d]# Installing mysql database:
  • 8. https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ig_mysql.html#cmig_topic_5_5 [root@cloudera yum.repos.d]# yum install mysql-server No package mysql-server available. Error: Nothing to do [root@cloudera yum.repos.d]# Mysql is not in the default repo fro Centos 7 , the right approach is to download the mysql community package which will update the needed repo file [root@cloudera yum.repos.d]# wget https://repo.mysql.com//mysql57-community-release-el7- 11.noarch.rpm 100%[======================================>] 25,680 --.-K/s in 0.08s 2018-03-05 13:26:51 (302 KB/s) - ‘mysql57-community-release-el7-11.noarch.rpm’ saved [25680/25680] [root@cloudera yum.repos.d]# rpm -ivh mysql57-community-release-el7-11.noarch.rpm warning: mysql57-community-release-el7-11.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY Preparing... ################################# [100%] Updating / installing... 1:mysql57-community-release-el7-11 ################################# [100%] [root@cloudera yum.repos.d]# ls CentOS-Base.repo CentOS-Media.repo mysql-community.repo CentOS-CR.repo CentOS-Sources.repo mysql-community-source.repo CentOS-Debuginfo.repo CentOS-Vault.repo CentOS-fasttrack.repo cloudera-manager.repo [root@cloudera yum.repos.d]# df -k . Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/centos-root 14616576 2260784 12355792 16% /
  • 9. [root@cloudera yum.repos.d]# yum install mysql-server mysql-connectors-community | 2.5 kB 00:00 mysql-tools-community | 2.5 kB 00:00 mysql57-community | 2.5 kB 00:00 (1/3): mysql-connectors-community/x86_64/primary_db | 18 kB 00:00 (2/3): mysql-tools-community/x86_64/primary_db | 39 kB 00:01 (3/3): mysql57-community/x86_64/primary_db | 134 kB 00:02 (1/6): mysql-community-common-5.7.21-1.el7.x86_64.rpm | 272 kB 00:05 (2/6): mysql-community-libs-5.7.21-1.el7.x86_64.rpm | 2.1 MB 00:43 (3/6): mysql-community-libs-compat-5.7.21-1.el7.x86_64.rpm | 2.0 MB 00:39 (4/6): net-tools-2.0-0.22.20131004git.el7.x86_64.rpm | 305 kB 00:24 (5/6): mysql-community-client-5.7.21-1.el7.x86_64.rpm | 24 MB 08:25 (6/6): mysql-community-server-5.7.21-1.el7.x86_64.rpm | 164 MB 30:03 -------------------------------------------------------------------------------- Total 104 kB/s | 193 MB 31:32 Complete! [root@cloudera mysql]# ls -lrt /etc/my.cnf -rw-r--r-- 1 root root 960 Dec 27 23:10 /etc/my.cnf [root@cloudera mysql]# cp /etc/my.cnf /etc/my.cnf.org [root@cloudera mysql]# systemctl start mysqld [root@cloudera mysql]# systemctl status mysqld ● mysqld.service - MySQL Server Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
  • 10. Active: active (running) since Mon 2018-03-05 14:09:00 EST; 29s ago Docs: man:mysqld(8) http://dev.mysql.com/doc/refman/en/using-systemd.html Retrieving mysql auto generated password: [root@cloudera mysql]# grep 'temporary password' /var/log/mysqld.log 2018-03-05T19:08:56.327113Z 1 [Note] A temporary password is generated for root@localhost: HFauGGUl=6Fh Removing password validation plugin: [root@cloudera mysql]# mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 2 Server version: 5.7.21 mysql> uninstall plugin validate_password; ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement. mysql> alter user root@localhost IDENTIFIED BY ABCxyz$123456' ; Query OK, 0 rows affected (0.00 sec) mysql> uninstall plugin validate_password; Query OK, 0 rows affected (0.01 sec) mysql> [root@cloudera mysql]# /usr/bin/mysql_secure_installation Securing the MySQL server deployment. Enter password for user root:
  • 11. VALIDATE PASSWORD PLUGIN can be used to test passwords and improve security. It checks the strength of password and allows the users to set only those passwords which are secure enough. Would you like to setup VALIDATE PASSWORD plugin? Press y|Y for Yes, any other key for No: No Using existing password for root. Change the password for root ? ((Press y|Y for Yes, any other key for No) : y New password: Re-enter new password: By default, a MySQL installation has an anonymous user, allowing anyone to log into MySQL without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? (Press y|Y for Yes, any other key for No) : Y Success. Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? (Press y|Y for Yes, any other key for No) : N ... skipping.
  • 12. By default, MySQL comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? (Press y|Y for Yes, any other key for No) : Y - Dropping test database... Success. - Removing privileges on test database... Success. Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? (Press y|Y for Yes, any other key for No) : Y Success. All done! [root@cloudera mysql]# Download and install the MySQL JDBC client driver: [root@cloudera backup]# wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java- 5.1.45.tar.gz 2018-03-05 14:24:02 (104 KB/s) - ‘mysql-connector-java-5.1.45.tar.gz’ saved [3467861/3467861] [root@cloudera backup]# ls mysql-connector-java-5.1.45.tar.gz [root@cloudera backup]# ls mysql-connector-java-5.1.45 mysql-connector-java-5.1.45.tar.gz
  • 13. [root@cloudera backup]# cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar /usr/share/java/mysql-connector-java.jar cp: cannot create regular file ‘/usr/share/java/mysql-connector-java.jar’: No such file or directory [root@cloudera backup]# mkdir -p /usr/share/java/ [root@cloudera backup]# cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar /usr/share/java/mysql-connector-java.jar [root@cloudera backup]# Tidy the mysql with moving the ib_logfiles and create needed database: [root@cloudera backup]# systemctl stop mysqld [root@cloudera backup]# mv /var/lib/mysql/ib_logfile0 /backup [root@cloudera backup]# mv /var/lib/mysql/ib_logfile1 /backup [root@cloudera etc]# mysql -uroot -p Enter password: mysql> create database rman DEFAULT CHARACTER SET utf8; Query OK, 1 row affected (0.00 sec) mysql> grant all on rman.* TO 'rman'@'localhost' IDENTIFIED BY 'password'; Query OK, 0 rows affected, 1 warning (0.00 sec) Configure cloudera manager to use the mysql as its external database: [root@cloudera etc]# /usr/share/cmf/schema/scm_prepare_database.sh mysql -h localhost -uroot - pwelcome1 --scm-host localhost scm scm scm JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera Verifying that we can write to /etc/cloudera-scm-server
  • 14. Mon Mar 05 14:46:56 EST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/share/java/mysql-connector- java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. Mon Mar 05 14:46:58 EST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly! Start Cloudera manager server: [root@cloudera ~]# service cloudera-scm-server start [root@cloudera ~]# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log 2018-03-05 14:58:45,006 INFO SearchRepositoryManager- 0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Finished constructing repo:2018-03-05T19:58:45.006Z 2018-03-05 14:58:45,767 INFO WebServerImpl:org.mortbay.log: jetty-6.1.26.cloudera.4 2018-03-05 14:58:45,768 INFO WebServerImpl:org.mortbay.log: Started SelectChannelConnector@0.0.0.0:7180 2018-03-05 14:58:45,768 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server. Installation has been completed successfully  Now start web browser with the VM IP address and port 7180 to start agents’ deployment and CDH cluster setup.
  • 15. Default login is admin/admin
  • 31. The warnings are mainly due to machine resources mainly disk space and memory , with the right resources you will not see those warnings.
  • 32. Congratulation, you have completed Cloudera Manager setup with its agents and external databases then created a new CDH cluster 