Apache hadoop 2_installation

Apache Hadoop 2 Installation in Pseudo Mode
Download URL
1. Hadoop: https://archive.apache.org/dist/hadoop/core/stable/
2. Hive: http://people.apache.org/~hashutosh/hive-0.10.0-rc0/
3. Pig: http://ftp.udc.es/apache/pig/pig-0.12.0/
4. Hbase: http://archive.apache.org/dist/hbase/hbase-0.94.10/
Step 1: Generate ssh key
$ssh-keygen -t rsa -P “”
Step 2: Copy id_rsa.pub to authorized_keys
$cd .ssh
$cp id_rsa.pub authorized_keys
$chmod 644 authorized_keys
Step 3: Passwordless ssh to localhost
$cd ~
$ssh localhost
Step 4: Untar tarballs

$tar -xvzf hadoop-2.2.0.tar.gz
Step 5: Configuration files
$cd hadoop-2.2.0/etc/hadoop/
$vim core-site.xml
Add following properties in core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.17.196.14</value>
</property>
<property>
<name>io.native.lib.available</name>
<value>true</value>
</property>
$vim hdfs-site.xml
Add following property in hdfs-site.xml
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop-2.2.0/pseudo/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop-2.2.0/pseudo/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
$vim mapred-site.xml
Add following property in mapred-site.xml

<property>
<name>mapreduce.cluster.temp.dir</name>
<value>/home/hadoop/hadoop-2.2.0/temp</value>
<final>true</final>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/hadoop/hadoop-2.2.0/local</value>
<final>true</final>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
$vim yarn-site.xml
Add following property in yarn-site.xml
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:6000</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value> localhost:6001</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler<
/value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value> localhost:6002</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/hadoop-2.2.0/yarn_nodemanager</value>
</property>
<property>

<name>yarn.nodemanager.address</name>
<value>0.0.0.0:6003</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/home/hadoop/hadoop-2.2.0/app-logs</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/hadoop-2.2.0/logs</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
$vim slaves
Add localhost in masters file
Step 6: set .bashrc
$cd ~
$vim .bashrc
export JAVA_HOME=/usr/
export HADOOP_HOME=/home/ahadoop2/hadoop-2.2.0
export HADOOP_CONF_DIR=/home/ahadoop2/hadoop-2.2.0/etc/hadoop
export PIG_HOME=/home/ahadoop2/pig-0.12.0
export HBASE_HOME=/home/ahadoop2/hbase-0.96.0-hadoop2
export HIVE_HOME=/home/ahadoop2/hive-0.11.0
export PIG_CLASSPATH=/home/ahadoop2/hadoop-2.2.0/etc/hadoop
export CLASSPATH=$PIG_HOME/pig-withouthadoop.jar:
$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:
$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:$HBASE_HOME/lib/hbase-client-
0.96.0-hadoop2.jar:$HBASE_HOME/lib/hbase-common-0.96.0-hadoop2.jar:
$HBASE_HOME/lib/hbase-server-0.96.0-hadoop2.jar:$HBASE_HOME/lib/commons-httpclient-
3.1.jar:$HBASE_HOME/lib/commons-collections-3.2.1.jar:$HBASE_HOME/lib/commons-lang-
2.6.jar:$HBASE_HOME/lib/jackson-mapper-asl-1.8.8.jar:$HBASE_HOME/lib/jackson-core-asl-
1.8.8.jar:$HBASE_HOME/lib/guava-12.0.1.jar:$HBASE_HOME/lib/protobuf-java-2.5.0.jar:

$HBASE_HOME/lib/commons-codec-1.7.jar:$HBASE_HOME/lib/zookeeper-3.4.5.jar:
$HIVE_HOME/lib/hive-jdbc-0.11.0.jar:$HIVE_HOME/lib/hive-metastore-0.11.0.jar:
$HIVE_HOME/lib/hive-serde-0.11.0.jar:$HIVE_HOME/lib/hive-common-0.11.0.jar:
$HIVE_HOME/lib/hive-service-0.11.0.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:
$HIVE_HOME/lib/postgresql-9.2-1003.jdbc3.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:
$HIVE_HOME/lib/slf4j-api-1.6.1.jar:$HIVE_HOME/lib/commons-logging-
1.0.4.jar:/home/ahadoop2/Hadoop2Training.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PIG_HOME/bin:$HBASE_HOME/bin:
$HIVE_HOME/bin:/bin:/usr/lib64/qt-
3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:
Step 7: Load .bashrc
$cd ~
$. .bashrc
Step 8: Formatting the name node
$cd ~
$hadoop namenode -format
Step 9: Starting Cluster
$cd ~/hadoop-2.2.0/sbin
$ ./start-all.sh
To view the started daemons
$ jps
This should show the started daemons.
NameNode
DataNode
SecondaryNamenode
Nodemanager
ResourceManager
Apache Hbase Installation in Pseudo Mode
Step 1: Untar the tarballs
$tar -xvzf hbase-0.96.0-hadoop2.tar.gz
$cd hbase-0.96.0-hadoop2/conf

$vim hbase-site.xml
Copy following properties in hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
<description>The directory shared by RegionServers</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
$vim regionservers
Add localhost in regionservers file
Step 3: Add hadoop jars from hadoop directory to hbase lib directory
$cd /home/hadoop/hadoop-2.2.0/share/hadoop/common/
$cp hadoop-common-2.2.0.jar /home/hadoop/hbase-0.96.0-hadoop2/lib/
Step 4: start hbase
$cd ~
$start-hbase.sh
Step 5: To view the started daemons
$ jps
Hmaster
Hregionserver
Hquorumpeer
Step 6: To view hbase shell
$hbase shell
Step 7: Before connecting to hbase using java
Start hbase rest service by executing following command

$hbase-daemon.sh start rest -p 8090
Apache Hive Installation
$tar -xvzf hive-0.11.0.tar.gz
Step 2: Configuring a remote PostgreSQL database for the Hive Metastore
Before you can run the Hive metastore with a remote PostgreSQL database, you must configure a
connector to the remote PostgreSQL database, set up the initial database schema, and configure the
PostgreSQL user account for the Hive user.
Install and start PostgreSQL if you have not already done so you need to edit the postgresql.conf
file. Set the listen property to * to make sure that the Configure authentication for your network in
pg_hba.conf. Add a new line into pg_hba.con that has the following information:
Start PostgreSQL Server
$ su postgres
$cd $postgres_home/bin
$./pg_ctl start -D path_to_data_dir
Install the Postgres JDBC Driver
Copy postgresql-jdbc driver in $HIVE_HOME/lib/
Create the metastore database and user account
Proceed as in the following example:
bash# sudo –u postgres psql

bash$ psql
postgres=# CREATE USER hiveuser WITH PASSWORD 'mypassword';
postgres=# CREATE DATABASE metastore;
postgres=# exit;
bash# sudo –u hiveuser metastore
You are now connected to database 'metastore' with hiveuser.
metastore=# i /home/hadoop/hive-0.11.0/scripts/metastore/upgrade/postgres/hive-schema-
0.10.0.postgres.sql
$cd hive-0.11.0/conf
$vim hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://<postgresql instance ip>:5432/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mypassword</value>
</property>
<property>

<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://<namenode ip>:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore
host</description>
</property>
<property>
<name>datanucleus.autoStartMechanism</name>
<value>SchemaTable</value>
</property>
</configuration>
Step 4: Strat hive metastore
$hive --service metastore
Step 5: To view hive console
$hive
hive>show tables;
OK
Step 6: Before connecting to hive using java
Start hiveserver by executing following command
$hive --service hiveserver
Apache pig installation
$tar -xvzf pig-0.12.0.tar.gz
Step 2: Delete two jars (pig and pig-without hadoop jar) from pig home directory and add pig-
withouthadoop.jar in pig installation directory (Uploaded in knowmax same path)
Step 3: To open pig grunt
$pig

Apache hadoop 2_installation

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to Apache hadoop 2_installation (20)

Apache hadoop 2_installation