Installing single node Hadoop 2.x on Ubuntu

1) Prerequisite
  • Install open-ssh server
  • Install Sun java-7-oracle
2) Add Hadoop Group and User
$ sudo addgroup hadoop

$ sudo adduser --ingroup hadoop hduser

$ sudo adduser hduser sudo

3) Setup SSH Certificate

$ ssh-keygen -t rsa -P '' ... Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/ ... $ cat ~/.ssh/ >> ~/.ssh/authorized_keys $ ssh localhost

4) Disabling IPv6
$ sudo gedit /etc/sysctl.conf Add following lines to the end of file and reboot the machine #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1

5) Install/ Setup Hadoop
  • Download the hadoop tar.gz file.
  • Follow below steps on shell
$ sudo tar vxzf hadoop-2.7.1.tar.gz -C /usr/local $ cd /usr/local $ sudo mv hadoop-2.7.1 hadoop $ sudo chown -R hduser:hadoop hadoop

6) Setup environment variable for hadoop
$cd ~
$vi .bashrc
paste following to the end of the file

7) Login using hduser and verify hadoop version
$ hadoop version
Hadoop 2.7.1 Subversion -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a Compiled by jenkins on 2015-06-29T06:04Z Compiled with protoc 2.5.0 From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar

8) Configure Hadoop
$ cd /usr/local/hadoop/etc/hadoop $ vi core-site.xml #Paste following between <configuration> <property> <name></name> <value>hdfs://localhost:9000</value> </property> $ vi yarn-site.xml #Paste following between <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> $ mv mapred-site.xml.template mapred-site.xml $ vi mapred-site.xml #Paste following between <configuration> <property> <name></name> <value>yarn</value> </property> $ cd ~ $ mkdir -p mydata/hdfs/namenode $ mkdir -p mydata/hdfs/datanode $ cd /usr/local/hadoop/etc/hadoop $ vi hdfs-site.xml Paste following between <configuration> tag <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name></name> <value>file:/home/hduser/mydata/hdfs/namenode</value> </property> <property> <name></name> <value>file:/home/hduser/mydata/hdfs/datanode</value> </property>

vi /usr/local/hadoop/etc/hadoop/ # Set Java home. The java implementation to use. export JAVA_HOME=/usr/lib/jvm/jdk

9) Format Namenode
$ hdfs namenode -format

10) Start Hadoop Service
15/12/09 14:39:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-ubuntu-VirtualBox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu-VirtualBox.out
Starting secondary namenodes [] starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-ubuntu-VirtualBox.out

starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-ubuntu-VirtualBox.out

localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-ubuntu-VirtualBox.out

$ start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-hduser-historyserver-ubuntu-VirtualBox.out

$ jps
2511 DataNode
2388 NameNode
3023 NodeManager
3346 JobHistoryServer
3413 Jps
2694 SecondaryNameNode
2894 ResourceManager

11) Run Hadoop Example
$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 2 7
WARNING: Use "yarn jar" to launch YARN applications.
Number of Maps  = 2
Samples per Map = 7
15/12/09 14:42:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
15/12/09 14:43:02 INFO client.RMProxy: Connecting to ResourceManager at /

15/12/09 14:43:04 INFO input.FileInputFormat: Total input paths to process : 2
15/12/09 14:43:04 INFO mapreduce.JobSubmitter: number of splits:2
15/12/09 14:43:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449652253965_0001
15/12/09 14:43:07 INFO impl.YarnClientImpl: Submitted application application_1449652253965_0001
15/12/09 14:43:07 INFO mapreduce.Job: The url to track the job: http://ubuntu-VirtualBox:8088/proxy/application_1449652253965_0001/
15/12/09 14:43:07 INFO mapreduce.Job: Running job: job_1449652253965_0001
Job Finished in 57.921 seconds
Estimated value of Pi is 3.71428571428571428571


