1) Prerequisite
vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh # Set Java home. The java implementation to use. export JAVA_HOME=/usr/lib/jvm/jdk
10) Start Hadoop Service
$ start-dfs.sh
15/12/09 14:39:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-ubuntu-VirtualBox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu-VirtualBox.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-ubuntu-VirtualBox.out
$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-ubuntu-VirtualBox.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-ubuntu-VirtualBox.out
- Install open-ssh server
- Install Sun java-7-oracle
2) Add Hadoop Group and User
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo adduser hduser sudo
3) Setup SSH Certificate
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
4) Disabling IPv6
$ sudo gedit /etc/sysctl.conf
Add following lines to the end of file and reboot the machine
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
5) Install/ Setup Hadoop
6) Setup environment variable for hadoop
5) Install/ Setup Hadoop
- Download the hadoop tar.gz file.
- Follow below steps on shell
$ sudo tar vxzf hadoop-2.7.1.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.7.1 hadoop
$ sudo chown -R hduser:hadoop hadoop
6) Setup environment variable for hadoop
$cd ~
$vi .bashrc
paste following to the end of the file
###
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export YARN_OPTS="$YARN_OPTS -Djava.net.preferIPv4Stack=true"
###end
7) Login using hduser and verify hadoop version
$ hadoop version
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar
8) Configure Hadoop
$ cd /usr/local/hadoop/etc/hadoop
$ vi core-site.xml
#Paste following between <configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
$ vi yarn-site.xml
#Paste following between <configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
$ mv mapred-site.xml.template mapred-site.xml
$ vi mapred-site.xml
#Paste following between <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ cd /usr/local/hadoop/etc/hadoop
$ vi hdfs-site.xml
Paste following between <configuration> tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh # Set Java home. The java implementation to use. export JAVA_HOME=/usr/lib/jvm/jdk
9) Format Namenode
$ hdfs namenode -format
$ start-dfs.sh
15/12/09 14:39:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-ubuntu-VirtualBox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu-VirtualBox.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-ubuntu-VirtualBox.out
$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-ubuntu-VirtualBox.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-ubuntu-VirtualBox.out
$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-hduser-historyserver-ubuntu-VirtualBox.out
$ jps
2511 DataNode
2388 NameNode
3023 NodeManager
3346 JobHistoryServer
3413 Jps
2694 SecondaryNameNode
2894 ResourceManager
11) Run Hadoop Example
$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 2 7
WARNING: Use "yarn jar" to launch YARN applications.
Number of Maps = 2
Samples per Map = 7
15/12/09 14:42:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
15/12/09 14:43:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/09 14:43:04 INFO input.FileInputFormat: Total input paths to process : 2
15/12/09 14:43:04 INFO mapreduce.JobSubmitter: number of splits:2
15/12/09 14:43:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449652253965_0001
15/12/09 14:43:07 INFO impl.YarnClientImpl: Submitted application application_1449652253965_0001
15/12/09 14:43:07 INFO mapreduce.Job: The url to track the job: http://ubuntu-VirtualBox:8088/proxy/application_1449652253965_0001/
15/12/09 14:43:07 INFO mapreduce.Job: Running job: job_1449652253965_0001
...
...
Job Finished in 57.921 seconds
Estimated value of Pi is 3.71428571428571428571
Comments
Post a Comment