Skip to main content

Posts

Showing posts from 2015

Installing Hadoop 2.x cluster with multiple nodes

1) Follow steps as below We are going to set up 3 node cluster for Hadoop to start with follow below steps as written in http://querydb.blogspot.in/2015/12/installing-single-node-hadoop-220-on.html 1) Prerequisite 2) Add Hadoop Group and User 3) Setup SSH Certificate 4) Disabling IPv6 5) Install/ Setup Hadoop 6) Setup environment variable for hadoop 7) Login using hduser and verify hadoop version ** Please make sure to complete the steps only till step 7) 2) Networking Update /etc/hosts on each of 3 boxes and add below lines: 172.26.34.91    slave2 192.168.64.96   slave1 172.26.34.126   master 3) SSH access Setup ssh in every node such that they can communicate with one another without any prompt for password. Since you have followed step 1) on every node. ssh keys has been setup. What we need to do is right now is to access slave1 and slave 2 from master. So, we just have to add the hduser@master’s public SSH key (which should be in $HOME/.ssh/id_rsa.pub) to the au

Installing single node Hadoop 2.x on Ubuntu

1) Prerequisite Install open-ssh server Install Sun java-7-oracle 2) Add Hadoop Group and User $ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser $ sudo adduser hduser sudo 3) Setup SSH Certificate $ ssh-keygen -t rsa -P '' ... Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub. ... $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ ssh localhost 4) Disabling IPv6 $ sudo gedit /etc/sysctl.conf Add following lines to the end of file and reboot the machine #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 5) Install/ Setup Hadoop Download the hadoop tar.gz file. Follow below steps on shell $ sudo tar vxzf hadoop-2.7.1.tar.gz -C /usr/local $ cd /usr/local $ sudo mv hadoop-2.7.1 hadoop $ sudo chown -R hduser:hadoop hadoop 6) Setup environment variable for hadoop

Hive-Complex UDF-To replace keywords in csv string

Suppose we have an input file as follows :- $vi source abcd deff,12, xyzd,US din,123,abcd,Pak And a keyword's file like :- $vi keyword abc,xyz xyz And say we want to produce output something like where there are 4 columns:- first column, indicates original value. second column, indicates indexes of keywords removed in original value. third column, indicates string after keywords are removed. fourth column, indicates number of times keywords are removed in original value. Firstly, let us create desired Tables in Hive as below: Hive> create table source ( inital_data string ) ; Hive> load data local inpath '/root/source' into table source; Put Keyword file to HDFS: $ hadoop fs -put /root/keyword hdfs://sandbox.hortonworks.com:8020/user/root/keyword We would be writing a Hive UDF "ReplaceKeyword" that would write desired output mentioned above with "$" se

Hive: Write custom serde

Suppose we have input file like below: $ vi uwserde kiju1233,1234567890 huhuhuhu,1233330987 … … … This input file consist of sessionid and timestamp as comma-separated value. Assuming this I wrote a WritableComparable as below: package hive; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.WritableComparable; public class UserWritable implements WritableComparable< UserWritable >  {               private Text sessionID ;                      private Text timestamp ;        public UserWritable() {               set( new Text(), new Text());        }                      public void set(Text sessionID, Text timestamp) {               this . sessionID = sessionID;               this . timestamp = timestamp;        }