Skip to main content

Posts

Showing posts from November, 2015

Hive-Complex UDF-To replace keywords in csv string

Suppose we have an input file as follows :- $vi source abcd deff,12, xyzd,US din,123,abcd,Pak And a keyword's file like :- $vi keyword abc,xyz xyz And say we want to produce output something like where there are 4 columns:- first column, indicates original value. second column, indicates indexes of keywords removed in original value. third column, indicates string after keywords are removed. fourth column, indicates number of times keywords are removed in original value. Firstly, let us create desired Tables in Hive as below: Hive> create table source ( inital_data string ) ; Hive> load data local inpath '/root/source' into table source; Put Keyword file to HDFS: $ hadoop fs -put /root/keyword hdfs://sandbox.hortonworks.com:8020/user/root/keyword We would be writing a Hive UDF "ReplaceKeyword" that would write desired output mentioned above with "$" se

Hive: Write custom serde

Suppose we have input file like below: $ vi uwserde kiju1233,1234567890 huhuhuhu,1233330987 … … … This input file consist of sessionid and timestamp as comma-separated value. Assuming this I wrote a WritableComparable as below: package hive; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.WritableComparable; public class UserWritable implements WritableComparable< UserWritable >  {               private Text sessionID ;                      private Text timestamp ;        public UserWritable() {               set( new Text(), new Text());        }                      public void set(Text sessionID, Text timestamp) {               this . sessionID = sessionID;               this . timestamp = timestamp;        }