Pig step-by-step installation with integrated HCatalog

1) Download tar file "pig-0.13.0.tar.gz"

2) Gunzip and Untar the file at /opt/ds/app/pig-0.13.0

3) Change directory to /opt/ds/app/pig-0.13.0/conf

4) Create log4j.properties from template file

5) Update pig.properties for HCatalog. For example:

hcat.bin=/opt/ds/app/hive-0.13.0/hcatalog/bin/hcat

6) Edit .bashrc

export PIG_HOME=/opt/ds/app/pig-0.13.0
export PATH=$PATH:$PIG_HOME/bin
export HCAT_HOME=/opt/ds/app/hive-0.13.0/hcatalog
export PATH=$PATH:$HCAT_HOME/bin

7) It is assumed that you have already set HADOOP_HOME, JAVA_HOME, HADOOP_COMMON_LIB_NATIVE_DIR, HADOOP_OPTS, YARN_OPTS

8) Optionally, you can create .pigbootup in User home directory

9) Execute command from user home directory

> source .bashrc

10) Execute

>pig -useHCatalog

11) Say you had created a table in Hive with name "hivetesting". Now, try to load with below command to verify installation.

grunt> A = LOAD 'hivetesting' USING org.apache.hcatalog.pig.HCatLoader();
grunt> describe A;

==========

Some times you may get problem after installing Pig like below:-

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at org.apache.hcatalog.common.HCatUtil.checkJobContextIfRunningFromBackend(HCatUtil.java:88)
 at org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:162)
 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:540)
 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:322)
 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:199)
 at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:277)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
 at org.apache.pig.PigServer.execute(PigServer.java:1341)

Many blogs suggest you to recompile the Pig by executing command:


ant clean jar-all -Dhadoopversion=23

or recompile piggybank.jar by executing below steps


cd contrib/piggybank/java
ant clean
ant -Dhadoopversion=23

But this may not solve your problem big time. The actual cause here is related to HCatalog. Try updating it!!. In my case, I was using Hive0.13 and Pig.0.13. And I was using HCatalog provided with Hive0.13.

Then I updated Pig to 0.15 and used separate hive-hcatalog-0.13.0.2.1.1.0-385 library jars. And problem was resolved....

Because later I identified it was not Pig who was creating problem rather it was Hive-HCatalog libraries. Hope this may help.

QueryDB

Search This Blog

Pig step-by-step installation with integrated HCatalog

Comments

Post a Comment

Popular posts

Spark MongoDB Connector Not leading to correct count or data while reading

Scala Spark building Jar leads java.lang.StackOverflowError

MongoDB Chunk size many times bigger than configure chunksize (128 MB)

Hive Parse JSON with Array Columns and Explode it in to Multiple rows.

AWS EMR Spark – Much Larger Executors are Created than Requested