Pig step-by-step installation with integrated HCatalog

1) Download tar file "pig-0.13.0.tar.gz"

2) Gunzip and Untar the file at /opt/ds/app/pig-0.13.0

3) Change directory to /opt/ds/app/pig-0.13.0/conf

4) Create log4j.properties from template file

5) Update pig.properties for HCatalog. For example:

hcat.bin=/opt/ds/app/hive-0.13.0/hcatalog/bin/hcat

6) Edit .bashrc

export PIG_HOME=/opt/ds/app/pig-0.13.0
export PATH=$PATH:$PIG_HOME/bin
export HCAT_HOME=/opt/ds/app/hive-0.13.0/hcatalog
export PATH=$PATH:$HCAT_HOME/bin

7) It is assumed that you have already set HADOOP_HOME, JAVA_HOME, HADOOP_COMMON_LIB_NATIVE_DIR, HADOOP_OPTS, YARN_OPTS

8) Optionally, you can create .pigbootup in User home directory

9) Execute command from user home directory

> source .bashrc

10) Execute

>pig -useHCatalog

11) Say you had created a table in Hive with name "hivetesting". Now, try to load with below command to verify installation.

grunt> A = LOAD 'hivetesting' USING org.apache.hcatalog.pig.HCatLoader();
grunt> describe A;

==========

Some times you may get problem after installing Pig like below:-

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at org.apache.hcatalog.common.HCatUtil.checkJobContextIfRunningFromBackend(HCatUtil.java:88)
 at org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:162)
 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:540)
 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:322)
 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:199)
 at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:277)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
 at org.apache.pig.PigServer.execute(PigServer.java:1341)

Many blogs suggest you to recompile the Pig by executing command:


ant clean jar-all -Dhadoopversion=23

or recompile piggybank.jar by executing below steps


cd contrib/piggybank/java
ant clean
ant -Dhadoopversion=23

But this may not solve your problem big time. The actual cause here is related to HCatalog. Try updating it!!. In my case, I was using Hive0.13 and Pig.0.13. And I was using HCatalog provided with Hive0.13.

Then I updated Pig to 0.15 and used separate hive-hcatalog-0.13.0.2.1.1.0-385 library jars. And problem was resolved....

Because later I identified it was not Pig who was creating problem rather it was Hive-HCatalog libraries. Hope this may help.

QueryDB

Search This Blog

Pig step-by-step installation with integrated HCatalog

Comments

Post a Comment

Popular posts

Read from a hive table and write back to it using spark sql

Hive Parse JSON with Array Columns and Explode it in to Multiple rows.

Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;

Hadoop Distcp Error Duplicate files in input path