Snappy ERROR using Spark/ Hive

we received following error using SPARK-

ERROR -

1)

java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy

at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)

at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)

2)
Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-d5273c94-b734-4a61-b631-b68a9e859151-libsnappyjava.so: /tmp/snappy-1.1.2-d5273c94-b734-4a61-b631-b68a9e859151-libsnappyjava.so: failed to map segment from shared object: Operation not permitted
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)

CAUSE -

It is because that /tmp doesn't have execute permissions.

SOLUTION -

Update the tmp directory location. For example -

spark-shell --master yarn --driver-memory 3G --num-executors 5 --executor-cores 3 --executor-memory 7G --conf "spark.driver.extraJavaOptions=-Djava.io.tmpdir=/product/a_d/spark/tmp" --conf "spark.executor.extraJavaOptions=-Djava.io.tmpdir=/product/a_d/spark/tmp"

If you receive same error from Hive shell then before opening Hive shell- set the temporary directory as below -

export HADOOP_OPTS="-Djava.io.tmpdir=/product/a_d/spark/tmp"

If performance is not a concern than you can also set -
set hive.fetch.task.conversion=none;

If you are using Beeline then set below property before invoking beeline command -

export _JAVA_OPTIONS=-Djava.io.tmpdir=/home/myhome/tmp

QueryDB

Search This Blog