Skip to main content

Posts

Showing posts from June, 2019

Snappy ERROR using Spark/ Hive

we received following error using SPARK- ERROR - 1) java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy         at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)         at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51) 2) Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-d5273c94-b734-4a61-b631-b68a9e859151-libsnappyjava.so: /tmp/snappy-1.1.2-d5273c94-b734-4a61-b631-b68a9e859151-libsnappyjava.so: failed to map segment from shared object: Operation not permitted         at java.lang.ClassLoader$NativeLibrary.load(Native Method)         at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)         at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)         at java.lang.Runtime.load0(Runtime.java:809) CAUSE -  It is because that /tmp doesn't have execute permissions. SOLUTION -  Update the tmp dir

Spark Hive Table on top data consisting of sub directories

We can create a Hive Table on HDFS path which consist of data in sub directories. Like - Table A |--Dir1 |   |--datafile |--Dir2 |   |--datafile |--Dir3    |--datafile When we read this Hive table using Spark it gives error that "respective path is a directory not a file". Solution- Data can be read recursively by setting following property - set mapreduce.input.fileinputformat.input.dir.recursive=true; 

Serialization Exception running GenericUDF in HIVE with Spark.

I get an exception running a job with a GenericUDF in HIVE with Spark. Exception trace as below -  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)                 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)                 at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:186)                 ... 40 more Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: java.time.temporal.TemporalQueries$$Lambda$27/1278539211 Serialization trace: query (java.time.format.DateTimeFormatterBuilder$ZoneTextPrinterParser) printerParsers (java.time.format.DateTimeFormatterBuilder$CompositePrinterParser) printerParser (java.time.format.DateTimeFormatter) frmt (com.ds.common.udfs.TZToOffset)                 at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)                 at o