QueryDB

Posts

Showing posts from June, 2019

Snappy ERROR using Spark/ Hive

we received following error using SPARK- ERROR - 1) java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62) at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51) 2) Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-d5273c94-b734-4a61-b631-b68a9e859151-libsnappyjava.so: /tmp/snappy-1.1.2-d5273c94-b734-4a61-b631-b68a9e859151-libsnappyjava.so: failed to map segment from shared object: Operation not permitted at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824) at java.lang.Runtime.load0(Runtime.java:809) CAUSE - It is because that /tmp doesn't have execute permissions. SOLUTION - Update the tmp dir

Spark Hive Table on top data consisting of sub directories

We can create a Hive Table on HDFS path which consist of data in sub directories. Like - Table A |--Dir1 | |--datafile |--Dir2 | |--datafile |--Dir3 |--datafile When we read this Hive table using Spark it gives error that "respective path is a directory not a file". Solution- Data can be read recursively by setting following property - set mapreduce.input.fileinputformat.input.dir.recursive=true;

Serialization Exception running GenericUDF in HIVE with Spark.

I get an exception running a job with a GenericUDF in HIVE with Spark. Exception trace as below - at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:186) ... 40 more Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: java.time.temporal.TemporalQueries$$Lambda$27/1278539211 Serialization trace: query (java.time.format.DateTimeFormatterBuilder$ZoneTextPrinterParser) printerParsers (java.time.format.DateTimeFormatterBuilder$CompositePrinterParser) printerParser (java.time.format.DateTimeFormatter) frmt (com.ds.common.udfs.TZToOffset) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at o