Skip to main content

Posts

Showing posts from February, 2022

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow

  We were running an application which was leading to below error -  Job aborted due to stage failure: Task 137 in stage 5.0 failed 4 times, most recent failure: Lost task 137.3 in stage 5.0 (TID 2090, ncABC.hadoop.com, executor 1): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 59606960. To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:330) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:456) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 59606960 at com.esotericsoftware.kryo.io.Output.require(Output.java:167) at com.esotericsoftware.kryo.io.Output.writeBytes(

Spark MongoDB Connection - Fetched BSON document does not have all the fields solution

  Spark MongoDB connector does not fetch all the fields present in stored BSON document in a collection.  This is because Mongo collection can have documents with different schema. Typically, all documents in a collection are of similar or related purpose. A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data. So, when we read MongoDB Collection using Spark connector. It infers schema as per first row it may read, which might not consist of fields which are present in subsequent tuples/ rows. Suppose We have a MongoDB collection - default.fruits, and it has following documents -  { "_id" : 1, "type" : "apple"} { "_id" : 2, "type" : "orange", "qty" : 10 } { "_id" : 3, "type" : "ban