Skip to main content

Posts

Showing posts from September, 2023

(AWS EMR) Spark Error - org.apache.kafka.common.TopicPartition; class invalid for deserialization

  Spark Kafka Integration Job leads to error below -  Caused by: java.io.InvalidClassException: org.apache.kafka.common.TopicPartition; class invalid for deserialization   at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)   at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:885) That is because CLASSPATH might be having two or more different version of kafka-clients-*.jar. For example - One may be dependent Jar with "spark-sql-kafka", and other version might be present by default on cluster.  For example in our case-  AWS EMR had "/usr/lib/hadoop-mapreduce/kafka-clients-0.8.2.1.jar" But, we provided following in spark-submit classpath -  spark-sql-kafka-0-10_2.12-2.4.4.jar kafka-clients-2.4.0.jar We tried removing "kafka-clients-2.4.0.jar" from spark-submit --jars but that lead to same error. So, we were finally required to remove EMR provided Jar - "kafka-clients-0.8.2.1.jar" to fix

Spark error- java.lang.IllegalStateException: Expected SaslMessage, received something else (maybe your client does not have SASL enabled?)

  Exception Trace -  23/09/08 19:37:39 dispatcher-event-loop-0-dispatcher-event-loop-0id ERROR YarnClusterScheduler: Lost executor 1 on Unable to create executor due to Unable to register with external shuffle server due to : java.lang.IllegalStateException: Expected SaslMessage, received something else (maybe your client does not have SASL enabled?) at org.apache.spark.network.sasl.SaslMessage.decode(SaslMessage.java:69) at org.apache.spark.network.sasl.SaslRpcHandler.doAuthChallenge(SaslRpcHandler.java:80) at org.apache.spark.network.server.AbstractAuthRpcHandler.receive(AbstractAuthRpcHandler.java:59) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:180) Reason -  This error was coming in our case when -  spark.shuffle.service.enabled=true spark.dynamicAllocation.enabled=true Solution -  Set following false -  spark.shuffle.service.enabled=false spark.dynamicAllocation.enabled=false Or, Set following property -  spark.authe