Spark Kafka Integration Job leads to error below -
Caused by: java.io.InvalidClassException: org.apache.kafka.common.TopicPartition; class invalid for deserialization
at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:885)
That is because CLASSPATH might be having two or more different version of kafka-clients-*.jar. For example - One may be dependent Jar with "spark-sql-kafka", and other version might be present by default on cluster.
For example in our case-
- AWS EMR had "/usr/lib/hadoop-mapreduce/kafka-clients-0.8.2.1.jar"
- But, we provided following in spark-submit classpath -
- spark-sql-kafka-0-10_2.12-2.4.4.jar
- kafka-clients-2.4.0.jar
We tried removing "kafka-clients-2.4.0.jar" from spark-submit --jars but that lead to same error. So, we were finally required to remove EMR provided Jar - "kafka-clients-0.8.2.1.jar" to fix the issue.
Comments
Post a Comment