Spark MongoDB Write Error - com.mongodb.MongoBulkWriteException: Bulk write operation error on server 'E11000 duplicate key error collection:'
One may see following error or exception, while running Spark 2.4 with -
- mongo-spark-connector_2.11-2.4.0.jar
- mongo-java-driver-3.9.0.jar
Exception -
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 6.0 failed 4 times, most recent failure: Lost task 2.3 in stage 6.0 (TID 238, nc0020.hadoop.mycluster.com, executor 2): com.mongodb.MongoBulkWriteException: Bulk write operation error on server vondbd0008.mymachine.com:27017. Write errors: [BulkWriteError{index=0, code=11000, message='E11000 duplicate key error collection: POC1_DB.MyCollection index: _id_ dup key: { _id: "113442141" }', details={ }}].
at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:177)
at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:206)
at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:147)
at com.mongodb.operation.BulkWriteBatch.getResult(BulkWriteBatch.java:227)
at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:277)
at com.mongodb.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:68)
at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:201)
at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:192)
at com.mongodb.operation.OperationHelper.withReleasableConnection(OperationHelper.java:424)
at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:192)
Solution -
- As per solution provided here - https://jira.mongodb.org/browse/SPARK-251
- This seem to be configuration error, where in - forceInsert might be set to true leading to this error.
- As per document - https://www.mongodb.com/docs/spark-connector/v2.4/configuration/#std-label-spark-output-conf
forceInsert
Forces saves to use inserts, even if a Dataset contains
_id
.Default:
false
- Even though, we didn't set this property ( default false) - We were getting this error.
- We tried manually setting this property as false but that didn't work for us.
- Finally, we solved this error by changing the connector version as below -
- mongo-spark-connector_2.11-2.3.5.jar
- mongo-java-driver-3.12.5.jar
Comments
Post a Comment