We had a Spark Job which was taking over 3 hours to complete.
- First, We found the stage which was taking time. Refer details as below -
- This is simply reading file and mapping data to final save location. So, there are not much joins or calculations involved.
- Second, We saw event time line, if there is any delay due to serialization/ shuffling/ scheduling. But, there was nothing.
- Fourth, to confirm there is no data skew, we sorted the tasks to see maximum duration and maximum input a task processed, and maximum shuffle. And, we found nothing.
So conclusively, we could say that there is no problem with Job. Why Job is slow, is because it has less number of executors or eventually less vcores for task processing. Thus, we bumped up Number of Executors from 5 to 30. And, each executor was assigned 3 vcores. Thus, a total vcores of 30*3 = 900 was provided to Job.
This Job completed in 40 minutes which was earlier taking 3.5 hours, just by increasing number of executors.
Comments
Post a Comment