Friday, 16 April 2021

Spark issue : Spark Out of Memory Issue

 Out Of Memory is the most common type of error you might face in Spark

1. You can get this error at Driver 

2. or You can get this error at the Executor level.


1. Driver 

Collect Operation

If we have big files and we try to collect those in Driver. You will see Driver OOM(i.e Out Of Memory Error)

Broadcast Join

If we try to broadcast large files. Then again you will see OOM. 

2. Executor 

High Concurrency 

If we assign more cores to a single executor then it might possible for each partition we have fewer resources(memory) but need to process large files then we get OOM.

As a benchmark - use 3 or 5 cores to a single executor, not more than this

Large Partitions causes OOM

If you have a larger partition then if the executor tries to process that data then chances are it fails with OOM 

As a benchmark - Use appropriate partitions

Yarn Memory Overhead OOM

Executor container consists of Yarn Memory Overhead + Executor memory

Normally Yarn Memory Overhead(off-heap memory) should be 10% of Executor memory.

This Yarn Memory Overhead memory is used to store spark internal objects. Some time those need more space and throw OOM 

If this error you will get, try to increase the Yarn Memory Overhead parameter. 


I hope you like this blog. 

Please subscribe and follow this blog for more information on Big Data Journey