Open File Descriptors w.r.t Kafka brokers relates with following -
- number of file descriptors to just track log segment files.
- Additional file descriptors to communicate via network sockets with external parties (such as clients, other brokers, Zookeeper, and Kerberos).
For # 1 this is formula -
(number of partitions)*(partition size / segment size)
For #2, every connection made my consumer or producer or zookeeper or Kerberos opens file descriptors. Note that each TCP connection creates 2 file descriptors. These connections can be for internal communication of heartbeat, or security handshake, or data transfer to or from client (producer or consumer)
When we run a Spark application integrating it with Kafka. And, if it is not stable, meaning -
- Streaming window for micro batches is less than Processing time. That will eventually create scheduling delay for each batch.
- Eventually, Active Batch backlog starts to increase in a Spark program.
Each of the Active Batch opens connection with Kafka. Hence, opens file descriptors to read messages or metadata from Kafka. These descriptors remain open until that batch is processed.
That said, if Active Batches keeps piling up that will eventually pile up open file descriptors at Kafka Broker end.
Thus, Spark Streaming for an application can impact overall Kafka Broker performance. Hence, it can impact entire cluster, which might be shared by other teams and applications.
Comments
Post a Comment