Skip to main content

Posts

Spark Streaming Kafka Errors

  We observed couple of errors / info messages while running Spark Streaming Applications, which might come  handy for debugging in future. Refer below -  1) We noticed that Spark streaming Job was running but consuming nothing. We could see following messages being iteratively printed in logs -  22/11/18 08:13:53 INFO AbstractCoordinator: [Consumer clientId=consumer-1, groupId=difnrt-uat-001] Group coordinator mymachine.com:9093 (id: 601150796 rack: null) is unavailable or invalid, will attempt rediscovery 22/11/18 08:13:53 INFO AbstractCoordinator: [Consumer clientId=consumer-1, groupId=difnrt-uat-001] Discovered group coordinator mymachine.com:9093 (id: 601150796 rack: null) 22/11/18 08:13:53 INFO AbstractCoordinator: [Consumer clientId=consumer-1, groupId=difnrt-uat-001] (Re-)joining group Possible Causes / Fixes -  May be due to Kafka Coordinator service. Try restarting Kafka it may fix the issue.  In our case, we couldn't get help from Admin Team. So, we changed "group.i

Spark Streaming - org.apache.kafka.clients.consumer.OffsetOutOfRangeException

  While Running Spark Streaming application, we observed following exception -  Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 410, mymachine.com, executor 2): org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {mygrouidid=233318826} at org.apache.kafka.clients.consumer.internals.Fetcher.initializeCompletedFetch(Fetcher.java:1260) at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:607) at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1313) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1240) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1168) at org.apache.spark.streaming.kafka010.InternalKafkaConsumer.poll(KafkaDataConsumer.scala:200) at org.apache.spark.streaming.ka

PKIX path building failed: sun.security.provider.certpath.SunCertP athBuilderException: unable to find valid certification path to requested target

  Receiving following error while creating build using Maven PKIX path building failed: sun.security.provider.certpath.SunCertP athBuilderException: unable to find valid certification path to requested target We tried resolving it with Solution 1 (, as below). But, after lots of tries - we did went with Solution 2, which is not recommended.  Solution 1: Open your website in browser, click on Lock icon to export selected certificate of website to a file on your local machine.  Then execute below command to import Certificate to your keystore -  keytool –import –noprompt –trustcacerts –alias ALIASNAME -file /PATH/TO/YOUR/DESKTOP/CertificateName.cer -keystore /PATH/TO/YOUR/JDK/jre/lib/security/cacerts -storepass changeit You can also generate your own keystore using following command -  keytool -keystore clientkeystore -genkey -alias client And, give it maven command as below -  mvn package -Djavax.net.ssl.trustStore=clientkeystore But, clearly Solution 1 didn't work for us. So, here

Cryptography: understanding AES and RSA

  Cryptography, or cryptology is the practice and study of techniques for secure communication in the presence of adversarial behavior. More generally, cryptography is about constructing and analyzing protocols that prevent third parties or the public from reading private messages. This secures the information being transmitted in point to point communication. This at lowest level, is achieved by using data & mathematical algorithms. Cryptography prior to the modern age was effectively synonymous with encryption , converting readable information (plaintext) to unintelligible nonsense text (ciphertext), which can only be read by reversing the process (decryption). The sender of an encrypted (coded) message shares the decryption (decoding) technique only with intended recipients to preclude access from adversaries. Modern cryptography is heavily based on mathematical theory and computer science practice; cryptographic algorithms are designed around computational hardness assumptions,

Spark - Creating Sub Folder while writing to Partitioned Hive Table

  We had been writing to a Partitioned Hive Table and realized that data is being written has sub-folder. For ex- Refer Table definition as below -  Create table T1 ( name string, address string) Partitioned by (process_date string) stored as parquet location '/mytable/a/b/c/org=employee'; While writing to table HDFS path being written looks something like this -  /mytable/a/b/c/org=employee/ process_date=20220812/ org=employee The unnecessary addition of   org=employee after  process_date partition is because Hive Table has location consisting "=" operator, which Hive uses as syntax to determine partition column. Re-defining Table resolves above problem -  Create table T1 ( name string, address string) Partitioned by (process_date string) stored as parquet location '/mytable/a/b/c/employee';