Skip to main content

Posts

Showing posts from February, 2023

Microsoft Azure - Get Bearer token and access KeyVault secret using CURL command

  First, you must be having following to call  https://login.microsoftonline.com to get Bearer token -  tenant_id/ subscription_id  client_id client_secret Then call following command to get Bearer token for authorization for accessing resource https://vault.azure.net-  curl -X POST \ https://login.microsoftonline.com/{tenant_id}/oauth2/token \ -H 'cache-control: no-cache' \ -H 'content-type: application/x-www-form-urlencoded' \ -d 'grant_type=client_credentials&client_id={client_id}&client_secret={client_secret}&resource=https://vault.azure.net' Note to replace {*} with appropriate value. This will result a JSON response like below -  {"token_type":"Bearer","expires_in":"3599","ext_expires_in":"3599","expires_on":"1677278006","not_before":"1677274106","resource":"https://vault.azure.net", "access_token":"eyJ0

Azure HDInsights - Sudden Spark Job Failure & Exit - ERROR sender.RawSocketSender: org.fluentd.logger.sender.RawSocketSender

  We observed that Spark Job suddenly exited without any Error when running long on Azure HDInsights. But, we observed following error - 22/07/13 05:38:32 ERROR RawSocketSender [MdsLoggerSenderThread]: Log data 53245216 larger than remaining buffer size 10485760 22/07/13 05:59:54 ERROR sender.RawSocketSender: org.fluentd.logger.sender.RawSocketSender java.net.ConnectException: Connection refused (Connection refused)         at java.net.PlainSocketImpl.socketConnect(Native Method)         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)         at java.net.Socket.connect(Socket.java:607)         at org.fluentd.logger.sender.RawSocketSender.connect(RawSocketSender.java:85)         at org.flu

Spark - java.util.NoSuchElementException: next on empty iterator [SPARK-27514]

  Recently, we did upgrade from HDP 3 to CDP 7, which involved upgrading Spark from 2.3 to 2.4. We did compile and build our Jar with new dependencies. But, code started failing with below error -  23/02/09 16:47:44 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: jav a.util.NoSuchElementException: next on empty iterator         at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)         at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)         at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)         at scala.collection.IterableLike$class.head(IterableLike.scala:107)         at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)         at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)         at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48)         at org.apache.spark.sql.catalyst

SSH “timed out waiting for input: auto-logout”

It means if your SSH session has no activities for some time configured, the session will be disconnected. Timeotu value can be check by doing echo on $ TMOUT   ~]$ echo $TMOUT 900 For Linux Bash, usually the environment variable TMOUT is set either at the user level (.bashrc or .bash_profile) or at the system level (/etc/profile) to implement this security measure.

Spark-HBase- java.lang.NullPointerException

  Using -  hbase_connectors- hbase-spark.jar results in following exception -  23/02/07 12:29:58 Driver-Driverid ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hbase.spark.HBaseRelation.<init>(DefaultSource.scala:138) at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) To resolve this error, set -  hbase.spark.use.hbasecontext to false Example - personDS . write . format ( "org.apache.hadoop.hbase.spark" ) . option ( "hbase.columns.m

Spark: handle column nullability- The 0th field 'colA' of input row cannot be null

  When you create a Spark DataFrame - One or more Columns can have schema nullable = false. What it means is that these column(s) can not have null values.  When null value is assigned to such columns, we see following exception -  2/7/2023 3:16:00 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 6) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: The 0th field 'colA' of input row cannot be null. So, as to avoid above error - we are required to update the Schema of DataFrame: to set nullable=true One of the way to do that is using When.Otherwise Clause like below -              . withColumn("col_name", when(col("col_name").isNotNull,                col("col_name")).otherwise(lit(null)))               This will tell Spark that Column can be null (, in case) Other way to do it is creating custom method to be called on Dataframe that returns new Dataframe with modified schema.               import org.apache.spark