Skip to main content

Posts

Showing posts from February, 2023

Microsoft Azure - Get Bearer token and access KeyVault secret using CURL command

  First, you must be having following to call  https://login.microsoftonline.com to get Bearer token -  tenant_id/ subscription_id  client_id client_secret Then call following command to get Bearer token for authorization for accessing resource https://vault.azure.net-  curl -X POST \ https://login.microsoftonline.com/{tenant_id}/oauth2/token \ -H 'cache-control: no-cache' \ -H 'content-type: application/x-www-form-urlencoded' \ -d 'grant_type=client_credentials&client_id={client_id}&client_secret={client_secret}&resource=https://vault.azure.net' Note to replace {*} with appropriate value. This will result a JSON response like below -  {"token_type":"Bearer","expires_in":"3599","ext_expires_in":"3599","expires_on":"1677278006","not_before":"1677274106","resource":"https://vault.azure.net", "access_token":"eyJ0...

Azure HDInsights - Sudden Spark Job Failure & Exit - ERROR sender.RawSocketSender: org.fluentd.logger.sender.RawSocketSender

  We observed that Spark Job suddenly exited without any Error when running long on Azure HDInsights. But, we observed following error - 22/07/13 05:38:32 ERROR RawSocketSender [MdsLoggerSenderThread]: Log data 53245216 larger than remaining buffer size 10485760 22/07/13 05:59:54 ERROR sender.RawSocketSender: org.fluentd.logger.sender.RawSocketSender java.net.ConnectException: Connection refused (Connection refused)         at java.net.PlainSocketImpl.socketConnect(Native Method)         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)         at java.net.SocksSocketImpl.connect(SocksSock...

Spark - java.util.NoSuchElementException: next on empty iterator [SPARK-27514]

  Recently, we did upgrade from HDP 3 to CDP 7, which involved upgrading Spark from 2.3 to 2.4. We did compile and build our Jar with new dependencies. But, code started failing with below error -  23/02/09 16:47:44 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: jav a.util.NoSuchElementException: next on empty iterator         at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)         at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)         at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)         at scala.collection.IterableLike$class.head(IterableLike.scala:107)         at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)         at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimiz...

SSH “timed out waiting for input: auto-logout”

It means if your SSH session has no activities for some time configured, the session will be disconnected. Timeotu value can be check by doing echo on $ TMOUT   ~]$ echo $TMOUT 900 For Linux Bash, usually the environment variable TMOUT is set either at the user level (.bashrc or .bash_profile) or at the system level (/etc/profile) to implement this security measure.

Spark-HBase- java.lang.NullPointerException

  Using -  hbase_connectors- hbase-spark.jar results in following exception -  23/02/07 12:29:58 Driver-Driverid ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hbase.spark.HBaseRelation.<init>(DefaultSource.scala:138) at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) To resolve this error, set -  hbase.spark.use.hbasecontext to false Example - personDS . write . format ( "org.apache.hadoop.hbase.spark" ) . option ( "hbase.colum...

Spark: handle column nullability- The 0th field 'colA' of input row cannot be null

  When you create a Spark DataFrame - One or more Columns can have schema nullable = false. What it means is that these column(s) can not have null values.  When null value is assigned to such columns, we see following exception -  2/7/2023 3:16:00 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 6) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: The 0th field 'colA' of input row cannot be null. So, as to avoid above error - we are required to update the Schema of DataFrame: to set nullable=true One of the way to do that is using When.Otherwise Clause like below -              . withColumn("col_name", when(col("col_name").isNotNull,                col("col_name")).otherwise(lit(null)))               This will tell Spark that Column can be null (, in case) Other way to do it is creating custom method to be ...