QueryDB

Posts

Showing posts from February, 2023

Microsoft Azure - Get Bearer token and access KeyVault secret using CURL command

First, you must be having following to call https://login.microsoftonline.com to get Bearer token - tenant_id/ subscription_id client_id client_secret Then call following command to get Bearer token for authorization for accessing resource https://vault.azure.net- curl -X POST \ https://login.microsoftonline.com/{tenant_id}/oauth2/token \ -H 'cache-control: no-cache' \ -H 'content-type: application/x-www-form-urlencoded' \ -d 'grant_type=client_credentials&client_id={client_id}&client_secret={client_secret}&resource=https://vault.azure.net' Note to replace {*} with appropriate value. This will result a JSON response like below - {"token_type":"Bearer","expires_in":"3599","ext_expires_in":"3599","expires_on":"1677278006","not_before":"1677274106","resource":"https://vault.azure.net", "access_token":"eyJ0...

Azure HDInsights - Sudden Spark Job Failure & Exit - ERROR sender.RawSocketSender: org.fluentd.logger.sender.RawSocketSender

We observed that Spark Job suddenly exited without any Error when running long on Azure HDInsights. But, we observed following error - 22/07/13 05:38:32 ERROR RawSocketSender [MdsLoggerSenderThread]: Log data 53245216 larger than remaining buffer size 10485760 22/07/13 05:59:54 ERROR sender.RawSocketSender: org.fluentd.logger.sender.RawSocketSender java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSock...

Spark - java.util.NoSuchElementException: next on empty iterator [SPARK-27514]

Recently, we did upgrade from HDP 3 to CDP 7, which involved upgrading Spark from 2.3 to 2.4. We did compile and build our Jar with new dependencies. But, code started failing with below error - 23/02/09 16:47:44 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: jav a.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63) at scala.collection.IterableLike$class.head(IterableLike.scala:107) at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48) at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimiz...

SSH “timed out waiting for input: auto-logout”

It means if your SSH session has no activities for some time configured, the session will be disconnected. Timeotu value can be check by doing echo on $ TMOUT ~]$ echo $TMOUT 900 For Linux Bash, usually the environment variable TMOUT is set either at the user level (.bashrc or .bash_profile) or at the system level (/etc/profile) to implement this security measure.

Spark-HBase- java.lang.NullPointerException

Using - hbase_connectors- hbase-spark.jar results in following exception - 23/02/07 12:29:58 Driver-Driverid ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hbase.spark.HBaseRelation.<init>(DefaultSource.scala:138) at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) To resolve this error, set - hbase.spark.use.hbasecontext to false Example - personDS . write . format ( "org.apache.hadoop.hbase.spark" ) . option ( "hbase.colum...

Spark: handle column nullability- The 0th field 'colA' of input row cannot be null

When you create a Spark DataFrame - One or more Columns can have schema nullable = false. What it means is that these column(s) can not have null values. When null value is assigned to such columns, we see following exception - 2/7/2023 3:16:00 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 6) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: The 0th field 'colA' of input row cannot be null. So, as to avoid above error - we are required to update the Schema of DataFrame: to set nullable=true One of the way to do that is using When.Otherwise Clause like below - . withColumn("col_name", when(col("col_name").isNotNull, col("col_name")).otherwise(lit(null))) This will tell Spark that Column can be null (, in case) Other way to do it is creating custom method to be ...