Skip to main content

Posts

Set following properties to access - S3, S3a, S3n

  "fs.s3.awsAccessKeyId", access_key "fs.s3n.awsAccessKeyId", access_key "fs.s3a.access.key", access_key "fs.s3.awsSecretAccessKey", secret_key "fs.s3n.awsSecretAccessKey", secret_key "fs.s3a.secret.key", secret_key "fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem" "fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem" "fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem" If one needs to copy data from one S3 Bucket to other with different credential keys. Then -  If you are on Hadoop cluster with version 2.7, and using s3a:// then -  use URI as following - s3a://DataAccountKey:DataAccountSecretKey/DataAccount/path If you are on EMR or Hadoop 2.8+ then one can add properties per-bucket, as following -  fs.s3a.bucket.DataAccount.access.key DataAccountKey fs.s3a.bucket.DataAccount.secret.key DataAccountSecretKey fs.s3.bucket.DataAccount.awsAccessKeyId Da

Debugging KAFKA connectivity integration with Remote Application including Spring Boot, Spark, Console Consumer, Open SSL

  Our downstream partners wanted to consume data from Kafka Topic. They did open network & firewall ports with respective zookeeper & broker servers. But, Spring Boot application or Console Consumer failed to consume messages from Kafka topic. Refer log stack trace below -  [2024-01-10 13:33:34,759] DEBUG [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism_group] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient) [2024-01-10 13:33:34,762] WARN [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism_group] Bootstrap broker ncxxx001.h.c.com:9093 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient) [2024-01-10 13:33:34,860] DEBUG [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism_group] Initialize connection to node ncxxx001.h.c.com:9093 (id: -1 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient) [2024-01-10 13:33:34,861] DEBUG [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism

Improve API Performance

 

Spark: Decimal Column are shown in scientific notation instead of numbers

  This issue relates to - When dataframe decimal type column having scale higher than 6, 0 values are shown in scientific notation  SPARK-25177   Solution -  One can use  format_number UDF to convert scientific notation into String, as shown below - 

How indexes work in SQL