Skip to main content

Posts

Showing posts from 2024

Deploy Code to Production

 

How to set priorities

 

Mixed Java and Scala Maven Projects

  To compile code with mixed Java & Scala classes, add following to pom.xml  <build> <pluginManagement> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>4.8.1</version> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.1</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </pluginManagement> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <executions> <execution> <id>scala-compile-fi

Spark Phoenix Error org.apache.phoenix.exception.PhoenixIOException: Can't find method newStub in org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService!

  Exception -  Caused by: org.apache.phoenix.exception.PhoenixIOException: Can't find method newStub in org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService! at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:138) at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:1652) at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:1462) at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1913) at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:3074) Solution -  This error is due to HBase & Phoenix Jar mismatch. Execute Spark-Submit with --verbose option for debugging.  Remove all HBase Jars from classpath. And, just add phoenix shaded Jar to classpath.

Maven Build Failure java.lang.UnsatisfiedLinkError: /tmp/jna/jna.tmp

  Exception  Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.3.0:compile failed: An API incompatibility was encountered while executing net.alchim31.maven:scala-maven-plugin:4.3.0:compile: java.lang.UnsatisfiedLinkError: /tmp/jna-1459455826/jna3448139317501565807.tmp: /tmp/jna-1459455826/jna3448139317501565807.tmp: failed to map segment from shared object: Operation not permitted Solution -  Set -Djna.tmpdir to any other directory, other then /tmp, which has execute permissions.

Spark 3 ( Scala 2.12) integration with HBase or Phoenix

  Clone Git Repo -  git clone https://github.com/dinesh028/hbase-connectors.git As of December 2022, the hbase-connectors releases in maven central are only available for Scala 2.11 and cannot be used with Spark 3.x The connector has to be compiled from source for Spark 3.x, see also HBASE-25326 Allow hbase-connector to be used with Apache Spark 3.0 Build as in this example (customize HBase, Spark and Hadoop versions, as needed): mvn -Dspark.version=3.3.1 -Dscala.version=2.12.15 -Dscala.binary.version=2.12 -Dhbase.version=2.4.15 -Dhadoop-three.version=3.3.2 -DskipTests clean package Use Jar with Spark -  spark-shell --jars ~/hbase-connectors/spark/hbase-spark/target/hbase-spark*.jar References -  https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_HBase_Connector.md https://kontext.tech/article/628/spark-connect-to-hbase Similarly, do build Phoenix connector or use Cloudera Repo to download Spark3 Jar @   https://repository.cloudera.com/service/rest/repository/bro

Set following properties to access - S3, S3a, S3n

  "fs.s3.awsAccessKeyId", access_key "fs.s3n.awsAccessKeyId", access_key "fs.s3a.access.key", access_key "fs.s3.awsSecretAccessKey", secret_key "fs.s3n.awsSecretAccessKey", secret_key "fs.s3a.secret.key", secret_key "fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem" "fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem" "fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem" If one needs to copy data from one S3 Bucket to other with different credential keys. Then -  If you are on Hadoop cluster with version 2.7, and using s3a:// then -  use URI as following - s3a://DataAccountKey:DataAccountSecretKey/DataAccount/path If you are on EMR or Hadoop 2.8+ then one can add properties per-bucket, as following -  fs.s3a.bucket.DataAccount.access.key DataAccountKey fs.s3a.bucket.DataAccount.secret.key DataAccountSecretKey fs.s3.bucket.DataAccount.awsAccessKeyId Da

Debugging KAFKA connectivity integration with Remote Application including Spring Boot, Spark, Console Consumer, Open SSL

  Our downstream partners wanted to consume data from Kafka Topic. They did open network & firewall ports with respective zookeeper & broker servers. But, Spring Boot application or Console Consumer failed to consume messages from Kafka topic. Refer log stack trace below -  [2024-01-10 13:33:34,759] DEBUG [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism_group] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient) [2024-01-10 13:33:34,762] WARN [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism_group] Bootstrap broker ncxxx001.h.c.com:9093 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient) [2024-01-10 13:33:34,860] DEBUG [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism_group] Initialize connection to node ncxxx001.h.c.com:9093 (id: -1 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient) [2024-01-10 13:33:34,861] DEBUG [Consumer clientId=consumer-o2_prism_group-1, groupId=o2_prism

Improve API Performance