Skip to main content

Posts

Showing posts from August, 2023

Logstash connect to Kerberos authenticated Hive Service

  Normally, one can write syntax like below to create a JDBC connection with Hive -  input { jdbc { jdbc_driver_library => "hive-jdbc-2.0.0.jar,hive2.jar,hive-common-2.3.1.jar,hadoop-core-1.2.1-0.jar" jdbc_driver_class => "org.apache.hive.jdbc.HiveDriver" jdbc_connection_string => "" } } output { # Publish out in command line stdout { codec => json } } But, you will get problem if you need to do Kerberos authentication for using Hive JDBC. Relating to this, set following JVM Options. Note that these can be set with either within config/jvm.options file or setting the  LS_JAVA_OPTS  variable will additive override JVM settings. Refer - https://www.elastic.co/guide/en/logstash/current/jvm-settings.html -Djava.security.auth.login.config=<Jass_config_file_path> (Required) -Djava.security.krb5.conf=<Path to krb5.conf> (if it is not in default location under /etc/) if KRB5.conf is not specified then y

Generate or Create a Keytab File (Kerberos)

  Steps as below -  Run ktutil to launch the command line utility   Type command -  addent -password -p $user @ $REALM -k 1 -e $encryptionType Note replace the highlighted keywords -  $user - Name of the user $REALM - Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service $encryptionType - Type of Encryption like -  aes256-cts des3-cbc-sha1-kd RC4-HMAC arcfour-hmac-md5  des-hmac-sha1 des-cbc-md5 , etc. You can add one or more entry(s) for different types of encryption. When prompted, enter the password for the Kerberos principal user. Type the following command to write a keytab file -  wkt $user .keytab Type 'q' to quit the utility.  Verify the keytab is created and has the right User Entry -  Execute below command -  klist -ekt $PWD/ $user .keytab Initialize the keytab or generate a ticket-  Execute below command -  kinit $user @ $REALM -kt $PWD/ $user .keytab Display list of currently cached Kerber

Spark Hadoop EMR Cross Realm Access HBase & Kafka

  We had in-premise Hadoop Cluster which included Kafka, HBase, HDFS, Spark, YARN , etc. We planned to migrate our Big Data Jobs and Data to AWS EMR but still keeping Kafka on in-premise CDP cluster. After Spawning EMR on AWS. We tried running Spark Job connecting to Kafka on in-premise cluster. We did setup all VPC connections & opened 2firewall ports between the two clusters. But, since EMR and CDP (in-premise) had different KDC Server & principal, it kept on failing for us to connect to Kafka ( in-premise) from EMR. Note, one can set following property to see Kerberos logs -  -Dsun.security.krb5.debug=true The easiest option for us were two -  Setup Cross-Realm Kerberos trust. Such that EMR principal in-premise KDC Server to use kafka service. Refer - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system-level_authentication_guide/using_trusts Setup to Cross-Realm trust using same AD accounts and domain. Refer https://docs.aws.amazon.com/emr/la