Skip to main content

Posts

Logstash connect to Kerberos authenticated Hive Service

  Normally, one can write syntax like below to create a JDBC connection with Hive -  input { jdbc { jdbc_driver_library => "hive-jdbc-2.0.0.jar,hive2.jar,hive-common-2.3.1.jar,hadoop-core-1.2.1-0.jar" jdbc_driver_class => "org.apache.hive.jdbc.HiveDriver" jdbc_connection_string => "" } } output { # Publish out in command line stdout { codec => json } } But, you will get problem if you need to do Kerberos authentication for using Hive JDBC. Relating to this, set following JVM Options. Note that these can be set with either within config/jvm.options file or setting the  LS_JAVA_OPTS  variable will additive override JVM settings. Refer - https://www.elastic.co/guide/en/logstash/current/jvm-settings.html -Djava.security.auth.login.config=<Jass_config_file_path> (Required) -Djava.security.krb5.conf=<Path to krb5.conf> (if it is not in default location under /etc/) if KRB5.conf is not specified then y...

Generate or Create a Keytab File (Kerberos)

  Steps as below -  Run ktutil to launch the command line utility   Type command -  addent -password -p $user @ $REALM -k 1 -e $encryptionType Note replace the highlighted keywords -  $user - Name of the user $REALM - Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service $encryptionType - Type of Encryption like -  aes256-cts des3-cbc-sha1-kd RC4-HMAC arcfour-hmac-md5  des-hmac-sha1 des-cbc-md5 , etc. You can add one or more entry(s) for different types of encryption. When prompted, enter the password for the Kerberos principal user. Type the following command to write a keytab file -  wkt $user .keytab Type 'q' to quit the utility.  Verify the keytab is created and has the right User Entry -  Execute below command -  klist -ekt $PWD/ $user .keytab Initialize the keytab or generate a ticket-  Execute below command -  kinit $user @ $REALM -k...

Spark Hadoop EMR Cross Realm Access HBase & Kafka

  We had in-premise Hadoop Cluster which included Kafka, HBase, HDFS, Spark, YARN , etc. We planned to migrate our Big Data Jobs and Data to AWS EMR but still keeping Kafka on in-premise CDP cluster. After Spawning EMR on AWS. We tried running Spark Job connecting to Kafka on in-premise cluster. We did setup all VPC connections & opened 2firewall ports between the two clusters. But, since EMR and CDP (in-premise) had different KDC Server & principal, it kept on failing for us to connect to Kafka ( in-premise) from EMR. Note, one can set following property to see Kerberos logs -  -Dsun.security.krb5.debug=true The easiest option for us were two -  Setup Cross-Realm Kerberos trust. Such that EMR principal in-premise KDC Server to use kafka service. Refer - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system-level_authentication_guide/using_trusts Setup to Cross-Realm trust using same AD accounts and domain. Refer https://do...

Spark MongoDB Write Error - com.mongodb.MongoBulkWriteException: Bulk write operation error on server 'E11000 duplicate key error collection:'

  One may see following error or exception, while running Spark 2.4 with -  mongo-spark-connector_2.11-2.4.0.jar mongo-java-driver-3.9.0.jar Exception -  User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 6.0 failed 4 times, most recent failure: Lost task 2.3 in stage 6.0 (TID 238, nc0020.hadoop.mycluster.com, executor 2): com.mongodb.MongoBulkWriteException: Bulk write operation error on server vondbd0008.mymachine.com:27017. Write errors: [BulkWriteError{index=0, code=11000, message='E11000 duplicate key error collection: POC1_DB.MyCollection index: _id_ dup key: { _id: "113442141" }', details={ }}]. at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:177) at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:206) at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:147) at com.mongodb.operation.BulkWrite...

Python pyodbc - Error - [unixODBC][Oracle][ODBC][Ora]ORA-12162: TNS:net service name is incorrectly specified

One may encounter below errors while connecting to oracle, using pyodbc, using python 3 [unixODBC][Driver Manager]Can't open lib 'Oracle ODBC driver for Oracle 19' : file not found (0) (SQLDriverConnect)  [unixODBC][Oracle][ODBC][Ora]ORA-12162: TNS:net service name is incorrectly specified\n (12162) (SQLDriverConnect) RuntimeError: Unable to set SQL_ATTR_CONNECTION_POOLING attribute The solution to fix above errors is to -  Make following entry in /etc/odbcinst.ini                  [Oracle ODBC driver for Oracle 19]                Description=Oracle ODBC driver for Oracle 19                Driver=$ORACLE_HOME/lib/libsqora.so.19.1                FileUsage=1                Driver Logging=7               ...