QueryDB

Posts

Spark-JDBC connection with Oracle Fails - java.sql.SQLSyntaxErrorException: ORA-00903: invalid table name

While connecting Spark with Oracle JDBC, one may observe exception like below - spark.read.format("jdbc"). option("url", "jdbc:oracle:thin:@//oraclehost:1521/servicename"). option("dbtable", "mytable"). option("user", "myuser").option("driver", "oracle.jdbc.driver.OracleDriver") option("password", "mypassword"). load().write.parquet("/data/out") java.sql.SQLSyntaxErrorException: ORA-00903: invalid table name at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531) at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:208)

Hadoop Jobs - javax.security.sasl.SaslException: GSS initiate failed

If you see below exception while running Big Data Jobs - Spark, MR, Tez, etc. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Solution - The simplest way is to generate kerberos Keytab & initialize same in a user session before running job - Ex - kinit

TEZ - How to enable Fetch Task instead of MapReduce Job for simple query in Hive

Certain simple Hive queries can utilize fetch task, which can avoid the overhead of starting MapReduce job. hive.fetch.task.conversion This parameter controls which kind of simple query can be converted to a single fetch task. Value "none" is added in Hive 0.14 to disable this feature Value "minimal" means SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only. hive.fetch.task.conversion.threshold This parameter controls input threshold (in bytes) for applying hive.fetch.task.conversion.

Validate Emails using Python

Validate Email using Python- https://pypi.org/project/email-validator/ Installation pip install email-validator Usage Below script read list of emails from file "test.emails". Loop & validate each email in the file. #!/usr/bin/python from email_validator import validate_email, EmailNotValidError filename="/home/dinesh/setuptools-7.0/test.emails" total_count=0 valid_count=0 invalid_count=0 with open(filename, "r") as a_file: for line in a_file: stripped_line = line.strip() print(stripped_line) total_count=total_count+1 try: # Validate. valid = validate_email(stripped_line) valid_count=valid_count+1 # Update with the normalized form. #email = valid.email except EmailNotValidError as e: # email is not valid, exception message is human-readable print(str(e)) invalid_count=invalid_count+1 print("Total Count"+str(total_count)) print("Valid Count"+str(valid_count)) print("Invalid Count"

Spark HBase Connector - Don't support IN Clause

We came across a scenario in using "shc-core-1.1.0.3.1.5.0-152.jar". A Spark data frame was created on one of the HBase Tables. We queried this data frame like "select * from df where col in ('A', 'B', 'C')" and found that filter on col is not working. But, if re-write the same SQL like "select * from df where col = 'A' or col= 'B' or col= 'C' then it works.