Skip to main content

Posts

Showing posts from January, 2021

Sqoop Import: New Line Character in one of the column value

Sometimes data produced by Sqoop Import may contain New Line Character. This may result failure to correctly read the data. To resolve same follow either of below solution: Specify following options with Sqoop: --map-column-java <Column name that contains New Line>=String --hive-drop-import-delims Or, Update Sqoop SQL and select the column with regex replacement, like: regexp_replace(<Column name that contains New Line>, '[[:space:]]+', ' ') 

SASL Exception on HDP Sandbox while running Pig action via Oozie

While running Pig scripts via Oozie, you might face SASL exception (even though Kerberos might be disabled).  To resolve same, just comment out following lines in hive-site.xml then upload it to “oozie.wf.application.path”, which were – <!--property>        <name>hive.metastore.kerberos.keytab.file</name>      <value>/etc/security/keytabs/hive.service.keytab</value>  </property>  <property>       <name>hive.metastore.kerberos.principal</name>        <value>hive/_HOST@EXAMPLE.COM</value>  </property-->

Hive Partitioned view Errors: IndexOutOfBoundsException

 Setup: Created table T1 : create table t1 (name string) PARTITIONED BY (c1 string) Created table v1: create view v1 PARTITIONED ON (c1) select * from t1 Inserted data in T1 with partition c1=A Did Alter view to add partition to V1 Noticed that: show partitions v1 – works fine show create table v1 – don’t show that view is partitioned. select * from v1 – gives error – “FAILED: IndexOutOfBoundsException” select * from v1 where c1 like '%%' - gives error – “FAILED: IndexOutOfBoundsException” select * from v1 where c1='A' – If partition column is specified than query works fine.

HDP Sandbox : Oozie web console is disabled.

If you see below message: To enable Oozie web console install the Ext JS library. Refer to Oozie Quick Start documentation for details. Oozie Web Console Solution: wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip Stop Oozie service from Ambari Copy it to the path: /usr/hdp/current/oozie-client/libext Regenerate the war file by executing: $ /usr/hdp/current/oozie-server/bin/oozie-setup.sh prepare-war Start Oozie again

Beeline (Hive) - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

 Exception -  Exception in thread "main" java.lang.OutOfMemoryError: Java heap space                     at java.util.Arrays.copyOf(Arrays.java:3236)                     at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)                     at org.apache.hive.beeline.BeeLine.getConsoleReader(BeeLine.java:905)                     at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:792)  Issue -  https://issues.apache.org/jira/browse/HIVE-10836 The issue is caused by large Beeline History file (<user_home>/.beeline/history) Analysis - 1.   ...

Spark Error - missing part 0 of the schema, 2 parts are expected

 Exception -  Caused by: org.apache.spark.sql.AnalysisException : Could not read schema from the hive metastore because it is corrupted. (missing part 0 of the schema, 2 parts are expected).; Analysis -  ·          Check for table definition. In TBLProperties, you might find something like this – > spark.sql.sources.schema.numPartCols > 'spark.sql.sources.schema.numParts' 'spark.sql.sources.schema.part.0' > 'spark.sql.sources.schema.part.1' 'spark.sql.sources.schema.part.2' > 'spark.sql.sources.schema.partCol.0' > 'spark.sql.sources.schema.partCol.1' That’s what error seems to say that part1 is defined but part0 is missing.  Solution -  Drop & re-create table. If Table was partitioned  then all partitions  would have been removed. So do either of below -  ·          Msck repair table <db_name>.<table_name> ·    ...