Skip to main content

Posts

Copy code of Git Repo in to a different Git Repo with History Commits

  1) Git clone  git clone <url to Source repo> temp-dir 2) Check different branches git branch -a 3) Checkout all the branches that you want to copy git checkout branch-name 4) fetch all the tags  git fetch --tags 5) Clear the link to Source repo git remote rm origin 6) Link your local repository to your newly created NEW repository git remote add origin <url to NEW repo> 7) Push all your branches and tags with these commands: git push origin --all git push --tags 8) Above steps complete copy from Source repo to New repo

Spark -Teradata connection Issues

Exception:  Caused by: java.lang.NullPointerException         at com.teradata.tdgss.jtdgss.TdgssConfigApi.GetMechanisms(Unknown Source)         at com.teradata.tdgss.jtdgss.TdgssManager.<init>(Unknown Source)         at com.teradata.tdgss.jtdgss.TdgssManager.<clinit>(Unknown Source) Brief: tdgssconfig.jar can't be found on the classpath. Please add same on classpath. Exception:  java.sql.SQLException: [Teradata Database] [TeraJDBC 15.10.00.33] [Error 3707] [SQLState 42000] Syntax error, expected something like a name or a Unicode delimited identifier or an 'UDFCALLNAME' keyword or '(' between the 'FROM' keyword and the 'SELECT' keyword. Brief: Normally Spark JDBC expects DBTable property to be a Table Name. So, internally it prepends "select * from" to Table Name. Like  select * from <Table Name> But, If we specify SQL instead of  Table Name then internally SQL will become something like: select * from select ... ; Above m

Splunk Data to Hadoop Ingestion

One of the approach to get data from Splunk to Hadoop is to use REST API provided by Splunk. Such that periodically data is ingested to Hadoop Data Lake.  Simple command like below can help in such scenario: curl  -u '<username>:<password>' \    -k https://splunkhost:8089/services/search/jobs/export \   -d search="search index=myindex | head 10" \   -d output_mode=raw \    | hdfs dfs -put -f - <HDFS_DIR>    Above command will get top 10 rows from Splunk index "myindex" and will ingest it to Hadoop Data Lake

Sqoop Import: New Line Character in one of the column value

Sometimes data produced by Sqoop Import may contain New Line Character. This may result failure to correctly read the data. To resolve same follow either of below solution: Specify following options with Sqoop: --map-column-java <Column name that contains New Line>=String --hive-drop-import-delims Or, Update Sqoop SQL and select the column with regex replacement, like: regexp_replace(<Column name that contains New Line>, '[[:space:]]+', ' ') 

SASL Exception on HDP Sandbox while running Pig action via Oozie

While running Pig scripts via Oozie, you might face SASL exception (even though Kerberos might be disabled).  To resolve same, just comment out following lines in hive-site.xml then upload it to “oozie.wf.application.path”, which were – <!--property>        <name>hive.metastore.kerberos.keytab.file</name>      <value>/etc/security/keytabs/hive.service.keytab</value>  </property>  <property>       <name>hive.metastore.kerberos.principal</name>        <value>hive/_HOST@EXAMPLE.COM</value>  </property-->