Skip to main content

Posts

Showing posts from October, 2019

Hive - Create a Table delimited with Regex expression.

Example - I wanted to create a table with delimiter "~*". That can be done like below  - CREATE external TABLE db1.t1(a string COMMENT '', b string COMMENT '', c string COMMENT '') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES (   'serialization.format' = '1',   'input.regex' = '(.*?)~\\*(.*?)~\\*(.*)' ) STORED AS   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '\abc' Note - In regex, I have to repeat the expression for as many columns as in table definition. For example, we had 3 columns. It can't be used for serialization that is. If I wanted to insert data in to the table I can't. It will throw exception. Other way to do same is -  create external TABLE `db1`.`t2` (col string) row format delimited location '\abc';

Caused by: org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans

Error - Caused by: org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans RepartitionByExpression [src_db_id#90, ssd_id#83, last_upd_dt#89], 200 +- Filter (isnotnull(ldy#94) && (ldy#94 = 2019))    +- HiveTableRelation `db2`.`t2`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, [ssd_id#83, flctr_bu_id#84, group_name#85, eval_level#86, last_chg_actn#87, last_upd_oper#88, last_upd_dt#89, src_db_id#90, across_skill_type_eval_type#91, dlf_batch_id#92, dlf_load_dttm#93], [ldy#94] and Aggregate [2019] +- Project    +- HiveTableRelation `db1`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [ssd_id#36, flctr_bu_id#37, group_name#38, eval_level#39, last_chg_actn#40, last_upd_oper#41, last_upd_dt#42, src_db_id#43, across_skill_type_eval_type#44] Join condition is missing or trivial. Use the CROSS JOIN syntax to allow cartesian products between these relations.; at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProd