Hive - Create a Table delimited with Regex expression.

Example -

I wanted to create a table with delimiter "~*". That can be done like below -

CREATE external TABLE db1.t1(a string COMMENT '', b string COMMENT '', c string COMMENT '')
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '(.*?)~\\*(.*?)~\\*(.*)'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '\abc'

Note -

In regex, I have to repeat the expression for as many columns as in table definition. For example, we had 3 columns.
It can't be used for serialization that is. If I wanted to insert data in to the table I can't. It will throw exception.

Other way to do same is -

create external TABLE `db1`.`t2` (col string) row format delimited location '\abc';

Note -

This will create a table and entire text will be in 1 column.
Then, you can execute a query like below -

select arr[0], arr[1], arr[2] from (select split(col, '~\\*') as arr from `db1`.`t2`) T limit 10;
Also, you can insert the data in this table without any hassle.

QueryDB

Search This Blog

Hive - Create a Table delimited with Regex expression.

Comments

Post a Comment

Popular posts

Hive Parse JSON with Array Columns and Explode it in to Multiple rows.

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;

Read from a hive table and write back to it using spark sql

Hadoop Distcp Error Duplicate files in input path

Scala Spark building Jar leads java.lang.StackOverflowError