Hive - Create a Table delimited with Regex expression.

Example -

I wanted to create a table with delimiter "~*". That can be done like below -

CREATE external TABLE db1.t1(a string COMMENT '', b string COMMENT '', c string COMMENT '')
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '(.*?)~\\*(.*?)~\\*(.*)'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '\abc'

Note -

In regex, I have to repeat the expression for as many columns as in table definition. For example, we had 3 columns.
It can't be used for serialization that is. If I wanted to insert data in to the table I can't. It will throw exception.

Other way to do same is -

create external TABLE `db1`.`t2` (col string) row format delimited location '\abc';

Note -

This will create a table and entire text will be in 1 column.
Then, you can execute a query like below -

select arr[0], arr[1], arr[2] from (select split(col, '~\\*') as arr from `db1`.`t2`) T limit 10;
Also, you can insert the data in this table without any hassle.

QueryDB

Search This Blog

Hive - Create a Table delimited with Regex expression.

Comments

Post a Comment

Popular posts

Spark MongoDB Connector Not leading to correct count or data while reading

Scala Spark building Jar leads java.lang.StackOverflowError

MongoDB Chunk size many times bigger than configure chunksize (128 MB)

AWS EMR Spark – Much Larger Executors are Created than Requested

Hive Count Query not working