Example -
I wanted to create a table with delimiter "~*". That can be done like below -
CREATE external TABLE db1.t1(a string COMMENT '', b string COMMENT '', c string COMMENT '')
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '(.*?)~\\*(.*?)~\\*(.*)'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '\abc'
Note -
I wanted to create a table with delimiter "~*". That can be done like below -
CREATE external TABLE db1.t1(a string COMMENT '', b string COMMENT '', c string COMMENT '')
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '(.*?)~\\*(.*?)~\\*(.*)'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '\abc'
Note -
- In regex, I have to repeat the expression for as many columns as in table definition. For example, we had 3 columns.
- It can't be used for serialization that is. If I wanted to insert data in to the table I can't. It will throw exception.
Other way to do same is -
create external TABLE `db1`.`t2` (col string) row format delimited location '\abc';
Note -
- This will create a table and entire text will be in 1 column.
- Then, you can execute a query like below -
select arr[0], arr[1], arr[2] from (select split(col, '~\\*') as arr from `db1`.`t2`) T limit 10;
- Also, you can insert the data in this table without any hassle.
Comments
Post a Comment