Hive Parse JSON with Array Columns and Explode it in to Multiple rows.

Say we have a JSON String like below -

{
"billingCountry":"US"
"orderItems":[
{
"itemId":1,
"product":"D1"
},
{
"itemId":2,
"product":"D2"
}
]
}

And, our aim is to get output parsed like below -

itemId	product
1	D1
2	D2

First, We can parse JSON as follows to get JSON String

get_json_object(value, '$.orderItems.itemId') as itemId
get_json_object(value, '$.orderItems.product') as product

Second, Above will result String value like "[1,2]". We want to convert it to Array as follows -

split(regexp_extract(get_json_object(value, '$.orderItems.itemId'),'^\\["(.*)\\"]$',1),'","') as itemId
split(regexp_extract(get_json_object(value, '$.orderItems.product'),'^\\["(.*)\\"]$',1),'","') as product

Third, We need to explode the multiple array columns to get final result. This can be done as follows-

lateral view posexplode(product) tf as pos,val

Fourth, Now we can select individual elements of array using index / position "pos"

product[pos] as product,
itemId[pos] as itemId

Summary SQL -

select

product[pos] as product,

itemId[pos] as itemId

from(

SELECT

split(regexp_extract(get_json_object(value, '$.orderItems.product'),'^\\["(.*)\\"]$',1),'","') as product, split(regexp_extract(get_json_object(value, '$.orderItems.itemId'),'^\\["(.*)\\"]$',1),'","') as itemId

from test_json) T1

lateral view posexplode(product) tf as pos,val

QueryDB

Search This Blog

Hive Parse JSON with Array Columns and Explode it in to Multiple rows.

Comments

Post a Comment

Popular posts

Spark MongoDB Connector Not leading to correct count or data while reading

Scala Spark building Jar leads java.lang.StackOverflowError

MongoDB Chunk size many times bigger than configure chunksize (128 MB)

AWS EMR Spark – Much Larger Executors are Created than Requested

Hive Count Query not working