數據處理中,一遇到json就頭大,很長一段時間里,明知lateral view函數是個好東西,但就是很抗拒去學,都是找數倉的同事先理好字段直接用,順便以菜雞的身份,同情和膜拜一下埋頭洗臟數的數倉同學,大佬辛苦。。
前段時間加入到一個數據建設的項目中作為先鋒軍打頭陣,沒辦法遇到json還是硬著頭皮終于學會了lateral view用法,感受:困難只是心中的一座大山!也不過如此!
我肯定我過不了幾天一定會忘掉(其實已經忘掉一點了。。),沒有好記性拿起爛筆頭,記錄在這里吧。
基本語法:
select
*
from T t
lateral view json_tuple(t.json_txt,[],[],……) q as item1,item2,……
假設T表中有個json_txt字段取值格式如下:
{
"student_no":"0001",
"student_name":'zhangxiaoxiao',
"class":"高三(1)班",
"score_detail":{
"scoreList":[{"scores":[
{"course":"語文","score":100,"rank":2}
,{"course":"數學","score":120,"rank":9}
,{"course":"英語","score":110,"rank":6}
,{"course":"化學","score":90,"rank":4}
,{"course":"物理","score":90,"rank":3}
,{"course":"生物","score":90,"rank":2}
]
}]
},
"total_score":"600"
"overal_rank":"3",
}
如果我要得到每個學生的所有信息字段,則需要將json中的信息解析出來。
select
t.* ----表中其他原始字段保留
,q.student_no
,q.student_name,
,q.class,
,q.total_score,
,q.overal_rank,
,q.course,q.score,q.rank
from T t
lateral view json_tuple(t.json_txt,
"student_no",
"student_name",
"class",
"total_score",
"overal_rank",
"score_detail.scoreList.[*].scores.[*].course",
"score_detail.scoreList.[*].scores.[*].score"
"score_detail.scoreList.[*].scores.[*].rank"
) q as course,score,rank
得到結果如下:
result
但是score和rank是以數列形式存儲在同一行,不方便計算,用trans_array()函數可以解決啦:
select
trans_array(5,',',student_no,student_name,class,total_score,overal_rank,course,score,rank) as (student_no,student_name,class,total_score,overal_rank,course,score,rank)
from (
select
student_no,student_name,class,total_score,overal_rank
,regexp_replace(course,'(\\[)|(\\])|("))','') as course ---去掉[]"符號
,regexp_replace(score,'(\\[)|(\\])|("))','') as score ---去掉[]"符號
,regexp_replace(rank,'(\\[)|(\\])|("))','') as rank ---去掉[]"符號
from result
) t
得到的結果就是縱列的分數明細:
result2