Hive Json Serde

  • SERDE 是序列化/反序列化
  • 教程

一、org.apache.hive.hcatalog.data.JsonSerDe

  • hive 表结构, 必须定好结构, 不可以动态支持字段
  • hive
  • spark

测试数据:

  1. {"common":{
  2. "c1":"a",
  3. "c2":"b",
  4. "c3":"xxxx",
  5. "c4":[
  6. "a","b","c"
  7. ]
  8. },
  9. "ip":"192.168.1.1"
  10. }

案例:

  1. ADD JAR /path/hive-hcatalog-core-1.1.0-cdh5.9.0.jar;
  2. -- 创建解析表
  3. CREATE EXTERNAL TABLE source_json_table (
  4. `common` struct<
  5. c1:string,
  6. c2:string,
  7. c3:string,
  8. c4:array<string>
  9. >,
  10. `ip` string
  11. ) PARTITIONED BY (p_dt String)
  12. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
  13. WITH SERDEPROPERTIES (
  14. -- 当前行 json 解析失败跳过
  15. "ignore.malformed.json"="true"
  16. )
  17. STORED AS TEXTFILE
  18. ;
  19. -- 增加数据
  20. ALTER TABLE source_json_table ADD IF NOT EXISTS PARTITION (p_dt = '2017-01-17') LOCATION '/path/20170117';
  21. -- 查询数据
  22. SELECT common.c1 FROM source_json_table LIMIT 10;

二、org.openx.data.jsonserde.JsonSerDe (推荐)

1. 处理 json 格式

测试数据:

  1. { "common":{
  2. "c1":"a",
  3. "c2":"b",
  4. "c3":"xxxx",
  5. },
  6. "country": "Switzerland",
  7. "languages":["German","French", "Italian"],
  8. "religions":{
  9. "catholic":[10,20],
  10. "protestant":[40,50]
  11. },
  12. "count" : {
  13. "a": 4,
  14. "b": 1,
  15. "c":[
  16. "2345MiniPage.exe",
  17. "2345PinyinCloud.exe",
  18. "2345PinyinUpdate.exe"
  19. ]
  20. }
  21. }

案例一, 自动匹配映射 Start:

  1. ADD JAR /path/json-serde-1.3.7-jar-with-dependencies.jar;
  2. -- 创建解析表
  3. CREATE EXTERNAL TABLE source_json_table (
  4. -- 模式匹配映射
  5. `common` map<string,string>,
  6. -- 映射 字符创
  7. `country` string,
  8. -- 映射数组
  9. `languages` array<string>,
  10. -- 映射嵌套
  11. `religions` map<string,array<int>>,
  12. -- 自定义结构映射
  13. `count` struct<
  14. a:string,
  15. b:string,
  16. c:array<string>
  17. >
  18. ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
  19. WITH SERDEPROPERTIES (
  20. -- 当前行 json 解析失败跳过
  21. "ignore.malformed.json"="true"
  22. ) STORED AS TEXTFILE
  23. LOCATION '/path/20170117';
  24. -- 查询数据(注意不同 map 的访问格式)
  25. SELECT common['c1'],count.a FROM ods.ods_pinyin_click LIMIT 1

2. 处理 json arr 格式

测试数据

  1. ["00:00:35","27.188.84.94","SERVER_RES",1,"F852FB306F94EDCFA59C806F8BDFCD2F","",""]

映射表

  1. CREATE TABLE IF NOT EXISTS source_json_arr_table (
  2. `time` string,
  3. `ip` string,
  4. `server` string,
  5. `num` string,
  6. `mac` string,
  7. `other1` string,
  8. `other2` string
  9. ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
  10. WITH SERDEPROPERTIES (
  11. -- 当前行 json 解析失败跳过
  12. "ignore.malformed.json"="true"
  13. )
  14. STORED AS TEXTFILE
  15. LOCATION '/path/20170117';
  16. ;
  17. SELECT * FROM ods.ods_pinyin_click LIMIT 1

三、org.apache.hadoop.hive.contrib.serde2.JsonSerde 解析

  • HIVE 1.1 后, CDH 5.9 后支持不好, 所以不建议使用了
  1. ADD JAR /path/hive-json-serde-0.2.jar;
  2. -- 创建映射表
  3. CREATE EXTERNAL TABLE IF NOT EXISTS source_json_table (
  4. `uid` string COMMENT 'from deserializer',
  5. `ccid` string COMMENT 'from deserializer'
  6. )
  7. ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
  8. STORED AS TEXTFILE
  9. LOCATION '/path/20170117'
  10. ;

四、永久生效

  1. 方法 1: hive-env.sh
  2. export HIVE_AUX_JARS_PATH=/etc/hive/auxlib/*.jar
  3. 方法 2. hive-site.xml
  4. <property>
  5. <name>hive.aux.jars.path</name>
  6. <value>file:///etc/hive/auxlib/hive-json-serde-0.2.jar</value>
  7. </property>