Hive 基本操作总结
创建时间: | 2021/4/26 17:22 |
---|---|
登录Hadoop和Hive
[root@master ~]# su - Zzz
[Zzz@master ~]$ start-all.sh
[Zzz@master ~]$ jps
[Zzz@master ~]$ hive
hive>
1 在Hive中执行shell命令和hdfs命令
执行shell命令
hive> !pwd
/home/Zzz/apache-hive-1.2.1-bin/conf
用户可以执行简单的bash shell命令,只要在命令前加上!并且以分号结尾。Hive CLI不能使用需要用户进行输入的交互式命令,而且不支持shell的“管道符”功能和文件名的自动补全功能
执行hdfs命令
hive> dfs -put /home/Zzz/world.txt /;
hive> dfs -cat /world.txt;
Hello world!
需要把hadoop关键字去掉,加上dfs,然后以分号 ; 结尾。(这种使用hadoop命令的方式实际上比与其等价的在bash shell中执行的hadoop fs命令更加高效)
DDL操作
1.Hive中的数据库
hive> show databases;
OK
default
Time taken: 1.693 seconds, Fetched: 1 row(s)
2.创建数据库
hive> create database day0722;
hive> show databases;
OK
day0722
default
Time taken: 0.041 seconds, Fetched: 2 row(s)
如果数据库存在,抛出异常
hive> create database if not exists ducl;
OK
Time taken: 0.137 seconds
3.查看数据库描述信息
hive> describe database ducl;
OK
ducl hdfs://master:9000/user/hive/warehouse/ducl.db Zzz USER
Time taken: 0.093 seconds, Fetched: 1 row(s)
4.删除数据库
- 先删表再删库
- 强制删除(cascade)
hive> drop database if exists ducl_test caSCADE; —不区分大小写
OK
Time taken: 1.814 seconds
2.创建表
内部表和外部表
未被external修饰的是内部表(managed table),被external修饰的为外部表(external table);
创建好的内部表和外部表,从HDFS上load数据文件到表中:
1.加载到内部表,复制HDFS上的数据文件
2.加载到外部表,移动HDFS上的数据文件
删除内部表,元数据和数据一起删除;
删除外部表,只删除元数据,不删除数据,再次创建表,并修复分区,即可恢复表。
—默认仓库路径
内部表:内部表的默认创建路径在:/user/hive/warehouse/database.db/xm_testA
外部表:外部表的默认创建路径在:/user/hive/warehouse/database.db/xm_testB
—drop 表
内部表:内部表删除后会将元数据和路径下的文件都删除
外部表:外部表只删除元数据,不删除路径下的文件
—load加载数据
内部表:会把数据移动到自己指定的路径下
外部表:不会把数据移动到自己的数据仓库目录下,也因此证明外部表的数据不是由自己管理的。
区别:
1)内部表的生命周期以及数据都由Hive自身管理,就是内部表的表结构和表中的数据都是由hive进行管理的。如果删除了内部表,那么内部表中的数据也会被删除。外部表只有表结构是hive进行管理的,数据由HDFS管理,如果删除hive中的外部表,那么表结构会删除,但是不会删除表中的数据。
2)删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
3)对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)
4)内部表的数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),但是内部表和外部表都可以自己指定存储位置;
- 创建内部表
hive> use ducl
hive>
create table student(
id int,
name string,
age int)
row format delimited —默认的hive语句
fields terminated by ‘\t’ —字段与字段间用制表符分隔
stored as textfile; —文件类型
hive> show tables;
OK
student
Time taken: 0.099 seconds, Fetched: 1 row(s)
hive> desc student;
OK
id int
name string
age int
Time taken: 0.204 seconds, Fetched: 3 row(s)
//查看创建表的详细信息
hive> show create table student;
OK
CREATE TABLE student
( —建表语句
id
int,
name
string,
age
int)
ROW FORMAT DELIMITED —字段分隔符格式
FIELDS TERMINATED BY ‘\t’
STORED AS INPUTFORMAT —输入格式
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT —输出格式
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION —存放位置
‘hdfs://master:9000/user/hive/warehouse/ducl.db/student‘
TBLPROPERTIES ( —时间戳
‘transient_lastDdlTime’=’1563785203‘)
Time taken: 0.361 seconds, Fetched: 14 row(s)
- 创建外部表
hive>
create external table student2(
id int,
name string,
age int)
row format delimited
fields terminated by ‘\t’
location ‘/student2’; —自行设定存放的路径
存放路径必须是hdfs上一个没有的路径,如果不设定就存在默认路径/user/hive/warehouse下
建表策略
如果数据是非常核心的,不能随便删除,最好建成外部表。即使删除了元数据,表结构也丢失了,它的数据依然还存在hdfs上。
- 创建分区表(提高查询效率)
hive>
create table student3(
id int,
name string,
age int)
partitioned by(sex string) —静态分区,以性别分区建表
row format delimited
fields terminated by ‘\t’
stored as textfile;
hive> desc student3;
- 创建分桶表
hive>
create table student4(
id int,
name string,
age int)
clustered by(id) sorted by(id desc) into 3 buckets —将id降序排列,并分为3个桶
row format delimited
fields terminated by ‘\t’;
3.修改表
- 重命名表
hive> alter table student4 rename to student_buck;
OK
Time taken: 0.892 seconds
- 增加列
hive> alter table student add columns(address string);
OK
Time taken: 0.267 seconds
hive> desc student;
OK
id int
name string
age int
address string
Time taken: 0.137 seconds, Fetched: 4 row(s)
- 改变列
hive> alter table student change column id number int;
OK
Time taken: 1.194 seconds
hive> desc student;
OK
number int
name string
age int
address string
Time taken: 0.143 seconds, Fetched: 4 row(s)
- 替换列
hive> alter table student replace columns(id int,name string,age int);
OK
Time taken: 0.175 seconds
hive> desc student;
OK
id int
name string
age int
Time taken: 0.136 seconds, Fetched: 3 row(s)
- 增加分区(必须在存在分区列的表上执行增加分区的操作,才会成功)
hive>
alter table student3
add partition(sex=’male’)
location ‘/user/hive/warehouse/ducl.db/student3/sex=male’;
OK
Time taken: 0.722 seconds
hive> desc student3;
OK
id int
name string
age int
sex string
# Partition Information
# col_name data_type comment
sex string
Time taken: 0.117 seconds, Fetched: 9 row(s)
hive> show create table student3;
OK
CREATE TABLE student3
(
id
int,
name
string,
age
int)
PARTITIONED BY (
sex
string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://master:9000/user/hive/warehouse/ducl.db/student3‘
TBLPROPERTIES (
‘transient_lastDdlTime’=’1563786075‘)
Time taken: 0.089 seconds, Fetched: 16 row(s)
- 删除分区
hive> alter table student3 drop partition(sex=’male’);
Dropped the partition sex=male
OK
Time taken: 1.316 seconds
增加分区后会在/user/hive/warehouse/ducl.db/student3/目录下生成一个sex=male目录,删除后这个目录便消失
DML操作
- 元数据(准备)
[Zzz@master ~]$ mkdir data
[Zzz@master ~]$ cd data
[Zzz@master data]$ vi student1.txt
[Zzz@master data]$ vi student2.txt
[Zzz@master data]$
[Zzz@master data]$ ls
student1.txt student2.txt
数据导入注意制表符(Tab)
student1.txt
1001 shiny 23
1002 cendy 22
1003 angel 23
1009 ella 21
1012 eva 24
student2.txt
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
- 加载数据
hive> use ducl;
hive> show tables;
OK
student
student2
student3
student_buck
Time taken: 0.162 seconds, Fetched: 4 row(s)
上传文件到HDFS
hive> dfs -put /home/Zzz/data/student1.txt /student2;
hive> dfs -ls /student2;
Found 1 items
-rw-r—r— 1 Zzz supergroup 67 2019-07-23 10:23 /student2/student1.txt
查看数据
hive> select * from student2;
OK
1001 shiny 23
1002 cendy 22
1003 angel 23
1009 ella 21
1012 eva 24
Time taken: 5.158 seconds, Fetched: 5 row(s)
- load(第一种加载数据到hive)
1.从本地上传load到hive中
hive> load data local inpath ‘/home/Zzz/data/student2.txt’ into table student;
hive> select * from student;
OK
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
Time taken: 0.36 seconds, Fetched: 5 row(s)
hive> load data local inpath ‘/home/Zzz/data/student1.txt’ into table student;
hive> select * from student;
OK
1001 shiny 23
1002 cendy 22
1003 angel 23
1009 ella 21
1012 eva 24
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
Time taken: 0.111 seconds, Fetched: 10 row(s)
覆盖数据并添加
hive> load data local inpath ‘/home/Zzz/data/student2.txt’ overwrite into table student;
hive> select * from student;
OK
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
Time taken: 0.129 seconds, Fetched: 5 row(s)
2.从HDFS上传load到hive中
hive> dfs -mkdir /data;
hive> dfs -put /home/Zzz/data/student1.txt /data;
hive> dfs -put /home/Zzz/data/student2.txt /data;
hive> dfs -ls /data;
Found 2 items
-rw-r—r— 1 Zzz supergroup 67 2019-07-23 11:09 /data/student1.txt
-rw-r—r— 1 Zzz supergroup 64 2019-07-23 11:09 /data/student2.txt
hive> load data inpath ‘/data/student1.txt’ into table student3 partition(sex=’female’);
hive> dfs -ls /user/hive/warehouse/ducl.db/student3;
Found 1 items
drwxr-xr-x - Zzz supergroup 0 2019-07-23 11:11 /user/hive/warehouse/ducl.db/student3/sex=female
hive> select * from student3;
OK
1001 shiny 23 female
1002 cendy 22 female
1003 angel 23 female
1009 ella 21 female
1012 eva 24 female
Time taken: 0.247 seconds, Fetched: 5 row(s)
3.通过查询语句向表中插入数据(第2种加载数据到hive)
hive> insert into table student_buck select from student;
hive> select from student_buck;
OK
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
Time taken: 0.114 seconds, Fetched: 5 row(s)
4.单个查询语句向表中插入数据(在创建表的同时去加载数据)(第三种)
hive> create table student5 as select from student;
hive> select from student5;
OK
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
Time taken: 0.15 seconds, Fetched: 5 row(s)
- 导出数据
1.导出数据到本地
带分隔符的导出数据
hive> insert overwrite local directory ‘/home/Zzz/data/student’
row format delimited
fields terminated by ‘\t’
select * from student;
本地查看
[Zzz@master ~]$ cd data/student
[Zzz@master student]$ ls
000000_0
[Zzz@master student]$ cat 000000_0
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
2.导出数据到HDFS
hive> insert overwrite directory ‘/student3’ row format delimited fields terminated by ‘\t’ select * from student;
hive> dfs -cat /student3/000000_0;
1005 bob 24
1006 mark 23
1007 leo 22
1011 JACK 23
1014 JAMES 24
数据库和数据仓库
数据库分类:关系型数据库、非关系型数据库
数据库:按照一定的数据结构来组织、存储的,多种方式来管理数据库的数据
数据仓库:
面向主题的、集成的、与时间相关且不可修改的数据集合(历史数据)
———————————————————————————————————————————————————-
分区表:
静态分区表、动态分区表
准备数据
[Zzz@master data]$ vi stu_messages.txt
5 男 16 北京 13754554587 2015-03-24
6 女 17 北京 13872374170 2017-02-20
2 女 16 北京 17938844884 2015-05-26
23 男 15 北京 13700000033 2016-07-25
4 男 17 北京 15257575773 2017-08-11
3 女 15 北京 15885888889 2018-05-03
10 男 16 北京 14098593989 2018-04-06
15 女 14 北京 14938983000 2019-06-24
本地数据导入数据库中的表
hive> load data local inpath ‘/home/Zzz/data/stu_messages.txt’ into table stu_messages;
查看
hive> select * from stu_messages;
OK
5 男 16 北京 13754554587 2015-03-24
6 女 17 北京 13872374170 2017-02-20
2 女 16 北京 17938844884 2015-05-26
23 男 15 北京 13700000033 2016-07-25
4 男 17 北京 15257575773 2017-08-11
3 女 15 北京 15885888889 2018-05-03
10 男 16 北京 14098593989 2018-04-06
15 女 14 北京 14938983000 2019-06-24
打开动态分区,每次启动Hive都要启动动态分区
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> set hive.exec.dynamic.partition=true;
创建表
hive>
create table if not exists stu_mess(
stu_id int,
sex string,
age int,
address string,
tel_num string,
ts date,
y int,
m int)
row format delimited
fields terminated by ‘\t’
stored as textfile;
插入数据
hive> insert into table stu_mess select stu_id,sex,age,address,tel_num,ts,year(ts)as y,month(ts)as m from stu_messages;
hive> select * from stu_mess;
OK
5 男 16 北京 13754554587 2015-03-24 2015 3
6 女 17 北京 13872374170 2017-02-20 2017 2
2 女 16 北京 17938844884 2015-05-26 2015 5
23 男 15 北京 13700000033 2016-07-25 2016 7
4 男 17 北京 15257575773 2017-08-11 2017 8
3 女 15 北京 15885888889 2018-05-03 2018 5
10 男 16 北京 14098593989 2018-04-06 2018 4
15 女 14 北京 14938983000 2019-06-24 2019 6
Time taken: 0.161 seconds, Fetched: 8 row(s)
创建动态分区表
hive>
create table if not exists stu_mess_part(
stu_id int,
sex string,
age int,
address string,
tel_num string,
ts date)
partitioned by(y int,m int)
row format delimited
fields terminated by ‘\t’
stored as textfile;
插入数据,在HDFS上可以看到已经分区显示了
hive> insert overwrite table stu_mess_part partition(y,m) select * from stu_mess;
创建桶表
Hive采用对列值哈希,然后除于桶的个数求余的方式决定该条记录存放在哪个桶中
准备数据
[Zzz@master data]$ vi stu_buck.txt
95003,王敏,女,22,MA
95004,张立,男,19,IS
95005,刘刚,男,18,MA
95006,孙庆,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,梦圆圆,女,18,MA
95010,孔小涛,男,19,CS
95011,包小柏,男,18,MA
95012,孙花,女,20,CS
95013,冯伟,男,21,CS
95014,王小丽,女,19,CS
95015,王君,男,18,MA
95016,钱国,男,21,MA
95017,王风娟,女,18,IS
95018,王一,女,19,IS
95019,邢小丽,女,19,IS
95020,赵钱,男,21,IS
95021,周二,男,17,MA
95022,郑明,男,20,MA
打开桶表分区功能
hive> set hive.enforce.bucketing=true;
设置桶个数
hive> set mapreduce.job.reduces=3;
创建桶表1
hive>
create table if not exists stu_buck(
sno int,
sname string,
ssex string,
sage int,
sdept string)
clustered by(sno) into 3 buckets
row format delimited
fields terminated by ‘,’
stored as textfile;
创建桶表2
hive>
create table if not exists students(
sno int,
sname string,
ssex string,
sage int,
sdept string)
clustered by(sno) into 3 buckets
row format delimited
fields terminated by ‘,’;
把本地数据导入桶表2
hive> load data local inpath ‘/home/Zzz/data/stu_buck.txt’ into table students;
从桶表2向桶表1插入数据
hive> insert into table stu_buck select * from students distribute by (sno) sort by(sno);
HDFS查看通标数据
hive> dfs -ls /user/hive/warehouse/ducl.db/stu_buck;
Found 3 items
-rwxr-xr-x 1 Zzz supergroup 193 2019-07-23 14:44 /user/hive/warehouse/ducl.db/stu_buck/000000_0
-rwxr-xr-x 1 Zzz supergroup 170 2019-07-23 14:44 /user/hive/warehouse/ducl.db/stu_buck/000001_0
-rwxr-xr-x 1 Zzz supergroup 164 2019-07-23 14:44 /user/hive/warehouse/ducl.db/stu_buck/000002_0
hive> dfs -cat /user/hive/warehouse/ducl.db/stu_buck/000000_0;
95001,李勇,男,20,CS
95019,邢小丽,女,19,IS
95016,钱国,男,21,MA
95013,冯伟,男,21,CS
95022,郑明,男,20,MA
95010,孔小涛,男,19,CS
95007,易思玲,女,19,MA
95004,张立,男,19,IS
分区又分桶
设置分桶个数
hive> set mapreduce.job.reduces=4;
创建分区分桶表
hive>
create table if not exists stu_mess_part_buck(
stu_id int,
sex string,
age int,
address string,
tel_num string,
ts date)
partitioned by (y int,m int)
clustered by(stu_id) into 4 buckets
row format delimited
fields terminated by ‘\t’
stored as textfile;
分区方式插入数据
hive> insert into table stu_mess_part_buck partition(y,m) select * from stu_mess;
hive> dfs -cat /user/hive/warehouse/ducl.db/stu_mess_part_buck/y=2017/m=8/000000_1;
4 男 17 北京 15257575773 2017-08-11
基于桶的抽样
hive> select * from stu_mess_part_buck tablesample(bucket 1 out of 4);
OK
4 男 17 北京 15257575773 2017-08-11 2017 8
Time taken: 0.136 seconds, Fetched: 1 row(s)
join操作(多表连接)
hive> create database day0723;
hive> use day0723;
hive>
create table studenta(
id int,
name string)
row format delimited
fields terminated by ‘\t’
stored as textfile;
hive>
create table studentb(
id int,
age int)
row format delimited
fields terminated by ‘\t’
stored as textfile;
hive> show tables;
OK
studenta
studentb
Time taken: 0.042 seconds, Fetched: 2 row(s)
准备数据
[Zzz@master data]$ vi studenta.txt
[Zzz@master data]$ vi studentb.txt
studenta.txt
10001 shiny
10002 mark
10003 angel
10005 ella
10009 jack
10014 eva
10018 judy
10020 cendy
studentb.txt
10001 23
10004 22
10007 24
10008 21
10009 25
10012 25
10015 20
10018 19
10020 26
加载数据(本地上传)
hive> load data local inpath ‘/home/Zzz/data/studenta.txt’ into table studenta;
hive> load data local inpath ‘/home/Zzz/data/studentb.txt’ into table studentb;
查看数据
hive> select from studenta;
OK
10001 shiny
10002 mark
10003 angel
10005 ella
10009 jack
10014 eva
10018 judy
10020 cendy
Time taken: 0.136 seconds, Fetched: 8 row(s)
hive> select from studentb;
OK
10001 23
10004 22
10007 24
10008 21
10009 25
10012 25
10015 20
10018 19
10020 26
Time taken: 0.076 seconds, Fetched: 9 row(s)
内连接查询
hive> select * from studenta a join studentb b on a.id=b.id;
10001 shiny 10001 23
10009 jack 10009 25
10018 judy 10018 19
10020 cendy 10020 26
左外链接(右边写+),以左边为匹配标准,右边补空值(哪边写+,哪边补空值)
hive> select * from studenta a left join studentb b on a.id=b.id;
10001 shiny 10001 23
10002 mark NULL NULL
10003 angel NULL NULL
10005 ella NULL NULL
10009 jack 10009 25
10014 eva NULL NULL
10018 judy 10018 19
10020 cendy 10020 26
右外链接(左边写+),左边补空值(哪边写+,哪边补空值)
hive> select * from studenta a right join studentb b on a.id=b.id;
10001 shiny 10001 23
NULL NULL 10004 22
NULL NULL 10007 24
NULL NULL 10008 21
10009 jack 10009 25
NULL NULL 10012 25
NULL NULL 10015 20
10018 judy 10018 19
10020 cendy 10020 26
全外连接,两表数据去重之和
hive> select * from studenta a full join studentb b on a.id=b.id;
NULL NULL 10004 22
NULL NULL 10008 21
NULL NULL 10012 25
10020 cendy 10020 26
10001 shiny 10001 23
10005 ella NULL NULL
10009 jack 10009 25
10002 mark NULL NULL
10014 eva NULL NULL
10018 judy 10018 19
10003 angel NULL NULL
NULL NULL 10007 24
NULL NULL 10015 20
左半连接:把符合两边连接条件的左表的数据显示出来(内连接的左半部分)
hive> select * from studenta a left semi join studentb b on a.id=b.id;
10001 shiny
10009 jack
10018 judy
10020 cendy
数据类型
1.原子数据类型(数值,布尔、字符串,时间戳,日期)
2.复杂数据类型(数组、映射、结构体)
复杂数据类型实例
- 数组
准备数据
[Zzz@master data]$ vi employee.txt
注意要用制表符(Tab)
shiny 23 beijing,tianjin,qingdao
jack 34 shanghai,guangzhou
mark 26 beijing,xian
ella 21 beijing
judy 30 shanghai,hangzhou,chongqing
cendy 28 beijing,shanghai,dalian,chengdu
创建数组
hive>
create table if not exists employee(
name string,
age int,
work_location array
row format delimited
fields terminated by ‘\t’
collection items terminated by ‘,’; —-数组中各元素的分隔符
导入本地数据到表中
hive> load data local inpath ‘/home/Zzz/data/employee.txt’ into table employee;
hive> select * from employee;
shiny 23 [“beijing”,”tianjin”,”qingdao”]
jack 34 [“shanghai”,”guangzhou”]
mark 26 [“beijing”,”xian”]
ella 21 [“beijing”]
judy 30 [“shanghai”,”hangzhou”,”chongqing”]
cendy 28 [“beijing”,”shanghai”,”dalian”,”chengdu”]
数组索引查看
hive> select work_location[0] from employee;
OK
beijing
shanghai
beijing
beijing
shanghai
beijing
Time taken: 0.216 seconds, Fetched: 6 row(s)
hive> select work_location[1] from employee;
OK
tianjin
guangzhou
xian
NULL
hangzhou
shanghai
Time taken: 0.115 seconds, Fetched: 6 row(s)
hive> select work_location[2] from employee;
OK
qingdao
NULL
NULL
NULL
chongqing
dalian
Time taken: 0.073 seconds, Fetched: 6 row(s)
hive> select work_location[3] from employee;
OK
NULL
NULL
NULL
NULL
NULL
chengdu
Time taken: 0.084 seconds, Fetched: 6 row(s)
- 映射
准备数据
[Zzz@master data]$ vi scores.txt
注意制表符(Tab)
shiny chinese:90,math:100,english:99
mark chinese:89,math:56,english:87
judy chinese:94,math:78,english:81
ella chinese:54,math:23,english:48
jack chinese:100,math:95,english:69
cendy chinese:67,math:83,english:45
创建映射表
hive>
create table if not exists scores(
name string,
scores map
row format delimited
fields terminated by ‘\t’
collection items terminated by ‘,’
map keys terminated by ‘:’
stored as textfile;
导入本地数据到表中
hive> load data local inpath ‘/home/Zzz/data/scores.txt’ into table scores;
hive> select * from scores;
shiny {“chinese”:90,”math”:100,”english”:99}
mark {“chinese”:89,”math”:56,”english”:87}
judy {“chinese”:94,”math”:78,”english”:81}
ella {“chinese”:54,”math”:23,”english”:48}
jack {“chinese”:100,”math”:95,”english”:69}
cendy {“chinese”:67,”math”:83,”english”:45}
映射查看
hive> select name,scores[‘math’] from scores;
OK
shiny 100
mark 56
judy 78
ella 23
jack 95
cendy 83
Time taken: 0.105 seconds, Fetched: 6 row(s)
hive> select name,scores[‘chinese’] from scores;
OK
shiny 90
mark 89
judy 94
ella 54
jack 100
cendy 67
Time taken: 0.055 seconds, Fetched: 6 row(s)
hive> select name,scores[‘english’] from scores;
OK
shiny 99
mark 87
judy 81
ella 48
jack 69
cendy 45
Time taken: 0.068 seconds, Fetched: 6 row(s)
- 结构体
准备数据
[Zzz@master data]$ vi coursescore.txt
注意制表符(Tab)
1 chinese,100
2 math,98
3 english,99
4 computer,78
创建结构体
hive>
create table if not exists coursescore(
id int,
course struct
row format delimited
fields terminated by ‘\t’
collection items terminated by ‘,’
stored as textfile;
导入本地数据到表中
hive> load data local inpath ‘/home/Zzz/data/coursescore.txt’ into table coursescore;
hive> select * from coursescore;
OK
1 {“name”:”chinese”,”score”:100}
2 {“name”:”math”,”score”:98}
3 {“name”:”english”,”score”:99}
4 {“name”:”computer”,”score”:78}
Time taken: 0.072 seconds, Fetched: 4 row(s)
结构体查看
hive> select id,course.name,course.score from coursescore;
OK
1 chinese 100
2 math 98
3 english 99
4 computer 78
Time taken: 0.078 seconds, Fetched: 4 row(s)