数据导入 - 基础数据导入 - 《大数据基础》

load的方式（hive中导入表）
insert数据导入

load的方式（hive中导入表）

—-linux磁盘
—-hdfs上的文件进行导入表

从linux磁盘导入数据到hive

案例：load data local inpath '/home/hadoop/student.txt' into table stu_managed ;
验证： select * from stu_managed limit 3;

从hdfs导入数据到hive

如何把一个linux磁盘文件通过shell上传到hdfs？？？
1. hdfs dfs -put '/home/hadoop/student.txt' '/'
2. hadoop dfs -put '/home/hadoop/student.txt' '/'
load data inpath '/student.txt' into table stu_managed ;
总结：
从linux磁盘导入数据到hive : 复制的方式导入
从hdfs磁盘导入数据到hive : 剪切的方式导入
验证：tail -3 student.txt / cat student.txt
insert数据导入
单条数据导入
insert into table stu_managed values(1,"haha",'n',1,'MA');
注：
1）文件追加形式进行导入的
2）首先创建一个临时表（元数据库多一条信息），
把数据放入到了临时表里（hdfs:3个备份），
把临时表的数据迁移到stu_managed（内部表）
（hdfs:3个备份）
3）单条数据导入产生大量的小文件（namenode压力大）
（可能导致namenode进程宕掉了）
单重数据导入
insert into table stu_managed
select * from stu_managed;
多重数据导入
适用场景: 从一个数据源进行查询，输出路径是多个
需求：
数据源： stu_managed
年龄是1岁 —-> stu_t1; 年龄是2岁 —-> stu_t2
多重数据导入：
from stu_managed
insert into table stu_t1 select * where age = 1
insert into table stu_t2 select * where age = 2; //多重数据导入，只扫描数据源一次
单重数据导入：
insert into table stu_t1
select * from stu_managed where age = 1
insert into table stu_t2
select * from stu_managed where age = 2; //2次单重数据导入，会扫描数据源2次
报错：没有 stu_t1，stu_t2 ，如何快速创建一个表？？？
create table stu_t1 like stu_managed ;
create table stu_t2 like stu_managed ;

小贴士：多重数据导入—>单重数据导入—>单条数据导入

基础数据导入

load的方式（hive中导入表）

从linux磁盘导入数据到hive

从hdfs导入数据到hive

总结：

insert数据导入

单条数据导入

单重数据导入

多重数据导入