一、前置准备

1. 创建MySQL的学生信息表并导入数据

  1. create database test character set utf8;
  2. use test;
  3. create table student (id int(6),name varchar(30),sex char(3),age Int(3));
  4. insert into student values(000001,'张三','男',18);
  5. insert into student values(000002,'李四','男',20);
  6. insert into student values(000003,'王五','女',20);
  7. select * from student;

2. 数据表赋权限

  1. GRANT ALL on maxwell.* to 'maxwell'@'%' identified by '123456';
  2. GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'%';
  3. FLUSH PRIVILEGES;

3. 通过impala创建kudu表

  1. CREATE TABLE student (
  2. id BIGINT,
  3. age INT,
  4. name STRING,
  5. sex STRING,
  6. PRIMARY KEY (id, age)
  7. )
  8. PARTITION BY HASH (id) PARTITIONS 4,
  9. RANGE (age)
  10. (
  11. PARTITION VALUES < 20,
  12. PARTITION 20 <= VALUES < 30,
  13. PARTITION 30 <= VALUES < 50,
  14. PARTITION 50 <= VALUES
  15. ) STORED AS KUDU
  16. TBLPROPERTIES("kudu.master_addresses" = "cdh1.macro.com:7051");

二、构建管道

1. 创建JDBC Query Consumer

image.png

1.1 修改配置

image.png
image.png
image.png

2. 创建Kudu输出端

image.png

2.1 修改配置

image.png

3. 成功连接到数据源后运行程序

image.png
**

三、测试

1. insert

1.1 登录MySQL,插入数据,观察StreamSets实时情况

image.png
image.png

1.2 查询kudu表是否成功写入

image.png

2. delete

2.1 删除一条MySQL数据

image.png

2.2 StreamSets实时状态信息并未改变

image.png2.3 查看kudu表发现数据还在,增量操作完成

image.png

四、总结:

1. 问题1:找不到JDBC

image.png

1.1 解决步骤:

  1. 新建文件夹/opt/cloudera/parcels/STREAMSETS_DATACOLLECTOR-3.14.0/sdc-extras,并赋予给用户sdc。

    1. mkdir /opt/cloudera/parcels/STREAMSETS_DATACOLLECTOR-3.14.0/sdc-extras
    2. chown sdc:sdc /opt/cloudera/parcels/STREAMSETS_DATACOLLECTOR-3.14.0/sdc-extras
  2. 在CM中配置StreamSets包的路径

    1. export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/cloudera/parcels/STREAMSETS_DATACOLLECTOR/sdc-extras/"

    image.png

  3. 重启集群后在web界面添加jdbc

image.png
image.png

  1. 添加jdbc后可能集群进程会意外退出,需要再次重启集群。