CDC 是 Change Data Capture(变更数据获取)的简称。核心思想是,监测并捕获数据库的
变动(包括数据或数据表的插入、更新以及删除等),将这些变更按发生的顺序完整记录下
来,写入到消息中间件中以供其他服务进行订阅及消费。
CDC 主要分为基于查询和基于 Binlog 两种方式,这两种之间的区别:
基于查询的 CDC | 基于 Binlog 的 CDC | |
---|---|---|
开源技术 | Datax、Sqoop、Kafka JDBC Source | Canal、Maxwell、Debezium、Flink CDC |
执行模式 | Batch | Streaming |
延迟性 | 高延迟 | 低延迟 |
是否增加数据库压力 | 是 | 否 |
基于查询的 CDC 就不做介绍了,下文主要介绍一下 Flink 社区开源的 Flink CDC:
Flink 社区开发了 flink-cdc-connectors 组件,这是一个可以直接从 MySQL、PostgreSQL
等数据库直接读取全量数据和增量变更数据的 source 组件。目前也已开源,开源地址:
https://github.com/ververica/flink-cdc-connectors
社区文档:https://ververica.github.io/flink-cdc-connectors/master/
安装 Docker 环境
创建文件 docker-compose.yml
version: '2.1'
services:
postgres:
image: debezium/example-postgres:1.1
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=1234
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
mysql:
image: debezium/example-mysql:1.1
ports:
- "3306:3306"
environment:
- MYSQL_ROOT_PASSWORD=123456
- MYSQL_USER=mysqluser
- MYSQL_PASSWORD=mysqlpw
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
ports:
- "9200:9200"
- "9300:9300"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
kibana:
image: elastic/kibana:7.6.0
ports:
- "5601:5601"
使用命令,启动容器
docker-compose up -d
准备 Flink 环境
1.安装 Flink
访问页面 https://downloads.apache.org/flink/ 这里选择 Flink 13.3
2.安装SQL连接器
下载下面列出的依赖包,并将它们放到目录 flink-1.13.3/lib/ 下:
- flink-sql-connector-elasticsearch7_2.11-1.13.3.jar
- flink-sql-connector-mysql-cdc-2.0.2.jar
- flink-sql-connector-postgres-cdc-2.0.2.jar
3.准备数据
连接 mysql
创建表结构 & 数据 ```bash — MySQL CREATE DATABASE mydb; USE mydb; CREATE TABLE products ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL, description VARCHAR(512) ); ALTER TABLE products AUTO_INCREMENT = 101;docker-compose exec mysql mysql -uroot -p123456
INSERT INTO products VALUES (default,”scooter”,”Small 2-wheel scooter”), (default,”car battery”,”12V car battery”), (default,”12-pack drill bits”,”12-pack of drill bits with sizes ranging from #40 to #3”), (default,”hammer”,”12oz carpenter’s hammer”), (default,”hammer”,”14oz carpenter’s hammer”), (default,”hammer”,”16oz carpenter’s hammer”), (default,”rocks”,”box of assorted rocks”), (default,”jacket”,”water resistent black wind breaker”), (default,”spare tire”,”24 inch spare tire”);
CREATE TABLE orders ( order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, order_date DATETIME NOT NULL, customer_name VARCHAR(255) NOT NULL, price DECIMAL(10, 5) NOT NULL, product_id INTEGER NOT NULL, order_status BOOLEAN NOT NULL — Whether order has been placed ) AUTO_INCREMENT = 10001;
INSERT INTO orders VALUES (default, ‘2020-07-30 10:08:22’, ‘Jark’, 50.50, 102, false), (default, ‘2020-07-30 10:11:09’, ‘Sally’, 15.00, 105, false), (default, ‘2020-07-30 12:00:30’, ‘Edward’, 25.25, 106, false);
连接 postgres
```bash
docker-compose exec postgres psql -h localhost -U postgres
遇到的问题
Causedby: java.lang.ClassNotFoundException: org.apache.flink.streaming.connectors.elasticsearch.table.Elasticsearch7DynamicSink$DefaultRestClientFactory
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Connector ‘elasticsearch-7’ can only be used as a sink. It cannot be used as a source.