1 简介
DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
2 安装及基本使用
https://github.com/alibaba/DataX/blob/master/userGuid.md
如果是python3需更换datax.py文件
支持mysql8以上版本 https://www.cnblogs.com/zifan/p/12550747.html
3.处理版Datax
链接:https://pan.baidu.com/s/1_FpIw76lfb7l0nDLf941zQ 提取码:7ki9
3 Json脚本配置说明
{"core": {"transport": {"channel": {"speed": {## 此处为数据导入的并发度,建议根据服务器硬件进行调优"channel": 10,##此处解除对读取行数的限制"record": -1,##此处解除对字节的限制"byte": -1,##每次读取batch的大小"batchSize": 204800}}}},"job": {"setting": {"speed": {"channel": 10,"record": -1,"byte": -1,"batchSize": 204800},"errorLimit": {"percentage": 0}},"content": [{"reader": {"name": "sqlserverreader","parameter": {"username": "sa","password": "sasa","where": "","connection": [{//querySql 在读之前支持链表读取源数据"querySql": ["SELECT a.Id,a.Code,getdate() as Catetime FROM basics.Location a join basics.LocationType b on a.LocationTypeCode = b.Code;"],"jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=Src.Basics.Pro"]}],"maxRetries": 3}},"writer": {"name": "mysqlwriter","parameter": {"username": "root","password": "123456","dateFormat": "YYYY-MM-dd hh:mm:ss","column": ["id","code","Catetime"],//preSql 支持多个写入前的动作"preSql": ["truncate table location_back","insert into location_back select * from location","truncate table location;"],"connection": [{//useCompression=true 压缩链接串,减少迁移时间"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai&useCompression=true","table": ["location"]}]}}}]}}
- Datax调优和动态传参
https://www.cnblogs.com/hit-zb/p/10940849.html#autoid-0-0-0
Datax结合Python
str = "CHCP 65001"+"&&"str += "C:"+"&&"str += "cd C:\\Users\\mechrev\\Desktop\\迁移文件准备\\" + "&&"str += "python newdatax\\bin\\datax.py "str += " --jvm=\"-Xms3G -Xmx3G\" " + filename//主要利用内置函数os.system 执行cmd命令行语句,其他需求在此基础上可任意扩展cmd = os.system(str)print(cmd)
Datax结合Net
//利用Process 执行cmd命令行语句,其他需求在此基础上可任意扩展,//这种方式是阻塞式的,不能够实时观察迁移情况string strInput = "ipconfig";Process p = new Process();//设置要启动的应用程序p.StartInfo.FileName = "cmd.exe";//是否使用操作系统shell启动p.StartInfo.UseShellExecute = false;// 接受来自调用程序的输入信息p.StartInfo.RedirectStandardInput = true;//输出信息p.StartInfo.RedirectStandardOutput = true;// 输出错误p.StartInfo.RedirectStandardError = true;//不显示程序窗口p.StartInfo.CreateNoWindow = true;//设置乱码p.StartInfo.StandardOutputEncoding = Encoding.UTF8;//启动程序p.Start();//向cmd窗口发送输入信息p.StandardInput.WriteLine(strInput + "&exit");p.StandardInput.AutoFlush = true;//获取输出信息string strOuput = p.StandardOutput.ReadToEnd();//等待程序执行完退出进程p.WaitForExit();p.Close();
扩展
- 定时任务:新增Net控制台程序,按datax规则编写程序,生成exe,配合系统的定时任务计划调用即可
- 主流ELK工具对比: https://cloud.tencent.com/developer/article/1531141
- DataxWeb: https://gitee.com/WeiYe-Jing/datax-web
- DataWorks: https://www.jianshu.com/p/1ca43271369b
