ClickHouse
1、概述
1、俄国人使用c++编写
2、列式存储
3、开源
4、在线分析处理查询(OLAP)
2、原理概述
1、列式存储的好处:
对于列的聚合,计数更优
同一列的数据类型相同,压缩性能更好,更节省磁盘空间
2、写操作:
采用类似LSM tree 的结构,数据先顺序写入临时数据区,之后异步合并到主数据区,写入性能优异,但不支持删改行数据,也不支持事务
3、读操作:
clickhouse是单语句多线程,即一条查询语句可能是由多个线程完成的,而MySQL中一条查询语句只有一个线程,多个语句才有可能起多个线程执行,一条语句起多个线程,处理单条语句很强,但是不适合处理多条语句的情景
4、采用稀疏索引:
优点是可以快速定位一大片数据
缺点是不适合做点对点查询
Hbase和redis是支持hash索引的,可以点对点查询,Mysql使用B+数索引,即可以点对点,也可以范围查找
5、性能分析结果:适合单个大宽表查询
3、单机安装
1、修改虚拟机参数:
vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
vim /etc/selinux/config
SELINUX=disabled
2、安装依赖:yum install -y libtool yum install -y *unixODBC*
3、下载下面的rpm文件,放到hadoop102虚拟机的/opt/software/目录下:
clickhouse-client-20.4.4.18-2.noarch.rpm
clickhouse-common-static-20.4.4.18-2.x86_64.rpm
clickhouse-common-static-dbg-20.4.4.18-2.x86_64.rpm
clickhouse-server-20.4.4.18-2.noarch.rpm
4、yum安装:
sudo rpm -ivh /opt/software/clickhouse*
5、安装后的安装位置如下:
/etc/systemd/system/multi-user.target.wants/clickhouse-server.service to
/etc/systemd/system/clickhouse-server.service
/etc/clickhouse-server/config.xml: /var/lib/clickhouse/
6、修改配置文件:sudo vim /etc/clickhouse-server/config.xml
<listen_host>::</listen_host> 的注解打开
7、启动服务:sudo systemctl start clickhouse-server
systemctl start clickhouse-server.service
systemctl status clickhouse-server.service
8、启动客户端:clickhouse-client -m
9、关闭开机自启动:sudo systemctl disable clickhouse-server
10、官方文档介绍下载等:
https://clickhouse.tech/docs/en/
官网:https://clickhouse.yandex/
下载地址:http://repo.red-soft.biz/repos/clickhouse/stable/el6/
Start clickhouse-server with:
sudo clickhouse start
Start clickhouse-client with:
clickhouse-client --password
4、数据类型
1、有符号整型:
Int8 - [-128 : 127]
Int16 - [-32768 : 32767]
Int32 - [-2147483648 : 2147483647]
Int64 - [-9223372036854775808 : 9223372036854775807]
2、无符号整型:
UInt8 - [0 : 255]
UInt16 - [0 : 65535]
UInt32 - [0 : 4294967295]
UInt64 - [0 : 18446744073709551615]
推荐无符号整形
3、浮点型:
Float32 - float
Float64 – double
基本不用
4、布尔型:
没有,使用UInt8中的0代表false,1代表true
5、Decimal型:
Decimal32(s),相当于Decimal(9-s,s)
Decimal64(s),相当于Decimal(18-s,s)
Decimal128(s),相当于Decimal(38-s,s)
有保留精度的场合使用,除法采用去尾法而不是四舍五入
6、字符串:
String:可变化的,使用较多
FixedString:不可变化的,使用较少
7、枚举类型:
包括 Enum8 和 Enum16 类型。Enum 保存 'string'= integer 的对应关系。
Enum8 用 'String'= Int8 对描述。
Enum16 用 'String'= Int16 对描述。
创建一个带有一个枚举 Enum8('hello' = 1, 'world' = 2) 类型的列:
CREATE TABLE t_enum
(
x Enum8('hello' = 1, 'world' = 2)
)
ENGINE= TinyLog
插入数据:INSERT INTO t_enum VALUES ('hello'), ('world'), ('hello')
8、时间类型:
Date 接受 年-月-日 的字符串比如 ‘2019-12-16’
Datetime 接受 年-月-日 时:分:秒 的字符串比如 ‘2019-12-16 20:50:10’
Datetime64 接受 年-月-日 时:分:秒.亚秒 的字符串比如 ‘2019-12-16 20:50:10.66’
9、数组:array(T)
例如:select array(1,2,3) as n
5、表引擎
表引擎的作用:
1)数据的存储方式和位置
2)并发数据访问。
3)索引的使用。
4)是否可以执行多线程请求。
5)数据如何拷贝副本。
1、TinyLog:
以列文件的形式保存在磁盘上,不支持索引,没有并发控制。一般保存少量数据的小表,生产环境上作用有限。可以用于平时练习测试用。
2、Memory:
性能高,但重启会消失
3、MergeTree:
可指定分区,指定主键,指定排序字段
create table t_order_mt(
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
) engine =MergeTree
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id,sku_id)
partition by用来指定分区,没有指定的话只有一个,任何一个批次的数据写入都会产生一个临时分区,不会纳入任何一个已有的分区写入后的某个时刻(大概10-15分钟后),clickhouse会自动执行合并操作,手动合并使用命令:optimize table t_order_mt final
primary key用来指定主键可选,可重复,order by用来指定排序字段,必选
没有定义主键怎么办,默认使用order by里的字段
主键必须是order by中字段的前缀字段,比如order by 字段是 (id,sku_id) 那么主键必须是id 或者(id,sku_id)
4、二级索引
所以使用二级索引前需要增加设置:set allow_experimental_data_skipping_indices=1;
搜索条件中涉及了total_amount就可以建立一个二级索引,索引主要作用是过滤,主要作用于where过滤条件中:
INDEX a total_amount TYPE minmax GRANULARITY 5
5、数据声明周期
列级TTL:
create table t_order_mt3(
id UInt32,
sku_id String,
total_amount Decimal(16,2) TTL create_time+interval 10 SECOND,
create_time Datetime
) engine =MergeTree
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id, sku_id)
在建表语句某个字段后面增加 TTL create_time+interval 10 SECOND 创建时间10s后失效
插入数据查询列数据存在
10s后执行:optimize table t_order_mt final 列数据失效
表级TTL:
alter table t_order_mt3 MODIFY TTL create_time + INTERVAL 10 SECOND
10s后执行:optimize table t_order_mt final 表失效
时间周期:
- SECOND
- MINUTE
- HOUR
- DAY
- WEEK
- MONTH
- QUARTER
- YEAR
6、ReplacingMergeTree
可实现去重;
注意:
实际上是使用order by 字段作为唯一键。
去重不能跨分区。
只有合并分区才会进行去重。
认定重复的数据保留,版本字段值最大的。
如果版本字段相同则保留最后一笔。
7、SummingMergeTree
提供预聚合:engine =SummingMergeTree(total_amount)
6、维度分析
1、select id , sku_id,sum(total_amount) from t_order_mt group by id,sku_id with rollup;
with rollup : 从右至左去掉维度进行小计。
2、select id , sku_id,sum(total_amount) from t_order_mt group by id,sku_id with cube;
with cube : 从右至左去掉维度进行小计,再从左至右去掉维度进行小计。
3、select id , sku_id,sum(total_amount) from t_order_mt group by id,sku_id with totals;
with totals: 只计算合计。
7、导出数据
clickhouse-client --query "select * from test.t_order_mt" --format CSVWithNames> ~/rs1.csv
然后使用sz命令将~/rs1.csv文件导出到window桌面查看
8、副本
1、启动zookeeper集群 和另外一台clickhouse 服务器
2、修改两台服务器的配置文件:
mkdir /etc/clickhouse-server/config.d
vim /etc/clickhouse-server/config.d/metrika.xml
<?xml version="1.0"?>
<yandex>
<zookeeper-servers>
<node index="1">
<host>hadoop102</host>
<port>2181</port>
</node>
<node index="2">
<host>hadoop102</host>
<port>2181</port>
</node>
<node index="3">
<host>hadoop102</host>
<port>2181</port>
</node>
</zookeeper-servers>
</yandex>
vim /etc/clickhouse-server/config.xml
<zookeeper incl="zookeeper-servers" optional="true" />
<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
3、两台机器分别建表:
create table rep_t_order_mt_0105 (
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/tables/01/rep_t_order_mt_0105','rep_hdp1')
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id,sku_id);
create table rep_t_order_mt_0105 (
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/tables/01/rep_t_order_mt_0105','rep_hdp1')
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id,sku_id);
4、使用即可
9、分片集群
1、配置文件:vim /etc/clickhouse-server/config.d/metrika.xml
<yandex>
<clickhouse_remote_servers>
<gmall_cluster> <!-- 集群名称-->
<shard> <!--集群的第一个分片-->
<internal_replication>true</internal_replication>
<replica> <!—该分片的第一个副本-->
<host>hadoop102</host>
<port>9000</port>
</replica>
<replica> <!—该分片的第二个副本-->
<host>hadoop103</host>
<port>9000</port>
</replica>
</shard>
<shard> <!--集群的第二个分片-->
<internal_replication>true</internal_replication>
<replica> <!—该分片的第一个副本-->
<host>hadoop104</host>
<port>9000</port>
</replica>
</shard>
</gmall_cluster>
</clickhouse_remote_servers>
<zookeeper-servers>
<node index="1">
<host>hadoop102</host>
<port>2181</port>
</node>
<node index="2">
<host>hadoop103</host>
<port>2181</port>
</node>
<node index="3">
<host>hadoop104</host>
<port>2181</port>
</node>
</zookeeper-servers>
<macros>
<shard>01</shard> <!—不同机器放的分片数不一样-->
<replica>rep_1_1</replica> <!—不同机器放的副本数不一样-->
</macros>
</yandex>
2、分发配置(3台)
配置修改:
hadoop102
<macros>
<shard>01</shard>
<replica>rep_1_1</replica>
</macros>
hadoop103
<macros>
<shard>01</shard>
<replica>rep_1_2</replica>
</macros>
hadoop104
<macros>
<shard>02</shard>
<replica>rep_2_1</replica>
</macros>
3、使用:
create table st_order_mt_0105 on cluster gmall_cluster (
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt_0105','{replica}')
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id,sku_id);
4、分布式表:
create table st_order_mt_0105_all on cluster gmall_cluster
(
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
)engine = Distributed(gmall_cluster,test0105, st_order_mt_0105,hiveHash(sku_id))
Distributed( 集群名称,库名,本地表名,分片键),分片键必须是整型数字
10、编写代码
1、引入依赖
<dependency>
<groupId>ru.yandex.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.1.55</version>
</dependency>
2、连接:jdbc:clickhouse://hadoop102:8123/test
驱动:ru.yandex.clickhouse.ClickHouseDriver
查询:
val address = "jdbc:clickhouse://hadoop102:8123/test"//获取连接
Class.forName("ru.yandex.clickhouse.ClickHouseDriver")//加载驱动
val connection: Connection = DriverManager.getConnection(address)//获取连接
val sql = "select * from t_order_mt"//sql语句
val resultSet: ResultSet = connection.createStatement().executeQuery(sql)//将查询结果放入结果集
println("id" + "\t" + "sku_id" + "\t" + "total_amount" + "\t" + "create_time")//打印表头
while (resultSet.next()){//依次获取每条语句
val id: String = resultSet.getString("id")
val sku_id: String = resultSet.getString("sku_id")
val total_amount: String = resultSet.getString("total_amount")
val create_time = resultSet.getString("create_time")
println(id + "\t" + sku_id + "\t" + total_amount + "\t" + create_time)//打印
}
connection.close()//关闭连接
11、增删改查
字段修改:
alter TABLE data_web.dm_ftsp_cust_tag_ctd MODIFY COLUMN cust_id Nullable(String);
alter TABLE data_web.dm_ftsp_cust_tag_ctd ADD COLUMN major_accounts_num Nullable(Decimal(38, 4)) after Div5TailNum ;
alter TABLE test.ontime_wpp_t drop COLUMN TailNum2;
CREATE TABLE zhida.SRC_ZD_HQLM_SUM
(
"ID" Nullable(VARCHAR2(255) ) ,
"SHOWID" VARCHAR2(255) ,
"YEAR" VARCHAR2(255 ) ,
"ZHIDA_PEOPLE_NUM" Int32,
"ZHIDA_PEOPLE_MONEY" Nullable(Decimal(38, 2)) ,
"ZHIDA_ENT_NUM" Int32,
"ZHIDA_ENT_MONEY" Decimal(38, 2)
)
ENGINE = MergeTree()
PARTITION BY SHOWID
ORDER BY tuple()
SETTINGS index_granularity = 8192
(1)添加列:
alter table [db.]table_name add column column_name [type] [default_expr] [after name_after]
type:类型
default_expr:默认值
after name_agter:把新列添加在指定列后面
(2)删除列:
alter table [db.]table_name drop column column_name
(3)修改列
alter table [db.]table_name modify column column_name [type] [default_expr]
(4)删除分区
alter table [db.]table drop partition 'partition_name'
12、建表语句
创建表:
CREATE TABLE grow.dwd_grow_crm_wx_chatdata (
`customer_id` Nullable(Int64) COMMENT '介绍人ID'
, `customer_name` Nullable(String) COMMENT '介绍人'
, `introduce_company` Nullable(String) COMMENT '介绍人客户公司'
, `name` Nullable(String) COMMENT '线索名称(客户名称)'
, `created_at` Nullable(String) COMMENT '线索生成时间'
, `gw_username` Nullable(String) COMMENT '顾问姓名'
, `gw_company_name` Nullable(String) COMMENT '顾问所属机构'
, `bill_id` Nullable(String) COMMENT '海报ID'
)
ENGINE = ReplacingMergeTree ORDER BY customer_id SETTINGS index_granularity = 8192
新增字段:
alter table event add column `$user_id` Nullable(String);
13、日期函数
SELECT
toDateTime('2016-06-15 23:00:00') AS time,
toDate(time) AS date_local,
toDate(time, 'Asia/Yekaterinburg') AS date_yekat,
toString(time, 'US/Samoa') AS time_samoa
now() // 2020-04-01 17:25:40 取当前时间
toYear() // 2020 取日期中的年份
toMonth() // 4 取日期中的月份
today() // 2020-04-01 今天的日期
yesterday() // 2020-03-31 昨天的额日期
toDayOfYear() // 92 取一年中的第几天
toDayOfWeek() // 3 取一周中的第几天
toHour() //17 取小时
toMinute() //25 取分钟
toSecond() //40 取秒
toStartOfYear() //2020-01-01 取一年中的第一天
toStartOfMonth() //2020-04-01 取当月的第一天
formatDateTime(now(),'%Y-%m-%d') // 2020*04-01 指定时间格式
toYYYYMM() //202004
toYYYYMMDD() //20200401
toYYYYMMDDhhmmss() //20200401172540
dateDiff()
SELECT
toDateTime('2019-07-30 10:10:10') AS time,
-- 将DateTime转换成Unix时间戳
toUnixTimestamp(time) as unixTimestamp,
-- 保留 时-分-秒
toDate(time) as date_local,
toTime(time) as date_time, -- 将DateTime中的日期转换为一个固定的日期,同时保留时间部分。
-- 获取年份,月份,季度,小时,分钟,秒钟
toYear(time) as get_year,
toMonth(time) as get_month,
-- 一年分为四个季度。1(一季度:1-3),2(二季度:4-6),3(三季度:7-9),4(四季度:10-12)
toQuarter(time) as get_quarter,
toHour(time) as get_hour,
toMinute(time) as get_minute,
toSecond(time) as get_second,
-- 获取 DateTime中的当前日期是当前年份的第几天,当前月份的第几日,当前星期的周几
toDayOfYear(time) as "当前年份中的第几天",
toDayOfMonth(time) as "当前月份的第几天",
toDayOfWeek(time) as "星期",
toDate(time, 'Asia/Shanghai') AS date_shanghai,
toDateTime(time, 'Asia/Shanghai') AS time_shanghai,
-- 得到当前年份的第一天,当前月份的第一天,当前季度的第一天,当前日期的开始时刻
toStartOfYear(time),
toStartOfMonth(time),
toStartOfQuarter(time),
toStartOfDay(time) AS cur_start_daytime,
toStartOfHour(time) as cur_start_hour,
toStartOfMinute(time) AS cur_start_minute,
-- 从过去的某个固定的时间开始,以此得到当前指定的日期的编号
toRelativeYearNum(time),
toRelativeQuarterNum(time);
未来时间
-- 第一种,日期格式(指定日期,需注意时区的问题)
WITH
toDate('2019-09-09') AS date,
toDateTime('2019-09-09 00:00:00') AS date_time
SELECT
addYears(date, 1) AS add_years_with_date,
addYears(date_time, 0) AS add_years_with_date_time;
-- 第二种,日期格式(当前,本地时间)
WITH
toDate(now()) as date,
toDateTime(now()) as date_time
SELECT
now() as now_time,-- 当前时间
-- 之后1年
addYears(date, 1) AS add_years_with_date,
addYears(date_time, 1) AS add_years_with_date_time,
-- 之后1月
addMonths(date, 1) AS add_months_with_date,
addMonths(date_time, 1) AS add_months_with_date_time,
--之后1周
addWeeks(date, 1) AS add_weeks_with_date,
addWeeks(date_time, 1) AS add_weeks_with_date_time,
-- 之后1天
addDays(date, 1) AS add_days_with_date,
addDays(date_time, 1) AS add_days_with_date_time,
--之后1小时
addHours(date_time, 1) AS add_hours_with_date_time,
--之后1分中
addMinutes(date_time, 1) AS add_minutes_with_date_time,
-- 之后10秒钟
addSeconds(date_time, 10) AS add_seconds_with_date_time,
-- 之后1个季度
addQuarters(date, 1) AS add_quarters_with_date,
addQuarters(date_time, 1) AS add_quarters_with_date_time;
过去时间
WITH
toDate(now()) as date,
toDateTime(now()) as date_time
SELECT
subtractYears(date, 1) AS subtract_years_with_date,
subtractYears(date_time, 1) AS subtract_years_with_date_time,
subtractQuarters(date, 1) AS subtract_Quarters_with_date,
subtractQuarters(date_time, 1) AS subtract_Quarters_with_date_time,
subtractMonths(date, 1) AS subtract_Months_with_date,
subtractMonths(date_time, 1) AS subtract_Months_with_date_time,
subtractWeeks(date, 1) AS subtract_Weeks_with_date,
subtractWeeks(date_time, 1) AS subtract_Weeks_with_date_time,
subtractDays(date, 1) AS subtract_Days_with_date,
subtractDays(date_time, 1) AS subtract_Days_with_date_time,
subtractHours(date_time, 1) AS subtract_Hours_with_date_time,
subtractMinutes(date_time, 1) AS subtract_Minutes_with_date_time,
subtractSeconds(date_time, 1) AS subtract_Seconds_with_date_time;
SELECT toDate('2019-07-31', 'Asia/GuangZhou') as date_guangzhou;
SELECT toDate('2019-07-31'), toDate('2019-07-31', 'Asia/Beijing') as date_beijing;
-- 亚洲只能加载上海的timezone???
SELECT toDateTime('2019-07-31 10:10:10', 'Asia/Shanghai') as date_shanghai;
时间差
-- 第一种:指定时间计算差值示例
WITH
toDateTime('2019-07-30 10:10:10', 'Asia/Shanghai') as date_shanghai_one,
toDateTime('2020-10-31 11:20:30', 'Asia/Shanghai') as date_shanghai_two
SELECT
dateDiff('year', date_shanghai_one, date_shanghai_two) as diff_years,
dateDiff('month', date_shanghai_one, date_shanghai_two) as diff_months,
dateDiff('week', date_shanghai_one, date_shanghai_two) as diff_week,
dateDiff('day', date_shanghai_one, date_shanghai_two) as diff_days,
dateDiff('hour', date_shanghai_one, date_shanghai_two) as diff_hours,
dateDiff('minute', date_shanghai_one, date_shanghai_two) as diff_minutes,
dateDiff('second', date_shanghai_one, date_shanghai_two) as diff_seconds;
-- 第二种:本地当前时间示例
WITH
now() as date_time
SELECT
dateDiff('year', date_time, addYears(date_time, 1)) as diff_years,
dateDiff('month', date_time, addMonths(date_time, 2)) as diff_months,
dateDiff('week', date_time, addWeeks(date_time, 3)) as diff_week,
dateDiff('day', date_time, addDays(date_time, 3)) as diff_days,
dateDiff('hour', date_time, addHours(date_time, 3)) as diff_hours,
dateDiff('minute', date_time, addMinutes(date_time, 30)) as diff_minutes,
dateDiff('second', date_time, addSeconds(date_time, 35)) as diff_seconds;
14、常用命令
各个数据库存储空间统计
SELECT database, formatReadableSize(sum(bytes_on_disk)) AS on_disk FROM system.parts GROUP BY database
结果:
data_web 968.47 MiB
test_log 9.34 KiB
grow 69.73 MiB
system 2.37 GiB
myb_test 22.29 KiB
ftsp_product 1.53 GiB
存储空间统计
SELECT
name
,path
,formatReadableSize(free_space) AS free
,formatReadableSize(total_space) AS total
,formatReadableSize(keep_free_space) AS reserved
FROM system.disks
结果:
default /var/lib/clickhouse/ 359.88 GiB 492.03 GiB 0.00 B
当前连接数
SELECT * FROM system.metrics WHERE metric LIKE '%Connection';
结果:
TCPConnection 0 Number of connections to TCP server (clients with native interface), also included server-server distributed query connections
MySQLConnection 0 Number of client connections using MySQL protocol
HTTPConnection 1 Number of connections to HTTP server
InterserverConnection 0 Number of connections from other replicas to fetch parts
当前正在执行的查询:
SELECT query_id, user, address, query FROM system.processes ORDER BY query_id;
终止查询
KILL QUERY WHERE query_id = 'ff695827-dbf5-45ad-9858-a853946ea140';
查询所有的查询记录
SELECT
user
, client_hostname AS host
, client_name AS client
, formatDateTime(query_start_time, '%T') AS started
, query_duration_ms / 1000 AS sec
, round(memory_usage / 1048576) AS MEM_MB
, result_rows AS RES_CNT
, result_bytes / 1048576 AS RES_MB
, read_rows AS R_CNT
, round(read_bytes / 1048576) AS R_MB
, written_rows AS W_CNT
, round(written_bytes / 1048576) AS W_MB
, query
FROM system.query_log
WHERE type = 2
ORDER BY query_duration_ms DESC
LIMIT 10
;
15、根据某个字段排序
select num,arr_time, day_, cust_phone
from (
select day_, cust_phone
,groupArray(update_date) arr_time
from(
select update_date, day_, staff_group, staff_name, staff_no, cust_phone, call_type, total_time
from (
select
update_date
,formatDateTime(update_date, '%Y-%m-%d') as day_
,case when staff_id = 'h0000000000000411023751335247872' then '财税五组'
when staff_id = 'h0000000000000447602747361484800' then '财税五组'
when staff_id = 'h0000000000000449467198091747328' then '财税五组'
when staff_id = 'h0000000000000479477792350912512' then '财税五组'
when staff_id = 'h0000000000000480904551738851328' then '财税五组'
when staff_id = 'h0000000000000462052726800130048' then '财税五组'
end as staff_group
,case when staff_id = 'h0000000000000411023751335247872' then '李葳'
when staff_id = 'h0000000000000447602747361484800' then '秦文娇'
when staff_id = 'h0000000000000449467198091747328' then '王美玲'
when staff_id = 'h0000000000000479477792350912512' then '曲英华'
when staff_id = 'h0000000000000480904551738851328' then '刘东洋'
when staff_id = 'h0000000000000462052726800130048' then '孙柳芳'
end as staff_name
,case when staff_id = 'h0000000000000411023751335247872' then '1022915'
when staff_id = 'h0000000000000447602747361484800' then '1027435'
when staff_id = 'h0000000000000449467198091747328' then '1027612'
when staff_id = 'h0000000000000479477792350912512' then '1030854'
when staff_id = 'h0000000000000480904551738851328' then '1030966'
when staff_id = 'h0000000000000462052726800130048' then '1029271'
end as staff_no
,cust_phone
,case when total_time > 0 then '无效'
else '未接通'
end as call_type
,total_time
from ftsp_product.grow_call_record
where staff_id in (
'h0000000000000411023751335247872',
'h0000000000000447602747361484800',
'h0000000000000449467198091747328',
'h0000000000000479477792350912512',
'h0000000000000480904551738851328',
'h0000000000000462052726800130048'
)
and total_time < 30
and update_date is not null
)
order by update_date --排序字段
)
group by day_,cust_phone --去重字段
)
ARRAY JOIN
arr_time,
arrayEnumerate(arr_time) as num
where num = 1 ---取一条