ClickHouse

1、概述

  1. 1、俄国人使用c++编写
  2. 2、列式存储
  3. 3、开源
  4. 4、在线分析处理查询(OLAP)

2、原理概述

1、列式存储的好处:
对于列的聚合,计数更优
同一列的数据类型相同,压缩性能更好,更节省磁盘空间

2、写操作:
采用类似LSM tree 的结构,数据先顺序写入临时数据区,之后异步合并到主数据区,写入性能优异,但不支持删改行数据,也不支持事务

3、读操作:
clickhouse是单语句多线程,即一条查询语句可能是由多个线程完成的,而MySQL中一条查询语句只有一个线程,多个语句才有可能起多个线程执行,一条语句起多个线程,处理单条语句很强,但是不适合处理多条语句的情景

4、采用稀疏索引:
优点是可以快速定位一大片数据
缺点是不适合做点对点查询
Hbase和redis是支持hash索引的,可以点对点查询,Mysql使用B+数索引,即可以点对点,也可以范围查找

5、性能分析结果:适合单个大宽表查询

3、单机安装

1、修改虚拟机参数:
vim /etc/security/limits.conf
* soft nofile 65536 
* hard nofile 65536 
* soft nproc 131072 
* hard nproc 131072

vim /etc/selinux/config
SELINUX=disabled

2、安装依赖:yum install -y libtool  yum install -y *unixODBC*

3、下载下面的rpm文件,放到hadoop102虚拟机的/opt/software/目录下:
clickhouse-client-20.4.4.18-2.noarch.rpm
clickhouse-common-static-20.4.4.18-2.x86_64.rpm
clickhouse-common-static-dbg-20.4.4.18-2.x86_64.rpm
clickhouse-server-20.4.4.18-2.noarch.rpm

4、yum安装:
sudo rpm -ivh /opt/software/clickhouse*

5、安装后的安装位置如下:
/etc/systemd/system/multi-user.target.wants/clickhouse-server.service to 
/etc/systemd/system/clickhouse-server.service
/etc/clickhouse-server/config.xml: /var/lib/clickhouse/

6、修改配置文件:sudo vim /etc/clickhouse-server/config.xml
<listen_host>::</listen_host> 的注解打开

7、启动服务:sudo systemctl start clickhouse-server
systemctl start clickhouse-server.service
systemctl status clickhouse-server.service


8、启动客户端:clickhouse-client -m

9、关闭开机自启动:sudo systemctl disable clickhouse-server

10、官方文档介绍下载等:
https://clickhouse.tech/docs/en/
官网:https://clickhouse.yandex/
下载地址:http://repo.red-soft.biz/repos/clickhouse/stable/el6/


Start clickhouse-server with:
 sudo clickhouse start

Start clickhouse-client with:
 clickhouse-client --password

4、数据类型

1、有符号整型:
Int8 - [-128 : 127]
Int16 - [-32768 : 32767]
Int32 - [-2147483648 : 2147483647]
Int64 - [-9223372036854775808 : 9223372036854775807]

2、无符号整型:
UInt8 - [0 : 255]
UInt16 - [0 : 65535]
UInt32 - [0 : 4294967295]
UInt64 - [0 : 18446744073709551615]

推荐无符号整形

3、浮点型:
Float32 - float
Float64 – double

基本不用

4、布尔型:
没有,使用UInt8中的0代表false,1代表true

5、Decimal型:
Decimal32(s),相当于Decimal(9-s,s)
Decimal64(s),相当于Decimal(18-s,s)
Decimal128(s),相当于Decimal(38-s,s)

有保留精度的场合使用,除法采用去尾法而不是四舍五入

6、字符串:
String:可变化的,使用较多
FixedString:不可变化的,使用较少

7、枚举类型:
包括 Enum8 和 Enum16 类型。Enum 保存 'string'= integer 的对应关系。
Enum8 用 'String'= Int8 对描述。
Enum16 用 'String'= Int16 对描述。

创建一个带有一个枚举 Enum8('hello' = 1, 'world' = 2) 类型的列:
CREATE TABLE t_enum
(
    x Enum8('hello' = 1, 'world' = 2)
)
ENGINE= TinyLog

插入数据:INSERT INTO t_enum VALUES ('hello'), ('world'), ('hello')

8、时间类型:
Date 接受 年-月-日 的字符串比如 ‘2019-12-16’
Datetime 接受 年-月-日 时:分:秒 的字符串比如 ‘2019-12-16 20:50:10’
Datetime64 接受 年-月-日 时:分:秒.亚秒 的字符串比如 ‘2019-12-16 20:50:10.66’

9、数组:array(T)
例如:select array(1,2,3) as n

5、表引擎

表引擎的作用:
1)数据的存储方式和位置 
2)并发数据访问。
3)索引的使用。
4)是否可以执行多线程请求。
5)数据如何拷贝副本。

1、TinyLog:
以列文件的形式保存在磁盘上,不支持索引,没有并发控制。一般保存少量数据的小表,生产环境上作用有限。可以用于平时练习测试用。

2、Memory:
性能高,但重启会消失

3、MergeTree:
可指定分区,指定主键,指定排序字段
create table t_order_mt(
    id UInt32,
    sku_id String,
    total_amount Decimal(16,2),
    create_time  Datetime
 ) engine =MergeTree
 partition by toYYYYMMDD(create_time)
   primary key (id)
   order by (id,sku_id)

partition by用来指定分区,没有指定的话只有一个,任何一个批次的数据写入都会产生一个临时分区,不会纳入任何一个已有的分区写入后的某个时刻(大概10-15分钟后),clickhouse会自动执行合并操作,手动合并使用命令:optimize table t_order_mt final

primary key用来指定主键可选,可重复,order by用来指定排序字段,必选
没有定义主键怎么办,默认使用order by里的字段
主键必须是order by中字段的前缀字段,比如order by 字段是 (id,sku_id)  那么主键必须是id 或者(id,sku_id)

4、二级索引
所以使用二级索引前需要增加设置:set allow_experimental_data_skipping_indices=1;
搜索条件中涉及了total_amount就可以建立一个二级索引,索引主要作用是过滤,主要作用于where过滤条件中:
INDEX a total_amount TYPE minmax GRANULARITY 5

5、数据声明周期
列级TTL:
  create table t_order_mt3(
    id UInt32,
    sku_id String,
    total_amount Decimal(16,2)  TTL create_time+interval 10 SECOND,
    create_time  Datetime 
 ) engine =MergeTree
 partition by toYYYYMMDD(create_time)
   primary key (id)
   order by (id, sku_id)
在建表语句某个字段后面增加 TTL create_time+interval 10 SECOND 创建时间10s后失效
插入数据查询列数据存在
10s后执行:optimize table t_order_mt final 列数据失效

表级TTL:
alter table t_order_mt3 MODIFY TTL create_time + INTERVAL 10 SECOND
10s后执行:optimize table t_order_mt final 表失效

时间周期:
- SECOND
- MINUTE
- HOUR
- DAY
- WEEK
- MONTH
- QUARTER
- YEAR

6、ReplacingMergeTree
可实现去重;
注意:
    实际上是使用order by 字段作为唯一键。
    去重不能跨分区。
    只有合并分区才会进行去重。
    认定重复的数据保留,版本字段值最大的。
    如果版本字段相同则保留最后一笔。

7、SummingMergeTree
提供预聚合:engine =SummingMergeTree(total_amount)

6、维度分析

1、select id , sku_id,sum(total_amount) from  t_order_mt group by id,sku_id with rollup;
with rollup : 从右至左去掉维度进行小计。

2、select id , sku_id,sum(total_amount) from  t_order_mt group by id,sku_id with cube;
with cube : 从右至左去掉维度进行小计,再从左至右去掉维度进行小计。

3、select id , sku_id,sum(total_amount) from  t_order_mt group by id,sku_id with totals;
with totals: 只计算合计。

7、导出数据

clickhouse-client  --query    "select * from test.t_order_mt" --format CSVWithNames> ~/rs1.csv

然后使用sz命令将~/rs1.csv文件导出到window桌面查看

8、副本

1、启动zookeeper集群 和另外一台clickhouse 服务器

2、修改两台服务器的配置文件:
mkdir /etc/clickhouse-server/config.d
vim /etc/clickhouse-server/config.d/metrika.xml

<?xml version="1.0"?>
<yandex>
  <zookeeper-servers>
     <node index="1">
         <host>hadoop102</host>
         <port>2181</port>
     </node>
     <node index="2">
         <host>hadoop102</host>
         <port>2181</port>
     </node>
<node index="3">
         <host>hadoop102</host>
         <port>2181</port>
     </node>
  </zookeeper-servers>
</yandex>

vim /etc/clickhouse-server/config.xml
<zookeeper incl="zookeeper-servers" optional="true" />
<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>

3、两台机器分别建表:
create table rep_t_order_mt_0105 (
    id UInt32,
    sku_id String,
    total_amount Decimal(16,2),
    create_time  Datetime
 ) engine =ReplicatedMergeTree('/clickhouse/tables/01/rep_t_order_mt_0105','rep_hdp1')
 partition by toYYYYMMDD(create_time)
   primary key (id)
   order by (id,sku_id);

create table rep_t_order_mt_0105 (
    id UInt32,
    sku_id String,
    total_amount Decimal(16,2),
    create_time  Datetime
 ) engine =ReplicatedMergeTree('/clickhouse/tables/01/rep_t_order_mt_0105','rep_hdp1')
 partition by toYYYYMMDD(create_time)
   primary key (id)
   order by (id,sku_id);

4、使用即可

9、分片集群

1、配置文件:vim /etc/clickhouse-server/config.d/metrika.xml
<yandex>
<clickhouse_remote_servers>
<gmall_cluster> <!-- 集群名称--> 
  <shard>         <!--集群的第一个分片-->
<internal_replication>true</internal_replication>
     <replica>    <!—该分片的第一个副本-->
          <host>hadoop102</host>
          <port>9000</port>
     </replica>
     <replica>    <!—该分片的第二个副本-->
          <host>hadoop103</host>
          <port>9000</port>
     </replica>
  </shard>

  <shard>  <!--集群的第二个分片-->
     <internal_replication>true</internal_replication>
     <replica>    <!—该分片的第一个副本-->
          <host>hadoop104</host>
          <port>9000</port>
     </replica>
</shard>



</gmall_cluster>

</clickhouse_remote_servers>



<zookeeper-servers>
  <node index="1">
    <host>hadoop102</host>
    <port>2181</port>
  </node>

  <node index="2">
    <host>hadoop103</host>
    <port>2181</port>
  </node>
  <node index="3">
    <host>hadoop104</host>
    <port>2181</port>
  </node>
</zookeeper-servers>

<macros>
<shard>01</shard>   <!—不同机器放的分片数不一样-->
<replica>rep_1_1</replica>  <!—不同机器放的副本数不一样-->

</macros>

</yandex>

2、分发配置(3台)
配置修改:
hadoop102        
<macros>
<shard>01</shard> 
<replica>rep_1_1</replica>
</macros>    

hadoop103
<macros>
<shard>01</shard> 
<replica>rep_1_2</replica>
</macros>    

hadoop104
<macros>
<shard>02</shard> 
<replica>rep_2_1</replica>
</macros>


3、使用:
    create table st_order_mt_0105 on cluster gmall_cluster (
    id UInt32,
    sku_id String,
    total_amount Decimal(16,2),
    create_time  Datetime
 ) engine =ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt_0105','{replica}')
 partition by toYYYYMMDD(create_time)
   primary key (id)
   order by (id,sku_id);


4、分布式表:
create table st_order_mt_0105_all on cluster gmall_cluster
(
    id UInt32,
    sku_id String,
    total_amount Decimal(16,2),
    create_time  Datetime
)engine = Distributed(gmall_cluster,test0105, st_order_mt_0105,hiveHash(sku_id))

Distributed( 集群名称,库名,本地表名,分片键),分片键必须是整型数字

10、编写代码

1、引入依赖
<dependency>
    <groupId>ru.yandex.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.1.55</version>
</dependency>

2、连接:jdbc:clickhouse://hadoop102:8123/test
驱动:ru.yandex.clickhouse.ClickHouseDriver

查询:
    val address = "jdbc:clickhouse://hadoop102:8123/test"//获取连接

    Class.forName("ru.yandex.clickhouse.ClickHouseDriver")//加载驱动

    val connection: Connection = DriverManager.getConnection(address)//获取连接

    val sql = "select * from t_order_mt"//sql语句

    val resultSet: ResultSet = connection.createStatement().executeQuery(sql)//将查询结果放入结果集

    println("id" + "\t" + "sku_id" + "\t" + "total_amount" + "\t" + "create_time")//打印表头
    while (resultSet.next()){//依次获取每条语句
      val id: String = resultSet.getString("id")
      val sku_id: String = resultSet.getString("sku_id")
      val total_amount: String = resultSet.getString("total_amount")
      val create_time = resultSet.getString("create_time")

      println(id + "\t" + sku_id + "\t" + total_amount + "\t" + create_time)//打印
    }

    connection.close()//关闭连接

11、增删改查

字段修改:
alter TABLE data_web.dm_ftsp_cust_tag_ctd MODIFY COLUMN cust_id Nullable(String);

alter TABLE data_web.dm_ftsp_cust_tag_ctd ADD COLUMN major_accounts_num Nullable(Decimal(38, 4))  after Div5TailNum ;

alter TABLE test.ontime_wpp_t drop COLUMN TailNum2;


CREATE TABLE zhida.SRC_ZD_HQLM_SUM
(
"ID"  Nullable(VARCHAR2(255) ) ,
"SHOWID" VARCHAR2(255)  ,
"YEAR" VARCHAR2(255 )  ,
"ZHIDA_PEOPLE_NUM" Int32,
"ZHIDA_PEOPLE_MONEY" Nullable(Decimal(38, 2)) ,
"ZHIDA_ENT_NUM" Int32,
"ZHIDA_ENT_MONEY" Decimal(38, 2)
)
ENGINE = MergeTree()
PARTITION BY SHOWID
ORDER BY tuple()
SETTINGS index_granularity = 8192

(1)添加列:

alter table [db.]table_name add column column_name [type] [default_expr] [after name_after]
type:类型

default_expr:默认值

after name_agter:把新列添加在指定列后面

(2)删除列:

alter table [db.]table_name drop column column_name
(3)修改列

alter table [db.]table_name modify column column_name [type] [default_expr]
 (4)删除分区

alter table [db.]table drop partition 'partition_name'

12、建表语句

创建表:

CREATE TABLE grow.dwd_grow_crm_wx_chatdata (
  `customer_id` Nullable(Int64) COMMENT '介绍人ID'
  , `customer_name` Nullable(String) COMMENT '介绍人'
  , `introduce_company` Nullable(String) COMMENT '介绍人客户公司'
  , `name` Nullable(String) COMMENT '线索名称(客户名称)'
  , `created_at` Nullable(String) COMMENT '线索生成时间'
  , `gw_username` Nullable(String) COMMENT '顾问姓名'
  , `gw_company_name` Nullable(String) COMMENT '顾问所属机构'
  , `bill_id` Nullable(String) COMMENT '海报ID'
  ) 
ENGINE = ReplacingMergeTree ORDER BY customer_id SETTINGS index_granularity = 8192

新增字段:
alter table event add column `$user_id` Nullable(String);

13、日期函数

SELECT
    toDateTime('2016-06-15 23:00:00') AS time,
    toDate(time) AS date_local,
    toDate(time, 'Asia/Yekaterinburg') AS date_yekat,
    toString(time, 'US/Samoa') AS time_samoa


now()                // 2020-04-01 17:25:40     取当前时间
toYear()             // 2020                    取日期中的年份
toMonth()            // 4                       取日期中的月份
today()              // 2020-04-01              今天的日期
yesterday()          // 2020-03-31              昨天的额日期
toDayOfYear()        // 92                      取一年中的第几天     
toDayOfWeek()        // 3                       取一周中的第几天
toHour()             //17                       取小时
toMinute()           //25                       取分钟
toSecond()           //40                       取秒
toStartOfYear()      //2020-01-01               取一年中的第一天
toStartOfMonth()     //2020-04-01               取当月的第一天

formatDateTime(now(),'%Y-%m-%d')        // 2020*04-01         指定时间格式
toYYYYMM()                              //202004              
toYYYYMMDD()                            //20200401
toYYYYMMDDhhmmss()                      //20200401172540
dateDiff()


SELECT
    toDateTime('2019-07-30 10:10:10') AS time,  

    -- 将DateTime转换成Unix时间戳
    toUnixTimestamp(time) as unixTimestamp,  

    -- 保留 时-分-秒
    toDate(time) as date_local,
    toTime(time) as date_time,   -- 将DateTime中的日期转换为一个固定的日期,同时保留时间部分。

    -- 获取年份,月份,季度,小时,分钟,秒钟
    toYear(time) as get_year,
    toMonth(time) as get_month,

    -- 一年分为四个季度。1(一季度:1-3),2(二季度:4-6),3(三季度:7-9),4(四季度:10-12)
    toQuarter(time) as get_quarter,
    toHour(time) as get_hour,
    toMinute(time) as get_minute,
    toSecond(time) as get_second,

    -- 获取 DateTime中的当前日期是当前年份的第几天,当前月份的第几日,当前星期的周几
    toDayOfYear(time) as "当前年份中的第几天",
    toDayOfMonth(time) as "当前月份的第几天",
    toDayOfWeek(time) as "星期",
    toDate(time, 'Asia/Shanghai') AS date_shanghai,
    toDateTime(time, 'Asia/Shanghai') AS time_shanghai,

    -- 得到当前年份的第一天,当前月份的第一天,当前季度的第一天,当前日期的开始时刻
    toStartOfYear(time),
    toStartOfMonth(time),
    toStartOfQuarter(time),
    toStartOfDay(time) AS cur_start_daytime,
    toStartOfHour(time) as cur_start_hour,
    toStartOfMinute(time) AS cur_start_minute,

    -- 从过去的某个固定的时间开始,以此得到当前指定的日期的编号
    toRelativeYearNum(time),
    toRelativeQuarterNum(time);

未来时间
-- 第一种,日期格式(指定日期,需注意时区的问题)
WITH
    toDate('2019-09-09') AS date,
    toDateTime('2019-09-09 00:00:00') AS date_time
SELECT
    addYears(date, 1) AS add_years_with_date,
    addYears(date_time, 0) AS add_years_with_date_time;

-- 第二种,日期格式(当前,本地时间)
WITH
    toDate(now()) as date,
    toDateTime(now()) as date_time
SELECT
    now() as now_time,-- 当前时间
    -- 之后1年
    addYears(date, 1) AS add_years_with_date,                  
    addYears(date_time, 1) AS add_years_with_date_time,

    -- 之后1月
    addMonths(date, 1) AS add_months_with_date,                 
    addMonths(date_time, 1) AS add_months_with_date_time,

    --之后1周
    addWeeks(date, 1) AS add_weeks_with_date,                   
    addWeeks(date_time, 1) AS add_weeks_with_date_time,

    -- 之后1天
    addDays(date, 1) AS add_days_with_date,                     
    addDays(date_time, 1) AS add_days_with_date_time,

    --之后1小时
    addHours(date_time, 1) AS add_hours_with_date_time,  

    --之后1分中       
    addMinutes(date_time, 1) AS add_minutes_with_date_time,

    -- 之后10秒钟     
    addSeconds(date_time, 10) AS add_seconds_with_date_time,

     -- 之后1个季度    
    addQuarters(date, 1) AS add_quarters_with_date,            
    addQuarters(date_time, 1) AS add_quarters_with_date_time;

过去时间
 WITH
    toDate(now()) as date,
    toDateTime(now()) as date_time
SELECT
    subtractYears(date, 1) AS subtract_years_with_date,
    subtractYears(date_time, 1) AS subtract_years_with_date_time,
    subtractQuarters(date, 1) AS subtract_Quarters_with_date,
    subtractQuarters(date_time, 1) AS subtract_Quarters_with_date_time,
    subtractMonths(date, 1) AS subtract_Months_with_date,
    subtractMonths(date_time, 1) AS subtract_Months_with_date_time,
    subtractWeeks(date, 1) AS subtract_Weeks_with_date,
    subtractWeeks(date_time, 1) AS subtract_Weeks_with_date_time,
    subtractDays(date, 1) AS subtract_Days_with_date,
    subtractDays(date_time, 1) AS subtract_Days_with_date_time,
    subtractHours(date_time, 1) AS subtract_Hours_with_date_time,
    subtractMinutes(date_time, 1) AS subtract_Minutes_with_date_time,
    subtractSeconds(date_time, 1) AS subtract_Seconds_with_date_time;

SELECT toDate('2019-07-31', 'Asia/GuangZhou') as date_guangzhou;
SELECT toDate('2019-07-31'), toDate('2019-07-31', 'Asia/Beijing') as date_beijing;

-- 亚洲只能加载上海的timezone???
SELECT toDateTime('2019-07-31 10:10:10', 'Asia/Shanghai') as date_shanghai;


时间差
-- 第一种:指定时间计算差值示例
WITH
    toDateTime('2019-07-30 10:10:10', 'Asia/Shanghai') as date_shanghai_one,
    toDateTime('2020-10-31 11:20:30', 'Asia/Shanghai') as date_shanghai_two
SELECT
    dateDiff('year', date_shanghai_one, date_shanghai_two) as diff_years,
    dateDiff('month', date_shanghai_one, date_shanghai_two) as diff_months,
    dateDiff('week', date_shanghai_one, date_shanghai_two) as diff_week,
    dateDiff('day', date_shanghai_one, date_shanghai_two) as diff_days,
    dateDiff('hour', date_shanghai_one, date_shanghai_two) as diff_hours,
    dateDiff('minute', date_shanghai_one, date_shanghai_two) as diff_minutes,
    dateDiff('second', date_shanghai_one, date_shanghai_two) as diff_seconds;


-- 第二种:本地当前时间示例
WITH
    now() as date_time
SELECT
    dateDiff('year', date_time, addYears(date_time, 1)) as diff_years,
    dateDiff('month', date_time, addMonths(date_time, 2)) as diff_months,
    dateDiff('week', date_time, addWeeks(date_time, 3)) as diff_week,
    dateDiff('day', date_time, addDays(date_time, 3)) as diff_days,
    dateDiff('hour', date_time, addHours(date_time, 3)) as diff_hours,
    dateDiff('minute', date_time, addMinutes(date_time, 30)) as diff_minutes,
    dateDiff('second', date_time, addSeconds(date_time, 35)) as diff_seconds;

14、常用命令

各个数据库存储空间统计

SELECT database, formatReadableSize(sum(bytes_on_disk)) AS on_disk FROM system.parts GROUP BY database

结果:
data_web    968.47 MiB
test_log    9.34 KiB
grow    69.73 MiB
system    2.37 GiB
myb_test    22.29 KiB
ftsp_product    1.53 GiB

存储空间统计

SELECT 
name
,path
,formatReadableSize(free_space) AS free
,formatReadableSize(total_space) AS total
,formatReadableSize(keep_free_space) AS reserved
FROM system.disks

结果:
default    /var/lib/clickhouse/    359.88 GiB    492.03 GiB    0.00 B

当前连接数

SELECT * FROM system.metrics WHERE metric LIKE '%Connection';

结果:
TCPConnection    0    Number of connections to TCP server (clients with native interface), also included server-server distributed query connections
MySQLConnection    0    Number of client connections using MySQL protocol
HTTPConnection    1    Number of connections to HTTP server
InterserverConnection    0    Number of connections from other replicas to fetch parts

当前正在执行的查询:

SELECT query_id, user, address, query FROM system.processes ORDER BY query_id;

终止查询

KILL QUERY WHERE query_id = 'ff695827-dbf5-45ad-9858-a853946ea140';

查询所有的查询记录

SELECT 
user
, client_hostname AS host
, client_name AS client
, formatDateTime(query_start_time, '%T') AS started
, query_duration_ms / 1000 AS sec
, round(memory_usage / 1048576) AS MEM_MB
, result_rows AS RES_CNT
, result_bytes / 1048576 AS RES_MB
, read_rows AS R_CNT
, round(read_bytes / 1048576) AS R_MB
, written_rows AS W_CNT
, round(written_bytes / 1048576) AS W_MB
, query 
FROM system.query_log 
WHERE type = 2 
ORDER BY query_duration_ms DESC 
LIMIT 10
;

15、根据某个字段排序

select num,arr_time, day_, cust_phone
            from (
                select day_, cust_phone
                ,groupArray(update_date) arr_time
                from(
                    select update_date, day_, staff_group, staff_name, staff_no, cust_phone, call_type, total_time
                    from (
                        select
                        update_date
                        ,formatDateTime(update_date, '%Y-%m-%d') as day_
                        ,case when staff_id = 'h0000000000000411023751335247872' then '财税五组'
                        when staff_id = 'h0000000000000447602747361484800' then '财税五组'
                        when staff_id = 'h0000000000000449467198091747328' then '财税五组'
                        when staff_id = 'h0000000000000479477792350912512' then '财税五组'
                        when staff_id = 'h0000000000000480904551738851328' then '财税五组'
                        when staff_id = 'h0000000000000462052726800130048' then '财税五组'
                        end as staff_group
                        ,case when staff_id = 'h0000000000000411023751335247872' then '李葳'
                        when staff_id = 'h0000000000000447602747361484800' then '秦文娇'
                        when staff_id = 'h0000000000000449467198091747328' then '王美玲'
                        when staff_id = 'h0000000000000479477792350912512' then '曲英华'
                        when staff_id = 'h0000000000000480904551738851328' then '刘东洋'
                        when staff_id = 'h0000000000000462052726800130048' then '孙柳芳'
                        end as staff_name
                        ,case when staff_id = 'h0000000000000411023751335247872' then '1022915'
                        when staff_id = 'h0000000000000447602747361484800' then '1027435'
                        when staff_id = 'h0000000000000449467198091747328' then '1027612'
                        when staff_id = 'h0000000000000479477792350912512' then '1030854'
                        when staff_id = 'h0000000000000480904551738851328' then '1030966'
                        when staff_id = 'h0000000000000462052726800130048' then '1029271'
                        end as staff_no
                        ,cust_phone
                        ,case when total_time > 0 then '无效'
                        else '未接通'
                        end as call_type
                        ,total_time
                        from ftsp_product.grow_call_record
                        where staff_id in (
                        'h0000000000000411023751335247872',
                        'h0000000000000447602747361484800',
                        'h0000000000000449467198091747328',
                        'h0000000000000479477792350912512',
                        'h0000000000000480904551738851328',
                        'h0000000000000462052726800130048'
                        )
                        and total_time < 30
                        and update_date is not null
                    )
                    order by update_date  --排序字段
                )
                group by day_,cust_phone  --去重字段
            )
            ARRAY JOIN
            arr_time,
            arrayEnumerate(arr_time) as num
            where num = 1  ---取一条