数据分片 - 分片算法设计 - 《Mycat2权威指南》

分片算法基本形式
常见下标
分片表类型
分区
计算方法
查询条件类型

author:chenjunwen 2022-2-10

分片算法设计 - 图1
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

分片算法基本形式

被描述为
SQL中的过滤条件->分区
即
condition->Partition[]

condition是SQL中关于某个逻辑表的条件,对于该逻辑表读取的数据,对该条件都成立
Partition[]是分区数组,分区就是(数据库实例,物理分库,物理分表)

当没有条件生效的时候,则返回所有分区
当有条件生效的时候,则返回一个或者部分分区

一个分区与部分分区必须是所有分区的子集,如果在设计要违反该规则做调整,咨询QQ:294712221

index是全局分区下标
tableIndex是物理分表下标
dbIndex是物理分库下标
targetIndex是存储节点下标,targetIndex小于存储节点数量

常见下标

单实例一库分表

targetIndex	dbIndex	tableIndex	index
0	0	0	0
0	0	1	1
0	0	2	2
0	0	3	3

分实例每库一表

targetIndex	dbIndex	index
0	0	0
1	1	1
2	2	2
3	3	3

单实例分库分表(分库分表字段不相同)

dbIndex	tableIndex	index
0	0	0
0	1	1
1	0	2
1	1	3

dbIndex	tableIndex	index
0	0	0
1	1	1
0	0	2
1	1	3

单实例分库分表(分库分表字段相同)

dbIndex	tableIndex	index
0	0	0
1	1	1
0	2	2
1	3	3

多实例分库分表(分库分表字段不相同)

targetIndex	dbIndex	tableIndex	index
0	0	0	0
0	0	1	1
1	1	0	2
1	1	1	3

范围分区+HASH算法

user_id要求在产品生命周期,严格自增
if(user_id>=0 && user_id<1000_0000){
  schema = "db1";
  table = "t"+user_id%16
}else if(user_id>=1000_0000 && user_id<1000_0000){
   schema = "db2";
   table = "t_"+user_id%16
}else ...{
}

名称映射

select * from info where company = "B" and code = '1';
targetName = "B"
table = "info_"+code;
将被改写成
select * from info_1 where company = "B" and company_id = 1;
发往名称为B的数据源

名称映射实际上与下标没有关系,但一般要配置index表示分区的次序

名称映射+HASH算法

select * from info where company = "B" and company_id = 1;
targetName = "B"
table = "info_"+company_id %16
将被改写成
select * from info_1 where company = "B" and company_id = 1;
发往名称为B的数据源

分片表类型

分实例单表
多个数据库存储数据,每个分片数据库仅有一个分表,每个分表数据不相同,无交集,但是所有分片数据库的分表可以构成一个完整的分片表
当库名与实例名,一对一的时候,就是分库单表

单实例分表
只有一个数据库存储一个分片表的所有数据,该分片数据库内每个分表数据不相同,无交集
当库名与实例名,一对一的时候,就是单库分表

分实例分表
多个数据库存储数据,每个分片数据库有多个分表,,每个分表数据不相同,无交集,所有分片数据库的分表可以构成一个完整的分片表,当库名与实例名,一对一的时候,就是分库分表
一般来说,会根据业务数据的规律,使每个数据库下的分表数据有相同的数据分布.

分区

public interface Partition extends java.lang.Comparable<Partition> {
    String getTargetName();//目标
    String getSchema();//物理分库
    String getTable();//物理分表
    Integer getDbIndex();//物理分库下标
    Integer getTableIndex();//物理分表下标
    Integer getIndex();//全局分表(分区)下标
    }

默认以全局分表下标为排序依据

计算方法

io.mycat.router.CustomRuleFunction

 public Partition calculateOne(Map<String, RangeVariable> values) 
 //键是字段,值是查询类型和具体值

该方法用于数据插入,根据条件计算出一个分区

public abstract List<Partition> calculate(Map<String, RangeVariable> values)
//键是字段,值是查询类型和具体值

该方法是通用计算方法,根据条件计算出一个或者多个分区
当values是空的时候,需要返回所有分区

public class RangeVariable {
    private final RangeVariableType operator;
    private final Object value;//等价值,smallOne
    private Object optionValue = null;//bigOne
    private String columnName;
    }

查询条件类型

public enum RangeVariableType {
    EQUAL,//=
    RANGE,//between, smallOne<= 分片字段 and 分片字段<=bigOne
    GTE,//>=,smallOne<= 分片字段 
    GT,//>,smallOne < 分片字段 
    LTE,//<=bigOne,分片字段<=bigOne
    LT//<bigOne,分片字段<bigOne
}

smallOne与bigOne是SQL中常量
HASH和规则分片实现EQUAL,RANGE
GTE,GT,LTE,LT是可选的(预计1.22提供)

当实现EQUAL,关于OR的计算,IN的计算,范围剪裁,分区求交集,由优化器实现,用户无需关注

 public boolean isShardingPartitionKey(String name)

该字段是否分区键

 public boolean isSameDistribution(CustomRuleFunction customRuleFunction)

是否与参数中的算法具有相同的数据分布,用于ER关系判断

 public boolean isSameTargetFunctionDistribution(CustomRuleFunction customRuleFunction)

是否具有相同的数据库目标映射函数,比如分库函数相同,分表函数不相同

 public boolean isSameTableFunctionDistribution(CustomRuleFunction customRuleFunction)

是否具有相同的分表表映射函数,比如分库函数不相同,分表函数相同

   public boolean isSameDbFunctionDistribution(CustomRuleFunction customRuleFunction) {
        return false;
    }

是否具有相同的database映射函数,比如分库函数相同,分表函数不相同

 public abstract String getErUniqueID();

获取分片算法的ID,是分片算法的标识(暂定)

    public boolean isAllPartitionInTargetName(String targetName)

该分片算法的分区是否都在该targetName内,即判断单实例分表

   public abstract ShardingTableType getShardingTableType();

获取分片表类型

键的判断

    public abstract boolean isShardingDbKey(String name);//是否分库键
    public abstract boolean isShardingTableKey(String name);//是否分表键
    public abstract boolean isShardingTargetKey(String name);//是否目标映射键

 public abstract int requireShardingKeyCount()

要计算出命中一个分区(分表)需要多个分片键?

    public abstract boolean requireShardingKeys(Set<String> shardingKeys);

这些字段是否能计算出一个分区

    public Partition getPartition(int index)

根据全局分表(分区)下标获得分区