Redis键的类型

从表面上看，键可以是

字符串类型，例如set str lang
整型，例如set 100 100
浮点型，例如set 0.1 100

但实际上，在底层，这些键的类型都是字符串类型。

Redis字符串类型

Redis的字符串类型叫SDS，simple dynamic string。

为什么不用C原生的字符串？

Redis底层采用C的实现，C的字符串是字符数组，例如：

char data[] = "hello\0";

但是Redis并没有采用C的字符串实现，而是自己实现了SDS。原因如下：

二进制安全的数据结构

C的字符串数组以\0结尾，Redis作为一个中间件可能要和很多种不同的语言进行数据交换，比如Java，C，PHP等。这些语言可能在传输的过程中会带上\0，那么就会导致读取的时候出问题，例如：

"Hello\0World"
如果采用C的字符串实现，就只会读取到Hello就结束了。

此外，提供了内存预分配。

因此，它的扩容与C不同，例如我们想要：

char buf[] = "Hello"  -> "HelloWorld"

C只会将数组长度从5变成10，但是SDS会（len + addLen） 2，即（5 + 5） 2 = 20；

最后，它会在字符串末尾，追加上\0来兼容C语言的函数库。

SDS详解

基本结构

char buf[] 用来保存数据；
len用来保存当前字符串的实际长度；
free用来保存当前数组的剩余空间。

例如，在上面的例子中，从Hello变成HelloWorld，那么：

// 扩容后总空间变成了20
char buf[] = "HelloWorld"
len: 10
free: 10

那么下一次需要修改字符串时，先去检查要新添加的字符串长度是否小于free，小于就直接添加进去即可，不必再进行扩容。

版本差异

在3.2版本以前，结构如下：

struct sdshdr {
    int len;
    int free;
    char buf[]
}

sds是simple dynamic string的缩写，hdr是header的缩写

但是这样导致了很大的空间浪费，因为int类型占据4个字节，可以表示上亿的数据，但是我们的char数组中通常不会存放那么多的数据。

在3.2以后，根据不同的数据实际大小选择不同的sds类型：

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

扩容的注意事项

当字符串的大小大于1M时，不会再成倍扩容，而是每次只会增加1M。
采取渐进式扩容，即不是一次性完成扩容和数据转移，而是逐步进行转移；

Key和Value的实现

类似于HashMap，采用数组 + 链表实现海量数据存储。

Key -> Hash(Key) % 长度 -> 转化成Index -> 如果哈希冲突，则形成链表。

每一个键值对都是一个dictEntry

typedef struct dictEntry {
    // key指向一个sds
    void *key;

    // 值是一个union，一次只使用union中的某一个字段
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;

    // next指向下一个dictEntry, 形成链表。
    struct dictEntry *next;
} dictEntry;

Redis值的类型

每一个对象被封装成了一个RedisObject, 由dicEntry中的val指针指向。

typedef struct redisObject {
    // type表示类型，用来约束客户端命令，比如不能对list类型的数据进行set，要用lpush
    unsigned type:4;

    // 涉及到底层优化，比如看起来是string类型的值，实际上是int类型或者embstr类型
    unsigned encoding:4;

    // LRU
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */

    // 引用法，用来做垃圾回收
    int refcount;

    // 指向真正的数据
    void *ptr;
} robj;

实际Encoding

对于String类型的值，底层的Encoding是变化的，可以是int, embstr或者raw，这取决于实际的值的长度。

例如:

set str aString

object encoding str

// 得到的aString的类型为embstr

set aNumber 10

object encoding aNumber

// 得到的10的类型为int

set longString aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

object encoding longString

// 得到的aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa的类型为raw => 即sds

注意，是值的类型，不是key的类型。

int类型源码如下：

/* Try to encode a string object in order to save space */
robj *tryObjectEncoding(robj *o) {
    long value;
    sds s = o->ptr;
    size_t len;

    /* Make sure this is a string object, the only type we encode
     * in this function. Other types use encoded memory efficient
     * representations but are handled by the commands implementing
     * the type. */
    serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);

    /* We try some specialized encoding only for objects that are
     * RAW or EMBSTR encoded, in other words objects that are still
     * in represented by an actually array of chars. */
    if (!sdsEncodedObject(o)) return o;

    /* It's not safe to encode shared objects: shared objects can be shared
     * everywhere in the "object space" of Redis and may end in places where
     * they are not handled. We handle them only as values in the keyspace. */
     if (o->refcount > 1) return o;

    /* Check if we can represent this string as a long integer.
     * Note that we are sure that a string larger than 20 chars is not
     * representable as a 32 nor 64 bit integer. */
    len = sdslen(s);
    // 如果长度小于等于20，且字符串可以转化为整型值，转化为int类型
    if (len <= 20 && string2l(s,len,&value)) {
        /* This object is encodable as a long. Try to use a shared object.
         * Note that we avoid using shared integers when maxmemory is used
         * because every object needs to have a private LRU field for the LRU
         * algorithm to work well. */
        if ((server.maxmemory == 0 ||
            !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
            value >= 0 &&
            value < OBJ_SHARED_INTEGERS)
        {
            decrRefCount(o);
            incrRefCount(shared.integers[value]);
            return shared.integers[value];
        } else {
            if (o->encoding == OBJ_ENCODING_RAW) sdsfree(o->ptr);
            o->encoding = OBJ_ENCODING_INT;
            o->ptr = (void*) value;
            return o;
        }
    }

    /* If the string is small and is still RAW encoded,
     * try the EMBSTR encoding which is more efficient.
     * In this representation the object and the SDS string are allocated
     * in the same chunk of memory to save space and cache misses. */
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
        robj *emb;

        if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
        emb = createEmbeddedStringObject(s,sdslen(s));
        decrRefCount(o);
        return emb;
    }

    /* We can't encode the object...
     *
     * Do the last try, and at least optimize the SDS string inside
     * the string object to require little space, in case there
     * is more than 10% of free space at the end of the SDS string.
     *
     * We do that only for relatively large strings as this branch
     * is only entered if the length of the string is greater than
     * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */
    if (o->encoding == OBJ_ENCODING_RAW &&
        sdsavail(s) > len/10)
    {
        o->ptr = sdsRemoveFreeSpace(o->ptr);
    }

    /* Return the original object. */
    return o;
}

embStr类型：
读取缓存行时，一次能读取64byte, 而一个redisObject只有16byte，那么就意味着还有48byte是没用到的。

我们尝试将数据放进没用到的48byte里边，即尝试将一个sds放到这48byte里，采用sdshdr8.

sdshdr8中，自身消耗了四个字节，即：

struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */  1个字节
    uint8_t alloc; /* excluding the header and null terminator */ 1个字节
    unsigned char flags; /* 3 lsb of type, 5 unused bits */ 1个字节
    char buf[];  // 末尾补\0占用一个字节
};

那么buf[]中还可以存44个字节。因此：

当String类型的长度小于等于44字节，且不能转化为int类型时，就会以embStr类型存储。大于44字节就变成了raw。

Bitmap

做亿级用户的日活统计。

List类型

3.2以前，采用zipList + quickList来做底层实现，其中quickList是双端链表。

zipList在满足以下条件之一会转化成quickList:

试图往zipList里添加新的字符串，且添加后长度大于server.list_max_ziplist_value，默认为64；
ziplist 包含的节点超过 server.list_max_ziplist_entries （默认值为 512 ）。

3.2以后，zipList被quickList所取代。

Hash类型

采用ZipList

底层采用字典（dict）实现，当数据量比较小，或者单个元素比较小时，底层用zipList存储。

图片来自：https://i6448038.github.io/2019/12/01/redis-data-struct/

每一个entry长这样：

当采用zipList时，底层的数据是有序的，即

可以看到，放进去的时候是什么顺序，拿出来的时候也是什么顺序。

采用HashTable

当满足下面两个条件之一，会转化成HashTable:

hash-max-ziplist-entries 512 // ziplist中包含的元素个数大于512时
hash-manx-ziplist-value 64 // 当存在单个元素的大小超过64byte时

例如，在上面的例子中设置一个非常长的字符串：

可以看到，不再有序。

Set类型

Set底层为一个value为null的字典（dict），当数据可以用整型表示时，Set集合采用编码为intset的数据结构。

如果满足以下两个条件之一，转化为HashTable编码：

元素个数大于set-max-intset-entries // 默认为512
元素无法用整型表示 // 例如sadd a-set a

Java技术

九、Redis底层C源码讲解