TinyKV 是PingCAP公司推出的一套开源分布式KV存储实战课程:https://github.com/tidb-incubator/tinykv

宗旨实现一个简易的分布式 kv

这课程一共包含了4子项目:

  1. Project 1需要参与者独立完成一个单机的KV Server
  2. Project 2需要基于Raft算法实现分布式键值数据库服务端
  3. Project 3需要在Project 2的基础上支持多个Raft集群
  4. Project 4需要Project 3的基础上支持分布式事务

难度都是阶梯式的,等价于麻省理工学院有一套MIT 6.824课程
[

](https://github.com/watchpoints/tinykv/blob/course/doc/project1-StandaloneKV.md)

任务:Standalone KV

第一个 Project 是集成 Badger,实现一个简易的单机版 kv。

Badger 是一个很优秀的开源的单机版 kv 存储引擎,基于 LSM Tree 实现,读写性能都很好,需要简单熟悉下 Badger 的用法,可以参考下官方示例:https://github.com/dgraph-io/badger
Talent Plan KV训练营Standalone KV实验 - 图1

第一步、开始任务:阅读文档

执行了make project1可以看到抛出了一大堆异常,这些异常原因就是官方工程师给你写的单元测试没有跑通过, 你要做的只需要把/tinykv/kv/server/server_test.go 下的所有的单元测试用例调用的api里面的功能实现即可

https://github.com/tidb-incubator/tinykv/blob/course/doc/project1-StandaloneKV.md
https://pkg.go.dev/github.com/dgraph-io/badger#Txn

具体的实现,在 kv/storage/standalone_storage/standalone_storage.go 中,需要封装一下 Badger,然后实现 storage 接口中定义的几个方法。

  • cf 定义
  1. Column family (it will abbreviate to CF below) is a term like key namespace,
  2. namely the values of the same key in different column families is not the same.
  3. You can simply regard multiple column families as separate mini databases
  4. const (
  5. CfDefault string = "default"
  6. CfWrite string = "write"
  7. CfLock string = "lock"
  8. )
  • Get fetches the current value for a key for the specified CF
  • Put replaces the value for a particular key for the specified CF in the database

The project can be broken down into 2 steps, including:

  1. Implement a standalone storage engine.
  2. Implement raw key/value service handlers.

    The first mission is implementing a wrapper of badger key/value API.

The service of gRPC server depends on an Storage which is defined in kv/storage/storage.go.

In this context, the standalone storage engine is just a wrapper of badger key/value API which is provided by two methods:

  1. type Storage interface {
  2. // Other stuffs
  3. Write(ctx *kvrpcpb.Context, batch []Modify) error
  4. Reader(ctx *kvrpcpb.Context) (StorageReader, error)
  5. }
  • https://github.com/Connor1996/badger

    In addition, Server depends on a Storage, an interface you need to implement for the standalone storage engine located in kv
    /storage/standalone_storage/standalone_storage.go.

Once the interface Storage is implemented in StandaloneStorage, you could implement the raw key/value service for the Server with it.

  • badger (一个高性能的LSM K/V store)使用指南

https://pkg.go.dev/github.com/dgraph-io/badger#section-readme

https://pkg.go.dev/github.com/dgraph-io/badger#Txn

  1. func TestBlob(t *testing.T)
  2. func TestGet(t *testing.T) {

badger (一个高性能的LSM K/V store)使用指南

第三次看:文档

image.png

第二步、不公布

  1. 第一个你要实现的 standalone_storage.go
  1. type Storage interface {
  2. // Other stuffs
  3. Write(ctx *kvrpcpb.Context, batch []Modify) error
  4. Reader(ctx *kvrpcpb.Context) (StorageReader, error)
  5. }
  6. // StandAloneStorage is an implementation of `Storage` for a single-node TinyKV instance. It does not
  7. // communicate with other nodes and all data is stored locally.
  8. type StandAloneStorage struct {
  9. // Your Data Here (1).
  10. }
  11. func NewStandAloneStorage(conf *config.Config) *StandAloneStorage {
  12. // Your Code Here (1).
  13. return nil
  14. }
  15. func (s *StandAloneStorage) Start() error {
  16. // Your Code Here (1).
  17. return nil
  18. }
  19. func (s *StandAloneStorage) Stop() error {
  20. // Your Code Here (1).
  21. return nil
  22. }
  23. func (s *StandAloneStorage) Reader(ctx *kvrpcpb.Context) (storage.StorageReader, error) {
  24. // Your Code Here (1).
  25. return nil, nil
  26. }
  27. func (s *StandAloneStorage) Write(ctx *kvrpcpb.Context, batch []storage.Modify) error {
  28. // Your Code Here (1).
  29. return nil
  30. }
  1. 第二个你要实现的:raw_api.go

第三步、测试:server_test.go

remember to run make project1 to pass the test suite.

  1. GOTEST := $(GO) test -v --count=1 --parallel=1 -p=1
  2. project1:
  3. $(GOTEST) ./kv/server -run 1
  4. make project1
  5. GO111MODULE=on go test -v --count=1 --parallel=1 -p=1 ./kv/server -run 1

第四步:我的疑问

  • 问:实验1 TestRawGetAfterRawPut1,通一个key,插入不同记录,但是在查询时候。结果不正确了。

这个查询和报错时候指定 cf吗?原来的badger没有cf这个概念。

回答:

image.png

  1. Badger doesnt give support for column families. engine_util package (kv/util/engine_util) simulates column families by adding a prefix to keys. For example, a key key that belongs to a specific column family cf is stored as ${cf}_${key}. It wraps badger to provide operations with CFs, and also offers many useful helper functions. So you should do all read/write operations through engine_util provided methods. Please read util/engine_util/doc.go to learn more.

问:func TestRawScan1(t *testing.T) 这个测试case 是什么浏览方式。

答:
image.png

相关讨论

上面群聊未能解决的问题,可以移步到asktug请求帮助
如果已有问答能够帮助解决问题,帖主记得勾选“对我有用”哦
[

](https://asktug.com/t/topic/273355)

这是近日的提问帖:https://asktug.com/t/topic/273196
https://asktug.com/t/topic/273154/5
https://asktug.com/t/topic/273269/2
https://asktug.com/t/topic/273388/3

https://asktug.com/t/topic/273391?u=tidber_ybwcfwut
https://asktug.com/t/topic/273388/2
https://asktug.com/t/topic/273387
https://asktug.com/t/topic/273355

相关别人思路

  1. Talent Plan KV训练营Project1解题分享