TinyKV 是PingCAP公司推出的一套开源分布式KV存储实战课程:https://github.com/tidb-incubator/tinykv,

宗旨实现一个简易的分布式 kv

这课程一共包含了4子项目:

  1. Project 1需要参与者独立完成一个单机的KV Server
  2. Project 2需要基于Raft算法实现分布式键值数据库服务端
  3. Project 3需要在Project 2的基础上支持多个Raft集群
  4. Project 4需要Project 3的基础上支持分布式事务

难度都是阶梯式的,等价于麻省理工学院有一套MIT 6.824课程

Project 2需要基于Raft算法实现分布式键值数据库服务端

第一步:准备 阅读文章和资料

第一个资料:题目来源

https://github.com/tidb-incubator/tinykv/blob/course/doc/project2-RaftKV.md

摘要:

  1. The project has 3 parts you need to do, including:
  2. Part A Implement the basic Raft algorithm
  3. Part B Build a fault-tolerant KV server on top of Raft
  4. Part C Add the support of raftlog GC and snapshot
  5. 该项目有 3 个你需要做的部分,包括:
  6. 1. 实现基本的 Raft 算法
  7. 2. Raft 之上搭建一个容错的 KV 服务器
  8. 3. 增加raftlog GCsnapshot的支持

raft 包含三个部分
This part can be broken down into 3 steps, including:

  • Leader election
  • Log replication
  • Raw node interface

题要求:从测试看题目要求

  1. Part A 分为三个部分
  2. This part can be broken down into 3 steps
  3. //Raw node interface
  4. project2a:
  5. $(GOTEST) ./raft -run 2A
  6. //Leader election
  7. project2aa:
  8. $(GOTEST) ./raft -run 2AA
  9. //Log replication
  10. project2ab:
  11. $(GOTEST) ./raft -run 2AB
  12. //Raw node interface
  13. project2ac:
  14. $(GOTEST) ./raft -run 2AC
  15. Log replication
  16. Raw node interface

第二个资料:群分享

第三个资料:

Go Etcd 源码学习【3】raftexample 学习

https://articles.zsxq.com/id_od6o03o5ttpk.html
Go Etcd 源码学习【5】raftNode 模块
https://articles.zsxq.com/id_mpjcmfhvwftw.html

raft.Node 是纯状态机的实现。
raft.MemoryStorage 是存储的实现。
wal.WAL 是持久化 log 的实现。
这三个是业务 raftNode 组装功能的核心

Go Etcd 源码学习【9】 状态机

FpSfF4jluPbOJivarKd4yjOK9BZR.jpg

第四个资料:视频

2020 MIT 6.824 分布式系统

MIT著名课程:6.824: Distributed Systems
分布式系统 课程主页:https://pdos.csail.mit.edu/6.824/index.html
课程安排:https://pdos.csail.mit.edu/6.824/schedule.htm
https://www.bilibili.com/video/BV1R7411t71W/

https://mit-public-courses-cn-translatio.gitbook.io/mit6-824/lecture-08-zookeeper

https://learn.pingcap.com/learner/player/390002;id=390002;classroomId=540003;rcoId=360022;courseDetailId=390002;learnerAttemptId=1638774629779

第五个资料

image.png
image.png

第六个asktbug

https://asktug.com/t/topic/272938
https://asktug.com/t/topic/274731

第二步:讨论思路

问 project2aa 领导选择 参考 raftexample 这个吗?

如果用raft 层用 log.Debug 测试里没打出来,参考一下这个pr https://github.com/tidb-incubator/tinykv/pull/324 改一下 kv/config 的 log level,会更好定位一点

问2:Applied代表 最近日志,还是持久化日志,还是大多数统一路径呢?

  1. // Applied is the last applied index. It should only be set when restarting
  2. // raft. raft will not return entries to the application smaller or equal to
  3. // Applied. If Applied is unset when restarting, raft might return previous
  4. // applied entries. This is a very application dependent configuration.
  5. Applied uint64

疑问3:// raft 状态机 func newRaft(c Config) raft {

疑问4:

持久化和没有持久化是怎么区分的,是wal写log吗?

疑问5:raft apply index 含义是什么?
image.png
image.png
行动:
https://www.zhihu.com/question/382888510
https://thesquareplanet.com/blog/raft-qa/
https://asktug.com/t/topic/69001
https://asktug.com/t/topic/69001

结果:

应用到state machine的index
state machine 怎么理解 就是写到内存里?

image.png

就是你收到了 raft log,

但是不一定把 kv 存到了本地(内存),把 kv 存到本地,

就算是 apply 了

image.pngimage.png

image.png

「执行」

就是你收到了 raft log,但是不一定把 kv 存到了本地(内存),把 kv 存到本地,就算是 apply 了
只是保存了,算是persist
必须先存再执行

apply 还没到应用层,apply 之后还要 commit 给 leader

然后 leader 才返回给应用层

疑问6 HeartbeatTick 谁发出的?

https://asktug.com/t/topic/274755

https://asktug.com/t/topic/273977
https://asktug.com/t/topic/273265

行动:

  • golang 定时任务方面time.Sleep和time.Tick的优劣对比

https://blog.csdn.net/Star_CSU/article/details/86650684

结果:

不知道

疑问7:随机赋予一个超时时间如何设定,然后整个业务逻辑是什么?

行动:

结果:

https://asktug.com/t/topic/274755/4

第三步:实现

第四步:测试

测试1 make project2aa

image.png

测试1 Leader election make project2aa

  • [ ] testUpdateTermFromMessage

    1. // testUpdateTermFromMessage tests that if one server’s current term is
    2. // smaller than the other’s, then it updates its current term to the larger
    3. // value. If a candidate or leader discovers that its term is out of date,
    4. // it immediately reverts to follower state.
    5. // Reference: section 5.1
    6. func testUpdateTermFromMessage(t *testing.T, state StateType) {
    7. r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
    8. switch state {
    9. case StateFollower:
    10. r.becomeFollower(1, 2)
    11. case StateCandidate:
    12. r.becomeCandidate()
    13. case StateLeader:
    14. r.becomeCandidate()
    15. r.becomeLeader()
    16. }
    17. r.Step(pb.Message{MsgType: pb.MessageType_MsgAppend, Term: 2})
    18. if r.Term != 2 {
    19. t.Errorf("term = %d, want %d", r.Term, 2)
    20. }
    21. if r.State != StateFollower {
    22. t.Errorf("state = %v, want %v", r.State, StateFollower)
    23. }
    24. }

TestCandidateUpdateTermFromMessage2AAce

image.pngimage.png—- PASS: TestCandidateUpdateTermFromMessage2AA (0.00s)

TestLeaderUpdateTermFromMessage2AA

第一次测试失败:

  1. raft_paper_test.go:67: term = 1, want 2
  2. raft_paper_test.go:70: state = StateLeader, want StateFollower
  3. --- FAIL: TestLeaderUpdateTermFromMessage2AA (0.00s)

image.png

第二次测试

—- PASS: TestLeaderUpdateTermFromMessage2AA (0.00s)

TestLeaderBcastBeat2AA

  1. // TestLeaderBcastBeat tests that if the leader receives a heartbeat tick,
  2. // it will send a MessageType_MsgHeartbeat with m.Index = 0, m.LogTerm=0 and empty entries
  3. // as heartbeat to all followers.
  4. // Reference: section 5.2
  5. func TestLeaderBcastBeat2AA(t *testing.T) {
  6. // heartbeat interval
  7. hi := 1
  8. r := newTestRaft(1, []uint64{1, 2, 3}, 10, hi, NewMemoryStorage())
  9. r.becomeCandidate()
  10. r.becomeLeader()
  11. r.Step(pb.Message{MsgType: pb.MessageType_MsgPropose, Entries: []*pb.Entry{{}}})
  12. r.readMessages() // clear message
  13. for i := 0; i < hi; i++ {
  14. r.tick()
  15. }
  16. msgs := r.readMessages()
  17. sort.Sort(messageSlice(msgs))
  18. wmsgs := []pb.Message{
  19. {From: 1, To: 2, Term: 1, MsgType: pb.MessageType_MsgHeartbeat},
  20. {From: 1, To: 3, Term: 1, MsgType: pb.MessageType_MsgHeartbeat},
  21. }
  22. if !reflect.DeepEqual(msgs, wmsgs) {
  23. t.Errorf("msgs = %v, want %v", msgs, wmsgs)
  24. }
  25. }

第一次测试 失败

  1. raft_paper_test.go:120: msgs = [], want [{MsgHeartbeat 2 1 1 0 0 [] 0 <nil> false {} [] 0} {MsgHeartbeat 3 1 1 0 0 [] 0 <nil> false {} [] 0}]

image.png

  1. 'MessageType_MsgPropose' proposes to append data to its log entries. This is a special
  2. type to redirect proposals to the leader. Therefore, send method overwrites
  3. eraftpb.Message's term with its HardState's term to avoid attaching its
  4. local term to 'MessageType_MsgPropose'.
  5. When 'MessageType_MsgPropose' is passed to the leader's 'Step'
  6. method, the leader first calls the 'appendEntry' method to append entries
  7. to its log, and then calls 'bcastAppend' method to send those entries to
  8. its peers.
  9. When passed to candidate, 'MessageType_MsgPropose' is dropped.
  10. When passed to
  11. follower, 'MessageType_MsgPropose' is stored in follower's mailbox(msgs) by the send
  12. method. It is stored with sender's ID and later forwarded to the leader by
  13. rafthttp package.

image.png

image.png
第二次测试; ok

image.png

测试4:TestFollowerStartElection2AA

  1. func TestFollowerStartElection2AA(t *testing.T) {
  2. testNonleaderStartElection(t, StateFollower)
  3. }
  4. func TestCandidateStartNewElection2AA(t *testing.T) {
  5. testNonleaderStartElection(t, StateCandidate)
  6. }
  7. // testNonleaderStartElection tests that if a follower receives no communication
  8. // over election timeout, it begins an election to choose a new leader. It
  9. // increments its current term and transitions to candidate state. It then
  10. // votes for itself and issues RequestVote RPCs in parallel to each of the
  11. // other servers in the cluster.
  12. // Reference: section 5.2
  13. // Also if a candidate fails to obtain a majority, it will time out and
  14. // start a new election by incrementing its term and initiating another
  15. // round of RequestVote RPCs.
  16. // Reference: section 5.2
  17. func testNonleaderStartElection(t *testing.T, state StateType) {

第一次测试结果;失败 没有写代码

测试5:TestFollowerStartElection2AA

  1. // TestCandidateFallback tests that while waiting for votes,
  2. // if a candidate receives an AppendEntries RPC from another server claiming
  3. // to be leader whose term is at least as large as the candidate's current term,
  4. // it recognizes the leader as legitimate and returns to follower state.
  5. // Reference: section 5.2
  6. func TestCandidateFallback2AA(t *testing.T) {
  1. 'MessageType_MsgHup' is used for election.
  2. If a node is a follower or candidate, the
  3. 'tick' function in 'raft' struct is set as 'tickElection'.
  4. If a follower or
  5. candidate has not received any heartbeat before the election timeout,
  6. it
  7. passes 'MessageType_MsgHup' to its Step method and becomes (or remains) a candidate to
  8. start a new election.
  • tick 出发 tickElection
  • 在选举超时之前没有收到ack ,需要重新发起一轮选举

image.png

image.png

测试通过:
image.png

关键线索:

重新选举

// Also if a candidate fails to obtain a majority, it will time out and
// start a new election by incrementing its term and initiating another
// round of RequestVote RPCs.

测试 5 TestLeaderElectionInOneRoundRPC2AA

  • 目的
  1. // TestLeaderElectionInOneRoundRPC tests all cases that may happen in
  2. // leader election during one round of RequestVote RPC:
  3. // a) it wins the election
  4. // b) it loses the election
  5. // c) it is unclear about the result
  6. // Reference: section 5.2
  7. MessageType_MsgRequestVoteResponse

5.2 Leader election —TestFollowerVote2AA
Follower处理投票逻 RequestVote RPC

- 规则1 https://asktug.com/t/topic/69001
  1. 1、如果接收到的请求的term<=本follower节点的currentTerm,返回false,原论文这里只有小于,实际应该小于或者等于,因为相等的情况下,说明follower早就投过同任期的票了。
  2. 2、如果candidate的日志跟自己的一样或者比自己的新,则投出赞成票
  3. 那么怎样才算是更新呢,论文里提到:比较两个节点日志中的最新的日志记录,然后比较这两条日志记录的下标和所属任期号,所属任期号比较大的日志记录,那么它就是较新的,如果所属任期号相同,再比较下标值,下标值较大的则是较新的那一条。
  4. 3、若请求参数里的term大于自己的currentTermfollower投票后,将自己的currentTerm赋值为请求参数term;(这条也是自己加的,论文体现在后面的节点规则中,

image.png5.4.1 Election restriction
image.pngimage.png

  • 更高的term

image.pngimage.pngimage.pngimage.png

make project2aa—TestLeaderCycle2AA

  1. testLeaderCycle verifies that each node in a cluster can campaign
  2. and be elected in turn. This ensures that elections work when not
  3. starting from a clean slate (as they do in TestLeaderElection)
  4. 验证集群中的每个节点都可以竞选
  • 错误日志

[root@localhost tinykv]# make project2aa >> a.out
make: * [Makefile:62:project2aa] 错误 1
[root@localhost tinykv]# vi a.out
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:2] [term: 0] received a MsgRequestVote message
2022/02/23 18:55:21 storage.FirstIndex failed
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:3] [term: 0] received a MsgRequestVote message
2022/02/23 18:55:21 storage.FirstIndex failed
2022/02/23 18:55:21 fucntion Step >> r.State =StateCandidate [node:1] [term: 1] received a MsgRequestVoteResponse message
2022/02/23 18:55:21 StepCandidate begin …
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 1 true 1 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 2 true 2 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse2 2 0 2
2022/02/23 18:55:21 becomeLeader:id =1 became leader at term =1
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:1] [term: 1] received a MsgRequestVoteResponse message
2022/02/23 18:55:21 StepCandidate begin …
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 1 true 1 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 2 true 2 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 3 true 3 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse2 3 0 2
2022/02/23 18:55:21 becomeLeader:id =1 became leader at term =1
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:2] [term: 1] received a MsgHup message
2022/02/23 18:55:21 Candidate term = 2
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:1] [term: 1] received a MsgRequestVote message
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:3] [term: 1] received a MsgRequestVote message
2022/02/23 18:55:21 storage.FirstIndex failed
2022/02/23 18:55:21 fucntion Step >> r.State =StateCandidate [node:2] [term: 2] received a MsgRequestVoteResponse message
2022/02/23 18:55:21 StepCandidate begin …
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 2 true 1 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 3 true 2 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse2 2 0 2
2022/02/23 18:55:21 becomeLeader:id =2 became leader at term =2
raft_test.go:116: campaignerID =2 after campaign of node 2, node 1 had state = StateLeader, want StateFollower
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:3] [term: 2] received a MsgHup message
2022/02/23 18:55:21 Candidate term = 3
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:1] [term: 2] received a MsgRequestVote message
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:2] [term: 2] received a MsgRequestVote message
raft_test.go:116: campaignerID =3 after campaign of node 3, node 1 had state = StateLeader, want StateFollower
raft_test.go:116: campaignerID =3 after campaign of node 3, node 2 had state = StateLeader, want StateFollower
raft_test.go:113: campaignerID =3 campaigning node 3 state = StateCandidate, want StateLeader
—- FAIL: TestLeaderCycle2AA (0.00s)
=== RUN TestVoteFromAnyState2AA

https://search.asktug.com/?q=TestLeaderCycle2AA
https://asktug.com/t/topic/272914/3