TinyKV 是PingCAP公司推出的一套开源分布式KV存储实战课程:https://github.com/tidb-incubator/tinykv,
宗旨实现一个简易的分布式 kv
这课程一共包含了4子项目:
- Project 1需要参与者独立完成一个单机的KV Server
- Project 2需要基于Raft算法实现分布式键值数据库服务端
- Project 3需要在Project 2的基础上支持多个Raft集群
- Project 4需要Project 3的基础上支持分布式事务
难度都是阶梯式的,等价于麻省理工学院有一套MIT 6.824课程
Project 2需要基于Raft算法实现分布式键值数据库服务端
第一步:准备 阅读文章和资料
第一个资料:题目来源
https://github.com/tidb-incubator/tinykv/blob/course/doc/project2-RaftKV.md
-
What is 6.824 about?
https://mit-public-courses-cn-translatio.gitbook.io/mit6-824/lecture-06-raft1/6.5-ri-zhi-raft-log
摘要:
The project has 3 parts you need to do, including:
Part A Implement the basic Raft algorithm
Part B Build a fault-tolerant KV server on top of Raft
Part C Add the support of raftlog GC and snapshot
该项目有 3 个你需要做的部分,包括:
1. 实现基本的 Raft 算法
2. 在 Raft 之上搭建一个容错的 KV 服务器
3. 增加raftlog GC和snapshot的支持
raft 包含三个部分
This part can be broken down into 3 steps, including:
- Leader election
- Log replication
- Raw node interface
题要求:从测试看题目要求
Part A 分为三个部分
This part can be broken down into 3 steps
//Raw node interface
project2a:
$(GOTEST) ./raft -run 2A
//Leader election
project2aa:
$(GOTEST) ./raft -run 2AA
//Log replication
project2ab:
$(GOTEST) ./raft -run 2AB
//Raw node interface
project2ac:
$(GOTEST) ./raft -run 2AC
Log replication
Raw node interface
第二个资料:群分享
第三个资料:
- Go Etcd 源码学习【1】开篇.md
- Go Etcd 源码学习【2】编译看看.md
- Go Etcd 源码学习【3】raftexample 学习.md
- Go Etcd 源码学习【4】raftNode 模块.md
- Go Etcd 源码学习【5】Storage 是什么.md
- Go Etcd 源码学习【6】Transport 网络层模块.md
- Go Etcd 源码学习【7】WAL 是什么.md
- Go Etcd 源码学习【8】raft.node.md
- Go Etcd 源码学习【9】raft 状态机.md
- Go Etcd 源码学习【11】配置变更.md
Go Etcd 源码学习【3】raftexample 学习
https://articles.zsxq.com/id_od6o03o5ttpk.html
Go Etcd 源码学习【5】raftNode 模块
https://articles.zsxq.com/id_mpjcmfhvwftw.html
- https://boilingfrog.github.io/2021/08/18/etcd%E4%B8%ADraft%E7%9A%84
Go Etcd 源码学习【5】raftexample 中 Storage 是什么
https://articles.zsxq.com/id_vzi5tr44re0f.html
raft.Node 是纯状态机的实现。
raft.MemoryStorage 是存储的实现。
wal.WAL 是持久化 log 的实现。
这三个是业务 raftNode 组装功能的核心
Go Etcd 源码学习【9】 状态机
第四个资料:视频
2020 MIT 6.824 分布式系统
MIT著名课程:6.824: Distributed Systems
分布式系统 课程主页:https://pdos.csail.mit.edu/6.824/index.html
课程安排:https://pdos.csail.mit.edu/6.824/schedule.htm
https://www.bilibili.com/video/BV1R7411t71W/
https://mit-public-courses-cn-translatio.gitbook.io/mit6-824/lecture-08-zookeeper
第五个资料
第六个asktbug
https://asktug.com/t/topic/272938
https://asktug.com/t/topic/274731
第二步:讨论思路
问 project2aa 领导选择 参考 raftexample 这个吗?
如果用raft 层用 log.Debug 测试里没打出来,参考一下这个pr https://github.com/tidb-incubator/tinykv/pull/324 改一下 kv/config 的 log level,会更好定位一点
问2:Applied代表 最近日志,还是持久化日志,还是大多数统一路径呢?
// Applied is the last applied index. It should only be set when restarting
// raft. raft will not return entries to the application smaller or equal to
// Applied. If Applied is unset when restarting, raft might return previous
// applied entries. This is a very application dependent configuration.
Applied uint64
疑问3:// raft 状态机 func newRaft(c Config) raft {
疑问4:
持久化和没有持久化是怎么区分的,是wal写log吗?
疑问5:raft apply index 含义是什么?
行动:
https://www.zhihu.com/question/382888510
https://thesquareplanet.com/blog/raft-qa/
https://asktug.com/t/topic/69001
https://asktug.com/t/topic/69001
结果:
应用到state machine的index
state machine 怎么理解 就是写到内存里?
就是你收到了 raft log,
但是不一定把 kv 存到了本地(内存),把 kv 存到本地,
就算是 apply 了
「执行」
就是你收到了 raft log,但是不一定把 kv 存到了本地(内存),把 kv 存到本地,就算是 apply 了
只是保存了,算是persist
必须先存再执行
apply 还没到应用层,apply 之后还要 commit 给 leader
然后 leader 才返回给应用层
疑问6 HeartbeatTick 谁发出的?
https://asktug.com/t/topic/274755
https://asktug.com/t/topic/273977
https://asktug.com/t/topic/273265
行动:
- golang 定时任务方面time.Sleep和time.Tick的优劣对比
https://blog.csdn.net/Star_CSU/article/details/86650684
结果:
不知道
疑问7:随机赋予一个超时时间如何设定,然后整个业务逻辑是什么?
行动:
结果:
https://asktug.com/t/topic/274755/4
第三步:实现
第四步:测试
测试1 make project2aa
测试1 Leader election make project2aa
[ ] testUpdateTermFromMessage
// testUpdateTermFromMessage tests that if one server’s current term is
// smaller than the other’s, then it updates its current term to the larger
// value. If a candidate or leader discovers that its term is out of date,
// it immediately reverts to follower state.
// Reference: section 5.1
func testUpdateTermFromMessage(t *testing.T, state StateType) {
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
switch state {
case StateFollower:
r.becomeFollower(1, 2)
case StateCandidate:
r.becomeCandidate()
case StateLeader:
r.becomeCandidate()
r.becomeLeader()
}
r.Step(pb.Message{MsgType: pb.MessageType_MsgAppend, Term: 2})
if r.Term != 2 {
t.Errorf("term = %d, want %d", r.Term, 2)
}
if r.State != StateFollower {
t.Errorf("state = %v, want %v", r.State, StateFollower)
}
}
TestCandidateUpdateTermFromMessage2AAce
—- PASS: TestCandidateUpdateTermFromMessage2AA (0.00s)
TestLeaderUpdateTermFromMessage2AA
第一次测试失败:
raft_paper_test.go:67: term = 1, want 2
raft_paper_test.go:70: state = StateLeader, want StateFollower
--- FAIL: TestLeaderUpdateTermFromMessage2AA (0.00s)
第二次测试
—- PASS: TestLeaderUpdateTermFromMessage2AA (0.00s)
TestLeaderBcastBeat2AA
// TestLeaderBcastBeat tests that if the leader receives a heartbeat tick,
// it will send a MessageType_MsgHeartbeat with m.Index = 0, m.LogTerm=0 and empty entries
// as heartbeat to all followers.
// Reference: section 5.2
func TestLeaderBcastBeat2AA(t *testing.T) {
// heartbeat interval
hi := 1
r := newTestRaft(1, []uint64{1, 2, 3}, 10, hi, NewMemoryStorage())
r.becomeCandidate()
r.becomeLeader()
r.Step(pb.Message{MsgType: pb.MessageType_MsgPropose, Entries: []*pb.Entry{{}}})
r.readMessages() // clear message
for i := 0; i < hi; i++ {
r.tick()
}
msgs := r.readMessages()
sort.Sort(messageSlice(msgs))
wmsgs := []pb.Message{
{From: 1, To: 2, Term: 1, MsgType: pb.MessageType_MsgHeartbeat},
{From: 1, To: 3, Term: 1, MsgType: pb.MessageType_MsgHeartbeat},
}
if !reflect.DeepEqual(msgs, wmsgs) {
t.Errorf("msgs = %v, want %v", msgs, wmsgs)
}
}
第一次测试 失败
raft_paper_test.go:120: msgs = [], want [{MsgHeartbeat 2 1 1 0 0 [] 0 <nil> false {} [] 0} {MsgHeartbeat 3 1 1 0 0 [] 0 <nil> false {} [] 0}]
'MessageType_MsgPropose' proposes to append data to its log entries. This is a special
type to redirect proposals to the leader. Therefore, send method overwrites
eraftpb.Message's term with its HardState's term to avoid attaching its
local term to 'MessageType_MsgPropose'.
When 'MessageType_MsgPropose' is passed to the leader's 'Step'
method, the leader first calls the 'appendEntry' method to append entries
to its log, and then calls 'bcastAppend' method to send those entries to
its peers.
When passed to candidate, 'MessageType_MsgPropose' is dropped.
When passed to
follower, 'MessageType_MsgPropose' is stored in follower's mailbox(msgs) by the send
method. It is stored with sender's ID and later forwarded to the leader by
rafthttp package.
第二次测试; ok
测试4:TestFollowerStartElection2AA
func TestFollowerStartElection2AA(t *testing.T) {
testNonleaderStartElection(t, StateFollower)
}
func TestCandidateStartNewElection2AA(t *testing.T) {
testNonleaderStartElection(t, StateCandidate)
}
// testNonleaderStartElection tests that if a follower receives no communication
// over election timeout, it begins an election to choose a new leader. It
// increments its current term and transitions to candidate state. It then
// votes for itself and issues RequestVote RPCs in parallel to each of the
// other servers in the cluster.
// Reference: section 5.2
// Also if a candidate fails to obtain a majority, it will time out and
// start a new election by incrementing its term and initiating another
// round of RequestVote RPCs.
// Reference: section 5.2
func testNonleaderStartElection(t *testing.T, state StateType) {
第一次测试结果;失败 没有写代码
测试5:TestFollowerStartElection2AA
// TestCandidateFallback tests that while waiting for votes,
// if a candidate receives an AppendEntries RPC from another server claiming
// to be leader whose term is at least as large as the candidate's current term,
// it recognizes the leader as legitimate and returns to follower state.
// Reference: section 5.2
func TestCandidateFallback2AA(t *testing.T) {
'MessageType_MsgHup' is used for election.
If a node is a follower or candidate, the
'tick' function in 'raft' struct is set as 'tickElection'.
If a follower or
candidate has not received any heartbeat before the election timeout,
it
passes 'MessageType_MsgHup' to its Step method and becomes (or remains) a candidate to
start a new election.
- tick 出发 tickElection
- 在选举超时之前没有收到ack ,需要重新发起一轮选举
测试通过:
关键线索:
重新选举
// Also if a candidate fails to obtain a majority, it will time out and
// start a new election by incrementing its term and initiating another
// round of RequestVote RPCs.
测试 5 TestLeaderElectionInOneRoundRPC2AA
- 目的
// TestLeaderElectionInOneRoundRPC tests all cases that may happen in
// leader election during one round of RequestVote RPC:
// a) it wins the election
// b) it loses the election
// c) it is unclear about the result
// Reference: section 5.2
MessageType_MsgRequestVoteResponse
5.2 Leader election —TestFollowerVote2AA
Follower处理投票逻 RequestVote RPC
- 规则1 https://asktug.com/t/topic/69001
1、如果接收到的请求的term<=本follower节点的currentTerm,返回false,原论文这里只有小于,实际应该小于或者等于,因为相等的情况下,说明follower早就投过同任期的票了。
2、如果candidate的日志跟自己的一样或者比自己的新,则投出赞成票
那么怎样才算是更新呢,论文里提到:比较两个节点日志中的最新的日志记录,然后比较这两条日志记录的下标和所属任期号,所属任期号比较大的日志记录,那么它就是较新的,如果所属任期号相同,再比较下标值,下标值较大的则是较新的那一条。
3、若请求参数里的term大于自己的currentTerm,follower投票后,将自己的currentTerm赋值为请求参数term;(这条也是自己加的,论文体现在后面的节点规则中,
5.4.1 Election restriction
- 更高的term
make project2aa—TestLeaderCycle2AA
testLeaderCycle verifies that each node in a cluster can campaign
and be elected in turn. This ensures that elections work when not
starting from a clean slate (as they do in TestLeaderElection)
验证集群中的每个节点都可以竞选
- 错误日志
[root@localhost tinykv]# make project2aa >> a.out
make: * [Makefile:62:project2aa] 错误 1
[root@localhost tinykv]# vi a.out
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:2] [term: 0] received a MsgRequestVote message
2022/02/23 18:55:21 storage.FirstIndex failed
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:3] [term: 0] received a MsgRequestVote message
2022/02/23 18:55:21 storage.FirstIndex failed
2022/02/23 18:55:21 fucntion Step >> r.State =StateCandidate [node:1] [term: 1] received a MsgRequestVoteResponse message
2022/02/23 18:55:21 StepCandidate begin …
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 1 true 1 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 2 true 2 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse2 2 0 2
2022/02/23 18:55:21 becomeLeader:id =1 became leader at term =1
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:1] [term: 1] received a MsgRequestVoteResponse message
2022/02/23 18:55:21 StepCandidate begin …
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 1 true 1 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 2 true 2 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 3 true 3 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse2 3 0 2
2022/02/23 18:55:21 becomeLeader:id =1 became leader at term =1
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:2] [term: 1] received a MsgHup message
2022/02/23 18:55:21 Candidate term = 2
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:1] [term: 1] received a MsgRequestVote message
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:3] [term: 1] received a MsgRequestVote message
2022/02/23 18:55:21 storage.FirstIndex failed
2022/02/23 18:55:21 fucntion Step >> r.State =StateCandidate [node:2] [term: 2] received a MsgRequestVoteResponse message
2022/02/23 18:55:21 StepCandidate begin …
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 2 true 1 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse1 3 true 2 0
2022/02/23 18:55:21 MessageType_MsgRequestVoteResponse2 2 0 2
2022/02/23 18:55:21 becomeLeader:id =2 became leader at term =2
raft_test.go:116: campaignerID =2 after campaign of node 2, node 1 had state = StateLeader, want StateFollower
2022/02/23 18:55:21 fucntion Step >> r.State =StateFollower [node:3] [term: 2] received a MsgHup message
2022/02/23 18:55:21 Candidate term = 3
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:1] [term: 2] received a MsgRequestVote message
2022/02/23 18:55:21 fucntion Step >> r.State =StateLeader [node:2] [term: 2] received a MsgRequestVote message
raft_test.go:116: campaignerID =3 after campaign of node 3, node 1 had state = StateLeader, want StateFollower
raft_test.go:116: campaignerID =3 after campaign of node 3, node 2 had state = StateLeader, want StateFollower
raft_test.go:113: campaignerID =3 campaigning node 3 state = StateCandidate, want StateLeader
—- FAIL: TestLeaderCycle2AA (0.00s)
=== RUN TestVoteFromAnyState2AA
https://search.asktug.com/?q=TestLeaderCycle2AA
https://asktug.com/t/topic/272914/3