准备

docker-swarm 集群由管理节点跟工作节点组成。所以需要几台装有 docker 的机器作为 docker-swarm 的节点,我这里准备了三台 linux 虚拟机(vm1,vm2,vm3),搭建包含一个管理节点跟两个工作节点的最小 docker-swarm 集群

docker-swarm 需要的 daemon api 最低版本为 1.24 ,可以使用 docker version 查看 daemon api 版本,我的版本是 1.41

  1. $ docker version
  2. Client: Docker Engine - Community
  3. Version: 20.10.0
  4. API version: 1.41
  5. Go version: go1.13.15
  6. Git commit: 7287ab3
  7. Built: Tue Dec 8 18:57:35 2020
  8. OS/Arch: linux/amd64
  9. Context: default
  10. Experimental: true
  11. Server: Docker Engine - Community
  12. Engine:
  13. Version: 20.10.0
  14. API version: 1.41 (minimum version 1.12)
  15. Go version: go1.13.15
  16. Git commit: eeddea2
  17. Built: Tue Dec 8 18:56:55 2020
  18. OS/Arch: linux/amd64
  19. Experimental: false
  20. containerd:
  21. Version: 1.4.3
  22. GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
  23. runc:
  24. Version: 1.0.0-rc92
  25. GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
  26. docker-init:
  27. Version: 0.19.0
  28. GitCommit: de40ad0

集群搭建

创建管理节点

我已经提前在三台虚拟机上都装好 docker 了,以 vm1 作为管理节点,在 vm1 上执行 docker swarm init 命令

  1. $ docker swarm init --advertise--addr 192.168.1.1
  2. Swarm initialized: current node (d65uz80dl1y5cf43717fh2m07) is now a manager.
  3. To add a worker to this swarm, run the following command:
  4. docker swarm join \
  5. --token SWMTKN-1-4bfzferp69e97yzd7f83d6cadjjov5sl9klyqr8mhe0tp89m2o-6ic02cl2fzyq3jfvz7n611ujl \
  6. 192.168.48.128:2377
  7. To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

如果你的 Docker 主机有多个网卡,拥有多个 IP,必须使用 --advertise-addr 指定 IP,执行 docker swarm init 命令的节点,会自动成为管理节点

添加工作节点

vm1 vm2 上执行

  1. $ docker swarm join \
  2. --token SWMTKN-1-4bfzferp69e97yzd7f83d6cadjjov5sl9klyqr8mhe0tp89m2o-6ic02cl2fzyq3jfvz7n611ujl \
  3. 192.168.48.128:2377
  4. This node joined a swarm as a worker.

如果执行 docker swarm join 后报以下错误

  1. Error response from daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.48.128:2377: connect: no route to host"

这是由于管理节点的机器防火墙导致的,使用 systemctl status firewalld.service 查看防火墙状态

  1. $ systemctl status firewalld.service
  2. firewalld.service - firewalld - dynamic firewall daemon
  3. Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
  4. Active: active (running) since 2020-12-10 00:36:44 CST; 9h ago
  5. Docs: man:firewalld(1)
  6. Main PID: 737 (firewalld)
  7. Tasks: 2
  8. Memory: 1.4M
  9. CGroup: /system.slice/firewalld.service
  10. └─737 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid

可以看到防火墙是 running 状态,需要开放 2377 端口,或者直接关闭防火墙。由于我这里是演示环境,所以就简单点直接关闭防火墙

  1. $ systemctl status firewalld.service

关闭后再执行 docker swarm join 就可以了

查看集群

在管理节点上执行 docker node ls 查看集群节点信息

  1. $ docker node ls
  2. ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
  3. 50x6p24w5b1csmcoolrbjeh20 localhost.localdomain Ready Active 20.10.0
  4. d65uz80dl1y5cf43717fh2m07 * localhost.localdomain Ready Active Leader 20.10.0
  5. oo1py2br7jr96lk8rv7wkcxwt localhost.localdomain Ready Active 20.10.0

可以看到 d65uz80dl1y5cf43717fh2m07 这个节点是 leader 节点,其他两个是 worker 节点

服务管理

docker service 命令可以用来管理集群中的服务,该命令只能在管理节点上运行

创建服务

nginx 为例,使用 docker service create 创建一个 nginx 服务

  1. $ docker service create --replicas 2 -p 80:80 --name nginx nginx:1.13.7-alpine
  2. image nginx:1.19.5-alpine could not be accessed on a registry to record
  3. its digest. Each node will access nginx:1.19.5-alpine independently,
  4. possibly leading to different nodes running different
  5. versions of the image.
  6. e3gn7co90wqp4zbwpse6719bq
  7. overall progress: 2 out of 2 tasks
  8. 1/2: running [==================================================>]
  9. 2/2: running [==================================================>]
  10. verify: Service converged

成功创建服务之后,可以通过任意节点的 80 端口访问 nginx 服务

查看服务

docker service ls

使用 docker service ls 查看 swarm 集群运行的服务

  1. $ docker service ls
  2. ID NAME MODE REPLICAS IMAGE PORTS
  3. e3gn7co90wqp nginx replicated 2/2 nginx:1.19.5-alpine *:80->80/tcp

docker service inspect

使用 docker service inspect 可以查看服务详情信息,格式如下

  1. docker service inspect [OPTIONS] SERVICE [SERVICE...]

示例

  1. $ docker service inspect nginx
  2. [
  3. {
  4. "ID": "c1u4o956nygnthpclzpvp91r4",
  5. "Version": {
  6. "Index": 973
  7. },
  8. "CreatedAt": "2020-12-11T03:06:22.605951309Z",
  9. "UpdatedAt": "2020-12-11T03:06:33.981409197Z",
  10. "Spec": {
  11. "Name": "nginx",
  12. "Labels": {},
  13. "TaskTemplate": {
  14. "ContainerSpec": {
  15. "Image": "nginx:1.19.5-alpine@sha256:1e9c503db9913a59156f78c6420f6e2f01c8a3b71ceeeddcd7f604c4db0f045e",
  16. "Init": false,
  17. "StopGracePeriod": 10000000000,
  18. "DNSConfig": {},
  19. "Isolation": "default"
  20. },
  21. "Resources": {
  22. "Limits": {},
  23. "Reservations": {}
  24. },
  25. "RestartPolicy": {
  26. "Condition": "any",
  27. "Delay": 5000000000,
  28. "MaxAttempts": 0
  29. },
  30. "Placement": {
  31. "Platforms": [
  32. {
  33. "Architecture": "amd64",
  34. "OS": "linux"
  35. },
  36. {
  37. "OS": "linux"
  38. },
  39. {
  40. "OS": "linux"
  41. },
  42. {
  43. "Architecture": "arm64",
  44. "OS": "linux"
  45. },
  46. {
  47. "Architecture": "386",
  48. "OS": "linux"
  49. },
  50. {
  51. "Architecture": "ppc64le",
  52. "OS": "linux"
  53. },
  54. {
  55. "Architecture": "s390x",
  56. "OS": "linux"
  57. }
  58. ]
  59. },
  60. "ForceUpdate": 0,
  61. "Runtime": "container"
  62. },
  63. "Mode": {
  64. "Replicated": {
  65. "Replicas": 4
  66. }
  67. },
  68. "UpdateConfig": {
  69. "Parallelism": 1,
  70. "FailureAction": "pause",
  71. "Monitor": 5000000000,
  72. "MaxFailureRatio": 0,
  73. "Order": "stop-first"
  74. },
  75. "RollbackConfig": {
  76. "Parallelism": 1,
  77. "FailureAction": "pause",
  78. "Monitor": 5000000000,
  79. "MaxFailureRatio": 0,
  80. "Order": "stop-first"
  81. },
  82. "EndpointSpec": {
  83. "Mode": "vip",
  84. "Ports": [
  85. {
  86. "Protocol": "tcp",
  87. "TargetPort": 80,
  88. "PublishedPort": 80,
  89. "PublishMode": "ingress"
  90. }
  91. ]
  92. }
  93. },
  94. "PreviousSpec": {
  95. "Name": "nginx",
  96. "Labels": {},
  97. "TaskTemplate": {
  98. "ContainerSpec": {
  99. "Image": "nginx:1.19.5-alpine@sha256:1e9c503db9913a59156f78c6420f6e2f01c8a3b71ceeeddcd7f604c4db0f045e",
  100. "Init": false,
  101. "DNSConfig": {},
  102. "Isolation": "default"
  103. },
  104. "Resources": {
  105. "Limits": {},
  106. "Reservations": {}
  107. },
  108. "Placement": {
  109. "Platforms": [
  110. {
  111. "Architecture": "amd64",
  112. "OS": "linux"
  113. },
  114. {
  115. "OS": "linux"
  116. },
  117. {
  118. "OS": "linux"
  119. },
  120. {
  121. "Architecture": "arm64",
  122. "OS": "linux"
  123. },
  124. {
  125. "Architecture": "386",
  126. "OS": "linux"
  127. },
  128. {
  129. "Architecture": "ppc64le",
  130. "OS": "linux"
  131. },
  132. {
  133. "Architecture": "s390x",
  134. "OS": "linux"
  135. }
  136. ]
  137. },
  138. "ForceUpdate": 0,
  139. "Runtime": "container"
  140. },
  141. "Mode": {
  142. "Replicated": {
  143. "Replicas": 2
  144. }
  145. },
  146. "EndpointSpec": {
  147. "Mode": "vip",
  148. "Ports": [
  149. {
  150. "Protocol": "tcp",
  151. "TargetPort": 80,
  152. "PublishedPort": 80,
  153. "PublishMode": "ingress"
  154. }
  155. ]
  156. }
  157. },
  158. "Endpoint": {
  159. "Spec": {
  160. "Mode": "vip",
  161. "Ports": [
  162. {
  163. "Protocol": "tcp",
  164. "TargetPort": 80,
  165. "PublishedPort": 80,
  166. "PublishMode": "ingress"
  167. }
  168. ]
  169. },
  170. "Ports": [
  171. {
  172. "Protocol": "tcp",
  173. "TargetPort": 80,
  174. "PublishedPort": 80,
  175. "PublishMode": "ingress"
  176. }
  177. ],
  178. "VirtualIPs": [
  179. {
  180. "NetworkID": "kj5uhvjip1s25i77eixlsz6fr",
  181. "Addr": "10.0.0.8/24"
  182. }
  183. ]
  184. }
  185. }
  186. ]

等价于 docker service inspect c1u4o956nygnthpclzpvp91r4

更多用法参考: https://docs.docker.com/engine/reference/commandline/service_inspect/

docker service ps

使用 docker service ps 查看某个服务的详情信息

  1. $ docker service ps nginx
  2. ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
  3. wkgm6q8jnsfd nginx.1 nginx:1.19.5-alpine localhost.localdomain Running Running 4 hours ago
  4. 6qztt4rxmo20 nginx.2 nginx:1.19.5-alpine localhost.localdomain Running Running 4 hours ago

docker service ps 除了查看运行中的服务外,输出还会显示服务的历史记录

  1. $ docker servie ps nginx
  2. ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
  3. wkx10y8dt219 nginx.1 nginx:1.19.5-alpine localhost.localdomain Running Running 32 minutes ago
  4. sdnkq64hfz5k \_ nginx.1 nginx:1.19.5-alpine localhost.localdomain Shutdown Complete 33 minutes ago
  5. kijx58koymhi nginx.2 nginx:1.19.5-alpine localhost.localdomain Running Running 32 minutes ago

参考链接:https://docs.docker.com/engine/reference/commandline/service_ps/

docker service logs

使用 docker service logs 查看某个服务的运行日志,格式如下

  1. docker service logs [OPTIONS] SERVICE|TASK

示例

  1. $ docker service logs nginx
  2. nginx.2.6qztt4rxmo20@localhost.localdomain | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
  3. nginx.2.6qztt4rxmo20@localhost.localdomain | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
  4. nginx.2.6qztt4rxmo20@localhost.localdomain | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
  5. nginx.2.6qztt4rxmo20@localhost.localdomain | 10-listen-on-ipv6-by-default.sh: Getting the checksum of /etc/nginx/conf.d/default.conf
  6. nginx.2.6qztt4rxmo20@localhost.localdomain | 10-listen-on-ipv6-by-default.sh: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
  7. nginx.2.6qztt4rxmo20@localhost.localdomain | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
  8. nginx.2.6qztt4rxmo20@localhost.localdomain | /docker-entrypoint.sh: Configuration complete; ready for start up
  9. nginx.1.wkgm6q8jnsfd@localhost.localdomain | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
  10. nginx.1.wkgm6q8jnsfd@localhost.localdomain | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
  11. nginx.1.wkgm6q8jnsfd@localhost.localdomain | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
  12. nginx.1.wkgm6q8jnsfd@localhost.localdomain | 10-listen-on-ipv6-by-default.sh: Getting the checksum of /etc/nginx/conf.d/default.conf
  13. nginx.1.wkgm6q8jnsfd@localhost.localdomain | 10-listen-on-ipv6-by-default.sh: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
  14. nginx.1.wkgm6q8jnsfd@localhost.localdomain | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
  15. nginx.1.wkgm6q8jnsfd@localhost.localdomain | /docker-entrypoint.sh: Configuration complete; ready for start up
  16. nginx.1.wkgm6q8jnsfd@localhost.localdomain | 10.0.0.4 - - [10/Dec/2020:09:24:29 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36" "-"
  17. nginx.1.wkgm6q8jnsfd@localhost.localdomain | 2020/12/10 09:24:30 [error] 30#30: *1 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: 10.0.0.4, server: localhost, request: "GET /favicon.ico HTTP/1.1", host: "192.168.48.130", referrer: "http://192.168.48.130/"
  18. nginx.1.wkgm6q8jnsfd@localhost.localdomain | 10.0.0.4 - - [10/Dec/2020:09:24:30 +0000] "GET /favicon.ico HTTP/1.1" 404 555 "http://192.168.48.130/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36" "-"

服务伸缩

使用 docker service scale 对服务运行的容器数量进行伸缩

  • 业务运行平稳时,需要减少服务的运行容器数量

    1. $ docker service scale nginx=1
  • 业务处于高峰期时,增加服务的运行容器数量

    1. $ docker service scale nginx=3

    删除服务

    使用 docker service rmswarm 集群中移除某个服务

    1. $ docker service rm nginx