环境准备

方案1

  • 购买至少4 台阿里云服务器

方案2

  • 电脑性能够强大, 开4 台虚拟机

image.png
image.png
我的电脑性能如下:
image.png

工作模式

Docker Engine 1.12 introduces swarm mode that enables you to create a cluster of one or more Docker Engines called a swarm. A swarm consists of one or more nodes: physical or virtual machines running Docker Engine 1.12 or later in swarm mode.
There are two types of nodes: managers and workers.
Docker Swarm - 图4
If you haven’t already, read through the swarm mode overview and key concepts.

节点

  • 管理节点 (上图Manager)
    • 管理节点可以与管理节点互相通信
    • 管理节点可以操作工作节点, 反之不行
  • 工作节点 (上图Worker)

    搭建集群

    docker swarm 命令用法

    ```shell docker swarm —help

Usage: docker swarm COMMAND

Manage Swarm

Options: —help Print usage

Commands:

初始化集群

init Initialize a swarm

加入集群

join Join a swarm as a node and/or manager

创建集群token

join-token Manage join tokens

离开集群

leave Leave the swarm unlock Unlock swarm unlock-key Manage the unlock key

更新集群

update Update the swarm

Run ‘docker swarm COMMAND —help’ for more information on a command.

  1. <a name="IzzIU"></a>
  2. ## docker swarm init 命令用法
  3. ```shell
  4. docker swarm init --help
  5. Usage: docker swarm init [OPTIONS]
  6. Initialize a swarm
  7. Options:
  8. # 广播地址
  9. --advertise-addr string Advertised address (format: <ip|interface>[:port])
  10. --autolock Enable manager autolocking (requiring an unlock key to start a stopped manager)
  11. --cert-expiry duration Validity period for node certificates (ns|us|ms|s|m|h) (default 2160h0m0s)
  12. --dispatcher-heartbeat duration Dispatcher heartbeat period (ns|us|ms|s|m|h) (default 5s)
  13. --external-ca external-ca Specifications of one or more certificate signing endpoints
  14. --force-new-cluster Force create a new cluster from current state
  15. --help Print usage
  16. --listen-addr node-addr Listen address (format: <ip|interface>[:port]) (default 0.0.0.0:2377)
  17. --max-snapshots uint Number of additional Raft snapshots to retain
  18. --snapshot-interval uint Number of log entries between Raft snapshots (default 10000)
  19. --task-history-limit int Task history retention limit (default 5)

实例

  1. 初始化集群, 并将指定的机器设置为主节点 (Manager 角色)

image.png
(上图在192.168.101.67 服务器上操作)

  1. 使用上图中的命令将另外机器加入到1. 步骤中初始化完毕的集群中 (Work 角色)

image.png
(上图在192.168.101.68 服务器上操作)
image.png
(上图在192.168.101.67 服务器上操作)

  1. 可以使用以下命令生成在1. 步骤中的加入命令 ```shell

    以manager 角色加入集群

    docker swarm join-token manager

以worker 角色加入集群

docker swarm join-token worker

  1. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644219814198-902ba92a-2192-4d41-9637-8656d01f4612.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=121&id=uce5f0a64&margin=%5Bobject%20Object%5D&name=image.png&originHeight=212&originWidth=1604&originalType=binary&ratio=1&rotation=0&showTitle=false&size=23305&status=done&style=none&taskId=u03199486-d7dc-4ffe-9db3-15dba0bd1e6&title=&width=916.5714285714286)(上图在192.168.101.67 服务器上操作)
  2. 4. 复制3. 步骤中的命令到另外未加入集群的服务中, 使其以work 角色加入集群
  3. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644219918125-4a31c77f-9fa1-4c7d-885c-4016d437a480.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=70&id=uc5fe641b&margin=%5Bobject%20Object%5D&name=image.png&originHeight=123&originWidth=1637&originalType=binary&ratio=1&rotation=0&showTitle=false&size=19446&status=done&style=none&taskId=u82c236e5-993e-4c92-b09b-28c18f8b8e6&title=&width=935.4285714285714)<br />(上图在192.168.101.70 服务器上操作)<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644219956513-b1ff6210-d6f5-49b1-98c9-1d8acdb9dc56.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=90&id=u2baf133e&margin=%5Bobject%20Object%5D&name=image.png&originHeight=157&originWidth=1487&originalType=binary&ratio=1&rotation=0&showTitle=false&size=26539&status=done&style=none&taskId=u1d68bc8f-2f00-4d41-bf8a-7f5de0c8b42&title=&width=849.7142857142857)<br />(上图在192.168.101.67 服务器上操作)
  4. 5. 使用docker swarm join-token manager 命令生成以manager 角色加入集群的命令, 4. 步骤一样, 使另外一台服务器以manager 角色加入集群, 再返回到67 上查看集群成员信息
  5. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644220092512-836cdec0-79e4-4582-aa37-7ca59085ac82.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=105&id=ub5d30f11&margin=%5Bobject%20Object%5D&name=image.png&originHeight=184&originWidth=1446&originalType=binary&ratio=1&rotation=0&showTitle=false&size=32544&status=done&style=none&taskId=uc82b07d9-0413-48b5-a48a-ef441a74fa8&title=&width=826.2857142857143)
  6. <a name="LGtTM"></a>
  7. ## 搭建swarm 集群的流程
  8. 1. 生成主节点 (使用init 命令)
  9. 1. 加入 (manager/worker)
  10. <a name="foBGR"></a>
  11. # Raft 一致性算法
  12. <a name="aNPNR"></a>
  13. ## 抛出问题
  14. - 集群中(现在是双主双从的模式), 假设一个节点挂了, 其他节点是否可以用?
  15. <a name="dwvhu"></a>
  16. ## 定义
  17. - 保证大多数节点存活才可以用, 至少 > 1, 集群至少大于3
  18. <a name="ufzs7"></a>
  19. ## 实验
  20. 1. 67 主节点停止 (模拟宕机), 可以发现在双主双从模式下的集群中, 另外一个节点也不能使用了
  21. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644220559001-5754a06d-027d-4507-a32d-0239255e28ac.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=48&id=u04ee40af&margin=%5Bobject%20Object%5D&name=image.png&originHeight=84&originWidth=1329&originalType=binary&ratio=1&rotation=0&showTitle=false&size=10878&status=done&style=none&taskId=u8028e613-4d45-47ae-bd33-4e5f26f1769&title=&width=759.4285714285714)<br />(上图在192.168.101.72 服务器上操作)
  22. 2. 再次上线67 主节点, 可以发现leader 67 变为了72
  23. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644220780332-968ae7b3-1c4b-4c8c-a0d3-7533e0e61421.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=110&id=u769f80c9&margin=%5Bobject%20Object%5D&name=image.png&originHeight=192&originWidth=1500&originalType=binary&ratio=1&rotation=0&showTitle=false&size=33200&status=done&style=none&taskId=u0a1bfbdf-c519-43d2-8652-fdd9d436a42&title=&width=857.1428571428571)<br />(上图在192.168.101.67 服务器上操作)
  24. 3. 70 (worker 节点) 离开集群, 可以发现在主节点的管理列表中, node status 变为down
  25. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644220880414-e07bd69d-6a4e-4b3d-84be-46ecacb77006.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=40&id=u93ef9338&margin=%5Bobject%20Object%5D&name=image.png&originHeight=70&originWidth=658&originalType=binary&ratio=1&rotation=0&showTitle=false&size=6687&status=done&style=none&taskId=u9e8bd028-0ce1-4cf2-a22d-7d2c0b7078f&title=&width=376)<br />(上图在192.168.101.70 服务器上操作)<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644220903702-c9d2e860-ea2d-4ff6-a19e-064fbcad8312.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=105&id=u6fed4f5b&margin=%5Bobject%20Object%5D&name=image.png&originHeight=184&originWidth=1475&originalType=binary&ratio=1&rotation=0&showTitle=false&size=32773&status=done&style=none&taskId=ud6a86ea3-1f27-47dc-ae6c-a93cde98d95&title=&width=842.8571428571429)<br />(上图在192.168.101.72 服务器上操作)
  26. 4. 在任意manager 角色的节点上生成以manager 角色加入集群的命令, 然后在3. 步骤中离开集群的节点再次以manager 角色加入集群中, 可以发现该节点可以使用node ls 命令查看集群成员信息
  27. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644221130910-d95fe3f1-b18b-4a1a-8685-5ea474924323.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=214&id=u1de743b7&margin=%5Bobject%20Object%5D&name=image.png&originHeight=375&originWidth=1667&originalType=binary&ratio=1&rotation=0&showTitle=false&size=66653&status=done&style=none&taskId=u801f96a8-f348-47d5-a482-1769eff916b&title=&width=952.5714285714286)
  28. 5. 67 70 两个manager 服务器上在重复1. 步骤, 只剩下72 主节点存活, 发现72 主节点无法使用node ls 命令, 整个集群down
  29. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/2516625/1644221692694-1d9d9aa3-4587-4c8b-ad29-a048f974ef98.png#clientId=uac7adfdf-922f-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=41&id=ue75da0d4&margin=%5Bobject%20Object%5D&name=image.png&originHeight=72&originWidth=1327&originalType=binary&ratio=1&rotation=0&showTitle=false&size=9382&status=done&style=none&taskId=u53a0e4f0-0bca-44c4-9ce3-7094e093fad&title=&width=758.2857142857143)
  30. <a name="wdKhR"></a>
  31. ## 总结
  32. - 集群需要可用状态, 必须保证有3 个主节点, 其中:
  33. - 2 个主节点down, 整个集群down
  34. - 1 个主节点down, 整个集群可以继续使用
  35. - 总结成一句话: 保证大多数存活, 才可以使用
  36. <a name="ROi7G"></a>
  37. # 动态扩缩容
  38. <a name="eKUej"></a>
  39. ## 定义
  40. - 将容器变为了服务 (e.g. redis 服务可能有10 个副本组成, 也就是同时开启了10 redis 容器)
  41. <a name="FGM4L"></a>
  42. ## docker service 命令用法
  43. ```shell
  44. docker service --help
  45. Usage: docker service COMMAND
  46. Manage services
  47. Options:
  48. --help Print usage
  49. Commands:
  50. create Create a new service
  51. inspect Display detailed information on one or more services
  52. ls List services
  53. ps List the tasks of a service
  54. rm Remove one or more services
  55. scale Scale one or multiple replicated services
  56. update Update a service
  57. Run 'docker service COMMAND --help' for more information on a command.

体验

docker service create 命令用法

  1. docker service create --help
  2. Usage: docker service create [OPTIONS] IMAGE [COMMAND] [ARG...]
  3. Create a new service
  4. Options:
  5. --constraint list Placement constraints (default [])
  6. --container-label list Container labels (default [])
  7. --dns list Set custom DNS servers (default [])
  8. --dns-option list Set DNS options (default [])
  9. --dns-search list Set custom DNS search domains (default [])
  10. --endpoint-mode string Endpoint mode (vip or dnsrr)
  11. #环境配置
  12. -e, --env list Set environment variables (default [])
  13. --env-file list Read in a file of environment variables (default [])
  14. --group list Set one or more supplementary user groups for the container (default [])
  15. --health-cmd string Command to run to check health
  16. --health-interval duration Time between running the check (ns|us|ms|s|m|h)
  17. --health-retries int Consecutive failures needed to report unhealthy
  18. --health-timeout duration Maximum time to allow one check to run (ns|us|ms|s|m|h)
  19. --help Print usage
  20. --host list Set one or more custom host-to-IP mappings (host:ip) (default [])
  21. --hostname string Container hostname
  22. #查看基本节点列表
  23. -l, --label list Service labels (default [])
  24. --limit-cpu decimal Limit CPUs (default 0.000)
  25. --limit-memory bytes Limit Memory (default 0 B)
  26. --log-driver string Logging driver for service
  27. --log-opt list Logging driver options (default [])
  28. --mode string Service mode (replicated or global) (default "replicated")
  29. --mount mount Attach a filesystem mount to the service
  30. --name string Service name
  31. --network list Network attachments (default [])
  32. --no-healthcheck Disable any container-specified HEALTHCHECK
  33. #查看暴露的节点端口
  34. -p, --publish port Publish a port as a node port
  35. --replicas uint Number of tasks
  36. --reserve-cpu decimal Reserve CPUs (default 0.000)
  37. --reserve-memory bytes Reserve Memory (default 0 B)
  38. --restart-condition string Restart when condition is met (none, on-failure, or any)
  39. --restart-delay duration Delay between restart attempts (ns|us|ms|s|m|h)
  40. --restart-max-attempts uint Maximum number of restarts before giving up
  41. --restart-window duration Window used to evaluate the restart policy (ns|us|ms|s|m|h)
  42. --secret secret Specify secrets to expose to the service
  43. --stop-grace-period duration Time to wait before force killing a container (ns|us|ms|s|m|h)
  44. -t, --tty Allocate a pseudo-TTY
  45. --update-delay duration Delay between updates (ns|us|ms|s|m|h) (default 0s)
  46. --update-failure-action string Action on update failure (pause|continue) (default "pause")
  47. --update-max-failure-ratio float Failure rate to tolerate during an update
  48. --update-monitor duration Duration after each task update to monitor for failure (ns|us|ms|s|m|h) (default 0s)
  49. --update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)
  50. #配置用户信息
  51. -u, --user string Username or UID (format: <name|uid>[:<group|gid>])
  52. --with-registry-auth Send registry authentication details to swarm agents
  53. #配置工作目录
  54. -w, --workdir string Working directory inside the container
  • 实例:
    • 启动自定义的nginx 服务

image.png

  • 根据服务名查看正在运行的nginx 服务

image.png

  • 查看正在运行的nginx 服务的具体信息
    1. docker service inspect mynginx
    2. [
    3. {
    4. "ID": "ljlbfk4122sy3cfczmzbjv6k7",
    5. "Version": {
    6. "Index": 29
    7. },
    8. "CreatedAt": "2022-02-07T11:52:41.695689971Z",
    9. "UpdatedAt": "2022-02-07T11:52:41.699705732Z",
    10. "Spec": {
    11. "Name": "mynginx",
    12. "TaskTemplate": {
    13. "ContainerSpec": {
    14. "Image": "nginx:latest@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31",
    15. "DNSConfig": {}
    16. },
    17. "Resources": {
    18. "Limits": {},
    19. "Reservations": {}
    20. },
    21. "RestartPolicy": {
    22. "Condition": "any",
    23. "MaxAttempts": 0
    24. },
    25. "Placement": {},
    26. "ForceUpdate": 0
    27. },
    28. "Mode": {
    29. "Replicated": {
    30. "Replicas": 1
    31. }
    32. },
    33. "UpdateConfig": {
    34. "Parallelism": 1,
    35. "FailureAction": "pause",
    36. "MaxFailureRatio": 0
    37. },
    38. "EndpointSpec": {
    39. "Mode": "vip",
    40. "Ports": [
    41. {
    42. "Protocol": "tcp",
    43. "TargetPort": 80,
    44. "PublishedPort": 8888,
    45. "PublishMode": "ingress"
    46. }
    47. ]
    48. }
    49. },
    50. "Endpoint": {
    51. "Spec": {
    52. "Mode": "vip",
    53. "Ports": [
    54. {
    55. "Protocol": "tcp",
    56. "TargetPort": 80,
    57. "PublishedPort": 8888,
    58. "PublishMode": "ingress"
    59. }
    60. ]
    61. },
    62. "Ports": [
    63. {
    64. "Protocol": "tcp",
    65. "TargetPort": 80,
    66. "PublishedPort": 8888,
    67. "PublishMode": "ingress"
    68. }
    69. ],
    70. "VirtualIPs": [
    71. {
    72. "NetworkID": "r6t241jfgja5jndllate9em8s",
    73. "Addr": "10.255.0.6/16"
    74. }
    75. ]
    76. },
    77. "UpdateStatus": {
    78. "StartedAt": "0001-01-01T00:00:00Z",
    79. "CompletedAt": "0001-01-01T00:00:00Z"
    80. }
    81. }
    82. ]
  • 查看正在运行的所有服务

    1. docker service ls
    2. ID NAME MODE REPLICAS IMAGE
    3. ljlbfk4122sy mynginx replicated 1/1 nginx:latest
  • 使用update 命令增加nginx 副本数

image.png

  • 在任意的manager 节点上查看nginx 的分布情况

image.png
(主节点1)
image.png
(主节点2)

  • 访问测试:

image.png
(上图访问了没有在该机器上运行自定义nginx service 的8888 端口)
image.png
(上图可以看出该机器没有运行nginx 容器, 但是由于在同一个swarm 集群中, 同样可以访问到)

  • 再次动态扩缩容:

    1. docker service update --replicas 10 mynginx
  • 再次查看每个主节点中的容器使用情况

image.png

  • 缩容

image.png

docker service scale 命令用法

  1. docker service scale --help
  2. Usage: docker service scale SERVICE=REPLICAS [SERVICE=REPLICAS...]
  3. Scale one or multiple replicated services
  4. Options:
  5. --help Print usage
  • 实例:
    • 将mynginx 副本数扩容到5

image.png

docker service rm 命令用法

  1. docker service rm --help
  2. Usage: docker service rm SERVICE [SERVICE...]
  3. Remove one or more services
  4. Aliases:
  5. rm, remove
  6. Options:
  7. --help Print usage
  • 实例:
    • 将mynginx 服务移除

image.png

与docker run 的区别

  • docker run 无扩缩容功能
  • docker service 具有扩缩容功能, 滚动更新, 灰度发布功能

概念总结

  • swarm
    • 集群的管理和编号, docker 可以初始化一个swarm 集群, 其他节点可以加入 (manager, worker)
  • node
    • 就是一个docker 节点, 多个节点就组成了一个网络集群 (manager, worker)
  • service
    • 任务, 可以在管理节点或者工作节点来运行, 核心
  • Task
    • 容器内的命令, 细节任务

image.png
image.png