在aws上部署一个ecs集群服务;
- ecs 是亚马逊自研了一个容器集群管理服务;
- 费用预算较低,选择使用了Fargate类型;
- 把镜像发布到ecr 服务;
- 在aws进行CI/CD,使用CodeDeploy服务;
- CodeCommit: 代码托管服务;
- CodeBuild: 把代码编译放入基础镜像,生成待发布的镜像;
- CodeDeploy: 把代码部署出去;
- CodePipeline: 流水线,把上面三个步骤串联起来,自动化发布代码到ecs上面; 本文使用了蓝绿部署方式;
- aws 上面的中文文档看起来是用机器翻译了,非常不适合阅读,会造成理解的歧义;建议直接阅读英文的说明文档; 或者使用google搜索解决方案;
- 使用到的服务:
- ecs (容器服务)
- ec2 (负载均衡)
- ecr(镜像管理)
- codedeploy(自动化发布)
开始进行开发:
- 如何 使用 codedeply 来发布代码;
- docker tag centos-6.8:latest 785247703.dkr.ecr.ap-southeast-1.amazonaws.com/centos-6.8:latest
- $(aws ecr get-login —no-include-email —region ap-southeast-1) ;如果出错;则到keychian里面 lock login keychain ,and unlock login keychain,password is you mac password;
- docker push 785247703.dkr.ecr.ap-southeast-1.amazonaws.com/centos-6.8:latest
- task json 写入: /www/
- aws ecs register-task-definition —cli-input-json file://taskdef.json
- 碰到问题1: An error occurred (AccessDeniedException) when calling the RegisterTaskDefinition operation: User: arn:aws:iam::785247703:user/MAC-User-cli is not authorized to perform: ecs:RegisterTaskDefinition on resource: * 问题1的解决方式: 把如下这段新增策略并赋予 mac-cli-user 用户;
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:RegisterTaskDefinition",
"ecs:ListTaskDefinitions",
"ecs:DescribeTaskDefinition"
],
"Resource": [
"*"
]
}
]
}
* 碰到问题2: An error occurred (AccessDeniedException) when calling the RegisterTaskDefinition operation: User: arn:aws:iam::785247703:user/MAC-User-cli is not authorized to perform: iam:PassRole on resource: arn:aws:iam::785247703:role/ecsTaskExecutionRole
* 问题2的解决方式:把如下这段新增给一个新的策略名称,并赋予 mac-cli-user 用户;
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:PassRole"
],
"Resource": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole" //这里要改成实际的用户的id和要赋予的权限;
}]
}
- 如果不知道如何申请codecommit 秘钥,请查看这里:https://console.aws.amazon.com/iam/home?region=ap-southeast-1#/users/MAC-User-cli?section=security_credentials
- 后来发现是8080端口没开启,去开启后,就跳过了健康监测了;
碰到的问题
- 状态原因 CannotPullContainerError: Error response from daemon: pull access denied for centos-6.8, repository does not exist or may require ‘docker login’
- 核心问题是:首次的任务定义里面的: IMAGES_NAME 写错了:要写完整的路径;并带上tag;
- 删除一个集群,然后:
- aws ecs create-service —service-name hqf-service-a —cli-input-json file://create-service.json
- 不允许修改;所以直接新建一个服务;
- aws ecs update-service —cluster hqf-cluster —service hqf-service-new2 —task-definition ecs-hqf-family:2
- elb helath check: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-healthcheck.html
- Create a Pipeline with an Amazon ECR Source and ECS-to-CodeDeploy Deployment for web
- why aws ecs after codedeploy the instance is unhealth
- 任务的定义:
- https://docs.aws.amazon.com/zh_cn/AmazonECS/latest/developerguide/ECS_AWSCLI_Fargate.html#AWSCLI_register_task_definition
- https://aws.amazon.com/cn/blogs/devops/use-aws-codedeploy-to-implement-blue-green-deployments-for-aws-fargate-and-amazon-ecs/
- code build ~;
- codebuild 的时候,提示: BUILD_CONTAINER_UNABLE_TO_PULL_IMAGE;Unable to pull customer’s container image
- 构建环境的选择:
- https://stackoverflow.com/questions/41072342/codebuild-aws-command-not-found-when-ran
- version 没有选择;需要注意的是;build.yml 需要附上 install:docker-version;
总结:实践aws的codepipeline的流程;实现codebuild、codedeploy全自动化;
- 把基础的docker镜像推送到ecr:
- 打开ECR(创建一个镜像名称):https://ap-southeast-1.console.aws.amazon.com/ecr/repositories?region=ap-southeast-1
- docker tag {imageName}:latest 785247703.dkr.ecr.ap-southeast-1.amazonaws.com/{imageName}:latest
- $(aws ecr get-login —no-include-email —region ap-southeast-1)
- docker push 785247703.dkr.ecr.ap-southeast-1.amazonaws.com/{imageName}:latest
- 前提:创建日志流组:用于开启日志记录组(cloudWatchLog)(如果已有创建过,不需要重复创建)(只针对需要记录日志的才创建)
- aws logs create-log-group —log-group-name awslogs-hqf-group —region ap-southeast-1
- 首先打开codecommit:https://ap-southeast-1.console.aws.amazon.com/codesuite/codecommit/repositories?region=ap-southeast-1 (用于存储任务的基础定义)
- 创建事项目名:如:{hqf}
- 新增 taskdef.first.json /taskdef.json/ appspec.yaml;
- 其中 :taskdef.json (用于定义ecs发布任务) ; appspec.yaml (用于定义 ecs的service); taskdef.first.json (用于首个任务的创建,其和 taskdef.json 的差别只有 image 的内容;)
- 项目里面包含运转的一个web项目代码;
- taskdef.json 里面主要注意的修改点是:name/image/family 值;
- appspec.yaml 里面要注意的修改点是: REPOSITORY_URI 值;
- 注册一个初始化的任务:
- aws ecs register-task-definition —cli-input-json file://taskdef.first.json (把返回的revision ,填入到 create-service.json 的首行)
- 初始化一个首次的任务,相当于创建了一个任务组;
- 然后修改 taskdef.first.json 里面的 image的内容用占位符替代
; 文件名改成 taskdef.json; (这个文件名,在code deploy阶段要设置到进去) - 修改完tasdef.json 提交到 codecommit;
- aws ecs register-task-definition —cli-input-json file://taskdef.first.json (把返回的revision ,填入到 create-service.json 的首行)
- 创建负载均衡:
- 打开负载均衡(在ec2里面): https://ap-southeast-1.console.aws.amazon.com/ec2/v2/home?region=ap-southeast-1#LoadBalancers:sort=loadBalancerName
- 创建两个负载均衡目标群组; (用于Blue/Green 发布;以对外提供不间断服务)
- 为了两个负载均衡器设置健康检测;如 curl -I ‘http://127.0.0.1/health-check/‘ 返回200;部署才会识别为成功,才会识别为成功;
- 创建ECS集群;
- 打开ecs集群: https://ap-southeast-1.console.aws.amazon.com/ecs/home?region=ap-southeast-1#/clusters
- 新增 create-service.json 用于创建集群下的服务;
- create-service.json 里面内容需要注意修改里面的: taskDefinition/cluster/targetGroupArn/containerName/containerPort
- aws ecs create-service —service-name {service-name} —cli-input-json file://create-service.json
- 创建 CodeBuild (根据实际情况;codebuild 步骤不一定需要)
- 新建 buildspec.yml;里面输入构建内容;并上传到 codecommit;
- 环境选择基础的镜像docker;ubuntu操作系统来进行docker build;
- 新建服务角色;
- 到ecr 里面的 permission 栏目,授权 该服务角色,允许访问ecr;
- 创建 CodeDeploy
- 打开:https://ap-southeast-1.console.aws.amazon.com/codesuite/codedeploy/applications?region=ap-southeast-1
- 创建 选择应用程序: 新增 部署名称:hqf-deploy;在里面配置好,对应的配置;
- 创建完后,再创建部署组;这里会涉及到配置 service_name;
- 新增 CodePipeline
- 打开: https://ap-southeast-1.console.aws.amazon.com/codesuite/codepipeline/pipelines?region=ap-southeast-1
- 把上面几个步骤的:CodeCommit、CodeBuild 、CodeDeploy ,外加一个ecr获取;融合到 Codepipeline;
- 实现当代码提交的时候,会自动执行CodePipeline;
- 绑定域名,以允许外部访问:
- 首先打开 负载均衡器https://ap-southeast-1.console.aws.amazon.com/ec2/v2/home?region=ap-southeast-1#LoadBalancers:sort=loadBalancerName
- 拷贝你的DNS名称;然后设置到你的域名管理器(如dnspod);设置cname;(注意:直接ping,或者访问这个DNS名称,是不可行);
- 打开:安全管理组: https://ap-southeast-1.console.aws.amazon.com/ec2/v2/home?region=ap-southeast-1#SecurityGroups:sort=groupId
- 选择你的安装管理组:选择 入站 ,选择编辑,添加一下允许HTTP访问的规则;(其他值默认);这样就允许外部访问了;
- 首先打开 负载均衡器https://ap-southeast-1.console.aws.amazon.com/ec2/v2/home?region=ap-southeast-1#LoadBalancers:sort=loadBalancerName
- 一些问题:
- 请注意 buildspec.yml 如果提示语法错误,请检查是否把本应输入空格的地方,输成了tab制表符;
- codedeploy 的一些日志,可以进入 集群里面的ecs 的服务里面查看事件内容,一般如果不成功是 health check不过关;
- 很多时候都是权限问题;如碰到权限问题;请到iam:https://console.aws.amazon.com/iam/home#/users 去授权;
- 故障排除: https://docs.aws.amazon.com/codebuild/latest/userguide/troubleshooting.html#troubleshooting-build-must-specify-runtime
- Exception while trying to read the task definition artifact file from: SourceArtifact
- 产生这个问题的原因,猜测可能git里有些文件拷贝出来后有冲突;所以无法读到taskdef.json;
- 解决办法: 把taskdef.json以及appsepc.yaml 独立到一个git单独管理后正常;
- 访问返回502;一个可能的原因是:负载均衡组里面的 和 期待的负载均衡;已经缺少联系;去修改一下负载均衡的
- 每次删除服务后,重新创建服务,先定义首个任务,把首个任务的revision填入到 create-service.json 首行;再创建服务;
- 如果有改服务名,则需要到deploy那边修正下最新的服务名;
- 参考的资料
- 在aws使用ecr部署到ecs上的操作文档:https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-ecs-ecr-codedeploy.html#tutorials-ecs-ecr-codedeploy-loadbal
- codebuild主要参考:https://aws.amazon.com/cn/blogs/devops/build-a-continuous-delivery-pipeline-for-your-container-images-with-amazon-ecr-as-source/
- 负载均衡设置: https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-application-load-balancer.html
- how to set nginx log to stdout&stderr: https://docs.docker.com/config/containers/logging/
- how to set awsWatchLog: https://docs.aws.amazon.com/AmazonECS/latest/userguide/using_awslogs.html#specify-log-config
- 修改任务定义的快速操作手册:
- aws ecs register-task-definition —cli-input-json file://taskdef.first.json
- 拷贝上个步骤返回的”revision”: 25的值到: create-service: ecs-hqf-family:25;
- aws ecs create-service —service-name {new-service-name} —cli-input-json file://create-service.json
- 打开 deploy发布组;选择编辑;把 new-service-name 填入到最新的发布信息里面: https://ap-southeast-1.console.aws.amazon.com/codesuite/codedeploy/applications/hqf-deploy/deployment-groups/hqf-deploy-group?region=ap-southeast-1
- 收尾:进入 ecs 把 old-servce-name 的服务,删除;
- 其中如果报 does not have an associated load balancer; 则到ec2的负载均衡器;选择负载均衡,选择侦听器,查看编辑规则把负载均衡组加入或者编辑到其中一个侦听器里面;
- 后续:查看日志;进入ecs ->任务->logs 查看日志;
- Dockerfile 语法涉及到COPY的,记得要确保原来位置没有该文件,否则COPY失败;
- 使用命令行快速发布的工具: https://github.com/silinternational/ecs-deploy?source=post_page—————————————-
- 优势是,可以只管发布;复用以前的任务;非常方便调试;
- 小工具:
- 查看任务定义
- aws ecs describe-task-definition —task-definition ecs-hqf-family:15
- 如果要修改任务定义;则去把任务取消注册(https://ap-southeast-1.console.aws.amazon.com/ecs/home?region=ap-southeast-1#/taskDefinitions),然后通过命令行重新定义一下任务;并提交到codecommit;
- 查看任务定义
- 实际操作中的应用:
- 因为要导入数据,没有找到GUI化的导数据工具,最后是创建了一个adminer服务(docker用于管理数据的gui工具);
- 资源的免费额度:
- ELB 对新用户有750小时免费时长;
- 课后问题:
- 如何回滚?
- 如何可以打标最近3个映像tag;并只保留最新5个(或者最新1个tag);
- 开发日志如何查看;