Terraform基础概念——状态管理

Terraform基础概念——状态管理

我们在第一章的末尾提过,当我们成功地执行了一次terraform apply,创建了期望的基础设施以后,我们如果再次执行terraform apply,生成的新的执行计划将不会包含任何变更,Terraform会记住当前基础设施的状态,并将之与代码所描述的期望状态进行比对。第二次apply时,因为当前状态已经与代码描述的状态一致了,所以会生成一个空的执行计划。

初探状态文件

在这里,Terraform引入了一个独特的概念——状态管理,这是Ansible等配置管理工具或是自研工具调用SDK操作基础设施的方案所没有的。简单来说,Terraform将每次执行基础设施变更操作时的状态信息保存在一个状态文件中,默认情况下会保存在当前工作目录下的terraform.tfstate文件里。例如我们在代码中声明一个variable和一个resource:

  1. variable "access_key" {
  2. default = "xxx"
  3. }
  4. variable "secret_key" {
  5. default = "xxx"
  6. }
  7. variable "region" {
  8. default = "cn-beijing"
  9. }
  10. resource "alicloud_vpc" "vpc" {
  11. name = "testvpc"
  12. cidr_block = "172.16.0.0/12"
  13. }
  14. resource "alicloud_vswitch" "vsw" {
  15. vpc_id = "${alicloud_vpc.vpc.id}"
  16. cidr_block = "172.16.0.0/21"
  17. availability_zone = "cn-beijing-b"
  18. }

使用terraform apply后,我们可以看到terraform.tfstate的内容:

{
  "version": 4,
  "terraform_version": "1.0.5",
  "serial": 42,
  "lineage": "f89d321c-b8ed-86b5-7229-8ec85665af7d",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "alicloud_instance",
      "name": "instance",
      "provider": "provider[\"registry.terraform.io/hashicorp/alicloud\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "allocate_public_ip": null,
            "auto_release_time": "",
            "auto_renew_period": null,
            "availability_zone": "cn-beijing-b",
            "credit_specification": "",
            "data_disks": [],
            "deletion_protection": false,
            "description": "",
            "dry_run": false,
            "force_delete": null,
            "host_name": "iZ2ze4jen9kd2w5zupzbgsZ",
            "id": "i-2ze4jen9kd2w5zupzbgs",
            "image_id": "ubuntu_140405_64_40G_cloudinit_20161115.vhd",
            "include_data_disks": null,
            "instance_charge_type": "PostPaid",
            "instance_name": "test_foo",
            "instance_type": "ecs.n2.small",
            "internet_charge_type": "PayByTraffic",
            "internet_max_bandwidth_in": -1,
            "internet_max_bandwidth_out": 10,
            "io_optimized": null,
            "is_outdated": null,
            "key_name": "",
            "kms_encrypted_password": null,
            "kms_encryption_context": null,
            "password": "",
            "period": null,
            "period_unit": null,
            "private_ip": "172.16.7.38",
            "public_ip": "123.57.162.225",
            "renewal_status": null,
            "resource_group_id": "",
            "role_name": null,
            "security_enhancement_strategy": null,
            "security_groups": [
              "sg-2zefrn59xpib8t3ki3u1"
            ],
            "spot_price_limit": 0,
            "spot_strategy": "NoSpot",
            "status": "Running",
            "subnet_id": "vsw-2ze8olgzjemroh6xo73ol",
            "system_disk_auto_snapshot_policy_id": "",
            "system_disk_category": "cloud_efficiency",
            "system_disk_description": null,
            "system_disk_name": null,
            "system_disk_performance_level": "",
            "system_disk_size": 40,
            "tags": null,
            "timeouts": null,
            "user_data": null,
            "volume_tags": {},
            "vswitch_id": "vsw-2ze8olgzjemroh6xo73ol"
          },
          "sensitive_attributes": [],
          "private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjo2MDAwMDAwMDAwMDAsImRlbGV0ZSI6MTIwMDAwMDAwMDAwMCwidXBkYXRlIjo2MDAwMDAwMDAwMDB9fQ==",
          "dependencies": [
            "alicloud_security_group.default",
            "alicloud_vpc.vpc",
            "alicloud_vswitch.vsw"
          ]
        }
      ]
    },
    {
      "mode": "managed",
      "type": "alicloud_security_group",
      "name": "default",
      "provider": "provider[\"registry.terraform.io/hashicorp/alicloud\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "description": "",
            "id": "sg-2zefrn59xpib8t3ki3u1",
            "inner_access": true,
            "inner_access_policy": "Accept",
            "name": "default",
            "resource_group_id": "",
            "security_group_type": "normal",
            "tags": null,
            "vpc_id": "vpc-2zeqe39hzudeb8sxo6ny1"
          },
          "sensitive_attributes": [],
          "private": "bnVsbA==",
          "dependencies": [
            "alicloud_vpc.vpc"
          ]
        }
      ]
    },
    {
      "mode": "managed",
      "type": "alicloud_security_group_rule",
      "name": "allow_all_tcp",
      "provider": "provider[\"registry.terraform.io/hashicorp/alicloud\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "cidr_ip": "0.0.0.0/0",
            "description": "",
            "id": "sg-2zefrn59xpib8t3ki3u1:ingress:tcp:1/65535:intranet:0.0.0.0/0:accept:1",
            "ip_protocol": "tcp",
            "nic_type": "intranet",
            "policy": "accept",
            "port_range": "1/65535",
            "priority": 1,
            "security_group_id": "sg-2zefrn59xpib8t3ki3u1",
            "source_group_owner_account": "",
            "source_security_group_id": "",
            "type": "ingress"
          },
          "sensitive_attributes": [],
          "private": "bnVsbA==",
          "dependencies": [
            "alicloud_security_group.default",
            "alicloud_vpc.vpc"
          ]
        }
      ]
    },
    {
      "mode": "managed",
      "type": "alicloud_vpc",
      "name": "vpc",
      "provider": "provider[\"registry.terraform.io/hashicorp/alicloud\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "cidr_block": "172.16.0.0/12",
            "description": "",
            "dry_run": null,
            "enable_ipv6": null,
            "id": "vpc-2zeqe39hzudeb8sxo6ny1",
            "ipv6_cidr_block": "",
            "name": "testvpc",
            "resource_group_id": null,
            "route_table_id": "vtb-2zejbi6p615045sc22i7v",
            "router_id": "vrt-2ze7b1l1bu7cijpqw53ui",
            "router_table_id": "vtb-2zejbi6p615045sc22i7v",
            "secondary_cidr_blocks": null,
            "status": "Available",
            "tags": null,
            "timeouts": null,
            "user_cidrs": null,
            "vpc_name": "testvpc"
          },
          "sensitive_attributes": [],
          "private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjo2MDAwMDAwMDAwMDAsImRlbGV0ZSI6NjAwMDAwMDAwMDAwfX0="
        }
      ]
    },
    {
      "mode": "managed",
      "type": "alicloud_vswitch",
      "name": "vsw",
      "provider": "provider[\"registry.terraform.io/hashicorp/alicloud\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "availability_zone": "cn-beijing-b",
            "cidr_block": "172.16.0.0/21",
            "description": "",
            "id": "vsw-2ze8olgzjemroh6xo73ol",
            "name": "",
            "status": "Available",
            "tags": null,
            "timeouts": null,
            "vpc_id": "vpc-2zeqe39hzudeb8sxo6ny1",
            "vswitch_name": "",
            "zone_id": "cn-beijing-b"
          },
          "sensitive_attributes": [],
          "private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjo2MDAwMDAwMDAwMDAsImRlbGV0ZSI6NjAwMDAwMDAwMDAwfX0=",
          "dependencies": [
            "alicloud_vpc.vpc"
          ]
        }
      ]
    }
  ]
}

我们可以看到,查询到的data以及创建的resource信息都被以json格式保存在tfstate文件里。

我们前面已经说过,由于tfstate文件的存在,我们在terraform apply之后立即再次apply是不会执行任何变更的,那么如果我们删除了这个tfstate文件,然后再执行apply会发生什么呢?Terraform读取不到tfstate文件,会认为这是我们第一次创建这组资源,所以它会再一次创建代码中描述的所有资源。更加麻烦的是,由于我们前一次创建的资源所对应的状态信息被我们删除了,所以我们再也无法通过执行terraform destroy来销毁和回收这些资源,实际上产生了资源泄漏。所以妥善保存这个状态文件是非常重要的。

另外,如果我们对Terraform的代码进行了一些修改,导致生成的执行计划将会改变状态,那么在实际执行变更之前,Terraform会复制一份当前的tfstate文件到同路径下的terraform.tfstate.backup中,以防止由于各种意外导致的tfstate损毁。

在Terraform发展的极早期,HashiCorp曾经尝试过无状态文件的方案,也就是在执行Terraform变更计划时,给所有涉及到的资源都打上特定的tag,在下次执行变更时,先通过tag读取相关资源来重建状态信息。但因为并不是所有资源都支持打tag,也不是所有公有云都支持多tag,所以Terraform最终决定用状态文件方案。

还有一点,HashiCorp官方从未公开过tfstate的格式,也就是说,HashiCorp保留随时修改tfstate格式的权力。所以不要试图手动或是用自研代码去修改tfstate,Terraform命令行工具提供了相关的指令(我们后续会介绍到),请确保只通过命令行的指令操作状态文件。

生产环境的tfstate管理方案——Backend

到目前为止我们的tfstate文件是保存在当前工作目录下的本地文件,假设我们的计算机损坏了,导致文件丢失,那么tfstate文件所对应的资源都将无法管理,而产生资源泄漏。

另外如果我们是一个团队在使用Terraform管理一组资源,团队成员之间要如何共享这个状态文件?能不能把tfstate文件签入源代码管理工具进行保存?

把tfstate文件签入管代码管理工具是非常错误的,这就好比把数据库签入了源代码管理工具,如果两个人同时签出了同一份tfstate,并且对代码做了不同的修改,又同时apply了,这时想要把tfstate签入源码管理系统可能会遭遇到无法解决的冲突。

为了解决状态文件的存储和共享问题,Terraform引入了远程状态存储机制,也就是Backend。Backend是一种抽象的远程存储接口,如同Provider一样,Backend也支持多种不同的远程存储服务:

Terraform基础概念——状态管理 - 图1

Terraform Remote Backend分为两种:

  • 标准:支持远程状态存储与状态锁
  • 增强:在标准的基础上支持远程操作(在远程服务器上执行plan、apply等操作)

目前增强型Backend只有Terraform Cloud云服务一种。

状态锁是指,当针对一个tfstate进行变更操作时,可以针对该状态文件添加一把全局锁,确保同一时间只能有一个变更被执行。不同的Backend对状态锁的支持不尽相同,实现状态锁的机制也不尽相同,例如consul backend就通过一个.lock节点来充当锁,一个.lockinfo节点来描述锁对应的会话信息,tfstate文件被保存在backend定义的路径节点内;s3 backend则需要用户传入一个Dynamodb表来存放锁信息,而tfstate文件被存储在s3存储桶里。名为etcd的backend对应的是etcd v2,它不支持状态锁;etcdv3则提供了对状态锁的支持,等等等等。读者可以根据实际情况,挑选自己合适的Backend。接下来我将以consul为范例为读者演示Backend机制。

Consul简介以及安装

Consul是HashiCorp推出的一个开源工具,主要用来解决服务发现、配置中心以及Service Mesh等问题;Consul本身也提供了类似ZooKeeper、Etcd这样的分布式键值存储服务,具有基于Gossip协议的最终一致性,所以可以被用来充当Terraform Backend存储。

下载

https://www.consul.io/downloads

安装完成后的验证:

➜ ./consul
Usage: consul [--version] [--help] <command> [<args>]

Available commands are:
    acl            Interact with Consul's ACLs
    agent          Runs a Consul agent
    catalog        Interact with the catalog
    config         Interact with Consul's Centralized Configurations
    connect        Interact with Consul Connect
    debug          Records a debugging archive for operators
    event          Fire a new event
    exec           Executes a command on Consul nodes
    force-leave    Forces a member of the cluster to enter the "left" state
    info           Provides debugging information for operators.
    intention      Interact with Connect service intentions
    join           Tell Consul agent to join cluster
    keygen         Generates a new encryption key
    keyring        Manages gossip layer encryption keys
    kv             Interact with the key-value store
    leave          Gracefully leaves the Consul cluster and shuts down
    lock           Execute a command holding a lock
    login          Login to Consul using an auth method
    logout         Destroy a Consul token created with login
    maint          Controls node or service maintenance mode
    members        Lists the members of a Consul cluster
    monitor        Stream logs from a Consul agent
    operator       Provides cluster-level tools for Consul operators
    reload         Triggers the agent to reload configuration files
    rtt            Estimates network round trip time between nodes
    services       Interact with services
    snapshot       Saves, restores and inspects snapshots of Consul server state
    tls            Builtin helpers for creating CAs and certificates
    validate       Validate config files/directories
    version        Prints the Consul version
    watch          Watch for changes in Consul

安装完Consul后,我们可以启动一个测试版Consul服务:

./consul agent -dev

Consul会在本机8500端口开放Http终结点,我们可以通过浏览器访问http://localhost:8500

Terraform基础概念——状态管理 - 图2

使用Backend

terraform {
    backend "consul" {
    address = "localhost:8500"
    scheme  = "http"
    path    = "my-ucloud-project"
  }
}


variable "access_key" {
  default = "xxx"
}
variable "secret_key" {
  default = "xxx"
}


provider "alicloud" {
  access_key = var.access_key
  secret_key = var.secret_key
  region     = "cn-beijing"
}


resource "alicloud_vpc" "vpc" {
  name       = "testvpc"
  cidr_block = "172.16.0.0/12"
}

resource "alicloud_vswitch" "vsw" {
  vpc_id            = "${alicloud_vpc.vpc.id}"
  cidr_block        = "172.16.0.0/21"
  availability_zone = "cn-beijing-b"
}


resource "alicloud_security_group" "default" {
  name = "default"
  vpc_id = "${alicloud_vpc.vpc.id}"
}


resource "alicloud_instance" "instance" {
  # cn-beijing
  availability_zone = "cn-beijing-b"
  security_groups = ["${alicloud_security_group.default.id}"]

  # series III
  instance_type        = "ecs.n2.small"
  system_disk_category = "cloud_efficiency"
  image_id             = "ubuntu_140405_64_40G_cloudinit_20161115.vhd"
  instance_name        = "test_foo"
  vswitch_id = "${alicloud_vswitch.vsw.id}"
  internet_max_bandwidth_out = 10
}


resource "alicloud_security_group_rule" "allow_all_tcp" {
  type              = "ingress"
  ip_protocol       = "tcp"
  nic_type          = "intranet"
  policy            = "accept"
  port_range        = "1/65535"
  priority          = 1
  security_group_id = "${alicloud_security_group.default.id}"
  cidr_ip           = "0.0.0.0/0"
}

在terraform节中,我们添加了backend配置节,指定使用localhost:8500为地址(也就是我们刚才启动的测试版Consul服务),指定使用http协议访问该地址,指定tfstate文件存放在Consul键值存储服务的my-ucloud-project路径下。

当我们执行完terraform apply后,我们访问http://localhost:8500/ui/dc1/kv

Terraform基础概念——状态管理 - 图3

可以看到my-ucloud-project,点击进入:

Terraform基础概念——状态管理 - 图4

可以看到,原本保存在工作目录下的tfstate文件的内容,被保存在了Consul的名为my-ucloud-project的键下。

让我们执行terraform destroy后,重新访问http://localhost:8500/ui/dc1/kv

Terraform基础概念——状态管理 - 图5

可以看到,my-ucloud-project这个键仍然存在。让我们点击进去:

Terraform基础概念——状态管理 - 图6

可以看到,它的内容为空,代表基础设施已经被成功销毁。

Backend的权限控制以及版本控制

Backend本身并没有设计任何的权限以及版本控制,这方面完全依赖于具体的Backend实现。以AWS S3为例,我们可以针对不同的Bucket设置不同的IAM,用以防止开发测试人员直接操作生产环境,或是给予部分人员对状态信息的只读权限;另外我们也可以开启S3的版本控制功能,以防我们错误修改了状态文件(Terraform命令行有修改状态的相关指令)。

目前支持多工作区的Backend有:

AzureRM
Consul
COS
GCS
Kubernetes
Local
Manta
Postgres
Remote
S3