ACK 云盘定时快照使用文档

背景说明


Kubernetes 通过 csi external-snapshotter 来做到对云盘快照的支持, 官方只支持最基本的快照的创建及删除。
ACK 通过安装 storage-auto-snapshotter 组件来使用云盘的定时快照功能

事前部署


部署 csi-snapshotter

首先我们需要部署 csi-snapshotter 来支持基本快照的创建,需要确认当前 ACK 集群的版本

  • ACK集群版本 >= 1.18 , 在集群创建的时候就已经部署好了 csi-snapshotter, 无需进行额外部署
  • ACK集群版本 < 1.18 参考如下文章进行部署 https://developer.aliyun.com/article/757325.

部署 storage-auto-snapshotter 插件

  • 使用 kubectl apply -f deployment.yaml 命令创建 deployment。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: storage-auto-snapshotter
  namespace: kube-system
  labels:
    app: storage-auto-snapshotter
spec:
  selector:
    matchLabels:
      app: storage-auto-snapshotter
  template:
    metadata:
      labels:
        app: storage-auto-snapshotter
    spec:
      tolerations:
        - operator: "Exists"
      priorityClassName: system-node-critical
      serviceAccount: admin
      hostNetwork: true
      hostPID: true
      containers:
      - name: storage-auto-snapshotter
        image: registry.cn-beijing.aliyuncs.com/gyq193577/csi_auto_snapshotter:v1.16.6-9268802
        imagePullPolicy: Always
        env:
        - name: SNAPSHOT_CLASS
          value: ""
        volumeMounts:
        - name: date-config
          mountPath: /etc/localtime
      volumes:
      - name: date-config
        hostPath:
          path: /etc/localtime
  • 使用 kubectl get pods -nkube-system | grep storage-auto-snapshotter | grep Running 判断插件是否正常启动

部署 VolumeSnapshotPolicy CRD

  • 使用 kubectl create -f volumesnapshotcrd.yaml 创建 CRD
    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    metadata:
    name: volumesnapshotpolicies.storage.alibabacloud.com
    spec:
    group: storage.alibabacloud.com
    versions:
    - name: v1alpha1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        description: VolumeSnapshotPolicy is the Schema for the VolumeSnapshotPolicy API
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
                    of an object. Servers should convert recognized schemas to the latest
                    internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
                    object represents. Servers may infer this from the endpoint the client
                    submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: VolumeSnapshotPolicySpec defines the desired Specification of VolumeSnapshotPolicy
            properties:
              retentionDays:
                description: retentionDays is days to save snapshot
                format: int64
                type: integer
              repeatWeekdays:
                description: RepeatWeekdays is a list of days in a week to create disk snapshot
                type: array
                items:
                    type: string
              timePoints:
                description: TimePoints is a list of hours in a day to create disk snapshot
                type: array
                items:
                    type: string
            type: object
        type: object
    subresources:
      status: {}
    scope: Cluster
    names:
    kind: VolumeSnapshotPolicy
    plural: volumesnapshotpolicies
    shortNames:
    - vsp
  • 使用 kubectl get crd volumesnapshotpolicies.storage.alibabacloud.com 检查 crd 是否已经正确创建

添加权限

  • 托管版(标准托管版 & ACK Pro)无需添加权限。
  • 专有版 ACK 需要在 ram worker role 上添加如下权限。
    {
    "Version": "1",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecs:DescribeInstances",
                "ecs:CreateAutoSnapshotPolicy",
                "ecs:DeleteAutoSnapshotPolicy",
                "ecs:DescribeSnapshots",
                "ecs:ApplyAutoSnapshotPolicy",
                "ecs:ModifyAutoSnapshotPolicy",
                "ecs:DescribeAutoSnapshotPolicyEX"
            ],
            "Resource": [
                "*"
            ],
            "Condition": {}
        }
    ]
    }

定时快照功能使用


storage-operator deployment 启动之后, 系统会检查当前cluster中是否存在 VolumeSnapshotPolicy, 如果存在, 继续对比当前创建的实例是否存在于 ecs 系统中, 如果存在,则跳过, 不存在则创建。

创建 VolumeSnapshotPolicy 实例

apiVersion: v1
items:
- apiVersion: storage.alibabacloud.com/v1alpha1
  kind: VolumeSnapshotPolicy
  metadata:
    name: volumesnapshotpolicy1
  spec:
    retentionDays: 1
    repeatWeekdays: ["1", "2"]
    timePoints: ["11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23"]
kind: List
metadata:
    resourceVersion: ""
    selfLink: ""

该实例代表一个 ecs 上的自动快照策略,用户在 Kubernetes 上创建一个上面的实例,系统会自动在用户对应 ECS 服务上创建 自动快照策略,下面介绍下 spec 字段意义

字段名称 意义
retentionDays 自动快照创建保留天数 -1 为永久保存
repeatWeekdays 一周内自动创建快照的时间点(天)
timePoints 一天内自动创建快照的时间点(小时)

创建 pvc/pv, 并为 pvc 设置自动快照生成策略

  • 通过 给 pvc 设置 annotations 来关联快照策略
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: csi-pvc-snapshot-policy
    annotations:
    policy.volumesnapshot.csi.alibabacloud.com: volumesnapshotpolicy1 # 这里需要将 pvc 与上一步创建出来的 volumesnapshotpolicy 相关联
    spec:
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
      storage: 25Gi
    selector:
    matchLabels:
      alicloud-pvname: static-disk-pv-snapshot-policy
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
    name: csi-pv-snapshot-policy
    labels:
    alicloud-pvname: static-disk-pv-snapshot-policy
    spec:
    capacity:
    storage: 25Gi
    accessModes:
    - ReadWriteOnce
    persistentVolumeReclaimPolicy: Retain
    csi:
    driver: diskplugin.csi.alibabacloud.com
    volumeHandle: <your-disk-id>

注意,这时 storage-auto-snapshot 并不会将 pvc 绑定的云盘关联到自动关联的策略上。因为这时云盘还没有任何数据,没有必要创建快照造成资金损失。

创建 pod 关联这个 pvc/pv

  • 只有当云盘被pod挂载之后,自动快照策略才开始生效
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web-policy
spec:
  selector:
    matchLabels:
      app: nginx-policy
  serviceName: "nginx"
  template:
    metadata:
      labels:
        app: nginx-policy
    spec:
      containers:
      - name: nginx-policy
        image: nginx
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: pvc-disk
          mountPath: /data
      volumes:
        - name: pvc-disk
          persistentVolumeClaim:
            claimName: csi-pvc-snapshot-policy

当pod启动之后,storage-operator 会自动将 pv 对应的 diskId 与 VolumeSnapshotPolicy 进行关联,并按照策略进行快照生成。

查看自动快照策略是否生效

  • 登录 ecs 主页面
  • 点击 存储与快照 页面
  • 点击 自动快照策略 页面
  • 查看快照策略是否关联上了指定云盘

修改定时快照策略


注意

  1. 修改定时快照策略会影响该策略关联的所有云盘,请谨慎修改
  2. 不要在 ecs 页面上进行策略修改,所有的修改请通过 crd 进行修改
  • 通过修改 volumesnapshotpolicy crd 进行快照策略的变更
$ kubectl edit volumesnapshotpolicy volumesnapshotpolicy1
```
```
apiVersion: v1
items:
- apiVersion: storage.alibabacloud.com/v1alpha1
  kind: VolumeSnapshotPolicy
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"storage.alibabacloud.com/v1alpha1","kind":"VolumeSnapshotPolicy","metadata":{"annotations":{},"name":"volumesnapshotpolicy1"},"spec":{"repeatWeekdays":["1","2"],"retentionDays":1,"timePoints":["11","12","13","14","15","16","17","18","19","20","21","22","23"]}}
      policyId: sp-uf6ahkkav6016ondbiyk
    creationTimestamp: "2021-01-05T11:13:52Z"
    generation: 2
    managedFields:
    - apiVersion: storage.alibabacloud.com/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:spec:
          .: {}
          f:retentionDays: {}
          f:timePoints: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: "2021-01-05T11:13:52Z"
    - apiVersion: storage.alibabacloud.com/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:repeatWeekdays: {}
      manager: kubectl-edit
      operation: Update
      time: "2021-01-06T07:16:59Z"
    - apiVersion: storage.alibabacloud.com/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:policyId: {}
      manager: operator
      operation: Update
      time: "2021-01-06T07:16:59Z"
    name: volumesnapshotpolicy1
    resourceVersion: "339669"
    selfLink: /apis/storage.alibabacloud.com/v1alpha1/volumesnapshotpolicies/volumesnapshotpolicy1
    uid: 02257b4a-28e6-46f4-a767-81c0d117aba0
  spec:
    repeatWeekdays:
    - "1"
    - "2"
    - "3"
    - "4"
    retentionDays: 1
    timePoints:
    - "11"
    - "12"
    - "13"
    - "14"
    - "15"
    - "16"
    - "17"
    - "18"
    - "19"
    - "20"
    - "21"
    - "22"
    - "23"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
```
- 在 ecs 页面上观察定时快照策略是否已更新.

### 使用定时快照生成的快照进行磁盘恢复
---
- 绑定定时快照策略之后, 用户会在 ack 集群中看到自动创建的快照(volumesnapshot & volumesnapshotcontent)

```
$ kubectl get volumesnapshot

NAME                     READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                            VOLUMESNAPSHOTCLASS   VOLUMESNAPSHOT           AGE
s-uf6221xxxxxxxxxxx  true         41943040      Delete           diskplugin.csi.alibabacloud.com   default-snapclass     s-uf622145z6iibqtlrbwi   7m40s
s-uf65y0zxxxxxxxxx   true         41943040      Delete           diskplugin.csi.alibabacloud.com   default-snapclass     s-uf65y0zrhwsd581q60mg   7m40s
s-uf6a83xxxxxxxxxx   true         41943040      Delete           diskplugin.csi.alibabacloud.com   default-snapclass     s-uf6a83009o5s2jgcch9f   7m40s
s-uf6fmpbyrxxxxxxx   true         41943040      Delete           diskplugin.csi.alibabacloud.com   default-snapclass     s-uf6fmpbyrlm10amjicha   7m40s
  • 这时我们就可以使用任意一个 volumesnapshot 来进行云盘的恢复
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web-restore
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: nginx
        command: ["sh", "-c"]
        args: ["sleep 10000"]
        volumeMounts:
        - name: disk-ssd
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: disk-ssd
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: alicloud-disk-ssd
      resources:
        requests:
          storage: 20Gi
      dataSource:
        name: s-uf6221xxxxxxxxxxx
        kind: VolumeSnapshot
        apiGroup: snapshot.storage.k8s.io
  • 等待pod启动之后,我们就完成了定时快照中数据的恢复
上一篇:精读《webpack4.0 升级指南》


下一篇:Kubernetes pod oom 问题 排查记录