Illustration of the industrial implementation of etcd addition, deletion, modification and inspection in kubernetes

Kubernetes implements centralized data storage based on etcd. Today, let's learn how to implement data read consistency, update consistency, and transaction based on etcd.

1. Data Storage and Versioning

1.1 Conversion of data storage

image.pngIn k8s, some data storage needs to be processed before it can be stored. For example, encrypted data such as secret, since it needs to be stored, it contains at least two operations, encrypted storage, decrypted and read. Transformer is implemented to complete this operation. Yes, it encrypts the data when storing etcd data, and decrypts it when reading

1.2 Resource version revision

image.pngWhen modifying (adding, deleting, modifying) operations in etcd, the revision will be incremented, and in k8s, this value is also used as the ResourceVersion of k8s resources. This mechanism is also the key mechanism for implementing watch. When operating etcd decodes the data obtained from etcd When the value is dynamically modified for the resource through the versioner component

1.3 Mapping of the data model

image.pngAfter reading the data from etcd, the data itself is a byte array. How to convert the corresponding data into our real runtime object? Remember our previous scheme and codec, where we know the corresponding data encoding format and the type of resource object, then through codec, byte array, and target type, we can complete the reflection of the corresponding data

2. Query interface consistency

image.pngData writing in etcd is implemented based on the leader's single-point writing and cluster quorum mechanism. It is not a strongly consistent data writing. If the nodes we access do not exist in half of the quorum nodes, it may cause short-term The data is inconsistent. For some strongly consistent scenarios, we can read the data through its revision mechanism to ensure that we can read the updated data.

// 省略非核心代码
func (s *store) Get(ctx context.Context, key string, resourceVersion string, out runtime.Object, ignoreNotFound bool) error {
	// 获取key
	getResp, err := s.client.KV.Get(ctx, key, s.getOps...)

    // 检测当前版本,是否达到最小版本的
	if err = s.ensureMinimumResourceVersion(resourceVersion, uint64(getResp.Header.Revision)); err != nil {
		return err
	}

	// 执行数据转换
	data, _, err := s.transformer.TransformFromStorage(kv.Value, authenticatedDataString(key))
	if err != nil {
		return storage.NewInternalError(err.Error())
	}
	// 解码数据
	return decode(s.codec, s.versioner, data, out, kv.ModRevision)
}

3. Create the interface implementationimage.png

When creating an interface data, the resource object will be checked first to avoid repeated creation of objects. At this time, the version field of the resource object will be used to perform a preliminary check, and then the transaction mechanism of etcd will be used to ensure the atomic operation of resource creation.

// 省略非核心代码
func (s *store) Create(ctx context.Context, key string, obj, out runtime.Object, ttl uint64) error {
	if version, err := s.versioner.ObjectResourceVersion(obj); err == nil && version != 0 {
		return errors.New("resourceVersion should not be set on objects to be created")
	}
	if err := s.versioner.PrepareObjectForStorage(obj); err != nil {
		return fmt.Errorf("PrepareObjectForStorage failed: %v", err)
	}
	// 将数据编码
	data, err := runtime.Encode(s.codec, obj)
	if err != nil {
		return err
	}
	
	// 转换数据
	newData, err := s.transformer.TransformToStorage(data, authenticatedDataString(key))
	if err != nil {
		return storage.NewInternalError(err.Error())
	}

	startTime := time.Now()
    // 事务操作
	txnResp, err := s.client.KV.Txn(ctx).If(
		notFound(key), // 如果之前不存在 这里是利用的etcd的ModRevision即修改版本为0, 寓意着对应的key不存在
	).Then(
		clientv3.OpPut(key, string(newData), opts...), // put修改数据
	).Commit()
	metrics.RecordEtcdRequestLatency("create", getTypeName(obj), startTime)
	if err != nil {
		return err
	}
	if !txnResp.Succeeded {
		return storage.NewKeyExistsError(key, 0)
	}

	if out != nil {
        // 获取对应的Revision
		putResp := txnResp.Responses[0].GetResponsePut()
		return decode(s.codec, s.versioner, data, out, putResp.Header.Revision)
	}
	return nil
}

func notFound(key string) clientv3.Cmp {
	return clientv3.Compare(clientv3.ModRevision(key), "=", 0)
}

4. Delete the implementation of the interface

image.pngThe deletion interface is mainly implemented through the CAS and transaction mechanism to ensure that no exception occurs in etcd. Even if the same resource is deleted concurrently, at least one node can be guaranteed to succeed.

// 省略非核心代码
func (s *store) conditionalDelete(ctx context.Context, key string, out runtime.Object, v reflect.Value, preconditions *storage.Preconditions, validateDeletion storage.ValidateObjectFunc) error {
	startTime := time.Now()
	// 获取当前的key的数据
	getResp, err := s.client.KV.Get(ctx, key)
	for {
		// 获取当前的状态
		origState, err := s.getState(getResp, key, v, false)
		if err != nil {
			return err
		}
		txnResp, err := s.client.KV.Txn(ctx).If(
			clientv3.Compare(clientv3.ModRevision(key), "=", origState.rev), // 如果修改版本等于当前状态,就尝试删除
		).Then(
			clientv3.OpDelete(key), // 删除
		).Else(
			clientv3.OpGet(key),	// 获取
		).Commit()
		if !txnResp.Succeeded {
			// 获取最新的数据重试事务操作
			getResp = (*clientv3.GetResponse)(txnResp.Responses[0].GetResponseRange())
			klog.V(4).Infof("deletion of %s failed because of a conflict, going to retry", key)
			continue
		}
		// 将最后一个版本的数据解码到out里面,然后返回
		return decode(s.codec, s.versioner, origState.data, out, origState.rev)
	}
}

5. Update the implementation of the interface

image.pngThere is no essential difference between the implementation of the update interface and the delete interface, but if multiple nodes are updated at the same time, one node will succeed in the concurrent operation of CAS. Too many operations, just return directly

// 省略非核心代码
func (s *store) GuaranteedUpdate(
	ctx context.Context, key string, out runtime.Object, ignoreNotFound bool,
	preconditions *storage.Preconditions, tryUpdate storage.UpdateFunc, suggestion ...runtime.Object) error {
	// 获取当前key的最新数据
	getCurrentState := func() (*objState, error) {
		startTime := time.Now()
		getResp, err := s.client.KV.Get(ctx, key, s.getOps...)
		metrics.RecordEtcdRequestLatency("get", getTypeName(out), startTime)
		if err != nil {
			return nil, err
		}
		return s.getState(getResp, key, v, ignoreNotFound)
	}

	// 获取当前数据
	var origState *objState
	var mustCheckData bool
	if len(suggestion) == 1 && suggestion[0] != nil {
		// 如果提供了建议的数据,则会使用,
		origState, err = s.getStateFromObject(suggestion[0])
		if err != nil {
			return err
		}
		//但是需要检测数据
		mustCheckData = true
	} else {
		// 尝试重新获取数据
		origState, err = getCurrentState()
		if err != nil {
			return err
		}
	}

	transformContext := authenticatedDataString(key)
	for {
		// 检查对象是否已经更新, 主要是通过检测uuid/revision来实现
		if err := preconditions.Check(key, origState.obj); err != nil {
			// If our data is already up to date, return the error
			if !mustCheckData {
				return err
			}
			// 如果检查数据一致性错误,则需要重新获取
			origState, err = getCurrentState()
			if err != nil {
				return err
			}
			mustCheckData = false
			// Retry
			continue
		}

		// 删除当前的版本数据revision
		ret, ttl, err := s.updateState(origState, tryUpdate)
		if err != nil {
			// If our data is already up to date, return the error
			if !mustCheckData {
				return err
			}

			// It's possible we were working with stale data
			// Actually fetch
			origState, err = getCurrentState()
			if err != nil {
				return err
			}
			mustCheckData = false
			// Retry
			continue
		}

		// 编码数据
		data, err := runtime.Encode(s.codec, ret)
		if err != nil {
			return err
		}
		if !origState.stale && bytes.Equal(data, origState.data) {
			// 如果我们发现我们当前的数据与获取到的数据一致,则会直接跳过
			if mustCheckData {
				origState, err = getCurrentState()
				if err != nil {
					return err
				}
				mustCheckData = false
				if !bytes.Equal(data, origState.data) {
					// original data changed, restart loop
					continue
				}
			}
			if !origState.stale {
                // 直接返回数据
				return decode(s.codec, s.versioner, origState.data, out, origState.rev)
			}
		}

		// 砖汉数据
		newData, err := s.transformer.TransformToStorage(data, transformContext)
		if err != nil {
			return storage.NewInternalError(err.Error())
		}

		opts, err := s.ttlOpts(ctx, int64(ttl))
		if err != nil {
			return err
		}
		trace.Step("Transaction prepared")

		startTime := time.Now()
		// 事务更新数据
		txnResp, err := s.client.KV.Txn(ctx).If(
			clientv3.Compare(clientv3.ModRevision(key), "=", origState.rev),
		).Then(
			clientv3.OpPut(key, string(newData), opts...),
		).Else(
			clientv3.OpGet(key),
		).Commit()
		metrics.RecordEtcdRequestLatency("update", getTypeName(out), startTime)
		if err != nil {
			return err
		}
		trace.Step("Transaction committed")
		if !txnResp.Succeeded {
			// 重新获取数据
			getResp := (*clientv3.GetResponse)(txnResp.Responses[0].GetResponseRange())
			klog.V(4).Infof("GuaranteedUpdate of %s failed because of a conflict, going to retry", key)
			origState, err = s.getState(getResp, key, v, ignoreNotFound)
			if err != nil {
				return err
			}
			trace.Step("Retry value restored")
			mustCheckData = false
			continue
		}
		// 获取put响应
		putResp := txnResp.Responses[0].GetResponsePut()

		return decode(s.codec, s.versioner, data, out, putResp.Header.Revision)
	}
}

6. A place not mentioned

I didn't find the transformer's implementation and registration place. I only saw a few places that cover resource types, as well as the list/watch interface. I will continue to learn later. I'll be here today, and see you next time.

> WeChat ID: baxiaoshi2020 > Follow the bulletin number to read more source code analysis articles 21 days greenhouse> For more articles, follow www.sreguide.com > This article is published by OpenWrite , a blog post multiple platform

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324187133&siteId=291194637