Kubernetes officially released v1.26, the stability has been significantly improved

On December 8, 2022, Pacific Time, Kubernetes officially released v1.26 with the theme `Electrifying`   .

As the last version in 2022, many new functions have been added, and the stability has also been significantly improved. We will introduce the update of version 1.26 from the following perspectives.

Update overview:

  • Kube APIServer:  As the entry point for Kubernetes requests, this update adds 4 new KEP functions, and makes some optimizations in response compression, and 2 other functions have also been upgraded from Alpha to Beta.

  • Node:  We put the updates most closely related to kubelet here, mainly including 4 new KEP functions, and 4 functions are GA in this version.

  • Storage:  In terms of storage, the function of allocating volumes from snapshots of other namespaces (cross-namespaces) has been added, and 2 storage-related functions have entered the Beta stage, and 3 functions are officially GA.

  • Network:  It is mainly an update to Kube Proxy, including a performance-optimized KEP, and 2 functions have entered Beta, and 4 functions have officially entered GA.

  • Resource control and coordination:  mainly for the update of related resource controllers in kube-controller-manager, there are also 2 new KEP functions, 2 functions are upgraded from Alpha to Beta, and 1 function is formally GA.

  • Scheduler:  Mainly added an important KEP function - PodSchedulingReadiness is used to control when the Pod can be scheduled by the scheduler, and one function has been upgraded from Alpha to Beta.

  • Observability:  In terms of observability, a new mechanism for exposing component status - Component Health SLIs is added. In addition, many indicators are added to each component.

  • The kubectl command, kubeadm and client-go  also have some optimizations and Bug Fixes.

  • For the functions that have been GA, according to the version iteration strategy of Kubernetes, 11 Feature Gates were also removed in 1.26 . If these Feature Gates continue to be set in the component command, the component will not start normally.

Next, let's take a look at some of the more important API deprecations and changes that affect upgrades.

01

API deprecations
and changes

PR#110618 Kubelet no longer supports the v1alpha2 version of CRI, and the connected container runtime must implement the v1 version of the container runtime interface.

This means that Kubernetes v1.26 will not support containerd 1.5.x and earlier versions; you need to upgrade to containerd 1.6.x or later before you can upgrade the node's kubelet to 1.26.

PR#112306 flowcontrol.apiserver.k8s.io adds v1beta3 version, and sets v1beta2 as the optimal version. In 1.27, v1beta3 will be set as the optimal version.

PR#113336 CSIMigrationvSphere feature has been GA and this feature cannot be turned off.

Official tip: Do not upgrade to Kubernetes v1.26 if you need to use Windows, XFS or raw blocks. You can upgrade after vSphere CSI Driver adds relevant support in versions after v2.7.x.

PR#113710 --pod-eviction-timeoutflag in kube-controller-manager command is deprecated and removed in 1.27 along with --enable-taint-managerflag.

PR#112643 DynamicKubeletConfig was discarded in 1.23, and the logic in kubelet has been removed in 1.24. This update removes the logic in FeatureGateDynamicKubeletConfig and APIServer.

PR#112120 Remove some invalid klog related flags in Kube component.

02

It was APIServer

KEP-2799 Reduction of Secret-based Service Account Tokens

Added Alpha Feature Gate —— LegacyServiceAccountTokenTracking  to control whether to enable this feature.

When LegacyServiceAccountTokenTracking is enabled, the secret-based sa token will use the label kubernetes.io/legacy-token-last-used to record the last usage time.

KEP-3488 CEL for Admission Control

Related PR: PR#113314,PR#113349,PR#112994,PR#112792,PR#112926,PR#112858。

Based on the KEP-2876 CRD Validation Expression Language [1] provided by Kubernetes v1.25, this function adds a new resource under admissionregistration.k8s.io/v1alpha1—ValidatingAdmissionPolicy, which allows field validation when Validation Webhook is not used.

apiVersion: admissionregistration.k8s.io/v1alpha1kind: ValidatingAdmissionPolicymetadata:  name: "demo-policy.example.com"Spec:  failurePolicy: Fail  matchConstraints:    resourceRules:    - apiGroups:   ["apps"]      apiVersions: ["v1"]      operations:  ["CREATE", "UPDATE"]      resources:   ["deployments"]  validations:    - expression: "object.spec.replicas <= 5"

This will require the resource's spec.replicas field to be less than or equal to 5.

Added Alpha Feature Gate  —— ValidatingAdmissionPolicy to control whether to enable this feature

KEP-3352 Aggregated Discovery PR#113171

Currently, users can only traverse and request the Group and Version APIs to obtain the Discovery API, but this function reduces these calls to only two interfaces, /api and /apis.

Added Alpha Feature Gate  —— AggregatedDiscoveryEndpoint to control whether to enable this feature.

KEP-3325 Auth API to get self user attributes PR#111333

In the authentication/v1alpha1 group, a new resource SelfSubjectReview is added to provide users with querying their own user information mapped in Kubernetes.

And the kubectl alpha auth whoami command has been added to facilitate query.

$ kubectl alpha auth whoami -o yamlapiVersion: authentication.k8s.io/v1alpha1kind: SelfSubjectReviewstatus:  userInfo:    username: jane.doe    uid: b79dbf30-0c6a-11ed-861d-0242ac120002    groups:    - students    - teachers    - system:authenticated    extra:      skills:      - reading      - learning      subjects:      - math      - sports

Added Alpha Feature Gate  —— APISelfSubjectAttributesReview to control whether to enable this feature.

PR#112193 APIServer adds --aggregator-reject-forwarding-redirect flag, users can set it to false to continue forwarding the redirection response of AA (Aggregated API) Server, the default is true.

PR#113015 Custom resources can be specified through the --encryption-provider-config file, and these custom resources can be stored encrypted in etcd.

Response Compression

PR#112299 Based on load testing and production data collected from thousands of production Kubernetes clusters, the community observed that gzip compression in Kubernetes APIServer is currently suboptimal.

ISSUE: kubernetes/kubernetes#112296

Here are some reports[2] and meeting minutes[3].

PR#112309 Added DisableCompression field in kubeconfig, when set to true, it is required to no longer compress the response.

PR#112580 Add --disable-compression flag in kubectl, when set to true, it is required not to compress the response.

Functional stability upgrade

Alpha -> Beta

Feature Gate KEP
LegacyServiceAccountTokenNoAutoGeneration KEP-2799 Reduction of Secret-based Service Account Tokens
APIServerIdentity KEP-1965 kube-apiserver identity

03

Node (Kubelet)

KEP-3063  dynamic resource allocation PR#111023 

Add resource.k8s.io/v1alpha1 group and add resources related to dynamic resource allocation under this group - 'ResourceClaim', 'ResourceClass', 'ResourceClaimTemplate', 'PodScheduling'.

Added Alpha Feature Gate  —— DynamicResourceAllocation to control whether to enable this feature.

The new API is more flexible than Kubernetes' existing Device Plugins functionality, because it allows Pods to request special types of resources, which can be provided at the node level, cluster level, or according to other modes set by the user.

Similarly, the Pod structure also adds corresponding support for dynamic resource allocation.

apiVersion: v1kind: Podspec:  containers:  - name: with-resource    image: busybox    command: ["sh", "-c", "set && mount && ls -la /dev/"]    resources:      claims:      - name: resource  resourceClaims:  - name: resource    source:      resourceClaimName: shared-claim      # resourceClaimTemplateName: test-inline-claim-template

KEP-3545 Improved multi-numa alignment in Topology Manager PR#112914

This function better handles NUMA (Non-Uniform Memory Access) nodes by optimizing the TopologyManager.

Add a new configurable item topologyManagerPolicyOptions field and --topology-manager-policy-options flag to kubelet config and kubectl commands respectively to set additional configuration of Topology Manager Policy

And add three Alpha Feature Gates to control the configuration of Topology Manager Policy

  • TopologyManagerPolicyOptions

  • TopologyManagerPolicyAlphaOptions

  • TopologyManagerPolicyBetaOptions

TopologyManagerPolicyOptions controls whether the TopologyManagerPolicyOptions function is supported, and the other two are used to control whether the Alpha and Beta level Options of the TopologyManagerPolicy can be set.

Of course, the TopologyManager Feature Gate needs to be enabled to use the TopologyManager function, but the Feature Gate is already in the Beta stage and does not need to be actively set.

KEP-3386 Kubelet Evented PLEG for Better Performance PR#111384

This feature allows kubelet to reduce periodic polling by relying on container runtime interface (CRI) notifications as much as possible when tracking Pod status in a node, which reduces kubelet's CPU usage

Added Alpha Feature Gate  - EventedPLEG to control whether to enable this feature.

PR#86139  In previous versions, when httpGet was used in the container preStop and postStart lifecycle callbacks, even if the schemes were set to HTTPS, http was still used to access, and the headers set by the user would not be applied in the setting request.

lifecycle:  postStart:    port: 443    httpGet:      scheme: HTTPS    httpHeadlers:    - name: HEADER      value: VALUE

This PR fixes these problems, and when the https access is abnormal, it will fall back to the http request. When the fallback occurs, a LifecycleHTTPFallback event will be created for the Pod, and the kubelet_lifecycle_handler_http_fallbacks_total indicator will be updated.

In addition, an Alpha Feature Gate - ConsistentHTTPGetHandlers has been added  . Users can set --feature-gates=ConsistentHTTPGetHandlers=false in the kubelet to turn off the fallback behavior.

KEP-3503 Windows allows specifying whether pods are added to the node's network namespace PR#112961 

Added Alpha Feature Gate  - WindowsHostNetworking to control whether to enable this feature.

PR#112414 allows merging multi-line options in /etc/resolv.conf into a single-line setting in Pods.

options ndots:1 attempts:3options ndots:1 attempts:3 ndots:5

->

options ndots:5 attempts:3

BUG FIX

PR#113041 Fixed an issue where kubelet picked the wrong container due to duplicate container names due to lifecycle.preStop when executing kubectl exec.

PR#108832 When the container has limit.cpu set, but requests.cpu is "0", cgroups cpuShares takes the minimum value of 2 instead of using limit.cpu.

PR#112184 When kubelet only sets --cloud-provider or --node-ip, it will make sure to clear the invalid annotation in the node - alpha.kubernetes.io/provided-node-ip.

PR#113481 Fix kubectl logs --timestamps When viewing logs, the problem of chaotic random timestamps appears.

PR#112518 Fixed an issue where pods continued to run on nodes tainted with NoExecute taints when the PodDisruptionConditions feature gate was enabled.

PR#112123 Set the minimum value of cpuCFSQuotaPeriod from 1us to 1ms, setting the value below 1ms will fail the validation.

Related PR: PR#112077,PR#111520,PR#63437  

Functional stability upgrade

Beta -> GA

Feature Gate KEP
CPUManager KEP-3057 Graduate to CPUManager to GA
DevicePlugins KEP-3573 Graduate DeviceManager to GA
KubeletCredentialProviders KEP-2133 Kubelet Credential Provider
WindowsHostProcessContainers KEP-1981 Support for Windows privileged containers

04

storage

KEP-3294 Provision volumes from cross-namespace snapshots

Related PRs: PR#113186 , PR#kubernetes-csi/external-rpovisioner#805

Before Kubernetes 1.26, with VolumeSnapshot, users could allocate volumes from snapshots. But it cannot bind to VolumeSnapshots in other namespaces in PersistentVolumeClaim. ​​​​​​​​

apiVersion: v1kind: PersistentVolumeClaimspec:  storageClassName: csi-hostpath-sc  accessModes:  - ReadWriteOnce  resources:    requests:      storage: 1Gi  dataSourceRef:    apiGroup: snapshot.storage.k8s.io    kind: VolumeSnapshot    name: new-snapshot-demo    namespace: prod  volumeMode: Filesystem

This function supports users to allocate volumes from snapshots across namespaces by setting the newly added field spec.dataSourceRef.namespace in PVC

Added Alpha Feature Gate  —— CrossNamespaceVolumeDataSource to control whether to enable this feature

Functional stability upgrade

Alpha -> Beta

Feature Gate KEP
RetroactiveDefaultStorageClass KEP-3333 Retroactive default StorageClass assignement

NodeOutOfServiceVolumeDetach

KEP-2268 Non-graceful node shutdown

Beta -> GA

Feature Gate KEP
CSIMigrationAzureFile KEP-625 In-tree storage plugin to CSI Driver Migration
CSIMigrationvSphere KEP-1491 vSphere in-tree to CSI driver migration
DelegateFSGroupToCSIDriver KEP-2317 Allow Kubernetes to supply pod's fsgroup to CSI driver on mount

05

network

PR#110268 Optimize kube-proxy performance, it only sends rules changed in call to iptables-restore instead of whole ruleset

Added Alpha Feature Gate  —— MinimizeIPTablesRestore to control whether to enable this feature.

PR#108250 kube-proxy adds flag --iptables-localhost-nodeports to allow users to prohibit access to NodePort's Service through localhost.

PR#111806 If the user requests to use ipvs, but the system is not configured correctly, it will no longer fall back to iptables mode, but return an error.

PR#113363 kube-proxy will restart if it detects that the pod.Spec.PodCIDRs assigned to Nodes have changed.

PR#112133 removes the deprecated "userspace" proxy mode.

Functional stability upgrade

Alpha -> Beta

Feature Gate KEP
ProxyTerminatingEndpoints KEP-1669 Proxy Terminating Endpoints
ExpandedDNSConfig KEP-2595 Expanded DNS configuration

Beta -> GA

Feature Gate KEP
MixedProtocolLBService KEP-1435 Support of mixed protocols in Services with type=LoadBalancer
ServiceIPStaticSubrange KEP-3070 Reserve Service IP Ranges For Dynamic and Static IP Allocation
ServiceInternalTrafficPolicy KEP-2086 Service Internal Traffic Policy
EndpointSliceTerminatingCondition KEP-1672 Tracking Terminating Endpoints

06

Resource Control and Coordination

KEP-3017 PodHealthyPolicy for PodDisruptionBudget PR#113375

Added spec.unhealthyPodEvictionPolicy field to the PodDisruptionBudget resource to control when unhealthy Pods should be considered for eviction.

There are currently two values ​​that can be set for the spec.unhealthyPodEvictionPolicy field - IfHealthyBudget and AlwaysAllow​​​​​​​​

spec:  minAvailable: 2  selector:    matchLabels:      app: zookeeper  unhealthyPodEvictionPolicy: IfHealthyBudget

Added Alpha Feature Gate  —— PDBUnhealthyPodEvictionPolicy to control whether to enable this feature

KEP-3335 StatefulSet Start Ordinal PR#112744 

StatefulSet currently numbers Pods starting from 0.

This function adds a spec.ordinals.start field to StatefulSet to control the starting number of Pods. ​​​​​​​​

apiVersion: apps/v1kind: StatefulSetspec:  ordinals:    start: 1

Added Alpha Feature Gate  —— StatefulSetStartOrdinal to control whether to enable this feature

PR#112011  If multiple HPAs involve the same Pod, it will stop working and set the HPA's ScalingAction Condition to False and Reason to AmbiguousSelector. This PR also includes multiple HPAs pointing to the same Deployment.

Functional stability upgrade

Alpha -> Beta

Feature Gate KEP
JobPodFailurePolicy KEP-3329 Retriable and non-retriable Pod failures for Jobs
PodDisruptionConditions KEP-3329 Retriable and non-retriable Pod failures for Jobs

Beta -> GA

Feature Gate KEP
JobTrackingWithFinalizers KEP-2307 Job tracking without lingering Pods

07

Pod scheduling

KEP-3521 Pod Scheduling Readiness

Related PR: PR#113275,PR#113274,PR#113442

Not all Pods in the Pending state are ready to be scheduled, and some Pods cannot be successfully scheduled due to the lack of necessary resources, which will also bring additional work to the scheduler.

This feature adds the spec.schedulingGates field in the Pod to control whether the Pod is ready for actual scheduling. ​​​​​​​​

spec:  schedulingGates:  - name: <value>

Scheduling will only start when spec.schedulingGates is cleared:​​​​​​​​

$ kubectl get pod example-po NAME         READY   STATUS            RESTARTS   AGE example-po   0/1     SchedulingGated   0          30s

Added Alpha Feature Gate  —— PodSchedulingReadiness to control whether to enable this feature

PR#111726 Output Pending status Pod information in Scheduler's debug Dummper.

Optimization and BUG FIX

PR#111809 When using Patch to update Pod status, in addition to net.ConnectionRefused, ServiceUnavailable and InternalError errors will also be retried

A ServiceUnavailable error occurs when the APIServer is temporarily unable to process a request.

E0805 20:54:21.624945 123623 scheduler.go:356] Error updating pod foo: the server is currently unable to handle the request (patch pods foo)

InternalError usually occurs due to a temporary failure of the webhook.

E0811 23:32:30.886582 213747 scheduler.go:357] Error updating pod foo: Internal error occurred: failed calling webhook "xyz": Post "xyz": context deadline exceeded

Functional stability upgrade

Alpha -> Beta

Feature Gate KEP
NodeInclusionPolicy KEP-3094 Take taints/tolerations into consideration when calculating PodTopologySpread skew

08

Observability

KEP-3466 Kubernetes Component Health SLIs

There was no standard format to expose the health information of Kubernetes components before. This function adds a new path /metrics/slis to each component to expose the service level indicator (ServiceLevelIndicator) in the Prometheus format.

Each component needs to expose two metrics:

  • gauge: the current health check status of the component

# HELP kubernetes_healthcheck [ALPHA] This metric records the result of a single healthcheck.# TYPE kubernetes_healthcheck gaugekubernetes_healthcheck{name="etcd",type="healthz"} 1kubernetes_healthcheck{name="etcd",type="readyz"} 1
  • counter: cumulative count of detected states for each health check

# HELP kubernetes_healthchecks_total [ALPHA] This metric records the results of all healthcheck.# TYPE kubernetes_healthchecks_total counterkubernetes_healthchecks_total{name="etcd",status="success",type="healthz"} 15kubernetes_healthchecks_total{name="etcd",status="success",type="readyz"} 15
  • It was APIServer: PR#112741

  • Kube Controller Manager: PR#112978

  • Kube Scheduler: PR#113026

  • Kubelet: PR#113030

  • Kube Proxy: PR#113057

  • Cloud Controller Manager: PR#113340

Added Alpha Feature Gate - ComponentSLIs to control whether to enable this feature.

Some metrics indicators have been added to each component, and some indicators calculation problems have been fixed.

09

Kubectl commands

Subcommand enhancements and fixes

PR#109525 kubectl wait command supports setting non-existent fields in -o jsonpath= , which can be useful when some fields are set asynchronously

PR#111096 kubectl api-resources adds categories column in -o wide output, and adds --categories parameter to support filtering based on categories.

PR#113819 kubectl alpha event moved to top-level command kubectl events.

PR#111093 fixed kubectlrollouthistory --revision=<version>-ojson|yaml<resource> to output json/yaml, return the latest version instead of the specified revision.

PR#111571 Optimize the prompt information of the kubectl label --dry-run command to prevent users from misunderstanding that the label has been set.

before​​​​​​​

$ kubectl label pod foo bar=baz --dry-run=serverpod/foo labeled

after​​​​​​​

$ kubectl label pod foo bar=baz --dry-run=serverpod/foo labeled (server dry run)

PR#112556 Optimize the error message when kubectl patch uses StrategicMerigePatch to update custom resources.

PR#112700 Fix kubectl covert choose wrong api version.

PR#109505 kubectl annotate no longer throws an error when setting an annotation with the same value as the original value.

PR#110907 When executing kubectl apply, if --namespace is specified, but --prune-allowlist is not specified, non-namespace resources will be deleted. This pr just adds a warning. In 1.28, when kubectl apply specifies a namespace, Resource pv & namespace without namespace are no longer deleted.

PR#113116 kubectl apply adds --prune-allowlist flag, used with --prune flag to replace the deprecated --prune-whitelist flag.

Other

PR#113146 The kubectl explain command can use OpenAPIv3 through the environment variable KUBECTL_EXPLAIN_OPENAPIV3.

PR#112553 Kubectl escapes terminal special characters in output. Fixes CVE-2021-25743.

PR#112150 Optimize kubectl's display of invalid requests returned by APIServer.

PR#112243, PR#112261 deprecate several flags of kubectl run command, even if they are set they will be ignored.

Shell Completion

PR#113636 kubectl shell completion supports displaying command descriptions in bash. ​​​​​​​​

bash-5.1$ kubectl a[tab][tab]alpha          (Commands for features in alpha)annotate       (Update the annotations on a resource)api-resources  (Print the supported API resources on the server)api-versions   (Print the supported API versions on the server, in the form of "group/version")apply          (Apply a configuration to a resource by file name or stdin)argo           (The command argo is a plugin installed by the user)attach         (Attach to a running container)auth           (Inspect authorization)autoscale      (Auto-scale a deployment, replica set, stateful set, or replication controller)
bash-5.1$ kubectl --c[tab][tab]--cache-dir              (Default cache directory)--certificate-authority  (Path to a cert file for the certificate authority)--client-certificate     (Path to a client certificate file for TLS)--client-key             (Path to a client key file for TLS)--cluster                (The name of the kubeconfig cluster to use)--context                (The name of the kubeconfig context to use)

PR#105867 provides shell completion for the kubectl plugin, and the plugin can provide shell completion for the plugin command through kube_complete-<pluginName>.

10

Kubeadm

Command fixes and enhancements

PR#113005 kubeadm join phase control-plane-preapare certs supports running with --dry-run.

PR#112945 supports dry-run mode for sub-phases, e.g. kubeadm reset phase cleanup-node --dry-run.

PR#111512 A new phase is added to the kubeadm init command -- show-join-command. Users can pass kubeadm init --skip-phase=show-join-command to skip printing the join information. This phase cannot be executed alone.

PR#112172 kubeadm reset command adds --cleanup-tmp-dir flag, which will clean up the content in /etc/kubernetes/tmp, the default is false.

PR#112732 kubeadm adds validation for mirror repository format in configuration.

PR#111783 When the CertificateAuthorityData of the kubeconfig read by kubeadm is empty, it will try to load the CA certificate from the external CertificateAuthority file.

PR#112508 Allow RSA and ECDSA format keys in preflight check.

PR#110972 kubeadm reset will try to clean up old data as much as possible during execution. Old data will be cleared when each reset phase is executed, and the default etcd data directory will be deleted when the remove-etcd-member phase is executed.

PR#112751 Fix the bug when validating ClusterConfiguration network related fields (dnsDomain, serviceSubnet, podSubnet).

Other

PR#111277 Optimize the error message when kubeadm runs subcommands.

PR#112008 Since the node-role.kubernetes.io/master taint is no longer set in the control plane nodes in 1.25, kubeadm no longer sets the node-role.kubernetes.io/master tolerance for CoreDNSDeployment.

PR#112000 The kubeadm init|join|upgrade command removes the --container-runtime flag, since since dockershim was removed, this flag has only one value that can be set --container-runtime=remote.

11

Client-Go

PR#112200client-go's SharedInformerFactory adds a Shutdown method to wait for all running informers in the Factory to end.

12

remove function

The new version removes the GA feature Feature Gate:

  • ServiceLoadBalancerClass

  • ServiceLBNodePortControl

  • CSRDuration

  • DefaultPodTopologySpread

  • NonPreemptingPriority

  • PodAffinityNamespaceSelector

  • PreferNominatedNode

  • SubOverhead

  • UnversionedKubeletConfigMap

  • IndexedJob

  • SuspendJob

13

Functional downgrade

Beta -> Alpha

LocalStorageCapacityIsolationFSQuotaMonitoring was upgraded to Beta in v1.25, but it was rolled back to Alpha due to the problem that the ConfigMap would not be synchronized to the Pod file system normally after the update.

Release history

References:

[1]  KEP-2876 CRD Validation Expression Language: 

https://github.com/kubernetes/enhancements/issues/2876

[2]  report:

https://docs.google.com/document/d/1rMlYKOVyujboAEG2epxSYdx7eyevC7dypkD_kUlBxn4/edit

[3]  Meeting Minutes:

https://youtu.be/GKBqyV8y8j0


 author of this article 

Cai Wei

Senior Cloud Native R&D Engineer of "DaoCloud"

Founder of the open source project Clusterpedia

Guess you like

Origin blog.csdn.net/DaoCloud_daoke/article/details/128497129
Recommended