The way to build a stateful Kubernetes application is here

Kubernetes is one of the fastest growing infrastructure projects. In just five years, it has matured and become the foundation of modern infrastructure. From managed containers as a service (CaaS) in the public cloud to enterprise platform as a service (PaaS) in the data center, Kubernetes is becoming ubiquitous.

In the early days of Kubernetes, it was mainly considered a platform for running web-level stateless services. Stateful services (such as database and analytics workloads) either run in virtual machines or run as cloud-based managed services. But as Kubernetes becomes the most popular infrastructure layer, its ecosystem strives to make stateful applications an important part of the Kubernetes field.

There are many techniques for running stateful applications in Kubernetes, and each technique has its advantages and disadvantages.

This article focuses on the key methods of running stateful applications in Kubernetes, the options available, and the type of workload corresponding to each method. You need to be familiar with the key building blocks of Kubernetes storage infrastructure, such as persistent volumes, persistent volume declarations, and storage classes.

Cluster shared storage

The first method is to integrate the Kubernetes cluster with traditional storage infrastructure exposed through Samba, NFS, or GlusterFS. This approach can be easily extended to cloud-based shared file systems such as Amazon EFS, Azure Files, and Google Cloud Filestore.

In this architecture, the storage layer is completely separated from the computing layer managed by Kubernetes. There are two ways to use shared storage in Kubernetes Pods:

1) Native configuration: Fortunately, most shared file systems have built-in volume plug-ins in the upstream Kubernetes distribution, or have a container storage interface (CSI) driver. This enables cluster administrators to declaratively define persistent volumes (PV) using parameters specific to shared file systems or hosting services.

2) Host-based configuration: In this method, the startup script runs on each node responsible for mounting shared storage. Each node in a Kubernetes cluster has a consistent, well-known mount point that is open to the workload. Persistent volumes point to the host directory through hostPath or Local PV.

 

Because the underlying storage manages durability and persistence, the workload is completely separated from it. This allows Pod to be scheduled on any node without defining node affinity, thus ensuring that Pod is always scheduled on the selected node.

However, this approach is not ideal for stateful workloads that require high I/O throughput. The shared file system is not designed to provide the IOPS required by relational databases, NoSQL databases, and other write-intensive workloads.

Storage options: GlusterFS, Samba, NFS, Amazon EFS, Azure Files, Google cloud file storage.

Typical workloads: content management systems, machine learning training/reasoning assignments, and digital asset management systems.

StatefulSet

Kubernetes maintains the required configuration state through the controller. Deployment, ReplicaSet, DaemonSet and StatefulSet are some commonly used controllers.

StatefulSet is a special type of controller that makes it easy to run cluster workloads in Kubernetes. Cluster workloads usually have one or more master servers and multiple slave servers. Most databases are designed to run in cluster mode to provide high availability and fault tolerance.

Stateful cluster workloads continuously replicate data between the master and slave servers. To this end, the cluster infrastructure expects participating entities (master and slave entities) to have consistent and well-known endpoints to reliably synchronize state. But in Kubernetes, Pods are designed to be transient, and there is no guarantee that they have the same name and IP address.

Another requirement for stateful cluster workloads is a persistent storage backend-it is fault-tolerant and can handle IOPS.

To facilitate running stateful cluster workloads in Kubernetes, StatefulSet was introduced. Ensure that Pods belonging to the StatefulSet have stable and unique identifiers. They follow predictable naming conventions and also support orderly and "elegant" deployment and expansion.

Each Pod participating in the StatefulSet has a corresponding persistent volume statement (PVC), which follows a similar naming convention. When a Pod is terminated and rescheduled on another node, the Kubernetes controller will ensure that the Pod is associated with the same PVC, which will ensure that the state is complete.

Since each Pod in the StatefulSet has a dedicated PVC and PV, there is no hard and fast rule for using shared storage. But StatefulSet needs a fast, reliable, and durable storage layer as a backing, such as SSD-based block storage devices. After ensuring that the write is fully committed to the disk, regular backups and snapshots can be obtained from the block storage device.

Storage options: SSD, block storage devices (such as Amazon EBS, Azure Disks, GCE PD).

Typical workloads: Apache ZooKeeper, Apache Kafka, Percona Server for MySQL, PostgreSQL Automatic Failover and JupyterHub.

Cloud native storage

The rise of Kubernetes has created a new market segment. As storage is one of the key components of cloud-native infrastructure, a new segment of the cloud-native storage market has developed rapidly in recent years.

Cloud native storage brings traditional storage primitives and workflows to Kubernetes. Like other services, it is abstracted from the underlying hardware and operating system. From provision to decommissioning, the workflow follows the same life cycle as typical Kubernetes resources. Cloud native storage is application-centric, which means it understands the context of the workload, rather than an independent layer outside the cluster. Like other resources, cloud native storage can be expanded and contracted based on workload conditions and characteristics. It can centralize the individual disks connected to each node and expose them to the Kubernetes Pod as a unified logical volume.

From installing storage clusters to resizing volumes, cloud native storage enables Kubernetes administrators to use common YAML artifacts managed by the powerful kubectl CLI. Cloud native storage has functions such as dynamic resource configuration, support for multiple file systems, snapshots, local and remote backups, and dynamic volume size adjustment.

The only expectation of a cloud native storage platform is the availability of raw storage in the cluster-these raw storage can be aggregated and aggregated into a logical volume. The original storage can be the direct-attached storage (DAS) of the local cluster, or the block storage of the managed cluster running in the public cloud.

 

Cloud native storage is to containers what block storage is to virtual machines. Both are logical storage blocks separated from the underlying physical storage. Block storage is connected to the VM, and cloud native storage is available through persistent volumes used by containers.

Most cloud native storage platforms come with a custom scheduler to support the hyper-convergence of storage and computing. The custom scheduler works with Kubernetes' built-in scheduler to ensure that the Pod is always on the same node that has the data.

存储选择:NetApp Trident、Maya Data、Portworx、Reduxio、Red Hat OpenShift Container Storage、Robin Systems、Rook、StorageOS。

Typical workload: Any workload that expects durability and durability.

Guess you like

Origin blog.csdn.net/k8scaptain/article/details/104723079