Evolution of Kubernetes: Transition from etcd to Distributed SQL

DevRel domain expert Denis Magda said he stumbled upon an article explaining how to seamlessly replace etcd with PostgreSQL. The article points out that the Kine project serves as an external etcd endpoint that translates Kubernetes etcd requests into SQL queries for the underlying relational database.

Inspired by this approach, Magda decided to further explore the potential of Kine, switching from etcd to YugabyteDB. YugabyteDB is a distributed SQL database built on PostgreSQL.

What's wrong with etcd?

etcd is the key-value store used by Kubernetes to store all cluster data.

It usually goes unnoticed until a Kubernetes cluster encounters scalability or high availability (HA) issues. Managing etcd in a scalable and highly available (HA) manner is especially challenging for large Kubernetes deployments.

Additionally, the Kubernetes community has growing concerns about the future development of the etcd project. Its community size is shrinking, and only a few maintainers have the interest and ability to support and advance the project.

These problems gave birth to Kine, an etcd API-to-SQL translation layer. Kine officially supports SQLite, PostgreSQL, and MySQL, systems that are growing in usage and have strong communities.

Why choose a distributed SQL database?

While PostgreSQL, SQLite, and MySQL are ideal choices for Kubernetes, they are designed and optimized for single server deployments. This means they can pose some challenges, especially for larger Kubernetes deployments with more stringent scalability and availability requirements.

If a developer's Kubernetes cluster requires an RPO (recovery point objective) of zero and an RTO (recovery time objective) measured in seconds, then architecting and maintaining a MySQL or PostgreSQL deployment will be a challenge. If people are interested in digging deeper into this topic, PostgreSQL's high availability options can be explored.

A distributed SQL database can be deployed across multiple racks, availability zones, or regions as a cluster of interconnected nodes. By design, they are highly available and scalable, so the same features can be improved for Kubernetes.

Start Kine on YugabyteDB

The decision to use YugabyteDB as the distributed SQL database for Kubernetes was influenced by PostgreSQL. YugabyteDB is built on the basis of PostgreSQL source code and reuses the upper part of PostgreSQL (query engine) while providing its own distributed storage implementation.

The tight connection between YugabyteDB and PostgreSQL allowed developers to redesign PostgreSQL's Kine implementation for YugabyteDB. Still, to keep an eye out, this won't be a simple story of lift and shift.

Now, turn those thoughts into action and start Kine on YugabyteDB. For this, an Ubuntu22.04 virtual machine equipped with 8 CPUs and 32GB of memory is used.

First, start a three-node YugabyteDB cluster on a virtual machine. Before distributing, you can experiment with distributed SQL databases on a single server. There are various ways to start YugabyteDB locally, but the author's preferred method is through Docker:

Shell 
 mkdir ~/yb_docker_data

 docker network create custom-network

 docker run -d --name yugabytedb_node1 --net custom-network \
 -p 15433:15433 -p 7001:7000 -p 9000:9000 -p 5433:5433 \
  -v ~/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start --tserver_flags="ysql_sequence_cache_minval=1" \
  --base_dir=/home/yugabyte/yb_data --daemon=false
 
 docker run -d --name yugabytedb_node2 --net custom-network \
  -p 15434:15433 -p 7002:7000 -p 9002:9000 -p 5434:5433 \
  -v ~/yb_docker_data/node2:/home/yugabyte/yb_data --restart unless-stopped \
 yugabytedb/yugabyte:latest \
  bin/yugabyted start --join=yugabytedb_node1 --tserver_flags="ysql_sequence_cache_minval=1" \
  --base_dir=/home/yugabyte/yb_data --daemon=false
     
 docker run -d --name yugabytedb_node3 --net custom-network \
  -p 15435:15433 -p 7003:7000 -p 9003:9000 -p 5435:5433 \
  -v ~/yb_docker_data/node3:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
 bin/yugabyted start --join=yugabytedb_node1 --tserver_flags="ysql_sequence_cache_minval=1" \
 --base_dir=/home/yugabyte/yb_data --daemon=false

Note: Set ysql_sequence_cache_minval=1 when starting the YugabyteDB node to ensure that the database sequence can be incremented by 1 sequentially. Without this option, a Kine connection to YugabyteDB will cache the next 100 IDs of the sequence. This can lead to "version mismatch" errors during Kubernetes cluster bootstrapping, as one Kine connection might insert records with IDs ranging from 1 to 100, while another Kine connection might insert records with IDs ranging from 101 to 200.

Next, start a Kine instance connected to YugabyteDB using the PostgreSQL implementation:

(1) Clone the Kine library:

Shell 
1 git clone https://github.com/k3s-io/kine.git && cd kine

(2) Start a Kine instance connected to the local YugabyteDB cluster:

Shell 
1 go run . --endpoint postgres://yugabyte:[email protected]:5433/yugabyte

(3) Connect to YugabyteDB and confirm that the Kine architecture is ready:

SQL 
 psql -h 127.0.0.1 -p 5433 -U yugabyte

 yugabyte=# \d
       List of relations
 Schema |    Name     |   Type   |  Owner
 --------+-------------+----------+----------
  public | kine        | table    | yugabyte
  public | kine_id_seq | sequence | yugabyte
(2 rows)

Great, the first test was successful. Kine sees YugabyteDB as PostgreSQL and starts without any issues. Now on to the next stage: Booting Kubernetes on top of Kine with YugabyteDB.

Launch Kubernetes on Kine with YugabyteDB

Kine can be used by various Kubernetes engines, including standard Kubernetes deployments, Rancher Kubernetes Engine (RKE), or K3 (a lightweight Kubernetes engine). For simplicity, the latter will be used.

A K3s cluster can be started with a simple command:

(1) Stop the Kine instance started in the previous section.

(2) Start K3s connected to the same local YugabyteDB cluster (the K3s executable is provided with Kine):

Shell 
curl -sfL https://get.k3s.io | sh -s - server --write-kubeconfig-mode=644 \
 --token=sample_secret_token \
--datastore-endpoint="postgres://yugabyte:[email protected]:5433/yugabyte"

(3) There should be no problem when Kubernetes starts, which can be confirmed by running the following command:

Shell 
 k3s kubectl get nodes
 NAME STATUS ROLES AGE VERSION
 ubuntu-vm Ready control-plane,master 7m13s v1.27.3+k3s1

Kubernetes runs seamlessly on YugabyteDB. This is thanks to YugabyteDB's nice features and runtime compatibility with PostgreSQL. This means that most of the libraries, drivers and frameworks created for PostgreSQL can be reused.

This may mark the end of this journey, looking back at K3s logs. During Kubernetes bootstrapping, the logs may report slow queries as follows:

SQL 
 INFO[0015] Slow SQL(total time: 3s) :
 SELECT
  *
 FROM (
  SELECT
  (
  SELECT
  MAX(rkv.id) AS id
  FROM
  kine AS rkv),
 (
  SELECT
  MAX(crkv.prev_revision) AS prev_revision
  FROM
  kine AS crkv
  WHERE
  crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value
  FROM
  kine AS kv
  JOIN (
  SELECT
  MAX(mkv.id) AS id
  FROM
  kine AS mkv
  WHERE
  mkv.name LIKE $1
  GROUP BY
  mkv.name) AS maxkv ON maxkv.id = kv.id
  WHERE
  kv.deleted = 0
  OR $2) AS lkv
 ORDER BY
  lkv.theid ASC
 LIMIT 10001

This may not be a significant issue when running YugabyteDB on a single machine, but once you switch to a distributed setup, such queries can become hot spots and create bottlenecks.

So cloned the Kine source code and started exploring the PostgreSQL implementation for potential optimization opportunities.

Kine optimization of YugabyteDB

Here, Magda teamed up with Franck Pachot, a database expert who is well versed in SQL layer optimization, with no or minimal changes to the application logic.

After examining the database schema generated by Kine and using EXPLAIN ANALYZE for some queries, Franck came up with basic optimizations that would be beneficial for any distributed SQL database.

Fortunately, the optimization does not require any changes to the Kine application logic. All it takes is to introduce some SQL-level enhancements. Therefore, a Kine fork was created that directly supports YugabyteDB.

At the same time, YugabyteDB's implementation has three optimizations compared to PostgreSQL:

(1) The primary index of the kine table has been changed from primary index (id) to primary INCEX (id asc). By default, YugabyteDB uses hash sharding to evenly distribute records across the cluster. However, Kubernetes runs many range queries on the id column, which makes it reasonable to switch to range sharding.

(2) The kine_name_prev_revision_uindex index has been updated as a covering index by including the id column in the index definition:

CREATE UNIQUE INDEX IF NOT EXISTS kine_name_prev_revision_uindex ON kine (name asc, prev_revision asc) INCLUDE(id);

YugabyteDB's index distribution is similar to table records. Therefore, index entries may refer to ids stored on different YugabyteDB nodes. To avoid extra network roundtrips between nodes, the id can be included in a secondary index.

(3) Kine performs many connections while fulfilling Kubernetes requests. If the query planner/optimizer decides to use nested loop joins, then by default the YugabyteDB query layer will read and join one record at a time. To speed up the process, batch nested loop joins can be enabled. YugabyteDB's Kine implementation does this by executing the following statement at startup:

ALTER DATABASE " + dbName + " set yb_bnl_batch_size=1024;

Give this optimized YugabyteDB implementation a try.

First, stop the previous K3s service and delete the Kine mode from the YugabyteDB cluster:

(1) Stop and delete the K3s service:

Shell 
 sudo /usr/local/bin/k3s-uninstall.sh
 sudo rm -r /etc/rancher

(2) Delete mode:

SQL 
 psql -h 127.0.0.1 -p 5433 -U yugabyte

 drop table kine cascade;

Next, start a Kine instance that provides an optimized version for YugabyteDB:

(1) Clone fork:

Shell 
 git clone https://github.com/dmagda/kine-yugabytedb.git && cd kine-yugabytedb

(2) Start Kine:

Shell 
 go run . --endpoint "yugabytedb://yugabyte:[email protected]:5433/yugabyte"

Kine starts up without any problems. The only difference now is that instead of specifying "postgres" in the connection string, you indicate "yugabytedb" to enable the optimized YugabyteDB implementation. Regarding the actual communication between Kine and YugabyteDB, Kine continues to use Go's standard PostgreSQL driver.

Build Kubernetes on an optimized version of Kine

Finally, start k3 on this optimized version of Kine.

To do this, you first need to build k3 from sources:

(1) Stop the Kine instance started in the previous section.

(2) Clone the K3s repository:

Shell 
 git clone --depth 1 https://github.com/k3s-io/k3s.git && cd k3s

(3) Open the go.mod file and add the following line at the end of the replace(..) section:

Go 
 github.com/k3s-io/kine => github.com/dmagda/kine-yugabytedb v0.2.0

This directive tells Go to use the latest version of Kinefork with the YugabyteDB implementation.

(4) Enable support for private repositories and modules:

Shell 
 go env -w GOPRIVATE=github.com/dmagda/kine-yugabytedb

(5) Make sure the changes take effect:

Shell 
 go mod tidy

(6) Prepare to build the full version of K3s:

Shell 
 mkdir -p build/data && make download && make generate

(7) Build the full version:

Shell 
 SKIP_VALIDATE=true make

It takes about five minutes for the build to complete.

NOTE: Once you stop using this custom K3s build, it can be uninstalled following the instructions.

Run sample workloads on optimized Kubernetes versions

After the build is complete, K3s can be started with an optimized version of Kine.

(1) Navigate to the directory containing the build artifacts:

Shell 
 cd dist/artifacts/

(2) Start K3s by connecting to the local YugabyteDB cluster:

Shell 
 sudo ./k3s server \
  --token=sample_secret_token \
 --datastore-endpoint="yugabytedb://yugabyte:[email protected]:5433/yugabyte"

(3) Confirm that Kubernetes starts successfully:

Shell 
 sudo ./k3s kubectl get nodes

 NAME STATUS ROLES AGE VERSION
 ubuntu-vm Ready control-plane,master 4m33s v1.27.4+k3s-36645e73
 

Now, deploy a sample application to ensure that the Kubernetes cluster does more than just bootstrap itself:

(1) An example of cloning a library using Kubernetes:

Shell 
 git clone https://github.com/digitalocean/kubernetes-sample-apps.git

(2) Deploy the Emojivoto application:

Shell 
 sudo ./k3s kubectl apply -k ./kubernetes-sample-apps/emojivoto-example/kustomize

(3) Make sure all deployments and services start successfully:

Shell 
 sudo ./k3s kubectl get all -n emojivoto
 
 NAME READY STATUS RESTARTS AGE
 pod/vote-bot-565bd6bcd8-rnb6x    1/1 Running 0 25s
 pod/web-75b9df87d6-wrznp  1/1 Running 0 24s
 pod/voting-f5ddc8ff6-69z6v   1/1 Running 0 25s
 pod/emoji-66658f4b4c-wl4pt  1/1 Running 0 25s

 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
 service/emoji-svc   ClusterIP 10.43.106.87 <none> 8080/TCP,8801/TCP 27s
 service/voting-svc   ClusterIP 10.43.14.118 <none> 8080/TCP,8801/TCP 27s
 service/web-svc   ClusterIP 10.43.110.237 <none> 80/TCP 27s

 NAME READY UP-TO-DATE AVAILABLE AGE
 deployment.apps/vote-bot  1/1 1 1 26s
 deployment.apps/web   1/1 1 1 25s
 deployment.apps/voting 1/1 1 1 26s
 deployment.apps/emoji 1/1 1 1 26s

 NAME DESIRED CURRENT READY AGE
 replicaset.apps/vote-bot-565bd6bcd8  1 1 1 26s
 replicaset.apps/web-75b9df87d6 1 1 1 25s
 replicaset.apps/voting-f5ddc8ff6  1 1 1 26s
 replicaset.apps/emoji-66658f4b4c  1  1   1   26s

(4) Call service/web svc with CLUSTER_IP:80 to trigger application logic:

Shell 
 curl 10.43.110.237:80

The application will respond with the following HTML:

HTML 
 <!DOCTYPE html>
 <html>
  <head>
  <meta charset="UTF-8">
  <title>Emoji Vote</title>
  <link rel="icon" href="/img/favicon.ico">
 
  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-60040560-4"></script>
  <script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', 'UA-60040560-4');
  </script>
 </head>
 <body>
  <div id="main" class="main"></div>
  </body>
 
 <script type="text/javascript" src="/js" async></script>
 
 </html>

epilogue

Get the job done! Kubernetes can now use YugabyteDB as a distributed and highly available SQL database for all its data.

Now it's time to move on to the next phase: deploying Kubernetes and YugabyteDB in a true cloud environment across multiple availability zones and regions, and testing how the solution handles various outages.

Software Development Build Tools

The JNPF rapid development platform is a full-stack development platform based on SpringBoot +Vue3 . It adopts micro-services, front-end and back-end separation architecture, and based on visual process modeling, form modeling, and report modeling tools, quickly builds business applications, and the platform can be local. It also supports K8S deployment.

Application experience address: https://www.jnpfsoft.com/?csdn , try it out!

The engine-based software rapid development mode, in addition to the above functions, is also equipped with visual function engines such as chart engine, interface engine, portal engine, organization user engine, etc. , to basically realize the visual construction of page UI. There are hundreds of functional controls and templates built in, so that it can meet the personalized needs of users to the greatest extent under the simple operation of dragging and dropping. Since the functions of the JNPF platform are relatively complete, this article chooses this tool to expand, so that you can see the advantages of low-code more intuitively .

Guess you like

Origin blog.csdn.net/wangonik_l/article/details/132361113