Cloud development environment, a new starting point for "developers going to the cloud"

Cloud development environment, a new starting point for "developers going to the cloud"

Click the link to learn more

img


Introduction

Cloud Studio team’s daily development practical case bootstrapping sharing

This article focuses on sharing how the Cloud Studio production and research team uses Tencent Cloud's "Cloud Development Environment - CDE" to improve the developer experience in the key stages of daily development-debugging-build-run.

Cloud Studio product is a development platform based on the cloud development environment, aiming to simplify the complex and solve many problems of local development.

The author starts from the challenges encountered by the team when initially deciding to go to the cloud, the pain points in the process of migrating to the cloud, and then explains the advantages and disadvantages brought by architectural reconstruction and how to focus on improving the startup performance of the development environment and reducing costs, and shares the progress made .

Finally, the author will leave you with some opportunities and ideas about "cloud development environment" now and for the future.

01 Initial pain points

The state of Cloud Studio's core codebase

Compared with traditional business projects, the business scenarios of Cloud Studio are extremely complex. Each module has different forms, such as vscode kernel, plug-ins, various file system management programs, dynamic container processes and other forms of applications, and more than ten programming languages. , dozens of background services and supporting construction and configuration tools for various programs.

Over time, fragmentation has become the biggest pain point for developers on the Cloud Studio team. Specifically, the dependencies are confusing, the versions of each module are difficult to unify, the cost of fragmented use of multiple tools is high, and collaboration and code sharing are difficult.

Move to Monorepo

To solve these problems, we formulated a code base transfer strategy, gradually transferred all code bases to a unified code warehouse, and established a trunk-based development model.

img

The monorepo model provides the following advantages:

  • Better dependency management
  • Consistent version management of components and modules
  • Centralized unified management platform (consisting of CODING CI, Bazel, etc.)
  • More convenient collaboration, document sharing, unified directory management, etc.

Monorepo Challenge

After switching to monorepo, we discovered another problem: although monorepo laid a solid foundation for a stable and unified development process, it made it impossible to complete the complete development process of DevOps (from code editing-submit-build-run-test) on daily laptops. ) becomes challenging, as shown in Figure 1:

  • Builds are bigger and take longer
  • Need to download several gigabytes of frequently changing artifacts to a laptop or build locally
  • Developing quickly outside of an office environment can be a challenge. Sometimes, cloning a new project and configuring a local development environment from scratch can take hours or even a day.

In addition to all of this, maintaining a consistent set of tools and keeping development local on the laptop were issues that needed our attention and solutions.

img

Figure 1 workspace anatomy diagram

02 Use Cloud Studio for remote development and bootstrapping

We asked ourselves, since what we are doing is a cloud development platform and we have mentioned so many advantages (scalability, enjoyment of cloud resources, isolation, easy access, etc.), can we run our own large warehouse on the cloud for development? In the environment, Cloud Studio product features are used for daily development, so as to continuously feed back the product experience. If the experience is not easy to use, I find myself trying to fix the problem. So we took the bootstrapping path.

What is Tencent Cloud-Cloud Development Environment?

As we looked for solutions to provide a faster, easier, and more secure development experience for our developers, we started looking at remote development as an alternative. The idea of ​​building a cloud development environment on Tencent Cloud's faster machines, pulling in seconds, and keeping all code libraries and tools in a secure, controlled environment.

This is the original intention of Cloud Development Environment: to build a CDE cloud development environment based on the Cloud Studio development platform .

What is Cloud Studio?

Cloud Studio is a browser-based integrated development environment (IDE) that provides developers with an always-on cloud workstation . Users do not need to install Cloud Studio when using it, and can use it anytime and anywhere by opening a browser. The cloud development experience is almost the same as that of local development, and the threshold for getting started is lower; it is extremely open, and third-party platforms can easily integrate Cloud Studio's cloud development capabilities through the SDK we provide.

>>Advantages

After ChatGPT became popular, how will the enterprise-level cloud development environment write the future?

After we released the Tencent Cloud Cloud Development Environment White Paper in March, we moved the Cloud Studio production research code and daily development into Tencent Cloud CDE. In the white paper, we define DevOps, containerization, code-based definition of CDE’s zero-setting threshold, and security as the main advantages of a remote development environment. We greatly improve security by running each developer's code on the latest stable version of the kernel and an isolated environment. For example, we have customized automatic scripts so that vulnerable application code can be processed during non-working hours. Be patched and updated. Monitoring many different aspects of a remote development environment is quite simple, and we can detect and identify malicious behavior at any time, such as digging holes, circumventing the wall, etc. Since Kubernetes pods running in a cloud development environment (behind a Kubernetes cluster) do not have the battery and resource constraints of a computer, scanning disks for malicious artifacts or activity during off-hours is trivial, but the value is huge.

>>Performance first

Faster Git, build and IDE experience by leveraging powerful Tencent Cloud resources - elastically increased to up to 32 cores and 128 GB RAM per environment and many additional features, as shown in Figure 2:

  • Cloud IDE startup: preheat commonly used development images, Miaji starts the cloud IDE, and installs and configures commonly used language servers, plug-ins, etc. for development according to the code definition of the cloud development environment (CDE), so that developers can have the best experience. At the same time we keep the kernel always fresh.

    >>Continuously tuned IDE startup link, after passing the test, it can be opened in an average of 2-3 seconds, as shown in the figure below

img

Figure 2: Layered architecture

  • Intranet download dependency construction: Set up a domestic central warehouse to find the nearest and fastest network download location; for the team's internal development, we optimized the network solution and set up the caching capability of the team's dependency library on the product to achieve second-level mounting without the need for Additional re-download.
  • Improved Git performance by:

>>Adopts Linux file system, which has better performance compared with laptop file system

>>Git network proxy, intranet access acceleration

>>Optimize Git configuration

  • Cloud development performance has been improved through the following methods: preloading required plug-ins in the development environment, preheating image startup, preheating dependent libraries, etc.; it also provides:

    >>Provide more computing resources, flexibly increase specifications during the compilation process, and reduce the occupancy of high specifications when not in use.

  • In addition, the remote development environment also provides:

    >>Multiple cloud development environments per user

    >>Isolated independently from other processes running on the laptop. Development can be truly parallel.

>>Environment coding and maintenance

Cloud Studio supports the visual definition configuration of workspace.yaml ( Cloud Studio advanced players: powerful YAML templates ), and saves the configuration as "custom templates". These templates provide great support for the cloud development of team members' monorepos warehouses. the value of:

  • Easy to configure - visual UI interface configuration, you can configure a customized new environment configuration in a code-free way
  • Support any local IDE - pre-configured IDE type to start by default
  • Access a consistent development environment in minutes
  • Tools, dependency configurations, plug-ins and testing tools in the pre-installation environment required for each monorepo
  • Pre-clone the repository to warm up the storage and start faster with more use

>>Security

  • There is no need to worry about code files being stolen, the persistent workspace provided by Cloud Studio will be specially encrypted (for detailed articles, see: Cloud Studio cloud development ensures enterprise source code security):

    >>Controlled development environment - only install safe and reliable tool plug-ins, prohibit manual installation of non-certified plug-ins, etc.

    >>Secure digital watermark - does not interfere with writing code, but when it is ready to be copied outside the development environment, we will encrypt the content to prevent the code from being taken away.

    >>Disable copy-paste and download capabilities, but still valid for the current workspace.

  • Security can be turned on or off as required via a switch.

  • Keep the environment up-to-date - turn on the automatic update capability, we will automatically update the latest plug-ins at night, perform security upgrades, and scan the environment for security issues:

    >> More secure tool chain - pre-defined security environment, controlled and secure security tool chain

    >>A more secure software supply chain

    >>Seamless configuration changes----Supports underlying base image switching, version switching, etc. You can enjoy the new environment with one-click hot start without losing the current development status.

    >>Team-level plug-in market, automatically updated with the latest plug-ins

    >>Ability to perform security scans on images before publishing them

The workspace provided by Cloud Studio is persistent, so engineers don’t need to worry about losing their personal settings, files, and code changes. This allows engineers to continue their work on different devices and supports multiple engineers to collaborate in a single environment.

Cloud Studio localized cloud development platform

>>Provide mainstream development language environment

We create a development platform for Chinese developers that is more adaptable to Chinese development habits. We also have built-in template libraries for dozens of basic development environments, including all necessary basic images, and pre-loaded default settings, preset commonly used plug-ins and development configurations. . We currently support the following development environment languages, as shown in Figure 3:

  • C/C++
  • Html
  • C#
  • Java and Spring framework
  • js and Nodejs surrounding frameworks
  • Go
  • Python and Flet framework
  • Flutter and Android development
  • Vue React Angular and other front-end frameworks

img

Figure 3: Out-of-the-box configuration

>>Web-based development space console

Cloud Studio provides a dedicated console for logged-in users to manage their workspace status, resource consumption, create personalized exclusive templates, logos, team management and team resource status, meeting the needs of simple personal development to complex enterprise-level projects. Development demands are shown in Figure 4.

img

Figure 4: Cloud Studio console

03 Cloud Studio development environment architecture

img

Figure 5: Cloud Studio console

As shown in Figure 5, in Cloud Studio, all personal environments are placed in the container environment, which allows developers to use various officially provided versions, and can also easily customize their own environment through Dockerfile, and even if they leave Cloud Studio can also be used anywhere. On top of the container, Cloud Studio will provide users with additional out-of-the-box software packages, including the user's editor interface, Docker, kubectl and other common development tools.

As shown in Figure 6, we provide Ubuntu as the development operating system. As the most popular Linux distribution in the world, Ubuntu is most in line with the usage habits of developers and users. It is built on Debian and can naturally use most tool chains in the Debian ecosystem, thus reusing most of the existing production environment infrastructure.

img

Figure 6 Cloud Studio image hierarchy

Migrating from a local computer to the cloud allows you to greatly optimize and enjoy the cloud's richer computing resources, massive computing cores and high-performance large-capacity GB RAM machines. The most important thing is that we decided to use kubernetes, which provides us with the IaaS capabilities we need based on Tencent Cloud TKE, EKS and high-speed and stable cluster capabilities:

  • The ability to host containers on powerful hardware
  • Connect containers and support parallel running
  • Used to store files during development between restarts and persisted to an NFS persistent volume

Standardized underlying containers for Kubernetes

We use Custom Resources | Kubernetes CRD to completely describe a workspace resource, which allows users to create, schedule, and access Cloud Studio's workspace pods directly through kubectl even if they are separated from the Cloud Studio platform.

In CRD, we extended the PersistentVolumeClaim feature to support arbitrary external persistence of data, not just the global NFS persistence used by Cloud Studio itself. In the future, we plan to support responding to VolumeSnapshotContents changes, thereby reducing storage costs when not in use.

img

Figure 7: Cloud Studio CRD

04 Challenge

Creating the perfect environment for engineers is no easy task – we encountered some challenges along the way. Balancing performance and cost efficiency, providing automatic upgrades, ensuring uninterrupted work, and preconfiguring IDE settings that work for everyone were some of the hurdles we had to overcome.

IDE core selection

Engineers use IDEs every day, so a remote environment cannot be successful without a good IDE experience. In the cloud development environment, we provide a variety of different IDE kernel options:

  • Cloud Studio Web IDE core
  • VS Code Remote SSH
  • JetBrtain Remote SSH JetBrains Gateway - Remote Development for JetBrains IDEs

>>Remote SSH connection method

As shown in Figure 8, any cloud development environment started by Pod can be connected using your favorite local IDE, retaining your favorite themes and familiar shortcut keys. Take full advantage of cloud development environments.

img

Figure 8: Access the cloud development environment through SSH tools

Connect to the cloud IDE workspace via SSH | Cloud Studio

In addition to offering multiple IDE options, we also focus on fine-tuning the hands-on experience through:

  • Preload index
  • Preconfigured settings
  • Pre-installed tools, extensions and custom add-ons

Cloud Studio provides users with a cloud development experience for rapid development based on elastic computing power and persistent storage. However, getting rid of local mainstream IDEs and switching to cloud development has been one of our biggest challenges from the early days. In particular, the latency issues of early popular web-based IDEs and subsequent stability issues caused us a lot of trouble.

Later we discovered that Web IDEs and local IDEs should coexist. We should be committed to building and promoting the concept of cloud development environment, and assist with the cost reduction and efficiency increase demands of enterprise R&D and the consistency of DevOps R&D.

Keep your environment up to date

Since engineers value their time, it's important to provide an environment that doesn't require manual maintenance, so we automatically upgrade the environment during non-business hours with the latest tools and security updates.

To meet everyone's needs, we allow engineers to choose from one of four release cadence channels:

  • stable - default one
  • rc - Candidate version for the next stable version
  • dev – nightly updates to the latest successful build
  • none - no automatic updates

Whether engineers want the most stable environment and most advanced features, or no upgrades at all, we have it covered.

We later improved automatic updates to support the gradual rollout of new versions. This feature was added to reduce the blast radius in case bugs get past our automated testing and release candidate, internal testing processes.

Cost-effectiveness

It is not uncommon to find from many customers that there are build and buy resources. In the process of cooperating with enterprises, Tencent Cloud Serverless, and self-development of products based on Cloud Studio, we continuously track and monitor and improve the cost of Cloud Studio, and ensure that the price/performance ratio of Cloud Studio is better than its ready-made alternatives.

In order to improve resource utilization and achieve better cost control, we migrated our self-built K8s cluster to Tencent Cloud Serverless Container Service (formerly Elastic Container Service EKS). Our purpose is to move resources to the cloud and make full use of the natural resources of the cloud. efficiency and operation and maintenance costs.

In order to achieve cloud migration, we have done a lot of architectural redesign. First, we redesigned all the specialized features that the workspace relies on:

  1. Removing OCI Hook feature dependency: In the previous workspace resource persistence, we used the OCI Hook feature in the CRI standard to implement the save and load of the user persistence layer. Specifically, during the container startup process, the upper layer of the image was replaced by The ext4 virtual disk we prepare for users persists data. On the cloud, obviously we cannot take advantage of such features because we cannot fully predict whether the underlying runtime will be compatible with OCI Hook. For this reason, we redesigned the user container and adopted a two-layer architecture. In the outer layer, we used a standard container to provide a Standard environment, then run a podman container in this environment, then get the layer information through the inner image inspect and assemble the user persistence layer into the final rootfs for podman to run.

  2. Eliminate dependence on DaemonSet features: The Cloud Studio team attaches great importance to container startup performance, and k8s, which comes from an Ops perspective, naturally does not pay special attention to the issue of container image download speed. Therefore, in order to improve the container loading speed required by users, we previously used DaemonSet for Each node preheats the basic images required by all users, thus ensuring the loading speed of user images. However, the characteristics of DaemonSet naturally conflict with the cloud, because the concept of the cloud is that users do not care about node resources. In order to deal with this problem, we completely redesigned In the user container image preheating logic designed, in addition to providing resource requirements to k8s on demand, we also introduce a layer of caching, which means that we will apply for a pod from k8s in advance. The difference from before is that if we apply for a node , then the node resources are fixed, but applying for a batch of pods is different. Based on the business needs, we can tolerate a certain amount of resource oversales. Therefore, the requests for the resources we apply for are very few, and at the same time, we use limit to limit the container. Maximum resources are guaranteed to balance resource requirements as much as possible. Finally, a rescheduling strategy is designed based on k8s scheduling. Each cache pod is scored and the best-performing pod is selected for each user.

In addition to these major logical changes, we also have many other optimization points such as traffic import strategy, real-time billing information collection, etc., and finally formed a cloud-native design that truly fits the cloud-native design and can meet great resource utilization. Workspace design, the following is a diagram of the changes:

img

Figure 9: Resource Utilization - Each Pod has everything it needs

We provide full persistence capabilities, which is more valuable than our competitors. Since the introduction of NFS, we had to ensure we were using compute and storage resources efficiently, so we implemented several improvements:

  • Shut down inactive environments: To conserve computing resources, automated jobs periodically check whether an environment has been used recently and delete its containers when an inactive environment is detected.
  • Rebalancing VMs: Since multiple environments are placed in one large VM, and we pay for VMs, it only makes sense to fully utilize one VM. So, if two VMs are using only half of their capacity, we shift the load to one machine and shut down the other.
  • Shut down the environment's snapshot disk: Cloud Studio requires a container and a disk to run. When we shut down the container, the disk is no longer in use, so we convert it to a low-cost storage option until the environment starts up again.

So far, we have tried our best to control our costs and strive to optimize performance and usage. Compared with the old development environment, the improvement effect is very significant.

Product monitoring indicators

Cloud Studio runs on the TKE cluster and standard K8S, as shown in Figure 10, and defines full-link performance indicators. We have achieved and exceeded performance limits through various caching solutions.

img

Figure 10: Definition of performance indicators of cloud development environment based on K8S (TKE)

Before we optimized, Cloud Studio cold start took about 19 seconds. We analyzed various typical scenarios and finally plotted the time-consuming situation of the main blocking points:

img

Figure 11: Choking point

At the same time, we analyzed the relationship between inspect, a serious blocking item, and the number of containers in the node:

img

Figure 12: Analysis

After continuous optimization:

img

Figure 13: After optimization

The cold start time is reduced to 5-7 seconds. The second startup time is reduced to less than 5S.

our progress

  • Startup performance continues to be optimized. Currently, through preheating optimization, we have internally achieved a stable cold start and secondary start time of about 4 seconds, and will gradually launch the online environment in the future.
  • By resident hotspot workspace, the secondary startup time of highly active workspace is shortened to less than 2S.
  • We plan to further warm up various commonly used templates in the future, so that in most scenarios, users can cold start the workspace in about 2 seconds.
  • The plan optimizes error feedback when starting the workspace, making it easier for users to find and solve problems.
  • IDE startup optimization enables millisecond-level IDE interface loading.

05 Cloud native development and debugging + cloud development environment

Cloud native debugging is another way we think about shifting testing to the left in the cloud integrated development environment. Starting from the IDE itself, cloud native service development and debugging is also included as a part of the IDE, so that any developer can use one click in the IDE editor. Deploy a complete set of cloud-native applications, and easily use the current code to hijack service traffic for one or more services you are currently concerned about, open the breakpoint debugger for debugging, or conduct real-time joint debugging with other team members.

At the end of 2021, the CODING team contributed the Nocalhost framework to CNCF. In the following time, the Cloud Studio team has been committed to combining Nocalhost with Cloud IDE to realize joint development debugging and test shift under the development cluster. Through the exclusive namespace under the cluster and the orthogonal capabilities of the Cloud Studio team, we try to allow the team to simplify the cloud native development phase and discover problems more left-hand. As shown in Figure 14, John and Peter jointly debugged the front-end and back-end in a namespace of a development cluster. They replaced the service Pod with Cloud IDE and then debugged and discovered the problem. I have recorded a video to give a brief introduction.

img

Figure 14: Resource servitization

06 Cloud development, more opportunity scenarios

The future is here, and cloud development is an abstract but valuable infrastructure. In the years that AI is available, there are more infinite possibilities. For example, AI + cloud development = Code Interpreter or AI Cloud Development Environment, or more advanced scenarios.

We view Cloud Studio IDE as a product implementation of a cloud development environment. Our goal is to make remote development completely seamless for enterprise-level engineers. We are working hard to improve:

>>Second-level flexible allocation of containers

We hope to reduce the allocation and startup of development space to less than 3 seconds by monitoring and identifying slow areas, such as eliminating asynchronous states, improving the warm-up hit rate, and other means.

>>Ephemeral development container

Once it's solved and brought to the ultimate launch level, we hope that developers will be able to use Cloud Studio as a build integration. We foresee the following use cases for ephemeral Devpods:

  • Devpod is specific to feature development
  • Simple and fast failure CI debugging
  • Ready-to-use code review environment
  • Analyzing mobile crashes

>>More resource-saving cloud native debugging

The traffic hijacking and dyeing brought by the cloud native Mesh solution can enable the development and debugging under the cloud native infrastructure and the cloud development environment to produce new enterprise-level products.

>>Non-disruptive automatic upgrade and maintenance of workloads

Currently, environments have maintenance windows set for non-business hours, however, some engineers may wish to work during non-business hours or run longer workloads in their environment. Therefore, improvements can be made by monitoring active connections and deferring maintenance workloads.

>>Improve seamless IDE experience

While engineers use the IDE locally on their laptops, it should hide the remote environment implementation in the background. The IDE connects to the remote environment silently in the background and provides all the benefits of significant computing power if there is a good enough network available - if there are any issues with the remote environment, they are not blocked and can be processed seamlessly locally continue working.

>>Configuration for specific teams

We're looking for ways to minimize the first-time setup barrier to using Teams. One of the areas we'd like to improve is allowing teams to customize their configurations (codewise) so that new team members get a consistent development environment tailored to their needs with just one click.

>>Seamless file transfer between local computer and cloud development environment

We try to explore scenarios for computational development, such as for computationally intensive tasks and long-running compilation builds. In some cases, developers need to move files between their laptops and the Cloud Studio cloud development environment (and vice versa). Our goal is to make the Cloud Studio CLI a local drive that automatically installs on the user's computer so files can be moved seamlessly.

>>vGPU

We are trying to build a vGPU solution based on Tencent Cloud gpu-manager, which may help the following development needs:

  • AI model training
  • GUI cross-platform client development
  • Game client development

img

Guess you like

Origin blog.csdn.net/CODING_devops/article/details/132990682