[Cloud Native] Microkernel Distributed Operating System Kubernetes

Microkernel distributed operating system Kubernetes

Today, Kubernetes has become the de facto standard for distributed cluster management systems and public/private clouds. In fact, Kubernetes is a distributed operating system. It is the crystallization of Google's more than ten years of engineering experience and wisdom in the field of distributed operating systems . Google has always managed the world's largest distributed cluster. Research and understanding in the field is ahead of the world. Therefore, 2014 2014Released in 2014 , Kubernetes has surpassed many predecessors in just a few years and achieved great success.

As a distributed operating system, Kubernetes (including its predecessor product Google Borg) appeared much later than famous stand-alone operating systems such as UNIX, Linux, and Windows. The architecture design of Kubernetes naturally inherited the precious heritage of many stand-alone operating systems . The kernel architecture is the most important of these legacies. In the rest of this article, we'll focus on the concept of a microkernel and its impact on the Kubernetes architecture.

1. What is a microkernel?

When introducing the microkernel, it is necessary for us to review the history of the stand-alone operating system at the same time to understand its value. In this chapter, "operating system" refers to "stand-alone operating system".

1.1 The rise of UNIX

After the birth of electronic computers, in the 70s and 70s of the last centuryBefore the 1970s , there were many operating systems, among which DOS, OS/360, and Multics were well-known representatives. This was a pioneering era in the field of operating systems. 20 2020 years of pioneering has produced great results: With the development of CPU technology, UNIX was launched in1969 and 1969Born in 1969 , this is a real time-sharing operating system.

insert image description here
With the support of new CPU technology, UNIX divides the software system into two parts: the kernel ( kernel) and the user mode program ( ). The kernel is a collection of interrupt handlers , which encapsulates hardware capabilities as operating system function calls ( ), and user-mode programs use hardware functions through system calls. User-mode programs run in their own processes, and all user-mode processes share the same A kernel, whenever a system call or interrupt occurs, UNIX will fall into ( ) the kernel, and the kernel executes the system call. At the same time, the time-sharing scheduling algorithm in the kernel will decide which process to give the CPU to and manage the context switching of the process. Also, UNIX encapsulates (almost) all hardware as files. UNIX also provides a special user-mode program for users to use the system directly. Through the inter-process communication capability provided by the kernel, users can combine a series of applications to handle complex requirements. The author calls this design idea "KISS "( ). All the design ideas of UNIX were very remarkable creations at the time.userland programssystem callstrapshellshellKeep It Simple and Stupld

UNIX not only made a huge direct contribution to the industry itself, but also became the blueprint for all modern operating systems. The two authors K en T ompson Ken\ TompsonKen T o m p so n and D  ennis R itchie Dennis\RitchieDennis R i t c hi e  therefore wonthe 1983 1983Turing Award in 1983 .

UNIX was born at Bell Laboratories , which belonged to AT & T AT\&TA T & T ), after seeing the powerful power of UNIX,AT & T AT\&TAT & T made a seemingly altruistic decision to open source UNIX (initially only to universities), which led to the birth of all modern operating systems . AlthoughAT & T AT\&TA T & T was eventually spun off and lost its glory, but the contribution of this decision to people continues to this day. at21 2121st century20 20Today in the 1920s , whether it is MacOS, Windows, or Linux, they are all directly influenced by UNIX, while iOS comes from MacOS, and Android comes from Linux. Therefore, the soul of UNIX still lives in everyone's mobile phone and in the background of every mobile app. in service.

In addition, when UNIX was born, it also came with a by-product worth more than the operating system itself: Dennis R itchie Dennis\ RitchieDennis R itchie  designed the C language for the development of UNIX , andthe C language has become the main source of design for all popular modern programming languages .40 years later, it is still one of the most important programming languages.

It is worth mentioning that at that time, UNIX was mainly open to research universities such as Berkeley and Carnegie Mellon .The main open goal of A T & T , so a little brother who graduated from Olivet College was not affected by the UNIX trend of thought. The name isDavid Cutler David\ CutlerDavid Cutler 's software genius  in1975 1975 _ _ _The VMS operating system was designed at DEC in 1975. Like the original UNIX, VMS runs on the PDP-11, but it is not based on UNIX, but is designed independently. VMS did not make a big wave in the industry, and ended up being compatible with UNIX. LaterDavid Cutler David\ CutlerDavid Cutler left DEC  and joined Microsoft, where he wrote his own legend . Interestingly, Jobs also studied in a liberal arts college. It seems that students in American liberal arts colleges do not take the usual path.

1.2 The rise of the microkernel

UNIX's " everything is a file " design brings a lot of convenience to user program design, but it requires all hardware packaging to be in kernel mode, so a bug in a module in the kernel will affect the entire system. For example, if a certain If a device driver has a memory leak, all user-mode processes using the device will have a memory leak. If a kernel module has a security hole, the security of the entire system will no longer be controllable.

In order to solve this kind of problem, the last century 70 70In the 1970s , operating system researchers began to develop the concept of "microkernel". The essence of a microkernel is to allow the kernel state of the operating system to retain only theof memory address management,thread management, andinter-process communication(IPC), while other Functions such asfile system,device driver,network protocol stack,GUI system,etc. are regarded as separate services, and such services are generally separate user-modedaemonprocesses.

User-mode applications access these services through IPC, thereby accessing all functions of the operating system. In this way, the number of system calls that need to fall into the kernel will be greatly reduced , and the modularization of the system will be clearer. At the same time, the system is more robust. Only a small number of system calls in the kernel have access to the full capabilities of the hardware. For example, problems with device drivers will only affect the corresponding services, not the entire system. In micro kernelcontrast, the design of UNIX is called monolithic kernel(single-core).

After the opening of UNIX, AT & T AT\&TA T & T continues to iterate the version, and each university is based onAT & T AT\&TA T & T 's UNIX has developed many new operating system kernels, the more well-known of which are:

  • BSD ,monolithic, by Berkeley legend Bill J oy Bill\ JoyBill J oy  in1974 1974 _Released in 1974 (supposedlyBill J oy Bill\ JoyIt took Bill J oy three  days to complete the development of the first version of the BSD kernel, Bill J oy Bill\ JoyBill Joy  's work also includes the first TCP/IP protocol stack, vi, Solaris, the SPARK chip, etc. ) . The kernel had a great impact on the industry, and later developed into branches such as FreeBSD, OpenBSD, and NetBSD. Modern operating systems such as Solaris, MacOS X, and Windows NT have many references to it.
  • Mach , the microkernel, developed by Carnegie Mellon University in 1984 1984Published in 1984 , the main authors are two CMU graduate studentsA vie T evanian Avie\ TevanianAvie Tevanian R i c k   R a s h i d Rick\ Rashid R ick R a s hi d  . _ _ The kernel also has a great influence on the industry. GNU Hurd and MacOS X have a lot of references to it, but the project itself ended in failure.
  • MINIX , the microkernel, by Andrew T anenbaum Andrew\ Tanenbaum of the Vrije Universiteit AmsterdamProfessor A n d re w T an e nba u m  in1987 1987Released in 1987 . Numerous computer science students have mastered the design principles of the operating system through MINIX and its supporting textbooks. The initial version of Linux is based on MINIX. Although famous, MINIX was mainly used for teaching and never gained a place in the industry.

1.3 The silence of the microkernel

From the last century 90 9090s to this century10 10In the 10s , the descendants of UNIX and VMS launched a melee. Judging from the results, although the concept of microkernel is beautiful, the reality is very cruel:

  • MINIX is limited to teaching, but the Linux system designed based on MINIX is monolithica great success. Mach has had a profound impact on the industry, but it has not been applied on a large scale. Its successor, GNU Hurd, has been under development and has never been applied.
  • The NTOS kernel of Windows is David Cutler David\ CutlerDavid Cutler designed it  based on the system VMS he originally designed independently at DEC (VMS has nothing to do with UNIX ) . NTOS borrowed ideas from the microkernel and some code from BSD, but in the endDavid Cutler David\ CutlerDavid Cutler  decided to put all services (such as GUI) in the kernel mode instead of the user mode, so Windows NT is consistent with the microkernel in software architecture, and the actual operation is consistent with the kernel , which ismonolithic calledhybrid kernel( hybrid core).
  • MacOS X is based on NextStep OS design, NextStep is A vie T evanian Avie\ TevanianAvie Tevanian 设计的, A v i e   T e v a n i a n Avie\ Tevanian A v i e T e vanian  is the main designer of Mach. After graduating from Ph.D., both Gates and Jobs invited him. He went to Next. His friend R ick R ashid Rick\ RashidR ick R a s hi d  went to Microsoft asDavid Cutler David \ CutlerD a v i d C u tl er  's chief assistant, it is said thatA vie T evanian Avie\ TevanianA v i e T e vanian  uses a calculator every day at Next to calculate the stock appreciation he lost because he did not go to Microsoft . After returning to Apple with Jobs,A vie AvieA v i e designed OS X based on the code of NextStep and BSD. Coincidentally, OS X also adopted thehybrid kernelarchitecture of .

Among several operating system technology giants, except Linus T orvalds Linus\ TorvaldsL in u s T or v a l d s  , eitherDavid Cutler David\ CutlerDavid Cutler A n d r e w   T a n e n b a u m Andrew\ Tanenbaum Andrew Tanenbaum,还是 A v i e   T e v a n i a n Avie\ Tevanian Avie Tevanian R i c k   R a s h i d Rick\ Rashid R ick R a s hi d  are all leaders in the microkernel architecture, but in the end they didn't completely implement the microkernel. This is for a reason .

A microkernel operating system accesses system services monolithicmuch less efficiently than an operating system . For example, in Linux, a system call (for example open) only needs to trap the kernel once, that is, switch the CPU to high-privilege mode first, and then switch back to low-privilege mode. If in a microkernel operating system, the user openneeds to assemble an IPC request message first, send it to the corresponding file system service process, and then obtain the IPC response message from the file system service process and unpack it to get the call result. Come, the data copy and process context switch brought by the message will bring a lot of overhead. The message needs to be copied because user-mode processes cannot access each other's memory addresses, while kernel code can access any memory address of any user-mode process. It is precisely for performance reasons that both OS X and Windows have chosen hybrid kernelthe architecture of NTOS, and NTOS even integrates a GUI subsystem in the kernel to bring a better user experience.

To put it simply, when the performance of the computer is not good, we will find that the mouse arrows of Windows are more " following ", even if the system is close to a crash, the mouse arrows of the Windows system can still be active. Windows XP can run on Windows 98 98The greater success of the previous generation of products such as 1998 is inseparable from NTOS's close attention to performance.In the mid -1980s , there was such a feat as the first-generation Machintosh, but because Jobs could not persuade the sales team to change to a stronger memory stick, the performance of the first-generation Mac was poor, and the running program was very slow, failing to achieve the blue ocean success it deserved.

2. Kubernetes and microkernels

Performance issues may be crucial for a stand-alone operating system , but not for a distributed operating system. As a "behind-the-scenes hero", a distributed operating system does not need to directly face users, and a small loss in stand-alone performance can Using more machines to make up for it, under this premise, a better architecture is often more important.

2.1 The birth of Borg

When the stand-alone operating system war was about to be decided, Google, the new darling of the industry, was preparing for an IPO. In today's terms, Google was a "small giant" at that time: it had already shown its edge and should not be underestimated, but the giants at that time We are in the quagmire of war and have no time to take care of it. 2003 2003In 2003 ,in order to better support the new version of the search engine (based on MapReduce) and enable it to serve hundreds of millions of users, Google started the development of a large-scale cluster management systemcalledBorg, and its goal isto manage tens of thousands of users. A computer cluster in units of units. Although only3, 4 3, 4A small team of 3 or 4 people, but Borg still kept up with Google's rapid development and proved its potential. In the end, all Google machines were managed by Borg, and famous systems such as MapReduce and Pregel were built on Borg . From the perspective of the operating system, Borg is amonolithicsystem, and any function upgrade to the system needs to go deep into the underlying code of Borg to modify the support. In a mature technical company like Google, there are many good engineers, so this problem is not serious in internal systems. However, if it is a public cloud, it must be connected to many third-party applications. No matter how strong a company's engineer team is, it cannot connect all other systems in the industry to Borg. At this time, the scalability of the system will be very important.

in 2010 2010Around 2010 , with the withdrawal of the Google China department, many outstanding Google engineers joined Chinese companies such as BAT, and some of them joined Tencent Soso. After these former Googlers joined Tencent, they reproduced many of Google's systems, and they were technically excellent. The copy of Borg was calledTBorg, which was later renamed Torca. Torca played a very important role in Soso's advertising business. Later, due to the adjustment of Tencent's business and the merger of Soso and Sogou, Torca lost users within Tencent and gradually stopped maintenance.

A few years after Borg went online, Google realized monolithicthe problems and bottlenecks of the architecture, so another small team started the development of the Omega system. The Omega system inherits the idea of ​​the microkernel, and new function upgrades can be completed almost without modifying the underlying code. It is more flexible and has better scalability than Borg. But because all of Google's systems had been built on Borg at that time, due to monolithicthe characteristics of Borg, systems such as MapReduce were tightly bound to the core code of Borg, not only was it impossible to seamlessly migrate to the Omega system, but the migration would also cost a huge The cost of manpower, time and trial and error, so even with the unremitting efforts of core members, the Omega system still failed to succeed at Google.

Interestingly, one of the core members of the Omega Project, Brendan Burns Brendan\ BurnsB re n d an B u r ns  career trajectory and the great predecessor in the field of operating systemsDavid Cutler David\ CutlerDavid Cutler has  many similarities . _ _ _ _

  • They also graduated from the College of Arts and Sciences: David Cutler David\ CutlerD a v i d C u tl er  graduated from Olivet College,Brendan Burns Brendan\ BurnsBre n d an Bur ns graduated from  Williams College .
  • They also joined a giant in a traditional industry after graduation: David Cutler David\ CutlerD a v i d C u tl er  joined DuPont after graduation,B rendan B urns Brendan\ BurnsBrendan Burns joined Thomson Financial  after graduation . _ _ _
  • As the Godfather said, a man can only have one destiny, Cutler CutlerCutler B u r n s Burns Burns learned to write code at these two traditional giants, and perhaps at that time, they discovered their talent in software and discovered their destiny to build a new generation of operating systems . So they also chose the hottest technology giant at the time for their second job:David Cutler David\ CutlerDavid Cutler 加入 DEC, B r e n d a n   B u r n s Brendan\ Burns Bre n d an B u r ns joined  Google .
  • They also reached the pinnacle of their careers at Microsoft: B rendan B urns Brendan\ BurnsB re n d an B u r ns is  now Microsoft's Corporate VP, andDavid Cutler David\ CutlerMr. D a v i d Cutler  has long been the only Senior Technical Fellow of Microsoft. It is rumored that Microsoft even has a rule that Cutler CutlerCutler 's technical rank must be the highest in the company, and anyone promoted to Cutler CutlerCutler 的 level, C u t l e r Cutler C u tl er is automatically upgraded to one level.

2.2 The birth of Kubernetes

In the era of stand-alone operating systems, hybrid kernelit was popular for a while, which proves the success of microkernel in software architecture, but because of performance problems, no successful kernel adopts a "pure" microkernel architecture, so microkernels are from a practical point of view. is a failure.

Unlike the failure of the microkernel architecture in the stand-alone operating system era, the failure of Omega within Google has nothing to do with performance issues, but the impact of historical issues. For the open source community and most companies, there is no system comparable to Borg, and there is no historical burden. Therefore, a few years later, Google decided to open source Omega , a new generation of distributed operating system that surpasses Borg, and named it Kubernetes .

In order to introduce the relationship between Kubernetes and the microkernel , and the advantages that the microkernel architecture brings to Kubernetes , it is necessary to introduce some technical details here.

As mentioned above, the system call of the stand-alone operating system needs to "travel" into the kernel. The so-called trapping ( trap) is also called interrupt ( interrupt). Regardless of the type of kernel, the stand-alone operating system needs to register the system call into the memory at startup. In an area, this area is called interrupt vector ( Interrupt Vector) or interrupt descriptor table ( IDT, Interrupt Descriptor Table). Of course, the interrupt handling of modern operating systems is very complicated, and there are many system calls, so in addition to IDT, a system call table ( SCV, System Call Vector) is also required. The system call calls an interrupt through a unified interrupt entry (such as INT 80) Handler, the interrupt handler distributes system calls to different function codes in the kernel through SCV. So SCV's place in the operating system is as important as it is in StarCraft . For the microkernel architecture, in addition to system calls in SCV, what kind of system capabilities are provided by user mode services also need to be registered in a certain area.

Similarly, distributed operating systems such as Kubernetes provide external services in the form of APIs . The APIs provided by the distributed operating system itself are equivalent to the system calls of the stand-alone operating system, and each API needs to be able to be registered to a certain location. For Kubernetes, the API will be registered in ectd . The APIs provided by Kubernetes itself, which are equivalent to system calls, are supported by a component called Controller . The new APIs provided by developers for Kubernetes are supported by Operators . Operators and Controllers are developed based on the same mechanism. This is in line with the idea of ​​the microkernel architecture: the Controller is equivalent to the services running in the kernel state, providing core capabilities such as thread, process management, and scheduling algorithms , and the Operator is equivalent to services such as GUI, file system, and printer in the microkernel architecture. state operation .

insert image description here
Therefore, the working mechanism of Kubernetes is similar to that of a stand-alone operating system. It etcdprovides a watch mechanism, Controllerand Operatoryou need to specify what you want to watch, and tell it etcd. This is equivalent to the process of registering system calls in IDT or SCV by the microkernel architecture.

Taking Argo as an example, Argo is an Operator that provides the ability to execute a DAG workflow in Kubernetes. When users use kubectlthe command to submit an Argo task, they actually ask to kubectlsubmit Argo's yaml to the Kubernetes API Server, and the API Server will write the Key-Value data in the yaml etcd, and etcdwill remind those services that are watching the specified Key. In our case, this service is Argo. This is just like the process in which a user process requests a user-mode service in a microkernel architecture.

Argo gets etcdthe http request from the watch, reads etcdand parses the data in yaml, and then knows what container to start, and requests Kubernetes to start the corresponding container through the API. Kubernetes scheduleris a Controller that, after receiving a request to start a container, allocates resources and starts the container. This is the process in which a user process starts another process through a system call in the microkernel architecture.

Of course, there are also differences between Kubernetes and a stand-alone operating system: Kubernetes does not have a clear "trapping" process, while a stand-alone operating system with a microkernel architecture needs to trap when accessing system calls, but does not need to trap when accessing user-mode services. However, Kubernetes can set different permissions for different services, which is similar to the difference between kernel mode and user mode CPU permissions in a stand-alone operating system to a certain extent.

The advantages of the microkernel architecture are fully revealed in Kubernetes : in Borg, it is very complicated for developers to add new subsystems, and often need to modify the underlying code of Borg, and the new system will therefore be bound to Borg. For Kubernetes, developers only need to implement an Operator based on the SDK provided by Kubernetes to add a new set of APIs without paying attention to the underlying code of Kubernetes. Argo and Kubeflow are both Operator applications. Any existing software can be easily integrated into Kubernetes through the Operator mechanism , so Kubernetes is very suitable as the underlying distributed operating system of the public cloud.Released mid- 2014 , through 2015 2015A year of growth in 2015 , in2016 2016It became the mainstream of the industry in 2016. For companies without historical burdens, Kubernetes is also used as the underlying system of the internal cloud.

3. Epilogue

In this article, we introduce a brief history of the development of stand-alone operating systems, introduce the process of microkernel architecture from rise to decline in this historical process, and also introduce the process of microkernel architecture rejuvenation in Kubernetes. Generally speaking, although technologies that are significantly ahead of the times may not be successful in the era when they were proposed, they will definitely regain their own glory after many years and after the times catch up . The different encounters of the microkernel architecture in the era of stand-alone operating systems and the era of cloud computing prove this point, and the different encounters of deep learning in the era of low computing power and high computing power also prove this point.

It is worth mentioning that after Kubernetes, Google launched Fuchsia as a possible replacement for Android. Fuchsia is developed based on the Zircon kernel, and Zircon is developed based on C++, which is the microkernel architecture. In the modern era of blowout computing power, in addition to the field of distributed operating systems, whether the microkernel can also be revived in the field of mobile phone/Internet of Things operating systems, let us wait and see .


The content of this article is mainly based on Wang Yi 's recent sharing to the SQLFlow and ElasticDL teams. Shen Kuomo summarized together with Zhang Haitao , Wu Yi , Yan Xu , Zhang Ke and others. This summary explains the basis of SQLFlow's design as a Kubernetes-native distributed compiler, and also explains why ElasticDL only does distributed AI for the Kubernetes platform. The author of this article includes the author of Baidu Paddle EDL. Paddle EDL is a distributed computing framework based on PaddlePaddle and Kubernetes, launched in 2018 2018Contributed to the Linux Foundation in 2018 .

Guess you like

Origin blog.csdn.net/be_racle/article/details/132254056