ZT: Some learning materials in the field of system architecture

A summary written by Baidu scientist Lin Shiding (original address). It involves topics such as virtual machines, distributed, and P2P that I am interested in, so I reprint it here.

Tags: architecture system system research

System architecture is a field that combines engineering and research. It focuses on practice and relies on theoretical guidance. It is easy to get started but difficult to master. Sometimes it needs to talk about comprehension. It has the characteristics of "pseudoscience". To advance in this field, in addition to continuously designing and building actual systems, we must also pay attention to the learning and refinement of methodology and design concepts.

Students often ask how to study, and a special study material is posted for your reference. Written in 2009, I extracted some projects that I think are worth studying and learning from the vast literature in the field of systems. It does not include some work that has appeared in recent years, and it is not comprehensive enough. However, it is actually enough. Looking at paper is a process from less to more to less. A general understanding of the nature, background and development history of the problem, supplemented by hands-on practice (long-term real practice), is enough to touch the door of this field.

This article has been reproduced many times on the Internet, but most of them do not state the source. Repost here today, and pay tribute to 315 by the way.

-

For engineers, they often encounter growth bottlenecks after a certain stage. In order to break through this bottleneck, it is necessary to study more deeply in the technical field, and understand the nature of the problem, methodology and design concept, and development history in this field. The following provides some learning materials in architecture-related fields, with brief comments for the reference of interested engineers. I hope that through the understanding and study of these fields, you can master more system design principles, be handy in your own work, and step into the realm of freedom.

1. Operating SystemsMach [Intro: http://www-2.cs.cmu.edu/afs/cs/project/mach/public/www/mach.html, Paper: http://www-2.cs.cmu .edu/afs/cs/project/mach/public/www/doc/publications.html]

In traditional kernel implementations, the response to interrupts is implemented in a "big function". The reason why it is called a large function is that there is the same control flow from the entry to the exit of the interrupt. When there is interrupt reentrancy, the implementation logic will become very complicated. Most OSs, such as UNIX, use this monolithic kernel architecture.

The Mach project started in 1985 proposed a brand-new microkernel structure, which made the academic circles who felt that there was no support for the follow-up suddenly found excitement because of the development of UNIX to the extreme in the 1970s, and also started the uproar of monokernel and microkernel. debate.

Insert a tidbit: Richard Rashid, the leader of Mach, was a professor at CMU at the time, and was entrusted by BillGates to lobby JimGray to join MS. As a result, he was also circumvented and formed Microsoft Research. He came to China to do several keynotes of 21Century Computing.

Exokernel [Intro: http://pdos.csail.mit.edu/exo/, Paper: http://pdos.csail.mit.edu/PDOS-papers.html#Exokernels]

Although the structure of the microkernel is very good, it is not widely used in practice, because the performance is too poor, and everyone gradually finds that the problem of OS is not the complexity of the implementation, but more about how to improve the flexibility of the application's use of resources. This is why the debate about OS kernel architecture slowly faded after the advent of kernel extensions (such as loadable module in Linux).

It is in this context that Exokernel appears. It does not provide abstraction (process, virtual memory, etc.) of traditional OS, but focuses on resource isolation and multiplexing, which was proposed by MIT. On top of exokernel, a set of libraries, the well-known libOS, is provided for implementing various OS interfaces. Such a structure provides maximum flexibility for applications, allowing different applications to either focus on scheduling fairness or real-time response, or on improving resource usage efficiency to optimize performance. From today's point of view, exokernel is more like a virtual machine monitor.

Singularity [Intro:http://research.microsoft.com/os/Singularity/, Paper: http://www.research.microsoft.com/os/singularity/publications/HotOS2005_BroadNewResearch.pdf
]

Singularity appeared in the early 21st century when viruses and spyware were inexhaustible and inexhaustible, and was proposed by Microsoft Research. Both academia and industry are discussing how to provide a trust-worthy computing environment and how to make computer systems more manage-able. Singularity believes that to solve these problems, the underlying system must provide hardisolation, and the hardware virtual memory mechanism that people have relied on before cannot provide high flexibility and good performance. After the emergence of runtimes such as .Net and Java, a software-level solution became possible.

On the basis of microkernel, Singularity built a set of type-safed assembly as ABI through .Net, and specified the message passing mechanism of data exchange, which fundamentally prevented the possibility of modifying isolated data. Coupled with the security checks of the application, a controllable and manageable operating system is provided. Due to the continuous optimization of .NetCLR and the development of hardware, the performance penalty of Singularity after adding these checks is still acceptable relative to the good features it provides.

This kind of design is still in the laboratory stage. Whether it can finally win depends on the opportunity of UNIX.

2. Virtual MachinesVMWare ["MemoryResource Management in VMware ESX Server", OSDI'02, Best paper award]

is a familiar vmware, needless to say.

XEN ["Xen and the Art of Virtualization", OSDI'04]

Excellent VMM from Cambridge.

Denali ["Scaleand Performance in the Denali Isolation Kernel", OSDI'02, UW]

is an application level virtual machine designed for internet services, which can run thousands of VMs on ordinary machines. Its VMM is based on the isolation kernel, which provides isolation, but does not require absolute fairness in resource allocation, thereby reducing performance consumption.

Entropia [“The Entropia VirtualMachine for Desktop Grids”, VEE'05]

To uniformly utilize desktop machine resources in a company for computing, computing tasks need to be well packaged to ensure that they do not affect the normal use of the machine and are isolated from user data. Entropia provides such a computing environment and implements an application level virtual machine based on windows. The basic approach is to redirect the syscalls called by the computing tasks to ensure isolation. Similar work is FVM: "AFeather-weight Virtual Machine for Windows Applications".

3. Design Revisited "Are Virtual Machine Monitors Microkernels Done Right?"

The title of HotOS'05 sounds very puzzling at first, which means that VMMs are actually the correct implementation of Microkernel. It discusses VMM and Microkernel in detail and is an excellent reference for understanding these two concepts.

"Thirty Years Is Long Enough: Getting Beyond C", HotOS'05

C may be the most successful programming language in the world, but its shortcomings are also very obvious. For example, it does not support threads, which is a bit powerless in today's highly parallel hardware structure, and this aspect is the strength of functional programming language. How to combine the advantages of the two is a very promising field.

4. Programming Model "Why Threads Are a Bad Idea"

It is difficult for a server with a single thread structure to achieve high performance because of the memory usage, switching overhead, synchronization overhead and programming complexity caused by ensuring the correctness of the lock, etc. .

"SEDA: An Architecture for Well-Conditioned, Scalable Internet Services", OSDI'01

Thread is not good, but event can't solve all problems, so we look for a combined method. SEDA splits the application into multiple stages, and different stages are connected through queues. Multiple threads can be started in the same stage to execute events in the queue, and the number of threads can be automatically adjusted through feedback.

Software Transactional Memory

If memory can provide transaction semantics, then the world we face will be completely different, and the language, compiler, OS, and runtime will all undergo fundamental changes. Although Intel is now doing hardware transactional memory, it is estimated that it will not be commercially available in the foreseeable future, so people turn to software solutions. It is conceivable that this solution cannot be based on native assembly, and there are currently implemented versions in languages ​​such as C# and haskell. For more information, see Wikipedia.

5. Distributed AlgorithmsLogical clock, [“Time, clocks, and the ordering of events in a distributed system”, Leslie Lamport, 1978]

This is a classic paper on Logic clock, time stamp, distributed synchronization.

Byzantine [“The Byzantine Generals Problem”, Leslie Lamport, 1982]

There are various kinds of errors in distributed systems. If there is an error, it will shut down. If there is an error, it will hinder the performance. More seriously, if there is an error, it will cause malicious behavior. The last kind of malicious behavior, like the mutiny of a general, will have a serious impact on the system. For this type of problem, Lamport proposed the Byzantine failure model. For a state machine consisting of 3f+1 replicas, as long as the number of rebellious replicas is less than or equal to f, the entire state machine can still work normally.

Paxos [“The part-time parliament”, Leslie Lamport, 1998]

How to achieve consensus in an asynchronous distributed environment is the most fundamental problem in distributed algorithm research. Paxos is the pinnacle of this type of algorithm. However, this paper is too difficult. It is said that only 3.5 people in the world can understand it, so Lamport later wrote a popular version of the paper: "Paxos Made Simple", but it is still difficult to understand. Also, see Butler Lampson's "The ABCD's of Paxos" (PODC'01), where the description of the replicated state machine will seriously enlighten your understanding of the nature of parallel worlds, and the Turing Award is not for the faint of heart.

There is a name that appears repeatedly on it: Leslie Lamport, who has been digging holes in the field of distributed computing and eventually became a generation of masters. There are also several anecdotes about him. I remember what he wrote on MSR's homepage before, "BillGates was wearing open-crotch pants (in diaper) when I was researching logicalclock..." (to the effect, the original text is now unavailable). In addition, when he was writing the paper, he liked to change the names of other great people and arrange them. That's probably why he hasn't won the Turing Award yet.

For Lamport's other achievements, see also this paper dedicated to his 60th birthday: "Lamport on mutual exclusion: 27 years of planting seeds", PODC'01.

6. Overlay Networking, and P2P DHTRON [“Resilient Overlay Networks”, SOSP'01]

RON describes how to build an overlay at the application layer to provide second-level WAN network layer failure recovery speed, while the existing routing protocol is used to recover The communication time is at least several tens of minutes. This fast recovery feature and flexibility make overlay networking widely used today.
Application Level Multicast
"End System Multicast", SigMetrics'00
"Scalable Application Layer Multicast", SigComm'02
There are many papers about ALM, basically describing how to build a mesh network to transmit control information robustly, and then build another A multicast tree is used to efficiently transmit data, and then do some layered delivery according to the characteristics of multimedia data. The systems such as coolstream and pplive that appeared in the past few years are all commercial products of such systems.
P2P
The advent of P2P has changed networking. According to the structure of various P2P networks, it can be divided into three types.
1. Napster type, centralized directory service, data transmission Peer to peer.
2. Gnutella style, query by gossip between neighbors, also known as unstructured P2P.
3. DHT, which is different from unstructured P2P, the query performed by DHT is guaranteed. If the data exists, it can be returned within a certain number of hops. This hop number is usually logN, where N is the number of system nodes.
Typical DHTs are CAN, Chord, Pastry, Tapestry and so on. These studies are mainly at the algorithm level, and the system work is mainly to build a wide area network storage system on it. There are also some people who conduct research at the mechanism level, such as how to incentivize users to share, prevent cheating, etc.

7. Distributed SystemsGFS/MapReduce/BigTable/Chubby/Sawzall
Google's series of papers are familiar to everyone, so I won't say more. Check it out here. There are too many papers on
Storage Distributed storage system.
A few of the most relevant are listed below.
"Chain Replication for Supporting High Throughput and Availability", OSDI'04.
"Dynamo: Amazon's Highly Available Key-value Store," SOSP'07.
"BitVault: a Highly Reliable Distributed Data Retention Platform", SIGOPS OSR'07.
“PacificA: Replication in Log-Based Distributed Storage Systems,” MSR-TR.
Distributed Simulation

"Simulating Large-Scale P2P Systems with the WiDS Toolkit", MASCOTS'05. The interesting thing about distributed simulation is that the simulated protocol is distributed, and the simulation engine itself is also distributed. Logical and physical time and events are intertwined in the system and need to be handled carefully.

8. Controversial Computing Models The current software systems have become too complex to be grasped by humans. Many systems are still released with many deterministic or non-deterministic bugs, which can only be continuously patch. Since as human beings, the lack of fine-grained characteristics determines that we cannot fix the bugs of the system, we can only start from other angles to study a way to make the system work in this frustrating environment. It's like a distributed system, failures are inevitable, and we choose to let the system as a whole provide high reliability.

The following three are typical representatives. Basically, the main research content is focused on 1) how to save the state correctly; 2) how to catch the error and restore the state; 3) how to do the unit-level recovery without affecting the whole.

Recovery Oriented Computing

Failure oblivious computing, OSDI'04

Treating Bugs as Allergies, SOSP'05

9. The debugging system is very complex, and humans cannot directly analyze it logically, and can only observe it macroscopically through the method of data mining.

Black box debugging ["Performance debugging for distributed systems of black boxes", SOSP'03]

is very difficult to perform performance debugging of large systems because many of the problems in it are non-deterministic and cannot be reproduced. The problem can only be located by digging through the log to find the paired calls/messages.

CP-miner ["A Tool for Finding Copy-paste and Related Bugs in Operating System Code", OSDI'04]

Many people use copy-paste when reusing code. But sometimes simple CP will bring serious problems, such as the duplication of local variables and so on. CP-miner analyzes the code, builds a syntax tree structure, and then mines out such errors.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326929263&siteId=291194637