Introduction to Rust language, key technologies and practical experience

Introduction to Rust language, key technologies and practical experience

Editor's note: High-availability architecture sharing and dissemination of articles with typical significance in the architecture field. This article is shared by Tang Liu in the high-availability architecture group. Please indicate that it is from the high-availability framework public account "ArchNotes".

Introduction to Rust language, key technologies and practical experienceTang Liu, the chief architect of PingCAP, is now committed to the development of the next generation distributed database TiDB and distributed storage TiKV. Open source enthusiasts, language enthusiasts and practitioners such as Go and Rust.

Hello everyone, I’m Tang Liu from PingCAP. It’s a great honor to share Rust-related knowledge and our team’s practical experience in using Rust with you today.

Why choose Rust

First of all, let’s talk about something that has been discussed in the Go community recently. Dropbox changed the underlying S3-like service to Rust to rewrite it. It suddenly increased Rust’s popularity. Who’s called Dropbox, a company like this, is usually based on the technical architecture. wind vane. A question of common concern:

Why does Dropbox not use Go, but instead use Rust with a steep learning curve for development?

In fact, this question also applies to us. A team who thinks that it has a lot of experience in Go, why choose Rust instead of Go?

Introduce what we are doing. Our team is engaged in the development of the next-generation distributed database, which is commonly known as NewSQL. The entire theoretical basis is based on Google's F1 and Spanner, so naturally our NewSQL is also divided into two parts, one is the stateless SQL layer, which is At this stage, TiDB ( http://dwz.cn/2XZkSm) has been open sourced , and the other is the distributed KV layer, which we call TiKV. It is expected to open source in April.
Introduction to Rust language, key technologies and practical experience

TiDB is written in Go. Students who are familiar with Go should know that it is very fast and convenient to write distributed applications in Go, and our team members have very deep Go programming experience, but they are deciding to do TiKV At that time, instead of using Go, or using more popular static languages ​​such as C++ and Java, we chose a language that we are completely unfamiliar with, Rust, Why?

Let me talk about the problems encountered when using Go:

  • GC, although Go's GC has been improved very well after 1.6, and we think it will get better and better in the future, but for a distributed application that requires very high performance, we tend to choose a language without GC. The memory is more controllable.
  • Cgo and TiKV use RocksDB as its underlying storage engine. If Go is used, we need to use Cgo to call RocksDB, and Cgo still has a relatively large loss in performance.

Let's talk about not choosing C++ or Java. C++ is mainly for large-scale development and requires very high requirements for the entire team. We also feel that it is impossible to hold a product in a short time with C++, so we naturally give up. As for Java, few people in our team are proficient in Java, so they just gave up.

Therefore, we finally set our sights on Rust, mainly because Rust has several cool features, namely type safety, memory safety and thread safety, which will be explained in detail later.

The basics of Rust

On the official website of Rust, we can see the introduction of Rust. It is a system programming language with good performance. At the same time, it can help you detect memory, multi-threaded data access and other problems during the compilation stage to ensure that the program is safe and robust.

In other words, using Rust to write programs, if they can be compiled, you don't have to worry about many memory leaks, segment fauts, and data race problems in C++, but all these come at a price. It’s not easy to get started with Rust, and the difficulty is comparable to C++. If it’s Go, maybe you can start contributing code to the project after a week of learning, but if you switch to Rust, you may still be struggling with the compiler for a month. Why is it your own research? The code does not compile.

Because I don't know how many people have been exposed to Rust, here are some examples to give you a basic understanding of Rust's syntax.

The first is the most common Hello world:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

fn is a keyword of Rust, used to define a function. The function name is main and println! is a macro. In rust style, the macro is represented by "!" at the end.

Let's define a few variables next, and the following will show the differences between rust and other languages:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Above we declared a variable of type u32, but it is not initialized, and then print its value. In some languages, such as Go, it will print 0, but in Rust, this fails to compile and the compiler will prompt:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

The above error tells us that an uninitialized variable is used. Let's initialize it first and become like this:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Above we first defined a variable with an initial value of 0, then changed it to 10, printed, compiled, and found the following error:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

This time the compiler told us that an immutable variable was changed. In Rust, variables are divided into immutable and mutable. We must explicitly define whether this variable can be changed. We use the mut keyword to tell Rust that this variable is mutable, as follows:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Now it can compile normally.

Through the above simple examples, everyone should have a preliminary impression of Rust. For more detailed understanding, please refer to the official website.

Key technical points to make Rust development easy

Earlier in Why choose Rust, I mentioned that because Rust has several cool features that allow us to write concurrent programs that are not prone to errors. And I think that as long as you understand and master this key point, it is very easy to use Rust for program development.

Type safety

Rust is a language that strictly requires type safety. In the C/C++ world, we can perform type conversions freely, such as:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

This kind of processing that is very common in C/C++ is not allowed in Rust, for example:
Introduction to Rust language, key technologies and practical experience

We will get the following compilation error:
Introduction to Rust language, key technologies and practical experience

If you force this kind of memory conversion, we can use unsafe, which explicitly tells Rust that it is unsafe to do so, but don't worry:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Under normal circumstances, Rust does not allow to write the above code, so it gives the programmer the option of unsafe, and we can clearly know where the unsafe code is through unsafe, and we need to pay attention.

Memory safety

Now let's enter the most maddening key point of Rust. Rust can guarantee the memory safety of the program, so how to guarantee it? The first thing we need to understand is the concept of ownership and move in Rust.

Ownership + move

In Rust, any resource can only have one ownership. For example, the simplest example:
Introduction to Rust language, key technologies and practical experience

Here we use let to bind a vector to the variable a, and we can think that a is now the ownership of this vector. Then for the resouce of this vector, only one ownership is allowed at a time. Let's look at the following code:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

In the above example, we assign a to b, and continue to print the value of a[0]. This is an operation that has no problem in most languages. In Rust, an error will be reported, as follows:
Introduction to Rust language, key technologies and practical experience

why? Before printing a[0], we performed an operation like let b = a. This operation is called move in Rust, which means that the ownership of a to vector is transferred to b, and a gives up the ownership of vector. Because a has no ownership of this vector, it will naturally not be able to access related data.

The concepts of ownership and move should be considered the first pit of learning Rust, and we can easily write the following code:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

This code still cannot be compiled, because of the function do_vec, a has already given the ownership of the vector to move.

So usually when we see code like let b = a, it means that a move has lost ownership, but there is one exception. If the type of a implements the copy trait, let b = a is not a move, but a copy. The following code can compile normally:
Introduction to Rust language, key technologies and practical experience

In the above code, let b = a, a does not move, but copies its own data for use by b. Usually the basic data types implement the copy trait. Of course, we can also implement the copy of our custom types, but we need to weigh the performance of the copy.

Borrow

Earlier we gave an example of do_vec. What if we really need to continue using this vector after calling this function?

Here we began to come into contact with the second pit, borrow. As mentioned above, move means I gave you the ownership, and borrow means I lent you the right to use the resource, and I will return it to me:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

In the above example, we used & in the parameter to indicate borrow. The do_vec function just borrows a, and then returns it after the function ends, so that we can continue to use a later.

Since borrow, it can of course be used, so we may write code like this:
Introduction to Rust language, key technologies and practical experience

Then Rust reported an error again gorgeously, outputting:
Introduction to Rust language, key technologies and practical experience

Because our borrow is only immutable borrow, and data cannot be changed. We also mentioned earlier that if you want to modify a variable, you must explicitly declare it with mut. The same is true for borrow. If you want to modify a borrowed thing, you must use mutable borrow, which is like this:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Borrow also has the concept of scope. Sometimes we write code like this:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Found that the compiler reported an error again, output:
Introduction to Rust language, key technologies and practical experience

Because we used y to borrow mutable for x, but we haven't returned it yet, so immutable borrow later is not allowed. This we can control the life cycle of mutable through scope:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

In this way, when printIn! performs immutable borrow, the mutable borrow of y has been returned.

A variable can carry out multiple immutable borrows at the same time, but only one mutable borrow is allowed. This is actually very similar to read-write lock. Multiple read locks are allowed at the same time, but only one write lock is allowed at a time.

Lifetime

In C++, I believe everyone is very impressed with wild pointers. Sometimes, we will refer to an object that has been deleted, and then panic when we use it again. In Rust, this problem is solved through lifetime, but after the introduction of lifetime, the code looks even uglier. A simple example:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

We define a struct, the field b inside is a reference to a u32 variable outside, and the lifetime of this reference is'a. Using lifetime can ensure that the lifetime of the u32 data referenced by b must be greater than that of A.

But in the above example, we are in a scope and let b refer to c, but c disappears after the scope ends. At this time, b is an invalid reference, so Rust will compile an error:
Introduction to Rust language, key technologies and practical experience

Thread safety

As we mentioned earlier, Rust uses move, borrow, and lifetime mechanisms to ensure memory safety. Although these concepts are not very easy to understand, and it is easy for everyone to fall into a struggle with the compiler when writing code, but I personally think As long as you understand these concepts, writing Rust is not a problem.

Okay, after talking about memory safety, we will immediately enter thread safety. Everyone knows that multi-threaded programs are difficult to write. Sometimes, if you don't pay attention to it, data race and other situations will occur, leading to data errors. Moreover, such bugs can be difficult to detect.

For example, in Go, we can do this:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

The above example is very extreme. You should not write code here, but in practice, we may still face the problem of data race. Although Go can open the data race check with --race, it is usually only used for test and not online.

Rust, on the other hand, completely prevents everyone from writing data race code at the source. First, let's first understand Rust's two traits for concurrency, Send and Sync:

  • Send
    When a type implements Send, we can think that this type can be safely moved from one thread to another thread.
  • Sync
    When a type implements Sync, we can think that this type can be safely used in multiple threads through shared reference (that is, Arc).

The above concept seems rather confusing. The simpler thing is that if a type implements Send + Sync, then this can be used safely under multithreading.

Let's look at a simple example:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

The above is a typical multi-threaded data race problem. Rust fails to compile and reports an error:
Introduction to Rust language, key technologies and practical experience

As mentioned earlier, we can use Arc to ensure that our types are safely used in multithreading, and we add Arc.
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Now our vector can be accessed by multiple threads, but still an error:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Because we not only need to multi-threaded read, but also need multi-threaded to write. Naturally Rust does not allow it, so we need to display the lock protection, as follows:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Therefore, if we want to safely use a data with multiple threads, the most common way is to use Arc<Mutex<T>> or Arc<RwLock<T>> for encapsulation.

Of course, in addition to Arc + Lock, Rust also provides a channel mechanism to facilitate data communication between threads. A channel is similar to a Go channel, with one end send and one end recv, which will not be explained in detail here.

Here is a point, because in Rust, we must explicitly lock the data used by multiple threads. This programming style was directly brought to the later when we wrote Go. In Go, I used to write lock, usually like this:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

Use a mutex variable m to protect the data v1 and v2 in multiple threads, but this way of writing is actually easy to forget which data the lock protects. Since I was influenced by Rust, I like to write it like this:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

The above is to put the lock and the data that needs to be protected into a struct. You can know what data the lock protects by looking at the code.

Rust development practical experience

I mentioned some of the basic features of Rust. Let me start with the relevant experience of using Rust in our project.

Cargo

If you want to use Rust for project development, the first thing you need to know is Cargo. Cargo is a build and package management tool for Rust. It should now be the norm for Rust development projects. The use of Cargo is still very simple, you can go directly to browse the official website ( https://crates.io/ ).

quick_error!

At the beginning, when we were writing C programs, we defined different int return values ​​to indicate whether a function had an error. Then in C++, we can handle errors through expection. However, which standard is adopted is still inconclusive.

When it comes to Go, it is directly agreed that the last return parameter of the function is error. There is also an official blog to introduce Go's error handling ( http://blog.golang.org/error-handling-and-go ).

In Rust, error also has a corresponding processing specification, which is Result. Result is an enum. The definition is as follows:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

In other words, our functions can all return Result and judge it outside. If it is Ok, then it is the correct processing, if it is Err, it is an error.

Here is a detailed description of error handling ( https://doc.rust-lang.org/book/error-handling.html ).

Usually everyone will deal with errors according to the above specifications, that is, define your own error, implement the from function that converts other errors into your own error, and then use try! to simplify the error processing in the code.

But soon we discovered a very serious problem. Defining our own error and converting other errors into our corresponding errors is a very redundant and complicated thing, so we use quick_error! ( http://dwz. cn/2XZpNo) to simplify the entire process .

Clippy

When I wrote Go code for the first time, I was very impressed with Go's fmt. I don't have to worry about coding style disputes anymore. Rust also has related rust fmt, but what surprised me more was Rust's Clippy tool. . Clippy is no longer entangled in coding style, but directly tells you that the code should be written like this, then the writing is wrong.

A very simple example:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

This code can be compiled and passed, but if we turn on Clippy support, it will directly prompt:
Introduction to Rust language, key technologies and practical experience

(Click on the picture to zoom the picture in full screen)

That is to tell you, don't use to_string, use to_owned.

We all know that to develop a high-performance network service, the usual choice is epoll, an event-based network model. In Rust, the mature library at this stage is MIO. MIO is an asynchronous IO library that provides unified abstract support for different operating systems, such as epoll under Linux, kqueue under UNIX, and IOCP under Windows. However, in order to unify the expansion of the platform, MIO made compromises in some implementations.

MY

For example, under Linux, the system directly provides event fd support, but in order to be compatible with UNIX, MIO uses traditional pipe instead of event fd to perform event loop awake processing.

Here is a separate thread communication channel mechanism provided by MIO. Although we can use Rust’s own thread channel for thread communication, if MIO is introduced, I prefer to use MIO’s own channel. The main reason is the lock used. Free queue, performance is better, but there is a problem of queue size limitation. If the sending is too frequent but the receiving end does not handle it, it will cause sending failure.

The flaws of the Rust language

Our team has been using Rust for several months of development, and of course we have encountered some uncomfortable places.

The first is the imperfect library. Compared to Go, Rust's library is really incomplete. I think Rust still has no large-scale applications at this stage, and the incomplete lib accounts for a big reason.

TiKV is a server program, which naturally involves network programming. The official net mod at this stage is only supported by block sockets, which can't be used to develop high-performance network programs. Fortunately, there is MIO, but MIO alone is not enough. In Go, we can easily use gRPC to write RPC, but in Rust, I think I have to wait for a long time to see if there is open source The realization.

Next, under Mac OS X, the stack generated by panic is completely unreadable. There is no file and line number information, so it is impossible to check bugs conveniently.

Of course, Rust is a relatively new language after all, and it is still being improved and developed. We are still confident that it will get better and better.

Q & A

  1. What is the difference between Go's Cgo and Rust FFI in terms of efficiency?
    Tang Liu: I wrote a simple test that calls Snappy's MaxCompressedLength function 10,000 times in a loop. I found that Rust's FFI is an order of magnitude faster than Go's Cgo. Although this test is not very accurate, it at least proves Rust. The FFI performance is better.

  2. According to the official introduction, the preferred platform for Rust is Windows. Can dlls be generated. What is your IDE?
    Tang Liu: I haven't used Windows before, so I don't know how to generate dlls. The IDEs used to develop Rust are those commonly used, such as Vim, Emacs, and Sublime. Anyway, they all have Rust plug-in support.

  3. Is it convenient for Rust to call C libraries?
    Tang Liu: It is very convenient for Rust to call C through FFI. There are related documents ( https://doc.rust-lang.org/book/ffi.html), but after all, this involves cross-language, so the code is not easy to write How good looking. And FFI needs unsafe protection, so we usually wrap a layer of Rust functions outside.

  4. What are the Rust performance indicators?
    Tang Liu: Rust performance is not very good to measure, because we have only done comparisons with Go in some specific environments, such as the Cgo vs FFI test. In this respect, the performance is better than Go. In addition, Rust is a static language without GC, so I don't think its performance will be a problem, otherwise Dropbox will not be able to write S3-like applications in Rust.

  5. What is the surrounding ecology of Rust? For example, binding with commonly used third-party systems such as DBSQL/MQ?
    Tang Liu: Rust’s ecology can only be described, it’s incomparable with Go and Java. Although there are commonly used bindings such as MySQL, I haven't used them. The main reason is that the official network IO is synchronous, so multithreading must be used to perform these processing, and with the asynchronous mode of MIO, everyone knows that the code logic is very severe. So usually we don't use Rust to develop these complex business systems. It feels that Go is more suitable.

related articles

  • A full-stack engineer's journey to Node.js (Sang Shilong)
  • The pain of concurrency: Thread, Goroutine, Actor (Wang Yuanming)
  • Codis author Huang Dongxu elaborates on distributed Redis architecture design (Huang Dongxu)

This article plans Liu Yun, poster Tang Duanrong, editors You Qian, Hao Yaqi, broadcast Yin Wenyu, Yin Xuegang, want to discuss more Rust language development, please follow the official account for opportunities to join the group. Please indicate that it is from the high-availability framework "ArchNotes" WeChat official account and include the following QR code.

Highly available architecture

Changing the way the internet is built

Introduction to Rust language, key technologies and practical experience
Long press the QR code to subscribe to the ``High Availability Architecture'' official account
Introduction to Rust language, key technologies and practical experience

Guess you like

Origin blog.51cto.com/14977574/2547865