A brief discussion on Rust memory management

Rust has been favored by many developers in recent years due to its uniqueness in memory management. The core feature of Rust memory management is ownership. Different languages ​​adopt different memory management methods, which are mainly divided into manual management by developers, compiler-assisted management, and garbage collection mechanisms. Rust's ownership mechanism is different from these two.

stack memory

We know that programs create data on the heap or stack. Creating data on the stack is easy. As long as you know the size of the data, moving the top pointer on the stack can open up the required space. To access it, we have variables at the code level, and the assembly level is just an offset relative to the top address of the stack. This offset is often very small, so accessing stack data is very fast. Creating data on the heap is more troublesome. We need to find a piece of memory of suitable size from the memory space and then "open up" it. That is, we need to register which piece of space is used. Then you need to store the starting address of this address, the occupied size, the actual used size, etc. To access this memory, you must also search in the memory space based on the recorded starting address. This process is relatively slow.

We use heap memory because it has some advantages that stack memory does not have: 1. It can be dynamically allocated. All data in the stack must occupy a known and fixed size. During the compilation phase, we can already know the relative position and size of the variables on the stack. And we can open up data on the heap of any allowed size as needed. We cannot determine the positional relationship between these data. 2. Heap memory can be shared. Because of the separation of pointers and actual memory, we can make multiple pointers point to the same memory to achieve data sharing. But because stack variables are stack memory, two variables also mean two copies of storage. 3. A controllable life cycle. Once a stack clear occurs, the memory allocated on the stack will be destroyed. The memory on the heap can exist for a long time. In some languages, developers need to clear it manually, and some require some recycling mechanisms to clean it up.

ownership rules

Let’s first propose Rust’s ownership rules:

  1. Every value in Rust has an owner .
  2. A value has exactly one owner at any time.
  3. When the owner (variable) goes out of scope, this value will be discarded.

 The scope here refers to the effective scope of use of the variable. Based on our previous understanding of the stack, we found that the variables on the stack conform to the above description. Stack variables are stack memory. Variables are the only owners of values. When we leave the scope of the variable, the compiler does not allow us to use this variable again. At this time, it has been discarded. When the stack is subsequently cleared, it will be destroyed. . But for heap memory, the above rules are not consistent in languages ​​such as C and C++. Some languages ​​implement memory management by inserting code at the end of the variable scope to release the memory pointed to by the variable. But if there are multiple pointers pointing to the same memory, there will be a problem of multiple releases. So some languages ​​use reference counting technology, and each pointer variable is the owner of the memory. When a variable leaves the scope, it only decreases the reference count and does not immediately release the memory. Wait until the reference count drops to 0 before releasing the memory. Rust controls only one owner at any time from the source, so that the value can be discarded immediately when the owner leaves the scope.

How variables interact with data

So how to ensure the above rules? Here are several ways to interact with variables and data: moving, cloning, and referencing (borrowing).

Movement is a behavior that distinguishes the Rust language from other languages. In C/C++, multiple pointer variables can point to the same memory, and free or delete can be called on any one of them. But in order to ensure the unique owner rule in Rust, when you use another variable to point to the memory pointed to by the current variable, it assumes that you will not use the previous variable again in the future, that is, the ownership is transferred to another variable.

If you want to keep the current pointer valid, one way is to clone a new memory, so that two memories and two owners do not violate the above rules. For heap memory, it is usually necessary to call the clone method. For stack memory, for types such as int whose size is known at compile time, copying often occurs automatically. Some special custom types need to implement the Copy trait.

However, cloning large memory often results in relatively large consumption, and often we do not need a new piece of memory. Can we use other variables to borrow this memory without transferring ownership? Yes, this is the quote. The citation is fine and neither transfers ownership nor adds an owner. But citations introduce new problems.

Quote

data competition

References give us the ability to access the same block of memory with other variables without transferring ownership, which is great, but by default, references can only read values, not modify them. Unless you declare a mutable reference:

fn main() {
    let mut s = String::from("hello");

    change(&mut s);
}

fn change(some_string: &mut String) {
    some_string.push_str(", world");
}

Now that we have references that can be read and written, everything looks great, but this introduces the problem of read and write competition.

A data race is similar to a race condition and can be caused by these three behaviors:

  • Two or more pointers access the same data simultaneously.
  • At least one pointer is used to write data.
  • There is no mechanism to synchronize data access.

Users of immutable references don’t want values ​​to be accidentally changed under their noses! However, multiple immutable references are OK because no one who can only read the data has the ability to affect the data read by others. Therefore, two mutable references, one mutable reference and multiple immutable references cannot overlap during the life cycle, otherwise the compiler will report an error.

dangling reference

In a language with pointers, it is easy to mistakenly create a dangling pointer ( dangling pointer ) by retaining a pointer to it when freeing memory. A dangling pointer means that the memory it points to may have been allocated to another holder. In contrast, in Rust the compiler ensures that a reference never becomes dangling: when you have a reference to some data, the compiler ensures that the data does not leave the scope before its reference. Rust ensures the validity of a reference by checking the owner's lifetime.

Life cycle annotations

The main goal of lifetime is to avoid dangling references. Most of the time lifetime is implicit and can be inferred, but there are also cases where the lifetime of a reference is related in some different ways, so Rust requires us to use generic lifetime parameters. To indicate their relationship, you can ensure that the reference actually used at runtime is absolutely valid. Let’s look at an example:

fn main() {
    let string1 = String::from("abcd");
    let string2 = "xyz";

    let result = longest(string1.as_str(), string2);
    println!("The longest string is {}", result);
}

fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

When we define this function, we don't know the specific value passed to the function, so we don't know whether or ifnot elsewill be executed. We also don't know the specific lifetime of the incoming reference, so we can't determine whether the returned reference is always valid by looking at the scope. To fix this error, we will add a generic lifetime parameter to define the relationship between references so that the borrow checker can analyze it.

Lifecycle annotations do not change the lifetime of any reference. Instead they describe the relationship between the lifetimes of multiple references without affecting their lifetimes. Just like when a function signature specifies a generic type parameter and can accept any type, a function can also accept any lifetime reference when a generic lifetime is specified.

Lifecycle annotations have a less common syntax: lifecycle parameter names must start with an apostrophe ( '), and their names are usually all lowercase, similar to generics whose names are very short. Most people use 'aas the first lifecycle annotation. The lifecycle parameter annotation is located &after the reference, and there is a space to separate the reference type from the lifecycle annotation.

In short, through lifecycle annotations, we tell the Rust function which parameters are associated with the lifecycle of its return value. Functions can only be used in areas where their life cycles overlap.

smart pointer

Smart pointers are a class of data structures that behave like pointers but have additional metadata and functionality . One difference between normal references and smart pointers in Rust is that a reference is a type of pointer that only borrows data; in contrast, in most cases, smart pointers own the data they point to.

Smart pointers are usually implemented using structures. Smart pointers differ from structures in that they implement Derefand Droptraits. DerefThe trait allows smart pointer struct instances to behave like references, allowing you to write code that works with both references and smart pointers. DropTraits allow us to customize the code that runs when a smart pointer goes out of scope.

Box<T>

前面我们提到了,栈上的值不会被多个变量共享,只会产生多个拷贝。但有时候我们希望将值在堆上开辟。这里使我们不得不这么做的一个例子是递归类型。Box<T>使我们能够像指针变量一样访问值类型。They are mostly used in the following scenarios:

  • When you have a type whose size is unknown at compile time and you want to use a value of that type in a context that requires the exact size
  • When you have a large amount of data and want to transfer ownership without making sure the data is copied
  • When you want to have a value and only care about whether its type implements a specific trait rather than its specific type

RC<T>

We mentioned earlier that Rust is single-ownership. If you want to use multiple ownership, you need to use reference counting. Rust Rc<T>类型可以实现这个功能。Rc<T>can only be used in single-threaded scenarios. The reference count can be incremented through RC::clone(), Rc<T>which allows data to be shared read-only between multiple parts of the program 叫strong_count。而Weak<T>的引用计数叫weak_count。via immutable references .Rc<T>

RefCell<T>

We mentioned earlier that the borrowing rules stipulate that data cannot be changed when there is an immutable reference. However, in certain cases it can be useful to have a value be able to modify itself within its method, while still being considered immutable by other code. Code outside the value method cannot modify its value. RefCell<T>is a way to obtain internal variability. RefCell<T>Rather than completely bypassing the borrowing rules, the borrow checker in the compiler allows for internal mutability and checks the borrowing rules at runtime accordingly. If these rules are violated, a panic occurs instead of a compilation error.

When creating immutable and mutable references, we use &the and &mutsyntax respectively. For RefCell<T>, it is the borrowand borrow_mutmethods, which are RefCell<T>part of the security API. borrowThe method returns Ref<T>a smart pointer of type, and borrow_mutthe method returns RefMut<T>a smart pointer of type. Both types are implemented Derefso they can be treated like regular references.

RefCell<T>Keep track of how many active Ref<T>and RefMut<T>smart pointers there are currently. On each call borrow, RefCell<T>increment the active immutable borrow count by one. When Ref<T>a value goes out of scope, the immutable borrow count is decremented by one. Just like the compile-time borrowing rules, RefCell<T>only multiple immutable borrows or one mutable borrow are allowed at any time.

Summary: Compared with the multiple ownership of RC<T>, RefCell<T>是单所有权的。它们都允许有多个引用,但是the references of RC<T> are immutable, and RefCell<T>可以返回可变引用。Box<T>immutable or variable borrow checks are allowed to be performed at compile time; RefCell<T>immutable or variable borrows are allowed to be performed at runtime. examine.

Weak<T>

In languages ​​that use reference counting, they all face the problem of memory leaks due to circular references that prevent the reference count from being reduced to 0. You can use Weak<T> to solve this problem.

reference:

1. Rust programming language Simplified Chinese version

Guess you like

Origin blog.csdn.net/Mamong/article/details/132955650