[Learn Rust programming with Xiaojia] Fifteen, smart pointer

Series Article Directory

[Learn Rust programming with Xiaojia] 1. The basics of Rust programming
[Learn Rust programming with Xiaojia] 2. Use of Rust package management tools
[Learn Rust programming with Xiaojia] 3. Basic program concepts of Rust
[Learn Rust programming with Xiaojia] 】4. Understand the concept of ownership in Rust
【Learn Rust programming from Xiaojia】5. Use structures to associate structured data
【Learn Rust programming from Xiaojia】6. Enumeration and pattern matching
【Learn Rust programming from Xiaojia】7. Use packages (Packages), unit packages (Crates) and modules (Module) to manage projects
[Learn Rust programming with Xiaojia] 8. Common collections
[Learn Rust programming with Xiaojia] 9. Error handling (Error Handling)
[Follow Xiao Jia learns Rust programming] 11. Write automated tests
[Learn Rust programming with Xiao Jia] 12. Build a command line program
[Learn Rust programming with Xiao Jia] 13. Functional language features: iterators and closures
[ Learn Rust programming with Xiaojia] 14. About Cargo and Crates.io
[Learn Rust programming with Xiaojia] 15. Smart pointer

foreword

A pointer is a variable that contains a memory address that references or executes another piece of data. The most common type of pointer in Rust is the reference. The difference is that references in Rust are given a deeper meaning of borrowing the value of other variables.

The main teaching material refers to "The Rust Programming Language"


1. Smart pointer

1.1, smart pointer (smart point)

A smart pointer is a complex data structure that contains more information than a reference, such as metadata, current length, maximum available length, etc.

In the previous chapters, we have actually seen a variety of smart pointers, such as dynamic string String and dynamic data Vec.

Smart pointers are often implemented based on structures. The biggest difference between them and our custom structures is that they implement the Deref and Drop features:

  • Deref: You can make smart pointers work like references, so you can write code that supports both smart pointers and references, such as *T
  • Drop: Allows you to automatically execute code after the specified smart pointer goes out of scope, such as finishing work such as data cleaning

1.2. Box heap memory allocation

In Rust, all values ​​are allocated on the stack by default, by creating Box<T>a box to make it allocated on the heap. Box<T>Is a smart pointer because it implements the Deref trait, which allows Box<T>the value to be treated as a reference. When Box<T>the value leaves the scope, because it implements the Drop trait, it first deletes the heap data it points to, and then deletes itself.

scenes to be used

  • At compile time, the size of a type cannot be determined, but when using this type, the context needs to know its exact size;
  • When you have a large amount of data and want to transfer ownership, but need to ensure that the data will not be copied during operation;
  • With a value, you only care about whether it implements a specific trait, not its specific type;

1.2.1, Scenario 1: Allocate data on the heap memory


fn main() {
    
    
	let a = Box::new(1);  // Immutable
	println!("{}", a);    // Output: 1
	
	let mut b = Box::new(1);  // Mutable
	*b += 1;
	println!("{}", b);    // Output: 2
}

The main feature of Box is single ownership, that is, at the same time, there is one person who owns the ownership of the data it points to, and at the same time, there is one mutable reference or multiple immutable references at the same time, which is consistent with the behavior of other data belonging to the heap in Rust.

1.2.2. Scenario 2: cons list

The cons list is a data structure from the Lisp language. Each member of the cons list contains two elements: the value of the current item and the next element. The last member of the cons list contains only a nil value, there is no next element.

Box<T>is a pointer, and Rust knows how much space it needs because the size of a pointer doesn't change based on the size of the data it points to.

use crate::List::{
    
    Cons, Nil};

fn main() {
    
    
    let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3,Box::new(Nil))))));
}

enum List {
    
    
    Cons(i32, Box<List>),
    Nil,
}

1.3, Deref Dereference

1.3.1、Deref trait

The Deref Trait allows us to overload the dereference operator *. Smart pointers that implement Deref can be treated as references, that is, *operators can be used to dereference smart pointers.

#[stable(feature = "rust1", since = "1.0.0")]
impl<T: ?Sized> Deref for Box<T> {
    
    
    type Target = T;

    fn deref(&self) -> &T {
    
    
        &**self
    }
}

1.3.2. Three Deref conversions

Before, we talked about immutable Deref conversion. In fact, Rust also supports converting a mutable reference to another mutable reference and converting a mutable reference to an immutable reference. The rules are as follows:

When T: Deref<Target=U>, you can convert &T to &U, which is the example we saw before.
When T: DerefMut<Target=U>, you can convert &mut T to &mut U.
When T: Deref<Target= U>, you can convert &mut T to &U

1.4. Drop releases resources

1.4.1、Drop trait

The main function of the Drop trait is to release the resources owned by the implementer instance, and it has only one method drop. This method is called automatically when the instance goes out of scope, calling implementer-specified code.

#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<#[may_dangle] T: ?Sized> Drop for Box<T> {
    
    
    fn drop(&mut self) {
    
    
        // FIXME: Do nothing, drop is currently performed by compiler.
    }
}

1.4.2. Use std::mem::drop to drop in advance

Rust does not allow to manually call the drop method of the Drop trait, but you can use the standard library's std::mem::drop to drop ahead of time.

1.5. Reference counting smart pointers ( RC<T>and Arc<T>)

1.5.1、RC<T>

RC<T>Primarily used in cases where multiple read-only accesses are required for all allocated data areas on the same heap, it is more elegant and intuitive than using and then Box<T>creating multiple immutable references, and Rc<T> supports Multiple ownership.

Rc is the abbreviation of Reference Counter, that is, reference counting. Rust's Runtime will record the Rc<T> number of current references in real time, and release the data when the reference count reaches zero (similar to Python's GC mechanism). Because of the need to maintain Rc<T>the number of times a record type is referenced, this implementation requires a Runtime Cost.

use std::rc::Rc;

fn main() {
    
    
    let a = Rc::new(1);
    println!("count after creating a = {}", Rc::strong_count(&a));
    let b = Rc::clone(&a);
    println!("count after creating b = {}", Rc::strong_count(&a));
    {
    
    
        let c = Rc::clone(&a);
        println!("count after creating c = {}", Rc::strong_count(&a));
    }
    println!("count after c goes out of scope = {}", Rc::strong_count(&a));
}

requires attention

  • RC<T>It is completely immutable, and can be understood as multiple read-only pointers for data on the same memory at the same time
  • RC<T>It is only applicable to a single thread. Although conceptually speaking, read-only pointers between different threads are completely safe, but since the RC<T>counting consistency is not guaranteed between multiple threads, if you try to use it in multiple threads, an error will be reported;

1.5.1, atomic reference count (Atomic reference counter)

At this point, reference counting can be safely used in different threads.

use std::thread;
use std::sync::Arc;

fn main() {
    
    
    let a = Arc::new(1);
    thread::spawn(move || {
    
    
        let b = Arc::clone(&a);
        println!("{}", b);  // Output: 1
    }).join();
}

1.6. Internal variability of Cell and RefCell

1.6.1. Interior mutability

Interior mutability is one of Rust's design patterns that allows you to modify data while only holding immutable references. Unsafe code is used in data structures to bypass Rust's normal mutability and borrowing rule.

1.6.2、Cell

There is no difference in function between Cell and Refcell, the difference is that Cell applies to the case where T implements Copy

1.6.3、RefCell

Since the Cell type is aimed at the value type that implements the Copy feature, in actual development, Cell is not used much, because what we need to solve is often the problem caused by the coexistence of mutable and immutable references. At this time, we need to use Use RefCell to achieve the goal.

Rust rules Extra rules brought by smart pointers
A data has only one owner Rc/Arc allows a piece of data to have multiple owners
Either multiple immutable borrows or one mutable borrow RefCell realizes coexistence of compiler variable and immutable references
Violation of the rules leads to compilation errors Violation of the rules causes a runtime panic

It can be seen that the combination of Rc/Arc and RefCell solves the problem of difficult use in some scenarios caused by strict ownership and borrowing rules in Rust. But they are not silver bullets. For example, RefCell does not actually solve the problem that variable references and references can coexist. It just postpones error reporting from compile time to runtime, changing from a compiler error to a panic exception:

1.6.4, Cell and RefCell

  • Cell is only applicable to the Copy type and is used to provide values, while RefCell is used to provide references
  • Cell will not panic, but RefCell will
  • Cell has no additional performance loss

From the CPU point of view, the loss is as follows:

  • Dereferencing Rc is free (compile time), but the indirect value brought by * is not free
  • Cloning Rc needs to compare the current reference count with 0 and usize::Max, and then add 1 to the count value
  • Release (drop) Rc needs to decrement the count value by 1, and then compare it with 0
  • For immutable borrowing of RefCell, you need to add 1 to the borrowing count of the isize type, and then compare it with 0
  • To release the immutable borrow of RefCell, you need to decrease isize by 1
  • The general process of variable borrowing of RefCell is similar to the above, but it needs to be compared with 0 first, and then subtracted by 1
  • To release the variable borrowing of RefCell, you need to add 1 to isize

1.6.5. Resolving borrowing conflicts

Two very useful methods have been added in Rust version 1.37:

  • Cell::from_mut, which converts &mut T to &Cell
  • Cell::as_slice_of_cells, which converts &Cell<[T]> to &[Cell]

1.7, Weak and reference cycle

1.7.1. Reference cycles and memory leaks

Rust's memory safety mechanism ensures that memory leaks are very difficult to occur. But it doesn't mean there will be no memory leaks. A typical example is using both Rc and RefCell to create circular references, and eventually the counts of these references cannot be reset to zero, so the value owned by Rc will not be released and cleaned up.

use crate::List::{
    
    Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

#[derive(Debug)]
enum List {
    
    
    Cons(i32, RefCell<Rc<List>>),
    Nil,
}

impl List {
    
    
    fn tail(&self) -> Option<&RefCell<Rc<List>>> {
    
    
        match self {
    
    
            Cons(_, item) => Some(item),
            Nil => None,
        }
    }
}

fn main() {
    
    
    let a = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));

    println!("a的初始化rc计数 = {}", Rc::strong_count(&a));
    println!("a指向的节点 = {:?}", a.tail());

    // 创建`b`到`a`的引用
    let b = Rc::new(Cons(10, RefCell::new(Rc::clone(&a))));

    println!("在b创建后,a的rc计数 = {}", Rc::strong_count(&a));
    println!("b的初始化rc计数 = {}", Rc::strong_count(&b));
    println!("b指向的节点 = {:?}", b.tail());

    // 利用RefCell的可变性,创建了`a`到`b`的引用
    if let Some(link) = a.tail() {
    
    
        *link.borrow_mut() = Rc::clone(&b);
    }

    println!("在更改a后,b的rc计数 = {}", Rc::strong_count(&b));
    println!("在更改a后,a的rc计数 = {}", Rc::strong_count(&a));

    // 下面一行println!将导致循环引用
    // 我们可怜的8MB大小的main线程栈空间将被它冲垮,最终造成栈溢出
    // println!("a next item = {:?}", a.tail());
}

How to prevent circular references

  • Developers pay attention to details
  • Use Weak

1.7.2、Weak

Weak is similar to RC but is different from RC holding ownership. Weak does not need to hold ownership, but only saves a weak reference pointing to the data. If you want to access the data, you need to implement it through the upgrade method of the Weak pointer. This method returns a type Option<Rc<T>>of value.

The so-called weak reference does not guarantee the existence of the reference relationship. If it does not exist, it returns None.

Because Weak references are not counted in ownership, it cannot prevent the referenced memory value from being released, and Weak itself does not make any guarantees for the existence of the value. If the referenced value still exists, it returns Some, and if it does not exist, it returns None.

Weak RC
not count count
no ownership Take ownership of the value
does not prevent the value from being released (drop) Only when the ownership count is zeroed can it be dropped
If the reference exists, return some, if it does not exist, return None Reference value must exist
Option<Rc<T>>Get the value through upgrade Automatically dereference through Deref, no operation is required to get the value

Weak references are very suitable for the following scenarios

  • Holds a temporary reference to an Rc object and doesn't care if the referenced value still exists
  • Prevent circular references caused by Rc, because the ownership mechanism of Rc will cause multiple Rcs to fail to count and return to zero

1.7.3、unsafe

In addition to using these types provided by the Rust standard library, you can also use raw pointers in unsafe to solve these tricky problems, but since we haven't covered unsafe yet.

Although unsafe is not safe, it is still commonly used in various library codes to implement self-referential structures. The main advantages are as follows:

  • High performance, after all, directly operate with naked pointers
  • The code is simpler and more intuitive: compareOption<Rc<RefCell<Node>>>

Summarize

That's all for today

Guess you like

Origin blog.csdn.net/fj_Author/article/details/132507528