"Go Language Lesson 1" Course Study Notes (10)

composite data type

Isomorphic Composite Types: From Fixed-Length Arrays to Variable-Length Slices

  • Composed of the values ​​of elements of multiple homogeneous types (same type) or heterogeneous types (different types), this type of data type is called a composite type in Go language.

What are the basic properties of arrays?

  • An array in Go language is a continuous sequence of fixed-length elements composed of isomorphic types.
    • Through this definition, we can recognize that the array type of Go contains two important attributes: the type of the element and the length of the array (the number of elements) : var arr [N]T.
    • Here we declare an array variable arr whose type is [N]T, where the element type is T and the length of the array is N.
    • The type of the array element can be any Go native type or custom type, and the length of the array must be provided when declaring the array variable. The Go compiler needs to know the length of the array type at the compilation stage, so we can only use the integer type A numeric literal or constant expression as the N value.
    • The type of the array element can be any Go native type or custom type, and the length of the array must be provided when declaring the array variable. The Go compiler needs to know the length of the array type at the compilation stage, so we can only use the integer type A numeric literal or constant expression as the N value.
    • If the element type T and the array length N of the two array types are the same, then the two array types are equivalent. If there is one attribute different, they are two different array types.
  • The array type is not only a logically continuous sequence, but also occupies a whole block of memory during actual memory allocation.
    • When the Go compiler actually allocates memory for variables of the array type, it will allocate a whole block of contiguous memory that can hold all its elements for the Go array.
    • Go provides a predefined function len that can be used to obtain the length of an array type variable. Through the Sizeof function provided by the unsafe package, we can obtain the total size of an array variable:
      var arr = [6]int{
              
              1, 2, 3, 4, 5, 6}
      fmt.Println("数组长度:", len(arr)) // 6
      fmt.Println("数组大小:", unsafe.Sizeof(arr)) // 48
      
  • Like basic data types, when we declare an array type variable, we can also explicitly initialize it.
    • Without explicit initialization, the value of an element in an array is the zero value of its type.
    • If we want to explicitly initialize the array, we need to explicitly place the array type in the rvalue and assign values ​​to each element through curly braces.
    • Of course, we can also ignore the length of the array type in the rvalue initialization expression and replace it with "...", and the Go compiler will automatically calculate the length of the array according to the number of array elements.
    • If we want to explicitly initialize a sparse array with a large length, it is too troublesome to assign values ​​one by one, we can initialize it by using subscript assignment:
      var arr4 = [...]int{
              
              
      	99: 39, // 将第100个元素(下标值为99)的值赋值为39,其余元素值均为0
      }
      fmt.Printf("%T\n", arr4) // [100]int
      
    • If the subscript value exceeds the range of the array length, or is a negative number, the Go compiler will give an error message to prevent access overflow.
  • Multidimensional Arrays
    • The array type itself can also be used as the type of array elements, which will produce multidimensional arrays: var mArr [2][3][4]int.
    • Array type variables are integral, which means that an array variable represents the entire array. In this way, whether it is involved in iteration or passed to a function/method as an actual parameter, the way Go passes arrays is pure value copying, which will bring a large memory copy overhead.
    • The Go language provides us with a more flexible and idiomatic way, slicing, to solve this problem.

What's up with slices?

  • Arrays are preserved in the Go language as the most basic isomorphic type, but there are two shortcomings in the use of arrays: the fixed number of elements, and the large overhead caused by the value-passing mechanism. So Go designers introduced another isomorphic composite type: slice (slice), to make up for these two deficiencies of arrays.
    • Compared with the array declaration, the slice declaration is only missing a "length" attribute.
    • Although there is no need to specify the length when declaring like an array, a slice also has its own length, but this length is not fixed, but changes with the number of elements in the slice. We can get the length of the slice type variable through the len function.
    • With the Go built-in function append, we can dynamically add elements to a slice.
  • A Go slice is actually a triplet structure at runtime, which is expressed in the Go runtime as follows:
    type slice struct {
          
          
    	array unsafe.Pointer
    	len int
    	cap int
    }
    
    • array: is a pointer to the underlying array;
    • len: is the length of the slice, that is, the number of current elements in the slice;
    • cap: It is the length of the underlying array and the maximum capacity of the slice. The cap value is always greater than or equal to the len value.
    • The Go compiler will automatically create an underlying array for each newly created slice. By default, the length of the underlying array is the same as the number of initial elements of the slice.
  • We can create a slice and specify the length of its underlying array in several ways:
    • Method 1: Use the make function to create slices and specify the length of the underlying array:
      • sl := make([]byte, 6, 10) // 其中10为cap值,即底层数组长度,6为切片的初始长度
      • If no cap parameter is specified in make, then the underlying array length cap is equal to len, for example: sl := make([]byte, 6) // cap = len = 6.
    • Method 2: Use the array[low : high : max] syntax to create a slice based on an existing array.
      • This approach is called array slicing:
        arr := [10]int{
                  
                  1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
        sl := arr[3:7:9]
        
      • For a slice created based on an array, its starting element starts from the subscript value identified by low, the length (len) of the slice is high - low, and its capacity is max - low.
      • Moreover, since the underlying array of the slice sl is the array arr, the modification of the elements in the slice sl will directly affect the variable of the array arr.
      • Slicing is like opening a "window" to access and modify the array. Through this window, we can directly manipulate some elements in the underlying array.
      • When we slice an array, we usually omit max, and the default value of max is the length of the array.
    • Method 3: Create slices based on slices.
      • The runtime representation of this slice is the same as above.
      • The biggest difference between a slice and an array lies in its variable length. This variable length needs to be supported by the Go runtime. This support is the "dynamic expansion" of the slice.
  • Dynamic Scaling of Slices
    • "Dynamic expansion" means that when we add data to the slice through the append operation, if the len value of the slice is equal to the cap value at this time, that is to say, the underlying array of the slice has no free space to store the additional value. , the Go runtime will expand the slice to ensure that the slice can always store additional new values.
    • append will dynamically allocate a new array according to the needs of the slice, and the length of the new array will be expanded according to a certain rule when the capacity of the current underlying array cannot be met. For an array whose elements are int, the capacity of the new array is twice that of the current array. After the new array is created, append will copy the data in the old array to the new array, and then the new array will become the underlying array of the slice, and the old array will be garbage collected.
    • Based on a slice created by an existing array, once the additional data operation touches the upper limit of the slice's capacity (essentially the upper bound of the array's capacity), the slice will be unbound from the original array, and any subsequent modifications to the slice will not be reflected in the original array.

What is the implementation mechanism of the native map type?

What is the map type?

  • Map is an abstract data type provided by Go language, which represents a set of unordered key-value pairs.
    • We will directly use key and value to represent the key and value of the map respectively. Moreover, each key in the map collection is unique.
    • Similar to a slice, as a compound type map, its type representation in Go is also composed of key type and value type: map[key_type]value_type.
    • The types of key and value can be the same or different.
      • If the key element type of two map types is the same, and the value element type is also the same, then we can say that they are the same map type, otherwise they are different map types.
      • The Go language requires that the key type must support two comparison operators, "==" and "!=".
      • In the Go language, function types, map types themselves, and slices only support comparisons with nil, but do not support comparisons between two variables of the same type. Function types, map types themselves, and slice types cannot be used as map key types.

Declaration and initialization of map variables

  • We can declare a map variable like this: var m map[string]int // 一个map[string]int类型的变量.
  • As with slice type variables, if we don't explicitly assign an initial value to the map variable, the default value of the map type variable is nil.
    • The slice type variable whose initial value is nil can be operated with the help of the built-in append function, which is called "zero value available" in Go language.
    • The map type, because of the complexity of its internal implementation, cannot be "zero-valued available". Therefore, if we directly operate on the map variable in a zero-value state, it will cause a runtime exception (panic), which will cause the program process to exit abnormally. Therefore, we must explicitly initialize the map type variable before it can be used.
  • Like slices, there are two ways to explicitly assign values ​​to map type variables: one is to use a composite literal value; the other is to use the pre-declared built-in function make.
    • Initialize a map type variable with a composite literal value: m := map[int]string{}.
      m2 := map[Position]string{
              
              
      	{
              
              29.935523, 52.568915}: "school",
      	{
              
              25.352594, 113.304361}: "shopping-mall",
      	{
              
              73.224455, 111.804306}: "hospital",
      }
      
    • Use make to explicitly initialize the map type variable. Through the initialization method of make, we can specify the initial capacity of the key-value pair for the map type variable, but cannot perform specific key-value pair assignment.

The basic operation of map

  • For a map type variable, we can perform operations such as inserting new key-value pairs, obtaining the current number of key-value pairs, finding a specific key and reading the corresponding value, deleting key-value pairs, and traversing key values.

    • Operation 1: Insert a new key-value pair
      • In the face of a non-nil map type variable, the way to insert a new key-value pair is very simple, we only need to assign the value to the corresponding key in the map:
        m := make(map[int]string)
        m[1] = "value1"
        m[2] = "value2"
        m[3] = "value3"
        
      • We don't need to judge whether the data is inserted successfully, because Go will ensure that the insertion is always successful.
      • The Go runtime will be responsible for the memory management inside the map variable, so unless the system memory is exhausted, we don't have to worry about the number of new data inserted into the map and the execution results.
      • If a key already exists in the map when we insert a new key-value pair, our insert operation will overwrite the old value with the new value.
    • Operation 2: Obtain the number of key-value pairs.
      • If we want to know how many key-value pairs have been created in the current map type variable during coding, like slices, the map type can also obtain the number of key-value pairs stored in the current variable through the built-in function len.
      • We cannot call cap on a map type variable to obtain the current capacity, which is a difference between the map type and the slice type.
    • Operation 3: Find and read data
      • The map type of the Go language supports querying for a key by using an idiom called "comma ok".
        m := make(map[string]int)
        v, ok := m["key1"]
        if !ok {
                  
                  
        // "key1"不在map中
        } /
        / "key1"map中,v将被赋予"key1"键对应的value
        
      • If we don't care about the value corresponding to a certain key, but only care about whether a certain key is in the map, we can use an empty identifier to replace the variable v, ignoring the value that may be returned:
        m := make(map[string]int)
        _, ok := m["key1"]
        ... ...
        
      • In Go, use the "comma ok" idiom for map key lookup and key value read operations.
    • Operation 4: Delete data
      • In Go, we need to delete data from the map with the help of the built-in function delete. In the case of using the delete function, the first parameter passed in is our map type variable, and the second parameter is the key we want to delete.
        m := map[string]int {
                  
                  
        	"key1" : 1,
        	"key2" : 2,
        } 
        fmt.Println(m) // map[key1:1 key2:2]
        delete(m, "key2") // 删除"key2"
        fmt.Println(m) // map[key1:1]
        
      • The delete function is the only way to delete a key from the map.
      • Even if the key passed to delete does not exist in the map, the execution of the delete function will not fail, nor will it throw a runtime exception.
    • Operation five: traverse the key-value data in the map
      • In Go, there is only one way to traverse the key-value pairs of the map, and that is to traverse the map data through the for range statement like a slice.
        func main() {
                  
                  
        	m := map[int]int{
                  
                  
        		1: 11,
        		2: 12,
        		3: 13,
        	} 
        	fmt.Printf("{ ")
        	for k, v := range m {
                  
                  
        		fmt.Printf("[%d, %d] ", k, v)
        	}
        	fmt.Printf("}\n")
        }
        
      • When traversing the same map multiple times, the order of traversing elements is different each time.
      • Program logic must never depend on the order of elements obtained by traversing the map.
  • Like slice types, maps are also reference types. This means that when a map type variable is passed as a parameter to a function or method, what is actually passed is only a "descriptor" instead of a data copy of the entire map, so the overhead of this transfer is fixed and small .

  • When the map variable is passed into the function or method, the modification of the map type parameter inside the function is also visible outside the function.

    package main
    import "fmt"
    func foo(m map[string]int) {
          
          
    	m["key1"] = 11
    	m["key2"] = 12
    } 
    func main() {
          
          
    	m := map[string]int{
          
          
    	"key1": 1,
    	"key2": 2,
    	fmt.Println(m) // map[key1:1 key2:2]
    	foo(m)
    	fmt.Println(m) // map[key1:11 key2:12]
    }
    

internal implementation of map

  • The Go runtime uses a hash table to implement the abstract map type. The runtime implements all functions of map type operations, including search, insert, delete, etc. In the compilation phase, the Go compiler will rewrite the map operation at the Go syntax level into the corresponding function call at runtime.
  • Schematic diagram of the implementation of the map type in the Go runtime layer:
    insert image description here
    • initial state
      • One-to-one correspondence with the map type variable (m) at the grammatical level is the instance of runtime.hmap.
      • The hmap type is the header structure (header) of the map type, that is, the descriptor of the map type, which stores all the information required for subsequent map type operations:
        insert image description here
      • Buckets are actually used to store key-value pair data, that is, buckets. Each bucket stores elements with the same low bit value of the Hash value. The default number of elements is BUCKETSIZE (the default value is 8).
      • When a certain bucket is full and the map has not yet reached the condition of expansion, an overflow bucket will be created at runtime, and this overflow bucket will be hung on the overflow pointer at the end of the above bucket, so that the two buckets form a linked list structure, until the next map expansion, this structure will always exist.
      • Each bucket consists of three parts, from top to bottom are tophash area, key storage area and value storage area.
        • tophash area
          • When we insert a piece of data into the map, or query data by key from the map, the runtime will use the hash function to hash the key and obtain a hash value (hashcode).
          • This hashcode is very critical, and the hashcode will be "divided into two" at runtime, where the value in the low-order area is used to select a bucket, and the value in the high-order area is used to determine the position of the key in a bucket.
            insert image description here
          • The tophash area of ​​each bucket is actually used to quickly locate the key location, thus avoiding the costly operation of comparing keys one by one. Especially when the key is a string type with a large size, the benefits are even more prominent. This is an idea of ​​exchanging space for time.
        • key storage area
          • Below the tophash area is a continuous memory area, which stores all the key data carried by this bucket.
          • The runtime needs to know the size of the key when allocating buckets.
          • When we declare a map type variable, such as var m map[string]int, Go runtime will generate a runtime.maptype instance for the specific map type corresponding to this variable. If this instance already exists, it will be reused directly.
            type maptype struct {
                          
                          
            	typ _type
            	key *_type
            	elem *_type
            	bucket *_type // internal type representing a hash bucket
            	keysize uint8 // size of key slot
            	elemsize uint8 // size of elem slot
            	bucketsize uint16 // size of bucket
            	flags uint32
            }
            
          • This instance contains all the "meta information" we need in the map type. The compiler will rewrite the map operation at the grammatical level into the corresponding function call at runtime. These runtime functions have a common feature, that is, the first parameter is a parameter of maptype pointer type.
          • The Go runtime uses the information in the maptype parameter to determine the type and size of the key.
            • The hash function used by map is also stored in maptype.key.alg.hash(key, hmap.hash0).
            • At the same time, the existence of maptype also allows all map types in Go to share a set of runtime map operation functions, instead of creating a set of map operation functions for each map type like C++, which saves the space occupied by the final binary file.
        • value storage area
          • Another continuous memory area below the key storage area stores the value corresponding to the key.
          • Like the key, the region is created with the help of information in the maptype.
          • The Go runtime adopts the method of storing key and value separately, instead of using a kv followed by a kv to store kv next to each other. This actually brings algorithmic complexity, but reduces the memory alignment. Memory waste.
          • If the data length of the key or value is greater than a certain value, the runtime will not directly store the data in the bucket, but will store the pointer of the key or value data.
    • map expansion
      • The map implementation of the Go runtime introduces a LoadFactor (load factor). When count > LoadFactor * 2^B or there are too many overflow buckets, the runtime will automatically expand the map.
      • If it is due to "expansion" caused by too many overflow buckets, in fact, a new bucket array with the same size as the existing one will be created at runtime, and then emptying and migration will be done during assign and delete.
      • If the expansion is due to the fact that the current amount of data exceeds the water level specified by LoadFactor, a bucket array twice the size of the existing bucket will be created at runtime, but the actual emptying and migration work is also carried out gradually during assign and delete. The original bucket array will hang under the old buckets pointer of hmap, and the original buckets array will not be released until all the data in the original buckets array is migrated to the new array.
        insert image description here
    • map and concurrency
      • The hmap instance that acts as a map descriptor is stateful (hmap.flags), and there is no concurrent protection for reading and writing the state. Therefore, the map instance is not safe for concurrent writing, nor does it support concurrent reading and writing.
      • If we read and write concurrently to the map instance, an exception will be thrown when the program is running.
      • However, if we are only doing concurrent reads, map is fine.
      • Go version 1.9 introduced the sync.Map type that supports concurrent write security, which can be used to replace map in concurrent read and write scenarios.
      • Considering that the map can be automatically expanded, the value position of the data element in the map may change during this process, so Go does not allow obtaining the address of the value in the map, and this constraint takes effect during compilation.

structure

How to customize a new type?

  • In Go, the type that provides aggregate abstraction is the structure type, or struct.
  • In Go, we generally have two ways to customize a new type.
    • The first is type definition (Type Definition), which is our most commonly used type definition method.
      • In this method, we will use the keyword type to define a new type T, the specific form is this: type T S // 定义一个新类型 T.
      • Here, S can be any defined type, including Go native types, or other defined custom types.
      • If a new type is defined based on a Go native type, then we call the Go native type the underlying type of the new type (Underlying Type).
        • The underlying type plays an important role in the Go language, and it is used to determine whether two types are essentially the same (Identical).
        • For two types that are essentially the same, their variables can be assigned to each other through explicit transformation. On the contrary, if two types are essentially different, it is impossible to even explicitly transform their variables, let alone assign to each other. .
      • In addition to defining new types based on existing types, we can also define new types based on type literals. This method is mostly used to customize a new composite type: type M map[int]string.
      • Similar to how variable declarations support the use of var blocks, type definitions also support the use of type blocks.
    • The second way to customize new types is to use type aliases (Type Alias). This type definition method is usually used in the gradual refactoring of projects and the secondary packaging of existing packages.
      • Its form is this: type T = S // type alias.
      • Compared with the first type definition above, the form of the type alias is just an extra equal sign, but it is this equal sign that makes the new type T completely equivalent to the original type S.
      • Complete equivalence means that the type alias does not define a new type, the classes T and S are actually the same type, they are just two names of a type.

How to define a struct type?

  • Composite types are generally defined by means of type literals, and the structure type as one of the composite types is no exception:
    type T struct {
          
          
    	Field1 T1
    	Field2 T2
    	... ...
    	FieldN Tn
    }
    
    • If the struct type is only used within the package it is defined in, then we can lowercase the first letter of the type name;
    • If you don't want to expose a field in the structure type to other packages, then we can also lowercase the first letter of the field name.
    • We can also use the empty identifier "_" as a field name in a structure type definition. Such fields with empty identifier names cannot be referenced by external packages, and cannot even be used by the package where the structure is located.
  • define an empty structure
    • We can define an empty structure, that is, a structure type that does not contain any fields: type Empty struct{} // Empty是一个不包含任何字段的空结构体类型.
    • Based on the characteristics of zero memory overhead of empty structure type, we often use empty structure type elements in daily Go development as a kind of "event" information for communication between Goroutines.
    • This kind of channel established with an empty structure as the element class is the communication method between Goroutines that can be realized at present with the smallest memory usage.
  • Use other structs as types for fields in custom structs
    • For a structure type that contains a structure type field, we don't need to provide the name of the field, just use its type.
      type Book struct {
              
              
      	Title string
      	Person // 结构体
      	... ...
      }
      
    • Structure fields defined in this way are called Embedded Fields. We can also call this kind of field an anonymous field, or think of the type name as the name of the field.
      var book Book
      println(book.Person.Phone) // 将类型名当作嵌入字段的名字
      println(book.Phone) // 支持直接访问嵌入字段所属类型中字段
      
  • The Go language does not support the definition of recursively putting fields of its own type in a structure type definition.
  • Although we cannot have a field defined with its own type T in the definition of a structure type T, we can have a pointer type of its own type, a slice type with its own type as the element type, and a map with its own type as the value type Fields of type:
    type T struct {
          
          
    	t *T // ok
    	st []T // ok
    	m map[string]T // ok
    }
    

Declaration and initialization of structure variables

  • Like all other variable declarations, we can also use the standard variable declaration statement, or the short variable declaration statement to declare a variable of type structure.
  • Variables of structure type usually have reasonable meaning only after they are given appropriate initial values.
  • Initialization of structure type variables
    • zero-initialization
      • Zero-initialization means using the zero value of the structure as its initial value.
      • The Go structure type consists of several fields. When the value of each field of the structure type variable is zero, we say that the structure type variable is in the zero value state.
    • use composite literals
      • The easiest way to explicitly initialize structure variables is to assign values ​​to each structure field in sequence.
        type Book struct {
                  
                  
        	Title string // 书名
        	Pages int // 书的页数
        	Indexes map[string]int // 书的索引
        } 
        var book = Book{
                  
                  "The Go Programming Language", 700, make(map[string]int)}
        
      • The Go language does not recommend that we explicitly initialize a structure type variable in the order of fields, and even Go officially provides a built-in check rule in the go vet tool: "composites", which is used to statically check the structure in the code Whether this method is used for variable initialization, once found, a warning will be given.
      • Go recommends that we use composite literal values ​​in the form of "field:value" to explicitly initialize structure type variables. This method can reduce the coupling between structure type users and structure type designers, which is also the Go language usage.
        var t = T{
                  
                  
        	F2: "hello",
        	F1: 11,
        	F4: 14,
        }
        
      • Using this "field:value" form of composite literals to initialize structure type variables is very flexible. In contrast to the previous sequential compound literal forms, the fields in the "field:value" form literals can appear in any order.

use a specific constructor

  • It is not uncommon to create and initialize struct variables using specific constructors.
    func NewT(field1, field2, ...) *T {
          
          
    	... ...
    }
    
    • NewT is a special constructor of structure type T, the parameters in its parameter list usually correspond to the exported fields in the definition of T, and the return value is a variable of T pointer type.
    • The non-exported fields of T are initialized inside NewT, and some fields that require complex initialization logic will also be initialized inside NewT.
    • In this way, we only need to call the NewT function to get an available T pointer type variable.

Memory layout of structure type

  • The Go structure type is the second one after the array type to store its elements (structure fields) one by one in a "tile" form in a contiguous memory block.
    • The structure type T has a very compact layout in memory. The memory allocated by Go is used to store fields, and there are no extra fields inserted by the Go compiler.
      insert image description here
    • We can use the functions provided by the unsafe package of the standard library to obtain the memory size occupied by the structure type variable and the offset of each field in memory relative to the start address of the structure variable.
  • In a real situation, although the Go compiler does not insert additional fields in the memory space occupied by structure variables, the structure fields may not actually be closely connected, and there may be "gap" in the middle.
    • These "gaps" are also part of the memory space occupied by structure variables, and they are "padding" inserted by the Go compiler.
    • So, why does the Go compiler insert "padding" between the fields of the structure?
      • This is actually a requirement for memory alignment.
      • The so-called memory alignment means that the memory addresses of various memory objects are not determined arbitrarily, but must meet specific requirements.
      • For various basic data types, the memory address value of its variables must be an integer multiple of the size of the type itself.
      • For a structure, the memory address of its variable only needs to be an integer multiple of the smaller one between its longest field length and the system alignment factor. But for the structure type, we also need to make the memory address of each field strictly meet the memory alignment requirements.
    • When defining a structure on a daily basis, you must pay attention to the order of the fields in the structure, and try to sort them reasonably to reduce the memory space occupied by the structure.

Guess you like

Origin blog.csdn.net/fangzhan666/article/details/132403781