Data Structures and Algorithms (1): Arrays

overview

definition

In computer science, an array is a data structure consisting of a set of elements (values ​​or variables), each identified by at least one index or key

In computer science, an array is a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key

Because the elements in the array are stored contiguously , the address of the element in the array can be calculated by its index, for example:

int[] array = {
    
    1,2,3,4,5}

Know the data start address of the array B ase A address BaseAddressB a se A dd ress , it can be obtained by the formulaB ase A address + i ∗ size BaseAddress + i * sizeBaseAddress+is i ze calculates the indexiithe address of the i element

  • i i i is the index, starting from 0 in Java, C and other languages
  • s i z e size s i ze is the byte occupied by each element, such asint intint accounts for4 44 d o u b l e double d o u b l e accounted for8 88

small test

byte[] array = {
    
    1,2,3,4,5}

It is known that the starting address of the data in the array is 0x7138f94c8, so what is the address of element 3?

Answer: 0x7138f94c8 + 2 * 1 = 0x7138f94ca

space occupied

The array structure in Java is

  • 8 bytes mark
  • 4 bytes class pointer (in case of compressed class pointer)
  • 4 byte array size (determines the maximum capacity of the array is 2 32 2^{32}232
  • Array elements + alignment bytes (the size of all objects in java is an integer multiple of 8 bytes [^12], and alignment bytes should be used to make up for the shortage)

For example

int[] array = {
    
    1, 2, 3, 4, 5};

is 40 bytes in size and consists of the following

8 + 4 + 4 + 5*4 + 4(alignment)

random access performance

That is, to find elements according to the index, the time complexity is O ( 1 ) O(1)O(1)

logical size and physical size

The physical size of an array is the total number of its array elements, or the number used to specify its capacity when the array was created.

The logical size of the array, which is the number of items it currently has in use by the application.

Don't worry about the difference when the array is always full, but that's rarely the case.

In general, the physical size of the logical size tells us several important things about the state of the array:

  • If the logical size is 0 and the array is empty, then the array contains no data items;
  • If the array contains data items, the index of the last item in the array is the logical size minus 1;
  • If the logical size is equal to the physical size, the array is already filled with data.

dynamic array

java version

import java.util.Arrays;
import java.util.Iterator;
import java.util.function.Consumer;
import java.util.stream.IntStream;

/**
 * @author Ethan
 * @date 2023/3/20
 * @description
 */
public class Ds01DynamicArray implements Iterable<Integer> {
    
    
    /**
     *  逻辑大小
     */
    private int size = 0;
    /**
     *  容量
     */
    private int capacity = 8;
    /**
     * 初始化数组为空
     */
    private int[] array = {
    
    };


    /**
     * 向任意位置添加元素
     *
     * @param index   索引位置
     * @param element 待添加元素
     */
    public void add(int index, int element) {
    
    
        // 检查容量大小,不够要扩容
        checkAndGrow();

        // 如果插入的位置效益逻辑大小,那么要先把位置腾出来,索引位置以后得元素都要后移一位
        if (index >= 0 && index < size) {
    
    
            // 向后挪动, 空出待插入位置,使用数组的copy方法
            // 从哪书分别是源数组、源数组起始位置、目标数组、目标数组的起始位置、copy元素个数
            System.arraycopy(array, index,
                    array, index + 1, size - index);
        }
        // 在指定位置插入元素
        array[index] = element;
        // 逻辑大小+1
        size++;
    }

    /**
     * 向最后位置 [size] 添加元素
     *
     * @param element 待添加元素
     */
    public void addLast(int element) {
    
    
        // 复用任意位置添加元素,插入位置是逻辑大小
        add(size, element);
    }

    /**
     * 容量检查,不够进行扩容
     */
    private void checkAndGrow() {
    
    
        // 容量检查
        if (size == 0) {
    
    
            array = new int[capacity];
        } else if (size == capacity) {
    
    
            // 进行扩容, 1.5 1.618 2
            capacity += capacity >> 1;
            int[] newArray = new int[capacity];
            System.arraycopy(array, 0,
                    newArray, 0, size);
            array = newArray;
        }
    }

    /**
     * 从 [0 .. size) 范围删除元素
     *
     * @param index 索引位置
     * @return 被删除元素
     */
    public int remove(int index) {
    
     // [0..size)
        // 要删除的元素
        int removed = array[index];
        // 如果要删除的元素索引小于逻辑大小-1,那么把目标索引的后面元素都向前移动一位
        if (index < size - 1) {
    
    
            // 向前挪动
            System.arraycopy(array, index + 1,
                    array, index, size - index - 1);
        }
        // 逻辑大小-1
        size--;
        return removed;
    }


    /**
     * 查询元素
     *
     * @param index 索引位置, 在 [0..size) 区间内
     * @return 该索引位置的元素
     */
    public int get(int index) {
    
    
        return array[index];
    }

    /**
     * 遍历方法1
     *
     * @param consumer 遍历要执行的操作, 入参: 每个元素
     */
    public void foreach(Consumer<Integer> consumer) {
    
    
        // 使用Consumer把拿到的元素交给调用者来使用,具体使用方法取决于调用者
        for (int i = 0; i < size; i++) {
    
    
            // 提供 array[i]
            // 返回 void
            consumer.accept(array[i]);
        }
    }

    /**
     * 遍历方法2 - 迭代器遍历,这个类要实现Iterator接口
     */
    @Override
    public Iterator<Integer> iterator() {
    
    
        // 使用匿名内部类,直接返回一个迭代器,实现接口的两个方法
        return new Iterator<Integer>() {
    
    
            int i = 0;

            @Override
            public boolean hasNext() {
    
     // 有没有下一个元素
                return i < size;
            }

            @Override
            public Integer next() {
    
     // 返回当前元素,并移动到下一个元素
                return array[i++];
            }
        };
    }

    /**
     * 遍历方法3 - stream 遍历
     *
     * @return stream 流
     */
    public IntStream stream() {
    
    
        return IntStream.of(Arrays.copyOfRange(array, 0, size));
    }

}
  • The implementation of these methods simplifies the validity judgment of the index, assuming that the input index is legal

Insert or delete performance

**Head position:**Because the elements behind the head need to be moved by one bit, the time complexity is O ( n ) O(n)O ( n )

**Middle position: **The element after the specified index position is also moved, so the time complexity is O ( n ) O(n)O ( n )

**Tail position:** The last element can be found directly through the index without moving the element, so the time complexity is O ( 1 ) O(1)O ( 1 ) (amortized)

Two-dimensional array

The so-called two-dimensional array is the array in the array, and the array nests the array. as follows:

int[][] array = {
    
    
    {
    
    11, 12, 13, 14, 15},
    {
    
    21, 22, 23, 24, 25},
    {
    
    31, 32, 33, 34, 35},
};

The memory map is as follows

insert image description here

  • The top two-dimensional array occupies 32 bytes, among which the three elements array[0], array[1] and array[2] respectively store references to three one-dimensional arrays

  • Three one-dimensional arrays occupy 40 bytes each

  • They are contiguous on the inner layout

More generally, for a two-dimensional array A array [ m ] [ n ] Array[m][n]Array[m][n]

  • m m m is the length of the outer array, which can be regarded as row row
  • n n n is the length of the inner array, which can be regarded as a column column
  • When accessing A array [ i ] [ j ] Array[i][j]Array[i][j] 0 ≤ i < m , 0 ≤ j < n 0\leq i \lt m, 0\leq j \lt n 0i<m,0j<When n , it is equivalent to
    • first find the iii inner arrays (rows)
    • Then find the jjth in this inner arrayj elements (columns)

small test

In the Java environment (regardless of class pointer and reference compression, which is the default), there are the following two-dimensional arrays

byte[][] array = {
    
    
    {
    
    11, 12, 13, 14, 15},
    {
    
    21, 22, 23, 24, 25},
    {
    
    31, 32, 33, 34, 35},
};

It is known that the starting address of the array object is 0x1000, so what is the address of the element 23?

answer:

  • Start address 0x1000
  • Outer array size: 16 bytes object header + 3 elements * 4 bytes per reference + 4 alignment bytes = 32 = 0x20
  • The size of the first inner array: 16-byte object header + 5 elements * each byte1 byte + 3 alignment bytes = 24 = 0x18
  • The second inner array, 16-byte object header = 0x10, the index of the element to be searched is 2
  • Final result = 0x1000 + 0x20 + 0x18 + 0x10 + 2*1 = 0x104a

principle of locality

Only spatial locality is discussed here

  • After the cpu reads the memory (slow speed) data, it will put it into the cache (fast speed). If the data is used in subsequent calculations and can be read in the cache, there is no need to read the memory
  • The smallest storage unit of the cache is the cache line (cache line), which is generally 64 bytes. It is not worthwhile to read less data at a time, so at least 64 bytes are read to fill a cache line, so when reading a certain data, it will also read its neighboring data , this is called spatial locality

Impact on efficiency

Compare the execution efficiency of the following two methods ij and ji

int rows = 1000000;
int columns = 14;
int[][] a = new int[rows][columns];

StopWatch sw = new StopWatch();
sw.start("ij");
ij(a, rows, columns);
sw.stop();
sw.start("ji");
ji(a, rows, columns);
sw.stop();
System.out.println(sw.prettyPrint());

ij method

public static void ij(int[][] a, int rows, int columns) {
    
    
    long sum = 0L;
    for (int i = 0; i < rows; i++) {
    
    
        for (int j = 0; j < columns; j++) {
    
    
            sum += a[i][j];
        }
    }
    System.out.println(sum);
}

ji method

public static void ji(int[][] a, int rows, int columns) {
    
    
    long sum = 0L;
    for (int j = 0; j < columns; j++) {
    
    
        for (int i = 0; i < rows; i++) {
    
    
            sum += a[i][j];
        }
    }
    System.out.println(sum);
}

Results of the

0
0
StopWatch '': running time = 96283300 ns
---------------------------------------------
ns         %     Task name
---------------------------------------------
016196200  017%  ij
080087100  083%  ji

It can be seen that the efficiency of ij is much faster than that of ji, why?

  • The cache is limited, when new data comes, some old cache line data will be overwritten
  • Inefficiency can result if cached data is not fully utilized

Take the execution of ji as an example, the first inner loop should read [ 0 , 0 ] [0,0][0,0 ] This piece of data, due to the principle of locality, is read into[ 0 , 0 ] [0,0][0,0 ] also read in[0, 1] . . . [ 0, 13] [0,1] ... [0,13][0,1]...[0,13 ] , as shown in the figure

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-ap5uUytD-1679319472567)(.\imgs\image-20221104164329026.png)]

But unfortunately, the second inner loop wants [ 1 , 0 ] [1,0][1,0 ] This piece of data is not in the cache, so the data in the figure below is read in

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-zuvKO099-1679319472568)(.\imgs\image-20221104164716282.png)]

This is obviously a waste because [ 0 , 1 ] . . . [ 0 , 13 ] [0,1] ... [0,13][0,1]...[0,13 ] includes[1, 1] . . . [1, 13] [1,1] ... [1,13][1,1]...[1,13 ] Although these data are read into the cache, they are not used in time, and the size of the cache is limited. When the ninth inner loop is executed

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-q4dHdf2Y-1679319472568)(.\imgs\image-20221104164947154.png)]

The first line of cached data has been replaced by the new data [ 8 , 0 ] . . . [ 8 , 13 ] [8,0] ... [8,13][8,0]...[8,13 ] Overwritten, if you want to read again later, such as[ 0 , 1 ] [0,1][0,1 ] , get the memory to read again

In the same way, the ij function can be analyzed to make full use of the cached data loaded by the principle of locality

learn by analogy

  1. The principle of locality can also be reflected when I/O is read and written

  2. Arrays can take full advantage of the principle of locality, but what about linked lists?

    Answer: The linked list is not good, because the elements of the linked list are not stored adjacently

Out of bounds inspection

There are out-of-bounds checks for reading and writing array elements in java, similar to the following code

bool is_within_bounds(int index) const        
{ 
    return 0 <= index && index < length(); 
}
  • Code location:openjdk\src\hotspot\share\oops\arrayOop.hpp

It's just that this check code does not need to be called by the programmer himself, the JVM will call it for us

Guess you like

Origin blog.csdn.net/qq_43745578/article/details/129677176
Recommended