Linear data structure summary

Array

An array is a linear data structure. When division will create an array in memory a contiguous memory area, data is stored in each index of this continuous region.

Array support operations

Query (get (int index))

First element acquired by the subscript time complexity is O (1). Since the array is a contiguous memory area, and the size of each element are equal, quickly able to find the memory address corresponding to the index change by a linear equation. For example, if an int array of memory address 0 position is b, index what you're looking subscript, then obviously you want to get the memory address is b + 4 * index, 4 is the number of bytes in an int accounted for. Therefore, the computer calculates the index by index * TYPE_SIZE + B . Of course, not all are so calculated array subscript, for example, some virtual machines in the open space is not an array of continuous open space.

Inserting (add (int index, E e))

To specify the array index is increased to be generally increased movement of array elements, for example, a ten element integer array, to a digital A first element is inserted into position, then it is necessary first of all Previous element moves from a position a and then inserted back into the first position. Therefore, the average insertion time complexity of the array is O (n)

Delete (remove (int index))

And insert as specified index to delete. For example there are ten elements, remove the first element, then you need an array element from the current element to the next forward to mark a before and after. Therefore, deletion of the array average time complexity is O (n)

Modified (set (int index, E e))

Modification is relatively simple, direct access to change the value of the index to the current element

Dynamic Array

Many high-level languages ​​have an array of basic structure, but in using their time if we add more than one element to start defining the total number of words it is no way to continue to add. So, when we do not know the size of the beginning of the array. This is too much trouble, and we need to define their own dynamic array, we do not need to control him of the initial capacity.

Realize their array

The main dynamic array needs to be rewritten to increase the array, delete operations, but also to achieve expansion of operations

increase

Increase operational difference between the original own implementation is to determine whether you need an array expansion. When the conditions for expansion of the length and the number of the same element of the array requires expansion, the expansion is generally twice. Expansion steps:

  • And make a copy of the original array size is 2 times the original array
  • All the original elements of the array is written to the new array, and adds the newly added element
delete

Deletion difference between the original own implementation is to determine whether the array volume reduction, volume reduction is necessary. Volume reduction conditions little more than the expansion conditions, that is, the array size before volume reduction greater than or equal to 2, and the current number of elements in the array is half the size. Reduction step of receiving:

  • First element of the original array delete
  • A copy of the original array and let the size of 1/2 of the original array
  • All the original elements of the array is written to a new array

Expansion, shock resulting volume reduction

We generated shock and expansion, volume reduction conditions, and if the above condition according to the capacity expansion and contraction, then if the number of elements of the array if the hovering between 8 and 9, the size of the array will be in 8, 16 wandering.

Solution and the volatility of the specific needs related to, we can volume reduction conditions to "array size before volume reduction greater than or equal to 2, and the current number of elements in an array of size 1/4"

How java.util.ArrayList expansion

Certainly in expansion

    //新增方法
    public boolean add(E e) {
        //modCount是记录修改次数的(迭代器判断结构是否变),迭代器fail-fast机制就靠它
        modCount++;
        //elementData是数组元素,size是数组大小
        add(e, elementData, size);
        return true;
    }
    
    private void add(E e, Object[] elementData, int s) {
        //数组大小和数组元素个数相等,要扩容了
        if (s == elementData.length)
            elementData = grow(s+1);
        elementData[s] = e;
        size = s + 1;
    }
    
    private Object[] grow(int minCapacity) {
        //可以看到也是通过复制一个新数组
        return elementData = Arrays.copyOf(elementData,
                                           newCapacity(minCapacity));
    }
    
    //接下来就是如何真正实现扩容的!!!
    private int newCapacity(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        //新大小是老大小的1.5倍
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        //如果新大小比oldCapacity+1小
        if (newCapacity - minCapacity <= 0) {
            //如果数组是空的,第一次添加元素,就直接扩容到10和minCapacity中大的那个
            if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA)
                return Math.max(DEFAULT_CAPACITY, minCapacity);
            if (minCapacity < 0) // overflow
                throw new OutOfMemoryError();
            return minCapacity;
        }
        //返回大的那个,不超过最大
        return (newCapacity - MAX_ARRAY_SIZE <= 0)
            ? newCapacity
            : hugeCapacity(minCapacity);
    }
复制代码

We can see the expansion of the idea is very simple, my environment is jdk1.9, as for other versions of expansion ideas I think are similar. As for the volume reduction, there is no longer set forth, ArrayList source is relatively easy to understand.

List

And arrays are linear chain structure, but a list is composed of a node, the next node has a pointer to the next node, that is to say the list does not require a continuous array of space. Figure:

image

List of operations

The following look at how to perform CRUD list

increase

Increased operating time complexity is O (1), do not have to move like an array of array elements, such as the A-> C, increase A B increases back to the steps of:

  • Save for next pointer A temp
  • A pointer points to the next B
  • The next B point temp

delete

Remove the time complexity is O (1), do not have to move like an array of array elements, such as the A-> B-> C, Delete Delete Step B:

  • Gets variable temp Save Node B
  • A pointer to the next point C (free fall B)

Inquire

Query time complexity of O (n), as a linked list arrays can not be directly calculated by the memory address index. It is necessary to find the corresponding index node traversal.

Here's additions and deletions time complexity is O (1) well understood, we may come to a conclusion that is highly efficient array of query modification, deletion high increase efficiency list. But in fact this is the case points. When we in one insertion or deletion node we are going to traverse to the specified position (since we only the first address or the node address of the end node). Done before a test, the structure LinkedList (doubly linked lists) java in and ArrayList (doubly linked lists), an increase of about one million element of time will find more methods of increasing the efficiency of the array, because adding to the list at the specified location going to traverse the elements of the list.

Dummy head node: Finally, there will be a general list head node, the node points to the usefulness of the list of the first node, the head node is the case when we will remove the last element when not specifically delete only one element of judgment

Stacks, queues

Very similar stacks and queues, read in conjunction with it.

  • The stack is a FILO (first out) structure which may be implemented by an array or linked list. Common browser such as web browser, to enter another page in the current page is equivalent to the stack, returns the equivalent of a stack.
  • Queue structure is a FIFO (First In First Out), which may be implemented by an array or linked list. Apply the queue is very wide, such as queuing.

Stack

Stack array-based implementation

Structure: an array of array, top variable

Array used to store incoming data. top pointer to the next position of the last element , if the stack is empty top point 0.

for example:

image
As FIG.

  • Empty stack: the stack is empty when the top points subscript 0
  • Full Stack: When the element is full top == array.length;
  • Stack: If there are elements of the stack array [top ++] = stack elements
  • Stack: return array [top--] When the element to the stack;

Based on the list of the stack

Structure: top pointer, node

  • Node has data, next point to the next node
  • Pointer to point to the uppermost top of the stack node

Figure:

image

  • The stack is empty: top points to null
  • Stack: the new node is configured node, the new node's next node pointer to point to top, top node pointer to the new node
  • Stack: save top node with the node pointed to, the pointer points to the node top node of the next pointer pointing return node

queue

Based on the list of queues

Structure: virtual head node, the node

  • Dummy head node: front pointer to the first element, rear pointer to the last element
  • Node: data element data, a next node pointing to the next

Figure:

image

  • Empty queue: front and rear point to null
  • Into the team: next pointer to construct a new node, rear pointer to the new node
  • Dequeue: determining non-empty, the storage node point node Front, front point to the next node node, return node

Queue-based array

There are a number of different stack-based arrays and array-based queues. It is contemplated that when the number of teams included the team behind the array can be increased, when the team moved out of the front element, it will appear in front of the problem is the team element will become unavailable but can not use, there is a solution can be put in when the team behind the elements on the front, just as you can delete the first element in front of a dynamic array, but if a team like this every once moved a whole bit too time-consuming elements of a number. Therefore, incorporated herein circular queue.

Understand circular queue

Circular queue problem to be solved is the issue queue array of wasted space. Circular queues physical address is not on a cycle, the cycle but logically.

Structures: arrays, front pointing head, rear tail pointing to the team after an element

front=front%array.length

Figure:

image

  • While two pointers front, superscript head rear, front team, rear tail subscript team.
  • The queue is empty: front and rear points to a location
  • a, b, c into the team, first team constant subscripts front, rear (rear = (rear + 1)% array.length) points to the last element of the next coordinate
  • a dequeue, front point to the next element (front = (front + 1)% array.length)
  • Finally, the representative elements shown in FIG increased when full d2 (rear + 1)% array.length = front as

Note that the point

  1. When the team, rear = (rear + 1)% array.length
  2. When dequeued, front = (front + 1)% array.length
  3. We wasted the equivalent position of an element in the circular queue, the benefits of doing so is that we can distinguish the difference between empty and full team team.
  4. Empty queue: front == rear
  5. Full queue: (rear + 1)% array.length == front, this time the capacity is needed
  6. Expansion: logical expansion step and dynamic array of similar, but to re-assignment to the front and rear

Hash table

When we want to find someone in the school of information, we will learn to make inquiries number to the Office of Academic Affairs, and the Office of Academic Affairs to obtain student number you provide will give you a student of information. Here students get through school number information is to use the idea of ​​a hash table, and the correspondence between the student number and student information is a hash function, and if the two numbers correspond to the same school a student information is a hash conflict.

So the next look at the hash table of terms:

  • Hash: access by a given value of the key data structure directly to a specific value corresponding to
  • Hash functions: given keyword into the memory address index
  • Hash collision: Different keywords into the same index by a hash function

The role of the hash table

Hash table lookup time complexity is O (1), the idea is to find a hash table lookup directly to the element by keyword. Deleted increase time complexity is O (1), that is to say the hash table

The structure of the hash table

Layer with a Node array array, the current number of elements M

  • The Node has key (key), value (value), next (next node Node).
  • M is the number of elements in the current array of some

Hash function

Because we want to achieve yield different values ​​as a hash function for different key. So for selecting a hash function is more important.

The following is a common hash function:

  • Direct addressing: a linear function takes a hash value of the keyword or keyword address. The H (key) = key or H (key) = a? Key + b. a, b are constants.
  • Digital Analysis: By analysis of the data, fewer collisions found that partial data, and configured to hash addresses. Such as ID number, we can extract the ID number of key figures, such as date of birth and represents the last six.
  • Middle-square method: the middle of the square after taking a few keywords as the hash address.
  • Taking the random number method: using a random function, the keyword takes a random value as the hash address this manner commonly used for different lengths of keywords occasion.
  • Method I except specimens: H (key) = key MOD p. key is a keyword, MOD is the modulo operator, p is the length of the hash table, P is preferably a prime number, it is possible to achieve the smallest possible hash collisions .

Next we will hash functions are set to other than law specimens for analysis.

Hash collision

Hash collisions is two different key, they get the same results after the hash (key). For solving the conflict also has a variety of hash

  • Chain address method: If A is determined after the hash array index is 2, B is determined after hash array subscript is also 2, B is formed just after the A chain. In this case the entire hash table array structure is formed + list.
  • Open Address method:
    • Linear Detection: When the found hash collision subscripts +1 indexing position until there is no element and then into
    • Square detection: when the found hash collision subscripts for 1 ^ 2, 2 ^ 2, 2 ^ 3 ... until the next indexing position without the element and then into
    • Pseudo-random sequence method: When a conflict is found again modulo key will add a random number, until the next indexing position without the element and then into
  • And then hashing: This method is simultaneously constructed of a plurality of different hash functions, if the first hash collisions on the use of a second hash function, and so on

Next we will hash conflict resolution provided for analysis to the chain address law.

Hash table operation

Increased operating

To add a user, for example, K, V value pairs, the first hash value calculated from K, then the hash value obtained modulo the length of the array index, if the index is no element, will K, V encapsulated Node Add to this index, if the index has elements to K, V Node encapsulated rearmost added to the index element.

Find operation

For example a user is looking through the node K, then modulo index into this array obtained by the hash (K), if the index is no element, representative of lookup fails, if the index has elements, then to the back of this element the linked list traversal, have returned.

Delete and modify logic similar

HashMap implementation

java HashMap is packaged in a very good hash table, by analyzing its implementation also allows us to better design its own hash table

Let's look at what it key field

    //负载因子
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    
    //哈希表中的数组
    transient Node<K,V>[] table;
    
    //数组中每个元素保存的下面这个节点
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    
        //..一些节点其他的方法
        
    }
复制代码

Next, put a process operation point of view (increased operation).

    //增加一个key,value
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    
    //采用的hash算法
    static final int hash(Object key) {
        int h;
        //可以看到这里hash算法用的key的hashCode和h右移16位的异或运算
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    
    //最后调用的putVal操作
    /**
     * hash是传来的哈希值
     * key是键,value是值
     * onlyIfAbsent为false代表多个key会重写
     * evict为false代表表处于创建状态
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //如果一开始hashmap没有元素的话初始化hashmap大小
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //没有哈希冲突的话直接构造新的链表节点添加进数组中,i为计算出的下标
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        //接下来是有哈希冲突的情况
        else {
            Node<K,V> e; K k;
            //这里计算出下标的节点的哈希值要等于之前传过来计算好的哈希值。并且要引用同一个对象并且equals方法也要相等!!这里表示判断为同一个对象的逻辑!!我曾经在这里踩过坑。。其次,如果两次的key都为null的话
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //这里不是同一个对象并且是TreeNode结构(当链表节点个数大于等于8的时候会转化为红黑树)
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //接下来正常添加链表节点
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //如果value变换之后这里会返回老的value
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
复制代码

About HashMap put in operation, it has a place to note the following points

  • If the transmitted key is null, putVal or by adding a conventional operation, as with the above logic, and get out of the hash value of 0
  • How to determine whether two key equal, is simply hashCode and equals two keys must be the same or consistent at the same time. Long before I stepped on a pit two objects is the same value returned by their hashCode, and equals it returns false, resulting in added time will always add two elements, in fact, this is my design error, a most hashCode rewritten to ensure good and consistent method can quals

Next will summarize tree-structured data types, and the next one will realize their own data types to share out

Guess you like

Origin juejin.im/post/5ceb3cc65188251a9b1d5625