17 Learn Java Collection System

What is Collection

First open Collectionthe source code of this class and view the first paragraph

  The root interface in the collection hierarchy.  A collection
  represents a group of objects, known as its elements.  Some
  collections allow duplicate elements and others do not.  Some are ordered
  and others unordered.  The JDK does not provide any direct
  implementations of this interface: it provides implementations of more
  specific subinterfaces like Set and List.  This interface
  is typically used to pass collections around and manipulate them where
  maximum generality is desired.

The translation probably means:

The current class is the root interface in the Collection inheritance system.

A Collection represents a group of objects, and each object is called its element.

Some collections allow duplicate elements to appear, and some do not.

Some sets are ordered, and some are disordered.

JDK does not provide direct realization of this interface, but provides some implementations sub-interface, for example Set, List.

This interface is usually used as a parameter in some more general places.

The Collection interface is an implementation of polymorphism in Java. It only gives some methods that collections need to use. It does not provide implementations by itself. It is implemented by specific implementation subclasses. As shown in the figure below, they are all defined by Collection. method,

You probably know what you are doing by looking at the method name, which is also the benefit of the code naming convention.

addIs to add an element.

addAllIt is to add all the elements of another set to the current set.

removeIs to delete an element.

removeAllIs to delete the elements contained in the parameter set from the current set.

More readers can go directly to read the source code comments of this class. The source code comment documents are written in detail.

List

 An ordered collection (also known as a sequence).  The user of this
 interface has precise control over where in the list each element is
inserted.  The user can access elements by their integer index (position in
 the list), and search for elements in the list.

List is a sub-interface of Collection. Compared with Collection, List interface provides a series of methods that can find and save elements by subscript index.

ArrayList

ArrayList is a specific implementation class of the List interface. We can use this class to directly add, delete, modify, and check elements, for example:

List<String> strArr = new ArrayList<String>();
strArr.add("1");
strArr.add("2");
strArr.add("3");
...一直无限添加

The question is, can the ArrayList object in the above code really add unlimited input? How is it done?

The essence of ArrayList is actually an array. There is an Object[] elementDataattribute in its class . This attribute is used to store the elements that are continuously added through add and other methods. When the length of the elementData array is not enough, a larger array is created. Copy all the data in the previous array, and then add the latest data to the new array. This method is called expansion.

// 自己写的代码
public static void main(String[] args) {
    ArrayList<String> list = new ArrayList<String>();
    list.add("1");
}

// ArrayList的add实现方法
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

// 调用add方法时 ArrayList的扩容方法
private void ensureCapacityInternal(int minCapacity) {
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}

// 调用add方法时 ArrayList的扩容方法
private static int calculateCapacity(Object[] elementData, int minCapacity) {
	// 这里算出当前对象的数组是不是初始化时的默认的,
    // 如果是,则在 minCapacity 和 DEFAULT_CAPACITY中取出一个最大的作扩容数组的长度。
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        return Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    return minCapacity;
}


// 调用add方法时 ArrayList的扩容方法
private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    // 这里如果+1后的最小容量大于等于当前数组里的长度时,将执行真正的扩容操作
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

// 调用add方法时 ArrayList的扩容方法
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    // 默认按照1.5倍扩容
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    // 执行将老数据拷贝到一个新数组,并将新数组赋值给老数组变量操作
    // 这一步是最关键的地方
    elementData = Arrays.copyOf(elementData, newCapacity);
}

We can check the grow method in the ArrayList source code. This method is the most critical place. The replacement of the old and new arrays is here. ArrayList helps us to encapsulate the direct operation of the array.

We only need to call the add method to keep adding elements. This is one of the benefits of encapsulation. Under normal circumstances, ArrayList can add multiple objects that meet your business scenario. Of course, if you use it unreasonably, it will cause Memory overflow, as long as the memory does not overflow, you can always add elements to it.

Set

A collection that contains no duplicate elements.  More formally, sets
contain no pair of elements e1 and e2 such that
 e1.equals(e2), and at most one null element.  As implied by
its name, this interface models the mathematical set abstraction.

Set is a collection that does not allow repeated equal elements, and will only contain a null. It is suitable for scenarios that store non-repeated elements or require efficient search for elements.

HashSet

HashSet is one of the implementation subclasses of the Set interface. In addition to implementing the characteristics of Set, this class does not guarantee that the storage of elements is stored in order.

Quick search example compared with List efficiency

public static void main(String[] args) {
    ArrayList<Integer> list = new ArrayList<>();
    HashSet<Integer> set = new HashSet<>();
    for (int i = 0; i < 1000_0000; i++) {
        list.add(i);
        set.add(i);
    }
    // list查找元素
    Date listStartTime = new Date();
    list.contains(9999_9999);
    Date listEndTime = new Date();
    System.out.println((listEndTime.getTime() - listStartTime.getTime())   + "ms");
    // set查找元素
    Date setStartTime = new Date();
    set.contains(9999_9999);
    Date setEndTime = new Date();
    System.out.println((setEndTime.getTime() - setStartTime.getTime()) + "ms");
}

The list took 34 milliseconds, and the set only took less than 1 millisecond. This is the efficiency gap.

Storage uniqueness example

public static void main(String[] args) {
    ArrayList<Integer> list = new ArrayList<>();
    HashSet<Integer> set = new HashSet<>();
    for (int i = 0; i < 10; i++) {
        list.add(1);
        set.add(1);
    }
    System.out.println(list);
    System.out.println(set);
}

As you can see, there are ten 1s in the list, but there is only one in the set.

Why HashSet is so much faster than ArrayList HashCode

HashCode conventions in Java:

1. 同一个对象必须始终返回相同的HashCode
2. 两个对象的equals返回true,必须返回相同的HashCode
3. 两个对象不等,也可能返回相同的HashCode

Why is there HashCode? Having said that, let's give an example,

If there are one million elements in a HashSet, when adding one at this time, how to judge whether it is a duplicate?

It is necessary to cycle these million elements to compare them one by one, which is very inefficient.

If we divide these million elements into the corresponding hash buckets, for example, Zhang San, Zhang Si, and Zhang Wu are allocated to the bucket surnamed Zhang, and the surnamed Li is put into the bucket surnamed Li, so Next time I come to a person whose surname is Zhang, I will look for it in a bucket surnamed Zhang, and for a person whose surname is Li, I will look for a person whose surname is Li.

HashCode in Java is equivalent to putting an object in a bucket. You can find his HashCode bucket based on the object. In the bucket, you can judge whether there is an object with the same memory address or the same value as yourself, which can be extremely efficient. Determine whether there is the same object as yourself.

Example of out-of-order storage

public static void main(String[] args) {
    HashSet<String> set = new HashSet<>();
    set.add("3");
    set.add("9");
    set.add("7");
    set.add("13");
    set.add("1");
    System.out.println(set);
}

As you can see, the elements are not output in the order in which we added them.

Map

An object that maps keys to values.  A map cannot contain duplicate keys;
each key can map to at most one value.

This interface takes the place of the Dictionary class, which
was a totally abstract class rather than an interface.

Map is an object mapped from keys to values. A map does not allow duplicate keys, and each key can be mapped to at most one value.

The Map interface replaces the Dictionary class.

The difference between Map and Dictionary is that one key of Map can only be mapped to one value, but Dictionary can be mapped to multiple values.

Of course, Map is just an interface with many implementations. Let's talk about the most commonly used implementation of HashMap.

HashMap

Hash table based implementation of the Map interface. 

This implementation provides all of the optional map operations, and permits null values and the null key. 

The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.

This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

HashMap is a Map interface implementation based on Hash table.

He implements all the operations provided in the map, and allows null as a key or value.

HashMap and Hashtable are basically the same, except that HashMap is not synchronized (thread-unsafe), and null values ​​are allowed.

HashMap does not guarantee the storage order of the mapped data, and it should be noted that the original order may also change over time.

HashMap saves an element through put(K key, V value)

 public static void main(String[] args) {
    Map<String,String> hashMap = new HashMap<>();
    hashMap.put("A","1");
    hashMap.put("B","2");
    System.out.println(hashMap);
}

HashMap obtains an element through get(Object key)

public static void main(String[] args) {
    Map<String,String> hashMap = new HashMap<>();
    hashMap.put("A","1");
    hashMap.put("B","2");
    System.out.println(hashMap);
    // get获取
    System.out.println(hashMap.get("A"));
}

HashMap的keySet()


Returns a Set view of the keys contained in this map.

The set is backed by the map, so changes to the map are reflected in the set, and vice-versa.

The keySet() method returns a Set set. The set is the key of the current map. Changes to this Set will affect the current map, and vice versa. (Because Set is not allowed to be repeated, and the key in the map is not allowed to be repeated, so the key is stored in Set.)

public static void main(String[] args) {
    Map<String,String> hashMap = new HashMap<>();
    hashMap.put("A","1");
    hashMap.put("B","2");
    // 接收map的key set
    Set<String> keySet = hashMap.keySet();
    // 输出结果
    System.out.println(keySet);
    // 从map中删除一个key
    hashMap.remove("A");
    // 此刻输出结果,发现Set中少了一个A
    System.out.println(keySet);
}

From the above code and results to verify the Set object returned by the keySet method, as long as the key of the map performs a CRUD operation, the Set object will also change accordingly.

HashMap's entrySet()

public static void main(String[] args) {
    Map<String,String> hashMap = new HashMap<>();
    hashMap.put("A","1");
    hashMap.put("B","2");
    // 循环键值对
    for (Map.Entry<String, String> entry : hashMap.entrySet()) {
        // 输出key和value
        System.out.println("key: " + entry.getKey() + " value:" + entry.getValue());
    }
}

EntrySet is the same as keySet, and changes to the map will also affect him, but the keySet method only returns the key, and the entrySet method returns both the key and value.

HashMap common interview questions

  1. Expansion of HashMap

The expansion of HashMap is similar to ArrayList, which is to create a larger array, put the values ​​in the original array into it, and put the newest value into it when there is enough space.

  1. Thread insecurity of HashMap

HashMap in a multi-threaded environment, when resize and expansion at the same time, there will be an infinite loop problem, so ConcurrentHashMap can be used in a concurrent environment.

  1. HashMap changes after 1.7+

After the Java 1.7+ version, the Hash bucket changed from a linked list to a red-black tree.

The reason is that there are 100,000 buckets in a HashMap. At this time, I store 1 million data, but since the HashCode returned by the 1 million data is the same, they are all stored in one bucket.

At this time, the advantages of the Hash bucket are gone. The same bucket was stored in a linked list before 1.7, so at this time, it is very inefficient to find one of 1 million objects.

So after 1.7+, the JDK changed the original linked list to a red-black tree.

Set sort

First, we first write a piece of code, and then we analyze and summarize the output results:

public static void main(String[] args) {
    List<Integer> list = Arrays.asList(10000, 196, -2 , -334422 , 23332 , 16);
    Set set1 = new HashSet();
    Set set2 = new LinkedHashSet();
    Set set3 = new TreeSet();
    for (Integer i : list) {
        set1.add(i);
        set2.add(i);
        set3.add(i);
    }
    set1.forEach(System.out::println);
    System.out.println("-------------");
    set2.forEach(System.out::println);
    System.out.println("-------------");
    set3.forEach(System.out::println);
}

HashSet storage order is random

The output of HashSet:

10000
-334422
16
-2
196
23332

It can be seen here that the output result of HashSet is completely disordered, without rules, and randomly disrupted.

LinkedHashSet and the order of insertion are the same

The output of LinkedHashSet:

10000
196
-2
-334422
23332
16

You can see that this is the same as the order we added in the code.

TreeSet ordered storage

The output of TreeSet:

-334422
-2
16
196
10000
23332

The TreeSet is sorted from small to large. In particular, TreeMap and TreeSet are the same, except that TreeMap stores key-value pairs.

Introduction to red-black tree (is a branch of binary tree)

First look at the storage structure of ArrayList:

ArrayList is essentially a large array, if we want to find an element in millions of data, we have to loop to compare and find.

The complexity of this search is called linear complexity in the algorithm O(n). If there are n numbers with the worst result to be searched n times, the cost is proportional to the increase in scale.

Let's look at the storage structure of the tree:

The child nodes of the tree are smaller than the upper level on the left, and larger than the upper level on the right.

In an array, among 1, 2, 3, 4, 5, 6, and 6 elements, if you use an array to find 6, you need to find 6 times.

But if you use the tree above, the root node 3 is found for the first time, and compared with 6, it is found that the root node is smaller than 6, and then go to the right of the root node to search.

Then I found 5, and found that 5 was still smaller than 6, so I continued to look for it on the right, and finally found 6, with the number of searches being 3, and the efficiency comparison came out.

So with the tree, the algorithm complexity O(n)changes O(log n)from linear time to logarithmic time.

If an element with a value of 0 is inserted into the tree, then the tree will maintain the principle of small left and large right, as follows:

Collection tool method collection

Let me talk about a little knowledge first, in the collection of java, if you want to search for collection-related tools, search for Collections,

If it is a Set, search for Sets and add an s after the name of the collection class, which is its tool class. This is a small specification.

  1. Collections.emptySet(): returns an empty collection

  2. Collections.synchronizedCollection(): Change a collection to thread-safe

  3. Collections.unmodifiableCollection(): Change a collection to immutable (Guava's Immutable can also be used)

Other implementations of Collection

  1. Queue / recount

Queue

A collection designed for holding elements prior to processing.
  
Besides basic Collection operations, queues provide additional insertion, extraction, and inspection operations.

Queue is used to store a series of collections with priority elements.

In addition to the basic operations of Collection, it also provides operations such as inserting, extracting, and checking.

Queue is a common data structure. For example, when you queue up to buy a ticket, the first one to get in is the first one to get the ticket.

So the queue is in this mode, first in first out (LILO: Last in Last out), the element that enters first is processed first, and then it is queued.

Deque (double-ended queue)

A linear collection that supports element insertion and removal at both ends.

The name deque is short for "double ended queue" and is usually pronounced "deck".

Deque is a linear collection that supports adding and deleting operations at both ends, head and tail.

His name is short for deque.

  1. Vector/Stack (has been abandoned by the JDK)

Vector

 If a thread-safe implementation is not needed, it is recommended to use ArrayList in place of Vector.

Vector is the predecessor of ArrayList. Now JDK does not recommend it, but recommends ArrayList.

Stack

A more complete and consistent set of LIFO stack operations is
provided by the {@link Deque} interface and its implementations,   
which should be used in preference to this class.  For example:

Deque<Integer> stack = new ArrayDeque<Integer>();

Stack is a queue that supports first-in-last-out, but now JDK does not support it. Deque is recommended.

  1. LinkedList

LinkedList is an implementation of linked list. The difference from ArrayList is that ArrayList uses an array to arrange their order.

LinkedList means that the previous element will point to the next element, and each element stores the position of the next element, so it is called a linked list.

  1. ConcurrentHashMap

ConcurrentHashMap is a thread-safe implementation of HashMap.

  1. PriorityQueue
An unbounded priority Queue queue based on a priority heap.

Borderless priority queue based on heap implementation.

This queue is like an alarm in a mobile phone. It is a list. For example, it is 7 o'clock in the morning, and there is an alarm clock at 9 o'clock in the last place in the list, and the others are all afternoon alarms.

But after 9 o'clock, the 9 o'clock alarm sounds first, because its priority is the highest in the current environment, regardless of the storage order.

Guava

Guava is a set of core Java class libraries developed by Google, which do a lot of filling and expansion of Java native collections.

Interested readers can go to github to find out: https://github.com/google/guava

First introduce the jar package in the Maven project

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>29.0-jre</version>
</dependency>

Multiset

public static void main(String[] args) {
    Multiset multiset = HashMultiset.create();
    multiset.add(1);
    multiset.add(2);
    multiset.add(3);
    multiset.add(3);
    System.out.println(multiset);
}

If it is HashSet, the output result is 1, 2, 3, but Multiset output is [1, 2, 3 x 2],

Because Multiset will help us count how many times the same object has been placed, this is the Set collection that Guava helps us expand.

Multimap

public static void main(String[] args) {
    Multimap multimap = HashMultimap.create();
    multimap.put(1,1);
    multimap.put(1,2);
    multimap.put(1,3);
    System.out.println(multimap);
}

A traditional map can only store one value per key, and a multimap can correspond to multiple values ​​per key.

BiMap

public static void main(String[] args) {
    BiMap biMap = HashBiMap.create();
    biMap.put(1,"我是1");
    System.out.println(biMap.inverse().get("我是1"));
}

Traditional map can only find value by key, and BiMap can find key by value.

The above are a few typical Guava examples, other readers will find out by themselves.

to sum up

Data structures and algorithms are everywhere. In Java and Java third-party libraries, the implementation algorithms for data structures are all very good. This article is an intriguing article. I hope readers can find more implementations of other data structures in the Java system, and Think about the principles and reinforce your own knowledge in this area.

Guess you like

Origin blog.csdn.net/cainiao1412/article/details/108552575