Understand the transient keyword in the Java Collections Framework


When analyzing the source code of HashMap and ArrayList, we will find that the arrays that store data are all modified with the transient keyword, as follows: In

HashMap:

````
 transient Node<K,V>[] table;
````


Inside the ArrayList:
````
 transient Object[] elementData
````


Since it is modified with transient, it means that this array will not be serialized. At the same time, we found that both sets have their own serialization methods:

first look at the custom serialization code of HashMap:
````
//1
    private void writeObject(java.io.ObjectOutputStream s)
        throws IOException {
        int buckets = capacity();
        // Write out the threshold, loadfactor, and any hidden stuff
        s.defaultWriteObject();
        s.writeInt(buckets);
        s.writeInt(size);
        internalWriteEntries(s);
    }
//2    
 public void internalWriteEntries(java.io.ObjectOutputStream s) throws IOException {
        Node<K,V>[] tab;
        if (size > 0 && (tab = table) != null) {
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                    s.writeObject(e.key);
                    s.writeObject(e.value);
                }
            }
        }
    }
    
````
Look at the custom deserialization code of HashMap:
````
//1
   private void readObject(java.io.ObjectInputStream s)
        throws IOException, ClassNotFoundException {
        // Read in the threshold (ignored), loadfactor, and any hidden stuff
        s.defaultReadObject();
        reinitialize();
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new InvalidObjectException("Illegal load factor: " +
                                             loadFactor);
        s.readInt();                // Read and ignore number of buckets
        int mappings = s.readInt(); // Read number of mappings (size)
        if (mappings < 0)
            throw new InvalidObjectException("Illegal mappings count: " +
                                             mappings);
        else if (mappings > 0) { // (if zero, use defaults)
            // Size the table using given load factor only if within
            // range of 0.25...4.0
            float lf = Math.min(Math.max(0.25f, loadFactor), 4.0f);
            float fc = (float)mappings / lf + 1.0f;
            int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ?
                       DEFAULT_INITIAL_CAPACITY :
                       (fc >= MAXIMUM_CAPACITY) ?
                       MAXIMUM_CAPACITY :
                       tableSizeFor((int)fc));
            float ft = (float)cap * lf;
            threshold = ((cap < MAXIMUM_CAPACITY && ft < MAXIMUM_CAPACITY) ?
                         (int)ft : Integer.MAX_VALUE);
            @SuppressWarnings({"rawtypes","unchecked"})
                Node<K,V>[] tab = (Node<K,V>[])new Node[cap];
            table = tab;

            // Read the keys and values, and put the mappings in the HashMap
            for (int i = 0; i < mappings; i++) {
                @SuppressWarnings("unchecked")
                    K key = (K) s.readObject();
                @SuppressWarnings("unchecked")
                    V value = (V) s.readObject();
                putVal(hash(key), key, value, false, false);
            }
        }
    }
    
    
    
````


Here we see that the serialization and deserialization methods are customized in the source code of HashMap. The serialization method mainly writes the number of buckets, size and k, v of the current HashMap to the object output stream one by one. , and then parsed from the stream one by one during deserialization, and then restored the entire data structure of HashMap.


Then we look at the implementation of custom serialization in ArrayList:

````
    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException{
        // Write out element count, and any hidden stuff
        int expectedModCount = modCount;
        s.defaultWriteObject();

        // Write out size as capacity for behavioural compatibility with clone()
        s.writeInt(size);

        // Write out all elements in the proper order.
        for (int i=0; i<size; i++) {
            s.writeObject(elementData[i]);
        }

        if (modCount != expectedModCount) {
            throw new ConcurrentModificationException();
        }
    }
    
````
Then the implementation of deserialization:

````
        private void readObject(java.io.ObjectInputStream s)
        throws java.io.IOException, ClassNotFoundException {
        elementData = EMPTY_ELEMENTDATA;

        // Read in size, and any hidden stuff
        s.defaultReadObject();

        // Read in capacity
        s.readInt(); // ignored

        if (size > 0) {
            // be like clone(), allocate array based upon size not capacity
            ensureCapacityInternal(size);

            Object[] a = elementData;
            // Read in all elements in the proper order.
            for (int i=0; i<size; i++) {
                a[i] = s.readObject();
            }
        }
    }
````


ArrayList also writes its size and non-null data to the stream, and then reuses the data to restore the data structure during deserialization.



So the question is, why they all implement the Serializable interface and already have the function of automatic serialization, why do they need to re-implement the serialization and deserialization methods?



(1) Reasons for serialization and deserialization


in HashMap: To define its own serialization and deserialization implementation in HashMap, there is an important factor because the hashCode method is modified with the native modifier, that is, use it Related to the running environment of JVM, the source code of hashCode in Object class is as follows:
````
 public native int hashCode();

````


That is to say, the hashCode generated by different jvm virtual machines for the same key may be different, so the memory distribution of the data may not be equal. For example, there are two jvm virtual machines, A and B, respectively. The hashCode generated by the same string x is different:

so it leads to:

in A's jvm its position in the table array is calculated by hashCode is 3

in B's jvm its position in the table array is calculated by hashCode At

this time, if we follow the default serialization method in A's jvm, the position attribute 3 will be written into the byte stream, and then deserialized through B's jvm, and this data will also be placed in At the position of 3 in the table array, and then we get the data in the jvm of B. Since the hashCode of the key is different from that of A, it will take the value from the position of 5, so that the data will not be read.



How to solve this problem? First of all, the main reason for the above problem is that the memory distribution may be different because the hashCode is different, so as long as the factors related to the hashCode such as the above location attribute are excluded during serialization, you can solve this problem.

The easiest way is to serialize the data into the byte stream in A's jvm, instead of serializing the array across the board, and then regenerate the memory distribution of the table according to the data when deserializing in B's jvm, so that comes It perfectly solved this problem.



(2) Reasons for serialization and deserialization in ArrayList:

In ArrayList, we know that the length of the array will continue to dynamically expand with the insertion of data, and each expansion needs to increase the length of the original array by half, which Half of the length is null in extreme cases, so this part of the data can be excluded during serialization, thereby saving time and space:
````
     for (int i=0; i<size; i++) {
            s.writeObject(elementData[i]);
        }
````

Note that the size used by ArrayList to traverse the elements in the original array when serializing, not elementData.length is the length of the array, and the size of size is the number of non-null elements in the array, so here is the use of self- Defines how to serialize.


Careful friends here may have a question: why does the dynamic array expansion used in HashMap use table.length instead of size when serializing? This is actually easy to answer. In HashMap, table.length must be 2 to the nth power, and this value will determine the value of several parameters, so if the null value is also removed, the value of table.length must be re-estimated, which may cause redistribution of all data, so the most The best way is to keep it as it is.

Note that the null value above means that the Node element in the table is null, not that the key in the HashMap is equal to null, and the key is a field in the Node.



Summary:


This article mainly introduces why the core data structure fields in HashMap and ArrayList are modified with transient and introduces the reasons respectively, so when using serialization, you should keep in mind a sentence in effective java: when the physical representation of an object When the method is substantially different from its logical data content, there are N kinds of defects in using the default serialization form, so the serialization method should be rewritten as much as possible according to the actual situation.

If you have any questions, you can scan the code and follow the WeChat public account: I am the siege division (woshigcs), leave a message in the background for consultation. Technical debts cannot be owed, and health debts cannot be owed. On the road of seeking the Tao, walk with you.






Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326311574&siteId=291194637