Analytical principle Serializable

Foreword

Normally we use Java serialization and de-serialization, simply class implements Serializablean interface to the rest of the things to the jdk. Today we'll explore what, Java serialization is how to achieve, and then explore a few common collection classes, how they deal with problems caused by the serialization.

Analysis process

Several issues to be thinking

  1. Why serialize an object, only need to implement Serializablethe interface on it.
  2. Normally we serialize a class, why the recommended practice is to achieve a static finalmember variables serialVersionUID.
  3. Serialization mechanism is how to ignore transientkeyword, static variables will not be serialized.

Next we took the problem, look for the answer in the source code.

Serializable

Look at Serializablethe interface, source code is very simple, an empty interface, there is no way and no member variables. But the very detailed notes, clearly describes Serializablehow to use, what to do, it is worth a visit, we look to pick up a few translation priorities,

/**
 * Serializability of a class is enabled by the class implementing the
 * java.io.Serializable interface. Classes that do not implement this
 * interface will not have any of their state serialized or
 * deserialized.  All subtypes of a serializable class are themselves
 * serializable.  The serialization interface has no methods or fields
 * and serves only to identify the semantics of being serializable. 
 */
复制代码

Class serializability achieved by java.io.Serializableopening the interface. Not implement the serialization interface class can not be serialized, serialization achieved all subclasses can be serialized. SerializableInterface methods and properties not just a class identification mark can be serialized.

/**
 * Classes that require special handling during the serialization and
 * deserialization process must implement special methods with these exact
 * signatures:
 *
 * <PRE>
 * private void writeObject(java.io.ObjectOutputStream out)
 *     throws IOException
 * private void readObject(java.io.ObjectInputStream in)
 *     throws IOException, ClassNotFoundException;
 * private void readObjectNoData()
 *     throws ObjectStreamException;
 * </PRE>
 */
复制代码

In the serialization process, if the class to do something special treatment, the following method can be achieved by writeObject(), readObject(), readObjectNoData(), wherein

  • writeObject method is responsible for writing the state of a particular class of objects, so that the corresponding readObject()method can be restored.
  • readObject()The method responsible for reading from the stream and recover class fields.
  • If a super class does not support serialization, but we do not want to use the default value of how to do? writeReplace()The method can be made before the object stream is written to replace themselves with an object.
  • readResolve()Typically used in a single embodiment mode, when the object is read from the stream, you can be replaced with another object of an object.

ObjectOutputStream

    //我们要序列化对象的方法实现一般都是在这个函数中
    public final void writeObject(Object obj) throws IOException {
        ...
        try {
            //写入的具体实现方法
            writeObject0(obj, false);
        } catch (IOException ex) {
            ...
            throw ex;
        }
    }
    
    private void writeObject0(Object obj, boolean unshared) throws IOException {
        ...省略
        
        Object orig = obj;
            Class<?> cl = obj.getClass();
            ObjectStreamClass desc;
            for (;;) {
                // REMIND: skip this check for strings/arrays?
                Class<?> repCl;
                //获取到ObjectStreamClass,这个类很重要
                //在它的构造函数初始化时会调用获取类属性的函数
                //最终会调用getDefaultSerialFields这个方法
                //在其中通过flag过滤掉类的某一个为transient或static的属性(解释了问题3)
                desc = ObjectStreamClass.lookup(cl, true);
                if (!desc.hasWriteReplaceMethod() ||
                    (obj = desc.invokeWriteReplace(obj)) == null ||
                    (repCl = obj.getClass()) == cl)
                {
                    break;
                }
                cl = repCl;
        }
            
        //其中主要的写入逻辑如下
        //String, Array, Enum本身处理了序列化
        if (obj instanceof String) {
            writeString((String) obj, unshared);
        } else if (cl.isArray()) {
            writeArray(obj, desc, unshared);
        } else if (obj instanceof Enum) {
            writeEnum((Enum<?>) obj, desc, unshared);
            //重点在这里,通过`instanceof`判断对象是否为`Serializable`
            //这也就是普通自己定义的类如果没有实现`Serializable`
            //在序列化的时候会抛出异常的原因(解释了问题1)
        } else if (obj instanceof Serializable) {
            writeOrdinaryObject(obj, desc, unshared);
        } else {
            if (extendedDebugInfo) {
                throw new NotSerializableException(
                    cl.getName() + "\n" + debugInfoStack.toString());
            } else {
                throw new NotSerializableException(cl.getName());
            }
        }
        ...
    }

    private void writeOrdinaryObject(Object obj,
                                     ObjectStreamClass desc,
                                     boolean unshared)
        throws IOException
    {
        ...
        try {
            desc.checkSerialize();
            
            //写入二进制文件,普通对象开头的魔数0x73
            bout.writeByte(TC_OBJECT);
            //写入对应的类的描述符,见底下源码
            writeClassDesc(desc, false);
            
            handles.assign(unshared ? null : obj);
            if (desc.isExternalizable() && !desc.isProxy()) {
                writeExternalData((Externalizable) obj);
            } else {
                writeSerialData(obj, desc);
            }
        } finally {
            if (extendedDebugInfo) {
                debugInfoStack.pop();
            }
        }
    }
    
    private void writeClassDesc(ObjectStreamClass desc, boolean unshared)
        throws IOException
    {
        //句柄
        int handle;
        //null描述
        if (desc == null) {
            writeNull();
            //类对象引用句柄
            //如果流中已经存在句柄,则直接拿来用,提高序列化效率
        } else if (!unshared && (handle = handles.lookup(desc)) != -1) {
            writeHandle(handle);
            //动态代理类描述符
        } else if (desc.isProxy()) {
            writeProxyDesc(desc, unshared);
            //普通类描述符
        } else {
            //该方法会调用desc.writeNonProxy(this)如下
            writeNonProxyDesc(desc, unshared);
        }
    }
    
    void writeNonProxy(ObjectOutputStream out) throws IOException {
        out.writeUTF(name);
        //写入serialVersionUID
        out.writeLong(getSerialVersionUID());
        ...
    }
    
    public long getSerialVersionUID() {
        // 如果没有定义serialVersionUID
        // 序列化机制就会调用一个函数根据类内部的属性等计算出一个hash值
        // 这也是为什么不推荐序列化的时候不自己定义serialVersionUID的原因
        // 因为这个hash值是根据类的变化而变化的
        // 如果你新增了一个属性,那么之前那些被序列化后的二进制文件将不能反序列化回来,Java会抛出异常
        // (解释了问题2)
        if (suid == null) {
            suid = AccessController.doPrivileged(
                new PrivilegedAction<Long>() {
                    public Long run() {
                        return computeDefaultSUID(cl);
                    }
                }
            );
        }
        //已经定义了SerialVersionUID,直接获取
        return suid.longValue();
    }

    //分析到这里,要插一个我对序列化后二进制文件的一点个人见解,见下面
复制代码

An interpretation of the sequence of binary files

If we want a sequence of List<PhoneItem>which PhoneItemis as follows,

class PhoneItem implements Serializable {
    String phoneNumber;
}
复制代码

Code List of configuration will be omitted, we assume a sequence of sizefive of the Listview shown in binary probably as follows,

7372 xxxx xxxx 
7371 xxxx xxxx 
7371 xxxx xxxx 
7371 xxxx xxxx 
7371 xxxx xxxx 
复制代码

Interpretation by just source, the magic number at the beginning of 0x73 represents a general object 72 represents the class descriptor, the class descriptor 71 represents a reference type. Benevolence seen a little thin see, when parsing binaries, that is, by matching the magic number (magic number) at the beginning of the way, so as to be converted into Java objects. When the serialization process, if the stream has the same object, then the following sequence of the class object may be acquired directly handle, it becomes a reference type, thereby improving the efficiency of the sequence.

    //通过writeSerialData调用走到真正解析类的方法中,有没有复写writeObject处理的逻辑不太一样
    //这里以默认没有复写writeObject为例,最后会调用defaultWriteFields方法
    private void defaultWriteFields(Object obj, ObjectStreamClass desc)
        throws IOException
    {
        ...
        int primDataSize = desc.getPrimDataSize();
        if (primVals == null || primVals.length < primDataSize) {
            primVals = new byte[primDataSize];
        }
        desc.getPrimFieldValues(obj, primVals);
        //写入属性大小
        bout.write(primVals, 0, primDataSize, false);

        ObjectStreamField[] fields = desc.getFields(false);
        Object[] objVals = new Object[desc.getNumObjFields()];
        int numPrimFields = fields.length - objVals.length;
        desc.getObjFieldValues(obj, objVals);
        for (int i = 0; i < objVals.length; i++) {
            ...
            try {
                //遍历写入属性类型和属性大小
                writeObject0(objVals[i],
                             fields[numPrimFields + i].isUnshared());
            } finally {
                if (extendedDebugInfo) {
                    debugInfoStack.pop();
                }
            }
        }
    }
复制代码

For similar deserialization and serialization process, is omitted here.

Common problems serialization of collections

HashMap

Java objects required to be deserialized the consistency with the previous object is serialized, but because the key hashmap by hash calculation. After deserialization may not match the calculated value (deserialization jvm performed in different environments). So HashMap serialization process needs to be rewritten to achieve, to avoid such inconsistencies.

Since the specific operation that is to be treated as a property definition transient, and replication writeObject, in which a special treatment

    private void writeObject(java.io.ObjectOutputStream s)
        throws IOException {
        int buckets = capacity();
        // Write out the threshold, loadfactor, and any hidden stuff
        s.defaultWriteObject();
        //写入hash桶的容量
        s.writeInt(buckets);
        //写入k-v的大小
        s.writeInt(size);
        //遍历写入不为空的k-v
        internalWriteEntries(s);
    }
复制代码

ArrayList

ArrayList capacity because the array will be substantially larger than the actual number of elements, in order to avoid a sequence of arrays of elements is not rewritten writeObjectandreadObject

    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException{
        ...
        s.defaultWriteObject();

        // 写入arraylist当前的大小
        s.writeInt(size);

        // 按照相同顺序写入元素
        for (int i=0; i<size; i++) {
            s.writeObject(elementData[i]);
        }
        ...
    }

复制代码

Guess you like

Origin juejin.im/post/5e211c54f265da3dfa49b3fc