Android binary XML file format XML

introduce

  After Android12, a custom protocol for writing XML was introduced. Such a binary file is called binary XML. This benchmark compares a typical packages.xml file to 4.3 times faster write speed and 2.4 times less storage space.
  This serialization has some limitations:
  1. Only UTF-8 encoding is supported.
  2. The length of the stored data cannot exceed 65535 bytes, and String values ​​are stored as UTF-8.
  3. Namespaces, prefixes, properties, and options are not supported.

Format

  The original XML format file can be read by humans because it is all characters. Binary XML cannot be read by humans, see its format in the figure below.
Binary XML data format

binary XML format

  The first three bytes are fixed as "ABX", which stands for "Android Binary XML." The fourth byte is the version, which is currently 0, and may increase as the protocol changes in the future.

  Then it is related to START_DOCUMENT, which is represented by one byte. Its value is START_DOCUMENT | TYPE_NULL. START_DOCUMENT is 0, TYPE_NULL is 1 << 4. START_DOCUMENT is called Token here, and the tags below are also called Token. TYPE_NULL is a value type, this one is empty. In this way, the upper four bits of a byte represent the value type, and the lower four bits represent the Token. The same is true below, each time an event starts, a byte is used to save the Token and value type in this way.
  A START_DOCUMENT corresponds to an END_DOCUMENT. Look at the last byte of the structure, which is related to END_DOCUMENT. It is also a byte, which contains the value type and Token.
  The structure next to START_DOCUMENT is START_TAG. You can see that its Token is START_TAG, and its value type is TYPE_STRING_INTERNED. The processing of TYPE_STRING_INTERNED needs to be said,

Handling of TYPE_STRING_INTERNED type values

  For values ​​of this type, if the value has been processed before, the value in the order in which it occurs is written as the result.
  If not, a 2-byte value of 65535 is first written. Then write the UTF-8 byte length of the value, and finally write the UTF-8 byte value. Refer to the following figure for the format of the first write-in of the tag content in the binary XML format diagram
STRING_INTERNED value layout

STRING_INTERNED type value layout

  After the value is processed for the first time, it will be collected later. When writing next time, it will first check whether it exists in the collected data. If it exists, you can find the order value in which it was collected. Directly The sequence value is written. This saves a lot of storage space.

  After talking about tags and tag content, we have to talk about the attributes of tags. The first byte at the beginning of the attribute is ATTRIBUTE | TYPE_**, because the value type of the attribute is more diverse, so I use TYPE_** to represent it here. Its types are TYPE_STRING, TYPE_BYTES_HEX, TYPE_BYTES_BASE64, TYPE_INT, TYPE_INT_HEX, TYPE_LONG, etc. The key writing method of the attribute is written according to the STRING_INTERNED type mentioned above, but the value is written in different formats depending on the format. Here, we choose TYPE_STRING and TYPE_INT to talk about it. Let’s first look at the TYPE_STRING format and the binary XML format map attribute The format and layout of the content it writes to the file refer to the following figure:
TYPE_STRING value layout

TYPE_STRING value layout

  Look at the layout format of the value in the TYPE_INT format TYPE_INT value layout

TYPE_INT value layout

   There is a purple block in the binary XML format diagram, which is used to describe the format of Token as TEXT. This <string>TEXT</string>is the case with "TEXT" in the middle. You can see that its Token is TEXT, and its type is TYPE_STRING. The specific format of the text content, its format is the same as the layout of the above TYPE_STRING value .
  There is an ellipsis in the binary XML format diagram, which says "other tags (which may be nested inside)", which means that other tags can continue to be nested in it, and the content format inside is basically the same as that mentioned above.
  This basically finishes the content of the binary XML file format. In the Android source code, the class used to generate binary XML files is platform\frameworks\base\core\java\com\android\internal\util\BinaryXmlSerializer, and the class used to parse it is platform\frameworks\base\core\java \com\android\internal\util\BinaryXmlPullParser, the specific logic can be viewed in these two files.

Convert binary to readable XML file

  According to the source file, the code tool files that can be used in the application are sorted out. For details, see XmlBinaryToTextUtil

Guess you like

Origin blog.csdn.net/q1165328963/article/details/125007694