[Meta EnCodec source code analysis] BitPacker function introduction

binary stream

First, a little about binary streams.

If there are the next 4 values

[  47, 19,  38,  53 ]

First of all, the binary corresponding to each number is as follows

decimal value binary value
47 0x0010 1111
19 0x0001 0011
38 0x0010 0110
53 0x0011 0101

We need to save these numbers into a binary file. Note: BigEndian or LittleEndian are not considered here.
Then in this file, according to the order of input, that [ 47, 19, 38, 53 ]is , the file should be like this,


(53)  0x0011 0101        (38)   0x0010 0110         (19)  0x0001 0011       (47) 0x0010 1111

A total of 4 bytes.

compressed byte stream

If you find that the first two digits of each value are actually 0, if these two digits are removed, the remaining valid data is 24bit, which is 3 bytes.
This leaves one byte.

BitPacker is used to do this kind of saving work. It stores the value in a binary stream according to the specified number of bits.


BitPacker source code implementation

class BitPacker:
    """Simple bit packer to handle ints with a non standard width, e.g. 10 bits.
    Note that for some bandwidth (1.5, 3), the codebook representation
    will not cover an integer number of bytes.

    Args:
        bits (int): number of bits per value that will be pushed.
        fo (IO[bytes]): file-object to push the bytes to.
    """
    def __init__(self, bits: int, fo: tp.IO[bytes]):
        self._current_value = 0
        self._current_bits = 0
        self.bits = bits
        self.fo = fo

    def push(self, value: int):
        """Push a new value to the stream. This will immediately
        write as many uint8 as possible to the underlying file-object."""
        self._current_value += (value << self._current_bits)
        self._current_bits += self.bits
        while self._current_bits >= 8:
            lower_8bits = self._current_value & 0xff
            self._current_bits -= 8
            self._current_value >>= 8
            self.fo.write(bytes([lower_8bits]))

    def flush(self):
        """Flushes the remaining partial uint8, call this at the end
        of the stream to encode."""
        if self._current_bits:
            self.fo.write(bytes([self._current_value]))
            self._current_value = 0
            self._current_bits = 0
        self.fo.flush()

and the corresponding unpacking class

class BitUnpacker:
    """BitUnpacker does the opposite of `BitPacker`.

    Args:
        bits (int): number of bits of the values to decode.
        fo (IO[bytes]): file-object to push the bytes to.
        """
    def __init__(self, bits: int, fo: tp.IO[bytes]):
        self.bits = bits
        self.fo = fo
        self._mask = (1 << bits) - 1
        self._current_value = 0
        self._current_bits = 0

    def pull(self) -> tp.Optional[int]:
        """
        Pull a single value from the stream, potentially reading some
        extra bytes from the underlying file-object.
        Returns `None` when reaching the end of the stream.
        """
        while self._current_bits < self.bits:
            buf = self.fo.read(1)
            if not buf:
                return None
            character = buf[0]
            self._current_value += character << self._current_bits
            self._current_bits += 8

        out = self._current_value & self._mask
        self._current_value >>= self.bits
        self._current_bits -= self.bits
        return out

Use BitPacker

Below is the test example,

if __name__ == '__main__':
        length: int = 4
        bits: int = 6
        tokens: tp.List[int] = [ 47, 19,  38,  53 ]
        rebuilt: tp.List[int] = []
        buf = io.BytesIO()
        packer = BitPacker(bits, buf)
        for token in tokens:
            packer.push(token)
        packer.flush()
        buf.seek(0)
        unpacker = BitUnpacker(bits, buf)
        while True:
            value = unpacker.pull()
            if value is None:
                break
            rebuilt.append(value)
        assert len(rebuilt) >= len(tokens), (len(rebuilt), len(tokens))
        # The flushing mechanism might lead to "ghost" values at the end of the stream.
        assert len(rebuilt) <= len(tokens) + 8 // bits, (len(rebuilt), len(tokens), bits)
        for idx, (a, b) in enumerate(zip(tokens, rebuilt)):
            assert a == b, (idx, a, b)

How does it work?

1. pack

The first character input (47) 0x0010 1111,

The figure below shows the bit positions of two bytes. When the first number (47) 0x0010 1111 is pushed in, no special processing is done because it is less than one byte in length.

insert image description here

Enter the second character (19) 0x0001 0011,

Enter the second character 0x0001 0011, intuitively it needs to be placed to the left of the previous number;
then it is actually shifted to the left by 6 bits, that is, 0x0001 0011 <<6, as shown in the figure below

insert image description here
At this time, the length is greater than one byte, so the low byte can be packed, that is, 0x 1110 1111 , and the remaining data is shown in the figure

insert image description here

Enter (38) 0x0010 0110 for the third character,

insert image description here

At this time, the length is greater than one byte, so the low byte can be packed, that is, 0x 0110 0100 , and the rest of the input is shown in the figure

![Insert picture description here](https://img-blog.csdnimg.cn/6aace43f6d4b49c6a3771beef6670f8c.pnginsert image description here

Enter the fourth character (53) 0x0011 0101 ,

Similarly, according to the above logic, pack the lowest byte, 0x11010110
![Insert picture description here](https://img-blog.csdnimg.cn/2e3765656c95485a92d983ea26810962.png

Precautions

The value to be pushed must be greater than or equal to the bit length specified in BitPacker, otherwise it will be truncated by the value.

Guess you like

Origin blog.csdn.net/mimiduck/article/details/128804802