Git object storage structure analysis

1 Introduction

There are four kinds of git objects: blob (data block), tree (directory tree), commit (commit), tag (tag).

This article discusses the storage structure of objects through an example, taking blob as an example. The example uses git version 2.17 .

2. Practical discussion

2.1. Generate Blob object file

First create a test git repository

$ mkdir hello
$ cd hello
$ git init

Then by creating a file test, the content of test is "hello", you can see that the byte length of the test file is 6, because the newline character \n is automatically added at the end of the line when the file is created. Execute the file git add, you can see that a subdirectory ce is generated under the .git/objects directory, and there is a file 013625030ba8dba906f756967f9e9ca394464a under the ce directory.

$ echo "hello" > test
$ du -b test
6       test
$ git add test
$ find .git/objects/ -type f
.git/objects/ce/013625030ba8dba906f756967f9e9ca394464a

This file is the blob object file generated by git for the data content of the test file, and the SHA value of the object is ce013625030ba8dba906f756967f9e9ca394464a.

So far, two questions arise:

  1. What is the data structure of an object file?
  2. How is the SHA value of the object generated?

2.2. Object data structure and SHA value

According to the description in Git-Internals-Git-Objects :

First , the object file data structure is as follows:

Enter image description

  • content: Indicates data content
  • head: object header information
    • object type: object type, optional values ​​are blob, tree, commit, tag
    • whitespace: a space character
    • content byte size: string of bytes of data content
    • NUL: Null character, ASCII code value is 0

Then , the SHA value of the object is obtained by performing the SHA1 hash digest algorithm on the above data structure.

2.3. Hands-on verification

According to the rules in 2.2, encode and generate a SHA value for the content of the test file in 2.1 to see if it is consistent with the SHA value generated by git?

    // object content
    String content = "hello\n";
    byte[] contentBytes = content.getBytes();

    ByteBuffer buf = ByteBuffer.allocate(1024);


    buf.put("blob".getBytes()); // object type
    buf.put((byte) ' ');        // whitespace
    buf.put(Integer.toString(contentBytes.length).getBytes());  // content byte size numeric string
    buf.put((byte) 0);          // NUL
    buf.put(contentBytes);      // content

    buf.flip();

    // whole object bytes
    byte[] objectBytes = new byte[buf.remaining()];
    buf.get(objectBytes);

    // Execute SHA1 hash digest
    MessageDigest md = MessageDigest.getInstance("SHA1");
    byte[] shaBytes = md.digest(objectBytes);

    // Show in hex
    String shaHex = Hex.encodeHexString(shaBytes);
    System.out.println(shaHex);

The above code outputs: ce013625030ba8dba906f756967f9e9ca394464a. Consistent with the SHA value generated by git in 2.1, the generation of the data structure and SHA value in 2.2 is verified.

2.4. Object Compression

According to Git-Internals-Git-Objects , git object files are compressed and stored by Zlib::Deflate.deflate.

$ cat .git/objects/ae/a941d707291bf3f2103c096479b068f7bed4f8
x☺K
cat: write error: Input/output error

It can be seen that the content cannot be directly output through the cat command.

        InputStream is = new InflaterInputStream(new FileInputStream(
                ".git\\objects\\ce\\013625030ba8dba906f756967f9e9ca394464a"));

        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        int b = 0;
        while ((b = is.read()) != -1) {
            baos.write(b);
        }

        byte[] res = baos.toByteArray();
        System.out.println(new String(res));

        is.close();
        baos.close();

The data structure of the object can be output by the above code:

blob 6hello

Note that this includes the invisible characters NUL and newlines.

git cat-fileThe data content of the object can be directly viewed through the command:

$ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello

3. Summary

  1. The object data structure is:

Enter image description

  1. The object SHA value is generated by performing the SHA1 message digest algorithm on the pair (1. object data structure);
  2. The object storage structure is: (1. Object data structure) is compressed and stored after deflate;

4. References

  • Git-Internals-Git-Objects
  • "Git Version Control Management" (2nd Edition) - People's Posts and Telecommunications Publishing House

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324896565&siteId=291194637