Solution ideas and cases of file integrity testing

Introduction : md5sum is a tool for calculating and verifying the MD5 value of files under Unix and Unix-like operating systems. It is mainly used to ensure the integrity of files. MD5 is a commonly used hash function that can generate a fixed-length (128-bit) hash value. In the file verification scenario, we can check whether the file has been changed during transmission or storage by comparing the MD5 hash value of the file.

History Raiders:

Linux: create large files quickly

python: a complete list of third-party libraries

Installation : The md5sum tool comes pre-installed on most Linux distributions, so you probably don't need to install it separately. You can check if md5sum is installed by typing:

md5sum --version

For Debian systems (such as Ubuntu), use the apt command:

sudo apt-get install coreutils

For Red Hat based systems such as Fedora or CentOS, use the yum command:

sudo yum install coreutils

Steps to use md5sum : filename is the name of the file. This command outputs an MD5 checksum, which is a unique representation of the file's contents.

md5sum filename

After getting a new file (such as a downloaded file), calculate the MD5 checksum of the file again, and compare it with the original MD5 checksum. If the two checksums are the same, then the file has not been changed. If the checksums are different, the file may have been altered during transfer or storage.

Caveats : MD5 is not suitable for scenarios where a high degree of security is required, as it has been shown to have collision issues (i.e. different inputs may produce the same output). If you need more security, consider using a more secure hash function such as SHA-256.

Case : Through inspection, we found that the hash before and after is a981130cf2b7e09f4686dc273cf7187e, that is, the file is complete after uploading, downloading and transferring

(base) [root@572ysx2s check-file-size]# fallocate -l 2G myfile-01.img  # 创建一个文件
(base) [root@572ysx2s check-file-size]# ls -lh  # 查看文件大小
total 2.1G
-rw-r--r-- 1 root root 2.0G Jun  1 10:30 myfile-01.img
(base) [root@572ysx2s check-file-size]# 
(base) [root@572ysx2s check-file-size]# md5sum myfile-01.img  # 通过md5sum 查看文件hash
a981130cf2b7e09f4686dc273cf7187e  myfile-01.img

# 注:上传这个文件,此处忽略

(base) [root@ci4vyvxi572ysx2s check-file-size]# cp -r myfile-01.img myfile-02.img  # 拷贝、远程拷贝或者下载这个文件。
(base) [root@572ysx2s check-file-size]# md5sum myfile-02.img  # 再次通过md5sum 查看文件hash 
a981130cf2b7e09f4686dc273cf7187e  myfile-02.img

Extended case : verify all files and integrity under a certain path. To handle the size and integrity of all files under a certain path, you can traverse all files under the target path and calculate their sizes and hash values ​​separately. Here are examples of how this can be done in Linux and Python:

Linux case : Using find, du and md5sum commands, it is easy to recursively find all files in a directory, calculate their size and MD5 hash value.

Example script:

find /path/to/directory -type f -exec du -sh {
    
    } \; -exec md5sum {
    
    } \;

In this script:

The find /path/to/directory -type f command finds all files (excluding directories) in the specified directory.

-exec du -sh {} ; The du -sh command will be executed for each file to calculate the size of the file.

-exec md5sum {} ; Execute the md5sum command for each file to calculate the MD5 hash value of the file.

Python case : Use the os and hashlib modules for this task.

Example script:

# -*- coding: utf-8 -*-
# time: 2023/6/1 14:00
# file: test.py
# 公众号: 玩转测试开发

import os
import hashlib


def get_file_md5(file_path):
    md5_hash = hashlib.md5()
    with open(file_path, "rb") as f:
        for byte_block in iter(lambda: f.read(4096),b""):
            md5_hash.update(byte_block)
    return md5_hash.hexdigest()


def get_file_size(file_path):
    return os.path.getsize(file_path)


def process_directory(directory_path):
    for root, dirs, files in os.walk(directory_path):
        for file in files:
            file_path = os.path.join(root, file)
            file_size = get_file_size(file_path)
            file_md5 = get_file_md5(file_path)
            print(f"File: {
      
      file_path}\nSize: {
      
      file_size} bytes\nMD5: {
      
      file_md5}\n")


directory_path = "/path/to/directory"
process_directory(directory_path)

In this script:

The os.walk(directory_path) function recursively walks all files and subdirectories of the specified directory.

For each file, the script calculates the file's size and MD5 hash, and prints the result.

Guess you like

Origin blog.csdn.net/hzblucky1314/article/details/130987210