File type checking tool: jmimemagic source code analysis

In development, there is often a need to upload files. For security reasons, to prevent uploading malicious files, the file type needs to be checked. There are generally two ways online:

1. Check the file extension. Only files that meet the specified extension can be uploaded successfully.

2. Check the file header, the magic number of the file header is as expected (the magic number of each file is known), then the upload can be successful

The first method has obvious defects, users can pass the inspection by modifying the extension,

The second method can meet most scenarios, but it also has disadvantages, it does not judge the file extension

 

jmimemagic is an open source tool that uses the magic number in the file header to determine the file type.

See its address: https://github.com/arimus/jmimemagic.git

 

The process of obtaining the file type is as follows:

 


illustrate:

1. In the whole process, several important classes are involved:

   a. Magic: The external interaction interface class of the jmimeMagic tool, and the methods in the class are all static methods. The main methods are: getMagicMatch(File, boolean), getMagicMatch(File, boolean, boolean)

   b. MagicParser: The magic.xml file parsing class, which parses the data in magic.xml into internal objects, and uses SAXParse for the underlying parsing.

   c. MagicMatch: the object corresponding to the match tag in the magic.xml file

   d. MagicMatcher: A tool class that associates files with MagicMatch

 

2. The dotted box on the left mainly loads and parses the magic.xml file. The result of the parsing is the MagicMatcher list and the hintMap list

   a. The code snippet of magic.xml is as follows:

     <match>

<mimetype></mimetype>

<extension></extension>

<description>b, 32 kBits</description>

<property name="bitrate" value="32"/>

<test type="byte" offset="2" length="" bitmask="0xf0" comparator="=">0x10</test>

</match>

    After each match tag pair is parsed, a MagicMatch object is obtained, and a MagicMatch object is stored in the MagicMatcher object.

  b. The properties of the class MagicMacth are as follows:

       private String mimeType = null;

       private String extension = null;

       private String description = null;

       private ByteBuffer test = null;

       private int offset = 0;

       private int length = 0;

 

       // possible types:

       //     byte, short, long, string, date, beshort, belong, bedate, leshort,

       //     lelong, ledate, regex

       private String type = "";

       private long bitmask = 0xFFFFFFFFL;

       private char comparator = '\0';

       private List<MagicMatch> subMatches = new ArrayList<MagicMatch>(0);

       private Map<String,String> properties;

 

3、右侧的虚线框主要是根据文件获取MagicMatch

    a、如果传入方法Magic#getMagicMatch的参数extHints=true,那么优先使用文件扩展名去获取MagicMatch,只有根据文件扩展名获取不到MagicMatch的情况下,才会遍历整个matchers去获取对应的MagicMatch。因此,一般extHints的入参值为true。

   b、特殊情况下,获取不到MagicMatch,就会抛出异常。

 

4、测试代码:

public class TestMagic {

 

    public static void main(String[] args) {

        MagicMatch magicMatch;

        try {

            magicMatch = Magic.getMagicMatch(new File("/home/yangjianzhou/document/123456.png"), false);

        } catch (Exception exp) {

            exp.printStackTrace();

            return;

        }

        String mimeType = magicMatch.getMimeType();

        System.out.println("file mime type is : " + mimeType);

    }

 

}

 

总结:

jmimeMagic是一个很好的获取文件mimeType的工具类,对于大多数文件来说,都可以判断出其文件mimeType,如果不能判断,则可以对magic.xml进行扩展使其满足要求。但是,如果在文件尾部人为写入一些内容,可以躲过该工具的检测。

 

在本文开始提到过,可以使用扩展名或者文件头来判断文件类型,但是各有优劣,我们可以联合两种方式来判断:首先判断扩展名,在扩展名满足要求的情况下,再检测文件头,如果文件头检测通过,即使文件中被写入恶意代码,这些恶意代码也不会执行。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326431780&siteId=291194637