The file type is determined using apache.tika

I. file type may generally be determined in two ways

  1. extension judge

     Easy to operate, but can not accurately determine the type

  2. Analyzing the file header information

    Analyzing the file type can usually, but some can not determine the file type (e.g., the first few bytes of header information word and excel is the same, not determined)

  3. Use apache.tika can easily solve the above problem in two ways

 

II. Use

  1. maven dependence

<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>1.22</version>
</dependency>

  2. realization

 1 public static String getMimeType(String fileName, InputStream inputStream){
 2     AutoDetectParser parser = new AutoDetectParser();
 3     parser.setParsers(new HashMap<MediaType, Parser>());
 4 
 5     Metadata metadata = new Metadata();
 6     metadata.add(TikaMetadataKeys.RESOURCE_NAME_KEY, fileName);
 7 
 8     try {
 9         parser.parse(inputStream, new DefaultHandler(), metadata, new ParseContext());
10         inputStream.close();
11     } catch (TikaException | SAXException | IOException e) {
12         e.printStackTrace();
13     }
14 
15     return metadata.get(HttpHeaders.CONTENT_TYPE);
16 }

  3. common file types

MimeType file type
application/msword word(.doc)
application/vnd.ms-powerpoint powerpoint(.ppt)
application/vnd.ms-excel excel(.xls)
application/vnd.openxmlformats-officedocument.wordprocessingml.document word(.docx)
application/vnd.openxmlformats-officedocument.presentationml.presentation powerpoint(.pptx)
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet excel(.xlsx)
application/x-rar-compressed rar
application/zip zip
application/pdf pdf
video/* Video files
image/* Image files
text/plain Plain Text
text/css css file
text/html html file
text/x-java-source java source code
text/x-csrc c source code
text/x-c++src c ++ source code

 

Guess you like

Origin www.cnblogs.com/Mr-kevin/p/12014611.html