文档格式签名列表

       最近在做解压缩相关项目,需要处理不同格式的文档,各个文件格式,解析器如何知道一个文件是什么格式,主要是文件二进制头(file signatures-文件签名)来决定的。

      例如如何确定一个文件是apk(同zip等压缩文件)文件,需要解析其前四个字节“50 4B 03 04”来确定。

      反编译apk文件中的dex文件,其前8个字节是固定的(“64 65 78 0A 30 33 35 00”),其他文件类似。

      先拷贝各文件签名列表如下,以便查询:

Hex signature ISO 8859-1 Offset File extension Description
00 . 0 PIC

PIF
SEA
YTR

IBM Storyboard bitmap file

Windows Program Information File
Mac Stuffit Self-Extracting Archive
IRIS OCR data file

00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00

........

........
........

11 PDB PalmPilot Database/Document File
00 00 00 nn 66 74 79 70

33 67 70

....ftyp

3gp

0 3GG, 3GP, 3G2 3rd Generation Partnership Project 3GPP (nn=0x14)

and 3GPP2 (nn=0x20) multimedia files

00 00 00 nn 66 74 79 70

33 67 70 35

....ftyp

3gp5

0 MP4 MPEG-4 video files
00 00 01 00 .... 0 ico Computer icon encoded in ICO file format[1]
00 01 00 00 ... 0 ... Palm Desktop Data File (Access format)
00 01 42 44 ... 0 DBA Palm Desktop To Do Archive
00 01 44 54 ... 0 TDA Palm Desktop Calendar Archive
05 07 00 00 42 4F 42 4F
05 07 00 00 00 00 00 00
00 00 00 00 00 01
....BOBO............ 0 cwk AppleWorks 5 document
06 07 E1 00 42 4F 42 4F
06 07 E1 00 00 00 00 00
00 00 00 00 00 01
....BOBO............ 0 cwk AppleWorks 6 document
1F 9D .. 0 z, tar.z compressed file (often tar zip)

using Lempel-Ziv-Welch algorithm

1F A0 .. 0 z, tar.z Compressed file (often tar zip)

using LZH algorithm

24 53 44 49 30 30 30 31 $SDI0001 0   System Deployment Image, a disk image format used by Microsoft
25 21 50 53  %!PS 0 ps PostScript document
25 50 44 46  %PDF 0 pdf PDF document
30 26 B2 75 8E 66 CF 11

A6 D9 00 AA 00 62 CE 6C

0&²u.fÏ.

¦Ù.ª.bÎl

0 asf, wma, wmv Advanced Systems Format[8]
38 42 50 53 8BPS 0 psd Photoshop Document file, Adobe Photoshop's native file format
41 47 44 33 AGD3 0 fh8 FreeHand 8 document[18][19][20]
42 4D BM 0 bmp, dib BMP file, a bitmap format used mostly in the Windows world
42 5A 68 BZh 0 bz2 Compressed file using Bzip2 algorithm
43 44 30 30 31 CD001 0x8001, 0x8801 or 0x9001 iso ISO9660 CD/DVD image file[9]
43 72 32 34 Cr24 0 crx Google Chrome extension[16] or packaged app[17]
45 52 02 00 00 00
or
8B 45 52 02 00 00 00
ER....
or
ãER....
0 toast Roxio Toast disc image file, also some .dmg-files begin with same bytes
46 4F 52 4D nn nn nn nn 38 53 56 58 FORM....8SVX 0, any 8svx, 8sv, svx, snd, iff IFF 8-Bit Sampled Voice
46 4F 52 4D nn nn nn nn 41 43 42 4D FORM....ACBM 0, any acbm, iff Amiga Contiguous Bitmap
46 4F 52 4D nn nn nn nn 41 49 46 46 FORM....AIFF 0, any aiff, aif, aifc, snd, iff Audio Interchange File Format
46 4F 52 4D nn nn nn nn 41 4E 42 4D FORM....ANBM 0, any anbm, iff IFF Animated Bitmap
46 4F 52 4D nn nn nn nn 41 4E 49 4D FORM....ANIM 0, any anim, iff IFF CEL Animation
46 4F 52 4D nn nn nn nn 43 4D 55 53 FORM....CMUS 0, any cmus, mus, iff IFF Musical Score
46 4F 52 4D nn nn nn nn 46 41 4E 54 FORM....FANT 0, any iff Amiga Fantavision Movie
46 4F 52 4D nn nn nn nn 46 41 58 58 FORM....FAXX 0, any faxx, fax, iff IFF Facsimile Image
46 4F 52 4D nn nn nn nn 46 54 58 54 FORM....FTXT 0, any ftxt, txt, iff IFF Formatted Text
46 4F 52 4D nn nn nn nn 49 4C 42 4D FORM....ILBM 0, any ilbm, lbm, ibm, iff IFF Interleaved Bitmap Image
46 4F 52 4D nn nn nn nn 53 4D 55 53 FORM....SMUS 0, any smus, smu, mus, iff IFF Simple Musical Score
46 4F 52 4D nn nn nn nn 59 55 56 4E FORM....YUVN 0, any yuvn, yuv, iff IFF YUV Image
47 49 46 38 37 61

47 49 46 38 39 61

GIF87a

GIF89a

0 gif Image file encoded in the Graphics Interchange Format (GIF)[2]
49 44 33 ID3 0 mp3 MP3 file with an ID3v2 container
49 49 2A 00 (little endian format) or
4D 4D 00 2A (big endian format)
II*. or
MM.*
0 tif, tiff Tagged Image File Format
4B 44 4D KDM 0 vmdk VMDK files [14][15]
4D 54 68 64 MThd 0 mid, midi MIDI sound file[12]
4D 5A MZ 0 exe DOS MZ executable file format and its descendants (including NE and PE)
4E 45 53 1A NES 0 nes Nintendo Entertainment System ROM file [25]
4F 67 67 53 OggS 0 ogg, oga, ogv Ogg, an open source media container format
50 4B 03 04, 50 4B 05 06 (empty archive) or 50 4B 07 08 (spanned archive) PK.. 0 zip, jar, odt, ods, odp, docx, xlsx, pptx, apk zip file format and formats based on it, such as JARODFOOXML
50 4D 4F 43 43 4D 4F 43 PMOCCMOC 0 dat Windows Files And Settings Transfer Repository[22] See also USMT 3.0 (Win XP)[23] and USMT 4.0 (Win 7)[24] User Guides
52 49 46 46 nn nn nn nn 57 41 56 45 RIFF....WAVE 0 wav Waveform Audio File Format
52 61 72 21 1A 07 00 Rar!... 0 rar RAR archive version 1.50 onwards[3]
52 61 72 21 1A 07 01 00 Rar!.... 0 rar RAR archive version 5.0 onwards[4]
53 44 50 58 (big endian format) or
58 50 44 53 (little endian format)
SDPX or
XPDS
0 dpx SMPTE DPX image
53 49 4d 50 4c 45 20 20

3d 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20
20 20 20 20 20 54

SIMPLE = T 0 fits Flexible Image Transport System (FITS)[10]
64 65 78 0A 30 33 35 00 dex
035
0 dex Dalvik Executable
66 4C 61 43 fLaC 0 flac Free Lossless Audio Codec[11]
75 73 74 61 72 00 30 30
or
75 73 74 61 72 20 20 00
ustar.00
or
ustar .
257 tar tar archive[26]
76 2F 31 01 v/1. 0 exr OpenEXR image
78 01 73 0D 62 62 60 x.s.bb` 0 dmg Apple Disk Image file
78 61 72 21 xar! 0 xar eXtensible ARchive format[21]
7F 45 4C 46 .ELF 0   Executable and Linkable Format
80 2A 5F D7 .*_. 0 cin Kodak Cineon image
89 50 4E 47 0D 0A 1A 0A .PNG.... 0 png Image encoded in the Portable Network Graphics format[5]
BE BA FE CA ... 0 DBA Palm Desktop Calendar Archive
CA FE BA BE Êþº¾ 0 class Java class fileMach-O Fat Binary
CE FA ED FE ........ 0   Mach-O binary (reverse byte ordering scheme, 32-bit)[6]
CF FA ED FE ........ 0   Mach-O binary (reverse byte ordering scheme, 64-bit)[7]
D0 CF 11 E0 A1 B1 1A E1     doc, xls, ppt Microsoft Office documents[13]
EF BB BF  0   UTF-8 encoded Unicode byte order mark, commonly seen in text files.
FE ED FA CE ........ 0 or typically 0x1000   Mach-O binary (32-bit)
FE ED FA CF ........ 0 or typically 0x1000   Mach-O binary (64-bit)
FF D8 FF ÿØÿà 0 jpg, jpeg JPEG
FF FB ˙ű 0 mp3 MPEG-1 Layer 3 file without an ID3 tag or with an ID3v1 tag (which's appended at the end of the file)
FF FE .. 0   Byte-order mark for text file encoded in little-endian 16-bit Unicode Transfer Format
FF FE 00 00 .... 0   Byte-order mark for text file encoded in little-endian 32-bit Unicode Transfer Format

参考:

1、http://en.wikipedia.org/wiki/List_of_file_signatures

2、http://www.astro.keele.ac.uk/oldusers/rno/Computing/File_magic.html#Exec

猜你喜欢

转载自blog.csdn.net/richerg85/article/details/39320549