File type identification-magic number

Introduction to Magic Number:

When recognizing the file type, many of us are identified by the suffix of the file, such as teacher Cang.mp4, teacher Bo.avi, Maria.jpg. Using suffix names to identify file types is not particularly accurate, especially if the suffix name can be modified manually. Another way to identify the file name is to use the mark in the header information of the file. We call this mark a magic number. Perhaps this analysis is not particularly accurate, but it is more accurate for identifying file types.

Common file type magic number table:

 

category

file type

Magic number

Starting offset

End offset

Exe,

dll

Windows executable

"MZ"

0

2

Linux executable

"\x7F\x45\x4c\x46"

0

4

Java class

"\xCA\xFE\xBA\xBE"

0

4

 

 

 

 

Text

files

class

 

Office2003, WPS

"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"

0

8

Office2003, WPS

"WPS2001"

2

9

Office2003, WPS

"\x64\x6f\x63\x50\x72\x6f\x70\x73"

30

38

Office2007

"_Types\x5d\x2exml"

38

49

mdb

"\x00\x01\x00\x00Standard Jet DB"

0

20

accdb

"\x00\x01\x00\x00Standard ACE DB"

0

20

rtf

"\{\\rtf"

0

5

hlp

"\x3F\x5F\x03\x00"

0

4

hlp

"\x4C\x4E\x02\x00"

0

4

chm

"ITSF"

0

4

with respect to

"From:"

0

5

reg

"REGEDIT"

0

7

reg

"Windows Registry"

0

16

reg

"R\x00e\x00g\x00i\x00s\x00t\x00r\x00y"

18

33

pdf

"%PDF-"

0

5

eps

"%!PS-Adobe"

0

10

Adobe FrameMaker

"<MakerFile"

0

10

 

 

Press

Shrink

class

 

rar

"Rar!"

0

4

zip

"PK\003\004"

0

4

zip

"PK00PK\003\004"

0

8

7z 

"7z\xBC\xAF\x27"

0

5

arj

"\x60\xEA"

0

2

bz2

"BZh"

0

3

gzip

"\037\213"

0

2

gzip

"\x1F\x8B\x08\x08"

0

4

tz

"TZ"

0

2

compress

"SZDD"

0

4

cab

"MSCF"

0

4

rpm

"\xED\xAB\xEE\xDB"

0

4

ace

"\x2A\x2A\x41\x43\x45\x2A\x2A"

7

14

 

 

 

Fig

sheet

class

 

bmp

"BM"

0

2

jpeg

"\377\330\377"

0

3

jpeg

"JFIF"

6

10

jpeg

"Exif"

6

10

png

"\x89PNG"

0

4

gif

"GIF"

0

3

swf

"FWS"

0

3

swf

"CWS"

0

3

flp

"FLhd"

0

4

flp

"flash_project"

4

17

dwg

"AC10"

0

4

tiff

"MM\x00\x2B"

0

4

tiff

"II*"

0

3

 

 

 

 

 

 

 

sound

As

frequency

class

rm

".RMF"

0

4

rm

".RMX" 

0

4

rm

"PROP" 

18

22

flv

"FLV"

0

3

avi

"AVI LIST" 

8

16

wma

"\x30\x26\xb2\x75\x8e\x66\xcf\x11"

0

8

wav

  "WAVEfmt"

8

15

mp4

"ftypmp41"

4

12

mp4

"ftypmp42"

4

12

mp4

"ftypisom"

4

12

3gp

"ftyp3gp4

4

12

3gp

"ftypmmp4"

4

12

mp3

"\xff\xff"  

11

1

mp3

"\x49\x44\x33"

0

3

mtv

"AMV"

0

3

m4a

"ftypM4A\x20"

4

12

m4a

"ftypM4V\x20"

4

12

au

"dns"

0

3

au

"\x2E\x73\x6E\x64"

0

4

mdi

"EP"

0

2

其他

vhd

"conectix"

0

8

gho

"\xFE\xEF\x01"

0

3

nri

"\x0ENeroISO"

0

8

pf

"SCCA"

4

8

hqx

"(This file must be converted with BinHex 4.0)"

0

45

 

Unknow

 

-1

0

  • 注1: 上述表格中,有的为"\xAA" : 指的为十六进制的0xAA, 因为该数为非打印的字符,因此使用十六进制进行表示;还有一部分为“\223” : 指的为八进制数223(O), 原因和上述十六进制相同。
  • 注2:用字符串来表示的优点有很多:1)方便扩展  2)以字符串常量的方式存储在字符串常量区  3)无论采用哪种方式存储,比较时仍然是二进制形式,所以匹配时和单独使用十六进制是相同的效果。

 

3. 如何查看一个文件类型的魔数信息

可使用的软件很多,我自己常用的是UltraEdit, 还有notepad++也可以,只不过需要安装个16进制的插件。

例如:

pdf类型:

wps创建的docx类型:

 

 

 

发布了81 篇原创文章 · 获赞 69 · 访问量 5万+

Guess you like

Origin blog.csdn.net/s2603898260/article/details/103648822