Metadata Quest

A Preliminary MetaData

     The written support CLR programming languages (such as C ++ / CLI, C #, VB , etc.) source code files can be compiled by Microsoft or write your own compiler into a managed module, it is actually a standard PE file, its structure can be found in -depth understanding of the process of loading the CLR article. Metadata (metadata) are present in the IL code Sections of the PE file, and IL Metadata is always simultaneously generated and synchronized, this paper discusses Metadata of content, and the following code as an example:

namespace HelloWorld
{
    using System;
    using System.Runtime.InteropServices;

    public class Hello
    {
        [DllImport("User32.dll")]
        public static extern int MessageBox(int a,string b,string c, int d);

        public static void Main()
        {
            int num = MessageBox(0, "2", "3", 4);
            Console.Write("Please enter your name: ");
            string str = Console.ReadLine();
            num = MessageBox(0, str, "Welcome to use IL Assembly", 1);
        }
    }
}

     I think the content Metadata included can be divided into "macro content" and "micro content" in two parts.

     Manifest content includes macro, which contains the following information: configuration file assembly, or dependent on the external Module Assembly (including custom or dependence on the GAC Assembly Assembly, any program will depend the Mscorlib.dll .Net, so the assembly of the application will appear in the Manifest), the current identity Module (at least an assembly to have a Module), the assembly public key, image base and the like, as shown:

clip_image002

 

Figure 1 Manifest Content

     Microscopic including type, methods, features (the Attribute), for example, the type, including the type name, field and method of the type comprising, parameters of each method and the like details.

Second, the analysis Metadata

     Logically, Metadata Stream composed by a number, which can be divided into two categories Stream: Metadata heap and Metadata Table.

     Metadata stack includes:

     1) String heap: storing UTF-8 encoded string of 0 and ending in the first stack string is empty string, so the heap must first byte stored is 0, and the last stack byte must also be 0;

     2) GUID stack: storing 16-byte binary objects, because the fixed length, there is no need to identify additional bytes of the length or end position;

     3) Blob stack: storing binary objects of any size, alignment, as follows: Let the length of the object (an unsigned integer) to length, if length <= 0x7F store the one byte, if 0x80 <= length <= 0x3FFF with the two bytes of storage, if 0x4000 <= length <= 0x1FFFFFFF 4 bytes are stored;

     We can visually obtain information Stream from the following figures: open the assembly with the above-mentioned code generation ILDASM.

     1, in the menu "view (View)" option, click the "header (Header)" option, as shown:

clip_image004

 

2 metadata header

     2, in the menu "view (View)", click the "meta-information (MetaInfo)" option and select the "Original: header, architecture, line (Raw: Header, Schema, Rows)" and "Original: Heap (Raw : Heaps) ", then click" show (show) ", as shown!:

clip_image006

 

Figure 3 Metadata information

clip_image008

 

Figure 4 Metadata Information

     3, by opening the assembly UltraEdit, FIG.

clip_image010

 

FIG 5 PE file information

     As can be seen in FIG. 2: Metadata header comprises two parts: Storage signature and Storage header.

     Storage signature structure:

     1) Signed (Signature), type DWORD, value 0x424a5342, where this value exist? Metadata can be found in FIG. 3 in FIG. 5 from the head start address 0x00001098, 0x00001098 to 0x0000109b can be found from this value;

     2) a major version (Major Version), the type WORD;

     3) minor version (Minor Version), type WORD;

     4) Extra Data Offset, Reserved field, DWORD type, a value of 0;

     5) version string length, type of DWORD;

     6) version string, BYTE array type, such as the current version v2.0.50727.

     Storage header structure:

     1) Flags,保留字段,类型为BYTE值为0;

     2) Stream的个数,类型为WORD,当前PE文件中共有5个Stream。

     接下来就是Stream头,可以看到这个PE文件包含5个Stream:#~Stream、#Strings Stream、#US Stream、#GUID Stream、#Blob Stream:

表1 Stream头

Offset

Size

Name

0x0000006C (108)

0x00000150 (336)

#~

0x000001BC (444)

0x00000148 (328)

#Strings

0x00000304 (772)

0x00000074 (116)

#US

0x00000378 (888)

0x00000010 (16)

#GUID

0x00000388 (904)

0x0000006C (108)

#Blob

     1、#Strings Stream

     是一个String堆,从图5我们可看到Metadata起始地址为0x00001098,由表1可知#Strings流的偏移量为0x000001BC,所以我们到0x00001254地址去查看该Stream的内容,也可以直接通过ILDASM(图4)查看Stream内容,另外发现#Strings中存储的是元数据项的名字,并且以0开始以0结束,如表2:

表2 #Strings的内容

Offset

Data

0

 

1

<Module>

10

HelloWorld.exe

25

Hello

31

HelloWorld

42

mscorlib

51

System

58

Object

65

MessageBox

76

Main

81

.ctor

87

a

89

b

91

c

93

d

95

System.Diagnostics

114

DebuggableAttribute

134

DebuggingModes

149

System.Runtime.CompilerServices

181

CompilationRelaxationsAttribute

213

RuntimeCompatibilityAttribute

243

System.Runtime.InteropServices

274

DllImportAttribute

293

User32.dll

304

Console

312

Write

318

ReadLine

327

     2、#US Stream

     是一个Blob堆,可以存储用户自定义字符串或者二进制对象,在地址0x0000139C处看起,有表3内容:

表3 #US的内容

Offset

Byte Length

Data

0

0x00 (0)

1

0x03 (3)

2

5

0x03 (3)

3

9

0x31 (49)

Please enter your name:

59

0x35 (53)

Welcome to use IL Assembly

113

0x00 (0)

114

0x00 (0)

115

0x00 (0)

     3、#GUID

     是一个GUID堆,按序存储“全球唯一标识符”,在地址处查看起,有表4内容:

表4 #GUID的内容

Offset

Data

0

{ec04bb0c-8238-4d78-b80e-4415e508b3b5}

     4、#Blob Stream

     是一个Blob堆,存储Metadata中的内部二进制对象,例如,图1中定义对外部程序集mscorlib.dll的引用时,.publickeytoken的默认值为(B7 7A 5C 56 19 34 E0 89),这个默认值就存储在#Blob Stream中。内容如表5:

表5 #Blob的内容

Offset

Byte Length

Data

0

0x00 (0)

 

1

0x08 (8)

B7-7A-5C-56-19-34-E0-89

10

0x07 (7)

00-04-08-08-0E-0E-08

18

0x03 (3)

00-00-01

22

0x03 (3)

20-00-01

26

0x05 (5)

20-01-01-11-0D

32

0x04 (4)

20-01-01-08

37

0x04 (4)

20-01-01-0E

42

0x04 (4)

00-01-01-0E

47

0x03 (3)

00-00-0E

51

0x04 (4)

07-02-08-0E

56

0x08 (8)

01-00-07-01-00-00-00-00

65

0x08 (8)

01-00-08-00-00-00-00-00

74

0x1E (30)

01-00-01-00-54-02-16-57-72-61-70-4E-6F-6E-45-78-63-65-70-74-69-6F-6E-54-68-72-6F-77-73-01

105

0x00 (0)

106

0x00 (0)

107

0x00 (0)

     5、#~ Stream、

     可以划分为两个部分:头(Header)和Metadata Table。

     (1) 头(Header)

     从图3可以看到有这么一段:Metadata header: 2.0, heaps: 0x00, rid: 0x01, valid: 0x0000000914021547, sorted: 0x000016003301fa00,我们也可以到地址0x00001098 + 0x0000006c = 0x00001104处查看内容,如下:

clip_image012

 

图6 Header

     实际上Header由以下几个部分组成:

     1)、4字节大小的保留字段,值总为0;

     2)、1字节大小的主版本字段(Table Schema的版本,应该是跟随着CLR的版本);

     3)、1字节大小的次版本字段;

     4)、1字节大小的heap sizes,为0表示堆的索引大小为2字节;

     5)、1字节大小的保留字段,值总为1;

     6)、8字节大小的掩码串,相应位置为1,表示该Metadata Table有效;

     7)、8字节大小的掩码串,相应位置为1,表示该表为需要按照主键排序的表,说明如表6:

表6 Sorted Metadata Table

Table

Primary key

Secondary key

ClassLayout

Parent

Constant

Parent

CustomAttribute

Parent

DeclSecurity

Parent

FieldLayout

Field

FieldMarshal

Parent

FieldRVA

Field

GenericParam

Owner

Number column

GenericParamConstraint

Owner

ImplMap

MemberForwarded

InterfaceImpl

Class

Interface column

MethodImpl

Class

MethodSemantics

Association

NestedClass

NestedClass

     另外代码中,父类在TypeDef表中记录的索引号一定比子类在TypeDef表中记录的索引小。(父类定义在子类定义前面)

     8)、n个4字节大小的无符号整型(n为有效Metadata Table的个数),表示有效Metadata Table中的记录记录数分别是多少,上述内容反映到表7中:

表7 #~ Stream的Header的内容

Field

Value

Reserved

0x00000000 (0)

Major

0x02 (2)

Minor

0x00 (0)

HeapSizes

0x00 (0)

Reserved

0x01 (1)

MaskValid

0x0000000914021547 ( 0000 0000 0000 0000 0000 0000 0000 1001 0001 0100 0000 0010 0001 0101 0100 0111 )

Sorted

0x000016003301FA00 ( 0000 0000 0000 0000 0001 0110 0000 0000 0011 0011 0000 0001 1111 1010 0000 0000 )

Rows

1, 7, 2, 3, 4, 7, 3, 1, 1, 1, 1, 1

     紧接着Header的就是Metadata Table,2.0的Metadata Table一共有45个,按先后顺序反映在表8中,详细的说明可以参考Ecma-335:

表8 45个Metadata表

Token

Name

0x00

Module

0x01

TypeRef

0x02

TypeDef

0x03

FieldPtr

0x04

Field

0x05

MethodPtr

0x06

MethodDef

0x07

ParamPtr

0x08

Param

0x09

InterfaceImpl

0x0A

MemberRef

0x0B

Constant

0x0C

CustomAttribute

0x0D

FieldMarshal

0x0E

DeclSecurity

0x0F

ClassLayout

0x10

FieldLayout

0x11

StandAloneSig

0x12

EventMap

0x13

EventPtr

0x14

Event

0x15

PropertyMap

0x16

PropertyPtr

0x17

Property

0x18

MethodSemantics

0x19

MethodImpl

0x1A

ModuleRef

0x1B

TypeSpec

0x1C

ImplMap

0x1D

FieldRva

0x1E

EncLog

0x1F

EncMap

0x20

Assembly

0x21

AssemblyProcessor

0x22

AssemblyOS

0x23

AssemblyRef

0x24

AssemblyRefProcessor

0x25

AssemblyRefOS

0x26

File

0x27

ExportedType

0x28

ManifestResource

0x29

NestedClass

0x2A

GenericParam

0x2B

MethodSpec

0x2C

GenericParamConstraint

     6、以Hello类型为例,分析部分关键Metadata Table间的关系:

 

图7 Metadata Table

三、推荐资料

     1、ECMA-335:http://www.ecma-international.org/publications/standards/Ecma-355.htm

     2、《.NET IL Assembler》,作者:Serge Lidin;

     3、http://msdn.microsoft.com/zh-tw/library/dd229216.aspx,作者:蔡學鏞

Reproduced in: https: //www.cnblogs.com/vivounicorn/archive/2009/10/18/1585339.html

Guess you like

Origin blog.csdn.net/weixin_33670713/article/details/93642160