A Preliminary MetaData
The written support CLR programming languages (such as C ++ / CLI, C #, VB , etc.) source code files can be compiled by Microsoft or write your own compiler into a managed module, it is actually a standard PE file, its structure can be found in -depth understanding of the process of loading the CLR article. Metadata (metadata) are present in the IL code Sections of the PE file, and IL Metadata is always simultaneously generated and synchronized, this paper discusses Metadata of content, and the following code as an example:
namespace HelloWorld { using System; using System.Runtime.InteropServices; public class Hello { [DllImport("User32.dll")] public static extern int MessageBox(int a,string b,string c, int d); public static void Main() { int num = MessageBox(0, "2", "3", 4); Console.Write("Please enter your name: "); string str = Console.ReadLine(); num = MessageBox(0, str, "Welcome to use IL Assembly", 1); } } }
I think the content Metadata included can be divided into "macro content" and "micro content" in two parts.
Manifest content includes macro, which contains the following information: configuration file assembly, or dependent on the external Module Assembly (including custom or dependence on the GAC Assembly Assembly, any program will depend the Mscorlib.dll .Net, so the assembly of the application will appear in the Manifest), the current identity Module (at least an assembly to have a Module), the assembly public key, image base and the like, as shown:
Figure 1 Manifest Content
Microscopic including type, methods, features (the Attribute), for example, the type, including the type name, field and method of the type comprising, parameters of each method and the like details.
Second, the analysis Metadata
Logically, Metadata Stream composed by a number, which can be divided into two categories Stream: Metadata heap and Metadata Table.
Metadata stack includes:
1) String heap: storing UTF-8 encoded string of 0 and ending in the first stack string is empty string, so the heap must first byte stored is 0, and the last stack byte must also be 0;
2) GUID stack: storing 16-byte binary objects, because the fixed length, there is no need to identify additional bytes of the length or end position;
3) Blob stack: storing binary objects of any size, alignment, as follows: Let the length of the object (an unsigned integer) to length, if length <= 0x7F store the one byte, if 0x80 <= length <= 0x3FFF with the two bytes of storage, if 0x4000 <= length <= 0x1FFFFFFF 4 bytes are stored;
We can visually obtain information Stream from the following figures: open the assembly with the above-mentioned code generation ILDASM.
1, in the menu "view (View)" option, click the "header (Header)" option, as shown:
2 metadata header
2, in the menu "view (View)", click the "meta-information (MetaInfo)" option and select the "Original: header, architecture, line (Raw: Header, Schema, Rows)" and "Original: Heap (Raw : Heaps) ", then click" show (show) ", as shown!:
Figure 3 Metadata information
Figure 4 Metadata Information
3, by opening the assembly UltraEdit, FIG.
FIG 5 PE file information
As can be seen in FIG. 2: Metadata header comprises two parts: Storage signature and Storage header.
Storage signature structure:
1) Signed (Signature), type DWORD, value 0x424a5342, where this value exist? Metadata can be found in FIG. 3 in FIG. 5 from the head start address 0x00001098, 0x00001098 to 0x0000109b can be found from this value;
2) a major version (Major Version), the type WORD;
3) minor version (Minor Version), type WORD;
4) Extra Data Offset, Reserved field, DWORD type, a value of 0;
5) version string length, type of DWORD;
6) version string, BYTE array type, such as the current version v2.0.50727.
Storage header structure:
1) Flags,保留字段,类型为BYTE值为0;
2) Stream的个数,类型为WORD,当前PE文件中共有5个Stream。
接下来就是Stream头,可以看到这个PE文件包含5个Stream:#~Stream、#Strings Stream、#US Stream、#GUID Stream、#Blob Stream:
表1 Stream头
Offset |
Size |
Name |
0x0000006C (108) |
0x00000150 (336) |
#~ |
0x000001BC (444) |
0x00000148 (328) |
#Strings |
0x00000304 (772) |
0x00000074 (116) |
#US |
0x00000378 (888) |
0x00000010 (16) |
#GUID |
0x00000388 (904) |
0x0000006C (108) |
#Blob |
1、#Strings Stream
是一个String堆,从图5我们可看到Metadata起始地址为0x00001098,由表1可知#Strings流的偏移量为0x000001BC,所以我们到0x00001254地址去查看该Stream的内容,也可以直接通过ILDASM(图4)查看Stream内容,另外发现#Strings中存储的是元数据项的名字,并且以0开始以0结束,如表2:
表2 #Strings的内容
Offset |
Data |
0 |
|
1 |
<Module> |
10 |
HelloWorld.exe |
25 |
Hello |
31 |
HelloWorld |
42 |
mscorlib |
51 |
System |
58 |
Object |
65 |
MessageBox |
76 |
Main |
81 |
.ctor |
87 |
a |
89 |
b |
91 |
c |
93 |
d |
95 |
System.Diagnostics |
114 |
DebuggableAttribute |
134 |
DebuggingModes |
149 |
System.Runtime.CompilerServices |
181 |
CompilationRelaxationsAttribute |
213 |
RuntimeCompatibilityAttribute |
243 |
System.Runtime.InteropServices |
274 |
DllImportAttribute |
293 |
User32.dll |
304 |
Console |
312 |
Write |
318 |
ReadLine |
327 |
2、#US Stream
是一个Blob堆,可以存储用户自定义字符串或者二进制对象,在地址0x0000139C处看起,有表3内容:
表3 #US的内容
Offset |
Byte Length |
Data |
0 |
0x00 (0) |
|
1 |
0x03 (3) |
2 |
5 |
0x03 (3) |
3 |
9 |
0x31 (49) |
Please enter your name: |
59 |
0x35 (53) |
Welcome to use IL Assembly |
113 |
0x00 (0) |
|
114 |
0x00 (0) |
|
115 |
0x00 (0) |
3、#GUID
是一个GUID堆,按序存储“全球唯一标识符”,在地址处查看起,有表4内容:
表4 #GUID的内容
Offset |
Data |
0 |
{ec04bb0c-8238-4d78-b80e-4415e508b3b5} |
4、#Blob Stream
是一个Blob堆,存储Metadata中的内部二进制对象,例如,图1中定义对外部程序集mscorlib.dll的引用时,.publickeytoken的默认值为(B7 7A 5C 56 19 34 E0 89),这个默认值就存储在#Blob Stream中。内容如表5:
表5 #Blob的内容
Offset |
Byte Length |
Data |
0 |
0x00 (0) |
|
1 |
0x08 (8) |
B7-7A-5C-56-19-34-E0-89 |
10 |
0x07 (7) |
00-04-08-08-0E-0E-08 |
18 |
0x03 (3) |
00-00-01 |
22 |
0x03 (3) |
20-00-01 |
26 |
0x05 (5) |
20-01-01-11-0D |
32 |
0x04 (4) |
20-01-01-08 |
37 |
0x04 (4) |
20-01-01-0E |
42 |
0x04 (4) |
00-01-01-0E |
47 |
0x03 (3) |
00-00-0E |
51 |
0x04 (4) |
07-02-08-0E |
56 |
0x08 (8) |
01-00-07-01-00-00-00-00 |
65 |
0x08 (8) |
01-00-08-00-00-00-00-00 |
74 |
0x1E (30) |
01-00-01-00-54-02-16-57-72-61-70-4E-6F-6E-45-78-63-65-70-74-69-6F-6E-54-68-72-6F-77-73-01 |
105 |
0x00 (0) |
|
106 |
0x00 (0) |
|
107 |
0x00 (0) |
5、#~ Stream、
可以划分为两个部分:头(Header)和Metadata Table。
(1) 头(Header)
从图3可以看到有这么一段:Metadata header: 2.0, heaps: 0x00, rid: 0x01, valid: 0x0000000914021547, sorted: 0x000016003301fa00,我们也可以到地址0x00001098 + 0x0000006c = 0x00001104处查看内容,如下:
图6 Header
实际上Header由以下几个部分组成:
1)、4字节大小的保留字段,值总为0;
2)、1字节大小的主版本字段(Table Schema的版本,应该是跟随着CLR的版本);
3)、1字节大小的次版本字段;
4)、1字节大小的heap sizes,为0表示堆的索引大小为2字节;
5)、1字节大小的保留字段,值总为1;
6)、8字节大小的掩码串,相应位置为1,表示该Metadata Table有效;
7)、8字节大小的掩码串,相应位置为1,表示该表为需要按照主键排序的表,说明如表6:
表6 Sorted Metadata Table
Table |
Primary key |
Secondary key |
ClassLayout |
Parent |
|
Constant |
Parent |
|
CustomAttribute |
Parent |
|
DeclSecurity |
Parent |
|
FieldLayout |
Field |
|
FieldMarshal |
Parent |
|
FieldRVA |
Field |
|
GenericParam |
Owner |
Number column |
GenericParamConstraint |
Owner |
|
ImplMap |
MemberForwarded |
|
InterfaceImpl |
Class |
Interface column |
MethodImpl |
Class |
|
MethodSemantics |
Association |
|
NestedClass |
NestedClass |
另外代码中,父类在TypeDef表中记录的索引号一定比子类在TypeDef表中记录的索引小。(父类定义在子类定义前面)
8)、n个4字节大小的无符号整型(n为有效Metadata Table的个数),表示有效Metadata Table中的记录记录数分别是多少,上述内容反映到表7中:
表7 #~ Stream的Header的内容
Field |
Value |
Reserved |
0x00000000 (0) |
Major |
0x02 (2) |
Minor |
0x00 (0) |
HeapSizes |
0x00 (0) |
Reserved |
0x01 (1) |
MaskValid |
0x0000000914021547 ( 0000 0000 0000 0000 0000 0000 0000 1001 0001 0100 0000 0010 0001 0101 0100 0111 ) |
Sorted |
0x000016003301FA00 ( 0000 0000 0000 0000 0001 0110 0000 0000 0011 0011 0000 0001 1111 1010 0000 0000 ) |
Rows |
1, 7, 2, 3, 4, 7, 3, 1, 1, 1, 1, 1 |
紧接着Header的就是Metadata Table,2.0的Metadata Table一共有45个,按先后顺序反映在表8中,详细的说明可以参考Ecma-335:
Token |
Name |
0x00 |
Module |
0x01 |
TypeRef |
0x02 |
TypeDef |
0x03 |
FieldPtr |
0x04 |
Field |
0x05 |
MethodPtr |
0x06 |
MethodDef |
0x07 |
ParamPtr |
0x08 |
Param |
0x09 |
InterfaceImpl |
0x0A |
MemberRef |
0x0B |
Constant |
0x0C |
CustomAttribute |
0x0D |
FieldMarshal |
0x0E |
DeclSecurity |
0x0F |
ClassLayout |
0x10 |
FieldLayout |
0x11 |
StandAloneSig |
0x12 |
EventMap |
0x13 |
EventPtr |
0x14 |
Event |
0x15 |
PropertyMap |
0x16 |
PropertyPtr |
0x17 |
Property |
0x18 |
MethodSemantics |
0x19 |
MethodImpl |
0x1A |
ModuleRef |
0x1B |
TypeSpec |
0x1C |
ImplMap |
0x1D |
FieldRva |
0x1E |
EncLog |
0x1F |
EncMap |
0x20 |
Assembly |
0x21 |
AssemblyProcessor |
0x22 |
AssemblyOS |
0x23 |
AssemblyRef |
0x24 |
AssemblyRefProcessor |
0x25 |
AssemblyRefOS |
0x26 |
File |
0x27 |
ExportedType |
0x28 |
ManifestResource |
0x29 |
NestedClass |
0x2A |
GenericParam |
0x2B |
MethodSpec |
0x2C |
GenericParamConstraint |
6、以Hello类型为例,分析部分关键Metadata Table间的关系:
图7 Metadata Table
三、推荐资料
1、ECMA-335:http://www.ecma-international.org/publications/standards/Ecma-355.htm;
2、《.NET IL Assembler》,作者:Serge Lidin;
3、http://msdn.microsoft.com/zh-tw/library/dd229216.aspx,作者:蔡學鏞
Reproduced in: https: //www.cnblogs.com/vivounicorn/archive/2009/10/18/1585339.html