基于X86架构的OS内核设计之杂记(二)

在用户shell进程通过系统调用read()读取按键键值时,第一次读取键值0,而内核空间中明确返回值非零,通过debug和分析汇编文件:

read系统调用代码如下:

int gets(char *buf)
{
	int ch, r, i = 0;

	do{
		r = read(fd_keyboard,(char*)&ch, 1);
		if ( ch == '\n') break;
		printf(0,"%02x r=%d",ch, r);
		buf[i++] = ch & 0xFF;
	}while (1);

	buf[i] = '\0';
	
	return (i);
}

其中 read采用linux类似的写法:

static inline _syscall_3(int,read,int,fd,char *,__buf, int, len)

#define _syscall_3(type,name,atype,a,btype,b,ctype,c) \
type name(atype a, btype b, ctype c)\
{\
	long _res;\
	__asm__ __volatile__("int $0x80"\
			:"=a"(_res)\
			:"a"(_SYSCALL_##name),"b"((long)(a)),\
			 "c"((long)(b)), "d"((long)(c))\
	);\
	if(_res >= 0)\
		return (type)(_res);\
	else{\
		return (type)(-1);\
	}\
}

宏展开后得到的read函数原型如下:

static inline int read(int fd,char *__buf, int len);

gets函数经过编译后的汇编代码(GCC优化等级为O2)如下:

02000390 <gets>:
 2000390:	55                   	push   %ebp
 2000391:	57                   	push   %edi
 2000392:	31 ff                	xor    %edi,%edi
 2000394:	56                   	push   %esi
 2000395:	53                   	push   %ebx
 2000396:	83 ec 1c             	sub    $0x1c,%esp
 2000399:	8b 6c 24 30          	mov    0x30(%esp),%ebp
 200039d:	8b 74 24 0c          	mov    0xc(%esp),%esi
 20003a1:	eb 23                	jmp    20003c6 <gets+0x36>
 20003a3:	90                   	nop
 20003a4:	8d 74 26 00          	lea    0x0(%esi,%eiz,1),%esi
 20003a8:	50                   	push   %eax
 20003a9:	56                   	push   %esi
 20003aa:	83 c7 01             	add    $0x1,%edi
 20003ad:	68 6c 0d 00 02       	push   $0x2000d6c
 20003b2:	6a 00                	push   $0x0
 20003b4:	e8 a7 fe ff ff       	call   2000260 <printf>
 20003b9:	8b 74 24 1c          	mov    0x1c(%esp),%esi
 20003bd:	83 c4 10             	add    $0x10,%esp
 20003c0:	89 f0                	mov    %esi,%eax
 20003c2:	88 44 3d ff          	mov    %al,-0x1(%ebp,%edi,1)
 20003c6:	b8 04 00 00 00       	mov    $0x4,%eax
 20003cb:	8b 1d 5c 13 00 02    	mov    0x200135c,%ebx
 20003d1:	8d 4c 24 0c          	lea    0xc(%esp),%ecx
 20003d5:	ba 01 00 00 00       	mov    $0x1,%edx
 20003da:	cd 80                	int    $0x80
 20003dc:	ba ff ff ff ff       	mov    $0xffffffff,%edx
 20003e1:	85 c0                	test   %eax,%eax
 20003e3:	0f 48 c2             	cmovs  %edx,%eax
 20003e6:	83 fe 0a             	cmp    $0xa,%esi
 20003e9:	75 bd                	jne    20003a8 <gets+0x18>
 20003eb:	c6 44 3d 00 00       	movb   $0x0,0x0(%ebp,%edi,1)
 20003f0:	83 c4 1c             	add    $0x1c,%esp
 20003f3:	89 f8                	mov    %edi,%eax
 20003f5:	5b                   	pop    %ebx
 20003f6:	5e                   	pop    %esi
 20003f7:	5f                   	pop    %edi
 20003f8:	5d                   	pop    %ebp
 20003f9:	c3                   	ret

关键语句:
read调用前,将变量ch从栈中复制到寄存器ESI:

200039d:	8b 74 24 0c          	mov    0xc(%esp),%esi

汇编语句将参数__buf首地址通过ECX寄存器传入系统调用:

20003d1:	8d 4c 24 0c          	lea    0xc(%esp),%ecx

通过INT 80调用后,直接将寄存器ESI的值(即ch的值)与’\n’比较

20003e6:	83 fe 0a             	cmp    $0xa,%esi

这个过程显然有问题的,因为ch变量的地址传递到了内核,并在内核中通过该地址(指针)修改了ch的值,然而这个修改并没有作用到ESI寄存器,而且非常可怕的是,我尝试通过用printf打印ch的值,发现除了第一次输出错误以外,后续的ch值都正常,所以问题很隐蔽,继而进一步分析printf:

 20003a4:	8d 74 26 00          	lea    0x0(%esi,%eiz,1),%esi
 20003a8:	50                   	push   %eax
 20003a9:	56                   	push   %esi
 20003aa:	83 c7 01             	add    $0x1,%edi
 20003ad:	68 6c 0d 00 02       	push   $0x2000d6c
 20003b2:	6a 00                	push   $0x0
 20003b4:	e8 a7 fe ff ff       	call   2000260 <printf>
 20003b9:	8b 74 24 1c          	mov    0x1c(%esp),%esi

指令lea 0x0(%esi,%eiz,1),%esi等价于mov %esi, %esi,将ch的值作为printf的参数入栈,分析汇编while循环体得知第一次输出ch(ESI)的值时,ch(ESI)的值理论上是一个随机值,因为没有被初始化或修改过[这里输出为0,因为我的内核在分配堆栈时,页面是首先被清空的],printf调用之后,(ESI)寄存器被重新赋值为ch的值(此时栈中的ch值在内核中已被改动了),所以下一次输出时就得到了正确的值。因而出现了第一次输出结果错误,后续输出都正常的现象。实际上就是内存变量ch(存放在栈中)的值发生了改变,而没有同步到缓存寄存器中,进一步来说就是内联汇编里面通过某种手段修改了内存变量,而这种修改GCC无法感知,那么解决办法有很简单明了,修改一下内联汇编语句:

#define _syscall_3(type,name,atype,a,btype,b,ctype,c) \
type name(atype a, btype b, ctype c)\
{\
	long _res;\
	__asm__ __volatile__("int $0x80"\
			:"=a"(_res)\
			:"a"(_SYSCALL_##name),"b"((long)(a)),\
			 "c"((long)(b)), "d"((long)(c))\
			 :"memory"\
	);\
	if(_res >= 0)\
		return (type)(_res);\
	else{\
		return (type)(-1);\
	}\
}

clobber-list增加了:"memory" 告诉GCC,汇编代码模块通过某种方式破坏了内存,于此相关的变量(此处是ch),任何缓存ch的寄存器(此处是ESI)都无效,必须重新加载。

修改后的完整汇编代码如下:

02000390 <gets>:
 2000390:	55                   	push   %ebp
 2000391:	57                   	push   %edi
 2000392:	56                   	push   %esi
 2000393:	53                   	push   %ebx
 2000394:	31 f6                	xor    %esi,%esi
 2000396:	83 ec 1c             	sub    $0x1c,%esp
 2000399:	8b 7c 24 30          	mov    0x30(%esp),%edi
 200039d:	8d 6c 24 0c          	lea    0xc(%esp),%ebp
 20003a1:	eb 21                	jmp    20003c4 <gets+0x34>
 20003a3:	90                   	nop
 20003a4:	8d 74 26 00          	lea    0x0(%esi,%eiz,1),%esi
 20003a8:	50                   	push   %eax
 20003a9:	52                   	push   %edx
 20003aa:	83 c6 01             	add    $0x1,%esi
 20003ad:	68 6c 0d 00 02       	push   $0x2000d6c
 20003b2:	6a 00                	push   $0x0
 20003b4:	e8 a7 fe ff ff       	call   2000260 <printf>
 20003b9:	8b 44 24 1c          	mov    0x1c(%esp),%eax
 20003bd:	83 c4 10             	add    $0x10,%esp
 20003c0:	88 44 37 ff          	mov    %al,-0x1(%edi,%esi,1)
 20003c4:	b8 04 00 00 00       	mov    $0x4,%eax
 20003c9:	8b 1d 5c 13 00 02    	mov    0x200135c,%ebx
 20003cf:	89 e9                	mov    %ebp,%ecx
 20003d1:	ba 01 00 00 00       	mov    $0x1,%edx
 20003d6:	cd 80                	int    $0x80
 20003d8:	ba ff ff ff ff       	mov    $0xffffffff,%edx
 20003dd:	85 c0                	test   %eax,%eax
 20003df:	0f 48 c2             	cmovs  %edx,%eax
 20003e2:	8b 54 24 0c          	mov    0xc(%esp),%edx
 20003e6:	83 fa 0a             	cmp    $0xa,%edx
 20003e9:	75 bd                	jne    20003a8 <gets+0x18>
 20003eb:	c6 04 37 00          	movb   $0x0,(%edi,%esi,1)
 20003ef:	83 c4 1c             	add    $0x1c,%esp
 20003f2:	89 f0                	mov    %esi,%eax
 20003f4:	5b                   	pop    %ebx
 20003f5:	5e                   	pop    %esi
 20003f6:	5f                   	pop    %edi
 20003f7:	5d                   	pop    %ebp
 20003f8:	c3                   	ret 

可以看到此时GCC在将ch与’\n’比较前,重新从内存(栈)中加载了ch的值,此时使用了寄存器EDX:

20003e2:	8b 54 24 0c          	mov    0xc(%esp),%edx
20003e6:	83 fa 0a             	cmp    $0xa,%edx

其实是一个很简单问题,因为不够仔细,犯了一个错误,花了不少时间来定位问题。不过对于GCC内联汇编的理解更深了一层。

加鸡腿:
lea 0x0(%esi,%eiz,1),%esi 这条指令有点怪异,其实等价于mov %esi,%esi,就是一条空指令,作用是内存对齐,X86这种空指令比多个NOP指令效率更高,属高级玩法。

猜你喜欢

转载自blog.csdn.net/wenshifang/article/details/97824474