Lab4实验报告

开始之前，为了弄懂 mpconfig 与 lapic，可能需要阅读 Intel processor manual 和 MP Specification，相关资源可以在课程中找到

Execrise 1

Implement mmio_map_region in kern/pmap.c.

// mmio_map_region()
uintptr_t ret = base;
size = ROUNDUP(size, PGSIZE);
base = base + size;
if (base >= MMIOLIM) {
    panic("larger than MMIOLIM");
}
boot_map_region(kern_pgdir, ret, size, pa, PTE_PCD | PTE_PWT | PTE_W);
return (void *) ret;

映射 MMIO 的部分空间到指定物理地址，size 需要对齐

Exercise 2

Then modify your implementation of page_init() in kern/pmap.c to avoid adding the page at MPENTRY_PADDR to the free list.

void page_init(void)
{
    size_t left_i = PGNUM(IOPHYSMEM);
    size_t right_i = PGNUM(PADDR(envs + NENV));
    for (size_t i = 1; i < npages; i++) {
        if ((i < left_i || i > right_i) && i != PGNUM(MPENTRY_PADDR)) {
            pages[i].pp_link = page_free_list;
            page_free_list = &pages[i];
        }
    }
}

在之前的基础上加上 i != PGNUM(MPENTRY_PADDR) 就行了

Question

1.Compare kern/mpentry.S side by side with boot/boot.S. Bearing in mind that kern/mpentry.S is compiled and linked to run above KERNBASE just like everything else in the kernel, what is the purpose of macro MPBOOTPHYS? Why is it necessary in kern/mpentry.S but not in boot/boot.S? In other words, what could go wrong if it were omitted in kern/mpentry.S?

由于 AP 没有设置页表，而 BSP 开启了页表，所以需要自己从虚拟地址转换到物理地址

Exercise 3

Modify mem_init_mp() (in kern/pmap.c) to map per-CPU stacks starting at KSTACKTOP, as shown in inc/memlayout.h.

// mem_init_mp()
for (int i = 0; i != NCPU; ++i) {
    uintptr_t kstacktop = KSTACKTOP - i * (KSTKSIZE + KSTKGAP);
    boot_map_region(kern_pgdir, kstacktop - KSTKSIZE, KSTKSIZE, PADDR(percpu_kstacks[i]), PTE_W | PTE_P);
}

根据 CPU 数目设置相应的内核栈

Exercise 4

The code in trap_init_percpu() (kern/trap.c) initializes the TSS and TSS descriptor for the BSP. It worked in Lab 3, but is incorrect when running on other CPUs. Change the code so that it can work on all CPUs.

void trap_init_percpu(void)
{
    thiscpu->cpu_ts.ts_esp0 = KSTACKTOP - cpunum() * (KSTKSIZE + KSTKGAP);
    thiscpu->cpu_ts.ts_ss0 = GD_KD;

    gdt[(GD_TSS0 >> 3) + cpunum()] = SEG16(STS_T32A, (uint32_t) (&(thiscpu->cpu_ts)), sizeof(struct Taskstate) - 1, 0);
    gdt[(GD_TSS0 >> 3) + cpunum()].sd_s = 0;

    ltr(GD_TSS0 + (cpunum() << 3)); 
    lidt(&idt_pd);
}

初始化每个 CPU 的 TSS 并向 GDT 中加入相应条目

Exercise 5

通过源码可以知道，这里 lock_kernel 的核心是 xchg 指令，它可以原子地设置新值并返回原值，以此来判断获取锁是否成功

Apply the big kernel lock as described above, by calling lock_kernel() and unlock_kernel() at the proper locations.

// i386_init()
// Your code here:
lock_kernel();
boot_aps();

在唤醒其他 CPU 前需要 lock ，防止唤醒的 CPU 启动进程

// mp_main()
// Your code here:
lock_kernel();
sched_yield();

初始化 AP 后，在调度之前需要 lock，防止其他 CPU 干扰进程的选择

// trap()
// LAB 4: Your code here.
lock_kernel();
assert(curenv);

用户态引发中断陷入内核态时，需要 lock

// env_run()
lcr3(PADDR(e->env_pgdir));
unlock_kernel();
env_pop_tf(&(e->env_tf));

离开内核态之前，需要 unlock
BPS 启动 AP 前，获取内核锁，所以 AP 会在 mp_main 执行调度之前阻塞，在启动完 AP 后，BPS 执行调度，运行第一个进程，之后释放内核锁，这样一来，其中一个 AP 就可以开始执行调度，若有的话运行进程

Question

2.It seems that using the big kernel lock guarantees that only one CPU can run the kernel code at a time. Why do we still need separate kernel stacks for each CPU? Describe a scenario in which using a shared kernel stack will go wrong, even with the protection of the big kernel lock.

如果内核栈中留下不同 CPU 之后需要使用的数据，可能会造成混乱

Exercise 6

Implement round-robin scheduling in sched_yield() as described above.

// sched.c
// sched_yield()
idle = curenv;
size_t i = idle != NULL ? ENVX(idle->env_id) + 1 : 0;
for (size_t j = 0; j != NENV; j++, i = (i + 1) % NENV) {
    if (envs[i].env_status == ENV_RUNNABLE) {
        env_run(envs + i);
    }
} 
if (idle && idle->env_status == ENV_RUNNING) {
    env_run(idle);
}

在 envs 中，从当前进程的下一个开始（或从头开始）找状态为 ENV_RUNNABLE 的进程，找到则运行该进程，若找了一圈回到当前进程，则判断当前进程的状态，若为 ENV_RUNNING 则继续运行当前进程

// syscall.c
// syscall()
case SYS_yield:
    sched_yield();
    return 0;

在 syscall 中加入相应的分支

Question

3.In your implementation of env_run() you should have called lcr3(). Before and after the call to lcr3(), your code makes references (at least it should) to the variable e, the argument to env_run. Upon loading the %cr3 register, the addressing context used by the MMU is instantly changed. But a virtual address (namely e) has meaning relative to a given address context–the address context specifies the physical address to which the virtual address maps. Why can the pointer e be dereferenced both before and after the addressing switch?

由于 e 变量时保存在内核栈上，而用户页表与内核页表在内核区域的映射是一致的，所以可以正确地访问该地址

4.Whenever the kernel switches from one environment to another, it must ensure the old environment’s registers are saved so they can be restored properly later. Why? Where does this happen?

这个就是进程上下文切换，保存寄存器发生在执行系统调用（例如 sys_yield）或时钟中断时，相关代码在 trapentry.S 中

Exercise 7

Implement the system calls described above in kern/syscall.c.

static envid_t sys_exofork(void)
{
    struct Env *e;
    int r = env_alloc(&e, curenv->env_id);
    if (r < 0) {
        return r;
    }
    e->env_status = ENV_NOT_RUNNABLE;
    e->env_tf = curenv->env_tf;
    e->env_tf.tf_regs.reg_eax = 0;
    return e->env_id;
}

这里的重点是将 trapframe 中 eax 的值设置为 0，由于子进程的 trapframe 与父进程的相同，所以运行子进程时，会执行父进程调用 fork 时的下一条指令，而此时返回值是存在 eax 中的，这样就实现了在子进程中返回 0

static int sys_env_set_status(envid_t envid, int status)
{
    if (status != ENV_RUNNABLE && status != ENV_NOT_RUNNABLE) {
        return -E_INVAL;
    }
    struct Env *e;
    if (envid2env(envid, &e, 1) < 0) {
        return -E_BAD_ENV;
    }
    e->env_status = status;
    return 0;
}

按照要求，这里的 status 只有两种选择

static int sys_page_alloc(envid_t envid, void *va, int perm)
{
    if ((uintptr_t) va >= UTOP || (uintptr_t) va % PGSIZE || (~perm & (PTE_U | PTE_P))) {
        return -E_INVAL;
    }
    struct Env *e;
    if (envid2env(envid, &e, 1) < 0) {
        return -E_BAD_ENV;
    }
    struct PageInfo *pp = page_alloc(ALLOC_ZERO);
    if (pp == NULL) {
        return -E_NO_MEM;
    }
    if (page_insert(e->env_pgdir, pp, va, perm) < 0) {
        page_free(pp);
        return -E_NO_MEM;   
    }
    return 0;
}

这里要判断的情况比较多，注意在插入页面失败时需要释放页面

static int sys_page_map(envid_t srcenvid, void *srcva, envid_t dstenvid, void *dstva, int perm)
{
    if ((uintptr_t) srcva >= UTOP || (uintptr_t) srcva % PGSIZE || (uintptr_t) dstva >= UTOP || (uintptr_t) dstva % PGSIZE || (~perm & (PTE_U | PTE_P))) {
        return -E_INVAL;    
    }
    struct Env *srce, *dste;
    if (envid2env(srcenvid, &srce, 1) < 0 || envid2env(dstenvid, &dste, 1) < 0) {
        return -E_BAD_ENV;
    }
    pte_t *pte;
    struct PageInfo *pp = page_lookup(srce->env_pgdir, srcva, &pte);
    if (pp == NULL || ((~(*pte) & PTE_W) && (perm & PTE_W))) {
        return -E_INVAL;
    }
    if (page_insert(dste->env_pgdir, pp, dstva, perm) < 0) {
        return -E_NO_MEM;   
    }
    return 0;
}

注意判断页面为只读而 perm 具有写权限的情况

static int sys_page_unmap(envid_t envid, void *va)
{
    if ((uintptr_t) va >= UTOP || (uintptr_t) va % PGSIZE) {
        return -E_INVAL;
    }
    struct Env *e;
    if (envid2env(envid, &e, 1) < 0) {
        return -E_BAD_ENV;
    }
    page_remove(e->env_pgdir, va);
    return 0;
}

这里还需要在 syscall 中添加相应分支

// syscall()
case SYS_page_alloc:
    return sys_page_alloc(a1, (void *) a2, a3);
case SYS_page_map:
    return sys_page_map(a1, (void *) a2, a3, (void *) a4, a5);
case SYS_page_unmap:
    return sys_page_unmap(a1, (void *) a2);
case SYS_exofork:
    return sys_exofork();
case SYS_env_set_status:
    return sys_env_set_status(a1, a2);

至此，Part A 完成，make grade 通过

Exercise 8

Implement the sys_env_set_pgfault_upcall system call.

static int sys_env_set_pgfault_upcall(envid_t envid, void *func)
{
    struct Env *e;
    if (envid2env(envid, &e, 1) < 0) {
        return -E_BAD_ENV;  
    }
    e->env_pgfault_upcall = func;
    return 0;
}

做接下来几个练习前，最好先理清下用户异常处理的调用关系

用户程序调用位于 pgfault.c 的 set_pgfault_handler()，作用是设置异常处理程序，为用户异常栈分配页面，并将 env_pgfault_upcall 设置为 _pgfault_upcall
当在用户态出现页错误时，trap_dispatch() 调用 page_fault_handler()，然后设置用户异常栈，将 esp 指向该栈顶，eip 指向之前设置的 env_pgfault_upcall，即 _pgfault_upcall，并调用 env_run()
根据 eip，执行位于 pfentry.S 的 _pgfault_upcall，首先调用之前设置的异常处理程序，完成后恢复 trapframe 中的寄存器，继续执行用户态指令

Exercise 9

Implement the code in page_fault_handler in kern/trap.c required to dispatch page faults to the user-mode handler.

// page_fault_handler()
// LAB 4: Your code here.
if (curenv->env_pgfault_upcall != NULL) {
    uintptr_t esp;

    if (tf->tf_esp > UXSTACKTOP - PGSIZE && tf->tf_esp < UXSTACKTOP) {
        esp = tf->tf_esp - 4 - sizeof(struct UTrapframe);   
    } else {
        esp = UXSTACKTOP - sizeof(struct UTrapframe);
    }

    user_mem_assert(curenv, (void *) esp, sizeof(struct UTrapframe), PTE_W | PTE_U | PTE_P);

    struct UTrapframe *utf = (struct UTrapframe *) (esp);
    utf->utf_fault_va = fault_va;
    utf->utf_err = tf->tf_err;
    utf->utf_regs = tf->tf_regs;
    utf->utf_eip = tf->tf_eip;
    utf->utf_eflags = tf->tf_eflags;
    utf->utf_esp = tf->tf_esp;

    tf->tf_esp = esp;
    tf->tf_eip = (uintptr_t) curenv->env_pgfault_upcall;
    env_run(curenv);
}

向用户异常栈中压入 UTrapframe，需要判断可能发生多次异常，这种情况下需要在之前的栈顶后先留下一个空位，再压入 UTrapframe，之后会用到这个空位，然后设置 esp 和 eip 并调用 env_run()

Exercise 10

Implement the _pgfault_upcall routine in lib/pfentry.S.

// LAB 4: Your code here.
movl 48(%esp), %eax
subl $4, %eax
movl %eax, 48(%esp)
movl 40(%esp), %ebx
movl %ebx, (%eax)

// LAB 4: Your code here.
addl $8, %esp
popal

// LAB 4: Your code here.
addl $4, %esp
popfl

// LAB 4: Your code here.
popl %esp

// LAB 4: Your code here.
ret

第一处：往 trap-time esp 所指的栈顶（可能是普通栈也可能是异常栈）后面的空位写入 trap-time eip 并将 trap-time esp 往下移指向该位置
第二处：跳过 fault_va 和 err，然后恢复通用寄存器
第三处：跳过 eip，然后恢复 efalgs，如果先恢复 eip 的话，指令执行的位置会改变，所以这里必须跳过
第四处：恢复 esp，如果第一处不将 trap-time esp 指向下一个位置，这里 esp 就会指向之前的栈顶
第五处：由于第一处的设置，现在 esp 指向的值为 trap-time eip，所以直接 ret 即可达到恢复上一次执行的效果

Exercise 11

Finish set_pgfault_handler() in lib/pgfault.c.

// set_pgfault_handler()
// LAB 4: Your code here.
envid_t id = sys_getenvid();
if ((r = sys_page_alloc(id, (void *) (UXSTACKTOP - PGSIZE), PTE_W | PTE_U | PTE_P)) < 0 ||
    (r = sys_env_set_pgfault_upcall(id, _pgfault_upcall)) < 0) {
    panic("sys_page_alloc: %e", r);
}

为用户异常栈分配页面并设置异常处理函数

Exercise 12

Implement fork, duppage and pgfault in lib/fork.c.

// pgfault()
// LAB 4: Your code here.
if (!(err & FEC_WR) || !(uvpt[PGNUM(addr)] & PTE_COW)) {
    panic("pgfault: failed!");
}

// LAB 4: Your code here.
if ((r = sys_page_alloc(0, (void *) PFTEMP, PTE_U | PTE_W | PTE_P)) < 0) {
    panic("pgfault: %e", r);
}
memcpy((void *) PFTEMP, ROUNDDOWN(addr, PGSIZE), PGSIZE);
if ((r = sys_page_map(0, (void *) PFTEMP, 0, ROUNDDOWN(addr, PGSIZE), PTE_U | PTE_W | PTE_P)) < 0) {
    panic("pgfault: %e", r);
}
if ((r = sys_page_unmap(0, (void *) PFTEMP)) < 0) {
    panic("pgfault: %e", r);
}

检查 err 和 pte 是否符合条件
分配页面到地址 PFTEMP
复制内容到刚分配的页面中
将虚拟地址 addr（需要向下对齐）映射到分配的页面
取消地址 PFTEMP 的映射

// duppage()
// LAB 4: Your code here.
int perm = PGOFF(uvpt[pn]);
if (perm & (PTE_W | PTE_COW)) {
    perm |= PTE_COW;
    perm &= ~PTE_W;
}
if ((r = sys_page_map(0, (void *) (pn * PGSIZE), envid, (void *) (pn * PGSIZE), perm)) < 0) {
    panic("duppage: %e", r);
}
if ((r = sys_page_map(0, (void *) (pn * PGSIZE), 0, (void *) (pn * PGSIZE), perm)) < 0) {
    panic("duppage: %e", r);
}
return 0;

先判断权限，然后建立映射，注意父进程页表也需要重新建立映射

// fork()
int r;
set_pgfault_handler(pgfault);
envid_t envid = sys_exofork();

if (envid == 0) {
    thisenv = &envs[ENVX(sys_getenvid())];
    return 0;
}
for (uintptr_t va = 0; va < USTACKTOP; va += PGSIZE) {
    if ((uvpd[PDX(va)] & PTE_P) && (uvpt[PGNUM(va)] & PTE_P)) {
        duppage(envid, PGNUM(va));
    }
}
if ((r = sys_page_alloc(envid, (void *) (UXSTACKTOP - PGSIZE), PTE_U | PTE_W | PTE_P)) < 0) {
    return r;
}
extern void _pgfault_upcall(void);
if ((r = sys_env_set_pgfault_upcall(envid, _pgfault_upcall)) < 0) {
    return r;
}
sys_env_set_status(envid, ENV_RUNNABLE);

return envid;

设置异常处理函数，创建子进程，映射页面到子进程，为子进程分配用户异常栈并设置 pgfault_upcall 入口，将子进程设置为可运行的
至此，Part B 完成，make grade 通过

Exercise 13

Modify kern/trapentry.S and kern/trap.c to initialize the appropriate entries in the IDT and provide handlers for IRQs 0 through 15. Then modify the code in env_alloc() in kern/env.c to ensure that user environments are always run with interrupts enabled.

在 IDT 中加入相应的项，在 Lab 3 中设置过，这里只要仿照之前的代码就行了，所以就不贴代码了

// env.c
// env_alloc()
// LAB 4: Your code here.
e->env_tf.tf_eflags |= FL_IF;

由于在运行 bootloader 时屏蔽了中断，这里只需要简单地设置 eflags 寄存器的 FL_IF 位就可以接收中断了

Exercise 14

Modify the kernel’s trap_dispatch() function so that it calls sched_yield() to find and run a different environment whenever a clock interrupt takes place.

// trap_dispatch()
case IRQ_TIMER + IRQ_OFFSET:
    lapic_eoi();
    sched_yield();

在函数中加入相应分支即可，需要先调用 lapic_eoi，作用是告诉 LAPIC 已经收到并调度中断了，于是 LAPIC 从中断请求队列中将其删除，这里可以不需要 break，因为不会返回

Exercise 15

Implement sys_ipc_recv and sys_ipc_try_send in kern/syscall.c.Then implement the ipc_recv and ipc_send functions in lib/ipc.c.

static int sys_ipc_try_send(envid_t envid, uint32_t value, void *srcva, unsigned perm)
{
    struct Env *e;
    if (envid2env(envid, &e, 0) < 0) {
        return -E_BAD_ENV;
    }
    if (!(e->env_ipc_recving)) {
        return -E_IPC_NOT_RECV;
    }
    if (e->env_ipc_dstva && srcva && (uintptr_t) srcva < UTOP) {
        int r = sys_page_map(0, srcva, envid, e->env_ipc_dstva, perm);
        if (r < 0) {
            return r;
        }
        e->env_ipc_perm = perm;
    } else {
        e->env_ipc_perm = 0;
    }
    e->env_ipc_value = value;
    e->env_ipc_from = curenv->env_id;

    e->env_tf.tf_regs.reg_eax = 0;
    e->env_ipc_recving = false;
    e->env_status = ENV_RUNNABLE;

    return 0;
}

首先对参数进行一些检查，如果需要发送页面则建立相关映射，在设置完相关的值后，需要将目标进程 trapframe 中的 eax 设置为 0 以作为成功返回值（sys_ipc_recv 若成功由于放弃 CPU 所以不会直接返回），然后取消阻塞，设置状态为可运行的以接受调度
说明中的很多情况在这里没有判断，原因是在 sys_page_map 函数中会对这些情况进行判断

static int sys_ipc_recv(void *dstva)
{
    if ((uintptr_t) dstva >= UTOP || (dstva && (uintptr_t) dstva % PGSIZE)) {
        return -E_INVAL;
    }
    curenv->env_ipc_recving = true;
    curenv->env_ipc_dstva = dstva;
    curenv->env_status = ENV_NOT_RUNNABLE;

    sched_yield();
}

首先还是检查参数，然后阻塞进程，设置目标地址与进程状态，最后放弃 CPU 执行调度函数
还需要在 syscall() 函数中加入相应分支，这里就不贴代码了

int32_t ipc_recv(envid_t *from_env_store, void *pg, int *perm_store)
{
    int r;
    r = sys_ipc_recv(pg);

    if (from_env_store) {
        *from_env_store = r < 0 ? 0 : thisenv->env_ipc_from;
    }
    if (perm_store) {
        *perm_store = r < 0 ? 0 : thisenv->env_ipc_perm;
    }
    if (r < 0) {
        return r;
    }
    return thisenv->env_ipc_value;
}

根据传入的参数来设置值，比较简单

void ipc_send(envid_t to_env, uint32_t val, void *pg, int perm)
{
    while (1) {
        int r = sys_ipc_try_send(to_env, val, pg, perm);
        if (r < 0 && r != -E_IPC_NOT_RECV) {
            panic("ipc_send: %e", r);
        }
        sys_yield();
        if (r == 0) {
            break;
        }
    }
}

不断尝试发送消息，如果目标进程没有设置 ipc_recving 则继续发送，直至成功或者发生其他错误
至此，Part C 完成，make grade 通过

MIT 6.828 学习笔记6 Lab4实验报告

Lab4实验报告

Execrise 1

Exercise 2

Question

Exercise 3

Exercise 4

Exercise 5

Question

Exercise 6

Question

Exercise 7

Exercise 8

Exercise 9

Exercise 10

Exercise 11

Exercise 12

Exercise 13

Exercise 14

Exercise 15

猜你喜欢