"Linux Kernel Design and Implementation" Reading Notes-System Call

C library, API and system calls

Under normal circumstances, applications are programmed through an application programming interface (API) implemented in user space rather than directly through system calls.

LInux system calls, like most Unix systems, are provided as part of the C library.

The C library implements the main APIs of the Unix system, including standard C library functions and system call interfaces.

The most popular API in Unix is ​​based on the POSIX standard.

 

Examples and instructions

getpid() system call:

SYSCALL_DEFINE0(getpid)
{
    return task_tgid_vnr(current);
}

SYSCALL_DEFINE0 is a macro that defines a system call without parameters. It is one of a series of macros:

#define SYSCALL_DEFINE0(name)      asmlinkage long sys_##name(void)
#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)

The expanded definition actually looks like this:

asmlinkage long sys_getpid(void)

Here:

asmlinkage is a compilation instruction used to inform the compiler to extract only the parameters of the function from the stack .

Long is to ensure compatibility. It returns int in user space and long in kernel space.

sys_ xx is a naming rule that all system calls in Linux should follow.

 

System call list

Linux has a system call table that records all registered system calls, and assigns a system call number to each system call to associate the system call.

比如(arch\x86\kernel\syscall_table_32.S):

ENTRY(sys_call_table)
	.long sys_restart_syscall	/* 0 - old "setup()" system call, used for restarting */
	.long sys_exit
	.long ptregs_fork
	.long sys_read
	.long sys_write
	.long sys_open		/* 5 */
	.long sys_close
	.long sys_waitpid
	.long sys_creat
	.long sys_link
	.long sys_unlink	/* 10 */
	.long ptregs_execve
	.long sys_chdir
	.long sys_time
// 后面略

If some system calls are not implemented, the corresponding position ( starting from 0 , which is the system call number ) will not be overwritten. Instead, use sys_ni_syscall() to indicate an unimplemented system call. It will only return -ENOSYS:

/*
 * Non-implemented system calls get redirected here.
 */
asmlinkage long sys_ni_syscall(void)
{
    return -ENOSYS;
}

 

Trigger system call

Soft interrupt : Prompt the system to switch to the kernel to execute the exception handler by raising an exception, and this exception handler is the system call handler (corresponding to the interrupt number 128, system_call()). The following is part of the implementation code for the x86_64 platform:

arch\x86\kernel\entry_64.S:
ENTRY(system_call)
    CFI_STARTPROC   simple
    CFI_SIGNAL_FRAME
    CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET
    CFI_REGISTER    rip,rcx
    /*CFI_REGISTER  rflags,r11*/
    SWAPGS_UNSAFE_STACK
    /*
     * A hypervisor implementation might want to use a label
     * after the swapgs, so that it can do the swapgs
     * for the guest and jump here on syscall.
     */
ENTRY(system_call_after_swapgs)
    movq    %rsp,PER_CPU_VAR(old_rsp)
    movq    PER_CPU_VAR(kernel_stack),%rsp
    /*
     * No need to follow this irqs off/on section - it's straight
     * and short:
     */
    ENABLE_INTERRUPTS(CLBR_NONE)
    SAVE_ARGS 8,1
    movq  %rax,ORIG_RAX-ARGOFFSET(%rsp)
    movq  %rcx,RIP-ARGOFFSET(%rsp)
    CFI_REL_OFFSET rip,RIP-ARGOFFSET
    GET_THREAD_INFO(%rcx)
    testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%rcx)
    jnz tracesys
system_call_fastpath:
    cmpq $__NR_syscall_max,%rax
    ja badsys
    movq %r10,%rcx
    call *sys_call_table(,%rax,8)  # XXX:    rip relative
    movq %rax,RAX-ARGOFFSET(%rsp)

Another trigger system call is the instruction: sysenter .

The system call number also needs to be passed to the kernel, which is passed through the eax register on x86 .

In addition to the system call number, additional external parameters may be required.

Regarding external parameters, different platforms should be different, and the statement in the book is also different from the code, so you don't need to pay special attention.

 

System call context

The kernel is in the process context when executing system calls .

The current pointer points to the current task, that is, the process that caused the system call.

The kernel can also sleep when executing a system call (correspondingly, the interrupt handler cannot sleep), so it is necessary to ensure that the system call is reentrant.

 

Avoid implementing new system calls

If you only exchange simple information, you can use an alternative method: implement device nodes, and implement read() and write() for this. Use ioctl() to operate specific settings or retrieve specific information.

Interfaces like semaphores can be represented by file descriptors.

Put the added information as a file in a suitable location in sysfs.

 

Guess you like

Origin blog.csdn.net/jiangwei0512/article/details/106143215