C library, API and system calls
Under normal circumstances, applications are programmed through an application programming interface (API) implemented in user space rather than directly through system calls.
LInux system calls, like most Unix systems, are provided as part of the C library.
The C library implements the main APIs of the Unix system, including standard C library functions and system call interfaces.
The most popular API in Unix is based on the POSIX standard.
Examples and instructions
getpid() system call:
SYSCALL_DEFINE0(getpid)
{
return task_tgid_vnr(current);
}
SYSCALL_DEFINE0 is a macro that defines a system call without parameters. It is one of a series of macros:
#define SYSCALL_DEFINE0(name) asmlinkage long sys_##name(void)
#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
The expanded definition actually looks like this:
asmlinkage long sys_getpid(void)
Here:
asmlinkage is a compilation instruction used to inform the compiler to extract only the parameters of the function from the stack .
Long is to ensure compatibility. It returns int in user space and long in kernel space.
sys_ xx is a naming rule that all system calls in Linux should follow.
System call list
Linux has a system call table that records all registered system calls, and assigns a system call number to each system call to associate the system call.
比如(arch\x86\kernel\syscall_table_32.S):
ENTRY(sys_call_table)
.long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */
.long sys_exit
.long ptregs_fork
.long sys_read
.long sys_write
.long sys_open /* 5 */
.long sys_close
.long sys_waitpid
.long sys_creat
.long sys_link
.long sys_unlink /* 10 */
.long ptregs_execve
.long sys_chdir
.long sys_time
// 后面略
If some system calls are not implemented, the corresponding position ( starting from 0 , which is the system call number ) will not be overwritten. Instead, use sys_ni_syscall() to indicate an unimplemented system call. It will only return -ENOSYS:
/*
* Non-implemented system calls get redirected here.
*/
asmlinkage long sys_ni_syscall(void)
{
return -ENOSYS;
}
Trigger system call
Soft interrupt : Prompt the system to switch to the kernel to execute the exception handler by raising an exception, and this exception handler is the system call handler (corresponding to the interrupt number 128, system_call()). The following is part of the implementation code for the x86_64 platform:
arch\x86\kernel\entry_64.S:
ENTRY(system_call)
CFI_STARTPROC simple
CFI_SIGNAL_FRAME
CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
SWAPGS_UNSAFE_STACK
/*
* A hypervisor implementation might want to use a label
* after the swapgs, so that it can do the swapgs
* for the guest and jump here on syscall.
*/
ENTRY(system_call_after_swapgs)
movq %rsp,PER_CPU_VAR(old_rsp)
movq PER_CPU_VAR(kernel_stack),%rsp
/*
* No need to follow this irqs off/on section - it's straight
* and short:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_ARGS 8,1
movq %rax,ORIG_RAX-ARGOFFSET(%rsp)
movq %rcx,RIP-ARGOFFSET(%rsp)
CFI_REL_OFFSET rip,RIP-ARGOFFSET
GET_THREAD_INFO(%rcx)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%rcx)
jnz tracesys
system_call_fastpath:
cmpq $__NR_syscall_max,%rax
ja badsys
movq %r10,%rcx
call *sys_call_table(,%rax,8) # XXX: rip relative
movq %rax,RAX-ARGOFFSET(%rsp)
Another trigger system call is the instruction: sysenter .
The system call number also needs to be passed to the kernel, which is passed through the eax register on x86 .
In addition to the system call number, additional external parameters may be required.
Regarding external parameters, different platforms should be different, and the statement in the book is also different from the code, so you don't need to pay special attention.
System call context
The kernel is in the process context when executing system calls .
The current pointer points to the current task, that is, the process that caused the system call.
The kernel can also sleep when executing a system call (correspondingly, the interrupt handler cannot sleep), so it is necessary to ensure that the system call is reentrant.
Avoid implementing new system calls
If you only exchange simple information, you can use an alternative method: implement device nodes, and implement read() and write() for this. Use ioctl() to operate specific settings or retrieve specific information.
Interfaces like semaphores can be represented by file descriptors.
Put the added information as a file in a suitable location in sysfs.