Wonderful journey of NULL pointer

Today I will show you how the NULL pointer is formed? Of course, we have to dig into the operating system to see why accessing a NULL instruction will report a Segment Fault error.

Presumably everyone has written NULL pointer programs when they touched the computer, especially those who play C language. For example, a pointer of type int that has just been initialized is assigned to this pointer before allocating memory space, and then a Segment Fault error will occur during operation.

#include <stdio.h>

int main()
{
    int*p = NULL;

    *p = 123;
    return 0;
}


root:~/test$ ./a.out
Segmentation fault (core dumped)

Just a few lines of code, but experienced a long "travel" in the operating system, today I will take you to explore this wonderful journey.

Start travel

After we compile the program, we use ./a.out to run it. In the operating system, bash is used to create a child process. This child process is our NULL pointer program. As for how to create a subprocess, you can go through the related articles created by the process. When a child process is created, the content of the NULL pointer program will be loaded through the exec program. When the program runs, the operating system will load each segment for the NULL pointer program

When a program runs, the operating system will automatically mount various sections for it. The common sections are:

  • Data segment: divided into read-only data segment, and readable and writable data segment
  • Code segment: the code we wrote, the general permissions are RX
  • Heap: Generally used to map the memory area or mmap requested by mallo
  • Stack: Generally used to store function parameters for function calls, used to save function jumps.
  • Shared library: This must exist for every process, some programs need to use the functions encapsulated in gibc, you need the Glic library.

Running trip

After all the environment is set up, the program needs to perform its mission. We can disassemble the NULL pointer program. There are many disassembly contents. We only look at the disassembly of the main function. Here we use aarch64-linux- gnu-objdump tool chain

0000000000400530 <main>:
  400530:       d10043ff        sub     sp, sp, #0x10
  400534:       f90007ff        str     xzr, [sp,#8]
  400538:       f94007e0        ldr     x0, [sp,#8]
  40053c:       52800f61        mov     w1, #0x7b                       // #123
  400540:       b9000001        str     w1, [x0]
  400544:       52800000        mov     w0, #0x0                        // #0
  400548:       910043ff        add     sp, sp, #0x10
  40054c:       d65f03c0        ret

All that can go to the main function is that the operating system helps to do some things, not paying attention to this part for now. When it reaches the main function, it will first push the stack, and then the CPU will execute the str w1, [x0] instruction. The corresponding C language is * p = 123. When the CPU executes this statement, the following operations occur.

  • The CPU will first send the virtual address to the MMU, and let the MMU hardware unit do a lookup table of the virtual address to the physical address and convert it.
  • At the same time, the MMU hardware unit will also do some virtual address permission checks to see if the virtual address access is beyond the boundary, and read and write permissions, etc.
  • When the mapping relationship between the virtual address and the physical address already exists in the MMU hardware unit, the physical address is directly returned for the CPU to perform the access
  • If there is no mapping between virtual addresses and physical addresses in the MMU hardware unit, a page fault exception will be triggered to establish a virtual-real mapping.
  • At the same time, because the virtual-real mapping is more time-consuming, the TLB is used to cache the recently accessed virtual-real mapping relationship, and the TLB is accessed before the table lookup to speed up the conversion speed.
  • For our example, the address of * p is NULL. If the CPU performs the access, the MMU will determine that the address is illegal, and a data abort exception will be triggered.
  • Triggering an exception will jump to the exception vector table of the corresponding architecture and execute it. Here we take ARM64 as an example

Abnormal travel

The CPU goes to access a NULL address. If the MMU detects that it is illegally accessed, it will trigger an exception and jump to the ARM64 exception vector table for execution.

/*
 * Exception vectors.
 */
	.pushsection ".entry.text", "ax"

	.align	11
ENTRY(vectors)
	kernel_ventry	1, sync_invalid			// Synchronous EL1t
	kernel_ventry	1, irq_invalid			// IRQ EL1t
	kernel_ventry	1, fiq_invalid			// FIQ EL1t
	kernel_ventry	1, error_invalid		// Error EL1t

	kernel_ventry	1, sync				// Synchronous EL1h
	kernel_ventry	1, irq				// IRQ EL1h
	kernel_ventry	1, fiq_invalid			// FIQ EL1h
	kernel_ventry	1, error			// Error EL1h

	kernel_ventry	0, sync				// Synchronous 64-bit EL0
	kernel_ventry	0, irq				// IRQ 64-bit EL0
	kernel_ventry	0, fiq_invalid			// FIQ 64-bit EL0
	kernel_ventry	0, error			// Error 64-bit EL0

The ARM64 architecture defines four abnormal levels, EL0, EL1, EL2, and EL3, among which EL0 is userspace, EL1 is Linux kernel, El2 is hyper, and EL3 is Secure mode. Currently our exception is triggered from EL0, it will jump to the EL0 exception handling handler

/*
 * EL0 mode handlers.
 */
	.align	6
el0_sync:
	kernel_entry 0
	mrs	x25, esr_el1			// read the syndrome register
	lsr	x24, x25, #ESR_ELx_EC_SHIFT	// exception class
	cmp	x24, #ESR_ELx_EC_SVC64		// SVC in 64-bit state
	b.eq	el0_svc
	cmp	x24, #ESR_ELx_EC_DABT_LOW	// data abort in EL0
	b.eq	el0_da
	cmp	x24, #ESR_ELx_EC_IABT_LOW	// instruction abort in EL0
	b.eq	el0_ia
	cmp	x24, #ESR_ELx_EC_FP_ASIMD	// FP/ASIMD access
	b.eq	el0_fpsimd_acc
	cmp	x24, #ESR_ELx_EC_SVE		// SVE access
	b.eq	el0_sve_acc
	cmp	x24, #ESR_ELx_EC_FP_EXC64	// FP/ASIMD exception
	b.eq	el0_fpsimd_exc
	cmp	x24, #ESR_ELx_EC_SYS64		// configurable trap
	ccmp	x24, #ESR_ELx_EC_WFx, #4, ne
	b.eq	el0_sys
	cmp	x24, #ESR_ELx_EC_SP_ALIGN	// stack alignment exception
	b.eq	el0_sp_pc
	cmp	x24, #ESR_ELx_EC_PC_ALIGN	// pc alignment exception
	b.eq	el0_sp_pc
	cmp	x24, #ESR_ELx_EC_UNKNOWN	// unknown exception in EL0
	b.eq	el0_undef
	cmp	x24, #ESR_ELx_EC_BREAKPT_LOW	// debug exception in EL0
	b.ge	el0_dbg
	b	el0_inv

It can be seen that there are many types of exceptions, such as data exception DateAbort, instruction exception IABort, stack alignment exception, PC alignment exception, etc. And how do you know what kind of anomaly is present? This is by reading the ESR register to get the corresponding exception type.

  • Bits [31:26] is used to determine the type of exception, Exception class
  • Bit [25]: Used to determine the length of abnormal instructions, 0 represents 16-bit abnormal instructions, 1 represents 32-bit abnormal
  • Bits [24: 0]: Used to determine specific exceptions, each exception type defines this field independently
  • For more information, you can go to the ARM manual
el0_da:
	/*
	 * Data abort handling
	 */
	mrs	x26, far_el1
	enable_daif
	ct_user_exit
	clear_address_tag x0, x26
	mov	x1, x25
	mov	x2, sp
	bl	do_mem_abort
	b	ret_to_user

What happened here is the data abort exception, which will jump to el0_da and eventually jump to the do_mem_abort handler function

static const struct fault_info fault_info[] = {
	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
	{ do_bad,		SIGKILL, SI_KERNEL,	"level 2 address size fault"	},
	{ do_bad,		SIGKILL, SI_KERNEL,	"level 3 address size fault"	},
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 0 translation fault"	},
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	},
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 2 translation fault"	},
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 3 translation fault"	},

asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
					 struct pt_regs *regs)
{
	const struct fault_info *inf = esr_to_fault_info(esr);

	if (!inf->fn(addr, esr, regs))
		return;
}

The corresponding exception type can be obtained through the value of the ESR register, and then the corresponding exception handling function is obtained with the exception type as the subscript in the fault_info array. The exception handling function corresponding to the example here is do_translation_fault, because we occurred in EL0 Address translation error.

static int __kprobes do_translation_fault(unsigned long addr,
					  unsigned int esr,
					  struct pt_regs *regs)
{
	if (is_ttbr0_addr(addr))
		return do_page_fault(addr, esr, regs);

	do_bad_area(addr, esr, regs);
	return 0;
}

Here, according to the exception address to determine whether it is currently EL0 or other mode exceptions, because addr = 0x0, which belongs to the EL0 exception, then jump to do_page_fault to further handle the exception, do_page_fault is the kernel's total processing interface for page fault exceptions, which Handle all kinds of page fault exceptions.

  • If the virtual address is legal, a page table will be created for the virtual address to establish the virtual-real mapping
  • If the virtual address access is illegal, and the address belongs to the kernel address space, it will directly panic
  • If the virtual address is legal, the authority will also be adhered to. If the virtual address is read-only, and if it is written, an exception will occur.
  • For virtual illegal virtual addresses in user space, the upper layer is usually notified by signals to terminate the program
  • For our NULL pointer program, the SIGSEGV signal will eventually be notified to the application
arm64_force_sig_fault(SIGSEGV,fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR,
		(void __user *)addr, inf->name);

The kernel will eventually call arm64_force_sig_fault to notify the application, and the signal type here is SIGSEGV, illegal access.

Signal receiving travel

Signals are an asynchronous communication method. A process can signal another process, but the signal processing is implemented in the kernel. The types of signals are:

 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

The signal that occurs in our example is SIGSEGV. The usual method of signal is:

  • The process installation signal can be called with the sigaction system. The installation signal must set the callback function of the signal to process the signal when the signal occurs.
  • For example, Kill -9 PID can kill the process. At the same time, the process will receive the signal and it will process the installation function of the signal.

The process of signal reception, no code analysis here:

  • When sigaction installs a signal, it will trigger a system call, trap to the kernel space to set the signal action of this process
  • When this process receives a signal, such as SIGSEGV, in order not to prevent the loss of the signal, the sigqueue structure is used to manage the signal
  • It can be understood as a signal receiving queue, and the received signals are managed by enqueuing. Of course there are strategies like priority
  • When a signal enters the queue, it will be put into the pending queue for processing. At this time, the process that needs to process the signal will be woken up.

Signal processing travel

Signals can't be processed at any time. Only check whether there is signal processing when returning to user space.

/*
 * Ok, we need to do extra processing, enter the slow path.
 */
work_pending:
	mov	x0, sp				// 'regs'
	bl	do_notify_resume
	ldr	x1, [tsk, #TSK_TI_FLAGS]	// re-check for single-step
	b	finish_ret_to_user
/*
 * "slow" syscall return path.
 */
ret_to_user:
	disable_daif
	ldr	x1, [tsk, #TSK_TI_FLAGS]
	and	x2, x1, #_TIF_WORK_MASK
	cbnz	x2, work_pending
finish_ret_to_user:
	enable_step_tsk x1, x2
	kernel_exit 0
ENDPROC(ret_to_user)

When ret_to_user returns to user space, it will check whether there are extra things to handle, if there are, then jump to do_notify_resume, and determine whether there are additional things to handle by judging flag flag in thread_info

asmlinkage void do_notify_resume(struct pt_regs *regs,
				 unsigned long thread_flags)
{

	do {
		/* Check valid user FS if needed */
		addr_limit_user_check();

		if (thread_flags & _TIF_NEED_RESCHED) {
			/* Unmask Debug and SError for the next task */
			local_daif_restore(DAIF_PROCCTX_NOIRQ);

			schedule();
		} else {
			if (thread_flags & _TIF_SIGPENDING)
				do_signal(regs);

		}

	} while (thread_flags & _TIF_WORK_MASK);
}
  • Two common things that need to be handled when returning to user space,
    • One is to check whether the current process needs scheduling, by checking whether the NEED_RESCHEd flag is set
    • One is to check if there is pending signal, if there is, then do_signal to process the signal

The do_signal function code will not be analyzed. The general process is to find the signal processing with high priority through get_signal, and return the corresponding signal processing handler, which is the callback function set through sigaction. Finally call the hanle_signal function to process the signal.

static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
			 struct rt_sigframe_user_layout *user, int usig)
{
	__sigrestore_t sigtramp;

	regs->regs[0] = usig;
	regs->sp = (unsigned long)user->sigframe;
	regs->regs[29] = (unsigned long)&user->next_frame->fp;
	regs->pc = (unsigned long)ka->sa.sa_handler;

	if (ka->sa.sa_flags & SA_RESTORER)
		sigtramp = ka->sa.sa_restorer;
	else
		sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp);

	regs->regs[30] = (unsigned long)sigtramp;
}

Here we need to establish the concept of a signal stack. By setting the signal processing function to the PC pointer returning to user space, the signal processing function will be called when returning to user space. After processing, it will return to the kernel clean stack frame operation through the sigreturn system call.

                                            

Register to travel

From our NULL pointer program, there is no installation signal, why do you receive Segmentation Fault? In fact, this is what glibC does for us. By downloading a glibc code.

/* Standard signals  */
  init_sig (SIGHUP, "HUP", N_("Hangup"))
  init_sig (SIGINT, "INT", N_("Interrupt"))
  init_sig (SIGQUIT, "QUIT", N_("Quit"))
  init_sig (SIGILL, "ILL", N_("Illegal instruction"))
  init_sig (SIGTRAP, "TRAP", N_("Trace/breakpoint trap"))
  init_sig (SIGABRT, "ABRT", N_("Aborted"))
  init_sig (SIGFPE, "FPE", N_("Floating point exception"))
  init_sig (SIGKILL, "KILL", N_("Killed"))
  init_sig (SIGBUS, "BUS", N_("Bus error"))
  init_sig (SIGSEGV, "SEGV", N_("Segmentation fault"))

It can be seen that glibc has installed some processing functions for flag signals for us. So after we access the NULL instruction, a Segmentation fault occurs.

Travel summary

  • When the application starts, the sigaction system will be called in glibc to set the signal processing function for the flag signal
  • When the CPU accesses the virtual address to 0x0, it triggers a data abort exception and falls into the kernel state
  • The kernel mode obtains the corresponding exception type according to the ESR register, and then calls back the corresponding exception handling function do_translation_fault
  • For the userspace address that the address cannot handle, the SIGSEGV signal is sent to the sigqueue queue, and then the corresponding signal processing function is woken up
  • When returning to user space, it will check whether there is signal processing, and if there is, jump to the do_signal function to process the signal
  • In the do_signal function, get the callback processing function corresponding to the signal through the get_signal function, and then establish the stack frame of the signal
  • Set the signal processing function handler to the PC pointer of the application, and return to the user layer will handle the callback function of the signal
  • At this time, the callback function corresponding to the SIGSEGV signal set by glibc is called, and a "Segmetation fault" error is issued.
  • After processing, it will return to the stack frame created by the kernel space clean through the sigreturn system call, and then it will return to user space again and execute.
  • At this point, the journey of a simple NULL pointer is over, which is quite complicated.

 

Published 187 original articles · won 108 · 370,000 views

Guess you like

Origin blog.csdn.net/longwang155069/article/details/104789808