Meaning from the application to the kernel, the analysis process top command displays the name contains square brackets "[]" of

background

In execution top/ pstime command, in COMMANDone, we will find that some processes were being []enclosed in a, for example,

  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 1542   928 root     R     1064   2%   5% top
    1     0 root     S     1348   2%   0% /sbin/procd
  928     1 root     S     1060   2%   0% /bin/ash --login
  115     2 root     SW       0   0%   0% [kworker/u4:2]
    6     2 root     SW       0   0%   0% [kworker/u4:0]
    4     2 root     SW       0   0%   0% [kworker/0:0]
  697     2 root     SW       0   0%   0% [kworker/1:3]
  703     2 root     SW       0   0%   0% [kworker/0:3]
   15     2 root     SW       0   0%   0% [kworker/1:0]
   27     2 root     SW       0   0%   0% [kworker/1:1]

In addition to exploring top paper in []meaning, but more importantly, how do we locate the information only from the problem?

From the application code to the kernel code, delegate to fish than giving the fishing, do you think?

Analysis of the process of children's shoes are not interested, you can jump directly to the conclusion

Application code logic analysis

Keywords: COMMAND

After obtaining busybox source code, try brute search keyword

[GMPY@12:22 busybox-1.27.2]$grep "COMMAND" -rnw *

It was found that much of the data match

applets/usage_pod.c:79: printf("=head1 COMMAND DESCRIPTIONS\n\n");
archival/cpio.c:100:      --rsh-command=COMMAND  Use remote COMMAND instead of rsh
docs/BusyBox.html:1655:<p>which [COMMAND]...</p>
docs/BusyBox.html:1657:<p>Locate a COMMAND</p>
docs/BusyBox.txt:93:COMMAND DESCRIPTIONS
docs/BusyBox.txt:112:        brctl COMMAND [BRIDGE [INTERFACE]]
docs/BusyBox.txt:612:    ip  ip [OPTIONS] address|route|link|neigh|rule [COMMAND]
docs/BusyBox.txt:614:        OPTIONS := -f[amily] inet|inet6|link | -o[neline] COMMAND := ip addr
docs/BusyBox.txt:1354:        which [COMMAND]...
docs/BusyBox.txt:1356:        Locate a COMMAND
......

At this point I found that the first match because there are a lot of non-source files, so it is a lot, then I can only retrieve C documents?

[GMPY@12:25 busybox-1.27.2]$find -name "*.c" -exec grep -Hn --color=auto "COMMAND" {} \;

The results of only 71 lines, simply sweep under the matching file, there is an interesting discovery

......
./shell/ash.c:9707:         if (cmdentry.u.cmd == COMMANDCMD) {
./editors/vi.c:1109:    // get the COMMAND into cmd[]
./procps/lsof.c:31: * COMMAND    PID USER   FD   TYPE             DEVICE     SIZE       NODE NAME
./procps/top.c:626:     " COMMAND");
./procps/top.c:701:     /* PID PPID USER STAT VSZ %VSZ [%CPU] COMMAND */
./procps/top.c:841: strcpy(line_buf, HDR_STR " COMMAND");
./procps/top.c:854:     /* PID VSZ VSZRW RSS (SHR) DIRTY (SHR) COMMAND */
./procps/ps.c:441:  { 16                 , "comm"  ,"COMMAND",func_comm  ,PSSCAN_COMM    },
......

In busybox, each command is a single file , the logical structure of the code which is good, we go directly to procps / top.c file 626 line

Function: display_process_list

procps / top.c of 626 lines belong function display_process_list , simply look at the code logic

static NOINLINE void display_process_list(int lines_rem, int scr_width)
{
    ......
    /* 打印表头 */
    printf(OPT_BATCH_MODE ? "%.*s" : "\033[7m%.*s\033[0m", scr_width,
        "  PID  PPID USER     STAT   VSZ %VSZ"
        IF_FEATURE_TOP_SMP_PROCESS(" CPU")
        IF_FEATURE_TOP_CPU_USAGE_PERCENTAGE(" %CPU")
        " COMMAND");

    ......
    /* 遍历每一个进程对应的描述 */
    while (--lines_rem >= 0) {
        if (s->vsz >= 100000)
            sprintf(vsz_str_buf, "%6ldm", s->vsz/1024);
        else
            sprintf(vsz_str_buf, "%7lu", s->vsz);
        /*打印每一行中除了COMMAND之外的信息,例如PID,USER,STAT等 */
        col = snprintf(line_buf, scr_width,
                "\n" "%5u%6u %-8.8s %s%s" FMT
                IF_FEATURE_TOP_SMP_PROCESS(" %3d")
                IF_FEATURE_TOP_CPU_USAGE_PERCENTAGE(FMT)
                " ",
                s->pid, s->ppid, get_cached_username(s->uid),
                s->state, vsz_str_buf,
                SHOW_STAT(pmem)
                IF_FEATURE_TOP_SMP_PROCESS(, s->last_seen_on_cpu)
                IF_FEATURE_TOP_CPU_USAGE_PERCENTAGE(, SHOW_STAT(pcpu))
        );
        /* 关键在这,读取cmdline */
        if ((int)(col + 1) < scr_width)
            read_cmdline(line_buf + col, scr_width - col, s->pid, s->comm);
        ......
    }
}

Excluding independent code, the logic function clear

  1. Before this function code has been through all processes, and build the structure described
  2. Traversing the structure described in display_process_list and press a predetermined print order information
  3. By read_cmdline, obtaining and printing process name

We enter into the function read_cmdline

Function: read_cmdline

void FAST_FUNC read_cmdline(char *buf, int col, unsigned pid, const char *comm)
{
    ......
    sprintf(filename, "/proc/%u/cmdline", pid);
    sz = open_read_close(filename, buf, col - 1);
    if (sz > 0) {
        ......
        while (sz >= 0) {
            if ((unsigned char)(buf[sz]) < ' ')
                buf[sz] = ' ';
            sz--;
        }
        ......
        if (strncmp(base, comm, comm_len) != 0) {
            ......
            snprintf(buf, col, "{%s}", comm);
            ......
    } else {
        snprintf(buf, col, "[%s]", comm ? comm : "?");
    }
}

Excluding independent code, I found

  1. By /proc/<PID>/cmdlineacquiring process name
  2. If /proc/<PID>/cmdlineis empty, it is used comm, this time with []enclosed
  3. If cmdlinethe basename and comminconsistent with the {}enclosed

For ease of reading, no longer to analyze cmdlineand comm.

We focus on the question, under what circumstances, /proc/<PID>/cmdlineis empty?

Kernel code logic analysis

Keywords: cmdline

/ proc mount is proc , a special kind of file system, cmdline certainly is its unique features,

We assume that the kernel is white, then we can do is search key cmdline kernel source code proc

[GMPY@09:54 proc]$cd fs/proc && grep "cmdline" -rnw *

There are two key found matching file base.c and cmdline.c

array.c:11: * Pauline Middelink :  Made cmdline,envline only break at '\0's, to
base.c:224: /* Check if process spawned far enough to have cmdline. */
base.c:708: * May current process learn task's sched/cmdline info (for hide_pid_min=1)
base.c:2902:    REG("cmdline",    S_IRUGO, proc_pid_cmdline_ops),
base.c:3294:    REG("cmdline",   S_IRUGO, proc_pid_cmdline_ops),
cmdline.c:26:   proc_create("cmdline", 0, NULL, &cmdline_proc_fops);
Makefile:16:proc-y  += cmdline.o
vmcore.c:1158:   * If elfcorehdr= has been passed in cmdline or created in 2nd kernel,

cmdline.c code logic is very simple and very easy to find it is / proc / cmdline implementation, not our needs

Let focused its attention to base.c , the relevant code

REG("cmdline",   S_IRUGO, proc_pid_cmdline_ops),

Experienced intuition tells me,

  1. cmdline: is the file name
  2. S_IRUGO: file permissions
  3. proc_pid_cmdline_ops: it is a file operation corresponding to the structure

Sure, entering proc_pid_cmdline_opswe found is defined as

static const struct file_operations proc_pid_cmdline_ops = {
    .read   = proc_pid_cmdline_read,
    .llseek = generic_file_llseek,
}

Function: proc_pid_cmdline_read

static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
                size_t _count, loff_t *pos)
{
    ......
    /* 获取进程对应的虚拟地址空间描述符 */
    mm = get_task_mm(tsk);
    ......
    /* 获取argv的地址和env的地址 */
    arg_start = mm->arg_start;
    arg_end = mm->arg_end;
    env_start = mm->env_start;
    env_end = mm->env_end;
    ......
    while (count > 0 && len > 0) {
        ......
        /* 计算地址偏移 */
        p = arg_start + *pos;
        while (count > 0 && len > 0) {
            ......
            /* 获取进程地址空间的数据 */
            nr_read = access_remote_vm(mm, p, page, _count, FOLL_ANON);
            ......
        }
    }
}

White this time probably wondering, how do you know access_remote_vmis doing it?

Very simple, jump to the access_remote_vmfunction, you can see that this function is annotated

/**
 * access_remote_vm - access another process' address space
 * @mm:         the mm_struct of the target address space
 * @addr:       start address to access
 * @buf:        source or destination buffer
 * @len:        number of bytes to transfer
 * @gup_flags:  flags modifying lookup behaviour
 *
 * The caller must hold a reference on @mm.
 */
int access_remote_vm(struct mm_struct *mm, unsigned long addr,
        void *buf, int len, unsigned int gup_flags)
{
    return __access_remote_vm(NULL, mm, addr, buf, len, gup_flags);
}

Linux kernel source code, many functions have explained very standardized function, parameter descriptions, precautions, etc. , we should make full use of these resources to learn code.

Digress, let's get back on topic.

From proc_pid_cmdline_read, we found that reading /proc/<PID>/cmdlineactually read the arg_startaddress space of the data began. So, when these data address space is empty, of course, can not attend any of the data. So the question is, address space when data arg_start identity is empty?

Keywords: arg_start

Address space-related, not mere proc thing, we try to retrieve the kernel source global keyword

[GMPY@09:55 proc]$find -name "*.c" -exec grep --color=auto -Hnw "arg_start" {} \;

Match a lot, I do not want to see one by one, and no direction from the retrieved code

./mm/util.c:635:    unsigned long arg_start, arg_end, env_start, env_end;
......
./kernel/sys.c:1747:        offsetof(struct prctl_mm_map, arg_start),
......
./fs/exec.c:709:    mm->arg_start = bprm->p - stack_shift;
./fs/exec.c:722:    mm->arg_start = bprm->p;
......
./fs/binfmt_elf.c:301:  p = current->mm->arg_end = current->mm->arg_start;
./fs/binfmt_elf.c:1495: len = mm->arg_end - mm->arg_start;
./fs/binfmt_elf.c:1499:                (const char __user *)mm->arg_start, len))
......
./fs/proc/base.c:246:   len1 = arg_end - arg_start;
......

But give me inspiration from matching file names:

/ proc / <PID> / cmdline is a property of each process, from task_struct to mm_struct are described processes and related resources, and that when it will modify the arg_start mm_struct where it? Process initialization time!

Lenovo to further the process of creating a user space no more than two steps:

  1. fork
  2. exec

When fork just to create a new task_struct, father and son share a process mm_struct, only in exectime, will separate out mm_struct, it must have been arg_start execis modified! The matching arg_startfile, just have exec.c.

View the fs/exec.ckeywords where the function setup_arg_pageslater, did not find the key to the code, so continue to view the matching file names, resulting in a further association:

exec implementation of a new program, the new program is actually loaded bin files, keyword matching is just there binfmt_elf.c!

Positioning problem not only to be able to understand the code, Lenovo is sometimes very effective

Function: create_elf_tables

binfmt_elf.c match keywords arg_start is a function of create_elf_tables , function very long, we streamlined look

static int
create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
        unsigned long load_addr, unsigned long interp_load_addr)
{
    ......
    /* Populate argv and envp */
    p = current->mm->arg_end = current->mm->arg_start;
    while (argc-- > 0) {
        ......
        if (__put_user((elf_addr_t)p, argv++))
            return -EFAULT;
        ......
    }
    ......
    current->mm->arg_end = current->mm->env_start = p;
    while (envc-- > 0) {
        ......
        if (__put_user((elf_addr_t)p, envp++))
            return -EFAULT;
        ......
    }
    ......
}

In this function, to achieve the argv and envp parties do not deposit arg_start and env_start address space.

Next, we try by tracing the source, along with traceability function create_elf_tablescalls

First of all, create_elf_tablesis declared as static , it said it is unlikely to exceed the effective range where the file. In file search, we found a higher level function

static int load_elf_binary(struct linux_binprm *bprm)

Or even static , and then continue to retrieve this file load_elf_binary, locate the following code:

static struct linux_binfmt elf_format = {
    .module         = THIS_MODULE,
    .load_binary    = load_elf_binary,
    .load_shlib     = load_elf_library
    .core_dump      = elf_core_dump,
    .min_coredump   = ELF_EXEC_PAGESIZE,
};

static int __init init_elf_binfmt(void)
{
    register_binfmt(&elf_format);
    return 0;
}

core_initcall(init_elf_binfmt);

Here is retrieved, the code structure is very clear, load_elf_binarythe function assigned to struct linux_binfmtby ` register_binfmtregister with the upper layer, provides the upper callback.

Keywords: load_binary

Why lock keyword load_binary it? Now .load_binary = load_elf_binary,, the call should indicate the upper XXX->load_binary(...), thus locking keyword load_binary to locate, where to call this callback.

[GMPY@09:55 proc]$ grep "\->load_binary" -rn *

Very lucky, this is only the callback fs/exec.ccalling

fs/exec.c:78:   if (WARN_ON(!fmt->load_binary))
fs/exec.c:1621:     retval = fmt->load_binary(bprm);

Enter the fs / exex.c the 1621 line, attributable to the function search_binary_handler, but unfortunately EXPORT_SYMBOL(search_binary_handler);existence, he said he was likely to have multiple function is called, this time to continue forward analysis clearly very difficult, why not try to reverse engineer it ?

When a dead end road, another point of view, the answer is at hand

Since the search_binary_handler continue the analysis is not easy, we take a look at execvewhether the system calls can step into search_binary_handler?

Keywords: exec

On Linux-4.9, system calls are generally defined SYSCALL_DEFILNE<参数数量>(<函数名>..., so we global search key, first determine where the system call is defined in?

[GMPY@09:55 proc]$ grep "SYSCALL_DEFINE.*exec" -rn *

Locate the filefs/exec.c

fs/exec.c:1905:SYSCALL_DEFINE3(execve,
fs/exec.c:1913:SYSCALL_DEFINE5(execveat,
fs/exec.c:1927:COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
fs/exec.c:1934:COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
kernel/kexec.c:187:SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
kernel/kexec.c:233:COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
kernel/kexec_file.c:256:SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,

Later follow-up call function is no longer a burden to summarize the call relationship

execve -> do_execveat -> do_execveat_common -> exec_binprm -> search_binary_handler

After all, is a return to the search_binary_handler

With this analysis, we determined the assignment logic:

  1. In execvethe implementation of the new program will initializemm_struct
  2. The execvepassed in argv and envp saved to arg_start and env_start address specified
  3. In cat /proc/<PID>/cmdlinefrom time arg_start virtual address to obtain data

Therefore, as long as the user-space process created after the system call execve, there will be/proc/<PID>/cmdline , but still did not clarify when it will cmdline be empty?

We know that in Linux, the process can be divided into user-space processes and kernel space processes, since user-space process cmdline is not empty, we look at the kernel process.

Function: kthread_run

Kernel-mode driver, often through the kthread_runcreation kernel process, we use this function as the entry point, when you create a kernel process analysis, will assign cmdline?

Kthread_run directly from the start, track calling relationships, discover the real work is a function of__kthread_create_on_node

kthread_run -> kthread_create -> kthread_create_on_node -> __kthread_create_on_node

Remove redundant code, focus on what function do

static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
                void *data, int node, const char namefmt[], va_list args)
{
    /* 把新进程相关的属性存于 kthread_create_info 的结构体中 */
    struct kthread_create_info *create = kmalloc(sizeof(*create), GFP_KERNEL);
    create->threadfn = threadfn;
    create->data = data;
    create->node = node;
    create->done = &done;
    
    /* 把初始化后的create加入到链表,并唤醒kthreadd_task进程来完成创建工作 */
    list_add_tail(&create->list, &kthread_create_list);
    wake_up_process(kthreadd_task);
    /* 等待创建完成 */
    wait_for_completion_killable(&done)
    
    ......

    task = create->result;
    if (!IS_ERR(task)) {
        ......
        /* 创建后,设置进程名,此处的进程名属性为comm,不同于cmdline */
        vsnprintf(name, sizeof(name), namefmt, args);
        set_task_comm(task, name);
        ......
    }
}

Similar analysis of the text keep, not tired. In summary, the function does two things

  1. Wake process kthread_taskto create a new process
  2. Set process attributes, attributes include comm, but does not include cmdline

Recalling user code analysis , if /proc/<PID>/cmdlineempty, COMM is used, this time with [] enclosed **

Therefore, after the kernel process kthread_run / ktrhread_create created /proc/<PID>/cmdlinecontent is empty

to sum up

In this paper top, psthe process name is displayed in the command whether or not containing []as an entry point, from the user program to implement the principle of kernel code in-depth analysis.

In this analysis process, mainly by the following analytical methods

  1. Keyword search - from COMMAND top of the program to the kernel source arg_start, load_binary, exec
  2. Notes function - a function description function access_remote_vm
  3. Lenovo - Lenovo attributes from the process to create a user-space process, and then navigate to the handler arg_start keyword
  4. Reverse thinking - is derived from search_binary_handler difficult to call up the relationship, instead analyze whether execve system calls can be a step by step to search_binary_handler?

According to this analysis, we conclude that

1. 用户空间创建的进程在top/ps显示不需要[]
2. 内核空间创建的进程在top/ps显示会有[]

Ps from the actual results, in line with the above analysis results.

Due to limited capacity, if the above analysis is not precise enough, and hope to learn together to discuss

Guess you like

Origin www.cnblogs.com/gmpy/p/11267949.html