Operating System Comprehensive Experiment 1

1. Experimental purpose and requirements

Comprehensively use the relevant knowledge of process control, combined with the cognition of shell functions and inter-process communication methods, write simple shell programs, and deepen the understanding of the process control of the operating system and the shell interface.

2. Experimental content

  • Use the Linux operating system; learn to use Linux inter-process control, inter-process communication, pipelines, message queues, shared memory, and processor scheduling
  • Learn to use POSIX semaphores to realize the synchronization relationship between producers and consumers
  • Write a simple shell program

3. Experimental steps and process

1 Linux learning and practice

1.1 Pipeline

There are two forms of pipe communication between processes. Unnamed pipes are used between parent and child processes, and named pipes can be used between arbitrary processes—named pipes have accessible path names in the file system. The pipeline communication method is mainly used for one-way communication. If two-way communication is required, two pipelines in opposite directions are established. The pipe is essentially a buffer managed by the kernel (one side is written by the process, and the other side is read by the process), so it should be noted that if the buffer is full, the process writing the pipe will be blocked. In addition, there is no explicit format and boundary inside the pipeline, and the message boundary needs to be handled by itself. If it is shared between multiple processes, it also needs to deal with the transfer target and other work.

nameless pipe

A pipe, or unnamed pipe, is an interprocess communication mechanism provided by all Unixes. A pipeline is a one-way channel. Processes write data from the write port of the pipeline, and processes that need data obtain data from the read port. Data flows in the pipeline in the order of arrival. When using "|" in Unix commands to connect two commands, pipes are used. For example, "ls | more" writes the standard output content of the ls command into the pipe, and the output content of the pipe is used as the standard input of the more command. Note that although the redirection technique looks similar to pipes, such as "ls > temp", redirection does not use pipes.

As shown in Figure 1 below, pipe() is used as an example to show the method of using pipes to communicate between parent and child processes. pipe() will use two file descriptors (integers) to refer to the read end and write end of the pipe buffer (in the code recorded with the fds[] variable). The parent process closes the read end fds[0] of the pipe and writes information to the write end fds[1] of the pipe, and the child process closes the write end fds[1] of the pipe and reads back from the read end fds[0] of the pipe information.

Figure 1 pipe-demo.c code

Figure 2 pipe-demo.c output

Figure 2 is the output of the pipe-demo operation in Figure 1, which shows that the parent process sent a message to the pipe, and the child process successfully received the "Message from parent".

named pipe

The previously mentioned unnamed pipes have one major disadvantage, they can only be accessed through inheritance of file descriptors between parent and child processes (and their descendants), and cannot be used between arbitrary processes. Named pipes (named pipes) or FIFOs break through this limitation. It can be said that FIFO is an upgraded version of the unnamed pipe - there is an accessible disk index node, that is, the FIFO file will appear in the directory tree (unlike the unnamed pipe, which only exists in the pipefs special file system).

In Figure 3 below, we use the mkfilo command to create the named pipe os-exp-fifo, as shown in screen display 4-2, where the ls command shows that its type is pipe "p".

Figure 3 mkfifo creates a named pipe

At this time, you can use the cat os-exp-fifo command to try to read data from the pipeline, but at this time no data has been written into the pipeline, so cat will enter a blocked state, as shown in Figure 4:

Figure 4 Trying to read an empty pipe file with cat (blocking)

If you use "echo Hello, Named PIPE! >os-exp-fifo" on another terminal at this time, cat will wake up and read the pipe data and echo the string "Hello, Named PIPE!", as shown in Figure 5

Figure 5 Write data to the pipeline with echo

1.2 System V IPC

Linux's process communication inherits the System V IPC. System V IPC refers to three interprocess communication facilities introduced by AT&T in the System V.2 release:

  1. Semaphores, used to manage access to shared resources
  2. Shared memory for efficient data sharing between processes
  3. Message queues are used to transfer data between processes.

We collectively refer to these three tools as System V IPC objects, and each object has a unique IPC identifier ID. In order to enable different processes to obtain the same IPC object, an IPC keyword (IPCkey) must be provided, and the kernel is responsible for converting the IPC keyword into an IPC identifier ID. Let's look at these three IPC tools.

Execute the ipcs command in Linux to view all System V IPC objects in the current system, as shown in Figure 6.

Figure 6 The output of the ipcs command

When viewing these IPC objects, you can also bring parameters. ipcs -a is the default output of all information, ipcs -m displays shared memory information, ipcs -q displays message queue information, and ipcs -s displays semaphore set information. In addition, there are some format control parameters, -t will output time information, -p will output process PID information, -c will output creator/owner's PID, -l output related restrictions. For example, using ipcs -ql will display the restriction conditions of the message queue, as shown in Figure 7.

Figure 7 The output of ipcs -ql

The command to delete these IPC objects is ipcrm, which will also delete the IPC objects and their associated data, and only the administrator or the creator of the IPC objects can perform the deletion operation. This command can use the IPC key or the ID of the IPC to specify the IPC object: ipcrm -M shmkey deletes the shared memory segment created with shmkey and ipcrm -m shmid deletes the shared memory segment identified with shmid, ipcrm -Q msgkey deletes the shared memory segment created with msqkey message queue and ipcrm -q msqid deletes the message queue identified by msqid, ipcrm -S semkey deletes the semaphore created by semkey and ipcrm -s semid deletes the semaphore identified by semid.

1.2.1 Message queue

The message queue is something like a mailbox in the post office, and the messages in it are a bit like letters-there are envelopes and stationery with content. Since each message can be distinguished by type (type), it can be used for communication between multiple processes. For example, a task dispatching process creates several execution sub-processes, whether the parent process sends a task dispatch message or the child process sends a task execution message, the type is set to the PID of the target process, and the target process only receives messages of type The type message realizes that the child process only receives its own tasks, and the parent process only receives the task results.

Figure 8 and Figure 9 below show the code msgtool.c, which runs as a new process every time it is started, so each run is independent of each other. The core function of sending a message is msgsnd(), the first parameter is the ID of the message queue, and the second parameter is the starting address of the message to be sent (the first member of the message is an integer used to indicate the message type), The third parameter is the length of the message, and the fourth parameter specifies some behaviors when writing a message (this example uses 0); the function to receive the message is msgrcv(), the first parameter is used to specify the ID of the message queue, the second The parameter is the address of the receiving buffer, and the third parameter indicates the type of message to be accepted (0 means to accept a message of any type, >0 means to accept a message of a specified type,

Figure 8 msgtool.c code

Figure 9 msgtool.c code

Execute msgtool s 1 Hello, my_msg_queue! to send a message of type 1, and then use ipcs -q to view the newly created message queue, which contains a message of 20 bytes. At this time, execute msgtool -r 1 (another process) to read the type 1 message, and then use ipcs -q to see that the message queue is empty (0 bytes). The output of the above operation is shown in Figure 10:

Figure 10 Execution result of msgtool

1.2.2 Shared memory

The shared memory of System V IPC is a section of memory provided by the kernel, which can be mapped to the continuation space of multiple processes, so that the data sharing between processes can be completed through the read and write operations on the memory. Let's take a look at how to create shared memory first. The sample code is shown in Figure 11, which creates a shared memory area of ​​4096 bytes. The first parameter of shmget() is IPC_PRIVATE (=0, which means creating a new shared memory), the second parameter is the size of the shared memory area, and the third is the access mode. Although it is also possible to convert the key value into an ID through ftok() as in the previous example of the message queue, the ID is not specified here, but an ID value is returned by the system after the shared memory is created (when the following processes want to use the shared memory need to specify the ID).

Figure 11 shmget-demo.c code

Execute the program, and its output is shown in Figure 12. The output results show that the ID of the newly created shared memory is 24, and the length is 4096 bytes. Currently, no process has mapped it to its own process space (the number of connections is 0).

Figure 12 The running result of shmget-demo.c

The following shows the process of another process using the shared memory by mapping it, as shown in Figure 13

Figure 13 shmatt-write-demo.c code

We run shatt-demo 24 (the ID of the shared memory indicated in the command line parameter is 24), and the output of the first paragraph is shown in Figure 14. After completing the mapping of the shared memory, shmatt-write-demo writes a string "Hello shared memory!" into the shared memory. shmatt-write-demo also executes "ipcs -m" through system(), so it also outputs the current shared memory information. It can be seen that the shared memory with ID 24 has been mapped once (nattach is listed as 1).

Figure 14 Shmatt-write-demo.c running results

Next, use the ps -a command, you can see that the PID of shmatt-write-de is 4898, as shown in Figure 15

Figure 15 ps -a command to view the PID of matt-write-de

Use cat /proc/4898/maps to view the process space of the process, and you can see the process layout as shown in Figure 16

Figure 16 Process layout after running

After pressing Enter in shmatt-write-demo, the mapping of the shared memory will be unmapped. At this time, ipcs -m shows that the corresponding shared memory area is unused (the number of connections is 0), as shown in Figure 17. At this time, if you check the memory layout, you can find that the virtual memory in the original range has disappeared.

Figure 17 Shared memory after running write

At this time, try to use another program to map the shared memory and read data from it. The shmatt-read-demo code is shown in Figure 18.

Figure 18 shmatt-read-demo.c code

As shown in Figure 19, it can be seen that although the process of creating the shared memory has ended, shmatt-read-demo still reads the original written string after mapping the shared memory with ID 24.

Figure 19 Shared memory after running read

From the above experiments, it can be seen that shared memory is a more flexible communication method. It does not need to use functions such as file interface read() and write() like pipelines, nor does it need to use functions such as msgsend()/msgrcv() like message queues. The operation can be operated directly by using the memory pointer. Although its capacity was not verified in the experiment, the capacity of shared memory is much larger than that of pipes and message queues.

1.2.3 Semaphore array/semaphore set

We have also studied semaphore and semaphore set mechanism in the operating system principle course. Semaphores in the System V IPC supported by Linux are actually arrays of semaphores (semaphore sets), and multiple semaphores can be created at a time. After the semaphore set is created or obtained, the P/V operation (or up/down operation) can be performed on each semaphore. When the process performs the P/V operation, it follows the synchronization constraint relationship of the signal - the blocking or blocking of the process is completed by the operating system. wake.

1.3 Synchronization between processes

Linux supports both semaphore sets in System V IPC and POSIX semaphores. The former is often used for inter-process communication and is implemented based on the kernel (does not disappear with the end of the process); while the latter is often used for inter-thread synchronization, is easy to use, and contains only one semaphore. POSIX semaphores are divided into named semaphores and unnamed semaphores. The former is associated with the path name of a file and does not disappear with the end of the process after creation (can be used for inter-process communication). On the contrary, the unnamed semaphore only exists within the process life cycle and Can only be used between threads created by this process. The programming interface functions for the above two semaphores are easily distinguished: for all System V semaphore functions, there are no underscores in their names (for example, there is semget() instead of sem_get()), while all POSIX semaphore Functions have an underscore (for example, there is sem_post() instead of sempost()). There are also multiple concurrent execution flows inside the Linux operating system kernel, and the kernel semaphore is used between them, which is different from the user mode semaphore discussed here.

1.3.1 System V IPC semaphore set

The synchronization mechanism of the System V IPC semaphore set between processes has been discussed together with the topic of inter-process communication in the previous System V IPC, and will not be repeated here.

1.3.2 POSIX semaphores

POSIX semaphores are divided into named semaphores and unnamed semaphores. The former can be used for synchronization between multiple processes or multiple threads, and unnamed semaphores can only be used for synchronization between threads. The creation functions of the two are different, but the corresponding P/V operation functions are the same. Well-known semaphores can be used for both inter-process synchronization and inter-thread synchronization because they can be accessed by identification. The creation of a named semaphore is completed using sem_open(). The code psem-named-open.c is shown in Figure 20. It uses sem_open() to create a semaphore first, and the semaphore is identified by a string.

Figure 20 psem-named-open.c code

Then compile with gcc psem-named-open.c -o psem-named-open -lpthread (the parameter -lpthread is used to indicate the thread library used when linking), and then run psem-named-open. If no file name string is entered as an identifier, a prompt is given to ask the user to input; if a file name string is input, the creation process will be completed under normal circumstances, as shown in Figure 21.

Figure 21 Output of psem-named-open

Then try to execute the V operation in the P/V operation (that is, decrement the semaphore by 1, which may cause blocking). The program psem-named-wait-demo.c is shown in Figure 22. It executes V through sem_wait() operation (minus 1 operation), and view the value of the semaphore through sem_getvalue(). Also for the sake of code brevity, the code here does not check whether sem_open() has successfully obtained the semaphore. Therefore, if the wrong identification string is entered, the specified semaphore cannot be successfully obtained, and sem_wait() refers to an invalid semaphore and a segment fault occurs.

Figure 22 psem-named-wait-demo.c code

Compile and execute psem-named-wait-demo, input the file name identifier used when creating the semaphore, and print out the current semaphore value as 0. If you run it again, since the value of the semaphore is already 0 at this time, Further V operation (minus 1 operation) will block the process. The running status of the program is shown in Figure 23.

Figure 23 psem-named-wait-demo running results

Figure 23 shows that the program does not return to the shell prompt after running for the second time. If you use another terminal to execute the ps command at this time, you can see that the process is in the S+ state, as shown in Figure 24.

Figure 24 View the running status of psem-named-wait-demo

Next, look at the P operation, which makes the previous psem-named-wait-demo process wake up from the original blocking state and complete the execution. The program is shown in Figure 25. It should also be noted here that the code does not actually judge the return value of sem_open().

Figure 25 psem-named-post-demo.c code

Compile and execute psem-named-post-demo (not on the same terminal shell as before), and you can see that the value of the semaphore increases to 1 at this time, and the originally blocked psem-named-wait-demo is awakened and executed Finished, as shown in Figure 26.

Figure 26 The output of psem-named-post-demo

At the same time, it can be seen that the originally blocked psem-named-wait-demo is awakened and executed, as shown in Figure 27.

Figure 27 Wake up the blocked psem-named-wait-demo process

Finally, if you do not want to use this semaphore, you can cancel the semaphore through sem_unlink(), as shown in Figure 28.

Figure 28 psem-named-unlink-demo.c code

The POSIX unnamed semaphore is suitable for inter-thread communication. If the unnamed semaphore is used for inter-process synchronization, the semaphore must be placed in shared memory (as long as the shared memory area exists, the semaphore is available). Unnamed semaphores are created using sem_init().

A mutex is a degraded version of a semaphore, which is only used for mutually exclusive access between concurrent tasks. Let's first use a code to show the situation of multi-thread concurrency and no mutual exclusion to protect shared variables, as shown in code 4-10, and the result may be wrong at this time. The program checks each integer in a buffer (in the buffer are integers whose values ​​are 3, 4, 3, 4... interleaved), and counts the integers whose value is 3. Statistics The work is done concurrently by 16 threads (each thread is responsible for 1/16 of the buffer's data).

The following code no-mutex-demo.c shows the situation of multi-threaded concurrency without using mutex to protect shared variables. The code is shown in Figure 29. After compiling and running no-mutex-demo, the result is shown in Figure 30. It can be seen that the result of each run is not unique, and shared variables are not mutually exclusive.

Figure 29 no-mutex-demo.c code

Figure 30 The running result of no-mutex-demo

If the critical section of count++ is protected, that is, a mutex m is added; this problem can be avoided. Compile and run no-mutex-demo1 and get the same results every time. The code is shown in Figure 31 and the result is as shown. Since the mutually exclusive access of shared variables is realized, the result of each operation is a definite value.

Figure 31 no-mutex-demo1.c code

Figure 32 no-mutex-demo1.c result

       At this point, the Linux learning and practice completed by myself are all over.

2 Use POSIX semaphores to complete the synchronization relationship between producers and consumers

According to the previous study and local topic requirements, we can get the design idea:

(1) To create a buffer shared by producers and consumers;

(2) For the producer program, first obtain the shared memory area and hang it into the memory, then create three semaphores, write the read lines into the buffer (the semaphore must have corresponding operations in the process), and then release Semaphore, end memory mapping and delete shared memory area;

(3) For the consumer program, it is also necessary to obtain the shared memory area and hang it into the memory, then obtain three semaphores, print the line string in the buffer (the semaphore must have a corresponding operation in the process), and then release signal. Here you need to create two processes to perform the above operations concurrently.

head File:

First, define the header file as shown in Figure 33, define NUM_LINE=16 as the number of lines of shared memory (the number of lines that can be stored), and the memory size of each line is 256; and define three semaphores to judge whether mutual exclusion and Empty and full of shared memory.

Figure 33 header file

Producer code:

       Since the code framework has been given in this experiment, it is only necessary to supplement the corresponding content in the framework. The completion code is shown in Figure 34. Define the shared memory pointer, shared memory id, and semaphore pointer to access the shared memory, then obtain the shared memory area, and map the shared memory to the memory space.

Figure 34 Get the shared memory area and put it into memory

After that, three semaphores need to be created. The initial values ​​of the semaphores of sem_queue, sem_queue_empty, and sem_queue_full are 1, NUM_LINE, and 0, respectively, as shown in Figure 35:

Figure 35 Create a semaphore

Next, as shown in Figure 36, the read and write pointers are initialized to point to row 0 at the beginning, and the input rows are written into the buffer, and a semaphore operation is required. Lock the shared area, output the value of the semaphore, store the input content in the shared area, update the line of the write operation and judge: if it is quit, jump out of the loop.

Figure 36 Write the input line to the buffer

The last part is shown in Figure 37. In order to release the semaphore, end the mount image of the shared memory in this process, and delete the shared memory area. This part is completed according to the previous part of the experimental guidance.

Figure 37 release semaphore

Producer code:

As shown in Figure 38, the consumer code is similar to the producer code. It first obtains the shared memory created in the producer code, and then maps the shared memory area to the process space of the process.

Figure 38 The producer obtains the shared memory area

       As shown in Figure 39, then get the 3 semaphores created by the producer. Create two processes. When entering the child process, first wait for the semaphore and perform the P operation. After success, print the consumption content and process number, exit after finding quit, and release the semaphore. Both parent and child processes are handled the same way.

Figure 39 Producer semaphore operation

       As shown in Figure 40, same as Figure 39, the parent-child process adopts the same processing method. It should be noted here that when the parent process releases the semaphore, waitpid(fork_result, NULL, 0); should be added at the end to wait for all child processes to end before exiting to prevent orphan processes from being generated.

Figure 40 Producer parent process semaphore operation

       After writing the code, it will be tested. As shown in Figure 41, the left side is the operation of the producer, and the right side is the operation of the customer. It can be seen that the program has completed the synchronization and communication functions. The input of the producer is through the shared memory, and the customer can read it. The product content input by the producer is the same as that of the customer. The output consumption information is in one-to-one correspondence, where the content with an odd product number corresponds to the running of the parent process, and the content with an even product number corresponds to the running of the child process. When the producer enters quit, both sides exit normally.

Figure 41 The code running results of the producer/consumer problem

3 Design a simple Shell program

According to the previous study, we can know that firstly, we need to read the input command, and analyze the internal command and external command. If the corresponding command is successfully parsed, it will be executed, otherwise it will be regarded as an invalid command.

First, as shown in Figure 42, import the necessary header files, complete the classification of several commands and pre-define several functions with the help of macro definitions. The specific role will be introduced below.

Figure 42 Header files and functions of the shell program

Then, define the main function as shown in Figure 43. The main function of the main function is to apply for space for storing commands, read in commands in a loop, and perform corresponding execution according to different commands. Here, internal commands such as "help", "exit", etc. will be parsed by string comparison and executed directly.

Figure 43 The main function of the shell program

Since the shell needs to print the prompt information, in Figure 44, I defined a function to obtain the current directory and output the prompt.

Figure 44 The prompt information of the shell program

For the help information, just call the function in Figure 45 to directly output the help information.

Figure 45 The help information function of the shell program

In order to complete the input of the command line, I defined the input function as shown in Figure 46. Use a loop to read characters one by one, and terminate the loop when a newline character is read or the length is exceeded. The read characters are stored in the command array every time they are read in.

Figure 46 The read-in function of the shell program

Since the command needs to be parsed, I defined the function as shown in Figure 47. He will use spaces to split the input commands and store them in arrays until a newline character is encountered.

Figure 47 The command parsing function of the shell program

The most important thing in the Shell is to execute commands. The most important thing in executing an order is to judge whether the order is legal. And classify commands according to the type of command (redirection command, pipeline command, etc.), the code is shown in Figure 48. First, you need to take out the command and determine whether it contains a background operator.

Figure 48 Command judgment of the shell program

Next, determine whether it is a redirection or pipeline command. If it is illegal flag++, if it is legal it will be classified. First, judge the redirection symbol and the pipe symbol, as shown in Figure 49. Here, the number of redirection symbols and pipe symbols will be judged, and the commands will be classified. Then for the redirection command, the target of the redirection command will be taken out and stored in the file variable. If it is a pipeline command, the executable shell command after the pipeline symbol is taken out for execution.

Figure 49 Redirection of shell programs and judgment of pipeline commands

After the classification of the command is completed, the execution of the command will be carried out. First, as shown in Figure 50, if the most common command does not contain redirection and pipe symbols, just call the execvp function to execute it.

Figure 50 Regular command execution of the shell program

For commands with output redirection symbols, as shown in Figure 51, use the dup2 function to redirect.

Figure 51 Output redirection command execution of the shell program

For commands with input redirection symbols, as shown in Figure 52, use the dup2 function to redirect, which is similar to output redirection.

Figure 52 Input redirection command execution of shell program

For the pipeline command, it is relatively complicated. It is necessary to use the child process to execute the command in front of the pipe symbol, and then call the parent process to complete the execution of the command on the right side of the pipe symbol. The specific code is shown in Figure 53. First, use fork to create a child process, and use the child process to write the output of the command on the left side of the pipe symbol into the intermediate file. In this process, the parent process needs to wait for the child process to complete execution before executing it. Next, the parent process takes as input the intermediate file written by the child process and runs the command to the right of the pipe character. Finally, delete the temporary file.

Figure 53 Pipeline command execution of shell program

In addition, if there is a background runner, the parent process returns directly without waiting for the child process. At this point the code is shown in Figure 54

Figure 54 Background operator processing of the shell program

In addition, the command needs to be searched during each execution, so the function as shown in Figure 55 is used to search for the command. It will be searched in the current directory, the bin directory, and the bin directory under the user.

Figure 55 Command search of shell program

All the codes are as above, and the next test is as follows. First, test the internal command "help", the effect is shown in Figure 56, and the help information will be printed

Figure 56 The shell program outputs help information

       If you enter an illegal command, you will also be prompted, as shown in Figure 57

Figure 57 Shell program illegal command

Support for external commands, first, test "ls", the result is shown in Figure 58

Figure 58 The result of running the shell program ls

Next, test "ps", the result is shown in Figure 59

Figure 59 The result of running the shell program ps

Next, test the "cp" command. Here, I tested using the cp command to copy the "helloworld.c" file. The results are shown in Figure 60 and Figure 61

Figure 60 The cp command of the shell program

Figure 61 The result of the cp command of the shell program

In addition, I can also compile and run the c program in my shell, the result is shown in Figure 62

Figure 62 The shell program compiles and runs the c code

Next, test the pipeline functionality, using ls | grep helloworld and ls | grep shell as commands. The grep command is used to find the qualified string in the file, ls | grep helloworld is to find out the file name containing helloworld in the current folder and print it out. The running result under myshell is shown in Figure 63, and it can be seen that the pipeline command can be executed normally.

Figure 63 Pipeline command test of shell program

Next, test the redirection functionality. First, test the input redirection. In order to complete the test of input redirection, I wrote a simple a+b program, the code is shown in Figure 64, and created a file for redirection, as shown in Figure 65.

Figure 64 a+b program

Figure 65 Input redirected file

Then, in my shell, I tested the input redirection, the result is shown in Figure 66, you can see that the program is executed correctly

Figure 66 Shell input redirection test

Finally, test the output redirection, as shown in Figure 67, use the ps command to redirect the result to "log.txt", and the result is shown in Figure 68.

Figure 67 shell output redirection test

Figure 68 The results of the shell output redirection test

It can be seen that the result of output redirection is also as expected. So far, the shell is written, and my shell can accept internal and external commands and use the information including the path as a prompt, and can read and execute commands in a loop inside the shell. In addition, my shell also implements input and output redirection and pipeline commands.

Guess you like

Origin blog.csdn.net/m0_46326495/article/details/124731735