1. Shell related knowledge
1.1 External commands and built-in commands
I think the shell can actually be seen as consisting of a front-end for string processing and a back-end for calling other files. The front-end parses the string entered by the user, and then passes the parsed result to the back-end, allowing the back-end to call other files. The files here refer to "external commands". The reason for this name is precisely because the function implementation of external commands is not within the shell. It is precisely because the shell does not need to implement external commands, so the difficulty of this big assignment is not high. The function of corresponding internal commands is implemented within the shell (that is, it needs to be implemented in the code of the shell). Fortunately, large jobs do not require the implementation of internal commands.
Regarding whether a command is an internal command or an external command, you can use type
the command to check, for example, enter
type cd
will have output
cd 是 shell 内建
Description is an internal command, and when input
type less
The output is
less 是 /usr/bin/less
Description is an external command.
1.2 File descriptors
Under the design philosophy of "everything is a file", what we call "redirection" is the process of replacing the standard input and output files with files of our own choice. Here will involve some knowledge of the operating system, as follows:
The above figure is a complete schematic diagram of the relationship between processes and files. On the far left is a unique file descriptor table for each process. Its essence is a file retrieval table. We can use file descriptors (file descriptor, fd) to retrieve corresponding entries.
Here first introduce the nature of fd:
- Each process has its own fd increment space. Positive integers occupied by closed fds may be reused. The number of fds that a single process can open at the same time is
limit
limited by system settings. - According to the agreement,
shell
when starting a new application, always open the three-number descriptors of0
,1
, as , , . They are named with macros in C, respectively , , .2
stdin
stdout
stderr
STDIN_FILENO
STDOUT_FILENO
STDERR_FILENO
The entries we retrieve point to something called a file table entry, which is still not a real file. It can be regarded as the state of the file, which records information such as our permissions on this file, the current offset of reading and writing. It is easy to think that two different file entries can correspond to the same file, but there are differences in status information such as permissions and offsets. This file entry table is shared by all processes.
The file entry will contain a pointer to the v-node
node , on which is the control block that records the static information of the file, and is v-node
the one-to-one correspondence between each file and each .
In short, when we program in C language, we either use fd
(the most essential) or use FILE*
to manipulate files (should be the package provided by C). Here we use fd, because we generally use it to implement redirection and pipes, and the parameters of related system calls are file descriptors.
1.3 Opening and closing of files
In the user process, it can be implemented through system calls. For opening files, we have
int open(char *filename, int flags, int mode);
usable. This function will open the file filename
named permissions flags
described by , we have a series of macros, and support and operation
macro | meaning |
---|---|
O_RDONLY | read only |
O_WRONLY | just write |
O_RDWR | readable and writable |
O_CREATE | If the file does not exist, create a truncated (empty) file |
O_TRUNC | If the file already exists, truncate it |
O_APPEND | Before each write operation, set the file position to the end |
mode
The access permission bit of the new file is specified, and there is also a macro definition, but it will not be expanded. Generally, there 0666
will .
This function returns the file descriptor for the open file fd
.
When we need to close a file, we can do this
int close(int fd);
1.4 Reading and writing files
This actually has nothing to do with the implementation of the shell, but this is the first time I understand it. Let me record it, that is, we will make a system call every time we read and write files, but this is undoubtedly a high cost, because it is frequently used in user mode and kernel mode. to switch between.
The functions we getc
usually are called buffered read and write functions. What he said is that he will read the information of the entire buffer size after opening the file, and then follow getc
the call of , one by one from the buffer to the outside Delivery, until there is no more, it is more convenient to call the system again to fill the entire buffer.
1.5 Redirection
With the above knowledge, we can introduce redirection. The function we use is
int dup2(int oldfd, int newfd);
This function says to copy the oldfd
corresponding descriptor entry to newfd
the entry. If newfd
there is no corresponding file, then newfd
when is used again, the corresponding file is the entry corresponding to oldfd
the file . If newfd
there is a corresponding file, then dup2
will be closed oldfd
before newfd
. If the return value is negative, it means failure.
For example, before calling
execute statement
dup2(4,1)
it became like this
In fact, this is the redirection stdout
of , and all future stdout
operations on will point to the file fd=4
of .
1.6 Pipeline
The pipeline is also based on the previous understanding
int pipe(int fd[2]);
Returns if successful 0
, otherwise returns -1
.
When it succeeds, it will modify the contents fd
of the array , as stipulated: fd[0] → r; fd[1] → w. Reading and writing data to the pipe file is actually reading and writing the kernel buffer. No open
, but manual close
.
Its implementation has a schematic diagram:
1.7 Calling external commands
We can use the following function
int execvp(const char* command, char* argv[]);
It should be noted that this function generally does not return, that is, the statement after it will not be executed, so if it is executed, it will report an error. Also, argv
the last item must be NULL
.
For example, let's say we want to enter the following command
ls -a -l ~
Then the corresponding parameters should be
command = "ls";
argv = {
"ls", "-a", "-l", "~", NULL};
2. Basic functions
2.1 Demand Analysis
At the beginning, I was very scared, because I felt that the shell was related to the operating system, so it might be because of insufficient knowledge of the operating system that I couldn't write it. Later, with in-depth research, I found that it is not that difficult to write a shell. A simple shell that does not implement redirection, pipes, built-in commands, and background commands can be written in more than 100 lines. In fact, its essence can be summarized Implement a system
function .
For redirection, in fact, you only need to identify the redirection symbol <,>,>>
separately , and then record the redirected file. Before calling the external command, perform the redirection operation of the file, and then call it.
For the pipeline command, because the title requires only to realize the pipeline connection of two commands, two command variables can be maintained, and then the position |
of and then it is cut into two commands accordingly, and then respectively Redirection is done and the requirement is met. But the "two commands" are not general, so I expanded it into a connection of any pipeline command, the effect will be demonstrated later, and the implementation principle will be introduced later.
2.2 shell program flow
Because the analysis process is relatively complicated, it is too cumbersome to display in the general flow chart. Therefore, a sub-flow chart is drawn to describe the analysis process.
2.3 Function display
2.3.1 Command prompt with identity feature
It can be seen that when my shell is started, the shell name will be printed first Thysrael Shell
, and then on the leftmost side of the command prompt in each line, there will ThyShell
be the words, these are characters with the identity of the writer.
2.3.2 Running an external command without parameters
We selected it ls
as the test object and found that it can be run.
2.3.3 Support I/O redirection
The redirection of the standard output, you can see that whether it is >
or >>
is functioning normally
input redirection
functioning normally
2.3.4 Pipeline commands
Two commands can be piped together
2.3.5 Combination of pipes and redirection
It can be seen that there is no problem with the combination of input redirection or output redirection and pipeline.
2.3.6 Code size
ThyShell
All codes are implemented ThyShell.c
in , with a total of 322 lines, which meets the requirements of the question.
2.4 Implementation and system calls
3. Advanced features
3.1 Print and compress the path
At the command prompt, I printed the path and implemented path compression, that is, when the home directory appears in the path, it will be /home/user_name
compressed~
The specific implementation method is to call getcwd
the function , you can get the current path, with the help getenv
of the function, you can get the current home directory path, and then you can compare and compress, the specific implementation code is as follows
void print_prompt()
{
char *path = getcwd(NULL, 0);
const char *home = getenv("HOME");
if (strstr(home, path) == 0)
{
path[0] = '~';
size_t len_home = strlen(home);
size_t len_path = strlen(path);
memmove(path + 1, path + len_home, len_path - len_home);
path[len_path - len_home + 1] = '\0';
}
printf("ThyShell \033[0;32m%s\033[0m $ ", path);
free(path);
}
3.2 quit built-in command
The effect is as follows
The specific method is to regard quit as a command, and then make a judgment before calling the external command. If it meets the requirement, it will exit directly. The implementation code is as follows
int builtin_command(Command command)
{
if (!strcmp(command.argv[0], "quit"))
{
quit();
}
else if (!strcmp(command.argv[0], "cd"))
{
if (chdir(command.argv[1]) != 0)
{
fprintf(stderr, "Error: cannot cd :%s\n", command.argv[1]);
}
return 1;
}
return 0;
}
3.3 The cd built-in command
The effect demonstration is as follows
You can see that you can switch freely under user permissions, and the method to achieve it is to use chdir
the function . The specific code is in Section 3.2.
3.4 Error detection
In addition to the operation of normal functions, ThyShell
it also has anomaly detection function, which can detect fork anomalies, waitpid anomalies and syntax anomalies. The specific implementation is to wrap the system call function, which not only ensures the normal function, but also makes the code concise , the specific implementation is as follows
void unix_error(char *msg)
{
fprintf(stderr, "%s: %s\n", msg, strerror(errno));
exit(0);
}
pid_t Fork()
{
pid_t pid;
if ((pid = fork()) < 0)
{
unix_error("Fork error");
}
return pid;
}
void Wait(pid_t pid)
{
int status;
waitpid(pid, &status, 0);
if (!WIFEXITED(status))
{
printf("child %d terminated abnormally\n", pid);
}
}
3.5 Multi-pipeline commands
ThyShell
Multi-pipeline commands can be implemented, and the specific demonstration is as follows:
Input cat filename | wc -l | less
can appear as follows
Indicates that the function is normal.
The specific implementation can refer to the flow chart, the idea is to abstract the command line into a separate level, and the command line can include one or more commands. When a pipeline command appears, the pipeline needs to be opened and then redirected.
3.6 Commands with parameters
This can be achieved when parsing the command line, and the parameters will be passed as the parameters execvp
of , the specific implementation is as follows
execvp(command.argv[0], command.argv);