In-depth understanding of Linux network technology insider (1) - basic introduction


foreword

Hello, everyone. Hello everyone.
Out of work needs and a strong interest in Linux, I decided to start learning 《深入理解Linux网络技术内幕》this .
The purpose of writing this blog is also to record my own study notes and my own learning experience. If you are also interested, welcome to study and discuss together.
insert image description here

I won’t go into details about the preface and introduction of this book here, and go directly to the first chapter of this book.

Introduction

The research on the source code of a large-scale project is like entering a strange and novel field, with its habits and unspeakable expectations. It will definitely help you to learn some major conventions beforehand and try to interact with several components instead of just standing by and watching.

This chapter focuses on some common programming patterns and techniques that are often encountered in networking code.
If possible, I hope you use some user-space tools (user-space tool) to try to interact with specific parts of the kernel network code. If such tools are not installed in your preferred Linux distribution, or you just want to update it to the latest version, this chapter can remind you where to download these tools.

There are also some tools introduced here that allow you to explore the huge kernel code in an elegant way. Finally, it will briefly explain why some kernel features widely used in the Linux community have not been integrated into the official kernel code version.

basic terms

In networking literature, the eight-bit quantity is often called 八位组(octet). However, more common terms are used in this book 字节(byte). After all, this book describes kernel behavior, not some networking abstraction, and kernel developers are used to thinking in terms of bytes.

术语向量(vector)和数组(array)是交替使用的

When it comes to the layers of the TCP/IP network protocol stack, 使用L2、L3和L4分别表示链路层(link layer)、网络层(network layer)以及传输层(transport layer). These numbers are based on the famous (now inaccurate) seven-layer OSI model. In most cases, L2 is a synonym for Ethernet, L3 refers to IPv4 or IPv6, and L4 refers to UDP, TCP or ICMP. When it is necessary to refer to a specific protocol, its name such as (TCP) will be used instead of the term of the generic Ln protocol.

The act of reception or transmission of a data unit is denoted by the abbreviations RX and TX, respectively.

The name of the data unit is different according to the layer where it is located, such as frame (frame), packet (packet), segment (sagment) and message (message).

主要的缩写如下
insert image description here

Common Coding Patterns

Similar to other kernel functions, each networking function is just one of various kernel functions. Therefore, memory, CPU, and various other shared resources must also be used appropriately and fairly. Most functions are not independent code fragments in the kernel, but interact more or less with other kernel components depending on the function. Therefore, similar functions will be implemented using similar mechanisms as much as possible.

Some kernel components have common requirements, such as having to allocate several instances of the same data structure type, having to record references to certain data structure instances to avoid unsafe memory deallocation, and so on. In the next few subsections, we'll describe the common way Linux handles such requirements, and we'll introduce common coding techniques you might encounter while browsing the kernel code.

memory cache

The kernel allocates and frees a block of memory using the kmallocand functions, respectively. kfreeThe syntax of these two functions is similar to the calls to the other two libcsister functions from the userspace library malloc和free. For details about kamlloc and kfree, you can refer to 《Linux设备驱动程序》a book.

It is common for kernel components to allocate several instances of the same data structure type. When allocation and deallocation occur frequently, the associated kernel component initialization function (for example, fib_hash_init for routing tables) usually allocates a special memory cache for allocation purposes. When a block of memory is freed, it actually returns to the same buffer it was originally allocated for.

Examples of some network data structures that the kernel maintains in its own memory cache include:
套接字缓冲区描述符
This cache is allocated by skb_init of net/core/sk_buff.c to allocate sk_buff buffer descriptors. The sk_buff structure probably has the highest number of allocation and deallocation registrations in the network subsystem.

邻居协议映射
Each neighbor protocol uses a buffer to allocate data structures that store L3-to-L2 address mappings. (See Chapter 27)

路由表
The routing code uses two caches, two data structures that define the path

用于处理内存缓存的关键内核函数:
kmem_cache_create
kmem_cache_destroy
创建和销毁一个缓存
kmem_cache_alloc
kmem_cache_free
allocate and reclaim a buffer for the cache. It is usually up to the calling 包裹函数(wrapper)implementation to allocate and deallocate a cache buffer, and the wrapper functions are managed in a higher layer to handle requests for allocation and deallocation. For example, using kfree_skb to request the release of an instance of an sk_buff buffer only occurs after all references to the buffer have been freed and all necessary cleanup has been done by related subsystems (such as the firewall). Call kmem_cache_free.
The limit on the number of instances that can be allocated for a given cache (when present) is usually determined by the wrapper function that includes kmem_cache_alloc, but can sometimes be /procconfigured by parameters in it.

For details about the implementation of the memory cache and the interface between it and the slab allocator (slab allocator), you can refer to the book "In-depth Understanding of the Linux Kernel".

Cache and hash table

It's common to use caching to improve performance. In the network code, the mapping between the L3 layer and the L2 layer (such as the ARP cache used by IPv4) and the routing table all use cache.

Cache lookup functions usually have an input parameter indicating whether a new element should be created and added to the cache when a cache miss occurs. Without this parameter, the query function will add the missing elements at any time.

缓存通常采用散列表(即hash表)实现. The kernel provides a set of data types, such as one-way and two-way lists, for building simple hash tables. Some inputs get the same value after hashing, and the standard way to deal with it is to put them into a list. The time it takes to query this list is much longer than the time it takes to query using the hash keyword (key). Therefore, it is always important to minimize the number of input values ​​with the same hash value.

When the query time of the hash table (with or without caching) is a key parameter of the owner subsystem, a mechanism can be implemented to increase the size of the hash table so that the average length of the collision list can be decreased, while the average query time improved.

reference count

When a piece of code tries to access a data structure that has been freed, the kernel doesn't welcome it, and most users don't like the kernel's reaction to it. In order to avoid such nasty problems, while making the garbage collection mechanism easier and more efficient, most of the data is kept 引用计数(reference count). For each store and deallocation of each data structure, a good kernel component increments and decrements the reference count for that data structure, respectively. If any data structure needs a reference count, the kernel component that owns the structure will usually provide two functions to increment and decrement the reference count value. Such functions are usually called xxx_hold and xxx_release respectively. Sometimes, the release function is called xxx_put (for example, the release function of the net_device structure is dev_put).

While we all hope that there are no bad components in the kernel, developers are human at that point, so it's impossible to write bug-free code forever. The use of reference counting is a simple but efficient mechanism to avoid freeing data structures that are still referenced. However, it doesn't always solve the problem completely because it ignores balancing increments and decrements:

  • If you release a reference to a data structure, but forget to call the xxx_release function, the kernel will never let the data structure be released (unless another piece of faulty code happens to call the release function again by mistake! ). This will cause the memory to be gradually exhausted.
  • If you have a reference to a data structure, but forget to call xxx_hold, and later you happen to be the only reference holder, the structure will be released early as a result (because you didn't say so). This situation is definitely more catastrophic than the previous one: the next time you try to access the structure, you may destroy other data, or cause the 内核恐慌(panic)entire system to crash in an instant.

When a data structure is deleted for some reason, the reference holder can be clearly notified that the data structure is about to disappear, so that it can release its reference according to the rules, and this is achieved through the notification chain. (There will be examples later)

  • There is a close relationship between the two data structure types. In this case, one of the structures will usually maintain a pointer initialized to the address of the second structure.
  • A timer starts, and the timer's handler will access the data structure. When the timer fires, the struct's reference count is incremented, because the last thing you want is for the data structure to be deallocated before the timer expires.
  • A successful lookup on the list or hashtable returns a pointer to the matching element. In most cases, the returned result is used by the caller to perform some kind of task. Therefore, it is common practice for the lookup function to increment the reference count of the matching element and then let the caller free it if necessary.

When the last reference to a data structure is also released, the data structure can be freed because the data structure is no longer used, but it doesn't have to be.
The newly introduced sysfs文件系统part of the good code that helps to generate the kernel can be used to pay more attention to reference counting and consistency.

garbage collection

Memory is a limited shared resource and should not be wasted, especially in the kernel, since the kernel is not using virtual memory. Most kernel subsystems implement some sort of garbage collection to reclaim memory held by unused or invalid data structure instantiations. Depending on the needs of a particular function, you will find that there are two main types of garbage collection:
异步
This type of garbage collection is not related to a specific event. A timer periodically starts a function that scans a set of data structures and frees those suitable for deletion. Whether a data structure is eligible for deletion depends on the functionality and logic of the subsystem, however, a common criterion is the presence or absence of a null (empty) reference count.
同步
In the case of insufficient memory, garbage collection will be triggered immediately, instead of waiting for the asynchronous garbage collection triggered by the timer. The criteria used to select data structures eligible for deletion are not necessarily the same as those used by the asynchronous cleanup mechanism (eg, the criteria are more aggressive).

Function Pointers and Virtual Function Table (VFT)

Function pointers are a convenient way to write concise C code while taking advantage of some of the advantages of object-oriented languages. In the definition of a data structure type (object), you can include a set of function pointers (methods). Thus, some or all of the operations on the structure can be performed by embedded functions. The function pointer of C language is similar to the following in the data structure

struct sock{
    
    
	...
	void (*sk_state_change)(struct sock *sk);
	void (*sk_data_ready)(struct sock *sk, int bytes);
	...
}

The main advantage of using 函数指针it is that it can be initialized according to different criteria and roles played by the object. Therefore, when sk_state_change is called, different functions may actually be called for different sock sockets.

Function pointers are used extensively in networking code. The following are just a few examples:

  • When an ingress or egress packet is processed by the routing subsystem, two functions in the buffer data structure are initialized.
  • When the packet is ready to be transmitted on the network hardware, it is handed over to the hard_start_xmit function pointer of the net_device data structure. This function is initialized by the device driver associated with this device.
  • When the L3 protocol wants to transmit a data packet, it calls one of a set of function pointers. These function pointers are initialized as a set of functions by the address resolution protocol (ARP) associated with the L3 protocol. According to the actual function initialized by the function pointer, transparent address resolution from the L3 layer to the L2 layer can be generated. Another function is called when address resolution is not needed. Analyze later.

As can be seen from the above examples, function pointers can be used as an interface between kernel components, or as a general mechanism to call the appropriate function handler at the appropriate time according to the result of something done by different subsystems. In other cases, function pointers can also be used as an easy way for a protocol, device driver, or any other function to individuate some kind of action.

下面分析一个例子
When a device driver registers a network device with the kernel, no matter what type of device it is, it goes through a series of steps. At some point, a function pointer is called on the net_device data structure to make the device driver do something else if necessary. The device driver can either initialize this function to one of its own functions, or leave the pointer as NULL, because the kernel's default execution steps are sufficient.
Before executing a function pointer, its value must always be checked to avoid extracting the value of the unit pointed to by a NULL pointer, for example, the register_netdevicesnippet in:

if(dev->init && dev->init(dev) !=0)
{
    
    
	....
}

函数指针有一个主要的缺点: Makes reading the code slightly more difficult.
Reading along a given code path, the final focus of attention may be the function pointer call. In this case, it is necessary to figure out how the function pointer is initialized before continuing to read down the code path. This may depend on a few different factors:

  • When the choice of function to assign to a function pointer is based on a particular piece of data (such as the protocol that handles the data, or the device driver from which a given data packet was received), it is easier to deduce that function. For example, if a given device is managed by a drivers/net/3c59x.c device driver, it is possible to read the device initialization function provided by that device driver, resulting in a function dispatched to the specific function pointer of the net_device data structure.
  • When the choice of function is based on more complex logic, such as the state of L3-to-L2 address mapping resolution, the function used at any time will depend on unpredictable external events.

A set of function pointers to a data structure is often called 虚拟函数表(virtual function table,VFT). When a VFT is used as an interface between two major subsystems, such as between the L3 and L4 protocol layers, or when the VFT is only used as an interface to a common kernel component (a set of objects) as an output, the data structure The number of function pointers may be inflated to include many different pointers to accommodate various protocols or other functions. Each function may end up using only a small subset of the many functions provided.

goto statement

Few C programmers like goto statements. Leaving aside the history of goto (the oldest and most famous debate in the field of computer programming) here are some reasons why goto is generally deprecated and the Linux kernel still uses it.

Any code fragment that uses a goto statement can be rewritten to not use it. Using the goto statement will reduce the readability of the code and make debugging difficult. Because for any statement after goto, you cannot clearly deduce the conditions for its execution.

To make an analogy: a tree no matter which node, you know the path from the root node to the node. But if the vines are randomly wrapped around some branches, there must be only one path between the root node and other nodes.

However, the C language does not provide display exception events (other languages ​​usually do not provide display exception events due to performance loss and coding complexity). Careful use of goto statements can easily jump to code that handles unexpected or special events. In kernel programming, especially network programming, such events are so common that the goto statement becomes a convenient feature.

在为内核使用goto语句辩解时,我必须指出一点,开发人员绝不是用到无法无天的地步。尽管有3000个以上的实例,但主要都是用于处理函数内的不同返回代码,或者用于跳出一层以上的嵌套。

Vector (array) definition

In some cases, an optional block is included at the end of the data structure definition. The following is an example.

struct abc{
    
    
	iint age;
	char *name[20];
	...
	char placeholder[0];
}

Optional blocks start with placeholder. Note that placeholder is defined as a vector of size 0. That is, when abc is allocated with an optional block, placeholder points to the beginning of this block. When the optional block is not needed, the placeholder points to the beginning of the block. When the optional block is not needed, the placeholder is just a pointer to the end of the structure and does not consume any space.

So, if abc is used by several pieces of code, each piece of code can use the same basic definition (to avoid confusion that each does the same thing in a slightly different way), but can use the definition of abc in different ways according to its needs Be individualized.
(I don't understand its function, and there will be examples later)

Conditional instructions (#ifdef and its series of instructions)

Sometimes it is necessary to pass some conditional instructions to the compiler. Extensive use of conditional instructions can reduce the readability of the code, but I can say that Linux does not abuse these instructions. Conditional instructions arise for a variety of different reasons, however, we are interested in those that check whether a core supports a particular feature. For example, make xconfigthis type of configuration tool can confirm whether the function is compiled in, does not support it at all, or can be loaded in the form of a module.

For example, #ifdef或#if definedto instruct the C preprocessor to complete a check function:
从数据结构定义中引入或排除字段

struct sk_buff{
    
    
	...
	#ifdef CONFIG_NETFILTER_DEBUG
		unsigned int nf_debug
	#endif
	...
}

In this example, the Netfilter debug function requires the nf_debug field in the sk_buff structure. When the kernel does not support Netfilter debugging (only a few developers need this feature), there is no need to introduce this field, and introducing this field will only consume more memory for each network packet.

从函数中引入或排除一些代码片段

int ip_route_input(...)
{
    
    
	...
	if(rth->fl.fl4_dst == daddr && 
		 rth->fl.fl4_src == saddr &&
		 rth->fl.iif == iif&&
		 rth->fl.oif == 0
#ifndef CONFIG_IP_ROUTE_FWMARK
		 rth->fl.fl4_fwmark == skb->nfmark &&
#endif
		 rth->fl.fl4_tos == tos)
		 {
    
    
		 		...
		 }
}

It will be explained later that the route cache query function ip_route_input will only check the mark value set by the firewall when the kernel is compiled to support the "IP: use netfilter MARK value as routing key" function.

为函数选择适当的原型

#ifdef CONFIG_IP_MULTIPLE_TABLES
struct fib_table * fib_hash_init(int id)
#else
struct fib_table* __init fib_hash_init(int id)
{
    
    
	...
}

In this example, these directives are used to add the __int flag to the prototype when the kernel does not support Policy Routing.
为函数选择适当定义

#idndef CONFIG_IP_MULTIPLE_TABLES
...
static inline struct fib_table *fib_get_table(int id)
{
    
    
	if(id != RT_TABLE_LOCAL)
		return ip_fib_main_table
	return ip_fib_local_table
}
...
#else
static inline struct ifb_table *fib_get_table(int id)
{
    
    
	if(id == 0)
		id = RT_TABLE_MAIN
	return fib_tables[id];
}
...
#endif

Note how this case differs from the previous one. In the former case, the function was located outside the #ifdef/#endif blocks, while in this case each block contains a completed function definition.

The definition or initialization of variables and macros can also use conditional compilation.
It is important to know that many definitions exist for certain functions or macros. Similar to the previous example, these function or macro definitions are determined during compilation by the preprocessor macros. Otherwise, when you look at a function, variable, or macro definition, you might see the wrong definition.

compile-time optimization of conditional checks

Most of the time, when the kernel compares a variable with some external value to see if a given condition is met, the result is likely to be predictable. This is very common, for example, in code that enforces sanity checks. The kernel uses the likely and unlikely macros, respectively, for package comparisons that return true (1) or false (0). These macros take advantage of a feature of the gcc compiler, which is the 可根据该项信息使代码编译最优化.

The following is an example. Suppose you need to call the do_something function. When the call fails, you must handle it with the handle_error function:

err = do_something(x,y,z);
if(err)
	handle_error(err);

Assuming that do_something rarely fails, the code can be rewritten as follows:

err = do_something(x,y,z);
if(unlikely(err))
	handle_error(err);

likely和unlikelyOne example of an optimization a macro might do is to process options in the IP header. The use of IP options is limited to some specific cases, so the kernel can safely assume that most IP packets will not carry IP options. When the kernel forwards IP packets, those options need to be processed according to some rules. The final stage of forwarding IP packets is carried by ip_forwad_finish. This function uses the unlikely macro to wrap a condition that checks whether the IP option needs to be processed.

mutually exclusive

Locking is used extensively in networking code, and you'll likely find the topic of locking under every topic in this book. Mutual exclusion, locking mechanisms, and synchronization are common topics for many types of programming, especially kernel-specific programming, and are quite interesting and complex. In recent years, Linux has not only introduced mutual exclusion, but also optimized some methods. Therefore, this section only summarizes the locking mechanisms seen in networking code. You can refer to "In-depth Understanding of the Linux Kernel" and "Linux Device Drivers"

Each mutual exclusion mechanism is the best choice under certain circumstances. Here is a succinct summary of the optional mutex methods commonly found in networking code:
回转锁(Spin lock)
This is a lock that can only be held by one thread of execution at a time. If you try to acquire a lock owned by another executing thread, you will enter a loop and wait until the lock is released. Because of the wasteful entry into the loop, spin locks are used only in multiprocessing systems, and generally only when the developer expects the lock to be held only for a short period of time. In addition, the execution thread cannot sleep while holding the spin lock because it will cause waste of other execution threads.

读-写回转锁
When the use of a given lock can be clearly divided into read-only and read-write, the read-write spin lock should be used first. The difference between a read spin lock and a read-write spin lock is that for a read-write spin lock, multiple readers can simultaneously hold the lock. However, only one person can write to the lock holder at the same time, and at the same time, when the lock is held by a writer, no reader can acquire the lock. Since the reader has a higher priority than the writer, when the reader's writing (or the number of read-only locks obtained) far exceeds the number of writers (or the number of read-write locks obtained), this This type of lock works well.
When the lock is acquired in read-only mode, it cannot be promoted directly to read-write mode: the lock must be released and then acquired in read-write mode.

读取(拷贝)更新(Read-Copy-Update,RCU)
RCU is the newest mechanism for Linux to provide mutual exclusion, and it works very well under the following specific conditions:

  • Read-write lock requests are rare compared to read-only lock requests.
  • Code holding the lock executes atomically and cannot sleep.
  • Data structures protected by locks are accessed through pointers.
    The first condition concerns performance, while the other two are fundamental to how RCU works.
    Note that the first condition seems to imply that a read-write rotary lock should be used instead of RCU. In order to understand properly.
    When using it, why RCU works better than read-write rotary lock, you need to consider other aspects, such as the impact of processor cache on SMP (symmetric multiprocessing, symmetric multiprocessing) system.

An example of the use of RCU in networking code is the routing subsystem. Lookups are more frequent than updates in the cache, so functions implementing route cache lookups are not blocked during searches.

The kernel also provides semaphores but they are rarely used in the networking code covered in this book. However, the code used to serialize configuration changes is an example of using semaphores.

Byte order conversion between host and network

Data structures larger than one byte can be stored in memory in two different formats:
小端: Low addresses store the low bits of the data
大端: Low addresses store the high bits of the data
The format used by operating systems like Linux depends on the processor used. For example, Intel processors follow the little-endian model, while Motorola processors support the big-endian model.

Suppose our Linux machine receives an IP packet from a remote host. Since Linux does not know whether the format used by the remote host to initialize the protocol header is little-endian or big-endian, how does Linux read the header? For this reason, each protocol family must define the sequence of bytes it uses. For example, the TCP/IP protocol stack adopts the big-endian mode.

However, this still leaves a problem for the kernel developer: the code he writes must work on a wide variety of processors that support different byte sequence modes. Some processors may have the same byte sequence pattern as the incoming packet, but if the byte sequence pattern is different, it must be converted to match.

Therefore, every time the kernel needs to read, save, or compare an IP header field of more than one byte, it must first convert network byte order to host byte order, or vice versa. The same is true for other protocols of the TCP/IP protocol stack. When both protocol and localhost are big-endian, the conversion function is a no-op, since no conversion is necessary. However, the conversion function will always appear in the code, in order to make the code portable; only the conversion function itself is platform-dependent.
The following table lists the main macros for two-byte and four-byte endian conversions.

insert image description here
insert image description here
insert image description here
insert image description here

Catch bugs

There are functions that are assumed to be called under certain conditions, or assumed not to be called under certain conditions. 内核使用BUG_ON和BUG_TRAP宏捕获引起这类条件不满足的地方. When the input condition passed to BUG_TRAP is false, the kernel will print a warning message (warning message). For BUG_ON, an error message is printed, and then the kernel panics.

Statistical data

It is good practice to have the ability to collect statistics on what happens under certain conditions, such as the number of successful and failed cache queries, the number of successful and failed memory allocations, and so on. For each network function that collects statistics, this book lists each counter and gives a description.

measure time

The kernel often needs to measure how much time has passed since a certain moment. For example, a function that performs a CPU-intensive task usually releases the CPU after a certain amount of time. When it is rescheduled for execution, it continues its work. This is especially important in kernel code, even if the kernel supports kernel preemption. A common example in networking code is implementing garbage-collected functions. You'll see many similar functions throughout this book.

The passage of time in kernel space is calculated using 滴答(tick). A tick is the length of time between two consecutive expired timer interrupts. Timers will be responsible for various tasks (which we are not interested in), and will expire a fixed number of HZ times per second. HZ is a variable that is initialized by architecture-dependent code. For example, on an i386 machine it is initialized 1000 times per second. That is, when Linux is running on an i386 system, the timer is interrupted 1000 times per second. Therefore, the length of time between two consecutive expiration interrupts is one millisecond.

Every time the timer expires, a global variable called jiffies is incremented. That is to say, at any moment, jiffies represents the number of ticks that have passed since the system was booted, and the passed value n*HZ represents n seconds.

If all a function needs to do is measure elapsed time, you can save the value of jiffies in a local variable, and later compare jiffies with the current timestamp to get a time interval (expressed in ticks), To know how much time has passed since the measurement moment.

The example below gives a function that has to do some kind of work, but doesn't want it to hold the CPU for more than a tick. When do_something sets job_done to a non-zero value, indicating that the work has been completed, this function can return.

extern unsigned long jiffies;
unsigned long start_time = jiffies;
int job_done = 0;
do{
    
    
	do_something(&job_done);
	if(job_done)
		return ;
}while(jiffies - start_time <1)

For several examples of kernel code using jiffies, see the section "Backlog Processing: process_backlog polling virtual function" in Chapter 10, and the section "Asynchronous Cleanup": neigh_periodic_timer function in Chapter 27.

user space tools

There are a number of different tools that can be used to configure the many networking features available to Linux. As mentioned at the beginning of this chapter, you can use these tools to manipulate the kernel to facilitate learning and to discover the impact of making such changes.
The following are the tools that will be covered frequently in this book:
iputils
In addition to the frequent use of the ping command, iputils also includes arping (for generating ARP requests), the network router discovery daemon radisc, and other programs.

net-tools
This is a set of network tools, the most famous of which are ifconfig, route, netstat, and arp, as well as ipmaddr, iptunnel, ether-wake, and netpluged.

IPROUTE2
This is the new generation of network configuration suites (although it has been around for years). Through a multi-purpose command ip, this suite can configure IP address and routing, and various other advanced functions, such as neighbor agreement and so on.

The source code of IPROTE2 can be downloaded from the official website , while other components can be downloaded from the download servers of most Linux distributions.

These components are included by default in most Linux distributions. Whenever you don't understand how the kernel code handles commands from user space, I encourage you to take a look at the source code of that user space to understand how the commands issued by the user are packaged and passed to the kernel.

At the URL below, you can find good documentation on how to use the above tools, including the mailing list. (mailing list)

Linux Advanced Routing & Traffic
ControlPolicy
RoutingNetflter

If you want to keep track of the latest changes to the networking code, follow the following mailing list:
Linux Web Development Forum Archives

browse code

The Linux kernel has gotten so big that browsing through code with "old friend" grep is definitely not a good idea anymore. Now, you can rely on a variety of software to make your kernel code journey easier.

For those who don't know yet cscope, it is recommended to use cscope to download the source code .
This is a simple yet powerful tool that, for example, can search where a function or variable is defined, where it is called, and more. The process of installing this tool is also very simple, and you can find all the necessary instructions on the website.

dead code

The kernel, like any other large and dynamically evolving piece of software, includes pieces of code that are no longer called. Unfortunately, comments in code rarely tell you this. Sometimes, it's just because you're looking at dead code that you can't understand the usage of a given function or what's going on with the initialization of a given variable. If you're lucky, the code doesn't compile, and you guessed it, the program code is too old. Other times, you may not be so lucky.

Each kernel subsystem is assumed to have one or more maintainers assigned to it. However, some maintainers have so much code to look at that they simply don't have enough free time to do it. Other times, they are no longer interested in continuing to maintain their subsystems, but cannot find a role to replace them. So it makes sense to keep that in mind when you see code that seems to be doing weird things, or code that doesn't follow common, common-sense programming rules.

Throughout this book, I have tried to remind you of unused functions, variables, and data structures whenever it makes sense. The reason it's not in use could be because it was left over when a feature was removed, or because a new feature was introduced but the coding wasn't done yet.

When functionality is provided as a patch

The kernel networking code is constantly evolving. Not only are new functions integrated, but design changes are sometimes made to existing components to achieve more modularity and higher performance. Obviously, this makes Linux very attractive as an embedded operating system for network applications (routers, switches, firewalls, load balancers, etc.).

Because anyone can develop a new function for the Linux kernel, or extend or reimplement an existing function. For any "open" developer, seeing his work become part of an official kernel release is one of the most heartening moments. However, sometimes even if a project has valuable and well-implemented features, it may not be possible to integrate, or it may take a long time. Common causes include the following:

  • The code is not written according to the principles of Documentation/CodingStyle.
  • Another important project has provided the same functionality for some time now, and has gained the approval of the linux community and important kernel developers who maintain the related kernel field.
  • There is too much overlap with another kernel component. In this case, the best practice is to remove redundant functions, use existing functions where possible, or extend existing components to be used in new environments. This situation emphasizes the importance of modularity. .
  • The size of the project and the amount of work required to maintain it in a rapidly changing kernel can lead developers of new projects to keep it as a separate patch and release new versions only occasionally.
  • This function will only be used in specific occasions and is not necessary in a general-purpose operating system. In this case a standalone patch is usually the best solution.
  • A comprehensive design may not satisfy some key kernel developers. These experts usually have a big picture in mind, concerned about the current development of the kernel and the future development direction. Usually they require design modifications to make this feature adaptable to the needs of the kernel

Sometimes, the overlap between functions is difficult to completely eliminate. For example, perhaps because a feature is so flexible, its different uses are only apparent later. For example, firewalls have hooks in several places in the network protocol stack. Therefore, there is no need to implement other functions such as filtering or indicating the direction of transmission for data packets, as long as the simple answer is to rely on the firewall. Of course there are dependencies (for example, the kernel must include support for firewalls if the routing subsystem is to conform traffic to certain criteria). In addition, the maintainer of the firewall must always accept reasonable enhancement requests for other kernel features. However, this tradeoff is often beneficial: less redundant code means fewer bugs, easier code maintenance, simplified code paths, and so on.

The most recent example of functional overlap cleanup is to delete the stateless NAT (stateless Network Address Translation) supported by the routing code in version 2.6 of the kernel. The developers realized that it was more flexible to put stateful NAT support in the firewall, so there was no value in continuing to maintain a stateless NAT program (although stateless NAT was faster and used less memory). Note that a new module will be written at any time required by Netfilter to provide stateless NAT support if necessary.

Link

Baidu Netdisk

Guess you like

Origin blog.csdn.net/m0_56145255/article/details/128233086