C++ classic book reading notes

Table of contents

Essential C++

C++ Primer

basic knowledge

Class Designer's Tools

Effective C++

Linux high-performance server programming

TCP/IP protocol family and various important network protocols

server programming

Optimize and monitor server performance

Write Makefile with me

Advanced Programming in UNIX Environment

The basic C++ knowledge is omitted, and it is being updated continuously

Essential C++

stl: standard template library: including containers: vector, list, set, map and generic algorithms for operating these containers: find(), soryt(), replace(), merge(), etc.
iterator: generic pointer iterator
inline functions should be defined in the header file
class constructor and destructor
The copy constructor must be used in some cases, for example, the class involves an array pointer. When the array memory is released, the pointer points to a null address at this time, which is not allowed.
Classes can specify other functions or classes as friends, so that they have the same access rights as member functions and can access private members of the class

friend int operator(const triangular_iterator&rhs)
{
    //函数
}

class Triangular{
friend class Triangular_iterator;
//class,这样Triangular就成为了Triangular_iterator的friend,可以访问其私有变量
//但是如果Triangular_iterator提供了共有函数访问私有变量，也可以不建立朋友关系
}

oop: object-oriented programming
Object-based programming cannot solve the relationship between classes, which is why oop was proposed
The two most important characteristics of object-oriented programming: inheritance and polymorphism
Dynamic binding is the third unique concept of the object-oriented programming style: when the function is executed, it knows which function of the derived class is actually called, and the compiler cannot determine in advance, which function is actually called by the parsing object is delayed until running This is called dynamic binding
The features of polymorphism and dynamic binding only come into play when using pointer or reference
By default, member function parsing is performed statically at compile time. If you want it to be performed dynamically at runtime, you need to add the keyword virtual before the member function life

class Book:public Libmat{
//Book继承Libmat
}

Members declared as protected can be directly accessed by derived classes. Otherwise, protected members cannot be accessed directly. Both the constructor of the derived class and the constructor of the base class will be executed.
A static member function cannot be declared as a virtual function
If the behavior of an operation is not needed outside the base class, declare it private. Derived classes of this base class also cannot access the private number in the base class
The inherited class must provide the corresponding implementation for the pure virtual function of the base class
Whenever a member of a derived class has the same name as a member of the base class, that part of the member of the base class will be masked
Reference can never represent an empty object, pointer may be null, so when using reference, we don't have to check whether it is null
Constructor: Base Class -> Derived Class Destructor: Derived Class -> Base Class Calling Order
If the derived class inherits the virtual functions of the base class completely unchanged, then this derived class will also become an abstract class, and no objects can be defined for it, and the virtual functions provided by the base class will be overwritten. The derived class needs to provide new definitions. The function prototype must fully conform to the function prototype declared by the base class, including the parameter list, return type, and constness. But there is an exception. When the virtual function of the base class returns a base class form (usually pointer or reference), the function of the same name in the derived class can return the type derived from the base class.
In C++, object-oriented programming concepts can only be supported with base class pointers or references
static_cast unconditional type conversion, dynamic_cast dynamic conversion (type checking will be performed during conversion)

Programming with template

The template template abstracts the actual type, and the user specifies the final type. Add before the code

template <typename valType>
//下文中所有指代类型的地方都用valType指代即可
class BTnode{
public:
	//...
private:
	valType _val;//栗子
};
//进行类型绑定
BTnode< int >bti;
BTnode< string >bts;

exception handling mechanism

throw throws an exception catch When the caught exception cannot be handled, it may be necessary to re-throw the exception throw

try
    {
        //
    }
catch(xx)
    {
    //如果try中有异常，catch中语句会进行捕捉
    }//如果一直没有对应的catch子句，函数调用链会一直不断解开，如果回溯到main还是没有，C++会调用
//标准库中的terminate()函数——默认行为是中断整个程序的执行

Before the exception handling mechanism terminates a function, C++ guarantees that the destructors of all local objects in the function will be called. Resource allocation is performed in the constructor and resource recovery is performed in the destructor

C++ Primer

basic knowledge

Numbers starting with 0 represent octal 024
If a variable of a built-in type is not explicitly initialized, its value is determined by where it is defined. Variables defined outside any function body are initialized to 0
User-defined class names are capitalized
The reference must be initialized, which is equivalent to taking an alias. reference is not an object
void* is a special pointer type that can be used to store the address of any object. We don't know what type of object is in this address. Therefore, void* has limited usefulness: compare it with other pointers, use it as input or output of a function, or assign it to another void* pointer. You cannot directly manipulate the object pointed to by the void* pointer, because you don't know what type the object is
* follows the variable, not the type. int*p1, p2 p2 is int type
By default, const variables are only valid in files. When const variables with the same name appear in multiple files, it is equivalent to defining independent variables in different files. In order to make const variables valid in different files, you can use the extern keyword, so you only need to define it once
A constant reference can bind a non-constant variable, but cannot change the value of the referenced object through the reference
Similarly, a constant address can point to a non-constant variable, but the value of the pointed object cannot be changed through the constant address

Pointers or references to constants are nothing more than pointers or references "self-righteous". They feel that they point to constants, so they consciously do not change the value of the pointed object

int a=0;
int *const pa=&a;//pa将一直指向a
const int b=2;
const int *const pb=&b;//pb是一个指向常量对象的常量指针

*The latter const ensures that the pointer always points to the same position, *the front const ensures that the pointer will not try to change the pointed object
The variables defined in the function body are generally not stored in a fixed address, and the addresses of objects defined outside the function body are fixed

using db = double;
db x = 10.2;//相当于typedef

The .end() of the iterator returns an iterator pointing to the next position of the end element of the container, which is an end element that does not exist.
For any loop body that uses an iterator, do not add elements to the container to which the iterator belongs
You cannot use one array to initialize another array, nor can you directly assign an array to another array
A vector can be initialized with an array, but an array cannot be initialized with a vector

int a[]={0,1,2,3,4};
vector<int> b(begin(a),end(a));//用数组初始化vector

Strictly speaking, there is no multi-dimensional array in the C++ language. What is usually called a multi-dimensional array is actually an array of arrays.

assert(expr);
//首先对expr求值，如果表达式为假，assert输出信息并终止程序的执行
//如果表达式为真，assert什么都不做

If the number of actual parameters of the function is unknown and all the actual parameters are of the same type, we can use a parameter of type initializer_list to represent an array of values of a certain type
If a name is declared in an inner scope, it will hide entities of the same name declared in outer scopes
inline function: expand it at each call site. The inlining mechanism is used to optimize small-scale, direct-flow, and frequently-called functions.
A constexpr function refers to a function that can be used in constant expressions, that is, the return value is a constant
Inline functions and constexpr functions can be defined multiple times in the function, because the compiler wants to expand the function and only the function declaration is not enough, so it is recommended to put the definitions of these two functions in the header file
assert preprocessing macro: The so-called preprocessing macro is actually a preprocessing variable, which behaves like an inline function

assert(expr);
//首先对expr求值，如果表达式为假，assert输出信息并终止程序的执行
//如果表达式为真，assert什么都不做

The compiler provides default copy, assignment, and destruction operations, but in some cases, these default operations do not work properly and need to be customized, especially when the class needs to allocate resources other than the class object

//构造函数
person(int age1,int grade1)：age(age1),grade(grade1)
{
}
//如果我们定义了构造函数，编译器将不会自动生成默认的构造函数，如果需要默认的构造函数，需要使用dafault
person()=default;

The default access permission of a class is private, and the default permission of a structure is public. The only difference between using class and struct to define a class is the default access rights
A class can run other classes or functions to access its non-public members by making other classes or functions its friends
A mutable data member is never const, even if it is a member of a const object. A const member function can change the value of a mutable member
Friend functions can be defined inside a class, such functions are implicitly inlined

class A{
friend class B;
}//B被声明为A的友元，所以B可以访问A的成员
//友元函数可以在类的内部定义，但是一定要在类外进行声明，不然直接进行调用会被认为没有声明

If members are const, reference, or belong to a class type that does not provide a default constructor, we must provide initial values for these members through the constructor initializer list
For constructors without incoming parameters, the parentheses should be omitted when creating an object, otherwise it will be regarded as declaring a function
Aggregate classes allow users to directly access their members and have a special initialization syntax. A class is said to be aggregated when it satisfies the following conditions: all members are public, no constructors are defined, no in-class initializers, no base classes, and no virtual functions. You can provide a member initialization list enclosed in curly braces and use it to initialize the data members of the aggregate class. The order of the initial values must be consistent with the order of declaration
Static members do not belong to an object of the class, but they can still be accessed using objects, references or pointers of the class, and member functions can use static members directly without using scope operators
When defining a static member outside the class, the static keyword cannot be repeated. This keyword only appears in the declaration statement inside the class
Every static member must be defined and initialized outside the class
endl wraps and flushes the buffer, ends inserts a null character into the buffer and flushes the buffer, flush flushes the buffer without outputting any additional characters

The design goal of forward_list is to achieve performance comparable to the best handwritten singly linked list data structure, so forward_list has no size operation
Every container type supports equality operators (== and !=); all containers except unordered associative containers support relational operators (>, >=, <, <=). The operands on the left and right sides of a relational operator must be containers of the same type and must hold elements of the same type
When an object is used to initialize a container, or when an object is inserted into a container, what is actually put into the container is a copy of the object, not the object itself
Calling front and back on an empty container is like using an out-of-bounds subscript, which is a serious programming error
erase() deletes an element from the specified position in the container, and returns an iterator pointing to the position after the deleted element
There is no need to increment the iterator after calling erase, because the iterator returned by erase already points to the next element in the sequence. After calling insert, you need to increment the iterator twice, insert inserts the new element before the given position, and then returns an iterator pointing to the newly inserted element.
After adding/deleting elements of vector or string, or adding/deleting elements at any position other than the first element in the deque, the iterator returned by the original end will always be invalid. Therefore, the loop program that adds or deletes elements must call end repeatedly, and cannot save the iterator returned by end before the loop, and always use it as the end.
The capacity() operation tells us how many elements the container can hold without expanding the memory space, and the reserve() operation allows us to inform the container how many elements it should be prepared to hold
The find("str") function of string completes the simplest search and returns the subscript where str appears for the first time in string
Adapter (adaptor) is a general concept in the standard library. An adapter is a mechanism that can make something behave like another thing. An adapter takes an existing container type and makes it behave like a different type
Not every container can be "transformed" into another container using an adapter
Generic algorithms themselves do not perform container operations, they only run on iterators and perform iterator operations. The fact that generic algorithms operate on iterators and do not perform container operations introduces a surprising but necessary programming premise: the algorithm never changes the size of the underlying container. Algorithms may change the value of elements stored in the container, and may also move elements within the container, but never add or remove elements directly
Some algorithms read elements from both sequences. The elements that make up the two sequences can come from different types of containers
In order to eliminate repeated words, first sort the vector so that repeated words appear next to each other, and then use unique() to rearrange the vector so that non-repeated elements appear at the beginning of the vector
A lambda expression represents a callable unit of code. It can be understood as an unnamed inline function

[capture list] (parameter list)->return type{function body}
    //一个lambda表达式的形式
    //capture list(捕获列表)是一个lambda所在函数中定义的局部变量的列表，通常为空
    //return type,parameter list和function body与普通函数一样，分别表示返回类型，参数列表和函数体
    //与普通函数不同，lambda必须使用尾置返回
    //一个lambda只有在其捕获列表中捕获一个它所在函数中的局部变量，才能在函数体中使用该变量
	//lambda用来向函数传递

Lambda variables can be captured by value or by reference. Similar to pass-by-value parameters, the premise of using value capture is that variables can be copied. Unlike parameters, the value of the captured variable is copied when the lambda is created, not when it is called, so subsequent modifications to the captured variable will not affect the corresponding value in the lambda
[=] Implicit capture list, capture by value, [&] Implicit capture list, capture by reference
By default, for a variable whose value is copied, lambda will not change its value. If you want to change the value of a captured variable, you must add the keyword mutable at the beginning of the parameter list
Most linked-list-specific algorithms are similar to, but not identical to, their general-purpose counterparts. A crucial difference between the linked list-specific version and the general version is that the linked list version changes the underlying container
Predicate: A function that returns a value that can be converted to bool type. Generic algorithms are often used to detect elements. The predicate used in the standard library is the unary
Associative container: The elements in the container are stored and accessed according to the key, while the elements in the sequential container are stored and accessed in order according to their position in the container
Two main associative containers: map, set
The names of containers that allow repeated keywords all contain the word multi, and the names of containers that do not keep keywords stored in order all start with the word unordered
map,set,multimap,multiset,unordered_map,unodered_set,unordered_multimap,unordered_multiset

map<string,size>word_count;
string word;
set<string>exclude={"the","but","and","or","an"}
while(cin>>word)
    if(exclude.find(word)==exclude.end())
    ++word_count[word];
for(const auto &w:word_count)
    cout<<w.first<<" occurs "<<w.second<<endl;
//find调用返回一个迭代器，如果给定关键字在set中，迭代器指向该关键字，否则find返回尾后迭代器
map<string,string>authors={
   
   {"joyce","james"},{"austen","jane"},{"dickens","charles"}}

When customizing the operation, it must be written in <>
set's iterator is const
The insert and emplace versions that add a single element return a pair that tells us whether the insert was successful. The first member of the pair is an iterator pointing to the element with the given key, and the second member is a bool value indicating whether the element was inserted successfully or already exists in the container

map<string,size>word_count;
string word;
while(cin>>word)
    {
        auto ret=word_count.insert({word,1});
        if(!ret.second)
            ++ret.first->second;
    }
//统计单词出现次数

c.erase(k): Delete each element whose key is k from c. Returns a size_type value indicating the number of deleted elements

c[k];
//返回关键字为k的元素；如果k不在c中，添加一个关键字为k的元素，对其进行值初始化
c.at(k);
//访问关键字为k的元素，带参数检查,若k不在c中，抛出一个out_of_range异常
c.lower_bound(k);
//返回一个迭代器，指向第一个关键字不小于k的元素
c.upper_bound(k);
//返回一个迭代器，指向第一个关键字大于K的元素
c.equal_range(k);
//返回一个迭代器pair，表示关键字等于k的元素的范围。若k不存在，pair两个成员均等于c.end()
//lower_bound()和upper_bound()不适合用于无序容器
//下标和at操作只适用于非const的map和unordered_map

When using the subscript operation on the map, if the keyword does not exist, a new element will be inserted, whose keyword is the given keyword, and its value is 0. If you only want to know whether a given keyword is in the map, you should use find()
Unordered containers are organized on storage as a set of buckets
Static memory saves local static objects, class static data objects, and variables defined outside any function
Stack memory is used to save non-static objects defined in functions
Objects allocated in static or stack memory are created and destroyed automatically by the compiler. For a stack object, it only exists when the program block it defines is running. Static objects are allocated before use and destroyed at the end of the program
In addition to static memory and stack memory, each program also has a memory pool, this part of memory is called free space or heap. Programs use the heap to store dynamically allocated objects—that is, objects that are allocated while the program is running. The lifetime of a dynamic object is controlled by the program. Our code must explicitly destroy dynamic objects when they are no longer in use
A smart pointer behaves like a regular pointer, with the important difference that it is responsible for automatically releasing the pointed-to object. shared_ptr allows multiple pointers to point to the same object, and unique_ptr exclusively points to the object

shared_ptr<string>p1;
//指向string
shared_ptr<list<int>>p2;
//指向int的list

When the last shared_ptr pointing to an object is destroyed, the shared_ptr class automatically destroys the object. It completes the destruction work through another special member function - destructor
It is legal to allocate const objects with new
The compiler cannot tell whether a pointer points to a statically or dynamically allocated object. Similarly, the compiler cannot tell whether the memory pointed to by a pointer has been freed
unique_ptr does not support ordinary copy or assignment operations. The exception is that you can copy or assign a unique_ptr that will be destroyed
weak_ptr is a smart pointer that does not control the lifetime of the object it points to, it points to an object managed by a shared_ptr. Binding a weak_ptr to a shared_ptr does not change the reference count of the shared_ptr
When creating a weak_ptr, initialize it with a shared_ptr
weak_ptr may point to an empty object, so you cannot directly access the object through weak_ptr, you need to use the lock() function
It is legal to dynamically allocate an empty array
unique_ptr supports managing dynamic arrays, shared_ptr does not support
Like vector, allocator is a template

allocator<string>alloc;
auto const p=alloc.allocate(n);
//allocate调用为n个string分配了内存
//allocator分配的内存是未构造的，需要在此内存中构造对象
auto q=p;//q指向最后构造的元素之后的位置
alloc.construct(q++);//*q为空字符串
alloc.construct(q++,10,'c');//*q为cccccccccc
alloc.construct(q++,"hi");//*q为hi
while(q!=p)
    alloc.destroy(--q);
//使用完对象后，必须对每个构造的元素调用destroy来销毁它们

Class Designer's Tools

Effective C++

Try to replace #define with const, enum, inline
use const whenever possible

char greeting[]="hello";
const char*p=greeting;//non-const pointer,const data
char *const p=greeting;//const pointer,non-const data

const appears on the left side of the asterisk, indicating that the pointed object is a constant; if it appears on the right side of the asterisk, it indicates that the pointer itself is a constant

Linux high-performance server programming

TCP/IP protocol family and various important network protocols

Data link layer, network layer, and transport layer protocols are implemented in the kernel, and the operating system needs to implement a set of system calls so that applications can access the services provided by these protocols. The API that implements this set of system calls is socket
UDP does not need to keep a copy of the application layer data, because the service it provides is unreliable. If the application program detects that the datagram has not been correctly received by the receiving end and intends to resend the datagram, the application program needs to copy the datagram from the user space to the UDP kernel sending buffer again.
Normally ARP maintains a cache containing IP address-to-physical address mappings of frequently accessed or recently accessed machines. This avoids repeated ARP requests. You can use the arp command to view and modify the ARP cache under linux
ARP requests and replies are sent from the Ethernet driver
There are many ways to realize the domain name query service: NIS (Network Information Service), DNS and local static files, etc.
Linux uses the /etc/resolv.conf file to store the IP address of the DNS server
Use host+URL to query the corresponding IP address
This set of APIs defined by socket provides the following two functions: one is to copy the application data from the user buffer to the TCP/UDP kernel send buffer to deliver the kernel to send data, or to receive from the kernel TCP/UDP Copy the data in the buffer to the user buffer to read the data. The second is that the application can use them to modify some header information or other data structures of the protocols of each layer in the kernel, so as to finely control the behavior of the underlying communication
Fragments of the same datagram have the same identifier value
You can use the route command or netstat command to view the routing table
Either end of the TCP connection is in a certain state at any time, and the current state can be viewed through the netstat command
On a Linux system, a TCP port cannot be opened multiple times (twice or more) at the same time
Some transport layer protocols have the concept of out-of-band (OOB) data, which is used to quickly notify the other party of important events that occur at the local end
There are many implementations of congestion control algorithms under Linux, such as reno algorithm, vegas algorithm and cubic algorithm
On the HTTP communication chain, there are usually some transit proxy servers between the client and the target server, which provide transit access to the target resource
The source IP address and destination IP address of the IP header are always unchanged during the forwarding process (one exception is source routing). However, the physical address of the source end and the physical address of the destination end of the frame header are always changing during the forwarding process.
GET, HEAD, OPTIONS, TRACE, PUT, DELETE and other request methods are considered idempotent, that is, multiple consecutive and repeated requests have exactly the same effect as sending the request only once. The POST method is different. Sending the same request multiple times in a row may further affect the resources on the server.

server programming

Most modern PCs use little-endian byte order, so little-endian byte order is also called host byte order.
When formatted data (such as 32-bit integers and 16-bit short integers) is passed directly between two hosts that use different endianness, the receiving end must interpret it incorrectly. The solution to the problem is: the sender always converts the data to be sent into big-endian data before sending it, and the receiver knows that the data sent by the other party always adopts big-endian, so the receiver can according to The byte order adopted by itself determines whether to convert the received data (convert for little-endian machines, not for big-endian machines). Therefore, the big-endian byte order is also called the network byte order, which provides a guarantee for all hosts receiving data to correctly interpret the received formatted data.
The general socket address structure is not easy to use, linux provides a special socket address structure for each protocol family
All variables of the special socket address (and sockaddr_storage) type need to be converted to the general socket address type sockaddr (mandatory conversion is enough) when actually used, because the type of address parameters used by all socket programming interfaces is sockaddr
When creating a socket, we specified an address family for it, but did not specify which specific socket address in the address family to use.
Binding a socket to a socket address is called naming the socket.
In the server program, we usually name the socket, because only after naming, the client can know how to connect to it. The client usually does not need to name the socket, but uses an anonymous method, that is, uses the socket address automatically assigned by the operating system.
After the socket is named, it cannot accept client connections immediately. We need to use the following system calls to create a listening queue to store pending client connections:
accept just removes the connection from the listening queue, regardless of the state of the connection (such as the above ESTABLISHED state and CLOSE_WAIT state), and does not care about any changes in network conditions
The pipe function can be used to create a pipe for inter-process communication.
The two file descriptors fd[0] and fd[1] created by the pipe function respectively constitute the two ends of the pipeline, and the data written to fd[1] can be read from fd[0]. Moreover, fd[0] can only be used to read data from the pipeline, and fd[1] can only be used to write data to the pipeline, but not vice versa. If you want to achieve two-way data transmission, you should use two pipelines
socketpair function. It makes it easy to create bidirectional pipes.
The tee function copies data between two pipe file descriptors, which is also a zero-copy operation.
The fcntl function, as its name (file control) describes, provides various control operations on file descriptors.
Linux server programs generally run as background processes. The background process is also called daemon. It has no controlling terminal, so it cannot accidentally receive user input. The parent process of a daemon is usually the init process (the process with PID 1).
The debugging and maintenance of the server requires a professional log system. Linux provides a daemon process to process system logs—syslogd, but its upgraded version—rsyslogd is now used on Linux systems.
Programs running on Linux will be affected by resource constraints. Linux system resource constraints can be read and set by the following pair of functions:
The function to change the root directory of the process is chroot

The logic of the C/S model is very simple. After the server starts, first create one (or more) listening sockets, and call the bind function to bind it to the port that the server is interested in, and then call the listen function to wait for the client to connect.
Since a client connection request is an asynchronous event that arrives randomly, the server needs to use some kind of I/O model to listen to this event.
fork() creates a child process.
The server listens to multiple client requests at the same time through the select system call
Two efficient event handling modes: Reactor and Proactor.
The synchronous I/O model is usually used to implement the Reactor mode, and the asynchronous I/O model is used to implement the Proactor mode.
Reactor is a mode that requires the main thread (I/O processing unit, the same below) to be only responsible for monitoring whether an event occurs on the file description, and immediately notifies the worker thread (logical unit, the same below) of the event if any . Apart from this, the main thread does not do any other substantial work
. Reading and writing data, accepting new connections, and handling client requests are all done in worker threads
The Proactor mode hands over all I/O operations to the main thread and the kernel for processing, and the worker thread is only responsible for business logic.
The server mainly has two concurrent programming modes: half-sync/half-async mode and leader/follower (Leader/Followers) mode.
In the I/O model, the difference between "synchronous" and "asynchronous" is what kind of I/O event (ready event or
completion write (either the application or the kernel). In the concurrent mode, "synchronous" means that the program is executed completely in the order of the code sequence; "asynchronous" means that the execution of the program needs to be driven by system events.
In the semi-synchronous/semi-asynchronous mode, the synchronous thread is used to process client logic, which is equivalent to the logic unit in Figure 8-4; the asynchronous thread is used to process I/O events, which is equivalent to the I/O processing unit in Figure 8-4 .
The leader/follower mode is a mode in which multiple worker threads obtain a collection of event sources in turn, listen to, distribute and process events in turn.
The leader/follower pattern includes the following components: handle set (HandleSet), thread set (ThreadSet), event handler (EventHandler) and specific event handler (ConcreteEventHandler).
Efficient logic processing method - finite state machine
Exchange space for time, that is, "waste" the hardware resources of the server in exchange for its operating efficiency. This is the concept of a pool.
A pool is a collection of resources that are fully created and initialized when the server is started. This is called static resource allocation.
Although I/O multiplexing can monitor multiple file descriptors at the same time, it is itself blocked.
The system calls to realize I/O multiplexing under Linux mainly include select, poll and epoll
The purpose of the select system call is to monitor readable, writable, and abnormal events on the file descriptors that the user is interested in within a specified period of time.
In the network program, there is only one abnormal situation that select can handle: out-of-band data is received on the socket.
The poll system call is similar to select. It also polls a certain number of file descriptors within a specified time to test whether there is a ready one.
epoll is a Linux-specific I/O multiplexing function.
First, epoll uses a set of functions to accomplish a task, rather than a single function. Secondly, epoll puts the events on the file descriptors that users care about into an event table in the kernel, so that there is no need to repeatedly pass in the file descriptor set or event set every time it is called like select and poll. But epoll needs to use an additional file descriptor to uniquely identify this event table in the kernel.
The main interface of the epoll series system calls is the epoll_wait function. It waits for events on a set of file descriptors for a timeout
There are two modes for epoll to operate on file descriptors: LT (Level Trigger, level trigger) mode and ET (Edge Trigger, edge trigger) mode. LT mode is the default working mode. In this mode, epoll is equivalent to a poll with higher efficiency. When registering an EPOLLET event on a file descriptor in the epoll kernel event table, epoll will operate the file descriptor in ET mode. ET mode is an efficient working mode of epoll.
We expect a socket connection to be handled by only one thread at any one time. This can be achieved using epoll's EPOLLONESHOT event.
When a thread is processing a socket, it is impossible for other threads to have the opportunity to operate the socket. But thinking about it the other way around, once the socket registered for the EPOLLONESHOT event is processed by a thread, the thread should immediately reset the
EPOLLONESHOT event on the socket to ensure that the EPOLLIN event can be triggered when the socket is readable next time. This in turn gives other worker threads the opportunity to continue processing the socket.
In practical applications, many server programs can monitor multiple ports at the same time, such as the super service inetd and the android debugging service adbd.
From the parameters of the bind system call, one socket can only be bound to one socket address, that is, one socket can only be used to monitor one port. Therefore, if the server wants to listen to multiple ports at the same time, it must create multiple sockets and bind them to each port.
Even if it is the same port, if the server wants to process TCP and UDP requests on the port at the same time, it needs to create two different sockets: one is a stream socket and the other is a datagram socket, and bind them both to this port. port.
The Linux Internet service inetd is a super service. It manages multiple sub-services at the same time, that is, listens to multiple ports. The inetd service program used on Linux systems is usually its upgraded version xinetd.
A signal is information sent to a target process by a user, system, or process to notify the target process of a state change or system exception.
Under Linux, the API for a process to send signals to other processes is the kill function.
When the target process receives a signal, it needs to define a receiving function to handle it.
To set a handler for a signal, use the signal system call
A more robust interface for setting signal handlers is the following system call: sigaction()
After setting the process signal mask, the masked signal will not be received by the process. If a masked signal is sent to the process, the operating system sets the signal as a suspended signal for the process. If we unmask the pending signal, it can be received by the process immediately
sigpending() can get the signal set that the process is currently suspended
When the controlling terminal of a process is suspended, the SIGHUP signal will be fired. For network daemons without a controlling terminal, they usually use the SIGHUP signal to force the server to re-read the configuration file.
The strace command can trace the system calls called and the signals received when the program is executed.
By default, writing data to a pipe or socket connection that is closed by the reader will raise the SIGPIPE signal. We need to catch and handle this signal in the code, or at least ignore it, because the default behavior of the program receiving the SIGPIPE signal is to terminate the process, and we definitely don't want the program to exit due to a wrong write operation.
The kernel notifies the application of the arrival of out-of-band data and can use the SIGURG signal
The third type of event that network programs need to handle is timed events. We need to encapsulate each timing event into a timer separately, and use some kind of container data structure, such as linked list, sorted linked list, time heap and time wheel, to connect all timers in series to achieve unified management of timing events
Linux provides three timing methods, they are:
❑ socket options SO_RCVTIMEO and SO_SNDTIMEO. These two options are only for socket-specific system calls related to data receiving and sending
❑ SIGALRM signal. Once the real-time alarm clock set by the alarm and setitimer functions times out, the SIGALRM signal will be triggered.
❑ The timeout parameter of the I/O multiplexing system call. The three groups of I/O multiplexing system calls under Linux all have timeout parameters, so they can not only uniformly process signals and I/O events, but also uniformly process timing events. However, since the I/O multiplexing system call may return before the timeout expires (I/O events occur), if we want to use them for timing, we need to constantly update the timing parameters to reflect the remaining time
Another way to design a timer is: use the timeout time of the timer with the smallest timeout time among all timers as the heartbeat interval
We call the timer implemented with min-heap as time heap.
Three types of events that Linux server programs must handle: I/O events, signals, and timing events.
High-performance I/O framework library Libevent
The objects to be processed by the I/O framework library, that is, I/O events, signals and timing events, are collectively called event sources. An event source is usually bound to a handle. The role of the handle is that when the kernel detects a ready event, it will notify the application of this event through the handle.
In the Linux environment, the handle corresponding to the I/O event is the file descriptor, and the handle corresponding to the signal event is the signal value.
The I/O framework library generally encapsulates various I/O multiplexing system calls supported by the system into a unified interface, which is called an event multiplexer.
Chapter 12 Skip
The system call to create a new process under Linux is fork.
Sometimes we need to execute other programs in the child process, that is, to replace the current process image, which requires using one of the exec series functions
A child process is said to be in a zombie state after the child process finishes running but before the parent process reads its exit status.
Pipes can only be used for communication between two associated processes (such as parent and child processes). The three System V IPCs discussed below can be used for communication between multiple unrelated processes, because they all use a globally unique key value to identify a channel.
The APIs of Linux semaphores are all defined in the sys/sem.h header file, which mainly includes three system calls: semget, semop and semctl.
The semget system call creates a new semaphore set, or retrieves an existing semaphore set.
The semop system call changes the value of the semaphore, that is, performs P and V operations.
The semctl system call allows the caller to have direct control over the semaphore.
Among these operations, GETNCNT, GETPID, GETVAL, GETZCNT and SETVAL operate on a single semaphore, which is the sem_num semaphore in the semaphore set specified by the identifier sem_id; while other operations target the entire semaphore set, so The parameter sem_num of semctl is ignored.
The shmget system call creates a new shared memory, or acquires an existing shared memory.
After the shared memory is created/acquired, we cannot access it immediately, but need to associate it with the address space of the process first. After using the shared memory, we also need to separate it from the process address space. These two tasks are implemented by the following two system calls respectively:
The shmctl system call controls certain properties of shared memory.
Message queues are a simple and efficient way of passing binary chunks of data between two processes.
The APIs of Linux message queues are all defined in the sys/msg.h header file, including 4 system calls: msgget, msgsnd, msgrcv and msgctl.
The msgget system call creates a message queue, or gets an existing message queue.
The msgsnd system call adds a message to the message queue.
The msgrcv system call fetches messages from the message queue.
The msgctl system call controls certain properties of message queues.
Linux provides the ipcs command to observe which shared resource instances exist on the current system.
Passing a file descriptor is not to pass the value of a file descriptor, but to create a new file descriptor in the receiving process, and the file descriptor and the file descriptor passed in the sending process point to the same file entry.
How to pass file descriptors between two unrelated processes? Under Linux, we can use the UNIX domain socket to transfer special auxiliary data between processes to realize the transfer of file descriptors
The function to create a thread is pthread_create.
The function to end a thread is pthread_exit
All threads in a process can call the pthread_join function to recycle other threads (provided that the target thread is recyclable, see later), that is, to wait for other threads to end.
Terminate a thread abnormally, that is, cancel the thread pthread_cancel
3 mechanisms dedicated to thread synchronization: POSIX semaphores, mutexes, and condition variables
On Linux, the semaphore API has two groups. One group is the System VIPC semaphore, and the other group is the POSIX semaphore we are going to discuss now.
Commonly used POSIX semaphore functions are the following five:

#include＜semaphore.h＞
int sem_init(sem_t*sem,int pshared,unsigned int value);
int sem_destroy(sem_t*sem);
int sem_wait(sem_t*sem);
int sem_trywait(sem_t*sem);
int sem_post(sem_t*sem);

Mutex locks (also known as mutexes) can be used to protect critical code segments to ensure their exclusive access, which is a bit like a binary semaphore.
There are five main functions related to POSIX mutex locks:

#include＜pthread.h＞
int pthread_mutex_init(pthread_mutex_t*mutex,const
pthread_mutexattr_t*mutexattr);
int pthread_mutex_destroy(pthread_mutex_t*mutex);
int pthread_mutex_lock(pthread_mutex_t*mutex);
int pthread_mutex_trylock(pthread_mutex_t*mutex);
int pthread_mutex_unlock(pthread_mutex_t*mutex);

The mutex attribute pshared specifies whether the mutex is allowed to be shared across processes
The mutex attribute type specifies the type of mutex.
Condition variables are used to synchronize the value of shared data between threads. Condition variables provide a notification mechanism between threads: when a shared data reaches a certain value, wake up the thread waiting for the shared data.
There are five main functions related to condition variables:

#include＜pthread.h＞
int pthread_cond_init(pthread_cond_t*cond,const
pthread_condattr_t*cond_attr);
int pthread_cond_destroy(pthread_cond_t*cond);
int pthread_cond_broadcast(pthread_cond_t*cond);
int pthread_cond_signal(pthread_cond_t*cond);
int pthread_cond_wait(pthread_cond_t*cond,pthread_mutex_t*mutex);

If a function can be called by multiple threads at the same time without race conditions, we call it thread safe, or it is a reentrant function.
Some library functions are not reentrant mainly because they use static variables internally. However, Linux provides corresponding reentrant versions for many non-reentrant library functions. The function names of these reentrant versions are added with _r at the end of the original function name.
A dedicated thread should be defined to handle all signals.

Optimize and monitor server performance

An excellent feature of the Linux platform is kernel fine-tuning, that is, we can adjust kernel parameters by modifying files.
During the development of the server, we may encounter various unexpected errors. One way to debug is to capture packets with tcpdump, as described in earlier chapters of this book. However, this method is mainly used to analyze the input and output of the program. For server logic errors, a more convenient debugging method is to use the gdb debugger.
Writing stress testing tools is often considered a part of server development.
p590 skips, gdb debugging won't
Linux provides many useful tools to facilitate developers to debug and evaluate server programs.
A few of the most commonly used tools: tcpdump, nc, strace, lsof, netstat, vmstat, ifstat, and mpstat.
tcpdump is a classic network packet capture tool.
lsof ( list open file ) is a tool that lists the file descriptors currently open on the system.
The nc (netcat) command is short and powerful, and it is mainly used to quickly build a network connection. We can make it run as a server, listen to a certain port and accept client connections, so it can be used to debug client programs. We can also make it run as a client, initiate a connection to the server and send and receive data, so it can be used to debug the server program, and it is a bit like a telnet program at this time.
strace is an important tool for testing server performance. It tracks the system calls executed and received signals during the running of the program, and outputs the system call name, parameters, return value and signal name to the standard output or the specified file.
vmstat is the abbreviation of virtual memory statistics, it can output the usage of various resources of the system in real time, such as process information, memory usage, CPU usage and I/O usage.
ifstat is the abbreviation of interface statistics, it is a simple network traffic monitoring tool.
mpstat is an abbreviation of multi-processor statistics, which can monitor the usage of each CPU on a multi-processor system in real time.

Write Makefile with me

Whether it is C or C++, the source file must first be compiled into an intermediate code file, which is .obj file under Windows, and .o file under UNIX, namely Object File. This action is called compile. Then a large number of Object Files are synthesized into executable files. This action is called a link.
At compile time, the compiler only detects program syntax and whether functions and variables are declared. If the function is not declared, the compiler will give a warning, but can generate ObjectFile. When linking the program, the linker will search for the implementation of the function in all Object Files. If it cannot find it, it will report a link error code (Linker Error)
When the make command is executed, a makefile is required to tell the make command how to compile and link the program
If more than one file in the prerequisites is newer than the target file, the command defined by command will be executed.
After the dependencies are defined, the following line defines the operating system command for how to generate the target file, which must start with a Tab key.
Declare the variable object, and then use the variable through $(object)
Adding "-" in front of the command means that the execution will continue regardless of the error
cc informs the dependent files of the target file and updates the target file

make supports three wildcards: * , ? and ~ .
$? is an automation variable
The special variable VPATH tells make that if the dependent files and target files cannot be found in the current directory, go to the directory shown in VPATH to find them. If VPATH is not defined, it will only be found in the current directory
.PHONY" to explicitly indicate that a target is a "pseudo target"
Most C/C++ compilers support a "-M" option, which automatically finds the header files included in the source files and generates a dependency.
make will output the command line it will execute to the screen before the command is executed. When we use the @ character before the command line, then this command will not be displayed by make
If you want to apply the result of the previous command to the next command, you should use a semicolon to separate the two commands instead of writing them on two lines
A local variable can be set for a certain target. This variable is called "Target-specific Variable". It can have the same name as a "global variable". Valid only within scope. It will not affect the value of global variables outside the rule chain.
ifdef <variable-name> The expression is true if the variable <variable-name> has a non-null value. Otherwise, the expression is false.
ifeq (<arg1>, <arg2>) compares whether the values of the arguments arg1 and arg2 are the same.
ifneq (<arg1>, <arg2>) compares whether the values of arguments arg1 and arg2 are the same, and returns true if they are different.
ifndef <variable-name> is the opposite of ifdef
Function calls, much like the use of variables, are also identified by $, and their syntax is as follows: $(<function> <arguments>)
$(subst <from>,<to>,<text>) Replace the <from> string in the string <text> with <to>.
$(patsubst <pattern>,<replacement>,<text>) Find the words in <text> (words separated by "space", "Tab" or "carriage return" "line feed") match the pattern <pattern>, if If it matches, replace it with <replacement>.
$(strip <string>) remove the empty characters at the beginning and end of the <string> string
$(findstring <find>,<in>)
Find <find> string in string <in>
$(filter <pattern...>,<text>) Filter words in <text> string by pattern <pattern>, keeping words matching pattern <pattern>. There can be multiple patterns.
$(filter-out <pattern...>,<text>) Filter the words in the <text> string with the pattern <pattern>, and remove the words that match the pattern <pattern>. There can be multiple patterns.
$(sort <list>) Sorts the words in the string <list> (in ascending order).
$(word <n>,<text>) takes the <n>th word in the string <text>. (From the beginning)
$(wordlist <ss>,<e>,<text>) Get the word string from <ss> to <e> from the string <text>. <ss> and <e> are a number.
$(words <text>) Count the number of words in the string in <text>
$(firstword <text>) Get the first word in the string <text>
$(dir <names...>) Extracts the directory part from the filename sequence <names>. The directory part is the part before the last backslash (/). If there is no backslash then ./ is returned.
$(notdir <names...>) Extract the non-directory part from the filename sequence <names>. The non-directory part is the part after the last backslash (/).
$(suffix <names...> ) Extract the suffix of each filename from the filename sequence <names>.
$(basename <names...>) Extract the prefix part of each filename from the filename sequence <names>.
$(addsuffix <suffix>,<names...>) Add the suffix <suffix> to each word in <names>.
$(addprefix <prefix>,<names...>) Add prefix <prefix> to each word in <names>.
$(join <list1>,<list2>) Add the words in <list2> to the words in <list1> correspondingly. If <list1> has more words than
<list2>, the extra words in <list1> will remain as they are. If <list2> has more words than <list1>, then the extra words in <list2> will be copied to <list1>
$(foreach <var>,<list>,<text>) Take out the words in the parameter <list> one by one and put them into the variable specified by the parameter <var>, and then execute the expression contained in <text>. Each time <text> will return a string, during the loop, each string returned by <text>
will be separated by spaces, and finally when the entire loop ends, each string returned by <text> will consist of The entire string of (
separated by spaces) will be the return value of the foreach function.
$(if <condition>,<then-part>) 或者$(if <condition>,<then-part>,<else-part>)
$(call <expression>,<parm1>,<parm2>,...,<parmn>) can be used to create new parameterized functions.
$(origin <variable>) tells you where your variable variable comes from
Shell functions are not like other functions. As the name suggests, its parameters should be the commands of the operating system Shell
There are three exit codes after the make command is executed: 0 means successful execution. 1 indicates an error. Returns 2 if you use the "-q" option to make and make makes some targets unnecessary to update
make -f xx specified makefile
Any target in the makefile can be specified as the final target, but except those starting with - or containing =
commonly used implicit rules

- Implicit rules for compiling C programs.
  The dependent target of the target of .o is automatically deduced as .c, and its generation command is $(CC) –c (CFLAGS)
- Implicit rules for compiling C++ programs.
  The dependent target of the .o target is automatically deduced as .cc or .C, and its build command is (CPPFLAGS) $(CXXFLAGS). (It is recommended to use .cc as the suffix of C++ source files instead of .C)
- Implicit rules for compiling Pascal programs.
  The dependent target for a .o target is automatically deduced as .p, and its build command is $(PC) –c $(PFLAGS).
- Implicit rules for assembly and assembly preprocessing.
  The dependent target of the .o target will be automatically deduced as .s, and the compiler as is used by default, and its generation command is: $ (AS)
  $(ASFLAGS) . The dependent target of the target of .s will be automatically deduced as .S, and the C precompiler cpp is used by default, and
  its generation command is: $(AS) $(ASFLAGS) .
- Implicit rules for linking Object files.
  The target depends on .o, and the linker is generated by running the C compiler (usually ld), and the generation command
  is: $(CC) $(LDFL

AR : Function library packer. The default command is ar
AS : Assembly Language Compiler. The default command is as
CC : Compile the program in C language. The default command is cc
CXX : C++ language compiler. The default command is g++
CO : expand file program from RCS file. The default command is co
CPP : preprocessor for C programs (output is standard output). The default command is $(CC) –E
The so-called automatic variable means that this variable will automatically take out a series of files defined in the pattern one by one until all the files matching the pattern are taken out.
$@ : Indicates the target file set in the rule. In a pattern rule, if there are multiple targets, then $@ is the set that matches the pattern definition in the target.
$% : Only if the target is in a library file, it indicates the target member name in the rule. For example, if a target is foo.a(bar.o), then $% is bar.o and $@ is foo.a. If the target is not a function library file (.a under Unix, .lib under Windows), then its value is empty.
$< : depends on the first target name in the target. If the dependency target is defined with a pattern (ie % ), then $< will be a sequence of files matching the pattern. Note that they are taken out one by one.
$? : A collection of all dependent targets newer than the target. separated by spaces.
$^ : A collection of all dependent targets. separated by spaces. If there are multiple duplicates in the dependent target, then this variable will remove the duplicate dependent target and only keep one copy.
$+ : This variable is like $^ and is also the set of all dependent targets. It's just that it doesn't remove duplicate dependency targets.
$* : This variable represents the part up to and including % in the target pattern. If the target is dir/a.foo.b, and the target's pattern is a.%.b, then the value of $* is dir/foo. This variable is useful for constructing associated filenames. If there is no pattern definition in the target, then $* cannot be deduced, but if the suffix of the target file is recognized by make, then $* is the part except the suffix. For example: if the target is foo.c, since .c is a suffix recognized by make, the value of $* is foo. This feature is GNU make, and it is likely not compatible with other versions of make, so you should try to avoid using $*, except in implicit rules or static patterns. If the suffix in the target is not recognized by make, then $* is empty.

Advanced Programming in UNIX Environment

The kernel uses the exec function to read the program into memory and execute the program
Signals are used to notify a process that something has happened
Clock time, also known as wall clock time, is the total amount of time a process has been running, and its value is related to the number of processes running simultaneously in the system
User CPU time is the amount of time spent executing user instructions, and system CPU time is the time spent executing kernel programs for that process
sbrk(2) is not a general-purpose memory manager that handles storage space allocation in UNIX system calls. It increases or decreases the process address space by the specified number of bytes.
The UNIX system shell associates file descriptor 0 with the standard input of the process, file descriptor 1 with standard output, and file descriptor 2 with standard error
The file offset can be greater than the current length of the file, in which case the next write to the file will lengthen the file and create a hole in the file
Most file systems use some kind of read-ahead technique to improve performance. When it detects that a sequential read is in progress, the system will try to read more data than the application requires, and pretend that the application will read the data soon.
The umask() function creates a mask word for the process to set the file mode
The three functions chmod, fchmod, and fchmodat can change the access permissions of existing files
Sticky bit: S_ISVTX bit. If this bit is set for an executable program file, when the program is executed for the first time, a copy of the program text is still stored in the swap area when the program is terminated, which makes it possible to It can be loaded into memory faster. Later versions of UNIX called it the save-text bit
Full buffering: the actual IO operation is performed after the standard IO buffer is filled
Line buffering: When the input and output encounter a newline character, the standard IO library performs IO operations
Annotation errors are unbuffered, streams opened to terminal devices are line-buffered, other streams are fully buffered
int main(int argc, char *argv[]), argc is the number of command line parameters, argv is an array of pointers to parameters
Each program receives an environment table. Like the parameter table, the environment table is also an array of string pointers.
The setjmp and longjmp functions can jump across functions, which is very useful for handling errors that occur in deeply nested function calls
The process with ID 0 is usually the scheduling process and is often called the swapping process. The process is part of the kernel
Process ID 1 is usually the init process, called by the kernel at the end of the bootstrap process. This process is responsible for starting the UNIX system after bootstrapping the kernel, booting the system into a state. The init process does not terminate, it is a normal user process
Process ID 2 is a daemon process (some UNIX), responsible for supporting paging operations of the virtual memory system
The child process is a copy of the parent process, and the child process obtains a copy of the parent process's data space, heap, and stack. This is the copy owned by the child process, the parent process and the child process do not share these memory space parts
A process that has terminated but has not been dealt with by its parent process is called a zombie process
The function to obtain the termination status of the process - waitid()
After creating a new child process with the fork function, the child process often needs to call an exec function to execute another program
Because calling exec does not create a new process, the process IDs before and after have not changed. exec just replaces the text segment, data segment, heap segment, and stack segment of the current process with a new program on the disk.
There are several rules about who can change the ID: if the process has a superuser ID, the setuid function sets the actual user ID, effective user ID, and saved set user ID to uid; if the process does not have superuser privileges, but uid is equal to the actual user ID or a saved set user id, then setuid only sets the effective user id to uid, without changing the actual user id and the saved set user id
An unprivileged user can always exchange real user ID and effective user ID
The system usually records the name used by the user to log in, and the login name can be obtained with the getlogin function
A process can choose to run at a lower priority by adjusting the nice value (by adjusting the nice value to reduce its CPU usage, so the process is "good"). Only privileged processes are allowed to elevate scheduling privileges
A process can get or change its nice value through the nice function
The main difference between logging into the system via a serial terminal and logging into the system via the network is that with a network login, the connection between the terminal and the computer is no longer point-to-point. In the case of network login, login is only an available service, which is of the same nature as other network services (FTP, SMTP)
A session is a collection of one or more process groups
tcgetpgrp() returns the foreground process group ID, which is associated with the terminal opened on fd

C++ classic book reading notes

Essential C++

C++ Primer

basic knowledge

Class Designer's Tools

Effective C++

Linux high-performance server programming

TCP/IP protocol family and various important network protocols

server programming

Optimize and monitor server performance

Write Makefile with me

Advanced Programming in UNIX Environment

Guess you like