Explain the 5 major hidden dangers of socket network programming

1. Ignore return status

The first pitfall is obvious, but it is the most common mistake novice developers make. If you ignore the return status of functions, you may be lost when they fail or partially succeed. This, in turn, can propagate errors and make it difficult to locate the source of the problem.

Capture and check every return status instead of ignoring them. Consider the example shown in Listing 1, a socket send function.

Listing 1. Ignoring API function return status

int status, sock, mode;
/* Create a new stream (TCP) socket */
sock = socket( AF_INET, SOCK_STREAM, 0 );
...
status = send( sock, buffer, buflen, MSG_DONTWAIT );
if (status == -1)

{

  /* send failed */
  printf( "send failed: %s\n", strerror(errno) );
}

else

{

  /* send succeeded -- or did it? */
}

Listing 1 explores a function fragment that performs a socket send operation (sending data over a socket). The error status of the function is caught and tested, but this example ignores a feature of send in non-blocking mode (enabled by the MSG_DONTWAIT flag).

The send API function has three types of possible return values:

If the data is successfully queued for transmission, 0 is returned.
If queuing fails, -1 is returned (the reason for the failure can be understood by using the errno variable).
If not all characters can be queued when the function is called, the final return value is the number of characters sent.

Due to the non-blocking nature of send's MSG_DONTWAIT variable, the function call returns after all data, some data, or no data has been sent. Ignoring the return status here will result in incomplete delivery and subsequent data loss.

2. peer socket closure

The interesting thing about UNIX is that you can treat almost anything as a file. Files themselves, directories, pipes, devices, and sockets are all treated as files. This is a novel abstraction that means a complete set of APIs can be used on a wide range of device types.

Consider the read API function, which reads a certain number of bytes from a file. The read function returns the number of bytes read (up to the maximum value you specify); or -1 on error; or 0 if the end of the file has been reached.

If a read operation is completed on a socket and a return value of 0 is obtained, this indicates that the peer layer on the remote socket called the close API method. This instruction is the same as for file reading - no extra data can be read through the descriptor (see Listing 2).

List 2. Properly handle the return value of the read API function

int sock, status;
sock = socket( AF_INET, SOCK_STREAM, 0 );
...
status = read( sock, buffer, buflen );
if (status > 0)

{

  /* Data read from the socket */
}

else if (status == -1)

 {

  /* Error, check errno, take action... */
}

else if (status == 0)

 {

  /* Peer closed the socket, finish the close */
  close( sock );
  /* Further processing... */
}

Likewise, you can use the write API function to detect peer socket closures. In this case, upon receiving the SIGPIPE signal, or if the signal is blocked, the write function returns -1 and sets errno to EPIPE.

3．Address usage error (EADDRINUSE)

You can use the bind API function to bind an address (an interface and a port) to a socket endpoint. This function can be used in server settings to limit the interfaces from which connections may come. This function can also be used in client settings to limit the interfaces that should be used for outgoing connections. The most common use of bind is to associate a port number with a server and use a wildcard address (INADDR_ANY), which allows any interface to be used by incoming connections.

A common problem encountered with bind is trying to bind to a port that is already in use. The trap is that perhaps no active socket exists, but binding to the port is still prohibited (bind returns EADDRINUSE), and it is caused by the TCP socket state TIME_WAIT. This state remains for approximately 2 to 4 minutes after the socket is closed. After exiting in the TIME_WAIT state, the socket is deleted so that the address can be rebound without problems.

Waiting for TIME_WAIT to end can be annoying, especially if you are developing a socket server and need to stop the server to make some changes and then restart it. Fortunately, there are ways to avoid the TIME_WAIT state. The SO_REUSEADDR socket option can be applied to the socket so that the port can be reused immediately.

Consider the example in Listing 3. Before binding the address, I call setsockopt with the SO_REUSEADDR option. To allow address reuse, I set the integer parameter (on) to 1 (otherwise, it could be set to 0 to disable address reuse).

List 3. Use the SO_REUSEADDR socket option to avoid address usage errors

int sock, ret, on;
struct sockaddr_in servaddr;
/* Create a new stream (TCP) socket */
sock = socket( AF_INET, SOCK_STREAM, 0 ):
/* Enable address reuse */
on = 1;
ret = setsockopt( sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on) );
/* Allow connections to port 8080 from any available interface */
memset( &servaddr, 0, sizeof(servaddr) );
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl( INADDR_ANY );
servaddr.sin_port = htons( 45000 );
/* Bind to the address (interface/port) */
ret = bind( sock, (struct sockaddr *)&servaddr, sizeof(servaddr) );

The bind API function will allow immediate reuse of addresses after applying the SO_REUSEADDR option.

4. Send structured data

Sockets are the perfect tool for sending unstructured binary byte streams or ASCII data streams (such as HTTP pages over HTTP, or email over SMTP). But if you try to send binary data over a socket, things get more complicated.

Let's say you want to send an integer: can you be sure that the recipient will interpret the integer in the same way? Applications running on the same architecture can rely on their common platform to interpret this type of data the same way. But what happens if a client running on a high-endian IBM PowerPC sends a 32-bit integer to a low-endian Intel x86? Byte alignment will cause incorrect interpretation.

Byte swapping or not?

Endianness refers to the order in which bytes are arranged in memory. Big endian sorts with the most significant byte first, whereas little endian sorts with the least significant byte first.

High-endian architectures (such as PowerPC®) have advantages over low-endian architectures (such as the Intel® Pentium® series, whose network byte order is big-endian). This means that, for high-endian machines, control data within TCP/IP is naturally ordered. The little-endian architecture requires byte swapping - a slight performance weakness for network applications.

What about sending a C structure over a socket? Here, too, you run into trouble, because not all compilers arrange the elements of a structure in the same way. Structures may also be compressed to minimize wasted space, which further misaligns elements within the structure.

Fortunately, there are solutions to this problem that ensure consistent interpretation of the data on both ends. In the past, the Remote Procedure Call (RPC) suite of tools provided so-called External Data Representation (XDR). XDR defines a standard representation for data to support the development of heterogeneous network application communications.

Now, there are two new protocols that provide similar functionality. Extensible Markup Language/Remote Procedure Call (XML/RPC) arranges procedure calls over HTTP in XML format. Data and metadata are encoded in XML and transmitted as strings, separating the values from their physical representation through the host schema. SOAP follows XML-RPC and extends its ideas with better features and functionality. , for more information about each protocol.

Information Direct: Linux kernel source code technology learning route + video tutorial kernel source code

Learning Express: Linux Kernel Source Code Memory Tuning File System Process Management Device Driver/Network Protocol Stack

5. Frame synchronization assumptions in TCP

TCP does not provide frame synchronization, which makes it perfect for byte stream-oriented protocols. This is an important difference between TCP and UDP (User Datagram Protocol). UDP is a message-oriented protocol that preserves message boundaries between sender and receiver. TCP is a stream-oriented protocol that assumes that the data being communicated is unstructured, as shown in Figure 1.

figure 1. UDP’s frame synchronization capabilities and TCP’s lack of frame synchronization

The upper part of Figure 1 illustrates a UDP client and server. The peer on the left completes two socket writes of 100 bytes each. The UDP layer of the protocol stack keeps track of the number of writes and ensures that when the receiver on the right gets the data over the socket, it arrives with the same number of bytes. In other words, the message boundaries provided by the writer are preserved for readers.

Now, look at the bottom of Figure 1. It demonstrates the same granularity of write operations for the TCP layer. Two separate write operations (100 bytes each) are written to the stream socket. But in this case, the stream socket reader gets 200 bytes. The TCP layer of the protocol stack aggregates the two write operations. This aggregation can occur at either the sender or receiver of the TCP/IP protocol stack. It's important to note that aggregation may not occur - TCP only guarantees that data is sent in order.

For most developers, this pitfall causes confusion. You want the reliability of TCP and the frame synchronization of UDP. Unless another transport protocol is used, such as Streaming Transmission Control Protocol (STCP), application layer developers are required to implement buffering and segmentation functions.

Tools for debugging socket applications

GNU/Linux provides several tools that can help you identify some problems in your socket applications. In addition, using these tools is educational and can help explain the behavior of applications and the TCP/IP stack. Here you'll see an overview of several tools. Check out below for more information.

View network subsystem details

The netstat tool provides the ability to view the GNU/Linux network subsystem. Using netstat, you can view currently active connections (by individual protocols), view connections in a specific state (such as server sockets in the listening state), and many other information. Listing 4 shows some of the options provided by netstat and the features they enable.

List 4. Usage patterns for the netstat utility

View all TCP sockets currently active
$ netstat --tcp
View all UDP sockets
$ netstat --udp
View all TCP sockets in the listening state
$ netstat --listening
View the multicast group membership information
$ netstat --groups
Display the list of masqueraded connections
$ netstat --masquerade
View statistics for each protocol
$ netstat --statistics

Although many other utilities exist, netstat is comprehensive and covers the functionality of route, ifconfig, and other standard GNU/Linux tools.

Monitor traffic

Several tools for GNU/Linux can be used to examine low-level traffic on the network. The tcpdump tool is an older tool that "sniffs" network packets from the Internet and prints them to stdout or records them in a file. This feature allows viewing of traffic generated by applications and the low-level flow control mechanisms generated by TCP. A new tool called tcpflow complements tcpdump by providing protocol flow analysis and the means to appropriately reconstruct the data flow regardless of packet order or retransmission. Listing 5 shows two usage patterns for tcpdump.

Listing 5. Usage patterns of the tcpdump tool

Display all traffic on the eth0 interface for the local host
$ tcpdump -l -i eth0
Show all traffic on the network coming from or going to host plato
$ tcpdump host plato
Show all HTTP traffic for host camus
$ tcpdump host camus and (port http)
View traffic coming from or going to TCP port 45000 on the local host
$ tcpdump tcp port 45000

The tcpdump and tcpflow tools have a large number of options, including the ability to create complex filtering expressions. Check out below for more information about these tools.

Both tcpdump and tcpflow are text-based command line tools. If you prefer a graphical user interface (GUI), there is an open source tool called Ethereal that may suit your needs. Ethereal is a professional protocol analysis software that can help debug application layer protocols. Its plug-in architecture can break down protocols such as HTTP and just about any protocol you can think of (637 in total at the time of writing).

Summarize

Sockets programming is easy and fun, but you want to avoid introducing bugs or at least making them easy to find by considering the five common pitfalls described in this article and using standard error-proof programming practices. GNU/Linux tools and utilities can also help find minor problems in some programs.

Original author: Learn embedded together