Zero-length array

What is a zero-length array

As the name implies, a zero-length array is an array of length zero.

The ANSI C standard stipulates: when defining an array, the length of the array must be a constant, that is, the length of the array is determined at compile time. The method of defining an array in ANSI C is as follows:

int  a[10];

The new C99 standard stipulates that a variable-length array can be defined.

int len;
int a[len];

That is to say, the length of the array is not determined at compile time, it is only determined when the program is running, and even the size can be specified by the user. For example, we can define an array, and then specify the size of the array when the program runs, and we can also initialize the array by entering data. The sample code is as follows.

int main(void)
{
    int len;

    printf("input array len:");
    scanf("%d",&len);
    int a[len];

    for(int i=0;i<len;i++)
    {
        printf("a[%d]= ",i);
        scanf("%d",&a[i]);
    }

      printf("a array print:\n");
    for(int i=0;i<len;i++)
        printf("a[%d] = %d\n",i,a[i]);

    return 0;
}

In this program, we define a variable len as the length of the array. After the program runs, we can specify the length of the array by input and initialize it, and finally print the elements of the array. The running result of the program is as follows:

input array len:3
a[0]= 6
a[1]= 7
a[2]= 8
a  array print:
a[0] = 6
a[1] = 7
a[2] = 8

GNU C may think that variable-length arrays are not fun, let's take another real hammer: support zero-length arrays. No other compiler is more ruthless than me! Yes, if we define a zero-length array in the program, you will find that in addition to the GCC compiler, it may not compile in other compilation environments or there is a warning message. The definition of a zero-length array is as follows:

int a[0];

The strange thing about a zero-length array is that it does not take up memory storage space. We use the sizeof keyword to check the size of the storage space occupied by the zero-length array in memory. The code is as follows.

int buffer[0];
int main(void)
{
    printf("%d\n", sizeof(buffer));
    return 0;
}

In this program, we define a zero-length array, use sizeof to view its size, you can see: zero-length array does not occupy space in memory, the size is 0.

Zero-length arrays generally have few opportunities to be used alone. It is often used as a member of a structure to form a variable-length structure.

struct buffer{
    int len;
    int a[0];
};
int main(void)
{
      printf("%d\n",sizeof(struct buffer));
      return 0;
}

Zero-length arrays also occupy no storage space in the structure, so the size of the buffer structure is 4.

Example of using a zero-length array

Zero-length arrays are often in the form of variable-length structures and are used by programmers in some special applications. In a variable-length structure, a zero-length array does not occupy the storage space of the structure, but we can use the member a of the structure to access the memory, which is very convenient. Examples of using variable-length structures are as follows.

struct buffer{
    int len;
    int a[0];
};
int main(void)
{
    struct buffer *buf;
    buf = (struct buffer *)malloc \
        (sizeof(struct buffer)+ 20);

    buf->len = 20;
    strcpy(buf->a, "hello wanglitao!\n");
    puts(buf->a);

    free(buf);  
    return 0;
}

In this program, we use malloc to apply for a piece of memory, the size is sizeof (buffer) + 20, which is 24 bytes in size. Among them, 4 bytes are used to store the structure type variable pointed to by the structure pointer buf, and the other 20 bytes are the memory space we really use. We can directly access this memory through the member a of the structure.

Through this flexible dynamic memory application method, a memory buffer represented by this buffer structure can be adjusted at any time, and can be large or small. This feature is very useful in some occasions. For example, many online video sites now support video playback in multiple formats: General Clear, High Definition, Ultra Clear, 1080P, Blu-ray, and even 4K. If our local program needs to apply for a buffer in the memory to buffer the decoded video data, then the buffer size required for different playback formats is different. If we apply for memory in accordance with the 4K standard, then when playing general-definition videos, we will not need such a large buffer, and waste memory in vain. Using variable-length structures, we can flexibly apply for buffers of different sizes according to the user's playback format settings, greatly saving memory space.

The use of zero-length arrays in the kernel

Zero-length arrays are usually used in the form of variable-length structures in the kernel. Today we will analyze the USB driver in the Linux kernel. In the network card driver, everyone may be familiar with a name: socket buffer, that is, socket buffer, used to transmit network data packets. Similarly, in the USB driver, there is a similar thing, called URB, whose full name is USB request block, that is, USB request block, which is used to transfer USB data packets.

struct urb {
    struct kref kref;
    void *hcpriv;
    atomic_t use_count;
    atomic_t reject;
    int unlinked;

    struct list_head urb_list;
    struct list_head anchor_list;
    struct usb_anchor *anchor;
    struct usb_device *dev;
    struct usb_host_endpoint *ep;
    unsigned int pipe;
    unsigned int stream_id;
    int status;
    unsigned int transfer_flags;
    void *transfer_buffer;
    dma_addr_t transfer_dma;
    struct scatterlist *sg;
    int num_mapped_sgs;
    int num_sgs;
    u32 transfer_buffer_length;
    u32 actual_length;
    unsigned char *setup_packet;
    dma_addr_t setup_dma;
    int start_frame;
    int number_of_packets;
    int interval;

    int error_count;
    void *context;
    usb_complete_t complete;
    struct usb_iso_packet_descriptor iso_frame_desc[0];
};

In this structure, the transmission direction, transmission address, transmission size and transmission mode of the USB data packet are defined. We do not delve into these details, we only look at the last member:

struct usb_iso_packet_descriptor iso_frame_desc[0];

At the end of the URB structure, define a zero-length array, mainly used for USB synchronous transmission. USB has 4 transmission modes: interrupt transmission, control transmission, batch transmission and synchronous transmission. Different USB devices have different requirements on transmission speed and transmission data security, and the transmission modes used are different. The USB camera has high requirements for the real-time transmission of video or images. It does not care about the frame loss of data. It does not matter if a frame is lost, and then it is downloaded. So the USB camera adopts the USB synchronous transmission mode.

Now the USB camera on Taobao, open its manual, generally supports multiple resolutions: from 16 * 16 to HD 720P multiple formats. For video transmission with different resolutions, for a frame of image data, the size and number of USB transmission data packets are different. How should USB be designed to adapt to the data transmission requirements of different sizes, but it does not affect other USB transmission modes? The answer lies in this zero-length array in the structure.

When users set different resolutions to transmit video, USB needs to use different size and number of data packets to transmit one frame of video data. This variable-length structure formed by a zero-length array can meet this requirement. According to the size of a frame of image data, it can flexibly apply for memory space to meet the data transmission of different sizes. However, this zero-length array does not occupy the storage space of the structure. When the USB uses other modes for transmission, it is not affected in any way, and it is entirely possible that this zero-length array does not exist. So, I have to say that such a design is really wonderful!

Thinking: why not use pointers instead of zero-length arrays?

In various occasions, you may often see such words: when the array name is passed as a function parameter, it is equivalent to a pointer. Here, we must not be confused by this sentence: when the array name is passed as a function parameter, it is indeed passed an address, but the array name is never a pointer, and the two are not the same thing. The array name is used to characterize the address of a continuous memory storage space, and the pointer is a variable. The compiler must allocate a separate memory space to it to store the address of the variable it points to. Let's look at this program below.

struct buffer1{
    int len;
    int a[0];
};
struct buffer2{
    int len;
    int *a;
};
int main(void)
{
    printf("buffer1: %d\n", sizeof(struct buffer1));
    printf("buffer2: %d\n", sizeof(struct buffer2));
    return 0;
}

The results are as follows:

buffer1:4
buffer2:8

For a pointer variable, the compiler must allocate a storage space for the pointer variable separately, and then store the address of another variable in this storage space. We will say that the pointer points to this variable. The array name, the compiler will not allocate a storage space for it, it is just a symbol, like the function name, used to represent an address. We next look at another program.

//hello.c
int array1[10] ={1,2,3,4,5,6,7,8,9};
int array2[0];
int *p = &array1[5];
int main(void)
{
    return 0;
}

In this program, we define an ordinary array, a zero-length array, and a pointer variable. The value of this pointer variable p is the address of the array element array1 [5], which means that the pointer p points to arraay1 [5]. We then compile and disassemble this program using the arm cross-compiler.

$ arm-linux-gnueabi-gcc hello.c -o a.out
$ arm-linux-gnueabi-objdump -D a.out

From the assembly code generated by disassembly, we find the assembly code of array1 and pointer variable p.

00021024 <array1>:
   21024:    00000001    andeq   r0, r0, r1
   21028:    00000002    andeq   r0, r0, r2
   2102c:    00000003    andeq   r0, r0, r3
   21030:    00000004    andeq   r0, r0, r4
   21034:    00000005    andeq   r0, r0, r5
   21038:    00000006    andeq   r0, r0, r6
   2103c:    00000007    andeq   r0, r0, r7
   21040:    00000008    andeq   r0, r0, r8
   21044:    00000009    andeq   r0, r0, r9
   21048:    00000000    andeq   r0, r0, r0
0002104c <p>:
   2104c:    00021038    andeq   r1, r2, r8, lsr r0
Disassembly of section .bss:

00021050 <__bss_start>:
   21050:    00000000    andeq   r0, r0, r0

From the assembly code, we can see that for the array of length 10 array1 [10], the compiler allocated 40 bytes of storage space from 0x21024--0x21048, but did not allocate storage for the array name array1 Space, the array name array1 only represents the first address of the 40 consecutive storage spaces, that is, the address of the array element array1 [0]. For array2 [0], a zero-length array, the compiler does not allocate storage space for it. At this time, array2 is just a symbol used to represent an address in memory. We can check the executable file a.out To find this address value.

$ readelf -s  a.out
    88: 00021024    40 OBJECT  GLOBAL DEFAULT   23 array1
    89: 00021054     0 NOTYPE  GLOBAL DEFAULT   24 _bss_end__
    90: 00021050     0 NOTYPE  GLOBAL DEFAULT   23 _edata
    91: 0002104c     4 OBJECT  GLOBAL DEFAULT   23 p
    92: 00010480     0 FUNC    GLOBAL DEFAULT   14 _fini
    93: 00021054     0 NOTYPE  GLOBAL DEFAULT   24 __bss_end__
    94: 0002101c     0 NOTYPE  GLOBAL DEFAULT   23 __data_start_
    96: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    97: 00021020     0 OBJECT  GLOBAL HIDDEN    23 __dso_handle
    98: 00010488     4 OBJECT  GLOBAL DEFAULT   15 _IO_stdin_used
    99: 0001041c    96 FUNC    GLOBAL DEFAULT   13 __libc_csu_init
    100: 00021054     0 OBJECT  GLOBAL DEFAULT   24 array2
    101: 00021054     0 NOTYPE  GLOBAL DEFAULT   24 _end
    102: 000102d8     0 FUNC    GLOBAL DEFAULT   13 _start
    103: 00021054     0 NOTYPE  GLOBAL DEFAULT   24 __end__
    104: 00021050     0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
    105: 00010400    28 FUNC    GLOBAL DEFAULT   13 main
    107: 00021050     0 OBJECT  GLOBAL HIDDEN    23 __TMC_END__
    110: 00010294     0 FUNC    GLOBAL DEFAULT   11 _init

As you can see from the symbol table, the address of array2 is 0x21054, which is behind the bss section of the program. The default address indicated by the array2 symbol is an unused memory space, nothing more, the compiler will never allocate a memory space to store the array name. Seeing this, you may understand that array names and pointers are not the same thing. Although array names can be used as an address when they are used as function parameters, they cannot be equal. Choppers can sometimes be used as weapons, but you cannot say that choppers are weapons.

As for why not use pointers, it is very simple. If you use a pointer, the pointer itself will also take up storage space. Not to mention, according to the case analysis of the USB driver above, you will find that it is far less clever than a zero-length array-it will not cause redundancy in the structure definition, It is also very convenient to use.

发布了81 篇原创文章 · 获赞 69 · 访问量 5万+

Guess you like

Origin blog.csdn.net/s2603898260/article/details/103657907