C language 0-length array (variable array/flexible array)

Zero-length array concept:

As we all know, GNU/GCC has made practical extensions on the basis of standard C/C++, and Arrays of Length Zero is one of the well-known extensions.

In most cases, it is used in variable-length arrays, which are defined as follows:

struct Packet
{
    int state;
    int len;
    char cData[0]; //这里的0长结构体就为变长结构体提供了非常好的支持
};

First, an explanation of 0-length arrays, also called flexible arrays:

  • Purpose: The main purpose of an array with a length of 0 is to meet the need for variable-length structures

  • Usage: At the end of a structure, declare an array with a length of 0 to make the structure variable in length. For the compiler, an array with a length of 0 does not take up space at this time, because the array name It does not take up space by itself, it is just an offset, and the symbol of the array name itself represents an unmodifiable address constant

(Note: array names are never pointers!), but for the size of this array, we can dynamically allocate

Note: If the structure is generated by dynamic allocation methods such as calloc, malloc or new, the corresponding space should be released when it is not needed.

Advantages: This method is more efficient than declaring a pointer variable in the structure and then dynamically allocating it. Because indirect access is not required when accessing the contents of the array, two memory fetches are avoided.

Disadvantages: In the structure, the array whose array is 0 must be declared at the end, and there are certain restrictions on its use.

For the compiler, the array name is just a symbol, it does not take up any space, it just represents an offset in the structure, representing an unmodifiable address constant!

Uses of 0-length arrays:

We imagine such a scenario, the data buffer we use in the network communication process, the buffer includes a len field and a data field, which respectively identify the length of the data and the transmitted data, we have several common design ideas:

  • Fixed-length data buffer, set a data buffer of sufficient size MAX_LENGTH

  • Set a pointer to the actual data, and dynamically open up the data buffer space according to the length of the data each time it is used

We consider their advantages and disadvantages from the design applied in the actual scene. The main considerations are the development, release and access of buffer space.

1. Fixed-length package (open up space, release, access):

For example, if I want to send 1024 bytes of data, if I use a fixed-length packet, assuming that the length of the fixed-length packet is MAX_LENGTH is 2048, it will waste 1024 bytes of space and cause unnecessary traffic waste:

  • Data structure definition:

//  定长缓冲区
struct max_buffer
{
    int     len;
    char    data[MAX_LENGTH];
};
  • Data structure size: considering alignment, then the size of the data structure >= sizeof(int) + sizeof(char) * MAX_LENGTH

Considering the overflow of data, the length of the data array in the variable-length data packet is generally set long enough to accommodate the largest data, so the data array in max_buffer is not filled with data in many cases, thus causing waste

  • Construction of data packet: If we want to send CURR_LENGTH = 1024 bytes, how do we construct this data packet; in general, we will return a pointer to the buffer data structure max_buffer:

    ///  开辟
    if ((mbuffer = (struct max_buffer *)malloc(sizeof(struct max_buffer))) != NULL)
    {
        mbuffer->len = CURR_LENGTH;
        memcpy(mbuffer->data, "Hello World", CURR_LENGTH);


        printf("%d, %s\n", mbuffer->len, mbuffer->data);
    }
  • Access: This section of memory is divided into two parts; the first 4 bytes p->len, as the header (that is, the extra part), this header is used to describe the length of the data part immediately after the header, here It is 1024, so the first four bytes are assigned a value of 1024 (since we want to construct an indefinite length data packet, how long is this packet? Therefore, we must use a variable to indicate the length of this data packet, which is len function); and the memory immediately following is the real data part, through p->data, and finally, a memcpy() memory copy is performed, and the data to be sent is filled into this memory

  • Release: Then when the space for freeing data is used up, it can be released directly


    /// 销毁
    free(mbuffer);
    mbuffer = NULL;

2. Summary:

  • Use a fixed-length array as a data buffer. In order to avoid buffer overflow, the size of the array is generally set to MAX_LENGTH, but in actual use, there are very few data reaching the length of MAX_LENGTH. In most cases, the buffer's Most of the space is wasted

  • But the use process is very simple, the opening and release of data space is simple, and the programmer does not need to consider additional operations

3. Pointer data packet (open space, release, access):

If you replace the above fixed-length array with a length of MAX_LENGTH with a pointer, and dynamically open up a space of CURR_LENGTH size each time you use it, then you will avoid wasting MAX_LENGTH - CURR_LENGTH space, and only waste the space of a pointer field:

  • Packet definition:

struct point_buffer
{
    int     len;
    char    *data;
};
  • Data structure size: considering alignment, then the size of the data structure >= sizeof(int) + sizeof(char *)

  • Space allocation: But it also causes the use of two steps when allocating memory

    // =====================
    // 指针数组  占用-开辟-销毁
    // =====================
    ///  占用
    printf("the length of struct test3:%d\n",sizeof(struct point_buffer));
    ///  开辟
    if ((pbuffer = (struct point_buffer *)malloc(sizeof(struct point_buffer))) != NULL)
    {
        pbuffer->len = CURR_LENGTH;
        if ((pbuffer->data = (char *)malloc(sizeof(char) * CURR_LENGTH)) != NULL)
        {
            memcpy(pbuffer->data, "Hello World", CURR_LENGTH);


            printf("%d, %s\n", pbuffer->len, pbuffer->data);
        }
    }

First, you need to allocate a piece of memory space for the structure; secondly, allocate memory space for the member variables in the structure.

In this way, the memory allocated twice is discontinuous, and needs to be managed separately. When using an array with a length of , the principle of one allocation is adopted, and all the required memory is allocated to it at one time.

  • Release: Conversely, the same is true for release:

    /// 销毁
    free(pbuffer->data);
    free(pbuffer);
    pbuffer = NULL;
  • summary:

    • Using the pointer result as a buffer only uses a pointer-sized space, and there is no need to use an array of MAX_LENGTH length, which will not cause a lot of waste of space

    • But that is when opening up space, you need to open up additional space for the data field, and you need to display the space for releasing the data field when you cast it, but in actual use, you often open up space in the function, and then return the pointer to the struct point_buffer to the user. At this time, we cannot assume that the user understands the details of our development and releases the space according to the agreed operation, so it is inconvenient to use and even causes memory leaks.

4. Variable length data buffer (open up space, release, access)

The fixed-length array is easy to use, but it wastes space. The pointer form only uses one more pointer space, which will not cause a lot of waste of space, but it needs to be allocated and released multiple times. Is there an implementation method that can What about not wasting space and easy to use?

GNU C's 0-length array, also called variable-length array, flexible array is such an extension. For this feature of 0-length array, it is easy to construct a structure, such as buffer, data packet, etc.:

  • Data structure definition:


//  0长度数组
struct zero_buffer
{
    int     len;
    char    data[0];
};
  • Data structure size: Such variable-length arrays are often used in network communication to construct variable-length data packets, which will not waste space and waste network traffic, because char data[0]; is just an array name and does not occupy storage space:

sizeof(struct zero_buffer) = sizeof(int)
  • Open up space: Then when we use it, we only need to open up a space once

///  开辟
    if ((zbuffer = (struct zero_buffer *)malloc(sizeof(struct zero_buffer) + sizeof(char) * CURR_LENGTH)) != NULL)
    {
        zbuffer->len = CURR_LENGTH;
        memcpy(zbuffer->data, "Hello World", CURR_LENGTH);


        printf("%d, %s\n", zbuffer->len, zbuffer->data);
    }
  • Free up space: The same is true for freeing up space, just release it once

    ///  销毁
    free(zbuffer);
    zbuffer = NULL;

Summarize:

// zero_length_array.c
#include <stdio.h>
#include <stdlib.h>


#define MAX_LENGTH      1024
#define CURR_LENGTH      512

//  0长度数组
struct zero_buffer
{
    int     len;
    char    data[0];
}__attribute((packed));


//  定长数组
struct max_buffer
{
    int     len;
    char    data[MAX_LENGTH];
}__attribute((packed));


//  指针数组
struct point_buffer
{
    int     len;
    char    *data;
}__attribute((packed));

int main(void)
{
    struct zero_buffer  *zbuffer = NULL;
    struct max_buffer   *mbuffer = NULL;
    struct point_buffer *pbuffer = NULL;


    // =====================
    // 0长度数组  占用-开辟-销毁
    // =====================
    ///  占用
    printf("the length of struct test1:%d\n",sizeof(struct zero_buffer));
    ///  开辟
    if ((zbuffer = (struct zero_buffer *)malloc(sizeof(struct zero_buffer) + sizeof(char) * CURR_LENGTH)) != NULL)
    {
        zbuffer->len = CURR_LENGTH;
        memcpy(zbuffer->data, "Hello World", CURR_LENGTH);


        printf("%d, %s\n", zbuffer->len, zbuffer->data);
    }
    ///  销毁
    free(zbuffer);
    zbuffer = NULL;


    // =====================
    // 定长数组  占用-开辟-销毁
    // =====================
    ///  占用
    printf("the length of struct test2:%d\n",sizeof(struct max_buffer));
    ///  开辟
    if ((mbuffer = (struct max_buffer *)malloc(sizeof(struct max_buffer))) != NULL)
    {
        mbuffer->len = CURR_LENGTH;
        memcpy(mbuffer->data, "Hello World", CURR_LENGTH);


        printf("%d, %s\n", mbuffer->len, mbuffer->data);
    }
    /// 销毁
    free(mbuffer);
    mbuffer = NULL;

    // =====================
    // 指针数组  占用-开辟-销毁
    // =====================
    ///  占用
    printf("the length of struct test3:%d\n",sizeof(struct point_buffer));
    ///  开辟
    if ((pbuffer = (struct point_buffer *)malloc(sizeof(struct point_buffer))) != NULL)
    {
        pbuffer->len = CURR_LENGTH;
        if ((pbuffer->data = (char *)malloc(sizeof(char) * CURR_LENGTH)) != NULL)
        {
            memcpy(pbuffer->data, "Hello World", CURR_LENGTH);


            printf("%d, %s\n", pbuffer->len, pbuffer->data);
        }
    }
    /// 销毁
    free(pbuffer->data);
    free(pbuffer);
    pbuffer = NULL;


    return EXIT_SUCCESS;
}

Support for variable-length arrays in GNU Document:

reference:


6.17 Arrays of Length Zero

C Struct Hack – Structure with variable length array

Before C90, 0-length arrays are not supported. 0-length arrays are an extension of GNU C, so they cannot be compiled in early compilers; for the extensions added by GNU C, GCC provides compilation options to clearly identify out them:

  • -pedantic option, then the corresponding warning message will be generated where the extended syntax is used

  • -Wall Use it to make GCC generate as many warning messages as possible

  • -Werror, which asks GCC to treat all warnings as errors

// 1.c
#include <stdio.h>
#include <stdlib.h>


int main(void)
{
    char a[0];
    printf("%ld", sizeof(a));
    return EXIT_SUCCESS;
}

Let's compile:

gcc 1.c -Wall   # 显示所有警告
#none warning and error

gcc 1.c -Wall -pedantic  # 对GNU C的扩展显示警告
1.c: In function ‘main’:
1.c:7: warning: ISO C forbids zero-size array ‘a’


gcc 1.c -Werror -Wall -pedantic # 显示所有警告同时GNU C的扩展显示警告, 将警告用error显示
cc1: warnings being treated as errors
1.c: In function ‘main’:
1.c:7: error: ISO C forbids zero-size array ‘a’

 The 0-length array is actually a flexibly used array that points to the continuous memory space behind it:

struct buffer
{
    int     len;
    char    data[0];
};

When the 0-length array was not introduced in the early days, everyone solved it through fixed-length arrays and pointers, but:

  • The fixed-length array defines a sufficiently large buffer, which is convenient to use, but causes a waste of space every time

  • The way of pointers requires programmers to perform multiple free operations when releasing space, and we often return pointers to buffers in functions during use, and we cannot guarantee that everyone understands and follows our guidelines release method

So GNU has extended it to a 0-length array. When using data[0], that is, a 0-length array, the 0-length array is used as the array name and does not occupy storage space.

After C99, a similar extension was added, but it used the form of char payload[] (so if you really need to use the -pedantic parameter when compiling, then you can change the char payload[0] type to into char payload[], so that it can be compiled. Of course, your compiler must support the C99 standard. If the compiler is too old, it may not support it)

// 2.c payload
#include <stdio.h>
#include <stdlib.h>

struct payload
{
    int   len;
    char  data[];
};

int main(void)
{
    struct payload pay;
    printf("%ld", sizeof(pay));
    return EXIT_SUCCESS;
}

After compiling with -pedantic, there is no warning, indicating that this syntax is C standard

gcc 2.c -pedantic -std=c99

 So the end of the structure points to the memory data behind it. Therefore, we can use this type of structure as the header format of the data message, and the last member variable is just the data content.

The GNU manual also provides two other structures to illustrate, making it easier to understand the meaning:

struct f1 {
    int x;
    int y[];
} f1 = { 1, { 2, 3, 4 } };

struct f2 {
    struct f1 f1;
    int data[3];
} f2 = { { 1 }, { 5, 6, 7 } };

I changed the 2,3,4 in f2 to 5,6,7 to show the distinction. If you type out the data. That is, the following information:

f1.x = 1
f1.y[0] = 2
f1.y[1] = 3
f1.y[2] = 4

That is, f1.y points to the data in the memory of {2,3,4}. So we can easily get that the data pointed to by f2.f1.y is exactly the content of f2.data. The printed data:


f2.f1.x = 1
f2.f1.y[0] = 5
f2.f1.y[1] = 6
f2.f1.y[2] = 7

If you are not sure whether it takes up space, you can use sizeof to calculate it. You can know that sizeof(struct f1)=4, that is, int y[] actually does not take up space. But this 0-length array must be placed at the end of the structure. If you didn't put it at the end. When compiling, there will be the following error:

main.c:37:9: error: flexible array member not at end of struct
     int y[];
         ^

At this point, you may have doubts, what if you replace int y[] in struct f1 with int *y? This involves the issue of arrays and pointers. Sometimes, these two are the same, and sometimes they are different.

The first thing to explain is that the extension of the 0-length array is supported, and the focus is on the array, that is, the int *y pointer cannot be used to replace it. The length of sizeof is different. Change struct f1 to this:

struct f3 {
    int x;
    int *y;
};

Under 32/64 bits, int is 4 bytes, sizeof(struct f1)=4, and sizeof(struct f3)=16

Because int *y is a pointer, the pointer is 64-bit under 64-bit, sizeof(struct f3) = 16, if it is in a 32-bit environment, sizeof(struct f3) is 8, and sizeof(struct f1) remains unchanged . So int *y cannot replace int y[];

code show as below:

// 3.c
#include <stdio.h>
#include <stdlib.h>


struct f1 {
    int x;
    int y[];
} f1 = { 1, { 2, 3, 4 } };

struct f2 {
    struct f1 f1;
    int data[3];
} f2 = { { 1 }, { 5, 6, 7 } };


struct f3
{
    int x;
    int *y;
};

int main(void)
{
    printf("sizeof(f1) = %d\n", sizeof(struct f1));
    printf("sizeof(f2) = %d\n", sizeof(struct f2));
    printf("szieof(f3) = %d\n\n", sizeof(struct f3));

    printf("f1.x = %d\n", f1.x);
    printf("f1.y[0] = %d\n", f1.y[0]);
    printf("f1.y[1] = %d\n", f1.y[1]);
    printf("f1.y[2] = %d\n", f1.y[2]);


    printf("f2.f1.x = %d\n", f1.x);
    printf("f2.f1.y[0] = %d\n", f2.f1.y[0]);
    printf("f2.f1.y[1] = %d\n", f2.f1.y[1]);
    printf("f2.f1.y[2] = %d\n", f2.f1.y[2]);

    return EXIT_SUCCESS;
}

Other characteristics of 0-length arrays:

1. Why does a 0-length array not take up storage space:

What is the difference between a 0-length array and pointer implementation? Why does a 0-length array not take up storage space?

In fact, it essentially involves the difference between an array and a pointer in the C language. Is the a in char a[1] the same as the b in char *b?

"Programming Abstractions in C" (Roberts, ES, Machinery Industry Press, 2004.6) page 82 says:

“arr is defined to be identical to &arr[0]”.

In other words, a in char a[1] is actually a constant, equal to &a[0]. And char *b has a real pointer variable b. So, a=b is not allowed, but b=a is allowed. Both variables support subscript access, so is there any difference between a[0] and b[0]? We can illustrate with an example.

See the following two programs gdb_zero_length_array.c and gdb_zero_length_array.c:


//  gdb_zero_length_array.c
#include <stdio.h>
#include <stdlib.h>

struct str
{
    int len;
    char s[0];
};

struct foo
{
    struct str *a;
};

int main(void)
{
    struct foo f = { NULL };

    printf("sizeof(struct str) = %d\n", sizeof(struct str));

    printf("before f.a->s.\n");
    if(f.a->s)
    {
        printf("before printf f.a->s.\n");
        printf(f.a->s);
        printf("before printf f.a->s.\n");
    }

    return EXIT_SUCCESS;
}

 

//  gdb_pzero_length_array.c
#include <stdio.h>
#include <stdlib.h>

struct str
{
    int len;
    char *s;
};

struct foo
{
    struct str *a;
};

int main(void)
{
    struct foo f = { NULL };

    printf("sizeof(struct str) = %d\n", sizeof(struct str));

    printf("before f.a->s.\n");

    if (f.a->s)
    {
        printf("before printf f.a->s.\n");
        printf(f.a->s);
        printf("before printf f.a->s.\n");
    }

    return EXIT_SUCCESS;
}

 It can be seen that although both programs have access exceptions, the location of the segment fault is different

We compile the two programs into assembly, then diff to see how their assembly code differs

gcc -S gdb_zero_length_array.c -o gdb_test.s
gcc -S gdb_pzero_length_array.c -o gdb_ptest
diff gdb_test.s gdb_ptest.s

1c1
<   .file   "gdb_zero_length_array.c"
---
>   .file   "gdb_pzero_length_array.c"
23c23
<   movl    $4, %esi
---
>   movl    $16, %esi
30c30
<   addq    $4, %rax
---
>   movq    8(%rax), %rax
36c36
<   addq    $4, %rax
---
>   movq    8(%rax), %rax
#    printf("sizeof(struct str) = %d\n", sizeof(struct str));
23c23
<   movl    $4, %esi    #printf("sizeof(struct str) = %d\n", sizeof(struct str));
---
>   movl    $16, %esi  #printf("sizeof(struct str) = %d\n", sizeof(struct str));

From the assembly of the 64-bit system, we can see that the size of the variable-length array structure is 4, and the size of the structure in the form of a pointer is 16:

f.a->s
30c30/36c36
<   addq    $4, %rax
---
>   movq    8(%rax), %rax

You can see that there are:

  • For char s[0], the assembly code uses the addq instruction, addq $4, %rax

  • For char*s, the assembly code uses the movq instruction, movq 8(%rax), %rax

addq to %rax + sizeof(struct str), that is, the end of the str structure is the address of char s[0], this step just gets its address, and movq puts the content in the address, so it is sometimes called Translated into leap instruction, see the next list

It can be seen from here that accessing the member array name actually gets the relative address of the array, and accessing the member pointer actually obtains the content in the relative address (this is the same as accessing other non-pointer or array variables):

  • The program will not crash when accessing a relative address, but the program will crash when accessing the contents of an illegal address.

// 4-1.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{

    char *a;
    printf("%p\n", a);

    return EXIT_SUCCESS;
}

4-2.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{

    char a[0];
    printf("%p\n", a);

    return EXIT_SUCCESS;
}
  • For char a[0], the assembly code uses the leal instruction, leal 16(%esp), %eax:

  • For char *a, the assembly code uses the movl instruction, movl 28(%esp), %eax

2. Address optimization:

// 5-1.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{

    char a[0];
    printf("%p\n", a);

    char b[0];
    printf("%p\n", b);

    return EXIT_SUCCESS;
}

 Since zero-length arrays are an extension of GNU C and are not allowed by the standard library, the execution results of some cleverly written weird codes depend on the implementation of the compiler and optimization strategies.

For example, in the above code, the addresses of a and b will be optimized to one place by the compiler, because a[0] and b[0] are unusable for the program, what does this remind us of?

For the same string constant, the compiler often optimizes the address to one place to reduce space occupation:

//  5-2.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{

    const char *a = "Hello";
    printf("%p\n", a);

    const char *b = "Hello";
    printf("%p\n", b);

    const char c[] = "Hello";
    printf("%p\n", c);

    return EXIT_SUCCESS;
}

 Pay attention to station B number: Xiaoyu, come quickly, +q fan group: 725022484 Receive 300G high-quality programming materials for free

Receive 300G programming materials for free icon-default.png?t=N176https://jq.qq.com/?_wv=1027&k=Idk5uNmV

Guess you like

Origin blog.csdn.net/yx5666/article/details/129589003