A Blog Society Series (2) - Custom types in C language: structure, bit segment, enumeration, union

Table of contents

 Preface

1. Structure

1.1. Declaration of structure type

1.2. Special structure type declaration

1.3. Self-reference of structure

1.4. Definition and initialization of structure

1.5. Calling structure member variables

1.6. Structure memory alignment 

1.6.1、offsetof

1.6.2. Calculation of structure size

1.6.3. Why does memory alignment exist? 

1.7. Modify the default alignment number

1.8. Structure parameter passing

2. Bit segment 

2.1. What is a bit segment?

2.2. Memory allocation of bit segments

2.3. Cross-platform issues in bit segments

2.4. Application of bit segments 

3. Enumeration

3.1. Definition of enumeration types

3.2. Advantages of enumeration 

4. Consortium (community)

4.1. Definition of union type

4.2. Characteristics of the consortium

4.3. Calculation of union size

 Preface

The built-in types of C language are: char, short, int, long, long long, float, double.

These built-in types cannot solve all problems , and there will be some complex objects in life.

For example, describe a person, name, gender, age, height, weight...

Describe a book, title, author, publisher...

Since there are complex objects, the C language supports custom types . This is the structure, bit field, enumeration, and union (union) that this blog will talk about .

1. Structure

        A structure is a user-defined data type used to combine multiple related data items to form a complete data collection.

        An array is a collection of elements of the same type , while a structure can contain different types of data , such as integers, characters, floating point types, arrays, pointers, etc. Each data in the structure is called a member variable .

1.1. Declaration of structure type

  • Structure keyword struct
  • Custom type name tag
  • member- list member-list
  • Structure variable name variable-list
struct tag
{
	member-list;
}variable-list;

For example, describe a student: 

struct Stu
{
	char name[20];//名字
	int age;//年龄
	char sex[5];//性别
	char id[20];//学号
}s1, s2, s3;//分号不能丢 s1,s2,s3是三个结构体变量,为全局变量

int main()
{
	struct Stu s4, s5, s6; //s4,s5,s6是三个结构体变量,为局部变量
	return 0;
}

1.2. Special structure type declaration

The anonymous structure type does not have a custom type name when it is defined , and the structure variable (s1) is created when it is defined.

Features: Because there is no custom type name, it cannot be created through a custom type name later, so structure variables can only be created when defining.

struct 
{
	char name[20];//名字
	int age;//年龄
	char sex[5];//性别
	char id[20];//学号
}s1;//分号不能丢

[Error-prone reminder] 

 Is the following code feasible?

struct 
{
	char name[20];
	int age;
	char sex[5];
	char id[20];
}s1;

struct
{
	char name[20];
	int age;
	char sex[5];
	char id[20];
}* p;

int main()
{
	p = &s1;  //是否可行?
	return 0;
}

【Answer】

It's not feasible . From the compiler's perspective, although the member variables of the two structures are the same, it will still be considered to be two structure types, so the compiler will issue a warning .

1.3. Self-reference of structure

Is it okay to include a member in a structure that is of type the structure itself?

struct Node
{
	int data;
	struct Node next;
};

In fact, you can change your thinking: if you can directly include itself in the structure, then the size of the structure must be calculated using sizeof(), because if it is feasible, it will be stored in memory, and if it is stored in memory, There must be a size. On the contrary, if the size cannot be calculated, it proves that the self-reference method is not feasible.

Then when we run sizeof to calculate the size of the structure, we will find that the compiler reports an error , which proves that the self-reference method is wrong.

[Correct structure self-reference] 

Because the size of the address (pointer) is determined, you can pass the pointer to realize the self-reference of the structure.

struct Node
{
	int data;
	struct Node* next; //结构体指针
};

int main()
{
	printf("%d\n", sizeof(struct Node));
	return 0;
}

1.4. Definition and initialization of structure

struct Point
{
	int x;
	int y;
}p1;
//声明类型的同时定义变量p1
struct Point p2;
//定义结构体变量p2
//初始化:定义变量的同时赋初值。
struct Point p3 = { 1, 2 };

struct Stu    //类型声明
{
	char name[15];//名字
	int age;  //年龄
};
struct Stu s = { "zhangsan", 20 };//初始化

struct Node
{
	int data;
	struct Point p;
	struct Node* next;
}n1 = { 10, {4,5}, NULL };
//结构体嵌套初始化
struct Node n2 = { 20, {5, 6}, NULL };//结构体嵌套初始化

1.5. Calling structure member variables

  • Structure variable name.Member variable name
  • Structure pointer- > member variable name

1.6. Structure memory alignment 

  • We have mastered the basic use of structures.
  • Now let's delve into a problem: calculating the size of a structure.
  • This is also a particularly popular test point: structure memory alignment

If the member variables of two structures are the same, will their sizes be the same?

struct S1
{
	char c1;
	int i;
	char c2;
};

struct S2
{
	char c1;
	char c2;
	int i;
};

int main()
{
	printf("%d\n", sizeof(struct S1));   //结构体大小为多少?
	printf("%d\n", sizeof(struct S2));   //结构体大小为多少?
	return 0;
}

【operation result】 

Unexpectedly, the size of S1 is 12, and the size of S2 is 8. Their sizes are inconsistent. Why is this? Next we introduce a macro offsetof , use this macro to explore what causes the different sizes of S1 and S2.

1.6.1、offsetof

The macro offsetof is used to calculate the offset of the structure members compared to the starting position, and the offset is returned.

【Calculate S1】

First, c1 and c2 occupy one byte, and i occupies four bytes. Then use offsetof to calculate the offsets as 0, 4, and 8.

And the total size of S1 is 12, then after storing c1, i, c2 in the memory according to the offset, it can be observed that if S1 is 12, then 6 bytes of space will be wasted (red area), this Why?

【Calculate S2】

 The size calculated by S2 is 8. Then after storing the member variables in the memory according to the offset, you can observe that 2 bytes of space are wasted (red area). Why is there wasted space, and the wasted space is still Is it different? The following will explain structure memory alignment to everyone.

1.6.2. Calculation of structure size

First, you must master the alignment rules of structures:

  1. The first member is at offset 0 from the structure variable.
  2. Other member variables should be aligned to addresses that are integer multiples of a certain number (alignment number).
  • Alignment number = The smaller of the compiler's default alignment number and the size of the member .
  • The default value in VS is 8
  • There is no default alignment number in Linux, the alignment number is the size of the member itself.

     3. The total size of the structure is an integer multiple of the maximum alignment number (each member variable has an alignment number).

     4. If a structure is nested and the nested structure is aligned to an integer multiple of its own maximum alignment number, the overall size of the structure is an integer of all maximum alignment numbers (including the alignment number of nested structures) times.

So when we know the alignment number, let's try to calculate the sizes of s1 and s2 ourselves. 

 [Manual calculation of S1]

  • The first member c1 is placed directly at offset 0.
  • The size of i is 4, and the default alignment number of vs is 8. The smaller value is 4 , so it needs to be placed at an integer multiple of 4, that is, 3 bytes are skipped (wasted) and placed at offset 4. Occupies 4 bytes.
  • The size of c2 itself is 1. The default alignment number of vs is 8. The smaller value is 1 , so it needs to be placed at an integer multiple of 1. Any number is an integer multiple of 1, so it can be placed directly after i. .
  • It's not over yet . The total size of the structure is an integer multiple of the maximum alignment number (each member variable has an alignment number). The alignment number of c1 is 1, the alignment number of i is 4, and the alignment number of c2 is 1. Therefore, the maximum alignment number is 4. At this time, the size is 9. It is necessary to waste 3 more spaces, so that the total size of the structure reaches 12 and becomes a multiple of 4. This is completed. A calculation of the structure was performed.

[Manual calculation of S2] 

  • The first member c1 is placed directly at offset 0.
  • The size of c2 itself is 1. The default alignment number of vs is 8. The smaller value is 1 , so it needs to be placed at an integer multiple of 1. Any number is an integer multiple of 1, so it can be placed directly behind c1. .
  • The size of i is 4, and the default alignment number of vs is 8. The smaller value is 4 , so it needs to be placed at an integer multiple of 4, that is, 2 bytes are skipped (wasted) and placed at offset 4. Occupies 4 bytes.
  • It's not over yet. The total size of the structure is an integer multiple of the maximum alignment number (each member variable has an alignment number). c1 has an alignment number of 1, i has an alignment number of 4, and c2 has an alignment number of 1, so the maximum alignment number is 4. But the size at this time is just a multiple of 4, so there is no need to waste other space, and the size of the structure is 8.

1.6.3. Why does memory alignment exist? 

        After we understand the structure memory alignment, we still have a question: Why does memory alignment exist?

Most of the references mention two reasons :

1. Platform reasons (transplantation reasons):
        Not all hardware platforms can access any data at any address; some hardware platforms can only fetch certain types of data at certain addresses, otherwise a hardware exception will be thrown.

2. Performance reasons:
        Data structures (especially stacks) should be aligned on natural boundaries as much as possible. The reason is that in order to access unaligned memory, the processor needs to make two memory accesses; aligned memory access requires only one access.

In general: 

Memory alignment of structures trades space for time .

 

 When designing a structure, we must not only satisfy alignment but also save space. How to do this:

Keep members who take up less space together as much as possible. For example, S1 and S2 used as examples above have exactly the same members, but there are some differences in the size of the space occupied by S1 and S2, because S2 puts the small members together.

struct S1
{
	char c1;
	int i;
	char c2;
};                      //结构体大小12

struct S2
{
	char c1;
	char c2;
	int i;
};                      //结构体大小8

1.7. Modify the default alignment number

We have seen the #pragma preprocessing directive before, and here we use it again to change our default alignment number.

#include <stdio.h>
#pragma pack(8)//设置默认对齐数为8

struct S1
{
	char c1;
	int i;
	char c2;
};
#pragma pack()//取消设置的默认对齐数,还原为默认


#pragma pack(1)//设置默认对齐数为1
struct S2
{
	char c1;
	int i;
	char c2;
};
#pragma pack()//取消设置的默认对齐数,还原为默认

int main()
{	//输出的结果是什么?
	printf("%d\n", sizeof(struct S1));
	printf("%d\n", sizeof(struct S2));
	return 0;
}

 【operation result】

It’s very easy to understand here. We have already calculated it above when the alignment number is set to 8. Then when it is set to 1, it means there is no alignment. Because any number is an integer multiple of 1, it is directly equal to 1+4+ 1 = 6.

1.8. Structure parameter passing

 Which of the following print1 or print2 functions is better?

struct S
{
	int data[1000];
	int num;
};
struct S s = { {1,2,3,4}, 1000 };
//结构体传参
void print1(struct S s)
{
	printf("%d\n", s.num);
}
//结构体地址传参
void print2(struct S* ps)
{
	printf("%d\n", ps->num);
}
int main()
{
	print1(s);  //传结构体
	print2(&s); //传地址
	return 0;
}

【Answer】

The print2  function is preferred  because when the function passes parameters, the parameters need to be pushed onto the stack, which will cause system overhead in time and space.
If a structure object is passed and the structure is too large, the system overhead of pushing parameters onto the stack will be relatively large, which will lead to performance degradation.

2. Bit segment 

 Bit segments appear to save space.

2.1. What is a bit segment?

The "bit" of the bit field is the "bit" of the binary bit. The declaration and structure of bit fields are similar, with two differences:

  1. Members of a bit field must be int, unsigned int, or signed int. After C99, it can also be other types, but basically they are all types of the integer family such as int and char.
  2. The member name of the bit field is followed by a colon and a number .
struct A
{
	int _a : 2;  //_a占用2个bit位的空间
	int _b : 5;  //_b占用5个bit位的空间
	int _c : 10; //_c占用10个bit位的空间
	int _d : 30; //_d占用30个bit位的空间
};

int main()
{
	printf("%d\n", sizeof(struct A));
	return 0;
}


//提示:1个字节等于8个bit位

 

Under normal circumstances, the size opened up by four int types is 16 bytes, but if the above code is used to implement it, only 8 bytes are used. Let's explain the memory allocation of bit segments.

2.2. Memory allocation of bit segments

  1. Members of the bit field can be int unsigned int signed int or char (belonging to the integer family) type
  2. The space of the bit field is allocated in 4 bytes (int) or 1 byte (char) as needed.
  3. Bit segments involve many uncertainties. Bit segments are not cross-platform. Programs that focus on portability should avoid using bit segments .
struct S
{
	char a : 3;
	char b : 4;
	char c : 5;
	char d : 4;
};


int main()
{
	struct S s = { 0 };

	s.a = 10;
	s.b = 12;
	s.c = 3;
	s.d = 4;

	int ret = sizeof(struct S);
	printf("%d\n", ret);
	return 0;
}

 [Run results] Test results in Visual Studio 2022 environment

The result is 3 bytes.

Question : 3+4+5+4 = 16 bits, 1 byte equals 8 bits, why not open 2 bytes?

We can find the answer in the values ​​stored in memory .

It can be concluded from the diagram: when there is not enough space to store the next member, the remaining space will not be used, but another space will be opened and the content will be stored in the newly opened space. Therefore, the result of the above code is It will be 3 instead of 2.

2.3. Cross-platform issues in bit segments

  1. It is undefined whether an int bit field is treated as a signed or unsigned number.
  2. The maximum number of bits in a bit field cannot be determined. (The maximum number for a 16-bit machine is 16, and the maximum number for a 32-bit machine is 32. Writing it as 27 will cause problems on a 16-bit machine.)
  3. The criteria for whether members of a bit field are allocated from left to right or right to left in memory are undefined.
  4. When a structure contains two bit fields, and the members of the second bit field are larger and cannot fit in the remaining bits of the first bit field, it is uncertain whether to discard the remaining bits or use them.

Summarize:

Compared with the structure, the bit segment can achieve the same effect, but can save space very well, but there are cross-platform problems.

2.4. Application of bit segments 

Network protocol stack, the bottom layer of the network transmits data.

In today's Internet era, data transmission through the network has become very common. So have you ever thought about how the network transmission part is processed when we send a text message or a WeChat message? It only transmits the message itself. ? Of course not, the simplest message contains a lot of other data, such as the time when the message was sent, the sender's IP address, the IP address of the sender, etc. A message contains so much data, so if there are no bit segments, the transmission volume of a single message will be too large, which will lead to excessive network load, which is not conducive to our daily use and server data storage. Using bit segments can compress the size very well, making messages smaller and lighter.

3. Enumeration

Enumeration, as the name suggests, is to enumerate all possible values ​​one by one.

For example, in our real life:

  • There are a limited number of 7 days from Monday to Sunday in a week, which can be listed one by one.
  • Gender includes: male, female, confidential, or you can list them one by one.
  • There are 12 months in the month, and you can also list them one by one.

3.1. Definition of enumeration types

The enum Day, enum Sex, and enum Color defined below are all enumeration types.
The contents in {} are possible values ​​of the enumeration type, also called enumeration constants.

enum Day//星期
{
	Mon,   //枚举的可能取值是默认从0开始的。
	Tues,
	Wed,
	Thur,
	Fri,
	Sat,
	Sun
};

enum Sex//性别
{
	MALE,
	FEMALE,
	SECRET
};

enum Color//颜色
{
	RED,
	GREEN,
	BLUE
};

These possible values ​​are all valuable, starting from 0 by default and incrementing by 1 at a time. Of course, an initial value can also be assigned when defining.

For example: 

enum Color//颜色
{
    RED=1,
    GREEN=2,
    BLUE=4
};

3.2. Advantages of enumeration 

We can use #define to define constants, why do we have to use enumerations? Advantages
of enumerations :

  1. Increase code readability and maintainability
  2. Compared with identifiers defined by #define, enumerations have type checking, which is more rigorous.
  3. Prevented naming pollution (encapsulation)
  4. Easy to debug
  5. Easy to use, multiple constants can be defined at one time

4. Consortium (community)

4.1. Definition of union type

Union is also a special custom type.
The variables defined by this type also contain a series of members. The characteristic is that these members share the same space (so the union is also called a union).
for example:

union Un
{
	char c;
	int i;
};

4.2. Characteristics of the consortium

The members of a union share the same memory space , so the size of a union variable must be at least the size of the largest member (because the union must be able to store at least the largest member). At the same time, because they share a memory space, only one can be used at the same time.

union Un
{
	char c;
	int i;
};

int main()
{
	union Un un;
	printf("%d\n", sizeof(un));
	printf("%p\n", &(un));
	printf("%p\n", &(un.c));
	printf("%p\n", &(un.i));
	return 0;
}

 

4.3. Calculation of union size

  • The size of the union is at least the size of the largest member.
  • When the maximum member size is not an integer multiple of the maximum alignment number, it must be aligned to an integer multiple of the maximum alignment number.
union Un
{
	char c[5];  //大小为5,对齐数为1
	int i;      //大小为4,对齐数为4
};

int main()
{
	printf("%zd\n", sizeof(union Un));
	return 0;
}

 【operation result】

The maximum member size is 5, but the maximum number of alignments is 4, so it needs to be aligned to 8.

If you think the author's writing is good, please give the blogger a big like and support. Your support is my biggest motivation for updating!

If you think the author's writing is good, please give the blogger a big like and support. Your support is my biggest motivation for updating!

If you think the author's writing is good, please give the blogger a big like and support. Your support is my biggest motivation for updating!

 

Guess you like

Origin blog.csdn.net/zzzzzhxxx/article/details/133302515