[C language] Detailed explanation of self-defined types (simple to in-depth structure and bit segment)

Table of contents

1. Declaration of structure

1.1 Basic knowledge

1.2 Declaration of structure

1.3 Special declarations for structures

 1.4 Self-reference of structure

1.5 Definition and initialization of structure variables

1.6 Structure memory alignment

Then alignment is such a waste of space, why should it be aligned?

 1.7 Modify the default alignment number

1.8 Structure parameter passing

2. Bit segment

2.1 What is a bit segment

2.2 Memory allocation for bit segments

In-depth study of bit segment memory allocation in VS environment

2.4 Cross-platform problems with bit segments


First of all, understand what is a custom type. The char, short, int, float, double, etc. we usually come into contact with are all built-in types. These types are stipulated by the C language, not created by us. can be used by us.

In fact, the C language also allows us to create some types, which are custom types

What types do custom types allow us to create?

Structure type, structure type, enumeration type, then this chapter is an explanation of custom types.

First, let's understand the structure

1. Declaration of structure

1.1 Basic knowledge

A structure is a collection of values ​​called member variables, and each member of the structure can be a different variable.

When it comes to collections, we think of collections of the same elements as arrays, but this article is about collections and structures of different types of elements.

1.2 Declaration of structure

 The keyword of the structure type is struct, and the following tag is the tag name of the structure, which is the name of the structure.

The mem-list in the curly braces is a list of members, and the last variable-list is a list of variables, so the following is how the structure is named

struct  tag
{
	mem - list;
}variable-list;

Next, give an example to simply use the structure type

Here I use a student information to explain 

//定义一个学生类型
struct student
{
	char name[20];
	int age;
	float weight;
};

One by one compared with the structure naming method above, struct is the structure keyword, tag corresponds to the name of the student structure, here we name it student, and mem-list corresponds to the member variables of the structure (there can be multiple ), the final variable list is as follows and discussed with the main function, the code is as follows

struct student
{
	char name[20];
	int age;
	float weight;
}s4,s5,s6;//全局变量
int main()
{
    int num=0;
	struct student s1;//局部变量
	struct student s2;
	struct student s3;
	return 0;
}

The struct student s1 type defined in the main function is actually the same as int num. The structure type can be defined in the main function (s1, s2, s3) or in the variable list (s4, s5, s6). The difference is s1, s2, s3 are local variables,

And s4, s5, s5 are global variables, so the variable list may or may not exist.

1.3 Special declarations for structures

When we omit the structure name when declaring a structure type, we call it an anonymous structure type

Is it okay to use this anonymous structure to define local variables in the main function?

The answer is no, because there is no name

9447d20dce014299a8eeb299c13a5624.png

 We can see that global variables can be defined in the variable list of anonymous structure types, but local variables cannot be defined, so they can only be used in the variable list of structure types to define global variables.

Then there is a little flower work below;

struct 
{
	char name[20];
	int age;
	float weight;
}s1;
struct
{
	char name[20];
	int age;
	float weigt;
}* pa;

int main()
{
	pa = &s1;
	return 0;
}

We are defining a pointer pa of structure type, pointing it to the anonymous structure variable s1; assigning the address of s1 to the pointer variable pa in the main function, seeing such code may feel that there are two custom structure types The member variables are the same, then put s1 into pa, the member types of these two anonymous variables, these are two variables of the same type.

But when we implement it, the compiler will report an error

ac71132211b642109b09404f46778093.png

 It means that although the member variables of the two structure types are exactly the same, but in the eyes of the compiler, they are two completely different types of variables, so these are the wrong ways to use the anonymous structure

The following is the correct usage of the anonymous structure

An anonymous structure can only define variables when creating the structure, and cannot define local variables because there is no structure name

7cd330283fdc45829d75821805559d83.png

 1.4 Self-reference of structure

Is it okay for a structure to contain a type that is a member of the structure itself?

Before explaining the self-reference of the structure, I will give you a brief understanding of some content of the data structure.

a838e98da1ea4daeba091a111417eaca.png

 When we want to store the numbers 1 2 3 4 5, we need to find a few contiguous spaces to store them in, just like an array.

Then we call this continuous storage method a sequence table.

The data storage may also be out of order.

e57ba601f4ee4ece8983469993d4af1d.png

 In such a disordered order, we can let 1 find 2, let 2 find 3, and so on to find 5, so we don’t need to find it later, so we only need to find the position of 1 to find the position of the following number, in this way Putting 1 2 3 4 5 together like a chain, then this kind of data storage method is called a linked list; both the sequential list and the linked list are worn like a line for storage, and we call them linear data structures.

We will store 1 2 3 4 5 data like this as a node, then every time we visit a node, we only need to visit one node to access all their contents;

So how do I visit the next node?

The type of the next node is the same as this node, and that node not only stores its own data but also establishes a relationship with the next node

We might as well use the struct type

struct  Node
{
	int data;//大小为四个字节
	struct Node* next;//大小为四个字节或八个字节
};

 We first define a structure type Node; define an int type data in the member variable to store our own data, and define a struct Node* type next; in this way, it is established with the next data

It is very important to pay attention here that the type of pointer must be used when accessing the next data; because we do not know the size of the next data linked to, and the amount of memory, and the address of accessing it when using a pointer is also the same. The effect of accessing data can be achieved, so the pointer structure type should be used in the member variable, and the size and location of the next data can be determined at the same time;

Then our next operation is to connect the two data

struct  Node
{
	int data;
	struct Node* next;
};
int main()
{
	struct Node n1;
	struct Node n2;
	n1.next=&n2;
	return 0;
}

We define two structure variables n1 and n2; assign the address of n2 to next in n1, so that n1 has the ability to find n2, which is equivalent to a chain connecting two data

So when a structure wants to find another structure that is the same as itself, this method can be used.

1.5 Definition and initialization of structure variables

Very simple, directly on the code example

struct student
{
	char name[20];
	int age;
	float weight;
}s4, s5, s6;
int main()
{
	struct student s1;
	struct student s2;
	struct student s3;
	return 0;
}

In fact, there are only two ways

One is a global variable defined directly behind the structure, and the other is a local variable defined in the main function;

After we create the variable, we need to initialize it, just like we define other types, such as int type, we need to initialize it, and we need to give it an initial value

The initialization of the structure is the same as the array we learned

struct S
{
	int a;
	char c;
}s1;
int main()
{
	int arr[10] = { 1,2,3};
	struct S s2 = {100,'u'};
	return 0;
}

First define a structure member variable with int type a and char type c, and then observe the initialization method of the array in the main function, which needs to be enclosed in braces, then the structure initialization should also be enclosed in braces, and then go to the braces Enter the initialization content of the member variables of the structure in order;

Then the initialization content here is to assign 100 to a and dudu to char;

How to initialize the self-reference of the structure? qishi


struct S
{
	int a;
	char c;
}s1;
struct B
{
	float f;
	struct S s;
};
int main()
{
	
	struct B sb = { 3.14,{100,'u'} };
	return 0;
}

 It's actually very simple, we just need to put another curly brace in the curly braces, let's debug and see if these values ​​​​are initialized to our variables.

0a335f1c3ba644d9a18786e6f8be2cd4.png

 We can see that it is indeed the value we want to initialize.

Of course, it is also possible to initialize without order. The initialization without order is as follows

struct S
{
	int a;
	char c;
}s1;
int main()
{
	struct S s3 = { .c = 'w',.a = 100 };
	return 0;
}

We only need to write . in curly braces, add the member variable name, and add the value we want to be assigned. Debug verification

0b776677d22040daba6b39ac13637163.png

 The result is expected;

So how do we use these saved data?

also very simple

int main()
{
	struct B sb = { 3.14,{100,'w'} };
	printf("%f %d %c",sb.f,sb.s.a,sb.s.c);
	return 0;
}

 The operator of the structure is a '.', so the structure variable name + member variable name should be used when outputting

dd3db51f9d0844fab1a585cbb34538ed.png

 The output is also what we want.

1.6 Structure memory alignment

The next thing to share is an important point of the structure: how to calculate the size of the structure;

struct S1
{
	int a;
	char c1;
};
struct S2
{
	char c1;
	int a;
	char c2;
};
struct S3
{
	char c1;
	int a;
	char c2;
	char c3;
};
int main()
{
	printf("%d\n", sizeof(struct S1));
	printf("%d\n", sizeof(struct S2));
	printf("%d\n", sizeof(struct S3));
	return 0;
}

Give such a string of codes to predict the output value

ac5f845ef3e04d3e8e32e368f86a7376.png

 Are you right?

In fact, when the structure is stored in memory, it is necessary to complete such an operation on it, and these structures should be operated on an aligned boundary instead of random storage; we use the table to explain

 d8e1a93854234333b535740a1925efa3.png

The meaning of this picture is that the first variable of the structure is always placed at the 0 offset, which is where the arrow points to. The integer variable a occupies 4 bytes, as shown in the figure below

74199903b535496fa3be7a430a288bad.png

 Starting from the second member, each subsequent member must be aligned to an integer multiple of an alignment number;

The alignment number is: the smaller value of the member's own size and the default alignment number;

The size of c1 in this example is 1;

In the environment of vs, the default value of the alignment number is 8, and when there is no other default alignment number, the alignment number is the size of the member itself.

According to this analysis, the size of char c1 itself is 1, and the default alignment number is 8. If the smaller value is taken, then its alignment number is 1.

Then c1 only needs to be aligned at a multiple of 1, so we store it in the order of offsets, and these offsets are all multiples of 1, so they can be stored at position 4; the current storage status is as shown below

0abbccf32f8f49bb89cc31f3c308825f.png

Isn't the result output just now 8, why only five positions are occupied?

Next comes the third important point: when all the members are stored, the overall size of the structure must be an integer multiple of the maximum alignment number of all members. If it is not enough, the space will be wasted.

The alignment number of the int type is 4, and the alignment number of the char type is 1. The largest alignment number of these two variables is 4, so the size of the structure must be a multiple of 4, and the five spaces just occupied will continue to be wasted three spaces

d8990c6e500449138ed62fe83ee26470.png

Then there will be 8 spaces. In fact, a and c only occupy 5 spaces, and the space of the remaining three bytes is wasted because of alignment.

Then we reverse the positions of the two variables, define c1 on the top, and write a on the bottom. What will happen?

struct S1
{
	char c1;
	int a;
};

 When we write like this

The occupancy of the offset is as follows

9a18630d02ae40cc9223289af52a7458.png

 Because after 0 is occupied, 1, 2, and 3 are not multiples of 4, so a should start occupying the fourth place;

So in this case, the relative offsets of 1, 2, and 3 will be wasted

Then let's debug and verify

3ed10de0fbe54399b5afd08189fec34f.png

 We can see that there are indeed three addresses between c1 and a when we take the address in the monitoring, that is, three bytes are wasted;

Then there is another method is the offsetof function, its function is to measure the offset

fa9b73f7f39e4d7bb86295ba6517ca24.png

 We can see that it is actually a macro, and the return value is an integer value. Don’t forget to include the header file <stdef.h> when using it. The code is as follows

#include<stddef.h>
struct S
{
	char c;
	int a;
};

int main()
{
	struct S s;
	printf("%d\n", offsetof(struct S, c));
	printf("%d\n", offsetof(struct S, a));
	return 0;
}

5bee2ccee1674424b51bcc1f84db0009.png

 We can see that the offset of c is 0, and the offset of a is 4. The above illustration has demonstrated it to everyone. The process is complicated, and I hope everyone can understand it.

Then alignment is such a waste of space, why should it be aligned?

Most platforms have two reasons

1. Platform reasons:

Not all hardware platforms can access any data at any address. Some hardware platforms can only fetch certain types of data at certain addresses, otherwise a hardware exception will be thrown, which means that some machines cannot allow The operator is going to manipulate some data, so we can only map some addresses to a specific location boundary, and it is enough to fetch data on the boundary, so it must be aligned;

2. Performance reasons:

Data structures (especially stacks) should be aligned on natural boundaries as much as possible. The reason is that in order to access unaligned memory, the processor needs to do two memory accesses; while aligned memory access only needs one time, which means that alignment can improve efficiency.

Let me explain it simply with an example

420a3c16909b49a19c5f3bdb36fec73a.png

Let me first explain that if it is on a 32-bit machine, the data read is four bytes each time, so when we are not aligned, it takes two times to read the four bytes each time to read the data (if not aligned The purple reading method), but when our data is aligned, we only need to skip the number of wasted bytes when accessing the address of c, and access the address of a.

So in general, structure memory alignment is the practice of exchanging space for time.

Then when we usually design the code, we should pay attention to the position of creating variables.

Try to gather the types that occupy less space together, and then the space that might have been wasted will be used, which saves our space to a certain extent.

eaa671de911d47ef84951b9a97cbcbd0.png

 1.7 Modify the default alignment number

If we feel that the default alignment set by the environment we use is not suitable, we can of course modify it manually

The modification requires the use of the #pragma preprocessing command, which is used as follows

#pragma pack(8)

 The content of the following brackets is the default alignment number, which we can modify

#pragma pack()

Then if we want to cancel the set default alignment number again, we can restore the original default alignment number by removing the numbers in the brackets in order.

We might as well try it simple and practical, the code is as follows;

#pragma pack(1)
struct S
{
	char c1;//1
	int i;//4
	char c2;//1
};
#pragma pack()


int main()
{
	printf("%d", sizeof(struct S));
	return 0;
}

fe8cf99586df4615b24b87b815be206f.png

 We can see that when the default alignment number is 8, we change the default alignment number to 1, and the output is 6

Our environment provides such a way to adapt to our own development, let's take a brief look.

1.8 Structure parameter passing

Go directly to the code to explain

struct S
{
	int data[1000];
	int num;
};
struct S s = { {1,2,3,4}, 1000 };
//结构体传参
void print1(struct S s)
{
	printf("%d\n", s.num);
}
//结构体地址传参
void print2(struct S* ps)
{
	printf("%d\n", ps->num);
}
int main()
{
	print1(s); //传结构体
	print2(&s); //传地址
	return 0;
}

We can see that it is basically the same as the parameter passing of the function. It should be noted that the formal parameter of the function is the name of the structure variable;

Which of the above print1 and print2 functions is better?
The answer is the print2 function.
Because when the function passes parameters, the parameters need to be pushed on the stack, which will cause system overhead in time and space.
If a structure object is passed, the structure is too large, and the system overhead of pushing the parameters to the stack is relatively large, which will lead to a
decrease in performance.

Then we may still have doubts. The print1 function passes the value. If we accidentally modify the content of s in struct S s, it will not affect the s in printf1(s). Is this form safer? some? And if I accidentally change the pointer in print2, the content of the print function in the print2 print function will be modified, isn't it unsafe?

In fact, you can use const to modify the pointer

void print2(const struct S* ps)
{
	printf("%d\n", ps->num);
}

In this way, with const added, the pointer has no ability to modify the content pointed to by the pointer.

Therefore, in the future code, try to ensure that the structure passes parameters, so as to ensure that our efficiency can be higher.

2. Bit segment

After talking about the structure, we have to talk about the ability of the structure to realize the bit segment

2.1 What is a bit segment

Bit field declarations and structures are similar, with two differences:

1. The members of the bit field are usually int, unsigned int or signed int

2. There must be a colon and a number after the members of the bit segment

for example

struct A
{
	int a : 2;
	int b : 5;
	int c : 10;
	int d : 30;
};

This is a simple definition of a bit segment. It can be seen that there are differences and similarities with the structure.

What do the numbers and the colon after the colon mean?

Let's start with the memory it occupies, and go directly to the code;

struct A
{
	int a : 2;
	int b : 5;
	int c : 10;
	int d : 30;
};
int main()
{
	struct A sa = { 0 };
	printf("%d\n", sizeof(sa));
	return 0;
}

Can predict the outcome of the loss

0c8e45f4807f46bfaab3d1537c330d8f.png

 In fact, the 'bit' in the bit field refers to the binary bit

That is to say, the number after the colon represents several binary digits,

struct A
{
	int a : 2;//表示两个二进制位
	int b : 5;//表示五个二进制位
	int c : 10;//表示十个二进制位
	int d : 30;//表示三十个二进制位
};

As shown in the comments after the above code, isn't such a comment the same as the int we learned before occupying 32 bits?

It should be noted that when we design the structure, the data we store in the member variable a is very limited. I only need to store the four states of 0 1 2 3, and the corresponding binary numbers are 00 01 10 11 respectively. You will find that only two bits can represent 0 or 1 or 2 or 3, but if I open up space for the int type according to the normal 32 bits, but I only use two bits, so the remaining 30 Bits are wasted, so in order to save space and make the space occupied by his member variables smaller, there is the concept of bit segment.

That is to say, if I give him two bits, it is enough to express the meaning that this member variable wants to express, then it is completely enough. This is one way in which bit segments are more space-efficient.

2.2 Memory allocation for bit segments

The space of the bit field is opened up in the form of 4 bytes ( int ) or 1 byte ( char ) as required

Then we continue to use the above code as an example

struct A
{
	int a : 2;
	int b : 5;
	int c : 10;
	int d : 30;
};

When this bit segment is put in front of us, we find that its member is int, then we open up four bytes at a time, that is, 32 bits, a first uses 2, and there are 30; b uses 5 bits, 25 bits left; 10 bits of c are used, 15 bits are left; finally, d needs 30 bits, so what if 15 bits are not enough to allocate to d? We re-apply for the space of type int, 4 bytes, that is, 32 bits, 30 of d are used, and there are two left; the question is, do we still need the remaining space of the first 15?

Something like this is totally compiler-dependent;

In fact, the C language standard does not stipulate whether such a space should be used or not.

Therefore, bit segments involve many uncertain factors. Bit segments are not cross-platform. Programs that focus on portability should avoid using bit segments.

However, it does not affect our final result. Regardless of whether the above d has used the remaining 15 spaces, we have opened up the second space of four bytes, so this bit segment occupies 8 bytes. Memory, the output result can also be correct as shown in the above figure

In-depth study of bit segment memory allocation in VS environment

 Next, continue to study the direction of data allocation and how to use space on the VS platform, etc.

Still use the code as an example

struct S
{
	char a : 3;
	char b : 4;
	char c : 5;
	char d : 4;
};
int main()
{
	struct S s = { 0 };
	s.a = 10;
	s.b = 12;
	s.c = 3;
	s.d = 4;
	return 0;
}

The meaning of this code is to first open up the space of char type, because char only occupies one byte, so a occupies three bytes, b occupies four bytes, c occupies five bytes, and d occupies four bytes; But when entering the main function, struct S s initializes all their spaces to 0, so no matter how much their space is before, after being initialized to 0, the space they occupy is 0 again, and the next step is to give a, b, c, d are assigned.

Let's first observe and discuss what is in the bit segment

Seeing that the members are all char type, open up one byte of space at a time, that is, eight bits, as shown below

a3db90b2ccd4435eb6ce54524e285409.png

 We first assume that the data in the bit segment is from low address to high address, so we first open up three bits of space for a, a uses 3 bits, and there are 5 bits left; b uses 4 bits. 1 bit, there is 1 left, then we continue to assume that it will not take up more space, so when using to open up space for c, we must apply for a byte of space, that is, eight bits to store c The content of c uses 5 bits, and there are 3 bits left; at this time, it is not enough to store the data of d, continue to apply for a byte of space to store b, and then b occupies 4 bits;

Up to now, we have opened up a total of three bytes of space and consumed three bytes of space.

Is the fact the same as we assumed?

We are observing and studying the content in the main function

The content in the main function is to initialize the write space to 0, and put the content in these memory spaces to verify the order of storage.

int main()
{
	struct S s = { 0 };
	s.a = 10;
	s.b = 12;
	s.c = 3;
	s.d = 4;
	return 0;
}

First, store 10 in a, the binary expression of 10 is 1010, but a can only store 3 bits; so store 010;

The second is to store 12 in b, the binary expression of 12 is 1100, and b can store 4 bits, so store 1100;

The third is to put 3 into c, the binary expression of 3 is 11, but it needs to store five bits, and three 0s are added in front, so the stored value is 00011

The fourth is to store 4 in d, the binary expression of 4 is 100, and d can store 4 bits; so add a 0 in front, so the stored value is 0100;

The premise of these storage is

1. The bits in the allocated memory are used from right to left

2. When the remaining bits of memory allocated in one byte are not enough for the next bit, it is wasted and the space is re-applied

In order to verify the above statement, we debugged, and for the convenience of observation, we converted the binary number into a hexadecimal number to observe the storage method in the computer

So the conversion to hexadecimal is 6 2 0 3 0 4

we debug it

b92e8c941c2445ada348928927821960.png

 It can be seen that the same way as we assumed

Then it means that a compiler like VS does it in the way mentioned above

but

but

But it cannot explain that other compilers store the contents of structure members in this way

2.4 Cross-platform problems with bit segments

1. Whether an int bit field is treated as a signed number or an unsigned number is undefined.

Such problems are different on different platforms.
2. The maximum number of bits in a bit field cannot be determined. (16-bit machines have a maximum of 16, 32-bit machines have a maximum of 32, written as 27, there
will be problems on 16-bit machines.

That is to say, when we define a large space for int, for example, int b: 30;

The rationality of this 30 is open to question, and there may be problems on 16-bit machines.
3. Whether the members in the bit segment are allocated from left to right in memory or from right to left has not yet been defined.


4. When a structure contains two bit segments, and the second bit segment is too large to accommodate the remaining bits of the first bit segment, it is
uncertain whether to discard or utilize the remaining bits.

Based on the above problems, it can be known that bit segments are not cross-platform, so programs that focus on portability should avoid using bit segments. Summary: Compared with
structures
, bit segments can achieve the same effect, but can save space very well , but there are cross-platform issues.

The above is all the content I want to share this time, I hope it will be helpful to you, and finally ask for a three-link, thank you for watching

Guess you like

Origin blog.csdn.net/wangduduniubi/article/details/129657607