"C Traps and Defects"----Chapter 1 Lexical Defects

The part of the compiler responsible for decomposing the program into symbols is generally called the "lexer".

Compilation is divided into three stages: precompile, compile, link. In the compilation phase, it is used to convert C language code into assembly code.

In C, whitespace between symbols (including spaces, tabs, or newlines) is ignored.

1.1 = is different from ==

Note: Don't misuse == as =, and don't misuse = as ==.

Some C compilers will issue a warning message to remind the programmer when an expression of the form e1=e2 appears in the conditional part of a loop statement. To avoid warnings from such compilers when we do need to assign a variable to a variable and check if the new value of that variable is 0, we should not simply turn off the warning option (usually add parentheses around the expression to turn off the warning) , but should be compared explicitly. That is, the following example

if(x = y)
	foo();

should write:

if((x = y) != 0)
	foo();

This way of writing also makes the intent of the code clear at a glance.

1.2 & and | are different from && and ||

Note: Do not confuse the symbols above, as the compiler will usually not give any hints.

1.3 The "greedy method" in lexical analysis

General rule: each symbol should contain as many characters as possible.

Note: With the exception of strings and character constants, symbols cannot have embedded whitespace (spaces, tabs, and newlines) between symbols.

For example, == is a single symbol, while == is two symbols, the following expression

a---b

with expression

a -- -b

has the same meaning as

a - -- b

meanings are different.Similarly, if / is the first character read to determine the next symbol, and / is followed by *, then regardless of the context, both characters will be treated as a symbol /*, indicating the beginning of a comment .

E.g:

y = x/*p;      //p指向除数

In fact, /* is understood as the beginning of a comment, and the compiler will continue to read characters until */ appears. That is, the statement directly assigns the value of x to y, regardless of the subsequent p. The above statement should be modified to the following statement to express the meaning of the expression:

y = x/(*p);

1.4 Integer constants

Numbers starting with 0 are treated as octal numbers. Therefore, 10 has very different meanings than 010. Also, many C compilers treat 8 and 9 as octal numbers. For example, the meaning of 0195 is 1*8 2 +9*8 1 +5*8 0 , which is 141 (decimal) or 0265 (octal). We certainly do not recommend this way of writing, and the ANSIC standard also prohibits this usage.

Note: Sometimes we may inadvertently write a decimal number as an octal number for the sake of format alignment in the context.

1.5 Characters and Strings

Distinguish between '' and ""

A character enclosed in single quotes actually represents an integer, often its ASCII value.

A string enclosed in double quotes represents a pointer to the constant area of ​​the string.

How to understand 'yes'?

Its meaning is not exactly defined in the compiler, but most compilers understand it as: an integer value represented by 'y', 'e', ​​'s' in the way defined in a particular compiler implementation combination is obtained.

There are generally three processing methods:

  1. Extra characters are ignored, and the final integer value is the integer value of the first character.

  2. Overwrite the previous character with the next character in turn, and the resulting integer value is the integer value of the last character. VS6.0 and GCC v2.95 are this way.

  3. Fill the characters in '' into the memory space where the integer variable is located one by one in units of bytes. An example is given below (compile environment: VS2019)

    #include<stdio.h>
    int main()
    {
    	int a = '1234';
    	printf("%d", a);
    	return 0;
    }
    

    After debugging, you can see the following:

    image-20220301173252546It can be seen from this that in VS2019, the characters in '' are directly stored in the memory space where the integer variable is located one by one in units of bytes.

practise

  1. Q: Why does n–>0 mean n-->0 instead of n-->0?

    Answer: According to the big mouth method, that is, each symbol should contain as many characters as possible, before the compiler reads >, it has already treated – as a single symbol.

  2. Q: What is the meaning of a++++++b?

    Answer: The only meaningful way to parse the above formula is:

    a ++ + ++ b

    However, we also noticed that according to the "big mouth method" rule, the above formula should be decomposed into:

    a ++ ++ + b

    This expression is grammatically incorrect and is equivalent to:

    ((a++)++) +b

    However, the result of a++ cannot be used as an lvalue, so the compiler will not accept a++ as the operand of the following ++ operator. So it can only be understood as the first one.

Guess you like

Origin blog.csdn.net/m0_57304511/article/details/123408436