How to simplify C++ code

I am a minimalist, and I advocate a concise and clear code style. This may also be the reason why I don't like the Java family bucket... Of course, the simplicity I am talking about is based on the premise of not reducing readability, that is, not affecting the code. own performance. It is also not advisable to obscure the code in order to keep it simple.

This article will introduce 10 terms, and the related content will be updated in succession in the future, please continue to pay attention!

1. Make good use of emplace

emplaceC++11 has introduced (placement) semantics for STL containers . Such as vector, map, unordered_map, and even stack and queue.

The convenience of emplace is that it can automatically construct objects with function parameters, instead of passing in a constructed object like push_back of vector and insert of map.

For example, if there is such an object.

class Point {
public:
    Point(int x, int y):_x(x),_y(y){}
private:
    int _x;
    int _y;
};

Before C++11. roughly spelling

std::vector<Point> vp;
std::map<std::string, Point> mp;

Point p(1, 2);
vp.push_back(p);
vp.push_back(Pointer(3, 4));

Point p1(10, 20);
mp.insert(std::pair<std::string, Point>("key1", p1));
Point p2(100, 200);
mp.insert(std::make_pair("key2", p2));

After C++11:

std::vector<Point> vp;
std::map<std::string, Point> mp;

vp.emplace_back(1, 2);
vp.emplace_back(3, 4);

Point p1(10, 20);
Point p2(100, 200);
mp.emplace("key1", p1);
mp.emplace("key2", p2);

Note that there is no need to use emplace_back without a brain. For example, when there is already a Point object in your usage scenario, you need to put it into the vector:

// 彼时,你已经有了一个Point的对象p。不需要自己凭空构造。
vp.push_back(p);
vp.emplace_back(p);

In this case, the performance of the two writing methods is almost the same (push_back is shorter... Of course, it may not be necessary to pursue this). After seeing some old projects upgrade to C++11, I have replaced all push_back with emplace_back. It's okay, but sometimes it's not necessary.

Of course, when you need to construct an object from parameters. Then emplace_back will obviously be much simpler. But at this time, in addition to the verbose code of push_back, its performance overhead is not much higher than emplace_back, because

vp.push_back(Pointer(3, 4));

The call is:

void push_back (value_type&& val);

Some serious netizens mentioned that the embedding function of emplace is slightly better than this push_back (value_type&& val), anyway. The implementation logic of the two functions is different, and the performance cannot be completely consistent, but it is not enough to affect their coding habits. All in all, when the object to be put into the vector does not exist, it can be constructed directly with emplace_back, and either emplace_back or push_back can be used when it already exists.

2. Use auto without affecting readability, distinguish between auto& and auto&&

auto doesn't explain much.

Many C++ programmers are asked "Are you familiar with C++11? talk about it

Answer an "auto"

nope

auto is there to simplify long types (like namespace nesting used to be deep). In addition, auto& and auto&& (universal reference) are not explained much.

Of course, misusing auto will also make the code less readable. In front of my programmers who use vim to develop C++ without IDE, auto abuse is like a nightmare. No type hints.

3. lambda expressions replace handwritten functions and function objects

Lambda expressions (or lamba objects) may be C++ programmers answering "Are you familiar with C++11? 』 This question, after answering auto, said the second new grammar.

With lambda, functions in STL's algorithm are more concise to use.

In addition, lambda has other conveniences in addition to replacing the definition of ordinary functions and function objects (overloading operator()). That's the nature of closures. Speaking of closures can be difficult to understand at the moment. You can understand it as the reference capture function of lambda.

In addition to the parameters of the lambda, other parameters are obtained. And can span the lambda life cycle.

The only thing to note is that reference capture may cause a dangling reference (similar to a null pointer) when the subsequent lambda object is actually called, resulting in a core dump.

4. Aliase verbose types, especially std::function types

Look at a lengthy piece of code.

class FuncFactory {
public:
    void put_func(std::string, std::function<std::vector<std::string>(std::string)>);
    std::function<std::vector<std::string>(std::string)> get_func(std::string);
private:
    std::unordered_map<std::string, std::function<std::vector<std::string>(std::string)>> _func_map;
};

Simplify it with using:

using func_t = std::function<std::vector<std::string>(std::string)>;

class FuncFactory {
public:
    void put_func(std::string, func_t);
    func_t get_func(std::string);
private:
    std::unordered_map<std::string, func_t> _func_map;
};

5. Use #pragma once in the header file to replace the old and shabby #ifndef #define #endif

Since the birth of the C language in the 1970s, header files have used #ifndef #define #endif to avoid repeated inclusion.

#ifndef HEADER_FILE
#define HEADER_FILE

...
#endif

C++ also inherited this way of writing. However, it is still possible to write:

#pragma once
...

This syntax existed a long time ago, but was not part of the C++ standard. However, this syntax was supported early in the implementations of many compiler vendors. In C++11, this syntax is still not positive, but because it is widely supported by compilers, it can almost be used with confidence. It is widely used in Google and Facebook's C++ open source projects. #ifndef #define #endif is finally coming to an end.

Of course, in individual cases, this syntax also has pits:

Unlike header guards, this pragma makes it impossible to mistakenly use the same macro name in multiple files. On the other hand, because files with #pragma once are excluded based on their filesystem-level identity, this does not prevent the header file from being included twice if it has multiple locations in the project.

You can refer to: https://zh.cppreference.com/w/cpp/preprocessor/impl

In short, pragmas are kept unique based on the file path of the header file. And macros can be done across multiple files to maintain the uniqueness of include. For example when you have multiple versions of a header file in a codebase...

In general, we may rarely need to use multiple versions of a header file in a project, but I don't have such a need anyway.

6. Make good use of for range to traverse containers, and you can also target repeated fields of PB (even mutable)

Still traversing containers with subscripts?

for (int i = 0; i < v.size(); ++i) {
    cout<<v[i]<<endl;
    v[i] *= 10;
}

Java and other languages ​​already have for-range loops without subscripting, and C++11 also has:

for (auto& e: v) {
    cout<<e<<endl;
    e *= 10;
}

It is best to use reference & to traverse, otherwise if the object is stored in the container, there will be a copy. Of course, if you don't want to modify the elements in the container, you can also use const auto& to traverse.

In C++ engineering projects, protobuf will definitely be used a lot. for range can also traverse the repeated field of pb

syntax = "proto3";
message Student {
    string name = 1;
    int32 score = 2;
}
message Report {
    repeated Student student = 1;
}

In code:

// report 是一个Report类型的对象
for (auto& student: report.student()) {
    cout<< student.name << "'s score:" << student.score << endl;
}

At work, I see a lot of traversing pb repeated field codes that can mostly do the above. However, when traversing the pb repeated field and modifying the variables in it (mutable returns a pointer, which cannot be directly for range), many people still choose to use the traditional for+subscript form to traverse. In fact, no, you can still for range

for (int i = 0; i < report.student_size(); ++i) {
    report.mutable_student(i)->set_score(60); // 60分万岁!
}

Long-winded! ! ! ! ! ! ! ! ! ! ! It can be written like this:

for (auot& student: *report.mutable_student()) {
    student.set_score(60);  // 60分万岁!
}

7. Use do while or IIFE to skip part of continuous logic without ending the function

Have you ever had this experience: in a piece of tiled logic in a function, go through three steps 1, 2, 3 in turn, and then other logic (such as 4, 5). Among them, 1, if it fails, do not execute 2, and 2 if it fails, do not execute 3. It is to jump directly to 4 and 5 after the logic interruption. There are three implementation ideas that are easy to think of:

One: abstract steps 1, 2, and 3 into functions. Each time the return value of the function is judged, the next function is called only if it succeeds. OK. That's fine. But if the sequence logic is too much. Then you need to extract many functions, and each function has only a few lines of code. Instead, gossip.

Second: use exceptions. If it is the Java language, it should be very accustomed to use exceptions to implement this logic, and encapsulate the sequential logic in a try catch block. Every step that fails directly throws an exception. OK, C++ can also write similar code. However, there are many hidden dangers of exceptions in C++, and it is not as safe as Java. Many engineering specifications try to avoid throwing exceptions. In addition, throwing exceptions is not without overhead, and this is only a logical interruption, and it is not considered an "abnormal" in logic. The way of throwing exceptions and catch exceptions will affect the expressiveness even more...

Third: goto. I've seen some code that does use goto on this occasion. Of course we have to strictly prohibit goto. This scheme is simply skipped.

In fact, there is a fourth solution: do while(0)

do {
    // 步骤1
    ...
    if (步骤1失败) {
        break;
    }
    // 步骤2
    ...
    if (步骤2失败) {
        break;
    }
    // 步骤3
    ...
    if (步骤3失败) {
        break;
    }
} while(0);

// 步骤4
...
// 步骤5
...

This actually applies to other languages ​​with do while, not just C++. Also, thanks to the advent of lambda functions in C++11, you can also write:

[]() {
    // 步骤1
    ...
    if (步骤1失败) {
        return;
    }
    // 步骤2
    ...
    if (步骤2失败) {
        return;
    }
    // 步骤3
    ...
    if (步骤3失败) {
        return;
    }
}();

// 步骤4
...
// 步骤5
...

This is to add a parenthesis at the end of the ordinary lambda expression, which means that the defined lambda can be executed immediately.

This feature is also known as IIFE(Immediately Invoked Function Expression), that is, the function expression is immediately invoked. It's a term from Javascript, probably not the proper name in C++...

8. In some cases, use struct instead of class to avoid writing C++ classes as JavaBeans

For various reasons, programmers who switch from Java to C++ like to write C++ classes as JavaBeans. Set(), get() at every turn

Of course, this kind of encapsulation is no problem, the data members are set to private, and all accesses are through interface functions. It's just too dogmatic, but long-winded. In C++, I like to directly use struct to represent classes of pure data types (only data). Does not contain any member functions. There is no need to use class, and then set a public. It is more intuitive to use struct!

[Of course, this may be controversial~]

9. The function returns the STL container or object directly. Don't return pointers, and don't need to add parameters to functions

Before C++11. What if you want to return an object of an STL container (or other complex type)?

The first:

void split(std::string str, std:string del, std::vector<std::string>& str_list) {
   // 解析字符串str,按del分隔符分割,拆成小字符串存入str_list中
   ...
}

// 调用方:
std::vector<std::string> str_list;
split("a:b:c:d", ":", str_list);

It's not very convenient to use this way. If not split, and other examples. I may want to call the return value member variable (foo().bar().xxx()) point by point in a row. Undoubtedly, the above will break my one-line statement writing.

The second:

std::shard_ptr<std::vector<string>> split(std::string str, std:string del) {
    std::shard_ptr<std::vector<string>> p_str_list = std::make_shared<std::vector<std::string>>();
    // 解析字符串str,按del分隔符分割,拆成小字符串存入p_str_list中
    ...
    return p_str_list;
}

Or the most original version:

std::vector<std::string>* split(std::string str, std:string del) {
    std::vector<std::string>* p_str_list = new std::vector<std::string>;
    // 解析字符串str,按del分隔符分割,拆成小字符串存入p_str_list中
    ...
    return p_str_list;
}
需要小心的处理返回值,自己控制delete掉指针,避免内存泄露。

It's all too verbose. But without exception. None of the seniors familiar with C++98 would recommend that you use a function that returns an STL container directly. However things have changed since C++11. Novice programmers who are not familiar with C++98 instead write the optimal solution:

std::vector<std::string> split(std::string str, std:string del) {
    std::vector<std::string> str_list;
    // ...
    return str_list;
}

Trust me, no problem.

This change actually caused some embarrassment at work. Sometimes when I write this kind of code, when I give core review to my old colleagues, I am afraid that I will be approved for a bad code. If I am criticized, I will naturally be embarrassed, and then I will explain that this type of writing is no problem in C++11, then the old colleagues will be embarrassed.

To avoid this embarrassment I always add a comment near the code:

// it's ok in C++11
std::vector<std::string> split(std::string str, std:string del);

In fact, it was used before C++11. Because the compiler does its own RVO, NRVO optimization, which is of course non-standard. Change the compile options and it may be gone. Although gcc does not explicitly close RVO, it starts by default. But when I was working in the C++98 environment, I rarely saw this way of directly returning an object. In fact, not all function definitions that return an object can trigger RVO. If it is not clear, C++98 programmers should use it with caution.

But starting with C++11, you don't have to worry about it.

10. Take advantage of the default behavior of the [] operator of unordered_map/map

For example, we have a counting logic in our program that uses an unordered_map<string, int> (or map<string, int>) to count tags of a string type. I saw a colleague write this before:

// freq_map 是一个 unordered_map<string, int> 类型。
// 通过某个计算获取到了一个string类型的变量tag,下面进行计数
if (freq_map.find(tag) == freq_map.end()) {
    frea_map.emplace(tag, 1);
} else {
    freq_map[tag] += 1;
}

// 或者这种
if (freq_map.find(tag) == freq_map.end()) {
    frea_map.emplace(tag, 0);
}
freq_map[tag] += 1;

In fact, all of them are not used. The above two are probably written in python using dict to count (I also had a similar way of writing when I wrote MapReduce tasks), but C++ does not use it, because. C++'s map will create a value by default when the key does not exist when using the [] operator! If value is a primitive data type, then it is 0.

So you can directly write:

frep_map[tag]++;
// 或
freq_map[tag] += 1;

Of course, it is precisely because of the default nature of the [] operator that there is an article in Effective C++ that says to use m.insert() to insert key, value (emplace after C++11) instead of m[key] = value , because the latter will first construct an empty object and then overwrite it. Of course, as I mentioned this counting scenario here, there is no need to consider this. Because it is necessary to initialize one when the key does not exist, and the value is a basic data type, initialized to 0, and then overwritten to 1, the overhead is not large.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324504628&siteId=291194637