[In-depth understanding of computer systems] Chapter 5-Optimizing Program Performance

Starting with this chapter, the next step is storage and networking. First resolve the main contradictions. Only record part.

 

  1. [Use of memory alias] When the incoming parameter is considered as a pointer, the compiler cannot directly optimize it. The example given is shown below. In fact, to write a function similar to memcpy, when you pass in two pointers for array copy, you also need to confirm whether the address pointed by the dst pointer coincides with the src array. The optimization is done by the compiler, and the code is written by humans, all of which require careful consideration.
  2. [Code Movement] Bring the repeated calculation out of the loop. For example, for (int i = 0; i <vec_int.size (); ++ i). Of course, we are not sure whether the compiler will optimize this.
  3. [Branch prediction & speculative execution]
    • The ICU (Instruction Control Unit) is responsible for reading the instruction sequence from the instruction telling cache and generating operations. The Retirement Unit records the ongoing processing and ensures that it respects the sequence semantics of machine-level programs.
    • EU (Execution Unit, execution unit): perform the operation generated by the ICU
  4. [Performance of Pentium III arithmetic operation] The execution time and launch time of integer division (36) and floating-point division (38) are touching. The performance of different processors is different. I didn't notice this when I was studying.
  5. [Loop Development] Parallel optimization. Such as: multiplying odd bits and multiplying even bits. However, the degree of parallelism is limited by the number of registers.
  6. [Basic strategy to optimize program performance]
    • Advanced design: select the appropriate algorithm and data structure.
    • Basic coding principles:
      • Eliminate continuous function calls. When possible, move the calculation out of the loop.
      • Eliminate unnecessary memory references. Introduce temporary variables to save intermediate results. After calculating the final value, store the result in an array or global variable.
    • Low-level optimization
      • Try various forms of pointers relative to array codes.
      • Loop unfolding
      • Iterative segmentation
    • Finally, correctness is the most important!
  7. [Profiling] Unix system provides a profiling program GPROF.
    • The timing is not very accurate. Based on a simple interval technique.
    • The call information is quite reliable.
    • By default, calls to library functions are not displayed.

 

Guess you like

Origin www.cnblogs.com/zhouys96/p/12702569.html