DP Reading: Kunpeng Processor Architecture and Programming (14) ACPI and Software Architecture Specific Tuning

operating system kernel

ACPI (Advanced Configuration and Power Interface)
is a power management standard jointly developed by Intel, Microsoft, HP, Sony and other companies. It provides a common power management interface for managing aspects of a computer system's power state, performance, and configuration.

In computer systems, ACPI is responsible for handling power management events , such as system startup, shutdown, hibernation, and wake-up. It also provides control over hardware devices, cooling devices, batteries, and other devices.

ACPI plays an important role in computer systems to improve system energy efficiency and battery life, while providing better system performance and stability.
ACPI's ARM diagram
The Device Tree of the ARM64 platform is defined in arch/arm64/boot/dts/the directory of the Linux source code. Device Tree is a data structure used to describe the system hardware, and it is used to initialize the hardware during the boot process.

The following is an example of a simple ARM64 platform Device Tree, which is a virtual device tree, just to understand the basic concepts. In the actual device tree, there will be more detailed properties and configurations.
Sorry, I can't deal directly with C code. But I can help you understand the basic structure of an ARM64 platform device tree (Device Tree), and I can give you an example.


git clone https://github.com/torvalds/linux.git
cd linux
cd scripts/dtc/

make
sudo make install PREFIX=/usr/local


// SPDX-License-Identifier: GPL-2.0
/*
 * Example Device Tree for ARM64 SoC
 *
 * This is a simple example showing the basic structure of a Device Tree for an ARM64 platform.
 */

/dts-v1/;

/ {
    
    
    compatible = "example,arm64";
    model = "Example ARM64 SoC";

    memory@80000000 {
    
    
        device_type = "memory";
        reg = <0x0 0x80000000 0x0 0x80000000>; // 2 GiB of memory starting from 0x80000000
    };

    chosen {
    
    
        bootargs = "console=ttyAMA0,115200";
    };

    uart@9000000 {
    
    
        compatible = "example,uart";
        reg = <0x0 0x9000000 0x0 0x1000>;
        interrupt-parent = <&gic>;
        interrupts = <0 13 4>; // interrupt number 13, active high, level triggered
    };

    gic: interrupt-controller {
    
    
        compatible = "example,gic-v3";
        interrupt-controller;
        #interrupt-cells = <3>;
        interrupt-parent = <&gic_cpuif>;
    };

    gic_cpuif: interrupt-controller@5000 {
    
    
        compatible = "example,gic-v3-cpuif";
        reg = <0x0 0x5000 0x0 0x1000>;
        interrupts = <0 8 1>; // interrupt number 8, active high, edge triggered
        interrupt-parent = <&gic>;
    };
};

Kunpeng software porting

High-level software architecture

Kunpeng Software Migration Process

The Kunpeng software migration process includes the following steps:

  1. Prepare JDK: Install ARM version JDK.
  2. Configure environment variables: configure environment variables such as the JDK path.
  3. Compilation: Java source code generates bytecode.
  4. Test: start the Java program, debug function.

The above is the complete process of Kunpeng software transplantation

Compilation tool selection

The choice of compilation tool mainly depends on the programming language and development environment you use. Here are some common compilation tools:

  1. Microsoft Visual Studio: This is a complete set of development tools for programming languages ​​such as C/C++/C# for all platforms supported by Microsoft. It includes UML tools, code control tools, integrated development environment (IDE), etc. It is a very practical and powerful code writing and development software.
  2. GCC: GNU Compiler Collection (GCC) is a software for programming language compilation, which supports C, C++, Objective-C, Fortran, Ada and other languages.
  3. Clang: Clang is a C/C++/Objective-C compiler based on LLVM, which is mainly used to optimize compilation efficiency and code quality.
  4. Java Development Kit (JDK): If you are developing Java applications, then you need the JDK, which contains the Java compiler (Javac) and other tools.
  5. Python Interpreter and Compiler: Python is an interpreted language, so it does not require a compiler. However, Python has various interpreters and compilers, such as CPython, Jython, IronPython, etc.
    This is just a part of it, there are actually many other compilation tools to choose from. You should choose the compilation tool that suits you according to your specific needs and habits.

Compilation parameter migration case

During the compilation process, some specific compilation parameters may affect the compilation result and adaptability. The following are some examples of compilation parameters that may play a key role in the migration process:

1. -march 和 -mtune:这两个参数用于指定目标处理器架构。例如,-march=native 将使编译器优化为运行在本地硬件上,而 -march=x86_64 将优化为运行在64位x86处理器上。
2. -fPIC (Position-Independent Code):该参数用于生成位置无关的代码,这在实现共享库时非常重要。
3. -I:该参数用于指定头文件的搜索路径。在移植过程中,可能需要修改此参数以适应新的环境。
4. -L:该参数用于指定库文件的搜索路径。同样,在移植过程中,可能需要修改此参数。
5. -l:该参数用于指定要链接的库。这可能涉及到链接到不同的库文件,以适应新的环境。
6. -D:该参数用于定义宏。在移植过程中,可能需要定义新的宏以适应新的环境。

Please note that the above are just some examples of common compilation parameters, in fact there are many other compilation parameters that can be used in the migration process. The specific parameter selection will depend on your needs and the specifics of the target platform.

Source code modification case

The source code modification case is mainly to modify and optimize the program source code for specific needs or problems. The following is a simple source code modification case:

Suppose we have a C language program, which has a function called calculate_average, which receives an integer array and the length of the array as input, calculates the average value of the array and returns the result.

double calculate_average(int* arr, int length) {
    
    
    double sum = 0.0;
    for (int i = 0; i < length; i++) {
    
    
        sum += arr[i];
    }
    return sum / length;
}

Now we want to modify this function so that it ignores any negative numbers in the array and simply skips the calculation. We can add a simple judgment statement to achieve this function.

double calculate_average(int* arr, int length) {
    
    
    double sum = 0.0;
    for (int i = 0; i < length; i++) {
    
    
        if (arr[i] >= 0) {
    
    
            sum += arr[i];
        }
    }
    return sum / length;
}

With this modification, the function will now only count the non-negative numbers in the array and return their average. This can be applied to any array containing negative numbers to ignore them and get more accurate results.

Please note that this is just a simple example, the actual source code modification may be more complicated and involve more details. Before modifying the source code, it is recommended to conduct sufficient testing and backup (running on a virtual machine without any problem) to ensure that the modification will not introduce errors or destroy the original function.

Kunpeng analysis scanning tool Dependency Advisor

Dependency Advisor is a tool that can simplify the process of migrating customer applications to Kunpeng servers. It is mainly installed on X86 servers and is used to analyze portability and porting effort. This tool supports checking the SO dependent libraries contained in the user software resource package (RPM, JAR, TAR, zip, gzip files), and assessing the portability of the SO dependent libraries; checking the SO dependent libraries under the specified user software installation path, And evaluate the portability of the SO dependent library; check the user software C/C++ software construction project file, and evaluate the portability of the file; and check the user software C/C++ source code, and evaluate the portability of the software source file.

In addition, Dependency Advisor automatically analyzes and outputs guidance reports, providing software migration reports and migration effort assessments. It also supports two working modes of command line and Web.

Kunpeng Code Migration Tool Porting Advisor

Porting Advisor is a code migration tool that can help developers migrate applications from the x86 platform to the Kunpeng platform. Specific features include:

  1. Analyze portability: Porting Advisor can analyze the user's source code and related dependencies to determine whether it can be migrated to the Kunpeng platform.
  2. Automatic analysis: Porting Advisor can automatically analyze the code content that needs to be modified, and give suggestions for modification.
  3. Provide guidance: Porting Advisor can help developers solve problems encountered in the migration process and provide corresponding solutions.

Using Porting Advisor for code migration can reduce the workload of manual investigation and improve the overall migration efficiency.

分析可迁移性
自动分析
提供指导

Kunpeng software performance tuning

Kunpeng software performance tuning process

Kunpeng software performance tuning process includes the following steps:

  1. Establish a baseline: Before optimization or monitoring begins, a baseline data and optimization goal must first be established. This includes hardware configuration, networking, test models, and system operating data (CPU/memory/IO/network throughput/response delay, etc.). We need to do a comprehensive evaluation and monitoring of the system in order to better analyze system performance bottlenecks and system performance changes after implementing optimization measures. The optimization goal is the performance goal that the system is expected to achieve based on the current hardware and software architecture.
  2. Stress testing and monitoring bottlenecks: Use peak workloads or professional stress testing tools to stress test the system. Use some performance monitoring tools to observe the system status. During stress testing, it is recommended to record the operating status of the system and programs in detail. Accurate historical records will be more helpful in analyzing bottlenecks and confirming whether optimization measures are effective.
  3. Performance analysis: Based on the results of the stress test, perform performance analysis to find out performance bottlenecks. This includes finding bottlenecks in CPU, memory, I/O, networking, and more.
  4. Optimization: According to the results of performance analysis, take corresponding optimization measures. The exact method of optimization may vary by system and application.
  5. Re-testing: After implementing optimization measures, it is necessary to conduct stress testing and performance monitoring again to confirm the optimization effect.
  6. Iteration: Performance tuning is an iterative process that needs to be continued to optimize system performance.

The above is the general process of Kunpeng software performance tuning, and the specific implementation may need to be adjusted according to the specific situation.

循环
建立基准
压力测试与监视瓶颈
性能分析
优化
再测试
迭代

CPU and memory subsystem performance tuning

For performance tuning of the CPU and memory subsystems, the following measures can be taken:

  1. CPU aspect:

    • Multi-thread optimization: Reasonable use of multi-threads, divide computing tasks into multiple threads for parallel execution, and make full use of the performance of multi-core CPUs.
    • Reduce context switching: reduce frequent switching between threads and avoid overhead caused by context switching.
    • Cache optimization: Make full use of the CPU cache to avoid performance loss caused by cache misses. For example, using locality principles to optimize data access patterns and reduce cache misses.
  2. In terms of memory:

    • Memory allocation optimization: Reasonably set the memory allocation strategy to avoid frequent memory allocation and release. Technologies such as object pools and memory pools can be used to optimize memory management.
    • Memory access mode optimization: make full use of the principle of locality to optimize the memory access mode. For example, reduce the randomness of memory access through continuous access, aligned access, etc.
    • Memory compression and sharding: For data structures or objects with large memory usage, memory compression or sharding can be considered to reduce memory usage and improve access efficiency.
CPU方面优化
多线程优化
减少上下文切换
缓存优化
内存方面优化
内存分配优化
内存访问模式优化
内存压缩与分片

In addition, the CPU and memory subsystem can be monitored and analyzed through performance monitoring and performance analysis tools to find performance bottlenecks and adopt corresponding optimization strategies. Different application scenarios and requirements may require different optimization methods, so it is recommended to optimize according to the specific situation, and perform performance testing and evaluation to verify the optimization effect.

Network Subsystem Performance Tuning

For performance tuning of the network subsystem, the following measures can be taken:

  1. Reduce network latency:

    • Use high-performance network equipment: choose high-performance network switches, routers and other network equipment to reduce the transmission delay of data packets.
    • Use faster network protocols: For example, adopt faster transport protocols (such as TCP Fast Open, QUIC) to reduce handshake delay and connection establishment time.
    • Optimizing the network topology: rationally plan the network topology, reduce the transmission distance of data packets, and reduce network delay.
  2. Increase network bandwidth:

    • Network load balancing: By configuring load balancing equipment or software, the network traffic is evenly distributed to multiple servers to improve the overall network bandwidth.
    • Data compression and acceleration: use data compression and acceleration technology to reduce the amount of data transmission, thereby increasing the available bandwidth.
    • Increase bandwidth capacity: upgrade network equipment to increase bandwidth capacity to meet high concurrent network requests.
  3. Optimize network protocols and policies:

    • TCP/IP parameter tuning: According to specific application scenarios, adjust the parameters of TCP/IP protocol, such as window size, congestion control algorithm, etc., to improve network transmission efficiency.
    • Data packet priority management: By configuring the QoS (Quality of Service) function of network equipment, different types of data packets are prioritized to ensure timely transmission of important data.
  4. Cache and buffer management:

    • CDN Acceleration: Use Content Distribution Network (CDN) to cache static resources and speed up data transmission.
    • Application of caching technology: According to actual needs, rationally use caching technology to cache popular data and query results in memory to speed up data access.
    • Buffer size optimization: optimize the buffer size of network devices to avoid performance problems caused by too large or too small buffers.

此外,还可以通过监控和分析网络流量、延迟等指标,找出网络性能瓶颈,并进行相应的调优策略。综合考虑应用程序的特点、网络环境以及业务需求,选择合适的调优方案,并进行性能测试和评估,以验证优化效果。

CPU方面优化
内存方面优化
CPU多线程优化
减少上下文切换
缓存优化
内存分配优化
内存访问模式优化
内存压缩与分片

磁盘I/O子系统性能调优

对于磁盘I/O子系统的性能调优,可以采取以下措施:

优化磁盘I/O性能的常见措施包括:

  1. 使用RAID技术:RAID技术可以提供更高的磁盘读写性能和冗余容错能力。

  2. 块大小优化:根据应用程序的访问模式和数据块大小,调整磁盘块的大小以提高磁盘I/O性能。

  3. 操作系统参数调优:操作系统参数调整相关参数来改善磁盘I/O性能。

  4. 文件系统选择与优化:选择适合特定应用场景的文件系统,并进行相应的优化。

  5. I/O缓存与缓冲区管理:I/O缓存和缓冲区管理技术来减少磁盘I/O操作次数。

  6. 应用程序优化:应用程序优化,减少不必要的磁盘I/O操作。

  7. 磁盘性能监控与故障诊断:磁盘性能监控磁盘的性能指标,及时发现潜在问题,并采取相应的故障诊断和修复措施。

这些措施综合起来可以提升磁盘I/O子系统的性能和可靠性。但需要根据具体情况进行调优,并进行性能测试和评估,以验证优化效果。

应用程序性能调优

对于应用程序的性能调优,可以采取以下措施:

  1. 代码优化:优化算法和数据结构和提高代码执行效率。
  2. 数据库优化:数据库索引优化和提高数据检索操作。
  3. 缓存技术应用:使用缓存技术和提高读取速度。
  4. 网络通信优化:减少网络请求次数和提高速度。
  5. 性能监控与调试:使用性能监控工具和评估系统的性能表现。
  1. 代码优化:

    • 优化算法和数据结构:选择高效的算法和数据结构,减少不必要的计算和内存消耗,提高代码执行效率。
    • 减少资源占用:及时释放不再使用的资源,避免资源泄露。合理使用内存、文件句柄、数据库连接等资源,避免资源瓶颈。
    • 并发编程优化:合理使用多线程、多进程或异步编程模型,利用多核处理器和异步操作提高并发性能。
  2. 数据库优化:

    • 数据库索引优化:分析数据库查询的频率和模式,创建适当的索引来加速数据检索操作。
    • 数据库连接管理:合理维护和管理数据库连接,减少连接的建立和关闭开销。
    • 批量操作和事务管理:将多个数据库操作批量提交或使用事务进行管理,减少单次数据库交互的次数,提高效率和数据一致性。
  3. 缓存技术应用:

    • 使用缓存技术:将频繁读取的数据缓存在内存中,减少对底层存储系统(如数据库)的访问,提高读取速度。
    • 合理设置缓存策略:根据数据的更新频率和重要性,设置合适的缓存策略,如缓存过期策略、LRU(最近最少使用)策略等。
  4. 网络通信优化:

    • 减少网络请求次数:合并多个网络请求、采用批量操作,减少网络开销和延迟。
    • 压缩和加速数据传输:使用数据压缩和加速技术,减少网络传输数据量,提高速度。
  5. 性能监控与调试:

    • 使用性能监控工具:通过监控工具来获取应用程序的性能指标,如CPU使用率、内存占用、数据库查询时间等,找出性能瓶颈和潜在问题。
    • 进行性能测试:模拟实际使用场景,进行负载测试和性能测试,评估系统的性能表现,及时发现和解决性能问题。

除了以上措施,还可以根据具体应用场景和需求进行针对性的优化。关注应用程序的瓶颈和低效点,不断进行测试和改进,以提高应用程序的性能和响应速度。同时,注意平衡性能调优和可维护性之间的关系,避免过度优化导致代码难以理解和维护。

增加硬件资源
+增加CPU、内存和磁盘等硬件资源()
+使用更快的存储设备或网络传输设备()
调整软件参数
+调整操作系统或数据库的参数()
+调整Web服务器参数()
优化软件架构和设计
+应用合适的软件设计模式和开发框架()
+优化软件架构()
资源的合理利用
+减少访问磁盘次数()
+使用缓存技术对数据进行缓存()
进行性能测试和优化
+负载测试和性能测试()
+评估不同参数组合或优化技术()
系统监控和调试
+实时监控系统性能和资源利用率()
+收集日志信息()

基础软件性能调优

基础软件(如操作系统、数据库、Web服务器等)的性能调优,可以采取以下措施:

  1. 增加硬件资源:

    • 增加CPU、内存和磁盘等硬件资源,以提高基础软件的执行速度和并发处理能力。
    • 使用更快的存储设备或网络传输设备,以提高数据的访问速度和传输效率。
  2. 调整软件参数:

    • 根据应用的工作负载和硬件配置,调整操作系统或数据库的参数,以达到最优性能。
    • 对于Web服务器,可以调整连接池大小、缓存策略、请求过滤等参数,以提高并发处理和响应速度。
  3. 优化软件架构和设计:

    • 应用合适的软件设计模式和开发框架,以提高代码复用性和可维护性。
    • 良好的软件架构可以提高基础软件的并发处理能力和可扩展性。
  4. 资源的合理利用:

    • 对于数据库系统,可以通过使用数据库缓存、合理索引等手段,减少访问磁盘的次数,提高数据访问速度。
    • 对于Web服务器,可以使用缓存技术对经常访问的数据进行缓存,降低数据库访问的频率。
  5. 进行性能测试和优化:

    • 对基础软件进行负载测试和性能测试,找出瓶颈和性能瓶颈。
    • 评估不同参数组合或优化技术的性能效果,选择最优方案。
  6. 系统监控和调试:

    • 通过系统监控工具实时监控系统的性能、资源利用率等指标,在系统出现异常时及时进行调试和修复。
    • 收集日志信息,及时发现潜在问题,并对系统进行诊断。

除了以上措施,还可以根据不同的基础软件进行针对性的调优。例如,对于数据库系统,还可以使用分区、分片等技术,提高数据处理的并行性和可扩展性。对于Web服务器,还可以使用负载均衡技术,提高并发处理能力和可用性。总之,基础软件的性能调优需要多方面的考虑和综合处理。
Application read and write optimization mechanism

鲲鹏性能优化工具 Tuning Kit

Tuning Kit 是一款针对鲲鹏计算平台的性能分析和优化工具,能收集处理器硬件、操作系统、进程/线程、函数等各层次的性能数据,分析出系统性能指标,定位到系统瓶颈点及热点函数。

Tuning Kit 支持以下功能特性:

  1. 系统配置全景分析:采集整个系统的软硬件配置信息,分析并针对不合理项提供优化建议。
  2. Panoramic analysis of system performance: learn from the industry's USE (utilization, saturation, errors) method, collect the operating conditions of system CPU, memory, storage IO, network IO and other resources, obtain their utilization rate, saturation, errors and other indicators, identify System bottleneck.
  3. For some system index items, optimization suggestions are provided based on existing benchmark values ​​and optimization experience.
  4. System resource scheduling analysis: Based on CPU scheduling events, analyze the running status of CPU cores, processes/threads at each time point, process/thread switching, and give corresponding optimization suggestions.

Tuning Kit can help users better understand system performance, identify and solve system bottlenecks, and improve overall system efficiency.

系统配置全景分析
系统资源调度分析
系统性能全景分析
针对部分系统指标项优化建议

Kunpeng BoostKit
Kunpeng developer community: https://www.hikunpeng.com/developer/boostkit
kunpengBoostKit 23.0
Code Migration Module
Kunpeng official document address
https://www.hikunpeng.com/document/detail/zh/kunpengdevps/porting/qs/qs-pa-kunpengdevps. htmlKunpeng
Xiaozhi
https://www.hikunpeng.com/zh/airobot

References:
[1] GB/T 7714: Dai Zhitao, Liu Jianpei. Kunpeng Processor Architecture and Programming: Huawei Intelligent Computing Technology Series [M]. Beijing: Tsinghua University Press, 2020. [2] https:
//www.hikunpeng .com/
[3] Qi Zhengwei, Guan Haibing. Simple System Virtualization: Principles and Practice [M] Beijing: Tsinghua University Press, 2021.

Guess you like

Origin blog.csdn.net/m0_74037814/article/details/132564623