Lichee_RV learning series--stream transplantation

Lichee_RV Learning Series Article Catalog

Lichee_RV Learning Series—Know Lichee Rv Dock, build the environment and compile the first program
Lichee_RV Learning Series—transplant dhrystone



1. Introduction to stream

official introduction

Stream test is an industry-recognized benchmark tool for memory bandwidth performance testing in memory testing . Stream supports four operations: Copy, Scale, Add, and Triad, and memory bandwidth is tested through these four operations.

2. Source code download

The source code can be downloaded from Github, or downloaded from the official introduction , which contains the Linux version, just download the Linux version.

github source code

The content of the file is as follows, the core files are stream.c and stream.f, the source code is written in C and fortran language, you can use gcc or fortran to compile the source file, here use gcc
insert image description here

3. File migration

1. Makefile compilation

Just modify the Makefile in the source code. Here are two executable files, which are single-threaded and multi-threaded .

#设置编译链路径及工具
CTOOL:=riscv64-unknown-linux-gnu-
CCL:=/home/allwinner/workspace/tina-D1-H/prebuilt/gcc/linux-x86/riscv/toolchain-thead-glibc/riscv64-glibc-gcc-thead_20200702
CC:=${
    
    CCL}/bin/${
    
    CTOOL}gcc
CFLAGS:=-O2 -finput-charset=UTF-8
STFLAGES:=-ffreestanding   -DSTREAM_ARRAY_SIZE=10000000 -mabi=lp64d  -march=rv64gcxthead  -mcmodel=medany -mexplicit-relocs 

# 参数介绍
# -mtune=native -march=native 针对CPU指令的优化,此处由于编译机即运行机器。故采用native的优化方法。
# -O3 编译器编译优化级别。
# -mcmodel=medium 当单个Memory Array Size 大于2GB时需要设置此参数(小于2GB时设置无效)。
# -fopenmp 适应多处理器环境,开启后,程序默认线程为CPU线程数。也可以在运行前设置进程数
# 设置方法: export OMP_NUM_THREADS=x  x为你想设置的线程数
# -DSTREAM_ARRAY_SIZE=100000000:这个参数是对测试结果影响最大,也是最需要关注的一个参数,指定计算中a[],b[],c[]数组的大小。
# -DNTIMES=20:执行的次数,并且从这些结果中选最优值

all : stream_single_thread stream_multithreading ubuntu
.PHONY:all
#单线程编译
stream_single_thread:stream_single_thread.o
	$(CC) $< -o $@ 
stream_single_thread.o:stream.c
	$(CC)  $(CFLAGS) -c $< -o $@   ${
    
    STFLAGES}

#多线程编译 -- 玄铁C906好像不支持-fopenmp参数
stream_multithreading:stream_multithreading.o
	$(CC) $< -o  $@ -fopenmp
stream_multithreading.o:stream.c
	$(CC)  $(CFLAGS) -c $< -o $@   ${
    
    STFLAGES} -fopenmp

#ubuntu环境下编译
ubuntu:ubuntu.o
	gcc $< -o $@ -fopenmp
ubuntu.o:stream.c
	gcc -mtune=native -march=native -O3 -mcmodel=medium  -DSTREAM_ARRAY_SIZE=100000000 -DNTIMES=20  -c $< -o $@

#清理规则
.PHONY:clean
clean:
	-rm -f stream_single_thread stream_single_thread.o stream_multithreading stream_multithreading.o ubuntu ubuntu.o

There are quite a lot of settings involved here, some are the compilation configuration of the cross-compilation toolchain, some are the compilation configuration of the stream itself, and the compilation configuration of the essence of the stream, you can find the official documentation for yourself.
Click the link below to see what compilation instructions Lichee Rv supports:

Getting Started with RISC-V Embedded Development Part 1: Introduction to RISC-V GCC Toolchain
For RISV-V processors, explain the gcc compiler in detail, the meaning of gcc compiler parameters in Makefile Compilation
support official documents

You can also use the gcc -v of the compilation toolchain to check what commands are supported, gcc --help to check
the output of the Lichee Rv Dock that supports the use of the -v command, and you can see a lot of information. Support languages ​​C, C++, fortran, etc.

cv/toolchain-thead-glibc/riscv64-glibc-gcc-thead_20200702/bin/riscv64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/home/allwinner/workspace/tina-D1-H/prebuilt/gcc/linux-x86/riscv/toolchain-thead-glibc/riscv64-glibc-gcc-thead_20200702/bin/riscv64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/allwinner/workspace/tina-D1-H/prebuilt/gcc/linux-x86/riscv/toolchain-thead-glibc/riscv64-glibc-gcc-thead_20200702/bin/../libexec/gcc/riscv64-unknown-linux-gnu/8.1.0/lto-wrapper
Target: riscv64-unknown-linux-gnu

Configured with: /ldhome/software/toolsbuild/slave/workspace/riscv64_build_linux_x86_64/build/../source/riscv/riscv-gcc/configure --target=riscv64-unknown-linux-gnu --with-mpc=/ldhome/software/toolsbuild/slave/workspace/riscv64_build_linux_x86_64/lib-for-gcc-x86_64-linux/ 
--with-mpfr=/ldhome/software/toolsbuild/slave/workspace/riscv64_build_linux_x86_64/lib-for-gcc-x86_64-linux/ 
--with-gmp=/ldhome/software/toolsbuild/slave/workspace/riscv64_build_linux_x86_64/lib-for-gcc-x86_64-linux/ 
--prefix=/ldhome/software/toolsbuild/slave/workspace/riscv64_build_linux_x86_64/install 
--with-sysroot=/ldhome/software/toolsbuild/slave/workspace/riscv64_build_linux_x86_64/install/sysroot 
--with-system-zlib 
--enable-shared --enable-tls --enable-languages=c,c++,fortran 
--disable-libmudflap --disable-libssp --disable-libquadmath --disable-nls --disable-bootstrap --src=../../source/riscv/riscv-gcc 
--enable-checking=yes --with-pkgversion='C-SKY RISCV Tools V1.8.4 B20200702' 
--enable-multilib --with-abi=lp64d --with-arch=rv64gcxthead 'CFLAGS_FOR_TARGET=-O2  -mcmodel=medany' 'CXXFLAGS_FOR_TARGET=-O2  -mcmodel=medany' CC=gcc CXX=g++

Thread model: posix
gcc version 8.1.0 (C-SKY RISCV Tools V1.8.4 B20200702) 

makefile porting

Run the file and pay attention to transplant the entire folder to the development board, mainly the intermediate file.o needs to be transplanted at the same time, otherwise it will run wrong (I don’t know why, just transplanting the final file will cause an error, you have to review the Makefile and The process of compiling)
generally includes the following files: cat_gcc is just the information I store in the gcc -v file, and has nothing to do with the stream project
insert image description here

4. Running results

ubuntu下执行的结果:

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 20 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 656919 microseconds.
   (= 656919 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            2076.6     0.818867     0.770487     0.884801
Scale:           2032.1     0.828582     0.787348     0.904562
Add:             2339.3     1.069185     1.025955     1.135896
Triad:           2332.6     1.092763     1.028905     1.282664
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

Lichee single-threaded running results

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 186503 microseconds.
   (= 186503 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            1092.9     0.155637     0.146398     0.171233
Scale:            870.7     0.192386     0.183770     0.201332
Add:              937.5     0.262170     0.256007     0.278177
Triad:            915.9     0.272345     0.262036     0.308677
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

Lichee 多线程运行结果

./stream_multithreading: error while loading shared libraries: libgomp.so.1: cannot open shared object file: No such file or directory

This requires the support of the libgomp.so library, and I checked the latest gcc compilation toolchain, and there is no such thing. It is estimated that you have to get the source file and compile it yourself. I will not do it here. Interested friends can compile it by themselves (mainly because I have been doing this transplant for too long and I don’t want to do it). Use the cross toolchain to compile and transplant to /usr/lib.

5. Strange errors and solutions during the transplantation process

I have been working on porting this for a long time, and it actually works 两天. The main problem is the setting of compiler instructions and the correct writing of Makefile files, as well as a strange problem ( executable files can be directly executed on a PC , regardless of whether there are intermediate files. o, and the .o file must be placed on the development board on the development board, and then the executable file will not go wrong).

Problem 1: ELF not found, sysntax error appears

insert image description here

There are usually two reasons for this problem: one is that the tool for compiling the file is wrong, and it may be that the file on the PC is run on the Lichee Rv board.
The second is: the lack of some library files (this is the answer seen from stack overflow). I'm messed up here, and I don't know why.

My problem here is mainly the lack of the intermediate file .o file, and the execution will go wrong. The solution is above.

Question 2: /bin/sh xxx: not found

The instruction is not supported. It may be that the compilation uses an instruction that is not supported by the compilation . During the compilation process, the compiler did not find it. Here you need to see what compilation instructions are supported. For example, use: use -march=rv64gc
, and check the compiler, it supports: -march=rv64gcxthead, then it can be compiled, but the compiled file cannot be executed.
insert image description here

5. Reference link

Memory performance test tool STREAM STREAM memory bandwidth test tool under linux
and its internal implementation
Use Stream and MLC to test memory performance
For RISV-V processors, explain the gcc compiler in detail, and the meaning of the gcc compiler parameters in the Makefile

Guess you like

Origin blog.csdn.net/weixin_46185705/article/details/128675699