table of Contents
background
rodinia_3.1 is a benchmark for GPU-CPU experiments. It can perform kmeans, hotspot and other experiments on openmp, opencl, and cuda, but its compilation and running are quite laborious. The following is a record of my entire process from downloading to running ,For reference
step
Download and unzip
Download the compressed package http://www.cs.virginia.edu/~kw5na/lava/Rodinia/Packages/Current/rodinia_3.1.tar.bz2 , then upload it to the ubuntu server and unzip it
root@sundata:/data/szc# tar -jxvf rodinia_3.1.tar.bz2
Make sure you have installed cuda, opencl for cuda (for installation steps, please refer to the installation of OpenCL and OpenACC article ), gcc7 and g++7
Modify and copy files
We need to modify some files, configure some paths, and adapt my sm and compute structure
1. Enter the rodinia directory and modify the common/make.config file
root@sundata:/data/szc/rodinia_3.1# vim common/make.config
Modify CUDA_DIR, SDK_DIR and OPENCL_DIR to your own cuda directory, C directory under the opencl directory, and opencl directory
# CUDA toolkit installation path
CUDA_DIR = /usr/local/cuda-10.0
# CUDA toolkit libraries
CUDA_LIB_DIR := $(CUDA_DIR)/lib
ifeq ($(shell uname -m), x86_64)
ifeq ($(shell if test -d $(CUDA_DIR)/lib64; then echo T; else echo F; fi), T)
CUDA_LIB_DIR := $(CUDA_DIR)/lib64
endif
endif
# CUDA SDK installation path
#SDK_DIR = $(HOME)/NVIDIA_GPU_Computing_SDK/C
SDK_DIR = /root/NVIDIA_GPU_Computing_SDK/C
#SDK_DIR =/if10/kw5na/NVIDIA_CUDA_Computing_SDK4/C
# OPENCL
# NVIDIA_DIR
OPENCL_DIR =/root/NVIDIA_GPU_Computing_SDK
OPENCL_INC = $(OPENCL_DIR)/OpenCL/common/inc
OPENCL_LIB = $(OPENCL_DIR)/OpenCL/common/lib -lOpenCL
# AMD_DIR
# OPENCL_DIR = /if10/kw5na/Packages/AMD-APP-SDK-v2.8-RC-lnx64
# OPENCL_INC = $(OPENCL_DIR)/include/
# OPENCL_LIB = $(OPENCL_DIR)/lib/x86_64/ -lOpenCL
#ifeq ($(shell uname -m), x86_64)
# ifeq ($(shell if test -d $(OPENCL_DIR)/lib/x86_64/; then echo T; else echo F; fi), T)
# OPENCL_LIB = $(OPENCL_DIR)/lib/x86_64/
# endif
#endif
2. Then modify the common/common.mk file and change SM_VERSIONS to sm_60
.........
.SUFFIXES : .cu .cu_dbg_o .c_dbg_o .cpp_dbg_o .cu_rel_o .c_rel_o .cpp_rel_o .cubin
# Add new SM Versions here as devices with new Compute Capability are released
# SM_VERSIONS := sm_10 sm_11 sm_12 sm_13
SM_VERSIONS := sm_60
CUDA_INSTALL_PATH ?= /usr/local/cuda-10.0
.........
3. Modify cuda/dwt2d/Makefile, cuda/cfd/Makefile, cuda/particlefilter/Makefile, cuda/lavaMD/makefile, cuda/hybridsort/Makefile, and change all the parameters suspected of sm_20 and compute_20 from 20 to 60 (The number here indicates the computing power, which can be viewed through https://developer.nvidia.com/cuda-gpus#compute . Although the capacity of my GeForce 1070 TI is 6.1, sm_60 can also be compiled and passed)
4. Modify the cuda/mummergpu/src/suffix-tree.cpp file and add the header file <unistd.h>
5. Modify the content in the openmp/mummergpu/src directory to https://github.com/rmtheis/mummergpu/tree/master/mummergpu-2.0/src and copy it directly
6. Modify the opencl/cfd/euler3d.cpp file and comment out the if (file==NULL) judgment branch in line 276~278
........
//float* normals;
{
std::ifstream file(data_file_name);
// if(file==NULL){
// throw(string("can not find/open file!"));
// }
file >> nel;
.........
4. Copy some files
root@sundata:/data/szc/rodinia_3.1# cp -r /usr/local/cuda-10.0/samples/common/inc/* /root/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc/
root@sundata:/data/szc/rodinia_3.1# cp -r /usr/local/cuda-10.0/samples/common/inc/* /root/NVIDIA_GPU_Computing_SDK/C/common/inc/
Compile
Compile submodule
We have to compile some things manually
1. Under the cuda/kmeans directory
root@sundata:/data/szc/rodinia_3.1/cuda/kmeans# gcc -g -fopenmp -O2 -c kmeans.c
2. Under the cuda/leukocyte/meschach_lib directory
root@sundata:/data/szc/rodinia_3.1/cuda/leukocyte/meschach_lib# cc -c -O -DHAVE_CONFIG_H *.c
3. Under the cuda/leukocyte/CUDA directory
root@sundata:/data/szc/rodinia_3.1/cuda/leukocyte/CUDA# gcc -g -O3 -Wall -I../meschach_lib *.c -c
4. Under the openmp/leukocyte/OpenMP directory
root@sundata:/data/szc/rodinia_3.1/openmp/leukocyte/OpenMP# cc -c -O -DHAVE_CONFIG_H *.c -I../meschach_lib
5. Under the opencl/leukocyte/OpenCL directory
root@sundata:/data/szc/rodinia_3.1/opencl/leukocyte/OpenCL# cc -c -O -DHAVE_CONFIG_H *.c -I ../../../openmp/leukocyte/meschach_lib -I /usr/local/cuda-10.0/include
Compile the project
6. Finally, go back to the project root directory and make
root@sundata:/data/szc/rodinia_3.1# make
......
make[1]: Leaving directory '/data/szc/rodinia_3.1/opencl/dwt2d'
root@sundata:/data/szc/rodinia_3.1#
As long as no error is reported at the end, it is fine. If there is still an error, you must modify the makefile, source file or manually compile according to the situation.
Error handling
1), if it is similar to the following error
nvcc fatal : Value 'compute_20' is not defined for option 'gpu-architecture'
It is necessary to modify the makefile file of the corresponding directory, and change the XX_20 inside to XX_60 (the value suitable for your GPU computing power)
2) If it is a code error in the source file, you have to modify the corresponding source file, such as adding header files, commenting code, modifying types, etc.
3) If the XXX.o file cannot be found, then we have to enter the corresponding directory and compile it ourselves. The compilation command can refer to the latest line of compilation command above the error message, but if it is a compiled .h file, it must be changed to a .c file; if we cannot find the header file when compiling, then locate this header file and then The directory contains the compilation command as the -I parameter, similar to this
root@sundata:/data/szc/rodinia_3.1/openmp/leukocyte/OpenMP# cc -c -O -DHAVE_CONFIG_H *.c -I/data/szc/rodinia_3.1/openmp/leukocyte/meschach_lib
View Results
Finally, let's take a look at the results generated in the bin directory
Compilation is now complete
run
Take kmeans as an example, other examples may have problems
1. Enter the data/kmeans directory
root@sundata:/data/szc/rodinia_3.1# cd data/kmeans/
2. Copy a copy of kmeans.cl file
root@sundata:/data/szc/rodinia_3.1/data/kmeans# cp /data/szc/rodinia_3.1/opencl/kmeans/kmeans.cl .
3. Run the test sample
root@sundata:/data/szc/rodinia_3.1/data/kmeans# ../../bin/linux/opencl/kmeans -i 204800.txt
WG size of kernel_swap = 256, WG size of kernel_kmeans = 256
I/O completed
Number of objects: 204800
Number of features: 34
iterated 19 times
Number of Iteration: 1
root@sundata:/data/szc/rodinia_3.1/data/kmeans# ../../bin/linux/cuda/kmeans -i 204800.txt
I/O completed
Number of objects: 204800
Number of features: 34
iterated 2 times
Number of Iteration: 1
root@sundata:/data/szc/rodinia_3.1/data/kmeans# ../../bin/linux/omp/kmeans -i 204800.txt
I/O completed
num of threads = 1
number of Clusters 5
number of Attributes 34
Time for process: 0.491860
Screenshot below
Error handling
If some test samples run incorrectly under bin
Or you can't find the executable file at all (there is no b+tree in the picture below)
A feasible solution is to switch to the model you want to run (such as opencl), enter the application you want to test, and compile or run it. Just run the following opencl/bfs directly
But opencl/b+tree reports an error directly, the memory overflow should be
But the cuda version can run normally, just change sm_20 in Makefile to sm_60 before compiling
root@sundata:/data/szc/rodinia_3.1/cuda/b+tree# vim Makefile
root@sundata:/data/szc/rodinia_3.1/cuda/b+tree# make KERNEL_DIM="-DRD_WG_SIZE_0=256"
The running screenshot is as follows
Conclusion
In short, compiling and running rodinia is a very cumbersome process, during which many problems will be encountered, and we need to find ways to solve them according to the actual situation.
So far, I will share with you the results of my compilation. Many executable files can be run directly:
Link: https://pan.baidu.com/s/1a8x9M_tRN9wqeDU627xkAw
Extraction code: cgs4
If you have any questions, you can discuss them in the comment section, and I will reply as soon as I see them.