OpenACC 计算圆周率(简单版)

▶ 书上的计算圆周率的简单程序,主要是使用了自定义函数

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 #include <math.h>
 4 #include <openacc.h>
 5 
 6 #define N   100
 7 
 8 #pragma acc routine seq
 9 float ff(const float x)
10 {    
11     return 4.0f / (1.0f + x * x);
12 }
13 
14 int main()
15 {
16     const float h = 1.0f / N;
17     float sumf = 0, result;
18            
19 #pragma acc parallel loop reduction(+:sumf)
20     for (int i = 0; i < N; i++)
21         sumf += ff(h * (i - 0.5f));
22 
23     result = h * sumf;    
24     printf("\nN = %d, myPi = %f, diff = %e\n", N, result, result / 3.141592653589793238 - 1);
25     //getchar();
26     return 0;
27 }

● 输出结果

D:\Code\OpenACC\OpenACCProject\OpenACCProject>pgcc main.c -acc -Minfo -o main_acc.exe
ff:
     10, Generating acc routine seq
         Generating Tesla code
     11, FMA (fused multiply-add) instruction(s) generated
main:
     19, Accelerator kernel generated
         Generating Tesla code
         20, #pragma acc loop gang, vector(100) /* blockIdx.x threadIdx.x */
             Generating reduction(+:sumf)
     19, Generating implicit copy(sumf)

D:\Code\OpenACC\OpenACCProject\OpenACCProject>main_acc.exe
launch CUDA kernel  file=D:\Code\OpenACC\OpenACCProject\OpenACCProject\main.c function=main line=19 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=100 grid=1 block=100 shared memory=1024
launch CUDA kernel  file=D:\Code\OpenACC\OpenACCProject\OpenACCProject\main.c function=main line=19 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=256 grid=1 block=256 shared memory=1024

N = 100, myPi = 3.161500, diff = 6.336546e-03
PGI: "acc_shutdown" not detected, performance results might be incomplete.
 Please add the call "acc_shutdown(acc_device_nvidia)" to the end of your application to ensure that the performance results are complete.

Accelerator Kernel Timing data
D:\Code\OpenACC\OpenACCProject\OpenACCProject\main.c
  main  NVIDIA  devicenum=0
    time(us): 11
    19: compute region reached 1 time
        19: kernel launched 1 time
            grid: [1]  block: [100]
            elapsed time(us): total=1000 max=1000 min=1000 avg=1000
        19: reduction kernel launched 1 time
            grid: [1]  block: [256]
             device time(us): total=0 max=0 min=0 avg=0
    19: data region reached 2 times
        19: data copyin transfers: 1
             device time(us): total=4 max=4 min=4 avg=4
        23: data copyout transfers: 1
             device time(us): total=7 max=7 min=7 avg=7

猜你喜欢

转载自www.cnblogs.com/cuancuancuanhao/p/9419429.html