Problem with the CBLAS_ORDER parameter of cblas_zgemm

The cblas_zgemm is a general complex matrix-matrix multiplication routine which will be intensively used in my project.

The prototype of cblas_zgemm is:

void cblas_zgemm(const enum CBLAS_ORDEROrder, const enum CBLAS_TRANSPOSE TransA,
                const enum CBLAS_TRANSPOSE TransB, const int M, const int N,
                const int K, const void *alpha, const void *A,
                const int lda, const void *B, const int ldb,
                const void *beta, void *C, const int ldc);

I founda interesting problem with the CBLAS_ORDER parameter of cblas_zgemm when one dimension of matrix A/B is one, such A is 4*1 matrix and Bis 1*4 matrix (both with no transpose).

The CBLAS_ORDER is an enumuration:

enum CBLAS_ORDER {CblasRowMajor=101,CblasColMajor=102};

That is, to define whether the matrix storage is row major or column major.

I wrote a test routine:

lapack_complex_double*tu;
 lapack_complex_double *tcb_sm_ap4[16];
 const double tmp_ap4_u[] = {1,0, -1,0, -1,0,-1,0, \
  1,0, 0,-1, 1,0, 0,1, \
  1,0, 1,0, -1,0, 1,0, \
  1,0, 0,1, 1,0, 0,-1, \
  1,0, -INV_SQRT_2,-INV_SQRT_2,0,-1, INV_SQRT_2,-INV_SQRT_2, \
  1,0, INV_SQRT_2,-INV_SQRT_2,0,1, -INV_SQRT_2,-INV_SQRT_2, \
  1,0, INV_SQRT_2,INV_SQRT_2,0,-1, -INV_SQRT_2,INV_SQRT_2, \
  1,0, -INV_SQRT_2,INV_SQRT_2,0,1, INV_SQRT_2,INV_SQRT_2, \
  1,0, -1,0, 1,0, 1,0, \
  1,0, 0,-1, -1,0, 0,-1, \
  1,0, 1,0, 1,0, -1,0, \
  1,0, 0,1, -1,0, 0,1, \
  1,0, -1,0, -1,0, 1,0, \
  1,0, -1,0, 1,0, -1,0, \
  1,0, 1,0, -1,0, -1,0, \
  1,0, 1,0, 1,0, 1,0};

 tu = newlapack_complex_double[16 * 4];
 memcpy(tu, tmp_ap4_u,sizeof(tmp_ap4_u));

 for(int i = 0; i< 16; ++i)
 {
  const double tmp_i4[] = {1,0,0,0, 0,0, 0,0, \
   0,0, 1,0,0,0, 0,0, \
   0,0, 0,0,1,0, 0,0, \
   0,0, 0,0,0,0, 1,0};
  tcb_sm_ap4[i] = newlapack_complex_double[4*4];
  memcpy(tcb_sm_ap4[i], tmp_i4,sizeof(tmp_i4)); 

  lapack_complex_doublealpha;
  cblas_zdotc_sub(4, tu+4*i,1,  tu+4*i, 1, &alpha);
  alpha =complex_div(make_complex(-2, 0), alpha);

  lapack_complex_doublebeta = make_complex(1, 0);
  cblas_zgemm(CblasColMajor,CblasNoTrans, CblasConjTrans, 4, 4, 1, &alpha,tu+4*i, 4, tu+4*i, 4, &beta, tcb_sm_ap4[i],4);

  //debug
  std::cout<< "i = "<< i <<", C = \n";
  print_matrix(4, 4,tcb_sm_ap4[i]);  
 } 

 delete[] tu;
 for(int i = 0; i < 16; ++i)
 {
  delete[] tcb_sm_ap4[i];
 }

Note that, the above routine is to generate the Wn matrix (Wn =I - 2uu'/u'u, where u' is the conjugate transpose of u)as specified in table table 6.3.4.2.3-2 of 36.211, which is the precoding codebooks for four-antenna port cases.

also note that the print_matrix function will print the complex matrix to console.

If the cblas_zgemm is called with Order=CblasRowMajor, you will get weird results:

i = 0, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 1, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 2, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 3, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 4, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 5, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 6, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 7, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 8, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 9, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 10, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 11, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 12, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
i = 13, C =
0.5 -0.5 -0.5 1.26509e-098-1.32849e+303j
-0.5 0.5 -0.5 1.26509e-098-1.32849e+303j
-0.5 -0.5 0.5 1.26509e-098-1.32849e+303j
1.26509e-098+1.32849e+303j 1.26509e-098+1.32849e+303j1.26509e-098+1.32849e+303j
 -1.#INF
i = 14, C =
0.5 -0.5 1.26509e-098-1.32849e+303j -0.25
-0.5 0.5 1.26509e-098-1.32849e+303j -0.25
1.26509e-098+1.32849e+303j 1.26509e-098+1.32849e+303j -1.#INF6.32543e-099+6.642
46e+302j
-0.25 -0.25 6.32543e-099-6.64246e+302j 0.875
i = 15, C =
0.5 1.26509e-098-1.32849e+303j -0.25 0.25
1.26509e-098+1.32849e+303j -1.#INF 6.32543e-099+6.64246e+302j-6.32543e-099-6.64
246e+302j
-0.25 6.32543e-099-6.64246e+302j 0.875 0.125
0.25 -6.32543e-099+6.64246e+302j 0.125 0.875
Press any key to continue . . .

But when replacing the Order with CblasColMajor, it works properly:

i = 0, C =
0.5 0.5 0.5 0.5
0.5 0.5 -0.5 -0.5
0.5 -0.5 0.5 -0.5
0.5 -0.5 -0.5 0.5
i = 1, C =
0.5 0.5j -0.5 -0.5j
-0.5j 0.5 -0.5j 0.5
-0.5 0.5j 0.5 -0.5j
0.5j 0.5 0.5j 0.5
i = 2, C =
0.5 -0.5 0.5 -0.5
-0.5 0.5 0.5 -0.5
0.5 0.5 0.5 0.5
-0.5 -0.5 0.5 0.5
i = 3, C =
0.5 -0.5j -0.5 0.5j
0.5j 0.5 0.5j 0.5
-0.5 -0.5j 0.5 0.5j
-0.5j 0.5 -0.5j 0.5
i = 4, C =
0.5 0.353553+0.353553j 0.5j -0.353553+0.353553j
0.353553-0.353553j 0.5 -0.353553-0.353553j -0.5j
-0.5j -0.353553+0.353553j 0.5 -0.353553-0.353553j
-0.353553-0.353553j 0.5j -0.353553+0.353553j 0.5
i = 5, C =
0.5 -0.353553+0.353553j -0.5j 0.353553+0.353553j
-0.353553-0.353553j 0.5 0.353553-0.353553j 0.5j
0.5j 0.353553+0.353553j 0.5 0.353553-0.353553j
0.353553-0.353553j -0.5j 0.353553+0.353553j 0.5
i = 6, C =
0.5 -0.353553-0.353553j 0.5j 0.353553-0.353553j
-0.353553+0.353553j 0.5 0.353553+0.353553j -0.5j
-0.5j 0.353553-0.353553j 0.5 0.353553+0.353553j
0.353553+0.353553j 0.5j 0.353553-0.353553j 0.5
i = 7, C =
0.5 0.353553-0.353553j -0.5j -0.353553-0.353553j
0.353553+0.353553j 0.5 -0.353553+0.353553j 0.5j
0.5j -0.353553-0.353553j 0.5 -0.353553+0.353553j
-0.353553+0.353553j -0.5j -0.353553-0.353553j 0.5
i = 8, C =
0.5 0.5 -0.5 -0.5
0.5 0.5 0.5 0.5
-0.5 0.5 0.5 -0.5
-0.5 0.5 -0.5 0.5
i = 9, C =
0.5 0.5j 0.5 0.5j
-0.5j 0.5 0.5j -0.5
0.5 -0.5j 0.5 -0.5j
-0.5j -0.5 0.5j 0.5
i = 10, C =
0.5 -0.5 -0.5 0.5
-0.5 0.5 -0.5 0.5
-0.5 -0.5 0.5 0.5
0.5 0.5 0.5 0.5
i = 11, C =
0.5 -0.5j 0.5 -0.5j
0.5j 0.5 -0.5j -0.5
0.5 0.5j 0.5 0.5j
0.5j -0.5 -0.5j 0.5
i = 12, C =
0.5 0.5 0.5 -0.5
0.5 0.5 -0.5 0.5
0.5 -0.5 0.5 0.5
-0.5 0.5 0.5 0.5
i = 13, C =
0.5 0.5 -0.5 0.5
0.5 0.5 0.5 -0.5
-0.5 0.5 0.5 0.5
0.5 -0.5 0.5 0.5
i = 14, C =
0.5 -0.5 0.5 0.5
-0.5 0.5 0.5 0.5
0.5 0.5 0.5 -0.5
0.5 0.5 -0.5 0.5
i = 15, C =
0.5 -0.5 -0.5 -0.5
-0.5 0.5 -0.5 -0.5
-0.5 -0.5 0.5 -0.5
-0.5 -0.5 -0.5 0.5
Press any key to continue . . .

Since operator A is a 4*1 matrix, row-major or column-major should have the same storage layout,but the cblas_zgemm seems prefer column-major.

Problem with the CBLAS_ORDER parameter of cblas_zgemm

猜你喜欢