The cblas_zgemm is a general complex matrix-matrix multiplication routine which will be intensively used in my project.
The prototype of cblas_zgemm is:
void cblas_zgemm(const enum CBLAS_ORDEROrder, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_TRANSPOSE TransB, const int M, const int N,
const int K, const void *alpha, const void *A,
const int lda, const void *B, const int ldb,
const void *beta, void *C, const int ldc);
I founda interesting problem with the CBLAS_ORDER parameter of cblas_zgemm when one dimension of matrix A/B is one, such A is 4*1 matrix and Bis 1*4 matrix (both with no transpose).
The CBLAS_ORDER is an enumuration:
enum CBLAS_ORDER {CblasRowMajor=101,CblasColMajor=102};
That is, to define whether the matrix storage is row major or column major.
I wrote a test routine:
lapack_complex_double*tu; lapack_complex_double *tcb_sm_ap4[16]; const double tmp_ap4_u[] = {1,0, -1,0, -1,0,-1,0, \ 1,0, 0,-1, 1,0, 0,1, \ 1,0, 1,0, -1,0, 1,0, \ 1,0, 0,1, 1,0, 0,-1, \ 1,0, -INV_SQRT_2,-INV_SQRT_2,0,-1, INV_SQRT_2,-INV_SQRT_2, \ 1,0, INV_SQRT_2,-INV_SQRT_2,0,1, -INV_SQRT_2,-INV_SQRT_2, \ 1,0, INV_SQRT_2,INV_SQRT_2,0,-1, -INV_SQRT_2,INV_SQRT_2, \ 1,0, -INV_SQRT_2,INV_SQRT_2,0,1, INV_SQRT_2,INV_SQRT_2, \ 1,0, -1,0, 1,0, 1,0, \ 1,0, 0,-1, -1,0, 0,-1, \ 1,0, 1,0, 1,0, -1,0, \ 1,0, 0,1, -1,0, 0,1, \ 1,0, -1,0, -1,0, 1,0, \ 1,0, -1,0, 1,0, -1,0, \ 1,0, 1,0, -1,0, -1,0, \ 1,0, 1,0, 1,0, 1,0}; tu = newlapack_complex_double[16 * 4]; memcpy(tu, tmp_ap4_u,sizeof(tmp_ap4_u)); for(int i = 0; i< 16; ++i) { const double tmp_i4[] = {1,0,0,0, 0,0, 0,0, \ 0,0, 1,0,0,0, 0,0, \ 0,0, 0,0,1,0, 0,0, \ 0,0, 0,0,0,0, 1,0}; tcb_sm_ap4[i] = newlapack_complex_double[4*4]; memcpy(tcb_sm_ap4[i], tmp_i4,sizeof(tmp_i4)); lapack_complex_doublealpha; cblas_zdotc_sub(4, tu+4*i,1, tu+4*i, 1, &alpha); alpha =complex_div(make_complex(-2, 0), alpha); lapack_complex_doublebeta = make_complex(1, 0); cblas_zgemm(CblasColMajor,CblasNoTrans, CblasConjTrans, 4, 4, 1, &alpha,tu+4*i, 4, tu+4*i, 4, &beta, tcb_sm_ap4[i],4); //debug std::cout<< "i = "<< i <<", C = \n"; print_matrix(4, 4,tcb_sm_ap4[i]); } delete[] tu; for(int i = 0; i < 16; ++i) { delete[] tcb_sm_ap4[i]; }Note that, the above routine is to generate the Wn matrix (Wn =I - 2uu'/u'u, where u' is the conjugate transpose of u)as specified in table table 6.3.4.2.3-2 of 36.211, which is the precoding codebooks for four-antenna port cases.
also note that the print_matrix function will print the complex matrix to console.
If the cblas_zgemm is called with Order=CblasRowMajor, you will get weird results:
i = 0, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 1, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 2, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 3, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 4, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 5, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 6, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 7, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 8, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 9, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 10, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 11, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 12, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 i = 13, C = 0.5 -0.5 -0.5 1.26509e-098-1.32849e+303j -0.5 0.5 -0.5 1.26509e-098-1.32849e+303j -0.5 -0.5 0.5 1.26509e-098-1.32849e+303j 1.26509e-098+1.32849e+303j 1.26509e-098+1.32849e+303j1.26509e-098+1.32849e+303j -1.#INF i = 14, C = 0.5 -0.5 1.26509e-098-1.32849e+303j -0.25 -0.5 0.5 1.26509e-098-1.32849e+303j -0.25 1.26509e-098+1.32849e+303j 1.26509e-098+1.32849e+303j -1.#INF6.32543e-099+6.642 46e+302j -0.25 -0.25 6.32543e-099-6.64246e+302j 0.875 i = 15, C = 0.5 1.26509e-098-1.32849e+303j -0.25 0.25 1.26509e-098+1.32849e+303j -1.#INF 6.32543e-099+6.64246e+302j-6.32543e-099-6.64 246e+302j -0.25 6.32543e-099-6.64246e+302j 0.875 0.125 0.25 -6.32543e-099+6.64246e+302j 0.125 0.875 Press any key to continue . . .But when replacing the Order with CblasColMajor, it works properly:
i = 0, C = 0.5 0.5 0.5 0.5 0.5 0.5 -0.5 -0.5 0.5 -0.5 0.5 -0.5 0.5 -0.5 -0.5 0.5 i = 1, C = 0.5 0.5j -0.5 -0.5j -0.5j 0.5 -0.5j 0.5 -0.5 0.5j 0.5 -0.5j 0.5j 0.5 0.5j 0.5 i = 2, C = 0.5 -0.5 0.5 -0.5 -0.5 0.5 0.5 -0.5 0.5 0.5 0.5 0.5 -0.5 -0.5 0.5 0.5 i = 3, C = 0.5 -0.5j -0.5 0.5j 0.5j 0.5 0.5j 0.5 -0.5 -0.5j 0.5 0.5j -0.5j 0.5 -0.5j 0.5 i = 4, C = 0.5 0.353553+0.353553j 0.5j -0.353553+0.353553j 0.353553-0.353553j 0.5 -0.353553-0.353553j -0.5j -0.5j -0.353553+0.353553j 0.5 -0.353553-0.353553j -0.353553-0.353553j 0.5j -0.353553+0.353553j 0.5 i = 5, C = 0.5 -0.353553+0.353553j -0.5j 0.353553+0.353553j -0.353553-0.353553j 0.5 0.353553-0.353553j 0.5j 0.5j 0.353553+0.353553j 0.5 0.353553-0.353553j 0.353553-0.353553j -0.5j 0.353553+0.353553j 0.5 i = 6, C = 0.5 -0.353553-0.353553j 0.5j 0.353553-0.353553j -0.353553+0.353553j 0.5 0.353553+0.353553j -0.5j -0.5j 0.353553-0.353553j 0.5 0.353553+0.353553j 0.353553+0.353553j 0.5j 0.353553-0.353553j 0.5 i = 7, C = 0.5 0.353553-0.353553j -0.5j -0.353553-0.353553j 0.353553+0.353553j 0.5 -0.353553+0.353553j 0.5j 0.5j -0.353553-0.353553j 0.5 -0.353553+0.353553j -0.353553+0.353553j -0.5j -0.353553-0.353553j 0.5 i = 8, C = 0.5 0.5 -0.5 -0.5 0.5 0.5 0.5 0.5 -0.5 0.5 0.5 -0.5 -0.5 0.5 -0.5 0.5 i = 9, C = 0.5 0.5j 0.5 0.5j -0.5j 0.5 0.5j -0.5 0.5 -0.5j 0.5 -0.5j -0.5j -0.5 0.5j 0.5 i = 10, C = 0.5 -0.5 -0.5 0.5 -0.5 0.5 -0.5 0.5 -0.5 -0.5 0.5 0.5 0.5 0.5 0.5 0.5 i = 11, C = 0.5 -0.5j 0.5 -0.5j 0.5j 0.5 -0.5j -0.5 0.5 0.5j 0.5 0.5j 0.5j -0.5 -0.5j 0.5 i = 12, C = 0.5 0.5 0.5 -0.5 0.5 0.5 -0.5 0.5 0.5 -0.5 0.5 0.5 -0.5 0.5 0.5 0.5 i = 13, C = 0.5 0.5 -0.5 0.5 0.5 0.5 0.5 -0.5 -0.5 0.5 0.5 0.5 0.5 -0.5 0.5 0.5 i = 14, C = 0.5 -0.5 0.5 0.5 -0.5 0.5 0.5 0.5 0.5 0.5 0.5 -0.5 0.5 0.5 -0.5 0.5 i = 15, C = 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5 -0.5 0.5 Press any key to continue . . .Since operator A is a 4*1 matrix, row-major or column-major should have the same storage layout,but the cblas_zgemm seems prefer column-major.