linux下使用cublasMatmulBench对GPU进行测试

cublasMatmulBench这个工具非官方渠道的工具,但是从nvidia github上的资料信息来看,应该是某些渠道获取的一个工具,可以在环境中正常使用。

今天就来介绍怎么使用这个工具,用法比较简单,主要是针对GMEM的测试

我们先来看看使用帮助

./cublasMatmulBench -h <options>
-h          : display this help
-help       : display this help
Test-specific options :
-P={sss,sss_fast_tf32,ccc,hhh,ddd,zzz,hsh,hss,bisb_imma,bii_imma,qqssq,tss,tst}
    sss: input type: float, math type: float, scale type: float, output type: float
    sss_fast_tf32: input type: float, math type: float, scale type: float, output type: float
    ccc: input type: complex, math type: complex, scale type: complex, output type: complex
    hhh: input type: half, math type: half, scale type: half, output type: half
    ddd: input type: double, math type: double, scale type: double, output type: double
    zzz: input type: double complex, math type: double complex, scale type: double complex, output type: double complex
    hsh: input type: half, math type: float, output type: half , scale type: float
    hss: input type: half, math type: float, output type: float, scale type: float
    bisb_imma: input type: int8, math type: int32, scale type: float, output type: int8
    bii_imma: input type: int8, math type: int32, scale type: int32, output type: int32
    qqssq: input type a: fp8_e4m3, input type b: fp8_e4m3, input type c: bfloat16, math type: float, scale type: float, output type: fp8_e4m3
    tss: input type: bfloat16, math type: float, output type: float , scale type: float
    tst: input type: bfloat16, math type: float, output type: bfloat16 , scale type: float
-m=<int>  : number of rows of A and C
-n=<int>  : number of columns of B and C
-k=<int>  : number of columns of A and rows of B
-A=<float>  : value of alpha
-B=<float>  : value of beta
-T=<int> : run N times back to back  , good for power consumption, no results checking
-lda=<int> : leading dimension of A , m by default
-ldb=<int> : leading dimension of B , k by default
-ldc=<int> : leading dimension of C , m by default
-ta= op(A) {0=no transpose, 1=transpose, 2=hermitian}
-tb= op(B) {0=no transpose, 1=transpose, 2=hermitian}
-p=<int> : 0:fill all matrices with zero, otherwise fill with pseudorandom distribution
-m_outOfPlace=<int> : out of place (C != D), 0: disable, 1:enable
-m_epilogue={Default,Bias,Gelu,ReLu,ReLuBias,GeluBias}
-z<a|b|c|d>=<0|1> : zero-copy , A,B,C or D is pinned on the Host, 0: disable, 1:enable
-s : shows the CUDA configuration of the machines

然后根据不同精度测试

INT8
./cublasMatmulBench -P=bisb_imma -m=8192 -n=3456 -k=16384 -T=1000 -ta=1 -B=0
FP16
./cublasMatmulBench -P=hsh -m=12288 -n=9216 -k=32768 -T=1000 -tb=1 -B=0
TF32
./cublasMatmulBench -P=sss_fast_tf32 -m=8192 -n=3456 -k=16384 -T=1000 -ta=1 -B=0
FP32
./cublasMatmulBench -P=ddd -m=3456 -n=2048 -k=16384 -T=1000 -tb=1 -B=0
FP64
./cublasMatmulBench -P=sss -m=3456 -n=2048 -k=16384 -T=1000 -tb=1 -B=0

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://sulao.cn/post/1065

评论列表

0%