cublasMatmulBench这个工具非官方渠道的工具,但是从nvidia github上的资料信息来看,应该是某些渠道获取的一个工具,可以在环境中正常使用。
今天就来介绍怎么使用这个工具,用法比较简单,主要是针对GMEM的测试
我们先来看看使用帮助
./cublasMatmulBench -h
<options> -h : display this help -help : display this help Test-specific options : -P={sss,sss_fast_tf32,ccc,hhh,ddd,zzz,hsh,hss,bisb_imma,bii_imma,qqssq,tss,tst} sss: input type: float, math type: float, scale type: float, output type: float sss_fast_tf32: input type: float, math type: float, scale type: float, output type: float ccc: input type: complex, math type: complex, scale type: complex, output type: complex hhh: input type: half, math type: half, scale type: half, output type: half ddd: input type: double, math type: double, scale type: double, output type: double zzz: input type: double complex, math type: double complex, scale type: double complex, output type: double complex hsh: input type: half, math type: float, output type: half , scale type: float hss: input type: half, math type: float, output type: float, scale type: float bisb_imma: input type: int8, math type: int32, scale type: float, output type: int8 bii_imma: input type: int8, math type: int32, scale type: int32, output type: int32 qqssq: input type a: fp8_e4m3, input type b: fp8_e4m3, input type c: bfloat16, math type: float, scale type: float, output type: fp8_e4m3 tss: input type: bfloat16, math type: float, output type: float , scale type: float tst: input type: bfloat16, math type: float, output type: bfloat16 , scale type: float -m=<int> : number of rows of A and C -n=<int> : number of columns of B and C -k=<int> : number of columns of A and rows of B -A=<float> : value of alpha -B=<float> : value of beta -T=<int> : run N times back to back , good for power consumption, no results checking -lda=<int> : leading dimension of A , m by default -ldb=<int> : leading dimension of B , k by default -ldc=<int> : leading dimension of C , m by default -ta= op(A) {0=no transpose, 1=transpose, 2=hermitian} -tb= op(B) {0=no transpose, 1=transpose, 2=hermitian} -p=<int> : 0:fill all matrices with zero, otherwise fill with pseudorandom distribution -m_outOfPlace=<int> : out of place (C != D), 0: disable, 1:enable -m_epilogue={Default,Bias,Gelu,ReLu,ReLuBias,GeluBias} -z<a|b|c|d>=<0|1> : zero-copy , A,B,C or D is pinned on the Host, 0: disable, 1:enable -s : shows the CUDA configuration of the machines
然后根据不同精度测试
INT8
./cublasMatmulBench -P=bisb_imma -m=8192 -n=3456 -k=16384 -T=1000 -ta=1 -B=0
FP16
./cublasMatmulBench -P=hsh -m=12288 -n=9216 -k=32768 -T=1000 -tb=1 -B=0
TF32
./cublasMatmulBench -P=sss_fast_tf32 -m=8192 -n=3456 -k=16384 -T=1000 -ta=1 -B=0
FP32
./cublasMatmulBench -P=ddd -m=3456 -n=2048 -k=16384 -T=1000 -tb=1 -B=0
FP64
./cublasMatmulBench -P=sss -m=3456 -n=2048 -k=16384 -T=1000 -tb=1 -B=0
内容版权声明:除非注明,否则皆为本站原创文章。
转载注明出处:https://sulao.cn/post/1065
评论列表