8000 cuda : speed-up by using CUBLAS_COMPUTE_32F instead of CUBLAS_COMPUTE_16F by ggerganov · Pull Request #3816 · ggml-org/llama.cpp · GitHub

cuda : speed-up by using CUBLAS_COMPUTE_32F instead of CUBLAS_COMPUTE_16F#3816

Closed

ggerganov wants to merge 1 commit intomasterfrom

Provide feedback