10000 ROCm Port by SlyEcho · Pull Request #1087 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

ROCm Port #1087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 105 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
0fd8363
use hipblas based on cublas
SlyEcho Apr 19, 2023
54a63c1
Update Makefile for the Cuda kernels
SlyEcho Apr 20, 2023
0e005f7
Build file changes
SlyEcho Apr 20, 2023
d3e1984
add rpath
SlyEcho Apr 21, 2023
3677235
More build file changes
SlyEcho Apr 22, 2023
db7a012
Merge 'origin/master' into hipblas
SlyEcho Apr 23, 2023
3a004b2
add rpath
SlyEcho Apr 23, 2023
608aa33
change default GPU arch to match CMake
SlyEcho Apr 25, 2023
d571d16
Merge 'origin/master' into hipblas
SlyEcho Apr 25, 2023
ef51e9e
Merge branch 'ggerganov:master' into hipblas
SlyEcho Apr 26, 2023
ecc0565
only .cu file needs to be complied as device
SlyEcho Apr 27, 2023
a1caa48
add more cuda defines
SlyEcho Apr 28, 2023
3b4a531
Merge 'origin/master' into hipblas
SlyEcho Apr 28, 2023
2ab9d11
Merge 'origin/master' into hipblas
SlyEcho Apr 28, 2023
d194586
Merge 'origin/master' into hipblas
SlyEcho Apr 28, 2023
d8ea75e
Merge 'origin/master' into hipblas
SlyEcho Apr 29, 2023
c73def1
Merge 'origin/master' into hipblas
SlyEcho Apr 30, 2023
fcbc262
Merge 'origin/master' into hipblas
SlyEcho May 1, 2023
b67cc50
Merge 'origin/master' into hipblas
SlyEcho May 3, 2023
d83cfba
Merge 'origin/master' into hipblas
SlyEcho May 4, 2023
04c0d48
Move all HIP stuff to ggml-cuda.cu
SlyEcho May 4, 2023
1107194
Merge 'origin/master' into hipblas
SlyEcho May 5, 2023
289073a
Merge 'origin/master' into hipblas
SlyEcho May 6, 2023
baeb482
Revert to default copy
SlyEcho May 7, 2023
0aefa6a
Merge 'origin/master' into hipblas
SlyEcho May 7, 2023
a3296d5
Merge 'origin/master' into hipblas
SlyEcho May 7, 2023
070cbcc
occupanct function
SlyEcho May 7, 2023
127f68e
Merge 'origin/master' into hipblas
SlyEcho May 11, 2023
605560d
Merge 'origin/master' into hipblas
SlyEcho May 12, 2023
0fe6384
fix makefile
SlyEcho May 12, 2023
2956630
Merge 'origin/master' into hipblas
SlyEcho May 13, 2023
8bab456
Merge 'origin/master' into hipblas
SlyEcho May 14, 2023
a0b2d5f
Merge 'origin/master' into hipblas
SlyEcho May 16, 2023
c66115b
Merge 'origin/master' into hipblas
SlyEcho May 20, 2023
b19fefe
Forwardcompat
SlyEcho May 20, 2023
600ace3
update warp size
SlyEcho May 20, 2023
f80ce7a
Merge branch 'origin/master' into hipblas
SlyEcho May 24, 2023
174bf6a
Merge 'origin/master' into hipblas
SlyEcho May 25, 2023
a593a4f
Add missing parameters
SlyEcho May 25, 2023
30d921a
and makefile
SlyEcho May 25, 2023
4c8b3fb
add configurable vars
SlyEcho May 25, 2023
a4648c1
Merge 'origin/master' into hipblas
SlyEcho May 27, 2023
9fdaa1d
Add more defs
SlyEcho May 27, 2023
33091a9
Merge 'origin/master' into hipblas
SlyEcho Jun 6, 2023
5d6eb72
warp size fixes
SlyEcho Jun 6, 2023
1ba4ce4
Revert "warp size fixes"
SlyEcho Jun 6, 2023
fa5b3d7
fix makefile.
SlyEcho Jun 6, 2023
4362e80
Merge 'origin/master' into hipblas
SlyEcho Jun 6, 2023
85f902d
Merge 'origin/master' into hipblas
SlyEcho Jun 8, 2023
a836529
Merge 'origin/master' into hipblas
SlyEcho Jun 14, 2023
61df8e9
add cudaMemset
SlyEcho Jun 14, 2023
6f7c156
Merge 'origin/master' into hipblas
SlyEcho Jun 17, 2023
67e229b
Merge 'origin/master' into hipblas
SlyEcho Jun 17, 2023
5dd2fbe
Merge 'origin/master' into hipblas
SlyEcho Jun 19, 2023
df7346c
Merge 'origin/master' into hipblas
SlyEcho Jun 22, 2023
35a6031
Merge 'origin/master' into hipblas
SlyEcho Jun 25, 2023
c1e5c83
Merge 'origin/master' into hipblas
SlyEcho Jun 25, 2023
c8ae945
Merge 'origin/master' into hipblas
SlyEcho Jun 27, 2023
bb16eff
headers fix; add kquants_iter for hipblas and add gfx803 (#1)
YellowRoseCx Jun 28, 2023
04419f1
Merge 'origin/master' into hipblas
SlyEcho Jun 28, 2023
15db19a
Merge 'origin/master' into hipblas
SlyEcho Jul 2, 2023
c3e3733
ROCm fixes
SlyEcho Jul 2, 2023
7735c5a
Merge 'origin/master' into hipblas
SlyEcho Jul 4, 2023
80e4e54
Merge 'origin/master' into hipblas
SlyEcho Jul 9, 2023
e610466
Expand arch list and make it overrideable
SlyEcho Jul 11, 2023
8c2c497
Merge 'origin/master' into hipblas
SlyEcho Jul 11, 2023
afcb8fe
Add new config option
SlyEcho Jul 11, 2023
cd36b18
Merge 'origin/master' into hipblas
SlyEcho Jul 13, 2023
2ec4466
Update build flags.
SlyEcho Jul 13, 2023
3db70b5
Merge 'origin/master' into hipblas
SlyEcho Jul 17, 2023
1f6294d
Fix multi GPU on multiple amd architectures with rocblas_initialize()…
YellowRoseCx Jul 24, 2023
8e8054a
Add rocblas to build files
SlyEcho Jul 24, 2023
cde52d6
Merge 'origin/master' into hipblas
SlyEcho Jul 24, 2023
d2ade63
Merge 'origin/master' into hipblas
SlyEcho Jul 29, 2023
f8e3fc6
rocblas init stuff
SlyEcho Jul 29, 2023
4336231
add hipBLAS to README
SlyEcho Jul 29, 2023
c1664a0
Merge 'origin/master' into hipblas
SlyEcho Jul 31, 2023
c1cb70d
new build arg LLAMA_CUDA_MMQ_Y
SlyEcho Jul 31, 2023
d91456a
fix half2 decomposition
ardfork Jul 31, 2023
ab62128
Merge 'origin/master' into hipblas
SlyEcho Aug 8, 2023
4024f91
Add intrinsics polyfills for AMD
SlyEcho Aug 8, 2023
610ba4c
Merge 'origin/master' into hipblas
SlyEcho Aug 9, 2023
8f8ab6c
hipLDFLAG Path change Unix to multisystem in Makefile
YellowRoseCx Aug 9, 2023
29a59b5
Fix merge
SlyEcho Aug 10, 2023
f41920e
AMD assembly optimized __dp4a
8000 Engininja2 Aug 10, 2023
42e055d
ws fix
SlyEcho Aug 10, 2023
e6b6ae5
Undo mess
SlyEcho Aug 11, 2023
c299c4a
New __dp4a assembly
Engininja2 Aug 11, 2023
b815e97
Merge 'origin/master' into hipblas
SlyEcho Aug 11, 2023
4e58a05
Allow overriding CC_TURING
SlyEcho Aug 11, 2023
6415610
gfx1100 support
SlyEcho Aug 12, 2023
70e2f7c
Merge 'origin/master' into hipblas
SlyEcho Aug 14, 2023
68e79cc
Merge 'origin/master' into hipblas
SlyEcho Aug 16, 2023
3de6a9a
reenable LLAMA_CUDA_FORCE_DMMV
SlyEcho Aug 16, 2023
bbbc0ce
makefile rewrite
SlyEcho Aug 16, 2023
c88c2a9
probably lld is not required
SlyEcho Aug 16, 2023
423db74
Merge 'origin/master' into hipblas
SlyEcho Aug 21, 2023
391dd9a
Merge 'origin/master' into hipblas
SlyEcho Aug 22, 2023
5d3e7b2
use "ROCm" instead of "CUDA"
SlyEcho Aug 22, 2023
7b84217
Merge 'origin/master' into hipblas
SlyEcho Aug 24, 2023
058f905
ignore all build dirs
SlyEcho Aug 24, 2023
a60231f
Add Dockerfiles
SlyEcho Aug 24, 2023
81ecaa4
fix llama-bench
SlyEcho Aug 24, 2023
238335f
fix -nommq help for non CUDA/HIP
SlyEcho Aug 24, 2023
9035cfc
Merge 'origin/master' into hipblas
SlyEcho Aug 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge 'origin/master' into hipblas
  • Loading branch information
SlyEcho committed Apr 23, 2023
commit db7a01297e691caaf670a3afd197d2802af78d67
8 changes: 4 additi 8000 ons & 4 deletions ggml-cuda.cu
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#include <stdint.h>
#if defined(__HIP_PLATFORM_AMD__)
#include "hip/hip_runtime.h"
#define cudaStream_t hipStream_t
#define __half _Float16
#include <stdio.h>
#if defined(GGML_USE_HIPBLAS)
#include "hip/hip_fp16.h"
#else
#include <cuda_fp16.h>
#endif
#include <atomic>
#include "ggml-cuda.h"

typedef uint16_t ggml_fp16_t;
Expand Down
32 changes: 32 additions & 0 deletions ggml-cuda.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,37 @@
#if defined(GGML_USE_HIPBLAS)
#include "hipblas/hipblas.h"
#include "hip/hip_runtime.h"
#define CUBLAS_COMPUTE_32F HIPBLAS_R_32F
#define CUBLAS_GEMM_DEFAULT HIPBLAS_GEMM_DEFAULT
#define CUBLAS_OP_N HIPBLAS_OP_N
#define CUBLAS_OP_T HIPBLAS_OP_T
#define CUBLAS_STATUS_SUCCESS HIPBLAS_STATUS_SUCCESS
#define cublasCreate hipblasCreate
#define cublasGemmEx hipblasGemmEx
#define cublasHandle_t hipblasHandle_t
#define cublasSetStream hipblasSetStream
#define cublasSgemm hipblasSgemm
#define cublasStatus_t hipblasStatus_t
#define CUDA_R_16F HIPBLAS_R_16F
#define CUDA_R_32F HIPBLAS_R_32F
#define cudaError_t hipError_t
#define cudaFree hipFree
#define cudaGetErrorString hipGetErrorString
#define cudaGetLastError hipGetLastError
#define cudaMalloc hipMalloc
#define cudaMemcpyAsync hipMemcpyAsync
#define cudaMemcpyDeviceToHost hipMemcpyDeviceToHost
#define cudaMemcpyHostToDevice hipMemcpyHostToDevice
#define cudaStream_t hipStream_t
#define cudaStreamCreateWithFlags hipStreamCreateWithFlags
#define cudaStreamNonBlocking hipStreamNonBlocking
#define cudaStreamSynchronize hipStreamSynchronize
#define cudaSuccess hipSuccess
#define GGML_USE_CUBLAS
#else
#include <cublas_v2.h>
#include <cuda_runtime.h>
#endif

#ifdef __cplusplus
extern "C" {
Expand Down
36 changes: 1 addition & 35 deletions ggml.c
Original file line number Diff line number Diff line change
Expand Up @@ -147,41 +147,7 @@ inline static void* ggml_aligned_malloc(size_t size) {
#include <Accelerate/Accelerate.h>
#elif defined(GGML_USE_OPENBLAS)
#include <cblas.h>
#elif defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS)

#if defined(GGML_USE_HIPBLAS)
#include "hipblas/hipblas.h"
#define CUBLAS_COMPUTE_32F HIPBLAS_R_32F
#define CUBLAS_GEMM_DEFAULT HIPBLAS_GEMM_DEFAULT
#define CUBLAS_OP_N HIPBLAS_OP_N
#define CUBLAS_OP_T HIPBLAS_OP_T
#define CUBLAS_STATUS_SUCCESS HIPBLAS_STATUS_SUCCESS
#define cublasCreate hipblasCreate
#define cublasGemmEx hipblasGemmEx
#define cublasHandle_t hipblasHandle_t
#define cublasSetStream hipblasSetStream
#define cublasSgemm hipblasSgemm
#define cublasStatus_t hipblasStatus_t
#define CUDA_R_16F HIPBLAS_R_16F
#define CUDA_R_32F HIPBLAS_R_32F
#define cudaError_t hipError_t
#define cudaFree hipFree
#define cudaGetErrorString hipGetErrorString
#define cudaGetLastError hipGetLastError
#define cudaMalloc hipMalloc
#define cudaMemcpyAsync hipMemcpyAsync
#define cudaMemcpyDeviceToHost hipMemcpyDeviceToHost
#define cudaMemcpyHostToDevice hipMemcpyHostToDevice
#define cudaStream_t hipStream_t
#define cudaStreamCreateWithFlags hipStreamCreateWithFlags
#define cudaStreamNonBlocking hipStreamNonBlocking
#define cudaStreamSynchronize hipStreamSynchronize
#define cudaSuccess hipSuccess
#define GGML_USE_CUBLAS
#else
#include <cublas_v2.h>
#include <cuda_runtime.h>
#endif
#elif defined(GGML_USE_CUBLAS) | defined(GGML_USE_HIPBLAS)
#include "ggml-cuda.h"
#endif

Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.
0