RPC sync #193

saood06 · 2025-02-08T11:12:07Z

I grabbed all of the changes needed for llama.cpp/pull/11047 , which was ggml-org/llama.cpp#9912 and ggml-org/llama.cpp#9040

This compiles, but has not been tested yet.

Add more checks which prevent RPC server from crashing if invalid input is received from client

Co-authored-by: Diego Devesa <slarengh@gmail.com>

Use structs for RPC request/response messages

ikawrakow · 2025-02-08T13:23:08Z

I never use RPC, have never looked into the RPC code, so I'll have to rely on you for self-review and testing.

saood06 · 2025-02-10T16:40:34Z

@jukofyork

I strongly suspect something funky is going on

There is, see this comment: #180 (comment)

This fork has much faster PP speeds, has Deepseek MLA support with a flag (-mla), this PR should allow RPC to work, and I'm working on porting the add option to override model tensor buffers.

This is something I've done for a while on my Windows builds due to the fact that on Windows long is not 8 bytes. On linux this changes nothing as both are 8 bytes there.

saood06 · 2025-02-27T23:11:54Z

This has been tested, and does not currently work. I'm not sure why as the errors I'm getting seem to have never been encountered by people on llama.cpp.

saood06 · 2025-02-27T23:14:23Z

ggml/src/ggml-rpc.cpp

+
+        rpc_msg_get_alloc_size_rsp response;
+        bool status = send_rpc_cmd(sock, RPC_CMD_GET_ALLOC_SIZE, &request, sizeof(request), &response, sizeof(response));
+        GGML_ASSERT(status);


The RPC client crashes here, which happens as the RPC server hits an issue.

saood06 · 2025-02-27T23:17:32Z

ggml/src/ggml-rpc.cpp

+    ggml_tensor * tensor = deserialize_tensor(ctx, &request.tensor);
+
+    if (tensor == nullptr) {
+        GGML_PRINT_DEBUG("Null tensor pointer passed to server get_alloc_size function.\n");


I'm fairly certain this is where the RPC server is crashing, although it doesn't print the message as I never ran with GGML_DEBUG on.

ubergarm · 2025-04-11T18:32:04Z

@saood06

I just came across another llama.cpp fork called prima.cpp which claims to have improved support for multi-device distributed inferencing.

I haven't tried it, just saw it on reddit today. Might be worth a shot given your GPU is in a different system than your big RAM box.

saood06 · 2025-04-12T04:39:37Z

@saood06

I just came across another llama.cpp fork called prima.cpp which claims to have improved support for multi-device distributed inferencing.

I haven't tried it, just saw it on reddit today. Might be worth a shot given your GPU is in a different system than your big RAM box.

Thanks for the link, it is interesting. I think it would work for dense models but not as well for MoE because as far as I can tell it doesn't handle -ot (this commit looks relevant) . I'd also need windows support which is on the roadmap (but I might see what the issue is by trying to build it on my machine, and see if I can fix it), and the GPU machine has to run windows (my big RAM box runs clear linux, and I have other servers that run FreeBSD and Proxmox).

rgerganov and others added 12 commits February 8, 2025 03:10

rpc : prevent crashes on invalid input (#9040)

a7a532d

Add more checks which prevent RPC server from crashing if invalid input is received from client

rpc : prevent crashes on invalid input (#9040)

96ec90c

Add more checks which prevent RPC server from crashing if invalid input is received from client

Added init tensor calling code

823cf1e

Added get_alloc_size forwarding

105b40c

Cleaned up and improved type/error handling.

14f2d8e

fix: remove trailing whitespaces.

a6e460d

Cleanup and use GGML error logging functions.

07c628f

Handle potentially dangerous edge cases.

d2dd8b4

Apply suggestions from code review

a0631d8

Co-authored-by: Diego Devesa <slarengh@gmail.com>

Fix merge mistake

bd7bde0

rpc : refactor backend

0f9bdbb

Use structs for RPC request/response messages

rpc : refactor server

2a77cb9

Fixes

0219e22

saood06 added 3 commits February 25, 2025 00:25

Merge remote-tracking branch 'origin/main' into s6/rpc

f16ef77

Merge remote-tracking branch 'origin/main' into s6/rpc

c47ef20

Change long to long long

b01012e

This is something I've done for a while on my Windows builds due to the fact that on Windows long is not 8 bytes. On linux this changes nothing as both are 8 bytes there.

saood06 commented Feb 27, 2025

View reviewed changes

saood06 mentioned this pull request Jun 1, 2025

Rpc improvement #480

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPC sync #193

RPC sync #193

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RPC sync #193

Are you sure you want to change the base?

RPC sync #193

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!