Closed
Description
For https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/server/__main__.py#L202
This line of code will actually block https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/server/__main__.py#L226 for sending heartbeat signal. Although async is used, but the main thread is blocked on executing create_chat_completion. If model is large and first message take a long time to send, network connection will drop