Computer Science > Networking and Internet Architecture

arXiv:2401.12961 (cs)

[Submitted on 23 Jan 2024 (v1), last revised 16 Jun 2024 (this version, v2)]

Title:Eloquent: A More Robust Transmission Scheme for LLM Token Streaming

Authors:Hanchen Li, Yuhan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang

Abstract:To render each generated token in real-time for users, the Large Language Model (LLM) server generates tokens one by one and streams each token (or group of a few tokens) through the network to the user right after generation, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of later tokens even if the packets containing them arrive on time. With a measurement study, we show that current applications suffer from increased stalls under unstable networks.
For this emerging token streaming problem in LLM Chatbots that differs from previous multimedia and text applications, we propose a novel transmission scheme, called Eloquent, which puts newly generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and, in the meantime, is independently rendered when received, avoiding the aforementioned stalls caused by missing packets. Through simulation under various networks, we show Eloquent reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the retransmission method commonly used by real chatbot applications and by 31.6% compared to the baseline packet duplication scheme. By tailoring Eloquent to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.

Comments:	In SIGCOMM Workshop on Networks for AI Computing (NAIC '24)
Subjects:	Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Cite as:	arXiv:2401.12961 [cs.NI]
	(or arXiv:2401.12961v2 [cs.NI] for this version)
	https://doi.org/10.48550/arXiv.2401.12961
Related DOI:	https://doi.org/10.1145/3672198.3673797

Submission history

From: Hanchen Li [view email]
[v1] Tue, 23 Jan 2024 18:45:27 UTC (2,918 KB)
[v2] Sun, 16 Jun 2024 17:17:41 UTC (2,843 KB)

Computer Science > Networking and Internet Architecture

Title:Eloquent: A More Robust Transmission Scheme for LLM Token Streaming

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Networking and Internet Architecture

Title:Eloquent: A More Robust Transmission Scheme for LLM Token Streaming

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators