Queue-length-aware dispatching in large-scale heterogeneous systems

169 Accesses
Explore all metrics

Abstract

One dominant approach for reducing response times in large-scale systems is Join-the-Shortest-Queue-d: whenever a job arrives, the dispatcher queries d servers at random and then assigns the job to the queried server with the shortest queue. While $\textsf{JSQ}$-d is known to perform quite well in systems where all servers run at the same speed, this is not the case in systems that exhibit heterogeneity with respect to server speeds. Unfortunately, there is no straightforward way to extend $\textsf{JSQ}$-d (or other so-called power-of-d policies) to heterogeneous systems. Should a job be assigned to the queried server with the shortest queue even if much faster servers were among those queried? Should a job be assigned to the queried server where it is expected to complete the soonest even if there is an idle, albeit slower, server available among those queried? And for that matter, should we query faster servers more often than their slower counterparts? Recent work has introduced a framework for developing strong dispatching policies by pairing suitably chosen querying and assignment rules. Within this framework, prior work has focused on finding strong-performing dispatching policies that use only the idle/busy statuses of the queried servers, rather than detailed queue length information. In this paper, we overcome the challenge of evaluating the performance of—and finding strong-performing—general scalable dispatching policies that make use of both server speed and detailed queue length information, through a combination of mean field analysis and a sequence of modified optimization problems. We find that well-designed length-aware dispatching policies can significantly outperform their idleness-based counterparts in large-scale heterogeneous systems. While the best policies of this kind are often complicated to describe, we find that in the vast majority of cases the relatively simple Shortest Expected Wait policy performs nearly as well, without the need for an especially sophisticated and time-consuming optimization procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A general “power-of-d” dispatching framework for heterogeneous systems

Article 28 April 2022

On the impact of job size variability on heterogeneity-aware load balancing

Article 23 September 2019

Understanding Slowdown in Large-Scale Heterogeneous Systems

Notes

As we have noted there may be multiple such choices of $\varvec{y}$, but which $\varvec{y}$ we choose is immaterial as $\alpha _i^{(\widehat{x}_i)}(\varvec{y})$ is the same for all such $\varvec{y}$. We need not resort to the axiom of choice in this definition, as we can impose a lexicographic order $\mathcal {A}$ and take $\varvec{x}$ to be the first (with respect to this order) $\varvec{y}\in \mathcal {A}$ satisfying the desired property.
Note that $\sum _{n=0}^\infty \alpha _i^{(n)}(\varvec{x})$ is the probability that a job is assigned to some class-i (regardless of its queue length) when the result of the query is $\varvec{x}$; under the dominance pruning the job will always be assigned to a class-i server of the shortest queue length.
More precisely, each row in the table corresponds to a different querying rule family over which the querying rule has been optimized.
Again, more precisely, each column in the table corresponds to a different assignment rule family over which the assignment rule has been optimized.
Note that, in the figure, we only include decisions pertaining to the $\widetilde{\varvec{x}}$ states that occur with probability at least $10^{-5}$; the assignment decision is immaterial for states that do not arise sufficiently frequently. The result is that, in our example policy, there are 39 values of $(i,\widetilde{\varvec{x}})\in \mathcal {S}\times \widetilde{\mathcal {A}}$ for which $\widetilde{\alpha }_i(\widetilde{\varvec{x}})>0$.
Note that the notation introduced here is slightly different from that used for $\textbf{IND}$ in [1]; nonetheless, the two approaches are equivalent.

References

Abdul Jaleel, J., Doroudi, S., Gardner, K., Wickeham, A.: A general "power-of-d’’ dispatching framework for heterogeneous systems. Queueing Syst. (2022). https://doi.org/10.1007/s11134-022-09736-z
Article Google Scholar
Abdul Jaleel, J., Wickeham, A., Doroudi, S., Gardner, K.: A general“power-of-d” dispatching framework for heterogeneous systems. In: Workshop on Mathematical Performance Modeling and Analysis (MAMA) (2020)
Aghajani, R., Li, X., Ramanan, K.: The PDE method for the analysis of randomized load balancing networks. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 1(2), 1–28 (2017)
Banawan, S., Zeidat, N.: A comparative study of load sharing in heterogeneous multicomputer systems. In: Proceedings. 25th Annual Simulation Symposium, pp. 22–31. IEEE (1992)
Banawan, S.A., Zahorjan, J.: Load sharing in heterogeneous queueing systems. In: Proceedings of IEEE INFOCOM’89, pp. 731–739 (1989)
Bonomi, F.: On job assignment for a parallel system of processor sharing queues. IEEE Trans. Comput. 39(7), 858–869 (1990)
Article Google Scholar
van der Boor, M., Comte, C.: Load balancing in heterogeneous server clusters: insights from a product-form queueing model. In: 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), pp. 1–10. IEEE (2021)
Bramson, M., Lu, Y., Prabhakar, B.: Randomized load balancing with general service time distributions. ACM SIGMETRICS Perform. Eval. Rev. 38(1), 275 (2010). https://doi.org/10.1145/1811099.1811071
Article Google Scholar
Bramson, M., Lu, Y., Prabhakar, B.: Asymptotic independence of queues under randomized load balancing. Queueing Syst. (2012). https://doi.org/10.1007/s11134-012-9311-0
Article Google Scholar
Chen, H., Ye, H.Q.: Asymptotic optimality of balanced routing. Oper. Res. 60(1), 163–179 (2012)
Article Google Scholar
Comte, C.: Dynamic load balancing with tokens. Comput. Commun. 144, 76–88 (2019)
Article Google Scholar
Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Article Google Scholar
Feng, H., Misra, V., Rubenstein, D.: Optimal state-free, size-aware dispatching for heterogeneous m/g/-type systems. Perform. Eval. 62(1), 475–492 (2005). https://doi.org/10.1016/j.peva.2005.07.031
Article Google Scholar
Gamarnik, D., Tsitsiklis, J.N., Zubeldia, M.: Stability, memory, and messaging trade-offs in heterogeneous service systems. Math. Oper. Res. 47(3), 1862–1874 (2022)
Article Google Scholar
Gardner, K., Jaleel, J.A., Wickeham, A., Doroudi, S.: Scalable load balancing in the presence of heterogeneous servers. Perform. Eval. (2020). https://doi.org/10.1016/j.peva.2020.102151
Article Google Scholar
Goren, G., Vargaftik, S., Moses, Y.: Stochastic coordination in heterogeneous load balancing systems. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, pp. 403–414 (2021)
Gupta, V., Harchol-Balter, M., Sigman, K., Whitt, W.: Analysis of join-the-shortest-queue routing for web server farms. Perform. Eval. 64(9–12), 1062–1081 (2007)
Article Google Scholar
Hellemans, T., Bodas, T., Van Houdt, B.: Performance analysis of workload dependent load balancing policies. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems (2019). https://doi.org/10.1145/3341617.3326150
Hellemans, T., Van Houdt, B.: On the power-of-d-choices with least loaded server selection. ACM SIGMETRICS Perform. Eval. Rev. (2019). https://doi.org/10.1145/3292040.3219664
Article Google Scholar
Hyytiä, E.: Optimal routing of fixed size jobs to two parallel servers. INFOR Inf. Syst. Oper. Res. 51(4), 215–224 (2013). https://doi.org/10.3138/infor.51.4.215
Article Google Scholar
Izagirre, A., Makowski, A.: Light traffic performance under the power of two load balancing strategy: the case of server heterogeneity. SIGMETRICS Perform. Eval. Rev. 42(2), 18–20 (2014)
Article Google Scholar
Jaleel, J.A., Wickeham, A., Doroudi, S., Gardner, K.: A general “power-of-d’’ dispatching framework for heterogeneous systems. Perform. Eval. Rev. 48(2), 30–32 (2020)
Google Scholar
Koole, G.: A simple proof of the optimality of a threshold policy in a two-server queueing system. Syst. Control Lett. 26(5), 301–303 (1995)
Article Google Scholar
Larsen, R.L.: Control of Multiple Exponential Servers with Application to Computer Systems. Ph.D. thesis, College Park (1981)
Li, Q.L., Dai, G., Lui, J.C.S., Wang, Y.: The mean-field computation in a supermarket model with server multiple vacations. Discrete Event Dyn. Syst. 24(4), 473–522 (2014). https://doi.org/10.1007/s10626-013-0171-5
Article Google Scholar
Lin, W., Kumar, P.R.: Optimal control of a queueing system with two heterogeneous servers. IEEE Trans. Autom. Control 29(8), 696–703 (1984)
Article Google Scholar
Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68(11), 1056–1071 (2011)
Article Google Scholar
Lubin, M., Dunning, I.: Computing in operations research using julia. INFORMS J. Comput. 27(2), 238–248 (2015). https://doi.org/10.1287/ijoc.2014.0623
Article Google Scholar
Luh, H.P., Viniotis, I.: Threshold control policies for heterogeneous server systems. Math. Methods Oper. Res. 55(1), 121–142 (2002)
Article Google Scholar
Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)
Article Google Scholar
Mitzenmacher, M.: Analyzing distributed join-idle-queue: a fluid limit approach. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 312–318. IEEE (2016)
Mitzenmacher, M.D., Vocking, B.: The asymptotics of selecting the shortest of two, improved (1999)
Mukhopadhyay, A., Mazumdar, R.R.: Analysis of randomized join-the-shortest-queue (JSQ) schemes in large heterogeneous processor-sharing systems. IEEE Trans. Control Netw. Syst. 3(2), 116–126 (2015)
Article Google Scholar
Nelson, R.D., Philips, T.K.: An Approximation to the Response Time for Shortest Queue Routing, vol. 17. ACM (1989)
Rubinovitch, M.: The Slow Server Problem. J. Appl. Probab. 22(1), 205–213 (1985)
Article Google Scholar
Rubinovitch, M.: The slow server problem: a queue with stalling. J. Appl. Probab. 22(4), 879–892 (1985)
Article Google Scholar
Rykov, V.V., Efrosinin, D.V.: On the slow server problem. Autom. Remote Control 70(12), 2013–2023 (2009)
Article Google Scholar
Selen, J., Adan, I., Kapodistria, S.: Approximate performance analysis of generalized join the shortest queue routing. In: Proceedings of the 9th EAI International Conference on Performance Evaluation Methodologies and Tools, pp. 103–110. ICST (Institute for Computer Sciences, Social-Informatics) (2016)
Selen, J., Adan, I., Kapodistria, S., van Leeuwaarden, J.: Steady-state analysis of shortest expected delay routing. Queueing Syst. 84(3–4), 309–354 (2016)
Article Google Scholar
Sethuraman, J., Squillante, M.S.: Optimal stochastic scheduling in multiclass parallel queues. SIGMETRICS Perform. Eval. Rev. 27(1), 93–102 (1999). https://doi.org/10.1145/301464.301483
Article Google Scholar
Shneer, S., Stolyar, A.L.: Large-scale parallel server system with multi-component jobs. Queueing Syst. 98, 21–48 (2021)
Article Google Scholar
Stolyar, A.: Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80(4), 341–361 (2015)
Article Google Scholar
Tantawi, A.N., Towsley, D.: Optimal static load balancing in distributed computer systems. J. ACM (JACM) 32(2), 445–465 (1985)
Article Google Scholar
Teschl, G.: Ordinary differential equations and dynamical systems, vol. 140. American Mathematical Society (2024)
Van Spilbeeck, I., Van Houdt, B.: On the impact of job size variability on heterogeneity-aware load balancing. Ann. Oper. Res. 293(1), 371–399 (2020)
Article Google Scholar
Vargaftik, S., Keslassy, I., Orda, A.: LSQ: load balancing in large-scale heterogeneous systems with multiple dispatchers. IEEE/ACM Trans. Netw. (2020). https://doi.org/10.1109/TNET.2020.2980061
Article Google Scholar
Vvedenskaya, N., Dobrushin, R., Karpelevich, F.: Queueing system with selection of the shortest of two queues: an asymptotic approach. Problemy Peredachi Informatsii 32(1), 20–34 (1996)
Google Scholar
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006). https://doi.org/10.1007/s10107-004-0559-y
Article Google Scholar
Wang, C., Feng, C., Cheng, J.: Distributed join-the-idle-queue for low latency cloud services. IEEE/ACM Trans. Netw. 26(5), 2309–2319 (2018)
Article Google Scholar
Weber, R.R.: On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15(2), 406–413 (1978)
Article Google Scholar
Weng, W., Zhou, X., Srikant, R.: Optimal load balancing with locality constraints. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 4(3), 1–37 (2020)
Whitt, W.: Deciding which queue to join: some counterexamples. Oper. Res. 34(1), 55–62 (1986)
Article Google Scholar
Winston, W.: Optimality of the shortest line discipline. J. Appl. Probab. 14(1), 181–189 (1977)
Article Google Scholar
Zhou, X., Shroff, N., Wierman, A.: Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers. Perform. Eval. 145, 102146 (2021)
Article Google Scholar
Zhou, X., Wu, F., Tan, J., Sun, Y., Shroff, N.: Designing low-complexity heavy-traffic delay-optimal load balancing schemes: theory to algorithms. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 1(2): 39 (2017)

Download references

Author information

Authors and Affiliations

University of Minnesota, Minneapolis, USA
Jazeem Abdul Jaleel & Sherwin Doroudi
Amherst College, Amherst, USA
Kristen Gardner

Authors

Jazeem Abdul Jaleel
View author publications
You can also search for this author in PubMed Google Scholar
Sherwin Doroudi
View author publications
You can also search for this author in PubMed Google Scholar
Kristen Gardner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jazeem Abdul Jaleel.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Validation via simulation

Recall that there are three places at which our analysis is approximate: (1) we assume that the number of servers $k\rightarrow \infty $, (2) we assume asymptotic independence, meaning that the states of any subset of d servers are mutually independent, and (3) we facilitate our analysis by introducing a finite buffer of size M at each of the servers. Our goal in this section is to validate these assumptions by comparing the performance metrics that we compute analytically to those obtained via simulation.

Below we present results for one particular system parameterization; the results for other parameterizations (not shown) are qualitatively similar. We set $d=3$, $s=3$ server speeds, $q_i = \frac{1}{3}$ for all $i\in \{1,2,3\}$, $\mu _1/\mu _3 = 3$, $\mu _2/\mu _3=1.5$, and $\lambda =0.85$; observe that this is one of the $12\,825$ problem instances described in Sect. 5.3. We show results for this particular problem instance because higher values of $\lambda $ and higher variability in the server speeds will tend to yield slower convergence between the simulated and analytical results. We treat each server’s queue in the simulated systems as having infinite buffer sizes.

Table 4 shows mean response time as a function of the number of servers k in the simulated system for the problem instance described above. Each row in the table results from averaging the mean response time across 10 runs of the simulation, where each run consists of $50\,000 k$ arrivals and we discard the first $15\%$ of the arrivals to ensure that the system has reached steady state. The $95\%$ confidence intervals are all less than 0.002, and the percent error terms (computed as $\left| (\mathbb {E}[T]^{\textrm{analysis}} - \mathbb {E}[T]^{\textrm{simulation}})/\mathbb {E}[T]^{\textrm{analysis}} \right| \cdot 100\%$) are all less than $0.9\%$. One can easily see that, as k increases, the simulated mean response time tends toward that produced by the analysis (in this case 1.6036). This indicates that our assumptions are reasonable and our analysis yields near-exact results, provided that the system under consideration is sufficiently large.

Table 4 Comparing simulated and analytical results for the scenario with $d=3$, $s=3$ server speeds, $q_i = \frac{1}{3}$ for all $i\in \{1,2,3\}$, $\mu _1/\mu _3 = 3$, $\mu _2/\mu _3=1.5$, and $\lambda =0.85$

Full size table

To further validate our results, we also compare the queue length distribution produced by simulation and by our analysis for the same parameter settings described above, and where $k=1350$ for the simulated results (see Table 5). Again, it is readily apparent that (outside of inconsequentially rare events with probabilities very close to zero) our analytical results match the simulated results very closely, providing further confirmation that our analytical approximation is highly accurate.

Table 5 Queue length probability distributions for each class when $k=1350$ with other parameters corresponding to those used in Table 4. A simulation value of 0 indicates that the queue length was never observed in the simulation

Full size table

Finally, we also verify that the introduction of the finite buffer (in this case our optimization methodology from Appendix D yields $M=6$) has virtually no impact on the system performance. Across these simulations, at most 1 out of every 1 million arrivals was ultimately assigned to a server with a queue length at or exceeding our buffer size.

Appendix: Solution existence and uniqueness

In this appendix, we will show that after pruning the space of our admissible assignment rules and introducing a finite buffer (if given an initial value condition), the system of differential equation associated with our system, i.e.,

$$\begin{aligned} \left( f_{i}^{(n)}\right) '= & {} \frac{\lambda }{q_i} \sum _{\varvec{d}\in \mathcal {D}} \left( p(\varvec{d})\sum _{\bar{\varvec{x}}\in \bar{\mathcal {A}}_{i}^{(n-1)}\left( \varvec{d}\right) } \left( \prod _{\begin{array}{c} \ell \in \mathcal {S} \end{array}}\left( \left( f_{\ell }^{({\bar{x}}_\ell )}\right) ^{d_\ell }-\left( f_{\ell }^{({\bar{x}}_\ell +1)}\right) ^{d_\ell }\right) \right) \widetilde{\alpha }_i\left( \widetilde{\varvec{x}}\right) \right) \\{} & {} \quad - \mu _i \left( f_{i}^{(n)} - f_{i}^{(n+1)}\right) \end{aligned}$$

(with equations for each $ (i,n)\in {\mathcal {S}}\times \{0,1,\ldots M\}$), has a unique solution.

We first introduce some useful notation. Letting $\textbf{f}\in {\mathbb {R}}^{sM}$ be given by

$$\begin{aligned}{} & {} \textbf{f}\equiv \left( f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{1}^{(M)},f_{2}^{(1)},f_{2}^{(2)},\ldots ,f_{2}^{(M)},\ldots ,\right. \\{} & {} \qquad \left. f_{s}^{(1)},f_{s}^{(2)},\ldots ,f_{s}^{(M)}\right) \end{aligned}$$

and understanding $f_{i}^{(0)}\equiv 1$ and $f_{i}^{(M+1)}\equiv 0$ for all $i\in \mathcal {S}$, we define the function $F:\mathbb {R}^{sM}\rightarrow {\mathbb {R}}^{sM}$ component-wise by

$$\begin{aligned}{} & {} F(\textbf{f})\equiv \\{} & {} \left( F_1^{(1)}(\textbf{f}),F_1^{(2)}(\textbf{f}),\ldots F_1^{(M)}(\textbf{f}),F_2^{(1)}(\textbf{f}),F_2^{(2)}(\textbf{f}),\ldots F_2^{(M)}(\textbf{f}),\ldots ,F_s^{(1)}(\textbf{f}),F_s^{(2)}(\textbf{f}),\ldots F_s^{(M)}(\textbf{f})\right) , \end{aligned}$$

where

$$\begin{aligned}{} & {} F_i^{(n)}(\textbf{f})\equiv \frac{\lambda }{q_i} \sum _{\varvec{d}\in \mathcal {D}} \left( p(\varvec{d})\sum _{\bar{\varvec{x}}\in \bar{\mathcal {A}}_{i}^{(n-1)}\left( \varvec{d}\right) } \left( \prod _{\begin{array}{c} \ell \in \mathcal {S} \end{array}}\left( \left( f_{\ell }^{({\bar{x}}_\ell )}\right) ^{d_\ell }-\left( f_{\ell }^{({\bar{x}}_\ell +1)}\right) ^{d_\ell }\right) \right) \widetilde{\alpha }_i\left( \widetilde{\varvec{x}}\right) \right) \\{} & {} \qquad - \mu _i \left( f_{i}^{(n)} - f_{i}^{(n+1)}\right) . \end{aligned}$$

Recalling that all $f_{i}^{(n)}$ are real-valued functions of time t, it follows that $\textbf{f}$ and $F(\textbf{f})$ are vector-valued functions of time t, and hence, we can rewrite the system of differential equations of interest as $(\textbf{f})'=F(\textbf{f})$, where differentiation is with respect to time t.

If F is Lipschitz continuous on the closed hypercube $[0,1]^{sM}$ (i.e., for some norm $\Vert \cdot \Vert $ there exists some constant $L\ge 0$ such that for all $\textbf{f},\textbf{g}\in [0,1]^{sM}$, we have $\left\Vert F(\textbf{f})-F(\textbf{g})\right\Vert \le L\left\Vert \textbf{f}-\textbf{g}\right\Vert $) then by the Picard-Lindelöf Theorem [44], the differential equation $(\textbf{f})'=F(\textbf{f})$ together with the initial value condition $F(\textbf{f})|_{t=t_0}=\textbf{f}_0$ has a unique solution (defined over some neighborhood of $t_0$) for all $\textbf{f}_0\in (0,1)^{sM}$.

In order to show that F is indeed Lipschitz continuous, it is sufficient to show that each component of F is Lipschitz continuous, i.e., for some norm $\Vert \cdot \Vert $ and for each $(i,n)\in \mathcal {S}\times \{1,\ldots ,M\}$ there exists some constant $L_i^{(n)}\ge 0$ such that $\left| F_i^{(n)}(\textbf{f})-F_i^{(n)}(\textbf{g})\right| \le L_i^{(n)} \left\Vert \textbf{f}-\textbf{g}\right\Vert $. Now observe that each $F_i^{(n)}(\textbf{f})$ is a finite-term multivariate polynomial in the variables $f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{s}^{(M)}$ (i.e., in the components of $\textbf{f}$) of degree at most d. To see that this is the case, observe that $F_i^{(n)}(\textbf{f})$ is a weighted sum of finitely many products of the form

$$\begin{aligned} \prod _{\begin{array}{c} \ell \in \mathcal {S} \end{array}}\left( \left( f_{\ell }^{({\bar{x}}_\ell )}\right) ^{d_\ell }-\left( f_{\ell }^{({\bar{x}}_\ell +1)}\right) ^{d_\ell }\right) \end{aligned}$$

in addition to the linear terms $-\mu _if_{i}^{(n)}$ and $\mu _if_{i}^{(n+1)}$. Moreover, these products are each polynomials of degree at most d in the variables $f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{s}^{(M)}$ and their weights (coefficients) are constant in these variables; the bound on the polynomial degree of these products follows from the fact that $\sum _{\ell \in \mathcal {S}}d_\ell =d$ for each $\varvec{d}\in \mathcal {D}$. Finally, since finite-degree finite-term polynomials are Lipschitz continuous over any compact set, each of the $F_i^{(n)}$ functions— and hence F itself—are Lipschitz continuous on the hypercube $[0,1]^{sM}$ as desired, and so the system of differential equations of interest admits a unique solution.

Appendix: Additional optimization problems

^{Footnote 6}

Appendix: Selection of $\varvec{h}$ and $M$ values and seeding procedure

Our methodology for heuristically solving the optimization problem for a dispatching policy family $\textbf{DPF}$ involves solving the problem multiple times, each time supporting this procedure with a solution from a more restricted family ($\textbf{DPF}^\dag \subsetneq \textbf{DPF}$) that was found earlier in our overall solution process. We note that we make use not only of the supporting solution $(\textbf{DPF}^\dag )^\star $—that is, of the $p(\varvec{d})$ and $\widetilde{\alpha }_i(\widetilde{\varvec{x}})$ values associated with the policy $(\textbf{DPF}^\dag )^\star $—but also

the vector $\varvec{h}^\dag \equiv (h_1^\dag ,h_2^\dag ,\ldots ,h_s^\dag )$ and the buffer size $M^\dag $ that were used in formulating the optimization problem for $\textbf{DPF}^\dag $,
the probabilities $\left( f_i^{(j)}\right) ^\dag $ associated with the solution,
and the response time yielded by $(\textbf{DPF}^\dag )^\star $, which we denote by ${\mathbb {E}}[T]^\dag $.

Specifically, we write that we heuristically solve the optimization problem for a dispatching policy family $\textbf{DPF}$ supported by the solution to the problem for the dispatching policy family $\textbf{DPF}^\dag $—or supported by $(\textbf{DPF}^\dag )^\star $ for short—to indicate that we carry out the following procedure:

(1)
Set
$$\begin{aligned} h_1= & {} \max \left\{ s+3,\min \left\{ h_1^\dag ,\min \left\{ j\in \left\{ 1,2,\ldots ,M^\dag \right\} :\right. \right. \right. \\{} & {} \left. \left. \left. \left( \forall i\in \mathcal {S}:\left( f_{i}^{(j)}\right) ^{\dag }\le 10^{-6}\right) \right\} +1\right\} \right\} , \end{aligned}$$
set $\textbf{h}=(h_1,h_1-1,\ldots ,h_1-s+1)$ and set $M=h_1$.

One obvious candidate for $h_1$ is $h_1^\dag $, the corresponding value that was used to find the supporting solution. However, because higher choices of $h_1$ will result in longer solving times, we consider using a lower value than $h_1^\dag $ if we deem this value to have been “unnecessarily high” for the supporting problem in retrospect. In particular, if there exists some queue length $j \le M^\dag $ such that $\left( f_{i}^{(j)}\right) ^{\dag } \le 10^{-6}$ for all server classes $i\in \mathcal {S}$ (that is, if all queues reach length j sufficiently infrequently), then we also consider $j+1$ as a candidate choice for $h_1$.

We further require $h_1 \ge s+3$ so that $h_s\ge 4$; this ensures that we retain sufficient information to differentiate among servers of different classes. Having selected $h_1$ according to the above considerations, we then set $\varvec{h}$ so that $h_i = h_{i-1}-1$ for all $i>1$. Finally, we set the buffer size $M = h_1$; by setting $h_1$ and M to the same value, we maximize the level of discriminatory power available to the assignment rule, given the buffer size.
(2)
Run the optimization problem associated with $\textbf{DPF}$ seeded with the solution to $\textbf{DPF}^\dag $ with the additional constraint that
$$\begin{aligned} {\mathbb {E}}[T]\le \left( 1+10^{-3}\right) {\mathbb {E}}[T]^\dag . \end{aligned}$$
If no solution is found, re-run this optimization problem without this constraint.

Introducing this constraint limits the “search” for new solutions that are superior to the supporting solution, while allowing for a small amount of “slack.” This constraint occasionally prevents the heuristic optimization procedure from finding the best solutions; hence, if no solution is found, we attempt to solve the problem without the constraint. In both cases we seed IPOPT with the supporting solution.
(3)
Consider the following three statements:
1. (a)
  The previous step did not yield a solution.
2. (b)
  The previous step yielded a solution that was significantly inferior to the supporting solution (i.e., it yielded ${\mathbb {E}}[T]\ge \left( 1+10^{-3}\right) {\mathbb {E}}[T]^\dag $).
3. (c)
  $\textbf{DPF}=\left\langle \textbf{QRF}, \textbf{ARF} \right\rangle $ for some $\textbf{QRF}\subsetneq \textbf{GEN}$ (i.e., the querying rule family under consideration is not $\textbf{GEN}$).
If at least one of the above statements is true, then do the following: solve the optimization problem associated with $\textbf{DPF}$ without any seeding (but still using the $\varvec{h}$ vector and $M$ value as set in step 1, noting that these may depend on the details of the supporting solution). If no solution is found, increment $M$ and all entries of $\varvec{h}$ by 1, and attempt to solve the problem again, up to a maximum of 4 times.

Seeding an optimization problem with a feasible solution sometimes allows IPOPT to find better solutions, but it may lead to the identification of a local solution that would be worse than that found by its “unseeded” counterpart. For this reason, we also typically solve the unseeded problem. However, solving the unseeded problem is very computationally expensive for problems allowing for general ($\textbf{GEN}$) querying, so if the querying rule under consideration is $\textbf{GEN}$ then we conduct this step only when the seeded problem did not yield a good solution.
(4)
Select the best solution (i.e., the solution that yields the lowest ${\mathbb {E}}[T]$) among those from the previous two steps as well as the supporting solution (associated with $\textbf{DPF}^\dag \subsetneq \textbf{DPF}$). Report this as the solution for $\textbf{DPF}$, and call the resulting policy $\textbf{DPF}^\star $.

$\textbf{DPF}^\star $ is set to be the best policy among all of those identified throughout this procedure; note that it is possible that the best policy may be the one corresponding to the supporting solution itself, i.e., $(\textbf{DPF}^\dag )^\star $.

Having defined what it means to solve the optimization problem associated with a family $\textbf{DPF}$ supported by a policy $(\textbf{DPF}^\dag )^\star $, we are now ready to describe the procedure that we use to identify solutions for all policy families of interest. The steps for our solution-finding procedure are as follows (see Fig. 10 for a visual aid):

(1)
Find $\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star $ for each assignment rule $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$ and find $\left\langle \textbf{SRC}, \textsf{JSQ} \right\rangle ^\star $ by heuristically solving their associated optimization problems (using IPOPT), where in each case we first set $\varvec{h} = (15,14,...,15-s+1)$ and $M=15$. If no solution is found increment $M$ and all elements of $\varvec{h}$ by 1 and attempt to solve again. Repeat this procedure a maximum of 4 times.
(2)
Find $\left\langle \textsf{BR}, \textbf{CLD} \right\rangle ^\star $ by solving the associated optimization problem supported by whichever of the three policies $\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star $, for $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$, yielded the lowest $\mathbb {E}[T]$. Also find $\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star $ supported by $\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star $, for each $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$.
(3)
Find $\left\langle \textbf{IID}, \textbf{CLD} \right\rangle ^\star $ by solving the associated optimization problem supported by whichever of the four policies $\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star $ for $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$ and $\left\langle \textsf{BR}, \textbf{CLD} \right\rangle $ yielded the lowest ${\mathbb {E}}[T]$. Also find $\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star $ supported by $\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star $ for each $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$.
(4)
Find $\left\langle \textbf{IND}, \textbf{CLD} \right\rangle ^\star $ by solving the associated optimization problem supported by whichever of the four policies $\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star $, for $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$, and $\left\langle \textbf{IID}, \textbf{CLD} \right\rangle ^\star $ yielded the lowest ${\mathbb {E}}[T]$. Also, for each $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$, find $\left\langle \textbf{GEN}, \textsf{AR} \right\rangle ^\star $ supported by whichever of the two policies $\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star $ and $\left\langle \textbf{SRC}, \textsf{JSQ} \right\rangle ^\star $ yielded the lowest ${\mathbb {E}}[T]$.
(5)
Find $\left\langle \textbf{GEN}, \textbf{CLD} \right\rangle ^\star $ by solving the associated optimization problem supported by whichever of the four policies $\left\langle \textbf{GEN}, \textsf{AR} \right\rangle ^\star $, for $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$, and $\left\langle \textbf{IND}, \textbf{CLD} \right\rangle $ yielded the lowest ${\mathbb {E}}[T]$.

Note: in our numerical experiments, we obtained solutions for all policies for all 12 825 parameter settings described in Sect. 5.3except for 20 high load parameter instances ($\lambda =.95$); on these 20 instances, we obtained solutions for all policies (including $\left\langle \textbf{GEN}, \textsf{SED} \right\rangle $) except for $\left\langle \textsf{BR}, \textsf{SED} \right\rangle ^\star $, $\left\langle \textbf{IID}, \textsf{SED} \right\rangle ^\star $, $\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star $, likely because at these parameter values, the $\textsf{SED}$-driven policies require a higher setting of $\varvec{h}$ and $M$ to obtain a solution. As a result, we could not support the optimization problems for $\left\langle \textbf{IID}, \textsf{SED} \right\rangle ^\star $ and $\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star $ with a solution, and when finding $\left\langle \textbf{GEN}, \textsf{SED} \right\rangle ^\star $ for these parameter instances, we had to discard $\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star $ as a potential candidate for supporting the solution process.

We also find strongly performing $\textbf{CID}$-driven dispatching policies using the methods presented in [1]; we then revise our $\textbf{CLD}$-driven policies if (due to the limitations associated with heuristic optimization) we are able to find a $\textbf{CID}$-driven policy that outperforms the best identified $\textbf{CLD}$-driven counterpart. Specifically, we execute these three steps:

(1)
Find $\left\langle \textsf{BR}, \textbf{CID} \right\rangle ^\star $, $\left\langle \textbf{IID}, \textbf{CID} \right\rangle ^\star $, $\left\langle \textbf{IND}, \textbf{CID} \right\rangle ^\star $, and $\left\langle \textbf{GEN}, \textbf{CID} \right\rangle ^\star $ using the methods presented in [1]. If for any two of these policies $\textbf{DPF}_1$ and $\textbf{DPF}_2$, we have $\textbf{DPF}_1\subseteq \textbf{DPF}_2$ and $\textbf{DPF}_1^\star $ outperforms $\textbf{DPF}_2^\star $ (e.g., if $\left\langle \textbf{IID}, \textbf{CID} \right\rangle $ outperforms $\left\langle \textbf{IND}, \textbf{CID} \right\rangle $), then we “overwrite” the policy $\textbf{DPF}_2^\star $ so that it is set to $\textbf{DPF}_1^\star $.
(2)
For each querying rule family $\textbf{QRF}\in \{\textsf{BR},\textbf{IID},\textbf{IND},\textbf{GEN}\}$ (where we abuse notation and treat $\textsf{BR}$ as the family $\{\textsf{BR}\}$), if $\left\langle \textbf{QRF}, \textbf{CID} \right\rangle ^\star $ yields a lower mean response time than $\left\langle \textbf{QRF}, \textbf{CLD} \right\rangle ^\star $, “overwrite” the policy $\left\langle \textbf{QRF}, \textbf{CLD} \right\rangle ^\star $ so that it is set to $\left\langle \textbf{QRF}, \textbf{CID} \right\rangle ^\star $.

Note: we cannot support an optimization problem associated with a $\textbf{CLD}$ -driven problem with the solution to a $\textbf{CID}$ -driven problem, because $\textbf{CID}$-driven policies do not necessarily obey our dominance pruning, and thus our formulation for the optimization problem associated with a $\textbf{CLD}$ -driven family may render $\textbf{CID}$ ph-driven policies infeasible.
(3)
For each assignment rule $\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}$, if $\left\langle \textsf{Q}_{\textbf{CID}}^\star , \textsf{AR} \right\rangle $—which is uniquely specified by pairing the querying rule of $\left\langle \textbf{GEN}, \textbf{CID} \right\rangle ^\star $ with the assignment rule $\textsf{AR}$—yields a lower mean response time than $\left\langle \textbf{GEN}, \textsf{AR} \right\rangle $, “overwrite” the policy $\left\langle \textbf{GEN}, \textsf{AR} \right\rangle $ so that it is set to $\left\langle \textsf{Q}_{\textbf{CID}}^\star , \textsf{AR} \right\rangle $.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Abdul Jaleel, J., Doroudi, S. & Gardner, K. Queue-length-aware dispatching in large-scale heterogeneous systems. Queueing Syst 108, 125–184 (2024). https://doi.org/10.1007/s11134-024-09918-x

Download citation

Received: 09 July 2022
Revised: 12 February 2024
Accepted: 22 May 2024
Published: 03 August 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s11134-024-09918-x