Abstract
One dominant approach for reducing response times in large-scale systems is Join-the-Shortest-Queue-d: whenever a job arrives, the dispatcher queries d servers at random and then assigns the job to the queried server with the shortest queue. While \(\textsf{JSQ}\)-d is known to perform quite well in systems where all servers run at the same speed, this is not the case in systems that exhibit heterogeneity with respect to server speeds. Unfortunately, there is no straightforward way to extend \(\textsf{JSQ}\)-d (or other so-called power-of-d policies) to heterogeneous systems. Should a job be assigned to the queried server with the shortest queue even if much faster servers were among those queried? Should a job be assigned to the queried server where it is expected to complete the soonest even if there is an idle, albeit slower, server available among those queried? And for that matter, should we query faster servers more often than their slower counterparts? Recent work has introduced a framework for developing strong dispatching policies by pairing suitably chosen querying and assignment rules. Within this framework, prior work has focused on finding strong-performing dispatching policies that use only the idle/busy statuses of the queried servers, rather than detailed queue length information. In this paper, we overcome the challenge of evaluating the performance of—and finding strong-performing—general scalable dispatching policies that make use of both server speed and detailed queue length information, through a combination of mean field analysis and a sequence of modified optimization problems. We find that well-designed length-aware dispatching policies can significantly outperform their idleness-based counterparts in large-scale heterogeneous systems. While the best policies of this kind are often complicated to describe, we find that in the vast majority of cases the relatively simple Shortest Expected Wait policy performs nearly as well, without the need for an especially sophisticated and time-consuming optimization procedure.









Similar content being viewed by others
Notes
As we have noted there may be multiple such choices of \(\varvec{y}\), but which \(\varvec{y}\) we choose is immaterial as \(\alpha _i^{(\widehat{x}_i)}(\varvec{y})\) is the same for all such \(\varvec{y}\). We need not resort to the axiom of choice in this definition, as we can impose a lexicographic order \(\mathcal {A}\) and take \(\varvec{x}\) to be the first (with respect to this order) \(\varvec{y}\in \mathcal {A}\) satisfying the desired property.
Note that \(\sum _{n=0}^\infty \alpha _i^{(n)}(\varvec{x})\) is the probability that a job is assigned to some class-i (regardless of its queue length) when the result of the query is \(\varvec{x}\); under the dominance pruning the job will always be assigned to a class-i server of the shortest queue length.
More precisely, each row in the table corresponds to a different querying rule family over which the querying rule has been optimized.
Again, more precisely, each column in the table corresponds to a different assignment rule family over which the assignment rule has been optimized.
Note that, in the figure, we only include decisions pertaining to the \(\widetilde{\varvec{x}}\) states that occur with probability at least \(10^{-5}\); the assignment decision is immaterial for states that do not arise sufficiently frequently. The result is that, in our example policy, there are 39 values of \((i,\widetilde{\varvec{x}})\in \mathcal {S}\times \widetilde{\mathcal {A}}\) for which \(\widetilde{\alpha }_i(\widetilde{\varvec{x}})>0\).
Note that the notation introduced here is slightly different from that used for \(\textbf{IND}\) in [1]; nonetheless, the two approaches are equivalent.
References
Abdul Jaleel, J., Doroudi, S., Gardner, K., Wickeham, A.: A general "power-of-d’’ dispatching framework for heterogeneous systems. Queueing Syst. (2022). https://doi.org/10.1007/s11134-022-09736-z
Abdul Jaleel, J., Wickeham, A., Doroudi, S., Gardner, K.: A general“power-of-d” dispatching framework for heterogeneous systems. In: Workshop on Mathematical Performance Modeling and Analysis (MAMA) (2020)
Aghajani, R., Li, X., Ramanan, K.: The PDE method for the analysis of randomized load balancing networks. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 1(2), 1–28 (2017)
Banawan, S., Zeidat, N.: A comparative study of load sharing in heterogeneous multicomputer systems. In: Proceedings. 25th Annual Simulation Symposium, pp. 22–31. IEEE (1992)
Banawan, S.A., Zahorjan, J.: Load sharing in heterogeneous queueing systems. In: Proceedings of IEEE INFOCOM’89, pp. 731–739 (1989)
Bonomi, F.: On job assignment for a parallel system of processor sharing queues. IEEE Trans. Comput. 39(7), 858–869 (1990)
van der Boor, M., Comte, C.: Load balancing in heterogeneous server clusters: insights from a product-form queueing model. In: 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), pp. 1–10. IEEE (2021)
Bramson, M., Lu, Y., Prabhakar, B.: Randomized load balancing with general service time distributions. ACM SIGMETRICS Perform. Eval. Rev. 38(1), 275 (2010). https://doi.org/10.1145/1811099.1811071
Bramson, M., Lu, Y., Prabhakar, B.: Asymptotic independence of queues under randomized load balancing. Queueing Syst. (2012). https://doi.org/10.1007/s11134-012-9311-0
Chen, H., Ye, H.Q.: Asymptotic optimality of balanced routing. Oper. Res. 60(1), 163–179 (2012)
Comte, C.: Dynamic load balancing with tokens. Comput. Commun. 144, 76–88 (2019)
Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Feng, H., Misra, V., Rubenstein, D.: Optimal state-free, size-aware dispatching for heterogeneous m/g/-type systems. Perform. Eval. 62(1), 475–492 (2005). https://doi.org/10.1016/j.peva.2005.07.031
Gamarnik, D., Tsitsiklis, J.N., Zubeldia, M.: Stability, memory, and messaging trade-offs in heterogeneous service systems. Math. Oper. Res. 47(3), 1862–1874 (2022)
Gardner, K., Jaleel, J.A., Wickeham, A., Doroudi, S.: Scalable load balancing in the presence of heterogeneous servers. Perform. Eval. (2020). https://doi.org/10.1016/j.peva.2020.102151
Goren, G., Vargaftik, S., Moses, Y.: Stochastic coordination in heterogeneous load balancing systems. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, pp. 403–414 (2021)
Gupta, V., Harchol-Balter, M., Sigman, K., Whitt, W.: Analysis of join-the-shortest-queue routing for web server farms. Perform. Eval. 64(9–12), 1062–1081 (2007)
Hellemans, T., Bodas, T., Van Houdt, B.: Performance analysis of workload dependent load balancing policies. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems (2019). https://doi.org/10.1145/3341617.3326150
Hellemans, T., Van Houdt, B.: On the power-of-d-choices with least loaded server selection. ACM SIGMETRICS Perform. Eval. Rev. (2019). https://doi.org/10.1145/3292040.3219664
Hyytiä, E.: Optimal routing of fixed size jobs to two parallel servers. INFOR Inf. Syst. Oper. Res. 51(4), 215–224 (2013). https://doi.org/10.3138/infor.51.4.215
Izagirre, A., Makowski, A.: Light traffic performance under the power of two load balancing strategy: the case of server heterogeneity. SIGMETRICS Perform. Eval. Rev. 42(2), 18–20 (2014)
Jaleel, J.A., Wickeham, A., Doroudi, S., Gardner, K.: A general “power-of-d’’ dispatching framework for heterogeneous systems. Perform. Eval. Rev. 48(2), 30–32 (2020)
Koole, G.: A simple proof of the optimality of a threshold policy in a two-server queueing system. Syst. Control Lett. 26(5), 301–303 (1995)
Larsen, R.L.: Control of Multiple Exponential Servers with Application to Computer Systems. Ph.D. thesis, College Park (1981)
Li, Q.L., Dai, G., Lui, J.C.S., Wang, Y.: The mean-field computation in a supermarket model with server multiple vacations. Discrete Event Dyn. Syst. 24(4), 473–522 (2014). https://doi.org/10.1007/s10626-013-0171-5
Lin, W., Kumar, P.R.: Optimal control of a queueing system with two heterogeneous servers. IEEE Trans. Autom. Control 29(8), 696–703 (1984)
Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68(11), 1056–1071 (2011)
Lubin, M., Dunning, I.: Computing in operations research using julia. INFORMS J. Comput. 27(2), 238–248 (2015). https://doi.org/10.1287/ijoc.2014.0623
Luh, H.P., Viniotis, I.: Threshold control policies for heterogeneous server systems. Math. Methods Oper. Res. 55(1), 121–142 (2002)
Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)
Mitzenmacher, M.: Analyzing distributed join-idle-queue: a fluid limit approach. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 312–318. IEEE (2016)
Mitzenmacher, M.D., Vocking, B.: The asymptotics of selecting the shortest of two, improved (1999)
Mukhopadhyay, A., Mazumdar, R.R.: Analysis of randomized join-the-shortest-queue (JSQ) schemes in large heterogeneous processor-sharing systems. IEEE Trans. Control Netw. Syst. 3(2), 116–126 (2015)
Nelson, R.D., Philips, T.K.: An Approximation to the Response Time for Shortest Queue Routing, vol. 17. ACM (1989)
Rubinovitch, M.: The Slow Server Problem. J. Appl. Probab. 22(1), 205–213 (1985)
Rubinovitch, M.: The slow server problem: a queue with stalling. J. Appl. Probab. 22(4), 879–892 (1985)
Rykov, V.V., Efrosinin, D.V.: On the slow server problem. Autom. Remote Control 70(12), 2013–2023 (2009)
Selen, J., Adan, I., Kapodistria, S.: Approximate performance analysis of generalized join the shortest queue routing. In: Proceedings of the 9th EAI International Conference on Performance Evaluation Methodologies and Tools, pp. 103–110. ICST (Institute for Computer Sciences, Social-Informatics) (2016)
Selen, J., Adan, I., Kapodistria, S., van Leeuwaarden, J.: Steady-state analysis of shortest expected delay routing. Queueing Syst. 84(3–4), 309–354 (2016)
Sethuraman, J., Squillante, M.S.: Optimal stochastic scheduling in multiclass parallel queues. SIGMETRICS Perform. Eval. Rev. 27(1), 93–102 (1999). https://doi.org/10.1145/301464.301483
Shneer, S., Stolyar, A.L.: Large-scale parallel server system with multi-component jobs. Queueing Syst. 98, 21–48 (2021)
Stolyar, A.: Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80(4), 341–361 (2015)
Tantawi, A.N., Towsley, D.: Optimal static load balancing in distributed computer systems. J. ACM (JACM) 32(2), 445–465 (1985)
Teschl, G.: Ordinary differential equations and dynamical systems, vol. 140. American Mathematical Society (2024)
Van Spilbeeck, I., Van Houdt, B.: On the impact of job size variability on heterogeneity-aware load balancing. Ann. Oper. Res. 293(1), 371–399 (2020)
Vargaftik, S., Keslassy, I., Orda, A.: LSQ: load balancing in large-scale heterogeneous systems with multiple dispatchers. IEEE/ACM Trans. Netw. (2020). https://doi.org/10.1109/TNET.2020.2980061
Vvedenskaya, N., Dobrushin, R., Karpelevich, F.: Queueing system with selection of the shortest of two queues: an asymptotic approach. Problemy Peredachi Informatsii 32(1), 20–34 (1996)
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006). https://doi.org/10.1007/s10107-004-0559-y
Wang, C., Feng, C., Cheng, J.: Distributed join-the-idle-queue for low latency cloud services. IEEE/ACM Trans. Netw. 26(5), 2309–2319 (2018)
Weber, R.R.: On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15(2), 406–413 (1978)
Weng, W., Zhou, X., Srikant, R.: Optimal load balancing with locality constraints. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 4(3), 1–37 (2020)
Whitt, W.: Deciding which queue to join: some counterexamples. Oper. Res. 34(1), 55–62 (1986)
Winston, W.: Optimality of the shortest line discipline. J. Appl. Probab. 14(1), 181–189 (1977)
Zhou, X., Shroff, N., Wierman, A.: Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers. Perform. Eval. 145, 102146 (2021)
Zhou, X., Wu, F., Tan, J., Sun, Y., Shroff, N.: Designing low-complexity heavy-traffic delay-optimal load balancing schemes: theory to algorithms. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 1(2): 39 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix: Validation via simulation
Recall that there are three places at which our analysis is approximate: (1) we assume that the number of servers \(k\rightarrow \infty \), (2) we assume asymptotic independence, meaning that the states of any subset of d servers are mutually independent, and (3) we facilitate our analysis by introducing a finite buffer of size M at each of the servers. Our goal in this section is to validate these assumptions by comparing the performance metrics that we compute analytically to those obtained via simulation.
Below we present results for one particular system parameterization; the results for other parameterizations (not shown) are qualitatively similar. We set \(d=3\), \(s=3\) server speeds, \(q_i = \frac{1}{3}\) for all \(i\in \{1,2,3\}\), \(\mu _1/\mu _3 = 3\), \(\mu _2/\mu _3=1.5\), and \(\lambda =0.85\); observe that this is one of the \(12\,825\) problem instances described in Sect. 5.3. We show results for this particular problem instance because higher values of \(\lambda \) and higher variability in the server speeds will tend to yield slower convergence between the simulated and analytical results. We treat each server’s queue in the simulated systems as having infinite buffer sizes.
Table 4 shows mean response time as a function of the number of servers k in the simulated system for the problem instance described above. Each row in the table results from averaging the mean response time across 10 runs of the simulation, where each run consists of \(50\,000 k\) arrivals and we discard the first \(15\%\) of the arrivals to ensure that the system has reached steady state. The \(95\%\) confidence intervals are all less than 0.002, and the percent error terms (computed as \(\left| (\mathbb {E}[T]^{\textrm{analysis}} - \mathbb {E}[T]^{\textrm{simulation}})/\mathbb {E}[T]^{\textrm{analysis}} \right| \cdot 100\%\)) are all less than \(0.9\%\). One can easily see that, as k increases, the simulated mean response time tends toward that produced by the analysis (in this case 1.6036). This indicates that our assumptions are reasonable and our analysis yields near-exact results, provided that the system under consideration is sufficiently large.
To further validate our results, we also compare the queue length distribution produced by simulation and by our analysis for the same parameter settings described above, and where \(k=1350\) for the simulated results (see Table 5). Again, it is readily apparent that (outside of inconsequentially rare events with probabilities very close to zero) our analytical results match the simulated results very closely, providing further confirmation that our analytical approximation is highly accurate.
Finally, we also verify that the introduction of the finite buffer (in this case our optimization methodology from Appendix D yields \(M=6\)) has virtually no impact on the system performance. Across these simulations, at most 1 out of every 1 million arrivals was ultimately assigned to a server with a queue length at or exceeding our buffer size.
Appendix: Solution existence and uniqueness
In this appendix, we will show that after pruning the space of our admissible assignment rules and introducing a finite buffer (if given an initial value condition), the system of differential equation associated with our system, i.e.,
(with equations for each \( (i,n)\in {\mathcal {S}}\times \{0,1,\ldots M\}\)), has a unique solution.
We first introduce some useful notation. Letting \(\textbf{f}\in {\mathbb {R}}^{sM}\) be given by
and understanding \(f_{i}^{(0)}\equiv 1\) and \(f_{i}^{(M+1)}\equiv 0\) for all \(i\in \mathcal {S}\), we define the function \(F:\mathbb {R}^{sM}\rightarrow {\mathbb {R}}^{sM}\) component-wise by
where
Recalling that all \(f_{i}^{(n)}\) are real-valued functions of time t, it follows that \(\textbf{f}\) and \(F(\textbf{f})\) are vector-valued functions of time t, and hence, we can rewrite the system of differential equations of interest as \((\textbf{f})'=F(\textbf{f})\), where differentiation is with respect to time t.
If F is Lipschitz continuous on the closed hypercube \([0,1]^{sM}\) (i.e., for some norm \(\Vert \cdot \Vert \) there exists some constant \(L\ge 0\) such that for all \(\textbf{f},\textbf{g}\in [0,1]^{sM}\), we have \(\left\Vert F(\textbf{f})-F(\textbf{g})\right\Vert \le L\left\Vert \textbf{f}-\textbf{g}\right\Vert \)) then by the Picard-Lindelöf Theorem [44], the differential equation \((\textbf{f})'=F(\textbf{f})\) together with the initial value condition \(F(\textbf{f})|_{t=t_0}=\textbf{f}_0\) has a unique solution (defined over some neighborhood of \(t_0\)) for all \(\textbf{f}_0\in (0,1)^{sM}\).
In order to show that F is indeed Lipschitz continuous, it is sufficient to show that each component of F is Lipschitz continuous, i.e., for some norm \(\Vert \cdot \Vert \) and for each \((i,n)\in \mathcal {S}\times \{1,\ldots ,M\}\) there exists some constant \(L_i^{(n)}\ge 0\) such that \(\left| F_i^{(n)}(\textbf{f})-F_i^{(n)}(\textbf{g})\right| \le L_i^{(n)} \left\Vert \textbf{f}-\textbf{g}\right\Vert \). Now observe that each \(F_i^{(n)}(\textbf{f})\) is a finite-term multivariate polynomial in the variables \(f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{s}^{(M)}\) (i.e., in the components of \(\textbf{f}\)) of degree at most d. To see that this is the case, observe that \(F_i^{(n)}(\textbf{f})\) is a weighted sum of finitely many products of the form
in addition to the linear terms \(-\mu _if_{i}^{(n)}\) and \(\mu _if_{i}^{(n+1)}\). Moreover, these products are each polynomials of degree at most d in the variables \(f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{s}^{(M)}\) and their weights (coefficients) are constant in these variables; the bound on the polynomial degree of these products follows from the fact that \(\sum _{\ell \in \mathcal {S}}d_\ell =d\) for each \(\varvec{d}\in \mathcal {D}\). Finally, since finite-degree finite-term polynomials are Lipschitz continuous over any compact set, each of the \(F_i^{(n)}\) functions— and hence F itself—are Lipschitz continuous on the hypercube \([0,1]^{sM}\) as desired, and so the system of differential equations of interest admits a unique solution.
Appendix: Additional optimization problems



Appendix: Selection of \(\varvec{h}\) and \(M\) values and seeding procedure
Our methodology for heuristically solving the optimization problem for a dispatching policy family \(\textbf{DPF}\) involves solving the problem multiple times, each time supporting this procedure with a solution from a more restricted family (\(\textbf{DPF}^\dag \subsetneq \textbf{DPF}\)) that was found earlier in our overall solution process. We note that we make use not only of the supporting solution \((\textbf{DPF}^\dag )^\star \)—that is, of the \(p(\varvec{d})\) and \(\widetilde{\alpha }_i(\widetilde{\varvec{x}})\) values associated with the policy \((\textbf{DPF}^\dag )^\star \)—but also
-
the vector \(\varvec{h}^\dag \equiv (h_1^\dag ,h_2^\dag ,\ldots ,h_s^\dag )\) and the buffer size \(M^\dag \) that were used in formulating the optimization problem for \(\textbf{DPF}^\dag \),
-
the probabilities \(\left( f_i^{(j)}\right) ^\dag \) associated with the solution,
-
and the response time yielded by \((\textbf{DPF}^\dag )^\star \), which we denote by \({\mathbb {E}}[T]^\dag \).
Specifically, we write that we heuristically solve the optimization problem for a dispatching policy family \(\textbf{DPF}\) supported by the solution to the problem for the dispatching policy family \(\textbf{DPF}^\dag \)—or supported by \((\textbf{DPF}^\dag )^\star \) for short—to indicate that we carry out the following procedure:
-
(1)
Set
$$\begin{aligned} h_1= & {} \max \left\{ s+3,\min \left\{ h_1^\dag ,\min \left\{ j\in \left\{ 1,2,\ldots ,M^\dag \right\} :\right. \right. \right. \\{} & {} \left. \left. \left. \left( \forall i\in \mathcal {S}:\left( f_{i}^{(j)}\right) ^{\dag }\le 10^{-6}\right) \right\} +1\right\} \right\} , \end{aligned}$$set \(\textbf{h}=(h_1,h_1-1,\ldots ,h_1-s+1)\) and set \(M=h_1\).
One obvious candidate for \(h_1\) is \(h_1^\dag \), the corresponding value that was used to find the supporting solution. However, because higher choices of \(h_1\) will result in longer solving times, we consider using a lower value than \(h_1^\dag \) if we deem this value to have been “unnecessarily high” for the supporting problem in retrospect. In particular, if there exists some queue length \(j \le M^\dag \) such that \(\left( f_{i}^{(j)}\right) ^{\dag } \le 10^{-6}\) for all server classes \(i\in \mathcal {S}\) (that is, if all queues reach length j sufficiently infrequently), then we also consider \(j+1\) as a candidate choice for \(h_1\).
We further require \(h_1 \ge s+3\) so that \(h_s\ge 4\); this ensures that we retain sufficient information to differentiate among servers of different classes. Having selected \(h_1\) according to the above considerations, we then set \(\varvec{h}\) so that \(h_i = h_{i-1}-1\) for all \(i>1\). Finally, we set the buffer size \(M = h_1\); by setting \(h_1\) and M to the same value, we maximize the level of discriminatory power available to the assignment rule, given the buffer size.
-
(2)
Run the optimization problem associated with \(\textbf{DPF}\) seeded with the solution to \(\textbf{DPF}^\dag \) with the additional constraint that
$$\begin{aligned} {\mathbb {E}}[T]\le \left( 1+10^{-3}\right) {\mathbb {E}}[T]^\dag . \end{aligned}$$If no solution is found, re-run this optimization problem without this constraint.
Introducing this constraint limits the “search” for new solutions that are superior to the supporting solution, while allowing for a small amount of “slack.” This constraint occasionally prevents the heuristic optimization procedure from finding the best solutions; hence, if no solution is found, we attempt to solve the problem without the constraint. In both cases we seed IPOPT with the supporting solution.
-
(3)
Consider the following three statements:
-
(a)
The previous step did not yield a solution.
-
(b)
The previous step yielded a solution that was significantly inferior to the supporting solution (i.e., it yielded \({\mathbb {E}}[T]\ge \left( 1+10^{-3}\right) {\mathbb {E}}[T]^\dag \)).
-
(c)
\(\textbf{DPF}=\left\langle \textbf{QRF}, \textbf{ARF} \right\rangle \) for some \(\textbf{QRF}\subsetneq \textbf{GEN}\) (i.e., the querying rule family under consideration is not \(\textbf{GEN}\)).
If at least one of the above statements is true, then do the following: solve the optimization problem associated with \(\textbf{DPF}\) without any seeding (but still using the \(\varvec{h}\) vector and \(M\) value as set in step 1, noting that these may depend on the details of the supporting solution). If no solution is found, increment \(M\) and all entries of \(\varvec{h}\) by 1, and attempt to solve the problem again, up to a maximum of 4 times.
Seeding an optimization problem with a feasible solution sometimes allows IPOPT to find better solutions, but it may lead to the identification of a local solution that would be worse than that found by its “unseeded” counterpart. For this reason, we also typically solve the unseeded problem. However, solving the unseeded problem is very computationally expensive for problems allowing for general (\(\textbf{GEN}\)) querying, so if the querying rule under consideration is \(\textbf{GEN}\) then we conduct this step only when the seeded problem did not yield a good solution.
-
(a)
-
(4)
Select the best solution (i.e., the solution that yields the lowest \({\mathbb {E}}[T]\)) among those from the previous two steps as well as the supporting solution (associated with \(\textbf{DPF}^\dag \subsetneq \textbf{DPF}\)). Report this as the solution for \(\textbf{DPF}\), and call the resulting policy \(\textbf{DPF}^\star \).
\(\textbf{DPF}^\star \) is set to be the best policy among all of those identified throughout this procedure; note that it is possible that the best policy may be the one corresponding to the supporting solution itself, i.e., \((\textbf{DPF}^\dag )^\star \).
Having defined what it means to solve the optimization problem associated with a family \(\textbf{DPF}\) supported by a policy \((\textbf{DPF}^\dag )^\star \), we are now ready to describe the procedure that we use to identify solutions for all policy families of interest. The steps for our solution-finding procedure are as follows (see Fig. 10 for a visual aid):
-
(1)
Find \(\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star \) for each assignment rule \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\) and find \(\left\langle \textbf{SRC}, \textsf{JSQ} \right\rangle ^\star \) by heuristically solving their associated optimization problems (using IPOPT), where in each case we first set \(\varvec{h} = (15,14,...,15-s+1)\) and \(M=15\). If no solution is found increment \(M\) and all elements of \(\varvec{h}\) by 1 and attempt to solve again. Repeat this procedure a maximum of 4 times.
-
(2)
Find \(\left\langle \textsf{BR}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the three policies \(\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star \), for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), yielded the lowest \(\mathbb {E}[T]\). Also find \(\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star \) supported by \(\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star \), for each \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\).
-
(3)
Find \(\left\langle \textbf{IID}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the four policies \(\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star \) for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\) and \(\left\langle \textsf{BR}, \textbf{CLD} \right\rangle \) yielded the lowest \({\mathbb {E}}[T]\). Also find \(\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star \) supported by \(\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star \) for each \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\).
-
(4)
Find \(\left\langle \textbf{IND}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the four policies \(\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star \), for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), and \(\left\langle \textbf{IID}, \textbf{CLD} \right\rangle ^\star \) yielded the lowest \({\mathbb {E}}[T]\). Also, for each \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), find \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle ^\star \) supported by whichever of the two policies \(\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star \) and \(\left\langle \textbf{SRC}, \textsf{JSQ} \right\rangle ^\star \) yielded the lowest \({\mathbb {E}}[T]\).
-
(5)
Find \(\left\langle \textbf{GEN}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the four policies \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle ^\star \), for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), and \(\left\langle \textbf{IND}, \textbf{CLD} \right\rangle \) yielded the lowest \({\mathbb {E}}[T]\).
Note: in our numerical experiments, we obtained solutions for all policies for all 12 825 parameter settings described in Sect. 5.3except for 20 high load parameter instances (\(\lambda =.95\)); on these 20 instances, we obtained solutions for all policies (including \(\left\langle \textbf{GEN}, \textsf{SED} \right\rangle \)) except for \(\left\langle \textsf{BR}, \textsf{SED} \right\rangle ^\star \), \(\left\langle \textbf{IID}, \textsf{SED} \right\rangle ^\star \), \(\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star \), likely because at these parameter values, the \(\textsf{SED}\)-driven policies require a higher setting of \(\varvec{h}\) and \(M\) to obtain a solution. As a result, we could not support the optimization problems for \(\left\langle \textbf{IID}, \textsf{SED} \right\rangle ^\star \) and \(\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star \) with a solution, and when finding \(\left\langle \textbf{GEN}, \textsf{SED} \right\rangle ^\star \) for these parameter instances, we had to discard \(\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star \) as a potential candidate for supporting the solution process.
We also find strongly performing \(\textbf{CID}\)-driven dispatching policies using the methods presented in [1]; we then revise our \(\textbf{CLD}\)-driven policies if (due to the limitations associated with heuristic optimization) we are able to find a \(\textbf{CID}\)-driven policy that outperforms the best identified \(\textbf{CLD}\)-driven counterpart. Specifically, we execute these three steps:
-
(1)
Find \(\left\langle \textsf{BR}, \textbf{CID} \right\rangle ^\star \), \(\left\langle \textbf{IID}, \textbf{CID} \right\rangle ^\star \), \(\left\langle \textbf{IND}, \textbf{CID} \right\rangle ^\star \), and \(\left\langle \textbf{GEN}, \textbf{CID} \right\rangle ^\star \) using the methods presented in [1]. If for any two of these policies \(\textbf{DPF}_1\) and \(\textbf{DPF}_2\), we have \(\textbf{DPF}_1\subseteq \textbf{DPF}_2\) and \(\textbf{DPF}_1^\star \) outperforms \(\textbf{DPF}_2^\star \) (e.g., if \(\left\langle \textbf{IID}, \textbf{CID} \right\rangle \) outperforms \(\left\langle \textbf{IND}, \textbf{CID} \right\rangle \)), then we “overwrite” the policy \(\textbf{DPF}_2^\star \) so that it is set to \(\textbf{DPF}_1^\star \).
-
(2)
For each querying rule family \(\textbf{QRF}\in \{\textsf{BR},\textbf{IID},\textbf{IND},\textbf{GEN}\}\) (where we abuse notation and treat \(\textsf{BR}\) as the family \(\{\textsf{BR}\}\)), if \(\left\langle \textbf{QRF}, \textbf{CID} \right\rangle ^\star \) yields a lower mean response time than \(\left\langle \textbf{QRF}, \textbf{CLD} \right\rangle ^\star \), “overwrite” the policy \(\left\langle \textbf{QRF}, \textbf{CLD} \right\rangle ^\star \) so that it is set to \(\left\langle \textbf{QRF}, \textbf{CID} \right\rangle ^\star \).
Note: we cannot support an optimization problem associated with a \(\textbf{CLD}\) -driven problem with the solution to a \(\textbf{CID}\) -driven problem, because \(\textbf{CID}\)-driven policies do not necessarily obey our dominance pruning, and thus our formulation for the optimization problem associated with a \(\textbf{CLD}\) -driven family may render \(\textbf{CID}\) ph-driven policies infeasible.
-
(3)
For each assignment rule \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), if \(\left\langle \textsf{Q}_{\textbf{CID}}^\star , \textsf{AR} \right\rangle \)—which is uniquely specified by pairing the querying rule of \(\left\langle \textbf{GEN}, \textbf{CID} \right\rangle ^\star \) with the assignment rule \(\textsf{AR}\)—yields a lower mean response time than \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle \), “overwrite” the policy \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle \) so that it is set to \(\left\langle \textsf{Q}_{\textbf{CID}}^\star , \textsf{AR} \right\rangle \).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abdul Jaleel, J., Doroudi, S. & Gardner, K. Queue-length-aware dispatching in large-scale heterogeneous systems. Queueing Syst 108, 125–184 (2024). https://doi.org/10.1007/s11134-024-09918-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11134-024-09918-x