[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Queue-length-aware dispatching in large-scale heterogeneous systems

  • Published:
Queueing Systems Aims and scope Submit manuscript

Abstract

One dominant approach for reducing response times in large-scale systems is Join-the-Shortest-Queue-d: whenever a job arrives, the dispatcher queries d servers at random and then assigns the job to the queried server with the shortest queue. While \(\textsf{JSQ}\)-d is known to perform quite well in systems where all servers run at the same speed, this is not the case in systems that exhibit heterogeneity with respect to server speeds. Unfortunately, there is no straightforward way to extend \(\textsf{JSQ}\)-d (or other so-called power-of-d policies) to heterogeneous systems. Should a job be assigned to the queried server with the shortest queue even if much faster servers were among those queried? Should a job be assigned to the queried server where it is expected to complete the soonest even if there is an idle, albeit slower, server available among those queried? And for that matter, should we query faster servers more often than their slower counterparts? Recent work has introduced a framework for developing strong dispatching policies by pairing suitably chosen querying and assignment rules. Within this framework, prior work has focused on finding strong-performing dispatching policies that use only the idle/busy statuses of the queried servers, rather than detailed queue length information. In this paper, we overcome the challenge of evaluating the performance of—and finding strong-performing—general scalable dispatching policies that make use of both server speed and detailed queue length information, through a combination of mean field analysis and a sequence of modified optimization problems. We find that well-designed length-aware dispatching policies can significantly outperform their idleness-based counterparts in large-scale heterogeneous systems. While the best policies of this kind are often complicated to describe, we find that in the vast majority of cases the relatively simple Shortest Expected Wait policy performs nearly as well, without the need for an especially sophisticated and time-consuming optimization procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. As we have noted there may be multiple such choices of \(\varvec{y}\), but which \(\varvec{y}\) we choose is immaterial as \(\alpha _i^{(\widehat{x}_i)}(\varvec{y})\) is the same for all such \(\varvec{y}\). We need not resort to the axiom of choice in this definition, as we can impose a lexicographic order \(\mathcal {A}\) and take \(\varvec{x}\) to be the first (with respect to this order) \(\varvec{y}\in \mathcal {A}\) satisfying the desired property.

  2. Note that \(\sum _{n=0}^\infty \alpha _i^{(n)}(\varvec{x})\) is the probability that a job is assigned to some class-i (regardless of its queue length) when the result of the query is \(\varvec{x}\); under the dominance pruning the job will always be assigned to a class-i server of the shortest queue length.

  3. More precisely, each row in the table corresponds to a different querying rule family over which the querying rule has been optimized.

  4. Again, more precisely, each column in the table corresponds to a different assignment rule family over which the assignment rule has been optimized.

  5. Note that, in the figure, we only include decisions pertaining to the \(\widetilde{\varvec{x}}\) states that occur with probability at least \(10^{-5}\); the assignment decision is immaterial for states that do not arise sufficiently frequently. The result is that, in our example policy, there are 39 values of \((i,\widetilde{\varvec{x}})\in \mathcal {S}\times \widetilde{\mathcal {A}}\) for which \(\widetilde{\alpha }_i(\widetilde{\varvec{x}})>0\).

  6. Note that the notation introduced here is slightly different from that used for \(\textbf{IND}\) in [1]; nonetheless, the two approaches are equivalent.

References

  1. Abdul Jaleel, J., Doroudi, S., Gardner, K., Wickeham, A.: A general "power-of-d’’ dispatching framework for heterogeneous systems. Queueing Syst. (2022). https://doi.org/10.1007/s11134-022-09736-z

    Article  Google Scholar 

  2. Abdul Jaleel, J., Wickeham, A., Doroudi, S., Gardner, K.: A general“power-of-d” dispatching framework for heterogeneous systems. In: Workshop on Mathematical Performance Modeling and Analysis (MAMA) (2020)

  3. Aghajani, R., Li, X., Ramanan, K.: The PDE method for the analysis of randomized load balancing networks. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 1(2), 1–28 (2017)

  4. Banawan, S., Zeidat, N.: A comparative study of load sharing in heterogeneous multicomputer systems. In: Proceedings. 25th Annual Simulation Symposium, pp. 22–31. IEEE (1992)

  5. Banawan, S.A., Zahorjan, J.: Load sharing in heterogeneous queueing systems. In: Proceedings of IEEE INFOCOM’89, pp. 731–739 (1989)

  6. Bonomi, F.: On job assignment for a parallel system of processor sharing queues. IEEE Trans. Comput. 39(7), 858–869 (1990)

    Article  Google Scholar 

  7. van der Boor, M., Comte, C.: Load balancing in heterogeneous server clusters: insights from a product-form queueing model. In: 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), pp. 1–10. IEEE (2021)

  8. Bramson, M., Lu, Y., Prabhakar, B.: Randomized load balancing with general service time distributions. ACM SIGMETRICS Perform. Eval. Rev. 38(1), 275 (2010). https://doi.org/10.1145/1811099.1811071

    Article  Google Scholar 

  9. Bramson, M., Lu, Y., Prabhakar, B.: Asymptotic independence of queues under randomized load balancing. Queueing Syst. (2012). https://doi.org/10.1007/s11134-012-9311-0

    Article  Google Scholar 

  10. Chen, H., Ye, H.Q.: Asymptotic optimality of balanced routing. Oper. Res. 60(1), 163–179 (2012)

    Article  Google Scholar 

  11. Comte, C.: Dynamic load balancing with tokens. Comput. Commun. 144, 76–88 (2019)

    Article  Google Scholar 

  12. Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)

    Article  Google Scholar 

  13. Feng, H., Misra, V., Rubenstein, D.: Optimal state-free, size-aware dispatching for heterogeneous m/g/-type systems. Perform. Eval. 62(1), 475–492 (2005). https://doi.org/10.1016/j.peva.2005.07.031

    Article  Google Scholar 

  14. Gamarnik, D., Tsitsiklis, J.N., Zubeldia, M.: Stability, memory, and messaging trade-offs in heterogeneous service systems. Math. Oper. Res. 47(3), 1862–1874 (2022)

    Article  Google Scholar 

  15. Gardner, K., Jaleel, J.A., Wickeham, A., Doroudi, S.: Scalable load balancing in the presence of heterogeneous servers. Perform. Eval. (2020). https://doi.org/10.1016/j.peva.2020.102151

    Article  Google Scholar 

  16. Goren, G., Vargaftik, S., Moses, Y.: Stochastic coordination in heterogeneous load balancing systems. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, pp. 403–414 (2021)

  17. Gupta, V., Harchol-Balter, M., Sigman, K., Whitt, W.: Analysis of join-the-shortest-queue routing for web server farms. Perform. Eval. 64(9–12), 1062–1081 (2007)

    Article  Google Scholar 

  18. Hellemans, T., Bodas, T., Van Houdt, B.: Performance analysis of workload dependent load balancing policies. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems (2019). https://doi.org/10.1145/3341617.3326150

  19. Hellemans, T., Van Houdt, B.: On the power-of-d-choices with least loaded server selection. ACM SIGMETRICS Perform. Eval. Rev. (2019). https://doi.org/10.1145/3292040.3219664

    Article  Google Scholar 

  20. Hyytiä, E.: Optimal routing of fixed size jobs to two parallel servers. INFOR Inf. Syst. Oper. Res. 51(4), 215–224 (2013). https://doi.org/10.3138/infor.51.4.215

    Article  Google Scholar 

  21. Izagirre, A., Makowski, A.: Light traffic performance under the power of two load balancing strategy: the case of server heterogeneity. SIGMETRICS Perform. Eval. Rev. 42(2), 18–20 (2014)

    Article  Google Scholar 

  22. Jaleel, J.A., Wickeham, A., Doroudi, S., Gardner, K.: A general “power-of-d’’ dispatching framework for heterogeneous systems. Perform. Eval. Rev. 48(2), 30–32 (2020)

    Google Scholar 

  23. Koole, G.: A simple proof of the optimality of a threshold policy in a two-server queueing system. Syst. Control Lett. 26(5), 301–303 (1995)

    Article  Google Scholar 

  24. Larsen, R.L.: Control of Multiple Exponential Servers with Application to Computer Systems. Ph.D. thesis, College Park (1981)

  25. Li, Q.L., Dai, G., Lui, J.C.S., Wang, Y.: The mean-field computation in a supermarket model with server multiple vacations. Discrete Event Dyn. Syst. 24(4), 473–522 (2014). https://doi.org/10.1007/s10626-013-0171-5

    Article  Google Scholar 

  26. Lin, W., Kumar, P.R.: Optimal control of a queueing system with two heterogeneous servers. IEEE Trans. Autom. Control 29(8), 696–703 (1984)

    Article  Google Scholar 

  27. Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68(11), 1056–1071 (2011)

    Article  Google Scholar 

  28. Lubin, M., Dunning, I.: Computing in operations research using julia. INFORMS J. Comput. 27(2), 238–248 (2015). https://doi.org/10.1287/ijoc.2014.0623

    Article  Google Scholar 

  29. Luh, H.P., Viniotis, I.: Threshold control policies for heterogeneous server systems. Math. Methods Oper. Res. 55(1), 121–142 (2002)

    Article  Google Scholar 

  30. Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12(10), 1094–1104 (2001)

    Article  Google Scholar 

  31. Mitzenmacher, M.: Analyzing distributed join-idle-queue: a fluid limit approach. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 312–318. IEEE (2016)

  32. Mitzenmacher, M.D., Vocking, B.: The asymptotics of selecting the shortest of two, improved (1999)

  33. Mukhopadhyay, A., Mazumdar, R.R.: Analysis of randomized join-the-shortest-queue (JSQ) schemes in large heterogeneous processor-sharing systems. IEEE Trans. Control Netw. Syst. 3(2), 116–126 (2015)

    Article  Google Scholar 

  34. Nelson, R.D., Philips, T.K.: An Approximation to the Response Time for Shortest Queue Routing, vol. 17. ACM (1989)

  35. Rubinovitch, M.: The Slow Server Problem. J. Appl. Probab. 22(1), 205–213 (1985)

    Article  Google Scholar 

  36. Rubinovitch, M.: The slow server problem: a queue with stalling. J. Appl. Probab. 22(4), 879–892 (1985)

    Article  Google Scholar 

  37. Rykov, V.V., Efrosinin, D.V.: On the slow server problem. Autom. Remote Control 70(12), 2013–2023 (2009)

    Article  Google Scholar 

  38. Selen, J., Adan, I., Kapodistria, S.: Approximate performance analysis of generalized join the shortest queue routing. In: Proceedings of the 9th EAI International Conference on Performance Evaluation Methodologies and Tools, pp. 103–110. ICST (Institute for Computer Sciences, Social-Informatics) (2016)

  39. Selen, J., Adan, I., Kapodistria, S., van Leeuwaarden, J.: Steady-state analysis of shortest expected delay routing. Queueing Syst. 84(3–4), 309–354 (2016)

    Article  Google Scholar 

  40. Sethuraman, J., Squillante, M.S.: Optimal stochastic scheduling in multiclass parallel queues. SIGMETRICS Perform. Eval. Rev. 27(1), 93–102 (1999). https://doi.org/10.1145/301464.301483

    Article  Google Scholar 

  41. Shneer, S., Stolyar, A.L.: Large-scale parallel server system with multi-component jobs. Queueing Syst. 98, 21–48 (2021)

    Article  Google Scholar 

  42. Stolyar, A.: Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80(4), 341–361 (2015)

    Article  Google Scholar 

  43. Tantawi, A.N., Towsley, D.: Optimal static load balancing in distributed computer systems. J. ACM (JACM) 32(2), 445–465 (1985)

    Article  Google Scholar 

  44. Teschl, G.: Ordinary differential equations and dynamical systems, vol. 140. American Mathematical Society (2024)

  45. Van Spilbeeck, I., Van Houdt, B.: On the impact of job size variability on heterogeneity-aware load balancing. Ann. Oper. Res. 293(1), 371–399 (2020)

    Article  Google Scholar 

  46. Vargaftik, S., Keslassy, I., Orda, A.: LSQ: load balancing in large-scale heterogeneous systems with multiple dispatchers. IEEE/ACM Trans. Netw. (2020). https://doi.org/10.1109/TNET.2020.2980061

    Article  Google Scholar 

  47. Vvedenskaya, N., Dobrushin, R., Karpelevich, F.: Queueing system with selection of the shortest of two queues: an asymptotic approach. Problemy Peredachi Informatsii 32(1), 20–34 (1996)

    Google Scholar 

  48. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006). https://doi.org/10.1007/s10107-004-0559-y

    Article  Google Scholar 

  49. Wang, C., Feng, C., Cheng, J.: Distributed join-the-idle-queue for low latency cloud services. IEEE/ACM Trans. Netw. 26(5), 2309–2319 (2018)

    Article  Google Scholar 

  50. Weber, R.R.: On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15(2), 406–413 (1978)

    Article  Google Scholar 

  51. Weng, W., Zhou, X., Srikant, R.: Optimal load balancing with locality constraints. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 4(3), 1–37 (2020)

  52. Whitt, W.: Deciding which queue to join: some counterexamples. Oper. Res. 34(1), 55–62 (1986)

    Article  Google Scholar 

  53. Winston, W.: Optimality of the shortest line discipline. J. Appl. Probab. 14(1), 181–189 (1977)

    Article  Google Scholar 

  54. Zhou, X., Shroff, N., Wierman, A.: Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers. Perform. Eval. 145, 102146 (2021)

    Article  Google Scholar 

  55. Zhou, X., Wu, F., Tan, J., Sun, Y., Shroff, N.: Designing low-complexity heavy-traffic delay-optimal load balancing schemes: theory to algorithms. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems 1(2): 39 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jazeem Abdul Jaleel.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Validation via simulation

Recall that there are three places at which our analysis is approximate: (1) we assume that the number of servers \(k\rightarrow \infty \), (2) we assume asymptotic independence, meaning that the states of any subset of d servers are mutually independent, and (3) we facilitate our analysis by introducing a finite buffer of size M at each of the servers. Our goal in this section is to validate these assumptions by comparing the performance metrics that we compute analytically to those obtained via simulation.

Below we present results for one particular system parameterization; the results for other parameterizations (not shown) are qualitatively similar. We set \(d=3\), \(s=3\) server speeds, \(q_i = \frac{1}{3}\) for all \(i\in \{1,2,3\}\), \(\mu _1/\mu _3 = 3\), \(\mu _2/\mu _3=1.5\), and \(\lambda =0.85\); observe that this is one of the \(12\,825\) problem instances described in Sect. 5.3. We show results for this particular problem instance because higher values of \(\lambda \) and higher variability in the server speeds will tend to yield slower convergence between the simulated and analytical results. We treat each server’s queue in the simulated systems as having infinite buffer sizes.

Table 4 shows mean response time as a function of the number of servers k in the simulated system for the problem instance described above. Each row in the table results from averaging the mean response time across 10 runs of the simulation, where each run consists of \(50\,000 k\) arrivals and we discard the first \(15\%\) of the arrivals to ensure that the system has reached steady state. The \(95\%\) confidence intervals are all less than 0.002, and the percent error terms (computed as \(\left| (\mathbb {E}[T]^{\textrm{analysis}} - \mathbb {E}[T]^{\textrm{simulation}})/\mathbb {E}[T]^{\textrm{analysis}} \right| \cdot 100\%\)) are all less than \(0.9\%\). One can easily see that, as k increases, the simulated mean response time tends toward that produced by the analysis (in this case 1.6036). This indicates that our assumptions are reasonable and our analysis yields near-exact results, provided that the system under consideration is sufficiently large.

Table 4 Comparing simulated and analytical results for the scenario with \(d=3\), \(s=3\) server speeds, \(q_i = \frac{1}{3}\) for all \(i\in \{1,2,3\}\), \(\mu _1/\mu _3 = 3\), \(\mu _2/\mu _3=1.5\), and \(\lambda =0.85\)

To further validate our results, we also compare the queue length distribution produced by simulation and by our analysis for the same parameter settings described above, and where \(k=1350\) for the simulated results (see Table 5). Again, it is readily apparent that (outside of inconsequentially rare events with probabilities very close to zero) our analytical results match the simulated results very closely, providing further confirmation that our analytical approximation is highly accurate.

Table 5 Queue length probability distributions for each class when \(k=1350\) with other parameters corresponding to those used in Table 4. A simulation value of 0 indicates that the queue length was never observed in the simulation

Finally, we also verify that the introduction of the finite buffer (in this case our optimization methodology from Appendix D yields \(M=6\)) has virtually no impact on the system performance. Across these simulations, at most 1 out of every 1 million arrivals was ultimately assigned to a server with a queue length at or exceeding our buffer size.

Appendix: Solution existence and uniqueness

In this appendix, we will show that after pruning the space of our admissible assignment rules and introducing a finite buffer (if given an initial value condition), the system of differential equation associated with our system, i.e.,

$$\begin{aligned} \left( f_{i}^{(n)}\right) '= & {} \frac{\lambda }{q_i} \sum _{\varvec{d}\in \mathcal {D}} \left( p(\varvec{d})\sum _{\bar{\varvec{x}}\in \bar{\mathcal {A}}_{i}^{(n-1)}\left( \varvec{d}\right) } \left( \prod _{\begin{array}{c} \ell \in \mathcal {S} \end{array}}\left( \left( f_{\ell }^{({\bar{x}}_\ell )}\right) ^{d_\ell }-\left( f_{\ell }^{({\bar{x}}_\ell +1)}\right) ^{d_\ell }\right) \right) \widetilde{\alpha }_i\left( \widetilde{\varvec{x}}\right) \right) \\{} & {} \quad - \mu _i \left( f_{i}^{(n)} - f_{i}^{(n+1)}\right) \end{aligned}$$

(with equations for each \( (i,n)\in {\mathcal {S}}\times \{0,1,\ldots M\}\)), has a unique solution.

We first introduce some useful notation. Letting \(\textbf{f}\in {\mathbb {R}}^{sM}\) be given by

$$\begin{aligned}{} & {} \textbf{f}\equiv \left( f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{1}^{(M)},f_{2}^{(1)},f_{2}^{(2)},\ldots ,f_{2}^{(M)},\ldots ,\right. \\{} & {} \qquad \left. f_{s}^{(1)},f_{s}^{(2)},\ldots ,f_{s}^{(M)}\right) \end{aligned}$$

and understanding \(f_{i}^{(0)}\equiv 1\) and \(f_{i}^{(M+1)}\equiv 0\) for all \(i\in \mathcal {S}\), we define the function \(F:\mathbb {R}^{sM}\rightarrow {\mathbb {R}}^{sM}\) component-wise by

$$\begin{aligned}{} & {} F(\textbf{f})\equiv \\{} & {} \left( F_1^{(1)}(\textbf{f}),F_1^{(2)}(\textbf{f}),\ldots F_1^{(M)}(\textbf{f}),F_2^{(1)}(\textbf{f}),F_2^{(2)}(\textbf{f}),\ldots F_2^{(M)}(\textbf{f}),\ldots ,F_s^{(1)}(\textbf{f}),F_s^{(2)}(\textbf{f}),\ldots F_s^{(M)}(\textbf{f})\right) , \end{aligned}$$

where

$$\begin{aligned}{} & {} F_i^{(n)}(\textbf{f})\equiv \frac{\lambda }{q_i} \sum _{\varvec{d}\in \mathcal {D}} \left( p(\varvec{d})\sum _{\bar{\varvec{x}}\in \bar{\mathcal {A}}_{i}^{(n-1)}\left( \varvec{d}\right) } \left( \prod _{\begin{array}{c} \ell \in \mathcal {S} \end{array}}\left( \left( f_{\ell }^{({\bar{x}}_\ell )}\right) ^{d_\ell }-\left( f_{\ell }^{({\bar{x}}_\ell +1)}\right) ^{d_\ell }\right) \right) \widetilde{\alpha }_i\left( \widetilde{\varvec{x}}\right) \right) \\{} & {} \qquad - \mu _i \left( f_{i}^{(n)} - f_{i}^{(n+1)}\right) . \end{aligned}$$

Recalling that all \(f_{i}^{(n)}\) are real-valued functions of time t, it follows that \(\textbf{f}\) and \(F(\textbf{f})\) are vector-valued functions of time t, and hence, we can rewrite the system of differential equations of interest as \((\textbf{f})'=F(\textbf{f})\), where differentiation is with respect to time t.

If F is Lipschitz continuous on the closed hypercube \([0,1]^{sM}\) (i.e., for some norm \(\Vert \cdot \Vert \) there exists some constant \(L\ge 0\) such that for all \(\textbf{f},\textbf{g}\in [0,1]^{sM}\), we have \(\left\Vert F(\textbf{f})-F(\textbf{g})\right\Vert \le L\left\Vert \textbf{f}-\textbf{g}\right\Vert \)) then by the Picard-Lindelöf Theorem [44], the differential equation \((\textbf{f})'=F(\textbf{f})\) together with the initial value condition \(F(\textbf{f})|_{t=t_0}=\textbf{f}_0\) has a unique solution (defined over some neighborhood of \(t_0\)) for all \(\textbf{f}_0\in (0,1)^{sM}\).

In order to show that F is indeed Lipschitz continuous, it is sufficient to show that each component of F is Lipschitz continuous, i.e., for some norm \(\Vert \cdot \Vert \) and for each \((i,n)\in \mathcal {S}\times \{1,\ldots ,M\}\) there exists some constant \(L_i^{(n)}\ge 0\) such that \(\left| F_i^{(n)}(\textbf{f})-F_i^{(n)}(\textbf{g})\right| \le L_i^{(n)} \left\Vert \textbf{f}-\textbf{g}\right\Vert \). Now observe that each \(F_i^{(n)}(\textbf{f})\) is a finite-term multivariate polynomial in the variables \(f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{s}^{(M)}\) (i.e., in the components of \(\textbf{f}\)) of degree at most d. To see that this is the case, observe that \(F_i^{(n)}(\textbf{f})\) is a weighted sum of finitely many products of the form

$$\begin{aligned} \prod _{\begin{array}{c} \ell \in \mathcal {S} \end{array}}\left( \left( f_{\ell }^{({\bar{x}}_\ell )}\right) ^{d_\ell }-\left( f_{\ell }^{({\bar{x}}_\ell +1)}\right) ^{d_\ell }\right) \end{aligned}$$

in addition to the linear terms \(-\mu _if_{i}^{(n)}\) and \(\mu _if_{i}^{(n+1)}\). Moreover, these products are each polynomials of degree at most d in the variables \(f_{1}^{(1)},f_{1}^{(2)},\ldots ,f_{s}^{(M)}\) and their weights (coefficients) are constant in these variables; the bound on the polynomial degree of these products follows from the fact that \(\sum _{\ell \in \mathcal {S}}d_\ell =d\) for each \(\varvec{d}\in \mathcal {D}\). Finally, since finite-degree finite-term polynomials are Lipschitz continuous over any compact set, each of the \(F_i^{(n)}\) functions— and hence F itself—are Lipschitz continuous on the hypercube \([0,1]^{sM}\) as desired, and so the system of differential equations of interest admits a unique solution.

Appendix: Additional optimization problems

figure e
figure f

Footnote 6

figure g

Appendix: Selection of \(\varvec{h}\) and \(M\) values and seeding procedure

Our methodology for heuristically solving the optimization problem for a dispatching policy family \(\textbf{DPF}\) involves solving the problem multiple times, each time supporting this procedure with a solution from a more restricted family (\(\textbf{DPF}^\dag \subsetneq \textbf{DPF}\)) that was found earlier in our overall solution process. We note that we make use not only of the supporting solution \((\textbf{DPF}^\dag )^\star \)—that is, of the \(p(\varvec{d})\) and \(\widetilde{\alpha }_i(\widetilde{\varvec{x}})\) values associated with the policy \((\textbf{DPF}^\dag )^\star \)—but also

  • the vector \(\varvec{h}^\dag \equiv (h_1^\dag ,h_2^\dag ,\ldots ,h_s^\dag )\) and the buffer size \(M^\dag \) that were used in formulating the optimization problem for \(\textbf{DPF}^\dag \),

  • the probabilities \(\left( f_i^{(j)}\right) ^\dag \) associated with the solution,

  • and the response time yielded by \((\textbf{DPF}^\dag )^\star \), which we denote by \({\mathbb {E}}[T]^\dag \).

Specifically, we write that we heuristically solve the optimization problem for a dispatching policy family \(\textbf{DPF}\) supported by the solution to the problem for the dispatching policy family \(\textbf{DPF}^\dag \)—or supported by \((\textbf{DPF}^\dag )^\star \) for short—to indicate that we carry out the following procedure:

  1. (1)

    Set

    $$\begin{aligned} h_1= & {} \max \left\{ s+3,\min \left\{ h_1^\dag ,\min \left\{ j\in \left\{ 1,2,\ldots ,M^\dag \right\} :\right. \right. \right. \\{} & {} \left. \left. \left. \left( \forall i\in \mathcal {S}:\left( f_{i}^{(j)}\right) ^{\dag }\le 10^{-6}\right) \right\} +1\right\} \right\} , \end{aligned}$$

    set \(\textbf{h}=(h_1,h_1-1,\ldots ,h_1-s+1)\) and set \(M=h_1\).

    One obvious candidate for \(h_1\) is \(h_1^\dag \), the corresponding value that was used to find the supporting solution. However, because higher choices of \(h_1\) will result in longer solving times, we consider using a lower value than \(h_1^\dag \) if we deem this value to have been “unnecessarily high” for the supporting problem in retrospect. In particular, if there exists some queue length \(j \le M^\dag \) such that \(\left( f_{i}^{(j)}\right) ^{\dag } \le 10^{-6}\) for all server classes \(i\in \mathcal {S}\) (that is, if all queues reach length j sufficiently infrequently), then we also consider \(j+1\) as a candidate choice for \(h_1\).

    We further require \(h_1 \ge s+3\) so that \(h_s\ge 4\); this ensures that we retain sufficient information to differentiate among servers of different classes. Having selected \(h_1\) according to the above considerations, we then set \(\varvec{h}\) so that \(h_i = h_{i-1}-1\) for all \(i>1\). Finally, we set the buffer size \(M = h_1\); by setting \(h_1\) and M to the same value, we maximize the level of discriminatory power available to the assignment rule, given the buffer size.

  2. (2)

    Run the optimization problem associated with \(\textbf{DPF}\) seeded with the solution to \(\textbf{DPF}^\dag \) with the additional constraint that

    $$\begin{aligned} {\mathbb {E}}[T]\le \left( 1+10^{-3}\right) {\mathbb {E}}[T]^\dag . \end{aligned}$$

    If no solution is found, re-run this optimization problem without this constraint.

    Introducing this constraint limits the “search” for new solutions that are superior to the supporting solution, while allowing for a small amount of “slack.” This constraint occasionally prevents the heuristic optimization procedure from finding the best solutions; hence, if no solution is found, we attempt to solve the problem without the constraint. In both cases we seed IPOPT with the supporting solution.

  3. (3)

    Consider the following three statements:

    1. (a)

      The previous step did not yield a solution.

    2. (b)

      The previous step yielded a solution that was significantly inferior to the supporting solution (i.e., it yielded \({\mathbb {E}}[T]\ge \left( 1+10^{-3}\right) {\mathbb {E}}[T]^\dag \)).

    3. (c)

      \(\textbf{DPF}=\left\langle \textbf{QRF}, \textbf{ARF} \right\rangle \) for some \(\textbf{QRF}\subsetneq \textbf{GEN}\) (i.e., the querying rule family under consideration is not \(\textbf{GEN}\)).

    If at least one of the above statements is true, then do the following: solve the optimization problem associated with \(\textbf{DPF}\) without any seeding (but still using the \(\varvec{h}\) vector and \(M\) value as set in step 1, noting that these may depend on the details of the supporting solution). If no solution is found, increment \(M\) and all entries of \(\varvec{h}\) by 1, and attempt to solve the problem again, up to a maximum of 4 times.

    Seeding an optimization problem with a feasible solution sometimes allows IPOPT to find better solutions, but it may lead to the identification of a local solution that would be worse than that found by its “unseeded” counterpart. For this reason, we also typically solve the unseeded problem. However, solving the unseeded problem is very computationally expensive for problems allowing for general (\(\textbf{GEN}\)) querying, so if the querying rule under consideration is \(\textbf{GEN}\) then we conduct this step only when the seeded problem did not yield a good solution.

  4. (4)

    Select the best solution (i.e., the solution that yields the lowest \({\mathbb {E}}[T]\)) among those from the previous two steps as well as the supporting solution (associated with \(\textbf{DPF}^\dag \subsetneq \textbf{DPF}\)). Report this as the solution for \(\textbf{DPF}\), and call the resulting policy \(\textbf{DPF}^\star \).

    \(\textbf{DPF}^\star \) is set to be the best policy among all of those identified throughout this procedure; note that it is possible that the best policy may be the one corresponding to the supporting solution itself, i.e., \((\textbf{DPF}^\dag )^\star \).

Fig. 10
figure 10

Flow-chart illustrating the “seeding procedure.” An arrow pointing one from dispatching policy family to another indicates that the solution of the former supports finding the solution of the latter; in some cases the best of several policies is chosen to be the supporting policy

Having defined what it means to solve the optimization problem associated with a family \(\textbf{DPF}\) supported by a policy \((\textbf{DPF}^\dag )^\star \), we are now ready to describe the procedure that we use to identify solutions for all policy families of interest. The steps for our solution-finding procedure are as follows (see Fig. 10 for a visual aid):

  1. (1)

    Find \(\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star \) for each assignment rule \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\) and find \(\left\langle \textbf{SRC}, \textsf{JSQ} \right\rangle ^\star \) by heuristically solving their associated optimization problems (using IPOPT), where in each case we first set \(\varvec{h} = (15,14,...,15-s+1)\) and \(M=15\). If no solution is found increment \(M\) and all elements of \(\varvec{h}\) by 1 and attempt to solve again. Repeat this procedure a maximum of 4 times.

  2. (2)

    Find \(\left\langle \textsf{BR}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the three policies \(\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star \), for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), yielded the lowest \(\mathbb {E}[T]\). Also find \(\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star \) supported by \(\left\langle \textsf{BR}, \textsf{AR} \right\rangle ^\star \), for each \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\).

  3. (3)

    Find \(\left\langle \textbf{IID}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the four policies \(\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star \) for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\) and \(\left\langle \textsf{BR}, \textbf{CLD} \right\rangle \) yielded the lowest \({\mathbb {E}}[T]\). Also find \(\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star \) supported by \(\left\langle \textbf{IID}, \textsf{AR} \right\rangle ^\star \) for each \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\).

  4. (4)

    Find \(\left\langle \textbf{IND}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the four policies \(\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star \), for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), and \(\left\langle \textbf{IID}, \textbf{CLD} \right\rangle ^\star \) yielded the lowest \({\mathbb {E}}[T]\). Also, for each \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), find \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle ^\star \) supported by whichever of the two policies \(\left\langle \textbf{IND}, \textsf{AR} \right\rangle ^\star \) and \(\left\langle \textbf{SRC}, \textsf{JSQ} \right\rangle ^\star \) yielded the lowest \({\mathbb {E}}[T]\).

  5. (5)

    Find \(\left\langle \textbf{GEN}, \textbf{CLD} \right\rangle ^\star \) by solving the associated optimization problem supported by whichever of the four policies \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle ^\star \), for \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), and \(\left\langle \textbf{IND}, \textbf{CLD} \right\rangle \) yielded the lowest \({\mathbb {E}}[T]\).

    Note: in our numerical experiments, we obtained solutions for all policies for all 12 825 parameter settings described in Sect. 5.3except for 20 high load parameter instances (\(\lambda =.95\)); on these 20 instances, we obtained solutions for all policies (including \(\left\langle \textbf{GEN}, \textsf{SED} \right\rangle \)) except for \(\left\langle \textsf{BR}, \textsf{SED} \right\rangle ^\star \), \(\left\langle \textbf{IID}, \textsf{SED} \right\rangle ^\star \), \(\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star \), likely because at these parameter values, the \(\textsf{SED}\)-driven policies require a higher setting of \(\varvec{h}\) and \(M\) to obtain a solution. As a result, we could not support the optimization problems for \(\left\langle \textbf{IID}, \textsf{SED} \right\rangle ^\star \) and \(\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star \) with a solution, and when finding \(\left\langle \textbf{GEN}, \textsf{SED} \right\rangle ^\star \) for these parameter instances, we had to discard \(\left\langle \textbf{IND}, \textsf{SED} \right\rangle ^\star \) as a potential candidate for supporting the solution process.

We also find strongly performing \(\textbf{CID}\)-driven dispatching policies using the methods presented in [1]; we then revise our \(\textbf{CLD}\)-driven policies if (due to the limitations associated with heuristic optimization) we are able to find a \(\textbf{CID}\)-driven policy that outperforms the best identified \(\textbf{CLD}\)-driven counterpart. Specifically, we execute these three steps:

  1. (1)

    Find \(\left\langle \textsf{BR}, \textbf{CID} \right\rangle ^\star \), \(\left\langle \textbf{IID}, \textbf{CID} \right\rangle ^\star \), \(\left\langle \textbf{IND}, \textbf{CID} \right\rangle ^\star \), and \(\left\langle \textbf{GEN}, \textbf{CID} \right\rangle ^\star \) using the methods presented in [1]. If for any two of these policies \(\textbf{DPF}_1\) and \(\textbf{DPF}_2\), we have \(\textbf{DPF}_1\subseteq \textbf{DPF}_2\) and \(\textbf{DPF}_1^\star \) outperforms \(\textbf{DPF}_2^\star \) (e.g., if \(\left\langle \textbf{IID}, \textbf{CID} \right\rangle \) outperforms \(\left\langle \textbf{IND}, \textbf{CID} \right\rangle \)), then we “overwrite” the policy \(\textbf{DPF}_2^\star \) so that it is set to \(\textbf{DPF}_1^\star \).

  2. (2)

    For each querying rule family \(\textbf{QRF}\in \{\textsf{BR},\textbf{IID},\textbf{IND},\textbf{GEN}\}\) (where we abuse notation and treat \(\textsf{BR}\) as the family \(\{\textsf{BR}\}\)), if \(\left\langle \textbf{QRF}, \textbf{CID} \right\rangle ^\star \) yields a lower mean response time than \(\left\langle \textbf{QRF}, \textbf{CLD} \right\rangle ^\star \), “overwrite” the policy \(\left\langle \textbf{QRF}, \textbf{CLD} \right\rangle ^\star \) so that it is set to \(\left\langle \textbf{QRF}, \textbf{CID} \right\rangle ^\star \).

    Note: we cannot support an optimization problem associated with a \(\textbf{CLD}\) -driven problem with the solution to a \(\textbf{CID}\) -driven problem, because \(\textbf{CID}\)-driven policies do not necessarily obey our dominance pruning, and thus our formulation for the optimization problem associated with a \(\textbf{CLD}\) -driven family may render \(\textbf{CID}\) ph-driven policies infeasible.

  3. (3)

    For each assignment rule \(\textsf{AR}\in \{\textsf{JSQ},\textsf{SED},\textsf{SEW}\}\), if \(\left\langle \textsf{Q}_{\textbf{CID}}^\star , \textsf{AR} \right\rangle \)—which is uniquely specified by pairing the querying rule of \(\left\langle \textbf{GEN}, \textbf{CID} \right\rangle ^\star \) with the assignment rule \(\textsf{AR}\)—yields a lower mean response time than \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle \), “overwrite” the policy \(\left\langle \textbf{GEN}, \textsf{AR} \right\rangle \) so that it is set to \(\left\langle \textsf{Q}_{\textbf{CID}}^\star , \textsf{AR} \right\rangle \).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdul Jaleel, J., Doroudi, S. & Gardner, K. Queue-length-aware dispatching in large-scale heterogeneous systems. Queueing Syst 108, 125–184 (2024). https://doi.org/10.1007/s11134-024-09918-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11134-024-09918-x

Keywords