1 Introduction

Optimization methods are essential in machine learning and are usually employed to find the model parameters and appropriate structures (Sra et al. 2011; Henning and Keifel 2013; Bull 2011; Sun et al. 2013). Among the various optimization methods available, random search approaches are direct search techniques that incorporate stochastic strategies into the search process to enable the algorithms to jump out of local optima with high probability, and they are widely used in machine learning (Bergstra and Bengio 2012). Metaheuristic methods are an important class of random search techniques. They are formally defined as an iterative generation-based process, which guides the search by using a subordinate heuristic and intelligently combining different concepts for exploring and exploiting the search space, the most popular methods in this class being the evolutionary algorithms (EAs) (Fortin et al. 2012; Pelikan 2012; Fournier and Teytaud 2011; Lu et al. 2011)

Particle swarm optimization (PSO) is a metaheuristic method attributed to be originally developed by Kennedy and Eberhart (1995), Eberhart and Kennedy (1995). It has been motivated by bird flocking and fish schooling mechanisms, and the swarm theory in particular. Unlike EAs, PSO has no evolution operators such as crossover and selection. The PSO algorithm performs an optimization task by iteratively improving a swarm of candidate solutions with respect to an objective (fitness) function. The candidate solutions, called particles, move through the problem space according to simple mathematical formulae describing the particles’ positions and velocities. The movement of each particle is influenced by its own experiences, and is also guided towards the current best known position.

In the last decade, PSO has gained increasing popularity due to its effectiveness in performing difficult optimization tasks. The reason why PSO is attractive is that it gets better solutions, in a faster and cheaper way compared to other methods, whereas has fewer parameters to adjust. It has been successfully used in many research and application areas (Poli 2007, 2008; Veeramachaneni et al. 2012). The algorithm has also been found useful in machine learning problems (Escalante et al. 2009).

To gain insights into how the algorithm works, some researchers have theoretically analyzed the PSO algorithm. These analyses mainly aimed for the behavior of the individual particle in the PSO algorithm, which is essential to the understanding of the search mechanism of the algorithm and to parameter selection (Kennedy 1998; Ozcan and Mohan 1999; Clerc and Kennedy 2002; van den Bergh 2002; Shi and Eberhart 1998a, b; Trelea 2003; Emara and Fattah 2004; Gavi and Passino 2003; Kadirkamanathan et al. 2006; Jiang et al. 2007; Poli 2009; Bonyadi et al. 2014; Cleghorn and Engelbrecht 2014). For example, Kennedy analysed a simplified particle behavior and demonstrated different particle trajectories for a range of design choices (Kennedy 1998). Clerc and Kennedy undertook a comprehensive analysis of the particle trajectory and its stability properties (Clerc and Kennedy 2002). As for the algorithm itself, Van den Bergh proved that the canonical PSO is not a global search algorithm, even not a local one (van den Bergh 2002; Van den Bergh and Engelbrecht 2006, 2010), by using the convergence criterion provided by Solis and Wets (1981).

In addition to the analyses mentioned above, there has been a considerable amount of work performed in improving the original version of the PSO through empirical studies. The original PSO proposed in Kennedy and Eberhart (1995) appeared to have weak local search ability, due to the slow convergence speed of the particles. It is universally known that the tradeoff between the local search (exploitation) and the global search (exploration) is vital for the performance of the algorithm. As such, the original PSO needs to accelerate the convergence speed of the particles in order to achieve a better balance between exploitation and exploration. The work in this area, which was first carried out by Shi and Eberhart, involved introducing an inertia weight into the update equation for velocities, in order to control the explosion in velocity values and partially help accelerate the convergence of individual particles (Shi and Eberhart 1998a, b). Clerc (1999) proposed another acceleration method by adding a constriction factor in the velocity update equation, in order to release the restriction on the particle’s velocity during the convergence history. The acceleration techniques were shown to work well, and the above two variants of PSO have laid the foundation for further enhancement of the PSO algorithm.

In general, two types of neighborhood topologies are used when the PSO algorithm is implemented. One is known as the global best topology or global best model (essentially the star model), which is employed in the PSO with inertia weight (PSO-In) and the PSO with constriction factor (PSO-Co). In this topology model, the search of the particles is guided by the global best position as well as their personal best positions. Although the algorithm with this model is able to efficiently obtain the best approximate solutions for many problems, some researchers argued that this model may be prone to encounter premature convergence when solving harder problems. If the global best particle sticks to a local or suboptimal point, it would mislead the other particles to move towards that point. In other words, other promising search areas might be missed. This had led to the investigation of other neighborhood topologies, known as the local best (lbest) models, first studied by Eberhart and Kennedy (1995) and subsequently in depth by many other researchers (Suganthan 1999; Kennedy 1999, 2002; Liang and Suganthan 2005; Mendes et al. 2004; Parrott and Li 2006; Bratton and Kennedy 2007; Kennedy and Mendes 2002; van den Bergh and Engelbrecht 2004; Lane et al. 2008; Li 2004). The objective there was to find other possible topologies to improve the performance of the PSO algorithm. Engelbrecht (2013) carried out a comprehensive and elaborate empirical comparison of the gbest PSO and lbest PSO algorithms, on a suite of 60 benchmark boundary constrained optimization problems of varying complexities. The statistics of their experimental results show that neither of the two types of algorithms can be considered to outperform the other, not even for specific problem classes in terms of convergence, exploration ability, and solution accuracy.

Another way to possibly improve the PSO algorithm is to directly sample new positions during the search. Thus, some researchers proposed several probabilistic PSO algorithms, which simulate the particle trajectories by directly sampling according to a certain probability distribution (Kennedy 2003, 2004, 2006; Sun et al. 2012; Krohling 2004; Secrest and Lamont 2003; Richer and Blackwell 2006). The Bare Bones PSO (BBPSO) family is a typical class of probabilistic PSO algorithms (Kennedy 2003). In BBPSO, each particle does not have a velocity vector, but its new position is sampled “around” a supposedly good one, according to a certain probability distribution, such as the Gaussian distribution in the original version (Kennedy 2003). Several other new BBPSO variants used other distributions which seem to generate better results (Kennedy 2004, 2006). Recently, some researchers employed stochastic process models, such as Markov chains, to analyse the convergence of the Bare Bone PSO (Poli and Langdon 2007; Zhang et al. 2014).

This paper is focused on the so-called random drift particle swarm optimization (RDPSO), which is inspired by the free electron model in metal conductors in an external electric field (Omar 1993). The basics of the original concept of the random drift model for PSO were sketched in our previous work (Sun et al. 2010). In the limited initial version of the algorithm (Sun et al. 2010), the velocity of the particle’s drift motion is simply expressed by the summation of the cognition part and the social part in the velocity update equation of the original PSO, which is not consistent with the physical meaning of the random drift model. This paper is to use a more concise form for the drift velocity, which is more in line with the physical meaning of the model, as well as a novel strategy for determining the random velocity, and thus it presents a new and different version of the RDPSO algorithm. In Sun et al. (2014a), an RDPSO version with double exponential distribution was validated by testing the algorithm on biochemical system identification problems, while in this paper, a Gaussian distribution is employed to sample the particles’ random velocities in the RDPSO algorithm. In Sun et al. (2014b), this version of RDPSO, along with two improved variants, were applied for training Hidden Markov Models for biological multiple sequence alignment, which is an important machine learning problem in bioinformatics.

This paper presents the extension of our previous work on the RDPSO algorithm. Its purpose is to gain an in-depth understanding of how the RDPSO works by making comprehensive theoretical analyses of the behavior of the individual particle in the RDPSO and the search behavior of the algorithm, and to undertake empirical studies on the issue of parameter selection for four RDPSO variants, which employ different random velocities and neighborhood topologies. Performance comparison between the RDPSO and other PSO variants is to be made by using fourteen benchmark functions from the CEC2005 benchmark suite in order to verify the effectiveness of the proposed algorithms.

The remainder of the paper is organized as follows. Section 2 describes the motivation and principle of the RDPSO algorithm. Section 3 presents the analyses of the RDPSO algorithm, and Sect. 4 presents the four RDPSO variants. Empirical studies on the parameter selection for the RDPSO algorithm and the performance comparison are provided in Sect. 5. Finally, the paper is concluded in Sect. 6.

2 Random drift particle swarm optimization (RDPSO)

2.1 Basic definitions and terminology for PSO

In a PSO with M individuals, each individual is treated as a volume-less particle in the N-dimensional space, with the current position vector and the velocity vector of particle i (\(1\le i\le M)\) at the \(n^{\mathrm{th}}\) iteration represented as \(X_{i,n} =(X_{i,n}^{1} ,X_{i,n}^{2} ,\ldots ,X_{i,n}^{N})\) and \(V_{i,n} =(V_{i,n}^{1} ,V_{i,n}^{2} ,\ldots ,V_{i,n}^{N})\). Each particle i also has the personal best (pbest) position vector \(P_{i,n} =(P_{i,n}^{1} ,P_{i,n}^{2} ,\ldots ,P_{i,n}^{N})\) at the \(n^{\mathrm{th}}\) iteration, which records the position giving the best fitness value (i.e. the objective function value) of the particle from the initialization to the current iteration. Besides, there is a vector \(G_{n} =(G_{n}^{1} ,G_{n}^{2} ,\ldots ,G_{n}^{N})\), known as the global best (gbest) position, recording the position with the best fitness value found by the whole particle swarm so far. Without loss of generality, we consider the following minimization problem:

$$\begin{aligned} {\begin{array}{ll} \hbox {Minimize }&{} {f(X)} \\ \end{array} }, \textit{s.t.} \quad X\in S\subseteq R^{N}, \end{aligned}$$
(1)

where f (X) is an objective function and S is the feasible space. Accordingly, \(P_{i,n} \) can be found by

$$\begin{aligned} P_{i,n} =\left\{ {{\begin{array}{lll} {X_{i,n} }\quad &{} {\hbox {if}}&{}\quad {f(X_{i,n} )<f(P_{i,n-1} )} \\ {P_{i,n-1} }\quad &{}{\hbox {if}}&{}\quad {f(X_{i,n} )\ge f(P_{i,n-1} )} \\ \end{array} }} \right. , \end{aligned}$$
(2)

and \(G_{n}\) updated by

$$\begin{aligned} G_{n} =P_{g,n} , \hbox {where}\quad g=\arg \mathop {\min } \limits _{1\le i\le M} [f(P_{i,n} )]. \end{aligned}$$
(3)

In the basic PSO algorithm, the particle updates its velocity and position at the \((n+1)^{\mathrm{th}}\) iteration according to the following equations:

$$\begin{aligned} V_{i,n+1}^{j}= & {} V_{i,n}^{j} +c_{1} r_{i,n}^{j} \left( P_{i,n}^{j} -X_{i,n}^{j} \right) +c_{2} R_{i,n}^{j} \left( G_{n}^{j} -X_{i,n}^{j} \right) , \end{aligned}$$
(4)
$$\begin{aligned} X_{i,n+1}^{j}= & {} X_{i,n}^{j} +V_{i,n+1}^{j} , \end{aligned}$$
(5)

for \(i=1,2,\ldots M;j=1,2\ldots ,N\), where \(c_{1}\) and \(c_{2}\) are known as the acceleration coefficients, and the parameters \(r_{i,n}^{j}\) and \(R_{i,n}^{j}\) are sequences of two different random numbers distributed uniformly on (0, 1), which is denoted by \(r_{i,n}^{j} ,R_{i,n}^{j} \sim U(0,1)\). Generally, the value of \(V_{i,n}^{j} \) is restricted within the interval \([-V_{\max } ,V_{\max }]\).

2.2 The motivation of the RDPSO algorithm

It has been demonstrated that the convergence of the whole particle swarm may be achieved if each particle converges to its local focus, \(p_{i,n} =(p_{i,n}^{1} ,p_{i,n}^{2} ,\ldots p_{i,n}^{N})\), defined by the following coordinates (Clerc and Kennedy 2002):

$$\begin{aligned} p_{i,n}^{j} =\frac{c_{1} r_{i,n}^{j} P_{i,n}^{j} +c_{2} R_{i,n}^{j} G_{n}^{j} }{c_{1} r_{i,n}^{j} +c_{2} R_{i,n}^{j} }, \quad \qquad {1\le j\le N} , \end{aligned}$$
(6)

or

$$\begin{aligned} p_{i,n}^{j} =\phi _{i,n}^{j} P_{i,n}^{j} +\left( 1-\phi _{i,n}^{j} \right) G_{n}^{j} , \end{aligned}$$
(7)

where \(\phi _{i,n}^{j} ={c_{1} r_{i,n}^{j} } \big /{(c_{1} r_{i,n}^{j} +c_{2} R_{i,n}^{j})}\), with regard to the random numbers \(r_{i,n}^{j}\) and \(R_{i,n}^{j}\) defined in Eqs. (4), (6). The acceleration coefficients \(c_{1}\) and \(c_{2}\) in the original PSO are generally set to be equal, namely, \(c_{1} =c_{2}\), and thus, \(\phi _{i,n}^{j}\) is a sequence of random numbers uniformly distributed on (0,1). As a result, Eq. (7) can be restated as

$$\begin{aligned} p_{i,n}^{j} =\phi _{i,n}^{j} P_{i,n}^{j} +\left( 1-\phi _{i,n}^{j} \right) G_{n}^{j} , \quad \varphi _{i,n}^{j} \sim U(0,1). \end{aligned}$$
(8)

In fact, as the particles are converging to their own local attractors, their current positions, pbest positions, local focuses and the gbest position are all converging to one point. Since \(p_{i,n}\) is a random point uniformly distributed within the hyper-rectangle with \(P_{i,n}\) and \(G_n\) being the two ends of its diagonal, the particle’s directional movement towards \(p_{i,n}\) makes the particle search around this hyper-rectangle and improves its fitness value locally. Hence, this directional movement essentially reflects the local search of the particle.

The motivation of the proposed RDPSO algorithm comes from the above trajectory analysis of the PSO and the free electron model in metal conductors placed in an external electric field (Omar 1993). According to this model, the movement of an electron is the superimposition of the thermal motion, which appears to be a random movement, and the drift motion (i.e., the directional motion) caused by the electric field. That is, the velocity of the electron can be expressed by \(V=VR+VD\), where VR and VD are called the random velocity and the drift velocity, respectively. The random motion (i.e., the thermal motion) exists even in the absence of the external electric field, while the drift motion is a directional movement in the opposite direction of the external electric field. The overall physical effect of the electron’s movement is that the electron careens towards the location of the minimum potential energy. In a non-convex-shaped metal conductor in an external electric field, there may be many locations of local minimum potential energies, which the drift motion generated by the electric force may drive the electron to. If the electron only had the drift motion, it might stick into a point of local minimum potential energy, just as a local optimization method converges to a local minimum of an optimization problem. The thermal motion can make the electron more volatile and, consequently, helps the electron to escape the trap of local minimum potential energy, just as a certain random search strategy is introduced into the local search technique to lead the algorithm to search globally. Therefore, the movement of the electron is a process of minimizing its potential energy. The goal of this process is essentially to find out the minimum solution of the minimization problem, with the position of the electron represented as a candidate solution and the potential energy function as the objective function of the problem.

2.3 Description of the RDPSO algorithm

Inspired by the above facts, we assume that the particle in the RDPSO behaves like an electron moving in a metal conductor in an external electric field. The movement of the particle is thus the superposition of the thermal and the drift motions. The thermal motion implements the global search of the particle, while the drift motion mainly implements the local search. The trajectory analysis, as described in the first paragraph of this subsection, indicates that, in the canonical PSO, the particle’s directional movement toward its local attractor \(p_{i,n}\) reflects the local search of the particle. In the proposed RDPSO, the drift motion of the particle is also defined as the directional movement towards \(p_{i,n} \). It represents the combination of the cognition part and the social part of the canonical PSO and, thus, is the main inheritance of the RDPSO algorithm from the canonical PSO algorithm. In the RDPSO algorithm, the ‘inertia part’ in the velocity equation of the canonical PSO is replaced by the random velocity component, which is the main difference between the RDPSO algorithm and the canonical PSO algorithm. Therefore, the velocity of the particle in the RDPSO algorithm has two components, i.e., the thermal or random component, and the drift component. Mathematically, the velocity of particle i in the \(j\hbox {th}\) dimension can be expressed by \(V_{i,n+1}^{j} =VR_{i,n+1}^{j} +VD_{i,n+1}^{j}\, (1\le i\le M,1\le j\le N), \hbox {where}\,VR_{i,n+1}^{j}\) and \(VD_{i,n+1}^{j}\) are the random velocity component and the drift velocity component, respectively.

A further assumption is that the value of the random velocity component \(VR_{i,n+1}^{j} \) follows the Maxwell velocity distribution law (Kittel and Kroemer 1980). Consequently, \(VR_{i,n+1}^{j} \) essentially follows a normal distribution (i.e., Gaussian distribution) whose probability density function is given by

$$\begin{aligned} f_{VR_{i,n+1}^{j} } (v)=\frac{1}{\sqrt{2\pi }\sigma _{i,n+1}^{j} } e^{\frac{-v^{2}}{2\left( \sigma _{i,n+1}^{j}\right) ^{2}}}, \end{aligned}$$
(9)

where \(\sigma _{i,n+1}^{j} \) is the standard deviation of the distribution. Using stochastic simulation, we can express \(VR_{i,n+1}^{j}\) as

$$\begin{aligned} VR_{i,n+1}^{j} =\sigma _{i,n+1}^{j} \varphi _{i,n+1}^{j} , \end{aligned}$$
(10)

where \(\varphi _{i,n+1}^{j}\) is a random number with a standard normal distribution, i.e., \(\varphi _{i,n+1}^{j} \sim N(0,1)\). \(\sigma _{i,n+1}^{j}\) must be determined in order to calculate \(VR_{i,n+1}^{j} \). An adaptive strategy is adopted for \(\sigma _{i,n+1}^{j}\):

$$\begin{aligned} \sigma _{i,n+1}^{j} =\alpha \big |C_{n}^{j} -X_{i,n}^{j} \big |, \end{aligned}$$
(11)

where \(C_{n} =(C_{n}^{1} ,C_{n}^{2} ,\ldots ,C_{n}^{N})\) is known as the mean best (mbest) position defined by the mean of the pbest positions of all the particles, namely,

$$\begin{aligned} {C_{n}^{j} =(1/M)\sum \nolimits _{i=1}^{M} {P_{i,n}^{j} ,} } \qquad {(1\le j\le N)}. \end{aligned}$$
(12)

Thus, Eq. (10) can be restated as

$$\begin{aligned} VR_{i,n+1}^{j} =\alpha \big |C_{n}^{j} -X_{i,n}^{j} \big |\varphi _{i,n+1}^{j} , \end{aligned}$$
(13)

where \(\alpha >0\) is an algorithmic parameter called the thermal coefficient. In the next section, where the search behavior of individual particles and the whole swarm is analyzed, we will find that this random velocity component drives the particle away from the global best position, so it indeed reflects the global search of the particle.

The role of the drift velocity component, \(VD_{i,n+1}^{j}\), is to implement the local search of the particle, which can be achieved by the directional movement toward \(p_{i,n}\), as has been mentioned above. In this paper we use the following simple linear expression for \(VD_{i,n+1}^{j}\):

$$\begin{aligned} VD_{i,n+1}^{j} =\beta \left( p_{i,n}^{j} -X_{i,n}^{j} \right) , \end{aligned}$$
(14)

where \(\beta >0\) is a deterministic constant and is another algorithmic parameter called the drift coefficient. This form of \(VD_{i,n+1}^{j}\)in Eq. (14) is more concise than the one in Sun et al. (2010) and it has a clear physical meaning that it reflects the particle’s directional movement towards \(p_{i,n}\). In Theorem 1 in the Appendix, it is proven that, if there is only drift motion and, i.e., \(V_{i,n+1}^{j} =VD_{i,n+1}^{j} , X_{i,n}^{j} \rightarrow p_{i,n}^{j}\) as \(n\rightarrow \infty \) when \(0<\beta <2\), meaning that the expression of \(VD_{i,n+1}^{j}\) in Eq. (14) can indeed guarantee the particle’s directional movement toward \(p_{i,n}\) as an overall result. More specifically, if \(0<\beta <1, X_{i,n}^{j}\) asymptotically converges to \(p_{i,n}^{j}\), which means that the sampling space of \(X_{i,n+1}\) does not cover the hyper-rectangle with \(P_{i,n}\) and \(G_{n}\) being the two ends of its diagonal. If \(\beta =1, X_{i,n+1}^{j}\) is identical to \(p_{i,n}^{j}\) so that the sampling space of \(X_{i,n+1}\) is exactly the hyper-rectangle. If \(1<\beta <2, X_{i,n}^{j}\) converges to \(p_{i,n}^{j}\) in oscillation and thus the sampling space of \(X_{i,n+1}\) covers the hyper-rectangle and even other neighborhoods of \(G_{n}\), where points with better fitness values may exist. As such, when we select the value of \(\beta \) for real application of the RDPSO algorithm, it may be desirable to set \(1\le \beta <2\) for good local search ability of the particles.

With the above specification, a novel set of update equations can be obtained for the particle of the RDPSO algorithm:

$$\begin{aligned} V_{i,n+1}^{j}= & {} \alpha \big |C_{n}^{j} -X_{i,n}^{j} \big |\varphi _{i,n+1}^{j} +\beta \left( p_{i,n}^{j} -X_{i,n}^{j}\right) , \end{aligned}$$
(15)
$$\begin{aligned} X_{i,n+1}^{j}= & {} X_{i,n}^{j} +V_{i,n+1}^{j}. \end{aligned}$$
(16)

The procedure of the algorithm is outlined below in Algorithm 1. Like in the canonical PSO, the value of \(V_{i,n}^{j}\) in the RDPSO is also restricted within the interval \([-V_{\max } ,V_{\max }]\) at each iteration.

figure a

3 Analysis of the RDPSO algorithm

3.1 Dynamical behaviour of the RDPSO particle

An analysis of the behavior of an individual particle in the RDPSO is essential to understanding how the RDPSO algorithm works and how to select the algorithmic parameters. Since the particle’s velocity is the superimposition of the thermal velocity and the drift velocity, the conditions for the particle’s position to converge or to be bounded are far more complex than those given in Sect. 2.3 when only the drift motion exists. In this subsection, we will carry out theoretical and empirical studies on the stochastic dynamical behavior of the particle in the RDPSO. Since each dimension of the particle’s position is updated independently, we only need to consider a single particle in a one-dimensional space without loss of generality. As such, Eqs. (15) and (16) can be simplified as

$$\begin{aligned} V_{n+1}= & {} \alpha |C-X_n |\varphi _{n+1} +\beta (p-X_n ), \end{aligned}$$
(17)
$$\begin{aligned} X_{n+1}= & {} X_{n+1} +V_{n+1} , \end{aligned}$$
(18)

where \(X_{n}\) and \(V_{n}\) denote the current position and the velocity of the particle respectively at the \(n^{\mathrm{th}}\) iteration, and the local focus of the particle and the mean best position are denoted by p and C, which are treated as probabilistically bounded random variables, i.e., \(P\{\sup |p|<\infty \}=1\) and \(P\{\sup |C|<\infty \}=1\). In Eq. (17), \(\{\varphi _{n}\}\) is a sequence of independent identically distributed random variables with \(\varphi _{n} \sim N(0,1)\).

Since the probability distribution of \(\varphi _{n}\) is symmetrical with respect to the ordinate, Eq. (17) has the following equivalence:

$$\begin{aligned} V_{n+1} =\alpha (X_{n} -C)\varphi _{n+1} -\beta (X_{n} -p), \end{aligned}$$
(19)

that is, the probability distributions of \(V_{n+1}\) in Eqs. (17) and (19) are the same. Based on Eqs. (19) and (18), several theorems on the dynamical behavior of an individual particle in RDPSO are proved in the Appendix. As shown by Theorem 2, the particle’s behavior is related to the convergence of \(\rho _{n} =\prod _{i=1}^{n} {\lambda _{i}}\), where \(\lambda _{n} =\alpha \varphi _{n} +(1-\beta )\) subject to a normal distribution, namely, \(\lambda _{n} \sim N(1-\beta ,\alpha ^{2})\). It is showed by Theorem 3 that if and only if \(\Delta =E(\ln |\lambda _n |)\le 0\), namely, the values of \(\alpha \) and \(\beta \) satisfy the following relationship:

$$\begin{aligned} \Delta =\frac{1}{\sqrt{2\pi }\alpha }\int _{-\infty }^{+\infty } {\ln |x|e^{-\frac{[x-(1-\beta )]^{2}}{2\alpha ^{2}}}} dx\le 0, \end{aligned}$$
(20)

\(\rho _{n}\) is probabilistically bounded and, thus, the position of the particle is probabilistically bounded too. In inequality (20), the value of \(\Delta \) is an improper integral which is undefined at \(x=0\). By a Dirichlet test, the improper integral in (20) is convergent if both \(\alpha \) and \(\beta \) are two finite numbers (Courant 1989).

Inequality (20) does not give any explicit constraint relation between \(\alpha \) and \(\beta \) due to the difficulty in calculating the improper integral in the inequality. A sufficient condition for \(\Delta <0\) (i.e. \({\mathop {\hbox {lim}}\nolimits _{n\rightarrow \infty }} \rho _{n} ={\mathop {\hbox {lim} }\nolimits _{n\rightarrow \infty }} \prod _{i=1}^{n} {\lambda _{i} } =0)\) is derived in Theorems 4. It says that if the values of \(\alpha \) and \(\beta \) are subject to the constraint:

$$\begin{aligned} 0<\alpha <1, \quad 0<\beta <2, \end{aligned}$$
(21)

\(\Delta <0\) and \(\rho _{n} =\prod _{i=1}^{n} {\lambda _{i} }\) converges to zero, which, as a consequence, ensures the probabilistic boundedness of the particle’s current position. Figure 1 traces some simulation results for the stochastic behaviour of the particle with different values of \(\alpha \) and \(\beta \), with C fixed at \(X=0.001, p\) fixed at the origin and the initial position of the particle set as \(X_{0} =1000\). Figure 1a–c show the results with \(\alpha \) and \(\beta \) satisfying constraint (21). It can be observed that the particle’s position oscillated around p and C, implying that the position is probabilistically bounded in these cases. Figure 1d–i show that the particle’s position is probabilistically bounded in some cases when \(\alpha \) and \(\beta \) do not satisfy constraint (21). This verifies that constraint (21) is a sufficient condition for \(\Delta <0\) or \({\mathop {\hbox {lim} }\nolimits _{n\rightarrow \infty }} \rho _{n} =0\). At other values of \(\alpha \) and \(\beta \) that do not satisfy (21), the value of \(\ln |X_{n} -p|\) reached 700 and stopped changing after a certain number of iterations, as shown in Fig. 1j–o. In such cases, the value of \(|X_{n} -p|\) reaches the maximum positive value that the computer can identify, so that it can be considered to have diverged to infinity.

Constraint (21) is of practical significance to the application of the RDPSO algorithm, although it does not give the necessary condition for \(\Delta \le 0\). In practice, the values of \(\alpha \) and \(\beta \) can generally be selected within the intervals given by (21), for a satisfactory algorithmic performance when the algorithm is applied to real-world problems. In Sect. 4, a detailed investigation into how to select these algorithmic parameters will be undertaken by using three widely used functions and then the algorithmic performance with these parameters values will be further tested on a set of benchmark functions from the CEC2005 benchmark suite.

Fig. 1
figure 1figure 1

The figure visualizes the simulation results for the behavior of the particle at different values of \(\alpha \) and \(\beta \). ac show that when the values of \(\alpha \) and \(\beta \) are selected within the intervals (0, 1) and (0, 2), the particle’s position is probabilistically bounded. di show that the particle’s position may be also probabilitcally bounded at some values of \(\alpha \) and \(\beta \) not satisfying constraint (21). jo show some cases that when \(\alpha \) and \(\beta \) do not satisfy constraint (21), \(\ln |X_n -p|\rightarrow +\infty \) (i.e.\(|X_{n} -p|\rightarrow +\infty )\) as n increases

3.2 The RDPSO’s search behavior

In the above analysis, it is assumed that each particle in the RDPSO updates its velocity and position independently, with the mean best position C and the local focus p being treated as independent probabilistically bounded random variables, and thus it is revealed that the behavior of an individual particle is related to the convergence or the boundedness of \(\rho _{n}\). However, the actual situation is more complex when the RDPSO algorithm runs in a real-world landscape. During the search process, each particle is influenced not only by \(\rho _{n}\) but also by the points \(C_{n}\) and \(p_{i,n}\), which cannot be treated as independent random variables anymore but are dependent on the other particles. As for \(C_{n}\), it is the mean of the pbest positions of all the particles, moving with each pbest position varying in the course of search. The local focus \(p_{i,n}\), is a random point associated with the pbest position of particle \(i (P_{i,n})\) and the gbest position \(G_{n}\) that rotates among the pbest positions of the member particles according to their fitness values. In contrast to \(C_{n}, p_{i,n}\), as well as \(P_{i,n}\) and \(G_{n}\), varies more dramatically, since \(C_{n}\) averages the changes of all the pbest positions.

Generally, the pbest positions of all the particles converge to a single point when the RDPSO algorithm is performing an optimization task, which implies that \(P\{{\mathop {\hbox {lim} }\limits _{n\rightarrow \infty }} |C_{n} -p_{i,n} |=0\}=1\) as indicated in the proof of Theorem 1. Referring to Eqs. (29)–(32), we can infer that if and only if \(\Delta <0, {\mathop {\hbox {lim} }\nolimits _{n\rightarrow \infty }} |X_{i,n} -C_{n} |=0\) or \({\mathop {\hbox {lim} }\nolimits _{n\rightarrow \infty }} |X_{i,n} -p_{i,n} |=0\). That means the current positions and the pbest positions of all the particles converge to a single point when \(\Delta <0\). It can also be found from Theorems 2 and 3 that, when \(\Delta =0\), the particle’s position is probabilistically bounded and oscillates around but does not converge to \(C_{n}\) or \(p_{i,n}\), even though \(P\{{\mathop {\hbox {lim} }\nolimits _{n\rightarrow \infty }} |C_{n} -p_{i,n} |=0\}=1\). When \(\Delta >0\), it is shown by Theorems 2 and 3 that the particle’s current position diverges and the explosion of the whole particle swarm happens.

In practical applications, it is always expected that the particle swarm in the RDPSO algorithm can converge to a single point, just as in the canonical PSO. Essentially, there are two movement trends, i.e. the random motion and the drift motion, for each particle in the RDPSO, as has been described in the motivation of the algorithm. These two motions reflect the global search and the local search, respectively. The drift motion, represented by the \(VD_{i,n+1}^{j}\) in the velocity update Eq. (15), draws the particle towards the local focus and makes the particle search in the vicinity of the gbest position and its pbest position so that the particle’s current and pbest positions can constantly come close to the gbest position. On the other hand, the random component \(VR_{i,n+1}^{j}\) results in a random motion, leading the particle to be volatile so that its current position may reach a point far from the gbest position and its pbest position. This component can certainly endue the particle a global search ability, which in the canonical PSO algorithm is given by the velocity at the last iteration, i.e. \(V_{i,n+1}^{j}\). Nevertheless, an important characteristic distinguishing the RDPSO from other randomized PSO methods is that the random component of the particle’s velocity uses an adaptive standard deviation for its distribution, i.e. \(\alpha |X_{i,n}^{j} -C_{n}^{j}|\). Such a random component makes the random motion of the particle have a certain orientation. The effect of \(VR_{i,n+1}^{j}\) is pulling or pushing the particle away from the gbest position by \(C_{n}^{j}\) as shown by Fig. 2, not just displacing the particle randomly as the mutation operation does in some variants of PSO and evolutionary algorithms. Figure 2a shows that, when \(C_{n}^{j}\) is at the left side of \(X_{i,n}^{j}\) and \(G_{n}^{j}, |X_{i,n}^{j} -C_{n}^{j} |=X_{i,n}^{j} -C_{n}^{j} \). The drift component \(\beta (p_{i,n}^{j} -X_{i,n}^{j})\) draws the particle right towards \(G_{n}^{j}\). If \(\varphi _{i,n+1}^{j} >0, \alpha |X_{i,n}^{j} -C_{n}^{j} |\varphi _{i,n+1}^{j} =\alpha (X_{i,n}^{j} -C_{n}^{j} )\varphi _{i,n+1}^{j} >0\), which makes the particle move to the right further and, thus, pushes \(X_{i,n}^{j}\) away from \(G_{n}^{j}\). If \(\varphi _{i,n+1}^{j} <0, \alpha (X_{i,n}^{j} -C_{n}^{j} )\varphi _{i,n+1}^{j} <0\), whose effect is that the particle’s position is pulled away from \(G_{n}^{j}\). Figure 2b illustrates the case when \(C_{n}^{j}\) is at the right side of \(X_{i,n}^{j}\) and \(G_{n}^{j}\). Only the effect of the sign of \(\varphi _{i,n+1}^{j}\) on the direction of the particle’s motion is opposite to that in Fig. 2a. Generally speaking, the longer the distance \(|X_{i,n}^{j} -C_{n}^{j} |\), the farther the particle’s position at next iteration \(X_{i,n+1}^{j}\) will be away from the gbest position. If the particle’s position is close to the gbest position, the random component can help the particle escape the gbest position easily, when the gbest position is stuck into a local optimal solution. As far as the whole particle swarm is concerned, the overall effect is that the RDPSO has a better balance between the global search and the local search, as illustrated below.

Fig. 2
figure 2

The figure shows that the mbest position \(C_{n}^{j}\) pulls or pushes the particle away from \(G_{n}^{j}\). The direction of the particle’s movement is determined by the sign of \(\varphi _{i,n+1}^{j} \)

Fig. 3
figure 3

The figure shows that \(C_{n}\) is shifted toward the lagged particles and thus far from the particles clustering around \(G_{n}\). The particles are pulled or pushed away from the neighbourhood of \(G_{n}\) and would search the landscape globally

In the RDPSO method, the swarm could not gather around the gbest position without waiting for the lagged particles. Figure 3 depicts the concept where the pbest positions of several particles, known as the lagged particles, are located far away from the rest of the particles and the gbest position \(G_{n}\), while the rest of the particles are nearer to the global best position, with their pbest positions located within a neighbourhood of the gbest position. The mbest position \(C_{n}\) would be shifted towards the lagged particles and be located outside the neighbourhood. When the lagged particles chase after their colleagues, that is, converge to \(G_{n}, C_{n}\) approaches \(G_{n}\) slowly. The current positions of the particles within the neighbourhood would be pulled or pushed outside the neighbourhood by \(C_{n}\), and the particles would explore the landscape globally around \(G_{n}\) so that the current \(G_{n}\) could skip out onto a better solution. As \(C_{n}\) careens toward the neighbourhood, the exploration scope of the particle becomes narrower. After the lagged particles move into the neighbourhood of the gbest position, \(C_{n}\) also enters the neighbourhood and the particles then perform the same search process based on a smaller neighbourhood of the gbest position. In the canonical PSO, each particle converges to the gbest position independently and has less opportunity to escape from the neighbourhood of the gbest position. When the speed of the particle is small, it is impossible for the particles within the neighbourhood to jump out of the neighbourhood. As a result, these particles would perform local search around the gbest position and only the lagged particles could search globally. Evident from the above analysis, the RDPSO algorithm generally has a better balance between exploration and exploitation than the canonical PSO.

Moreover, different from mutation operations that play minor roles in some variants of PSO and evolutionary algorithms, the random motion has an equally important role as the drift motion in the RDPSO. Owing to the random motion oriented by \(C_{n}\), the RDPSO achieves a good balance between the local and global searches during the search process. By the influences of both \(C_{n}\) and its local focus, each particle in the RDPSO have two movement trends, convergence and divergence, but the overall effect is their convergence to a common point of all the particles if \(\Delta <0\). The convergence rate of the algorithm depends on the values of \(\alpha \) and \(\beta \), which can be tuned to balance the local and global search, when the algorithm is used for a practical problem.

4 Four variants of RDPSO

In order to investigate the RDPSO in depth, some variants of the algorithm are proposed in this paper. Two methods are used for determining the random component of the velocity. One employs Eq. (13) for this component and the other replaces the mbest position in Eq. (13) by thepbest position of a randomly selected particle in the population at each iteration. For convenience, we denote the randomly selected pbest position by \({C}'_{n}\). For each particle, the probability for its pbest position to be selected as \({C}'_{n}\) is 1 / M. Consequently, the expected value of \({C}'_{n}\) equals to \(C_{n}\), that is,

$$\begin{aligned} E({C}'_{n} )=\sum _{i=1}^{M} {\frac{1}{M}} P_{i,n} =C_{n}. \end{aligned}$$
(22)

However, as the \({C}'_{n}\) appears to be more changeful than \(C_{n}\), the current position of each particle at each iteration shows to be more volatile than that of the particle with Eq. (13), which diversifies the particle swarm and in turn enhances the global search ability of the algorithm.

In addition to the global best model, the local best model is also examined for the RDPSO, in order to make a comprehensive empirical analysis of the RDPSO algorithm in different neighborhood topologies. The ring topology is a widely used neighborhood topology for the local best model (Li 2010; Engelbrecht 2013), in which each particle connects exactly to two neighbors. The standard PSO (SPSO) in Bratton and Kennedy (2007) is defined by the integration of the PSO-Co with the ring topology. Although there are various neighborhood topologies, we chose the ring topology for the RDPSO with the local best model. Thus, the combination of the two topologies with the two strategies for the random velocity component produces the four resulting RDPSO variations:

RDPSO-Gbest: The RDPSO algorithm with the global best model and the random velocity component described by Eq. (13).

RDPSO-Gbest-RP: The RDPSO algorithm using the global best model and employing a randomly selected pbest position to determine the random velocity component.

RDPSO-Lbest: The RDPSO algorithm with the ring neighborhood topology and the random velocity component in Eq. (13), where, however, the mbest position is the mean of the pbest positions of the neighbors of each particle and the particle itself, instead of the mean of the pbest positions of all the particles in the swarm.

RDPSO-Lbest-RP: The RDPSO algorithm using the ring neighborhood topology and employing the pbest position of a particle randomly selected from the neighbors of each particle and the particle itself.

5 Experimental results and discussion

5.1 Benchmark problems

The previous analysis of the RDPSO provides us with a deep insight into the mechanism of the algorithm. However, it is not sufficient to evaluate the effectiveness of the algorithm without comparing it with other methods on a set of benchmark problems. To evaluate the RDPSO in an empirical manner, the first fourteen functions from the CEC2005 benchmark suite (Suganthan et al. 2005) were employed for this purpose. Functions \(F_{1}\) to \(F_{5}\) are unimodal, functions \(F_{6}\) to \(F_{12}\) are multi-modal, and \(F_{13}\) and \(F_{14 }\) are two expanded functions. The mathematical expressions and properties of the functions are described in detail in (Suganthan et al. 2005). The codes in Matlab, C and Java for the functions can be found at http://www.ntu.edu.sg/home/EPNSugan/. The dimension of each tested benchmark function in our experiments is 30.

5.2 Empirical studies on the parameter selection of the RDPSO variants

Parameter selection is the major concern when a stochastic optimization algorithm is employed to solve a given problem. For the RDPSO, the algorithmic parameters include the swarm size, the maximum number of iterations, the thermal coefficient \(\alpha \) and the drift coefficient \(\beta \). As in the canonical PSO, the swarm size in the RDPSO is recommended to be set from 20 to 100. The selection of the maximum number of iterations depends on the problem to be solved. In the canonical PSO, the acceleration coefficients and the inertia weight (or the constriction factor) have been studied extensively and in depth since these parameters are very important for the convergence of the algorithm. For the RDPSO algorithm, \(\alpha \) and \(\beta \) play the same roles as the inertia weight and the acceleration coefficients for the canonical PSO. In Sect. 3, it was shown that it is sufficient to set \(\alpha \) and \(\beta \) according to (21), such that \(\Delta <0\), to prevent the individual particle from divergence and guarantee the convergence of the particle swarm. However, this does not mean that such values of \(\alpha \) and \(\beta \) can lead to a satisfactory performance of the RDPSO algorithm in practical applications. This section intends to find out, through empirical studies, suitable settings of \(\alpha \) and \(\beta \) so that the RDPSO may yield satisfactory algorithmic performance in general.

There are various control methods for the parameters \(\alpha \) and \(\beta \) when the RDPSO is applied to practical problems. A simple approach is to set them as fixed values when the algorithm is executed. Another method is to decrease the value of the parameter linearly during the course of the search process. In this work, we fixed the value of \(\beta \) in all the experiments and employed the two control methods for \(\alpha \), respectively.

To specify the value of \(\alpha \) and \(\beta \) for real applications of the RDPSO, we tested the RDPSO-Gbest, RDPSO-Gbest-RP, RDPSO-Lbest, and RDPSO-Lbest-RP with different parameter settings on three frequently used functions from the CEC2005 benchmark suite: Shifted Rosenbrock Function \((F_{6})\), Shifted Rotated Griewank’s Function \((F_{7})\), and Shifted Rastrigin’s Function \((F_{9})\), using the two methods for controlling \(\alpha \) with \(\beta \) fixed at 1.5 or 1.45. The initial position of each particle was determined randomly within the initialization range. One reason why only three functions were used for parameter selection is that we want to show that the RDPSO algorithm is not very sensitive to the parameter values, and parameter values found by optimizing these three functions can lead to good performance when optimizing other functions in general. Another reason is that these three functions are widely used in the existing literature and that the optimal parameter values for each function are very different, that is, the optimal parameter values for a function may have a poor performance when used for another function.

For each parameter configuration, each algorithm, using 40 particles, was tested for 100 runs on every benchmark function. To determine the effectiveness of each algorithm for the \(\alpha \) setting under a control method with a fixed value of \(\beta \) on each problem, the best objective function value (i.e., the best fitness value) found after 5000 iterations was averaged over 100 runs of tests for that parameter setting and the same benchmark function. The results (i.e., the mean best fitness values) obtained by the parameter settings with the same control method for \(\alpha \) were compared across the three benchmarks. The best parameter setting with each control method for \(\alpha \) was selected by ranking the averaged best objective function values for each problem, summing the ranks, and taking the value that had the lowest summed (or average) rank, provided that the performance is acceptable (in the top half of the rankings) in all the tests for a particular parameter configuration.

The rankings of the results for the RDPSO-Gbest are plotted in Fig. 4. When the fixed value method was used for \(\alpha \), it was set to a range of values subject to constraint (21), with \(\beta \) fixed at 1.5 or 1.45 in each case. Results obtained for other parameter settings were very poor and are not considered for ranking. The best average rank among all the tested parameter configurations occurs when \(\alpha =0.7\) and \(\beta =1.5\). When linearly varying \(\alpha \) was used, its initial value \(\alpha _{1}\) and final value \(\alpha _{2} \, (\alpha _{1} >\alpha _{2})\) were selected from a series of different values subject to constraint (21), with \(\beta \) set at 1.5 or 1.45. Only acceptable results are ranked and plotted in Fig. 4. It was found that with \(\beta =1.45\), decreasing \(\alpha \) linearly from 0.9 to 0.3 leads to the best performance among all the tested parameter settings.

Fig. 4
figure 4

The rankings of the mean best fitness values for each of the three benchmarks and the average rank for the RDPSO-Gbest

The rankings of the results for the RDPSO-Gbest-RP are visualized in Fig. 5. It is clear from these results that the value of \(\alpha \), whether it used the fixed value or time-varying method, should be set relatively small, so that the algorithm is comparable in performance with the RDPSO-Gbest, when \(\beta \) was given. Results obtained with \(\alpha \) outside the range [0.38, 0.58] were of poor quality and were not used for ranking. As shown in Fig. 5, when the fixed value method for \(\alpha \) was used, the best average ranks among all tested parameter settings were obtained by setting \(\alpha =0.5\) and \(\beta =1.45\). On the other hand, the algorithm exhibited the average best performance when \(\beta =1.45\) and \(\alpha \) was decreasing linearly from 0.6 to 0.2, for the method of linearly varying \(\alpha \).

Fig. 5
figure 5

The rankings of the mean best fitness values for each of the three benchmarks and the average rank for the RDPSO-Gbest-RP

Fig. 6
figure 6

The rankings of the mean best fitness values for each of the three Benchmarks and the average rank for the RDPSO-Lbest

Figure 6 shows the rankings of the results for the RDPSO-Lbest. For the fixed \(\alpha \) method, the results of the algorithm obtained with \(\alpha \) outside the range [0.6, 0.78] did not participate in ranking because of their poor qualities. The best average ranking among all the tested parameter configurations in this case occur when \(\alpha =0.7\) and \(\beta =1.5\). For the linearly varying \(\alpha \) method, it was identified that decreasing \(\alpha \) linearly from 0.9 to 0.3 with \(\beta =1.45\) could yield the average best quality results among all the tested parameter configurations.

Figure 7 plots the rankings of the results for the RDPSO-Lbest-RP. For fixed \(\alpha \), the best average ranking among all the tested parameter settings could be obtained when \(\alpha =0.7\) and \(\beta =1.45\). For time-varying \(\alpha \), the algorithm obtained the average best performance among all the tested parameter configurations when \(\alpha \) was decreasing linearly from 0.9 to 0.3 with \(\beta =1.45\).

Fig. 7
figure 7

The rankings of the mean best fitness values for each of the three Benchmarks and the average rank for the RDPSO-Lbest-RP

5.3 Performance comparisons among the RDPSO variants and other PSO variants

To explore the generalizability of the parameter selection methods for \(\alpha \) used for the RDPSO in the last subsection, and to the determine whether RDPSO can be as effective as other variants of PSO, a further performance comparison using the first fourteen benchmark functions of the CEC2005 benchmark suite was made among the RDPSO algorithms (i.e., the RDPSO-Gbest, RDPSO-Gbest-RP, RDPSO-Lbest and RDPSO-Lbest-RP) and other PSO variants, including the PSO with inertia weight (PSO-In) (Shi and Eberhart 1998a, b, 1999), the PSO with constriction factor (PSO-Co) (Clerc and Kennedy 2002; Clerc 1999), the PSO-In with local best model (PSO-In-Lbest) (Liang et al. 2006), the standard PSO (SPSO) (i.e. PSO-Co-Lbest) (Bratton and Kennedy 2007), the Gaussian bare bones PSO (GBBPSO) (Kennedy 2003, 2004), the comprehensive learning PSO (CLPSO) (Liang et al. 2006), the dynamic multiple swarm PSO (DMS-PSO) (Liang and Suganthan 2005), and the fully-informed particle swarm (FIPS) (Mendes et al. 2004). Each algorithm was run 100 times for each benchmark function, using 40 particles to search the global optimal fitness value. At each run, the particles in the algorithms started in new and randomly-generated positions, which are uniformly distributed within the search bounds. Each run of every algorithm lasted for 5000 iterations, and the best fitness value (objective function value) for each run was recorded.

For the four RDPSO variants, it was shown in the last subsection that the linearly decreasing \(\alpha \) with fixed \(\beta \) was stable in the search performance, although fixing both \(\alpha \) and \(\beta \) had better results in some cases. Thus, in this group of experiments for performance comparison, the linearly decreasing \(\alpha \) with fixed \(\beta \) was used for the RDPSO variants, and the parameters for each case were set as those indentified and recommended by the previous experiments on the three benchmark functions. These parameter configurations were selected from the experiments on the three functions, so they are far from optimal. The parameter configurations for other PSO variants were the same as those recommended by the existing publications. For the PSO-In, the inertia weight linearly decreased from 0.9 to 0.4 in the course of the run and we fixed the acceleration coefficients \((c_{1}\) and \(c_{2})\) at 2.0, as in the empirical study performed by Shi and Eberhart (1999). For the PSO-Co, the constriction factor was set to be \(\chi =0.7298\), and the acceleration coefficients \(c_{1}=c_{2}=2.05\), as recommended by Clerc and Kennedy (2002). Eberhart and Shi also used these values of the parameters when comparing the performance of the PSO-Co with that of the PSO-In (Eberhart and Shi 2000). For the SPSO, the ring topology was used and other parameters were set as those in the PSO-Co (Bratton and Kennedy 2007). Parameter configurations for the GBBPSO, FIPS, DMS-PSO and CLPSO were the same as those in Kennedy (2003), Mendes et al. (2004), Liang and Suganthan (2005), Liang et al. (2006), respectively. The justification for using the recommended parameter settings for these PSO variants is that in their related papers, the parameter configurations for these algorithms were tested on different benchmark functions, including those three functions used in our experiments for the RDPSO. The performance of these parameter settings were satisfactory and, thus, were recommended by the authors.

Tables 1 and 2 record the mean and the standard deviation of the best fitness values out of 100 runs of each algorithm on each benchmark function. The bold in the tables corresponds to the smallest ones among the mean best fitness values and the standard deviations obtained for each function by all the compared algorithms. To investigate whether the differences in the mean best fitness values among the algorithms were significant, a statistical multiple comparison procedure was implemented to determine the algorithmic performance ranking for each problem in a statistical manner. The procedure employed in this work is known as the “stepdown” procedure (Day and Quinn 1989). The algorithms that were not statistically different to each other were given the same rank; those that were not statistically different to more than one other groups of algorithms were ranked with the best-performing of these groups. For each algorithm, the resulting rank for each problem and the average rank across all the tested fourteen benchmark problems are shown in Table 3.

Table 1 Mean and standard deviation of the best fitness values after 100 runs of each algorithm for \(F_{1}\) to \(F_{7}\)
Table 2 Mean and standard deviation of the best fitness values after 100 runs of each algorithm for \(F_{8}\) to \(F_{14}\)
Table 3 Ranking by algorithms and problems obtained from “Stepdown” multiple comparisons

For the Shifted Sphere Function \((F_{1})\), the RDPSO-Lbest-RP generated better results than the other methods. The results for the Shifted Schwefel’s Problem 1.2 \((F_{2})\) show that the PSO-Co and the GBBPSO performed better than the others, but the performance of the CLPSO seems to be inferior to those of other competitors due to its slow convergence speed. For the Shifted Rotated High Conditioned Elliptic Function \((F_{3})\), the RDPSO-Gbest-RP outperformed the other methods in a statistical significance manner. The SPSO was the second best performing method for this function. The RDPSO-Gbest-RP showed to be the winner among all the tested algorithms for the Shifted Schwefel’s Problem 1.2 with Noise in Fitness \((F_{4})\), and the RDPSO-Gbest was the second best performing for this problem. \(F_{5}\) is the Schwefel’s Problem 2.6 with Global Optimum on the Bounds. For this benchmark, the RDPSO-Gbest-RP occupied the first place from the perspective of the statistical test. For benchmark \(F_{6}\), the Shifted Rosenbrock Function, both the RDPSOs with the ring topology outperformed the other algorithms. The results for the Shifted Rotated Griewank’s Function without Bounds \((F_{7})\) suggest that both the RDPSOs with local best model and the SPSO were able to find the solution to the function with better quality compared to the other methods. Benchmark \(F_{8}\) is the Shifted Rotated Ackley’s Function with Global Optimum on the Bounds. The SPSO and the PSO-In-Lbest yielded better results for this problem than the others. The Shifted Rastrigin’s Function \((F_{9})\) is a separable function, which the CLPSO algorithm was good at solving and obtained remarkably better results for. It can also be observed that the RPDOS-Gbest yielded a better result than the remainders. \(F_{10}\) is the Shifted Rotated Rastrigrin’s Function, which appears to be a more difficult problem than \(F_{9}\). For this benchmark, both the RDPSO-Lbest and RDPSO-Lbest-RP outperformed the other competitors in a statistically significant manner. The best result for the Shifted Rotated Weierstrass Function \((F_{11})\) was obtained by the RDPSO-Gbest-RP. The RDPSO-Gbest yielded the second best result which shows no statistical significance with that of the RDPSO-Gbest-RP. When searching the optima of Schewefel’s Problem 2.13 \((F_{12})\), the RDPSO-Gbest-RP was found to rank first in algorithmic performance from a statistical point of view.

\(F_{13}\) is the Shifted Expand Griewank’s plus Rosenbrock’s Function, for which the RDPSO-Lbest-RP, RDPSO-Lbest, and RDPSO-Gbest yielded better results than their competitors. There are no statistically significant differences in algorithmic performance between the three RDPSO variants. For the Shifted Rotated Expanded Scaffer’s F6 Function \((F_{14})\), all the RDPSO variants showed better performance than the others in a statistically significant manner.

The average ranks listed in Table 3 reveal that the RDPSO-Gbest-RP had the best overall performance for the fourteen benchmark functions among all the tested algorithms. Across the whole suite of benchmark functions, it had fairly stable performance with the worst rank being 6 for \(F_{9}\). The second best-performing was the RPDSO-Lbest. For seven of the benchmark functions, the algorithm had the first performance ranks. However, its result for \(F_{2}\) is unsatisfactory due to its slow convergence speed. The RDPSO-Gbest had the third best overall performance. Compared to the RDPSO-Gbest-RP, the RDPSO-Gbest performed somewhat unstable, with the resulting ranks for \(F_{1}\) being only 8. The fourth best performing was the RDPSO-Lbest-RP, which did not show satisfactory performance on \(F_{2}\) and \(F_{4}\). Nevertheless, it had a significant advantage over the SPSO, the next best performing one. Between random velocity components determined by the mbest position and the random selected pbest position, the two versions of the RDPSO with the mbest position obtained the total average rank of 2.79, while the two with the randomly selected pbest position had the total average rank of 2.90. This means that there is no remarkable performance difference for the tested functions between the two different methods for determining random velocity components. What can be found from the total average ranks is that the RDPSO algorithms were able to perform better by using the global best model (with the total average rank of 2.53) than the local best model (with the total average ranks of 3.00) for the first fourteen CEC2005 benchmark functions. In addition, the total average rank over all the versions of the RDPSO is 2.84, which implies that the RDPSOs with the linearly varying \(\alpha \) and fixed \(\beta \) had a satisfactory overall performance. Therefore, it is recommended that the linearly varying \(\alpha \) method with fixed \(\beta \) should be employed when the RDPSO is used for real applications with the values of the parameters tuned finely around the values used in the experiments in this work. More specifically, for the RDPSO-Gbest, RDPSO-Lbest and RDPSO-Lbest-RP, the initial and final values of \(\alpha \) can be selected from the intervals [0.8, 1.0] and [0.2, 0.4], respectively, depending on the problem to be solved. For the RDPSO-Gbest-RP, the initial and final values of \(\alpha \) can be selected from the intervals [0.5, 0.7] and [0.1, 0.3], respectively. The drift coefficient \(\beta \) can be valued on the interval [1.45, 1.5] for all he RDPSO variants.

Except the RDPSO algorithms, the best-performing algorithm was the SPSO, i.e. the PSO-Co-Lbdest, which yielded the best results for \(F_{7}\) and \(F_{8}\). The next best algorithm was DMS-PSO, obtaining the first performance rank for \(F_{3}\) and the worst rank for \(F_{6}\). The GBBPSO was the next best-performing method. This is an important probabilistic PSO variant and had good performance for unimodal functions. The FIPS, which also employs the ring topology, ranked the first when optimizing \(F_{2}\). From the total average ranks in Table 3, it is conclusive that incorporating the ring topology into the PSO-In and the PSO-Co could enhance the overall performance of the two PSO variants on the tested benchmark functions. What should be noticed is that the CLPSO is very effective in solving separable functions such as \(F_{9}\), but not in the rotated functions and unimodal ones due to its slower convergence speed, as has been indicated in the related publication (Liang et al. 2006).

6 Conclusion

In this paper, based on our preliminary previous work, we made a comprehensive study on the RDPSO algorithm, by analyzing the particle behavior and the search mechanism of the algorithm and empirically investigating the four newly proposed variants of the RDPSO algorithm.

A comprehensive analysis of the RDPSO algorithm and its variants was made in order to have a better understanding of the mechanism behind the algorithm. Firstly, the stochastic dynamical behavior of a single particle in the RDPSO was analyzed theoretically. We derived the sufficient and necessary condition as well as a sufficient condition for the particle’s current position to be probabilistically bounded. Secondly, the search behavior of the RDPSO algorithm itself was investigated by analyzing the interaction between the particles, and it was found that the RDPSO may have a good balance between the global and the local search, due to the designed random component of the particle’s velocity. In addition, four variants of the RDPSO algorithm were proposed by combining different random velocity components with different neighborhood topologies.

Empirical studies on the RDPSO algorithm were carried out on the first fourteen benchmark functions of the well-known CEC2005 benchmark suite. Two methods of controlling the algorithmic parameters were employed, and each RDPSO variant, with each control method, was first tested on three of the benchmark functions in order to identify the parameter values that can generate satisfactory algorithmic performance. Then, the RDPSO variants with linearly decreasing thermal coefficients and fixed drift coefficients, which were identified to have stable algorithmic performance, were further compared with other forms of PSO on the fourteen functions. The experimental results show that the RDPSO algorithm is comparable with, or even better, than the other compared PSO variants in finding the optimal solutions of the tested benchmark functions.