1. Introduction
Recently, visual media platforms such as YouTube, Spotify, Netflix, Twitch, and others have become increasingly popular, especially during the COVID-19 lockdown periods. These platforms typically provide recommendation lists to their users on their mobile devices, tablets, or television screens, based on their item preferences. These recommendations, presented in horizontal or vertical forms on the main screens of many media platforms, are usually based on the user’s past likes, trending items, or related demographic information. With the development and competition of recommender system technologies, users expect personalized or session-based recommendations on these platforms [
1]. However, generating online recommendations in live recommendation systems is challenging due to the absence of initial or accomplished data. This type of recommendation requires evaluating ongoing and noisy data systems rather than employing data from scratch. Whereas recommender systems have widely used traditional deterministic models like Collaborative Filtering (CF) and Content-Based Filtering (CBF) to solve this problem, they tend to offer the same recommendations to all users and require continuous updating and diversification of the home screen recommendations, due to the changing tastes of users [
2]. To address these limitations, researchers in recommendation systems have recently considered heuristic and deep learning methods to offer continuous and variable recommendations [
3]. The vital processes of a recommender system are to increase the connected nodes of the user–item graph and produce more accurate predictions between users and new items. While doing this, the system must find user-specific relations that are considered to be of quality. One of the challenges to ensuring quality is the presence of cold-start users. While most recommender systems address the problem of cold-start users in offline settings, it is crucial to consider their evolving preferences within the system itself. This is because all users can be considered cold start, due to their ever-changing and unpredictable tastes. However, recommender systems mainly offer recommendation sets for each user based on their past clicks, which might turn out to be similar, uncompelling, and poor-quality recommendations for the users [
2,
4,
5]. This challenge drives us to tackle another issue related to cold-start users: the over-specialization problem, where recommendations become too narrowly focused, potentially limiting the diversity and discovery of unexplored content.
In this paper, we deal with the problems related to the recommendations for cold-start users, personalized recommendations, over-specialization issues, and facilitating time complexity in the recommender systems. AcoRec, the proposed method in this paper, is a promising alternative that can provide diverse recommendations for the issues mentioned above. We initiate the AcoRec framework, which we developed using the Continuous Ant Colony Optimization method, ACO
ℝ, as described in [
6], to enhance the variety of user-item relationships and diversify recommendations for users in the system. Based on ACO
ℝ, AcoRec employs various item-similarity or proximity models as input to generate user-specific, probabilistic, and highly diverse recommendations based on users’ past clicks. As a meta-heuristic and hybrid framework, AcoRec seeks diverse recommendations, addressing the challenges associated with relevant recommendations for cold-start users and long-tail items. The primary approach of this study is to generate an initial prediction based on the user’s click vector using the selected item-similarity model. Subsequently, these initial predictions are updated based on the user’s clicks and the scale vector obtained by adjusting the diagonal elements of the selected item-similarity matrix to modulate the influence over the matrix. AcoRec validates the initial predictions as the preliminary pheromone values
τ and utilizes the item-similarity model as the heuristic tool for the model
η. Through this prior process, we establish new item connections for cold-start users based on their recent preferences within the context of the selected similarity model. The initial pheromone values define the user’s recent preferences for the items within the scope of the selected similarity model. AcoRec optimizes the likelihood of user–item interactions within the system to infer how the similarity model responds to user knowledge. It achieves this by maximizing the importance of items for the specific user through hyperparameter tuning in the continuous domain. Additionally, we searched within a continuous domain to conduct hyperparameter tuning, which allowed for more precise optimization of the model’s parameters. Unlike deterministic approaches, ACO
ℝ incorporates probabilistic elements and some degree of randomness to address the challenges mentioned above. Although Ant Colony Optimization (and, by extension, ACO
ℝ) has been used for decades, we revisited it with the help of advanced coding libraries, GPU capabilities, and techniques that have gained prominence with the rise of deep learning. This approach allows us to reassess its potential by leveraging modern computational advancements to better understand and possibly enhance its efficacy. However, we avoided employing such deep learning models in this study to prevent potential complications in the backpropagation process that could arise from introducing randomness into the weights. Consequently, we identified ACO
ℝ as the effective solution for the specific needs of this study. Other optimization methods could be used in future work. During training, AcoRec identifies the valuable items for the relevant user based on the expected probabilities of those items. Subsequently, our model generates a top-N recommendation list that ranks the users’ estimated probabilities of belonging to items. These predictions can vary and differ across sessions, which is the core concept of our novel model. AcoRec enabled parallelization and running on multiple processors with a row-based user recommendation structure. This approach further reduced the estimation time, making it feasible to handle large item numbers and giant user dimensions in recommendation systems, as detailed in the proposed
Section 3.3 (see Algorithm 1).
Algorithm 1 AcoRec |
Inputs: item similarity model S € Rm×m, click vector of user ru € R1×m, Sc Frobenius norm of columns of S, μ ← 0, σ ← 1, ant_size ← ant size, archive_size ← archive size, T ← epoch count, a weight template NDCG@100 Output: Predictionsu ← predictions of user u for items |
compute τ(u) according to Equation (8) construct SolutionArchive(1...arch_size) ← {} |
for epoch ← 1 to T do for each k, ant_size // sample β variable using the Gaussian distribution with mean μ and deviation σ β* = N(μ,σ) // estimate probabilities for ant, according to Equation (10) pk = τ(u) × S × Scβ* // fitness value for each ant fitness = likelihood(r(u), pk). NDCG@100(u, pk) SolutionArchive.insert(β*, fitness) end for // sort solutions and trim them for the best solutions sort(SolutionArchive, by fitness descending) trim(SolutionArchive, archive_size) update μ and σ from Solution Archive via Adam end for β = μ return τ(u) × S × Scβ |
In various scenarios, we evaluated our models on popular datasets such as MovieLens, Pinterest, and Netflix. We utilized state-of-the-art item-based similarity models (Gram, Cosine, and Jaccard) as inputs and initially compared our model with these simple baseline estimators. While our model offers recommendations that change during sessions, we aim to maintain the relevance and satisfaction of these items with the user’s preferences. We also noticed an increase in the diversity of recommended items.
The rest of the paper is organized as follows. In
Section 2 we review related works that have employed similar approaches in the literature.
Section 3 explains ACO
ℝ and our proposed method. In
Section 4 the datasets, metrics, and methods used to evaluate our model are described. In
Section 5 we compare our proposed method with the state-of-the-art methods and present evaluation results.
Section 6 includes discussions of the results. Finally,
Section 7 concludes this paper.
2. Related Work
In the existing literature, to our best knowledge, there is no study that employs ACO
ℝ in recommender systems to provide recommendations for cold-start users or long-tail item recommendations using the best available data. Most applications of Ant Colony Optimization (ACO) in recommender systems focus on the discrete version of ACO for solving combinatorial problems such as item ranking, user clustering, and collaborative filtering. These studies have employed ACO as a core implementation in the literature to address these types of problems. For example, Sobecki et al. used actual data to recommend student courses based on ACO [
7]. In addition, T-BAR, which is considered as one of the efficient probabilistic models, is also implemented using ACO [
8]. Although T-BAR is effective in offering diverse user predictions, the problem with offering effective predictions to cold-start users has prevailed. The authors proposed an updated DT-BAR (Dynamic T-BAR) to overcome the cold-start problem [
9]. In another study, Massa proposed MoleTrust, a basic collaborative filtering model that incorporates Pearson similarity and trust in recommender systems [
10]. Bedi and Sharma introduced the Trust-based Ant Recommender System (TARS), which produces recommendations by combining user trust assumptions with similarity based on Ant Colony Optimization (ACO). During training, TARS establishes new user relationships and generates predictions using updated, trusted users [
11]. In contrast, the Semantic-enhanced Trust-based Ant Recommender System (STARS) represents a more advanced model that addresses some of TARS’s limitations. STARS enhances the original approach by incorporating semantic user similarity and clustering, offering a more nuanced and progressive solution [
12]. TCFACO investigated user trust statements and developed an ACO-based collaborative filtering method aimed at predicting user effectiveness [
13]. In a different approach, Tengkiattrakul et al. combined SVD-based user factors with trustworthiness to enhance user similarity in ACO-based recommendations [
14,
15]. While TCFACO focuses on leveraging user trust for effectiveness predictions, Tengkiattrakul et al.’s work integrates matrix factorization techniques with trust metrics to improve similarity measures in the ACO framework. Bellaachia et al. introduced ALT-BAR, a progressive approach that employs an averaged localized trust-based ant recommender system specifically designed to tackle the cold-start problem in recommendations [
16]. Expanding on the TARS framework, Kaleroun et al. further refined the model by integrating item deviation distance into the prediction formula. Their enhanced model was rigorously tested against several challenges, including Shilling Attacks, Cold-Start users, Sparse Matrix issues, and Grey Sheep users [
17]. In contrast, Liao et al. focused on improving ranking accuracy through a different mechanism. They computed user and item pheromones separately, and then combined them in the rating prediction process, highlighting the role of pheromone dynamics in ranking [
18,
19]. This approach diverges from trust-based models by emphasizing pheromone-based ranking strategies. Meanwhile, Nadi et al. explored a fuzzy-based Ant Colony system for website recommendations. Their model utilized Jaccard-based user similarity and applied fuzzification to the user–item interaction matrix, presenting an alternative method for integrating user similarity and interaction into the recommendation process [
20].
The typical approach in these ACO-based recommendation system studies is as follows:
Computing user similarities using metrics such as Cosine, Jaccard, Pearson, and trust measures.
Obtaining users as nodes and selecting similar users with Ant Colony Optimization steps.
Predicting the new recommendations from similar neighbors (users) based on Resnick’s prediction formula [
21].
Conventional ACO applications for recommendation systems usually involve computations based on users, as expressed in the above. Given that the number of users typically exceeds the number of items, this leads to significant computational challenges. When a new user is added to the system, similarities with other users need to be recalculated. In contrast, our approach relies on lower-dimensional item-similarity matrices rather than user similarities. Additionally, the optimization process for the ants in our method requires minimal traversal paths rather than extensive graph-based exploration. When we model our work according to the ACO algorithm, although the nodes in the graph structure represent the items and the edge values seem to reflect the probability of the user’s interest in the neighbor item, we opted to use ACOℝ for the system’s parameter optimization. This choice was due to the inherent limitations of traditional ACO algorithms, such as their discrete nature and potential for premature convergence. ACOℝ, a more advanced variant, allows for continuous optimization of parameters, thus providing a more flexible and robust approach to fine-tuning the system’s performance. The specific details and advantages of using ACOℝ for parameter optimization are discussed in the next section.
3. Proposed Method
Deterministic recommendation models are robust algorithms, despite their simple structures. For instance, neighborhood models or regression models can be overwhelmed by many models [
22,
23]. In deterministic recommendation models, users are given a set of recommendations {
S} at time
t1, and this set {
S} remains the same as long as there is no change in the model between time
t1 and
t2. Nevertheless, we might acknowledge these results as adequate or sufficient, based on the evaluation metrics [
2]. Many researchers obtain evaluation results for algorithms by averaging the results of multiple experiments. Yet, these results can vary depending on the selection of the dataset, sampling methods, chosen metrics, and hyperparameter evaluations [
23,
24].
In heuristic-based systems, outcomes could be provided in various ways without updating the parameters or data, due to the randomness of its core, which could be an attractive feature for users. However, a challenge in providing diverse recommendation lists for a current user is that randomly recommended items may be difficult to match with the user’s taste. The recommendations given to a user are boundless, but we have inferential approximation illustrations, like a top-N recommendation list. These lists can be updated over time, but inadequate feedback may prevent these lists from changing. When the number of items m is significantly larger than the number of items to recommend n, the number of possible recommendation sets is C(m,n) = . Exhaustively evaluating all possible item sets is computationally intractable. Therefore, generating a top-N recommendation list in recommendation systems can be considered a combinatorial optimization problem. Hence, heuristic methods such as Ant Colony Optimization can be seen as an effective solution to the problem.
3.1. Ant Colony Optimization
Ant Colony Optimization models are derived from the behavior of real ants to solve many optimization problems. Ants can discover the shortest path from a food source to the nest. While traveling, each ant deposits a chemical hormone called pheromone on the ground, reflecting the pheromones the other ants deposited. It is a suitable model for mimicking the behavior of users in recommendation systems, where nodes represent items and a set of nodes visited by ants can be recommended to the users. Initially, ants are randomly distributed to the nodes in the graph. An ant
k at time
t, being in node
I, chooses the next node
j with a probability given by the random proportional rule defined in Equation (1)
where
u is a set of nodes in the neighborhood of
i,
τ is the pheromone value of the edge, and
η is the desirability of the edge. After evaluating all the ant’s tour costs in the current iteration, the pheromone values of each edge (
i,
j) are updated. The evaporation of pheromones is calculated, and better solutions are indicated by a higher amount of pheromones deposited by the ants.
3.2. Ant Colony Optimization in the Continuous Domain
Combinatorial optimization, such as classic ACO, deals with finding optimal combinations or permutations of available problem components like in the Travelling Salesman Problem (TSP) problem. However, some issues may be tackled with a combinatorial optimization that is only sometimes convenient, especially if the bounds are wide and the sensitivity of the parameters is high. In such cases, algorithms that optimize continuous variables yield better results. Blum [
25] attempted to extend ACO algorithms to tackle discrete- and continuous-optimization problems. Two approaches are presented for integrating ACO into the continuous domain. The first method uses a familiar approach to ant behavior, and the second method carries the fundamental ACO graph structure to investigate it in the continuous domain. This evolution could be flawless due to proper discretization or probabilistic search-space sampling [
26]. In the second method, Socha and Dorigo introduced the continuous Ant Colony Optimization algorithm ACO
ℝ [
27], used a Gaussian kernel probability density function (
pdf) expression for the distribution model, and presented the ACO
ℝ as a meta-heuristic framework. In ACO
ℝ, given a problem with
n decision variables, a vector
xj = {
xj,1, xj,2, xj,3, ..., xj,n} represents probabilities from a probabilistic density function as a solution by an ant,
j, and f(
xj) represents the objective function value of the solution. In ACO
ℝ, each ant represents a row of the
Solution Archive. During the iterations, the candidate solutions in the
Solution Archive are ordered according to their objective function values. Each solution has an associated weight,
ωj, which keeps the proportion of its solution quality on the whole. The weight of the
jth solution is defined in Equation (2)
where
G(j) is the value of the Gaussian function with argument
j,
μ is the distribution mean,
σ is the standard deviation, and
q is the parameter for the deviation distance of the algorithm. When
q is a small value, the high-fit solutions are promoted, and the probability intensifies with the increase in
q. By sticking with the original ACO’s pheromone model, the algorithm updates
μ and
σ values after each iteration to optimize the probability distribution. Once the initial
Solution Archive is constructed, each ant selects a distribution from the
Solution Archive with the asset of a fitness proportionate selection function such as the roulette wheel selection algorithm, and the solution probabilities of each row are obtained by dividing all sums by themselves,
In Equation (3),
p(j) is the probability of the
jth row in the
Solution Archive. The quality of the solution is calculated based on the objective function and merged with the
Solution Archive. After sorting, the first
k best solutions are selected, and the others are discarded for forthcoming iterations. For example, for a maximization problem, the
Solution Archive constructed by
k ants is ordered in descending order, where f(
x1) ≥ f(
x2) ≥ ⋯ ≥ f(
xk) and
ω1 ≥
ω2 ≥ ⋯ ≥
ωk. The sample Solution Archive structure is given in
Figure 1.
In the search process, iterations aim to find the best solution and converge the model. After each iteration, the pheromone update strategy (like ACO) is performed by adding k newly generated solutions to the Solution Archive. After sorting the solutions, the worst k solutions were eliminated, so the total number of solutions in the archive remained equal to k. This method maintains the better solutions in the Solution Archive, due to the practical guidance of ants in the search process for better quality.
In this paper, we investigated the issues associated with recommender systems (RSs), as noted in
Section 1, and utilized the ACO
ℝ to overcome these challenges. Additionally, we introduced novel enhancements to this method to address the challenges posed by RSs problems, as will be detailed in the following section.
3.3. Stochastic Approach of AcoRec
This paper introduces AcoRec, a novel method that aims to leverage Bayesian inference and users’ past click history to predict their interest in items. The approach involves utilizing a vector pheromone model and adjusting user-specific hyper-parameters to optimize expected outcomes, allowing for seamless adaptation to session-based or real-time systems tailored to individual users. In AcoRec, the probabilistic transition rule for the users, selected by ant
k who mimics user
u at time
t, is given in Equation (4),
where
τ(
u)
t represents the pheromone values for user
u on items at time
t, denotes the selected input model, and
α and
β represent the pheromone regularization and heuristic model adjustment parameters, respectively. Notably, normalization was not applied in the denominator. These parameters maximize the posterior information of the items for users, similar to the prediction process of item-based models. In item-based models, user scores for items are predicted using the base equation in (5),
where
S is an
m ×
m item-similarity matrix, and
ru is an item vector with size
m. It is shown as
ru = [
ru1, ...,
rum], where
rui equals 1 if the user
u clicks the item
i; otherwise, it is set to 0, as given in Equation (6). Suppose we accept the clicked items that users tasted before using pheromone-traced items for users. In that case, AcoRec denotes the
ru vector as pheromone vectors and
S as heuristic information between the items for further optimizations.
In this context, we estimate posterior probabilities by selecting the rows corresponding to items previously clicked by the user from the item–item-similarity model (assuming a symmetric matrix structure, where column values mirror row values in Hermitian matrices). These selected rows are then assembled into a low-rank vector to form the Lp-norm derived from the columns of this subset matrix. The norms of the user-clicked items represent the user’s actions as a pheromone vector (prior probabilities) analogous to social network behavior. This serves as an initial pheromone interpolation, aligning with the foundational principles of the ACO.
Let
xu = [
xu1, ...,
xuq] be a subset vector of
ru containing all clicked items belonging to user
u, where
q is the count of clicked items. The formula for the Lp-norm of these clicked items is shown below:
where
Su is a subset matrix of
S that only keeps the x
u item rows, the clicked items of user
u, S is the item-similarity model, and
i is the column id in the item-similarity model. In Equation (7), when the
p-value is 1, this means L1-norm, and if the
p-value is equal to 2, it is equal to L2-norm, also known as Euclidean Space. We analyzed how the similarity model responds to user knowledge by examining the probability of user–item interactions within the system. Additionally, we established a relationship between general Lp-norm vectors and user clicks. Equation (8) is used to differentiate between positive (clicked) and negative (not clicked) interactions for a user. The goal is to predict the likelihood of a user clicking on an item and to evaluate the likelihood of this prediction compared to the actual interaction. We used the Bernoulli transformation with Binary Cross Entropy (BCE) for this conversion. The formula for
τ(
u) is given by
where
represents the pheromone value of the items for user
u at time
t = 0.
Lp(
u) represents the likelihood of a user
u interacting (clicking) with an item at time
t. Based on the dataset characteristics, we must determine the appropriate transformation, either tanh or sigmoid, for
Lp(
u). Since our datasets are binary, we used these transformations to ensure that
Lp(
u) is within the [0, 1] range, suitable for representing probabilities.
In ACO, the pheromone model is pivotal in introducing randomness into the search space [
27]. The initial estimation of pheromone values for user-clicked items is conducted through Equations (4), (7), and (8). Specifically, Equation (4) governs the adjustment of pheromone values’ tendency in ACO models via the α parameter, while the
β parameter, controlling the model’s heuristic knowledge, is deemed crucial for its performance [
28]. Following initializing of pheromone values with the user’s prior clicks, we set
α = 1 to regulate the pheromone bias in our model, thereby directing our focus solely on adjusting the
β parameter. Our observations indicate that controlling the heuristic knowledge with
β enables us to either enhance the pheromone effect while mitigating bias or diminish the pheromone effect while reinforcing the tendency towards heuristic knowledge. The parametric scaling of heuristic knowledge, which gauges the resistance of Euclidean norm data to popularity, has been widely integrated into numerous models, resulting in favorable outcomes [
29,
30,
31]. This scaling ensures a coherent development path and facilitates seamless integration with the user’s selected item model. The scaling applied to the heuristic matrix
S is defined by Equation (9). This approach aligns with the principles of the ACO model and provides a clear and consistent development trajectory.
where
β is the scaling parameter and
are the Frobenius norm (
L2-norms) of each column in the
S. This
β parameter is utilized to adjust the impact of high-norm values in both popular and long-tail items, dynamically adapting, based on the specific scenario, to either emphasize or reduce their influence. Re-inserting the pheromone values and the scaled heuristic model into Equation (8) and Equation (9), respectively, Equation (4) yields the following formula, denoted as Equation (10).
Our scaling method differs significantly from others in an important aspect. While the referenced models [
29,
30,
31] apply this scaling using a uniform parameter across all users and select the best parameter in a grid search, our algorithm performs scaling parameter tuning as an internal hyper-parameter optimization. This means that the scaling parameter is computed differently for each user, enabling the generation of personalized recommendations tailored to individual user preferences. In Equation (10), by fixing the α value to 1, the term
becomes constant and pre-computable for all users. As a result, the
β value only needs to be computed from the remaining term, which significantly reduces computational cost. Instead of performing an
m ×
m matrix multiplication, this approach achieves an
O(
m) complexity through a vector-based multiplication
m × 1, making it highly efficient in terms of the computational burden. The optimized
β parameter reflects the user’s preferences and behavior [
31]. This parameter is highly user-specific and can vary significantly among users, based on their unique tastes and preferences. When
β takes on a negative value, it can highlight rare items for the user, offering a personalized touch to the recommendations. The optimal value for
β can vary for each user, and users may have multiple probabilities for the best value that maximizes their posterior. These probabilities may form either a tight distribution or a wide range. However, discrete probabilities may not provide certainty when searching for a hyper-parameter. Consequently, the optimization challenges of continuous fields have spurred new directions in Ant Colony Optimization research. Instead of relying on a discrete probability distribution, a
pdf is employed to sample the probabilistic hyper-parameters in the continuous domain. Conceptually, a node in a conventional ACO problem can be likened to a local parameter in a Gaussian Distribution.
In our model, each ant samples candidate β parameters from a Gaussian distribution G(x) = N(μ,σ), and we set μ = 0 and σ = 1 initially. Points close to each other in the continuous domain produce similar results, facilitating stochastic exploration of the optimal β parameter. We discuss finding the maximized β value in the ACOℝ domain, and provide details in Algorithm 1.
During each iteration we estimate each ant’s probability values using Equation (10), followed by a non-linear transformation to adjust for varying β values and ensure alignment in fitness measurements. Additionally, we found dropout beneficial in this process. To optimize the computation of pk, we precompute the product τ(u) × S whenever τ(u) and S remain constant before entering the loop. The model estimates the fitness value for the selected β* parameter from the count of the validation items in the recommended current top-N list.
The fitness process uses a likelihood evaluation function (e.g., Bernoulli, Gaussian) to assess the consistency between each ant’s recommended list and user preferences. We used Bernoulli for likelihood in our experiments. We incorporated NDCG@100 (a detailed formula for this metric is explained in
Section 4) as the weights, allowing us to evaluate the fitness value of each ant’s list. We utilized the process mentioned above to update the solution matrix in ACO
ℝ. The row count of the solution space equals the archive-size parameter in our approach, with each row containing a sampled
β value and its fitness value. Our approach diverges from ACO
ℝ, in that the central
Solution Archive is initially empty. Subsequently, all solutions are sorted, and the best archive-size solutions, determined by their fitness value, are selected for the next iteration. At the end of each iteration,
μ and
σ are optimized by Adam [
32] from the
Solution Archive. This training process shifts
μ of the distribution to concentrate on the best quality and best
β simultaneously. After each iteration, we applied evaporation to the
Solution Archive, ensuring the model continuously improved throughout the iterations.
Algorithm 1 presents the methodology for generating user-specific recommendations. This row-based approach significantly improves computational efficiency by independently processing each user’s data, allowing for parallel execution and reducing overall computation time. Additionally, the input matrices involved in the process are precomputed, which further streamlines the workflow. Given the additional information that τ(u) × S is precomputed, we can simplify the time complexity analysis of Algorithm 1 by ignoring this operation in the computational steps. Initializing an empty Solution Archive has a negligible time complexity, O(1). Epochs run T times, so the total complexity will be multiplied by T. Ants walk ant_size times, so the complexity of the inner operations will be multiplied by ant_size. Sampling β variable using Gaussian distribution is O(1). Since τ(u) × S is precomputed, this step is reduced to a vector–scalar multiplication with Scβ*. The time complexity of this operation is O(m), where m is the item vector size. Given that both the likelihood(r(u), pk) and NDCG@100(u, pk) have a complexity of O(m), due to the item size, the overall complexity of this step remains O(m). Inserting into an archive of size archive_size is O(1). Sorting the archive of size archive_size is O(archive_size lg(archive_size)) and trimming is O(archive_size). Updating parameters with Adam involves simple arithmetic operations over archive_size items, which is O(archive_size). Since τ(u) × S is precomputed, the returning final predictions operation is O(m). Assuming m > archive_size, total time complexity for a single user is O(T×ant_size×m + T×(m+archive_size lg(archive_size))). This approach also enhances scalability by enabling the distribution of computational tasks across multiple processors, thereby optimizing the performance of large-scale recommendation systems.
3.4. Heuristic Base of AcoRec and Item Model Selection
In ACO-based recommender systems, the distance between nodes is determined by the similarity or proximity between users or items. We prefer to measure this similarity using distance metrics in inter-nodal Euclidean space. Our model is designed to be low-dimensional and focuses on gauging a user’s interest in items rather than the distance between nodes. The relationship between items is managed through various forms, such as similarity, proximity, dissimilarity, or correlation, utilizing specific methods. Collaborative Filtering (CF) models consider the collaborative benefits of items, while Content-Based Filtering (CBF) models focus on items’ metadata (e.g., demography, mood, etc.). Graph Similarity Models are based on the relationships in the user–item network structure. Time-based models track the temporal sequences of item purchases. Latent Factor-Based Models extract hidden components from low-rank computations. Demographic Models consider collaborative behaviors in the same geographical areas.
This study evaluated three well-known item-based similarity measures for computational simplicity and popularity. Let Sm×m be the similarity matrix, i and j be the two items, Sij represent the similarity between two items, and vi and vj be the column vectors of these items.
3.4.1. Gram Matrix (Gram)
The dot-product similarity of two items equals the inner product of these item vectors, as given by the formula in Equation (11).
3.4.2. Cosine Similarity (Cosine)
The cosine similarity between the two items is the cosine of the angle between their rating vectors. It is estimated by the inner product of these item vectors divided by vector norm multiplication, as shown in Equation (12).
3.4.3. Jaccard Similarity (Jaccard)
The Jaccard similarity between two items is defined as the ratio of the number of users that co-rated items based on the number of users that rated at least either
i or
j items, as described in Equation (13).
5. Results
To assess the performance of our models, we conducted experiments in two scenarios. The first scenario was designed to evaluate the accuracy of our model in providing recommendations to cold-start users, who had fewer ratings in the system, making it challenging to offer high-quality recommendations [
42,
43].
For the first scenario, we selected heat or warm users as candidate users from the probe set who were also in the training set. We randomly assigned 100 users, each with at least one rating in the probe set and at least twenty ratings in the training set. We then transformed these warm users into cold-start users by reducing their rating counts in the training set. To do this, we considered examples from studies in the existing literature. Whereas some studies defined cold-start users by keeping only three items in the training set [
43], others used 5% of the user’s ratings [
29]. There are also other studies that experimented with numbers ranging from 1 to 20 or used percentage rates [
44]. For a more challenging approach, we kept between 5 and 10 random ratings for each selected user in the training set and removed the rest. This process turned the candidate users into cold-start users, each represented by a minimum of 5 and a maximum of 10 random ratings in the training set.
The second scenario, focused on long-tail item recommendations, was designed to test how effectively recommendations accommodate a variety of less-popular items. Popular items are familiar to users and can become monotonous over time [
45]. Therefore, recommending less-popular items can be more engaging. Traditional CF methods often concentrate on popular items or users, overshadowing diverse relationships. Since the quality of models depends on the diversity of recommendations they offer, these CF methods may struggle to generate diverse suggestions, especially with inadequate data [
46].
To create an experimental environment suitable for the long-tail item scenario, we adopted the method described in [
36]. As noted by the authors, the most prevalent 1.7% of items, accounting for 33% of the ratings in the Netflix dataset, were referred to as short-head items, whereas the remaining items were called long-tail items. Following this method, we sorted the items in all datasets by popularity, determined by rating frequency, in descending order. We marked items as short-head from top to bottom until the sum of their frequencies equaled or exceeded 33% of the total ratings, and marked the remaining as long-tail items. We kept long-tail items in the probe set and removed the others. We then created a test set from the probe set, by randomly selecting 250 users who had rated at least one long-tail item. This process allowed us to randomly choose users with less-common tastes for each repeated holdout evaluation.
Experimental results for both scenarios based on the Recall, NDCG, and Coverage metrics are summarized in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7. The best results for each column are highlighted in bold, while the second-best results are underlined. In the Coverage columns, since the random model performed well, as expected, we highlighted the second-best result in bold and the third-best result with an underline. We used three item-based models for AcoRec as input: Co-occurrence (AcoRec
Gram), Cosine-Similarity (AcoRec
Cosine), and Jaccard-Similarity (AcoRec
Jaccard).
5.1. Cold-Start User Scenario
Table 2 presents the results of cold-start experiments conducted on the ML-1M dataset, which is notably less sparse than the other datasets analyzed in our study. All three of our models outperformed their respective base models, with the AcoRec
Gram model consistently delivering superior results across most metrics. This indicates that the AcoRec algorithm, when integrated with Gram similarity, provides highly effective recommendations. Notably, after AcoRec
Gram, the AcoRec
Cosine model emerged as the second-best performer, significantly enhancing the results of the Base
Cosine model while also outperforming other models across all metrics. The AcoRec
Jaccard model, meanwhile, surpassed its base model by generating more diverse recommendation lists than all other models. Jaccard similarity is particularly effective at identifying less-obvious connections between items, making it a powerful tool for enhancing diversity in recommendation systems. However, it is important to remark that while Jaccard’s ability to find diverse connections can improve coverage, it may negatively impact relevance compared to other models. The specific improvements in our models based on their input models are discussed in detail in the Discussion section.
When comparing AcoRecGram to the closest competing models, including TARS, it demonstrates significant performance differences in several metrics. Specifically, AcoRecGram outperforms RecWalkK by 8.7%, 5.2%, 13.0%, and 5.6% in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, respectively. Compared to RecWalkPR, AcoRecGram shows improvements of 9.8%, 5.2%, 16.0%, and 6.2% across these same metrics. When compared to RP3ß, AcoRecGram offers performance gains of 11.2%, 6.3%, 9.6%, and 3.2%. Against the TARS model, AcoRecGram exhibits a significant performance improvement of 18.6%, 12.5%, 23.0%, and 12.1%. In terms of coverage, AcoRecGram provides 20.8% higher coverage than RecWalkK, 9.4% higher than RecWalkPR, 14.7% higher than RP3ß, and 175.6% higher than TARS. These results indicate that AcoRecGram offers distinct advantages over these models in both relevance and diversity.
In this experiment, RecWalkPR and RecWalkK demonstrate better performance compared to other models, aside from our AcoRecGram and AcoRecCosine models. However, RecWalkPR achieves better coverage than RecWalkK, indicating that while RecWalkK excels in list quality, RecWalkPR is more effective in covering a broader range of items. As sparsity decreases, the performance of state-of-the-art models designed for sparse datasets, such as EASER, declines. Consequently, while both EASER and RP3ß perform less effectively than RecWalk models in NDCG, they demonstrate better performance in Recall, achieving better results in Recall@10 and Recall@20. The UserKNN and TARS base models, while previously effective, lag behind in performance. Despite both models utilizing Pearson correlation for user similarity, TARS, which is based on ACO techniques, did not achieve better results compared to UserKNN. Among the evaluated models, RecWalkPR demonstrated the highest coverage outside of our models, which may be attributed to the influence of the PageRank algorithm utilized in RecWalkPR.
For AcoRec
Gram in this experiment,
tanh was used for the likelihood in Equation (8). Dropout was not applied, the ant size was 200, and the archive size was 50. These parameter settings were essential in accomplishing the model’s best performance, and the results of the experiment are shown in
Table 2.
Table 2.
Comparison of Cold-Start User Scenario on ML-1M.
Table 2.
Comparison of Cold-Start User Scenario on ML-1M.
Model | NDCG@10 | NDCG@20 | Recall@10 | Recall@20 | Coverage |
---|
Random | 0.0021 | 0.0028 | 0.0020 | 0.0036 | 53.01 |
Popular | 0.0395 | 0.0570 | 0.0528 | 0.0951 | 0.97 |
BaseGram | 0.0527 | 0.0714 | 0.0696 | 0.1130 | 2.54 |
BaseCosine | 0.0653 | 0.0890 | 0.0835 | 0.1401 | 10.22 |
BaseJaccard | 0.0583 | 0.0809 | 0.0752 | 0.1295 | 15.02 |
UserKNN | 0.0682 | 0.0923 | 0.0858 | 0.1436 | 8.82 |
TARS | 0.0662 | 0.0898 | 0.0820 | 0.1384 | 5.99 |
RecWalkPR | 0.0715 | 0.0960 | 0.0870 | 0.1460 | 15.09 |
RecWalkK | 0.0722 | 0.0960 | 0.0893 | 0.1469 | 13.67 |
EASER | 0.0666 | 0.0924 | 0.0878 | 0.1509 | 13.38 |
RP3ß | 0.0706 | 0.0950 | 0.0921 | 0.1503 | 14.39 |
AcoRecGram | 0.0785 | 0.1010 | 0.1009 | 0.1551 | 16.51 |
AcoRecCosine | 0.0731 | 0.0990 | 0.0926 | 0.1548 | 18.81 |
AcoRecJaccard | 0.0662 | 0.0901 | 0.0861 | 0.1423 | 23.38 |
Table 3 presents the results of cold-start experiments for the Netflix dataset. All three of our models outperformed their respective base models, although AcoRec
Cosine and AcoRec
Jaccard did not show significant improvement. The AcoRec
Gram model maintained its strong performance, surpassing all other methods and achieving superior results across all metrics. In this dataset, the runner-up models varied, depending on the metric: RecWalk
PR for NDCG@10, RP
3ß for NDCG@20 and Recall@20, AcoRec
Jaccard for Recall@10, and AcoRec
Cosine for Coverage. This variation reflects the diverse strengths and design focus of each model, as well as their interaction with the specific characteristics of the dataset and metrics. This observation highlights the AcoRec
Gram model’s ability to adapt effectively to different data and evaluation metrics.
Table 3.
Comparison of Cold-Start User Scenario on Netflix.
Table 3.
Comparison of Cold-Start User Scenario on Netflix.
Model | NDCG@10 | NDCG@20 | Recall@10 | Recall@20 | Coverage |
---|
Random | 0.0013 | 0.0015 | 0.0016 | 0.0020 | 33.60 |
Popular | 0.0010 | 0.0027 | 0.0017 | 0.0060 | 0.46 |
BaseGram | 0.0631 | 0.0824 | 0.0802 | 0.1262 | 20.17 |
BaseCosine | 0.0722 | 0.0925 | 0.0923 | 0.1424 | 27.13 |
BaseJaccard | 0.0720 | 0.0938 | 0.0915 | 0.1443 | 27.15 |
UserKNN | 0.0705 | 0.0901 | 0.0870 | 0.1340 | 24.77 |
TARS | 0.0665 | 0.0849 | 0.0788 | 0.1238 | 25.39 |
RecWalkPR | 0.0760 | 0.0960 | 0.0934 | 0.1429 | 28.12 |
RecWalkK | 0.0756 | 0.0960 | 0.0925 | 0.1426 | 27.30 |
EASER | 0.0755 | 0.0954 | 0.0941 | 0.1418 | 27.64 |
RP3ß | 0.0754 | 0.1001 | 0.0920 | 0.1528 | 25.30 |
AcoRecGram | 0.0784 | 0.1007 | 0.0981 | 0.1538 | 29.40 |
AcoRecCosine | 0.0742 | 0.0953 | 0.0926 | 0.1430 | 29.32 |
AcoRecJaccard | 0.0724 | 0.0950 | 0.0946 | 0.1443 | 28.66 |
When comparing AcoRecGram to other models, including TARS, significant performance differences are observed across several metrics. Specifically, AcoRecGram outperforms RecWalkK by 3.7%, 4.9%, 6.1%, and 4.9% in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, respectively. Compared to RecWalkPR, AcoRecGram shows improvements of 3.2%, 4.9%, 5.0%, and 7.6% across the same metrics. The AcoRecGram model also outperforms RP3ß by 4.0%, 0.6%, 6.6%, and 0.7%, and EASER by 3.8%, 5.6%, 4.3%, and 8.5%. When compared to the TARS model, AcoRecGram exhibits a significant performance gain of 17.9%, 18.6%, 24.5%, and 24.2% across these metrics. In terms of Coverage, AcoRecGram provides 7.7% higher coverage than RecWalkK, 4.6% higher coverage than RecWalkPR, 16.2% higher coverage than RP3ß, 6.4% higher coverage than EASER, and 15.8% higher coverage than TARS. An important observation is that while RP3ß produces results comparable to AcoRecGram in longer lists (NDCG@20, Recall@20), it does so at the expense of lower coverage.
Unlike the previous dataset, the results across different models are generally closer to each other in the Netflix dataset, with no single model clearly outperforming the others. It is also notable that while the RecWalk and EASER models perform well with shorter lists, their effectiveness diminishes with longer lists.
For AcoRecGram in this experiment, tanh was used for the likelihood in Equation (8). The dropout was set to 0.2, the ant size was 200, and the archive size was 20.
Table 4 presents the results of cold-start experiments conducted on the Pinterest dataset. All three of our models outperformed their respective base models; however, the AcoRec
Gram model did not show improvement in the NDCG@20 and Recall@20 metrics. On this dataset, AcoRec
Gram and AcoRec
Cosine demonstrated superiority over other models across all metrics except for Coverage. In terms of Coverage, the RecWalk
PR model offered more diverse recommendations, though it did not achieve the same level of success in NDCG and Recall values. While EASE
R and RP
3ß trailed our models, they performed better than other models, albeit with lower Coverage.
Table 4.
Comparison of Cold-Start User Scenario on Pinterest.
Table 4.
Comparison of Cold-Start User Scenario on Pinterest.
Model | NDCG@10 | NDCG@20 | Recall@10 | Recall@20 | Coverage |
---|
Random | 0.0024 | 0.0031 | 0.0033 | 0.0050 | 42.03 |
Popular | 0.0062 | 0.0098 | 0.0088 | 0.0180 | 0.71 |
BaseGram | 0.0679 | 0.1000 | 0.0940 | 0.1767 | 25.64 |
BaseCosine | 0.0685 | 0.0991 | 0.0928 | 0.1720 | 33.57 |
BaseJaccard | 0.0665 | 0.0947 | 0.0937 | 0.1652 | 33.99 |
UserKNN | 0.0678 | 0.0982 | 0.0964 | 0.1746 | 26.20 |
TARS | 0.0675 | 0.0936 | 0.0963 | 0.1624 | 30.97 |
RecWalkPR | 0.0677 | 0.0935 | 0.0932 | 0.1608 | 36.96 |
RecWalkK | 0.0688 | 0.0936 | 0.0964 | 0.1608 | 36.28 |
EASER | 0.0701 | 0.0995 | 0.0990 | 0.1748 | 27.55 |
RP3ß | 0.0705 | 0.0998 | 0.0986 | 0.1734 | 28.37 |
AcoRecGram | 0.0711 | 0.1007 | 0.0996 | 0.1767 | 36.66 |
AcoRecCosine | 0.0711 | 0.9998 | 0.1009 | 0.1760 | 35.01 |
AcoRecJaccard | 0.0700 | 0.1000 | 0.0944 | 0.1718 | 35.06 |
When comparing our models with others, the performance of the AcoRecGram model showed slight differences compared to RP3ß and EASER, but outperformed TARS across all metrics. Specifically, in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, AcoRecGram outperformed RP3ß by 0.9%, 0.9%, 1.0%, and 1.9%, respectively. Compared to EASER, AcoRecGram showed improvements of 1.4%, 1.2%, 0.6%, and 1.1%, respectively. Against the TARS model, AcoRecGram exhibited performance gains of 5.3%, 7.6%, 3.4%, and 8.8%, respectively. In terms of Coverage, AcoRecGram provided 29.2% higher coverage than RP3ß and 33.1% higher coverage than EASER.
In this experiment, RecWalkPR and RecWalkK exhibited poorer performance compared to other datasets, except in Coverage. RecWalk models’ reliance on the SLIM model as input likely influenced its overall success. The UserKNN and TARS models, once again, did not demonstrate a notable success.
For AcoRecGram in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.5, the ant size was 50, and the archive size was 50. For AcoRecCosine, the sigmoid was used for the likelihood conversion. Dropout was not applied, the ant size was 250, and the archive size was 50.
5.2. Long-Tail Item Scenario
Table 5 presents the results of long-tail item experiments conducted on the ML-1M dataset. Our models outperformed all others, demonstrating their effectiveness even in scenarios where input models typically favor popular items. Remarkably, all three of our models ranked in the top three across all metric measurements. The balanced relationship between high Coverage and Recall highlights the superiority of our models. Notably, while the base models struggled in the long-tail item scenario, our models that utilized the base models as inputs achieved significant success.
When comparing our models with the closest competitors, including TARS, the AcoRecJaccard model displayed significant performance advantages over RecWalkK, RecWalkPR, RP3ß, EASER, and TARS across several metrics. Specifically, in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, AcoRecJaccard outperformed RecWalkK by 71.6%, 60.9%, 59.4%, and 48.9%, respectively. Compared to RecWalkPR, AcoRecJaccard showed performance improvements of 79.9%, 66.4%, 66.0%, and 52.5%, respectively. Against the RP3ß model, AcoRecJaccard demonstrated improvements of 45.8%, 38.7%, 39.7%, and 30.8%, respectively. Compared to the EASER model, AcoRecJaccard achieved performance gains of 98.6%, 79.2%, 77.8%, and 57.5%, respectively. Against the TARS model, AcoRecJaccard exhibited a remarkable performance increase of 636.5%, 418.7%, 472.7%, and 275.3%, respectively. In terms of Coverage, AcoRecJaccard provided 55.5% higher coverage than RecWalkK, 57.6% higher coverage than RecWalkPR, 22.1% higher coverage than RP3ß, 64.2% higher coverage than EASER, and 142.9% higher coverage than TARS.
For AcoRecJaccard in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 50, and the archive size was 50.
Table 5.
Comparison of long-tail item scenario on ML-1M.
Table 5.
Comparison of long-tail item scenario on ML-1M.
Model | NDCG@10 | NDCG@20 | Recall@10 | Recall@20 | Coverage |
---|
Random | 0.0019 | 0.0028 | 0.0035 | 0.0058 | 80.62 |
Popular | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 2.94 |
BaseGram | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 3.20 |
BaseCosine | 0.0071 | 0.0099 | 0.0106 | 0.0180 | 10.38 |
BaseJaccard | 0.0125 | 0.0165 | 0.0177 | 0.0278 | 14.10 |
UserKNN | 0.0370 | 0.0543 | 0.0566 | 0.1026 | 27.64 |
TARS | 0.0137 | 0.0246 | 0.0231 | 0.0535 | 17.11 |
RecWalkPR | 0.0561 | 0.0767 | 0.0797 | 0.1317 | 26.37 |
RecWalkK | 0.0588 | 0.0793 | 0.0830 | 0.1349 | 26.73 |
EASER | 0.0508 | 0.0712 | 0.0744 | 0.1275 | 25.31 |
RP3ß | 0.0692 | 0.0920 | 0.0947 | 0.1535 | 34.05 |
AcoRecGram | 0.0991 | 0.1262 | 0.1307 | 0.2003 | 49.35 |
AcoRecCosine | 0.0946 | 0.1215 | 0.1272 | 0.1960 | 42.04 |
AcoRecJaccard | 0.1009 | 0.1276 | 0.1323 | 0.2008 | 41.56 |
Table 6 presents the results of long-tail item experiments conducted on the Netflix dataset. In this experiment, our models demonstrated clear superiority over others, with the exception of the RecWalk models. For shorter lists, RecWalk exhibited slightly better performance than our models. Specifically, in terms of NDCG@10, NDCG@20, and Recall@10, our AcoRec
Gram model lagged behind RecWalk
K by −3.2%, −2.3%, and −1.2%, respectively, and behind RecWalk
PR by −2.2%, −1.4%, and −0.8%, respectively. However, for Recall@20, AcoRec
Gram outperformed RecWalk
K by 0.2% and RecWalk
PR by 0.6%.
In the Coverage metric, all three of our models outperformed all models. It is important to note that the Coverage value for each model is based on the best NDCG@10 result. A key strength of our models is their ability to simultaneously enhance both recommendation accuracy and diversity. Achieving a Coverage result close to that of the Random model on the Netflix dataset underscores a highly successful outcome in terms of list diversity.
When comparing our models with others, the AcoRecGram model demonstrated significant performance improvements over the RP3ß, EASER, and TARS models across all metrics, except when compared to RecWalk. Specifically, AcoRecGram outperformed RP3ß in NDCG@10, NDCG@20, Recall@10, and Recall@20 by 12.7%, 10.9%, 16.0%, and 11.0%, respectively. Compared to EASER, AcoRecGram showed improvements of 24.2%, 21.5%, 21.0%, and 16.9%, respectively. Against the TARS model, AcoRecGram achieved a remarkable performance improvement of 165.5%, 141.6%, 143.5%, and 111.7%, respectively. In terms of Coverage, AcoRecGram provided 15.6% higher coverage than RP3ß, 23.2% higher than EASER, and 63.9% higher than TARS.
For AcoRecGram in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 100, and the archive size was 20.
Table 6.
Comparison of long-tail item scenario on Netflix.
Table 6.
Comparison of long-tail item scenario on Netflix.
Model | NDCG@10 | NDCG@20 | Recall@10 | Recall@20 | Coverage |
---|
Random | 0.0005 | 0.0012 | 0.0010 | 0.0028 | 61.22 |
Popular | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.60 |
BaseGram | 0.0311 | 0.0441 | 0.0422 | 0.0766 | 26.28 |
BaseCosine | 0.0854 | 0.1040 | 0.1054 | 0.1537 | 40.03 |
BaseJaccard | 0.0913 | 0.1093 | 0.1100 | 0.1563 | 40.59 |
UserKNN | 0.0730 | 0.0890 | 0.0937 | 0.1358 | 42.83 |
TARS | 0.0530 | 0.0678 | 0.0688 | 0.1076 | 36.23 |
RecWalkPR | 0.1439 | 0.1661 | 0.1688 | 0.2267 | 52.73 |
RecWalkK | 0.1454 | 0.1676 | 0.1696 | 0.2274 | 52.14 |
EASER | 0.1133 | 0.1348 | 0.1384 | 0.1949 | 48.18 |
RP3ß | 0.1249 | 0.1477 | 0.1444 | 0.2053 | 51.36 |
AcoRecGram | 0.1407 | 0.1638 | 0.1675 | 0.2278 | 59.37 |
AcoRecCosine | 0.1371 | 0.1605 | 0.1620 | 0.2259 | 58.36 |
AcoRecJaccard | 0.1315 | 0.1557 | 0.1523 | 0.2147 | 53.39 |
Table 7 presents the results of long-tail item experiments conducted on the Pinterest dataset. Consistent with the ML-1M experiment, all three of our models ranked in the top three across all metrics. Among our models, AcoRec
Cosine was the most successful, outperforming all other models in accuracy metrics except for Coverage. Similar to the results from the Netflix dataset, AcoRec
Gram achieved a Coverage score close to that of the Random model, indicating a successful outcome in terms of list diversity.
When comparing AcoRecCosine to the closest competing models, including RecWalkK, RecWalkPR, RP3ß, EASER, and TARS, it demonstrated substantial performance improvements across several metrics. Specifically, AcoRecCosine outperformed RecWalkK in NDCG@10, NDCG@20, Recall@10, and Recall@20 by 33.2%, 26.7%, 34.8%, and 26.0%, respectively. Compared to RecWalkPR, AcoRecCosine showed improvements of 26.6%, 25.5%, 27.7%, and 27.3%, respectively. Against RP3ß, AcoRecCosine improved by 22.9%, 18.5%, 21.7%, and 16.9%, respectively. In comparison with the EASER model, it demonstrated enhancements of 58.6%, 46.8%, 54.6%, and 40.8%, respectively. Compared to TARS, AcoRecCosine exhibited the most significant gains, with improvements of 141.3%, 107.6%, 136.0%, and 94.2%, respectively. In terms of Coverage, AcoRecCosine achieved 20.2% higher coverage than RecWalkK, 20.9% higher than RecWalkPR, 9.1% higher than RP3ß, 20.1% higher than EASER, and 46.9% higher than TARS.
Other than our models, when RecWalkK, RecWalkPR, RP3ß, and EASER are evaluated across the entire dataset, we observe that the EASER model lagged behind the others. This may be attributed to its parametric nature, which may not effectively highlight niche items. Graph-based models like RecWalkK, RecWalkPR, and RP3ß appear more successful in capturing new relationships. Base models generally assess products based on co-occurrence frequency, which limits their effectiveness in long-tail item scenarios. Similarly, UserKNN and TARS models underperformed compared to base models, across all three datasets.
For AcoRecCosine in this experiment, we used the sigmoid function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 200, and the archive size was 50.
Table 7.
Comparison of long-tail item scenario on Pinterest.
Table 7.
Comparison of long-tail item scenario on Pinterest.
Model | NDCG@10 | NDCG@20 | Recall@10 | Recall@20 | Coverage |
---|
Random | 0.0015 | 0.0020 | 0.0026 | 0.0041 | 70.87 |
Popular | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.91 |
BaseGram | 0.0206 | 0.0352 | 0.0299 | 0.0711 | 33.94 |
BaseCosine | 0.0345 | 0.0529 | 0.0493 | 0.1006 | 48.68 |
BaseJaccard | 0.0371 | 0.0536 | 0.0541 | 0.0995 | 48.73 |
UserKNN | 0.0218 | 0.0362 | 0.0319 | 0.0723 | 35.31 |
TARS | 0.0276 | 0.0432 | 0.0414 | 0.0855 | 47.07 |
RecWalkPR | 0.0526 | 0.0715 | 0.0765 | 0.1304 | 57.18 |
RecWalkK | 0.0500 | 0.0708 | 0.0725 | 0.1317 | 57.51 |
EASER | 0.0420 | 0.0611 | 0.0632 | 0.1179 | 57.59 |
RP3ß | 0.0542 | 0.0757 | 0.0803 | 0.1420 | 63.38 |
AcoRecGram | 0.0666 | 0.0897 | 0.0977 | 0.1660 | 69.14 |
AcoRecCosine | 0.0685 | 0.0948 | 0.1033 | 0.1799 | 65.22 |
AcoRecJaccard | 0.0648 | 0.0882 | 0.0986 | 0.1652 | 59.16 |
5.3. Effect of Parameters
Figure 2 and
Figure 3 illustrate the relationship between NDCG@10 and the number of iterations across different input models and scenarios. The horizontal x-axis represents the number of iterations, while the vertical y-axis represents NDCG@10. Our models are depicted with continuous lines, and each model’s base input is indicated by dashed horizontal lines in the color of that model. Dashed vertical lines on the y-axis indicate the best epoch value for our models. Dashed horizontal lines on the x-axis represent the best NDCG@10 values for the base models (Base
Gram, Base
Cosine, Base
Jaccard), each in the same color as their corresponding models. We tested our models in each dataset in ten steps, with their best parameter combinations, from 1 to 300 iterations. In both cold-start and long-tail item scenarios, our models started producing consistent results after a certain number of iterations. In
Figure 2, we observed the training process of the models in the cold-start scenario. Across all datasets, model results stabilized after a specific number of iterations. For example, in the ML-1M dataset, the Gram model began producing similar results around the 80th iteration and reached its peak performance at the 120th iteration. The peak levels of the models are indicated by dashed vertical lines in the graphs.
A notable feature of our study is the model’s rapid convergence and minimal stagnation, attributed to its structure. Regarding the effect of Gaussian Distribution during training, we discovered an equal distribution in all localities.
The ants converged at the same distribution position as the focus space tightened, throughout the iterations. At this point, when the variance decreased below a specified threshold, our model completed its training. During our experiments, we observed that the models quickly achieved high success, and that beyond this point further iterations did not impact its performance.
Figure 4 and
Figure 5 examine the relationship between our models ‘Ant Size’ and ‘Archive Size’ parameters. ‘Ant Size’ determines how many points you sample from your Gaussian distribution, while the ‘Archive Size’ is the number of points used as input for the
Gaussian negative log-likelihood loss. In the figures, we set the Ant Size values to {200, 100, 50} and the Archive Size values to {50, 20, 10}. For each dataset, we selected the most successful <ant size, archive size> values pair. To better understand the differences between the results, we normalized the values using max–min normalization, and displayed them in
Figure 4 and
Figure 5. We found that the best parameter values vary, depending on the dataset. For example, the AcoRec
Cosine model only produced meaningful results with the <50, 50> values (i.e., ant size = 50 and archive size = 50) in the cold-start scenario with the Netflix dataset. The AcoRec
Gram model performed better with the <200, 20> pair. The AcoRec
Jaccard model required a low ‘Ant Size’ value.
5.4. Experimental Environment and Tools
6. Discussion
While the similarity matrices (Base
Gram, Base
Cosine, and Base
Jaccard) are not particularly effective when employed alone as recommendation models in both scenarios (recommendations for cold-start users and recommendations of long-tail items), they exhibit strong performance when combined with AcoRec (i.e., AcoRec
Gram, AcoRec
Cosine, and AcoRec
Jaccard). We estimated the percentage improvements for each metric by comparing our three AcoRec models to their corresponding item-based similarity models in
Table 2,
Table 3 and
Table 4.
Table 8 illustrates the percentage enhancement of each AcoRec model over its base item-similarity model in a cold-start scenario. The results demonstrate that our AcoRec models significantly enhance the performance of their base models. Notably, the improvements in the Gram model surpass those in the Jaccard and Cosine similarity models.
On the other hand, the baseline models perform poorly in the long-tail item scenario. The cold-start scenario requires highlighting less-prominent items, which the baseline models struggle to do because of their inherent focus on popular items.
Table 9 shows each AcoRec model’s improvement percentage on its base item-similarity models (i.e., Base
Gram, Base
Cosine, Base
Jaccard) in the long-tail item scenario. The results indicate that our AcoRec models significantly enhance the performance of their base models, especially in the long-tail scenario. These improvements surpass those observed for cold-start users, demonstrating the model’s efficacy in highlighting diverse items.
The comparisons have shown that AcoRec models also provide further improvements on the Gram matrix. The Gram matrix, used as input without normalization, retains more inherent information about data relationships, proving beneficial during iterations. Notably, our models exclusively utilized implicit data, avoiding ethical concerns related to demographic, personal, or tracking data.
One of our observations from all the experiments is that, while a model may excel in one dataset, it can fail in another. However, the results of the experiments conducted in this study demonstrated that our models consistently delivered successful and stable results across all datasets. The fundamental reason for our study’s success across different scenarios is its parametric structure, which allows for flexibility in addressing diverse contexts. The cold-start and long-tail item scenarios require evaluating items under completely different conditions. In the cold-start scenario, models generally achieve success by highlighting popular items. This is evident from the success of the baseline models (Popular, Gram, Cosine, and Jaccard), which emphasize high-frequency relationships among items, predominantly found among popular items. In contrast, the long-tail item scenario focuses on the ability to highlight less-popular items. The β parameter in our algorithm is automatically tuned, allowing it to adapt to the specific requirements of each scenario and exhibit the desired behavior. Despite the failure of base algorithms in this scenario, our models have shown quite successful results using these inputs.
Another observation is that, in the experiments, an inverse correlation is observed between NDCG and Recall metrics with Coverage, where increased NDCG and Recall values led to decreased list diversity. One of the most crucial aspects of our models is their ability to improve both metrics simultaneously.
We established that data sparsity contributes to the cold-start issue and that addressing the cold-start problem effectively requires incorporating a popularity bias; our heuristic AcoRec model showed promising results in mitigating data unavailability in the cold-start and long-tail item scenarios.
If we were to discuss the drawbacks of our model, computing the input model (i.e., the similarity matrix used in our model) in high-dimensional datasets can incur computational costs. For example, computing similarities such as Gram, Cosine, or Jaccard can be challenging in high-dimensional spaces. However, such computations can be carried out as a pre-processing step, and no operations are performed on these inputs during our model’s training. Additionally, as mentioned in [
40], the Gram matrix can be computed more efficiently using the Coppersmith–Winograd algorithm.