[go: up one dir, main page]

CN110134874A - A Collaborative Filtering Method for Optimizing User Similarity - Google Patents

A Collaborative Filtering Method for Optimizing User Similarity Download PDF

Info

Publication number
CN110134874A
CN110134874A CN201910312071.8A CN201910312071A CN110134874A CN 110134874 A CN110134874 A CN 110134874A CN 201910312071 A CN201910312071 A CN 201910312071A CN 110134874 A CN110134874 A CN 110134874A
Authority
CN
China
Prior art keywords
user
similarity
item
formula
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910312071.8A
Other languages
Chinese (zh)
Inventor
安彦涵
张新鹏
吴汉舟
余江
王子驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201910312071.8A priority Critical patent/CN110134874A/en
Publication of CN110134874A publication Critical patent/CN110134874A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention proposes a kind of collaborative filtering methods for optimizing user's similarity.While not increasing server delay, the precision of proposed algorithm is improved.The characteristics of this method, is: by being standardized pretreatment to user's score data, calculate Pearson similarity, the evaluation weight of user vector distance and asymmetrical similarity weight, and then Pearson similarity is optimized, so that traditional collaborative filtering recommends precision to be improved.This method is suitable for user --- the data set of project scoring.

Description

一种优化用户相似度的协同过滤方法A Collaborative Filtering Method for Optimizing User Similarity

技术领域technical field

针对基于协同过滤的推荐系统,本发明提出了一种优化用户相似度的协同过滤方法。Aiming at the recommendation system based on collaborative filtering, the present invention proposes a collaborative filtering method for optimizing user similarity.

背景技术Background technique

互联网的快速发展和普及为用户获取、分享和传播信息提供了极大的便利。与此同时,信息量的大幅增长却降低了信息的利用率,使用户很难及时从网络中获得对自己真正有用的信息,造成信息超载问题。一种能有效应对信息超载问题的方法是设计推荐系统,它根据用户的需求、兴趣等信息,将用户感兴趣的内容和产品推荐给用户。和搜索引擎相比,推荐系统通过研究用户的兴趣、偏好,进行个性化计算,从而发现用户的兴趣点,进而引导用户发现自己的信息需求,并获取对自己有用的信息。好的推荐系统不仅能为用户提供个性化的服务,还能为不同用户建立相互之间的密切关系,让用户对推荐产生依赖。The rapid development and popularization of the Internet has provided great convenience for users to obtain, share and disseminate information. At the same time, the substantial increase in the amount of information reduces the utilization rate of information, making it difficult for users to obtain information that is truly useful to them from the network in a timely manner, resulting in information overload. A method that can effectively deal with the problem of information overload is to design a recommender system, which recommends content and products that users are interested in based on information such as users' needs and interests. Compared with search engines, recommender systems conduct personalized calculations by studying the interests and preferences of users, so as to discover the points of interest of users, and then guide users to discover their own information needs and obtain useful information for them. A good recommendation system can not only provide users with personalized services, but also establish a close relationship between different users, allowing users to rely on recommendations.

推荐系统主要包括内容过滤和协同过滤。基于内容过滤的推荐系统根据用户以前的浏览或购买记录得到用户关注项目的特征,将最符合用户兴趣特征的新项目推荐给用户。而基于协同过滤的推荐系统是通过计算用户之间历史记录的相似性得到用户间的相似程度,搜寻与目标用户偏好相似的其他用户,将这类用户感兴趣的项目推荐给目标用户。Recommendation systems mainly include content filtering and collaborative filtering. The recommendation system based on content filtering obtains the characteristics of the user's attention items according to the user's previous browsing or purchase records, and recommends new items that best match the user's interest characteristics to the user. The recommendation system based on collaborative filtering obtains the similarity between users by calculating the similarity of historical records between users, searches for other users with similar preferences to the target user, and recommends items of interest to such users to the target user.

基于内容过滤的推荐系统只考虑目标用户本身,基于协同过滤的推荐系统则充分利用了集体智慧,即在大量的人群的行为和数据中收集答案,推荐的个性化程度更高,所以协同过滤推荐算法是个性化推荐服务中应用最为广泛、有效的推荐算法。The recommendation system based on content filtering only considers the target user itself, while the recommendation system based on collaborative filtering makes full use of collective intelligence, that is, collects answers from the behavior and data of a large number of people, and the recommendation is more personalized, so collaborative filtering recommends Algorithms are the most widely used and effective recommendation algorithms in personalized recommendation services.

基于协同过滤的推荐系统又分为基于模型的协同过滤推荐系统和基于记忆的协同过滤推荐系统。前者主要是利用机器学习、数据挖掘和统计学等方法,对用户的历史数据进行训练,然后构造相对应的用户模型,利用该模型为用户提供预测和推荐,涉及矩阵分解,隐语义分析等技术。后者分为基于用户的协同过滤推荐系统和基于项目的协同过滤推荐系统。The recommendation system based on collaborative filtering is further divided into model-based collaborative filtering recommendation system and memory-based collaborative filtering recommendation system. The former mainly uses methods such as machine learning, data mining and statistics to train users' historical data, then constructs a corresponding user model, and uses the model to provide users with predictions and recommendations, involving matrix decomposition, latent semantic analysis and other technologies . The latter is divided into user-based collaborative filtering recommender systems and item-based collaborative filtering recommender systems.

传统的基于用户的协同过滤推荐系统虽采用Pearson公式度量相似度,但未对数据集进行预处理、不考虑用户评分向量间的距离、未考虑用户间的相似性关系的不平等性,会使推荐系统的推荐质量下降。为此,本发明针对基于用户的协同过滤推荐算法,对上述三点进行优化,提高推荐质量。Although the traditional user-based collaborative filtering recommendation system uses the Pearson formula to measure the similarity, it does not preprocess the data set, does not consider the distance between user rating vectors, and does not consider the inequality of the similarity relationship between users. The recommendation quality of the recommender system decreases. Therefore, the present invention optimizes the above three points for the user-based collaborative filtering recommendation algorithm to improve the recommendation quality.

发明内容SUMMARY OF THE INVENTION

本发明致力于降低传统的基于用户的协同过滤推荐算法的平均绝对误差值,有效提高推荐系统的推荐质量,提供一种优化用户相似度的协同过滤方法。The present invention is dedicated to reducing the average absolute error value of the traditional user-based collaborative filtering recommendation algorithm, effectively improving the recommendation quality of the recommending system, and providing a collaborative filtering method for optimizing user similarity.

为达到上述目的,本发明提出如下技术方案:To achieve the above object, the present invention proposes the following technical solutions:

一种优化用户相似度的协同过滤方法,通过将用户的评分向量标准化,结合用户向量距离的评价权重、非对称的相似度权重对Pearson相似度优化,最后进行用户评分的预测,具体步骤如下:A collaborative filtering method for optimizing user similarity. By standardizing the user's score vector, combining the evaluation weight of the user vector distance and the asymmetric similarity weight to optimize the Pearson similarity, and finally predicting the user's score, the specific steps are as follows:

1)筹备实验数据库:收集一定数量用户对不同项目的评分值,建立实验数据库;1) Prepare the experimental database: collect the scores of a certain number of users for different items, and establish an experimental database;

2)标准化预处理:运用Z-score方法对每个用户的评分向量进行标准化,并依据标准化后的用户评分向量,生成用户——项目评分矩阵;2) Standardization preprocessing: use the Z-score method to standardize the rating vector of each user, and generate a user-item rating matrix based on the standardized user rating vector;

3)计算用户的相似度矩阵:根据步骤2)生成的用户——项目评分矩阵,计算Pearson相似度、用户向量距离的评价权重、非对称的相似度权重;结合用户向量距离的评价权重、非对称的相似度权重对Pearson相似度进行优化,得出优化后的相似度公式,根据优化后的相似度公式计算出每个用户与其他用户的相似度,最终生成相似度矩阵;3) Calculate the similarity matrix of the user: According to the user-item scoring matrix generated in step 2), calculate the Pearson similarity, the evaluation weight of the user vector distance, and the asymmetric similarity weight; Symmetrical similarity weights optimize the Pearson similarity, obtain the optimized similarity formula, calculate the similarity between each user and other users according to the optimized similarity formula, and finally generate a similarity matrix;

4)预测评分:依据目标用户与其他用户的相似度,计算目标用户的邻居用户集合,通过评分公式对目标用户的未评分项目进行预测。4) Prediction score: According to the similarity between the target user and other users, the neighbor user set of the target user is calculated, and the unscored items of the target user are predicted by the scoring formula.

与现有技术相比,本发明具有如下的优点:Compared with the prior art, the present invention has the following advantages:

本发明方法对协同过滤技术中推荐算法模块进行用户相似度的优化,使得推荐系统在不增加服务器延时的同时,推荐质量得到有效提高。The method of the invention optimizes the user similarity for the recommendation algorithm module in the collaborative filtering technology, so that the recommendation system can effectively improve the recommendation quality without increasing the server delay.

附图说明Description of drawings

图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

下面结合附图,对本发明的具体实施例做进一步的说明。The specific embodiments of the present invention will be further described below with reference to the accompanying drawings.

本实施例针对MovieLens-100k数据集(可从网站https://movielens.org/下载)进行实例分析,该数据集涵盖943个用户对1682部电影的共计10万条评分记录,评分值为1到5之间的整数,其中1代表评价最低,5代表评价最高。每个用户对至少20部电影进行过评分。数据集中80%的数据为训练集,20%的数据为测试集。In this embodiment, instance analysis is performed on the MovieLens-100k dataset (available for download from the website https://movielens.org/), which covers a total of 100,000 rating records of 1,682 movies by 943 users, and the rating value is 1 An integer between 5 and 1, where 1 is the lowest rating and 5 is the highest rating. Each user has rated at least 20 movies. 80% of the data in the dataset is the training set and 20% of the data is the test set.

如图1所示,一种优化用户相似度的协同过滤方法,通过将用户的评分向量标准化,结合用户向量距离的评价权重、非对称的相似度权重对Pearson相似度优化,最后进行用户评分的预测,具体步骤如下:As shown in Figure 1, a collaborative filtering method for optimizing user similarity, by standardizing the user's rating vector, combining the evaluation weight of the user vector distance and the asymmetric similarity weight to optimize the Pearson similarity, and finally the user rating is calculated. The specific steps are as follows:

1)筹备实验数据库:收集一定数量用户对不同项目的评分值,建立实验数据库。1) Prepare the experimental database: collect the scores of a certain number of users for different items, and establish an experimental database.

2)标准化预处理:运用Z-score方法对每个用户的评分向量进行标准化,并依据标准化后的用户评分向量,生成用户——项目评分矩阵,具体步骤如下:2) Standardization preprocessing: Use the Z-score method to standardize the rating vector of each user, and generate a user-item rating matrix based on the standardized user rating vector. The specific steps are as follows:

设训练集中第u个用户的评分向量为Ru=(r(u,1),r(u,2),…,r(u,m)),其中r(u,m)表示用户u对项目m的评分;如式(1)所示,运用Z-score方法对Ru进行标准化,其中z(u,m)是标准化后用户u对项目m的评分,是Ru各分量的平均值,σu是Ru各分量的标准差:Let the rating vector of the uth user in the training set be R u =(r (u,1) ,r (u,2) ,...,r (u,m) ), where r (u,m) represents the pair of user u The score of item m; as shown in formula (1), the Z-score method is used to standardize R u , where z (u, m) is the score of user u on item m after normalization, is the mean of each component of R u , and σ u is the standard deviation of each component of R u :

标准化后的用户u的评分向量记为Zu=(z(u,1),z(u,2),…,z(u,m)),Zu均值为0,标准差为1。生成大小为943×1682的用户—项目评分矩阵,其中943是用户数量,1682是项目数量。Zu记录在用户——项目评分矩阵的第u行,将用户u未进行评分的项目的评分值记为0。The standardized rating vector of user u is denoted as Z u =(z (u,1) ,z (u,2) ,...,z (u,m) ), where the mean value of Z u is 0 and the standard deviation is 1. A user-item rating matrix of size 943 × 1682 is generated, where 943 is the number of users and 1682 is the number of items. Zu is recorded in the uth row of the user-item rating matrix, and the rating value of the item that user u has not rated is recorded as 0.

3)计算用户的相似度矩阵:根据步骤2)生成的用户——项目评分矩阵,计算Pearson相似度、用户向量距离的评价权重、非对称的相似度权重;结合用户向量距离的评价权重、非对称的相似度权重对Pearson相似度进行优化,得出优化后的相似度公式,根据优化后的相似度公式计算出每个用户与其他用户的相似度,最终生成相似度矩阵。以MovieLens-100k的训练集中任意两个用户u和用户v为例,计算用户u对用户v的相似度,具体步骤如下:3) Calculate the similarity matrix of the user: According to the user-item scoring matrix generated in step 2), calculate the Pearson similarity, the evaluation weight of the user vector distance, and the asymmetric similarity weight; Symmetrical similarity weights are used to optimize the Pearson similarity, and an optimized similarity formula is obtained. According to the optimized similarity formula, the similarity between each user and other users is calculated, and a similarity matrix is finally generated. Taking any two users u and user v in the training set of MovieLens-100k as an example, to calculate the similarity between user u and user v, the specific steps are as follows:

3.1)计算Pearson相似度:如式(2)所示,用Pearson相似度公式度量用户u和用户v的Pearson相似度sim(u,v),其中集合S是用户u和用户v的共同评分过的项目集合:3.1) Calculate Pearson similarity: As shown in formula (2), the Pearson similarity sim (u, v) of user u and user v is measured by the Pearson similarity formula, where set S is the common score of user u and user v. A collection of items:

3.2)计算用户向量距离的评价权重:如式(3)所示,计算Zu和Zv的用户向量距离的评价权重D(u,v),其中S是用户u和用户v的共同评分项目集合,N(S)为集合S的元素个数,α表示单独一个项目的评分差距的阈值,如果α越大,D(u,v)越接近1,如果α越小,D(u,v)越接近0:3.2) Calculate the evaluation weight of the user vector distance: As shown in formula (3), calculate the evaluation weight D (u, v) of the user vector distance between Z u and Z v , where S is the common rating item of user u and user v Set, N(S) is the number of elements in set S, α represents the threshold of the score difference of a single item, if α is larger, D (u, v) is closer to 1, if α is smaller, D (u, v ) is closer to 0:

3.3)计算非对称的相似度权重:如式(4)所示,计算用户u对用户v的非对称的相似度权重w(u,v),其中,S是用户u和用户v的共同评分项目集合,Iu是用户u的已评分项目集合,N(S)为集合S的元素个数,N(Iu)是集合Iu的元素个数:3.3) Calculate the asymmetric similarity weight: as shown in formula (4), calculate the asymmetric similarity weight w (u, v) of user u to user v, where S is the common score of user u and user v Item set, I u is the set of rated items of user u, N(S) is the number of elements in the set S, N(I u ) is the number of elements in the set I u :

3.4)用户相似度公式:如式(5)所示,通过融合式(2)、式(3)以及式(4),得到优化后用户u对用户v的相似度为sim′(u,v)3.4) User similarity formula: as shown in formula (5), by fusing formula (2), formula (3) and formula (4), the similarity between user u and user v after optimization is obtained as sim′ (u, v ) :

sim′(u,v)=D(u,v)*w(u,v)*sim(u,v) (5)sim′ (u,v) = D (u,v) *w (u,v) *sim (u,v) (5)

3.5)计算用户相似度矩阵:按式(5)计算不同用户间的相似度,最终得到943×943的用户相似度矩阵。3.5) Calculate the user similarity matrix: Calculate the similarity between different users according to formula (5), and finally obtain a user similarity matrix of 943×943.

4)预测评分:依据目标用户与其他用户的相似度,计算目标用户的邻居用户集合,通过评分公式对目标用户的未评分项目进行预测,本实例中以训练集中任意用户u的一个未评分项目a为例,计算用户u对项目a的预测评分,具体步骤如下:4) Prediction score: Calculate the neighbor user set of the target user according to the similarity between the target user and other users, and predict the unscored items of the target user through the scoring formula. In this example, an unscored item of any user u in the training set is used. Taking a as an example, to calculate the predicted score of user u for item a, the specific steps are as follows:

4.1)计算邻居用户集合:在训练集中,找到评价过项目a的用户集合,记为Ua={u(1,a),u(2,a),…,u(q,a)},其中u(q,a)表示第q个评价过项目a的用户;依据这q个用户与用户u的相似度大小,按相似度从大到小的顺序进行排序,记为U′a={u′(1,a),u′(2,a),…,u′(q,a)};再从排好序的用户集合U′a中选取前k个用户作为用户u的邻居用户集合,记为U={u′(1,a),u′(2,a),…,u′(k,a)}。4.1) Calculate the set of neighbor users: In the training set, find the set of users who have evaluated item a, denoted as U a ={u (1,a) ,u (2,a) ,...,u (q,a) }, Among them, u (q, a) represents the qth user who has evaluated item a; according to the similarity between these q users and user u, sort them in descending order of similarity, and denote it as U′ a = { u′ (1,a) ,u′ (2,a) ,…,u′ (q,a) }; then select the first k users from the sorted user set U′ a as the neighbor users of user u Set, denoted as U={u′ (1,a) ,u′ (2,a) ,…,u′ (k,a) }.

4.2)预测用户u对项目a的评分:按式(6)计算用户u对项目a的预测评分p(u,a),其中集合U是用户u的邻居用户集合,是Ru各分量的平均值,σu是Ru各分量的标准差,z(v,a)是用户v对项目a的标准化评分,sim′(u,v)是用户u对用户v的相似度:4.2) Predict the score of user u on item a: Calculate the predicted score p (u,a) of user u on item a according to formula (6), where set U is the set of neighbor users of user u, is the average value of each component of R u , σ u is the standard deviation of each component of R u , z (v,a) is the standardized score of user v to item a, sim′ (u,v) is the score of user u to user v Similarity:

如式(7),采用平均绝对误差MAE(Mean Absolute Error)来刻画推荐精度,MAE越小说明误差越小,精度越高,其中pi是用户对项目i的预测评分,ri是测试集中用户对项目i的实际评分,n是测试集中的评分数量:As shown in formula (7), the mean absolute error (MAE) is used to describe the recommendation accuracy. The smaller the MAE, the smaller the error and the higher the accuracy, where pi is the user's predicted score for item i , and ri is the test set. User's actual rating for item i, n is the number of ratings in the test set:

本实施例中邻居集合的大小取10,本发明的MAE值为0.74086,比原始采用Pearson相似度低3.06%。本发明方法计算用户相似度矩阵需要的时间为61.3秒,评分预测需要的时间为7.04秒。在实际应用中,计算用户相似度矩阵可以通过离线计算完成,而用户使用本发明推荐算法进行在线预测项目评分时,所用的实时计算时间与原始方法几乎一致,没有增加用户在线等待的时间。In this embodiment, the size of the neighbor set is 10, and the MAE value of the present invention is 0.74086, which is 3.06% lower than the original Pearson similarity. The time required for calculating the user similarity matrix by the method of the present invention is 61.3 seconds, and the time required for scoring prediction is 7.04 seconds. In practical applications, the calculation of the user similarity matrix can be completed by offline calculation, and when the user uses the recommendation algorithm of the present invention to predict the item score online, the real-time calculation time used is almost the same as the original method, and the user's online waiting time is not increased.

Claims (4)

1.一种优化用户相似度的协同过滤方法,通过将用户的评分向量标准化,结合用户向量距离的评价权重、非对称的相似度权重对Pearson相似度优化,最后进行用户评分的预测,其特征在于,具体步骤如下:1. A collaborative filtering method for optimizing user similarity. By standardizing the user's rating vector, combining the evaluation weight of the user vector distance and the asymmetric similarity weight to optimize the Pearson similarity, and finally predicting the user's rating, its characteristics Yes, the specific steps are as follows: 1)筹备实验数据库:收集一定数量用户对不同项目的评分值,建立实验数据库;1) Prepare the experimental database: collect the scores of a certain number of users for different items, and establish an experimental database; 2)标准化预处理:运用Z-score方法对每个用户的评分向量进行标准化,并依据标准化后的用户评分向量,生成用户——项目评分矩阵;2) Standardization preprocessing: use the Z-score method to standardize the rating vector of each user, and generate a user-item rating matrix based on the standardized user rating vector; 3)计算用户的相似度矩阵:根据步骤2)生成的用户——项目评分矩阵,计算Pearson相似度、用户向量距离的评价权重、非对称的相似度权重;结合用户向量距离的评价权重、非对称的相似度权重对Pearson相似度进行优化,得出优化后的相似度公式,根据优化后的相似度公式计算出每个用户与其他用户的相似度,最终生成相似度矩阵;3) Calculate the similarity matrix of the user: According to the user-item scoring matrix generated in step 2), calculate the Pearson similarity, the evaluation weight of the user vector distance, and the asymmetric similarity weight; Symmetrical similarity weights optimize the Pearson similarity, obtain the optimized similarity formula, calculate the similarity between each user and other users according to the optimized similarity formula, and finally generate a similarity matrix; 4)预测评分:依据目标用户与其他用户的相似度,计算目标用户的邻居用户集合,通过评分公式对目标用户的未评分项目进行预测。4) Prediction score: According to the similarity between the target user and other users, the neighbor user set of the target user is calculated, and the unscored items of the target user are predicted by the scoring formula. 2.根据权利要求1所述的优化用户相似度的协同过滤方法,其特征在于,所述步骤2)的具体步骤如下:设训练集中第u个用户的评分向量为Ru=(r(u,1),r(u,2),…,r(u,m)),其中z(u,m)是标准化后用户u对项目m的评分,r(u,m)表示用户u对项目m的评分;如式(1)所示,运用Z-score方法对Ru进行标准化,其中z(u,m)是标准化后用户u对项目m的评分,是Ru各分量的平均值,σu是Ru各分量的标准差:2. the collaborative filtering method of optimizing user similarity according to claim 1, is characterized in that, the concrete steps of described step 2) are as follows: let the rating vector of the uth user in the training set be R u =(r (u ,1) ,r (u,2) ,…,r (u,m) ), where z (u,m) is the user u’s rating on item m after normalization, and r (u,m) is the user u’s rating on item m The score of m; as shown in formula (1), use the Z-score method to standardize R u , where z (u, m) is the score of user u on item m after normalization, is the mean of each component of R u , and σ u is the standard deviation of each component of R u : 标准化后的用户u的评分向量记为Zu=(z(u,1),z(u,2),…,z(u,m)),Zu均值为0,标准差为1;然后,生成用户——项目评分矩阵;Zu记录在用户——项目评分矩阵的第u行,将用户u未进行评分的项目的评分值记为0。The standardized rating vector of user u is denoted as Z u = (z (u,1) ,z (u,2) ,...,z (u,m) ), the mean value of Z u is 0, and the standard deviation is 1; then , generate a user-item rating matrix; Zu is recorded in the uth row of the user-item rating matrix, and the rating value of the item that user u has not scored is recorded as 0. 3.根据权利要求1所述的优化用户相似度的协同过滤方法,其特征在于,所述步骤3)中以训练集中任意两个用户u和用户v为例,计算用户u对用户v的相似度,具体步骤如下:3. the collaborative filtering method of optimizing user similarity according to claim 1, is characterized in that, in described step 3), take any two users u and user v in training set as an example, calculate the similarity of user u to user v degree, the specific steps are as follows: 3.1)计算Pearson相似度:如式(2)所示,用Pearson相似度公式度量用户u和用户v的Pearson相似度sim(u,v),其中集合S是用户u和用户v的共同评分过的项目集合:3.1) Calculate Pearson similarity: As shown in formula (2), the Pearson similarity sim (u, v) of user u and user v is measured by the Pearson similarity formula, where set S is the common score of user u and user v. A collection of items: 3.2)计算用户向量距离的评价权重:如式(3)所示,计算Zu和Zv的用户向量距离的评价权重D(u,v),其中S是用户u和用户v的共同评分项目集合,N(S)为集合S的元素个数,α表示单独一个项目的评分差距的阈值,如果α越大,D(u,v)越接近1,如果α越小,D(u,v)越接近0:3.2) Calculate the evaluation weight of the user vector distance: As shown in formula (3), calculate the evaluation weight D (u, v) of the user vector distance between Z u and Z v , where S is the common rating item of user u and user v Set, N(S) is the number of elements in set S, α represents the threshold of the score difference of a single item, if α is larger, D (u, v) is closer to 1, if α is smaller, D (u, v ) is closer to 0: 3.3)计算非对称的相似度权重:如式(4)所示,计算用户u对用户v的非对称的相似度权重w(u,v),其中,S是用户u和用户v的共同评分项目集合,Iu是用户u的已评分项目集合,N(S)为集合S的元素个数,N(Iu)是集合Iu的元素个数:3.3) Calculate the asymmetric similarity weight: as shown in formula (4), calculate the asymmetric similarity weight w (u, v) of user u to user v, where S is the common score of user u and user v Item set, I u is the set of rated items of user u, N(S) is the number of elements in the set S, N(I u ) is the number of elements in the set I u : 3.4)用户相似度公式:如式(5)所示,通过融合式(2)、式(3)以及式(4),得到优化后用户u对用户v的相似度为sim′(u,v)3.4) User similarity formula: as shown in formula (5), by fusing formula (2), formula (3) and formula (4), the similarity between user u and user v after optimization is obtained as sim′ (u, v ) : sim′(u,v)=D(u,v)*w(u,v)*sim(u,v) (5)sim′ (u,v) = D (u,v) *w (u,v) *sim (u,v) (5) 3.5)计算用户相似度矩阵:按式(5)计算不同用户间的相似度,最终得到用户相似度矩阵。3.5) Calculate the user similarity matrix: Calculate the similarity between different users according to formula (5), and finally obtain the user similarity matrix. 4.根据权利要求1所述的优化用户相似度的协同过滤方法,其特征在于,所述步骤4)中以训练集中任意用户u的一个未评分项目a为例,计算用户u对项目a的预测评分,具体步骤如下:4. the collaborative filtering method of optimizing user similarity according to claim 1, it is characterized in that, in described step 4), take an unscored item a of any user u in training set as an example, calculate user u to item a. To predict the score, the specific steps are as follows: 4.1)计算邻居用户集合:在训练集中,找到评价过项目a的用户集合,记为Ua={u(1,a),u(2,a),…,u(q,a)},其中u(q,a)表示第q个评价过项目a的用户;依据这q个用户与用户u的相似度大小,按相似度从大到小的顺序进行排序,记为U′a={u′(1,a),u′(2,a),…,u′(q,a)};再从排好序的用户集合U′a中选取前k个用户作为用户u的邻居用户集合,记为U={u′(1,a),u′(2,a),…,u′(k,a)};4.1) Calculate the set of neighbor users: In the training set, find the set of users who have evaluated item a, denoted as U a ={u (1,a) ,u (2,a) ,...,u (q,a) }, Among them, u (q, a) represents the qth user who has evaluated item a; according to the similarity between these q users and user u, sort them in descending order of similarity, and denote it as U′ a = { u′ (1,a) ,u′ (2,a) ,…,u′ (q,a) }; then select the first k users from the sorted user set U′ a as the neighbor users of user u Set, denoted as U={u′ (1,a) ,u′ (2,a) ,…,u′ (k,a) }; 4.2)预测用户u对项目a的评分:按式(6)计算用户u对项目a的预测评分p(u,a),其中集合U是用户u的邻居用户集合,是Ru各分量的平均值,σu是Ru各分量的标准差,z(v,a)是用户v对项目a的标准化评分,sim′(u,v)是用户u对用户v的相似度:4.2) Predict the score of user u on item a: Calculate the predicted score p (u,a) of user u on item a according to formula (6), where set U is the set of neighbor users of user u, is the average value of each component of R u , σ u is the standard deviation of each component of R u , z (v,a) is the standardized score of user v to item a, sim′ (u,v) is the score of user u to user v Similarity: 如式(7),采用平均绝对误差MAE来刻画推荐精度,MAE越小说明误差越小,精度越高,其中pi是用户对项目i的预测评分,ri是测试集中用户对项目i的实际评分,n是测试集中的评分数量:As shown in formula (7), the average absolute error MAE is used to describe the recommendation accuracy. The smaller the MAE, the smaller the error and the higher the accuracy, where pi is the user's predicted score for item i , and ri is the user's score for item i in the test set. Actual ratings, n is the number of ratings in the test set:
CN201910312071.8A 2019-04-18 2019-04-18 A Collaborative Filtering Method for Optimizing User Similarity Pending CN110134874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910312071.8A CN110134874A (en) 2019-04-18 2019-04-18 A Collaborative Filtering Method for Optimizing User Similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910312071.8A CN110134874A (en) 2019-04-18 2019-04-18 A Collaborative Filtering Method for Optimizing User Similarity

Publications (1)

Publication Number Publication Date
CN110134874A true CN110134874A (en) 2019-08-16

Family

ID=67570203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910312071.8A Pending CN110134874A (en) 2019-04-18 2019-04-18 A Collaborative Filtering Method for Optimizing User Similarity

Country Status (1)

Country Link
CN (1) CN110134874A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727876A (en) * 2019-09-02 2020-01-24 南京理工大学 Individual recommendation algorithm for intelligent retail system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124782A (en) * 1990-09-14 1992-04-24 N T T Data Tsushin Kk Method and device for feature extraction
CN103559622A (en) * 2013-07-31 2014-02-05 焦点科技股份有限公司 Characteristic-based collaborative filtering recommendation method
CN106021558A (en) * 2016-05-27 2016-10-12 天津大学 Calculation method for user availability in collaborative filtering recommendation system
US20160314501A1 (en) * 2015-03-24 2016-10-27 Mxm Nation Inc. Scalable networked computing system for scoring user influence in an internet-based social network
CN107943948A (en) * 2017-11-24 2018-04-20 中国科学院电子学研究所苏州研究院 A kind of improved mixing collaborative filtering recommending method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124782A (en) * 1990-09-14 1992-04-24 N T T Data Tsushin Kk Method and device for feature extraction
CN103559622A (en) * 2013-07-31 2014-02-05 焦点科技股份有限公司 Characteristic-based collaborative filtering recommendation method
US20160314501A1 (en) * 2015-03-24 2016-10-27 Mxm Nation Inc. Scalable networked computing system for scoring user influence in an internet-based social network
CN106021558A (en) * 2016-05-27 2016-10-12 天津大学 Calculation method for user availability in collaborative filtering recommendation system
CN107943948A (en) * 2017-11-24 2018-04-20 中国科学院电子学研究所苏州研究院 A kind of improved mixing collaborative filtering recommending method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TASNIM ZAYET等: "A new weighting algorithm for collaborative filtering", 《IEEE XPLORE》 *
何汶坤等: "基于共同评分数量及差异度的协同过滤推荐算法", 《伊犁师范学院学报(自然科学版)》 *
李容等: "基于改进相似度的协同过滤算法研究", 《计算机科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727876A (en) * 2019-09-02 2020-01-24 南京理工大学 Individual recommendation algorithm for intelligent retail system
CN110727876B (en) * 2019-09-02 2022-09-30 南京理工大学 Individual recommendation algorithm for intelligent retail system

Similar Documents

Publication Publication Date Title
CN110162706B (en) Personalized recommendation method and system based on interactive data clustering
CN104391849B (en) Collaborative filtering recommendation method incorporating temporal context information
Min et al. Detection of the customer time-variant pattern for improving recommender systems
Wang et al. A new collaborative filtering recommendation approach based on naive Bayesian method
CN111475744B (en) Personalized position recommendation method based on ensemble learning
CN103886486A (en) Electronic commerce recommending method based on support vector machine (SVM)
CN111681084A (en) An e-commerce platform recommendation method based on social relationship influencing factors
CN109977299A (en) A kind of proposed algorithm of convergence project temperature and expert's coefficient
CN112380451A (en) Favorite content recommendation method based on big data
CN113836393A (en) A cold-start recommendation method based on preference adaptive meta-learning
Lee et al. A hybrid collaborative filtering-based product recommender system using search keywords
CN114358807A (en) User portrayal method and system based on predictable user characteristic attributes
Nozari et al. A novel trust computation method based on user ratings to improve the recommendation
CN107545075A (en) A kind of restaurant recommendation method based on online comment and context aware
Chen Design and Implementation of a Personalized Recommendation System Based on Deep Learning Distributed Collaborative Filtering Algorithm on Social Media Platforms
Suhaim et al. Directional user similarity model for personalized recommendation in online social networks
Borges et al. A survey on recommender systems for news data
CN119537704A (en) A time-aware user portrait modeling method based on sentiment analysis
Su et al. Integrated mining of social and collaborative information for music recommendation
CN109885748A (en) Semantic feature-based optimization recommendation method
CN110134874A (en) A Collaborative Filtering Method for Optimizing User Similarity
Sharma et al. A review on collaborative filtering using knn algorithm
CN110727867A (en) Semantic entity recommendation method based on fuzzy mechanism
Attarde et al. Survey on recommendation system using data mining and clustering techniques
Darvishy et al. New attributes for neighborhood-based collaborative filtering in news recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816