CN115687801B

CN115687801B - Position recommendation method based on position aging characteristics and time perception dynamic similarity

Info

Publication number: CN115687801B
Application number: CN202211180601.6A
Authority: CN
Inventors: 朱俊; 韩立新; 李振旺; 梁太波; 徐逸卿; 杨忆; 李景仙
Original assignee: Nanjing Vocational University of Industry Technology NUIT
Current assignee: Nanjing Vocational University of Industry Technology NUIT
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2024-01-19
Anticipated expiration: 2042-09-27
Also published as: CN115687801A

Abstract

The invention discloses a position recommending method based on position aging characteristics and time perception dynamic similarity, which comprises the following steps: generating a user-time-position three-dimensional scoring matrix according to the original sign-in data set; and extracting a user-position two-dimensional scoring matrix of each time slot, and calculating the dynamic similarity of the positions in different time slots. Predicting a score for the non-access address using an improved project-based collaborative filtering method; calculating aging characteristic values of the position in different time slots based on time perception; realizing personalized probability density modeling by using a kernel density estimation method, and mining geographic influence; a scoring prediction mechanism integrating the historical preference of the user, the geographical distance influence, the position aging characteristic and the position dynamic similarity is constructed, and the position with higher final prediction score is recommended to the user; and defining a timeliness evaluation system of the recommendation system, and comparing the prediction accuracy and the recommendation timeliness of different recommendation systems. The invention has strong portability and wide industrial application prospect.

Description

Position recommendation method based on position aging characteristics and time perception dynamic similarity

Technical Field

The invention relates to a position recommending method based on position aging characteristics and time perception dynamic similarity, and belongs to the technical field of artificial intelligence and machine learning.

Background

In recent years, with the rapid popularization of mobile intelligent terminals and the rapid development of wireless communication technologies, location-based social networks (Location-based Social Networks, LBSNs) are becoming increasingly popular worldwide, such as a well-known foreign social network platform Foursquare, facebook, twitter, yelp, domestic bean paste, new wave microblog, public criticizing, and the like. As an internet bridge connecting the physical world with the virtual network, the location social network does not simply add location information rigidly to the traditional social network, but rather forms a more complex social network through reconstruction of the traditional social network platform and data structure. In the position social network, as consumers of information, users can learn relevant knowledge through information shared by other people, and find interesting merchants and services at any time and place, so that the daily life of people is greatly facilitated. As a producer of information, users can actively share own consumption experience through the check-in behavior of the intelligent terminal at any time and place.

The advantages and convenience of the location social network attract a large number of users and merchants to conduct information interaction in the location social network platform, and a large number of information such as locations, pictures, audios, videos and comments are accumulated. The mass data provides a new opportunity for the industry and academia to study the behavior preference of the user, but also increases the difficulty of the user to accurately find the goods or services of interest. In order to solve the problem of information overload (Information Overload) in the big data age, a recommendation system (Recommender System) is used as an important information retrieval tool and becomes an indispensable technical service means in a location social network. Through analysis of large-scale space-time data in the location social network, the recommendation system can mine the behavior mode, rules and preferences of the user, and recommend corresponding goods, services and the like to the user according to the existing project information.

In a location social network, location recommendation (Location Recommendation) is a research hotspot and emerging application in the field of recommendation systems. The main task of location recommendation is to fully mine the user's check-in preferences at different places, recommending to the user places that they may be interested in and will check in the future. The position recommendation system can help the user to explore new interesting places in the city, and can help the merchant to accurately put advertisements to target clients, so that unprecedented business opportunities are provided for the merchant. At present, domestic popular social network platforms (such as Mei Tuan, hungry, tremble, microblog, weChat friend circle and the like) all provide location recommendation services.

As an important component of the recommendation system, whether it is a developing process or a key technology, the location recommendation system stands well for the conventional recommendation system, and some location recommendation system researches consider locations as common items like movies, music, etc., and generate recommendation results by using the conventional recommendation method. The main technologies of the traditional recommendation system comprise two aspects, namely a content-based method and a collaborative filtering method, wherein the collaborative filtering method is divided into a collaborative filtering algorithm based on a memory and a collaborative filtering algorithm based on a model. The Content-based recommendation algorithm (Content-based Recommendation) uses the description information of the commodity and the individual to match the description attribute of the commodity with the individual information of the user, the interest description and the like, so that the method has obvious advantages in solving the cold start problem, but has great limitations in solving the problem of recommending rich media (such as video, pictures and music) because useful information and characteristics are difficult to extract. The memory-based collaborative filtering algorithm mainly comprises a user-based collaborative filtering (UBCF) based collaborative filtering algorithm and a project-based collaborative filtering algorithm (item-based collaborative filtering, IBCF). The UBCF method recommends commodities visited by users similar to the recommended users to the users by calculating the similarity between the users. The IBCF method is to recommend similar items of items that a user has accessed to the user by calculating the similarity between items. The UBCF and the IBCF methods are similar, but compared with a large number of commodities, the number of commodities visited by a user is very small, which brings a certain difficulty to calculating the similarity of the users, and from the commodity perspective, the similar commodities are easier to find because of more records of each commodity visited by the user. Therefore, IBCF methods tend to be superior to UBCF methods in terms of recommended effects. The model-based collaborative filtering algorithm is a generic term of a class of algorithms, and is typically represented by a matrix decomposition algorithm, for example, a low-dimensional orthogonal matrix decomposed by a Singular Value Decomposition (SVD) technology reduces noise on the basis of an original matrix, and can effectively reveal potential association between a user and a commodity.

Unlike recommendations for traditional items (e.g., movies, music, jokes, etc.), the subject of the location recommendation is an address with a geographic factor. Users access to the under-line address are highly susceptible to complex contextual environments, exhibiting social, geographic and temporal features. For example, from a social relationship, a user's preference for a location is often affected by a friend comment when the location is selected. Users are more willing to trust the sharing of friends than strangers; from a geographic feature perspective, most check-ins occur in certain restricted areas, such as the user's address or the area surrounding the office; from a temporal feature, check-in activities also exhibit some specific temporal patterns, such as a user checking in at a location near the office during the day and at bar, movie theatre or gym at night. These unique features make location recommendations different from traditional recommendation systems, so how to further introduce relationship context, location context, and time context into traditional recommendation algorithms, recommending a list of locations of interest to a user in real time has become an urgent need for various types of social application platforms.

The rich check-in records, social relations, space-time data and other heterogeneous information in the position social network have important auxiliary effects on modeling of user behaviors, and at present, some recommendation systems blend different types of contexts into position recommendation problems, but still have some disadvantages and shortcomings, and the following points are summarized:

(1) Most of the position recommendation systems mainly analyze and mine context information such as user preference, social relationship, geographic influence and the like, provide a global recommendation list for users, and cannot realize real-time recommendation according to the current time. However, the user's access needs to the location are not the same for different periods of time, e.g., 8 a.m. users may go to a breakfast store, 16 a.m. users may go to a coffee shop near the office, and 20 a.m. users may be more likely to access movie theatres, bars, etc. locations. Therefore, the current demands and preferences of the user are analyzed according to the time information, and the real-time position recommendation result is provided for the user, so that the recommendation accuracy and the user satisfaction can be effectively improved.

(2) The dynamic similarity of the position over different time periods is ignored. When the position similarity is mined in the existing research, the time dimension dynamic characteristics of the position similarity are not considered, and a global similarity matrix is shared in different time periods. How to mine fine-grained characteristics of position similarity, so that different position similarity matrixes can be generated according to time variables is a technical problem to be solved.

(3) The ageing characteristics of the location are ignored. The aging characteristics represent the novelty and popularity of the location at the current time, and prior studies have ignored mining of the location aging characteristics. In fact, locations with a high recent access frequency are more popular with users than locations with a higher early access frequency, and there is a certain correlation between the aging characteristics of the locations and the probability of being accessed. Therefore, a clear definition and calculation method must be provided for the aging characteristic of the location, and the aging characteristic result is applied to the recommendation process, so as to improve the timeliness of the recommendation system.

(4) A timeliness assessment system for the location recommendation system is lacking. In a location social network, a user's access habits are particularly affected by location aging characteristics, e.g., a mall will always attract more customers just in the business. Therefore, whether the position which is more suitable for the recent interest preference of the user can be timely recommended to the user is an important evaluation index in the evaluation of the position recommendation system. However, the existing recommendation technology mainly researches on accuracy of recommendation results, and timeliness evaluation in a position recommendation scene is ignored.

The defects of the existing position recommending technology are caused by the large defects in the design, development, deployment and operation of social network platforms at different positions, and particularly the service quality of a recommending system is reduced on the network platform with massive project information, so that the sales performance of an electronic commerce system is affected.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a position recommending method based on position aging characteristics and time perception dynamic similarity, which aims at constructing a real-time position recommending system with high accuracy and strong timeliness, considers the dynamic rule of the position similarity changing along with time dimension, innovatively digs the position dynamic similarity based on time perception, and effectively improves the predicting accuracy of the position recommending system. Meanwhile, the invention provides a definition and calculation method of the position aging characteristic, and the position aging characteristic value is fused in the recommendation process, so that the recommendation timeliness of the position recommendation system is improved. In addition, the invention takes the system theory as a theoretical basis, takes the evaluation system as an essential component of the recommendation system, not only measures the prediction accuracy of the recommendation system, but also examines the timeliness of the recommendation system. The invention innovatively provides a timeliness evaluation system of the position recommendation system, and provides important technical support for quantifying the novelty of the output result of the position recommendation system.

The technical scheme adopted for solving the technical problems is as follows: dividing a day into 8 time slots, dividing a user-time-position three-dimensional scoring matrix according to the time slots, extracting a user-position two-dimensional scoring matrix corresponding to each time slot, and calculating the dynamic similarity of positions in different time slots by using Jaccard coefficients according to each scoring matrix; improving the traditional collaborative filtering method based on the project, and calculating the score of the user to the non-accessed address by using the dynamic similarity of the positions; defining a position aging characteristic calculation method based on time perception, and excavating aging characteristic values of each position in each time slot one by one; fitting personalized distribution of user check-in by using a kernel density estimation method, quantifying influence of geographical distance on user check-in, constructing a personalized probability density model, and calculating check-in probability of a user on an unvisited address; and comprehensively considering the historical preference of the user, the geographical distance influence, the position aging characteristics and the dynamic similarity of the position, calculating the final prediction scores of the user context, the position context and the time context fusion, sequencing the final prediction scores of all the non-accessed addresses, and recommending a plurality of positions with the top ranking to the user.

The method comprises the following specific processes:

step 1: collecting and arranging an original check-in data set, converting specific time in a check-in record into different time slots, counting the check-in times of users in different time slots on each position, and converting the check-in record into a user-time-position three-dimensional scoring matrix according to the statistical result.

Step 2: dividing the three-dimensional scoring matrix of the user-time-position according to time slots, extracting the two-dimensional scoring matrix of the user-position in each time slot, and calculating the dynamic similarity of the positions in different time slots by using Jaccard coefficients according to each scoring matrix. Traditional collaborative filtering methods based on items are improved, and the dynamic similarity of locations and the scoring that the user has completed are used to predict their scoring of non-visited addresses.

Step 3: and for each position, recording the specific time of each user in the last time of accessing the position in each time slot, and calculating the time effect characteristic value of the position in each time slot for each user one by one. And in different time slots, averaging the position aging characteristic values of all users accessing the address, and calculating the aging characteristic value of the position based on time perception.

Step 4: according to longitude and latitude information, geographic distances among all positions in the sign-in data set are calculated, personalized probability density modeling is achieved by using a nuclear density estimation method, and geographic feature influences are personalized excavated.

Step 5: comprehensively considering the influence of the user context, the position context and the time context on the sign-in behavior of the user, constructing a scoring prediction mechanism integrating the historical preference of the user, the influence of the geographic distance, the position aging characteristic and the position dynamic similarity, and recommending a plurality of positions with higher final prediction scores to the user.

Step 6: and (5) providing an timeliness evaluation index and defining an timeliness evaluation system of the recommendation system. And respectively comparing the prediction accuracy and the recommendation timeliness of the recommendation system and other classical recommendation systems provided by the invention by using an accuracy evaluation index and a timeliness evaluation index, and evaluating the accuracy and the effectiveness of the proposed technology.

The beneficial effects are that:

1. the position recommending method based on the position aging characteristic and the time perception similarity further considers the influence of the time context on the sign-in behavior of the user on the basis of comprehensively considering the context information such as the user preference, the geographic influence and the like, and expands the recommending result from a binary relation of 'user-position' to a three-dimensional model of 'user-time-position'. According to the technical scheme, the current demands and preferences of the users are analyzed according to the time information, and real-time position recommendation results are provided for the users dynamically, so that the user viscosity of the position social network platform is improved, merchants can be helped to push advertisements for the users in real time, and greater commercial benefits are brought to the merchants.

2. According to the method, the dynamic rule of the position similarity along with the change of the time dimension is considered, the position similarity based on time perception is innovatively mined, the self-adaptive recommendation is realized on the basis of collaborative filtering based on items, and the prediction accuracy of a position recommendation system is effectively improved. The technical scheme provided by the method not only can be oriented to a collaborative filtering system based on the project, but also can be applied to a collaborative filtering method based on the user, and has certain portability.

3. The invention innovatively digs the aging characteristic of the position, quantifies the attenuation degree of the attraction of the position to the user along with the time, and enables the recommendation result to be more in line with the aging preference of the user. The recommendation method can provide more addresses meeting the preference of the recent interests for the user, greatly improves the use satisfaction degree of the user on the social network platform, and has very important significance for practical application.

4. The invention defines a timeliness evaluation system of the position recommendation system, provides a technical index for detecting whether the recommendation result accords with the preference of the recent interest of the user, and fills the blank in the timeliness evaluation field of the position recommendation system. In practical application, the timeliness evaluation result is fed back to the recommendation system, so that the updating capability of the recommendation system is improved better, and the robustness of the recommendation system is enhanced.

5. The method can be applied to a position recommendation system, can be applied to the personalized recommendation field of other traditional projects, has strong portability and has wide industrialized application prospect.

Drawings

FIG. 1 is a flow chart of a location recommendation method based on location aging characteristics and time-aware similarity according to the present invention.

FIG. 2 is a flowchart showing the specific steps of a location recommendation method based on location aging characteristics and time-aware similarity according to the present invention.

Fig. 3 is a schematic diagram of statistics of the number of location accesses of users and the number of users in the embodiment of the present invention.

Fig. 4 is a box plot of accuracy Precision after 100 runs of the location recommendation method based on location aging characteristics and time-aware similarity in an embodiment of the present invention.

FIG. 5 is a box plot of Recall after 100 runs of a location recommendation method based on location aging characteristics and time-aware similarity in an embodiment of the invention.

Fig. 6 is a box diagram of the recommended accuracy index F1 after 100 runs of the position recommendation method based on the position aging characteristic and the time-aware similarity in the embodiment of the present invention.

Fig. 7 is a histogram of accuracy Precision comparison of the recommendation method and classical project-based collaborative filtering (IBCF), user-based collaborative filtering (UBCF), kernel density estimation-based access probability prediction method (KDE) in an embodiment of the present invention.

FIG. 8 is a histogram of Recall contrast for the recommendation method and classical project-based collaborative filtering (IBCF), user-based collaborative filtering (UBCF), kernel density estimation-based access probability prediction method (KDE) in an embodiment of the invention.

FIG. 9 is a histogram of comparison of the integrated accuracy index F1 values of the recommendation method and classical project-based collaborative filtering (IBCF), user-based collaborative filtering (UBCF), kernel density estimation-based access probability prediction method (KDE) in an embodiment of the present invention.

FIG. 10 is a bar graph of timeliness index value comparisons of a recommendation method and a classical project-based collaborative filtering (IBCF), user-based collaborative filtering (UBCF), kernel density estimation-based access probability prediction method (KDE) in an embodiment of the invention.

Detailed Description

The invention will be described in further detail with reference to the drawings.

As shown in fig. 1 and 2, the specific flow of the design and implementation of the present invention includes the following steps:

step 1: collecting and arranging an original check-in data set, converting specific time in a check-in record into different time slots, counting the check-in times of users in different time slots on each position, and converting the check-in record into a user-time-position three-dimensional scoring matrix according to the statistical result. The operation steps are as follows:

Step 1-1: the original check-in data set C is collected, the check-in Time in each check-in record is rounded, the rounded check-in Time is recorded as Time, and the set of Time values is time= {0,1,2,3, …,23}. The time of day is divided into 8 discrete time slots T, the set of time slots being denoted t= {0,1,2, …,7}. The corresponding conversion relation between the rounded sign-in time and the time slot t is as follows:

step 1-2: sorting the check-in data set converted into time slots to obtain n check-in records, and recording as C= { C ₁ ,c ₂ ,…,c _n }. Wherein each check-in record contains the user ID, check-in time slot t, and the ID, longitude and latitude information of the accessed location, recorded asc _i ＝<userID,t,locationID,longitude,latitude>，i∈[1,n]. All users in the check-in data set are collected as U, all positions are collected as L, and the number of users and positions are respectively marked as NU and NL.

Step 1-3: the score of user u for location l at time slot t is defined as: if the user u accesses the position l in the sign-in time period corresponding to the time slot t, scoring r _u,t,l =1; conversely, r _u,t,l ＝0。

Summarizing all scores to form a user-time-location three-dimensional scoring matrix r= { R _u,t,l }，u∈U，t∈[0,7]L e L, the scoring matrix R has NU x 8 rows and NL columns, where NU and NL are the total number of users and locations, respectively.

Step 2: dividing the three-dimensional scoring matrix of the user-time-position according to time slots, extracting the two-dimensional scoring matrix of the user-position in each time slot, and calculating the dynamic similarity of the positions in different time slots by using Jaccard coefficients according to each scoring matrix. Traditional collaborative filtering methods based on items are improved, and the dynamic similarity of locations and the scoring that the user has completed are used to predict their scoring of non-visited addresses. The specific operation steps are as follows:

step 2-1: dividing the user-time-position three-dimensional scoring matrix into eight user-position two-dimensional scoring matrices from t=0 to t=7 according to the value of the time slot t, and marking as R _t ＝{r _u,l }，t∈[0,7]，u∈U，l∈L。

Step 2-2: for each two-dimensional scoring matrix R _t The Jaccard coefficient which is more suitable for binary scoring is selected to calculate the position similarity, and the position l at each time slot t is calculated respectively _i And l _j Dynamic similarity between based on time perception:

wherein i is e [1, NL ]]，j∈[1,NL]，t∈[0,7]，U _i,t Indicating that the position l was accessed at time slot t _i Is set by the user, U _j,t Indicating access at time slot tOver position l _j Is a set of users.

Step 2-3: some target user u in selected-location social network _a As a recommended service object, the current recommended time is used for time _r Conversion to the corresponding time slot t _r 。

Step 2-4: improving the traditional collaborative filtering method based on the project, and utilizing the dynamic similarity of the positions and the target user u _a The score that has been completed predicts its score for the unvisited address/:

wherein u is _a Is a target object of the current service of the recommendation system, t _r Is the corresponding time slot when the recommendation system provides the recommendation service, L is a position which is not visited by the target user, L represents all position sets, sim (L, L', t) _r ) Representing the dynamic similarity between positions l and l' at time slot tr,representing target user u _a In time slot t _r The position l' is scored.

The implementation steps are as follows:

step 3-1: recording the oldest time and the latest time of the check-in behavior in the check-in data set, and respectively recording as minT and maxT.

Step 3-2: for each location, all users who have accessed a location l are denoted as a set U _{_l} . If user U epsilon U _{_l} Recording the time of the last access of the user u to the position l in the time slot t as the recontt (u, l, t). For the useru, the time characteristic value timeline (u, l, t) of the position l in the time slot t is calculated as follows:

wherein minT and maxT are the oldest time and the latest time when the check-in action occurs in the check-in dataset, respectively, and the recentT (u, l, t) is the time when the user u has last accessed the location l in the time slot t.

Step 3-3: for U accessing position l _{_l} The timeline (u, l, t) values of all users in the set are averaged, and the time-aware-based aging characteristic value of the position l in the time slot t is calculated:

wherein U is _{_l} For all user sets that have accessed location l, timeline (u, l, t) represents the aging characteristic value of location l for user u at time slot t.

Step 4: according to longitude and latitude information, geographic distances among all positions in the sign-in data set are calculated, personalized probability density modeling is achieved by using a nuclear density estimation method, and geographic feature influences are personalized excavated. The implementation steps are as follows:

step 4-1: and acquiring all addresses and longitude and latitude information corresponding to the addresses in the sign-in data set C, and calculating the geographic distance between the positions according to the longitude and latitude of each address. Set position l _i Is respectively lng _i And lat _i Is denoted as l _i ＝<lng _i ，lat _i >Position l _j Is respectively lng _j And lat _j Is denoted as l _j ＝<lng _j ，lat _j >Position l _i And l _j The geographic distance between them is:

where R is the earth radius, r=6371 km.

After the geographical distance between every two addresses is calculated, a distance matrix Dist= { Dist is formed _ij }, wherein dist _ij Indicating position l _i And l _j The geographic distance between the two is equal to or less than 1 and equal to or less than or equal to i and equal to or less than 1 and equal to or less than or equal to NL, the matrix is provided with NL rows and NL columns, and NL is the total number of addresses in the sign-in data set.

Step 4-2: target user u _a The accessed location set is recorded as L _{_a} Find L from distance matrix Dist _{_a} The geographic distance d between each pair of locations in the set forms a set of distance samples X _{_a} The distance distribution is estimated by a probability density function f over a distance d:

where σ is a smoothing coefficient, also called bandwidth, and K () is a gaussian kernel function:

step 4-3: given target user u _a Accessed location set L _{_a} Calculating target user u _a Accessing candidate address l _i The probability of (2) is:

wherein f is a probability density function as shown in equation 7.

Step 5: comprehensively considering the influence of the user context, the position context and the time context on the sign-in behavior of the user, constructing a scoring prediction mechanism integrating the historical preference of the user, the influence of the geographic distance, the position aging characteristic and the position dynamic similarity, and recommending a plurality of positions with higher final prediction scores to the user. The implementation steps are as follows:

Step 5-1: to calculate the target user u _a At the current recommended time slot t _r The final prediction score of the non-access address l is firstly subjected to min-max standardization processing on each pre-score generated in the step 2, the step 3 and the step 4:

wherein,is based on the improved collaborative filtering method based on the project, and the current recommended time slot t is utilized _r Calculating the score of the target user ua on the unvisited address l, as shown in the step 2-4; timeline (l, t) _r ) Is the candidate address l in the current recommended time slot t _r Based on the time-aware aging characteristic values, pr (l|L as shown in step 3-3 _{_a} ) Is based on the target user u _a Accessed location set L _{_a} The access probability of the candidate address L predicted by the kernel density estimation method is shown in step 4-3, where L is a set of all locations.

Step 5-2: comprehensively considering the time-effect characteristics of the position, the dynamic similarity of the position based on time perception and the influence of the geographic distance on the sign-in habit of the user, and calculating a target user u _a At t _r Final prediction score for candidate address/:

step 5-3: for target user u _a Not visited placeWith addresses ordered according to the final predictive score (equation 13), recommending the top N positions to target user u _a 。

Step 6: and (5) providing an timeliness evaluation index and defining an timeliness evaluation system of the recommendation system. And respectively comparing the prediction accuracy and the recommendation timeliness of the recommendation system and other classical recommendation systems provided by the invention by using an accuracy evaluation index and a timeliness evaluation index, and evaluating the accuracy and the effectiveness of the proposed technology. The implementation steps are as follows:

step 6-1: NU x 10% of the users are randomly selected as the target user set TestU, where NU represents the total number of users in the check-in dataset. For each target user u in the TestU set _a Respectively running recommendation algorithms to generate current recommendation time t _r The recommendation list topNList (u) _a ,t _r )。

Step 6-2: calculating the time slot t of the recommendation method _r For target user u _a Timeliness index value when providing recommended service, defining the value as target user u _a Average of the aging characteristic values for all positions in the recommended list:

wherein u is _a Is the target user, t _r Is the time slot corresponding to the current recommended time, topNList (u) _a ,t _r ) Is the recommended method in time slot t _r For target user u _a A provided recommendation list, timeline (u _a ,l,t _r ) Is for user u _a For example, position l is in time slot t _r As shown in equation 4).

Step 6-3: calculating the time slot t of the recommendation method _r Timeliness at time:

wherein, testu is a target user set, timeline @, the method comprises the steps ofu,t _r ) Is in time slot t _r The timeliness index value of the recommendation method when providing the recommendation service to the target user u (as shown in formula 14).

Step 6-4: defining the final timeline of the recommendation method as the average value of Timeliness indexes of each time slot:

where T is the set of time slots, t= {0,1,2,3,4,5,6,7}, and timeline (T) is the Timeliness of the recommended method at time slot T (equation 15).

Step 6-5: calculating the time slot t of the recommendation method _r Accuracy and recall at time:

where TestU is the set of all target users, TP (u, t _r )、FP(u,t _r ) And FN (u, t) _r ) The number of positions in the recommendation list for positive case score, negative case score and positive case score, respectively.

Step 6-6: calculating the final accuracy and recall of the recommendation method, wherein the final accuracy and recall are the average value of corresponding evaluation indexes in each time slot:

where T is the set of time slots, t= {0,1,2,3,4,5,6,7}, precision (T) and recovery (T) are the accuracy and recall of the recommended method within time slot T, respectively.

Step 6-7: calculating the comprehensive accuracy F1 value of the recommendation system:

precision and recall are the overall accuracy and recall, respectively, of the recommended method run once.

Step 6-8: repeating the steps 6-1 to 6-7 for Ntimes, wherein the values of the final Timeliness index value timeline, the prediction accuracy Precision, the Recall rate Recall and the comprehensive accuracy index F1 of the recommendation method are the average value of the results of the Ntimes corresponding indexes.

Step 6-9: comparing and analyzing the results of each index: if the accuracy Precision of the position recommending method based on the position aging characteristic and the time perception dynamic similarity is larger than the Precision value of other recommending algorithms, the recommending technology provided by the invention has higher predicting accuracy; if the Recall ratio Recall of the algorithm provided by the invention is larger than the Recall values of other recommended algorithms, the technical query capability provided by the invention is stronger; if the comprehensive precision index F1 value of the algorithm provided by the invention is larger than the F1 values of other recommended algorithms, the technology provided by the invention is higher in comprehensive capacity in the aspect of prediction accuracy; if the Timeliness of the recommendation method is larger than the Timeliness value of other recommendation algorithms, the recommendation technology provided by the invention can be used for mining the recent preference of the user, and has stronger Timeliness.

In the following, a specific location-based social network Brightkite is taken as an example, to describe in detail how the location recommendation method based on location aging characteristics and time-aware dynamic similarity in the present invention works.

The Brightkite dataset is social relationship and check-in information on the Brightkite website for 58228 users during months 4 to 10 of 2008 collected by the university of stenford SNAP laboratory. The number of positions in the Brightkite data set is 693362, the number of check-in records of users is 4747281, and 214078 social relations are formed among the users. The Brightkite dataset is one of the most commonly used test datasets by researchers of location recommendation systems.

The invention selects the check-in data of five hot areas in Brightkite dataset, such as Los Angeles, san Francisco, new York, maricopa and King county for illustration.

step 1-1: the original check-in data set C is collected, the check-in Time in each check-in record is rounded, the rounded check-in Time is recorded as Time, and the set of Time values is time= {0,1,2,3, …,23}. For example, if the check-in time is 00:33:56, the rounded time time=0; if the check-in time is 23:02:54, the rounded time time=23.

The time of day is divided into 8 discrete time slots T, the set of time slots being denoted t= {0,1,2, …,7}. The corresponding conversion relation between the rounded sign-in time and the time slot t is as follows:

specifically, the conversion modes of the rounded check-in time and the time slot t are as shown in table 1:

TABLE 1 conversion relationship between check-in time and time slot

Step 1-2: sorting the check-in data set converted into time slots to obtain 56861 check-in records, which are marked as C= { C ₁ ,c ₂ ,…,c ₅₆₈₆₁ }. Wherein each check-in record contains a user ID and a check-in time for the check-in actionSlot t and the ID, longitude and latitude information of the visited location, denoted as c _i ＝<userID,t,locationID,longitude,latitude>，i∈[1,56861]. All users in the check-in dataset are grouped as U, all locations are grouped as L, the number of users nu=1963, the number of locations nl=2574. In this embodiment, a schematic diagram of the number of location accesses of the users and the statistics result of the number of users is shown in fig. 3.

Step 1-3: the score of user u for location l at time slot t is defined as: if user u accesses location l in three check-in time periods (Table 1) corresponding to time slot t, score r _u,t,l =1; conversely, r _u,t,l ＝0。

Summarizing all scores to form a user-time-location three-dimensional scoring matrix r= { R _u,t,l }，u∈U，t∈[0,7]L epsilon L, the scoring matrix R has 1963X 8 rows and 2574 columns.

wherein i is E [1,2574 ]]，j∈[1,2574]，t∈[0,7]，U _i,t Indicating that the position l was accessed at time slot t _i Is set by the user, U _j,t Indicating that the position l was accessed at time slot t _j Is a set of users.

Step 2-3: some target user u in selected-location social network _a As a recommended service object, the current recommended time is used for time _r Conversion to the corresponding time slot t _r . For example, if the current recommended time is 22:01:34, time slot t _r ＝7。

wherein u is _a Is a target object of the current service of the recommendation system, t _r Is the corresponding time slot when the recommendation system provides the recommendation service, L is a position which is not visited by the target user, L represents all position sets, sim (L, L', t) _r ) Represented in time slot t _r Dynamic similarity between positions l and l',representing target user u _a In time slot t _r The position l' is scored.

The implementation steps are as follows:

step 3-1: recording the oldest time and the latest time of the check-in behavior in the check-in data set, and respectively recording as minT and maxT. In this embodiment, the oldest time of check-in activity occurs is 2008, 4 months, 4 days, 00:37:58, noted as mint= 20080404003758; the latest time of sign-in behavior is 14:31:04 at 10/18/2010, denoted maxt= 20101018143104.

Step 3-2: for each location, all users who have accessed a location l are denoted as a set U _{_l} . If user U epsilon U _{_l} Recording the time of the last access of the user u to the position l in the time slot t as the recontt (u, l, t). For user u, the time characteristic value timeline (u, l, t) of location l in time slot t is calculated as follows:

after the geographical distance between every two addresses is calculated, a distance matrix Dist= { Dist is formed _ij }, wherein dist _ij Indicating position l _i And l _j The geographical distance between the two is 1-2574,1-2574, j-2574, and the matrix has 2574 rows and 2574 columns.

where f is the probability density function as shown in equation 28.

Step 5-1: to calculate the target user u _a At the current recommended time slot t _r The final prediction score for the non-access address i,

firstly, carrying out min-max standardization processing on each pre-score generated in the step 2, the step 3 and the step 4:

wherein,is based on the improved collaborative filtering method based on the project, and the current recommended time slot t is utilized _r Calculating the score of the target user ua for the unvisited address l as in step 2-4); timeline (l, t) _r ) Is the candidate address l in the current recommended time slot t _r Based on the time-aware aging characteristic values, pr (l|L as shown in step 3-3 _{_a} ) Is based on the target user u _a Accessed location set L _{_a} The access probability of the candidate address L predicted by the kernel density estimation method is shown in step 4-3, where L is a set of all locations. />

step 5-3: for target user u _a All addresses which are not accessed are sorted according to the final prediction scores, and N positions which are ranked at the top are recommended to the target user u _a . N takes a multiple of 10, 20, 30, 40, 50 respectively.

step 6-1: 196 users are randomly selected as target user sets Testu, and each target user u in the Testu set is selected _a Respectively running recommendation algorithms to generate current recommendation time t _r The recommendation list topNList (u) _a ,t _r )。

wherein u is _a Is the target user, t _r Is the time slot corresponding to the current recommended time, topNList (u) _a ,t _r ) Is the recommended method in time slot t _r For target user u _a A provided recommendation list, timeline (u _a ,l,t _r ) Is for user u _a For example, position l is in time slot t _r Is used (formula 25).

wherein Testu is a set of target users, timeline (u, t) _r ) Is in time slot t _r The timeliness index value of the recommendation method when providing the recommendation service for the target user u is shown in formula 35.

where T is the set of time slots, t= {0,1,2,3,4,5,6,7}, and timeline (T) is the Timeliness of the recommended method at time slot T, as shown in equation 36.

Step 6-8: the steps 6-1 to 6-7 are repeatedly executed for 100 times, and the box graphs of the accuracy Precision, recall and comprehensive accuracy index F1 of the recommended method after the recommended method is operated for 100 times are respectively shown in fig. 4,5 and 6.

The values of the final Timeliness index value timeline, the prediction accuracy Precision, the Recall rate Recall and the comprehensive accuracy index F1 of the recommendation method are the average value of the 100 corresponding index results. When N is 10, 20, 30, 40, 50, the results of Precision, recall, comprehensive Precision index F1, and timeline of the recommended methods are shown in tables 2, 3, 4, and 5, respectively, wherein the numerical value of each row with the bold format represents the maximum value of the row index:

TABLE 2 Precision index values for different recommendation methods

Table 3 Recall index values for different recommendation methods

TABLE 4 recommendation precision F1 index values for different recommendation methods

Table 5 timelines index value for Timeliness of different recommendation methods

In this case, the histogram of accuracy Precision, recall, and comprehensive Precision index F1, timeliness contrast of the recommendation method and classical project-based collaborative filtering method IBCF, user-based collaborative filtering (UBCF), and kernel density estimation-based access probability prediction method KDE according to the present invention are shown in fig. 7, 8, 9, and 10, respectively.

Step 6-9: comparing and analyzing the results of each index: the accuracy Precision of the position recommending method based on the position aging characteristic and the time perception dynamic similarity is larger than the Precision value of other recommending algorithms, so that the recommending technology provided by the invention has higher prediction accuracy; the Recall rate Recall of the algorithm provided by the invention is larger than the Recall value of other recommended algorithms, which shows that the technical query capability of the algorithm provided by the invention is stronger; the comprehensive precision index F1 value of the algorithm provided by the invention is larger than the F1 values of other recommended algorithms, which shows that the technology provided by the invention has stronger comprehensive capacity in the aspect of prediction accuracy; the timeline of the recommendation method provided by the invention is larger than the timeline value of other recommendation algorithms, so that the recommendation technology provided by the invention can more mine the recent preference of users and has stronger Timeliness.

Different from the existing position recommendation algorithm, the method aims at constructing a real-time, high-accuracy and high-timeliness position recommendation system, considers the difference of the position similarity in different time periods and the inherent timeliness characteristic of the position, innovatively digs the position dynamic similarity based on time perception, and effectively improves the prediction accuracy of the position recommendation system; the definition and calculation method of the position timeliness characteristic are provided, and the recommendation timeliness of the position recommendation system is greatly improved. In addition, the invention designs a set of timeliness evaluation system aiming at the position recommendation system while measuring the prediction accuracy of the recommendation system, and provides important technical support for quantifying the novelty of the recommendation result. The technology provided by the invention has wide application prospect and is expected to be widely applied to the social network market based on the position.

The above technical process is only a preferred embodiment of the present invention, but not represents all the details of the present invention. Any modification, equivalent replacement, and improvement made by those skilled in the art within the scope of the present disclosure, which is within the spirit and principles of the present invention, should be included in the scope of the present invention.

Claims

1. The position recommending method based on the position aging characteristics and the time perception dynamic similarity is characterized by comprising the following steps of:

step 1: collecting and arranging an original check-in data set, converting specific time in a check-in record into different time slots, counting the check-in times of users in the different time slots on each position, and converting the check-in record into a user-time-position three-dimensional scoring matrix according to a statistical result;

step 2: dividing the three-dimensional scoring matrix of the user-time-position according to time slots, extracting the two-dimensional scoring matrix of the user-position in each time slot, calculating the dynamic similarity of the positions in different time slots by using Jaccard coefficients for each scoring matrix, and scoring the non-access address by using the dynamic similarity of the positions and the scoring prediction completed by the user, wherein the method comprises the following steps:

step 2-1: dividing the user-time-position three-dimensional scoring matrix into eight user-position two-dimensional scoring matrices from t=0 to t=7 according to the value of the time slot t, and marking as R _t ＝{r _u,l }，t∈[0,7]，u∈U，l∈L；

wherein i is e [1, NL ]]，j∈[1,NL]，t∈[0,7]NL denotes the number of all locations in the data set, U _i,t Indicating that the position l was accessed at time slot t _i Is set by the user, U _j,t Indicating that the position l was accessed at time slot t _j Is a set of users;

step 2-3: some target user u in selected-location social network _a As a recommended service object, the current recommended time is used for time _r Conversion to the corresponding time slot t _r ；

wherein u is _a Is a target object of the current service of the recommendation system, t _r Is the corresponding time slot when the recommendation system provides the recommendation service, L is a position which is not visited by the target user, L represents all position sets, sim (L, L', t) _r ) Representing the dynamic similarity between positions l and l' at time slot tr,representing target user u _a In time slot t _r Scoring position l' at the time;

step 3: for each position, recording the specific time of each user accessing the position in each time slot last time, calculating the aging characteristic value of the position in each time slot for each user one by one, averaging the aging characteristic values of the positions of all users accessing the address in different time slots, and calculating the aging characteristic value of the position based on time perception, wherein the method comprises the following steps:

Step 3-1: recording the oldest time and the latest time of the sign-in behavior in the sign-in data set, and respectively recording as minT and maxT;

step 3-2: for each location, all users who have accessed a location l are denoted as a set U _{_l} If user U epsilon U _{_l} Recording the time of the user u accessing the position l most recently in the time slot t as a recentT (u, l, t), and for the user u, calculating an aging characteristic value (u, l, t) of the position l in the time slot t as follows:

wherein minT and maxT are respectively the oldest time and the latest time of the sign-in behavior in the sign-in data set, and the recentT (u, l, t) is the time of the user u accessing the position l at the latest time of the time slot t;

wherein U is _{_l} For all user sets that have accessed location l, timeline (u, l, t) represents the aging characteristic value of location l for user u at time slot t;

step 4: according to longitude and latitude information, calculating geographic distances among all positions in a sign-in data set, realizing personalized probability density modeling by using a kernel density estimation method, and mining geographic characteristic influence in a personalized way;

Step 5: comprehensively considering a scoring prediction mechanism of user context, position context and time context on user sign-in behaviors, and recommending a plurality of positions with higher final prediction scores to the user;

step 6: providing a timeliness evaluation index, defining a timeliness evaluation system of a recommendation system, comparing the prediction accuracy and the recommendation timeliness of the recommendation system and other classical recommendation systems provided by the invention by using the accuracy evaluation index and the timeliness evaluation index respectively, and evaluating the accuracy and the effectiveness of the proposed technology, wherein the method comprises the following steps:

step 6-1: randomly selecting NU×10% users as target user set Testu, wherein NU represents the total number of users in the sign-in data set, and is each target user u in the Testu set _a Respectively running recommendation algorithms to generate current recommendation time t _r The recommendation list topNList (u) _a ,t _r )；

wherein u is _a Is the target user, t _r Is the time slot corresponding to the current recommended time, topNList (u) _a ,t _r ) Is the recommended method in time slot t _r For target user u _a A provided recommendation list, timeline (u _a ,l,t _r ) Is for user u _a For example, position l is in time slot t _r Is a time-dependent characteristic value of (2);

where TestU is the set of target users,Timeliness(u,t _r ) Is in time slot t _r Providing a timeliness index value of a recommendation method when a recommendation service is provided for a target user u;

step 6-4: defining the final timeline of the recommendation method as the average value of Timeliness indexes of each time slot, wherein the average value is as follows:

where T is the set of time slots, t= {0,1,2,3,4,5,6,7}, timeline (T) is the Timeliness of the recommended method at time slot T;

where TestU is the set of all target users, TP (u, t _r )、FP(u,t _r ) And FN (u, t) _r ) The number of positions of positive case score, negative case score and positive case score in the recommendation list respectively;

wherein T is a time slot set, t= {0,1,2,3,4,5,6,7}, precision (T) and recovery (T) are the accuracy and recall of the recommended method in time slot T, respectively;

the precision and the recovery are the overall accuracy and recall rate of the recommended method running once respectively;

step 6-8: repeating the steps 6-1 to 6-7 for Ntimes, wherein the values of a final Timeliness index value timeline, a prediction accuracy Precision, a Recall rate Recall and a comprehensive accuracy index F1 of the recommendation method are the average value of the results of the Ntimes corresponding indexes;

step 6-9: comparing and analyzing the results of each index: if the Precision of the method is larger than the Precision of other recommendation algorithms, the recommendation technology prediction accuracy of the method is higher; if the Recall ratio Recall of the method is larger than the Recall values of other recommendation algorithms, the technical check capability of the method is higher; if the comprehensive precision index F1 value of the method is larger than the F1 values of other recommendation algorithms, the technology of the method is higher in comprehensive capacity in the aspect of prediction accuracy; if the Timeliness of the method is larger than the Timeliness value of other recommendation algorithms, the recommendation technology provided by the method can be used for mining the recent preference of the user, and the Timeliness is higher.

2. The location recommendation method based on location aging characteristics and time-aware dynamic similarity according to claim 1, wherein step 1 of the method comprises:

step 1-1: collecting an original check-in data set C, rounding the check-in Time in each check-in record, recording the rounded check-in Time as Time, dividing the Time of day into 8 discrete Time slots T with a set of Time values of time= {0,1,2,3, …,23}, and expressing the set of Time slots as T= {0,1,2, …,7}, wherein the corresponding conversion relation between the rounded check-in Time and the Time slots T is as follows:

step 1-2: sorting the check-in data set converted into time slots to obtain n check-in records, and recording as C= { C ₁ ,c ₂ ,…,c _n Each check-in record contains user ID, check-in time slot t, and ID, longitude and latitude information of the accessed location, denoted as c _i ＝<userID,t,locationID,longitude,latitude>，i∈[1,n]All user sets in the sign-in data set are U, all position sets are L, and the number of users and positions are respectively marked as NU and NL;

step 1-3: the score of user u for location l at time slot t is defined as: if the user u accesses the position l in the sign-in time period corresponding to the time slot t, scoring r _u,t,l =1; conversely, r _u,t,l ＝0；

3. The location recommendation method based on location aging characteristics and time-aware dynamic similarity according to claim 1, wherein step 4 of the method comprises:

step 4-1: acquiring all addresses and longitude and latitude information corresponding to all the addresses in the sign-in data set C, calculating the geographic distance between the positions according to the longitude and latitude of each address, and setting a position l _i Is respectively lng _i And lat _i Is denoted as l _i ＝<lng _i ，lat _i >Position l _j Is respectively lng _j And lat _j Is denoted as l _j ＝<lng _j ，lat _j >Position l _i And l _j The geographic distance between them is:

where R is the earth radius, r=6371 km;

after the geographical distance between every two addresses is calculated, a distance matrix Dist= { Dist is formed _ij }, wherein dist _ij Indicating position l _i And l _j The geographic distance between the two is equal to or less than 1 and equal to or less than or equal to NL, and j is equal to or less than or equal to 1 and is equal to or less than NL, wherein the matrix is provided with NL rows and NL columns, and NL is the total number of addresses in a sign-in data set;

step 4-2: target user u _a The accessed location set is recorded as L _{_a} Find L from distance matrix Dis _{_a} The geographic distance d between each pair of locations in the set forms a set of distance samples X _{_a} The distance distribution is estimated by a probability density function f over a distance d:

wherein f-function is a probability density function.

4. The location recommendation method based on location aging characteristics and time-aware dynamic similarity according to claim 1, wherein step 5 of the method comprises:

step 5-1: to calculate the target user u _a At the current recommended time slot t _r And (3) carrying out min-max standardization processing on the final prediction scores of the non-accessed addresses l, namely, firstly carrying out min-max standardization processing on the pre-scores generated in the steps (2), (3) and (4):

wherein,calculating the score of the target user ua on the unvisited address l in the current recommended time slot tr by utilizing an improved collaborative filtering method based on items; timeline (l, t) _r ) Is a time-aware aging characteristic value of candidate address L in the current recommended time slot tr, pr (l|l _{_a} ) Based on a set L of locations that the target user ua has visited _{_a} The access probability of the candidate address L predicted by the kernel density estimation method is utilized, wherein L is a set of all positions;

step 5-3: for target user u _a All addresses which are not accessed are ranked according to the final prediction scores, and N positions which are ranked at the top are recommended to the target user u _a 。