CN111695917B - Commodity recommendation method, commodity recommendation system, electronic equipment and storage medium - Google Patents
Commodity recommendation method, commodity recommendation system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111695917B CN111695917B CN201910181571.2A CN201910181571A CN111695917B CN 111695917 B CN111695917 B CN 111695917B CN 201910181571 A CN201910181571 A CN 201910181571A CN 111695917 B CN111695917 B CN 111695917B
- Authority
- CN
- China
- Prior art keywords
- cluster
- clustering
- behavior data
- center
- centers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 35
- 238000013507 mapping Methods 0.000 claims abstract description 27
- 238000004364 calculation method Methods 0.000 claims abstract description 21
- 230000006399 behavior Effects 0.000 claims description 97
- 230000002776 aggregation Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 15
- 238000004220 aggregation Methods 0.000 claims description 12
- 230000003542 behavioural effect Effects 0.000 claims description 2
- 238000005259 measurement Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0255—Targeted advertisements based on user history
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0255—Targeted advertisements based on user history
- G06Q30/0256—User search
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a commodity recommendation method, a commodity recommendation system, electronic equipment and a storage medium. The commodity recommendation method comprises the following steps: acquiring a plurality of behavior data of a user in an European space; mapping all the behavior data to a Hamming space according to a local sensitive hash algorithm, wherein each line of data corresponds to one binary code in Yu Haiming spaces; selecting a plurality of first cluster centers from all binary codes; clustering all binary codes according to a plurality of first clustering centers to obtain a plurality of first cluster classes; for each first cluster, reselecting a first cluster center, and continuing to execute the step of clustering until the clustering termination condition is met; recommending commodities to the user according to the obtained first cluster classes. Because the exclusive or calculation between binary codes is quite simple, the clustering operation is performed based on the Hamming distance calculation similarity, the calculation amount of measurement similarity is greatly reduced, and the calculation complexity of commodity recommendation is further greatly reduced.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, a system, an electronic device, and a storage medium for recommending commodities.
Background
The user can show the purchase interests of the user through the actions such as searching, browsing the commodities and the like when shopping online, and recommend proper commodities to the user in order to meet the purchase interests of the user. The clustering is to divide a large number of data sets with unknown labels into a plurality of different categories according to the data characteristics existing in the data, and divide similar data into the same categories, and divide dissimilar data into different categories, thereby belonging to unsupervised learning. Commonly used clustering algorithms (such as K-means algorithms) generally use spatial distances such as euclidean distances to measure similarity between data. However, when the data set to be clustered is large, the time complexity and the space complexity of calculating the similarity based on the space distance are high, and the calculation for recommending the commodity is more complicated.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to overcome the defect of higher calculation complexity of commodity recommendation based on spatial distance clustering in the prior art, and provide a commodity recommendation method, a commodity recommendation system, electronic equipment and a storage medium.
The embodiment of the invention solves the technical problems through the following technical scheme:
the commodity recommending method is characterized by comprising the following steps:
acquiring a plurality of behavior data of a user in an European space, wherein the behavior data comprises at least one of search behavior data and browsing behavior data;
mapping all the behavior data to a Hamming space according to a local sensitive hash algorithm, wherein each line of data corresponds to one binary code in Yu Haiming spaces;
selecting a plurality of first cluster centers from all binary codes;
Clustering all binary codes according to a plurality of first clustering centers to obtain a plurality of first cluster classes;
For each first cluster, reselecting a first cluster center, and continuing to execute the step of clustering all binary codes according to the plurality of first cluster centers until a cluster termination condition is met;
And recommending commodities to the user according to the obtained first cluster after the clustering is ended.
Preferably, the step of clustering all binary codes according to the plurality of first clustering centers includes:
for each binary, calculating hamming distances of the binary from all first cluster centers;
Selecting the shortest hamming distance from all hamming distances;
and classifying the binary codes into a first cluster class of a first cluster center corresponding to the shortest Hamming distance.
Preferably, the step of selecting a plurality of first cluster centers among all binary codes includes:
Selecting a plurality of second hub classes among all the behavioral data;
setting binary codes corresponding to each second clustering center as first clustering centers;
For each first cluster class, the step of reselecting the first cluster center includes:
Setting behavior data corresponding to binary codes classified into the first cluster class to be classified into the same second cluster class, wherein different first cluster classes correspond to different second cluster classes;
reselecting a second cluster center among the second cluster types;
mapping the reselected second aggregation center to a Hamming space according to the local sensitive hash algorithm to obtain a recalculated binary code;
the recalculated binary code is set to the reselected first cluster center.
Preferably, in the step of selecting a plurality of second aggregation centers in all the behavior data, each second aggregation center corresponds to a different binary code;
and/or in the step of reselecting a second cluster center in the second cluster, the second cluster center is obtained by calculation according to Euclidean distances of all behavior data in the second cluster.
Preferably, the clustering termination condition includes:
Executing the step of reselecting the first cluster center for a preset number of times for each first cluster class, or enabling the variation of each first cluster center to be smaller than a preset variation;
and/or the locality sensitive hashing algorithm comprises SimHash.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements any of the above-mentioned commodity recommendation methods when executing the computer program.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of any of the above-mentioned commodity recommendation methods.
A merchandise recommendation system, the merchandise recommendation system comprising:
the behavior data acquisition module is used for acquiring a plurality of behavior data of a user in an European space, wherein the behavior data comprises at least one of search behavior data and browsing behavior data;
The mapping module is used for mapping all the behavior data to a Hamming space according to a local sensitive hash algorithm, wherein each line of data corresponds to one binary code in Yu Haiming spaces;
A first selection module for selecting a plurality of first cluster centers among all binary codes;
The clustering module is used for clustering all binary codes according to the plurality of first clustering centers to obtain a plurality of first cluster classes;
the second selection module is used for reselecting the first clustering center for each first cluster class and continuously calling the clustering module until the clustering termination condition is met;
and the commodity recommending module is used for recommending commodities to the user according to the obtained first cluster types after the clustering is ended.
Preferably, the clustering module includes:
A distance calculation unit for calculating Hamming distances between each binary code and all first cluster centers;
a shortest distance selecting unit for selecting a shortest hamming distance among all hamming distances;
and the classifying unit is used for classifying the binary codes into a first cluster class of a first cluster center corresponding to the shortest Hamming distance.
Preferably, the first selecting module includes:
A first selecting unit for selecting a plurality of second aggregation centers among all the behavior data;
the first setting unit is used for setting binary codes corresponding to each second clustering center as first clustering centers;
The second selection module includes:
a second setting unit, configured to set behavior data corresponding to binary codes classified in the first cluster class to be classified in the same second cluster class, where different first cluster classes correspond to different second cluster classes;
a second selecting unit, configured to reselect a second cluster center in the second cluster;
The mapping unit is used for mapping the reselected second aggregation center to the Hamming space according to the local sensitive hash algorithm to obtain a recalculated binary code;
and a third setting unit, configured to set the recalculated binary code as the reselected first cluster center.
Preferably, each second aggregation center selected by the first selecting unit corresponds to a different binary code;
and the second selecting unit calculates the second cluster center according to Euclidean distances of all the behavior data in the second cluster.
Preferably, the clustering termination condition includes:
For each first cluster, calling the second selection module for preset times, or enabling the variation of each first cluster center to be smaller than the preset variation;
and/or the locality sensitive hashing algorithm comprises SimHash.
The embodiment of the invention has the positive progress effects that: when the behavior data are clustered, the embodiment of the invention does not measure the similarity between the data according to the Euclidean distance and other spatial distances in the Euclidean space, but maps the behavior data in the Euclidean space to binary codes of the Hamming space through local sensitive hash, measures the similarity between the behavior data before conversion in the Euclidean space by measuring the similarity between the converted behavior data in the Hamming space, and greatly reduces the calculated amount of measuring the similarity, thereby greatly improving the clustering speed of the behavior data and further greatly reducing the calculation complexity of commodity recommendation.
Drawings
Fig. 1 is a flowchart of a commodity recommendation method according to embodiment 1 of the present invention.
Fig. 2 is a schematic hardware structure of an electronic device according to embodiment 2 of the present invention.
Fig. 3 is a flowchart of step S103 in the commodity recommendation method according to embodiment 4 of the present invention.
Fig. 4 is a flowchart of step S105 in the commodity recommendation method according to embodiment 4 of the present invention.
Fig. 5 is a schematic diagram of a commodity recommendation system according to embodiment 7 of the present invention.
Fig. 6 is a schematic block diagram of a commodity recommendation system according to embodiment 8 of the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Example 1
The present embodiment provides a commodity recommendation method, and fig. 1 shows a flowchart of the present embodiment. Referring to fig. 1, the commodity recommendation method of the present embodiment includes:
s101, acquiring a plurality of behavior data of a user in an European space;
S102, mapping all behavior data to a Hamming space according to a local sensitive hash algorithm, wherein each line of data corresponds to one binary code in Yu Haiming spaces;
S103, selecting a plurality of first clustering centers from all binary codes;
s104, clustering all binary codes according to a plurality of first clustering centers to obtain a plurality of first cluster classes;
S105, for each first cluster, reselecting a first cluster center, and continuing to execute the step S104 until a cluster termination condition is met;
s106, recommending commodities to the user according to the obtained first cluster classes.
In the present embodiment, a plurality of behavior data in the european space is first acquired in step S101, where the behavior data may include, but is not limited to, search behavior data generated by a user on a line and browsing behavior data, and more specifically, the behavior data may include time data, session data, and the like.
Then in step S102, the behavior data are mapped to Hamming (Hamming) space one by one according to the locality sensitive hashing algorithm, so as to form binary codes corresponding to the behavior data one by one. The basic idea of the local sensitive hash algorithm (Locality-SENSITIVE HASHING, LSH) is that a hash function is designed to calculate hash values of two points in a high-dimensional space, and if the two points are closer, the probability that the hash values of the two points are the same is high; if the two points are far apart, then the probability that the hash values of the two points are the same is small. Thus, the binary code formed by the locality sensitive hashing algorithm in hamming space can reflect the similarity between behavior data in European space.
Specifically, in this embodiment SimHash may be used to map all behavior data to hamming space. Assuming that the behavior data includes X 1 = (1, 2,3, 4), the procedure for performing SimHash on X 1 is as follows:
Firstly, the components of X 1 are transformed by using a traditional Hash algorithm to obtain signature values, and the signature values are as follows:
next, if a value of 0 in the signature value is set to-1, there are:
Again, the weighting operation is performed on the current signature value, and assuming that the weights of the components in X 1 are 1, 3, 1, and 3, respectively, there are:
then, the current signature value is summed by bit, and then:
(1,2,3,4)→(-8,-2,0,-4)
Finally, the positive number in the signature value is set to 1, otherwise, the positive number is set to 0, and the following steps are included:
(1,2,3,4)→(-8,-2,0,-4)→(0,0,0,0)
That is, after the behavior data X 1 = (1, 2,3, 4) is mapped to the hamming space via SimHash, the resulting binary code is X' 1 = (0, 0).
Assuming that the behavior data also includes X 2 = (1, 2,3, 5), then after the behavior data X 2 is mapped to hamming space via SimHash described above, the resulting binary code is X' 2 = (0, 1).
Wherein the behavior data X 1 and X 2 are similar in European space, which differ only in the 4 th component (one 4 and one 5). After the SimHash transformation, behavior data X 1 and X 2 were also similar with a hamming distance of 1 in hamming space.
In step S103, N (N is a positive integer and may be set up in a customized manner according to the specific application) first cluster centers are selected from all binary codes, and are denoted as: k 1,K2,……,KN. In step S104, all binary codes in hamming space are clustered according to N first cluster centers, so as to obtain N first cluster classes corresponding to each first cluster center, which are marked as follows: c 1,C2,……,CN.
Specifically, for each binary code in hamming space, its hamming distance from the first cluster center K 1,K2,……,KN is calculated separately, noted as: d 1,D2,……,DN, from which the shortest hamming distance D M (m=1, 2, … …, N) is selected, then the binary code is categorized in the first cluster class C M of the first cluster center K M corresponding to the shortest hamming distance D M. Thus, clustering of all binary codes is achieved in hamming space.
In step S105, the first cluster center is reselected for each first cluster class, for example, K I (i=1, 2, … …, N) is used as the cluster center to obtain a first cluster class C I including a plurality of binary codes, the first cluster center K I is redetermined according to the plurality of binary codes, and the process goes to step S104 to cluster all binary codes in hamming space to obtain a first cluster class C I. The iteration is repeated until the clustering termination condition is met, and the clustering termination condition may include, but is not limited to, the iteration repetition number, that is, the iteration repeatedly executing step S104 or S105 is performed for a preset number of times, or the variation of the first cluster center K I in the hamming space is smaller than the preset variation and tends to converge.
After the clustering is terminated, the finally obtained first cluster classes can represent user behaviors of a plurality of classes, wherein the user behaviors of each class can reflect different shopping interests of the user. In step S106, the user is recommended with the appropriate merchandise according to the resulting plurality of first cluster classes, i.e., according to the inferred shopping interests.
Because the exclusive or calculation between the binary codes is quite simple, the clustering operation is performed based on the Hamming distance calculation similarity, the calculation amount of measuring the similarity is greatly reduced, the clustering speed of behavior data is greatly improved, and the calculation complexity of commodity recommendation is further greatly reduced.
Example 2
The present embodiment provides an electronic device, which may be expressed in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor may implement the commodity recommendation method provided in embodiment 1 when executing the computer program.
Fig. 2 shows a schematic hardware structure of the present embodiment, and as shown in fig. 2, the electronic device 9 specifically includes:
At least one processor 91, at least one memory 92, and a bus 93 for connecting the different system components (including the processor 91 and the memory 92), wherein:
The bus 93 includes a data bus, an address bus, and a control bus.
The memory 92 includes volatile memory such as Random Access Memory (RAM) 921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 91 executes various functional applications and data processing such as the commodity recommendation method provided in embodiment 1 of the present invention by running a computer program stored in the memory 92.
The electronic device 9 may further communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 96. The network adapter 96 communicates with other modules of the electronic device 9 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Example 3
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the commodity recommendation method provided in embodiment 1.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible embodiment, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of implementing the merchandise recommendation method in embodiment 1 when said program product is run on the terminal device.
Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.
Example 4
On the basis of embodiment 1, this embodiment provides a commodity recommendation method. Referring to fig. 3, in the present embodiment, step S103 specifically includes:
s1031, selecting a plurality of second aggregation centers from all behavior data;
s1032, setting the binary code corresponding to each second cluster center as the first cluster center.
Specifically, in this embodiment, N second aggregation centers are first selected from all behavior data, and are denoted as: l 1,L2,……,LN, and in step S102, the second aggregation center L 1,L2,……,LN maps to hamming space via a locality sensitive hashing algorithm to obtain N binary codes, denoted as: k 1,K2,……,KN, setting the binary code K 1,K2,……,KN as the first cluster center.
The difference between this embodiment and embodiment 1 is that the second cluster center is first selected in the euclidean space, and then the first cluster center corresponding to the second cluster center is determined in the hamming space according to the mapping relationship between the euclidean space and the hamming space. For example, N second cluster centers may be randomly selected among all behavior data according to the K-means algorithm, wherein the N first cluster centers are preferably different from each other, so that when selecting the second cluster centers, it should be noted that each second cluster center corresponds to a different binary code. For another example, N second aggregation centers may be selected from all behavior data directly according to the K-means++ algorithm, such that the second aggregation centers are far apart.
Referring to fig. 4, in the present embodiment, step S105 specifically includes:
s1051, setting behavior data corresponding to binary codes classified in a first cluster class to be classified in the same second cluster class;
s1052, in the second cluster, reselecting a second cluster center;
S1053, mapping the re-selected second aggregation center to a Hamming space according to a local sensitive hash algorithm to obtain a re-calculated binary code;
s1054, setting the recalculated binary code as the first cluster center of the reselection.
Specifically, in step S104, a first cluster class C I having K I as a cluster center and including a plurality of binary codes is obtained, and according to the mapping relationship between the euclidean space and the hamming space, the behavior data corresponding to the binary codes classified in the first cluster class C I may be classified in the second cluster class D I, that is, different first cluster classes correspond to different second cluster classes. In the second cluster D I, the second cluster center L I may be recalculated according to the euclidean distance of all the behavior data, and since the reselected second cluster center L I is calculated and may not exist in the behavior data, it is required to remap the binary code corresponding to the reselected second cluster center L I according to the locally sensitive hash algorithm to the hamming space, and further, the binary code obtained by the recalculation is set as the reselected first cluster center K I.
In this embodiment, the second clustering center is selected in the european style space, and then the first clustering center is determined according to the mapping relationship between the european style space and the hamming space, instead of directly selecting the first clustering center in the hamming space, another manner of determining the first clustering center is provided, and further another commodity recommendation method is provided.
Example 5
The present embodiment provides an electronic device, which may be expressed in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor may implement the commodity recommendation method provided in embodiment 4 when executing the computer program.
Example 6
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the merchandise recommendation method provided by embodiment 4.
Example 7
The embodiment provides a commodity recommendation system, and fig. 5 shows a schematic block diagram of the embodiment. Referring to fig. 5, the commodity recommendation system of the present embodiment includes:
The behavior data acquisition module 1 is used for acquiring a plurality of behavior data of a user in an European space;
The mapping module 2 is used for mapping all the behavior data to a Hamming space according to a local sensitive hash algorithm, wherein each line of data corresponds to one binary code in Yu Haiming spaces;
a first selecting module 3, configured to select a plurality of first cluster centers from all binary codes;
the clustering module 4 is used for clustering all binary codes according to a plurality of first clustering centers to obtain a plurality of first cluster types;
the second selecting module 5 is used for reselecting the first cluster center for each first cluster class, and continuously calling the clustering module 4 until the clustering termination condition is met;
And the commodity recommending module 6 is used for recommending commodities to the user according to the obtained first cluster types after the clustering is ended.
In the present embodiment, the behavior data acquisition module 1 is first invoked to acquire a plurality of behavior data in the european style space, wherein the behavior data may include, but is not limited to, search behavior data generated by a user on a line and browsing behavior data, and more specifically, the behavior data may include time data, session data, and the like.
And then, calling a mapping module 2 to map the behavior data to Hamming (Hamming) space one by one according to a local sensitive hash algorithm to form binary codes corresponding to the behavior data one by one. The basic idea of the local sensitive hash algorithm (Locality-SENSITIVE HASHING, LSH) is that a hash function is designed to calculate hash values of two points in a high-dimensional space, and if the two points are closer, the probability that the hash values of the two points are the same is high; if the two points are far apart, then the probability that the hash values of the two points are the same is small. Thus, the binary code formed by the locality sensitive hashing algorithm in hamming space can reflect the similarity between behavior data in European space.
Specifically, in this embodiment SimHash may be used to map all behavior data to hamming space. Assuming that the behavior data includes X 1 = (1, 2,3, 4), the procedure for performing SimHash on X 1 is as follows:
Firstly, the components of X 1 are transformed by using a traditional Hash algorithm to obtain signature values, and the signature values are as follows:
next, if a value of 0 in the signature value is set to-1, there are:
Again, the weighting operation is performed on the current signature value, and assuming that the weights of the components in X 1 are 1, 3, 1, and 3, respectively, there are:
then, the current signature value is summed by bit, and then:
(1,2,3,4)→(-8,-2,0,-4)
Finally, the positive number in the signature value is set to 1, otherwise, the positive number is set to 0, and the following steps are included:
(1,2,3,4)→(-8,-2,0,-4)→(0,0,0,0)
That is, after the behavior data X 1 = (1, 2,3, 4) is mapped to the hamming space via SimHash, the resulting binary code is X' 1 = (0, 0).
Assuming that the behavior data also includes X 2 = (1, 2,3, 5), then after the behavior data X 2 is mapped to hamming space via SimHash described above, the resulting binary code is X' 2 = (0, 1).
Wherein the behavior data X 1 and X 2 are similar in European space, which differ only in the 4 th component (one 4 and one 5). After the SimHash transformation, behavior data X 1 and X 2 were also similar with a hamming distance of 1 in hamming space.
The first selection module 3 selects N (N is a positive integer and may be set up in a customized manner according to a specific application) first cluster centers from all binary codes, and marks as: k 1,K2,……,KN. The clustering module 4 clusters all binary codes in the Hamming space according to the N first clustering centers to obtain N first cluster classes corresponding to the first clustering centers, and the N first cluster classes are recorded as follows: c 1,C2,……,CN.
Referring to fig. 5, the clustering module 4 includes a distance calculation unit 41, a shortest distance selection unit 42, and a classification unit 43. Specifically, the distance calculation unit 41 is configured to calculate, for each binary code in the hamming space, the hamming distance thereof from the first cluster center K 1,K2,……,KN, and is written as: d 1,D2,……,DN, the shortest distance selecting unit 42 is configured to select the shortest hamming distance D M (m=1, 2, … …, N) from the shortest distance, and the classifying unit 43 is configured to classify the binary code into the first cluster class C M of the first cluster center K M corresponding to the shortest hamming distance D M. Thus, clustering of all binary codes is achieved in hamming space.
For each first cluster class, the first cluster center is reselected, for example, K I (i=1, 2, … …, N) is used as a cluster center to obtain a first cluster class C I including a plurality of binary codes, the second selecting module 5 redetermines the first cluster center K I according to the plurality of binary codes, and clusters all binary codes in hamming space to obtain a first cluster class C I. The iteration is repeated until the clustering termination condition is met, and the clustering termination condition may include, but is not limited to, iteration repetition times, that is, a preset times clustering module 4 or a second selecting module 5 is called, or the variation of a first clustering center K I in the hamming space is smaller than a preset variation and tends to converge.
After the clustering is terminated, the finally obtained first cluster classes can represent user behaviors of a plurality of classes, wherein the user behaviors of each class can reflect different shopping interests of the user. The item recommendation module 6 is invoked to recommend the appropriate item to the user based on the resulting plurality of first cluster classes, i.e., based on the inferred shopping interests.
Because the exclusive or calculation between the binary codes is quite simple, the clustering operation is performed based on the Hamming distance calculation similarity, the calculation amount of measuring the similarity is greatly reduced, the clustering speed of behavior data is greatly improved, and the calculation complexity of commodity recommendation is further greatly reduced.
Example 8
On the basis of embodiment 7, this embodiment provides a commodity recommendation system, and fig. 6 shows a schematic block diagram of this embodiment. Referring to fig. 6, in the merchandise recommendation system of the present embodiment, the first selection module 3 specifically includes:
A first selecting unit 31 for selecting a plurality of second aggregation centers among all the behavior data;
The first setting unit 32 is configured to set the binary code corresponding to each second cluster center as the first cluster center.
Specifically, in the present embodiment, the first selection unit 31 first selects N second aggregation centers among all the behavior data, denoted as: l 1,L2,……,LN, and further, mapping the second aggregation center L 1,L2,……,LN to hamming space via a locality sensitive hashing algorithm to obtain N binary codes, which are recorded as: k 1,K2,……,KN, the first setting unit 32 may set the binary code K 1,K2,……,KN as the first cluster center.
The difference between this embodiment and embodiment 7 is that the second cluster center is selected in the euclidean space first, and then the first cluster center corresponding to the second cluster center is determined in the hamming space according to the mapping relationship between the euclidean space and the hamming space. For example, N second cluster centers may be randomly selected among all behavior data according to the K-means algorithm, wherein the N first cluster centers are preferably different from each other, so that when selecting the second cluster centers, it should be noted that each second cluster center corresponds to a different binary code. For another example, N second aggregation centers may be selected from all behavior data directly according to the K-means++ algorithm, such that the second aggregation centers are far apart.
In this embodiment, the second selecting module 5 specifically includes:
a second setting unit 51, configured to set behavior data corresponding to binary codes classified in the first cluster class to be classified in the same second cluster class;
a second selecting unit 52, configured to reselect a second cluster center in a second cluster;
a mapping unit 53, configured to map the re-selected second aggregation center to hamming space according to a locally sensitive hash algorithm, to obtain a re-calculated binary code;
a third setting unit 54 is configured to set the recalculated binary code to the reselected first cluster center.
Specifically, by calling the clustering module 4 to obtain a first cluster class C I having K I as a clustering center and including a plurality of binary codes, according to the mapping relationship between the european space and the hamming space, the second setting unit 51 may classify behavior data corresponding to the binary codes classified in the first cluster class C I into a second cluster class D I, that is, different first cluster classes correspond to different second cluster classes. In the second cluster D I, the second selecting unit 52 may recalculate and select the second cluster center L I according to the euclidean distance of all the behavior data, and since the recalculated second cluster center L I is calculated and may not exist in the behavior data, the mapping unit 53 needs to be called to remap to the hamming space according to the above-mentioned locally sensitive hash algorithm, so as to obtain the binary code corresponding to the reselected second cluster center L I, and further, the third setting unit 54 sets the recalculated binary code to be the reselected first cluster center K I.
In this embodiment, the second clustering center is selected in the european style space, and then the first clustering center is determined according to the mapping relationship between the european style space and the hamming space, instead of directly selecting the first clustering center in the hamming space, another manner of determining the first clustering center is provided, and further another commodity recommendation system is provided.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.
Claims (12)
1. A commodity recommendation method, characterized in that the commodity recommendation method comprises:
acquiring a plurality of behavior data of a user in an European space, wherein the behavior data comprises at least one of search behavior data and browsing behavior data;
mapping all the behavior data to a Hamming space according to a local sensitive hash algorithm, wherein each line of data corresponds to one binary code in Yu Haiming spaces;
selecting a plurality of first cluster centers from all binary codes;
Clustering all binary codes according to a plurality of first clustering centers to obtain a plurality of first cluster classes;
For each first cluster, reselecting a first cluster center, and continuing to execute the step of clustering all binary codes according to the plurality of first cluster centers until a cluster termination condition is met;
And recommending commodities to the user according to the obtained first cluster after the clustering is ended.
2. The merchandise recommendation method according to claim 1, wherein said step of clustering all binary codes according to a plurality of first clustering centers comprises:
for each binary, calculating hamming distances of the binary from all first cluster centers;
Selecting the shortest hamming distance from all hamming distances;
and classifying the binary codes into a first cluster class of a first cluster center corresponding to the shortest Hamming distance.
3. The merchandise recommendation method of claim 1, wherein the step of selecting a plurality of first cluster centers among all binary codes comprises:
Selecting a plurality of second hub classes among all the behavioral data;
setting binary codes corresponding to each second clustering center as first clustering centers;
For each first cluster class, the step of reselecting the first cluster center includes:
Setting behavior data corresponding to binary codes classified into the first cluster class to be classified into the same second cluster class, wherein different first cluster classes correspond to different second cluster classes;
reselecting a second cluster center among the second cluster types;
mapping the reselected second aggregation center to a Hamming space according to the local sensitive hash algorithm to obtain a recalculated binary code;
the recalculated binary code is set to the reselected first cluster center.
4. The merchandise recommendation method according to claim 3, wherein in said step of selecting a plurality of second aggregation centers among all of the behavior data, each second aggregation center corresponds to a different binary code;
and/or in the step of reselecting a second cluster center in the second cluster, the second cluster center is obtained by calculation according to Euclidean distances of all behavior data in the second cluster.
5. The merchandise recommendation method of claim 1, wherein the cluster termination condition comprises:
Executing the step of reselecting the first cluster center for a preset number of times for each first cluster class, or enabling the variation of each first cluster center to be smaller than a preset variation;
and/or the locality sensitive hashing algorithm comprises SimHash.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the merchandise recommendation method of any one of claims 1-5 when the computer program is executed by the processor.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the merchandise recommendation method according to any one of claims 1-5.
8. A merchandise recommendation system, the merchandise recommendation system comprising:
the behavior data acquisition module is used for acquiring a plurality of behavior data of a user in an European space, wherein the behavior data comprises at least one of search behavior data and browsing behavior data;
The mapping module is used for mapping all the behavior data to a Hamming space according to a local sensitive hash algorithm, wherein each line of data corresponds to one binary code in Yu Haiming spaces;
A first selection module for selecting a plurality of first cluster centers among all binary codes;
The clustering module is used for clustering all binary codes according to the plurality of first clustering centers to obtain a plurality of first cluster classes;
the second selection module is used for reselecting the first clustering center for each first cluster class and continuously calling the clustering module until the clustering termination condition is met;
and the commodity recommending module is used for recommending commodities to the user according to the obtained first cluster types after the clustering is ended.
9. The merchandise recommendation system of claim 8, wherein the clustering module comprises:
A distance calculation unit for calculating Hamming distances between each binary code and all first cluster centers;
a shortest distance selecting unit for selecting a shortest hamming distance among all hamming distances;
and the classifying unit is used for classifying the binary codes into a first cluster class of a first cluster center corresponding to the shortest Hamming distance.
10. The merchandise recommendation system of claim 8 wherein the first selection module comprises:
A first selecting unit for selecting a plurality of second aggregation centers among all the behavior data;
the first setting unit is used for setting binary codes corresponding to each second clustering center as first clustering centers;
The second selection module includes:
a second setting unit, configured to set behavior data corresponding to binary codes classified in the first cluster class to be classified in the same second cluster class, where different first cluster classes correspond to different second cluster classes;
a second selecting unit, configured to reselect a second cluster center in the second cluster;
The mapping unit is used for mapping the reselected second aggregation center to the Hamming space according to the local sensitive hash algorithm to obtain a recalculated binary code;
and a third setting unit, configured to set the recalculated binary code as the reselected first cluster center.
11. The merchandise recommendation system of claim 10, wherein each second hub selected by said first selection unit corresponds to a different binary code;
And/or the second selecting unit calculates the second cluster center according to Euclidean distances of all behavior data in the second cluster.
12. The merchandise recommendation system of claim 8, wherein the cluster termination condition comprises:
For each first cluster, calling the second selection module for preset times, or enabling the variation of each first cluster center to be smaller than the preset variation;
and/or the locality sensitive hashing algorithm comprises SimHash.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910181571.2A CN111695917B (en) | 2019-03-11 | 2019-03-11 | Commodity recommendation method, commodity recommendation system, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910181571.2A CN111695917B (en) | 2019-03-11 | 2019-03-11 | Commodity recommendation method, commodity recommendation system, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111695917A CN111695917A (en) | 2020-09-22 |
CN111695917B true CN111695917B (en) | 2024-10-22 |
Family
ID=72474692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910181571.2A Active CN111695917B (en) | 2019-03-11 | 2019-03-11 | Commodity recommendation method, commodity recommendation system, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111695917B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210357515A1 (en) * | 2020-05-18 | 2021-11-18 | Gsi Technology Inc. | Secure similarity search for sensitive data |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113421136A (en) * | 2021-08-25 | 2021-09-21 | 深圳兆瑞优品科技有限公司 | Online shopping wind control method, device and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103631928A (en) * | 2013-12-05 | 2014-03-12 | 中国科学院信息工程研究所 | LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system |
CN105956093A (en) * | 2016-04-29 | 2016-09-21 | 浙江大学 | Individual recommending method based on multi-view anchor graph Hash technology |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710334B (en) * | 2009-12-04 | 2012-01-25 | 大连理工大学 | Large-scale image library retrieving method based on image Hash |
CN101894130B (en) * | 2010-06-08 | 2011-12-21 | 浙江大学 | Sparse dimension reduction-based spectral hash indexing method |
US8625907B2 (en) * | 2010-06-10 | 2014-01-07 | Microsoft Corporation | Image clustering |
CN103064887B (en) * | 2012-12-10 | 2016-01-20 | 华为技术有限公司 | A kind of method and apparatus of recommendation information |
CN104199922B (en) * | 2014-09-01 | 2019-05-03 | 中国科学院自动化研究所 | A Large-scale Image Database Retrieval Method Based on Local Similarity Hash Algorithm |
CN104679835B (en) * | 2015-02-09 | 2017-10-31 | 浙江大学 | A kind of book recommendation method based on multi views Hash |
CN106951489A (en) * | 2017-03-13 | 2017-07-14 | 杭州师范大学 | A kind of personalized recommendation method and device for sparse big data |
CN108665333B (en) * | 2017-03-31 | 2021-04-30 | 北京京东尚科信息技术有限公司 | Commodity recommendation method and device, electronic equipment and storage medium |
CN107341178B (en) * | 2017-05-24 | 2020-05-29 | 北京航空航天大学 | A Data Retrieval Method Based on Adaptive Binary Quantization Hash Coding |
CN107067045A (en) * | 2017-05-31 | 2017-08-18 | 北京京东尚科信息技术有限公司 | Data clustering method, device, computer-readable medium and electronic equipment |
CN108052535B (en) * | 2017-11-15 | 2021-02-05 | 国家计算机网络与信息安全管理中心 | Visual feature parallel rapid matching method and system based on multiprocessor platform |
CN108491430B (en) * | 2018-02-09 | 2021-10-15 | 北京邮电大学 | An Unsupervised Hash Retrieval Method Based on Clustering Feature Directions |
CN108897847B (en) * | 2018-06-28 | 2021-05-14 | 中国人民解放军国防科技大学 | Multi-GPU density peak clustering method based on locality sensitive hashing |
CN109145143A (en) * | 2018-08-03 | 2019-01-04 | 厦门大学 | Sequence constraints hash algorithm in image retrieval |
-
2019
- 2019-03-11 CN CN201910181571.2A patent/CN111695917B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103631928A (en) * | 2013-12-05 | 2014-03-12 | 中国科学院信息工程研究所 | LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system |
CN105956093A (en) * | 2016-04-29 | 2016-09-21 | 浙江大学 | Individual recommending method based on multi-view anchor graph Hash technology |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210357515A1 (en) * | 2020-05-18 | 2021-11-18 | Gsi Technology Inc. | Secure similarity search for sensitive data |
Also Published As
Publication number | Publication date |
---|---|
CN111695917A (en) | 2020-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110941740B (en) | Video recommendation method and computer-readable storage medium | |
US20180342004A1 (en) | Cumulative success-based recommendations for repeat users | |
TW202008237A (en) | Method and device for training prediction model for new scenario | |
WO2021068513A1 (en) | Abnormal object recognition method and apparatus, medium, and electronic device | |
CN111612038B (en) | Abnormal user detection method and device, storage medium and electronic equipment | |
US20240046157A1 (en) | System and method for generating and optimizing artificial intelligence models | |
CN111125658B (en) | Method, apparatus, server and storage medium for identifying fraudulent user | |
US10642912B2 (en) | Control of document similarity determinations by respective nodes of a plurality of computing devices | |
JP7171471B2 (en) | LEARNING MODEL GENERATION SUPPORT DEVICE AND LEARNING MODEL GENERATION SUPPORT METHOD | |
CN111966886A (en) | Object recommendation method, object recommendation device, electronic equipment and storage medium | |
US20200293898A1 (en) | System and method for generating and optimizing artificial intelligence models | |
CN111260243A (en) | Risk assessment method, device, equipment and computer readable storage medium | |
CN113591881B (en) | Intention recognition method and device based on model fusion, electronic equipment and medium | |
CN112148880A (en) | Customer service dialogue corpus clustering method, system, equipment and storage medium | |
CN111695917B (en) | Commodity recommendation method, commodity recommendation system, electronic equipment and storage medium | |
US9201967B1 (en) | Rule based product classification | |
US11403688B2 (en) | Machine learning based procurement as a service | |
US20210365470A1 (en) | Apparatus for recommending feature and method for recommending feature using the same | |
CN113239259A (en) | Method and device for determining similar stores | |
US11487964B2 (en) | Comprehensive data science solution for segmentation analysis | |
US11954141B1 (en) | Systems and methods for ad-hoc event detection in content | |
WO2020056286A1 (en) | System and method for predicting average inventory with new items | |
US10572926B1 (en) | Using artificial intelligence to efficiently identify significant items in a database | |
US12111881B2 (en) | Item recommendation with application to automated artificial intelligence | |
JP7244449B2 (en) | Information processing device, information processing method and information processing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |