HK1193192B - Method and system for sorting search results, and method and system for optimizing search result sorting - Google Patents
Method and system for sorting search results, and method and system for optimizing search result sorting Download PDFInfo
- Publication number
- HK1193192B HK1193192B HK14106578.1A HK14106578A HK1193192B HK 1193192 B HK1193192 B HK 1193192B HK 14106578 A HK14106578 A HK 14106578A HK 1193192 B HK1193192 B HK 1193192B
- Authority
- HK
- Hong Kong
- Prior art keywords
- sorting
- transaction data
- effective
- actual
- ranking
- Prior art date
Links
Description
Technical Field
The present application relates to the field of computer data processing technologies, and in particular, to a search result ranking method and system, and a search result ranking optimization method and system.
Background
With the development of electronic commerce, more and more users choose to purchase goods at an electronic commerce website. Generally, an e-commerce website usually has tens of millions or even hundreds of millions of products, and a user needs to find a product needed by the user from a large number of products, and searching by using keywords is a common method. The search is that the user inputs a keyword, and the website returns a search result related to the keyword for the user to screen.
In many cases, a keyword may have a large number of search results, and the search results are necessarily arranged in a certain order when being displayed, and how to sort the search results requires comprehensive consideration of websites. For example, the relevance of the search results to the keywords, the previous click-through rate of the search results, the deal situation, and so forth may be used. For the e-commerce website, the main purpose is to increase the sales volume of the goods, and therefore, in addition to the relevance, the e-commerce website needs to consider the availability of the search results, such as the conversion rate of the deal, the goodness of the deal, and the like when ranking the search results.
At present, when a common e-commerce website sorts search results, relevance and trafficability prediction is mainly obtained by manually analyzing historical data, determining characteristics and weights of the search results (namely specific commodities) according to experience, and calculating according to a certain formula. The commodity characteristics refer to factors which can influence the commodity bargaining performance, such as sales volume, favorable rating, bargaining conversion rate and the like. Because the features and weights are set empirically, blindly and subjectively, and often have errors from the actual situation. Therefore, the returned ranked search results may be greatly different from the user expectation, and the search results expected by the user may be arranged at a later position, because the number of the search results is usually large, in order to reduce the data transmission amount, the server usually returns the ranked search results in segments, returns partial results first, and returns partial results if the user submits a request. When the sorted search results are greatly different from the user expectation, the user may continuously request to view the remaining search results or submit a new search request to the server through the client again to obtain the search results expected by the user. The data transmission amount of the server is increased, which undoubtedly increases the load of the server, occupies a large amount of network resources, and may even cause network congestion. Meanwhile, the sequenced search results returned by the server contain a large amount of irrelevant data, and the transmission of the data is undoubtedly waste and unnecessary occupation of server resources and network resources.
Disclosure of Invention
The application provides a search result ordering method and system, and a search result ordering optimization method and system, which can solve the problems of server burden increase and network congestion caused by repeated sending of search requests by a user through a client due to the fact that a search result is different from the expectation of the user.
In order to solve the above problem, the present application discloses a search result ranking method, including the following steps:
acquiring an original feature set, wherein the original feature comprises preset features which may influence the sequencing of search results;
extracting effective features from the original feature set based on historical transaction data, wherein the effective features refer to features which can influence the sequencing of search results and are determined according to the historical transaction data;
determining initial weights of all effective features based on historical transaction data, and training the initial weights by using the historical transaction data and a preset training model to obtain final weights;
ranking the search results based on the final weight of the valid features.
Further, the extracting valid features from the raw feature set based on the historical transaction data includes:
selecting two groups of test products based on historical transaction data, wherein one group of test products is products with transaction records, and the other group of test products is products without transaction records;
extracting relevant data of the two groups of test products in a certain time period from historical transaction data respectively, and calculating characteristic values of original characteristics of the two groups of test products by using the relevant data;
and comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, selecting the original characteristics as effective characteristics.
Further, the extracting valid features from the raw feature set based on the historical transaction data includes:
extracting transaction data in a preset time period from historical transaction data, and calculating the transaction conversion rate of each product in the preset time period;
selecting two groups of products with the difference value of the hybridization conversion rate larger than a threshold value as test products;
extracting transaction data of the two groups of test products in a certain time period after the preset time period from historical transaction data, and calculating a characteristic value of each original characteristic in an original characteristic set of the two groups of test products;
and comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, selecting the original characteristics as effective characteristics.
Further, the determining initial weights of the effective features based on the historical transaction data, and training the initial weights by using the historical transaction data and the training model to obtain final weights includes:
determining an initial weight of the valid features;
substituting the historical transaction data and the initial weight into a preset training model, and calculating theoretical data;
and comparing the theoretical data with the actual data, if the difference between the theoretical data and the actual data is within a preset range, determining the initial weight as the final weight of the effective features, and otherwise, returning to the step of determining the initial weight of the effective features.
Further, the ranking the search results based on the final weight of the valid features comprises:
determining an actual effective characteristic value of the search result;
calculating the predicted transaction conversion rate of the search result based on the final weight of the effective characteristic and the actual effective characteristic value;
and sorting the search results according to the predicted successful conversion rate.
The application also discloses a search result sorting optimization method, which comprises the following steps:
respectively acquiring each group of alternative weight values of the effective characteristics of the search results;
respectively adopting each alternative weight value to calculate a theoretical sorting score of the search result at a certain preset time point, and sorting the search results according to the theoretical sorting score to obtain each group of sorting results;
respectively acquiring search results of which the sequencing results of each group are arranged in the front by a preset amount, and acquiring transaction data of the search results after the preset time point;
calculating actual sorting scores of the search results which are arranged in the front in a preset number in each group of sorting results according to the transaction data;
and selecting the alternative weight values corresponding to a group of sorting results with the highest actual sorting scores as the final weight values of the effective features.
Further, the theoretical sorting score is a predicted value of a single feature predicted value or a feature combination, and the actual sorting score is an actual value of a single feature or a feature combination corresponding to the theoretical sorting score.
Further, the theoretical ranking score is a predicted transaction conversion rate, and the actual ranking score is an actual transaction conversion rate; or
The theoretical ranking score is a predicted good rating, and the actual ranking score is an actual good rating.
Further, the selecting, as the final weight value of the effective feature, the alternative weight value corresponding to the group of ranking results with the highest actual ranking score includes:
and selecting the alternative weight values corresponding to a group of sorting results with the highest actual sorting score sum or average value as the final weight values of the effective features.
The application also discloses a search result sorting optimization method, which comprises the following steps:
obtaining a sorting result sorted according to a theoretical sorting score of a search result at a certain preset time point, wherein the theoretical sorting score is obtained according to the final weight of the effective features and the actual effective feature value of each search result;
acquiring transaction data of a preset number of search results arranged in the sorting result after the preset time point, and calculating an actual sorting score of the search results according to the transaction data;
and comparing the actual sorting score with the theoretical sorting score, and optimizing the final weight of the effective features if the difference value of the actual sorting score and the theoretical sorting score is greater than a threshold value.
Further, the theoretical ranking score is a predicted transaction conversion rate, and the actual ranking score is an actual transaction conversion rate; or
The theoretical ranking score is a predicted good rating, and the actual ranking score is an actual good rating.
The application also discloses a search result ranking system, comprising:
the system comprises an original characteristic set acquisition module, a search result sorting module and a search result display module, wherein the original characteristic set acquisition module is used for acquiring an original characteristic set, and the original characteristic comprises preset characteristics which can influence the search result sorting;
the effective feature extraction module is used for extracting effective features from the original feature set based on historical transaction data, wherein the effective features refer to features which can influence the sequencing of the search results and are determined according to the historical transaction data;
the effective characteristic weight determining module is used for determining the initial weight of each effective characteristic based on historical transaction data and training the initial weight by utilizing the historical transaction data and a preset training model to obtain a final weight;
and the ranking module is used for ranking the search results based on the final weight of the effective characteristics.
Further, the valid feature extraction module includes:
the test product selection submodule is used for selecting two groups of test products based on historical transaction data, wherein one group of test products is products with transaction records, and the other group of test products is products without transaction records;
the characteristic value operator module is used for respectively extracting relevant data of the two groups of test products in a certain time period from historical transaction data and calculating characteristic values of original characteristics of the two groups of test products by using the relevant data;
and the comparison submodule is used for comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, the original characteristics are selected as effective characteristics.
The application also discloses a search result ranking optimization system, which comprises:
the alternative weight value acquisition module is used for respectively acquiring each group of alternative weight values of the effective characteristics of the search results;
the theoretical sorting score calculating module is used for calculating the theoretical sorting scores of the search results at a certain preset time point by adopting the alternative weight values respectively, and sorting the search results according to the theoretical sorting scores to obtain various groups of sorting results;
the transaction data acquisition module is used for respectively acquiring search results of which the sequencing results of each group are arranged in front of a preset number and acquiring transaction data of the search results after the preset time point;
the actual sorting score calculating module is used for calculating the actual sorting scores of the search results which are arranged in the front of the preset number in each group of sorting results according to the transaction data;
and the final weight determining module is used for selecting the alternative weight values corresponding to the group of sorting results with the highest actual sorting scores as the final weight values of the effective features.
The application also discloses a search result ranking optimization system, which comprises:
the theoretical sorting score calculating module is used for obtaining sorting results sorted according to the theoretical sorting scores of the search results at a certain preset time point, and the theoretical sorting scores are obtained according to the final weight of the effective features and the actual effective feature values of the search results;
the actual sorting score calculating module is used for acquiring the transaction data of the search results which are arranged in the sorting results in the front preset number after the preset time point, and calculating the actual sorting scores of the search results according to the transaction data;
and the optimization module is used for comparing the actual sorting score with the theoretical sorting score, and optimizing the final weight of the effective features if the difference value of the actual sorting score and the theoretical sorting score is greater than a threshold value.
Compared with the prior art, the method has the following advantages:
according to the search result sorting method and system, effective features influencing sorting results are selected through historical transaction data, final weights of the effective features are determined by combining the historical transaction data, and the search results are finally sorted by using the weights. In the process, besides determining each effective characteristic and the initial weight according to the historical transaction data, the initial weight is trained by utilizing the historical transaction data, so that an optimized final weight is obtained, the objectivity and the accuracy of the final weight are ensured, the objectivity and the accuracy of a sequencing result are improved, the situation that a user continuously requests to obtain residual data or sends a new search request to a server again through a client because the sequencing is not accurate and an expected search result cannot be obtained is avoided, the burden of the server and the occupation of network resources are reduced, and the transmission quantity of data is reduced.
In addition, in the process of selecting effective characteristics, two groups of test products with high and low transaction rates and high contrast are selected according to historical transaction data to serve as test bases. After the characteristic values of the two groups of test products are respectively calculated according to historical transaction data, the influence of the characteristics on the product transaction rate is determined by comparing the difference of the characteristic values of the same original characteristics of the two groups of products, so that effective characteristics are accurately selected, and the sorting accuracy is improved.
In the search result ranking optimization method and system, an optimal weight value is determined or an optimized mode is performed on the determined weight value by using a certain time point and transaction data after the time point, namely, a relatively optimized search result ranking mode is determined by using real historical transaction data or an existing search result ranking mode is optimized, so that the ranking result is more objective and accurate, and the situation that a user cannot obtain an expected search result due to inaccurate ranking and continuously requests to obtain residual data or sends a new search request to a server again through a client can also be avoided, so that the burden of the server and the occupation of network resources are reduced, and meanwhile, the transmission quantity of data is reduced.
Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for practicing the present application.
Drawings
FIG. 1 is a flowchart of a first embodiment of a search result ranking method of the present application;
FIG. 2 is a flowchart of a first embodiment of a search result ranking optimization method according to the present application;
FIG. 3 is a schematic diagram of two sets of ranking results of an example of search result ranking optimization of the present application;
FIG. 4 is a flowchart of a second embodiment of a search result ranking optimization method of the present application;
FIG. 5 is a schematic structural diagram of a first embodiment of a search result ranking system of the present application;
FIG. 6 is a schematic structural diagram of a first embodiment of a search result ranking optimization system according to the present application;
fig. 7 is a schematic structural diagram of a second embodiment of the search result ranking optimization system of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
Referring to fig. 1, a first embodiment of a search result ranking method according to the present application is shown, which includes the following steps:
step 101, obtaining an original feature set, where the original feature includes a preset feature that may affect the ranking of search results.
The raw feature set may be determined from historical transaction data or experience. In general, the original feature set includes features such as transaction amount, transaction conversion rate, goodness, shipping speed, picture and text quality, transaction amount, and the like.
The original feature set of the search result can be preset, and can be directly obtained from a server or other databases when needed, or historical transaction data can be obtained from the server or the databases in real time and extracted by a real-time analysis method.
And 102, extracting effective characteristics from the original characteristic set based on historical transaction data, wherein the effective characteristics refer to characteristics which can influence the sequencing of the search results and are determined according to the historical transaction data.
The historical transaction data can be directly read from the server, and the effective feature extraction from the original feature set based on the historical transaction data specifically comprises the following steps:
two sets of test products are selected based on historical transaction data, one set being products with a record of deals and the other set being products without a record of deals.
Extracting relevant data of the two groups of test products in a certain time period from historical transaction data respectively, and calculating characteristic values of original characteristics of the two groups of test products by using the relevant data;
and comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, selecting the original characteristics as effective characteristics.
The correlation data is used to calculate the specific values of each feature in the original feature set. The characteristics are different, the required related data are also different, and the specific related data can be determined according to specific needs. For example, for the feature of transaction amount, the required data is the number of transaction made within a predetermined period of time, and the related data is the number of transaction made. For another example, for the goodness, the raw data required for the goodness is the total number of evaluations and the goodness in a predetermined time period, and the related data is the total number of evaluations and the goodness.
The calculation formula of the feature value of each original feature in the original feature set may be determined according to actual conditions, and how to effectively represent the feature is preferably considered. For example, for the feature of the transaction amount, if the value of the transaction amount is directly used to represent the feature value, the theoretical value may be 0 to any natural number. However, in the specific value taking process, the difference of the numerical values compared alone cannot explain the problem. For example, in the case of two transactions of 0 and 1, the difference between the characteristic values of the two transactions is 1, but the difference is larger in the case of two transactions. However, for the two cases of the transaction amounts being 100 and 101, respectively, this difference only indicates that the transaction amounts differ by 1. For this purpose, the calculation formula may be reset for the characteristic, that is, a final characteristic value is calculated by using the transaction amount as a parameter instead of using the transaction amount value as the characteristic value. For example, assuming that the transaction amount is n, 1-1/(1 + n) may be adopted, corresponding to 0 transaction having a characteristic value of 0, 1 transaction having a characteristic value of 0.5, 100 transactions having 0.9901, and 101 transactions having 0.9902. In this way, the variation difference of the transaction amount can be more effectively represented. Similarly, similar processing manners may be adopted for other features as long as the features can be effectively expressed, and the present application is not limited by contrast.
It is understood that the criteria for selecting two sets of test products in the foregoing steps is whether there is a transaction record, and in order to increase the contrast of the two sets of test products and increase the range of the selected products and improve the accuracy of the result, it is preferable that one of the two sets of test products is selected as a product with a transaction record higher than the first threshold, and the other is selected as a product with no transaction record or lower than the second threshold. The first threshold value and the second threshold value can be set according to actual conditions, the first threshold value can be set as high as possible, and the second threshold value is set as low as possible, so that the two groups of test products are guaranteed to have larger difference, and subsequent accurate effective features can be conveniently extracted.
It will be appreciated that the selection of test products may be made in accordance with certain characteristics in addition to historical transaction data, such as a transaction record. Preferably, because the main purpose of the present application is to rank the search results in the e-commerce website and rank the search results that meet the user's expectations as far as possible in front, the probability of the product being purchased is increased, and the user is prevented from repeatedly sending search requests to the server through the client to obtain the search results that he desires. That is, in addition to the relevance, the present application gives priority to the conversion rate of deal of a product, that is, the probability that a certain product is purchased after appearing in the search result, which is a feature having a large weight on the ranking result. Generally, if a product appears in the search results and the probability of being purchased is higher, the probability of purchasing the product is also increased for users with the same search request. Therefore, when selecting the test product, the test product can be performed according to the conversion rate of the transaction, and the following method is specifically adopted:
extracting transaction data in a preset time period from historical transaction data, and calculating the transaction conversion rate of each product in the preset time period;
selecting two groups of products with the difference value of the hybridization conversion rate larger than a threshold value as test products;
extracting transaction data of the two groups of test products in a certain time period after the preset time period from historical transaction data, and calculating a characteristic value of each original characteristic in an original characteristic set of the two groups of test products;
and comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, selecting the original characteristics as effective characteristics.
The length of the preset time period can be set according to actual needs, and a shorter length can be set for saving calculation time and reducing calculation amount; to provide accuracy of the results or to provide sufficient computing power for the system, a longer length may be set, for example, a day, three days, ten days, thirty days, or other length, which is not limited by the present application. The length of a certain period of time after the predetermined period of time may also be set according to actual needs. Preferably, in order to ensure the matching of the calculation results, it may be set to have the same length as the predetermined period of time.
In the process, two groups of products with large cross conversion rate difference are selected as test products according to the cross conversion rate in a preset time period. In a specific implementation, a first conversion value and a second conversion value may be set, and a difference value between the first conversion value and the second conversion value is a threshold, and if the transaction conversion rate of a group of products is higher than the first conversion value and the transaction conversion rate of a group of products is lower than the second conversion value, the two groups of products may be selected as test products. And then calculating the characteristic value of each original characteristic in the original characteristic set of the two groups of test products by using the transaction data of the two groups of test products within a certain time period after a preset time period. If the difference of the feature values of the same original feature of the two groups of test products is large, for example, exceeds a set threshold, the original feature can be taken as a valid feature. Because two groups of test products with obvious difference of transaction conversion rates are selected, if the difference value of a certain original characteristic feature value is larger, the influence of the original characteristic on whether the products are in transaction is larger. The original features are screened in the mode, and relevant effective features are extracted, so that the sequencing result is more accurate.
The effective feature selection by the methods mainly depends on two groups of test products with higher transaction contrast (for example, one group is products with transaction records, the other group is products without transaction records, or one group is products with high transaction conversion rate, and the other group is products with low transaction conversion rate), and if a certain feature has a great influence on the transaction of the products, the difference of the feature values calculated by transaction data is also great. If a certain characteristic has little or no effect on the deal of a product, then the difference in the characteristic values for the two groups of products with higher deal contrast will be little or no. Therefore, the effective characteristics can be better screened out through the method, and the accuracy of the sequencing of the subsequent search results is improved.
It can be understood that other characteristics may also be referred to for the selection of the test product, for example, if the ranking result is more focused on the good evaluation, two groups of products with larger differences in the good evaluation may be selected as the test products, then the characteristic values of the original characteristics of the two groups of test products are calculated in the similar manner as described above, and the original characteristics with larger differences in the characteristic values are selected as the effective characteristics. Similarly, if the sorting result is more focused on the transaction amount, two groups of products with larger transaction amount difference can be selected as the test products. The specific selection can be performed by a process similar to the foregoing method, and will not be described herein again.
And 103, determining initial weights of the effective features based on the historical transaction data, and training the initial weights by using the historical transaction data and a preset training model to obtain final weights.
The initial weight and the final weight of each valid feature may be determined by way of model training, and it is understood that the initial weight may also be set empirically. Taking a multidimensional linear model as an example, the initial weights of the effective features can be determined through a multidimensional linear fitting mode, then the initial weights are substituted into a calculation formula to be combined with historical transaction data to calculate theoretical data, and the theoretical data is compared with actual data, and the smaller the difference is, the more accurate the initial weight determination is. If the difference is in the preset range, the initial weight is selected as the final weight of the effective characteristic, otherwise, the initial weight is determined again and the method is adopted to calculate until the difference is reduced to be in the preset range.
Taking the deal situation of the product as an example, firstly, calculating the theoretical deal situation of the product according to the initial weight and the historical transaction data, and finally, comparing the calculated theoretical deal situation with the actual deal situation, wherein the smaller the difference is, the more accurate the initial weight is determined, the initial weight can be used as the final weight of the effective characteristics, otherwise, the weight needs to be determined again until the determined weight value enables the difference between the theoretical deal situation and the actual deal situation to reach the minimum value or be in a preset range. In the specific training, the bargaining condition can be represented by the conversion rate of bargaining or whether bargaining is performed. It is understood that since model training can be performed by more machine learning methods, it is not described in detail in this application.
And 104, sorting the search results based on the final weight of the effective features.
Preferably, ranking the search results based on the final weights of the valid features comprises:
determining an actual effective characteristic value of the search result;
calculating the predicted transaction conversion rate of the search result based on the final weight of the effective characteristic and the actual effective characteristic value;
and sorting the search results according to the predicted successful conversion rate.
It is understood that the main factor of ranking reference herein is the predicted deal conversion rate of the search results. In practical applications, the ranking may also be performed according to other factors, for example, the rating of the search result is good, the main factor for referring to the ranking may be determined according to different ranking objectives, when the ranking objectives are different, the main factor for referring to the ranking may also be different, and the ranking result may also be changed accordingly. However, regardless of the variation of the main factors to be referred to, the ranking score of each search result may be calculated and ranked according to the method described above.
The foregoing method is described in detail below with reference to specific examples. It is assumed that the extracted features included in the original feature set are five features of transaction amount, transaction conversion rate, goodness, delivery speed and picture character quality.
The process of extracting the effective characteristics comprises the following steps:
according to the following table 1, assuming that the predetermined period of time is 30 days, it may be determined that the historical transaction data required to be acquired includes the number of transaction strokes, the number of exposures, the number of good reviews, the total review number, the number of delivery days, the number of pictures, and the number of characters. After the historical transaction data is obtained, calculation can be carried out according to a calculation method to determine the characteristic value of each original characteristic.
TABLE 1 eigenvalue calculation method and raw data
| Serial number | Raw data | Feature name | Calculation method |
| 1 | Number of business turn 30 days (n) | Amount of transaction | 1-1/(1+n) |
| 2 | Number of successful transaction (n) in 30 days, number of exposure (d) in 30 days | Conversion rate of hybridization | (n+0.2)/(d+10) |
| 3 | 30 days good evaluation (g), 30 days total evaluation (f) | Good rate of evaluation | (g+8.5)/(f+10) |
| 4 | Delivery days (t) | Delivery rate | if(t>3)3/t;else1; |
| 5 | Number of pictures (i), number of letters (w)) | Picture character quality | (1-1/(1+i))*(1-1/(1+w)) |
It is assumed that the feature values of the five features of the two sets of test products calculated using the aforementioned historical transaction data are initial feature values. Two groups of test products with higher contrast can be selected according to the calculated initial characteristic value, and one group is assumed to be products with the transaction conversion rate of more than 70 percent, and the other group is assumed to be products with the transaction conversion rate of less than 1 percent. It will be appreciated that since the test product is selected here, if the conversion rate is biased, then the conversion rate may be calculated only, and the characteristic values of other characteristics may not be calculated.
Next, historical transaction data of the two groups of test products in a plurality of time periods after the 30 days, for example, historical transaction data in a week or still in 30 days, is acquired, and feature values of five features of the two groups of test products are calculated according to the historical transaction data, which is assumed to be verification feature values.
And then, comparing the difference values of the verification characteristic values of the same characteristics of the two groups of test products respectively, and if the difference value of the two verification characteristic values exceeds a threshold value, determining that the characteristic is an effective characteristic. Assuming that the threshold value is 0.3, the difference between the five characteristics of the transaction amount, the transaction conversion rate, the goodness rate, the delivery rate and the picture and text quality of the two groups of test products is 0.6, 0.9, 0.8, 0.5 and 0.02 respectively through the comparison. It can be seen that the effective characteristics finally selected are the transaction amount, the transaction conversion rate, the goodness rate and the shipping speed.
And finally, determining the final weight of the four effective features through a model training mode based on historical transaction data, acquiring the actual values of the four effective features in the search results, calculating the ranking score of each search result based on the determined final weight and actual values of the effective features, and ranking the search results according to the ranking score.
According to the search result sorting method and system, effective features influencing sorting results are selected through historical transaction data, final weights of the effective features are determined by combining the historical transaction data, and the search results are finally sorted by using the weights. In the process, besides determining each effective characteristic and the initial weight according to the historical transaction data, the initial weight is trained by utilizing the historical transaction data, so that an optimized final weight is obtained, the objectivity and the accuracy of the final weight are ensured, the objectivity and the accuracy of a sequencing result are improved, the situation that a user continuously requests to obtain residual data or sends a new search request to a server again through a client because the sequencing is not accurate and an expected search result cannot be obtained is avoided, the burden of the server and the occupation of network resources are reduced, and the transmission quantity of data is reduced.
In addition, in the process of selecting effective characteristics, two groups of test products with high and low transaction rates and high contrast are selected according to historical transaction data to serve as test bases. After the characteristic values of the two groups of test products are respectively calculated according to historical transaction data, the influence of the characteristics on the product transaction rate is determined by comparing the difference of the characteristic values of the same original characteristics of the two groups of products, so that effective characteristics are accurately selected, and the sorting accuracy is improved.
Referring to fig. 2, a first embodiment of a search result ranking optimization method according to the present application is shown, which includes the following steps:
step 201, obtaining each group of candidate weight values of the effective features of the search results respectively.
The candidate weight values of the valid features are at least two groups, and may also be three groups or four groups.
Step 202, calculating theoretical sorting scores of the search results at a certain preset time point by using the candidate weight values respectively, and sorting the search results according to the theoretical sorting scores to obtain each group of sorting results.
The theoretical ranking score may be a specific score of a predicted transaction conversion rate, a predicted good ranking rate, or other features, or a combination of features of the search result, and is mainly determined according to an actual ranking purpose, which is not limited in the present application.
Preferably, in the examples of the present application, the prediction of the conversion rate of the deal is taken as an example for illustration. Namely, the predicted transaction conversion rates of the search results at a certain preset time point are calculated by adopting the candidate weight values respectively, and the search results are sorted according to the predicted transaction conversion rates to obtain each group of sorting results.
When the search result at a predetermined time point is determined, the effective features of the search result may be first obtained, and the effective feature values of the search result may be calculated according to the actual data. And then, according to the effective characteristic values, the effective characteristic values are respectively combined with the alternative weight values of each group, the different predicted transaction conversion rates of the search results are calculated, and different sequencing results are obtained according to the different predicted transaction conversion rates.
For example, assuming that the search results at a certain predetermined time point are four in total, including a, b, c, and d, assuming that there are two sets of candidate weight values, it may occur that the ranking results calculated according to one set of weight values are a, b, c, and d; the ranking results calculated from the other set of weight values are d, c, a, b.
Step 203, respectively acquiring the search results of which the sequencing results are arranged in the front by a preset number, and acquiring the transaction data of the search results after the preset time point.
Wherein the specific numerical value ranked in the front by the predetermined number may be determined according to the number of actual search results and the computing power of the system. For example, if the number of actual search results is large and the computing power of the system is general, the predetermined number may be set to a smaller value, such as 2%, 4%, etc. The predetermined quantitative value may also be set to a larger value, e.g., 10%, etc., if the computing power of the system allows it. Of course, the more data, the more objective and accurate the results can be provided, and therefore, a plurality of predetermined numbers, such as 2%, 4%, 6%, 8%, 10%, etc., may be provided.
The specific range of the transaction data after the predetermined time point may be set according to actual conditions, and for example, the transaction data may be transaction data within one week after the predetermined time point, or may be transaction data of ten days, twenty days, or other time periods, as long as the transaction data can be acquired after the predetermined time point.
And step 204, calculating actual sorting scores of the search results which are arranged in front of the preset number in each group of sorting results according to the transaction data.
The actual ranking score refers to the actual ranking score of the search result calculated by the same method of calculating the theoretical ranking score according to the actual data. For example, taking the theoretical ranking score as the predicted transaction conversion rate as an example, the actual ranking score at this time is the actual transaction conversion rate.
Step 205, selecting the candidate weight values corresponding to the group of ranking results with the highest actual ranking scores as the final weight values of the effective features.
Because the theoretical ranking score of each search result is calculated when ranking, the higher the theoretical ranking score, the higher the ranking will be. When the actual sorting score is higher, the sorting result is more consistent with the actual situation, and the sorting is more accurate. It is understood that the actual ranking score is the highest, and may be that the actual ranking score of the selected search result in a certain ranking result is higher than the actual ranking score of the search result at the same position in other ranking results. However, this is a relatively ideal ranking result, and such an optimized ranking result may not be obtained in practice, so for simplifying the calculation process, the highest actual ranking score may be the sum or average of the actual ranking scores.
Take the above two sorting results a, b, c, d and d, c, a, b as examples. And (4) assuming that the basis of the sorting is the hybridization conversion rate, sorting according to the size of the predicted hybridization conversion rate, and selecting the search results which are arranged at the first two positions in each group of sorting results, namely a, b, d and c. The actual conversion rates of the four search results (a, b, d, c) were calculated from the transaction data to be 5%, 4%, 3% and 2%, respectively. Then it can be seen that the average of the actual cross-conversion of a, b is 4.5% higher than the average of the actual cross-conversion of d, c is 2.5%. Therefore, the candidate weight values corresponding to the group of the ranking results a, b, c, d should be the final weight values of the valid features.
In the following, the embodiment of the search result ranking optimization method is described in detail with reference to specific examples by taking the transaction conversion rate as an example.
It is assumed that at a certain point in time T, a search based on a certain keyword may result in a set of search results. According to the foregoing method, the valid feature of the set of search results is fixed, and the valid feature value thereof is also fixed. Supposing that the final weights of the effective features are two groups, the predicted transaction conversion rate of the search results is calculated according to the two groups of weight values, and then the search results are sorted according to the predicted transaction conversion rate. Assuming that there are fifty search results in total, because of the difference of the weighted values, two sets of ranking results can be obtained, assuming N and O respectively, as shown in fig. 3, for the ranking results N and O, the average value of the actual transaction conversion rate of the top x% search results in a period of time after T can be counted, for example, in one week. If the average value of the actual crossing conversion rate of the first x% of the ranking result N is higher than the average value of the actual crossing conversion rate of the first x% of the ranking result O, the prediction of the crossing conversion rate of the search result at the time point T of the ranking result N is closer to the actual result. That is, if the search results are rewound to the time point T and ranked by the weight value adopted by the ranking result N, the search results with higher transaction conversion rate after the time point T can be ranked in front, so that the display opportunities of the search results are improved, and more transactions are promoted.
Preferably, to obtain a more comprehensive objective comparison, different x values can be taken to calculate the difference between the two sets of ranking results. For example, the actual trade conversion average for the top 2% of the commodity can be calculated, then the top 4%, 6%, 8%, … … can be calculated, as shown in Table 2, and so on, and the results of the two sorts can be compared at a number of different points. It can be seen that the prediction effect of the ranking result N is significantly better than that of the ranking result O. It can be understood that the data can be further used for drawing a curve of the average value of the actual transaction conversion rate, and the effect difference of the actual transaction conversion rate and the actual transaction conversion rate can be seen more intuitively.
TABLE 2 average of actual conversion of top x% of the two ranked results (N and O) for the actual trade
| x% | 2% | 4% | 6% | 8% | 10% | ... |
| N | 0.038671 | 0.037019 | 0.036061 | 0.035228 | 0.034294 | ... |
| O | 0.031106 | 0.030587 | 0.029903 | 0.029179 | 0.028548 | ... |
Preferably, in order to ensure that the final weight of the valid features adopted by the sorting result N is statistically significant rather than accidental in the improvement of the effect, the significance verification may be further performed. There are many existing methods for significance verification, such as the T-test for example. The T-test is a common method of comparing the mean of two sets of samples. The P value in the T-test represents the probability that the assumption of a difference in the mean of the two samples does not hold. It is generally considered that P < ═ 0.01 shows a significant difference between the two samples. Assuming that there are 50 actual cross-over conversion averages in table 1, the P value obtained by T-testing the 50 actual cross-over conversion averages of the two ranking results in table 1 is about 8.7E-07, which is much less than 0.01, so that the final weight of the valid features used for ranking result N is statistically significantly optimized relative to the final weight of the valid features used for ranking result O.
It is understood that the foregoing method is described by taking the transaction conversion rate as an example, and in practical applications, the sorting and optimization can be performed according to other characteristics, such as goodness, shipping speed, and the like. Preferably, the ordering and optimization can also be based on the composite features. Different sorting calculation formulas can be set specifically, but the main idea of sorting is similar to the foregoing process of the present application, and is not described herein again.
Further, the foregoing optimization method is a search result ranking optimization method when the final weight of the valid features is not determined yet and the final weight of the optimal set of valid features needs to be selected from multiple sets of possible results. It will be appreciated that when optimization based on the final weights of the active features having been determined is desired, the following approach may be employed.
Referring to fig. 4, a second embodiment of the search result ranking optimization method of the present application is shown, which includes the following steps:
step 401, obtaining a ranking result ranked according to a theoretical ranking score of the search result at a certain preset time point, wherein the theoretical ranking score is obtained according to a final weight of the effective features and an actual effective feature value of each search result;
step 402, obtaining the transaction data of the search results arranged in the sorting results in the front preset number after the preset time point, and calculating the actual sorting score of the search results according to the transaction data;
and 403, comparing the actual sorting score with the theoretical sorting score, and if the difference between the actual sorting score and the theoretical sorting score is greater than a threshold value, optimizing the final weight of the effective features.
The final weight of the optimized effective features may be obtained by the model training method mentioned in the foregoing sorting method, that is, obtaining historical transaction data, and determining the final weight of each optimized effective feature by combining with the training model, which is not described in detail herein. The threshold may also be set according to actual characteristics corresponding to the actual ranking score and the theoretical ranking score, for example, if the actual ranking score and the theoretical ranking score are the actual transaction conversion rate and the predicted transaction conversion rate, respectively, the threshold may be determined according to a range of a difference allowed by the transaction conversion rate in a general situation, for example, 0.2 or other values. In addition, for the specific details mentioned in this method embodiment, reference may be made to the first search result ranking optimization method embodiment, which is not described in detail herein.
In the search result ranking optimization method, the optimal weight value is determined or the determined weight value is optimized by using a certain time point and transaction data after the time point, namely, the ranking mode of the relatively optimized search result is determined by using real historical transaction data or the ranking mode of the existing search result is optimized, so that the ranking result is more objective and accurate.
Referring to fig. 5, an embodiment of the search result ranking system of the present application is shown, which includes an original feature set obtaining module 10, an effective feature extracting module 20, an effective feature weight determining module 30, and a ranking module 40.
An original feature set obtaining module 10, configured to obtain an original feature set, where the original feature includes a preset feature that may affect the ranking of the search results.
And the effective feature extraction module 20 is configured to extract effective features from the original feature set based on the historical transaction data, where the effective features are determined according to the historical transaction data and can affect the ranking of the search results. Preferably, the effective feature extraction module comprises a test product selection submodule, a feature value calculation operator module and a comparison submodule. The test product selection submodule is used for selecting two groups of test products based on historical transaction data, wherein one group of test products is products with transaction records, and the other group of test products is products without transaction records. And the characteristic value operator module is used for respectively extracting the related data of the two groups of test products in a certain time period from the historical transaction data and calculating the characteristic value of each original characteristic of the two groups of test products by using the related data. And the comparison submodule is used for comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, the original characteristics are selected as effective characteristics.
And the valid feature weight determining module 30 is configured to determine an initial weight of each valid feature based on the historical transaction data, and train the initial weight by using the historical transaction data and a predetermined training model to obtain a final weight.
And the sorting module 40 is used for sorting the search results based on the final weight of the effective characteristics.
Referring to fig. 6, a first embodiment of the search result ranking optimization system of the present application is shown, which includes an alternative weight value obtaining module 61, a theoretical ranking score calculating module 63, a transaction data obtaining module 65, an actual ranking score calculating module 67, and a final weight determining module 69.
And the candidate weight value obtaining module 61 is configured to obtain each group of candidate weight values of the effective features of the search result.
And a theoretical sorting score calculating module 63, configured to calculate a theoretical sorting score of the search result at a certain predetermined time point by using each alternative weight value, and sort the search results according to the theoretical sorting score to obtain each group of sorting results.
And the transaction data acquisition module 65 is configured to acquire a predetermined number of search results with the respective groups of ranking results arranged in front, and acquire transaction data of the search results after the predetermined time point.
And the actual sorting score calculating module 67 is used for calculating the actual sorting scores of the search results which are arranged in front of the preset number in each group of sorting results according to the transaction data.
And a final weight determining module 69, configured to select, as a final weight value of the effective feature, a candidate weight value corresponding to a group of ranking results with the highest actual ranking score.
Referring to fig. 7, a second embodiment of the search result ranking optimization system of the present application is shown, which includes a theoretical ranking score calculating module 71, an actual ranking score calculating module 73, and an optimizing module 75.
And the theoretical sorting score calculating module 71 is configured to obtain a sorting result that is sorted at a certain predetermined time point according to the theoretical sorting score of the search result, where the theoretical sorting score is obtained according to the final weight of the effective feature and the actual effective feature value of each search result.
And the actual sorting score calculating module 73 is configured to obtain transaction data of a predetermined number of search results arranged in the sorting result after the predetermined time point, and calculate an actual sorting score of the search result according to the transaction data.
And an optimizing module 75, configured to compare the actual ranking score with the theoretical ranking score, and if a difference between the actual ranking score and the theoretical ranking score is greater than a threshold, optimize a final weight of the valid features.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The search result ranking method and system, and the search result ranking optimization method and system provided by the present application are introduced in detail above, and a specific example is applied in the text to explain the principle and implementation of the present application, and the description of the above embodiment is only used to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (15)
1. A method for ranking search results, comprising the steps of:
acquiring an original feature set, wherein the original feature comprises preset features which may influence the sequencing of search results;
extracting effective features from the original feature set based on historical transaction data, wherein the effective features refer to features which can influence the sequencing of search results and are determined according to the historical transaction data;
determining initial weights of all effective features based on historical transaction data, and training the initial weights by using the historical transaction data and a preset training model to obtain final weights;
ranking the search results based on the final weight of the valid features.
2. The method of search result ranking according to claim 1 wherein said extracting valid features from a raw set of features based on historical transactional data comprises:
selecting two groups of test products based on historical transaction data, wherein one group of test products is products with transaction records, and the other group of test products is products without transaction records;
extracting relevant data of the two groups of test products in a certain time period from historical transaction data respectively, and calculating characteristic values of original characteristics of the two groups of test products by using the relevant data;
and comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, selecting the original characteristics as effective characteristics.
3. The method of search result ranking according to claim 2 wherein said extracting valid features from a raw set of features based on historical transactional data comprises:
extracting transaction data in a preset time period from historical transaction data, and calculating the transaction conversion rate of each product in the preset time period;
selecting two groups of products with the difference value of the hybridization conversion rate larger than a threshold value as test products;
extracting transaction data of the two groups of test products in a certain time period after the preset time period from historical transaction data, and calculating a characteristic value of each original characteristic in an original characteristic set of the two groups of test products;
and comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, selecting the original characteristics as effective characteristics.
4. The method of claim 1, wherein determining an initial weight for each valid feature based on historical transactional data and training the initial weights using the historical transactional data and a training model to obtain final weights comprises:
determining an initial weight of the valid features;
substituting the historical transaction data and the initial weight into a preset training model, and calculating theoretical data;
and comparing the theoretical data with the actual data, if the difference between the theoretical data and the actual data is within a preset range, determining the initial weight as the final weight of the effective features, and otherwise, returning to the step of determining the initial weight of the effective features.
5. The method of claim 1, wherein ranking search results based on the final weight of the valid features comprises:
determining an actual effective characteristic value of the search result;
calculating the predicted transaction conversion rate of the search result based on the final weight of the effective characteristic and the actual effective characteristic value;
and sorting the search results according to the predicted successful conversion rate.
6. A search result ranking optimization method is characterized by comprising the following steps:
respectively acquiring each group of alternative weight values of the effective characteristics of the search results; the effective features are extracted from an original feature set based on historical transaction data, and the effective features refer to features which can influence the sequencing of search results and are determined according to the historical transaction data;
respectively adopting each alternative weight value to calculate a theoretical sorting score of the search result at a certain preset time point, and sorting the search results according to the theoretical sorting score to obtain each group of sorting results;
respectively acquiring search results of which the sequencing results of each group are arranged in the front by a preset amount, and acquiring transaction data of the search results after the preset time point;
calculating actual sorting scores of the search results which are arranged in the front in a preset number in each group of sorting results according to the transaction data;
and selecting the alternative weight values corresponding to a group of sorting results with the highest actual sorting scores as the final weight values of the effective features.
7. The search result ranking optimization method according to claim 6, wherein the theoretical ranking score is a predicted value of a single feature predicted value or a combination of features, and the actual ranking score is an actual value of a single feature or a combination of features corresponding to the theoretical ranking score.
8. The search result ranking optimization method of claim 7, wherein the theoretical ranking score is a predicted deal conversion rate, and the actual ranking score is an actual deal conversion rate; or
The theoretical ranking score is a predicted good rating, and the actual ranking score is an actual good rating.
9. The method according to any one of claims 6 to 8, wherein the selecting the candidate weight value corresponding to the group of ranking results with the highest actual ranking score as the final weight value of the effective feature comprises:
and selecting the alternative weight values corresponding to a group of sorting results with the highest actual sorting score sum or average value as the final weight values of the effective features.
10. A search result ranking optimization method is characterized by comprising the following steps:
obtaining a sorting result sorted according to a theoretical sorting score of a search result at a certain preset time point, wherein the theoretical sorting score is obtained according to the final weight of the effective features and the actual effective feature value of each search result; the effective features are extracted from an original feature set based on historical transaction data, and the effective features refer to features which can influence the sequencing of search results and are determined according to the historical transaction data; the final weight of the effective features is obtained by determining the initial weight of each effective feature based on historical transaction data and training the initial weight by using the historical transaction data and a preset training model;
acquiring transaction data of a preset number of search results arranged in the sorting result after the preset time point, and calculating an actual sorting score of the search results according to the transaction data;
and comparing the actual sorting score with the theoretical sorting score, and optimizing the final weight of the effective features if the difference value of the actual sorting score and the theoretical sorting score is greater than a threshold value.
11. The search result ranking optimization method of claim 10, wherein the theoretical ranking score is a predicted deal conversion rate, and the actual ranking score is an actual deal conversion rate; or
The theoretical ranking score is a predicted good rating, and the actual ranking score is an actual good rating.
12. A search result ranking system, comprising:
the system comprises an original characteristic set acquisition module, a search result sorting module and a search result display module, wherein the original characteristic set acquisition module is used for acquiring an original characteristic set, and the original characteristic comprises preset characteristics which can influence the search result sorting;
the effective feature extraction module is used for extracting effective features from the original feature set based on historical transaction data, wherein the effective features refer to features which can influence the sequencing of the search results and are determined according to the historical transaction data;
the effective characteristic weight determining module is used for determining the initial weight of each effective characteristic based on historical transaction data and training the initial weight by utilizing the historical transaction data and a preset training model to obtain a final weight;
and the ranking module is used for ranking the search results based on the final weight of the effective characteristics.
13. The search result ranking system of claim 12 wherein the valid feature extraction module comprises:
the test product selection submodule is used for selecting two groups of test products based on historical transaction data, wherein one group of test products is products with transaction records, and the other group of test products is products without transaction records;
the characteristic value operator module is used for respectively extracting relevant data of the two groups of test products in a certain time period from historical transaction data and calculating characteristic values of original characteristics of the two groups of test products by using the relevant data;
and the comparison submodule is used for comparing the difference value of the characteristic values of the same original characteristics of the two groups of test products, and if the difference value exceeds a threshold value, the original characteristics are selected as effective characteristics.
14. A search result ranking optimization system, comprising:
the alternative weight value acquisition module is used for respectively acquiring each group of alternative weight values of the effective characteristics of the search results; the effective features are extracted from an original feature set based on historical transaction data, and the effective features refer to features which can influence the sequencing of search results and are determined according to the historical transaction data;
the theoretical sorting score calculating module is used for calculating the theoretical sorting scores of the search results at a certain preset time point by adopting the alternative weight values respectively, and sorting the search results according to the theoretical sorting scores to obtain various groups of sorting results;
the transaction data acquisition module is used for respectively acquiring search results of which the sequencing results of each group are arranged in front of a preset number and acquiring transaction data of the search results after the preset time point;
the actual sorting score calculating module is used for calculating the actual sorting scores of the search results which are arranged in the front of the preset number in each group of sorting results according to the transaction data;
and the final weight determining module is used for selecting the alternative weight values corresponding to the group of sorting results with the highest actual sorting scores as the final weight values of the effective features.
15. A search result ranking optimization system, comprising:
the theoretical sorting score calculating module is used for obtaining sorting results sorted according to the theoretical sorting scores of the search results at a certain preset time point, and the theoretical sorting scores are obtained according to the final weight of the effective features and the actual effective feature values of the search results; the effective features are extracted from an original feature set based on historical transaction data, and the effective features refer to features which can influence the sequencing of search results and are determined according to the historical transaction data; the final weight of the effective features is obtained by determining the initial weight of each effective feature based on historical transaction data and training the initial weight by using the historical transaction data and a preset training model;
the actual sorting score calculating module is used for acquiring the transaction data of the search results which are arranged in the sorting results in the front preset number after the preset time point, and calculating the actual sorting scores of the search results according to the transaction data;
and the optimization module is used for comparing the actual sorting score with the theoretical sorting score, and optimizing the final weight of the effective features if the difference value of the actual sorting score and the theoretical sorting score is greater than a threshold value.
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1193192A HK1193192A (en) | 2014-09-12 |
| HK1193192B true HK1193192B (en) | 2018-08-31 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI554895B (en) | Search results sorting methods and systems, search results sorting optimization methods and systems | |
| JP5693746B2 (en) | Product information ranking | |
| US9117006B2 (en) | Recommending keywords | |
| US9646079B2 (en) | Method and apparatus for identifiying similar questions in a consultation system | |
| US10452662B2 (en) | Determining search result rankings based on trust level values associated with sellers | |
| US8250066B2 (en) | Search results ranking method and system | |
| CN104699725B (en) | data search processing method and system | |
| US8893012B1 (en) | Visual indicator based on relative rating of content item | |
| US20090164895A1 (en) | Extracting semantic relations from query logs | |
| US20190311395A1 (en) | Estimating click-through rate | |
| US20120185359A1 (en) | Ranking of query results based on individuals' needs | |
| US20180308152A1 (en) | Data Processing Method and Apparatus | |
| WO2014107682A1 (en) | Method and apparatus for generating webpage content | |
| CN112232933A (en) | House source information recommendation method, device, equipment and readable storage medium | |
| US10409818B1 (en) | Populating streams of content | |
| CN114820123A (en) | Group purchase commodity recommendation method, device, equipment and storage medium | |
| CN114282976B (en) | Vendor recommendation method and device, electronic equipment and medium | |
| US8700625B1 (en) | Identifying alternative products | |
| CN103514187B (en) | Method and device for providing search results | |
| JP6160018B1 (en) | Information analysis apparatus, information analysis method, and information analysis program | |
| HK1193192B (en) | Method and system for sorting search results, and method and system for optimizing search result sorting | |
| HK1193192A (en) | Method and system for sorting search results, and method and system for optimizing search result sorting | |
| CN119204767A (en) | Inventory management evaluation method, apparatus, device, medium and program product | |
| CN107730369B (en) | Object feature processing method and device and electronic equipment | |
| CN102737059B (en) | For determining the method for the accuracy information of resource description information, device and equipment |