CN119762182A

CN119762182A - Dynamic adjustment method of recommendation strategy, device, equipment and medium thereof

Info

Publication number: CN119762182A
Application number: CN202411829994.8A
Authority: CN
Inventors: 常成龙
Original assignee: Guangzhou Shangyun Network Technology Co ltd
Current assignee: Guangzhou Shangyun Network Technology Co ltd
Priority date: 2024-12-11
Filing date: 2024-12-11
Publication date: 2025-04-04

Abstract

The present application relates to a method for dynamically adjusting recommendation strategies in the field of e-commerce technology and its device, equipment, and medium, the method comprising: determining a target recommendation scenario associated with an independent station store, obtaining a recommendation strategy set corresponding to the scenario based on a recommendation phase strategy library; starting an iterative process, and after applying each recommendation strategy in the recommendation strategy set online, determining the recommendation strategy with the highest measured recommendation effectiveness score as the preferred recommendation strategy; reconstructing a new recommendation strategy corresponding to the target recommendation scenario and higher than the latest highest recommendation effectiveness score based on the recommendation phase strategy library, replacing all other recommendation strategies except the preferred recommendation strategy in the recommendation strategy set, and continuing the iterative process; when a new recommendation strategy cannot be constructed, terminating the iterative process, and obtaining the latest preferred recommendation strategy for the target recommendation scenario as a practical recommendation strategy. The present application can fully explore the optimal recommendation strategy for a scenario.

Description

Dynamic adjustment method of recommendation strategy, device, equipment and medium thereof

Technical Field

The present application relates to the field of electronic commerce technologies, and in particular, to a dynamic adjustment method for a recommendation policy, and a corresponding apparatus, computer device, and computer readable storage medium thereof.

Background

In the field of electronic commerce, the importance of commodity recommendation is self-evident, and the commodity recommendation method can remarkably improve user experience and increase sales. However, the prior art has the disadvantage of using a "one-shot" strategy, using the same recommendation strategy for all users or stores, which ignores the significant differences that exist between different stores, and while providing a degree of recommendation service, they generally fail to adequately account for the uniqueness of each individual station store. For example, different stores may have different target customer groups, merchandise features, brand positioning, and marketing strategies, which may result in a lack of pertinence in the recommendation results, failing to meet the specific needs of the different stores, thereby affecting the recommendation results and user satisfaction.

In view of the shortcomings of the conventional methods, the inventor conducts research in the related field for a long time, and develops a new way for solving the problem in the technical field of electronic commerce.

Disclosure of Invention

It is therefore a primary object of the present application to solve at least one of the above problems and provide a dynamic adjustment method for a recommendation policy, and a corresponding apparatus, computer device, and computer program product.

In order to meet the purposes of the application, the application adopts the following technical scheme:

The application provides a dynamic adjustment method of a recommendation strategy, which is suitable for one of the purposes of the application, and comprises the following steps:

determining a target recommendation scene from various recommendation scenes associated with independent station shops, and acquiring a recommendation strategy set correspondingly constructed for the target recommendation scene based on a recommendation stage strategy library, wherein the recommendation strategy set comprises a plurality of recommendation strategies;

starting an iterative process, and determining a recommendation strategy with the highest actual measured recommendation effect score as a preferred recommendation strategy after each recommendation strategy in the recommendation strategy set is applied online;

reconstructing a new recommendation strategy of the target recommendation scene based on a recommendation stage strategy library, replacing all recommendation strategies except the preferred recommendation strategy in the recommendation strategy set with new recommendation strategies, and continuing an iterative process, wherein the predicted recommendation achievement score of the new recommendation strategy is higher than the actual measured recommendation achievement score of the preferred recommendation strategy;

And ending the iterative process when the new recommendation strategy cannot be constructed, and obtaining the latest preferred recommendation strategy of the target recommendation scene as a practical recommendation strategy for commodity recommendation of the target recommendation scene.

On the other hand, the dynamic recommendation policy adjusting device provided by the application is suitable for one of the purposes of the application, and comprises a collection acquisition module, an iteration starting module, an iteration continuing module and an iteration ending module, wherein the collection acquisition module is used for determining a target recommendation scene from various recommendation scenes associated with independent station stores, acquiring a recommendation policy set correspondingly constructed for the target recommendation scene based on a recommendation stage policy library, the recommendation policy set comprises a plurality of recommendation policies, the iteration starting module is used for starting an iteration process, after each recommendation policy in the recommendation policy set is applied on line, determining the recommendation policy with the highest actual measurement recommendation effect score as a preferred recommendation policy, the iteration continuing module is used for reconstructing a new recommendation policy of the target recommendation scene based on the recommendation stage policy library, replacing all recommendation policies except the preferred recommendation policies in the recommendation policy set with the new recommendation policy, continuing the iteration process, wherein the predicted recommendation effect score of the new recommendation policy is higher than the actual measurement recommendation effect score of the preferred recommendation policy, and the ending module is used for ending the new recommendation policy when the new recommendation policy cannot be constructed, and the target recommendation policy is used as the new recommendation policy.

In yet another aspect, a computer device adapted to one of the objects of the present application comprises a central processor and a memory, said central processor being adapted to invoke the steps of running a computer program stored in said memory for performing the dynamic adjustment method of the recommended policies of the present application.

In yet another aspect, a computer program product is provided adapted to another object of the application, comprising a computer program/instruction which, when executed by a processor, carries out the steps of the method described in any of the embodiments of the application.

The technical scheme of the application has various advantages, including but not limited to the following aspects:

According to the application, a target recommendation scene of an independent station shop is adapted in an online environment, a corresponding recommendation strategy set is constructed and applied based on a recommendation stage strategy library, the recommendation strategy with the highest actual measurement recommendation effect score of each recommendation strategy in the set is determined to be a preferred recommendation strategy through continuous iteration, a non-preferred recommendation strategy is dynamically replaced, a new recommendation strategy with better prediction effect is introduced, and after iteration is finished, the latest preferred recommendation strategy is ensured to be the most practical current recommendation strategy. Therefore, the actual effect of various recommendation strategies can be dynamically debugged on line, so that various recommendation strategies are continuously explored until the optimal recommendation strategy is finally optimized, the real-time dynamic update of the recommendation strategy is ensured, the actual effect is reliable, and the characteristics of independent station shops and the characteristics of market or user requirements can be adapted. And secondly, the suitability of the recommended scene is achieved, the accuracy of the independent station shops in recommending the commodity in the recommended scene is ensured, and the recommendation effect is improved.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a network architecture of an exemplary e-commerce platform of the present application;

FIG. 2 is a flowchart illustrating an exemplary embodiment of a dynamic adjustment method for recommendation strategies according to the present application;

FIG. 3 is a schematic block diagram of a dynamic adjustment device for recommendation strategies according to the present application;

fig. 4 is a schematic structural diagram of a computer device according to the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In the network architecture shown in fig. 1, the e-commerce platform 82 is deployed in the internet to provide corresponding services to its users, and the merchant user's device 80 and the consumer user's device 81 of the e-commerce platform 82 are similarly connected to the internet to use the services provided by the e-commerce platform.

The exemplary e-commerce platform 82 provides matching of supply and demand for products and/or services to the public by means of an internet infrastructure, in the e-commerce platform 82, the products and/or services are provided as merchandise information, and for simplicity of description, the concept of merchandise, products, etc. is used in the present application to refer to the products and/or services in the e-commerce platform 82, and specifically may be physical products, digital products, tickets, service subscriptions, other off-line fulfillment services, etc.

In reality, each entity of the parties can access the identity of the user to the e-commerce platform 82, and the purpose of participating in the business activity realized by the e-commerce platform 82 is realized by using various online services provided by the e-commerce platform 82. These entities may be natural persons, legal persons, social organizations, etc. The e-commerce platform 82 corresponds to both merchant and consumer entities in commerce, and there are two broad categories of merchant users and consumer users, respectively. The online service can be used in the e-commerce platform 82 by the identity of the merchant user, while the online service can be used in the e-commerce platform 82 by the identity of the consumer, including the real or potential consumer, of the merchant user. In actual business activities, the same entity can perform activities on the identity of a merchant user and the identity of a consumer user, so that the user can flexibly understand the activities.

The infrastructure for deploying the e-commerce platform 82 mainly comprises a background architecture and front-end equipment, wherein the background architecture runs various online services through a service cluster, including middleware or front-end services facing a platform side, services facing a consumer, services facing a merchant and the like to enrich and perfect service functions of the services, and the front-end equipment mainly comprises terminal equipment used by a user as a client to access the e-commerce platform 82, including but not limited to various mobile terminals, personal computers, point-of-sale equipment and the like. For example, a merchant user can enter commodity information for his online store through his terminal device 80 or generate his commodity information by using an interface opened by the e-commerce platform, and a consumer user can access a webpage of the online store implemented by the e-commerce platform 82 through his terminal device 81, trigger a shopping process by a shopping key provided on the webpage, and invoke various online services provided by the e-commerce platform 82 in the shopping process, thereby achieving the purpose of ordering shopping.

In some embodiments, the e-commerce platform 82 may be implemented by a processing facility including a processor and memory that stores a set of instructions that, when executed, cause the e-commerce platform 82 to perform e-commerce and support functions in accordance with the present application. The processing facility may be part of a server, client, network infrastructure, mobile computing platform, cloud computing platform, fixed computing platform, or other computing platform, and provide electronic components of the merchant platform 82, merchant devices, payment gateways, application developers, marketing channels, transport providers, client devices, point-of-sale devices, and the like.

The e-commerce platform 82 may be implemented as online services such as cloud computing services, software as a service (SaaS), infrastructure as a service (iaas), platform as a service (PaaS), desktop as a service (DaaS), hosted software as a service, mobile backend as a service (MBaaS), information technology management as a service (I TMaaS), and the like. In some embodiments, the various features of the e-commerce platform 82 may be implemented to be adapted to operate on a variety of platforms and operating systems, e.g., for an online store, its administrator users enjoy the same or similar functionality, whether in the iOS, androi d, homonyOS, web pages, etc., of the various embodiments.

The e-commerce platform 82 may implement its respective independent station for each merchant to run its respective online store, providing the merchant with a respective instance of the commerce management engine for the merchant to establish, maintain, and run one or more of its online stores in one or more independent stations. The business management engine instance can be used for content management, task automation and data management of one or more online stores, and various specific business processes of the online stores can be configured through interfaces or built-in components and the like to support the realization of business activities. The independent station is an infrastructure of the e-commerce platform 82 with cross-border service functionality, and merchants can maintain their online stores more centrally and autonomously based on the independent station. The stand-alone stations typically have merchant-specific domain names and memory space, with relative independence between the different stand-alone stations, and the e-commerce platform 82 may provide standardized or personalized technical support for a vast array of stand-alone stations, so that merchant users may customize their own adaptive commerce management engine instances and use such commerce management engine instances to maintain one or more online stores owned by them.

The online store may implement background configuration and maintenance by the merchant user logging in his business management engine instance with an administrator identity, which, in support of various online services provided by the infrastructure of the e-commerce platform 82, may configure various functions in his online store, consult various data, etc., e.g., the merchant user may manage various aspects of his online store, such as viewing recent activities of the online store, updating online store inventory, managing orders, recent access activities, total order activities, etc., the merchant user may also view more detailed information about businesses and visitors to the merchant's online store, such as displaying sales summaries of the merchant's overall business, specific sales and participation data of the active sales marketing channel, etc., by acquiring reports or metrics.

The e-commerce platform 82 may provide a communications facility and associated merchant interface for providing electronic communications and marketing, such as utilizing an electronic message aggregation facility to collect and analyze communications interactions between merchants, consumers, merchant devices, customer devices, point-of-sale devices, etc., to aggregate and analyze communications, such as for increasing the potential to provide product sales, etc. For example, a consumer may have problems with the product, which may create a dialogue between the consumer and the merchant (or an automated processor-based proxy on behalf of the merchant), where the communication facility is responsible for interacting and providing the merchant with an analysis of how to increase sales probabilities.

In some embodiments, an application program suitable for being installed to a terminal device may be provided to serve access requirements of different users, so that various users can access the e-commerce platform 82 in the terminal device through running the application program, for example, a merchant background module of an online store in the e-commerce platform 82, and in the process of implementing the business activity through the functions, the e-commerce platform 82 may implement various functions related to supporting implementation of the business activity as middleware or online service and open corresponding interfaces, and then implant a tool kit corresponding to the interface access function into the application program to implement function expansion and task implementation. The commerce management engine may include a series of basic functions and expose those functions through APIs to online service and/or application calls that use the corresponding functions by remotely calling the corresponding APIs.

Under the support of the various components of the commerce management engine instance, the e-commerce platform 82 may provide online shopping functionality, enabling merchants to establish contact with customers in a flexible and transparent manner, consumer users may purchase items online, create merchandise orders, provide delivery addresses for the items in the merchandise orders, and complete payment confirmation of the merchandise orders. The merchant may then review and fulfill or cancel the order. The audit component carried by the business management engine instance may enable compliance use of the business process to ensure that the order is suitable for fulfillment prior to actual fulfillment. Orders can sometimes be fraudulent, requiring verification (e.g., identification card checking), a payment method that requires the merchant to wait to ensure funds are received can act to prevent such risk, and so on. The order risk may be generated by fraud detection tools submitted by third parties through an order risk API or the like. Before fulfillment, the merchant may need to acquire payment information or wait to receive payment information in order to mark the order as paid before the merchant can prepare to deliver the product. Such as this, a corresponding examination can be made. The audit flow may be implemented by a fulfillment component. Merchants can review, adjust the job by means of the fulfillment component and trigger related fulfillment services, such as through a manual fulfillment service that can be used when a merchant picks and packages a product in a box, purchases a shipping label and enters its tracking number, or simply marks an item as fulfilled, a custom fulfillment service that can define sending an email to notify, an API fulfillment service that can trigger a third party application to create a fulfillment record at a third party, a legacy fulfillment service that can trigger a custom API call from a business management engine to a third party, and a gift card fulfillment service. Generating a number and activating the gift card may be provided. Merchants may print shipping slips using an order printer application. The fulfillment process may be performed when the items are packaged in boxes and ready for shipment, tracking, delivery, verification by the consumer, etc.

It can be seen that the service provided by the e-commerce platform is based on the fact that products are expanded as cores, corresponding commodity data are basic data of the e-commerce platform, commodity information is provided through the commodity data, mining and utilization of the commodity data are bases for realizing various technical services, and basic services are provided for operation of a data processing system by utilizing user transaction data and commodity data in the commodity data of the e-commerce platform. Therefore, the data processing system can be operated in any one or more servers of the cluster of the e-commerce platform so as to realize various functions by utilizing various commodity data provided by the e-commerce platform.

The dynamic adjustment method of the recommendation strategy of the application can be programmed into a computer program product and deployed in a client or a server for operation, for example, in the exemplary application scenario of the application, the dynamic adjustment method of the recommendation strategy can be deployed in the server of an e-commerce customer service platform, thereby the method can be executed by accessing an interface opened after the computer program product is operated and performing man-machine interaction with the process of the computer program product through a graphical user interface.

Referring to fig. 2, the method for dynamically adjusting a recommendation policy of the present application, in an exemplary embodiment thereof, includes the following steps:

Step S1100, determining a target recommendation scene from various recommendation scenes associated with independent station shops, and acquiring a recommendation strategy set correspondingly constructed for the target recommendation scene based on a recommendation stage strategy library, wherein the recommendation strategy set comprises a plurality of recommendation strategies;

In order to determine a relatively small number of products recommended to a user from a plurality of products sold in independent station shops, the recommendation process implemented may be divided into a plurality of stages, typically four stages in order of first-come, thick-lined, fine-lined, and rearranged, and other stages, for example, three stages in order of first-come, fine-lined, and filtered, which may be determined by one skilled in the art according to the needs of the business.

The preset recommended stage strategy library comprises at least one stage strategy and a recommended scene applicable to each stage in the recommending process, wherein the stage strategy corresponding to each stage can be set, such as store high order recall, user region recall, similar commodity information recall, user behavior recall and the like which are applicable to recall stage setting, various fine-ranking models such as deep FM, GDBT+LR, li stNet and the like which are applicable to fine-ranking stage setting and prediction tasks such as CTR and/or CVR and/or CTCVR, and novel ranking-based, diversity-based ranking, presentation time ranking-based and the like which are applicable to rearrangement stage setting. Secondly, all the phase strategies in the recommendation phase strategy library can be provided with suitable recommendation scenes, for example, for the phase strategies set in the recall phase, the recommendation of clicking actions of the user aiming at the recommendation scenes, and for the user clicking the commodity because of interest, in order to realize recommendation meeting the expectations of the user, various phase strategies suitable for the recommendation scenes can be similar commodity title recall, similar commodity category recall, similar commodity picture recall, similar commodity information recall and the like so as to recall commodities similar to the commodity clicked by the user; for the recommendation that the recommended scene is the top page, the recommendation is performed for displaying the commodity which the user may be interested in, the various stage strategies suitable for the recommended scene can be user behavior recall, collaborative filtering recall, store high click recall, store high shopping recall, store high order recall, user region recall and the like so as to recall the commodity which the user is interested in, and for the fine ranking stage and the filtering stage, the stage strategies of the stages are generally universal and are commonly used for various recommended scenes. Those skilled in the art will be able to set up a recommendation phase policy library as needed for a business based on the disclosure herein.

In the independent station shops, recommendation scenes of the commodity recommendation events are preconfigured aiming at different commodity recommendation events, wherein the recommendation scenes can be the recommendation of pages and/or behaviors triggering the corresponding commodity recommendation events, and can also be the recommendation of activities formulated based on the operation conditions and/or marketing conditions of the independent station shops. In addition, the event identification of each commodity recommendation event is preset, and the event identification is used for uniquely referring to the corresponding commodity recommendation event to distinguish other commodity recommendation events, so that the method can be flexibly realized by one skilled in the art. That is, each of the recommended scenes associated with the independent station store is a recommended scene that the independent station store can serve, and any one of the recommended scenes can be determined as the target recommended scene.

It is to be understood that the recommendation stage library is used for constructing recommendation strategies for recommending commodities in a recommendation scene, specifically, by determining all stage strategies applicable to each stage in the recommendation process and applicable to the target recommendation scene in the recommendation stage library as candidate stage strategies, selecting each stage in the recommendation process, combining each candidate stage strategy of each stage according to the sequence of the stages to obtain a plurality of candidate recommendation strategies, namely, a single candidate recommendation strategy comprises one candidate stage strategy of each stage, and/or selecting part of stages in the recommendation process, for example, each stage in the recommendation process is recall, coarse ranking, fine ranking and rearrangement, selecting the recall and rearrangement, and then, a person skilled in the art can set the part of stages of the recommendation process according to the service requirement to select each candidate stage strategy of each part of stages to combine a plurality of candidate recommendation strategies according to the sequence of the stages to obtain one candidate strategy of each stage. Further, a preset achievement prediction model is adopted to predict that each obtained candidate recommendation strategy is used for recommendation achievement scores corresponding to the target recommendation scene, all candidate recommendation strategies are ranked according to the sequence from high to low of the predicted recommendation achievement scores, a plurality of candidate recommendation strategies with the top ranking are screened out and are respectively used as recommendation strategies to be added into a recommendation strategy set, and the recommendation strategy set is initialized in advance to be constructed as empty. The number of candidate recommendation strategies screened out can be set by a person skilled in the art according to requirements.

The achievement prediction model is trained to a convergence state in advance, the ability of predicting the recommendation achievement scoring corresponding to commodity recommendation of a recommendation scene to which the recommendation strategy is applicable is obtained, a specific training process is further disclosed by the follow-up part of embodiments, and the step is temporarily omitted.

Step 1200, starting an iterative process, and determining a recommendation strategy with the highest actual measured recommendation effect score as a preferred recommendation strategy after each recommendation strategy in the recommendation strategy set is applied online;

Each recommendation strategy in the recommendation strategy set is deployed on line and applied to a commodity recommendation service, in one embodiment, each time a user in an independent station shop triggers a commodity recommendation event associated with the target recommendation scene, a terminal device of the user carrying the independent station shop constructs an on-line recommendation request carrying an event identifier of the commodity recommendation event and sends the on-line recommendation request to a server. And when the server receives the online recommendation requests, and responds to the event, determining a preset recommendation scene of the event, namely a target recommendation scene, according to the unique identification, and further distributing the online recommendation requests to all recommendation strategies in a recommendation strategy set one by one in turn, so that each recommendation strategy is called to respond to the same number of online recommendation requests on average, and a corresponding single recommendation strategy is called to be suitable for a stage strategy corresponding to each stage in the recommendation process by taking a single online recommendation request as an example, and correspondingly determining a commodity recommendation list formed by a relatively small number of multiple recommended commodities from a plurality of commodities in an independent station shop to push to a user triggering the online recommendation request so as to respond to the online recommendation request. After each online recommendation request allocated by each recommendation strategy is responded by invoking each recommendation strategy, whether the corresponding user makes a target user action on the recommended commodities in the commodity recommendation list or not is monitored, the total number of recommended commodities corresponding to the target user action made by the user and the total number of recommended commodities corresponding to the target user action not made by the user are determined in all the recommended commodities in the commodity recommendation list. When the total number of on-line recommendation requests for calling each recommendation strategy response reaches a preset threshold, accumulating the total number of recommended commodities corresponding to the target user behavior made by the determined user after each recommendation strategy response line recommendation request is accumulated for each recommendation strategy, obtaining the total number of obtained finished commodities, and obtaining the total number of unobtained finished commodities according to the total number of recommended commodities corresponding to the target user behavior not made by the user, wherein the ratio of the total number of obtained finished commodities to the total number of unobtained finished commodities is used as the actual measurement recommendation effect score of the recommendation strategy. The target user behavior may be any one or more of clicking, joining a shopping cart, purchasing, and repurchasing. The preset threshold may be flexibly set by a person skilled in the art. The actual measurement recommendation achievement scores of all the recommendation strategies in the recommendation strategy set are obtained, all the recommendation strategies in the recommendation strategy set are ranked according to the actual measurement recommendation achievement scores from high to low, and the recommendation strategy with the forefront ranking is screened out to be used as the preferred recommendation strategy.

Step S1300, reconstructing a new recommendation strategy of the target recommendation scene based on a recommendation stage strategy library, replacing all recommendation strategies except the preferred recommendation strategy in the recommendation strategy set with the new recommendation strategy, and continuing an iterative process, wherein the predicted recommendation achievement score of the new recommendation strategy is higher than the actual measured recommendation achievement score of the preferred recommendation strategy;

It can be understood that, based on the recommendation stage policy library, a plurality of candidate recommendation policies applicable to the target recommendation scene except the recommendation policies added in the recommendation policy set before the candidate recommendation policies can be reconstructed, and then a preset effect prediction model is adopted to predict that each obtained candidate recommendation policy is used for the recommendation effect score corresponding to the target recommendation scene, all candidate recommendation policies are ranked according to the order from high to low of the predicted recommendation effect score, a plurality of candidate recommendation policies which are ranked earlier and have a predicted recommendation effect score higher than the actual measured recommendation effect score corresponding to the existing preferred recommendation policies are screened out and respectively used as new recommendation policies, and then all other recommendation policies except the preferred recommendation policies in the recommendation policy set are replaced with the new recommendation policies, and the iterative process is continued. The number of the reconstruction and screening of the candidate recommendation strategies can be set by a person skilled in the art according to requirements.

And step 1400, ending the iterative process when the new recommendation strategy cannot be constructed, and obtaining the latest preferred recommendation strategy of the target recommendation scene as a practical recommendation strategy for commodity recommendation of the target recommendation scene.

It may be understood that if, after reconstructing all candidate recommendation strategies except the recommendation strategies added in the recommendation strategy set before based on the recommendation stage strategy library, a preset achievement prediction model is further adopted to predict the obtained recommendation achievement scores of the candidate recommendation strategies, and all the predicted recommendation achievement scores are lower than the actual measured recommendation achievement scores of the latest preferred recommendation strategies in the recommendation strategy set, then a new recommendation strategy cannot be constructed, at this time, the preferred recommendation strategies in the recommendation strategy set are the latest preferred recommendation strategies and are not better than the preferred recommendation strategies, so that the iterative process is ended, and the preferred recommendation strategies are used as practical recommendation strategies and are used for providing corresponding commodity recommendation lists to users triggering the online recommendation requests in response to online recommendation requests so as to provide commodity recommendation services for users of independent station stores.

It will be appreciated from the above embodiments that the present application has various advantages over the prior art, including at least:

In a further embodiment, after each recommendation policy in the recommendation policy set is applied online in step S1200, the recommendation policy with the highest measured recommendation effect score is determined as the preferred recommendation policy, which includes the following steps:

Step S1210, calling each recommendation strategy in the recommendation strategy set to respectively respond to at least one online recommendation request associated with the target recommendation scene, and determining the actual measured recommendation success score of each recommendation strategy;

In one embodiment, the server distributes all the received online recommendation requests associated with the target recommendation scene to each recommendation policy in the recommendation policy set one by one in turn, so that each recommendation policy is invoked to respond to the same number of online recommendation requests on average, and a corresponding stage policy applicable to each stage in the recommendation process in the corresponding single recommendation policy is invoked by taking the single online recommendation request as an example, and a commodity recommendation list formed by a plurality of relatively small quantities of recommended commodities is correspondingly determined from a plurality of commodities in an independent station shop and pushed to a user triggering the online recommendation request so as to respond to the online recommendation request. After each online recommendation request allocated by each recommendation strategy is responded by invoking each recommendation strategy, whether the corresponding user makes a target user action on the recommended commodities in the commodity recommendation list or not is monitored, the total number of recommended commodities corresponding to the target user action made by the user and the total number of recommended commodities corresponding to the target user action not made by the user are determined in all the recommended commodities in the commodity recommendation list. When the total number of on-line recommendation requests for calling each recommendation strategy response reaches a preset threshold, accumulating the total number of recommended commodities corresponding to the target user behavior made by the determined user after each recommendation strategy response line recommendation request is accumulated for each recommendation strategy, obtaining the total number of obtained finished commodities, and obtaining the total number of unobtained finished commodities according to the total number of recommended commodities corresponding to the target user behavior not made by the user, wherein the ratio of the total number of obtained finished commodities to the total number of unobtained finished commodities is used as the actual measurement recommendation effect score of the recommendation strategy. The preset threshold may be set as desired by those skilled in the art.

In other alternative embodiments, for each recommendation policy in the recommendation policy set, the actual measured recommendation performance score of the recommendation policy may be obtained if the recommendation policy is a preferred recommendation policy, and the predicted recommendation performance score of the recommendation policy may be obtained if the recommendation policy is not a preferred recommendation policy. And sequencing all the recommendation strategies according to the sequence of the recommendation effect scores from low to high, then distributing the received online recommendation requests related to the target recommendation scene by the server according to the ranking of each recommendation strategy, so that the recommendation strategies with higher calling ranking respond to more online recommendation requests, and the number and implementation modes of specific distribution can be flexibly changed by those skilled in the art. Therefore, a recommendation strategy with a low recommendation effect score can be called more opportunities, and the prediction error can be reduced.

Step S1220, invoking the two recommendation strategies with higher actual measurement recommendation scores to respond to at least one online recommendation request associated with the recommendation scene respectively, determining that the two recommendation strategies correspond to new actual measurement recommendation effect scores, and updating original actual measurement recommendation scores;

further, sorting all the recommendation strategies in the recommendation strategy set according to the sequence from high to low of the actual recommendation effect scores, screening out two recommendation strategies with the front sorting, then, carrying out redirection distribution on all the received online recommendation requests related to the target recommendation scene, changing from the original corresponding recommendation strategies distributed in the recommendation strategy set to the screened two recommendation strategies one by one in turn, enabling the two recommendation strategies to respond to the same number of online recommendation requests on average, and then, determining that the two recommendation strategies correspond to new actual recommendation effect scores in the same way, and then, correspondingly updating the original actual recommendation scores.

Step S1230, determining that the preferred recommendation strategy is the recommendation strategy with the highest actual measured recommendation success score in the two recommendation strategies.

And screening out the recommendation strategy with the highest recommendation effect score as the preferred recommendation strategy according to the latest actually measured recommendation effect scores of the two recommendation strategies.

In this embodiment, first, each recommendation policy in the recommendation policy set is enabled to respond to the same number of online recommendation requests on average, so that each recommendation policy is guaranteed to have a fair opportunity to display its effect, and balanced measured data is facilitated to be collected, so that the success of each recommendation policy is accurately evaluated. And then, by monitoring the response of the user to the commodity recommendation list, the effective recommendation times and the ineffective recommendation times of each recommendation strategy can be accurately counted, and further, the actual recommendation effect score is calculated, and the recommendation optimization is closer to the actual demands of the user due to the realized scoring mechanism based on the user behavior feedback. Finally, based on two-stage optimization, namely, two recommended strategies with better success are screened out firstly, and the recommended strategy with the best success in the two recommended strategies is screened out finely, so that the overall optimization efficiency is higher on the premise of ensuring the accuracy and reliability of optimization.

In a further embodiment, step S1300, reconstructing a new recommendation policy of the target recommendation scene based on the recommendation phase policy library, includes the following steps:

step 1310, reconstructing at least one candidate recommendation strategy corresponding to the recommendation scene based on at least one corresponding stage strategy in the recommendation strategies set in each stage in the recommendation process in the recommendation stage strategy library;

A plurality of candidate recommendation policies applicable to the target recommendation scenario in addition to the recommendation policies previously added to the recommendation policy set may be reconstructed based on the recommendation phase policy library.

Step S1320, predicting the recommendation effect scores of each candidate recommendation strategy by adopting a preset effect prediction model;

Step S1330, when the predicted recommendation performance score of the candidate recommendation policy is higher than the existing actually measured recommendation performance score of the latest preferred recommendation policy, determining that the candidate recommendation policy is a new recommendation policy.

And sorting all candidate recommendation strategies according to the order of the predicted recommendation effect scores from high to low, screening a plurality of candidate recommendation strategies which are ranked at the front and correspond to the predicted recommendation effect scores higher than the actual measured recommendation effect scores of the prior preferred recommendation strategies, and respectively serving as new recommendation strategies.

In this embodiment, a plurality of candidate recommendation strategies are reconstructed first, and a performance estimation model is adopted to predict the recommendation performance scores corresponding to the candidate recommendation strategies, and then the candidate recommendation strategies corresponding to the predicted recommendation performance scores higher than the latest actually measured recommendation performance scores are optimized as new recommendation strategies, so that possible high-quality recommendation strategies can be ensured to be accurately mined.

In a further embodiment, before predicting the recommended achievement score of each candidate recommendation policy by using the preset achievement prediction model, step S1320 includes the following steps:

step S2300, obtaining a training sample in a prepared training set and a supervision label thereof, wherein the training sample comprises a history recommendation strategy and a recommendation scene applicable to the history recommendation strategy, and the supervision label is a recommendation effect score of the history recommendation strategy, which is used for actual measurement and corresponds to commodity recommendation of the recommendation scene;

It can be understood that the applied historical recommendation policies can be deployed on the center line of each independent station store realized by the merchant platform and the recommendation scenes served by each historical recommendation policy in application can be determined in advance, and the recommendation scenes are generally applicable to the corresponding historical recommendation policies. In addition, after each history recommendation policy is applied on line to serve the applicable recommendation scene, a corresponding commodity recommendation list is provided for a user, real-time feedback corresponding to each history recommendation policy is determined based on monitoring, namely, for each history recommendation policy, after the history recommendation policy responds to an on-line recommendation request triggered under the applicable recommendation scene each time, the total number of recommended commodities corresponding to the determined user making target user actions is correspondingly monitored to obtain the total number of obtained finished commodities, the total number of non-obtained finished commodities is obtained according to the total number of recommended commodities corresponding to the user not making target user actions, and the ratio of the total number of obtained finished commodities to the total number of non-obtained finished commodities is used as the actual measured recommendation performance score of the history recommendation policy.

Then, each history recommendation strategy is related to the applicable recommendation scene to form a single training sample, the supervision label of each training sample is marked as the actual measurement recommendation effect score corresponding to the commodity recommendation of the recommendation scene to which the history recommendation strategy of the training sample is applicable, and then all the training samples and the supervision labels thereof are collected to form a training set for calling.

Step S2310, inputting the training sample into the effect estimation model to extract corresponding deep semantic features, and obtaining an effect feature vector;

The model of the achievement pre-estimation model can be a logistic regression model suitable for a classification task, such as an FM model, a GDBT +LR model, an MLP model, an MLR model and the like, and the realization can be flexibly selected and realized by a person skilled in the art.

And extracting the features relevant to the recommended achievement by the training samples corresponding to the influence of the implicit layer in the achievement prediction model to obtain deep semantic features, wherein the deep semantic features can be obtained through low-order and/or high-order feature interaction, sequence position feature interaction, abstract feature representation and/or multi-modal feature fusion in the process. Then, a successful feature vector corresponding to the vectorized representation of the deep semantic feature is output.

Step S2320, performing classified mapping on the achievement feature vectors, and predicting corresponding prediction recommendation achievement scores;

The achievement feature vector is classified and mapped, typically to a probability range between zero and one, with closer to zero indicating less likely to achieve recommended achievement and closer to one indicating more likely to achieve recommended achievement. Thereby, the mapped probability value is obtained as a predicted recommended achievement score.

Step S2330, calculating a loss value corresponding to the predicted recommended achievement score according to the supervision tag, and updating the achievement prediction model according to the loss value until the model converges so as to be used for predicting a recommended achievement score corresponding to commodity recommendation of a recommendation scene for which the recommendation strategy is applicable.

And when the cross entropy loss value does not reach the preset threshold, the effect pre-estimated model is indicated not to be converged, gradient updating is carried out on the model according to the cross entropy loss value, weight parameters of each link of the model are corrected through back propagation to enable the model to further approach convergence, and then, other training samples and supervision labels thereof are continuously called to carry out iterative training on the effect pre-estimated model until the model is trained to a convergence state. The predetermined threshold may be preset by one skilled in the art based on the disclosure herein and as desired.

In this embodiment, the capability of predicting the recommendation performance score corresponding to the commodity recommendation of the recommendation scene to which the recommendation strategy is applied by training the performance prediction model until the performance prediction model is trained to a convergence state is learned, so that the performance prediction model has applicability in multiple recommendation scenes, namely, the recommendation performance score corresponding to the recommendation strategy can be predicted for the recommendation strategies called in various different target recommendation scenes.

In a further embodiment, after step S1400 of obtaining the latest preferred recommendation policy of the target recommendation scene as a practical recommendation policy, the method includes the following steps:

step 1510, determining the total number of times corresponding to the event that the difference value of the actual measured recommendation success score of the practical recommendation strategy exceeds the first preset threshold;

it is to be appreciated that the utility recommendation policy can be continually monitored to determine its corresponding measured recommendation performance score, and that particular implementations can be flexibly implemented by those skilled in the art.

The first preset threshold is used for measuring whether the difference value of the recommended achievement score drop is large enough or not, if the difference value of the recommended achievement score drop measured by the practical recommendation strategy exceeds the first preset threshold, the recommended achievement score drop is excessively large and exceeds expectations, if the difference value of the recommended achievement score drop measured by the practical recommendation strategy is smaller than or equal to the first preset threshold, the recommended achievement score drop is not large enough, and a person skilled in the art can explain the first preset threshold as required.

Step S1520, when the total number exceeds a second preset threshold, restarting the iterative process to redetermine a new practical recommendation strategy.

The second preset threshold may be preset to be 1 or a value greater than 1, if the second preset threshold is 1, this means that once it is confirmed that the actual measured recommended achievement of the practical recommended policy is reduced to a greater extent than the second preset threshold, then the iteration process is restarted immediately, the new practical recommended policy may be redetermined, and updated to the practical recommended policy with the optimal actual measured achievement in time, if the second preset threshold is a value greater than 1, this means that the difference value between the actual measured recommended achievement of the practical recommended policy is greater than the total number of times corresponding to the first preset threshold, and when the difference value is less than or equal to the range of the second preset threshold, this is used as a suitable oscillation range for a great reduction, and when the total number of times exceeds the suitable oscillation range, the iteration process is triggered to restart, the new practical recommended policy may be redetermined, and updated to the practical recommended policy with the optimal actual measured achievement in time. The second preset threshold may be set as desired by one skilled in the art based on the disclosure herein.

In this embodiment, a process of dynamically adjusting a practical recommendation policy is disclosed, so that a response to a performance change can be timely provided, and more accurate recommendation is provided, and the best performance recommendation is maintained, so as to maintain user experience. In addition, stability and responsiveness can be balanced through flexible setting of preset thresholds, and resource waste is avoided through moderate adjustment according to needs.

Step 1201, aiming at each recommendation policy in a recommendation policy set, acquiring actual measurement recommendation effect scores of the recommendation policies if the recommendation policies are preferred recommendation policies, acquiring predicted recommendation effect scores of the recommendation policies if the recommendation policies are not preferred recommendation policies, and constructing effect distribution of each recommendation policy according to corresponding initialization of the recommendation effect scores of each recommendation policy, wherein the predicted recommendation effect scores are obtained by adopting a preset effect prediction model as corresponding recommendation policies;

It will be appreciated that for a recommendation policy in the recommendation policy set that is a preferred recommendation policy, in an iterative process prior to this step, an actual measured recommendation achievement score for the recommendation policy has been determined, thereby obtaining the actual measured recommendation achievement score for the recommendation policy, and for a recommendation policy in the recommendation policy set that is not a preferred recommendation policy, in a construction process prior to this step, a pre-set achievement prediction model has been employed to predict the recommendation policy, thereby obtaining a predicted recommendation achievement score for the recommendation policy.

Then, based on an algorithm principle of thompson sampling (Thompson samp l ing), the recommended achievement scores of all recommended strategies in a recommended strategy set are multiplied by the same preset calling times respectively, alpha in achievement distribution of all recommended strategies is correspondingly obtained, all alpha is subtracted from the preset calling times, beta in achievement distribution of all recommended strategies is correspondingly obtained, and Beta (alpha, beta) constructed by alpha and Beta in the achievement distribution is taken as the achievement distribution aiming at the achievement distribution of all recommended strategies. The preset number of calls can be flexibly set by those skilled in the art according to the service requirements.

Step S1202, starting a sampling iteration process, determining corresponding expected probability of effect for each effect distribution random sampling, calling a recommendation strategy line with the highest expected probability of effect to apply to a plurality of online recommendation requests associated with the target recommendation scene, correspondingly determining the actual effective recommendation total times and the actual ineffective recommendation total times of the recommendation strategies, updating the effect distribution of the recommendation strategies according to the effective recommendation total times and the ineffective recommendation total times, and continuing the sampling iteration process;

For each achievement profile, the achievement profile is used to randomly generate a random number as the achievement expectation probability based on its deformation parameters α and β. And then, determining a recommendation strategy with the highest success probability from the success expected probability of each success distribution, and fully distributing the recommendation strategy by the server according to the plurality of online recommendation requests, so that the online recommendation requests are responded by calling the recommendation strategy, and the server does not call any other recommendation strategy before the online recommendation requests are responded.

And for each online recommendation request, invoking a stage strategy which is applicable to each stage in the recommendation process and corresponds to the stage in the recommendation strategy, and correspondingly determining a commodity recommendation list formed by a relatively small number of multiple recommended commodities from a plurality of commodities in an independent station shop, and pushing the commodity recommendation list to a user triggering the online recommendation request so as to answer the online recommendation request. After the online recommendation requests allocated to the recommendation strategies are responded by invoking the recommendation strategies, whether the corresponding user makes target user behaviors on the recommended commodities in the commodity recommendation list or not is monitored for each online recommendation request, the total number of recommended commodities corresponding to the target user behaviors made by the user and the total number of recommended commodities corresponding to the target user behaviors not made by the user are determined in all the recommended commodities in the commodity recommendation list. When the recommendation strategies are called to finish responding to the online recommendation requests, after the recommendation strategies respond to the online recommendation requests each time, the total number of recommended commodities corresponding to the target user behaviors of the determined user is correspondingly monitored to obtain the total number of obtained finished commodities, the obtained total number of the obtained finished commodities is used as the effective total recommendation times, and the total number of the recommended commodities corresponding to the target user behaviors of the user is not made by the user to obtain the total number of unobtained finished commodities, and the obtained total number of the obtained finished commodities is used as the ineffective total recommendation times. And then updating alpha in the effect distribution of the recommendation strategy to be the value of the sum of the effect distribution and the effective recommended total times, and updating beta in the effect distribution to be the value of the sum of the effect distribution and the ineffective recommended total times, thereby completing the updating of the effect distribution and continuing the sampling iteration flow.

Step S1203, ending the sampling iteration flow when the sampling iteration number exceeds a preset threshold, determining actual measured recommended achievement scores of all recommended strategies according to the latest achievement distribution of all recommended strategies, and determining the recommended strategy with the highest actual measured recommended achievement score as the preferred recommended strategy.

And constructing a counter with the initialized sampling iteration number of zero, and accumulating the sampling iteration number by one after each sampling iteration process is executed. When the reading counter determines that the sampling iteration number exceeds a preset threshold, ending the sampling iteration flow, and aiming at each latest achievement distribution of each recommended strategy, taking the ratio between alpha and beta in the achievement distribution as the actual measured recommended achievement score of the recommended strategy associated with the achievement distribution. And then sorting all the recommendation strategies in the recommendation strategy set according to the actual measured recommendation effect scores from high to low, and screening the recommendation strategy with the forefront sorting as the preferred recommendation strategy. The preset threshold may be set as desired by one skilled in the art.

In this embodiment, according to the recommendation achievement score of each recommendation policy in the recommendation policy set, the achievement distribution of each recommendation policy is initialized and constructed by applying the principle of the thompson sampling algorithm, and then the achievement distribution of each recommendation policy is randomly sampled to determine the recommendation policy with the highest achievement expectation probability and apply the recommendation policy to an actual online recommendation request, in this process, the total times corresponding to effective recommendation and ineffective recommendation are accumulated by monitoring the reaction of a user to a commodity recommendation list, so that the recommendation policy is dynamically adjusted according to the achievement distribution of a real-time feedback update policy, so that the user behavior change can be continuously learned and adapted, and in the sampling iteration process, both exploration and utilization are considered, the user preference is captured more accurately, and the recommendation relevance and user satisfaction are improved.

When the sampling iteration number exceeds a preset threshold, ending the iteration flow, determining a preferred recommendation strategy according to the actually measured recommendation strategies determined by each latest achievement distribution, and ensuring that the recommendation strategy is preferred based on sufficient data and the iterative optimization result, thereby improving the stability and reliability of recommendation.

Step S1501, in response to a recommended policy setting event of a new independent store, determining that the new independent store associated with the event belongs to a plurality of independent stores corresponding to the same class of stores;

The new independent station store can be a newly created independent station store or an independent station store with an operation duration less than a preset duration, which can be preset by a person skilled in the art according to the disclosure herein and as required by the business.

The e-commerce platform provides a recommendation strategy setting service for each independent station store realized by the e-commerce platform, and when any independent station store selects to use the service, a recommendation strategy setting event of the independent station store can be triggered, so that a practical recommendation strategy of the independent station store is set for the independent station store.

In one embodiment, for each independent station shop realized by the electronic commerce platform, store operation characteristics of the independent station shop, such as a shop operation category, a shop main operation price, a shop operation label and the like, are acquired, wherein the shop operation category comprises different categories of commodities in the corresponding independent station shop, the shop operation price interval comprises the price of the main commodity in the corresponding independent station shop, the shop operation label comprises different labels of the commodities in the corresponding independent station shop, the labels can be obtained by extracting keywords in descriptive texts of the corresponding commodity, the descriptive texts comprise all texts describing the corresponding commodity, such as commodity titles, commodity detail texts, commodity introduction texts and the like, and the determination of each shop operation characteristic can be flexibly realized by a person skilled in the art. Further, a clustering algorithm or a deep learning algorithm is adopted, and whether the shops of each two independent stations belong to the same class of shops or not is determined based on the shop management characteristics of the shops of each independent station. The clustering algorithm can be any one of a K-Means algorithm, a DBSCAN algorithm and the like, and can be flexibly and flexibly realized by a person skilled in the art. The deep learning algorithm can be a logistic regression model suitable for two classifications, and can also be a double-tower model, and can be flexibly realized by a person skilled in the art.

Step S1502, for each of various recommendation scenes associated with the new independent station shops, acquiring practical recommendation strategies for recommending commodities in the recommendation scenes by the independent station shops and actual recommendation effect scores thereof, screening out practical recommendation strategies with highest actual recommendation effect, and setting the practical recommendation strategies as practical recommendation strategies for recommending commodities in the recommendation scenes by the new independent station shops.

Similarly, in the new-generation independent station shops, recommendation scenes of the new-generation independent station shops are preconfigured aiming at different commodity recommendation events, wherein the recommendation scenes can be the recommendation of pages and/or behaviors triggering the corresponding commodity recommendation events, and can also be the recommendation of activities formulated based on the operation conditions and/or marketing conditions of the independent station shops.

Taking a recommendation scene associated with a new independent station shop as an example, acquiring practical recommendation strategies and actual recommendation effect scores of the practical recommendation strategies which are determined by a plurality of independent station shops corresponding to the new independent station shops belonging to the same kind of shops in a previous iteration, and screening out the practical recommendation strategy with the highest actual recommendation effect, wherein the practical recommendation strategy is the recommendation strategy which is most suitable for the new independent station shops to conduct commodity recommendation in the recommendation scene, so that the screened practical recommendation strategy is set as the practical recommendation strategy for the new independent station shops to recommend commodities in the recommendation scene.

In this embodiment, by determining a plurality of independent station shops similar to the new independent station shops, and further aiming at various recommendation scenes associated with the new independent station shops, the practical recommendation strategy with the best practical recommendation effect in the practical recommendation strategies of the plurality of independent station shops of the same class can be optimized, and the practical recommendation strategy is set as the practical recommendation strategy used under the recommendation scene corresponding to the new independent station shops, so that the problem of cold start of insufficient user volume of the new independent station shops can be solved, and the recommendation strategies corresponding to the various recommendation scenes can be adapted for the new independent station shops.

Referring to fig. 3, a dynamic adjustment device for a recommendation policy provided by adapting to one of the purposes of the present application is a functional embodiment of the dynamic adjustment method for a recommendation policy of the present application, and on the other hand, the dynamic adjustment device for a recommendation policy provided by adapting to one of the purposes of the present application includes an aggregate acquisition module 1100, an iteration start module 1200, an iteration continuation module 1300, and an iteration end module 1400, where the aggregate acquisition module 1100 is configured to determine a target recommendation policy from various recommendation scenes associated with independent station stores, obtain a recommendation policy set correspondingly constructed for the target recommendation scene based on a recommendation phase policy library, where the recommendation policy set includes a plurality of recommendation policies, the iteration start module 1200 is configured to start an iteration process, determine a recommendation policy with a highest actual measurement recommendation effect score after each recommendation policy in the recommendation policy set is applied, and the iteration continuation module 1300 is configured to reconstruct a new recommendation policy of the target recommendation scene based on the recommendation phase policy library, replace all other recommendation policies except the recommendation policies in the recommendation policy set with new recommendation policies, and continue the iteration process, where the new recommendation policies are further configured to be the new recommendation policies with the new recommendation policies, and the new recommendation policies are configured to be the best iteration result score when the new recommendation policies 1400 are obtained as the new recommendation policies, and the iteration process is unable to be actually used as the new recommendation policies.

In a further embodiment, the iteration start module 1200 includes an actual measurement sub-module for invoking each recommendation policy in a recommendation policy set to respectively respond to at least one online recommendation request associated with the target recommendation scene to determine an actual measurement recommendation effect score of each recommendation policy, a policy two-choice sub-module for invoking two recommendation policies with higher actual measurement recommendation scores to respectively respond to at least one online recommendation request associated with the recommendation scene to determine that the two recommendation policies correspond to new actual measurement recommendation effect scores and update original actual measurement recommendation scores, and a policy single-choice sub-module for determining that a preferred recommendation policy is a recommendation policy with highest actual measurement recommendation effect score in the two recommendation policies.

In a further embodiment, the iteration continuation module 1300 includes a candidate construction sub-module configured to reconstruct at least one candidate recommendation policy corresponding to the recommendation scene based on at least one corresponding stage policy in the recommendation policies set for each stage in the recommendation process in the recommendation stage policy library, a success prediction sub-module configured to predict a recommendation success score of each candidate recommendation policy by using a preset success prediction model, and a new policy confirmation sub-module configured to confirm that the candidate recommendation policy is a new recommendation policy when the predicted success score of the candidate recommendation policy is higher than an actual measured success score of the latest preferred recommendation policy.

In a further embodiment, the achievement prediction submodule comprises a training preparation submodule, a feature extraction submodule, a classification mapping submodule and a model tuning submodule, wherein the training preparation submodule is used for obtaining training samples in a prepared training set and supervision labels thereof, the training samples comprise historical recommendation strategies and recommendation scenes suitable for the historical recommendation strategies, the supervision labels are actual recommendation achievement scores corresponding to commodity recommendation of the recommendation scenes, the feature extraction submodule is used for inputting the training samples into the achievement prediction models to extract corresponding deep semantic features to obtain achievement feature vectors, the classification mapping submodule is used for carrying out classification mapping on the achievement feature vectors to predict corresponding prediction recommendation achievement scores, the model tuning submodule is used for calculating loss values corresponding to the prediction recommendation achievement scores according to the supervision labels, and updating the achievement prediction models according to the loss values until the models converge and are used for predicting recommendation achievement scores corresponding to commodity recommendation scenes suitable for the recommendation strategies.

In a further embodiment, the iteration end module 1400 includes an event accumulation sub-module configured to determine a total number of times corresponding to an event in which a difference in the actual recommended achievement score drop of the actual recommended policy exceeds a first preset threshold, and a policy update sub-module configured to restart an iteration process to redetermine a new actual recommended policy when the total number of times exceeds a second preset threshold.

In a further embodiment, the iteration start module 1200 includes a distribution initialization sub-module configured to, for each recommended policy in the recommended policy set, obtain an actual measurement recommended performance score of the recommended policy if the recommended policy is a preferred recommended policy, obtain a predicted recommended performance score of the recommended policy if the recommended policy is not a preferred recommended policy, correspondingly initialize and construct a performance distribution of each recommended policy according to the recommended performance score of each recommended policy, where the predicted recommended performance score is obtained by using a preset performance prediction model as a corresponding recommended policy, a distribution update sub-module configured to start a sampling iteration process, randomly sample and determine a corresponding performance expected probability for each performance distribution, call a recommendation policy line with a highest performance expected probability, and then correspondingly determine an actual measurement recommended total number and an invalid recommended total number of the recommended policy after the recommendation policy is not a preferred recommended policy, update the recommended performance distribution of each recommended policy according to the actual measurement recommended total number and the invalid total number of times, and further determine an actual measurement iteration process when the recommended policy exceeds a preset iteration score by using a preset iteration process, and determine that the actual measurement result score exceeds a recommendation policy is the highest, and the actual measurement recommended policy is determined as the recommended recommendation policy.

In a further embodiment, the iteration end module 1400 includes an event response sub-module configured to respond to a recommendation policy setting event of a new independent store, determine that the new independent store associated with the event belongs to a plurality of independent stores corresponding to the same class of stores, and a policy setting sub-module configured to obtain, for each recommendation scenario associated with the new independent store, a practical recommendation policy for the plurality of independent stores to recommend goods in the recommendation scenario and a practical recommendation result score actually measured by the plurality of independent stores, screen out a practical recommendation policy with the highest actually measured recommendation result, and set the practical recommendation policy for the new independent store to recommend goods in the recommendation scenario.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. As shown in fig. 4, the internal structure of the computer device is schematically shown. The computer device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and when the computer readable instructions are executed by a processor, the processor can realize a recommendation strategy dynamic adjustment method. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may store computer readable instructions that, when executed by the processor, cause the processor to perform the proposed policy dynamic adjustment method of the present application. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by persons skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The processor in this embodiment is configured to execute specific functions of each module and its sub-module in fig. 3, and the memory stores program codes and various data required for executing the above modules or sub-modules. The network interface is used for data transmission between the user terminal or the server. The memory in this embodiment stores program codes and data required for executing all modules/sub-modules in the recommended policy dynamic adjustment device of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the recommendation policy dynamic adjustment method of any of the embodiments of the present application.

Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments of the present application may be implemented by a computer program for instructing relevant hardware, where the computer program may be stored on a computer readable storage medium, where the program, when executed, may include processes implementing the embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-only memory (Read-On l yMemory, ROM), or a random access memory (Random Access Memory, RAM).

In summary, the method and the system can fully mine the optimal recommendation strategy aiming at the scene, and expected recommendation results are expected to be obtained.

Those of skill in the art will appreciate that the various operations, methods, steps in the flow, acts, schemes, and alternatives discussed in the present application may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed herein may be alternated, altered, rearranged, disassembled, combined, or eliminated. Further, various operations, methods, steps, means, or arrangements of procedures found in the prior art with the open source of the present application may be alternated, altered, rearranged, split, combined, or eliminated.

The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims

1. A method for dynamically adjusting a recommendation strategy, comprising the following steps:

Determine a target recommendation scenario from various recommendation scenarios associated with the independent station store, and obtain a recommendation strategy set corresponding to the target recommendation scenario based on the recommendation stage strategy library, wherein the recommendation strategy set includes multiple recommendation strategies;

Start the iterative process, apply each recommendation strategy in the recommendation strategy set online, and then determine the recommendation strategy with the highest measured recommendation effectiveness score as the preferred recommendation strategy;

Reconstructing a new recommendation strategy for the target recommendation scenario based on the recommendation phase strategy library, replacing all recommendation strategies other than the preferred recommendation strategy in the recommendation strategy set with the new recommendation strategy, and continuing the iterative process, wherein the predicted recommendation effectiveness score of the new recommendation strategy is higher than the measured recommendation effectiveness score of the preferred recommendation strategy;

When the new recommendation strategy cannot be constructed, the iterative process is terminated, and the latest preferred recommendation strategy for the target recommendation scenario is obtained as a practical recommendation strategy for product recommendations in the target recommendation scenario.

2. The method for dynamically adjusting recommendation strategies according to claim 1 is characterized in that after each recommendation strategy in the recommendation strategy set is applied online, the recommendation strategy with the highest measured recommendation effectiveness score is determined as the preferred recommendation strategy, comprising the following steps:

Invoke each recommendation strategy in the recommendation strategy set to respectively respond to at least one online recommendation request associated with the target recommendation scenario, and determine a measured recommendation effectiveness score of each recommendation strategy;

Invoke the two recommendation strategies with higher measured recommendation scores to respectively respond to at least one online recommendation request associated with the recommendation scenario, determine new measured recommendation effectiveness scores corresponding to the two recommendation strategies, and update the original measured recommendation scores;

The preferred recommendation strategy is determined to be the recommendation strategy with the highest measured recommendation effectiveness score among the two recommendation strategies.

3. The method for dynamically adjusting recommendation strategies according to claim 1, characterized in that reconstructing a new recommendation strategy for the target recommendation scenario based on the recommendation stage strategy library comprises the following steps:

Reconstructing at least one candidate recommendation strategy applicable to the recommendation scenario based on at least one corresponding stage strategy among the recommendation strategies set for each stage in the recommendation process in the recommendation stage strategy library;

Use the preset effectiveness estimation model to predict the recommendation effectiveness score of each candidate recommendation strategy;

When the predicted recommendation effectiveness score of the candidate recommendation strategy is higher than the actual measured recommendation effectiveness score of the latest preferred recommendation strategy, the candidate recommendation strategy is confirmed as a new recommendation strategy.

4. The method for dynamically adjusting recommendation strategies according to claim 3 is characterized in that before using a preset effectiveness prediction model to predict the recommendation effectiveness score of each candidate recommendation strategy, the following steps are included:

Obtain training samples and supervisory labels in a prepared training set, wherein the training samples include historical recommendation strategies and applicable recommendation scenarios, and the supervisory labels are actually measured recommendation effectiveness scores corresponding to the commodity recommendations of the historical recommendation strategies in the recommendation scenarios;

Inputting the training sample into the effectiveness prediction model to extract corresponding deep semantic features and obtain an effectiveness feature vector;

Classify and map the effectiveness feature vector to predict the corresponding prediction recommendation effectiveness score;

The loss value corresponding to the predicted recommendation effectiveness score is calculated according to the supervision label, and the effectiveness estimation model is updated according to the loss value until the model converges, so as to predict the recommendation effectiveness score corresponding to the product recommendation of the recommendation strategy for its applicable recommendation scenario.

5. The method for dynamically adjusting recommendation strategies according to claim 1, characterized in that after obtaining the latest preferred recommendation strategy of the target recommendation scenario as the practical recommendation strategy, the method comprises the following steps:

Determine the total number of events corresponding to the decrease in the difference of the recommendation effectiveness score actually measured by the practical recommendation strategy exceeding the first preset threshold;

When the total number of times exceeds a second preset threshold, the iterative process is restarted to redetermine a new practical recommendation strategy.

6. The method for dynamically adjusting recommendation strategies according to claim 1 is characterized in that after each recommendation strategy in the recommendation strategy set is applied online, the recommendation strategy with the highest recommendation effectiveness score is determined as the preferred recommendation strategy, comprising the following steps:

For each recommendation strategy in the recommendation strategy set, if the recommendation strategy is a preferred recommendation strategy, the measured recommendation effectiveness score of the recommendation strategy is obtained; if the recommendation strategy is not a preferred recommendation strategy, the predicted recommendation effectiveness score of the recommendation strategy is obtained; and the effectiveness distribution of each recommendation strategy is constructed based on the corresponding initialization of the recommendation effectiveness score of each recommendation strategy, wherein the predicted recommendation effectiveness score is obtained for the corresponding recommendation strategy using a preset effectiveness prediction model;

Initiate a sampling iteration process, randomly sample each of the effectiveness distributions to determine the corresponding expected effectiveness probability, call the recommendation strategy with the highest expected effectiveness probability and apply it online to multiple online recommendation requests associated with the target recommendation scenario, and then determine the total number of valid recommendations and the total number of invalid recommendations measured by the recommendation strategy accordingly, update the effectiveness distribution of the recommendation strategy according to the total number of valid recommendations and the total number of invalid recommendations, and continue the sampling iteration process;

When the number of sampling iterations exceeds the preset threshold, the sampling iteration process is terminated, and the measured recommendation effectiveness scores of each recommendation strategy are determined according to the latest effectiveness distribution of each recommendation strategy, and the recommendation strategy with the highest measured recommendation effectiveness score is determined as the preferred recommendation strategy.

7. The method for dynamically adjusting the recommendation strategy according to any one of claims 1 to 6, characterized in that after obtaining the latest preferred recommendation strategy of the target recommendation scenario as the practical recommendation strategy, the method comprises the following steps:

In response to a recommendation strategy setting event of a new independent station store, it is determined that the new independent station store associated with the event belongs to multiple independent station stores corresponding to the same type of store;

For various recommendation scenarios associated with new independent station stores, obtain the practical recommendation strategies used by the multiple independent station stores to recommend products in the recommendation scenarios and their actually measured recommendation effectiveness scores, screen out the practical recommendation strategy with the highest measured recommendation effectiveness, and set it as the practical recommendation strategy used by the new independent station stores to recommend products in the recommendation scenarios.

8. A device for dynamically adjusting a recommendation strategy, comprising:

A set acquisition module, used to determine a target recommendation scenario from various recommendation scenarios associated with independent station stores, and acquire a recommendation strategy set corresponding to the target recommendation scenario based on a recommendation stage strategy library, wherein the recommendation strategy set includes multiple recommendation strategies;

The iteration start module is used to start the iteration process. After applying each recommendation strategy in the recommendation strategy set online, the recommendation strategy with the highest measured recommendation effectiveness score is determined as the preferred recommendation strategy.

An iterative continuation module is used to reconstruct a new recommendation strategy for the target recommendation scenario based on the recommendation phase strategy library, replace all recommendation strategies other than the preferred recommendation strategy in the recommendation strategy set with the new recommendation strategy, and continue the iterative process, wherein the predicted recommendation effectiveness score of the new recommendation strategy is higher than the measured recommendation effectiveness score of the preferred recommendation strategy;

The iteration ending module is used to end the iteration process when the new recommendation strategy cannot be constructed, and obtain the latest preferred recommendation strategy of the target recommendation scenario as a practical recommendation strategy for product recommendation in the target recommendation scenario.

9. A computer device comprising a central processing unit and a memory, wherein the central processing unit is used to call and run a computer program stored in the memory to execute the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores a computer program implemented by the method according to any one of claims 1 to 7 in the form of computer-readable instructions, and when the computer program is called and executed by a computer, the steps included in the corresponding method are executed.