CN112256961B

CN112256961B - User portrait generation method, device, equipment and medium

Info

Publication number: CN112256961B
Application number: CN202011118110.XA
Authority: CN
Inventors: 夏婧; 吴振宇; 王建明
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2024-04-09
Anticipated expiration: 2040-10-19
Also published as: CN112256961A; WO2021189922A1

Abstract

The application relates to the technical field of artificial intelligence and discloses a user portrait generation method, device, equipment and medium, wherein the method comprises the following steps: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into a behavior prediction model corresponding to the product identifier to perform probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data. Fully mining user behaviors at the life stage, the life state and the consumption scene change, improving the accuracy of user portraits and improving the fineness of the granularity of the user portraits.

Description

User portrait generation method, device, equipment and medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a user portrait generating method, apparatus, device, and medium.

Background

The user portraits are digital abstractions of the user roles, are models for analyzing and mining the user behaviors, construct accurate user portraits, can help enterprises expand sales of emerging products, and can conduct targeted sales by knowing the environment in which the user is located and the required products. The traditional user portrait model adopts a flock model or a portrait model, can only analyze a user in a single scene, and cannot follow the changes of a life stage, a life state, a consumption scene and the like of the user; the existing user portrait descriptive content lacks individuation, the granularity of the user portrait is thicker, the requirements of a plurality of marketing scenes are difficult to meet, the requirements of various roles are difficult to meet, and long-term clients are difficult to track user behavior culture. Under the conditions of the difficulties, the improvement of the user image help business for accurate marketing is limited, and the requirements of business personnel at the marketing end can not be met in real time, and the characteristic differences and the demand differences of different types of users can not be distinguished with high granularity.

Disclosure of Invention

The main purpose of the application is to provide a user portrait generation method, a device, equipment and a medium, which aim at solving the technical problems that the improvement obtained by the accurate marketing of user portrait help business in the prior art is limited, the requirements of business personnel at a marketing end cannot be met in real time, and the characteristic differences and the demand differences of different types of users cannot be distinguished with high granularity.

In order to achieve the above object, the present application proposes a user portrait generating method, which includes:

acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;

inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data.

Further, before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further includes:

acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users;

determining a set of utility functions for the sample data based on a markov decision process;

And carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier.

Further, the acquiring sample data of a plurality of typical users includes:

acquiring historical data of a plurality of typical users, the historical data comprising: the method comprises the steps of state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user;

carrying out time sequence construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time sequence of the typical user;

and constructing the time sequence of the typical user purchasing behavior data according to the product identifier to obtain sample data of the typical user purchasing behavior time sequence.

Further, the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a utility function set for the sample data based on a markov decision process includes:

Acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user;

carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula;

and extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set.

Further, the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model includes:

performing linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function;

normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function;

and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.

Further, the step of performing parameter estimation on the normalized personal utility function by using a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model includes:

Assuming that there is a potential probability distribution under which expert trajectories are generated, the known conditions are:

where f represents the characteristic expectation (here, the expected utility value of each product to the customer, that is, the personal utility function U to be estimated _agent )，Is expert in nature expectations (weighted utility values of various products to clients), the probability of each product being selected (i.e. the personal utility function U to be estimated _agent W of (3) ₁ ,w ₂ ,w ₃ ,……w _n ) The problem is converted into a standard type, and the problem becomes an optimal problem when solving the maximum entropy:

s.t.∑w＝1

wherein plogp represents the entropy of a random variable;is to find the maximum value; s.t. followed by calculationIs a limitation of (2);

by lagrangian multiplier method:

after solving, carrying out differential calculation on the probability w to obtain the maximum entropy probability as follows:

wherein exp () is an exponential function based on a natural constant e in higher mathematics; parameter lambda _j Corresponding to the Lagrangian multiplier, the parameter can be solved by using a maximum likelihood method; f (f) _j Refers to the expected utility value brought to customers by each j products.

Further, the step of determining the representation of the target user based on the behavior prediction data includes:

comparing the behavior prediction data with a preset threshold value, and taking the comparison result as a prediction result;

And combining the prediction results corresponding to the product identifiers into vectors to serve as portraits of the target users.

The application also provides a user portrait generating device, which comprises:

the data acquisition module is used for acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

the model acquisition module is used for searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;

the prediction module is used for inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction so as to obtain behavior prediction data of the target user;

and the portrait module is used for determining the portrait of the target user according to the behavior prediction data.

The present application also proposes a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the methods described above when the processor executes the computer program.

The present application also proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method of any of the above.

The user portrait generation method, the device, the equipment and the medium realize the description of the life stage, the life state and the consumption scene of the user by acquiring the state characteristic time sequence and the purchasing behavior time sequence of the target user, thereby being beneficial to constructing the user portrait with multiple views and meeting the user portrait requirement of the complex scene; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.

Drawings

FIG. 1 is a flow chart of a user portrait creation method according to an embodiment of the present application;

FIG. 2 is a block diagram schematically illustrating a configuration of a user image generating apparatus according to an embodiment of the present application;

fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.

The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In order to solve the technical problems that in the prior art, the improvement obtained by carrying out accurate marketing on user portrait help business is limited, the requirements of business personnel at a marketing end cannot be met in real time, and the characteristic differences and the requirement differences of different types of users cannot be distinguished with high granularity, a user portrait generation method is provided, and the method is applied to the technical field of artificial intelligence. According to the method, the behavior prediction model is obtained through the model obtained based on the Markov decision process and the maximum likelihood reverse reinforcement learning, and then the probability prediction is carried out by adopting the behavior prediction model, so that the Markov decision process can fully mine the user behavior in the life stage, the life state and the consumption scene, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.

Referring to fig. 1, the user portrait creation method includes:

s1: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

s2: searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;

s3: inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user;

s4: and determining the portrait of the target user according to the behavior prediction data.

According to the embodiment, the state characteristic time sequence and the purchasing behavior time sequence of the target user are obtained, so that the description of the life stage, the life state and the consumption scene of the user is realized, the construction of multi-view user portraits is facilitated, and the user portraits requirements of complex scenes are met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.

For S1, the state characteristic time series and the purchase behavior time series of the target user may be acquired from the database.

The state characteristic time sequence and the purchasing behavior time sequence of the target user refer to the state characteristic time sequence and the purchasing behavior time sequence of the same user to be portrayed.

The state characteristic time sequence refers to a time sequence of state characteristic vectors of a user to be portrait. Each state feature vector represents a plurality of user information. That is, the state feature time series includes a plurality of state feature vectors, the state feature vectors of the plurality of state feature vectors being arranged in time. User information includes, but is not limited to: personal information, financial status, purchase product information, loan records, and information browsing records. For example, the time series of state features may be expressed as { x } ₁ ,x ₂ ,x ₃ ,……x _n }，{x ₁ ,x ₂ ,x ₃ ,……x _n Each state feature vector in the } includes 6 vector elements, the 6 vector elements representing data generation time, personal information, financial status, purchase product information, loan record, information browsing record, that is, x _i Comprising 6 vector elements, x _i Respectively representing data generation time, personal information, financial status, purchase product information, loan record, information browsing record, x _i Is { x } ₁ ,x ₂ ,x ₃ ,……x _n The i-th value (i.e., the state feature vector at the i-th time) in the sequence is not specifically limited herein.

The purchasing behavior time sequence refers to a time sequence of purchasing behavior characteristics of a product by a user to be portrayed. The time series of purchasing behavior includes a plurality of the purchasing behavior features, each of which includes a value, for example, when the purchasing behavior feature is 1, it indicates that the product is purchased, and when the purchasing behavior feature is 0, it indicates that the product is not purchased, which is not particularly limited herein by way of example. For example, the time series of purchasing behavior can be expressed as { a } ₁ ,a ₂ ,a ₃ ,……a _n }，{a ₁ ,a ₂ ,a ₃ ,……a _n The purchase behavior of the same product, a _i Has a value (0 or 1), when a _i Is 0 to indicate that the product is purchased, when a _i Is 1 is that the product is not purchased, a _i Is { a } ₁ ,a ₂ ,a ₃ ,……a _n The ith value (i.e., the purchasing behavior feature of the ith time) in the }, is not specifically limited herein.

Preferably, the number of state feature vectors in the state feature time sequence is the same as the number of purchasing behavior features in the purchasing behavior time sequence.

And S2, finding out the identification which is the same as the product identification of the target user purchasing product carried by the purchasing behavior time sequence from the product identifications of a preset model library, and taking the behavior prediction model corresponding to the found product identification as the behavior prediction model corresponding to the product identification.

The preset model library comprises at least one behavior prediction model, and each behavior prediction model carries a product identifier. The behavior prediction model is a model for performing probability prediction on purchasing behavior for a target.

And modeling and autonomous learning are carried out by adopting sample data of a plurality of typical users based on a Markov decision process and maximum likelihood reverse reinforcement learning, so as to obtain a behavior prediction model. That is, the behavior prediction model carries the same product identification as the product identification of the sample data of the plurality of typical users employed for modeling and autonomous learning.

And S3, inputting the state characteristic time sequence and the purchasing behavior time sequence into a behavior prediction model corresponding to the product identifier carried by the input purchasing behavior time sequence to carry out probability prediction, and obtaining behavior prediction data of the target user output by the behavior prediction model corresponding to the product identifier carried by the purchasing behavior time sequence, namely, the product identifier corresponding to the behavior prediction data is the same as the product identifier carried by the purchasing behavior time for prediction.

The behavior prediction data refers to a probability prediction value of purchasing behavior of a product by a target user.

And repeating the steps S2 to S3, so that the probability prediction of the state characteristic time sequence and the multiple purchasing behavior time sequences can be completed. That is, steps S2 to S3 predict only the probability prediction value of the purchasing behavior of the target user for one product at a time.

For S4, the portrait of the target user is used for describing whether the target user purchases the product.

For example, the representation of the target user may be expressed as [1 0 1 1], the first vector element represents product one, the second vector element represents product two, the third vector element represents product three, the fourth vector element represents product four, the vector element value 0 represents no purchase, and the vector element value 1 represents purchase, then the representation of the target user [1 0 1 1] represents that the target user purchases product one, product three, and product four, and the target user does not purchase product two, as examples and without limitation.

For another example, the representation of the target user may also be expressed as { product one: 1, product two: 0, product three: 1, product IV: 1, aggregate element value 0 represents no purchase, aggregate element value 1 represents purchase, portrayal of target user { product one: 1, product two: 0, product three: 1, product IV: 1} represents that the target user purchases the first product, the third product and the fourth product, and the target user does not purchase the second product, and the example is not particularly limited.

In an embodiment, before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further includes:

s021: acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users;

s022: determining a set of utility functions for the sample data based on a markov decision process;

s023: and carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier.

The embodiment realizes that the behavior prediction model is determined based on the Markov decision process and the maximum likelihood reverse reinforcement learning by adopting sample data of a plurality of typical users, the Markov decision process can fully mine the user behavior in the life stage, the life state and the consumption scene, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.

For S021, sample data for a plurality of typical users may be obtained from the database.

The sample data of the typical user refers to data of a representative customer, and is determined according to historical customer data. Representative customers refer to customers in a class of customers who have a desire and behavior to purchase a product at an average level for the class of customers. Wherein clients with similar income level, similar education level, similar family member composition and similar work experience are divided into the same class of clients. It will be appreciated that there are other ways to categorize clients, such as clients of similar education and similar family members, into the same category of clients, and the examples are not specifically limited herein.

The sample data includes: the method comprises the steps of a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user.

The state characteristic time sequence of the typical user refers to a time sequence of state characteristic vectors of the typical user.

The time series of purchasing behavior of the typical user refers to the time series of purchasing behavior characteristics of the typical user on a certain product.

Preferably, the number of state feature vectors in the state feature time sequence of the typical user is the same as the number of purchase behavior features in the purchase behavior time sequence of the typical user.

For S022, a relationship of states, behaviors, utility functions is established based on a markov decision process according to the time series of state features of all the typical users and the time series of purchasing behavior of all the typical users with the same product identity. And then carrying out optimization solution on the utility function, and determining the utility function set according to the optimization solution result. And extracting utility functions from the optimized solving result, and combining the extracted utility functions into a set, wherein the set formed by the extracted utility functions is the utility function set.

Preferably, the number of utility functions in the utility function set is the same as the number of state feature vectors in the state feature time sequence of the typical user.

And for S023, when maximum likelihood inverse reinforcement learning is carried out according to the utility function set, integrating utility functions in the utility function set in a linear superposition mode, carrying out parameter estimation on an integration result by adopting maximum entropy inverse reinforcement learning, and completing parameter estimation to obtain the behavior prediction model, thereby fitting personal utility functions and purchasing behavior characteristics.

The product identifier carried by the behavior prediction model is the same as the product identifier corresponding to the time series of purchasing behavior of the typical user in step S022.

In one embodiment, the acquiring sample data of a plurality of typical users includes:

s0211: acquiring historical data of a plurality of typical users, the historical data comprising: the method comprises the steps of state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user;

s0212: carrying out time sequence construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time sequence of the typical user;

S0213: and constructing the time sequence of the typical user purchasing behavior data according to the product identifier to obtain sample data of the typical user purchasing behavior time sequence.

According to the embodiment, the time sequence construction is carried out on the state characteristic data of the typical user to obtain the sample data of the state characteristic time sequence of the typical user, the time sequence construction is carried out on the purchasing behavior data of the typical user according to the product identification to obtain the sample data of the purchasing behavior time sequence of the typical user, so that the sample data of the typical user realizes the description of the life stage, the life state and the consumption scene of the user, the construction of multi-view user portraits is facilitated, and the user portrayal requirements of complex scenes are met.

For S0211, acquiring historical customer data to be processed; and extracting typical user characteristics according to the historical client data to be processed to obtain the historical data of the plurality of typical users.

The historical data of each typical user corresponds to one typical user.

The state characteristic data is a data set.

Preferably, the number of the state feature data in the state feature data of the typical user is the same as the number of the purchase behavior data in the purchase behavior data of the typical user.

For S0212, extracting state characteristic data from the state characteristic data of the typical user; and constructing the extracted state characteristic data in a time sequence to obtain sample data of the typical user state characteristic time sequence.

And for S0213, extracting purchasing behavior data from the purchasing behavior data of the typical user according to the product identifier, and constructing the extracted purchasing behavior data in a time sequence to obtain sample data of the purchasing behavior time sequence of the typical user. That is, the time series of purchasing behavior of the typical user of one product identifier is extracted each time, and the time series of purchasing behavior of a plurality of typical users corresponding to the same typical user can be determined through multiple times of extraction.

In one embodiment, the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a utility function set based on the markov decision process according to the state feature time series of all the typical users and the purchasing behavior time series of all the typical users with the same product identification, wherein the step comprises the following steps:

S0221: acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user;

s0222: carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula;

s0223: and extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set.

The embodiment realizes that the utility function set is determined based on the Markov decision process by adopting sample data of a plurality of typical users, and the Markov decision process can fully mine user behaviors when the life stage, the life state and the consumption scene change.

For S0221, the maximum value behavior calculation formula a is expressed as follows:

where p (a|x), the probability of taking action a at state x, U (x, a) is the utility function; x is a value in the time series of state features of the typical user expressed as { x } ₁ ,x ₂ ,x ₃ ,……x _n -a }; a is a value in the time series of purchasing behavior of the typical user expressed as { a } ₁ ,a ₂ ,a ₃ ,……a _n }。

And for S0222, carrying out iteration optimization solution on the maximum value behavior calculation formula by adopting a dynamic programming method to obtain the target maximum value behavior calculation formula.

The optimization solution is to find an optimal strategy to allow a typical user to get always more harvest than other strategies in the interaction process with each state feature in the time series of state features. The optimization solution is to makeMaximum value of->The utility function U (x, a) extracted when the value of (a) is maximum is the most valuable utility function.

Meaning that an optimal strategy is sought that allows individuals to obtain always more harvest during interaction with the environment than other strategies, which can be expressed in pi. Once this optimal strategy pi is found, we solve this reinforcement learning problem. Generally, it is difficult to find an optimal strategy, but a better strategy, i.e., a locally optimal solution, can be determined by comparing the merits of several different strategies.

Preferably, the maximum value behavior calculation formula is subjected to optimization solution by adopting a dynamic programming method iteration by adopting a Bellman equation V.

Wherein V (x) _t ) The representation is based on state x _t The expectation of the utility function U; u (x) _t ,a _t ) Represented at x _t (time t) and a _t Utility function value at time (time t); beta is an attenuation factor, and the value of the attenuation factor is 0-1 (0 can be included or 1 can be included); x is a value in the time series of state characteristics of the typical user, and a is a value in the time series of purchasing behavior of the typical user.

Preferably, the attenuation factor takes a value of 0.9, so that excessive attenuation is avoided; t is time; u is the utility function U (x, a).

And for S0223, extracting a utility function from the target maximum value behavior calculation formula, and putting the extracted utility function into the utility function set.

In one embodiment, the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model includes:

s0231: performing linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function;

s0232: normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function;

s0233: and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.

The embodiment realizes linear superposition and normalization processing to realize maximum likelihood reverse reinforcement learning, realizes autonomous learning through the maximum likelihood reverse reinforcement learning, and improves generalization capability.

For S0231, the utility function set is expressed as { U } ₁ ,U ₂ ,U ₃ ,……U _n ' will (V)Performing linear superposition on utility functions in the utility function set to obtain the to-be-estimated personal utility function U _agent The method is specifically expressed as follows:

U _agent ＝w ₁ U ₁ +w ₂ U ₂ +w ₃ U ₃ +……+w _n U _n

wherein w is ₁ ,w ₂ ,w ₃ ,……w _n Is a parameter that needs to be estimated.

For S0232, it is preferable that the to-be-estimated personal utility function is normalized by a softmax function.

The Softmax function is a normalized exponential function that "compresses" one K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1.

Wherein U (x, a) _j Refers to U in step S0231 _agent W of (2) _j U _j ；U(x,a) _i Refers to U in step S0231 _agent W of (2) _i U _i The method comprises the steps of carrying out a first treatment on the surface of the e is a natural constant, a constant in mathematics, an infinite non-cyclic fraction, and an overrun number, which is approximately 2.718281828459.

In one embodiment, the step of performing parameter estimation on the normalized personal utility function by using the maximum entropy inverse reinforcement learning method to obtain the behavior prediction model includes:

where f represents the expected value of the characteristic (herein, the expected utility value of each product to the customer, i.e., the waitEstimating a personal utility function U _agent )，Is expert in nature expectations (weighted utility values of various products to clients), the probability of each product being selected (i.e. the personal utility function U to be estimated _agent W of (3) ₁ ,w ₂ ,w ₃ ,……w _n ) The method comprises the steps of carrying out a first treatment on the surface of the The problem is converted into a standard type, and the problem becomes the optimal problem when solving the maximum entropy:

s.t.∑w＝1

by lagrangian multiplier method:

wherein exp () is an exponential function based on a natural constant e in higher mathematics; parameter lambda _j Corresponding to the Lagrangian multiplier, the parameter may utilize a maximumSolving by a likelihood method; f (f) _j Refers to the expected utility value brought to customers by each j products.

In one embodiment, the step of determining the representation of the target user based on the behavior prediction data comprises:

s61: comparing the behavior prediction data with a preset threshold value, and taking the comparison result as a prediction result;

Determining that the predicted result corresponding to the product identifier is purchase when the behavior predicted data is higher than the preset threshold value, otherwise determining that the predicted result corresponding to the product identifier is not purchase;

s62: and combining the prediction results corresponding to the product identifiers into vectors to serve as portraits of the target users.

For S61, the preset threshold may be selected from 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, and 0.8, which are not specifically limited herein. The prediction results obtained by the preset threshold value being high are high in accuracy and low in range compared with the prediction results obtained by the preset threshold value being low, wherein the range is reduced, and the prediction results of the users with purchase will are determined to be not purchased.

In S62, all the prediction results corresponding to the product identifier may be combined into a vector, and the vector thus combined may be used as the representation of the target user.

It can be understood that all the prediction results corresponding to the product identifier may be combined into a set, and the combined set may be used as the portrait of the target user.

Referring to fig. 2, the present application further provides a user portrait generating device, where the device includes:

the data acquisition module 100 is configured to acquire a state feature time sequence and a purchasing behavior time sequence of a target user, where the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

The model obtaining module 200 is configured to find a behavior prediction model corresponding to the product identifier from a preset model library, where the behavior prediction model is a model obtained based on a markov decision process and maximum likelihood inverse reinforcement learning;

the prediction module 300 is configured to input the state feature time sequence and the purchasing behavior time sequence to the behavior prediction model corresponding to the product identifier to perform probability prediction to obtain behavior prediction data of the target user;

and a portrayal module 400 for determining a portrayal of the target user based on the behavior prediction data.

In one embodiment, the apparatus comprises: a model training module;

the model training module is used for acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users; determining a set of utility functions for the sample data based on a markov decision process; and carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier.

In one embodiment, the model training module comprises: a sample acquisition sub-module;

the sample obtaining submodule is used for obtaining historical data of a plurality of typical users, and the historical data comprises: the method comprises the steps of state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user; carrying out time sequence construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time sequence of the typical user; and constructing the time sequence of the typical user purchasing behavior data according to the product identifier to obtain sample data of the typical user purchasing behavior time sequence.

In one embodiment, the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user;

the model training module further comprises: determining a sub-module by a utility function;

the utility function determining submodule is used for obtaining a maximum value behavior calculation formula obtained by determining the state characteristic time sequence and the purchasing behavior time sequence of the typical user; carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula; and extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set.

In one embodiment, the model training module further comprises: a maximum likelihood reverse reinforcement learning sub-module;

the maximum likelihood inverse reinforcement learning sub-module is used for carrying out linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function; normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function; and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.

In one embodiment, the maximum likelihood inverse reinforcement learning submodule includes: a parameter estimation unit;

the parameter estimation unit is configured to assume that there is a potential probability distribution, under which an expert trajectory is generated, where the known conditions are:

s.t.∑w＝1

by lagrangian multiplier method:

In one embodiment, the portrait module 400 includes: a prediction result determination sub-module and a representation determination sub-module;

The prediction result determining submodule is used for comparing the target behavior prediction data with a preset threshold value and taking a comparison result as a prediction result;

the portrait determination submodule is used for combining the prediction results corresponding to the product identifiers into vectors to serve as portraits of the target users.

Referring to fig. 3, a computer device is further provided in the embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as user portrait generation methods and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a user portrayal generation method. The user portrait generation method comprises the following steps: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data.

An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a user portrayal generation method, comprising the steps of: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data.

According to the executed user portrait generation method, the state feature time sequence and the purchasing behavior time sequence of the target user are acquired, so that the description of a life stage, a life state and a consumption scene of the user is realized, the construction of multi-view user portraits is facilitated, and the user portrait requirements of complex scenes are met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. A user representation generation method, the method comprising:

inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; determining the portrait of the target user according to the behavior prediction data;

before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further comprises the following steps:

performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier;

the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a utility function set of the sample data based on a markov decision process includes:

extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set;

the time series of state features is expressed as { x } ₁ ,x ₂ ,x ₃ ,……x _n }，{x ₁ ,x ₂ ,x ₃ ,……x _n Each state feature vector in the data generation system comprises 6 vector elements, wherein the 6 vector elements respectively represent data generation time, personal information, financial condition, purchase product information, loan records and information browsing records;

the time series of purchasing behavior is expressed as { a } ₁ ,a ₂ ,a ₃ ,……a _n }，{a ₁ ,a ₂ ,a ₃ ,……a _n The purchase behavior of the same product, a _i Has a value of when a _i Is 0 to indicate that the product is purchased, when a _i Is 1 indicates that the product was not purchased;

the maximum value behavior calculation formula A is expressed as follows:

where p (a|x), the probability of taking action a at state x, U (x, a) is the utility function; x is a value in the time series of state characteristics of the representative user;

adopting a Belman equation V to carry out optimization solution on the maximum value behavior calculation formula by adopting the dynamic programming method iteration

Wherein V (x) _t ) The representation is based on state x _t The expectation of the utility function U; u (x) _t ,a _t ) Represented at x _t And a _t Utility function value at the moment; beta is an attenuation factor, and the value of the attenuation factor is 0-1; x is a value in the state characteristic time series of the typical user, and a is a value in the purchasing behavior time series of the typical user;

the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model comprises the following steps:

2. The user representation generation method according to claim 1, wherein the acquiring sample data of a plurality of typical users includes:

3. The user representation generation method according to claim 1, wherein the step of determining the representation of the target user based on the behavior prediction data comprises:

4. A user representation generating apparatus, the apparatus comprising:

the portrait module is used for determining the portrait of the target user according to the behavior prediction data;

the model training module is used for acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users; determining a set of utility functions for the sample data based on a markov decision process; performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier;

the utility function determining submodule is used for obtaining a maximum value behavior calculation formula obtained by determining the state characteristic time sequence and the purchasing behavior time sequence of the typical user; carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula; extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set;

the maximum value behavior calculation formula A is expressed as follows:

5. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 3 when the computer program is executed.

6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 3.