CN112256961B - User portrait generation method, device, equipment and medium - Google Patents
User portrait generation method, device, equipment and medium Download PDFInfo
- Publication number
- CN112256961B CN112256961B CN202011118110.XA CN202011118110A CN112256961B CN 112256961 B CN112256961 B CN 112256961B CN 202011118110 A CN202011118110 A CN 202011118110A CN 112256961 B CN112256961 B CN 112256961B
- Authority
- CN
- China
- Prior art keywords
- behavior
- user
- time sequence
- product
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 104
- 230000006399 behavior Effects 0.000 claims abstract description 262
- 230000002787 reinforcement Effects 0.000 claims abstract description 48
- 238000007476 Maximum Likelihood Methods 0.000 claims abstract description 41
- 230000008569 process Effects 0.000 claims abstract description 40
- 230000002441 reversible effect Effects 0.000 claims abstract description 26
- 230000006870 function Effects 0.000 claims description 117
- 238000004364 calculation method Methods 0.000 claims description 35
- 239000013598 vector Substances 0.000 claims description 34
- 238000005457 optimization Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 238000010276 construction Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 8
- 230000009471 action Effects 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 230000008859 change Effects 0.000 abstract description 2
- 238000005065 mining Methods 0.000 abstract description 2
- 239000003795 chemical substances by application Substances 0.000 description 10
- 230000006872 improvement Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000003306 harvesting Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 244000144992 flock Species 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to the technical field of artificial intelligence and discloses a user portrait generation method, device, equipment and medium, wherein the method comprises the following steps: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into a behavior prediction model corresponding to the product identifier to perform probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data. Fully mining user behaviors at the life stage, the life state and the consumption scene change, improving the accuracy of user portraits and improving the fineness of the granularity of the user portraits.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a user portrait generating method, apparatus, device, and medium.
Background
The user portraits are digital abstractions of the user roles, are models for analyzing and mining the user behaviors, construct accurate user portraits, can help enterprises expand sales of emerging products, and can conduct targeted sales by knowing the environment in which the user is located and the required products. The traditional user portrait model adopts a flock model or a portrait model, can only analyze a user in a single scene, and cannot follow the changes of a life stage, a life state, a consumption scene and the like of the user; the existing user portrait descriptive content lacks individuation, the granularity of the user portrait is thicker, the requirements of a plurality of marketing scenes are difficult to meet, the requirements of various roles are difficult to meet, and long-term clients are difficult to track user behavior culture. Under the conditions of the difficulties, the improvement of the user image help business for accurate marketing is limited, and the requirements of business personnel at the marketing end can not be met in real time, and the characteristic differences and the demand differences of different types of users can not be distinguished with high granularity.
Disclosure of Invention
The main purpose of the application is to provide a user portrait generation method, a device, equipment and a medium, which aim at solving the technical problems that the improvement obtained by the accurate marketing of user portrait help business in the prior art is limited, the requirements of business personnel at a marketing end cannot be met in real time, and the characteristic differences and the demand differences of different types of users cannot be distinguished with high granularity.
In order to achieve the above object, the present application proposes a user portrait generating method, which includes:
acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;
searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;
inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data.
Further, before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further includes:
acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users;
determining a set of utility functions for the sample data based on a markov decision process;
And carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier.
Further, the acquiring sample data of a plurality of typical users includes:
acquiring historical data of a plurality of typical users, the historical data comprising: the method comprises the steps of state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user;
carrying out time sequence construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time sequence of the typical user;
and constructing the time sequence of the typical user purchasing behavior data according to the product identifier to obtain sample data of the typical user purchasing behavior time sequence.
Further, the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a utility function set for the sample data based on a markov decision process includes:
Acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user;
carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula;
and extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set.
Further, the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model includes:
performing linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function;
normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function;
and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.
Further, the step of performing parameter estimation on the normalized personal utility function by using a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model includes:
Assuming that there is a potential probability distribution under which expert trajectories are generated, the known conditions are:
where f represents the characteristic expectation (here, the expected utility value of each product to the customer, that is, the personal utility function U to be estimated agent ),Is expert in nature expectations (weighted utility values of various products to clients), the probability of each product being selected (i.e. the personal utility function U to be estimated agent W of (3) 1 ,w 2 ,w 3 ,……w n ) The problem is converted into a standard type, and the problem becomes an optimal problem when solving the maximum entropy:
s.t.∑w=1
wherein plogp represents the entropy of a random variable;is to find the maximum value; s.t. followed by calculationIs a limitation of (2);
by lagrangian multiplier method:
after solving, carrying out differential calculation on the probability w to obtain the maximum entropy probability as follows:
wherein exp () is an exponential function based on a natural constant e in higher mathematics; parameter lambda j Corresponding to the Lagrangian multiplier, the parameter can be solved by using a maximum likelihood method; f (f) j Refers to the expected utility value brought to customers by each j products.
Further, the step of determining the representation of the target user based on the behavior prediction data includes:
comparing the behavior prediction data with a preset threshold value, and taking the comparison result as a prediction result;
And combining the prediction results corresponding to the product identifiers into vectors to serve as portraits of the target users.
The application also provides a user portrait generating device, which comprises:
the data acquisition module is used for acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;
the model acquisition module is used for searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;
the prediction module is used for inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction so as to obtain behavior prediction data of the target user;
and the portrait module is used for determining the portrait of the target user according to the behavior prediction data.
The present application also proposes a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the methods described above when the processor executes the computer program.
The present application also proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method of any of the above.
The user portrait generation method, the device, the equipment and the medium realize the description of the life stage, the life state and the consumption scene of the user by acquiring the state characteristic time sequence and the purchasing behavior time sequence of the target user, thereby being beneficial to constructing the user portrait with multiple views and meeting the user portrait requirement of the complex scene; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.
Drawings
FIG. 1 is a flow chart of a user portrait creation method according to an embodiment of the present application;
FIG. 2 is a block diagram schematically illustrating a configuration of a user image generating apparatus according to an embodiment of the present application;
fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present application.
The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In order to solve the technical problems that in the prior art, the improvement obtained by carrying out accurate marketing on user portrait help business is limited, the requirements of business personnel at a marketing end cannot be met in real time, and the characteristic differences and the requirement differences of different types of users cannot be distinguished with high granularity, a user portrait generation method is provided, and the method is applied to the technical field of artificial intelligence. According to the method, the behavior prediction model is obtained through the model obtained based on the Markov decision process and the maximum likelihood reverse reinforcement learning, and then the probability prediction is carried out by adopting the behavior prediction model, so that the Markov decision process can fully mine the user behavior in the life stage, the life state and the consumption scene, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.
Referring to fig. 1, the user portrait creation method includes:
s1: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;
s2: searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;
s3: inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user;
s4: and determining the portrait of the target user according to the behavior prediction data.
According to the embodiment, the state characteristic time sequence and the purchasing behavior time sequence of the target user are obtained, so that the description of the life stage, the life state and the consumption scene of the user is realized, the construction of multi-view user portraits is facilitated, and the user portraits requirements of complex scenes are met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.
For S1, the state characteristic time series and the purchase behavior time series of the target user may be acquired from the database.
The state characteristic time sequence and the purchasing behavior time sequence of the target user refer to the state characteristic time sequence and the purchasing behavior time sequence of the same user to be portrayed.
The state characteristic time sequence refers to a time sequence of state characteristic vectors of a user to be portrait. Each state feature vector represents a plurality of user information. That is, the state feature time series includes a plurality of state feature vectors, the state feature vectors of the plurality of state feature vectors being arranged in time. User information includes, but is not limited to: personal information, financial status, purchase product information, loan records, and information browsing records. For example, the time series of state features may be expressed as { x } 1 ,x 2 ,x 3 ,……x n },{x 1 ,x 2 ,x 3 ,……x n Each state feature vector in the } includes 6 vector elements, the 6 vector elements representing data generation time, personal information, financial status, purchase product information, loan record, information browsing record, that is, x i Comprising 6 vector elements, x i Respectively representing data generation time, personal information, financial status, purchase product information, loan record, information browsing record, x i Is { x } 1 ,x 2 ,x 3 ,……x n The i-th value (i.e., the state feature vector at the i-th time) in the sequence is not specifically limited herein.
The purchasing behavior time sequence refers to a time sequence of purchasing behavior characteristics of a product by a user to be portrayed. The time series of purchasing behavior includes a plurality of the purchasing behavior features, each of which includes a value, for example, when the purchasing behavior feature is 1, it indicates that the product is purchased, and when the purchasing behavior feature is 0, it indicates that the product is not purchased, which is not particularly limited herein by way of example. For example, the time series of purchasing behavior can be expressed as { a } 1 ,a 2 ,a 3 ,……a n },{a 1 ,a 2 ,a 3 ,……a n The purchase behavior of the same product, a i Has a value (0 or 1), when a i Is 0 to indicate that the product is purchased, when a i Is 1 is that the product is not purchased, a i Is { a } 1 ,a 2 ,a 3 ,……a n The ith value (i.e., the purchasing behavior feature of the ith time) in the }, is not specifically limited herein.
Preferably, the number of state feature vectors in the state feature time sequence is the same as the number of purchasing behavior features in the purchasing behavior time sequence.
And S2, finding out the identification which is the same as the product identification of the target user purchasing product carried by the purchasing behavior time sequence from the product identifications of a preset model library, and taking the behavior prediction model corresponding to the found product identification as the behavior prediction model corresponding to the product identification.
The preset model library comprises at least one behavior prediction model, and each behavior prediction model carries a product identifier. The behavior prediction model is a model for performing probability prediction on purchasing behavior for a target.
And modeling and autonomous learning are carried out by adopting sample data of a plurality of typical users based on a Markov decision process and maximum likelihood reverse reinforcement learning, so as to obtain a behavior prediction model. That is, the behavior prediction model carries the same product identification as the product identification of the sample data of the plurality of typical users employed for modeling and autonomous learning.
And S3, inputting the state characteristic time sequence and the purchasing behavior time sequence into a behavior prediction model corresponding to the product identifier carried by the input purchasing behavior time sequence to carry out probability prediction, and obtaining behavior prediction data of the target user output by the behavior prediction model corresponding to the product identifier carried by the purchasing behavior time sequence, namely, the product identifier corresponding to the behavior prediction data is the same as the product identifier carried by the purchasing behavior time for prediction.
The behavior prediction data refers to a probability prediction value of purchasing behavior of a product by a target user.
And repeating the steps S2 to S3, so that the probability prediction of the state characteristic time sequence and the multiple purchasing behavior time sequences can be completed. That is, steps S2 to S3 predict only the probability prediction value of the purchasing behavior of the target user for one product at a time.
For S4, the portrait of the target user is used for describing whether the target user purchases the product.
For example, the representation of the target user may be expressed as [1 0 1 1], the first vector element represents product one, the second vector element represents product two, the third vector element represents product three, the fourth vector element represents product four, the vector element value 0 represents no purchase, and the vector element value 1 represents purchase, then the representation of the target user [1 0 1 1] represents that the target user purchases product one, product three, and product four, and the target user does not purchase product two, as examples and without limitation.
For another example, the representation of the target user may also be expressed as { product one: 1, product two: 0, product three: 1, product IV: 1, aggregate element value 0 represents no purchase, aggregate element value 1 represents purchase, portrayal of target user { product one: 1, product two: 0, product three: 1, product IV: 1} represents that the target user purchases the first product, the third product and the fourth product, and the target user does not purchase the second product, and the example is not particularly limited.
In an embodiment, before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further includes:
s021: acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users;
s022: determining a set of utility functions for the sample data based on a markov decision process;
s023: and carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier.
The embodiment realizes that the behavior prediction model is determined based on the Markov decision process and the maximum likelihood reverse reinforcement learning by adopting sample data of a plurality of typical users, the Markov decision process can fully mine the user behavior in the life stage, the life state and the consumption scene, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.
For S021, sample data for a plurality of typical users may be obtained from the database.
The sample data of the typical user refers to data of a representative customer, and is determined according to historical customer data. Representative customers refer to customers in a class of customers who have a desire and behavior to purchase a product at an average level for the class of customers. Wherein clients with similar income level, similar education level, similar family member composition and similar work experience are divided into the same class of clients. It will be appreciated that there are other ways to categorize clients, such as clients of similar education and similar family members, into the same category of clients, and the examples are not specifically limited herein.
The sample data includes: the method comprises the steps of a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user.
The state characteristic time sequence of the typical user refers to a time sequence of state characteristic vectors of the typical user.
The time series of purchasing behavior of the typical user refers to the time series of purchasing behavior characteristics of the typical user on a certain product.
Preferably, the number of state feature vectors in the state feature time sequence of the typical user is the same as the number of purchase behavior features in the purchase behavior time sequence of the typical user.
For S022, a relationship of states, behaviors, utility functions is established based on a markov decision process according to the time series of state features of all the typical users and the time series of purchasing behavior of all the typical users with the same product identity. And then carrying out optimization solution on the utility function, and determining the utility function set according to the optimization solution result. And extracting utility functions from the optimized solving result, and combining the extracted utility functions into a set, wherein the set formed by the extracted utility functions is the utility function set.
Preferably, the number of utility functions in the utility function set is the same as the number of state feature vectors in the state feature time sequence of the typical user.
And for S023, when maximum likelihood inverse reinforcement learning is carried out according to the utility function set, integrating utility functions in the utility function set in a linear superposition mode, carrying out parameter estimation on an integration result by adopting maximum entropy inverse reinforcement learning, and completing parameter estimation to obtain the behavior prediction model, thereby fitting personal utility functions and purchasing behavior characteristics.
The product identifier carried by the behavior prediction model is the same as the product identifier corresponding to the time series of purchasing behavior of the typical user in step S022.
In one embodiment, the acquiring sample data of a plurality of typical users includes:
s0211: acquiring historical data of a plurality of typical users, the historical data comprising: the method comprises the steps of state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user;
s0212: carrying out time sequence construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time sequence of the typical user;
S0213: and constructing the time sequence of the typical user purchasing behavior data according to the product identifier to obtain sample data of the typical user purchasing behavior time sequence.
According to the embodiment, the time sequence construction is carried out on the state characteristic data of the typical user to obtain the sample data of the state characteristic time sequence of the typical user, the time sequence construction is carried out on the purchasing behavior data of the typical user according to the product identification to obtain the sample data of the purchasing behavior time sequence of the typical user, so that the sample data of the typical user realizes the description of the life stage, the life state and the consumption scene of the user, the construction of multi-view user portraits is facilitated, and the user portrayal requirements of complex scenes are met.
For S0211, acquiring historical customer data to be processed; and extracting typical user characteristics according to the historical client data to be processed to obtain the historical data of the plurality of typical users.
The historical data of each typical user corresponds to one typical user.
The state characteristic data is a data set.
Preferably, the number of the state feature data in the state feature data of the typical user is the same as the number of the purchase behavior data in the purchase behavior data of the typical user.
For S0212, extracting state characteristic data from the state characteristic data of the typical user; and constructing the extracted state characteristic data in a time sequence to obtain sample data of the typical user state characteristic time sequence.
And for S0213, extracting purchasing behavior data from the purchasing behavior data of the typical user according to the product identifier, and constructing the extracted purchasing behavior data in a time sequence to obtain sample data of the purchasing behavior time sequence of the typical user. That is, the time series of purchasing behavior of the typical user of one product identifier is extracted each time, and the time series of purchasing behavior of a plurality of typical users corresponding to the same typical user can be determined through multiple times of extraction.
In one embodiment, the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a utility function set based on the markov decision process according to the state feature time series of all the typical users and the purchasing behavior time series of all the typical users with the same product identification, wherein the step comprises the following steps:
S0221: acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user;
s0222: carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula;
s0223: and extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set.
The embodiment realizes that the utility function set is determined based on the Markov decision process by adopting sample data of a plurality of typical users, and the Markov decision process can fully mine user behaviors when the life stage, the life state and the consumption scene change.
For S0221, the maximum value behavior calculation formula a is expressed as follows:
where p (a|x), the probability of taking action a at state x, U (x, a) is the utility function; x is a value in the time series of state features of the typical user expressed as { x } 1 ,x 2 ,x 3 ,……x n -a }; a is a value in the time series of purchasing behavior of the typical user expressed as { a } 1 ,a 2 ,a 3 ,……a n }。
And for S0222, carrying out iteration optimization solution on the maximum value behavior calculation formula by adopting a dynamic programming method to obtain the target maximum value behavior calculation formula.
The optimization solution is to find an optimal strategy to allow a typical user to get always more harvest than other strategies in the interaction process with each state feature in the time series of state features. The optimization solution is to makeMaximum value of->The utility function U (x, a) extracted when the value of (a) is maximum is the most valuable utility function.
Meaning that an optimal strategy is sought that allows individuals to obtain always more harvest during interaction with the environment than other strategies, which can be expressed in pi. Once this optimal strategy pi is found, we solve this reinforcement learning problem. Generally, it is difficult to find an optimal strategy, but a better strategy, i.e., a locally optimal solution, can be determined by comparing the merits of several different strategies.
Preferably, the maximum value behavior calculation formula is subjected to optimization solution by adopting a dynamic programming method iteration by adopting a Bellman equation V.
Wherein V (x) t ) The representation is based on state x t The expectation of the utility function U; u (x) t ,a t ) Represented at x t (time t) and a t Utility function value at time (time t); beta is an attenuation factor, and the value of the attenuation factor is 0-1 (0 can be included or 1 can be included); x is a value in the time series of state characteristics of the typical user, and a is a value in the time series of purchasing behavior of the typical user.
Preferably, the attenuation factor takes a value of 0.9, so that excessive attenuation is avoided; t is time; u is the utility function U (x, a).
And for S0223, extracting a utility function from the target maximum value behavior calculation formula, and putting the extracted utility function into the utility function set.
In one embodiment, the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model includes:
s0231: performing linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function;
s0232: normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function;
s0233: and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.
The embodiment realizes linear superposition and normalization processing to realize maximum likelihood reverse reinforcement learning, realizes autonomous learning through the maximum likelihood reverse reinforcement learning, and improves generalization capability.
For S0231, the utility function set is expressed as { U } 1 ,U 2 ,U 3 ,……U n ' will (V)Performing linear superposition on utility functions in the utility function set to obtain the to-be-estimated personal utility function U agent The method is specifically expressed as follows:
U agent =w 1 U 1 +w 2 U 2 +w 3 U 3 +……+w n U n
wherein w is 1 ,w 2 ,w 3 ,……w n Is a parameter that needs to be estimated.
For S0232, it is preferable that the to-be-estimated personal utility function is normalized by a softmax function.
The Softmax function is a normalized exponential function that "compresses" one K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1.
Wherein U (x, a) j Refers to U in step S0231 agent W of (2) j U j ;U(x,a) i Refers to U in step S0231 agent W of (2) i U i The method comprises the steps of carrying out a first treatment on the surface of the e is a natural constant, a constant in mathematics, an infinite non-cyclic fraction, and an overrun number, which is approximately 2.718281828459.
In one embodiment, the step of performing parameter estimation on the normalized personal utility function by using the maximum entropy inverse reinforcement learning method to obtain the behavior prediction model includes:
Assuming that there is a potential probability distribution under which expert trajectories are generated, the known conditions are:
where f represents the expected value of the characteristic (herein, the expected utility value of each product to the customer, i.e., the waitEstimating a personal utility function U agent ),Is expert in nature expectations (weighted utility values of various products to clients), the probability of each product being selected (i.e. the personal utility function U to be estimated agent W of (3) 1 ,w 2 ,w 3 ,……w n ) The method comprises the steps of carrying out a first treatment on the surface of the The problem is converted into a standard type, and the problem becomes the optimal problem when solving the maximum entropy:
s.t.∑w=1
wherein plogp represents the entropy of a random variable;is to find the maximum value; s.t. followed by calculationIs a limitation of (2);
by lagrangian multiplier method:
after solving, carrying out differential calculation on the probability w to obtain the maximum entropy probability as follows:
wherein exp () is an exponential function based on a natural constant e in higher mathematics; parameter lambda j Corresponding to the Lagrangian multiplier, the parameter may utilize a maximumSolving by a likelihood method; f (f) j Refers to the expected utility value brought to customers by each j products.
In one embodiment, the step of determining the representation of the target user based on the behavior prediction data comprises:
s61: comparing the behavior prediction data with a preset threshold value, and taking the comparison result as a prediction result;
Determining that the predicted result corresponding to the product identifier is purchase when the behavior predicted data is higher than the preset threshold value, otherwise determining that the predicted result corresponding to the product identifier is not purchase;
s62: and combining the prediction results corresponding to the product identifiers into vectors to serve as portraits of the target users.
For S61, the preset threshold may be selected from 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, and 0.8, which are not specifically limited herein. The prediction results obtained by the preset threshold value being high are high in accuracy and low in range compared with the prediction results obtained by the preset threshold value being low, wherein the range is reduced, and the prediction results of the users with purchase will are determined to be not purchased.
In S62, all the prediction results corresponding to the product identifier may be combined into a vector, and the vector thus combined may be used as the representation of the target user.
It can be understood that all the prediction results corresponding to the product identifier may be combined into a set, and the combined set may be used as the portrait of the target user.
Referring to fig. 2, the present application further provides a user portrait generating device, where the device includes:
the data acquisition module 100 is configured to acquire a state feature time sequence and a purchasing behavior time sequence of a target user, where the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;
The model obtaining module 200 is configured to find a behavior prediction model corresponding to the product identifier from a preset model library, where the behavior prediction model is a model obtained based on a markov decision process and maximum likelihood inverse reinforcement learning;
the prediction module 300 is configured to input the state feature time sequence and the purchasing behavior time sequence to the behavior prediction model corresponding to the product identifier to perform probability prediction to obtain behavior prediction data of the target user;
and a portrayal module 400 for determining a portrayal of the target user based on the behavior prediction data.
According to the embodiment, the state characteristic time sequence and the purchasing behavior time sequence of the target user are obtained, so that the description of the life stage, the life state and the consumption scene of the user is realized, the construction of multi-view user portraits is facilitated, and the user portraits requirements of complex scenes are met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.
In one embodiment, the apparatus comprises: a model training module;
the model training module is used for acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users; determining a set of utility functions for the sample data based on a markov decision process; and carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier.
In one embodiment, the model training module comprises: a sample acquisition sub-module;
the sample obtaining submodule is used for obtaining historical data of a plurality of typical users, and the historical data comprises: the method comprises the steps of state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user; carrying out time sequence construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time sequence of the typical user; and constructing the time sequence of the typical user purchasing behavior data according to the product identifier to obtain sample data of the typical user purchasing behavior time sequence.
In one embodiment, the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user;
the model training module further comprises: determining a sub-module by a utility function;
the utility function determining submodule is used for obtaining a maximum value behavior calculation formula obtained by determining the state characteristic time sequence and the purchasing behavior time sequence of the typical user; carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula; and extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set.
In one embodiment, the model training module further comprises: a maximum likelihood reverse reinforcement learning sub-module;
the maximum likelihood inverse reinforcement learning sub-module is used for carrying out linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function; normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function; and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.
In one embodiment, the maximum likelihood inverse reinforcement learning submodule includes: a parameter estimation unit;
the parameter estimation unit is configured to assume that there is a potential probability distribution, under which an expert trajectory is generated, where the known conditions are:
where f represents the characteristic expectation (here, the expected utility value of each product to the customer, that is, the personal utility function U to be estimated agent ),Is expert in nature expectations (weighted utility values of various products to clients), the probability of each product being selected (i.e. the personal utility function U to be estimated agent W of (3) 1 ,w 2 ,w 3 ,……w n ) The problem is converted into a standard type, and the problem becomes an optimal problem when solving the maximum entropy:
s.t.∑w=1
wherein plogp represents the entropy of a random variable;is to find the maximum value; s.t. followed by calculationIs a limitation of (2);
by lagrangian multiplier method:
after solving, carrying out differential calculation on the probability w to obtain the maximum entropy probability as follows:
wherein exp () is an exponential function based on a natural constant e in higher mathematics; parameter lambda j Corresponding to the Lagrangian multiplier, the parameter can be solved by using a maximum likelihood method; f (f) j Refers to the expected utility value brought to customers by each j products.
In one embodiment, the portrait module 400 includes: a prediction result determination sub-module and a representation determination sub-module;
The prediction result determining submodule is used for comparing the target behavior prediction data with a preset threshold value and taking a comparison result as a prediction result;
the portrait determination submodule is used for combining the prediction results corresponding to the product identifiers into vectors to serve as portraits of the target users.
Referring to fig. 3, a computer device is further provided in the embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as user portrait generation methods and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a user portrayal generation method. The user portrait generation method comprises the following steps: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data.
According to the embodiment, the state characteristic time sequence and the purchasing behavior time sequence of the target user are obtained, so that the description of the life stage, the life state and the consumption scene of the user is realized, the construction of multi-view user portraits is facilitated, and the user portraits requirements of complex scenes are met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.
An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a user portrayal generation method, comprising the steps of: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; and determining the portrait of the target user according to the behavior prediction data.
According to the executed user portrait generation method, the state feature time sequence and the purchasing behavior time sequence of the target user are acquired, so that the description of a life stage, a life state and a consumption scene of the user is realized, the construction of multi-view user portraits is facilitated, and the user portrait requirements of complex scenes are met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and therefore the fineness of user portrait granularity is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning, the Markov decision process can fully mine user behaviors in a life stage, a life state and a consumption scene, the accuracy of user portraits is improved, autonomous learning is realized through the maximum likelihood reverse reinforcement learning, and the generalization capability is improved.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.
Claims (6)
1. A user representation generation method, the method comprising:
acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;
Searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;
inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction to obtain behavior prediction data of the target user; determining the portrait of the target user according to the behavior prediction data;
before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further comprises the following steps:
acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users;
determining a set of utility functions for the sample data based on a markov decision process;
performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier;
the sample data includes: a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a utility function set of the sample data based on a markov decision process includes:
Acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user;
carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula;
extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set;
the time series of state features is expressed as { x } 1 ,x 2 ,x 3 ,……x n },{x 1 ,x 2 ,x 3 ,……x n Each state feature vector in the data generation system comprises 6 vector elements, wherein the 6 vector elements respectively represent data generation time, personal information, financial condition, purchase product information, loan records and information browsing records;
the time series of purchasing behavior is expressed as { a } 1 ,a 2 ,a 3 ,……a n },{a 1 ,a 2 ,a 3 ,……a n The purchase behavior of the same product, a i Has a value of when a i Is 0 to indicate that the product is purchased, when a i Is 1 indicates that the product was not purchased;
the maximum value behavior calculation formula A is expressed as follows:
where p (a|x), the probability of taking action a at state x, U (x, a) is the utility function; x is a value in the time series of state characteristics of the representative user;
adopting a Belman equation V to carry out optimization solution on the maximum value behavior calculation formula by adopting the dynamic programming method iteration
Wherein V (x) t ) The representation is based on state x t The expectation of the utility function U; u (x) t ,a t ) Represented at x t And a t Utility function value at the moment; beta is an attenuation factor, and the value of the attenuation factor is 0-1; x is a value in the state characteristic time series of the typical user, and a is a value in the purchasing behavior time series of the typical user;
the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model comprises the following steps:
performing linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function;
normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function;
and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.
2. The user representation generation method according to claim 1, wherein the acquiring sample data of a plurality of typical users includes:
acquiring historical data of a plurality of typical users, the historical data comprising: the method comprises the steps of state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user;
Carrying out time sequence construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time sequence of the typical user;
and constructing the time sequence of the typical user purchasing behavior data according to the product identifier to obtain sample data of the typical user purchasing behavior time sequence.
3. The user representation generation method according to claim 1, wherein the step of determining the representation of the target user based on the behavior prediction data comprises:
comparing the behavior prediction data with a preset threshold value, and taking the comparison result as a prediction result;
and combining the prediction results corresponding to the product identifiers into vectors to serve as portraits of the target users.
4. A user representation generating apparatus, the apparatus comprising:
the data acquisition module is used for acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;
the model acquisition module is used for searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood reverse reinforcement learning;
The prediction module is used for inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to carry out probability prediction so as to obtain behavior prediction data of the target user;
the portrait module is used for determining the portrait of the target user according to the behavior prediction data;
the model training module is used for acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users; determining a set of utility functions for the sample data based on a markov decision process; performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identifier;
the utility function determining submodule is used for obtaining a maximum value behavior calculation formula obtained by determining the state characteristic time sequence and the purchasing behavior time sequence of the typical user; carrying out iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to carry out optimization solution to obtain a target maximum value behavior calculation formula; extracting utility functions from the target maximum value behavior calculation formula and combining the extracted utility functions into the utility function set;
The time series of state features is expressed as { x } 1 ,x 2 ,x 3 ,……x n },{x 1 ,x 2 ,x 3 ,……x n Each state feature vector in the data generation system comprises 6 vector elements, wherein the 6 vector elements respectively represent data generation time, personal information, financial condition, purchase product information, loan records and information browsing records;
the time series of purchasing behavior is expressed as { a } 1 ,a 2 ,a 3 ,……a n },{a 1 ,a 2 ,a 3 ,……a n The purchase behavior of the same product, a i Has a value of when a i Is 0 to indicate that the product is purchased, when a i Is 1 indicates that the product was not purchased;
the maximum value behavior calculation formula A is expressed as follows:
where p (a|x), the probability of taking action a at state x, U (x, a) is the utility function; x is a value in the time series of state characteristics of the representative user;
adopting a Belman equation V to carry out optimization solution on the maximum value behavior calculation formula by adopting the dynamic programming method iteration
Wherein V (x) t ) The representation is based on state x t The expectation of the utility function U; u (x) t ,a t ) Represented at x t And a t Utility function value at the moment; beta is an attenuation factor, and the value of the attenuation factor is 0-1; x is a value in the state characteristic time series of the typical user, and a is a value in the purchasing behavior time series of the typical user;
the maximum likelihood inverse reinforcement learning sub-module is used for carrying out linear superposition on utility functions in the utility function set to obtain a to-be-estimated personal utility function; normalizing the to-be-estimated personal utility function by adopting a softmax function to obtain a normalized personal utility function; and carrying out parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.
5. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 3 when the computer program is executed.
6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 3.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011118110.XA CN112256961B (en) | 2020-10-19 | 2020-10-19 | User portrait generation method, device, equipment and medium |
PCT/CN2020/132601 WO2021189922A1 (en) | 2020-10-19 | 2020-11-30 | Method and apparatus for generating user portrait, and device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011118110.XA CN112256961B (en) | 2020-10-19 | 2020-10-19 | User portrait generation method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112256961A CN112256961A (en) | 2021-01-22 |
CN112256961B true CN112256961B (en) | 2024-04-09 |
Family
ID=74243980
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011118110.XA Active CN112256961B (en) | 2020-10-19 | 2020-10-19 | User portrait generation method, device, equipment and medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112256961B (en) |
WO (1) | WO2021189922A1 (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022190454A (en) * | 2021-06-14 | 2022-12-26 | 富士通株式会社 | Inverse reinforcement learning program, inverse reinforcement learning method, and information processor |
CN113516533A (en) * | 2021-06-24 | 2021-10-19 | 平安科技(深圳)有限公司 | Product recommendation method, device, equipment and medium based on improved BERT model |
CN113592551A (en) * | 2021-07-31 | 2021-11-02 | 广州小鹏汽车科技有限公司 | Method, device and equipment for analyzing and processing behavior data of vehicle purchasing user |
CN113988070B (en) * | 2021-10-09 | 2023-05-05 | 广州快决测信息科技有限公司 | Investigation problem generation method, investigation problem generation device, computer equipment and storage medium |
CN115994259A (en) * | 2021-10-20 | 2023-04-21 | 上海点掌文化科技股份有限公司 | User portrait generation method and device, storage medium and terminal |
CN113963440B (en) * | 2021-10-22 | 2025-03-14 | 北京明略软件系统有限公司 | A method and device for analyzing customer purchase intention |
CN114331512B (en) * | 2021-12-22 | 2023-08-25 | 重庆汇博利农科技有限公司 | Visual data modeling and big data portrayal method |
CN113988727B (en) * | 2021-12-28 | 2022-05-10 | 卡奥斯工业智能研究院(青岛)有限公司 | Resource scheduling method and system |
CN114663132A (en) * | 2022-03-02 | 2022-06-24 | 厦门文杉信息科技有限公司 | A kind of intelligent marketing method and device based on real-time user portrait |
CN115098931B (en) * | 2022-07-20 | 2022-12-16 | 江苏艾佳家居用品有限公司 | Small sample analysis method for mining personalized requirements of indoor design of user |
CN115907106A (en) * | 2022-11-04 | 2023-04-04 | 广东电网有限责任公司 | Micro-grid scheduling method, device, equipment and storage medium based on edge calculation |
CN115952908A (en) * | 2022-12-30 | 2023-04-11 | 北京科大讯飞教育科技有限公司 | Learning path planning method, system, device and storage medium |
CN117271905B (en) * | 2023-11-21 | 2024-02-09 | 杭州小策科技有限公司 | Crowd image-based lateral demand analysis method and system |
CN117710009B (en) * | 2023-11-21 | 2024-10-15 | 中国电子科技集团公司第十五研究所 | Petroleum demand prediction method and system |
CN118350680B (en) * | 2024-06-18 | 2024-09-20 | 国网山东省电力公司营销服务中心(计量中心) | Electricity fee anomaly identification method and system based on dynamic multi-target gravitation search |
CN119379347B (en) * | 2024-08-19 | 2025-05-06 | 南京弘竹泰信息技术有限公司 | User portrait generation cloud platform in enterprise digital operation |
CN118916398B (en) * | 2024-10-09 | 2024-12-27 | 每日互动股份有限公司 | Method, device, medium and equipment for obtaining user portrait prediction model |
CN119313985B (en) * | 2024-12-17 | 2025-05-30 | 上海孚厘科技有限公司 | Portrait marking method and device based on stream data and computer equipment |
CN119918642B (en) * | 2025-04-07 | 2025-06-20 | 合肥工业大学 | Expert portrait characterization and dynamic update method based on reinforcement learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013239065A (en) * | 2012-05-16 | 2013-11-28 | Nippon Telegr & Teleph Corp <Ntt> | Initial purchase estimation device, initial purchase estimation method and initial purchase estimation program |
CN106228314A (en) * | 2016-08-11 | 2016-12-14 | 电子科技大学 | The workflow schedule method of study is strengthened based on the degree of depth |
CN108594638A (en) * | 2018-03-27 | 2018-09-28 | 南京航空航天大学 | The in-orbit reconstructing methods of spacecraft ACS towards the constraint of multitask multi-index optimization |
CN110570279A (en) * | 2019-09-04 | 2019-12-13 | 深圳创新奇智科技有限公司 | Strategic recommendation method and device based on real-time user behavior |
CN111159534A (en) * | 2019-12-03 | 2020-05-15 | 泰康保险集团股份有限公司 | User portrait based aid decision making method and device, equipment and medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7835936B2 (en) * | 2004-06-05 | 2010-11-16 | Sap Ag | System and method for modeling customer response using data observable from customer buying decisions |
CN105761102B (en) * | 2016-02-04 | 2021-05-11 | 杭州朗和科技有限公司 | Method and device for predicting commodity purchasing behavior of user |
KR101813805B1 (en) * | 2016-09-28 | 2017-12-29 | 한양대학교 산학협력단 | Method and Apparatus for purchase probability prediction of user using machine learning |
KR102408476B1 (en) * | 2017-07-10 | 2022-06-14 | 십일번가 주식회사 | Method for predicing purchase probability based on behavior sequence of user and apparatus therefor |
CN107705155A (en) * | 2017-10-11 | 2018-02-16 | 北京三快在线科技有限公司 | A kind of consuming capacity Forecasting Methodology, device, electronic equipment and readable storage medium storing program for executing |
CN108492138B (en) * | 2018-03-19 | 2020-03-24 | 平安科技(深圳)有限公司 | Product purchase prediction method, server and storage medium |
-
2020
- 2020-10-19 CN CN202011118110.XA patent/CN112256961B/en active Active
- 2020-11-30 WO PCT/CN2020/132601 patent/WO2021189922A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013239065A (en) * | 2012-05-16 | 2013-11-28 | Nippon Telegr & Teleph Corp <Ntt> | Initial purchase estimation device, initial purchase estimation method and initial purchase estimation program |
CN106228314A (en) * | 2016-08-11 | 2016-12-14 | 电子科技大学 | The workflow schedule method of study is strengthened based on the degree of depth |
CN108594638A (en) * | 2018-03-27 | 2018-09-28 | 南京航空航天大学 | The in-orbit reconstructing methods of spacecraft ACS towards the constraint of multitask multi-index optimization |
CN110570279A (en) * | 2019-09-04 | 2019-12-13 | 深圳创新奇智科技有限公司 | Strategic recommendation method and device based on real-time user behavior |
CN111159534A (en) * | 2019-12-03 | 2020-05-15 | 泰康保险集团股份有限公司 | User portrait based aid decision making method and device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN112256961A (en) | 2021-01-22 |
WO2021189922A1 (en) | 2021-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112256961B (en) | User portrait generation method, device, equipment and medium | |
JP7276757B2 (en) | Systems and methods for model fairness | |
CN109345302B (en) | Machine learning model training method and device, storage medium and computer equipment | |
Burnap et al. | Design and evaluation of product aesthetics: A human-machine hybrid approach | |
CN108764584B (en) | Enterprise electric energy substitution potential evaluation method | |
CN112182384B (en) | Content recommendation method and device based on countermeasure learning and computer equipment | |
CN113536105B (en) | Recommendation model training method and device | |
CN112905876A (en) | Information pushing method and device based on deep learning and computer equipment | |
CN108491511A (en) | Data digging method and device, model training method based on diagram data and device | |
WO2016165058A1 (en) | Social prediction | |
CN113762005B (en) | Feature selection model training and object classification methods, devices, equipment and media | |
Desirena et al. | Maximizing customer lifetime value using stacked neural networks: An insurance industry application | |
CN112380427A (en) | User interest prediction method based on iterative graph attention network and electronic device | |
CN112270571A (en) | A meta-model training method for cold-start advertisement click-through rate prediction model | |
CN118364317A (en) | Sample expansion method, sample expansion device, computer equipment and readable storage medium | |
CN119151608A (en) | Advertisement effect attribution assessment method, apparatus, computer device and storage medium | |
US11775887B2 (en) | Methods and systems for processing data having varied temporal characteristics to generate predictions related to management arrangements using random forest classifiers | |
Dikopoulou et al. | A new approach using mixed graphical model for automatic design of fuzzy cognitive maps from ordinal data | |
Tran et al. | Intervention recommendation for improving disability employment | |
Horvath et al. | Granger causality using neural networks | |
CN117436968A (en) | Method, device, computer equipment and storage medium for recommending products | |
Cuevas et al. | Otsu and Kapur segmentation based on harmony search optimization | |
CN114140848B (en) | Micro expression recognition method, system, equipment and storage medium based on KNN and DSN | |
Hu et al. | Learning mixed multinomial logits with provable guarantees | |
Kutiel et al. | What's behind the mask: Estimating uncertainty in image-to-image problems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |