CN117216553A

CN117216553A - Pre-training method, adjusting method, recommending method and related products of recommending model

Info

Publication number: CN117216553A
Application number: CN202310993505.1A
Authority: CN
Inventors: 赵鑫; 谢若冰; 孙文奇; 卞书青; 周杰
Original assignee: Tencent Technology Shenzhen Co Ltd; Renmin University of China
Current assignee: Tencent Technology Shenzhen Co Ltd; Renmin University of China
Priority date: 2023-08-08
Filing date: 2023-08-08
Publication date: 2023-12-12

Abstract

This application discloses a pre-training method, adjustment method, recommendation method and related products for a recommendation model. First, obtain the single-source domain behavior sequence of the object, and obtain the multi-modal information corresponding to the content in the single-source domain behavior sequence. The multi-modal information includes at least two different modal information; at the beginning of model pre-training, process Obtain a multi-modal vector representation of the content in a multi-domain universal content representation space; during model pre-training, the recommendation model to be trained is based on the behavior vector representation and predicts the first same source triggered after the end content of the single-source domain behavior sequence triggered by the object. domain content, and finally a preliminary recommendation model is obtained. Combined with the above model pre-training process, it can be seen that in this application, at least two different modalities of information are used, so that the information of various modalities can complement each other to build a sufficient training data set, so that the model obtained after pre-training Can be more robust.

Description

Pre-training methods, adjustment methods, recommendation methods and related products of recommended models

技术领域Technical field

本申请涉及多领域推荐技术领域，尤其涉及一种推荐模型的预训练方法、调整方法、推荐方法及相关产品。This application relates to the field of multi-field recommendation technology, and in particular to a pre-training method, adjustment method, recommendation method and related products for a recommendation model.

背景技术Background technique

在多领域推荐(MDR，multi-domain recommendation)技术领域中，推荐模型的训练和构建方案取得了很大进展，完成构建后的推荐模型可以用于对对象进行预测，以便为对象推荐其感兴趣的内容。然而现有的推荐模型训练方案大多选择以文本信息作为跨领域的桥梁，而忽略了其他模态信息的重要价值，例如，图像信息是许多领域中内容的主要信息源。这导致相关技术中训练得到的一些推荐模型面临的训练数据集中的模态信息不充分，进而导致了存在模型稳健性不足的情况。In the field of multi-domain recommendation (MDR, multi-domain recommendation) technology, great progress has been made in the training and construction of recommendation models. The built recommendation model can be used to predict objects in order to recommend objects of interest to them. Content. However, most of the existing recommendation model training solutions use text information as a bridge across domains, while ignoring the important value of other modal information. For example, image information is the main information source of content in many fields. This results in some recommendation models trained in related technologies facing insufficient modal information in the training data set, which in turn leads to insufficient model robustness.

由此，如何提高推荐模型的稳健性，已经成为当前领域亟待解决的技术问题。Therefore, how to improve the robustness of recommendation models has become an urgent technical issue in the current field that needs to be solved.

发明内容Contents of the invention

本申请实施例提供了一种推荐模型的预训练方法、调整方法、推荐方法及相关产品，旨在提高推荐模型的稳健性。The embodiments of this application provide a pre-training method, adjustment method, recommendation method and related products for a recommendation model, aiming to improve the robustness of the recommendation model.

本申请第一方面提供了一种推荐模型的预训练方法，包括：The first aspect of this application provides a pre-training method for a recommendation model, including:

获取对象的单源域行为序列；所述单源域行为序列包括同一源域的多个内容，且所述多个内容依照受所述对象触发的时间由先到后排序；Obtain the single-source domain behavior sequence of the object; the single-source domain behavior sequence includes multiple contents from the same source domain, and the multiple contents are sorted from first to last according to the time triggered by the object;

获取所述单源域行为序列中的内容对应的多模态信息；所述多模态信息包括至少两种不同模态的信息；Obtain multi-modal information corresponding to the content in the single-source domain behavior sequence; the multi-modal information includes information of at least two different modalities;

将内容对应的多模态信息作为待训练推荐模型的输入，通过所述待训练推荐模型在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示；The multi-modal information corresponding to the content is used as the input of the recommendation model to be trained, and the multi-modal vector representation of the content in the multi-domain universal content representation space is obtained by processing the recommendation model to be trained on the basis of the input multi-modal information. ;

根据所述单源域行为序列中内容的排序，以及所述单源域行为序列中各内容分别在所述多域通用内容表示空间中的多模态向量表示，获得所述对象在所述单源域行为序列所属源域的行为向量表示；According to the sorting of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, the object in the single-source domain behavior sequence is obtained. The behavior vector representation of the source domain to which the source domain behavior sequence belongs;

由所述待训练推荐模型基于所述行为向量表示，预测所述对象触发所述单源域行为序列的末尾内容之后触发的首个相同源域的内容；The recommendation model to be trained predicts, based on the behavior vector representation, the first content of the same source domain triggered after the object triggers the end content of the single-source domain behavior sequence;

根据预测触发的首个相同源域的内容和所述对象在所述单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整所述待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型。Iteratively adjust the parameters of the recommendation model to be trained according to the difference between the content of the first same source domain triggered by prediction and the content of the first same source domain actually triggered by the object after the end content of the single-source domain behavior sequence. , until the adjusted model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after the pre-training is completed.

本申请第二方面提供了一种模型调整方法，用于通过第一方面预训练得到的初步推荐模型进行调整，以实现所述初步推荐模型从源域向目标域的迁移，包括：The second aspect of this application provides a model adjustment method for adjusting the preliminary recommendation model obtained through pre-training in the first aspect to realize the migration of the preliminary recommendation model from the source domain to the target domain, including:

获取目标对象的多域混合流行为序列；所述多域混合流行为序列包括多个领域的多个内容，且所述多个领域的多个内容依照受所述目标对象触发的时间由先到后排序；所述多域混合流行为序列涉及的多个领域中包括所述目标域；Obtain the multi-domain mixed flow behavior sequence of the target object; the multi-domain mixed flow behavior sequence includes multiple contents in multiple fields, and the multiple contents in the multiple fields arrive first according to the time triggered by the target object. Post-sequencing; multiple fields involved in the multi-domain mixed flow behavior sequence include the target domain;

基于所述多域混合流行为序列中的各内容对应的多模态信息，通过所述初步推荐模型，分别获得所述多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示；Based on the multi-modal information corresponding to each content in the multi-domain mixed flow behavior sequence, through the preliminary recommendation model, the multi-domain general content representation space of each content in the multi-domain mixed flow behavior sequence is obtained. Modal vector representation;

根据所述多域混合流行为序列中的各内容的排序，以及所述多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得所述目标对象在所述多域混合流行为序列对应的多域混合流行为向量表示；According to the ordering of each content in the multi-domain mixed flow behavior sequence and the multi-modal vector representation of each content in the multi-domain mixed flow behavior sequence in the multi-domain universal content representation space, the target object is obtained at the location of the multi-domain mixed flow behavior sequence. Multi-domain mixed flow behavior vector representation corresponding to the multi-domain mixed flow behavior sequence;

由所述初步推荐模型基于所述多域混合流行为向量表示，预测所述目标对象触发所述多域混合流行为序列的末尾内容之后，触发的首个所述目标域的内容；The preliminary recommendation model predicts, based on the multi-domain mixed flow behavior vector representation, the content of the first target domain triggered by the target object after triggering the end content of the multi-domain mixed flow behavior sequence;

根据预测触发的所述目标域的内容和所述目标对象在所述多域混合流行为序列的末尾内容之后，实际触发的首个所述目标域的内容的差别，迭代调整所述初步推荐模型的参数，直至模型调整好满足预设微调截止条件，结束调整得到目标推荐模型。Iteratively adjust the preliminary recommendation model according to the difference between the content of the target domain that is predicted to be triggered and the content of the first target domain that is actually triggered by the target object after the end content of the multi-domain mixed flow behavior sequence. parameters until the model is adjusted to meet the preset fine-tuning cutoff conditions, and then the adjustment is completed to obtain the target recommendation model.

本申请第三方面提供了一种推荐方法，用于通过第二方面获得的目标推荐模型进行推荐，包括：The third aspect of this application provides a recommendation method for recommendation through the target recommendation model obtained in the second aspect, including:

获取待推荐对象的历史行为序列，所述历史行为序列中至少包含隶属于所述目标域的内容，且所述历史行为序列中的各内容依照受所述待推荐对象触发的时间由先到后排序；Obtain the historical behavior sequence of the object to be recommended. The historical behavior sequence at least includes content belonging to the target domain, and each content in the historical behavior sequence is in order from first to last according to the time triggered by the object to be recommended. sort;

基于所述历史行为序列中的各内容对应的多模态信息，通过所述目标推荐模型，分别获得所述历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示；Based on the multi-modal information corresponding to each content in the historical behavior sequence, through the target recommendation model, the multi-modal vector representation of each content in the historical behavior sequence in the multi-domain universal content representation space is obtained respectively;

根据所述历史行为序列中的各内容的排序，以及所述历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得所述待推荐对象在所述历史行为序列对应的历史行为向量表示；According to the sorting of each content in the historical behavior sequence and the multi-modal vector representation of each content in the historical behavior sequence in the multi-domain universal content representation space, the corresponding position of the object to be recommended in the historical behavior sequence is obtained. historical behavior vector representation;

由所述目标推荐模型基于所述历史行为向量表示，预测所述待推荐对象触发所述历史行为序列的末尾内容之后，触发的首个所述目标域的内容；The target recommendation model predicts, based on the historical behavior vector representation, the content of the first target domain triggered after the object to be recommended triggers the end content of the historical behavior sequence;

向所述待推荐对象推荐所述目标推荐模型预测出的首个所述目标域的内容。The content of the first target domain predicted by the target recommendation model is recommended to the object to be recommended.

本申请第四方面提供了一种推荐模型的预训练装置，包括：The fourth aspect of this application provides a pre-training device for recommended models, including:

行为序列获取模块，用于获取对象的单源域行为序列；所述单源域行为序列包括同一源域的多个内容，且所述多个内容依照受所述对象触发的时间由先到后排序；The behavior sequence acquisition module is used to obtain the single-source domain behavior sequence of the object; the single-source domain behavior sequence includes multiple contents of the same source domain, and the multiple contents are arranged from first to last according to the time triggered by the object. sort;

多模态信息获取模块，用于获取所述单源域行为序列中的内容对应的多模态信息；所述多模态信息包括至少两种不同模态的信息；A multi-modal information acquisition module is used to obtain multi-modal information corresponding to the content in the single-source domain behavior sequence; the multi-modal information includes information of at least two different modalities;

信息输入确定模块，用于将内容对应的多模态信息作为待训练推荐模型的输入，通过所述待训练推荐模型在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示；The information input determination module is used to use the multi-modal information corresponding to the content as the input of the recommendation model to be trained, and obtain the multi-domain universal content representation of the content based on the input multi-modal information by the recommendation model to be trained. Multimodal vector representation of space;

行为表示构造模块，用于根据所述单源域行为序列中内容的排序，以及所述单源域行为序列中各内容分别在所述多域通用内容表示空间中的多模态向量表示，获得所述对象在所述单源域行为序列所属源域的行为向量表示；A behavior representation construction module, configured to obtain a representation based on the sorting of content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space. The object’s behavior vector representation in the source domain to which the single-source domain behavior sequence belongs;

相同源域内容预测模块，用于由所述待训练推荐模型基于所述行为向量表示，预测所述对象触发所述单源域行为序列的末尾内容之后触发的首个相同源域的内容；A same source domain content prediction module, configured to use the recommendation model to be trained based on the behavior vector representation to predict the first content of the same source domain triggered after the object triggers the end content of the single source domain behavior sequence;

初步推荐模型获得模块，用于根据预测触发的首个相同源域的内容和所述对象在所述单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整所述待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型。The preliminary recommendation model acquisition module is used to iteratively adjust based on the difference between the content of the first same source domain that is predicted to be triggered and the content of the first same source domain that is actually triggered by the object after the end content of the single-source domain behavior sequence. The parameters of the recommended model to be trained are used until the adjusted model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after the pre-training is completed.

本申请第五方面提供了一种模型调整装置，包括：The fifth aspect of this application provides a model adjustment device, including:

混合流行为序列获取模块，用于获取目标对象的多域混合流行为序列；所述多域混合流行为序列包括多个领域的多个内容，且所述多个领域的多个内容依照受所述目标对象触发的时间由先到后排序；所述多域混合流行为序列涉及的多个领域中包括所述目标域；The mixed flow behavior sequence acquisition module is used to obtain the multi-domain mixed flow behavior sequence of the target object; the multi-domain mixed flow behavior sequence includes multiple contents in multiple fields, and the multiple contents in the multiple fields are in accordance with the subject The triggering time of the target object is ordered from first to last; the target domain is included in the multiple fields involved in the multi-domain mixed flow behavior sequence;

多模态向量表示获得模块，用于基于所述多域混合流行为序列中的各内容对应的多模态信息，通过所述初步推荐模型，分别获得所述多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示；A multi-modal vector representation obtaining module is configured to obtain each content in the multi-domain mixed flow behavior sequence through the preliminary recommendation model based on the multi-modal information corresponding to each content in the multi-domain mixed flow behavior sequence. Multi-modal vector representation of content in a multi-domain universal content representation space;

混合流行为向量表示获得模块，用于根据所述多域混合流行为序列中的各内容的排序，以及所述多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得所述目标对象在所述多域混合流行为序列对应的多域混合流行为向量表示；Mixed flow behavior vector representation acquisition module, used for sorting each content in the multi-domain mixed flow behavior sequence, and multi-modality of each content in the multi-domain mixed flow behavior sequence in a multi-domain universal content representation space Vector representation, obtaining the multi-domain mixed flow behavior vector representation corresponding to the multi-domain mixed flow behavior sequence of the target object;

目标域内容预测模块，用于由所述初步推荐模型基于所述多域混合流行为向量表示，预测所述目标对象触发所述多域混合流行为序列的末尾内容之后，触发的首个所述目标域的内容；A target domain content prediction module, configured to use the preliminary recommendation model based on the multi-domain mixed flow behavior vector representation to predict the first trigger of the target object after triggering the end content of the multi-domain mixed flow behavior sequence. The content of the target domain;

目标推荐模型获得模块，用于根据预测触发的所述目标域的内容和所述目标对象在所述多域混合流行为序列的末尾内容之后，实际触发的首个所述目标域的内容的差别，迭代调整所述初步推荐模型的参数，直至模型调整好满足预设微调截止条件，结束调整得到目标推荐模型。A target recommendation model acquisition module is used to obtain a module based on the difference between the content of the target domain that is predicted to be triggered and the content of the first target domain that is actually triggered by the target object after the end content of the multi-domain mixed flow behavior sequence. , iteratively adjust the parameters of the preliminary recommendation model until the model is adjusted to meet the preset fine-tuning cutoff conditions, and then the adjustment is completed to obtain the target recommendation model.

本申请第六方面提供了一种推荐装置，包括：The sixth aspect of this application provides a recommended device, including:

历史行为序列获取模块，用于获取待推荐对象的历史行为序列，所述历史行为序列中至少包含隶属于所述目标域的内容，且所述历史行为序列中的各内容依照受所述待推荐对象触发的时间由先到后排序；The historical behavior sequence acquisition module is used to obtain the historical behavior sequence of the object to be recommended. The historical behavior sequence at least contains content belonging to the target domain, and each content in the historical behavior sequence is subject to the to-be-recommended object. The time of object triggering is sorted from first to last;

历史多模态向量表示获得模块，用于基于所述历史行为序列中的各内容对应的多模态信息，通过所述目标推荐模型，分别获得所述历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示；The historical multi-modal vector representation acquisition module is used to obtain the multi-domain universal information of each content in the historical behavior sequence through the target recommendation model based on the multi-modal information corresponding to each content in the historical behavior sequence. Multimodal vector representation of content representation space;

历史行为向量表示获得模块，用于根据所述历史行为序列中的各内容的排序，以及所述历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得所述待推荐对象在所述历史行为序列对应的历史行为向量表示；A historical behavior vector representation acquisition module, configured to obtain the to-be-listed content according to the sorting of each content in the historical behavior sequence and the multi-modal vector representation of each content in the historical behavior sequence in a multi-domain universal content representation space. The historical behavior vector representation of the recommended object corresponding to the historical behavior sequence;

历史目标域内容预测模块，用于由所述目标推荐模型基于所述历史行为向量表示，预测所述待推荐对象触发所述历史行为序列的末尾内容之后，触发的首个所述目标域的内容；A historical target domain content prediction module, configured to use the target recommendation model based on the historical behavior vector representation to predict the content of the first target domain triggered after the object to be recommended triggers the end content of the historical behavior sequence. ;

目标域内容推荐模块，用于向所述待推荐对象推荐所述目标推荐模型预测出的首个所述目标域的内容。A target domain content recommendation module is configured to recommend the first content of the target domain predicted by the target recommendation model to the object to be recommended.

本申请第七方面提供了一种计算机设备，所述设备包括处理器以及存储器：A seventh aspect of this application provides a computer device, which includes a processor and a memory:

所述存储器用于存储计算机程序，并将所述计算机程序传输给所述处理器；The memory is used to store a computer program and transmit the computer program to the processor;

所述处理器用于根据所述计算机程序中的指令执行第一方面提供的推荐模型的预训练方法的步骤，或者执行第二方面提供的模型调整方法的步骤，或者执行第三方面提供的推荐方法的步骤。The processor is configured to perform the steps of the pre-training method of the recommendation model provided in the first aspect according to the instructions in the computer program, or perform the steps of the model adjustment method provided in the second aspect, or perform the recommendation method provided in the third aspect. A step of.

本申请第八方面提供了一种计算机可读存储介质，所述计算机可读存储介质用于存储计算机程序，所述计算机程序被计算机设备执行时实现第一方面提供的推荐模型的预训练方法的步骤，或者执行第二方面提供的模型调整方法的步骤，或者执行第三方面提供的推荐方法的步骤。An eighth aspect of the present application provides a computer-readable storage medium. The computer-readable storage medium is used to store a computer program. When the computer program is executed by a computer device, it implements the pre-training method of the recommendation model provided in the first aspect. Steps, or perform the steps of the model adjustment method provided by the second aspect, or perform the steps of the recommended method provided by the third aspect.

本申请第九方面提供了一种计算机程序产品，包括计算机程序，该计算机程序被计算机设备执行时实现第一方面提供的推荐模型的预训练方法的步骤，或者执行第二方面提供的模型调整方法的步骤，或者执行第三方面提供的推荐方法的步骤。A ninth aspect of the present application provides a computer program product, including a computer program that, when executed by a computer device, implements the steps of the pre-training method for the recommendation model provided in the first aspect, or executes the model adjustment method provided in the second aspect steps, or steps to perform the recommended method provided by the third party.

从以上技术方案可以看出，本申请实施例具有以下优点：It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:

本申请技术方案中首先获取对象的单源域行为序列，以及获取单源域行为序列中的内容对应的多模态信息，其中多模态信息包括至少两种不同模态的信息；在模型预训练之初，将内容对应的多模态信息作为待训练推荐模型的输入，以通过待训练推荐模型在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示；在模型预训练期间，根据单源域行为序列中内容的排序，以及单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示，获得对象在单源域行为序列所属源域的行为向量表示；如此，以便由待训练推荐模型基于行为向量表示，预测对象触发单源域行为序列的末尾内容之后触发的首个相同源域的内容，最后根据预测触发的首个相同源域的内容和对象在单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型。结合上述模型预训练过程可知，在本申请中采用了至少两种不同模态的信息，使各种模态的信息可以相互补充，以构建充分的训练数据集，从而使得预训练后得到的模型可以更具备稳健性。并且利用预训练技术学习内容以及行为序列可以更具备通用性和鲁棒性，可以将初步推荐模型进行有效迁移，便于后续对初步推荐模型模型进行微调，进而实现模型预测并推荐的最佳性能。In the technical solution of this application, the single-source domain behavior sequence of the object is first obtained, and the multi-modal information corresponding to the content in the single-source domain behavior sequence is obtained, where the multi-modal information includes information of at least two different modalities; in model pre-processing At the beginning of training, the multi-modal information corresponding to the content is used as the input of the recommendation model to be trained, and the multi-modal representation of the content in the multi-domain universal content representation space is obtained by processing the recommendation model to be trained on the basis of the input multi-modal information. state vector representation; during the model pre-training, based on the ordering of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, the object is obtained in a single The behavior vector representation of the source domain to which the source domain behavior sequence belongs; in this way, based on the behavior vector representation, the recommendation model to be trained predicts the first content of the same source domain triggered after the end content of the single source domain behavior sequence is triggered by the object, and finally according to the prediction The difference between the first triggered content of the same source domain and the first content of the same source domain actually triggered by the object after the end of the single-source domain behavior sequence, iteratively adjust the parameters of the recommended model to be trained until the adjusted model meets the predetermined Training cut-off conditions, the preliminary recommended model is obtained after pre-training. Combined with the above model pre-training process, it can be seen that in this application, at least two different modalities of information are used, so that the information of various modalities can complement each other to build a sufficient training data set, so that the model obtained after pre-training Can be more robust. In addition, the use of pre-training technology to learn content and behavior sequences can be more versatile and robust, and the preliminary recommendation model can be effectively migrated to facilitate subsequent fine-tuning of the preliminary recommendation model, thereby achieving the best performance in model prediction and recommendation.

附图说明Description of drawings

图1为本申请实施例提供的相关技术中多领域模态信息的示意图；Figure 1 is a schematic diagram of multi-domain modal information in related technologies provided by embodiments of the present application;

图2为本申请实施例中提供的一种推荐模型的预训练方法、调整方法、推荐方法的场景架构图；Figure 2 is a scene architecture diagram of the pre-training method, adjustment method, and recommendation method of a recommendation model provided in the embodiment of this application;

图3为本申请实施例提供的一种实际应用场景中推荐模型的预训练方法的流程图；Figure 3 is a flow chart of a pre-training method for recommended models in practical application scenarios provided by the embodiment of the present application;

图4为本申请实施例提供的一种推荐模型的预训练方法的流程图；Figure 4 is a flow chart of a pre-training method for a recommendation model provided by an embodiment of the present application;

图5为本申请实施例提供的一种推荐模型的结构示意图；Figure 5 is a schematic structural diagram of a recommendation model provided by an embodiment of the present application;

图6为本申请实施例提供的一种推荐模型的多模态内容表示构造器的结构示意图；Figure 6 is a schematic structural diagram of a multi-modal content representation constructor of a recommendation model provided by an embodiment of the present application;

图7为本申请实施例提供的一种推荐模型的多域映射器的结构示意图；Figure 7 is a schematic structural diagram of a multi-domain mapper of a recommendation model provided by an embodiment of the present application;

图8为本申请实施例提供的一种推荐模型的处理示意图；Figure 8 is a schematic diagram of the processing of a recommendation model provided by the embodiment of the present application;

图9为本申请实施例提供的一种模型调整方法的流程图；Figure 9 is a flow chart of a model adjustment method provided by an embodiment of the present application;

图10为本申请实施例提供的一种调整模型的处理示意图；Figure 10 is a schematic diagram of a process of adjusting the model provided by the embodiment of the present application;

图11为本申请实施例提供的一种调整模型的调整示意图；Figure 11 is an adjustment schematic diagram of an adjustment model provided by an embodiment of the present application;

图12为本申请实施例提供的一种推荐方法的流程图；Figure 12 is a flow chart of a recommendation method provided by an embodiment of the present application;

图13为本申请实施例提供的实际应用中本方案推荐模型和相关技术模型的增益效果对比图；Figure 13 is a comparison chart of the gain effects of the recommended model of this solution and the related technology model in practical applications provided by the embodiment of the present application;

图14为本申请实施例提供的推荐模型的预训练装置的结构示意图；Figure 14 is a schematic structural diagram of a pre-training device for a recommendation model provided by an embodiment of the present application;

图15为本申请实施例提供的模型调整装置的结构示意图；Figure 15 is a schematic structural diagram of a model adjustment device provided by an embodiment of the present application;

图16为本申请实施例提供的推荐装置的结构示意图；Figure 16 is a schematic structural diagram of a recommendation device provided by an embodiment of the present application;

图17为本申请实施例中服务器的一个结构示意图；Figure 17 is a schematic structural diagram of a server in an embodiment of the present application;

图18为本申请实施例中终端设备的一个结构示意图。Figure 18 is a schematic structural diagram of a terminal device in an embodiment of the present application.

具体实施方式Detailed ways

下面结合附图，对本申请的实施例进行描述。The embodiments of the present application are described below with reference to the accompanying drawings.

首先对本申请下文的实施例中可能涉及的若干个名词术语进行解释。First, several terms that may be involved in the following embodiments of this application are explained.

多模态：在人工智能领域中，往往指感知信息，如图像、文本、语音等协同，帮人工智能更准确地理解外部世界。Multi-modality: In the field of artificial intelligence, it often refers to the collaboration of sensory information, such as images, text, and voice, to help artificial intelligence understand the external world more accurately.

在多领域推荐(MDR，multi-domain recommendation)技术领域中，推荐模型的训练和构建方案取得了很大进展，完成构建后的推荐模型可以用于对对象进行预测，以便为对象推荐其感兴趣的内容。然而现有的推荐模型训练方案大多选择以文本信息作为跨领域的桥梁，而忽略了其他模态信息的重要价值，例如，图像信息是许多领域中内容的主要信息源。In the field of multi-domain recommendation (MDR, multi-domain recommendation) technology, great progress has been made in the training and construction of recommendation models. The built recommendation model can be used to predict objects in order to recommend objects of interest to them. Content. However, most of the existing recommendation model training solutions use text information as a bridge across domains, while ignoring the important value of other modal information. For example, image information is the main information source of content in many fields.

图1为本申请实施例提供的相关技术中多领域模态信息的示意图。如图1所示，很显然在“图书”领域，文本模态的信息比图像模态的信息更为重要，但随着微视频和通信技术的蓬勃发展，图像信息等属于视觉模态的信息逐渐成为许多领域中内容的主要信息源，如在“艺术品、手工品”领域，图像模态的信息是比文本模态的信息更重要的。在此种情况下，若单靠一种模态的信息(即文本模态的信息)进行模型预训练，这势必会导致相关技术中训练得到的一些推荐模型面临的训练数据集中的模态信息不充分，进而导致了存在模型稳健性不足的情况。由此，如何提高推荐模型的稳健性，已经成为当前领域亟待解决的技术问题。Figure 1 is a schematic diagram of multi-domain modal information in related technologies provided by embodiments of the present application. As shown in Figure 1, it is obvious that in the field of "books", text modal information is more important than image modal information. However, with the vigorous development of micro-video and communication technology, image information and other visual modal information It has gradually become the main information source of content in many fields. For example, in the field of "arts and handicrafts", image modal information is more important than text modal information. In this case, if the model pre-training is performed solely on the information of one modality (that is, the information of the text modality), this will inevitably lead to the modal information in the training data set faced by some recommendation models trained in related technologies. Inadequate, resulting in insufficient model robustness. Therefore, how to improve the robustness of recommendation models has become an urgent technical issue in the current field that needs to be solved.

鉴于以上问题，在本申请中提供了一种推荐模型的预训练方法、调整方法、推荐方法及相关产品，目的在于提高推荐模型的稳健性。在本申请提供的技术方案中，首先获取对象的单源域行为序列，以及获取单源域行为序列中的内容对应的多模态信息，其中多模态信息包括至少两种不同模态的信息；在模型预训练之初，将内容对应的多模态信息作为待训练推荐模型的输入，以通过待训练推荐模型在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示；在模型预训练期间，根据单源域行为序列中内容的排序，以及单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示，获得对象在单源域行为序列所属源域的行为向量表示；如此，以便由待训练推荐模型基于行为向量表示，预测对象触发单源域行为序列的末尾内容之后触发的首个相同源域的内容，最后根据预测触发的首个相同源域的内容和对象在单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型。结合上述模型预训练过程可知，在本申请中采用了至少两种不同模态的信息，使各种模态的信息可以相互补充，以构建充分的训练数据集，从而使得预训练后得到的模型可以更具备稳健性。并且利用预训练技术学习内容以及行为序列可以更具备通用性和鲁棒性，可以将初步推荐模型进行有效迁移，便于后续对初步推荐模型模型进行微调，进而实现模型预测并推荐的最佳性能。In view of the above problems, this application provides a pre-training method, adjustment method, recommendation method and related products for the recommendation model, with the purpose of improving the robustness of the recommendation model. In the technical solution provided by this application, the single-source domain behavior sequence of the object is first obtained, and the multi-modal information corresponding to the content in the single-source domain behavior sequence is obtained, where the multi-modal information includes information of at least two different modalities. ; At the beginning of model pre-training, the multi-modal information corresponding to the content is used as the input of the recommendation model to be trained, so that the multi-domain universal content representation of the content can be obtained by processing the recommendation model to be trained on the basis of the input multi-modal information. Multi-modal vector representation of the space; during model pre-training, based on the ordering of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, Obtain the behavior vector representation of the object in the source domain to which the single-source domain behavior sequence belongs; in this way, the recommendation model to be trained can predict the first content of the same source domain triggered after the object triggers the last content of the single-source domain behavior sequence based on the behavior vector representation. , and finally based on the difference between the content of the first same source domain triggered by prediction and the content of the first same source domain actually triggered by the object after the end of the single source domain behavior sequence, iteratively adjust the parameters of the recommendation model to be trained until the adjustment The model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after pre-training. Combined with the above model pre-training process, it can be seen that in this application, at least two different modalities of information are used, so that the information of various modalities can complement each other to build a sufficient training data set, so that the model obtained after pre-training Can be more robust. In addition, the use of pre-training technology to learn content and behavior sequences can be more versatile and robust, and the preliminary recommendation model can be effectively migrated to facilitate subsequent fine-tuning of the preliminary recommendation model, thereby achieving the best performance in model prediction and recommendation.

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive subject that covers a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

本申请提供的推荐模型的预训练方法、模型调整方法和推荐方法主要涉及机器学习。其中，机器学习(Machine Learning,ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。The pre-training method, model adjustment method and recommendation method of the recommendation model provided in this application mainly involve machine learning. Among them, Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

本申请实施例提供的推荐模型的预训练方法的执行主体可以为终端设备，例如在终端设备上获取对象的单源域行为序列。或者本申请实施例提供的模型调整方法的执行主体可以为终端设备，例如在终端设备上获取目标对象的多域混合流行为序列。或者本申请实施例提供的推荐方法的执行主体可以为终端设备，例如在终端设备上获取待推荐对象的历史行为序列。作为示例，终端设备具体可以包括但不限于手机、台式电脑、平板电脑、笔记本电能、掌上电脑、智能语音交互设备、智能家电、车载终端、飞行器等。本申请实施例提供的推荐模型的预训练方法的执行主体也可以是服务器，即可以在服务器上获取对象的单源域行为序列。或者本申请实施例提供的模型调整方法的执行主体可以是服务器，例如在终端设备上获取目标对象的多域混合流行为序列。或者本申请实施例提供的推荐方法的执行主体可以是服务器，例如在终端设备上获取待推荐对象的历史行为序列。本申请实施例提供的推荐模型的预训练方法或者模型调整方法或者推荐方法也可以由终端设备和服务器协同执行。故本申请实施例中对于执行本申请技术方案的实现主体不做限定。The execution subject of the pre-training method of the recommendation model provided by the embodiment of the present application may be a terminal device, for example, the single-source domain behavior sequence of an object is obtained on the terminal device. Or the execution subject of the model adjustment method provided by the embodiment of the present application may be a terminal device, for example, the multi-domain mixed flow behavior sequence of the target object is obtained on the terminal device. Or the execution subject of the recommendation method provided by the embodiment of the present application may be a terminal device, for example, the historical behavior sequence of the object to be recommended is obtained on the terminal device. As examples, terminal devices may specifically include but are not limited to mobile phones, desktop computers, tablet computers, notebook computers, PDAs, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, aircraft, etc. The execution subject of the pre-training method of the recommendation model provided by the embodiment of the present application can also be a server, that is, the single-source domain behavior sequence of the object can be obtained on the server. Or the execution subject of the model adjustment method provided by the embodiment of the present application may be a server, for example, obtaining the multi-domain mixed flow behavior sequence of the target object on the terminal device. Or the execution subject of the recommendation method provided by the embodiment of the present application may be a server, for example, obtaining the historical behavior sequence of the object to be recommended on the terminal device. The pre-training method or the model adjustment method or the recommendation method of the recommendation model provided by the embodiments of the present application can also be executed collaboratively by the terminal device and the server. Therefore, in the embodiments of this application, there is no limitation on the implementation entity that implements the technical solution of this application.

图2示例性地展示了一种推荐模型的预训练方法、调整方法、推荐方法的场景架构图。图中包括服务器以及多种形式的终端设备。图1所示的服务器可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统。另外，服务器还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。Figure 2 exemplarily shows the scene architecture diagram of the pre-training method, adjustment method, and recommendation method of a recommendation model. The figure includes servers and various forms of terminal equipment. The server shown in Figure 1 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. In addition, the server can also be a basic cloud that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. Cloud server for computing services.

为了便于理解本申请实施例提供的技术方案，接下来，将结合一种实际应用场景，对本申请实施例提供的一种推荐模型的预训练方法介绍。In order to facilitate understanding of the technical solutions provided by the embodiments of the present application, next, a pre-training method for a recommendation model provided by the embodiments of the present application will be introduced based on a practical application scenario.

参见图3，图3为本申请实施例提供的一种实际应用场景中推荐模型的预训练方法的示意图，在该实际应用场景中，处理设备为具有模型训练功能的服务器300。Referring to Figure 3, Figure 3 is a schematic diagram of a pre-training method for a recommended model in an actual application scenario provided by an embodiment of the present application. In this actual application scenario, the processing device is a server 300 with a model training function.

首先，服务器300获取对象的单源域行为序列以及单源域行为序列中内容对应的多模态信息，其中多模态信息包括至少两种不同模态的信息(如A模态信息和B模态信息)，单源域行为序列包括同一源域的多个内容(如a内容和b内容，其中a内容对应A模态信息，b内容对应B模态信息)，且多个内容受对象触发的时间由先到后排序。为了使多模态信息对应的向量表示可以更具通用性，服务器300将多模态信息(即A模态信息和B模态信息)作为待训练推荐模型的输入，以通过待训练推荐模型处理得到内容在多域通用内容表示空间下的多模态向量表示。然后根据单源域行为序列中内容的排序以及各内容分别在多域通用内容表示空间中的多模态向量表示，获得对象的行为向量表示，以便服务器300可以使待训练推荐模型根据行为向量表示预测对象触发单源域行为序列的末尾内容之后触发的首个相同源域的内容。最后服务器300根据预测触发的首个相同源域的内容和对象在单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型。如此结合上述模型预训练过程可知，在本申请中采用了至少两种不同模态的信息，使各种模态的信息可以相互补充，以构建充分的训练数据集，从而使得预训练后得到的模型可以更具备稳健性。First, the server 300 obtains the object's single-source domain behavior sequence and the multi-modal information corresponding to the content in the single-source domain behavior sequence. The multi-modal information includes at least two different modal information (such as A-modal information and B-modal information). modal information), a single source domain behavior sequence includes multiple contents of the same source domain (such as a content and b content, where a content corresponds to A modal information and b content corresponds to B modal information), and multiple contents are triggered by objects The times are sorted from first to last. In order to make the vector representation corresponding to the multi-modal information more versatile, the server 300 uses the multi-modal information (ie, A-modal information and B-modal information) as the input of the recommendation model to be trained to process it through the recommendation model to be trained. Obtain the multi-modal vector representation of the content in a multi-domain universal content representation space. Then, based on the sorting of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the multi-domain universal content representation space, the behavior vector representation of the object is obtained, so that the server 300 can make the recommendation model to be trained according to the behavior vector representation. The prediction object triggers the first content of the same source domain after the last content of the single-source domain behavior sequence. Finally, the server 300 iteratively adjusts the parameters of the recommendation model to be trained based on the difference between the content of the first same source domain triggered by prediction and the content of the first same source domain actually triggered by the object after the end content of the single source domain behavior sequence, until the adjustment The final model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after pre-training. Combined with the above model pre-training process, it can be seen that in this application, at least two different modalities of information are used, so that the information of various modalities can complement each other to build a sufficient training data set, so that the information obtained after pre-training The model can be more robust.

图4为本申请实施例提供的一种推荐模型的预训练方法的流程图。如图4所示的推荐模型的预训练方法中，包括：Figure 4 is a flow chart of a pre-training method for a recommendation model provided by an embodiment of the present application. The pre-training method of the recommended model shown in Figure 4 includes:

S401：获取对象的单源域行为序列。S401: Obtain the single-source domain behavior sequence of the object.

该单源域行为序列包括同一源域的多个内容，且多个内容依照受对象触发的时间由先到后排序。比如：多个内容包括内容a、内容b和内容c，内容a的受对象触发时间比内容b的受对象触发时间靠前，内容b的受对象触发时间比内容c的受对象触发时间靠前，那么获取到的对象的单源域行为序列包括{内容a，内容b，内容c}。还需要说明的是，由于对象可以在不同的平台上触发内容，因此在本申请中并不限定于仅在同一个平台上获取受对象触发后产生的内容，其中平台可以包括电商平台，在此不做具体限定。比如：平台1给对象推送内容a的信息，当用户在平台1中点击该内容a的信息时，同时平台1会自动跳转到平台2上以触发内容a。The single-source domain behavior sequence includes multiple contents from the same source domain, and the multiple contents are ordered from first to last according to the time when they are triggered by the object. For example: multiple contents include content a, content b and content c. The object triggering time of content a is earlier than the object triggering time of content b. The object triggering time of content b is earlier than the object triggering time of content c. , then the single-source domain behavior sequence of the obtained object includes {content a, content b, content c}. It should also be noted that since objects can trigger content on different platforms, this application is not limited to obtaining content generated after being triggered by objects only on the same platform, where the platform may include an e-commerce platform. This is not specifically limited. For example: Platform 1 pushes the information of content a to the object. When the user clicks on the information of content a in platform 1, platform 1 will automatically jump to platform 2 to trigger content a.

S402：获取单源域行为序列中的内容对应的多模态信息。S402: Obtain multi-modal information corresponding to the content in the single-source domain behavior sequence.

该多模态信息包括至少两种不同模态的信息。比如：模态信息可以为A模态的信息，模态信息也可以为B模态的信息。可以理解的是，多个内容包括内容a、内容b和内容c，内容a对应的多模态信息可以仅包括A模态的信息，内容a对应的多模态信息也可以仅包括B模态的信息，内容a对应的多模态信息还可以包括A模态的信息和B模态的信息，内容b和内容c亦然。需要说明的是，在实际应用中，内容对应的实际多模态信息还需要根据获取的信息而定。如此在本申请中使各种模态信息相互补充，可以更全面地表示内容。The multi-modal information includes information of at least two different modalities. For example: the modal information can be the information of the A mode, and the modal information can also be the information of the B mode. It can be understood that the multiple contents include content a, content b, and content c. The multimodal information corresponding to content a may only include information of modality A, and the multimodal information corresponding to content a may also include only modality B. Information, the multi-modal information corresponding to content a may also include information of modality A and information of modality B, and the same applies to content b and content c. It should be noted that in actual applications, the actual multi-modal information corresponding to the content needs to be determined based on the acquired information. In this way, various modal information complement each other in this application, and the content can be expressed more comprehensively.

S403：将内容对应的多模态信息作为待训练推荐模型的输入，通过待训练推荐模型在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示。S403: Use the multi-modal information corresponding to the content as the input of the recommendation model to be trained, and obtain the multi-modal vector representation of the content in the multi-domain universal content representation space by processing the recommendation model to be trained on the basis of the input multi-modal information. .

该多域通用内容表示空间包括支持对多种源域的内容进行处理得到通用模态向量表示的空间，可以理解的，该多域通用内容空间并不仅限定于对同一种源域的内容进行处理，其中该多域通用内容空间还包括对目标域的内容进行处理，以便在后续模型微调过程中可以通过该多域通用内容空间对目标域中的各内容进行处理获得通用的多模态向量表示，如此使得多模态信息对应的向量表示可以更具通用性，也即通过该多域通过内容空间输出的向量表示可以均被统一识别到。在该阶段中，将内容对应的多模态信息作为待训练推荐模型的输入，以获得内容在多域通用内容表示空间的多模态向量表示。The multi-domain universal content representation space includes a space that supports processing content from multiple source domains to obtain a universal modal vector representation. It can be understood that the multi-domain universal content space is not limited to processing content from the same source domain. , where the multi-domain general content space also includes processing the content of the target domain, so that in the subsequent model fine-tuning process, each content in the target domain can be processed through the multi-domain general content space to obtain a general multi-modal vector representation. , so that the vector representation corresponding to the multi-modal information can be more versatile, that is, the vector representation output through the content space through the multi-domain can be uniformly recognized. In this stage, the multi-modal information corresponding to the content is used as the input of the recommendation model to be trained to obtain a multi-modal vector representation of the content in a multi-domain universal content representation space.

S404：根据单源域行为序列中内容的排序，以及单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示，获得对象在单源域行为序列所属源域的行为向量表示。S404: Based on the sorting of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, obtain the source domain to which the object belongs in the single-source domain behavior sequence. Behavior vector representation.

在本步骤中，该行为向量表示包括对多模态向量表示排序后获得的向量表示，具体的，将处理得到的内容在多域通用内容表示空间的多模态向量表示，结合在单源域行为序列中该内容的排序，获得对象在单源域行为序列中所属源域的行为向量表示。In this step, the behavior vector representation includes a vector representation obtained after sorting multi-modal vector representations. Specifically, the multi-modal vector representation of the processed content in a multi-domain universal content representation space is combined in a single source domain. Sort the content in the behavior sequence to obtain the behavior vector representation of the source domain to which the object belongs in the single-source domain behavior sequence.

S405：由待训练推荐模型基于行为向量表示，预测对象触发单源域行为序列的末尾内容之后触发的首个相同源域的内容。S405: Based on the behavior vector representation of the recommendation model to be trained, predict the first content of the same source domain triggered after the object triggers the last content of the single-source domain behavior sequence.

在本步骤中，该首个相同源域的内容包括同一源域下除单源域行为序列中内容之外的内容。比如：单源域行为序列中的内容包括内容a、内容b和内容c，内容d为待训练推荐模型预测的内容，其中内容d与内容a、内容b、内容c属于相同源域。如此，利用单源域行为序列下内容处理获得的行为向量表示，来预测不存在单源域行为序列中，但与单源域行为序列中内容属于同一源域的内容，也即，使得该待训练推荐模型通过内容a、内容b和内容c来预测对象将要点击的下一内容是否为内容d。In this step, the first content of the same source domain includes content under the same source domain except the content in the single-source domain behavior sequence. For example: the content in the single-source domain behavior sequence includes content a, content b, and content c. Content d is the content predicted by the recommendation model to be trained. Content d belongs to the same source domain as content a, content b, and content c. In this way, the behavior vector representation obtained by content processing under the single-source domain behavior sequence is used to predict the content that does not exist in the single-source domain behavior sequence, but belongs to the same source domain as the content in the single-source domain behavior sequence, that is, the content to be The recommendation model is trained to predict whether the next content that the subject will click on is content d through content a, content b, and content c.

S406：根据预测触发的首个相同源域的内容和对象在单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型。S406: Based on the difference between the content of the first same source domain triggered by prediction and the content of the first same source domain actually triggered by the object after the end content of the single source domain behavior sequence, iteratively adjust the parameters of the recommendation model to be trained until the adjustment The model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after pre-training.

可以理解的，在本步骤中，使待推荐训练模型学习预测触发的首个相同源域的内容，和对象在单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，并根据该差别迭代调整待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，最终预训练结束得到初步推荐模型。如此，在本申请中采用了至少两种不同模态的信息，使各种模态的信息可以相互补充，以构建充分的训练数据集，从而使得预训练后得到的模型可以更具备稳健性。It can be understood that in this step, the training model to be recommended is made to learn the difference between the first content of the same source domain that is predicted to be triggered, and the first content of the same source domain that is actually triggered by the object after the end content of the single-source domain behavior sequence. , and iteratively adjust the parameters of the recommended model to be trained based on the difference until the adjusted model meets the pre-training cutoff conditions, and finally the preliminary recommended model is obtained after the pre-training is completed. In this way, in this application, at least two different modalities of information are used, so that the information of various modalities can complement each other to build a sufficient training data set, so that the model obtained after pre-training can be more robust.

需要说明的是，预训练截止条件包括第一条件和第二条件，其中第一条件为关于预测损失的条件，第二条件为关于对比学习综合损失的条件。第一条件包括预测损失小于第一损失阈值；其中，预测损失为基于预测触发的首个源域的内容和实际触发的首个相同源域的内容的差距得到本申请通过预训练截止。It should be noted that the pre-training cutoff conditions include the first condition and the second condition, where the first condition is the condition regarding the prediction loss, and the second condition is the condition regarding the contrastive learning comprehensive loss. The first condition includes that the prediction loss is less than the first loss threshold; wherein, the prediction loss is based on the difference between the content of the first source domain triggered by prediction and the content of the first source domain actually triggered. This application passes the pre-training cutoff.

第二条件包括对比学习综合损失小于第二损失阈值；其中，对比学习综合损失为有关于跨域序列与内容对比学习任务以及有关于跨域序列与序列对比学习任务的损失。在跨域序列与内容对比学习任务中，单源域行为序列的末尾内容之后实际触发的首个相同源域的内容作为正例，与单源域行为序列同一批次输入到模型中的其他单源域行为序列中涉及其他源域的内容作为负例，其用于增强不同域的通用表示的融合与适配。跨域序列与序列对比学习任务中，单源域行为序列对应的数据缺失序列作为正例，与单源域行为序列同一批次输入到模型中的其他源域的单源域行为序列作为负例。数据缺失序列为通过随机丢弃单源域行为序列中的内容得到的，或者数据缺失序列为随机丢弃单源域行为序列中的内容对应的一种或多种模态信息得到的。可以理解的，可以丢弃单源域行为序列中的内容对应的文本模态信息，或者可以丢弃单源域行为序列中的内容对应的图像模态信息。The second condition includes that the comprehensive loss of contrastive learning is less than the second loss threshold; where the comprehensive loss of contrastive learning is the loss related to the cross-domain sequence and content comparison learning task and the cross-domain sequence and sequence comparison learning task. In the cross-domain sequence and content comparison learning task, the first content of the same source domain actually triggered after the end content of the single-source domain behavior sequence is used as a positive example, and other single-source domain behavior sequences input into the model in the same batch are used as positive examples. Contents involving other source domains in the source domain behavior sequence are used as negative examples, which are used to enhance the fusion and adaptation of universal representations of different domains. In the cross-domain sequence and sequence comparison learning task, the missing data sequence corresponding to the single-source domain behavior sequence is used as a positive example, and the single-source domain behavior sequence input into the model in the same batch as the single-source domain behavior sequence is used as a negative example. . The data missing sequence is obtained by randomly discarding the content in the single-source domain behavior sequence, or the data missing sequence is obtained by randomly discarding one or more modal information corresponding to the content in the single-source domain behavior sequence. It can be understood that the text modality information corresponding to the content in the single-source domain behavior sequence can be discarded, or the image modality information corresponding to the content in the single-source domain behavior sequence can be discarded.

在一种可实现的实施方式中，对比学习综合损失可以使用损失函数来表示，损失函数/>表示如下：In an implementable implementation, the comparative learning comprehensive loss can use the loss function To express, the loss function/> Expressed as follows:

其中，表征第一损失函数，/>表征第二损失函数，λ表征控制损失函数/>中第二损失函数/>权重的超参数，如此，可使调整后的模型满足该损失函数后，预训练结束得到初步推荐模型。in, Characterize the first loss function,/> Characterizes the second loss function, and λ represents the control loss function/> The second loss function/> The hyperparameters of the weights can make the adjusted model satisfy the loss function, and then the preliminary recommended model can be obtained after the pre-training is completed.

具体的，第一损失函数和第二损失函数/>表示如下：Specifically, the first loss function and the second loss function/> Expressed as follows:

其中，第一损失函数表征对跨域序列与内容对比学习任务的损失，第二损失函数/>表征对跨域序列与序列对比学习任务的损失，T表征同一批次中输入到模型中的单源域行为序列总数，τ表征温度超参数，s_j表征待做预测的单源域行为序列，v_j表征跨域序列与内容对比学习任务中的正例，v_j′表征跨域序列与内容对比学习任务中的负例，/>表征跨域序列与序列对比学习任务中的正例，s_j′表征跨域序列与序列对比学习任务中的负例。Among them, the first loss function Represents the loss for cross-domain sequence and content comparison learning tasks, the second loss function/> represents the loss of cross-domain sequence and sequence contrast learning tasks, T represents the total number of single-source domain behavior sequences input into the model in the same batch, τ represents the temperature hyperparameter, s _j represents the single-source domain behavior sequence to be predicted, v _j represents the positive examples in the cross-domain sequence and content comparison learning task, v _j′ represents the negative examples in the cross-domain sequence and content comparison learning task,/> represents the positive examples in the cross-domain sequence and sequence comparison learning task, and s _j′ represents the negative examples in the cross-domain sequence and sequence comparison learning task.

接下来介绍待训练推荐模型的具体构造。图5为本申请实施例提供的一种推荐模型的结构示意图，如图5所示。Next, we introduce the specific structure of the recommendation model to be trained. Figure 5 is a schematic structural diagram of a recommendation model provided by an embodiment of the present application, as shown in Figure 5 .

待训练推荐模型包括内容表示构造模块、行为表示构造模块和预测模块，行为表示构造模块的输入端和输出端分别连接内容表示构造模块的输出端和预测模块的输入端。其中，内容表示构造模块负责完成多模态向量表示处理任务，其具体用于在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示；行为表示构造模块负责行为向量表示的获得任务，其具体用于根据单源域行为序列中内容的排序，以及单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示，获得对象在单源域行为序列所属源域的行为向量表示；预测模块负责内容的预测任务，其具体用于基于行为向量表示，预测对象触发单源域行为序列的末尾内容之后触发的首个相同源域的内容，并在预训练截止条件不满足时，反向传播调整行为表示构造模块和/或内容表示构造模块的参数。The recommendation model to be trained includes a content representation construction module, a behavior representation construction module and a prediction module. The input end and output end of the behavior representation construction module are respectively connected to the output end of the content representation construction module and the input end of the prediction module. Among them, the content representation construction module is responsible for completing the multi-modal vector representation processing task, which is specifically used to process the multi-modal vector representation of the content in a multi-domain universal content representation space based on the input multi-modal information; behavioral representation The construction module is responsible for the task of obtaining behavior vector representation, which is specifically used to sort the content in the single-source domain behavior sequence and to represent the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space. Obtain the behavior vector representation of the source domain to which the object belongs in the single-source domain behavior sequence; the prediction module is responsible for the content prediction task, which is specifically used to predict the first identical trigger after the object triggers the end content of the single-source domain behavior sequence based on the behavior vector representation. content of the source domain, and when pre-training cutoffs are not met, backpropagation adjusts the parameters of the behavioral representation building block and/or the content representation building block.

进一步的，内容表示构造模块包括多模态内容表示构造器和多域映射器。步骤S403(将内容对应的多模态信息作为待训练推荐模型的输入，通过待训练推荐模型在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示)包括以下步骤SA1-SA2(需要说明的是，步骤SA1-SA2未在附图中示出)：Further, the content representation construction module includes a multi-modal content representation constructor and a multi-domain mapper. Step S403 (use the multi-modal information corresponding to the content as the input of the recommendation model to be trained, and process the multi-modal information corresponding to the content on the basis of the input multi-modal information through the recommendation model to be trained to obtain the multi-modal vector of the content in the multi-domain universal content representation space represents) including the following steps SA1-SA2 (it should be noted that steps SA1-SA2 are not shown in the drawings):

SA1：将内容对应的多模态信息输入至多模态内容表示构造器中，通过多模态内容表示构造器对多模态信息进行联合表征学习，得到内容在所属源域的多模态表示。SA1: Input the multimodal information corresponding to the content into the multimodal content representation constructor, and perform joint representation learning on the multimodal information through the multimodal content representation constructor to obtain the multimodal representation of the content in the source domain to which it belongs.

接下来结合图6来说明步骤SA1。图6为本申请实施例提供的一种推荐模型的多模态内容表示构造器的结构示意图，如图6所示。多模态内容表示构造器包括视觉与语言联合表征模型以及适应层，适应层包括图像模态自注意力模块、文本模态自注意力模块以及图像文本跨模态自注意力模块，其中至少两种不同模态的信息包括图像模态信息和文本模态信息。在图6中，将图像模态信息用A1表示，将文本模态信息用A2表示。Next, step SA1 will be described with reference to FIG. 6 . Figure 6 is a schematic structural diagram of a multi-modal content representation constructor of a recommendation model provided by an embodiment of the present application, as shown in Figure 6 . The multimodal content representation constructor includes a visual and language joint representation model and an adaptation layer. The adaptation layer includes an image modality self-attention module, a text modality self-attention module, and an image-text cross-modal self-attention module, at least two of which Different modalities of information include image modal information and text modal information. In FIG. 6 , image modality information is represented by A1 and text modality information is represented by A2.

可以理解的，将内容对应的图像模态信息A1和文本模态信息A2共同输入到视觉与语言联合表征模型中，以通过视觉与语言联合表征模型对图像模态信息A1和文本模态信息A2进行联合表征学习，得到内容在所属源域的第一图像模态向量表示B1和第一文本模态向量表示B2，其中视觉与语言联合表征模型可以包括VilBERT模型，在此不做具体限定，在实际应用中可以采用其他可以实现将视觉与语言联合表征的模型。It can be understood that the image modality information A1 and text modality information A2 corresponding to the content are jointly input into the visual and language joint representation model, so that the image modality information A1 and text modality information A2 are compared through the visual and language joint representation model. Perform joint representation learning to obtain the first image modal vector representation B1 and the first text modal vector representation B2 of the content in the source domain. The visual and language joint representation model can include the VilBERT model, which is not specifically limited here. In practical applications, other models that can jointly represent vision and language can be used.

在获得第一图像模态向量表示B1和第一文本模态向量表示B2之后，通过适应层中的图像模态自注意力模块学习第一图像模态向量表示B1，得到图像模态自注意力模块输出的内容在所属源域的第二图像模态向量表示C1；通过适应层中的图像文本跨模态自注意力模块联合学习第一图像模态向量表示B1和第一文本模态向量表示B2，得到图像文本跨模态自注意力模块输出的内容在所属源域的第一跨模态向量表示C1C2，以及通过适应层中的文本模态自注意力模块学习第一文本模态向量表示B2，得到文本模态自注意力模块输出的内容在所属源域的第二文本模态向量表示C2。如此，在多模态内容表示构造器包括视觉与语言联合表征模型以及适应层时，内容在所属源域的多模态表示包括第二图像模态向量表示、第二文本模态向量表示和第一跨模态向量表示。After obtaining the first image modality vector representation B1 and the first text modality vector representation B2, the first image modality vector representation B1 is learned through the image modality self-attention module in the adaptation layer, and the image modality self-attention is obtained The content output by the module is represented by the second image modal vector representation C1 in the corresponding source domain; the first image modal vector representation B1 and the first text modal vector representation are jointly learned through the image and text cross-modal self-attention module in the adaptation layer. B2, obtain the first cross-modal vector representation C1C2 of the content output by the image text cross-modal self-attention module in the source domain, and learn the first text modal vector representation through the text modal self-attention module in the adaptation layer B2, obtain the second text modality vector representation C2 of the content output by the text modality self-attention module in the source domain to which it belongs. In this way, when the multi-modal content representation constructor includes a visual and language joint representation model and an adaptation layer, the multi-modal representation of the content in the source domain includes the second image modal vector representation, the second text modal vector representation and the third A cross-modal vector representation.

在一种可实现的实施方式中，多模态内容表示构造器包括视觉与语言联合表征模型，具体实现方式与上述过程无异，区别仅在于在此种实现方式中不需要适应层的协作，也可得到多模态向量表示。如此，在多模态内容表示构造器包括视觉与语言联合表征模型时，内容在所属源域的多模态表示包括第一图像模态向量表示和第一文本模态向量表示。具体的，将内容对应的图像模态信息和文本模态信息共同输入到视觉与语言联合表征模型中，以通过视觉与语言联合表征模型对图像模态信息和文本模态信息进行联合表征学习，得到内容在所属源域的第一图像模态向量表示和第一文本模态向量表示。需要说明的是，在得到第一图像模态向量表示和第一文本模态向量表示之后可直接输入到内容表示构造模块的多域映射器中，其获得的多模态向量表示与将第二图像模态向量表示、第二文本模态向量表示和第一跨模态向量表示输入到内容表示构造模块的多域映射器中获得的多模态向量表示相同。In an implementable implementation, the multimodal content representation constructor includes a visual and language joint representation model. The specific implementation method is the same as the above process. The only difference is that in this implementation method, the cooperation of the adaptation layer is not required. Multimodal vector representations are also available. In this way, when the multimodal content representation constructor includes a visual and language joint representation model, the multimodal representation of the content in the source domain includes the first image modality vector representation and the first text modality vector representation. Specifically, the image modality information and text modality information corresponding to the content are jointly input into the visual and language joint representation model, so as to perform joint representation learning on the image modality information and text modality information through the visual and language joint representation model. Obtain the first image modal vector representation and the first text modal vector representation of the content in the corresponding source domain. It should be noted that, after obtaining the first image modal vector representation and the first text modal vector representation, they can be directly input into the multi-domain mapper of the content representation construction module, and the multi-modal vector representation obtained is the same as the second modal vector representation. The image modality vector representation, the second text modality vector representation, and the first cross-modality vector representation are input into the multi-domain mapper of the content representation construction module to obtain the same multi-modal vector representation.

在另一种可实现的实施方式中，可以通过以下公式来获得通过视觉与语言联合表征模型对图像模态信息和文本模态信息进行联合表征学习后，获得的第一图像模态向量表示和第一文本模态向量表示。公式表示如下(在该公式中，所使用的视觉与语言联合表征模型为VilBERT模型)：In another implementable implementation, the following formula can be used to obtain the first image modality vector representation obtained after joint representation learning of image modality information and text modality information through a visual and language joint representation model. First text modal vector representation. The formula is expressed as follows (in this formula, the visual and language joint representation model used is the VilBERT model):

x_i，y_i＝VilBERT([[IMG]；e₁，...，e_j；[CLS]；w₁，...，w_c])x _i , y _i =VilBERT([[IMG]; e ₁ ,..., e _j ; [CLS]; w ₁ ,..., w _c ])

其中，x_i，y_i分别表征第一图像模态向量表示和第一文本模态向量表示；[IMG]；e₁，...，e_j表征图像模态信息，j表征图像模态信息的个数；[CLS]；w₁，...，w_c表征文本模态信息，c表征文本模态信息的个数。Among them, x _i and y _i represent the first image modal vector representation and the first text modal vector representation respectively; [IMG]; e ₁ ,..., e _j represent the image modal information, and j represents the image modal information. The number of;[CLS];w ₁ ,...,w _c represents the text modal information, and c represents the number of text modal information.

SA2：通过多域映射器将多模态表示映射到多域通用内容表示空间，得到内容在多域通用内容表示空间的多模态向量表示。SA2: Map the multi-modal representation to the multi-domain universal content representation space through the multi-domain mapper, and obtain the multi-modal vector representation of the content in the multi-domain universal content representation space.

接下来结合图7来说明步骤SA2。图7为本申请实施例提供的一种推荐模型的多域映射器的结构示意图，如图7所示。多域映射器包括映射层、拼接层和多层感知机。首先通过多域映射器，将内容在所属源域的第二图像模态向量表示C1映射到多域通用内容表示空间，得到第一映射结果E1；将内容在所属源域的第一跨模态向量表示C1C2映射到多域通用内容表示空间，得到第三映射结果E1E2；以及将内容在所属源域的第二文本模态向量表示C2映射到多域通用内容表示空间，得到第二映射结果E2。具体的，映射层包括白化层和混合专家网络层，白化层包括分别对应于图像模态、文本模态和图像文本跨模态的第一白化模块、第二白化模块和第三白化模块；混合专家网络层包括分别对应于图像模态、文本模态和图像文本跨模态的第一混合专家网络、第二混合专家网络和第三混合专家网络；第一混合专家网络、第二混合专家网络和第三混合专家网络均采用面向于多域通用内容表示空间涉及到的多个领域的门控机制；多个领域包括内容所属源域。Next, step SA2 will be described with reference to FIG. 7 . Figure 7 is a schematic structural diagram of a multi-domain mapper of a recommendation model provided by an embodiment of the present application, as shown in Figure 7 . Multi-domain mapper includes mapping layer, concatenation layer and multi-layer perceptron. First, through the multi-domain mapper, the second image modality vector representation C1 of the content in the corresponding source domain is mapped to the multi-domain universal content representation space, and the first mapping result E1 is obtained; the first cross-modal representation of the content in the corresponding source domain is obtained. The vector representation C1C2 is mapped to the multi-domain universal content representation space to obtain the third mapping result E1E2; and the second text modal vector representation C2 of the content in the corresponding source domain is mapped to the multi-domain universal content representation space to obtain the second mapping result E2 . Specifically, the mapping layer includes a whitening layer and a hybrid expert network layer. The whitening layer includes a first whitening module, a second whitening module and a third whitening module corresponding to image modality, text modality and image-text cross-modality respectively; hybrid The expert network layer includes a first hybrid expert network, a second hybrid expert network and a third hybrid expert network respectively corresponding to image modality, text modality and image-text cross-modality; the first hybrid expert network and the second hybrid expert network and the third hybrid expert network both adopt gating mechanisms for multiple fields involved in the multi-domain universal content representation space; multiple fields include the source domain to which the content belongs.

进一步的，为了降低参数之间的冗余性以及排除干扰参数信息，通过第一白化模块对第二图像模态向量表示C1进行白化处理，得到第一白化结果D1；通过第三白化模块对第一跨模态向量表示C1C2进行白化处理，得到第三白化结果D1D2；以及通过第二白化模块对第二文本模态向量表示C2进行白化处理，得到第二白化结果D2。Further, in order to reduce the redundancy between parameters and eliminate interference parameter information, the second image modal vector representation C1 is whitened through the first whitening module to obtain the first whitening result D1; the third whitening module is used to whiten the second image modal vector representation C1. A cross-modal vector representation C1C2 is whitened to obtain the third whitening result D1D2; and the second text modal vector representation C2 is whitened through the second whitening module to obtain the second whitening result D2.

在一种可实现的实施方式中，可以通过以下公式来获得通过白化层处理后得到的白化结果。公式表示如下：In an implementable implementation, the whitening result obtained after processing by the whitening layer can be obtained by the following formula. The formula is expressed as follows:

其中，和/>均表征通过白化层处理后得到的白化结果，x_i，y_i分别表征第一图像模态向量表示和第一文本模态向量表示，b和W₁均为可训练的参数，其中b代表偏置，W₁代表权重。需要说明的是，该可训练的参数包括针对于获得图像模态向量表示对应的白化结果的参数、针对于获得文本模态向量表示对应的白化结果的参数和针对于获得跨模态向量表示对应的白化结果的参数如此，有助于提高模态学习的鲁棒性。in, and/> Both represent the whitening results _obtained after processing by the whitening _layer _. Set, W ₁ represents the weight. It should be noted that the trainable parameters include parameters for obtaining the whitening results corresponding to the image modality vector representation, parameters for obtaining the whitening results corresponding to the text modality vector representation, and parameters for obtaining the corresponding cross-modal vector representations. The parameters of the whitening result are like this, which helps to improve the robustness of modal learning.

为了进一步对参数之间差异性较大的参数进行适应性的调整，可以再通过第一混合专家网络通过门控机制对第一白化结果D1进行处理，得到第一映射结果E1；通过第三混合专家网络通过门控机制对第三白化结果D1D1进行处理，得到第三映射结果E1E2；以及通过第二混合专家网络通过门控机制对第二白化结果D2进行处理，得到第二映射结果E2。最后通过拼接层对第一映射结果E1、第三映射结果E1E2和第二映射结果E2依次拼接，得到拼接结果，并通过多层感知机对拼接结果进行降维处理，得到内容在多域通用内容表示空间的多模态向量表示。In order to further make adaptive adjustments to parameters with large differences between parameters, the first whitening result D1 can be processed through the first mixing expert network through the gating mechanism to obtain the first mapping result E1; through the third mixing The expert network processes the third whitening result D1D1 through the gating mechanism to obtain the third mapping result E1E2; and the second hybrid expert network processes the second whitening result D2 through the gating mechanism to obtain the second mapping result E2. Finally, the first mapping result E1, the third mapping result E1E2 and the second mapping result E2 are sequentially spliced through the splicing layer to obtain the splicing result, and the multi-layer perceptron is used to perform dimensionality reduction processing on the splicing result to obtain universal content in multiple domains. A multimodal vector representation of representation space.

在另一种可实现的实施方式中，可以通过以下公式来获得通过混合专家网络层处理后得到的映射结果。公式表示如下：In another implementable implementation, the mapping result obtained after processing by the hybrid expert network layer can be obtained by the following formula. The formula is expressed as follows:

FFN(x)＝(GeLU(xW₁+b₁))W₂+b₂ FFN(x)=(GeLU(xW ₁ +b ₁ ))W ₂ +b ₂

其中，v_i表征通过混合专家网络层处理后得到的映射结果，gk表征权重，G表征混合专家网络层中门控的个数，k表征第k个门控，表征白化结果经过第k个门控的输出，k在总的门控个数G中取值；GeLU表征激活函数，W₁和W₂均表征随机参数对应的矩阵，b₁和b₂均表征随机参数对应的矩阵的偏置。需要说明的是，混合专家网络层中的门控机制G是相互独立的，是可以提前预设的，其门控机制的数量可以比混合专家网络层的层数多。Among them, _vi represents the mapping result obtained after processing by the hybrid expert network layer, gk represents the weight, G represents the number of gates in the hybrid expert network layer, k represents the kth gate, It represents the output of the kth gate after the whitening result, and k takes the value in the total number of gates G; GeLU represents the activation function, W ₁ and W ₂ both represent the matrix corresponding to the random parameter, and b ₁ and b ₂ both represent The bias of the matrix corresponding to the random parameter. It should be noted that the gating mechanisms G in the hybrid expert network layer are independent of each other and can be preset in advance, and the number of their gating mechanisms can be greater than the number of layers in the hybrid expert network layer.

进一步的，行为表示构造模块包括对象行为编码器。步骤S404(根据单源域行为序列中内容的排序，以及单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示，获得对象在单源域行为序列所属源域的行为向量表示)包括以下步骤SB1-SB3(需要说明的是，步骤SB1-SB3未在附图中示出)：Further, the behavior representation building module includes an object behavior encoder. Step S404 (According to the sorting of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, obtain the source domain to which the object belongs in the single-source domain behavior sequence. (behavior vector representation) includes the following steps SB1-SB3 (it should be noted that steps SB1-SB3 are not shown in the drawings):

SB1：根据单源域行为序列中内容的排序，得到内容在单源域行为序列中的位置信息。SB1: Based on the sorting of the content in the single-source domain behavior sequence, obtain the position information of the content in the single-source domain behavior sequence.

SB2：根据单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示，以及内容在单源域行为序列中的位置信息，得到单源域行为序列对应的内容表示序列。SB2: Based on the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, and the position information of the content in the single-source domain behavior sequence, obtain the content representation corresponding to the single-source domain behavior sequence. sequence.

具体的，首先将内容在单源域行为序列中的位置信息编码为位置向量表示，在此阶段后将同一内容在多域通用内容表示空间中的多模态向量表示以及在单源域行为序列中的位置向量表示相加，得到该内容对应的向量叠加结果，最后根据单源域行为序列中的各内容分别对应的向量叠加结果，完成构建单源域行为序列对应的内容表示序列。Specifically, the position information of the content in the single-source domain behavior sequence is first encoded into a position vector representation. After this stage, the multi-modal vector representation of the same content in the multi-domain universal content representation space and the single-source domain behavior sequence are The position vector representations in are added to obtain the vector superposition result corresponding to the content. Finally, based on the vector superposition results corresponding to each content in the single-source domain behavior sequence, the content representation sequence corresponding to the single-source domain behavior sequence is completed.

SB3：将内容表示序列作为对象行为编码器的输入，通过对象行为编码器对内容表示序列进行编码处理，得到对象在单源域行为序列所属源域的行为向量表示。SB3: Use the content representation sequence as the input of the object behavior encoder, and encode the content representation sequence through the object behavior encoder to obtain the behavior vector representation of the source domain to which the object's single-source domain behavior sequence belongs.

接下来结合图8来说明步骤SB1-SB2。图8为本申请实施例提供的一种推荐模型的处理示意图，如图8所示。在图8中示出的是单源域序列中各内容对应的内容表示(也即上述文中的向量叠加结果)，并示出单源域序列中的各内容包括内容a、内容b和内容c。由图8可知，内容a的向量叠加结果在单源域序列中的位置信息为位置3，内容b的向量叠加结果在单源域序列中的位置信息为位置1，内容c的向量叠加结果在单源域序列中的位置信息为位置2。在此阶段，将内容a、内容b和内容c的向量叠加结果构建成内容表示序列{1，2，3}，并输入到对象行为编码器中。如此通过对象行为编码器对内容表示序列进行编码处理，得到对象在单源域行为序列所属源域的行为向量表示。Next, steps SB1-SB2 will be described with reference to FIG. 8 . Figure 8 is a schematic diagram of processing of a recommendation model provided by the embodiment of the present application, as shown in Figure 8 . Figure 8 shows the content representation corresponding to each content in the single source domain sequence (that is, the vector superposition result in the above text), and shows that each content in the single source domain sequence includes content a, content b, and content c. . As can be seen from Figure 8, the position information of the vector superposition result of content a in the single source domain sequence is position 3, the position information of the vector superposition result of content b in the single source domain sequence is position 1, and the vector superposition result of content c is at position 1. The position information in the single-source domain sequence is position 2. At this stage, the vector superposition results of content a, content b and content c are constructed into a content representation sequence {1, 2, 3} and input into the object behavior encoder. In this way, the content representation sequence is encoded by the object behavior encoder, and the behavior vector representation of the source domain to which the single-source domain behavior sequence belongs to the object is obtained.

需要说明的是，本申请中的对象行为编码器可以包括transformer架构，transformer架构包括：多头自注意力层(用MHAttn(·)表示)和点前馈网络(用FFN(·)表示)。其中点前馈网络包括由ReLu激活的多层感知机。综上，本申请在根据对象的单源域序列获得多模态信息之后，将内容对应的多模态信息输入到待训练推荐模型中，以通过内容表示构造模块处理获得内容在多域通用内容表示空间的多模态向量表示，在此阶段，再根据行为表示构造模块对多模态向量表示进行处理，获得行为向量表示，最后再根据预测模块在预训练截止条件不满足时，反向调整内容表示构造模块和\或行为表示构造模块的参数，以获得初步推荐模型。如此经过对对象的单源域序列进行预训练学习可以得到初步推荐模型。为了能使初步推荐模型更具通用性，可以将该初步推荐模型从源域向目标域的迁移，以使初步推荐模型在目标域上来调整模型参数，以获得更精确的推荐模型。接下来介绍对初步推荐模型调整的模型调整方法。It should be noted that the object behavior encoder in this application may include a transformer architecture. The transformer architecture includes: a multi-head self-attention layer (represented by MHAttn(·)) and a point feedforward network (represented by FFN(·)). The point feedforward network includes a multi-layer perceptron activated by ReLu. In summary, after obtaining multi-modal information based on the single source domain sequence of the object, this application inputs the multi-modal information corresponding to the content into the recommendation model to be trained, so as to obtain multi-domain universal content through content representation construction module processing. Represents the multi-modal vector representation of the space. At this stage, the multi-modal vector representation is processed according to the behavioral representation construction module to obtain the behavioral vector representation. Finally, the prediction module is used to reversely adjust when the pre-training cutoff condition is not met. Content represents the building blocks and\or behavior represents the parameters of the building blocks to obtain a preliminary recommendation model. In this way, a preliminary recommendation model can be obtained by pre-training and learning the single-source domain sequence of the object. In order to make the preliminary recommendation model more versatile, the preliminary recommendation model can be migrated from the source domain to the target domain, so that the preliminary recommendation model can adjust model parameters in the target domain to obtain a more accurate recommendation model. Next, we introduce the model adjustment method for preliminary recommendation model adjustment.

图9为本申请实施例提供的一种模型调整方法的流程图。如图9所示的一种模型调整方法中，包括：Figure 9 is a flow chart of a model adjustment method provided by an embodiment of the present application. A model adjustment method as shown in Figure 9 includes:

S901：获取目标对象的多域混合流行为序列。S901: Obtain the multi-domain mixed flow behavior sequence of the target object.

该多域混合流行为序列包括多个领域的多个内容，且多个领域的多个内容依照受目标对象触发的时间由先到后排序。需要说明的是，多域混合流行为序列涉及的多个领域中包括目标域，该模型调整方法用于对经过上述过程预训练得到的初步推荐模型进行调整，以实现初步推荐模型从源域向目标域的迁移。The multi-domain mixed flow behavior sequence includes multiple contents in multiple domains, and the multiple contents in multiple domains are ordered from first to last according to the time triggered by the target object. It should be noted that the multiple domains involved in the multi-domain mixed flow behavior sequence include the target domain. This model adjustment method is used to adjust the preliminary recommendation model obtained through pre-training through the above process, so as to realize the transition of the preliminary recommendation model from the source domain to Migration of target domain.

S902：基于多域混合流行为序列中的各内容对应的多模态信息，通过初步推荐模型，分别获得多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示。S902: Based on the multi-modal information corresponding to each content in the multi-domain mixed flow behavior sequence, obtain the multi-modal vector representation of each content in the multi-domain mixed flow behavior sequence in the multi-domain universal content representation space through the preliminary recommendation model. .

可以理解的，在本步骤中获得多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示，与上述过程中获得单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示无异。It can be understood that the multi-modal vector representation of each content in the multi-domain mixed flow behavior sequence obtained in this step in the multi-domain universal content representation space is different from the above process obtaining the multi-domain representation of each content in the single-source domain behavior sequence. Multimodal vector representations in a universal content representation space are indistinguishable.

S903：根据多域混合流行为序列中的各内容的排序，以及多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得目标对象在多域混合流行为序列对应的多域混合流行为向量表示。S903: According to the sorting of each content in the multi-domain mixed flow behavior sequence and the multi-modal vector representation of each content in the multi-domain mixed flow behavior sequence in the multi-domain universal content representation space, obtain the multi-domain mixed flow behavior of the target object Multi-domain mixed flow behavior vector representation corresponding to the sequence.

进一步可以理解的，在本步骤中获得目标对象在多域混合流行为序列对应的多域混合流行为向量表示，与上述过程中对象在单源域行为序列所属源域的行为向量表示无异。It can be further understood that the multi-domain mixed flow behavior vector representation corresponding to the target object's multi-domain mixed flow behavior sequence obtained in this step is the same as the behavior vector representation of the source domain to which the object's single-source domain behavior sequence belongs in the above process.

S904：由初步推荐模型基于多域混合流行为向量表示，预测目标对象触发多域混合流行为序列的末尾内容之后，触发的首个目标域的内容。S904: Based on the multi-domain mixed flow behavior vector representation, the preliminary recommendation model predicts the content of the first target domain triggered by the target object after triggering the end content of the multi-domain mixed flow behavior sequence.

接下来结合图10来说明步骤S904。图10为本申请实施例提供的一种调整模型的处理示意图，如图10所示。在图10中示出的是多域混合流序列中各内容对应的内容表示(也即上述文中的向量叠加结果)，并示出多域包括域A、域B和域C，其中域A中的各内容包括内容a和内容b，并且示出内容a的向量叠加结果在该域A序列中的位置信息为位置1，以及内容b的向量叠加结果在该域A序列中的位置信息为位置3；域B中的各内容包括内容c，并且示出内容c的向量叠加结果在该域B序列中的位置信息为位置2；域C中的各内容包括内容d，并且示出内容d的向量叠加结果在该域C序列中的位置信息为位置4。如此，构建获得的多域混合流行为向量表示为{(内容a)1，(内容c)2，(内容b)3，(内容d)4}。在此阶段，将多域混合流行为向量表示输入到对象行为编码器中,通过初步推荐模型预测目标对象触发多域混合流行为序列的末尾内容之后，触发的首个目标域的内容为内容n。Next, step S904 will be described with reference to FIG. 10 . Figure 10 is a schematic diagram of a process of adjusting a model provided by an embodiment of the present application, as shown in Figure 10 . What is shown in Figure 10 is the content representation corresponding to each content in the multi-domain mixed stream sequence (that is, the vector superposition result in the above article), and shows that the multi-domain includes domain A, domain B and domain C, where domain A Each content of includes content a and content b, and shows that the position information of the vector superposition result of content a in the domain A sequence is position 1, and the position information of the vector superposition result of content b in the domain A sequence is position 3; Each content in domain B includes content c, and the position information of the vector superposition result of content c in the sequence of domain B is position 2; each content in domain C includes content d, and the position information of content d is shown. The position information of the vector superposition result in the domain C sequence is position 4. In this way, the multi-domain mixed flow behavior vector obtained by the construction is expressed as {(content a) 1, (content c) 2, (content b) 3, (content d) 4}. At this stage, the multi-domain mixed flow behavior vector representation is input into the object behavior encoder. After the target object triggers the end content of the multi-domain mixed flow behavior sequence through the preliminary recommendation model, the content of the first target domain triggered is content n .

S905：根据预测触发的目标域的内容和目标对象在多域混合流行为序列的末尾内容之后，实际触发的首个目标域的内容的差别，迭代调整初步推荐模型的参数，直至模型调整好满足预设微调截止条件，结束调整得到目标推荐模型。S905: Based on the difference between the content of the target domain that is predicted to be triggered and the content of the first target domain that is actually triggered by the target object after the end of the multi-domain mixed flow behavior sequence, iteratively adjust the parameters of the preliminary recommendation model until the model is adjusted to meet the requirements Preset fine-tuning cutoff conditions, and end the adjustment to obtain the target recommendation model.

接下来结合图11来说明步骤S905。图11为本申请实施例提供的一种调整模型的调整示意图，如图11所示。为了更好地捕捉各内容的行为向量表示之间的关系，对各内容的行为向量表示添加上内容id，各内容id均是不相同。在图11中示出的是将多域混合流行为向量表示结合多域混合流行为向量表示对应的各内容的id，来预测触发的目标域的内容。在图11中示出(内容a)1对应的id为id1，(内容c)2对应的id为id2，(内容b)3对应的id为id3，(内容d)4对应的id为id4，目标域为域D。如此，通过初步推荐模型预测目标对象触发多域混合流行为序列的末尾内容之后，触发的首个目标域的内容为内容n，然后根据实际触发的首个目标域的内容e和内容n之间的差别，迭代调整初步推荐模型的参数，直至模型调整好满足预设微调截止条件，结束调整得到目标推荐模型。还需要说明的是，在本申请中着重调整初步推荐模型中多域映射器部分的参数。Next, step S905 will be described with reference to FIG. 11 . FIG. 11 is an adjustment schematic diagram of an adjustment model provided by an embodiment of the present application, as shown in FIG. 11 . In order to better capture the relationship between the behavior vector representations of each content, content IDs are added to the behavior vector representations of each content, and each content ID is different. What is shown in Figure 11 is that the content of the triggered target domain is predicted by combining the multi-domain mixed flow behavior vector representation with the ID of each content corresponding to the multi-domain mixed flow behavior vector representation. Figure 11 shows that the id corresponding to (content a)1 is id1, the id corresponding to (content c)2 is id2, the id corresponding to (content b)3 is id3, and the id corresponding to (content d)4 is id4. The target domain is domain D. In this way, after the target object triggers the end content of the multi-domain mixed flow behavior sequence through the preliminary recommendation model, the content of the first target domain triggered is content n, and then based on the actual triggered content of the first target domain e and content n difference, iteratively adjust the parameters of the preliminary recommendation model until the model is adjusted to meet the preset fine-tuning cutoff conditions, and then the adjustment is completed to obtain the target recommendation model. It should also be noted that in this application, the focus is on adjusting the parameters of the multi-domain mapper part of the preliminary recommendation model.

在一种可实现的实施方式中，可以通过以下公式来获得多域混合流行为向量表示以及预测的内容。公式表示如下：In an implementable implementation, the multi-domain mixed flow behavior vector representation and predicted content can be obtained through the following formula. The formula is expressed as follows:

其中，p_j表征内容在对象行为编码器中的位置，F^l表征多域混合流行为向量表示，将作为F^l，F^l+1表征预测的触发的首个目标域的内容，其中l的值可以自定义设置。Among them, p _j represents the position of the content in the object behavior encoder, and F ^l represents the multi-domain mixed flow behavior vector representation. As F ^l , F ^l+1 represents the content of the first target domain of the predicted trigger, where the value of l can be customized.

还需要说明的是，在本申请还可以通过以下公式来获得预测的内容的概率。公式表示如下：It should also be noted that in this application, the probability of the predicted content can also be obtained through the following formula. The formula is expressed as follows:

其中，表征预测下一时刻内容的概率，/>表征前t时刻的行为序列，/>表征下一时刻要预测的内容，/>下一时刻要预测的内容对应的内容id。in, Represents the probability of predicting the content of the next moment,/> Characterizes the behavioral sequence at the previous t moment,/> Represents the content to be predicted at the next moment,/> The content id corresponding to the content to be predicted at the next moment.

综上，通过多域混合流序列对初步推荐模型进行调整后，获得更精确的目标推荐模型。该目标推荐模型可以用于对待推荐对象推荐其最感兴趣的内容。接下来介绍利用该目标推荐模型对待推荐对象推荐内容的推荐方法。In summary, after adjusting the preliminary recommendation model through multi-domain mixed flow sequences, a more accurate target recommendation model is obtained. This target recommendation model can be used to recommend content that is of most interest to the recommendation object. Next, we introduce the recommendation method using this target recommendation model to treat recommended content of recommended objects.

图12为本申请实施例提供的一种推荐方法的流程图。如图12所示的一种推荐方法中，该推荐方法采用经过上述模型调整过程获得的目标推荐模型进行推荐，包括：Figure 12 is a flow chart of a recommendation method provided by an embodiment of the present application. In a recommendation method as shown in Figure 12, the recommendation method uses the target recommendation model obtained through the above model adjustment process for recommendation, including:

S1201：获取待推荐对象的历史行为序列。S1201: Obtain the historical behavior sequence of the object to be recommended.

该历史行为序列中至少包含隶属于目标域的内容，且历史行为序列中的各内容依照受待推荐对象触发的时间由先到后排序。The historical behavior sequence at least contains content belonging to the target domain, and each content in the historical behavior sequence is ordered from first to last according to the time when the recommended object is triggered.

S1202：基于历史行为序列中的各内容对应的多模态信息，通过目标推荐模型，分别获得历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示。S1202: Based on the multi-modal information corresponding to each content in the historical behavior sequence, through the target recommendation model, obtain the multi-modal vector representation of each content in the historical behavior sequence in the multi-domain universal content representation space.

可以理解的，在本步骤中获得历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示，与上述过程中获得单源域行为序列中各内容分别在多域通用内容表示空间中的多模态向量表示无异。It can be understood that the multi-modal vector representation of each content in the historical behavior sequence in the multi-domain universal content representation space is obtained in this step, which is different from the multi-domain universal content representation of each content in the single-source domain behavior sequence obtained in the above process. Multimodal vector representations in space are no different.

S1203：根据历史行为序列中的各内容的排序，以及历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得待推荐对象在历史行为序列对应的历史行为向量表示。S1203: According to the sorting of each content in the historical behavior sequence and the multi-modal vector representation of each content in the historical behavior sequence in the multi-domain universal content representation space, obtain the historical behavior vector representation corresponding to the historical behavior sequence of the object to be recommended.

进一步可以理解的，在本步骤中获得待推荐对象在历史行为序列对应的历史行为向量表示，与上述过程中对象在单源域行为序列所属源域的行为向量表示无异。It can be further understood that the historical behavior vector representation corresponding to the historical behavior sequence of the object to be recommended in this step is obtained, which is the same as the behavior vector representation of the source domain to which the object's behavior sequence in a single source domain belongs in the above process.

S1204：由目标推荐模型基于历史行为向量表示，预测待推荐对象触发历史行为序列的末尾内容之后，触发的首个目标域的内容。S1204: Based on the historical behavior vector representation, the target recommendation model predicts the content of the first target domain triggered after the object to be recommended triggers the end content of the historical behavior sequence.

S1205：向待推荐对象推荐目标推荐模型预测出的首个目标域的内容。S1205: Recommend the content of the first target domain predicted by the target recommendation model to the object to be recommended.

表1为本申请实施例提供的一种推荐模型的预训练方法、调整方法、推荐方法的数据集示意表，如表1所示。Avg.n表示行为序列的平均长度，Sparsity表示数据稀疏度，家庭类、衣物类和办公类的数据集包括同一源域数据集，食物类、仪器类、电子产品类、艺术类和运动类包括目标域数据集。需要说明的是，在本申请中，为了对对象的各内容按时间顺序形成序列，可以采用留一法进行数据集的划分。Table 1 is a schematic representation of the data set of the pre-training method, adjustment method, and recommendation method of a recommendation model provided by the embodiment of the present application, as shown in Table 1. Avg.n represents the average length of the behavior sequence, and Sparsity represents the data sparsity. The household, clothing, and office data sets include the same source domain data set, and the food, instrument, electronic product, art, and sports categories include Target domain data set. It should be noted that in this application, in order to form a sequence for each content of the object in chronological order, the leave-one-out method can be used to divide the data set.

表1Table 1

表2为本申请实施例提供的一种推荐模型的预训练方法、调整方法、推荐方法的结果对比示意表，如表2所示。与现有方案相比，本方案所提出的方法在输入数据上，综合考虑了内容文本、内容图像和内容ID，以及所提出的方法在所使用的迁移学习技术上，综合考虑了预训练模型、域自适应、混合流和跨域技术。如此，通过本方案构建的模型可以更加具备稳健性、通用性和鲁棒性。Table 2 is a comparison table of the results of the pre-training method, adjustment method, and recommendation method of a recommendation model provided by the embodiment of the present application, as shown in Table 2. Compared with existing solutions, the method proposed in this solution comprehensively considers the content text, content image and content ID in the input data, and the proposed method comprehensively considers the pre-training model in the transfer learning technology used. , domain adaptive, hybrid flow and cross-domain technologies. In this way, the model constructed through this solution can be more robust, versatile and robust.

表2Table 2

表3为本申请实施例提供的另一种推荐模型的预训练方法、调整方法、推荐方法的结果对比示意表，如表3所示。在本申请中采用召回率(Recall)和/或归一化折损累计增益(NDGG)来评价方案性能，其中召回率和增益的取值除了5之外，还可以取值10、15、20等，具体取值还可根据实际需求设定。相较于现有方案，本方案在召回率和增益方面平均提升了+2.90至+14.49％(+14.49％是将召回率取值20后获得的结果)，其使得通过跨模态的映射器学习到的多模态向量表示是更加具备鲁棒性和更多信息量的。Table 3 is a comparison table of the results of the pre-training method, adjustment method, and recommendation method of another recommendation model provided by the embodiment of the present application, as shown in Table 3. In this application, recall and/or normalized loss cumulative gain (NDGG) are used to evaluate the performance of the solution. In addition to 5, the values of recall and gain can also take values of 10, 15, and 20. etc. The specific value can also be set according to actual needs. Compared with the existing scheme, this scheme has an average improvement of +2.90 to +14.49% in terms of recall and gain (+14.49% is the result obtained after setting the recall to a value of 20), which allows the cross-modal mapper to The learned multi-modal vector representation is more robust and informative.

表3table 3

参见图13，图13为本申请实施例提供的实际应用中本方案推荐模型和相关技术模型的增益效果对比图。如图13所示，该折线对比图以增益(NDGG@10)作为基准，其中各折线图的纵坐标表示增益结果，横坐标表示数据丢失率。具体的，图13中1301折线图表示艺术类数据集，其中1301折线图中线段1表示假定本方案中图像数据丢失、线段2表示假定本方案中文本数据丢失和线段3表示现有方案中文本数据丢失；图13中1302折线图表示电子产品类数据集，其中1302折线图中线段4表示假定本方案中图像数据丢失、线段5表示假定本方案中文本数据丢失和线段6表示现有方案中文本数据丢失；图13中1303折线图表示食物类数据集，其中1303折线图中线段7表示假定本方案中图像数据丢失、线段8表示假定本方案中文本数据丢失和线段9表示现有方案中文本数据丢失。如此可见，本方案中提出的方法在图像数据或文本数据丢失的情况下均显著于现有方案中的方法，本方案中提出的方法在实际应用方面更加具备鲁棒性和具有更多信息量。Refer to Figure 13. Figure 13 is a comparison chart of the gain effects of the recommended model of this solution and the related technology model in practical applications provided by the embodiment of the present application. As shown in Figure 13, the line comparison chart uses gain (NDGG@10) as the benchmark, in which the ordinate of each line chart represents the gain result, and the abscissa represents the data loss rate. Specifically, the 1301 line chart in Figure 13 represents the art data set. Line segment 1 in the 1301 line chart represents the assumption that image data is lost in this solution, line segment 2 represents the assumption that text data is lost in this solution, and line segment 3 represents the text in the existing solution. Data loss; the 1302 line chart in Figure 13 represents the electronic product data set, in which line segment 4 in the 1302 line chart represents the assumption that image data is lost in this solution, line segment 5 represents the assumption that text data is lost in this solution, and line segment 6 represents the existing solution in Chinese This data is lost; the 1303 line chart in Figure 13 represents the food data set, in which line segment 7 in the 1303 line chart represents the assumption that image data is lost in this solution, line segment 8 represents the assumption that text data is lost in this solution, and line segment 9 represents the existing solution in Chinese This data is lost. It can be seen that the method proposed in this scheme is significantly better than the method in the existing scheme even when the image data or text data is lost. The method proposed in this scheme is more robust and has more information in practical applications. .

基于前文实施例提供的推荐模型的预训练方法，本申请中还相应提供了一种推荐模型的预训练装置。以下结合图14进行说明。图14为本申请实施例提供的推荐模型的预训练装置的结构示意图。如图14所示的推荐模型的预训练装置包括：Based on the pre-training method of the recommendation model provided in the foregoing embodiments, this application also provides a pre-training device for the recommendation model. Description will be made below with reference to Figure 14. Figure 14 is a schematic structural diagram of a pre-training device for a recommendation model provided by an embodiment of the present application. The pre-training device of the recommended model shown in Figure 14 includes:

行为序列获取模块1401，用于获取对象的单源域行为序列；所述单源域行为序列包括同一源域的多个内容，且所述多个内容依照受所述对象触发的时间由先到后排序；The behavior sequence acquisition module 1401 is used to obtain the single-source domain behavior sequence of the object; the single-source domain behavior sequence includes multiple contents of the same source domain, and the multiple contents are in order of arrival according to the time triggered by the object. post sorting;

多模态信息获取模块1402，用于获取所述单源域行为序列中的内容对应的多模态信息；所述多模态信息包括至少两种不同模态的信息；The multi-modal information acquisition module 1402 is used to obtain multi-modal information corresponding to the content in the single-source domain behavior sequence; the multi-modal information includes information of at least two different modalities;

信息输入确定模块1403，用于将内容对应的多模态信息作为待训练推荐模型的输入，通过所述待训练推荐模型在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示；The information input determination module 1403 is used to use the multi-modal information corresponding to the content as the input of the recommendation model to be trained, and process the multi-domain general content of the content based on the input multi-modal information through the recommendation model to be trained. Multimodal vector representation of representation space;

行为表示构造模块1404，用于根据所述单源域行为序列中内容的排序，以及所述单源域行为序列中各内容分别在所述多域通用内容表示空间中的多模态向量表示，获得所述对象在所述单源域行为序列所属源域的行为向量表示；The behavior representation construction module 1404 is used to sort the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, Obtain the behavior vector representation of the source domain to which the single-source domain behavior sequence belongs for the object;

相同源域内容预测模块1405，用于由所述待训练推荐模型基于所述行为向量表示，预测所述对象触发所述单源域行为序列的末尾内容之后触发的首个相同源域的内容；The same source domain content prediction module 1405 is used to predict the content of the first same source domain triggered after the object triggers the end content of the single source domain behavior sequence based on the behavior vector representation by the recommendation model to be trained;

初步推荐模型获得模块1406，用于根据预测触发的首个相同源域的内容和所述对象在所述单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整所述待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型。The preliminary recommendation model acquisition module 1406 is used to iterate based on the difference between the content of the first same source domain that is predicted to be triggered and the content of the first same source domain that is actually triggered by the object after the end content of the single-source domain behavior sequence. Adjust the parameters of the recommended model to be trained until the adjusted model meets the pre-training cutoff conditions, and a preliminary recommended model is obtained after the pre-training is completed.

可选的，所述待训练推荐模型包括内容表示构造模块、行为表示构造模块和预测模块；Optionally, the recommendation model to be trained includes a content representation construction module, a behavior representation construction module and a prediction module;

所述内容表示构造模块，用于在所输入的多模态信息的基础上处理得到内容在多域通用内容表示空间的多模态向量表示；The content representation construction module is used to process and obtain a multi-modal vector representation of the content in a multi-domain universal content representation space based on the input multi-modal information;

所述行为表示构造模块1404，用于根据所述单源域行为序列中内容的排序，以及所述单源域行为序列中各内容分别在所述多域通用内容表示空间中的多模态向量表示，获得所述对象在所述单源域行为序列所属源域的行为向量表示；The behavior representation construction module 1404 is used to sort the content in the single-source domain behavior sequence and the multi-modal vectors of each content in the single-source domain behavior sequence in the multi-domain universal content representation space. Represent, obtain the behavior vector representation of the object in the source domain to which the single-source domain behavior sequence belongs;

所述预测模块，用于基于所述行为向量表示，预测所述对象触发所述单源域行为序列的末尾内容之后触发的首个相同源域的内容，并在所述预训练截止条件不满足时，反向传播调整所述行为表示构造模块和/或所述内容表示构造模块的参数。The prediction module is used to predict, based on the behavior vector representation, the content of the first same source domain triggered by the object after triggering the end content of the single-source domain behavior sequence, and when the pre-training cutoff condition is not met. , back propagation adjusts the parameters of the behavioral representation building blocks and/or the content representation building blocks.

可选的，所述内容表示构造模块包括：多模态内容表示构造器和多域映射器，所述信息输入确定模块1403，包括：Optionally, the content representation construction module includes: a multi-modal content representation constructor and a multi-domain mapper, and the information input determination module 1403 includes:

多模态信息学习单元，用于将内容对应的多模态信息输入至所述多模态内容表示构造器中，通过所述多模态内容表示构造器对所述多模态信息进行联合表征学习，得到内容在所属源域的多模态表示；A multimodal information learning unit is used to input multimodal information corresponding to the content into the multimodal content representation constructor, and jointly represent the multimodal information through the multimodal content representation constructor. Learn to obtain the multi-modal representation of the content in the source domain;

多模态信息映射单元，用于通过所述多域映射器将所述多模态表示映射到所述多域通用内容表示空间，得到内容在所述多域通用内容表示空间的多模态向量表示。A multi-modal information mapping unit configured to map the multi-modal representation to the multi-domain universal content representation space through the multi-domain mapper to obtain a multi-modal vector of the content in the multi-domain universal content representation space. express.

可选的，所述多模态内容表示构造器包括：视觉与语言联合表征模型以及适应层，所述适应层包括图像模态自注意力模块、文本模态自注意力模块以及图像文本跨模态自注意力模块，所述多模态信息学习单元，包括：Optionally, the multi-modal content representation constructor includes: a visual and language joint representation model and an adaptation layer. The adaptation layer includes an image modality self-attention module, a text modality self-attention module, and an image-text cross-modality module. Modal self-attention module, the multi-modal information learning unit includes:

模态信息联合学习单元，用于将内容对应的图像模态信息和文本模态信息共同输入到所述视觉与语言联合表征模型中，通过所述视觉与语言联合表征模型对所述图像模态信息和所述文本模态信息进行联合表征学习，得到内容在所属源域的第一图像模态向量表示和第一文本模态向量表示；A modal information joint learning unit is used to jointly input image modal information and text modal information corresponding to the content into the visual and language joint representation model, and use the visual and language joint representation model to Perform joint representation learning on the information and the text modality information to obtain the first image modality vector representation and the first text modality vector representation of the content in the source domain to which it belongs;

模态向量表示学习单元，用于通过所述图像模态自注意力模块学习所述第一图像模态向量表示，通过所述文本模态自注意力模块学习所述第一文本模态向量表示，以及通过所述图像文本跨模态自注意力模块联合学习所述第一图像模态向量表示和所述第一文本模态向量表示，得到所述图像模态自注意力模块、所述文本模态自注意力模块以及所述图像文本跨模态自注意力模块分别输出的内容在所属源域的第二图像模态向量表示、第二文本模态向量表示以及第一跨模态向量表示。A modality vector representation learning unit, configured to learn the first image modality vector representation through the image modality self-attention module, and learn the first text modality vector representation through the text modality self-attention module. , and jointly learn the first image modality vector representation and the first text modality vector representation through the image and text cross-modal self-attention module, to obtain the image modality self-attention module, the text The content output by the modal self-attention module and the image-text cross-modal self-attention module respectively is represented by the second image modal vector representation, the second text modal vector representation, and the first cross-modal vector representation of the source domain. .

可选的，所述多域映射器包括映射层、拼接层和多层感知机，所述多模态信息映射单元，包括：Optionally, the multi-domain mapper includes a mapping layer, a splicing layer and a multi-layer perceptron, and the multi-modal information mapping unit includes:

向量映射结果获得单元，用于通过所述多域映射器，将内容在所属源域的第二图像模态向量表示、第二文本模态向量表示以及第一跨模态向量表示分别映射到所述多域通用内容表示空间，得到所述第二图像模态向量表示对应的第一映射结果、所述第二文本模态向量表示对应的第二映射结果和所述第一跨模态向量表示对应的第三映射结果；A vector mapping result obtaining unit configured to map the second image modal vector representation, the second text modal vector representation, and the first cross-modal vector representation of the content in the corresponding source domain to the respective source domains through the multi-domain mapper. The multi-domain universal content representation space is used to obtain the first mapping result corresponding to the second image modal vector representation, the second mapping result corresponding to the second text modal vector representation and the first cross-modal vector representation. The corresponding third mapping result;

映射结果拼接单元，用于通过所述拼接层对所述第一映射结果、所述第三映射结果和所述第二映射结果依次拼接，得到拼接结果；A mapping result splicing unit, configured to splice the first mapping result, the third mapping result and the second mapping result sequentially through the splicing layer to obtain a splicing result;

拼接结果降维处理单元，用于通过所述多层感知机对所述拼接结果进行降维处理，得到内容在所述多域通用内容表示空间的多模态向量表示。The splicing result dimensionality reduction processing unit is used to perform dimensionality reduction processing on the splicing result through the multi-layer perceptron to obtain a multi-modal vector representation of the content in the multi-domain universal content representation space.

可选的，所述映射层包括白化层和混合专家网络层，所述白化层包括分别对应于图像模态、文本模态和图像文本跨模态的第一白化模块、第二白化模块和第三白化模块，所述混合专家网络层包括分别对应于图像模态、文本模态和图像文本跨模态的第一混合专家网络、第二混合专家网络和第三混合专家网络，所述向量映射结果获得单元，包括：Optionally, the mapping layer includes a whitening layer and a hybrid expert network layer. The whitening layer includes a first whitening module, a second whitening module and a third whitening module respectively corresponding to image modality, text modality and image-text cross-modality. Three whitening modules, the hybrid expert network layer includes a first hybrid expert network, a second hybrid expert network and a third hybrid expert network respectively corresponding to image modality, text modality and image-text cross-modality, the vector mapping Result acquisition units include:

向量表示白化处理单元，用于通过所述第一白化模块、所述第二白化模块和所述第三白化模块分别对所述第二图像模态向量表示、所述第二文本模态向量表示以及所述第一跨模态向量表示进行白化处理，得到所述第二图像模态向量表示、所述第二文本模态向量表示以及所述第一跨模态向量表示分别对应的第一白化结果、第二白化结果和第三白化结果；The vector representation whitening processing unit is used to respectively represent the second image modality vector and the second text modality vector through the first whitening module, the second whitening module and the third whitening module. And perform whitening processing on the first cross-modal vector representation to obtain first whitening corresponding to the second image modal vector representation, the second text modal vector representation and the first cross-modal vector representation respectively. Result, second whitening result and third whitening result;

白化结果门控处理单元，用于通过所述第一混合专家网络、所述第二混合专家网络和所述第三混合专家网络分别通过所述门控机制对所述第一白化结果、所述第二白化结果和所述第三白化结果进行处理，得到所述第一映射结果、所述第二映射结果和所述第三映射结果。A whitening result gating processing unit configured to process the first whitening result, the first whitening result, and the third hybrid expert network through the gating mechanism through the first hybrid expert network, the second hybrid expert network, and the third hybrid expert network respectively. The second whitening result and the third whitening result are processed to obtain the first mapping result, the second mapping result and the third mapping result.

可选的，所述行为表示构造模块，包括：Optionally, the behavior represents the construction module, including:

位置信息获得单元，用于根据所述单源域行为序列中内容的排序，得到内容在所述单源域行为序列中的位置信息；A location information obtaining unit, configured to obtain the location information of the content in the single-source domain action sequence according to the sorting of the content in the single-source domain action sequence;

内容表示序列获得单元，用于根据所述单源域行为序列中各内容分别在所述多域通用内容表示空间中的多模态向量表示，以及内容在所述单源域行为序列中的位置信息，得到所述单源域行为序列对应的内容表示序列；A content representation sequence acquisition unit, configured to obtain a multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, and the position of the content in the single-source domain behavior sequence. information to obtain the content representation sequence corresponding to the single-source domain behavior sequence;

序列编码处理单元，用于将所述内容表示序列作为所述对象行为编码器的输入，通过所述对象行为编码器对所述内容表示序列进行编码处理，得到所述对象在所述单源域行为序列所属源域的行为向量表示。A sequence encoding processing unit, configured to use the content representation sequence as an input to the object behavior encoder, and encode the content representation sequence through the object behavior encoder to obtain the object in the single source domain. Behavior vector representation of the source domain to which the behavior sequence belongs.

可选的，所述内容表示序列获得单元，包括：Optionally, the content represents the sequence acquisition unit, including:

位置向量表示编码单元，用于将内容在所述单源域行为序列中的位置信息编码为位置向量表示；A position vector representation encoding unit, used to encode the position information of the content in the single-source domain behavior sequence into a position vector representation;

向量叠加结果获得单元，用于将同一内容在所述多域通用内容表示空间中的多模态向量表示以及在所述单源域行为序列中的位置向量表示相加，得到该内容对应的向量叠加结果；A vector superposition result obtaining unit is used to add the multi-modal vector representation of the same content in the multi-domain universal content representation space and the position vector representation in the single-source domain behavior sequence to obtain the vector corresponding to the content. Overlay results;

内容表示序列构建单元，用于根据所述单源域行为序列中的各内容分别对应的向量叠加结果，构建所述单源域行为序列对应的内容表示序列。A content representation sequence construction unit is configured to construct a content representation sequence corresponding to the single-source domain behavior sequence based on the vector superposition results corresponding to each content in the single-source domain behavior sequence.

可选的，所述装置还包括预测损失调整模块：Optionally, the device also includes a prediction loss adjustment module:

所述预测损失调整模块，用于所述预训练截止条件包括第一条件和第二条件，其中第一条件为关于预测损失的条件，第二条件为关于对比学习综合损失的条件；所述第一条件包括：预测损失小于第一损失阈值；其中，预测损失为基于预测触发的首个源域的内容和实际触发的首个相同源域的内容的差距得到；The prediction loss adjustment module is used for the pre-training cutoff conditions to include a first condition and a second condition, where the first condition is a condition regarding prediction loss, and the second condition is a condition regarding contrastive learning comprehensive loss; the third condition One condition includes: the prediction loss is less than the first loss threshold; wherein the prediction loss is obtained based on the difference between the content of the first source domain triggered by prediction and the content of the first same source domain actually triggered;

所述第二条件包括：对比学习综合损失小于第二损失阈值；其中，对比学习综合损失为有关于跨域序列与内容对比学习任务以及有关于跨域序列与序列对比学习任务的损失；The second condition includes: the comprehensive loss of contrastive learning is less than the second loss threshold; wherein the comprehensive loss of contrastive learning is the loss related to the cross-domain sequence and content comparison learning task and the cross-domain sequence and sequence comparison learning task;

所述跨域序列与内容对比学习任务中，所述单源域行为序列的末尾内容之后实际触发的首个相同源域的内容作为正例，与所述单源域行为序列同一批次输入到模型中的其他单源域行为序列中涉及其他源域的内容作为负例；In the cross-domain sequence and content comparison learning task, the first content of the same source domain that is actually triggered after the end content of the single-source domain behavior sequence is used as a positive example and is input into the same batch as the single-source domain behavior sequence. Contents involving other source domains in other single-source domain behavior sequences in the model are used as negative examples;

所述跨域序列与序列对比学习任务中，所述单源域行为序列对应的数据缺失序列作为正例，与所述单源域行为序列同一批次输入到模型中的其他源域的单源域行为序列作为负例；所述数据缺失序列为通过随机丢弃所述单源域行为序列中的内容得到的，或者所述数据缺失序列为随机丢弃所述单源域行为序列中的内容对应的一种或多种模态信息得到的。In the cross-domain sequence and sequence comparison learning task, the data missing sequence corresponding to the single-source domain behavior sequence is used as a positive example, and the single sources of other source domains are input into the model in the same batch as the single-source domain behavior sequence. The domain behavior sequence is used as a negative example; the data missing sequence is obtained by randomly discarding the content in the single-source domain behavior sequence, or the data missing sequence is obtained by randomly discarding the content corresponding to the single-source domain behavior sequence. Obtained from one or more modal information.

基于前文实施例提供的模型调整方法，本申请中还相应提供了一种模型调整装置。以下结合图15进行说明。图15为本申请实施例提供的模型调整装置的结构示意图。如图15所示的模型调整装置包括：Based on the model adjustment method provided in the foregoing embodiments, this application also provides a model adjustment device. Description will be made below with reference to Figure 15. Figure 15 is a schematic structural diagram of a model adjustment device provided by an embodiment of the present application. The model adjustment device shown in Figure 15 includes:

混合流行为序列获取模块1501，用于获取目标对象的多域混合流行为序列；所述多域混合流行为序列包括多个领域的多个内容，且所述多个领域的多个内容依照受所述目标对象触发的时间由先到后排序；所述多域混合流行为序列涉及的多个领域中包括所述目标域；The mixed flow behavior sequence acquisition module 1501 is used to obtain the multi-domain mixed flow behavior sequence of the target object; the multi-domain mixed flow behavior sequence includes multiple contents in multiple fields, and the multiple contents in the multiple fields are in accordance with the subject The time at which the target object is triggered is sorted from first to last; the multiple domains involved in the multi-domain mixed flow behavior sequence include the target domain;

多模态向量表示获得模块1502，用于基于所述多域混合流行为序列中的各内容对应的多模态信息，通过所述初步推荐模型，分别获得所述多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示；The multi-modal vector representation obtaining module 1502 is configured to obtain the multi-domain mixed flow behavior sequence through the preliminary recommendation model based on the multi-modal information corresponding to each content in the multi-domain mixed flow behavior sequence. Multi-modal vector representation of each content in a multi-domain universal content representation space;

混合流行为向量表示获得模块1503，用于根据所述多域混合流行为序列中的各内容的排序，以及所述多域混合流行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得所述目标对象在所述多域混合流行为序列对应的多域混合流行为向量表示；The mixed flow behavior vector representation acquisition module 1503 is used for sorting each content in the multi-domain mixed flow behavior sequence and the multi-modality of each content in the multi-domain mixed flow behavior sequence in the multi-domain universal content representation space. State vector representation, obtaining the multi-domain mixed flow behavior vector representation corresponding to the multi-domain mixed flow behavior sequence of the target object;

目标域内容预测模块1504，用于由所述初步推荐模型基于所述多域混合流行为向量表示，预测所述目标对象触发所述多域混合流行为序列的末尾内容之后，触发的首个所述目标域的内容；The target domain content prediction module 1504 is configured to use the preliminary recommendation model to predict, based on the multi-domain mixed flow behavior vector representation, the first trigger of the target object after triggering the end content of the multi-domain mixed flow behavior sequence. Describe the content of the target domain;

目标推荐模型获得模块1505，用于根据预测触发的所述目标域的内容和所述目标对象在所述多域混合流行为序列的末尾内容之后，实际触发的首个所述目标域的内容的差别，迭代调整所述初步推荐模型的参数，直至模型调整好满足预设微调截止条件，结束调整得到目标推荐模型。The target recommendation model acquisition module 1505 is configured to obtain the content of the first target domain actually triggered by the target object based on the predicted content of the target domain after the end content of the multi-domain mixed flow behavior sequence. difference, iteratively adjust the parameters of the preliminary recommendation model until the model is adjusted to meet the preset fine-tuning cutoff conditions, and then the adjustment is completed to obtain the target recommendation model.

基于前文实施例提供的推荐方法，本申请中还相应提供了一种推荐装置。以下结合图16进行说明。图16为本申请实施例提供的推荐装置的结构示意图。如图16所示的推荐装置包括：Based on the recommendation method provided in the foregoing embodiments, this application also provides a recommendation device. Description will be made below with reference to Figure 16 . Figure 16 is a schematic structural diagram of a recommendation device provided by an embodiment of the present application. Recommended devices as shown in Figure 16 include:

历史行为序列获取模块1601，用于获取待推荐对象的历史行为序列，所述历史行为序列中至少包含隶属于所述目标域的内容，且所述历史行为序列中的各内容依照受所述待推荐对象触发的时间由先到后排序；The historical behavior sequence acquisition module 1601 is used to obtain the historical behavior sequence of the object to be recommended. The historical behavior sequence at least contains content belonging to the target domain, and each content in the historical behavior sequence is subject to the to-be-recommended object. The time when the recommended objects are triggered is sorted from first to last;

历史多模态向量表示获得模块1602，用于基于所述历史行为序列中的各内容对应的多模态信息，通过所述目标推荐模型，分别获得所述历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示；The historical multi-modal vector representation acquisition module 1602 is used to obtain the multi-domain representation of each content in the historical behavior sequence through the target recommendation model based on the multi-modal information corresponding to each content in the historical behavior sequence. Multimodal vector representations of universal content representation spaces;

历史行为向量表示获得模块1603，用于根据所述历史行为序列中的各内容的排序，以及所述历史行为序列中的各内容在多域通用内容表示空间的多模态向量表示，获得所述待推荐对象在所述历史行为序列对应的历史行为向量表示；The historical behavior vector representation obtaining module 1603 is used to obtain the historical behavior vector representation according to the sorting of each content in the historical behavior sequence and the multi-modal vector representation of each content in the historical behavior sequence in a multi-domain universal content representation space. The historical behavior vector representation corresponding to the historical behavior sequence of the object to be recommended;

历史目标域内容预测模块1604，用于由所述目标推荐模型基于所述历史行为向量表示，预测所述待推荐对象触发所述历史行为序列的末尾内容之后，触发的首个所述目标域的内容；The historical target domain content prediction module 1604 is used to predict, based on the historical behavior vector representation by the target recommendation model, the first trigger of the target domain after the object to be recommended triggers the end content of the historical behavior sequence. content;

目标域内容推荐模块1605，用于向所述待推荐对象推荐所述目标推荐模型预测出的首个所述目标域的内容。The target domain content recommendation module 1605 is configured to recommend the first content of the target domain predicted by the target recommendation model to the object to be recommended.

本申请实施例提供了一种计算机设备，该计算机设备可以为服务器。图17是本申请实施例提供的一种服务器结构示意图，该服务器900可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上中央处理器(central processing units，CPU)922(例如，一个或一个以上处理器)和存储器932，一个或一个以上存储应用程序942或数据944的存储介质930(例如一个或一个以上海量存储设备)。其中，存储器932和存储介质930可以是短暂存储或持久存储。存储在存储介质930的程序可以包括一个或一个以上模块(图示没标出)，每个模块可以包括对服务器中的一系列指令操作。更进一步地，中央处理器922可以设置为与存储介质930通信，在服务器900上执行存储介质930中的一系列指令操作。An embodiment of the present application provides a computer device, which may be a server. Figure 17 is a schematic structural diagram of a server provided by an embodiment of the present application. The server 900 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPUs) 922 (for example, , one or more processors) and memory 932, one or more storage media 930 (eg, one or more mass storage devices) that stores applications 942 or data 944. Among them, the memory 932 and the storage medium 930 may be short-term storage or persistent storage. The program stored in the storage medium 930 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server. Furthermore, the central processor 922 may be configured to communicate with the storage medium 930 and execute a series of instruction operations in the storage medium 930 on the server 900 .

服务器900还可以包括一个或一个以上电源926，一个或一个以上有线或无线网络接口950，一个或一个以上输入输出接口958，和/或，一个或一个以上操作系统941。Server 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input and output interfaces 958, and/or, one or more operating systems 941.

其中，CPU 922用于执行如下步骤：Among them, CPU 922 is used to perform the following steps:

根据预测触发的首个相同源域的内容和所述对象在所述单源域行为序列的末尾内容之后实际触发的首个相同源域的内容的差别，迭代调整所述待训练推荐模型的参数，直至调整后的模型满足预训练截止条件，预训练结束得到初步推荐模型；Iteratively adjust the parameters of the recommendation model to be trained according to the difference between the content of the first same source domain triggered by prediction and the content of the first same source domain actually triggered by the object after the end content of the single-source domain behavior sequence. , until the adjusted model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after the pre-training is completed;

或者；or;

根据预测触发的所述目标域的内容和所述目标对象在所述多域混合流行为序列的末尾内容之后，实际触发的首个所述目标域的内容的差别，迭代调整所述初步推荐模型的参数，直至模型调整好满足预设微调截止条件，结束调整得到目标推荐模型；Iteratively adjust the preliminary recommendation model according to the difference between the content of the target domain that is predicted to be triggered and the content of the first target domain that is actually triggered by the target object after the end content of the multi-domain mixed flow behavior sequence. parameters until the model is adjusted to meet the preset fine-tuning cutoff conditions, and the adjustment is completed to obtain the target recommendation model;

或者；or;

本申请实施例还提供了另一种计算机设备，该计算机设备可以为终端设备。如图18所示，为了便于说明，仅示出了与本申请实施例相关的部分，具体技术细节未揭示的，请参照本申请实施例方法部分。以该终端设备为手机为例：An embodiment of the present application also provides another computer device, which may be a terminal device. As shown in FIG. 18 , for convenience of explanation, only the parts related to the embodiments of the present application are shown. If the specific technical details are not disclosed, please refer to the method part of the embodiments of the present application. Taking the terminal device as a mobile phone as an example:

图18示出的是与本申请实施例提供的手机的部分结构的框图。参考图18，手机包括：射频(英文全称：Radio Frequency，英文缩写：RF)电路1010、存储器1020、输入单元1030、显示单元1040、传感器1050、音频电路1060、无线保真(英文全称：wirelessfidelity，英文缩写：WiFi)模块1070、处理器1080、以及电源1090等部件。本领域技术人员可以理解，图18中示出的手机结构并不构成对手机的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Figure 18 shows a block diagram of a partial structure of a mobile phone provided by an embodiment of the present application. Referring to Figure 18, the mobile phone includes: radio frequency (English full name: Radio Frequency, English abbreviation: RF) circuit 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuit 1060, wireless fidelity (English full name: wirelessfidelity, English abbreviation: WiFi module 1070, processor 1080, power supply 1090 and other components. Those skilled in the art can understand that the structure of the mobile phone shown in FIG. 18 does not limit the mobile phone, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components.

下面结合图18对手机的各个构成部件进行具体的介绍：The following is a detailed introduction to each component of the mobile phone in conjunction with Figure 18:

RF电路1010可用于收发信息或通话过程中，信号的接收和发送，特别地，将基站的下行信息接收后，给处理器1080处理；另外，将设计上行的数据发送给基站。通常，RF电路1010包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(英文全称：LowNoise Amplifier，英文缩写：LNA)、双工器等。此外，RF电路1010还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议，包括但不限于全球移动通讯系统(英文全称：Global System of Mobile communication，英文缩写：GSM)、通用分组无线服务(英文全称：General Packet Radio Service，GPRS)、码分多址(英文全称：CodeDivision Multiple Access，英文缩写：CDMA)、宽带码分多址(英文全称：Wideband CodeDivision Multiple Access,英文缩写：WCDMA)、长期演进(英文全称：Long TermEvolution，英文缩写：LTE)、电子邮件、短消息服务(英文全称：Short Messaging Service，SMS)等。The RF circuit 1010 can be used to receive and transmit information or signals during a call. In particular, after receiving downlink information from the base station, it is processed by the processor 1080; in addition, the designed uplink data is sent to the base station. Generally, the RF circuit 1010 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low-noise amplifier (English full name: LowNoise Amplifier, English abbreviation: LNA), a duplexer, etc. In addition, RF circuit 1010 can also communicate with networks and other devices through wireless communications. The above wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (English full name: Global System of Mobile communication, English abbreviation: GSM), General Packet Radio Service (English full name: General Packet Radio Service, GPRS ), Code Division Multiple Access (English full name: CodeDivision Multiple Access, English abbreviation: CDMA), Wideband Code Division Multiple Access (English full name: Wideband CodeDivision Multiple Access, English abbreviation: WCDMA), Long Term Evolution (English full name: Long TermEvolution, English Abbreviation: LTE), email, short message service (full English name: Short Messaging Service, SMS), etc.

存储器1020可用于存储软件程序以及模块，处理器1080通过运行存储在存储器1020的软件程序以及模块，从而执行手机的各种功能应用以及数据处理。存储器1020可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外，存储器1020可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 1020 can be used to store software programs and modules. The processor 1080 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1020 . The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store a program according to Data created by the use of mobile phones (such as audio data, phone books, etc.), etc. In addition, the memory 1020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

输入单元1030可用于接收输入的数字或字符信息，以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地，输入单元1030可包括触控面板1031以及其他输入设备1032。触控面板1031，也称为触摸屏，可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1031上或在触控面板1031附近的操作)，并根据预先设定的程式驱动相应的连接装置。可选的，触控面板1031可包括触摸检测装置和触摸控制器两个部分。其中，触摸检测装置检测用户的触摸方位，并检测触摸操作带来的信号，将信号传送给触摸控制器；触摸控制器从触摸检测装置上接收触摸信息，并将它转换成触点坐标，再送给处理器1080，并能接收处理器1080发来的命令并加以执行。此外，可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1031。除了触控面板1031，输入单元1030还可以包括其他输入设备1032。具体地，其他输入设备1032可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 1030 may be used to receive input numeric or character information, and generate key signal input related to user settings and function control of the mobile phone. Specifically, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031 , also known as a touch screen, can collect the user's touch operations on or near the touch panel 1031 (for example, the user uses a finger, stylus, or any other suitable object or accessory on or near the touch panel 1031 operation), and drive the corresponding connection device according to the preset program. Optionally, the touch panel 1031 may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the touch controller. to the processor 1080, and can receive commands from the processor 1080 and execute them. In addition, the touch panel 1031 can be implemented using various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1031, the input unit 1030 may also include other input devices 1032. Specifically, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, etc.

显示单元1040可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元1040可包括显示面板1041，可选的，可以采用液晶显示器(英文全称：Liquid Crystal Display，英文缩写：LCD)、有机发光二极管(英文全称：Organic Light-Emitting Diode，英文缩写：OLED)等形式来配置显示面板1041。进一步的，触控面板1031可覆盖显示面板1041，当触控面板1031检测到在其上或附近的触摸操作后，传送给处理器1080以确定触摸事件的类型，随后处理器1080根据触摸事件的类型在显示面板1041上提供相应的视觉输出。虽然在图18中，触控面板1031与显示面板1041是作为两个独立的部件来实现手机的输入和输入功能，但是在某些实施例中，可以将触控面板1031与显示面板1041集成而实现手机的输入和输出功能。The display unit 1040 may be used to display information input by the user or information provided to the user as well as various menus of the mobile phone. The display unit 1040 may include a display panel 1041. Optionally, a liquid crystal display (English full name: Liquid Crystal Display, English abbreviation: LCD), organic light-emitting diode (English full name: Organic Light-Emitting Diode, English abbreviation: OLED), etc. may be used. to configure the display panel 1041. Further, the touch panel 1031 can cover the display panel 1041. When the touch panel 1031 detects a touch operation on or near it, it is sent to the processor 1080 to determine the type of the touch event. The processor 1080 then determines the type of the touch event. Type provides corresponding visual output on display panel 1041. Although in Figure 18, the touch panel 1031 and the display panel 1041 are used as two independent components to implement the input and input functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 can be integrated. Realize the input and output functions of mobile phone.

手机还可包括至少一种传感器1050，比如光传感器、运动传感器以及其他传感器。具体地，光传感器可包括环境光传感器及接近传感器，其中，环境光传感器可根据环境光线的明暗来调节显示面板1041的亮度，接近传感器可在手机移动到耳边时，关闭显示面板1041和/或背光。作为运动传感器的一种，加速计传感器可检测各个方向上(一般为三轴)加速度的大小，静止时可检测出重力的大小及方向，可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等；至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器，在此不再赘述。The mobile phone may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of the ambient light. The proximity sensor may close the display panel 1041 and/or when the mobile phone is moved to the ear. or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three axes). It can detect the magnitude and direction of gravity when stationary. It can be used to identify applications of mobile phone posture (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, etc. that can be configured on the mobile phone, we will not mention them here. Repeat.

音频电路1060、扬声器1061，传声器1062可提供用户与手机之间的音频接口。音频电路1060可将接收到的音频数据转换后的电信号，传输到扬声器1061，由扬声器1061转换为声音信号输出；另一方面，传声器1062将收集的声音信号转换为电信号，由音频电路1060接收后转换为音频数据，再将音频数据输出处理器1080处理后，经RF电路1010以发送给比如另一手机，或者将音频数据输出至存储器1020以便进一步处理。The audio circuit 1060, speaker 1061, and microphone 1062 can provide an audio interface between the user and the mobile phone. The audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the speaker 1061 converts it into a sound signal for output; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal, and the audio circuit 1060 After receiving, it is converted into audio data, and then processed by the audio data output processor 1080, and then sent to, for example, another mobile phone through the RF circuit 1010, or the audio data is output to the memory 1020 for further processing.

WiFi属于短距离无线传输技术，手机通过WiFi模块1070可以帮助用户收发电子邮件、浏览网页和访问流式媒体等，它为用户提供了无线的宽带互联网访问。虽然图18示出了WiFi模块1070，但是可以理解的是，其并不属于手机的必须构成，完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-distance wireless transmission technology. The mobile phone can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 1070. It provides users with wireless broadband Internet access. Although FIG. 18 shows the WiFi module 1070, it can be understood that it is not a necessary component of the mobile phone and can be omitted as needed without changing the essence of the invention.

处理器1080是手机的控制中心，利用各种接口和线路连接整个手机的各个部分，通过运行或执行存储在存储器1020内的软件程序和/或模块，以及调用存储在存储器1020内的数据，执行手机的各种功能和处理数据，从而对手机进行整体数据及信息收集。可选的，处理器1080可包括一个或多个处理单元；优选的，处理器1080可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器1080中。The processor 1080 is the control center of the mobile phone, using various interfaces and lines to connect various parts of the entire mobile phone, and executing software programs and/or modules stored in the memory 1020 by running or executing them, and calling data stored in the memory 1020. Various functions of the mobile phone and processing data, thereby collecting overall data and information on the mobile phone. Optionally, the processor 1080 may include one or more processing units; preferably, the processor 1080 may integrate an application processor and a modem processor, where the application processor mainly processes operating systems, user interfaces, application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 1080.

手机还包括给各个部件供电的电源1090(比如电池)，优选的，电源可以通过电源管理系统与处理器1080逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The mobile phone also includes a power supply 1090 (such as a battery) that supplies power to various components. Preferably, the power supply can be logically connected to the processor 1080 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.

尽管未示出，手机还可以包括摄像头、蓝牙模块等，在此不再赘述。Although not shown, the mobile phone may also include a camera, a Bluetooth module, etc., which will not be described in detail here.

在本申请实施例中，该手机所包括的处理器1080还具有以下功能：In this embodiment of the present application, the processor 1080 included in the mobile phone also has the following functions:

或者；or;

本申请实施例还提供一种计算机可读存储介质，用于存储计算机程序，该计算机程序在计算机设备上运行时，使得该计算机设备用于执行前述各个实施例所述的推荐模型的预训练方法中的任意一种实施方式，或者执行前述各个实施例所述的模型调整方法中的任意一种实施方式，或者执行前述各个实施例所述的推荐方法中的任意一种实施方式。Embodiments of the present application also provide a computer-readable storage medium for storing a computer program. When the computer program is run on a computer device, the computer device is used to execute the pre-training method of the recommendation model described in each of the foregoing embodiments. Any one of the implementations, or any one of the model adjustment methods described in the foregoing embodiments, or any one of the recommendation methods described in the foregoing embodiments.

本申请实施例还提供一种包括计算机程序的计算机程序产品，当其在计算机设备上运行时，使得计算机设备执行前述各个实施例所述的推荐模型的预训练方法中的任意一种实施方式，或者执行前述各个实施例所述的模型调整方法中的任意一种实施方式，或者执行前述各个实施例所述的推荐方法中的任意一种实施方式。Embodiments of the present application also provide a computer program product including a computer program, which, when run on a computer device, causes the computer device to execute any one of the pre-training methods for the recommendation model described in the foregoing embodiments, Either execute any one of the model adjustment methods described in the foregoing embodiments, or execute any one of the recommendation methods described in the foregoing embodiments.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、设备的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems and devices described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统和方法，可以通过其它的方式实现。例如，以上所描述的系统实施例仅仅是示意性的，例如，所述系统的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个系统可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems and methods can be implemented in other ways. For example, the system embodiments described above are only illustrative. For example, the division of the system is only a logical function division. In actual implementation, there may be other division methods. For example, multiple systems may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的系统可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The systems described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(英文全称：Read-OnlyMemory，英文缩写：ROM)、随机存取存储器(英文全称：Random Access Memory，英文缩写：RAM)、磁碟或者光盘等各种可以存储计算机程序的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-OnlyMemory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic disk Or various media such as optical discs that can store computer programs.

以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present application, but are not intended to limit them. Although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still modify the technical solutions described in the foregoing embodiments. Modifications may be made to the recorded technical solutions, or equivalent substitutions may be made to some of the technical features; however, these modifications or substitutions shall not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A pre-training method for recommendation models, which is characterized by including:

Obtain the single-source domain behavior sequence of the object; the single-source domain behavior sequence includes multiple contents from the same source domain, and the multiple contents are sorted from first to last according to the time triggered by the object;

Obtain multi-modal information corresponding to the content in the single-source domain behavior sequence; the multi-modal information includes information of at least two different modalities;

The multi-modal information corresponding to the content is used as the input of the recommendation model to be trained, and the multi-modal vector representation of the content in the multi-domain universal content representation space is obtained by processing the recommendation model to be trained on the basis of the input multi-modal information. ;

According to the sorting of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, the object in the single-source domain behavior sequence is obtained. The behavior vector representation of the source domain to which the source domain behavior sequence belongs;

The recommendation model to be trained predicts, based on the behavior vector representation, the first content of the same source domain triggered after the object triggers the end content of the single-source domain behavior sequence;

Iteratively adjust the parameters of the recommendation model to be trained according to the difference between the content of the first same source domain triggered by prediction and the content of the first same source domain actually triggered by the object after the end content of the single-source domain behavior sequence. , until the adjusted model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after the pre-training is completed.

2. The method according to claim 1, characterized in that the recommendation model to be trained includes a content representation construction module, a behavior representation construction module and a prediction module, wherein the input end and the output end of the behavior representation construction module are respectively connecting the output of the content representation construction module and the input of the prediction module;

The content representation construction module is used to process and obtain a multi-modal vector representation of the content in a multi-domain universal content representation space based on the input multi-modal information;

The behavior representation construction module is used to sort the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space. , obtain the behavior vector representation of the object in the source domain to which the single-source domain behavior sequence belongs;

The prediction module is used to predict, based on the behavior vector representation, the content of the first same source domain triggered by the object after triggering the end content of the single-source domain behavior sequence, and when the pre-training cutoff condition is not met. , back propagation adjusts the parameters of the behavioral representation building blocks and/or the content representation building blocks.

3. The method according to claim 2, characterized in that the content representation construction module includes: a multi-modal content representation constructor and a multi-domain mapper;

The multi-modal information corresponding to the content is used as the input of the recommendation model to be trained, and the multi-modal information of the content in the multi-domain universal content representation space is obtained by processing the recommendation model to be trained on the basis of the input multi-modal information. Vector representation, including:

The multi-modal information corresponding to the content is input into the multi-modal content representation constructor, and the multi-modal information is jointly represented through the multi-modal content representation constructor to obtain the representation of the content in the source domain to which it belongs. multimodal representation;

The multi-modal representation is mapped to the multi-domain universal content representation space by the multi-domain mapper, thereby obtaining a multi-modal vector representation of the content in the multi-domain universal content representation space.

4. The method according to claim 3, wherein the at least two different modalities of information include image modality information and text modality information; the multi-modal content representation constructor includes: visual and language A joint representation model and an adaptation layer, which includes an image modality self-attention module, a text modality self-attention module, and an image-text cross-modality self-attention module;

The multi-modal information corresponding to the content is input into the multi-modal content representation constructor, and the multi-modal information is jointly represented through the multi-modal content representation constructor to obtain the content in the corresponding source. Multimodal representation of domains, including:

The image modality information and text modality information corresponding to the content are jointly input into the visual and language joint representation model, and the image modality information and the text modality information are processed through the visual and language joint representation model. Joint representation learning to obtain the first image modal vector representation and the first text modal vector representation of the content in the source domain;

The first image modality vector representation is learned through the image modality self-attention module, the first text modality vector representation is learned through the text modality self-attention module, and the image-text cross-modality The modal self-attention module jointly learns the first image modal vector representation and the first text modal vector representation to obtain the image modal self-attention module, the text modal self-attention module and the The content output by the image-text cross-modal self-attention module is the second image modal vector representation, the second text modal vector representation, and the first cross-modal vector representation of the source domain to which it belongs.

5. The method according to claim 4, wherein the multi-domain mapper includes a mapping layer, a splicing layer and a multi-layer perceptron;

Mapping the multi-modal representation to the multi-domain universal content representation space through the multi-domain mapper to obtain a multi-modal vector representation of the content in the multi-domain universal content representation space includes:

Through the multi-domain mapper, the second image modal vector representation, the second text modal vector representation and the first cross-modal vector representation of the content in the corresponding source domain are respectively mapped to the multi-domain universal content representation space, Obtain the first mapping result corresponding to the second image modality vector representation, the second mapping result corresponding to the second text modality vector representation, and the third mapping result corresponding to the first cross-modal vector representation;

The first mapping result, the third mapping result and the second mapping result are sequentially spliced through the splicing layer to obtain a splicing result;

The multi-layer perceptron performs dimensionality reduction processing on the splicing result to obtain a multi-modal vector representation of the content in the multi-domain universal content representation space.

6. The method according to claim 5, characterized in that the mapping layer includes a whitening layer and a hybrid expert network layer, and the whitening layer includes respectively corresponding to image modality, text modality and image-text cross-modality. A first whitening module, a second whitening module and a third whitening module; the hybrid expert network layer includes a first hybrid expert network and a second hybrid expert network corresponding to image modality, text modality and image-text cross-modality respectively. and a third hybrid expert network; the first hybrid expert network, the second hybrid expert network and the third hybrid expert network all adopt gates oriented to multiple fields involved in the multi-domain universal content representation space. control mechanism; the multiple domains include the source domain to which the content belongs;

By using the multi-domain mapper, the second image modal vector representation, the second text modal vector representation and the first cross-modal vector representation of the content in the source domain are respectively mapped to the multi-domain universal content representation. space to obtain the first mapping result corresponding to the second image modality vector representation, the second mapping result corresponding to the second text modality vector representation, and the third mapping result corresponding to the first cross-modal vector representation. ,include:

The second image modality vector representation, the second text modality vector representation and the first cross-modality are respectively processed through the first whitening module, the second whitening module and the third whitening module. The vector representation is whitened to obtain the first whitening result, the second whitening result and the third whitening result respectively corresponding to the second image modal vector representation, the second text modal vector representation and the first cross-modal vector representation. Three whitening results;

Through the first hybrid expert network, the second hybrid expert network and the third hybrid expert network, the first whitening result, the second whitening result and the third whitening result are respectively processed through the gating mechanism. The whitening result is processed to obtain the first mapping result, the second mapping result and the third mapping result.

7. The method of claim 2, wherein the behavior representation construction module includes an object behavior encoder;

According to the sorting of the content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, the location of the object is obtained. The behavior vector representation of the source domain to which the single-source domain behavior sequence belongs includes:

According to the sorting of the content in the single-source domain behavior sequence, obtain the position information of the content in the single-source domain behavior sequence;

According to the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, and the position information of the content in the single-source domain behavior sequence, the single-source domain is obtained The content corresponding to the behavior sequence represents the sequence;

The content representation sequence is used as the input of the object behavior encoder, and the content representation sequence is encoded by the object behavior encoder to obtain the behavior of the object in the source domain to which the single source domain behavior sequence belongs. vector representation.

8. The method according to claim 7, characterized in that, according to the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space, and the location of the content in the multi-domain universal content representation space, Describe the position information in the single-source domain behavior sequence, and obtain the content representation sequence corresponding to the single-source domain behavior sequence, including:

Encoding the position information of the content in the single-source domain behavior sequence into a position vector representation;

Add the multi-modal vector representation of the same content in the multi-domain universal content representation space and the position vector representation in the single-source domain behavior sequence to obtain the vector superposition result corresponding to the content;

According to the vector superposition results corresponding to each content in the single-source domain behavior sequence, a content representation sequence corresponding to the single-source domain behavior sequence is constructed.

9. The method according to any one of claims 1 to 5, characterized in that the pre-training cutoff condition includes a first condition and a second condition, wherein the first condition is a condition regarding prediction loss, and the second condition is Regarding the conditions for comparative learning comprehensive loss; the first condition includes: the prediction loss is less than the first loss threshold; wherein the prediction loss is based on the content of the first source domain triggered by prediction and the content of the first same source domain actually triggered gap gets;

The second condition includes: the comprehensive loss of contrastive learning is less than the second loss threshold; wherein the comprehensive loss of contrastive learning is the loss related to the cross-domain sequence and content comparison learning task and the cross-domain sequence and sequence comparison learning task;

In the cross-domain sequence and content comparison learning task, the first content of the same source domain that is actually triggered after the end content of the single-source domain behavior sequence is used as a positive example and is input into the same batch as the single-source domain behavior sequence. Contents involving other source domains in other single-source domain behavior sequences in the model are used as negative examples;

In the cross-domain sequence and sequence comparison learning task, the data missing sequence corresponding to the single-source domain behavior sequence is used as a positive example, and the single sources of other source domains are input into the model in the same batch as the single-source domain behavior sequence. The domain behavior sequence is used as a negative example; the data missing sequence is obtained by randomly discarding the content in the single-source domain behavior sequence, or the data missing sequence is obtained by randomly discarding the content corresponding to the single-source domain behavior sequence. Obtained from one or more modal information.

10. A model adjustment method, characterized in that it is used to adjust the preliminary recommendation model obtained by pre-training by the method according to any one of claims 1 to 9, so as to realize the transition of the preliminary recommendation model from the source domain to the target domain. Migration; the model adjustment method includes:

Obtain the multi-domain mixed flow behavior sequence of the target object; the multi-domain mixed flow behavior sequence includes multiple contents in multiple fields, and the multiple contents in the multiple fields arrive first according to the time triggered by the target object. Post-sequencing; multiple fields involved in the multi-domain mixed flow behavior sequence include the target domain;

Based on the multi-modal information corresponding to each content in the multi-domain mixed flow behavior sequence, through the preliminary recommendation model, the multi-domain general content representation space of each content in the multi-domain mixed flow behavior sequence is obtained. Modal vector representation;

According to the ordering of each content in the multi-domain mixed flow behavior sequence and the multi-modal vector representation of each content in the multi-domain mixed flow behavior sequence in the multi-domain universal content representation space, the target object is obtained at the location of the multi-domain mixed flow behavior sequence. Multi-domain mixed flow behavior vector representation corresponding to the multi-domain mixed flow behavior sequence;

The preliminary recommendation model predicts, based on the multi-domain mixed flow behavior vector representation, the content of the first target domain triggered by the target object after triggering the end content of the multi-domain mixed flow behavior sequence;

Iteratively adjust the preliminary recommendation model according to the difference between the content of the target domain that is predicted to be triggered and the content of the first target domain that is actually triggered by the target object after the end content of the multi-domain mixed flow behavior sequence. parameters until the model is adjusted to meet the preset fine-tuning cutoff conditions, and then the adjustment is completed to obtain the target recommendation model.

11. A recommendation method, characterized in that the target recommendation model adjusted by the model adjustment method of claim 10 is used for recommendation; the recommendation method includes:

Obtain the historical behavior sequence of the object to be recommended. The historical behavior sequence at least includes content belonging to the target domain, and each content in the historical behavior sequence is in order from first to last according to the time triggered by the object to be recommended. sort;

Based on the multi-modal information corresponding to each content in the historical behavior sequence, through the target recommendation model, the multi-modal vector representation of each content in the historical behavior sequence in the multi-domain universal content representation space is obtained respectively;

According to the sorting of each content in the historical behavior sequence and the multi-modal vector representation of each content in the historical behavior sequence in the multi-domain universal content representation space, the corresponding position of the object to be recommended in the historical behavior sequence is obtained. historical behavior vector representation;

The target recommendation model predicts, based on the historical behavior vector representation, the content of the first target domain triggered after the object to be recommended triggers the end content of the historical behavior sequence;

The content of the first target domain predicted by the target recommendation model is recommended to the object to be recommended.

12. A pre-training device for recommended models, characterized by including:

The behavior sequence acquisition module is used to obtain the single-source domain behavior sequence of the object; the single-source domain behavior sequence includes multiple contents of the same source domain, and the multiple contents are arranged from first to last according to the time triggered by the object. sort;

A multi-modal information acquisition module is used to obtain multi-modal information corresponding to the content in the single-source domain behavior sequence; the multi-modal information includes information of at least two different modalities;

The information input determination module is used to use the multi-modal information corresponding to the content as the input of the recommendation model to be trained, and obtain the multi-domain universal content representation of the content based on the input multi-modal information by the recommendation model to be trained. Multimodal vector representation of space;

A behavior representation construction module, configured to obtain a representation based on the sorting of content in the single-source domain behavior sequence and the multi-modal vector representation of each content in the single-source domain behavior sequence in the multi-domain universal content representation space. The object’s behavior vector representation in the source domain to which the single-source domain behavior sequence belongs;

A same source domain content prediction module, configured to use the recommendation model to be trained based on the behavior vector representation to predict the first content of the same source domain triggered after the object triggers the end content of the single source domain behavior sequence;

The preliminary recommendation model acquisition module is used to iteratively adjust based on the difference between the content of the first same source domain that is predicted to be triggered and the content of the first same source domain that is actually triggered by the object after the end content of the single-source domain behavior sequence. The parameters of the recommended model to be trained are used until the adjusted model meets the pre-training cutoff conditions, and the preliminary recommended model is obtained after the pre-training is completed.

13. A model adjustment device, characterized in that it includes:

The mixed flow behavior sequence acquisition module is used to obtain the multi-domain mixed flow behavior sequence of the target object; the multi-domain mixed flow behavior sequence includes multiple contents in multiple fields, and the multiple contents in the multiple fields are in accordance with the subject The triggering time of the target object is ordered from first to last; the target domain is included in the multiple fields involved in the multi-domain mixed flow behavior sequence;

A multi-modal vector representation obtaining module is configured to obtain each content in the multi-domain mixed flow behavior sequence through the preliminary recommendation model based on the multi-modal information corresponding to each content in the multi-domain mixed flow behavior sequence. Multi-modal vector representation of content in a multi-domain universal content representation space;

Mixed flow behavior vector representation acquisition module, used for sorting each content in the multi-domain mixed flow behavior sequence, and multi-modality of each content in the multi-domain mixed flow behavior sequence in a multi-domain universal content representation space Vector representation, obtaining the multi-domain mixed flow behavior vector representation corresponding to the multi-domain mixed flow behavior sequence of the target object;

A target domain content prediction module, configured to use the preliminary recommendation model based on the multi-domain mixed flow behavior vector representation to predict the first trigger of the target object after triggering the end content of the multi-domain mixed flow behavior sequence. The content of the target domain;

A target recommendation model acquisition module is used to obtain a module based on the difference between the content of the target domain that is predicted to be triggered and the content of the first target domain that is actually triggered by the target object after the end content of the multi-domain mixed flow behavior sequence. , iteratively adjust the parameters of the preliminary recommendation model until the model is adjusted to meet the preset fine-tuning cutoff conditions, and then the adjustment is completed to obtain the target recommendation model.

14. A recommendation device, characterized in that it includes:

The historical behavior sequence acquisition module is used to obtain the historical behavior sequence of the object to be recommended. The historical behavior sequence at least contains content belonging to the target domain, and each content in the historical behavior sequence is subject to the to-be-recommended object. The time of object triggering is sorted from first to last;

The historical multi-modal vector representation acquisition module is used to obtain the multi-domain universal information of each content in the historical behavior sequence through the target recommendation model based on the multi-modal information corresponding to each content in the historical behavior sequence. Multimodal vector representation of content representation space;

A historical behavior vector representation acquisition module, configured to obtain the to-be-listed content according to the sorting of each content in the historical behavior sequence and the multi-modal vector representation of each content in the historical behavior sequence in a multi-domain universal content representation space. The historical behavior vector representation of the recommended object corresponding to the historical behavior sequence;

A historical target domain content prediction module, configured to use the target recommendation model based on the historical behavior vector representation to predict the content of the first target domain triggered after the object to be recommended triggers the end content of the historical behavior sequence. ;

A target domain content recommendation module is configured to recommend the first content of the target domain predicted by the target recommendation model to the object to be recommended.

15. A computer device, characterized in that the device includes a processor and a memory:

The memory is used to store a computer program and transmit the computer program to the processor;

The processor is configured to execute the steps of the pre-training method of the recommendation model according to any one of claims 1 to 9 according to the instructions in the computer program, or to implement the steps of the model adjustment method of claim 10 when executed. , or when executed, the steps of the recommendation method described in claim 11 are implemented.