[go: up one dir, main page]

CN115829653A - Method, device, equipment and medium for determining relevancy of advertisement text - Google Patents

Method, device, equipment and medium for determining relevancy of advertisement text Download PDF

Info

Publication number
CN115829653A
CN115829653A CN202211541802.4A CN202211541802A CN115829653A CN 115829653 A CN115829653 A CN 115829653A CN 202211541802 A CN202211541802 A CN 202211541802A CN 115829653 A CN115829653 A CN 115829653A
Authority
CN
China
Prior art keywords
advertisement
text
texts
edge
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211541802.4A
Other languages
Chinese (zh)
Inventor
谭云飞
刘晓庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211541802.4A priority Critical patent/CN115829653A/en
Publication of CN115829653A publication Critical patent/CN115829653A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开提供了一种广告文本的相关度确定方法及装置、设备和介质,涉及人工智能技术领域,尤其涉及自然语言处理技术领域。实现方案为:获取与第一和第二广告文本分别对应的第一和第二特征向量表示;以及基于所述第一和第二特征向量表示,确定第一和第二广告文本之间的相关度,其中,利用如下确定过程得到广告文本的特征向量表示:获取历史广告数据;获取广告文本的收益信息;基于所述历史广告数据中所述多个广告文本之间的共现关系,确定每个广告文本对应的至少一个关联文本;以及基于每个广告文本及其相应的关联文本的语义和收益信息,确定每个广告文本的特征向量表示。

Figure 202211541802

The disclosure provides a method, device, device and medium for determining the relevance of an advertisement text, and relates to the technical field of artificial intelligence, and in particular to the technical field of natural language processing. The implementation scheme is: acquiring first and second feature vector representations respectively corresponding to the first and second advertisement texts; and determining the correlation between the first and second advertisement texts based on the first and second feature vector representations degree, wherein the feature vector representation of the advertisement text is obtained by using the following determination process: obtaining historical advertisement data; obtaining revenue information of the advertisement text; based on the co-occurrence relationship between the multiple advertisement texts in the historical advertisement data, At least one associated text corresponding to each advertisement text; and based on the semantic and revenue information of each advertisement text and its corresponding associated text, determine the feature vector representation of each advertisement text.

Figure 202211541802

Description

广告文本的相关度确定方法及装置、设备和介质Advertisement Text Relevance Determination Method, Device, Equipment and Medium

技术领域technical field

本公开涉及人工智能技术领域,尤其涉及自然语言处理领域,具体涉及一种广告文本的相关度确定方法、装置、电子设备、计算机可读存储介质和计算机程序产品。The present disclosure relates to the field of artificial intelligence technology, in particular to the field of natural language processing, and in particular to a method, device, electronic device, computer-readable storage medium and computer program product for determining the relevance of advertisement text.

背景技术Background technique

人工智能是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科,既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术;人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术以及机器学习/深度学习、大数据处理技术、知识图谱技术等几大方向。Artificial intelligence is a discipline that studies the use of computers to simulate certain human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; artificial intelligence software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology, and machine learning/depth Learning, big data processing technology, knowledge map technology and other major directions.

在广告营销场景中,通常需要向用户主动推荐广告,特别是基于特定内容来向用户推荐与该特定内容相关的广告。因此,需要基于候选内容与该特定内容的相关度来决定推荐策略。In an advertising marketing scenario, it is usually necessary to actively recommend advertisements to users, especially to recommend advertisements related to the specific content to users based on specific content. Therefore, it is necessary to determine a recommendation strategy based on the relevance between the candidate content and the specific content.

在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明,否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地,除非另有指明,否则此部分中提及的问题不应认为在任何现有技术中已被公认。The approaches described in this section are not necessarily approaches that have been previously conceived or employed. Unless otherwise indicated, it should not be assumed that any approaches described in this section are admitted to be prior art solely by virtue of their inclusion in this section. Similarly, issues mentioned in this section should not be considered to have been recognized in any prior art unless otherwise indicated.

发明内容Contents of the invention

本公开提供了一种广告文本的相关度确定方法、装置、电子设备、计算机可读存储介质和计算机程序产品。The present disclosure provides a method, device, electronic device, computer-readable storage medium and computer program product for determining the relevance degree of advertisement text.

根据本公开的一方面,提供了一种广告文本的相关度确定方法,包括:获取与第一广告文本对应的第一特征向量表示和与第二广告文本对应的第二特征向量表示;以及基于所述第一特征向量表示和第二特征向量表示,确定所述第一广告文本和第二广告文本之间的相关度,其中,所述第一特征向量表示和所述第二特征向量表示是利用如下确定过程得到的:获取包含多个广告文本的历史广告数据,所述多个广告文本包括所述第一广告文本和所述第二广告文本;获取所述多个广告文本中每个广告文本的收益信息;针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本;以及针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本对应的语义和收益信息,确定与该广告文本对应的特征向量表示。According to one aspect of the present disclosure, there is provided a method for determining the relevance of advertisement text, including: acquiring a first feature vector representation corresponding to the first advertisement text and a second feature vector representation corresponding to the second advertisement text; and based on The first feature vector representation and the second feature vector representation determine the correlation between the first advertisement text and the second advertisement text, wherein the first feature vector representation and the second feature vector representation are Obtained by the following determination process: obtaining historical advertisement data containing multiple advertisement texts, the multiple advertisement texts including the first advertisement text and the second advertisement text; obtaining each advertisement in the multiple advertisement texts Revenue information of the text; for each advertisement text in the plurality of advertisement texts, based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, from at least one other advertisement text except the advertisement text Determine at least one associated text corresponding to the advertisement text in the text; and for each advertisement text in the plurality of advertisement texts, based on the semantics and income information of the advertisement text, and the corresponding semantics and The revenue information is to determine the feature vector representation corresponding to the advertisement text.

根据本公开的另一方面,还提供一种广告文本推荐方法,包括:获取目标广告文本和多个候选广告文本;利用上述广告文本的相关度确定方法,确定所述目标广告文本和所述多个候选广告文本之间的相关度;以及基于所述目标广告文本和所述多个候选广告文本之间的相关度,从所述多个候选广告文本中确定至少一个待推荐广告文本。According to another aspect of the present disclosure, there is also provided a method for recommending an advertisement text, including: acquiring a target advertisement text and a plurality of candidate advertisement texts; the degree of relevance between the candidate advertisement texts; and based on the degree of relevance between the target advertisement text and the plurality of candidate advertisement texts, determine at least one advertisement text to be recommended from the plurality of candidate advertisement texts.

根据本公开的另一方面,提供了一种广告文本的相关度确定装置,包括:第一获取单元,被配置为获取与第一广告文本对应的第一特征向量表示和与第二广告文本对应的第二特征向量表示;以及第一确定单元,被配置为基于所述第一特征向量表示和第二特征向量表示,确定所述第一广告文本和所述第二广告文本之间的相关度,其中,所述第一特征向量表示和所述第二特征向量表示是利用特征向量确定单元得到的,所述特征向量确定单元包括:第一获取子单元,被配置为获取包含多个广告文本的历史广告数据,所述多个广告文本包括所述第一广告文本和第二广告文本;第二获取子单元,被配置为获取所述多个广告文本中每个广告文本的收益信息;第一确定子单元,被配置为针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本;以及第二确定子单元,被配置为针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本对应的语义和收益信息,确定与该广告文本对应的特征向量表示。According to another aspect of the present disclosure, there is provided an apparatus for determining the relevance of advertisement text, including: a first acquisition unit configured to acquire the first feature vector representation corresponding to the first advertisement text and the first feature vector representation corresponding to the second advertisement text the second feature vector representation; and a first determining unit configured to determine the degree of relevance between the first advertisement text and the second advertisement text based on the first feature vector representation and the second feature vector representation , wherein, the first eigenvector representation and the second eigenvector representation are obtained by using a eigenvector determination unit, and the eigenvector determination unit includes: a first acquisition subunit configured to obtain a historical advertisement data, the multiple advertisement texts include the first advertisement text and the second advertisement text; the second acquisition subunit is configured to acquire the revenue information of each advertisement text in the multiple advertisement texts; the second A determining subunit, configured to, for each advertisement text in the plurality of advertisement texts, based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, from at least Determining at least one associated text corresponding to the advertisement text from one other advertisement text; and the second determination subunit is configured to, for each advertisement text in the plurality of advertisement texts, based on the semantics and revenue information of the advertisement text , and corresponding semantic and revenue information corresponding to at least one associated text, determine a feature vector representation corresponding to the advertisement text.

根据本公开的另一方面,还提供一种广告文本推荐装置,包括:第二获取单元,被配置为获取目标广告文本和多个候选广告文本;如上所述的广告文本的相关度确定装置,被配置为确定所述目标广告文本和所述多个候选广告文本之间的相关度;以及第二确定单元,被配置为基于所述目标广告文本和所述多个候选广告文本之间的相关度,从所述多个候选广告文本中确定至少一个待推荐广告文本。According to another aspect of the present disclosure, there is also provided an advertisement text recommendation device, including: a second acquisition unit configured to acquire a target advertisement text and a plurality of candidate advertisement texts; the above-mentioned apparatus for determining the relevance of advertisement texts, configured to determine the degree of relevance between the target advertisement text and the plurality of candidate advertisement texts; and a second determination unit configured to, based on the correlation between the target advertisement text and the plurality of candidate advertisement texts degree, at least one advertisement text to be recommended is determined from the plurality of candidate advertisement texts.

根据本公开的另一方面,提供了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述广告文本的相关度确定方法。According to another aspect of the present disclosure, there is provided an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information executable by the at least one processor. instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the above method for determining the relevance of advertisement text.

根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行上述广告文本的相关度确定方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the above method for determining the relevance of advertisement text.

根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,其中,计算机程序在被处理器执行时能够实现上述广告文本的相关度确定方法。According to another aspect of the present disclosure, a computer program product is provided, including a computer program, wherein, when the computer program is executed by a processor, the above method for determining the relevance of an advertisement text can be realized.

根据本公开的一个或多个实施例,可以更准确地确定广告文本之间的相关度。According to one or more embodiments of the present disclosure, the degree of relevance between advertisement texts can be determined more accurately.

应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.

附图说明Description of drawings

附图示例性地示出了实施例并且构成说明书的一部分,与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的,并不限制权利要求的范围。在所有附图中,相同的附图标记指代类似但不一定相同的要素。The drawings exemplarily illustrate the embodiment and constitute a part of the specification, and together with the text description of the specification, serve to explain the exemplary implementation of the embodiment. The illustrated embodiments are for illustrative purposes only and do not limit the scope of the claims. Throughout the drawings, like reference numbers designate similar, but not necessarily identical, elements.

图1示出了根据本公开示例性实施例的可以在其中实施本文描述的各种方法的示例性系统的示意图;FIG. 1 shows a schematic diagram of an exemplary system in which various methods described herein may be implemented according to an exemplary embodiment of the present disclosure;

图2示出了根据本公开示例性实施例的广告文本的特征向量表示的确定过程的流程图;FIG. 2 shows a flow chart of a process for determining a feature vector representation of an advertisement text according to an exemplary embodiment of the present disclosure;

图3示出了根据本公开示例性实施例的广告文本的相关度确定方法的流程图;Fig. 3 shows a flowchart of a method for determining the relevance of advertisement text according to an exemplary embodiment of the present disclosure;

图4示出了根据本公开示例性实施例的文本图的示意图;FIG. 4 shows a schematic diagram of a text graph according to an exemplary embodiment of the present disclosure;

图5示出了根据本公开示例性实施例的特征向量确定单元的结构框图;Fig. 5 shows a structural block diagram of a feature vector determining unit according to an exemplary embodiment of the present disclosure;

图6示出了根据本公开示例性实施例的广告文本的相关度确定装置的结构框图;Fig. 6 shows a structural block diagram of an apparatus for determining the relevance of advertisement text according to an exemplary embodiment of the present disclosure;

图7示出了能够用于实现本公开实施例的示例性电子设备的结构框图。FIG. 7 shows a structural block diagram of an exemplary electronic device that can be used to implement the embodiments of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

在本公开中,除非另有说明,否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系,这种术语只是用于将一个元件与另一元件区分开。在一些示例中,第一要素和第二要素可以指向该要素的同一实例,而在某些情况下,基于上下文的描述,它们也可以指代不同实例。In the present disclosure, unless otherwise stated, using the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, temporal relationship or importance relationship of these elements, and such terms are only used for Distinguishes one element from another. In some examples, the first element and the second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on contextual description.

在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的,而并非旨在进行限制。除非上下文另外明确地表明,如果不特意限定要素的数量,则该要素可以是一个也可以是多个。此外,本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。The terminology used in describing the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, there may be one or more elements. In addition, the term "and/or" used in the present disclosure covers any one and all possible combinations of the listed items.

相关技术中,一种实现方式是利用广告文本之间的语义相似度确定其相关度,或者,也可以利用历史数据中广告文本之间的共现关系确定其相关度。但是,这两种方式对多个广告文本之间的相关度特征的表达能力有限,并且未能考虑到广告文本对应的收益信息。In related technologies, an implementation manner is to use the semantic similarity between advertisement texts to determine their relevance, or, use the co-occurrence relationship between advertisement texts in historical data to determine their relevance. However, these two methods have limited ability to express the correlation characteristics between multiple advertisement texts, and fail to take into account the revenue information corresponding to the advertisement texts.

基于此,本公开提供了一种广告文本的相关度确定方法,利用历史广告数据中多个广告文本之间的共现关系构建文本图,利用文本图中每个节点及其关联的关联节点的语义和收益信息,确定每个节点对应的广告文本的特征向量表示,进而利用特征向量表示来确定广告文本之间的相关度,能够在利用文本图的图结构所表征的文本间的关联关系的基础上,进一步结合收益信息来确定广告文本间的相关度,提升准确性。Based on this, the present disclosure provides a method for determining the relevance of advertisement text, which uses the co-occurrence relationship between multiple advertisement texts in historical advertisement data to construct a text graph, and uses the information of each node and its associated associated nodes in the text graph to Semantic and revenue information, determine the feature vector representation of the advertising text corresponding to each node, and then use the feature vector representation to determine the correlation between the advertising texts, and can use the graph structure of the text graph to represent the text. On this basis, the revenue information is further combined to determine the correlation between advertisement texts to improve accuracy.

下面将结合附图详细描述本公开的实施例。Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

图1示出了根据本公开的实施例可以将本文描述的各种方法和装置在其中实施的示例性系统100的示意图。参考图1,该系统100包括一个或多个客户端设备101、102、103、104、105和106、服务器120以及将一个或多个客户端设备耦接到服务器120的一个或多个通信网络110。客户端设备101、102、103、104、105和106可以被配置为执行一个或多个应用程序。FIG. 1 shows a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented according to an embodiment of the present disclosure. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks coupling the one or more client devices to the server 120 110. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

在本公开的实施例中,服务器120可以运行使得能够执行广告文本的相关度确定方法的一个或多个服务或软件应用。In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the execution of the method for determining the relevance of advertisement text.

在某些实施例中,服务器120还可以提供其他服务或软件应用,这些服务或软件应用可以包括非虚拟环境和虚拟环境。在某些实施例中,这些服务可以作为基于web的服务或云服务提供,例如在软件即服务(SaaS)模型下提供给客户端设备101、102、103、104、105和/或106的用户。In some embodiments, server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, such as under a software-as-a-service (SaaS) model to users of client devices 101, 102, 103, 104, 105, and/or 106 .

在图1所示的配置中,服务器120可以包括实现由服务器120执行的功能的一个或多个组件。这些组件可以包括可由一个或多个处理器执行的软件组件、硬件组件或其组合。操作客户端设备101、102、103、104、105和/或106的用户可以依次利用一个或多个客户端应用程序来与服务器120进行交互以利用这些组件提供的服务。应当理解,各种不同的系统配置是可能的,其可以与系统100不同。因此,图1是用于实施本文所描述的各种方法的系统的一个示例,并且不旨在进行限制。In the configuration shown in FIG. 1 , server 120 may include one or more components that implement the functions performed by server 120 . These components may include software components, hardware components or combinations thereof executable by one or more processors. Users operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client application programs to interact with server 120 to utilize the services provided by these components. It should be understood that various different system configurations are possible, which may differ from system 100 . Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein, and is not intended to be limiting.

用户可以使用客户端设备101、102、103、104、105和/或106来发送广告文本。客户端设备可以提供使客户端设备的用户能够与客户端设备进行交互的接口。客户端设备还可以经由该接口向用户输出信息。尽管图1仅描绘了六种客户端设备,但是本领域技术人员将能够理解,本公开可以支持任何数量的客户端设备。A user may use client devices 101, 102, 103, 104, 105, and/or 106 to send advertising text. A client device may provide an interface that enables a user of the client device to interact with the client device. The client device can also output information to the user via the interface. Although FIG. 1 depicts only six client devices, those skilled in the art will understand that the present disclosure can support any number of client devices.

客户端设备101、102、103、104、105和/或106可以包括各种类别的计算机设备,例如便携式手持设备、通用计算机(诸如个人计算机和膝上型计算机)、工作站计算机、可穿戴设备、智能屏设备、自助服务终端设备、服务机器人、游戏系统、瘦客户端、各种消息收发设备、传感器或其他感测设备等。这些计算机设备可以运行各种类别和版本的软件应用程序和操作系统,例如MICROSOFT Windows、APPLE iOS、类UNIX操作系统、Linux或类Linux操作系统(例如GOOGLE Chrome OS);或包括各种移动操作系统,例如MICROSOFT WindowsMobile OS、iOS、Windows Phone、Android。便携式手持设备可以包括蜂窝电话、智能电话、平板电脑、个人数字助理(PDA)等。可穿戴设备可以包括头戴式显示器(诸如智能眼镜)和其他设备。游戏系统可以包括各种手持式游戏设备、支持互联网的游戏设备等。客户端设备能够执行各种不同的应用程序,例如各种与Internet相关的应用程序、通信应用程序(例如电子邮件应用程序)、短消息服务(SMS)应用程序,并且可以使用各种通信协议。Client devices 101, 102, 103, 104, 105, and/or 106 may include various classes of computing devices, such as portable handheld devices, general-purpose computers (such as personal computers and laptops), workstation computers, wearable devices, Smart screen devices, self-service terminal devices, service robots, game systems, thin clients, various messaging devices, sensors or other sensing devices, etc. These computer devices can run various classes and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux or Linux-like operating systems (such as GOOGLE Chrome OS); or include various mobile operating systems , such as MICROSOFT WindowsMobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular phones, smart phones, tablet computers, personal digital assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. Gaming systems may include various handheld gaming devices, Internet-enabled gaming devices, and the like. A client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (eg, email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

网络110可以是本领域技术人员熟知的任何类别的网络,其可以使用多种可用协议中的任何一种(包括但不限于TCP/IP、SNA、IPX等)来支持数据通信。仅作为示例,一个或多个网络110可以是局域网(LAN)、基于以太网的网络、令牌环、广域网(WAN)、因特网、虚拟网络、虚拟专用网络(VPN)、内部网、外部网、区块链网络、公共交换电话网(PSTN)、红外网络、无线网络(例如蓝牙、WIFI)和/或这些和/或其他网络的任意组合。Network 110 can be any type of network known to those skilled in the art that can support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, and the like. By way of example only, the one or more networks 110 may be a local area network (LAN), an Ethernet-based network, a token ring, a wide area network (WAN), the Internet, a virtual network, a virtual private network (VPN), an intranet, an extranet, Blockchain network, public switched telephone network (PSTN), infrared network, wireless network (eg Bluetooth, WIFI) and/or any combination of these and/or other networks.

服务器120可以包括一个或多个通用计算机、专用服务器计算机(例如PC(个人计算机)服务器、UNIX服务器、中端服务器)、刀片式服务器、大型计算机、服务器群集或任何其他适当的布置和/或组合。服务器120可以包括运行虚拟操作系统的一个或多个虚拟机,或者涉及虚拟化的其他计算架构(例如可以被虚拟化以维护服务器的虚拟存储设备的逻辑存储设备的一个或多个灵活池)。在各种实施例中,服务器120可以运行提供下文所描述的功能的一个或多个服务或软件应用。Server 120 may include one or more general purpose computers, dedicated server computers (e.g., PC (personal computer) servers, UNIX servers, midrange servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination . Server 120 may include one or more virtual machines running virtual operating systems, or other computing architectures involving virtualization (eg, one or more flexible pools of logical storage devices that may be virtualized to maintain the server's virtual storage devices). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

服务器120中的计算单元可以运行包括上述任何操作系统以及任何商业上可用的服务器操作系统的一个或多个操作系统。服务器120还可以运行各种附加服务器应用程序和/或中间层应用程序中的任何一个,包括HTTP服务器、FTP服务器、CGI服务器、JAVA服务器、数据库服务器等。Computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle-tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

在一些实施方式中,服务器120可以包括一个或多个应用程序,以分析和合并从客户端设备101、102、103、104、105和106的用户接收的数据馈送和/或事件更新。服务器120还可以包括一个或多个应用程序,以经由客户端设备101、102、103、104、105和106的一个或多个显示设备来显示数据馈送和/或实时事件。In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101 , 102 , 103 , 104 , 105 , and 106 . Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101 , 102 , 103 , 104 , 105 , and 106 .

在一些实施方式中,服务器120可以为分布式系统的服务器,或者是结合了区块链的服务器。服务器120也可以是云服务器,或者是带人工智能技术的智能云计算服务器或智能云主机。云服务器是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPS,Virtual Private Server)服务中存在的管理难度大、业务扩展性弱的缺陷。In some implementations, the server 120 may be a server of a distributed system, or a server combined with blockchain. The server 120 can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology. Cloud server is a host product in the cloud computing service system to solve the defects of difficult management and weak business scalability existing in traditional physical host and virtual private server (VPS, Virtual Private Server) services.

系统100还可以包括一个或多个数据库130。在某些实施例中,这些数据库可以用于存储数据和其他信息。例如,数据库130中的一个或多个可用于存储诸如音频文件和视频文件的信息。数据库130可以驻留在各种位置。例如,由服务器120使用的数据库可以在服务器120本地,或者可以远离服务器120且可以经由基于网络或专用的连接与服务器120通信。数据库130可以是不同的类别。在某些实施例中,由服务器120使用的数据库例如可以是关系数据库。这些数据库中的一个或多个可以响应于命令而存储、更新和检索到数据库以及来自数据库的数据。System 100 may also include one or more databases 130 . In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and video files. Database 130 may reside in various locations. For example, the database used by server 120 may be local to server 120, or may be remote from server 120 and may communicate with server 120 via a network-based or dedicated connection. Database 130 can be of different kinds. In some embodiments, the database used by server 120 may be, for example, a relational database. One or more of these databases may store, update and retrieve the database and data from the database in response to commands.

在某些实施例中,数据库130中的一个或多个还可以由应用程序使用来存储应用程序数据。由应用程序使用的数据库可以是不同类别的数据库,例如键值存储库,对象存储库或由文件系统支持的常规存储库。In some embodiments, one or more of databases 130 may also be used by applications to store application data. Databases used by applications can be of different classes such as key-value stores, object stores or regular stores backed by a file system.

图1的系统100可以以各种方式配置和操作,以使得能够应用根据本公开所描述的各种方法和装置。The system 100 of FIG. 1 may be configured and operated in various ways to enable application of the various methods and apparatuses described in accordance with this disclosure.

图2示出了根据本公开示例性实施例的广告文本的特征向量表示的确定过程200的流程图。如图2所示,过程200包括:Fig. 2 shows a flowchart of a process 200 for determining a feature vector representation of an advertisement text according to an exemplary embodiment of the present disclosure. As shown in Figure 2, the process 200 includes:

步骤S201、获取包含多个广告文本的历史广告数据,所述多个广告文本包括所述第一广告文本和第二广告文本;Step S201, acquiring historical advertisement data including a plurality of advertisement texts, the plurality of advertisement texts including the first advertisement text and the second advertisement text;

步骤S202、获取所述多个广告文本中每个广告文本的收益信息;Step S202, obtaining the revenue information of each advertisement text in the plurality of advertisement texts;

步骤S203、针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本;以及Step S203, for each advertisement text in the plurality of advertisement texts, based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, from at least one other advertisement text except the advertisement text determining at least one associated text corresponding to the advertisement text; and

步骤S204、针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本对应的语义和收益信息,确定与该广告文本对应的特征向量表示。Step S204, for each advertisement text in the plurality of advertisement texts, based on the semantics and revenue information of the advertisement text and the semantics and revenue information corresponding to at least one associated text, determine the feature vector corresponding to the advertisement text express.

图3示出了根据本公开示例性实施例的广告文本的相关度确定方法300的流程图。如图3所示,方法300包括:Fig. 3 shows a flowchart of a method 300 for determining the relevance of advertisement text according to an exemplary embodiment of the present disclosure. As shown in FIG. 3, method 300 includes:

步骤S301、获取与第一广告文本对应的第一特征向量表示和与第二广告文本对应的第二特征向量表示,其中,所述第一特征向量表示和第二特征向量表示是利用过程200得到的;以及Step S301. Obtain a first feature vector representation corresponding to the first advertisement text and a second feature vector representation corresponding to the second advertisement text, wherein the first feature vector representation and the second feature vector representation are obtained by using the process 200 of; and

步骤S302、基于所述第一特征向量表示和第二特征向量表示,确定所述第一广告文本和第二广告文本之间的相关度。Step S302, based on the first feature vector representation and the second feature vector representation, determine the degree of correlation between the first advertisement text and the second advertisement text.

由此,基于上述确定过程200,能够利用历史广告数据中多个广告文本之间的共现关系来确定每个广告文本对应的关联文本,进而基于每个广告文本及其关联文本的语义和收益信息,确定每个广告文本的特征向量表示,以使得广告文本的特征向量表示能够更准确地表征该广告文本与其他广告文本的相关性特征,并且同时能够表征广告文本的收益信息。进一步地,通过利用上述特征向量表示来确定广告文本间的相关度,能够在利用文本共现关系所指示的广告文本间的关联关系的基础上,进一步结合收益信息来确定广告文本间的相关度,提升准确性。Therefore, based on the above-mentioned determination process 200, the co-occurrence relationship between multiple advertisement texts in the historical advertisement data can be used to determine the associated text corresponding to each advertisement text, and then based on the semantics and benefits of each advertisement text and its associated text Information, to determine the feature vector representation of each advertisement text, so that the feature vector representation of the advertisement text can more accurately characterize the correlation between the advertisement text and other advertisement texts, and at the same time represent the revenue information of the advertisement text. Furthermore, by using the above-mentioned feature vector representation to determine the correlation between advertisement texts, it is possible to determine the correlation between advertisement texts based on the correlation between advertisement texts indicated by the text co-occurrence relationship and further combining revenue information , to improve accuracy.

在一些示例中,广告文本例如可以是向用户推荐的广告名称文本,或者是用户所主动查询的商品名称文本,只要能够表征广告营销对象的相关信息即可,本公开对此不作限定。In some examples, the advertisement text may be, for example, the text of the advertisement name recommended to the user, or the text of the product name actively queried by the user, as long as it can represent the relevant information of the advertisement marketing object, which is not limited in the present disclosure.

在一些示例中,当广告文本为向用户推荐的广告名称文本时,可以是基于该广告的费用来确定广告文本的收益信息。应当理解,广告文本的收益信息也可以包括其他内容,例如可以是该广告文本所表征的营销对象对应的收益信息,如商品销售额、页面浏览量、视频播放量等,本公开对此不作限制。In some examples, when the advertisement text is the advertisement title text recommended to the user, the revenue information of the advertisement text may be determined based on the advertisement cost. It should be understood that the revenue information of the advertisement text may also include other content, for example, it may be the revenue information corresponding to the marketing object represented by the advertisement text, such as product sales, page views, video playback volume, etc., which is not limited in this disclosure .

在一些示例中,步骤S302中基于所述第一特征向量表示和第二特征向量表示,确定所述第一广告文本和第二广告文本之间的相关度可以包括:基于所述第一特征向量表示和第二特征向量表示之间的相似度,确定所述第一广告文本和第二广告文本之间的相关度。所述相似度例如可以是通过计算第一特征向量表示和第二特征向量表示之间的欧氏距离来确定的,或者,也可以是通过计算余弦相似度、曼哈顿距离等指标来确定的。In some examples, determining the correlation between the first advertisement text and the second advertisement text based on the first feature vector representation and the second feature vector representation in step S302 may include: based on the first feature vector represents the similarity between the representation and the second feature vector representation, and determines the correlation between the first advertisement text and the second advertisement text. The similarity may be determined, for example, by calculating the Euclidean distance between the first feature vector representation and the second feature vector representation, or may also be determined by calculating cosine similarity, Manhattan distance and other indicators.

在一些示例中,在利用过程200确定多个广告文本的特征向量表示后,可以将多个广告文本及相应的多个特征向量表示存入数据库,从而能够在步骤S301中基于第一广告文本和第二广告文本,从数据库中查询相应的第一特征向量表示和第二特征向量表示,提升广告文本的相关度确定的效率。In some examples, after using the process 200 to determine the feature vector representations of multiple advertisement texts, the multiple advertisement texts and the corresponding multiple feature vector representations can be stored in the database, so that in step S301 based on the first advertisement text and For the second advertisement text, the corresponding first feature vector representation and the second feature vector representation are queried from the database, so as to improve the efficiency of determining the relevance of the advertisement text.

根据一些实施例,步骤S203中针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本包括:基于所述历史广告数据中所述多个广告文本之间的共现关系,构建包含与所述多个广告文本一一对应的多个节点的文本图,其中,针对所述多个文本那种的任意两个广告文本,响应于确定所述两个广告文本之间的共现关系满足预设条件,基于所述两个广告文本对应的节点建立连接边;以及针对所述多个节点中的每个节点,基于所述多个节点之间的连接关系,从至少一个其他节点中确定与该节点对应的至少一个关联节点,以得到与相应的广告文本对应的至少一个关联文本。According to some embodiments, for each advertisement text in the plurality of advertisement texts in step S203, based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, at least Determining at least one associated text corresponding to the advertising text in one other advertising text includes: based on the co-occurrence relationship between the multiple advertising texts in the historical advertising data, constructing a one-to-one correspondence with the multiple advertising texts A text graph of a plurality of nodes, wherein, for any two advertisement texts of the plurality of texts, in response to determining that the co-occurrence relationship between the two advertisement texts satisfies a preset condition, based on the two The node corresponding to the advertisement text establishes a connection edge; and for each node in the plurality of nodes, based on the connection relationship between the plurality of nodes, at least one associated node corresponding to the node is determined from at least one other node , to obtain at least one associated text corresponding to the corresponding advertisement text.

由此,基于上述确定过程200,能够利用历史广告数据中多个广告文本之间的共现关系构建文本图,利用文本图中每个节点及其关联的关联节点的语义和收益信息,确定每个节点对应的广告文本的特征向量表示,以使得广告文本的特征向量表示能够更准确地表征该广告文本与其他广告文本的相关性特征,并且同时能够表征广告文本的收益信息。进一步地,通过利用上述特征向量表示来确定广告文本间的相关度,能够在利用文本图的图结构所表征的文本间的关联关系的基础上,进一步结合收益信息来确定广告文本间的相关度,提升准确性。Thus, based on the above determination process 200, the co-occurrence relationship between multiple advertisement texts in the historical advertisement data can be used to construct a text graph, and the semantic and revenue information of each node and its associated associated nodes in the text graph can be used to determine each The feature vector representation of the advertising text corresponding to each node, so that the feature vector representation of the advertising text can more accurately characterize the correlation between the advertising text and other advertising texts, and at the same time represent the revenue information of the advertising text. Further, by using the above-mentioned feature vector representation to determine the correlation between advertisement texts, it is possible to determine the correlation between advertisement texts based on the association relationship between texts represented by the graph structure of the text graph and further combining revenue information , to improve accuracy.

根据一些实施例,步骤204中针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本的语义和收益信息,确定该广告文本的特征向量表示包括:针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示;以及基于所述至少一个连接边的边向量表示,确定该广告文本的特征向量表示。由此,能够利用边向量表示来表征每个边的两个端点对应的广告文本的相关度特征,进而对每个节点与其关联节点之间的多个边向量进行聚合,以得到更准确的每个节点对应的文本的特征向量表示。According to some embodiments, for each advertisement text in the plurality of advertisement texts in step 204, based on the semantics and revenue information of the advertisement text and the corresponding semantics and revenue information of at least one associated text, determine the The feature vector representation includes: for each connecting edge in at least one connecting edge between the node corresponding to the advertising text and the corresponding at least one associated node, based on the semantics of the advertising text respectively corresponding to the two endpoints of the connecting edge and revenue information, determine an edge vector representation of the connected edge; and determine a feature vector representation of the advertisement text based on the edge vector representation of the at least one connected edge. Therefore, the edge vector representation can be used to characterize the relevance features of the advertisement text corresponding to the two endpoints of each edge, and then the multiple edge vectors between each node and its associated nodes can be aggregated to obtain a more accurate The feature vector representation of the text corresponding to each node.

在一些示例中,所述基于所述至少一个连接边的边向量表示,确定所述节点对应的广告文本的特征向量表示可以包括:计算所述至少一个连接边的边向量表示的平均值,以得到所述节点对应的广告文本的特征向量表示。这一步骤也可以是利用其他方式进行,例如可以是基于所述至少一个连接边的边向量表示和预设公式进行计算,以得到所述节点对应的广告文本的特征向量表示。In some examples, the determining the feature vector representation of the advertisement text corresponding to the node based on the edge vector representation of the at least one connection edge may include: calculating an average value of the edge vector representations of the at least one connection edge to obtain A feature vector representation of the advertisement text corresponding to the node is obtained. This step can also be performed in other ways, for example, it can be calculated based on the edge vector representation of the at least one connection edge and a preset formula, so as to obtain the feature vector representation of the advertisement text corresponding to the node.

根据一些实施例,所述基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示包括:确定所述两个端点分别对应的广告文本之间的语义相似度;以及基于所述语义相似度和所述两个端点分别对应的广告文本的收益信息,确定该连接边的边向量表示。由此,能够利用语义相似度来表征两个端点分别对应的广告文本间的相关性,更加便捷准确。According to some embodiments, the determining the edge vector representation of the connecting edge based on the semantics and revenue information of the advertising text corresponding to the two endpoints of the connecting edge includes: determining the relationship between the advertising text corresponding to the two endpoints respectively ; and based on the semantic similarity and the revenue information of the advertising text corresponding to the two endpoints, determine the edge vector representation of the connecting edge. Therefore, the semantic similarity can be used to represent the correlation between the advertisement texts corresponding to the two endpoints, which is more convenient and accurate.

在一些示例中,可以是确定所述两个端点分别对应的广告文本的语义特征向量表示,再基于两个语义特征向量表示之间的相似度确定所述语义相似度。所述语义特征向量表示例如可以是通过将广告文本输入语言模型来实现的,或者也可以通过查询存储有多个文本及相应的语义特征向量的数据库实现,本公开对此不作限定。In some examples, it may be to determine the semantic feature vector representations of the advertisement text corresponding to the two endpoints, and then determine the semantic similarity based on the similarity between the two semantic feature vector representations. The semantic feature vector representation may be realized, for example, by inputting advertisement text into a language model, or may also be realized by querying a database storing multiple texts and corresponding semantic feature vectors, which is not limited in the present disclosure.

根据一些实施例,所述确定所述两个端点分别对应的广告文本之间的语义相似度包括:将所述连接边的两个端点分别对应的广告文本输入预训练语言模型,以获取所述预训练语言模型所输出的所述语义相似度,其中,所述预训练语言模型是利用标注语料数据进行训练得到的。由此,能够利用预训练语言模型得到两个广告文本之间的语义相似度,提升效率和准确度。According to some embodiments, the determining the semantic similarity between the advertisement texts respectively corresponding to the two endpoints includes: inputting the advertisement texts respectively corresponding to the two endpoints of the connecting edge into a pre-trained language model to obtain the The semantic similarity output by a pre-trained language model, wherein the pre-trained language model is obtained by using labeled corpus data for training. As a result, the semantic similarity between two advertisement texts can be obtained by using the pre-trained language model, improving efficiency and accuracy.

在一些示例中,所述预训练语言模型例如可以是ernie模型。In some examples, the pre-trained language model may be an Ernie model, for example.

根据一些实施例,所述确定过程200还包括:针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,确定所述连接边的两个端点分别对应的广告文本之间的共现频率,并且其中,基于所述连接边的两个端点分别对应的广告文本的语义和收益信息以及所述共现频率,确定该连接边的边向量表示。由此,能够结合两个端点分别对应的广告文本的共现频率,更精准地指示二者间的相关性。According to some embodiments, the determining process 200 further includes: for each connecting edge in at least one connecting edge between the node corresponding to the advertisement text and the corresponding at least one associated node, determining two of the connecting edges The co-occurrence frequencies between the advertising texts corresponding to the endpoints, and wherein, based on the semantics and revenue information of the advertising texts corresponding to the two endpoints of the connecting edge and the co-occurrence frequency, the edge vector representation of the connecting edge is determined . Thus, the co-occurrence frequency of the advertisement text corresponding to the two endpoints can be combined to more accurately indicate the correlation between the two endpoints.

在一些示例中,所述共现频率可以是共现次数cnt。由此,可以基于如下公式确定共现次数cnt和收益信息acp的融合信息S:In some examples, the co-occurrence frequency may be the number of co-occurrence cnt. Therefore, the fusion information S of the number of co-occurrence cnt and the income information acp can be determined based on the following formula:

S=a*cnt+b*acpS=a*cnt+b*acp

在这一示例中,公式中的a和b可以是根据实际需求设置的权重值。In this example, a and b in the formula may be weight values set according to actual needs.

根据一些实施例,所述确定过程200还包括:对所述共现频率和所述收益信息分别执行归一化处理,以得到归一化共现频率和归一化收益信息,并且其中,基于所述连接边的两个端点分别对应的广告文本的语义和归一化收益信息以及所述归一化共现频率,确定该连接边的边向量表示。由此,能够将共现频率和收益信息的数值范围缩放至同一区间,简化计算过程,得到更准确的边向量表示。According to some embodiments, the determining process 200 further includes: respectively performing normalization processing on the co-occurrence frequency and the revenue information to obtain the normalized co-occurrence frequency and the normalized revenue information, and wherein, based on The two endpoints of the connecting edge correspond to the semantics of the advertisement text, the normalized revenue information and the normalized co-occurrence frequency respectively, and determine the edge vector representation of the connecting edge. As a result, the numerical ranges of co-occurrence frequency and revenue information can be scaled to the same interval, the calculation process can be simplified, and a more accurate edge vector representation can be obtained.

在一些示例中,可以是利用如下公式来执行归一化:In some examples, normalization may be performed using the following formula:

Figure BDA0003978024380000111
Figure BDA0003978024380000111

式中,x为初始的收益信息或共现频率,xmin为全部数据中收益信息或共现频率的最小值,xmax为全部数据中收益信息或共现频率的最大值,X′为归一化收益信息或归一化共现频率。In the formula, x is the initial revenue information or co-occurrence frequency, x min is the minimum value of revenue information or co-occurrence frequency in all data, x max is the maximum value of revenue information or co-occurrence frequency in all data, and X′ is the regression Normalized return information or normalized co-occurrence frequency.

根据一些实施例,所述基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示包括:将所述连接边的两个端点分别对应的广告文本及其收益信息输入边向量编码模型,以得到所述边向量编码模型所输出的所述边向量表示,其中,所述边向量编码模型是利用如下方式进行训练得到的:获取包含与多个样本文本一一对应的多个节点的样本文本图和所述多个样本文本中每个样本文本的收益信息,所述样本文本图中包括多个用于连接所述多个节点的多个连接边;针对样本文本图所包括的多个连接边中的每个连接边,将所述连接边的两个端点分别对应的样本文本及其收益信息输入所述边向量编码模型,以得到所述边向量编码模型所输出的所述连接板的边向量表示;基于所述多个连接边的边向量表示,确定所述多个节点对应的多个样本文本的特征向量表示;获取所述多个样本文本中的第一样本文本和第二样本文本之间的真实相关度;基于与所述第一样本文本和第二样本文本分别对应的第一特征向量表示和第二特征向量表示,确定所述第一样本文本和第二样本文本之间的预测相关度;以及基于所述真实相关度和所述预测相关度,调整所述边向量编码模型的参数。由此,能够利用边向量编码模型得到边向量表示,利用边向量编码模型所输出的边向量表示执行文本间的相关度预测任务,基于此进行模型训练,提升效率和准确度。According to some embodiments, determining the edge vector representation of the connected edge based on the semantics and revenue information of the advertisement text corresponding to the two endpoints of the connected edge includes: assigning the advertisement corresponding to the two endpoints of the connected edge The text and its revenue information are input into the edge vector encoding model to obtain the edge vector representation output by the edge vector encoding model, wherein the edge vector encoding model is obtained by training in the following manner: obtaining information including multiple A sample text graph of a plurality of nodes corresponding to the sample text one-to-one and revenue information of each sample text in the plurality of sample texts, the sample text graph includes a plurality of connections for connecting the plurality of nodes side; for each of the multiple connection edges included in the sample text graph, input the sample text corresponding to the two end points of the connection edge and its income information into the edge vector coding model to obtain the The edge vector representation of the connecting plate output by the edge vector encoding model; based on the edge vector representations of the plurality of connection edges, determine the feature vector representations of a plurality of sample texts corresponding to the plurality of nodes; obtain the plurality of The real correlation between the first sample text and the second sample text in the sample text; based on the first feature vector representation and the second feature vector representation respectively corresponding to the first sample text and the second sample text, determining a predicted correlation between the first sample text and the second sample text; and adjusting parameters of the edge vector coding model based on the real correlation and the predicted correlation. In this way, the side vector representation can be obtained by using the side vector coding model, and the correlation prediction task between texts can be performed using the side vector representation output by the side vector coding model, and model training can be carried out based on this to improve efficiency and accuracy.

在一些示例中,也可以同时将所述连接边的两个端点分别对应的广告文本的共现频率输入该边向量编码模型,例如可以是将上文所描述的融合信息S输入边向量编码模型,以得到能够更准确地表征两个广告文本之间的相关度特征的边向量表示。In some examples, the co-occurrence frequencies of the advertising texts corresponding to the two endpoints of the connecting edge can also be input into the edge vector coding model at the same time, for example, the fusion information S described above can be input into the edge vector coding model , to obtain an edge vector representation that can more accurately characterize the correlation between two advertisement texts.

根据一些实施例,所述多个广告文本包括多个历史查询文本和多个历史推荐文本,所述历史广告数据包括多个文本对,所述多个文本对中的每个文本对均包括一个历史查询文本和基于所述历史查询文本向用户推荐的历史推荐文本,并且所述响应于确定所述两个广告文本之间的共现关系满足预设条件,以所述两个广告文本对应的节点为顶点建立文本图的边包括:响应于确定所述历史广告数据包括由所述两个广告文本组成的文本对,以所述两个广告文本对应的节点为顶点建立文本图的边。由此,能够充分利用历史广告数据中历史查询文本和历史推荐文本的映射关系所表征的相关性,基于此构建文本图的边,以使得文本图中的边能够准确地指示两个端点节点间的相关性。According to some embodiments, the plurality of advertisement texts include a plurality of historical query texts and a plurality of historical recommendation texts, the historical advertisement data includes a plurality of text pairs, each text pair in the plurality of text pairs includes a historical query texts and historical recommended texts recommended to users based on the historical query texts, and in response to determining that the co-occurrence relationship between the two advertisement texts satisfies a preset condition, the corresponding The node establishing an edge of the text graph for the vertex includes: in response to determining that the historical advertisement data includes a text pair composed of the two advertisement texts, establishing an edge of the text graph with nodes corresponding to the two advertisement texts as vertices. Therefore, the correlation represented by the mapping relationship between the historical query text and the historical recommended text in the historical advertising data can be fully utilized, and the edges of the text graph can be constructed based on this, so that the edges in the text graph can accurately indicate the relationship between the two endpoint nodes. relevance.

在一些示例中,可以将基于历史广告数据中每个文本对的频次,确定相应的两个广告文本之间的共现频率。In some examples, the co-occurrence frequency between corresponding two advertisement texts may be determined based on the frequency of each text pair in the historical advertisement data.

根据一些实施例,所述针对所述文本图中的每个节点,基于所述多个节点之间的连接关系,从多个其他节点中确定与该节点对应的至少一个关联节点包括:针对所述多个其他节点中的每个其他节点,响应于该其他节点与所述节点之间的连接跳数不大于预设阈值,确定该其他节点为所述关联节点。由此,能够将与所述节点距离更近的其他节点确定为关联节点,避免远距离节点影响该节点广告文本的特征向量表示,提升准确度。According to some embodiments, for each node in the text graph, determining at least one associated node corresponding to the node from multiple other nodes based on the connection relationship between the multiple nodes includes: for all For each of the plurality of other nodes, in response to the number of connection hops between the other node and the node being not greater than a preset threshold, the other node is determined to be the associated node. In this way, other nodes closer to the node can be determined as associated nodes, preventing distant nodes from affecting the feature vector representation of the node's advertisement text, and improving accuracy.

根据一些实施例,所述针对所述文本图中的每个节点,基于所述多个节点之间的连接关系,从多个其他节点中确定与该节点对应的至少一个关联节点包括:基于所述多个节点之间的连接关系和预设规则,确定所述多个其他节点各自的采样概率,其中,根据所述预设规则,所述多个其他节点中与所述节点之间的连接跳数更小的节点的采样概率大于与所述节点之间的连接跳数更大的节点的采样概率;以及基于所述采样概率,对所述多个其他节点进行随机采样,以得到所述至少一个关联节点。由此,能够利用分层的随机采样得到每个节点对应的关联节点,距离越近的节点对应的采样概率越高,在减少关联节点数量,简化向量计算过程的同时保证准确度。According to some embodiments, for each node in the text graph, determining at least one associated node corresponding to the node from multiple other nodes based on the connection relationship between the multiple nodes includes: based on the According to the connection relationship between the plurality of nodes and preset rules, determine the respective sampling probabilities of the plurality of other nodes, wherein, according to the preset rules, the connection between the plurality of other nodes and the node The sampling probability of a node with a smaller hop count is greater than the sampling probability of a node with a larger connection hop count between the node; and based on the sampling probability, random sampling is performed on the plurality of other nodes to obtain the At least one associated node. Therefore, the associated nodes corresponding to each node can be obtained by using hierarchical random sampling, and the closer the node is, the higher the sampling probability is, which reduces the number of associated nodes and simplifies the vector calculation process while ensuring accuracy.

图4示出了根据本公开示例性实施例的文本图的示意图。在这一示例中,针对中心处的节点a,例如可以对与其关联的两层邻居节点进行采样,具体地,与a的连接跳数为一的b、c、d节点为第一层节点,与a的连接跳数为二的e、f、g节点为第二层节点。每一层节点对应的采样频率可以随着层数的增加而降低,例如可以基于指数衰减函数来确定每一层节点对应的采样频率。Fig. 4 shows a schematic diagram of a text graph according to an exemplary embodiment of the present disclosure. In this example, for node a at the center, for example, two layers of neighbor nodes associated with it can be sampled, specifically, nodes b, c, and d whose connection hops to a are one are the first layer nodes, Nodes e, f, and g whose connection hops with a are two are the second-layer nodes. The sampling frequency corresponding to each layer of nodes may decrease as the number of layers increases, for example, the sampling frequency corresponding to each layer of nodes may be determined based on an exponential decay function.

在这一示例中,通过利用前文所描述的步骤,可以分别得到边a-b、边a-c、边a-d、边b-e、边d-f、边d-g的边向量表示。由此,可以分别针对每一层的节点进行聚合,即基于该层的节点与下一层的节点之间的连接边的边向量表示以及下一层的节点对应的特征向量表示来确定该层的节点对应的特征向量表示。例如,可以基于边d-f、边d-g的边向量表示确定节点d对应的特征向量表示,再基于边a-b、边a-c、边a-d的边向量表示和b、c、d节点的特征向量表示确定节点a对应的广告文本的特征向量表示。In this example, by utilizing the steps described above, edge vector representations of edges a-b, edge a-c, edge a-d, edge b-e, edge d-f, and edge d-g can be obtained respectively. Therefore, the nodes of each layer can be aggregated separately, that is, the layer is determined based on the edge vector representation of the connection edge between the node of this layer and the node of the next layer and the corresponding feature vector representation of the node of the next layer The eigenvector representation corresponding to the nodes of . For example, the eigenvector representation corresponding to node d can be determined based on the edge vector representation of edge d-f and edge d-g, and then the node a can be determined based on the edge vector representation of edge a-b, edge a-c, edge a-d and the eigenvector representation of nodes b, c, and d A feature vector representation of the corresponding advertisement text.

根据本公开的另一方面,还提供一种广告文本推荐方法,包括:获取目标广告文本和多个候选广告文本;利用上述广告文本的相关度确定方法,确定所述目标广告文本和所述多个候选广告文本之间的相关度;以及基于所述目标广告文本和所述多个候选广告文本之间的相关度,从所述多个候选广告文本中确定至少一个待推荐广告文本。According to another aspect of the present disclosure, there is also provided a method for recommending an advertisement text, including: acquiring a target advertisement text and a plurality of candidate advertisement texts; the degree of relevance between the candidate advertisement texts; and based on the degree of relevance between the target advertisement text and the plurality of candidate advertisement texts, determine at least one advertisement text to be recommended from the plurality of candidate advertisement texts.

在一些示例中,可以是基于所述目标广告文本和所述多个候选广告文本之间的相关度,对所述多个候选广告文本进行排序,基于排序结果从中确定至少一个待推荐广告文本。In some examples, the multiple candidate advertisement texts may be sorted based on the correlation between the target advertisement text and the multiple candidate advertisement texts, and at least one advertisement text to be recommended is determined based on the sorting result.

根据本公开的另一方面,还提供一种广告文本的相关度确定装置。图5示出了根据本公开示例性实施例的特征向量确定单元500的结构框图。图6示出了根据本公开示例性实施例的广告文本的相关度确定装置600的结构框图。According to another aspect of the present disclosure, an apparatus for determining the relevance of advertisement text is also provided. Fig. 5 shows a structural block diagram of a feature vector determining unit 500 according to an exemplary embodiment of the present disclosure. Fig. 6 shows a structural block diagram of an apparatus 600 for determining the relevance of advertisement text according to an exemplary embodiment of the present disclosure.

如图5所示,特征向量确定单元500包括:As shown in Figure 5, the feature vector determination unit 500 includes:

第一获取子单元501,被配置为获取包含多个广告文本的历史广告数据,所述多个广告文本包括所述第一广告文本和第二广告文本;The first acquiring subunit 501 is configured to acquire historical advertisement data including a plurality of advertisement texts, the plurality of advertisement texts including the first advertisement text and the second advertisement text;

第二获取子单元502,被配置为获取所述多个广告文本中每个广告文本的收益信息;The second obtaining subunit 502 is configured to obtain the revenue information of each advertisement text in the plurality of advertisement texts;

第一确定子单元503,被配置为针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本;以及The first determination subunit 503 is configured to, for each advertisement text in the multiple advertisement texts, based on the co-occurrence relationship among the multiple advertisement texts in the historical advertisement data, select at least one associated text that corresponds to the ad text identified in at least one other ad text; and

第二确定子单元504,被配置为针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本对应的语义和收益信息,确定与该广告文本对应的特征向量表示。The second determination subunit 504 is configured to, for each advertisement text in the plurality of advertisement texts, based on the semantics and revenue information of the advertisement text and the semantics and revenue information corresponding to at least one associated text, determine the corresponding The feature vector representation corresponding to the advertisement text.

单元500的单元501-单元504的操作与前面描述的步骤S201-步骤S204的操作类似,在此不作赘述。The operations of unit 501 - unit 504 of unit 500 are similar to the operations of step S201 - step S204 described above, and will not be repeated here.

如图6所示,广告文本的相关度确定装置600包括:As shown in FIG. 6 , the apparatus 600 for determining the relevance of advertisement text includes:

第一获取单元601,被配置为与第一广告文本对应的第一特征向量表示和与第二广告文本对应的第二特征向量表示;以及The first acquiring unit 601 is configured as a first feature vector representation corresponding to the first advertisement text and a second feature vector representation corresponding to the second advertisement text; and

第一确定单元602,被配置为基于所述第一特征向量表示和第二特征向量表示,确定所述第一广告文本和第二广告文本之间的相关度,其中,所述第一特征向量表示和第二特征向量表示是利用单元200得到的。The first determining unit 602 is configured to determine the correlation between the first advertisement text and the second advertisement text based on the first feature vector representation and the second feature vector representation, wherein the first feature vector The representation and the second eigenvector representation are obtained using unit 200 .

装置600的单元601-单元602的操作与前面描述的步骤S301-步骤S302的操作类似,在此不作赘述。The operations of unit 601 - unit 602 of the device 600 are similar to the operations of step S301 - step S302 described above, and will not be repeated here.

根据一些实施例,所述第一确定子单元包括:构建模块,被配置为基于所述历史广告数据中所述多个广告文本之间的共现关系,构建包含与所述多个广告文本一一对应的多个节点的文本图,其中,针对所述多个文本那种的任意两个广告文本,响应于确定所述两个广告文本之间的共现关系满足预设条件,基于所述两个广告文本对应的节点建立连接边;以及第一确定模块,被配置为针对所述多个节点中的每个节点,基于所述多个节点之间的连接关系,从至少一个其他节点中确定与该节点对应的至少一个关联节点,以得到与相应的广告文本对应的至少一个关联文本。According to some embodiments, the first determination subunit includes: a construction module configured to, based on the co-occurrence relationship among the multiple advertisement texts in the historical advertisement data, construct an A text graph corresponding to a plurality of nodes, wherein, for any two advertisement texts of the plurality of texts, in response to determining that the co-occurrence relationship between the two advertisement texts satisfies a preset condition, based on the The nodes corresponding to the two advertisement texts establish a connection edge; and the first determination module is configured to, for each node in the plurality of nodes, based on the connection relationship between the plurality of nodes, from at least one other node Determine at least one associated node corresponding to the node, so as to obtain at least one associated text corresponding to the corresponding advertisement text.

根据一些实施例,所述第二确定子单元包括:第二确定模块,被配置为针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示;以及第三确定模块,被配置为基于所述至少一个连接边的边向量表示,确定该广告文本的特征向量表示。According to some embodiments, the second determination subunit includes: a second determination module configured to, for each connection edge in at least one connection edge between the node corresponding to the advertisement text and the corresponding at least one associated node , based on the semantics and revenue information of the advertisement text corresponding to the two endpoints of the connecting edge, determine the edge vector representation of the connecting edge; and a third determining module, configured to be based on the edge vector representation of the at least one connecting edge , to determine the feature vector representation of the advertisement text.

根据一些实施例,所述第二确定模块被配置为:确定所述两个端点分别对应的广告文本之间的语义相似度;以及基于所述语义相似度和所述两个端点分别对应的广告文本的收益信息,确定该连接边的边向量表示。According to some embodiments, the second determination module is configured to: determine the semantic similarity between the advertisement texts respectively corresponding to the two endpoints; and determine the advertisement text corresponding to the two endpoints based on the semantic similarity The revenue information of the text determines the edge vector representation of the connected edge.

根据一些实施例,所述第二确定模块被配置为:将所述连接边的两个端点分别对应的广告文本输入预训练语言模型,以获取所述预训练语言模型所输出的所述语义相似度,其中,所述预训练语言模型是利用标注语料数据进行训练得到的。According to some embodiments, the second determination module is configured to: input the advertisement texts respectively corresponding to the two endpoints of the connecting edge into a pre-trained language model, so as to obtain the semantic similarity outputted by the pre-trained language model degree, wherein the pre-trained language model is obtained by using labeled corpus data for training.

根据一些实施例,所述特征向量确定单元还包括:第三确定子单元,被配置为针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,确定所述连接边的两个端点分别对应的广告文本之间的共现频率,并且其中,所述第二确定模块被配置为基于所述连接边的两个端点分别对应的广告文本的语义和收益信息以及所述共现频率,确定该连接边的边向量表示。According to some embodiments, the feature vector determining unit further includes: a third determining subunit configured to, for each connection in at least one connection edge between the node corresponding to the advertisement text and the corresponding at least one associated node edge, determining the co-occurrence frequency between the advertisement texts respectively corresponding to the two endpoints of the connecting edge, and wherein, the second determination module is configured to be based on the Semantic and revenue information, as well as the co-occurrence frequency, determine the edge vector representation of the connected edge.

根据一些实施例,所述特征向量确定单元还包括:处理子单元,被配置为对所述共现频率和所述收益信息分别执行归一化处理,以得到归一化共现频率和归一化收益信息,并且其中,所述第一确定模块被配置为基于所述连接边的两个端点分别对应的广告文本的语义和归一化收益信息以及所述归一化共现频率,确定该连接边的边向量表示。According to some embodiments, the eigenvector determining unit further includes: a processing subunit configured to perform normalization processing on the co-occurrence frequency and the income information respectively to obtain a normalized co-occurrence frequency and a normalized revenue information, and wherein the first determining module is configured to determine the Edge vector representation of connected edges.

根据一些实施例,所述第二确定模块被配置为:将所述连接边的两个端点分别对应的广告文本及其收益信息输入边向量编码模型,以得到所述边向量编码模型所输出的所述边向量表示,其中,所述边向量编码模型是利用如下方式进行训练得到的:获取包含与多个样本文本一一对应的多个节点的样本文本图和所述多个样本文本中每个样本文本的收益信息,所述样本文本图中包括多个用于连接所述多个节点的多个连接边;针对样本文本图所包括的多个连接边中的每个连接边,将所述连接边的两个端点分别对应的样本文本及其收益信息输入所述边向量编码模型,以得到所述边向量编码模型所输出的所述连接板的边向量表示;基于所述多个连接边的边向量表示,确定所述多个节点对应的多个样本文本的特征向量表示;获取所述多个样本文本中的第一样本文本和第二样本文本之间的真实相关度;基于与所述第一样本文本和第二样本文本分别对应的第一特征向量表示和第二特征向量表示,确定所述第一样本文本和第二样本文本之间的预测相关度;以及基于所述真实相关度和所述预测相关度,调整所述边向量编码模型的参数。According to some embodiments, the second determination module is configured to: input the advertisement text corresponding to the two endpoints of the connecting edge and its revenue information into the edge vector encoding model, so as to obtain the output of the edge vector encoding model The edge vector representation, wherein the edge vector coding model is obtained by training in the following manner: obtain a sample text graph containing a plurality of nodes corresponding to a plurality of sample texts one-to-one and each of the plurality of sample texts Revenue information of a sample text graph, the sample text graph includes a plurality of connection edges used to connect the plurality of nodes; for each connection edge in the plurality of connection edges included in the sample text graph, the The sample text corresponding to the two endpoints of the connection edge and the income information thereof are input into the edge vector encoding model to obtain the edge vector representation of the connection plate output by the edge vector encoding model; based on the multiple connections The edge vector representation of the edge determines the feature vector representation of a plurality of sample texts corresponding to the plurality of nodes; obtains the true correlation between the first sample text and the second sample text in the plurality of sample texts; based on The first feature vector representation and the second feature vector representation corresponding to the first sample text and the second sample text respectively, determine the predicted correlation between the first sample text and the second sample text; and based on The real correlation degree and the predicted correlation degree adjust parameters of the side vector coding model.

根据一些实施例,所述多个广告文本包括多个历史查询文本和多个历史推荐文本,所述历史广告数据包括多个文本对,所述多个文本对中的每个文本对均包括一个历史查询文本和基于所述历史查询文本向用户推荐的历史推荐文本,并且其中,所述构建模块被配置为:响应于确定所述历史广告数据包括由所述两个广告文本组成的文本对,以所述两个广告文本对应的节点为顶点建立文本图的边。According to some embodiments, the plurality of advertisement texts include a plurality of historical query texts and a plurality of historical recommendation texts, the historical advertisement data includes a plurality of text pairs, each text pair in the plurality of text pairs includes a historical query text and historical recommended text recommended to users based on the historical query text, and wherein the building block is configured to: in response to determining that the historical advertisement data includes a text pair consisting of the two advertisement texts, The edges of the text graph are established with the nodes corresponding to the two advertisement texts as vertices.

根据一些实施例,所述第一确定模块被配置为:针对所述多个其他节点中的每个其他节点,响应于该其他节点与所述节点之间的连接跳数不大于预设阈值,确定该其他节点为所述关联节点。According to some embodiments, the first determination module is configured to: for each other node among the plurality of other nodes, in response to the number of connection hops between the other node and the node being not greater than a preset threshold, Determine the other node as the associated node.

根据一些实施例,所述第一确定模块被配置为:基于所述多个节点之间的连接关系和预设规则,确定所述多个其他节点各自的采样概率,其中,根据所述预设规则,所述多个其他节点中与所述节点之间的连接跳数更小的节点的采样概率大于与所述节点之间的连接跳数更大的节点的采样概率;以及基于所述采样概率,对所述多个其他节点进行随机采样,以得到所述至少一个关联节点。According to some embodiments, the first determination module is configured to: determine the respective sampling probabilities of the plurality of other nodes based on the connection relationship between the plurality of nodes and preset rules, wherein, according to the preset rule, the sampling probability of a node with a smaller connection hop between the nodes among the plurality of other nodes is greater than the sampling probability of a node with a larger connection hop between the node; and based on the sampling probability, performing random sampling on the plurality of other nodes to obtain the at least one associated node.

根据本公开的另一方面,还提供一种广告文本推荐装置,包括:第二获取单元,被配置为获取目标广告文本和多个候选广告文本;如上所述的广告文本的相关度确定装置,被配置为确定所述目标广告文本和所述多个候选广告文本之间的相关度;以及第二确定单元,被配置为基于所述目标广告文本和所述多个候选广告文本之间的相关度,从所述多个候选广告文本中确定至少一个待推荐广告文本。According to another aspect of the present disclosure, there is also provided an advertisement text recommendation device, including: a second acquisition unit configured to acquire a target advertisement text and a plurality of candidate advertisement texts; the above-mentioned apparatus for determining the relevance of advertisement texts, configured to determine the degree of relevance between the target advertisement text and the plurality of candidate advertisement texts; and a second determination unit configured to, based on the correlation between the target advertisement text and the plurality of candidate advertisement texts degree, at least one advertisement text to be recommended is determined from the plurality of candidate advertisement texts.

根据本公开的另一方面,还提供一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的广告文本的相关度确定方法。According to another aspect of the present disclosure, there is also provided an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information executable by the at least one processor. instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned method for determining the relevance of advertisement text.

根据本公开的另一方面,还提供一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行上述的广告文本的相关度确定方法。According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the above-mentioned method for determining the relevance of advertisement text.

根据本公开的另一方面,还提供一种计算机程序产品,包括计算机程序,其中,所述计算机程序再被处理器执行时实现上述的广告文本的相关度确定方法。According to another aspect of the present disclosure, there is also provided a computer program product, including a computer program, wherein, when the computer program is executed by a processor, the above-mentioned method for determining the relevance of an advertisement text is implemented.

参考图7,现将描述可以作为本公开的服务器或客户端的电子设备700的结构框图,其是可以应用于本公开的各方面的硬件设备的示例。电子设备旨在表示各种形式的数字电子的计算机设备,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。Referring to FIG. 7 , a structural block diagram of an electronic device 700 that can serve as a server or a client of the present disclosure will now be described, which is an example of a hardware device that can be applied to various aspects of the present disclosure. Electronic device is intended to mean various forms of digital electronic computing equipment, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图7所示,设备700包括计算单元701,其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序,来执行各种适当的动作和处理。在RAM 703中,还可存储设备700操作所需的各种程序和数据。计算单元701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7, the device 700 includes a computing unit 701 that can execute according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random-access memory (RAM) 703. Various appropriate actions and treatments. In the RAM 703, various programs and data necessary for the operation of the device 700 can also be stored. The computing unit 701 , ROM 702 , and RAM 703 are connected to each other through a bus 704 . An input/output (I/O) interface 705 is also connected to the bus 704 .

设备700中的多个部件连接至I/O接口705,包括:输入单元706、输出单元707、存储单元708以及通信单元709。输入单元706可以是能向设备700输入信息的任何类别的设备,输入单元706可以接收输入的数字或字符信息,以及产生与电子设备的用户设置和/或功能控制有关的键信号输入,并且可以包括但不限于鼠标、键盘、触摸屏、轨迹板、轨迹球、操作杆、麦克风和/或遥控器。输出单元707可以是能呈现信息的任何类别的设备,并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。存储单元708可以包括但不限于磁盘、光盘。通信单元709允许设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组,例如蓝牙TM设备、802.11设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。Multiple components in the device 700 are connected to the I/O interface 705 , including: an input unit 706 , an output unit 707 , a storage unit 708 and a communication unit 709 . The input unit 706 may be any type of device capable of inputting information to the device 700, the input unit 706 may receive input digital or character information, and generate key signal input related to user settings and/or function control of the electronic device, and may Including but not limited to mouse, keyboard, touch screen, trackpad, trackball, joystick, microphone and/or remote control. The output unit 707 may be any kind of device capable of presenting information, and may include, but is not limited to, a display, a speaker, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 708 may include, but is not limited to, a magnetic disk and an optical disk. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network such as the Internet and/or various telecommunication networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chipset , such as Bluetooth™ devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理,例如广告文本的相关度确定方法。例如,在一些实施例中,广告文本的相关度确定方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元708。在一些实施例中,计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到设备700上。当计算机程序加载到RAM 703并由计算单元701执行时,可以执行上文描述的广告文本的相关度确定方法的一个或多个步骤。备选地,在其他实施例中,计算单元701可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行广告文本的相关度确定方法。The computing unit 701 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 701 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 executes various methods and processes described above, such as a method for determining the relevance of advertisement text. For example, in some embodiments, the method for determining the relevance of advertisement text may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 708 . In some embodiments, part or all of the computer program may be loaded and/or installed on the device 700 via the ROM 702 and/or the communication unit 709 . When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the method for determining the relevance of advertisement text described above can be executed. Alternatively, in other embodiments, the computing unit 701 may be configured in any other appropriate way (for example, by means of firmware) to execute the method for determining the relevance of advertisement text.

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、互联网和区块链网络。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: local area networks (LANs), wide area networks (WANs), the Internet, and blockchain networks.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.

应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行、也可以顺序地或以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

虽然已经参照附图描述了本公开的实施例或示例,但应理解,上述的方法、系统和设备仅仅是示例性的实施例或示例,本发明的范围并不由这些实施例或示例限制,而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外,可以通过不同于本公开中描述的次序来执行各步骤。进一步地,可以以各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进,在此描述的很多要素可以由本公开之后出现的等同要素进行替换。Although the embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it should be understood that the above-mentioned methods, systems and devices are merely exemplary embodiments or examples, and the scope of the present invention is not limited by these embodiments or examples, but It is limited only by the appended claims and their equivalents. Various elements in the embodiments or examples may be omitted or replaced by equivalent elements thereof. Also, steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples can be combined in various ways. Importantly, as technology advances, many of the elements described herein may be replaced by equivalent elements appearing after this disclosure.

Claims (27)

1.一种广告文本的相关度确定方法,包括:1. A method for determining the relevance of an advertisement text, comprising: 获取与第一广告文本对应的第一特征向量表示和与第二广告文本对应的第二特征向量表示;以及acquiring a first feature vector representation corresponding to the first advertisement text and a second feature vector representation corresponding to the second advertisement text; and 基于所述第一特征向量表示和第二特征向量表示,确定所述第一广告文本和第二广告文本之间的相关度,determining a degree of relevance between the first advertisement text and the second advertisement text based on the first feature vector representation and the second feature vector representation, 其中,所述第一特征向量表示和所述第二特征向量表示是利用如下确定过程得到的:Wherein, the first eigenvector representation and the second eigenvector representation are obtained using the following determination process: 获取包含多个广告文本的历史广告数据,所述多个广告文本包括所述第一广告文本和所述第二广告文本;acquiring historical advertisement data including a plurality of advertisement texts, the plurality of advertisement texts including the first advertisement text and the second advertisement text; 获取所述多个广告文本中每个广告文本的收益信息;Acquiring revenue information of each advertisement text in the plurality of advertisement texts; 针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本;以及For each advertisement text in the plurality of advertisement texts, based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, determine from at least one other advertisement text other than the advertisement text that is related to the advertisement text at least one associated text to which the advertisement text corresponds; and 针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本对应的语义和收益信息,确定与该广告文本对应的特征向量表示。For each advertisement text in the plurality of advertisement texts, based on the semantics and revenue information of the advertisement text and the semantics and revenue information corresponding to at least one associated text, determine the feature vector representation corresponding to the advertisement text. 2.如权利要求1所述的方法,其中,所述针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本包括:2. The method according to claim 1, wherein, for each advertisement text in the plurality of advertisement texts, based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, from The at least one associated text determined to correspond to the advertisement text in at least one other advertisement text other than the advertisement text includes: 基于所述历史广告数据中所述多个广告文本之间的共现关系,构建包含与所述多个广告文本一一对应的多个节点的文本图,其中,针对所述多个文本那种的任意两个广告文本,响应于确定所述两个广告文本之间的共现关系满足预设条件,基于所述两个广告文本对应的节点建立连接边;以及Based on the co-occurrence relationship among the plurality of advertisement texts in the historical advertisement data, construct a text graph containing a plurality of nodes corresponding to the plurality of advertisement texts one-to-one, wherein, for the plurality of texts Any two advertisement texts, in response to determining that the co-occurrence relationship between the two advertisement texts satisfies a preset condition, establish a connection edge based on the nodes corresponding to the two advertisement texts; and 针对所述多个节点中的每个节点,基于所述多个节点之间的连接关系,从至少一个其他节点中确定与该节点对应的至少一个关联节点,以得到与相应的广告文本对应的至少一个关联文本。For each node in the plurality of nodes, based on the connection relationship between the plurality of nodes, determine at least one associated node corresponding to the node from at least one other node, so as to obtain the corresponding advertisement text At least one associated text. 3.如权利要求2所述的方法,其中,所述针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本的语义和收益信息,确定该广告文本的特征向量表示包括:3. The method according to claim 2, wherein, for each advertisement text in the plurality of advertisement texts, based on the semantics and revenue information of the advertisement text, and the corresponding semantics and revenue information of at least one associated text Information to determine the feature vector representation of the advertisement text includes: 针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示;以及For each connection edge in at least one connection edge between the node corresponding to the advertisement text and the corresponding at least one associated node, based on the semantics and revenue information of the advertisement text corresponding to the two endpoints of the connection edge, determine an edge vector representation of the connected edge; and 基于所述至少一个连接边的边向量表示,确定该广告文本的特征向量表示。Based on the edge vector representation of the at least one connected edge, a feature vector representation of the advertisement text is determined. 4.如权利要求3所述的方法,其中,所述基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示包括:4. The method according to claim 3, wherein said determining the edge vector representation of the connecting edge based on the semantics and revenue information of the advertising text respectively corresponding to the two endpoints of the connecting edge comprises: 确定所述两个端点分别对应的广告文本之间的语义相似度;以及determining the semantic similarity between the advertisement texts respectively corresponding to the two endpoints; and 基于所述语义相似度和所述两个端点分别对应的广告文本的收益信息,确定该连接边的边向量表示。Based on the semantic similarity and the revenue information of the advertisement text corresponding to the two endpoints, determine the edge vector representation of the connecting edge. 5.如权利要求4所述的方法,其中,所述确定所述两个端点分别对应的广告文本之间的语义相似度包括:5. The method according to claim 4, wherein said determining the semantic similarity between the advertisement texts corresponding to the two endpoints comprises: 将所述连接边的两个端点分别对应的广告文本输入预训练语言模型,以获取所述预训练语言模型所输出的所述语义相似度,其中,所述预训练语言模型是利用标注语料数据进行训练得到的。Inputting the advertisement texts corresponding to the two endpoints of the connecting edge into the pre-training language model to obtain the semantic similarity output by the pre-training language model, wherein the pre-training language model utilizes labeled corpus data obtained by training. 6.如权利要求3-5中任一项所述的方法,其中,所述确定过程还包括:6. The method according to any one of claims 3-5, wherein the determining process further comprises: 针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,确定所述连接边的两个端点分别对应的广告文本之间的共现频率,For each connection edge in at least one connection edge between the node corresponding to the advertisement text and the corresponding at least one associated node, determine the co-occurrence frequency between the advertisement texts respectively corresponding to the two endpoints of the connection edge, 并且其中,基于所述连接边的两个端点分别对应的广告文本的语义和收益信息以及所述共现频率,确定该连接边的边向量表示。And wherein, based on the semantics and revenue information of the advertisement text corresponding to the two end points of the connecting edge and the co-occurrence frequency, the edge vector representation of the connecting edge is determined. 7.如权利要求6所述的方法,其中,所述确定过程还包括:7. The method of claim 6, wherein the determining process further comprises: 对所述共现频率和所述收益信息分别执行归一化处理,以得到归一化共现频率和归一化收益信息,Performing normalization processing on the co-occurrence frequency and the revenue information respectively to obtain a normalized co-occurrence frequency and a normalized revenue information, 并且其中,基于所述连接边的两个端点分别对应的广告文本的语义和归一化收益信息以及所述归一化共现频率,确定该连接边的边向量表示。And wherein, based on the semantics and normalized revenue information of the advertisement text corresponding to the two endpoints of the connecting edge and the normalized co-occurrence frequency, the edge vector representation of the connecting edge is determined. 8.如权利要求2-7中任一项所述的方法,其中,所述基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示包括:8. The method according to any one of claims 2-7, wherein, based on the semantics and revenue information of the advertising text corresponding to the two endpoints of the connecting edge, determining the edge vector representation of the connecting edge includes : 将所述连接边的两个端点分别对应的广告文本及其收益信息输入边向量编码模型,以得到所述边向量编码模型所输出的所述边向量表示,inputting the advertisement text corresponding to the two endpoints of the connecting edge and its revenue information into the edge vector encoding model to obtain the edge vector representation output by the edge vector encoding model, 其中,所述边向量编码模型是利用如下方式进行训练得到的:Wherein, the edge vector coding model is obtained by training in the following manner: 获取包含与多个样本文本一一对应的多个节点的样本文本图和所述多个样本文本中每个样本文本的收益信息,所述样本文本图中包括多个用于连接所述多个节点的多个连接边;Obtaining a sample text graph containing a plurality of nodes one-to-one corresponding to a plurality of sample texts and revenue information of each sample text in the plurality of sample texts, the sample text graph includes multiple Multiple connection edges of nodes; 针对样本文本图所包括的多个连接边中的每个连接边,将所述连接边的两个端点分别对应的样本文本及其收益信息输入所述边向量编码模型,以得到所述边向量编码模型所输出的所述连接板的边向量表示;For each connection edge among the plurality of connection edges included in the sample text graph, input the sample text corresponding to the two end points of the connection edge and its revenue information into the edge vector encoding model to obtain the edge vector an edge vector representation of said connected plate output by the encoding model; 基于所述多个连接边的边向量表示,确定所述多个节点对应的多个样本文本的特征向量表示;Based on the edge vector representations of the plurality of connection edges, determine feature vector representations of a plurality of sample texts corresponding to the plurality of nodes; 获取所述多个样本文本中的第一样本文本和第二样本文本之间的真实相关度;Acquiring the real correlation between the first sample text and the second sample text among the plurality of sample texts; 基于与所述第一样本文本和第二样本文本分别对应的第一特征向量表示和第二特征向量表示,确定所述第一样本文本和第二样本文本之间的预测相关度;以及determining a predicted correlation between the first sample text and the second sample text based on the first feature vector representation and the second feature vector representation respectively corresponding to the first sample text and the second sample text; and 基于所述真实相关度和所述预测相关度,调整所述边向量编码模型的参数。Adjusting parameters of the edge vector coding model based on the true correlation and the predicted correlation. 9.如权利要求2-8中任一项所述的方法,其中,所述多个广告文本包括多个历史查询文本和多个历史推荐文本,所述历史广告数据包括多个文本对,所述多个文本对中的每个文本对均包括一个历史查询文本和基于所述历史查询文本向用户推荐的历史推荐文本,9. The method according to any one of claims 2-8, wherein said plurality of advertisement texts comprises a plurality of historical query texts and a plurality of historical recommendation texts, said historical advertisement data comprises a plurality of text pairs, so Each of the plurality of text pairs includes a historical query text and historical recommendation text recommended to users based on the historical query text, 并且其中,所述响应于确定所述两个广告文本之间的共现关系满足预设条件,以所述两个广告文本对应的节点为顶点建立文本图的边包括:And wherein, in response to determining that the co-occurrence relationship between the two advertisement texts satisfies a preset condition, establishing an edge of the text graph with nodes corresponding to the two advertisement texts as vertices includes: 响应于确定所述历史广告数据包括由所述两个广告文本组成的文本对,以所述两个广告文本对应的节点为顶点建立文本图的边。In response to determining that the historical advertisement data includes a text pair consisting of the two advertisement texts, an edge of the text graph is established with nodes corresponding to the two advertisement texts as vertices. 10.如权利要求2-9中任一项所述的方法,其中,所述针对所述文本图中的每个节点,基于所述多个节点之间的连接关系,从多个其他节点中确定与该节点对应的至少一个关联节点包括:10. The method according to any one of claims 2-9, wherein, for each node in the text graph, based on the connection relationship between the plurality of nodes, from a plurality of other nodes Determining at least one associated node corresponding to the node includes: 针对所述多个其他节点中的每个其他节点,响应于该其他节点与所述节点之间的连接跳数不大于预设阈值,确定该其他节点为所述关联节点。For each of the plurality of other nodes, in response to the number of connection hops between the other node and the node being not greater than a preset threshold, it is determined that the other node is the associated node. 11.如权利要求2-9中任一项所述的方法,其中,所述针对所述文本图中的每个节点,基于所述多个节点之间的连接关系,从多个其他节点中确定与该节点对应的至少一个关联节点包括:11. The method according to any one of claims 2-9, wherein, for each node in the text graph, based on the connection relationship between the plurality of nodes, from a plurality of other nodes Determining at least one associated node corresponding to the node includes: 基于所述多个节点之间的连接关系和预设规则,确定所述多个其他节点各自的采样概率,其中,根据所述预设规则,所述多个其他节点中与所述节点之间的连接跳数更小的节点的采样概率大于与所述节点之间的连接跳数更大的节点的采样概率;以及Based on the connection relationship between the plurality of nodes and preset rules, determine the respective sampling probabilities of the plurality of other nodes, wherein, according to the preset rules, among the plurality of other nodes and between the nodes The sampling probability of a node with a smaller connection hop is greater than the sampling probability of a node with a larger connection hop between the node; and 基于所述采样概率,对所述多个其他节点进行随机采样,以得到所述至少一个关联节点。Randomly sampling the plurality of other nodes based on the sampling probability to obtain the at least one associated node. 12.一种广告文本推荐方法,包括:12. A method for recommending an advertisement text, comprising: 获取目标广告文本和多个候选广告文本;Obtain the target advertisement text and multiple candidate advertisement texts; 利用权利要求1-11中任一项所述的方法,确定所述目标广告文本和所述多个候选广告文本之间的相关度;以及Using the method according to any one of claims 1-11, determining the degree of relevance between the target advertisement text and the plurality of candidate advertisement texts; and 基于所述目标广告文本和所述多个候选广告文本之间的相关度,从所述多个候选广告文本中确定至少一个待推荐广告文本。Based on the correlation between the target advertisement text and the plurality of candidate advertisement texts, at least one advertisement text to be recommended is determined from the plurality of candidate advertisement texts. 13.一种广告文本的相关度确定装置,包括:13. A device for determining the relevance of an advertisement text, comprising: 第一获取单元,被配置为获取与第一广告文本对应的第一特征向量表示和与第二广告文本对应的第二特征向量表示;以及A first acquiring unit configured to acquire a first feature vector representation corresponding to the first advertisement text and a second feature vector representation corresponding to the second advertisement text; and 第一确定单元,被配置为基于所述第一特征向量表示和第二特征向量表示,确定所述第一广告文本和所述第二广告文本之间的相关度,The first determining unit is configured to determine the correlation between the first advertisement text and the second advertisement text based on the first feature vector representation and the second feature vector representation, 其中,所述第一特征向量表示和所述第二特征向量表示是利用特征向量确定单元得到的,所述特征向量确定单元包括:Wherein, the first eigenvector representation and the second eigenvector representation are obtained by using a eigenvector determination unit, and the eigenvector determination unit includes: 第一获取子单元,被配置为获取包含多个广告文本的历史广告数据,所述多个广告文本包括所述第一广告文本和第二广告文本;The first acquisition subunit is configured to acquire historical advertisement data including a plurality of advertisement texts, the plurality of advertisement texts including the first advertisement text and the second advertisement text; 第二获取子单元,被配置为获取所述多个广告文本中每个广告文本的收益信息;The second obtaining subunit is configured to obtain the revenue information of each advertisement text in the plurality of advertisement texts; 第一确定子单元,被配置为针对所述多个广告文本中的每个广告文本,基于所述历史广告数据中所述多个广告文本之间的共现关系,从除该广告文本以外的至少一个其他广告文本中确定与该广告文本对应的至少一个关联文本;以及The first determining subunit is configured to, for each advertisement text in the plurality of advertisement texts, based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, from At least one associated text identified for the advertising text in at least one other advertising text; and 第二确定子单元,被配置为针对所述多个广告文本中的每个广告文本,基于该广告文本的语义和收益信息,和相应的至少一个关联文本对应的语义和收益信息,确定与该广告文本对应的特征向量表示。The second determination subunit is configured to determine, for each advertisement text among the plurality of advertisement texts, based on the semantics and revenue information of the advertisement text and the corresponding semantics and revenue information corresponding to at least one associated text The feature vector representation corresponding to the advertisement text. 14.如权利要求13所述的装置,其中,所述第一确定子单元包括:14. The apparatus according to claim 13, wherein the first determining subunit comprises: 构建模块,被配置为基于所述历史广告数据中所述多个广告文本之间的共现关系,构建包含与所述多个广告文本一一对应的多个节点的文本图,其中,针对所述多个文本那种的任意两个广告文本,响应于确定所述两个广告文本之间的共现关系满足预设条件,基于所述两个广告文本对应的节点建立连接边;以及A construction module configured to construct a text graph containing a plurality of nodes corresponding to the plurality of advertisement texts based on the co-occurrence relationship between the plurality of advertisement texts in the historical advertisement data, wherein, for the Any two advertisement texts of the plurality of texts, in response to determining that the co-occurrence relationship between the two advertisement texts satisfies a preset condition, establishing a connection edge based on nodes corresponding to the two advertisement texts; and 第一确定模块,被配置为针对所述多个节点中的每个节点,基于所述多个节点之间的连接关系,从至少一个其他节点中确定与该节点对应的至少一个关联节点,以得到与相应的广告文本对应的至少一个关联文本。The first determination module is configured to, for each node in the plurality of nodes, determine at least one associated node corresponding to the node from at least one other node based on the connection relationship between the plurality of nodes, to At least one associated text corresponding to the corresponding advertisement text is obtained. 15.如权利要求14所述的装置,其中,所述第二确定子单元包括:15. The device according to claim 14, wherein the second determining subunit comprises: 第二确定模块,被配置为针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,基于所述连接边的两个端点分别对应的广告文本的语义和收益信息,确定该连接边的边向量表示;以及The second determination module is configured to, for each connection edge in at least one connection edge between the node corresponding to the advertisement text and the corresponding at least one associated node, based on the advertisement corresponding to the two endpoints of the connection edge Semantic and yield information of the text, determining the edge vector representation of the connected edge; and 第三确定模块,被配置为基于所述至少一个连接边的边向量表示,确定该广告文本的特征向量表示。The third determining module is configured to determine the feature vector representation of the advertisement text based on the edge vector representation of the at least one connected edge. 16.如权利要求15所述的装置,其中,所述第二确定模块被配置为:16. The apparatus of claim 15, wherein the second determining module is configured to: 确定所述两个端点分别对应的广告文本之间的语义相似度;以及determining the semantic similarity between the advertisement texts respectively corresponding to the two endpoints; and 基于所述语义相似度和所述两个端点分别对应的广告文本的收益信息,确定该连接边的边向量表示。Based on the semantic similarity and the revenue information of the advertisement text corresponding to the two endpoints, determine the edge vector representation of the connecting edge. 17.如权利要求16所述的装置,其中,所述第二确定模块被配置为:17. The apparatus of claim 16, wherein the second determining module is configured to: 将所述连接边的两个端点分别对应的广告文本输入预训练语言模型,以获取所述预训练语言模型所输出的所述语义相似度,其中,所述预训练语言模型是利用标注语料数据进行训练得到的。Inputting the advertisement texts corresponding to the two endpoints of the connecting edge into the pre-training language model to obtain the semantic similarity output by the pre-training language model, wherein the pre-training language model utilizes labeled corpus data obtained by training. 18.如权利要求15-17中任一项所述的装置,其中,所述特征向量确定单元还包括:18. The device according to any one of claims 15-17, wherein the feature vector determining unit further comprises: 第三确定子单元,被配置为针对与该广告文本对应的节点与相应的至少一个关联节点之间的至少一个连接边中的每个连接边,确定所述连接边的两个端点分别对应的广告文本之间的共现频率,The third determining subunit is configured to, for each connecting edge in at least one connecting edge between the node corresponding to the advertisement text and the corresponding at least one associated node, determine the corresponding frequency of co-occurrence between ad texts, 并且其中,所述第二确定模块被配置为基于所述连接边的两个端点分别对应的广告文本的语义和收益信息以及所述共现频率,确定该连接边的边向量表示。And wherein, the second determining module is configured to determine the edge vector representation of the connecting edge based on the semantic and revenue information of the advertisement text corresponding to the two endpoints of the connecting edge and the co-occurrence frequency. 19.如权利要求18所述的装置,其中,所述特征向量确定单元还包括:19. The device according to claim 18, wherein the eigenvector determining unit further comprises: 处理子单元,被配置为对所述共现频率和所述收益信息分别执行归一化处理,以得到归一化共现频率和归一化收益信息,a processing subunit configured to perform normalization processing on the co-occurrence frequency and the revenue information respectively to obtain a normalized co-occurrence frequency and a normalized revenue information, 并且其中,所述第一确定模块被配置为基于所述连接边的两个端点分别对应的广告文本的语义和归一化收益信息以及所述归一化共现频率,确定该连接边的边向量表示。And wherein, the first determining module is configured to determine the edge of the connecting edge based on the semantics and normalized revenue information of the advertisement text corresponding to the two endpoints of the connecting edge and the normalized co-occurrence frequency Vector representation. 20.如权利要求15-19中任一项所述的装置,其中,所述第二确定模块被配置为:20. The apparatus according to any one of claims 15-19, wherein the second determining module is configured to: 将所述连接边的两个端点分别对应的广告文本及其收益信息输入边向量编码模型,以得到所述边向量编码模型所输出的所述边向量表示,inputting the advertisement text corresponding to the two endpoints of the connecting edge and its revenue information into the edge vector encoding model to obtain the edge vector representation output by the edge vector encoding model, 其中,所述边向量编码模型是利用如下方式进行训练得到的:Wherein, the edge vector coding model is obtained by training in the following manner: 获取包含与多个样本文本一一对应的多个节点的样本文本图和所述多个样本文本中每个样本文本的收益信息,所述样本文本图中包括多个用于连接所述多个节点的多个连接边;Obtaining a sample text graph containing a plurality of nodes one-to-one corresponding to a plurality of sample texts and revenue information of each sample text in the plurality of sample texts, the sample text graph includes multiple Multiple connection edges of nodes; 针对样本文本图所包括的多个连接边中的每个连接边,将所述连接边的两个端点分别对应的样本文本及其收益信息输入所述边向量编码模型,以得到所述边向量编码模型所输出的所述连接板的边向量表示;For each connection edge among the plurality of connection edges included in the sample text graph, input the sample text corresponding to the two end points of the connection edge and its revenue information into the edge vector encoding model to obtain the edge vector an edge vector representation of said connected plate output by the encoding model; 基于所述多个连接边的边向量表示,确定所述多个节点对应的多个样本文本的特征向量表示;Based on the edge vector representations of the plurality of connection edges, determine feature vector representations of a plurality of sample texts corresponding to the plurality of nodes; 获取所述多个样本文本中的第一样本文本和第二样本文本之间的真实相关度;Acquiring the real correlation between the first sample text and the second sample text among the plurality of sample texts; 基于与所述第一样本文本和第二样本文本分别对应的第一特征向量表示和第二特征向量表示,确定所述第一样本文本和第二样本文本之间的预测相关度;以及determining a predicted correlation between the first sample text and the second sample text based on the first feature vector representation and the second feature vector representation respectively corresponding to the first sample text and the second sample text; and 基于所述真实相关度和所述预测相关度,调整所述边向量编码模型的参数。Adjusting parameters of the edge vector coding model based on the true correlation and the predicted correlation. 21.如权利要求14-20中任一项所述的装置,其中,所述多个广告文本包括多个历史查询文本和多个历史推荐文本,所述历史广告数据包括多个文本对,所述多个文本对中的每个文本对均包括一个历史查询文本和基于所述历史查询文本向用户推荐的历史推荐文本,21. The device according to any one of claims 14-20, wherein the plurality of advertisement texts includes a plurality of historical query texts and a plurality of historical recommendation texts, the historical advertisement data includes a plurality of text pairs, the Each of the plurality of text pairs includes a historical query text and historical recommendation text recommended to users based on the historical query text, 并且其中,所述构建模块被配置为:And where the building blocks are configured as: 响应于确定所述历史广告数据包括由所述两个广告文本组成的文本对,以所述两个广告文本对应的节点为顶点建立文本图的边。In response to determining that the historical advertisement data includes a text pair consisting of the two advertisement texts, an edge of the text graph is established with nodes corresponding to the two advertisement texts as vertices. 22.如权利要求14-21中任一项所述的装置,其中,所述第一确定模块被配置为:22. The apparatus according to any one of claims 14-21, wherein the first determining module is configured to: 针对所述多个其他节点中的每个其他节点,响应于该其他节点与所述节点之间的连接跳数不大于预设阈值,确定该其他节点为所述关联节点。For each of the plurality of other nodes, in response to the number of connection hops between the other node and the node being not greater than a preset threshold, it is determined that the other node is the associated node. 23.如权利要求14-21中任一项所述的装置,其中,所述第一确定模块被配置为:23. The apparatus according to any one of claims 14-21, wherein the first determining module is configured to: 基于所述多个节点之间的连接关系和预设规则,确定所述多个其他节点各自的采样概率,其中,根据所述预设规则,所述多个其他节点中与所述节点之间的连接跳数更小的节点的采样概率大于与所述节点之间的连接跳数更大的节点的采样概率;以及Based on the connection relationship between the plurality of nodes and preset rules, determine the respective sampling probabilities of the plurality of other nodes, wherein, according to the preset rules, among the plurality of other nodes and between the nodes The sampling probability of a node with a smaller connection hop is greater than the sampling probability of a node with a larger connection hop between the node; and 基于所述采样概率,对所述多个其他节点进行随机采样,以得到所述至少一个关联节点。Randomly sampling the plurality of other nodes based on the sampling probability to obtain the at least one associated node. 24.一种广告文本推荐装置,包括:24. An advertising text recommendation device, comprising: 第二获取单元,被配置为获取目标广告文本和多个候选广告文本;a second acquiring unit configured to acquire target advertisement text and a plurality of candidate advertisement texts; 如权利要求13-23中任一项所述的装置,被配置为确定所述目标广告文本和所述多个候选广告文本之间的相关度;以及The apparatus according to any one of claims 13-23, configured to determine a degree of relevance between the target advertisement text and the plurality of candidate advertisement texts; and 第二确定单元,被配置为基于所述目标广告文本和所述多个候选广告文本之间的相关度,从所述多个候选广告文本中确定至少一个待推荐广告文本。The second determining unit is configured to determine at least one advertisement text to be recommended from the multiple candidate advertisement texts based on the correlation between the target advertisement text and the multiple candidate advertisement texts. 25.一种电子设备,包括:25. An electronic device comprising: 至少一个处理器;以及at least one processor; and 与所述至少一个处理器通信连接的存储器;其中a memory communicatively coupled to the at least one processor; wherein 所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-12中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1-12 Methods. 26.一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行根据权利要求1-12中任一项所述的方法。26. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method according to any one of claims 1-12. 27.一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现根据权利要求1-12中任一项所述的方法。27. A computer program product comprising a computer program, wherein said computer program, when executed by a processor, implements the method according to any one of claims 1-12.
CN202211541802.4A 2022-12-02 2022-12-02 Method, device, equipment and medium for determining relevancy of advertisement text Pending CN115829653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211541802.4A CN115829653A (en) 2022-12-02 2022-12-02 Method, device, equipment and medium for determining relevancy of advertisement text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211541802.4A CN115829653A (en) 2022-12-02 2022-12-02 Method, device, equipment and medium for determining relevancy of advertisement text

Publications (1)

Publication Number Publication Date
CN115829653A true CN115829653A (en) 2023-03-21

Family

ID=85543869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211541802.4A Pending CN115829653A (en) 2022-12-02 2022-12-02 Method, device, equipment and medium for determining relevancy of advertisement text

Country Status (1)

Country Link
CN (1) CN115829653A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290631A (en) * 2008-05-28 2008-10-22 北京百问百答网络技术有限公司 Network advertisement automatic delivery method and its system
CN102160079A (en) * 2008-09-19 2011-08-17 摩托罗拉移动公司 Selection of associated content for content items
US20130110594A1 (en) * 2011-10-28 2013-05-02 Microsoft Corporation Ad copy determination
CN103186676A (en) * 2013-04-08 2013-07-03 湖南农业大学 Method for searching thematic knowledge self growth form focused crawlers
CN111428514A (en) * 2020-06-12 2020-07-17 北京百度网讯科技有限公司 Semantic matching method, device, equipment and storage medium
WO2020230938A1 (en) * 2019-05-14 2020-11-19 주식회사 슈퍼갈땐슈퍼맨 Device for recommending receipt advertisements by using products purchased by customers in connection with pos terminal
CN112446728A (en) * 2019-09-04 2021-03-05 百度在线网络技术(北京)有限公司 Advertisement recall method, device, equipment and storage medium
CN113392180A (en) * 2021-01-07 2021-09-14 腾讯科技(深圳)有限公司 Text processing method, device, equipment and storage medium
CN114677165A (en) * 2022-03-10 2022-06-28 北京弗罗达教育科技有限公司 Contextual online advertisement delivery method, device, server and storage medium
CN115269989A (en) * 2022-08-03 2022-11-01 百度在线网络技术(北京)有限公司 Object recommendation method and device, electronic equipment and storage medium
CN115375361A (en) * 2022-08-23 2022-11-22 飞书深诺数字科技(上海)股份有限公司 Method and device for selecting target population for online advertisement delivery and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290631A (en) * 2008-05-28 2008-10-22 北京百问百答网络技术有限公司 Network advertisement automatic delivery method and its system
CN102160079A (en) * 2008-09-19 2011-08-17 摩托罗拉移动公司 Selection of associated content for content items
US20130110594A1 (en) * 2011-10-28 2013-05-02 Microsoft Corporation Ad copy determination
CN103186676A (en) * 2013-04-08 2013-07-03 湖南农业大学 Method for searching thematic knowledge self growth form focused crawlers
WO2020230938A1 (en) * 2019-05-14 2020-11-19 주식회사 슈퍼갈땐슈퍼맨 Device for recommending receipt advertisements by using products purchased by customers in connection with pos terminal
CN112446728A (en) * 2019-09-04 2021-03-05 百度在线网络技术(北京)有限公司 Advertisement recall method, device, equipment and storage medium
CN111428514A (en) * 2020-06-12 2020-07-17 北京百度网讯科技有限公司 Semantic matching method, device, equipment and storage medium
CN113392180A (en) * 2021-01-07 2021-09-14 腾讯科技(深圳)有限公司 Text processing method, device, equipment and storage medium
CN114677165A (en) * 2022-03-10 2022-06-28 北京弗罗达教育科技有限公司 Contextual online advertisement delivery method, device, server and storage medium
CN115269989A (en) * 2022-08-03 2022-11-01 百度在线网络技术(北京)有限公司 Object recommendation method and device, electronic equipment and storage medium
CN115375361A (en) * 2022-08-23 2022-11-22 飞书深诺数字科技(上海)股份有限公司 Method and device for selecting target population for online advertisement delivery and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋胜利;王少龙;陈平;: "面向文本分类的中文文本语义表示方法", 西安电子科技大学学报, no. 02, 30 November 2012 (2012-11-30), pages 95 - 103 *

Similar Documents

Publication Publication Date Title
CN113807440B (en) Method, apparatus, and medium for processing multimodal data using neural networks
CN114612749B (en) Neural network model training method and device, electronic device and medium
CN113722594B (en) Training method and device of recommendation model, electronic equipment and medium
CN114443989B (en) Sorting method, training method of ranking model, device, electronic device and medium
CN116541536A (en) Knowledge-enhanced content generation system, data generation method, device, and medium
CN115359309A (en) Training method and device, equipment and medium of target detection model
CN114219046B (en) Model training method, matching method, device, system, electronic equipment and medium
WO2024027125A1 (en) Object recommendation method and apparatus, electronic device, and storage medium
CN114791982B (en) Object recommendation method and device
CN116306862A (en) Training method, apparatus and medium for text processing neural network
CN115577081A (en) Dialogue method and device, equipment and medium
CN114281990A (en) Document classification method and device, electronic equipment and medium
CN114821233B (en) Training method and device, equipment and medium of target detection model
CN117273107A (en) Training method and training device for text generation model
CN116597454A (en) Image processing method, training method and device of image processing model
CN116244529A (en) Interest point retrieval method, device, electronic device and storage medium
CN115601555A (en) Image processing method and apparatus, device and medium
CN115809364B (en) Object recommendation methods and model training methods
CN114861658B (en) Address information analysis method and device, equipment and medium
CN115600646B (en) Language model training method, device, medium and equipment
CN114205164B (en) Flow classification method and device, training method and device, equipment and medium
CN112765975B (en) Word segmentation disambiguation processing method, device, equipment and medium
CN116050543A (en) Data processing method, device, electronic device, medium and chip
CN112954025B (en) Method, device, equipment, and medium for pushing information based on hierarchical knowledge graph
CN115578501A (en) Image processing method, device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination