[go: up one dir, main page]

CN119522428A - A model training method and related equipment - Google Patents

A model training method and related equipment Download PDF

Info

Publication number
CN119522428A
CN119522428A CN202280097659.6A CN202280097659A CN119522428A CN 119522428 A CN119522428 A CN 119522428A CN 202280097659 A CN202280097659 A CN 202280097659A CN 119522428 A CN119522428 A CN 119522428A
Authority
CN
China
Prior art keywords
weight
information
node
graph
gnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280097659.6A
Other languages
Chinese (zh)
Inventor
李凯迪
王神迪
李小慧
吴艺晖
曹琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN119522428A publication Critical patent/CN119522428A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/51Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

A model training method comprises the steps of obtaining information of a graph, obtaining a first characteristic representation of each node and a second characteristic representation of each side in the graph according to the information of the graph, obtaining a first weight through a first neural network according to the first characteristic representation of each node, wherein the first weight is the weight of the node, fusing the first weight with the corresponding first characteristic representation to obtain a third characteristic representation, and obtaining a second weight through a second neural network according to the second characteristic representation of each side, wherein the second weight is the weight of the side. The first neural network and the second neural network obtained through training can be used as interpreters of the graph to judge the importance degree of each node and the relation between the nodes, and a more complete graph interpretation result can be obtained.

Description

Model training method and related equipment Technical Field
The application relates to the field of artificial intelligence, in particular to a model training method and related equipment.
Background
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The diagram is a data structure including at least one node and at least one edge. In some scenarios, nodes in a graph may be mapped as objects (or referred to as entities), and edges in a graph may be mapped as entities to relationships between entities. The graph may be a directed graph or an undirected graph. Of course, the graph may also include other data besides nodes and edges, such as labels for nodes and labels for edges. In an exemplary scenario, where the scenario is applied to friend recommendation, each node in the graph may represent a user, each edge in the graph may represent social relationships between different users, and the data of each node in the graph is portrait data of the user and behavioral data of the user, such as age, occupation, hobbies, academic, and the like of the user. As another example, applied in the context of merchandise recommendation, each node in the graph may represent a user or a merchandise, and each edge in the graph may represent an interaction relationship between the user and the merchandise, such as a purchase relationship, a collection relationship, and the like. As another example, in a financial pneumatic context, each node in the graph may represent an account number, transaction, or funds.
The interpreter of the graph is used for obtaining the influence degree of each node on the state of a certain node according to the information (such as the information including the nodes and the information of the edges) of the graph. In the prior art, the interpretation of the graph is realized by a disturbance-based interpretation method, the idea of the method is that the information of the graph is input into a graph neural network (graph neural network, GNN) for realizing a target task, the target task can be used for predicting the state of a certain node, and the output of the GNN model is observed by disturbing the input information of the graph, so that the effect of each node in the input information of the graph on the state of the predicted node is determined, and the effect can be used as the influence degree of the GNN model on the state of the certain node.
However, the above approach can only obtain the degree of influence of the nodes on the state of the nodes, and the result of the illustration is incomplete (for example, the degree of influence of the edges between the nodes on the state of the nodes cannot be obtained).
Disclosure of Invention
The model training method provided by the application can obtain a more complete graphic release result compared with the prior art.
The application provides a model training method, which is applied to a cloud side server or terminal equipment, and comprises the following steps:
Obtaining information of an information graph of the graph, wherein the graph comprises a plurality of nodes and edges among the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to an object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects;
Taking the object corresponding to the node as an example, the attribute of the person can be at least one of gender, age, occupation, income, hobbies and education degree, wherein the gender can be male or female, the age can be a number between 0 and 100, the occupation can be a teacher, a programmer, a chef and the like, the hobbies can be basketball, tennis, running and the like, the education degree can be primary school, junior middle school, high school and university and the like, and the application is not limited to the specific type of the attribute of the object. Taking the object corresponding to the node as an example, the object may be a physical object or a virtual object, for example, may be an object such as Application (APP), audio/video, web page, news information, etc., and the attribute of the object may be at least one of an object name, a developer, an installation package size, a class and a good grade, where taking the object as an example, the class of the object may be a chat class, a running game, an office class, etc., and the good grade may be a score, a comment, etc. for the object. Taking the object corresponding to the node as an example, the relationship between the objects may be a relatives relationship or an economic relationship (such as a stock right association, trade association, etc.).
The method comprises the steps of obtaining a first characteristic representation of each node and a second characteristic representation of each side according to information of a graph, obtaining a first weight through a first neural network according to the first characteristic representation of each node, wherein the first weight is the weight of the node, fusing the first weight with the corresponding first characteristic representation to obtain a third characteristic representation, obtaining a second weight through a second neural network according to the second characteristic representation of each side, wherein the second weight is the weight of the side, obtaining a first loss through a graph neural network GNN according to the third characteristic representation and the second weight, wherein the first loss is used for determining a loss function, and updating the first attention network, the second attention network and the GNN according to the loss function.
The first weight may be fused with the corresponding first feature representation, which corresponds to applying a disturbance to the first feature representation, which corresponds to the effect of the first neural network obtaining the magnitude of the applied disturbance according to the first feature representation, since the input to the subsequent task network (e.g. GNN) is the feature after the disturbance is applied (i.e. the third feature representation), with updating of the model, there may be a trend that the first weight may be given to a node with a larger influence on the accuracy of the task performed by the network (i.e. the disturbance may be smaller), the first weight may be given to a node with a smaller influence on the accuracy of the task performed by the network (i.e. the disturbance may be larger), and the first weight may characterize the influence degree of the node. Similarly, the second weight may be input into a subsequent task network (e.g., GNN) as a weight applied to the edge when the task network processes the information of the corresponding edge, for example, a parameter set for the weight of each edge may exist in the task network (typically, the weight of each edge is the same by default), and thus the parameter set for the weight of each edge in the task network may be set as the corresponding second weight. In this way, the disturbance applied to the second feature representation corresponds to the effect of the second neural network, and the magnitude of the applied disturbance is obtained according to the second feature representation, and as the model is updated, there is a trend that the second weight of the side with larger influence on the accuracy of the task performed by the network is given larger and larger (that is, the disturbance is smaller and smaller), and the second weight of the side with smaller influence on the accuracy of the task performed by the network is given smaller and smaller (that is, the disturbance is larger and larger), and the influence degree of the side can be represented by the second weight.
The first neural network and the second neural network obtained through training in the above manner can be used as an interpreter of the graph to judge the importance degree of each node and the relation between the nodes, which is equivalent to obtaining a more complete graph interpretation result compared with the prior art.
In one possible implementation, the first weight may represent the forward influence degree of the corresponding node on the GNN when executing the target task, however, in one implementation, when the first weight of the node is set to be larger, the processing precision of the network is higher, and when the first weight of the node is set to be smaller, the processing precision of the network is still higher, or the influence degree of the node is still low if the first weight of the node is reduced to be smaller, so that the actual influence degree of the node cannot be accurately represented only by the weight (such as the forward influence degree) of one dimension. In the embodiment of the application, the actual influence degree of the node is accurately represented through the feedforward process of the parameter model by the weights of a plurality of dimensions.
In one possible implementation, a third weight may be obtained according to the first weight, where the third weight is a weight of a node, the first weight indicates a degree of adverse effect of a corresponding node on GNN when performing a target task, the third weight is used to fuse with a corresponding first feature representation to obtain a fourth feature representation, and a second loss is obtained through the graph neural network GNN according to the fourth feature representation, where the second loss is used to determine a loss function. For example, when the first weight is larger and the third weight is smaller, the first loss may represent the accuracy of the corresponding model when the weight of the node is larger, the second loss may represent the accuracy of the corresponding model when the weight of the node is smaller, if the first weight of the node is set larger, the processing accuracy of the network is higher, and when the first weight of the node is set smaller, the processing accuracy of the network is still higher, or the degradation is less, the first weight may gradually become smaller along with the update of the model, so that the actual influence degree of the node can be more accurately described, and the accuracy of the network is improved.
In one possible implementation, the first weight is represented as a positive number less than 1, and the sum of the third weight and the corresponding first weight is 1. For example, the first weight is 0.9 and the third weight is 0.1.
In one possible implementation, the first weight may be referred to as a positive mask of the node and the third weight may be referred to as a negative mask of the node.
Similarly, for the edges, a loss function may be constructed by using the positive and negative masks, for example, GNN is used to execute the target task, the second weight indicates the forward influence degree of the corresponding edge on GNN when executing the target task, a fourth weight may be obtained according to the second weight, where the fourth weight is the weight of the edge, the fourth weight indicates the reverse influence degree of the corresponding edge on GNN when executing the target task, and according to the fourth weight, a third loss is obtained through the graph neural network GNN, where the third loss is used to determine the loss function.
In one possible implementation, the first weight is represented as a positive number less than 1 and the sum of the fourth weight and the corresponding second weight is 1.
In one possible implementation, the embedded representation of each node may be fused with the information of the node (e.g., spliced to obtain a first feature representation) that may be heterogeneous information of the node.
For each edge, the first feature representation of the node at the two ends of the edge and the information of the edge thereof can be fused to obtain the second feature representation of the edge (i.e. heterogeneous information of the edge).
In this way, in the heterograms including different types of nodes, the different types of nodes correspond to features in different dimensions, and the meanings of features in the same dimension may also be different. Different types of edge cause in the heterograph also require differential characterization. For nodes, the embodiment can obtain the characteristics representing the heterogeneous information of the nodes by fusing the graph structure information (embedded representation of the nodes) and the original characteristics (information of the nodes). For the edge, the embodiment can obtain the characteristic of heterogeneous information representing the edge by extracting the embedded representation of the head and tail nodes where the edge is located and the fusion representation of the attribute (the information of the edge) carried by the edge. And further, accurate heterogeneous information representation of nodes and edges in the heterogeneous composition is realized.
In one possible implementation, the first feature representation includes features of multiple dimensions, the first weight including a weight corresponding to the features of each dimension, or
The second feature representation includes features of multiple dimensions, and the second weight includes a weight corresponding to the features of each dimension.
In one possible implementation, the first neural network or the second neural network is a neural network based on an attention mechanism.
In one possible implementation, the fusing includes:
And (5) weighting.
In one possible implementation, the objects are characters, different nodes correspond to different characters, edges indicate relatives or economic relations between the characters, and GNN is used for predicting whether at least one character has an economic risk according to information of the graph.
In a second aspect, an embodiment of the present application provides a data processing method, including:
Obtaining information of a graph, wherein the information of the graph comprises a plurality of nodes and edges among the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to an object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects;
Obtaining a first characteristic representation of each node and a second characteristic representation of each side according to the information of the graph;
According to the first characteristic representation of each node, a first weight is obtained through a first neural network, wherein the first weight is the weight of the node;
Obtaining a second weight according to the second characteristic representation of each side through a second neural network, wherein the second weight is the weight of the side, and the second weight is used for representing the importance degree of the corresponding side in the graph.
In one possible implementation, the first feature representation includes an embedded representation (embedding) of the node obtained through the feature network and information of the corresponding node, or
The second feature representation includes a first feature representation of nodes at both ends of the edge and information of the corresponding edge.
In one possible implementation, the first feature representation includes features of multiple dimensions, the first weight including a weight corresponding to the features of each dimension, or
The second feature representation includes features of multiple dimensions, and the second weight includes a weight corresponding to the features of each dimension.
In one possible implementation, the first neural network or the second neural network is a neural network based on an attention mechanism.
In a third aspect, the present application provides a model training apparatus, the apparatus comprising:
The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring information of an information graph of the graph, the graph comprises a plurality of nodes and edges among the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to one object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects;
the processing module is used for obtaining a first characteristic representation of each node and a second characteristic representation of each side according to the information of the graph;
according to the first characteristic representation of each node, a first weight is obtained through a first neural network, wherein the first weight is the weight of the node;
Obtaining a second weight through a second neural network according to the second characteristic representation of each side, wherein the second weight is the weight of the side;
according to the third characteristic representation and the second weight, obtaining a first loss through the graph neural network GNN, wherein the first loss is used for determining a loss function;
And the model updating module is used for updating the first attention network, the second attention network and the GNN according to the loss function.
The first neural network and the second neural network obtained through training in the above manner can be used as an interpreter of the graph to judge the importance degree of each node and the relation between the nodes, which is equivalent to obtaining a more complete graph interpretation result compared with the prior art.
In one possible implementation, the GNN is configured to execute a target task, the first weight indicates a forward influence degree of the corresponding node on the GNN when executing the target task, and the obtaining module is further configured to:
Acquiring a third weight according to the first weight, wherein the third weight is the weight of the node, and the first weight indicates the reverse influence degree of the corresponding node on the GNN when the target task is executed;
And the processing module is further used for obtaining a second loss through the graph neural network GNN according to the fourth characteristic representation, and the second loss is used for determining a loss function.
In one possible implementation, the first weight is represented as a positive number less than 1, and the sum of the third weight and the corresponding first weight is 1.
In one possible implementation, the GNN is configured to perform a target task, the second weight indicates a forward influence degree of the corresponding edge on the GNN when performing the target task, and the obtaining module is further configured to:
Acquiring a fourth weight according to the second weight, wherein the fourth weight is the weight of the edge, and the fourth weight indicates the reverse influence degree of the corresponding edge on the GNN when the target task is executed;
And the processing module is further used for obtaining a third loss through the graph neural network GNN according to the fourth weight, and the third loss is used for determining a loss function.
In one possible implementation, the first weight is represented as a positive number less than 1 and the sum of the fourth weight and the corresponding second weight is 1.
In one possible implementation, the first feature representation includes an embedded representation (embedding) of the node obtained through the feature network and information of the corresponding node, or
The second feature representation includes embedded representations of nodes at both ends of the edge and information of the edge.
In one possible implementation, the first feature representation includes features of multiple dimensions, the first weight including a weight corresponding to the features of each dimension, or
The second feature representation includes features of multiple dimensions, and the second weight includes a weight corresponding to the features of each dimension.
In one possible implementation, the first neural network or the second neural network is a neural network based on an attention mechanism.
In one possible implementation, the fusing includes:
And (5) weighting.
In one possible implementation, the objects are characters, different nodes correspond to different characters, edges indicate relatives or economic relations between the characters, and GNN is used for predicting whether at least one character has an economic risk according to information of the graph.
In a fourth aspect, an embodiment of the present application provides a data processing method, including:
The device comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring information of a graph, the information of the graph comprises a plurality of nodes and edges among the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to one object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises the relation among the objects;
the processing module is used for obtaining a first characteristic representation of each node and a second characteristic representation of each side according to the information of the graph;
According to the first characteristic representation of each node, a first weight is obtained through a first neural network, wherein the first weight is the weight of the node;
And obtaining a second weight according to the second characteristic representation of each side through a second neural network, wherein the second weight is the weight of the side, and the second weight is used for representing the importance degree of the corresponding side in the graph.
In one possible implementation, the first feature representation includes an embedded representation (embedding) of the node obtained through the feature network and information of the corresponding node, or
The second feature representation includes a first feature representation of nodes at both ends of the edge and information of the corresponding edge.
In one possible implementation, the first feature representation includes features of multiple dimensions, the first weight including a weight corresponding to the features of each dimension, or
The second feature representation includes features of multiple dimensions, and the second weight includes a weight corresponding to the features of each dimension.
In one possible implementation, the first neural network or the second neural network is a neural network based on an attention mechanism.
In a fifth aspect, an embodiment of the present application provides a training device, which may include a memory, a processor, and a bus system, where the memory is configured to store a program, and the processor is configured to execute the program in the memory, so as to perform any of the optional methods according to the first aspect.
In a fifth aspect, an embodiment of the present application provides an execution apparatus, which may include a memory, a processor, and a bus system, where the memory is configured to store a program, and the processor is configured to execute the program in the memory, so as to perform any one of the optional methods in the second aspect.
In a sixth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the first and any optional methods, and the second and any optional methods.
In a seventh aspect, embodiments of the present application provide a computer program product comprising code which, when executed, is adapted to carry out the first and any optional methods of the first aspect and the second and any optional methods of the second aspect.
In an eighth aspect, the application provides a chip system comprising a processor for supporting an execution device or training device to perform the functions involved in the above aspects, e.g. to send or process data involved in the above method, or information. In one possible design, the chip system further includes a memory for holding program instructions and data necessary for the execution device or the training device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
Drawings
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of a model training method according to an embodiment of the present application;
FIG. 7 is a schematic representation of a loss function;
FIG. 8 is a schematic illustration of a flow of a model training method;
Fig. 9 is a schematic diagram of an explanation result of the structure of the drawing;
FIG. 10 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 11 is a schematic structural diagram of a model training device according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 13 is a schematic diagram of an implementation device according to an embodiment of the present application;
FIG. 14 is a schematic diagram of a training device according to an embodiment of the present application;
fig. 15 is a schematic diagram of a chip according to an embodiment of the present application.
Detailed Description
Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments of the invention only and is not intended to be limiting of the invention.
Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
First, an application scenario of the present application is described, where the present application may be, but not limited to, applied to a circuit design class application program or a cloud service provided by a cloud side server, and the following description is made separately:
1. graph interpretation class application
The product form of embodiments of the application may be an illustrative type of application. The graph interpretation class application may run on a terminal device or a server on the cloud side.
In one possible implementation, the illustration class application may generate a degree of importance of the nodes and a degree of association between the nodes based on the information of the inputted graph (including information of the nodes and information of edges). In the embodiment of the application, the information of the nodes can be attribute information of objects and relationships among the objects, the attribute information can be various, the objects are exemplified by characters, the objects can include but are not limited to gender, age, occupation, hobbies and the like, the objects can also be articles such as Application (APP), for example, the object features extracted from training samples of the APP market can be names (identifiers), types, sizes and the like of the APP, the object features extracted from training samples of the E-commerce APP can be names of commodities, categories of the commodities, price intervals and the like, and the relationships among the objects can be relatives, economic relationships (such as stock right association, trade association and the like).
The graph interpretation class application in the embodiments of the present application is next described separately from the functional architecture and the product architecture implementing the functions.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a functional architecture of a class application in an embodiment of the present application:
In one possible implementation, embodiments of the present application include a system (e.g., a graph interpretation class application) capable of generating a degree of importance of nodes and a degree of association between nodes based on information of an input graph, wherein inputting different parameter values to the system may cause different graph interpretations to be generated. As shown in FIG. 1, a graph interpretation class application 102 may receive input parameters 101 and produce graph interpretation results 103. The graph interpretation class application 102 is executable on at least one computer system, for example, and includes computer code that, when executed by one or more computers, causes the computers to perform a method for performing graph interpretation correlations.
In one possible implementation, the illustration class design software may run in a terminal device on the end side or in a server on the cloud side.
For example, the terminal device may be installed with graph interpretation class design software and actions including data input, data processing, and data output may be performed for the terminal device.
For example, the terminal device may be provided with a client of the graph interpretation class design software, and the actions including data input and data output may be performed by the terminal device, that is, the terminal device may transmit data required for data processing to a cloud side server, and after the data processing action is performed by the cloud side server, the data processing result may be returned to the terminal device on the end side, and the terminal device outputs based on the processing result.
The physical architecture of running the circuit design class application in an embodiment of the present application is described next.
Referring to fig. 2, fig. 2 is a schematic diagram of a physical architecture of an application program of a circuit design class according to an embodiment of the present application:
Referring to fig. 2, fig. 2 shows a schematic diagram of a system architecture. The system may include a terminal 100 and a server 200. Wherein the server 200 may include one or more servers (illustrated in fig. 2 as including one server as an example), the server 200 may provide illustration services for one or more terminals.
The terminal 100 may install a graphic interpretation class design application program thereon, or open a web page related to the graphic interpretation class design, and the application program and the web page may provide a graphic interpretation class design interface, and the terminal 100 may receive related parameters input by a user on the graphic interpretation class design interface and send the parameters to the server 200, and the server 200 may obtain a processing result based on the received parameters and return the processing result to the terminal 100.
It should be appreciated that in some alternative implementations, the terminal 100 may also perform the actions of deriving the graphical interpretation result by itself based on the received parameters, without requiring a server to cooperate with the implementation, and embodiments of the present application are not limited thereto.
Next, the product form of the terminal 100 of fig. 2 will be described;
the terminal 100 in the embodiment of the present application may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), or the like, which is not limited in the embodiment of the present application.
Fig. 3 shows an alternative hardware configuration of the terminal 100.
Referring to fig. 3, the terminal 100 may include a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150 (optional), an audio circuit 160 (optional), a speaker 161 (optional), a microphone 162 (optional), a processor 170, an external interface 180, a power supply 190, and the like. It will be appreciated by those skilled in the art that fig. 3 is merely an example of a terminal or multifunction device and is not limiting of the terminal or multifunction device and may include more or fewer components than shown, or may combine certain components, or different components.
The input unit 130 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the portable multifunction device. In particular, the input unit 130 may comprise a touch screen 131 (optional) and/or other input devices 132. The touch screen 131 may collect touch operations on or near the user (e.g., operations of the user on or near the touch screen using any suitable object such as a finger, a joint, a stylus, etc.), and drive the corresponding connection means according to a preset program. The touch screen may detect a touch action of a user on the touch screen, convert the touch action into a touch signal, send the touch signal to the processor 170, and receive a command sent by the processor 170 and execute the command, where the touch signal includes at least touch point coordinate information. The touch screen 131 may provide an input interface and an output interface between the terminal 100 and a user. In addition, the touch screen may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 130 may include other input devices in addition to the touch screen 131. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys 132, switch keys 133, etc.), a trackball, mouse, joystick, etc.
Wherein the input device 132 may receive information illustrating relevant parameters, such as a graph in an embodiment of the application, etc.
The display unit 140 may be used to display information input by a user or information provided to the user, various menus of the terminal 100, an interactive interface, file display, and/or play of any of the multimedia files. In an embodiment of the present application, the display unit 140 may be used to display an interface of the graphic interpretation class design application, a schematic of the graphic interpretation result, and the like.
The memory 120 may be used to store instructions and data, and the memory 120 may mainly include a storage instruction area and a storage data area, where the storage data area may store various data, such as multimedia files, text, etc., and the storage instruction area may store software elements, such as operating systems, applications, instructions required for at least one function, etc., or a subset, an extension set thereof. And may also include non-volatile random access memory, and providing processor 170 with support for controlling software and applications, including managing hardware, software, and data resources in the computing processing device. And is also used for storing multimedia files and storing running programs and applications.
The processor 170 is a control center of the terminal 100, connects various parts of the entire terminal 100 using various interfaces and lines, and performs various functions of the terminal 100 and processes data by executing or executing instructions stored in the memory 120 and calling data stored in the memory 120, thereby controlling the terminal device as a whole. Alternatively, the processor 170 may include one or more processing units, and preferably the processor 170 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, etc., and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 170. In some embodiments, the processor, memory, may be implemented on a single chip, or they may be implemented separately on separate chips in some embodiments. The processor 170 may be further configured to generate corresponding operation control signals to corresponding components of the computing processing device, and to read and process data in the software, and in particular, to read and process data and programs in the memory 120, so that each functional module therein performs a corresponding function, thereby controlling the corresponding components to act as required by the instructions.
The memory 120 may be used to store software codes related to the illustrated method, and the processor 170 may execute steps of the illustrated method of the chip, and may schedule other units (such as the input unit 130 and the display unit 140) to implement corresponding functions.
The rf unit 110 (optional) may be configured to receive and transmit information or signals during a call, for example, receive downlink information from a base station, process the downlink information with the processor 170, and transmit uplink data to the base station. Typically, RF circuitry includes, but is not limited to, antennas, at least one amplifier, transceivers, couplers, low noise amplifiers (Low Noise Amplifier, LNAs), diplexers, and the like. In addition, the radio frequency unit 110 may also communicate with network devices and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (Global System of Mobile communication, GSM), general packet Radio Service (GENERAL PACKET Radio Service, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term Evolution, LTE), email, short message Service (Short MESSAGING SERVICE, SMS), and the like.
In this embodiment of the present application, the rf unit 110 may send parameters such as information of the graph to the server 200, and receive the graph interpretation result sent by the server 200.
It should be appreciated that the radio unit 110 is optional and may be replaced with other communication interfaces, such as a portal.
The terminal 100 also includes a power supply 190 (e.g., a battery) for powering the various components, which may be logically connected to the processor 170 via a power management system, such as a power management system that performs functions such as charge, discharge, and power consumption management.
The terminal 100 further includes an external interface 180, which may be a standard Micro USB interface, or a multi-pin connector, which may be used to connect the terminal 100 to communicate with other devices, or may be used to connect a charger to charge the terminal 100.
Although not shown, the terminal 100 may further include a flash, a wireless fidelity (WIRELESS FIDELITY, wiFi) module, a bluetooth module, sensors of different functions, etc., which will not be described herein. Some or all of the methods described hereinafter may be applied to the terminal 100 as shown in fig. 3.
Next, the product form of the server 200 in fig. 4 will be described;
Fig. 4 provides a schematic structural diagram of a server 200, and as shown in fig. 4, the server 200 includes a bus 201, a processor 202, a communication interface 203, and a memory 204. Communication between processor 202, memory 204, and communication interface 203 is via bus 201.
Bus 201 may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.
The processor 202 may be any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (DIGITAL SIGNAL processor, DSP).
The memory 204 may include volatile memory (RAM), such as random access memory (random access memory). The memory 204 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk (HARD DRIVE DRIVE, HDD) or Solid State Disk (SSD) STATE DRIVE.
The memory 204 may be used for storing software codes related to the illustrated method, and the processor 202 may execute steps of the illustrated method of the chip, or may schedule other units to implement the corresponding functions.
It should be understood that the terminal 100 and the server 200 may be centralized or distributed devices, and the processors (e.g., the processor 170 and the processor 202) in the terminal 100 and the server 200 may be hardware circuits (such as an application SPECIFIC INTEGRATED circuit, an ASIC, a field-programmable gate array GATE ARRAY, an FPGA), a general-purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits, for example, the processor may be a hardware system having a function of executing instructions, such as a CPU, a DSP, etc., or a hardware system having no function of executing instructions, such as an ASIC, an FPGA, etc., or a combination of the hardware systems having no function of executing instructions and a hardware system having a function of executing instructions.
It should be understood that the illustrated method in the embodiments of the present application relates to AI-related operations, and the instruction execution architecture of the terminal device and the server is not limited to the architecture of the processor in fig. 3 and fig. 4 in combination with the memory when performing AI operations. The system architecture provided by the embodiment of the present application is described in detail below with reference to fig. 5.
Fig. 5 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in fig. 5, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data acquisition system 560.
The execution device 510 includes a computing module 511, an I/O interface 512, a preprocessing module 513, and a preprocessing module 514. The calculation module 511 may include a target model/rule 501 therein, with the preprocessing module 513 and preprocessing module 514 being optional.
The executing device 510 may be a terminal device or a server running a circuit design class application.
The data acquisition device 560 is used to acquire training samples. The training samples may be attribute information of objects and relationships between objects, the attribute information may be various, and the objects may be characters, and specifically may include but not limited to gender, age, occupation, hobbies, and the like, the objects may be objects, such as Application (APP), for example, object features extracted from the training samples of APP market may be names (identifications), types, sizes, and the like of APP, object features extracted from the training samples of e-commerce APP may be names of commodities, categories, price intervals, and the like, relationships between objects may be relatives, economic relationships (such as stock right association, trade association, and the like), and tag features may be used to indicate whether the samples are positive examples or negative examples, such as whether the characters have economic risks, and the like. After the training samples are collected, the data collection device 560 stores the training samples in the database 530.
The training device 520 may maintain training samples based on the database 530 to obtain the target model/rule 501 for the neural network to be trained (e.g., the first neural network, the second neural network, the graph neural network, etc. in embodiments of the present application).
It should be noted that, in practical applications, the training samples maintained in the database 530 are not necessarily all acquired by the data acquisition device 560, but may be received from other devices. It should be further noted that the training device 520 is not necessarily completely based on the training samples maintained by the database 530 to perform training of the target model/rule 501, and it is also possible to obtain the training samples from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.
The target model/rule 501 obtained by training according to the training device 520 may be applied to different systems or devices, such as the execution device 510 shown in fig. 5, where the execution device 510 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a vehicle-mounted terminal, or a server.
Specifically, the training device 520 may pass the trained model to the execution device 510.
In fig. 5, an execution device 510 configures an input/output (I/O) interface 512 for data interaction with external devices, and a user may input data (e.g., information of a drawing in an embodiment of the present application, etc.) to the I/O interface 512 through a client device 540.
The preprocessing module 513 and the preprocessing module 514 are used for preprocessing according to the input data received by the I/O interface 512. It should be appreciated that there may be no pre-processing module 513 and pre-processing module 514 or only one pre-processing module. When the preprocessing module 513 and the preprocessing module 514 are not present, the calculation module 511 may be directly employed to process the input data.
In preprocessing input data by the execution device 510, or in performing processing related to computation or the like by the computation module 511 of the execution device 510, the execution device 510 may call data, codes or the like in the data storage system 550 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 550.
Finally, the I/O interface 512 provides the processing results (e.g., the results illustrated in embodiments of the application) to the client device 540, and thus to the user.
In the case shown in FIG. 5, the user may manually give input data, which may be manipulated through an interface provided by I/O interface 512. In another case, the client device 540 may automatically send the input data to the I/O interface 512, and if the client device 540 is required to automatically send the input data requiring authorization from the user, the user may set the corresponding permissions in the client device 540. The user may view the results output by the execution device 510 at the client device 540, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 540 may also be used as a data collection terminal to collect input data from the input I/O interface 512 and output data from the output I/O interface 512 as new sample data, and store the new sample data in the database 530. Of course, instead of being collected by the client device 540, the I/O interface 512 may directly store the input data of the I/O interface 512 and the output result of the I/O interface 512 as new sample data into the database 530.
It should be noted that fig. 5 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 5, the data storage system 550 is an external memory with respect to the execution device 510, and in other cases, the data storage system 550 may be disposed in the execution device 510. It should be appreciated that the execution device 510 described above may be deployed in a client device 540.
From the reasoning side of the model:
in an embodiment of the present application, the computing module 511 of the execution device 520 may obtain codes stored in the data storage system 550 to implement an illustrative method.
In an embodiment of the present application, the computing module 511 of the execution device 520 may include a hardware circuit (such as an application SPECIFIC INTEGRATED circuit, ASIC), a field-programmable gate array (FPGA), a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), a microprocessor, or a microcontroller, etc.), or a combination of these hardware circuits, for example, the training device 520 may be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above hardware systems without an instruction execution function and a hardware system with an instruction execution function.
Specifically, the computing module 511 of the execution device 520 may be a hardware system with an instruction execution function, and the connection relation prediction method provided by the embodiment of the present application may be a software code stored in a memory, and the computing module 511 of the execution device 520 may obtain the software code from the memory and execute the obtained software code to implement the illustrated method provided by the embodiment of the present application.
It should be understood that, the computing module 511 of the execution device 520 may be a combination of a hardware system that does not have an instruction execution function and a hardware system that has an instruction execution function, and some steps of the illustrated method provided in the embodiment of the present application may also be implemented by a hardware system that does not have an instruction execution function in the computing module 511 of the execution device 520, which is not limited herein.
From the training side of the model:
In the embodiment of the present application, the training device 520 may obtain the code stored in the memory (not shown in fig. 5, and may be integrated into the training device 520 or separately disposed from the training device 520) to implement the steps related to model training in the embodiment of the present application.
In an embodiment of the present application, the training device 520 may include a hardware circuit (such as an application SPECIFIC INTEGRATED circuit, ASIC), a field-programmable gate array (FPGA), a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), a microprocessor, or a microcontroller, etc.), or a combination of these hardware circuits, for example, the training device 520 may be a hardware system having a function of executing instructions, such as a CPU, a DSP, etc., or a hardware system having no function of executing instructions, such as an ASIC, an FPGA, etc., or a combination of the above hardware systems having no function of executing instructions and a hardware system having a function of executing instructions.
It should be understood that, the training device 520 may be a combination of a hardware system that does not have a function of executing instructions and a hardware system that has a function of executing instructions, and the steps related to training the model according to the embodiments of the present application may also be implemented by a hardware system that does not have a function of executing instructions in the training device 520, which is not limited herein.
2. Cloud services provided by a server:
In one possible implementation, the server may provide graph interpretation services to the end side through an application programming interface (application programming interface, API).
The terminal device may send relevant parameters (such as information of a graph) to the server through an API provided by the cloud, and the server may obtain a processing result based on the received parameters, and return the processing result (such as a graph interpretation result) to the terminal.
The description of the terminal and the server may be described in the above embodiments, and will not be repeated here.
Because the embodiments of the present application relate to a large number of applications of neural networks, for convenience of understanding, related terms and related concepts of the neural networks related to the embodiments of the present application will be described below.
(1) Neural network
The neural network may be composed of neural units, which may refer to an arithmetic unit with xs (i.e., input data) and intercept 1 as inputs, and the output of the arithmetic unit may be:
Wherein, s=1, 2, &....n, n is a natural number greater than 1, ws is the weight of xs and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be the input of the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.
(2) Deep neural network
Deep neural networks (Deep Neural Network, DNN), also known as multi-layer neural networks, can be understood as neural networks having many hidden layers, many of which are not particularly metrics. The neural network inside the DNN can be divided into three types, namely an input layer, an hidden layer and an output layer, according to the positions of different layers from the DNN. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression: Wherein, Is the input vector which is to be used for the input,Is the output vector of the vector,Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for input vectorsObtaining the output vector through such simple operationSince the DNN layer number is large, the coefficient W and the offset vectorAnd thus a large number. The parameters are defined in DNN by taking the coefficient W as an example, assuming that in DNN of one layer, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined asThe superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. Summarizing, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined asIt should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.
(3) Graph (Graph):
The diagram is a data structure including at least one node and at least one edge. In some scenarios, nodes in the graph may map to entities, and edges in the graph may map to entities-to-entity relationships. The graph may be a directed graph or an undirected graph. Of course, the graph may also include other data besides nodes and edges, such as labels for nodes and labels for edges. In an exemplary scenario, where the scenario is applied to friend recommendation, each node in the graph may represent a user, each edge in the graph may represent social relationships between different users, and the data of each node in the graph is portrait data of the user and behavioral data of the user, such as age, occupation, hobbies, academic, and the like of the user. As another example, applied in the context of merchandise recommendation, each node in the graph may represent a user or a merchandise, and each edge in the graph may represent an interaction relationship between the user and the merchandise, such as a purchase relationship, a collection relationship, and the like. As another example, in a financial pneumatic context, each node in the graph may represent an account number, transaction, or funds. Edges in the figure may represent flow relationships of funds, for example loops in the figure may represent recurring transfers. For another example, in the scenario of determining a connection relationship between network elements in a network system, each node in the graph may represent one network element, for example, a router, a switch, a terminal, etc., and each edge in the graph may represent a connection relationship between different network elements.
(4) Fig. neural network (graph neural network, GNN):
GNN is a deep learning method with structural information that can be used to calculate the current state of a node. The information transmission of the graph neural network is carried out according to a given graph structure, and the state of each node can be updated according to adjacent nodes. Specifically, according to the structure diagram of the current node, the information of all adjacent nodes can be transferred to the current node by taking the neural network as an aggregation function of point information, and the information is updated by combining the state of the current node.
(5) Loss function
In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be predicted to be lower, and the adjustment is continued until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.
(6) Back propagation algorithm
The convolutional neural network can adopt a Back Propagation (BP) algorithm to correct the parameter in the initial super-resolution model in the training process, so that the reconstruction error loss of the super-resolution model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial super-resolution model are updated by back-propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal super-resolution model, such as a weight matrix.
(7) Attention mechanism (attention mechanism)
The attention mechanism mimics the internal process of biological observation behavior, i.e., a mechanism that aligns internal experience with external sensations to increase the observation finesse of a partial region, enabling rapid screening of high value information from a large amount of information with limited attention resources. Attention mechanisms can quickly extract important features of sparse data and are thus widely used for natural language processing tasks, particularly machine translation. While the self-attention mechanism (self-attention mechanism) is an improvement of the attention mechanism, which reduces reliance on external information, and is more adept at capturing internal dependencies of data or features. The essential idea of the attention mechanism can be rewritten as the following formula:
Wherein lx= |source|represents the length of Source, the meaning of the formula is that the constituent elements in Source are imagined to be composed of a series of data pairs, at this time, given an element Query in a Target, the weight coefficient of Value corresponding to each Key is obtained by calculating the similarity or correlation of the Query and each Key, and then the Value is weighted and summed, thus obtaining the final Value. The attribute mechanism essentially performs weighted summation on the Value values of the elements in the Source, and Query and Key are used to calculate the weight coefficients for the corresponding values. Conceptually, attention is understood to mean that a small amount of important information is selectively screened out from a large amount of information and focused on the important information, and most of the unimportant information is ignored. The focusing process is embodied in the calculation of a weight coefficient, and the larger the weight is, the more focused on the Value corresponding to the weight is, namely the weight represents the importance of the information, and the Value is the information corresponding to the weight. The self-Attention mechanism is understood to be internal Attention (intra Attention), and the Attention mechanism occurs between the element Query of the Target and all elements in the Source, and the self-Attention mechanism refers to the Attention mechanism occurring between the elements in the Source or between the elements in the Target, and may also be understood to be an Attention computing mechanism in the special case of target=source, where the specific computing process is the same, but the computing object has changed.
The diagram is a data structure including at least one node and at least one edge. In some scenarios, nodes in a graph may be mapped as objects (or referred to as entities), and edges in a graph may be mapped as entities to relationships between entities. The graph may be a directed graph or an undirected graph. Of course, the graph may also include other data besides nodes and edges, such as labels for nodes and labels for edges. In an exemplary scenario, where the scenario is applied to friend recommendation, each node in the graph may represent a user, each edge in the graph may represent social relationships between different users, and the data of each node in the graph is portrait data of the user and behavioral data of the user, such as age, occupation, hobbies, academic, and the like of the user. As another example, applied in the context of merchandise recommendation, each node in the graph may represent a user or a merchandise, and each edge in the graph may represent an interaction relationship between the user and the merchandise, such as a purchase relationship, a collection relationship, and the like. As another example, in a financial pneumatic context, each node in the graph may represent an account number, transaction, or funds.
The interpreter of the graph is used for obtaining the influence degree of each node on the state of a certain node according to the information (such as the information including the nodes and the information of the edges) of the graph. In the prior art, the interpretation of the graph is realized by a disturbance-based interpretation method, the idea of the method is that the information of the graph is input into a graph neural network (graph neural network, GNN) for realizing a target task, the target task can be used for predicting the state of a certain node, and the output of the GNN model is observed by disturbing the input information of the graph, so that the effect of each node in the input information of the graph on the state of the predicted node is determined, and the effect can be used as the influence degree of the GNN model on the state of the certain node.
However, the above approach can only obtain the degree of influence of the nodes on the state of the nodes, and the result of the illustration is incomplete (for example, the degree of influence of the edges between the nodes on the state of the nodes cannot be obtained).
In order to solve the above-mentioned problems, the present application provides a model training method, referring to fig. 6, fig. 6 is an embodiment schematic diagram of a model training method provided by an embodiment of the present application, and as shown in fig. 6, the model training method provided by the embodiment of the present application includes:
601. obtaining information of an information graph of a graph, wherein the graph comprises a plurality of nodes and edges between the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to an object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects.
In the embodiment of the present application, the execution body of step 601 may be a cloud server, and the server may receive the information of the graph sent from the terminal device, and further the server may obtain the information of the graph.
In an embodiment of the present application, the execution subject of step 601 may be a terminal device, which may be a portable mobile device, such as, but not limited to, a mobile or portable computing device (e.g., a smart phone), a personal computer, a server computer, a handheld device (e.g., a tablet) or laptop, a multiprocessor system, a game console or controller, a microprocessor-based system, a set top box, a programmable consumer electronics, a mobile phone, a mobile computing and/or communication device with a wearable or accessory form factor (e.g., a watch, glasses, a headset or an earplug), a network PC, a minicomputer, a mainframe computer, a distributed computing environment including any of the above systems or devices, and the like.
For convenience of description, the following description will be made as a training device without distinguishing the form of the execution subject.
For the graph to be interpreted, the information of the graph can be obtained, the information of the graph can comprise a plurality of nodes and edges among the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to one object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects.
Taking the object corresponding to the node as an example, the attribute of the person can be at least one of gender, age, occupation, income, hobbies and education degree, wherein the gender can be male or female, the age can be a number between 0 and 100, the occupation can be a teacher, a programmer, a chef and the like, the hobbies can be basketball, tennis, running and the like, the education degree can be primary school, junior middle school, high school and university and the like, and the application is not limited to the specific type of the attribute of the object.
Taking the object corresponding to the node as an example, the object may be a physical object or a virtual object, for example, may be an object such as Application (APP), audio/video, web page, news information, etc., and the attribute of the object may be at least one of an object name, a developer, an installation package size, a class and a good grade, where taking the object as an example, the class of the object may be a chat class, a running game, an office class, etc., and the good grade may be a score, a comment, etc. for the object.
Taking the object corresponding to the node as an example, the relationship between the objects may be a relatives relationship or an economic relationship (such as a stock right association, trade association, etc.).
In training, the information of the graph may be used as an input of the neural network to be trained, and in addition, label information (or referred to as a true value groundtruth) used in the training process may be obtained, and may specifically be related to a task to be implemented by GNN in the neural network to be trained, for example, GNN may be used to predict a state of an object corresponding to a node (for example, to execute a classification problem corresponding to the state), for example, GNN may predict whether an economic risk exists in an object corresponding to each node (for example, whether a problem of insufficient repayment capability may occur), and then label information may indicate whether an economic risk actually occurs in an object corresponding to a node.
602. From the information of the graph, a first feature representation is obtained, which is a feature representation of the node, and a second feature representation, which is a feature representation of the edge.
In one possible implementation, the information of the graph may be input into a feature extraction network (implemented in the feed forward process of training) to obtain a first feature representation for each node and a second feature representation for each edge.
Wherein, for each node, information including the node itself and nearby nodes (e.g., information of k-order subgraphs of each node, k being greater than 1) may be input into the feature extraction network to obtain an embedded representation (embedding) of each node. The embedded representation of a node may include features of multiple dimensions (otherwise known as channels).
In one possible implementation, the embedded representation of each node may be fused with the information of the node (e.g., spliced to obtain a first feature representation) that may be heterogeneous information of the node.
For each edge, the first feature representation of the node at the two ends of the edge and the information of the edge thereof can be fused to obtain the second feature representation of the edge (i.e. heterogeneous information of the edge).
In this way, in the heterograms including different types of nodes, the different types of nodes correspond to features in different dimensions, and the meanings of features in the same dimension may also be different. Different types of edge cause in the heterograph also require differential characterization. For nodes, the embodiment can obtain the characteristics representing the heterogeneous information of the nodes by fusing the graph structure information (embedded representation of the nodes) and the original characteristics (information of the nodes). For the edge, the embodiment can obtain the characteristic of heterogeneous information representing the edge by extracting the embedded representation of the head and tail nodes where the edge is located and the fusion representation of the attribute (the information of the edge) carried by the edge. And further, accurate heterogeneous information representation of nodes and edges in the heterogeneous composition is realized.
In one possible implementation, the feature extraction network described above may be, but is not limited to, a multi-layer pyramid model. The embedded representation may be in the form of a feature vector.
603. According to the first characteristic representation of each node, a first weight is obtained through a first neural network, wherein the first weight is the weight of the node, and the first weight is used for fusing with the corresponding first characteristic representation to obtain a third characteristic representation.
In one possible implementation, the first feature representation of each of the nodes may be input into a first neural network, where the first neural network may be an attention-based neural network, and the first neural network may derive a corresponding first weight (the weight gradually having a determined semantic meaning as the network is updated) from the input feature representation.
In one possible implementation, different neural networks (for example, neural networks with different parameter values) may be used for different types of nodes, and the processing of the steps described above is performed for each node, so that a feature mask of the node may be obtained (the mask may include a first weight corresponding to each node).
The first weights may be fused (e.g., weighted, i.e., weight-based product operation) with the corresponding first feature representation to obtain a third feature representation. The first feature representation is weighted based on the first weight, which corresponds to applying a disturbance to the first feature representation, which corresponds to the effect of the first neural network to obtain the magnitude of the applied disturbance according to the first feature representation, since the input to the subsequent task network (e.g. GNN) is the feature after the disturbance is applied (i.e. the third feature representation), along with updating of the model, the disturbance may have a tendency that the first weight is given to a node with a larger influence on the accuracy of the task performed by the network (i.e. the disturbance is smaller and smaller), the first weight is given to a node with a smaller influence on the accuracy of the task performed by the network (i.e. the disturbance is larger and smaller), and the first weight may further characterize the influence degree of the node.
In one possible implementation, the first feature representation may include features of multiple dimensions, and the first weights may include weights for the features of each dimension, and each weight may be weighted with the features of the corresponding dimension in the first feature representation when fused.
604. And obtaining a second weight according to the second characteristic representation of each side through a second neural network, wherein the second weight is the weight of the side.
In one possible implementation, the second feature representation of each edge may be input into a second neural network, where the second neural network may be an attention-based neural network, and the second neural network may derive a corresponding second weight (the weight gradually having a determined semantic meaning as the network is updated) from the input feature representation.
In one possible implementation, the second weight may be input into a subsequent task network (e.g., GNN) as a weight applied to the edge by the task network when processing the information of the corresponding edge, for example, a parameter set for the weight of each edge may exist in the task network (typically, the weight of each edge is the same by default), and then the parameter set for the weight of each edge in the task network may be set as the corresponding second weight. In this way, the disturbance applied to the second feature representation corresponds to the effect of the second neural network, and the magnitude of the applied disturbance is obtained according to the second feature representation, and as the model is updated, there is a trend that the second weight of the side with larger influence on the accuracy of the task performed by the network is given larger and larger (that is, the disturbance is smaller and smaller), and the second weight of the side with smaller influence on the accuracy of the task performed by the network is given smaller and smaller (that is, the disturbance is larger and larger), and the influence degree of the side can be represented by the second weight.
605. And according to the third characteristic representation and the second weight, obtaining a first loss through a graph neural network GNN, wherein the first loss is used for determining a loss function.
In one possible implementation, the third feature representation and the second weight obtained above may be input into a task network (e.g., GNN), so as to obtain an output result (e.g., GNN is used to implement a target task, the output result is an execution result of the target task), for example, if the target task is a prediction result that predicts whether a person corresponding to each node has an economic risk, the output result may be a prediction result that the person corresponding to each node has an economic risk.
Based on the output result and the label information of the graph, a first loss (representing a difference between the output result and the label information) can be obtained, and a loss function can be determined based on the first loss.
In one possible implementation, the first weight may represent the forward influence degree of the corresponding node on the GNN when performing the target task, however, in one implementation, when the first weight of the node is set to be larger, the processing precision of the network is higher, and when the first weight of the node is set to be smaller, the processing precision of the network is still higher, or the influence degree of the node is still low if the drop is small, so that the actual influence degree of the node cannot be accurately represented only by the weight of one dimension (such as the forward influence degree). In the embodiment of the application, the actual influence degree of the node is accurately represented through the feedforward process of the parameter model by the weights of a plurality of dimensions.
In one possible implementation, a third weight may be obtained according to the first weight, where the third weight is a weight of a node, and the third weight is a degree of reverse influence of a corresponding node on the GNN when the target task is executed, and the third weight is used to fuse with a corresponding first feature representation to obtain a fourth feature representation, and obtain, according to the fourth feature representation, a second loss through the graph neural network GNN, where the second loss is used to determine the loss function. For example, when the first weight is larger and the third weight is smaller, the first loss may represent the accuracy of the corresponding model when the weight of the node is larger, the second loss may represent the accuracy of the corresponding model when the weight of the node is smaller, if the first weight of the node is set larger, the processing accuracy of the network is higher, and when the first weight of the node is set smaller, the processing accuracy of the network is still higher, or the degradation is less, the first weight may gradually become smaller along with the update of the model, so that the actual influence degree of the node can be more accurately described, and the accuracy of the network is improved.
In one possible implementation, the first weight is represented as a positive number less than 1, and the sum of the third weight and the corresponding first weight is 1. For example, the first weight is 0.9 and the third weight is 0.1.
In one possible implementation, the first weight may be referred to as a positive mask of the node and the third weight may be referred to as a negative mask of the node.
Similarly, for the edge, a loss function may also be constructed by using the positive and negative masks, for example, the GNN is used to execute a target task, the second weight indicates the forward influence degree of the corresponding edge on the GNN when executing the target task, a fourth weight may be obtained according to the second weight, where the fourth weight is the weight of the edge, and the fourth weight indicates the reverse influence degree of the corresponding edge on the GNN when executing the target task, and according to the fourth weight, a third loss is obtained through the graph neural network GNN, where the third loss is used to determine the loss function.
In one possible implementation, the first weight is represented as a positive number less than 1, and the sum of the fourth weight and the corresponding second weight is 1.
A specific example of a loss function is described below, which may include three parts:
(1) Applying a mask to affect the accuracy of model classification;
(2) Applying negative mask to influence the model classification accuracy;
(3) Variance of the mask.
Wherein (1) and (2) together enable optimization to update towards generating a mask which is beneficial to the prediction of the GNN model, important nodes and edges can be distributed with more weights, the weights of irrelevant features and edges are continuously reduced, and the causal relationship is continuously enhanced.
(3) The function of the mask is to increase the variance of the mask, so that the mask has better distinction between nodes and edges, and the interpretation quality is improved. A specific representation of the loss function may be as shown in fig. 7. Wherein, masked pred represents the predicted result of the GNN model after mask application.The predicted result of the GNN model after applying the negative mask is shown. var represents a function of variance.
606. Updating the first attention network, the second attention network and the GNN according to the loss function.
During training, the update interpreters (first neural network, second neural mesh) are back-propagated by optimizing the loss function.
During reasoning, as shown in fig. 8, a k-order subgraph where the node to be explained is located can be input, and a feature mask and an edge mask can be obtained through feature extraction and forward computation of the first neural network and the second neural network and respectively serve as node feature interpretation and edge interpretation.
Next, using simulation data in the area of air control, the effects and performance of the embodiments of the present application are demonstrated.
The node types mainly comprise juriders and clients, and the types of the sides comprise cardholders, equity association and the like. The predictive model is a heterogeneous map transformer (Heterogeneous Graph Transformer, HGT) that functions to predict whether the customer is a high risk customer (whether there is a bond breach risk). The basic requirement of (a) is to provide an interpretation of the feature dimension and relationship dimension predicted to be high risk customers. During model training, the following steps may be performed:
(1) Extracting a sub-graph and a corresponding label of the node from the data as input
(2) Heterogeneous information is extracted, and feature inputs and edge inputs are generated.
(3) And (3) inputting features input and edge input in the step (2) into a corresponding feature attention network and edge attention network, calculating to obtain a feature mask and an edge mask, and further obtaining new features based on a positive mask.
(4) And (3) respectively inputting HGT prediction models according to the output of the step (3) to perform reasoning prediction. And calculating loss according to the loss function, and back-propagating parameters of the update interpreter.
Repeating the steps (1) - (4) until the model converges or the cycle number reaches a preset step length.
(5) And (3) reasoning by using the trained interpreter to obtain the interpretation of the corresponding high-risk client.
Fig. 9 is an explanatory effect display of the above embodiment. Dark gray nodes represent clients predicted to be at high risk, and light gray represents normal clients.
As shown in fig. 9, a characteristic dimension and a relationship dimension interpretation are given to the client 1 predicted to be high risk. The explanation of the feature dimension shows that the three features of discovering the age and accumulated release amount of the client 1 and the behavior score have the largest effect on predicting that the client 1 is at high risk, and the explanation of the relation dimension shows that the share right association relationship exists between the client 1 and the high risk legal person 4, so that the client 1 is at high risk. From the interpretation efficiency point of view, GNNexplainer takes 4-7s to generate a single sample interpretation, whereas the time for generating a single sample interpretation of the present invention is 10ms.
As shown in Table 1, the embodiment of the application designs a heterogeneous information extraction module, generates a feature mask and an edge mask by setting an attention network, sets a causal enhancement-based loss function to optimize an interpreter, and ensures that the interpretation is locally optimal based on all training data. The interpreter after the training does not need retraining when generating the interpretation, and the interpretation is less in time consumption and high in efficiency.
TABLE 1
As shown in table 2, the performance of the present invention is compared to the performance of the prior art method on the published data set. It can be seen that the interpretation accuracy of the method is greatly higher than that of the existing method, and the average single sample interpretation time is improved by 5 times on average.
TABLE 2
The embodiment of the application provides a model training method, which comprises the steps of obtaining information of an information graph of a graph, wherein the graph comprises a plurality of nodes and edges between the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to one object, the information of the nodes comprises attributes of the objects, the information of the edges comprises relations among the objects, according to the information of the graph, a first feature representation and a second feature representation are obtained, the first feature representation is a feature representation of the nodes, the second feature representation is a feature representation of the edges, according to the first feature representation of each node, a first weight is obtained through a first neural network, the first weight is the weight of the node, the first weight is used for being fused with the corresponding first feature representation to obtain a third feature representation, according to the second feature representation of each edge, the second weight is obtained through a second neural network, the second weight is the weight of the edges, according to the third feature representation and the second feature representation, the graph is used for obtaining the attention loss through a GN, and the first neural network is used for determining the attention loss according to the first function.
The first weight may be fused with the corresponding first feature representation, which corresponds to applying a disturbance to the first feature representation, which corresponds to the effect of the first neural network obtaining the magnitude of the applied disturbance according to the first feature representation, since the input to the subsequent task network (e.g. GNN) is the feature after the disturbance is applied (i.e. the third feature representation), with updating of the model, there may be a trend that the first weight may be given to a node with a larger influence on the accuracy of the task performed by the network (i.e. the disturbance may be smaller), the first weight may be given to a node with a smaller influence on the accuracy of the task performed by the network (i.e. the disturbance may be larger), and the first weight may characterize the influence degree of the node. Similarly, the second weight may be input into a subsequent task network (e.g., GNN) as a weight applied to the edge when the task network processes the information of the corresponding edge, for example, a parameter set for the weight of each edge may exist in the task network (typically, the weight of each edge is the same by default), and thus the parameter set for the weight of each edge in the task network may be set as the corresponding second weight. In this way, the disturbance applied to the second feature representation corresponds to the effect of the second neural network, and the magnitude of the applied disturbance is obtained according to the second feature representation, and as the model is updated, there is a trend that the second weight of the side with larger influence on the accuracy of the task performed by the network is given larger and larger (that is, the disturbance is smaller and smaller), and the second weight of the side with smaller influence on the accuracy of the task performed by the network is given smaller and smaller (that is, the disturbance is larger and larger), and the influence degree of the side can be represented by the second weight.
The first neural network and the second neural network obtained through training in the above manner can be used as an interpreter of the graph to judge the importance degree of each node and the relation between the nodes, which is equivalent to obtaining a more complete graph interpretation result compared with the prior art.
The recommendation method is introduced from the viewpoint of model training, and the data processing method provided by the embodiment of the application is introduced from the viewpoint of model reasoning:
Referring to fig. 10, fig. 10 is a flowchart of a data processing method according to an embodiment of the present application, and as shown in fig. 10, the data processing method according to the embodiment of the present application includes:
1001. obtaining information of a graph, wherein the information of the graph comprises a plurality of nodes and edges between the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to an object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects;
For a specific description of step 1001, reference may be made to the description of step 601 in the above embodiment, which is not repeated here.
1002. According to the information of the graph, a first characteristic representation and a second characteristic representation are obtained, wherein the first characteristic representation is a characteristic representation of a node, and the second characteristic representation is a characteristic representation of an edge;
For a specific description of step 1002, reference may be made to the description of step 602 in the above embodiment, which is not repeated here.
1003. Obtaining a first weight according to the first characteristic representation of each node through a first neural network, wherein the first weight is the weight of the node;
the first neural network may be obtained through a model training method corresponding to fig. 6.
1004. And obtaining a second weight according to the second characteristic representation of each side through a second neural network, wherein the second weight is the weight of the side, and the second weight is used for representing the importance degree of the corresponding side in the graph.
The second neural network may be obtained through a model training method corresponding to fig. 6.
In one possible implementation, the first feature representation includes an embedded representation (embedding) of the node and information of the corresponding node obtained through the feature network, or
The second feature representation includes a first feature representation of nodes at both ends of the edge and information of the corresponding edge.
In one possible implementation, the first feature representation includes features of multiple dimensions, the first weight including a weight corresponding to the features of each of the dimensions, or
The second feature representation includes features of a plurality of dimensions, and the second weight includes a weight corresponding to the features of each of the dimensions.
In one possible implementation, the first neural network or the second neural network is an attention-mechanism-based neural network.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application, and as shown in fig. 11, a model training apparatus 1100 according to an embodiment of the present application includes:
An obtaining module 1101, configured to obtain information of an information graph of a graph, where the graph includes a plurality of nodes and edges between the nodes, the information of the graph includes information of the nodes and information of the edges, each of the nodes corresponds to an object, the information of the nodes includes an attribute of the object, and the information of the edges includes a relationship between the objects;
The specific description of the obtaining module 1101 may refer to the description of step 601 in the above embodiment, which is not repeated herein.
A processing module 1102, configured to obtain a first feature representation and a second feature representation according to the information of the graph, where the first feature representation is a feature representation of a node, and the second feature representation is a feature representation of an edge;
The specific description of the processing module 1102 may refer to the descriptions of steps 602 to 605 in the above embodiments, and will not be repeated here.
According to the first characteristic representation of each node, a first weight is obtained through a first neural network, wherein the first weight is the weight of the node;
Obtaining a second weight through a second neural network according to the second characteristic representation of each side, wherein the second weight is the weight of the side;
according to the third characteristic representation and the second weight, obtaining a first loss through a graph neural network GNN, wherein the first loss is used for determining a loss function;
A model updating module 1103 is configured to update the first attention network, the second attention network, and the GNN according to the loss function.
The specific description of the model updating module 1103 may refer to the description of step 606 in the above embodiment, which is not repeated here.
The first neural network and the second neural network obtained through training in the above manner can be used as an interpreter of the graph to judge the importance degree of each node and the relation between the nodes, which is equivalent to obtaining a more complete graph interpretation result compared with the prior art.
In one possible implementation, the GNN is configured to execute a target task, the first weight is a forward influence degree of a corresponding node on the GNN when executing the target task, and the acquiring module is further configured to:
Acquiring a third weight according to the first weight, wherein the third weight is the weight of a node, and the third weight is the reverse influence degree of the corresponding node on the GNN when the target task is executed;
The processing module is further configured to obtain, according to the fourth feature representation, a second loss through the graph neural network GNN, where the second loss is used to determine the loss function.
In one possible implementation, the first weight is represented as a positive number less than 1, and the sum of the third weight and the corresponding first weight is 1.
In one possible implementation, the GNN is configured to perform a target task, the second weight indicates a degree of positive influence of a corresponding edge on the GNN when performing the target task, and the acquiring module is further configured to:
Acquiring a fourth weight according to the second weight, wherein the fourth weight is the weight of an edge, and the fourth weight indicates the reverse influence degree of the corresponding edge on the GNN when the target task is executed;
and the processing module is further configured to obtain a third loss through the graph neural network GNN according to the fourth weight, where the third loss is used to determine the loss function.
In one possible implementation, the first weight is represented as a positive number less than 1, and the sum of the fourth weight and the corresponding second weight is 1.
In one possible implementation, the first feature representation includes an embedded representation (embedding) of the node and information of the corresponding node obtained through the feature network, or
The second feature representation includes embedded representations of nodes at both ends of the edge and information of the edge.
In one possible implementation, the first feature representation includes features of multiple dimensions, the first weight including a weight corresponding to the features of each of the dimensions, or
The second feature representation includes features of a plurality of dimensions, and the second weight includes a weight corresponding to the feature of each of the dimensions.
In one possible implementation, the first neural network or the second neural network is an attention-mechanism-based neural network.
In one possible implementation, the fusing includes:
And (5) weighting.
In a possible implementation, the objects are characters, different nodes correspond to different characters, the edges indicate relatives or economic relations between the characters, and the GNN is used for predicting whether at least one of the characters has an economic risk according to the information of the graph.
Referring to fig. 12, fig. 12 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 12, a data processing apparatus 1200 according to an embodiment of the present application includes:
An obtaining module 1201, configured to obtain information of a graph, where the information of the graph includes a plurality of nodes and edges between the nodes, the information of the graph includes information of the nodes and information of the edges, each of the nodes corresponds to an object, the information of the nodes includes an attribute of the object, and the information of the edges includes a relationship between the objects;
the specific description of the acquiring module 1201 may refer to the description of step 1001 in the above embodiment, which is not repeated here.
A processing module 1202, configured to obtain a first feature representation and a second feature representation according to the information of the graph, where the first feature representation is a feature representation of a node, and the second feature representation is a feature representation of an edge;
Obtaining a first weight according to the first characteristic representation of each node through a first neural network, wherein the first weight is the weight of the node;
And obtaining a second weight according to the second characteristic representation of each side through a second neural network, wherein the second weight is the weight of the side, and the second weight is used for representing the importance degree of the corresponding side in the graph.
The specific description of the acquiring module 1202 may refer to the descriptions of steps 1002 to 1004 in the above embodiments, and will not be repeated here.
In one possible implementation, the first feature representation includes an embedded representation (embedding) of the node and information of the corresponding node obtained through the feature network, or
The second feature representation includes a first feature representation of nodes at both ends of the edge and information of the corresponding edge.
In one possible implementation, the first feature representation includes features of multiple dimensions, the first weight including a weight corresponding to the features of each of the dimensions, or
The second feature representation includes features of a plurality of dimensions, and the second weight includes a weight corresponding to the feature of each of the dimensions.
In one possible implementation, the first neural network or the second neural network is an attention-mechanism-based neural network.
Referring to fig. 13, fig. 13 is a schematic structural diagram of an execution device provided in an embodiment of the present application, and the execution device 1300 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a server, etc., which is not limited herein. The recommendation device described in the corresponding embodiment of fig. 12 may be disposed on the execution device 1300, so as to implement the function of the recommendation method in the corresponding embodiment of fig. 10. In particular, the execution device 1300 includes a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (where the number of processors 1303 in the execution device 1300 may be one or more), where the processor 1303 may include an application processor 13031 and a communication processor 13032. In some embodiments of the application, the receiver 1301, transmitter 1302, processor 1303, and memory 1304 may be connected by a bus or other means.
Memory 1304 may include read only memory and random access memory and provides instructions and data to processor 1303. A portion of the memory 1304 may also include non-volatile random access memory (non-volatile random access memory, NVRAM). The memory 1304 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for performing various operations.
The processor 1303 controls operations of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.
The method disclosed in the above embodiment of the present application may be applied to the processor 1303 or implemented by the processor 1303. The processor 1303 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1303. The processor 1303 may be a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), a microprocessor or microcontroller, a visual processor (vision processing unit, VPU), a tensor processor (tensor processing unit, TPU), or the like, which is suitable for AI operation, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The processor 1303 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304, and combines the hardware to perform the steps 1001 to 1004 in the above embodiment.
The receiver 1301 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1302 may be used to output digital or character information via the first interface, the transmitter 1302 may be further used to send instructions to the disk pack via the first interface to modify data in the disk pack, and the transmitter 1302 may further include a display device such as a display screen.
Referring to fig. 14, fig. 14 is a schematic structural diagram of a training device according to an embodiment of the present application, specifically, training device 1400 is implemented by one or more servers, where training device 1400 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1414 (e.g., one or more processors) and a memory 1432, and one or more storage mediums 1430 (e.g., one or more mass storage devices) storing application programs 1442 or data 1444. Wherein the memory 1432 and storage medium 1430 can be transitory or persistent storage. The program stored on the storage medium 1430 may include one or more modules (not shown) each of which may include a series of instruction operations for the training device. Still further, central processor 1414 may be configured to communicate with storage medium 1430 to execute a series of instruction operations in storage medium 1430 on training device 1400.
The training apparatus 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input/output interfaces 1458, or one or more operating systems 1441, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
Specifically, the training apparatus may perform the steps of steps 601 to 606 in the above embodiments.
Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the steps as performed by the aforementioned performing device or causes the computer to perform the steps as performed by the aforementioned training device.
The embodiment of the present application also provides a computer-readable storage medium having stored therein a program for performing signal processing, which when run on a computer, causes the computer to perform the steps performed by the aforementioned performing device or causes the computer to perform the steps performed by the aforementioned training device.
The execution device, the training device or the terminal device provided by the embodiment of the application can be a chip, wherein the chip comprises a processing unit and a communication unit, the processing unit can be a processor, and the communication unit can be an input/output interface, a pin or a circuit, for example. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the execution device to perform the data processing method described in the above embodiment, or to cause the chip in the training device to perform the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, or the like, and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), or the like.
Specifically, referring to fig. 15, fig. 15 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU1500, and the NPU1500 is mounted as a coprocessor on a main CPU (Host CPU), and the Host CPU distributes tasks. The core part of the NPU is an operation circuit 1503, and the controller 1504 controls the operation circuit 1503 to extract matrix data in the memory and perform multiplication.
The NPU 1500 may implement the model training method provided in the embodiment depicted in fig. 6 and the data processing method provided in the embodiment depicted in fig. 10 through inter-cooperation between the various devices within.
More specifically, in some implementations, the arithmetic circuitry 1503 in the NPU1500 includes multiple processing units (PEs) internally. In some implementations, the operation circuit 1503 is a two-dimensional systolic array. The operation circuit 1503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1503 is a general-purpose matrix processor.
For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit takes the data corresponding to matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1501 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 1508.
Unified memory 1506 is used to store input data and output data. The weight data is carried directly to the weight memory 1502 through the memory cell access controller (Direct Memory Access Controller, DMAC) 1505. The input data is also carried into the unified memory 1506 through the DMAC.
BIU is Bus Interface Unit, bus interface unit 1510, for interaction of the AXI bus with the DMAC and instruction fetch memory (Instruction Fetch Buffer, IFB) 1509.
The bus interface unit 1510 (Bus Interface Unit, abbreviated as BIU) is configured to fetch the instruction from the external memory by the instruction fetch memory 1509, and further configured to fetch the raw data of the input matrix a or the weight matrix B from the external memory by the memory unit access controller 1505.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data to the weight memory 1502 or to transfer input data to the input memory 1501.
The vector calculation unit 1507 includes a plurality of operation processing units, and further processes such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like are performed on the output of the operation circuit 1503 if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel level summation, up-sampling of a characteristic plane and the like.
In some implementations, the vector computation unit 1507 can store the vector of processed outputs to the unified memory 1506. For example, the vector calculation unit 1507 may apply a linear function, or a nonlinear function to the output of the operation circuit 1503, for example, to linearly interpolate the feature plane extracted by the convolution layer, and then, for example, to accumulate the vector of values to generate the activation value. In some implementations, the vector calculation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1503, for example for use in subsequent layers in a neural network.
An instruction fetch memory (instruction fetch buffer) 1509 connected to the controller 1504 for storing instructions used by the controller 1504;
The unified memory 1506, the input memory 1501, the weight memory 1502 and the finger memory 1509 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.
The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.

Claims (23)

  1. A method of model training, the method comprising:
    Obtaining information of a graph, wherein the graph comprises a plurality of nodes and edges among the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to an object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects;
    According to the information of the graph, a first characteristic representation and a second characteristic representation are obtained, wherein the first characteristic representation is a characteristic representation of a node, and the second characteristic representation is a characteristic representation of an edge;
    According to the first characteristic representation of each node, a first weight is obtained through a first neural network, wherein the first weight is the weight of the node;
    Obtaining a second weight through a second neural network according to the second characteristic representation of each side, wherein the second weight is the weight of the side;
    according to the third characteristic representation and the second weight, obtaining a first loss through a graph neural network GNN, wherein the first loss is used for determining a loss function;
    Updating the first attention network, the second attention network and the GNN according to the loss function.
  2. The method of claim 1, wherein the GNN is configured to perform a target task, wherein the first weight is a degree of positive impact of a corresponding node on the GNN in performing the target task, and wherein the method further comprises:
    Acquiring a third weight according to the first weight, wherein the third weight is the weight of a node, and the third weight is the reverse influence degree of the corresponding node on the GNN when the target task is executed;
    According to the fourth characteristic representation, a second loss is obtained by the graph neural network GNN, the second loss being used for determining the loss function.
  3. The method of claim 2, wherein the first weight is represented as a positive number less than 1 and the sum of the third weight and the corresponding first weight is 1.
  4. A method according to any one of claims 1 to 3, wherein the GNN is configured to perform a target task, the second weight indicating a degree of positive influence of a corresponding edge on the GNN when performing the target task, the method further comprising:
    Acquiring a fourth weight according to the second weight, wherein the fourth weight is the weight of an edge, and the fourth weight indicates the reverse influence degree of the corresponding edge on the GNN when the target task is executed;
    And according to the fourth weight, obtaining a third loss through the graph neural network GNN, wherein the third loss is used for determining the loss function.
  5. The method of claim 4, wherein the first weight is represented as a positive number less than 1 and the sum of the fourth weight and the corresponding second weight is 1.
  6. The method according to any one of claims 1 to 5, wherein,
    The first feature representation comprising an embedded representation (embedding) of the node and information of the corresponding node obtained through the feature network, or
    The second feature representation includes a first feature representation of nodes at both ends of the edge and information of the corresponding edge.
  7. The method of any one of claims 1 to 6, wherein the first feature representation comprises a plurality of dimensions of features, the first weight comprising a weight corresponding to each of the dimensions of features, or
    The second feature representation includes features of a plurality of dimensions, and the second weight includes a weight corresponding to the feature of each of the dimensions.
  8. The method of any one of claims 1 to 7, wherein the first neural network or the second neural network is an attention-mechanism-based neural network.
  9. The method of any one of claims 1 to 8, wherein the fusing comprises:
    And (5) weighting.
  10. The method according to any one of claims 1 to 9, wherein the objects are persons, different nodes correspond to different persons, the edges indicate relatives or economic relations between the persons, and the GNN is configured to predict whether at least one of the persons is at risk for economy based on the information of the graph.
  11. A model training apparatus, the apparatus comprising:
    The device comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring information of an information graph of a graph, the graph comprises a plurality of nodes and edges between the nodes, the information of the graph comprises information of the nodes and information of the edges, each node corresponds to one object, the information of the nodes comprises attributes of the objects, and the information of the edges comprises relations among the objects;
    The processing module is used for obtaining a first characteristic representation and a second characteristic representation according to the information of the graph, wherein the first characteristic representation is a characteristic representation of a node, and the second characteristic representation is a characteristic representation of an edge;
    According to the first characteristic representation of each node, a first weight is obtained through a first neural network, wherein the first weight is the weight of the node;
    Obtaining a second weight according to the second characteristic representation of each side through a second neural network, wherein the second weight is the weight of the side;
    according to the third characteristic representation and the second weight, obtaining a first loss through a graph neural network GNN, wherein the first loss is used for determining a loss function;
    And the model updating module is used for updating the first attention network, the second attention network and the GNN according to the loss function.
  12. The apparatus of claim 11, wherein the GNN is configured to perform a target task, the first weight is a degree of positive impact of a corresponding node on the GNN in performing the target task, and the obtaining module is further configured to:
    Acquiring a third weight according to the first weight, wherein the third weight is the weight of a node, and the third weight is the reverse influence degree of the corresponding node on the GNN when the target task is executed;
    The processing module is further configured to obtain, according to the fourth feature representation, a second loss through the graph neural network GNN, where the second loss is used to determine the loss function.
  13. The apparatus of claim 12, wherein the first weight is represented as a positive number less than 1 and the sum of the third weight and the corresponding first weight is 1.
  14. The apparatus according to any one of claims 11 to 13, wherein the GNN is configured to perform a target task, the second weight indicates a degree of positive influence of a corresponding edge on the GNN when performing the target task, and the acquiring module is further configured to:
    Acquiring a fourth weight according to the second weight, wherein the fourth weight is the weight of an edge, and the fourth weight indicates the reverse influence degree of the corresponding edge on the GNN when the target task is executed;
    and the processing module is further configured to obtain a third loss through the graph neural network GNN according to the fourth weight, where the third loss is used to determine the loss function.
  15. The apparatus of claim 14, wherein the first weight is represented as a positive number less than 1 and the sum of the fourth weight and the corresponding second weight is 1.
  16. The device according to any one of claims 11 to 15, wherein,
    The first feature representation comprising an embedded representation (embedding) of the node and information of the corresponding node obtained through the feature network, or
    The second feature representation includes embedded representations of nodes at both ends of the edge and information of the edge.
  17. The apparatus of any one of claims 11 to 16, wherein the first feature representation comprises a plurality of dimensions of features, the first weight comprising a weight corresponding to each of the dimensions of features, or
    The second feature representation includes features of a plurality of dimensions, and the second weight includes a weight corresponding to the feature of each of the dimensions.
  18. The apparatus of any one of claims 11 to 17, wherein the first neural network or the second neural network is an attention-mechanism-based neural network.
  19. The apparatus of any one of claims 11 to 18, wherein the fusing comprises:
    And (5) weighting.
  20. The apparatus according to any one of claims 11 to 19, wherein the objects are persons, different nodes correspond to different persons, the edges indicate relatives or economic relationships between the persons, and the GNN is configured to predict whether at least one of the persons is at risk for economy based on the information of the graph.
  21. A computing device comprising a memory storing code and a processor configured to obtain the code and to perform the method of any of claims 1 to 10.
  22. A computer storage medium storing one or more instructions which, when executed by one or more computers, cause the one or more computers to implement the method of any one of claims 1 to 10.
  23. A computer program product comprising code for implementing the method of any of claims 1 to 10 when said code is executed.
CN202280097659.6A 2022-06-30 2022-06-30 A model training method and related equipment Pending CN119522428A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103117 WO2024000512A1 (en) 2022-06-30 2022-06-30 Model training method and related device

Publications (1)

Publication Number Publication Date
CN119522428A true CN119522428A (en) 2025-02-25

Family

ID=89383844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280097659.6A Pending CN119522428A (en) 2022-06-30 2022-06-30 A model training method and related equipment

Country Status (2)

Country Link
CN (1) CN119522428A (en)
WO (1) WO2024000512A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118536550B (en) * 2024-05-17 2025-04-18 中国人民解放军国防科技大学 Optimization method for large language models based on layer-by-layer parameter editing
CN119624466B (en) * 2025-02-13 2025-07-18 北京银行股份有限公司 Customer service method and system based on machine learning technology

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11625580B2 (en) * 2019-05-31 2023-04-11 Apple Inc. Neural network wiring discovery
CN113850381B (en) * 2021-09-15 2024-09-17 支付宝(杭州)信息技术有限公司 Graphic neural network training method and device
CN113989574B (en) * 2021-11-04 2024-04-02 中国科学技术大学 Image interpretation method, image interpretation device, electronic device and storage medium
CN114637923B (en) * 2022-05-19 2022-09-02 之江实验室 Data information recommendation method and device based on hierarchical attention-graph neural network

Also Published As

Publication number Publication date
WO2024000512A1 (en) 2024-01-04

Similar Documents

Publication Publication Date Title
WO2021047593A1 (en) Method for training recommendation model, and method and apparatus for predicting selection probability
EP4510036A1 (en) Circuit wiring determination method and related device
WO2024083121A1 (en) Data processing method and apparatus
EP4538894A1 (en) Operation prediction method and related apparatus
CN118043802A (en) Recommendation model training method and device
US20250225398A1 (en) Data processing method and related apparatus
WO2023217127A1 (en) Causation determination method and related device
EP4398128A1 (en) Recommendation method and related device
CN116049536A (en) A kind of recommended method and related device
CN119522428A (en) A model training method and related equipment
WO2024213099A1 (en) Data processing method and apparatus
CN116204709A (en) Data processing method and related device
CN115618065A (en) A data processing method and related equipment
CN115686908A (en) A data processing method and related equipment
WO2025016380A1 (en) Data processing method and related device
WO2024230757A1 (en) Data processing method and related apparatus
WO2024012360A1 (en) Data processing method and related apparatus
CN115048560B (en) Data processing method and related device
CN116308640A (en) Recommendation method and related device
CN115630297A (en) Model training method and related equipment
WO2025044967A1 (en) Data processing method and apparatus
CN117669691A (en) Data processing method and device
CN116956204A (en) Network structure determining method, data predicting method and device of multi-task model
CN115545738A (en) A kind of recommended method and related device
CN116910358A (en) A data processing method and related devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination