CN113609863B

CN113609863B - Method, device and computer equipment for training and using data conversion model

Info

Publication number: CN113609863B
Application number: CN202110155510.6A
Authority: CN
Inventors: 王龙跃; 刘思佑; 丁亮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2024-05-07
Anticipated expiration: 2041-02-04
Also published as: CN113609863A

Abstract

The application provides a method, a device and computer equipment for training and using a data conversion model, which can be applied to the fields of artificial intelligence, cloud security and the like and is used for solving the problem of low accuracy of an attention mechanism. The method for training the data conversion model at least comprises the following steps: obtaining element correlation probabilities between each training output semantic position in a training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence; and respectively obtaining global training output semantic elements and local training output semantic elements corresponding to the training output semantic positions based on the obtained element correlation probabilities, and respectively determining target training output semantic elements corresponding to the training output semantic positions based on the obtained global training output semantic elements and the local training output semantic elements to obtain the training output semantic element sequence.

Description

Method, device and computer equipment for training and using data conversion model

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a computer device for training and using a data conversion model.

Background

With the continuous development of technology, in many fields, a machine can perform some intelligent processing on initial data based on a trained model and obtain target data based on the output of the model, so that the machine can perform some intelligent actions like a person. For example, in the field of natural language processing, a machine may translate initial data in a first language into target data in a second language based on a trained model. For another example, in the field of computer vision, a machine may translate initial data in the form of images into target data in the form of text, etc., based on a trained model.

The data conversion model in the trained model usually focuses attention on the global features of the initial data, however, the global features of the initial data only relate to the features of the initial data at a macroscopic angle, so that the target data obtained based on the trained model can only characterize the features of the initial data at a macroscopic angle, and the attention mechanism of the data conversion model is low in accuracy.

Disclosure of Invention

The embodiment of the application provides a method, a device and computer equipment for training and using a data conversion model, which are used for solving the problem of low accuracy of an attention mechanism.

In a first aspect, a method of training a data conversion model is provided, comprising:

Training the data conversion model by adopting a sample input semantic element sequence set to obtain a trained data conversion model; in a training process, inputting a semantic element sequence for a sample in the sample input semantic element sequence set, and at least executing the following operations:

obtaining element correlation probabilities between each training output semantic position in a training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence;

And respectively obtaining global training output semantic elements and local training output semantic elements corresponding to the training output semantic positions based on the obtained element correlation probabilities, respectively determining target training output semantic elements corresponding to the training output semantic positions based on the obtained global training output semantic elements and the local training output semantic elements, respectively, so as to obtain the training output semantic element sequence, wherein the global training output semantic elements are related to each sample input semantic element, and the local training output semantic elements are related to part of sample input semantic elements.

In a second aspect, there is provided a method of using a data conversion model, comprising:

Obtaining an input semantic element sequence to be processed, and converting each conversion output semantic position in an output semantic element sequence to be processed, wherein the element correlation probability between each conversion output semantic position and each input semantic element to be processed in the input semantic element sequence to be processed is obtained;

And respectively obtaining global conversion output semantic elements and local conversion output semantic elements corresponding to the conversion output semantic positions based on the obtained element correlation probabilities, respectively determining target conversion output semantic elements corresponding to the conversion output semantic positions based on the obtained global conversion output semantic elements and the obtained local conversion output semantic elements, so as to obtain the conversion output semantic element sequence, wherein the global conversion output semantic elements are related to each input semantic element to be processed, and the local conversion output semantic elements are related to part of the input semantic elements to be processed.

In a third aspect, an apparatus for training a data conversion model is provided, comprising:

Training module: training the data conversion model by adopting a sample input semantic element sequence set to obtain a trained data conversion model; in a training process, inputting a semantic element sequence for a sample in the sample input semantic element sequence set, and at least executing the following operations:

The acquisition module is used for: the method comprises the steps of obtaining element correlation probabilities between each training output semantic position in a training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence;

the training module is also configured to: and respectively obtaining global training output semantic elements and local training output semantic elements corresponding to the training output semantic positions based on the obtained element correlation probabilities, respectively determining target training output semantic elements corresponding to the training output semantic positions based on the obtained global training output semantic elements and the local training output semantic elements, respectively, so as to obtain the training output semantic element sequence, wherein the global training output semantic elements are related to each sample input semantic element, and the local training output semantic elements are related to part of sample input semantic elements.

In a fourth aspect, there is provided an apparatus using a data conversion model, comprising:

The acquisition module is used for: the method comprises the steps of obtaining a to-be-processed input semantic element sequence, and converting each conversion output semantic position in an output semantic element sequence to obtain element correlation probability between each to-be-processed input semantic element in the to-be-processed input semantic element sequence;

And a conversion module: the data conversion model is used for respectively obtaining global conversion output semantic elements and local conversion output semantic elements corresponding to the conversion output semantic positions based on the obtained element correlation probabilities, respectively determining target conversion output semantic elements corresponding to the conversion output semantic positions based on the obtained global conversion output semantic elements and the obtained local conversion output semantic elements, respectively, and obtaining the conversion output semantic element sequence, wherein the global conversion output semantic elements are related to each input semantic element to be processed, and the local conversion output semantic elements are related to part of the input semantic elements to be processed.

In a fifth aspect, there is provided a computer device comprising:

A memory for storing program instructions;

and a processor for invoking program instructions stored in said memory and executing the method according to the first or second aspect in accordance with the obtained program instructions.

In a sixth aspect, there is provided a storage medium storing computer executable instructions for causing a computer to perform the method of the first or second aspect.

In the embodiment of the application, based on the element correlation probability between each training output semantic position in the training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence, global training output semantic elements and local training output semantic elements corresponding to each training output semantic position are respectively obtained. The method has the advantages that attention is focused on global features of the sample input semantic element sequences, attention is focused on local features of part of sample input semantic elements corresponding to training output semantic positions, the sample input semantic element sequences are described from multiple angles, and the data conversion model can more accurately perform data conversion processing on the sample input semantic element sequences.

And, a data conversion model is adopted, and based on each global training output semantic element and each local training output semantic element, target training output semantic elements corresponding to each training output semantic position are obtained, so that training output semantic element sequences are obtained. The multi-dimensional description of the sample input semantic element sequence is taken as the basis for obtaining the training output semantic element sequence, so that the obtained training output semantic element sequence can more accurately represent the semantics of the sample input semantic element sequence, and the accuracy of the data conversion model is improved.

In the embodiment of the application, based on the element correlation probability between each conversion output semantic position in the conversion output semantic element sequence and each to-be-processed input semantic element in the to-be-processed input semantic element sequence, global conversion output semantic elements and local conversion output semantic elements corresponding to each conversion output semantic position are respectively obtained. The method and the device not only focus attention on the global features of the input semantic element sequence to be processed, but also focus attention on the local features of part of the input semantic elements to be processed corresponding to the conversion output semantic positions, and describe the input semantic element sequence to be processed from multiple angles, so that the data conversion model can more accurately perform data conversion processing on the input semantic element sequence to be processed.

And a data conversion model is adopted, and based on each global conversion output semantic element and each local conversion output semantic element, target conversion output semantic elements corresponding to each conversion output semantic position are obtained, so that a conversion output semantic element sequence is obtained. The multi-dimensional description of the input semantic element sequence to be processed is taken as the basis for obtaining the conversion output semantic element sequence, so that the obtained conversion output semantic element sequence can more accurately represent the semantics of the input semantic element sequence to be processed, and the accuracy of the data conversion model is improved.

Drawings

FIG. 1 is an application scenario of a method for training and using a data conversion model according to an embodiment of the present application;

FIG. 2a is a schematic diagram of a method for training and using a data conversion model according to an embodiment of the present application;

FIG. 2b is a schematic diagram of a method for training and using a data conversion model according to an embodiment of the present application;

FIG. 3 is a second flow chart of a method for training and using a data conversion model according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for training and using a data transformation model according to an embodiment of the present application;

FIG. 5a is a schematic diagram II of a method for training and using a data conversion model according to an embodiment of the present application;

FIG. 5b is a flowchart illustrating a method for training and using a data transformation model according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for training and using a data transformation model according to an embodiment of the present application;

FIG. 7a is a schematic diagram of a method for training and using a data transformation model according to an embodiment of the present application;

FIG. 7b is a schematic diagram II of a method for training and using a data transformation model according to an embodiment of the present application;

FIG. 7c is a schematic diagram III of a method for training and using a data conversion model according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a polyline for a method of training and using a data transformation model according to an embodiment of the present application;

FIG. 9 is a second polyline diagram of a method for training and using a data transformation model according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a training apparatus using a data conversion model according to an embodiment of the present application;

FIG. 11 is a schematic diagram II of a training apparatus using a data conversion model according to an embodiment of the present application;

Fig. 12 is a schematic diagram of a training apparatus using a data conversion model according to a second embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

(1) Deep learning (DEEP LEARNING, DL) and neural networks (Neural Network, NN):

Deep learning is a branch of machine learning, which is a method that attempts to abstract data at a high level using multiple processing layers including complex structures, or multiple processing layers composed of multiple nonlinear transformations.

Neural networks are a deep learning model that mimics the structure and function of biological neural networks in the field of machine learning and cognitive sciences.

(2) Machine translation (Machine Translation, MT) and neural network machine translation (Neural Machine Translation, NMT):

machine translation is a method of automatically translating one language word into another language word by means of an electronic computer or the like.

Neural network machine translation is the latest generation of machine translation technology based on neural networks.

(3) Cross-attention mechanisms (Cross-Attention Mechanism or Encoder-decoder Attention):

The cross-attention mechanism is a method for establishing a dependency relationship between hidden states of an encoding model and a decoding model in a neural network, that is, the cross-attention mechanism is a neural network structure focusing attention on the encoding model from the decoding model.

(4) Self-focusing neural networks (Self-Attention Network, SAN) and transducers:

the self-focusing neural network is a neural network structure model based on a self-focusing attention mechanism.

The transducer is a SAN-based encoding model-decoding model framework. Transformer is the most popular current sequence-to-sequence-sequence generation model structure.

(5) Machine translation evaluation index (Bilingual Evaluation Understudy, BLEU):

The machine translation evaluation index is used for evaluation and a translation model thereof, and a higher value of the machine translation evaluation index indicates a better translation effect of the translation model.

Embodiments of the application relate to Cloud technology (Cloud technology) and artificial intelligence (ARTIFICIAL INTELLIGENCE, AI). Designed based on cloud computing (cloud computing) and cloud storage (cloud storage) in cloud technology. Designed based on Computer Vision (CV), voice (Speech Technology), natural language processing (natural language processing, NLP) and machine learning (MACHINE LEARNING, ML) in artificial intelligence.

Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Cloud computing is a computing model that distributes computing tasks over a large number of computer-made resource pools, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

As a basic capability provider of cloud computing, a cloud computing resource pool (abbreviated as a cloud platform, generally referred to as IaaS (Infrastructure AS A SERVICE) platform) is established, in which multiple types of virtual resources are deployed for external clients to select for use.

According to the logic function division, a PaaS (Platform AS A SERVICE, platform service) layer can be deployed on an IaaS (Infrastructure AS A SERVICE, infrastructure service) layer, and a SaaS (Software AS A SERVICE, service) layer can be deployed above the PaaS layer, or the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, web container, etc. SaaS is a wide variety of business software such as web portals, sms mass senders, etc. Generally, saaS and PaaS are upper layers relative to IaaS.

Cloud storage is a new concept which extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system which integrates a large number of storage devices (storage devices are also called storage nodes) of different types in a network through application software or application interfaces to cooperatively work and jointly provides data storage and service access functions for the outside through functions such as cluster application, grid technology, a distributed storage file system and the like.

At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object.

The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant Array of INDEPENDENT DISK), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.

Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology mainly comprises a computer vision technology, a natural language processing technology, machine learning, deep learning and other directions.

With research and advancement of artificial intelligence technology, artificial intelligence is being developed for research and application in various fields, such as common smart homes, intelligent recommendation systems, virtual assistants, smart speakers, smart marketing, smart translation, autopilot, robots, smart medicine, etc., and it is believed that with the development of technology, artificial intelligence will be applied in more fields and with increasing importance.

Computer vision is a science of researching how to make a machine "look at", and more specifically, it means that a camera and a computer are used to replace human eyes to perform machine vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content, behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Key technologies to speech technology are Automatic Speech Recognition (ASR) and speech synthesis (TTS) technologies and voiceprint recognition technologies. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.

Natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The application fields of training and using the data conversion model provided by the embodiment of the application are briefly described below.

With the continuous development of technology, in many fields, a machine can perform some intelligent processing on initial data based on a trained model and obtain target data based on the output of the model, so that the machine can perform some intelligent actions like a person. However, the data transformation model in the trained model usually focuses attention on global features of the initial data, however, the global features of the initial data only relate to features of macroscopic angles of the initial data, so that the target data obtained based on the trained model can also only characterize features of macroscopic angles of the initial data, thereby making the accuracy of the obtained target data lower.

For example, in the field of translation, a machine may translate an initial text or initial speech in a first language into a target text or target speech in a second language based on a trained model. As another example, in the field of video auditing, a machine may translate an initial video frame in the form of an image into target text in the form of text, etc., based on a trained model, as described in the examples below.

In the field of translation, for example, because the target account uses Chinese, when English is needed to communicate with other accounts, after receiving the Chinese text or the Chinese voice input by the target account, the machine needs to convert the Chinese text or the Chinese voice into English text to be displayed on the other accounts or into English voice to be displayed on the other accounts, and the like, so that the target account can communicate with the other accounts without using English. However, for a Chinese word, there may be a plurality of English words corresponding to the Chinese word, different English words are selected to translate the Chinese word, so that different semantics may be obtained, if the Chinese word is translated by selecting one English word according to the macroscopic features of all the words in the Chinese sentence, the situation that the translated English sentence is different from the original Chinese sentence in terms of semantics easily occurs, so that other account numbers cannot understand the meaning of the target account number, or misunderstand the meaning of the target account number, and other problems, such as the Chinese sentence, the A may be translated into the English sentence by communicating with a girl, A is socializing WITH A GIRL, but should be translated into A IS DATING A GIRL in practice.

In the field of video auditing, for example, when a target account number uploads a video on a video platform, a machine needs to identify content expressed by a video frame in the video to determine whether the video can pass the auditing. For each video frame, each image region in the video frame needs to be converted into text, and what the video frame expresses is determined. However, for an image area, there may be multiple text corresponding to the image area, different texts are selected to express the image area, so that different video frame contents can be obtained, if the macro features of the video frames are used to express the image area, a situation that the obtained video frame contents are different from the contents actually displayed by the original video frames and cause audit errors easily occurs, for example, a text book is included in the image area, a machine may use a reading to express the image area, and the reading and the text book actually represent completely different scenes.

It can be seen that the attention mechanism of the data transformation model in the trained model is less accurate.

In order to solve the problem of low accuracy of the attention mechanism of the data conversion model, the application provides a method for training the data conversion model and a method for using the data conversion model. The method of training the data conversion model is described first.

The method for training the data conversion model adopts a sample input semantic element sequence set to train the data conversion model until the training loss of the data conversion model meets the preset convergence condition, and the trained data conversion model is obtained. In one training process, the following operations are executed for a sample input semantic element sequence in a sample input semantic element sequence set:

And obtaining element correlation probabilities between each training output semantic position in the training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence. After the relevant probabilities of all the elements are obtained, a data conversion model is adopted, and global training output semantic elements and local training output semantic elements corresponding to all the training output semantic positions are respectively obtained based on the obtained relevant probabilities of all the elements. After obtaining each global training output semantic element and each local training output semantic element, respectively determining each corresponding target training output semantic element of each training output semantic position based on each obtained global training output semantic element and each local training output semantic element so as to obtain a training output semantic element sequence. After obtaining the training output semantic element sequence, a training penalty is determined based on the sample input semantic element sequence and the training output semantic element sequence. After the training loss is obtained, model parameters of the data conversion model are adjusted based on the training loss.

As an embodiment, the global training output semantic element corresponding to each training output semantic position is used for representing the semantic feature of all sample input semantic elements in the sample input semantic element sequence aiming at the corresponding output semantic position, and the local training output semantic element corresponding to each training output semantic position is used for representing the semantic feature of part of sample input semantic elements in the sample input semantic element sequence aiming at the corresponding training output semantic position.

In the embodiment of the application, a data conversion model is adopted, and based on each global training output semantic element and each local training output semantic element, each target training output semantic element corresponding to each training output semantic position is obtained, so that a training output semantic element sequence is obtained. The multi-dimensional description of the sample input semantic element sequence is taken as the basis for obtaining the training output semantic element sequence, so that the obtained training output semantic element sequence can more accurately represent the semantics of the sample input semantic element sequence, and the accuracy of the data conversion model is improved.

The application scenario of the method for training and using the data conversion model provided by the application is described below.

Referring to fig. 1, an application scenario of the method for training a data conversion model provided by the present application is shown. The application scene comprises a client 101 and a server 102. The client 101 includes a first client 1011 and a second client 1012. Communication may be between the client 101 and the server 102, for example, communication may be between the first client 1011 and the server 102, and communication may be between the second client 1012 and the server 102. Communication may be between the first client 1011 and the second client 1012. The communication mode can be communication by adopting a wired communication technology, for example, communication is carried out through a connecting network wire or a serial port wire; the communication may also be performed by using a wireless communication technology, for example, a bluetooth or wireless fidelity (WIRELESS FIDELITY, WIFI) technology, which is not limited in particular.

The first client 1011 broadly refers to a device that can provide a sample input semantic element sequence or a pending input semantic element sequence to the server 102, and the second client 1012 broadly refers to a device that can present output data generated based on training or converting the output semantic element sequence. The client 101 is, for example, a terminal device, a third party application accessible by the terminal device, a web page accessible by the terminal device, or the like. The terminal device is, for example, a mobile phone, a tablet computer, a personal computer or the like. The server 102 generally refers to a device that may employ a data conversion model for data processing, such as a terminal device or a server. The server is, for example, a cloud server or a local server. Both the client 101 and the server 102 can adopt cloud computing to reduce occupation of local computing resources; cloud storage may also be employed to reduce the occupation of local storage resources.

As an embodiment, the first client 1011 and the server 102 may be the same device; or the second client 1012 and the server 102 may be the same device; or the first client 1011 and the second client 1012 may be the same device; or the first client 1011, the second client 1012, and the server 102 may be the same device, etc., without limitation. In the embodiment of the present application, the first client 1011, the second client 1012, and the server 102 are respectively different devices for explanation.

The method for training the data conversion model provided by the embodiment of the application is described below.

Since the input data of the data conversion model is generally a vector or a matrix for characterizing the initial data features, and the output data of the data conversion model is generally a vector or a matrix for characterizing the target data features, the input data of the data conversion model cannot completely replace the initial data, and the output data cannot completely replace the target data. Meanwhile, in a real scene, accurate target data needs to be obtained, so that when the server 102 trains the data conversion model, the server can train the data conversion model by combining a coding model for providing input data for the data conversion model based on initial data and a decoding model for obtaining the target data based on output data of the data conversion model; or the server 102 may train the data conversion model according to the actual usage scenario of the data conversion model, and combine with other models used in combination with the data conversion model, which is not particularly limited. Therefore, the data conversion model can be trained based on the target data and the initial data, and the accuracy of the training data conversion model is improved. In the embodiment of the application, the mode of combining the coding model, the data conversion model and the decoding model to train the data conversion model is taken as an example for introduction.

Fig. 2a is a schematic diagram of a combination of an encoding model, a data conversion model and a decoding model. In the process of training the data conversion model, aiming at the training process of one input data, the server 102 inputs one initial data in the initial data set into the coding model, and the coding model performs coding processing on the initial data to obtain a sample input semantic element sequence corresponding to the initial data. The server 102 inputs the sample input semantic element sequence into a data conversion model, and the data conversion model performs data conversion processing on the sample input semantic element sequence to obtain a training output semantic element sequence. The server 102 inputs the training output semantic element sequence into a decoding model, and the decoding model decodes the training output semantic element sequence to obtain target data corresponding to the training output semantic element sequence.

The training model in the server 102 determines a codec training penalty based on the initial data and the target data, which may be used as a training penalty for the data conversion model. The server 102 adjusts model parameters of the coding model, model parameters of the data conversion model, and model parameters of the decoding model based on the codec training loss. The primary adjustment process of the model parameters of the coding model, the model parameters of the data conversion model and the model parameters of the decoding model is equivalent to a primary learning process of the coding model, the data conversion model and the decoding model based on initial data. By continuously learning based on each initial data in the initial data set until the codec training loss meets a preset convergence condition, the codec training loss meeting the preset convergence condition may indicate that the training loss of the data conversion model meets the preset convergence condition. When the training loss of the data conversion model meets a preset convergence condition, a trained data conversion model can be obtained based on the current model parameters of the data conversion model.

An exemplary description of the process of combining the coding model, the data conversion model and the decoding model is presented below based on the schematic diagram of fig. 2 a. Please refer to fig. 2b, which is a flow chart illustrating the combination of the encoding model, the data conversion model and the decoding model.

S201, based on the coding model, obtaining a sample input semantic element sequence.

The server 102 obtains an initial data set, and processes each initial data in the initial data set. The server 102 may obtain the initial data set in various manners, for example, may receive the initial data set sent by the first client 1011; or may obtain an initial data set in an associated storage device; or the initial data set may be downloaded in a designated link, etc., without limitation in particular.

For one initial data in the initial data, the server 102 inputs the initial data into a coding model, the coding model performs feature extraction on each initial sub-data in the initial data respectively to obtain sample input semantic elements corresponding to each initial sub-data, and each sample input semantic element can represent the semantics of the corresponding initial sub-data. After obtaining each sample input semantic element, the server 102 may determine an arrangement order of each sample input semantic element in the sample input semantic element sequence. After determining the arrangement order of the sample input semantic elements, the server 102 may obtain a sample input semantic element sequence.

There are various ways in which the server 102 orders the sample input semantic elements, two of which are described below as examples.

The first sequencing method comprises the following steps:

According to the arrangement sequence of the initial sub-data in the initial data, arranging the input semantic elements of each sample.

Because one initial sub-data corresponds to one sample input semantic element, the arrangement sequence of the sample input semantic elements can be arranged according to the position of the corresponding initial sub-data in the initial data.

For example, the initial data is a Chinese sentence, A is in contact with a girl. Each initial sub-data may include six initial sub-data, "a," at, "" with, "" one, "" girl, "and" in-contact; or the method can comprise four initial sub-data of 'A', 'in-heel', 'a girl' and 'interaction', and the dividing mode of the initial sub-data can train the coding model according to actual use conditions, and the method is not particularly limited. Taking the example that the initial sub-data includes "a", "at", "with", "one", "girl", and "in-contact", the respective sample input semantic elements in the sample input semantic element sequence may be arranged in the order of "a", "at", "with", "one", "girl", and "in-contact".

And a sorting method II:

And according to the arrangement rules of the target sub-data learned by the coding model and the decoding model, arranging each sample input semantic element.

Since the arrangement order of each target sub-data in the target data output by the decoding model may be different from the arrangement order of the initial sub-data corresponding to each target sub-data in the initial data, after the sample input semantic elements corresponding to each initial sub-data are obtained, the arrangement order of the sample input semantic elements in the sample input semantic element sequence may be determined according to the arrangement rule of the target sub-data learned by the encoding model and the decoding model, so that each training output semantic element in the training output semantic element sequence output by the data conversion model is arranged according to the arrangement rule of the target sub-data learned by the encoding model and the decoding model, the target sub-data output by the decoding model is arranged according to the arrangement rule of the target sub-data learned by the encoding model and the decoding model, and the data conversion model and the decoding model do not need to execute the data arrangement process.

For example, the initial data is a Chinese sentence, A is in contact with a girl, and each initial sub-data includes six initial sub-data of "A", "in contact with", "one", "girl" and "in contact". The target data is English sentence A IS DATING A GIRL. Each target sub-data includes "A", "is", "dating", "a", and "girl". The arrangement rule of the target sub data learned by the coding model and the decoding model is the order of "a", "is", "dating", "a" and "girl". Thus, individual sample input semantic elements in a sample input semantic element sequence may be arranged in the order "a", "in", "with", "in", "one", and "girl".

S202, obtaining a training output semantic element sequence based on a data conversion model.

After obtaining the sample input semantic element sequence based on the encoding model, the server 102 may input the sample input semantic element sequence into the data transformation model. The server 102 performs data conversion processing on each sample input semantic element in the sample input semantic element sequence by adopting a data conversion model, and obtains each training output semantic element. The server 102 sorts the training output semantic elements based on the arrangement rules learned by the coding model and the decoding model, and obtains a training output semantic element sequence.

For example, the initial data is a Chinese sentence, A is in contact with a girl, and each initial sub-data includes six initial sub-data of "A", "in contact with", "one", "girl" and "in contact". The target data is English sentence A IS DATING A GIRL. Each target sub-data includes "A", "is", "dating", "a", and "girl". The order of the individual sample input semantic elements is "a", "in", "with", "one", "girl" and "in-between". The server 102 determines the order in which the arrangement rules of the target sub-data learned by the coding model and the decoding model are "a", "is", "dating", "a", and "girl". Therefore, the server 102 may arrange the training output semantic elements in the order of "a", "is", "dating", "a" and "girl" to obtain a training output semantic element sequence.

S203, obtaining target data based on the decoding model.

After the server 102 obtains the training output semantic element sequence, the training output semantic element sequence is input into a decoding model, and target sub-data corresponding to each training output semantic element in the training output semantic element sequence is obtained. If each training output semantic element in the training output semantic element sequence is arranged according to the arrangement rule of the target sub-data learned by the coding model and the decoding model, each obtained target sub-data can be arranged according to the arrangement sequence of each training output semantic element in the training output semantic element sequence, and target data corresponding to the initial data is obtained. If each training output semantic element in the training output semantic element sequence is arranged according to the arrangement sequence of the initial sub-data in the initial data or other arrangement sequences, each target sub-data is arranged according to the arrangement rule of the target sub-data learned by the coding model and the decoding model, and target data corresponding to the initial data is obtained. For example, the respective target sub-data arranged according to the arrangement rule are "a", "is", "dating", "a", and "girl", and then the obtained target data is "A IS DATING A GIRL".

A training process of the data conversion model is specifically described below based on fig. 2 a. Referring to fig. 3, a flowchart of a method for training a data conversion model according to an embodiment of the application is shown.

S301, obtaining element correlation probabilities between each training output semantic position in the training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence.

Because the data conversion model is trained in combination with the encoding model and decoding model, the server 102 may determine element correlation probabilities between corresponding sample input semantic elements and training output semantic elements based on the data correlation probabilities between the initial sub-data and the target sub-data.

In the training process, the model parameters of the coding model and the model parameters of the decoding model are continuously adjusted based on the target data and the initial data, so that the coding model and the decoding model can learn the associated positions of the target sub-data in the target data and the data associated probabilities of the target sub-data and the initial sub-data respectively when the target sub-data are determined. The greater the probability of data correlation between an associated location and an initial sub-data, the more relevant the target sub-data at the associated location is to the initial sub-data; the smaller the probability of data correlation between an associated location and an initial sub-data, the less relevant the target sub-data at the associated location is to the initial sub-data.

For example, among the target data, the target sub-data at each associated position are "a", "is", "dating", "a", and "girl". The initial sub-data are "a", "in", "with", "one", "girl" and "in contact". The probability of data correlation between the target sub-data "dating" and each initial sub-data in the initial data is 0.18, the probability of data correlation between "dating" and "a" is 0.02, the probability of data correlation between "dating" and "between" is 0.05, the probability of data correlation between "dating" and "with" is 0.03, the probability of data correlation between "dating" and "one" is 0.04, the probability of data correlation between "dating" and "girl" is 0.68, and the probability of data correlation between "dating" and "with" is 0.68. The association position of "dating" in the target data is most relevant to "exchange", and the association position of "dating" in the target data is least relevant to "at". Thus, when the target sub-data "dating" is obtained based on the initial data "a is in contact with a girl," more attention should be paid to "in contact with" and less attention should be paid to "in".

After obtaining the associated position of each target sub-data in the target data and the data correlation probability between each target sub-data and each initial sub-data, the server 102 determines each data correlation probability as an element correlation probability between each training output semantic position in the corresponding training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence.

For example, the data correlation probability between the associated position of "dating" in the target data and "a" is determined as the training output semantic position of the training output semantic element corresponding to "dating" in the training output semantic element sequence, and the element correlation probability between the sample input semantic element corresponding to "a".

S302, a data conversion model is adopted, and global training output semantic elements and local training output semantic elements corresponding to the training output semantic positions are respectively obtained based on the obtained element correlation probabilities.

After obtaining the sample input semantic element sequence, the server 102 may obtain, for each training output semantic position in the training output semantic element sequence, a training output semantic element at each training output semantic position, so as to obtain the training output semantic element sequence.

For a training output semantic location, the server 102 may determine a training output semantic element corresponding to the training output semantic location according to a global training output semantic element and a local training output semantic element corresponding to the training output semantic location. The global training output semantic elements are used for representing semantic features of all sample input semantic elements in the sample input semantic element sequence aiming at the output semantic positions, and the local training output semantic elements are used for representing semantic features of part of sample input semantic elements in the sample input semantic element sequence aiming at the training output semantic positions.

For example, the initial data is a Chinese sentence, "A is in contact with a girl," and the sample input semantic element sequence includes respective sample input semantic elements corresponding to respective initial sub-data "A", "in contact with", "one", "girl" and "in contact. The target data is English sentence, "A IS DATING A GIRL". The training output semantic element sequence comprises training output semantic elements corresponding to each target sub-data "A", "is", "dating", "a" and "girl". Then, for the third associated position in the training output semantic element sequence, that is, for the training output semantic element corresponding to the target sub-data "dating", the server 102 may determine, from a macroscopic perspective, the training output semantic element corresponding to the target sub-data "dating" by noting the sample input semantic element corresponding to each of the initial sub-data "a", "at", "with", "one", "girl", and "in-contact". That is, the semantic features of each sample input semantic element can be characterized from a macroscopic perspective using the global training output semantic elements.

Further, during the training process, the model tends to focus attention on the sample input semantic elements most relevant to the training output semantic elements, creating an overfitting situation. To reduce the impact of overfitting on model accuracy, attention can be paid to other sample input semantic elements that are sub-correlated to the training output semantic elements.

For example, for the third associated position in the training output semantic element sequence, i.e., for the training output semantic element corresponding to the target sub-data "dating", the server 102 may easily translate "interaction" into "socializing with" if too much attention is focused on the sample input semantic element corresponding to the initial sub-data "interaction" and little attention is focused on the sample input semantic elements corresponding to the other initial sub-data. The server 102 may determine the training output semantic elements corresponding to the target sub-data "dating" from a microscopic perspective by dispersing the attention of "in contact" to "a", "girl", and "in contact", reducing the likelihood of determining the training output semantic elements corresponding to the target sub-data "socializing with". That is, the semantic features of a portion of the sample input semantic elements may be characterized from a microscopic perspective using the local training output semantic elements.

The process of obtaining a global training output semantic element and a local training output semantic element corresponding to a training output semantic position is specifically described below. Referring to fig. 4, a flow chart of global training output semantic elements and local training output semantic elements corresponding to a training output semantic position is obtained.

S401, obtaining global training output semantic elements corresponding to training output semantic positions based on the obtained element correlation probabilities.

Based on the obtained respective element correlation probabilities, respective sample input semantic elements in the sample input semantic element sequence are determined, corresponding to the first element weights of the training output semantic locations. The server 102 uses the training output semantic location and the element correlation probability between the first sample input semantic element in the sample input semantic element sequence as the first element weight of the first sample input semantic element corresponding to the training output semantic location. The server 102 uses the training output semantic location and the element correlation probability between the second sample input semantic element in the sample input semantic element sequence as the first element weight of the second sample input semantic element corresponding to the training output semantic location. Thereby obtaining respective sample input semantic elements in the sequence of sample input semantic elements corresponding to the first element weights of the training output semantic locations.

After obtaining the first element weights of the sample input semantic elements corresponding to the training output semantic positions, the server 102 performs weighted summation processing on the sample input semantic elements based on the first element weights, and determines global training output semantic elements corresponding to the training output semantic positions.

S402, determining target related sample input semantic elements corresponding to the training output semantic positions based on the obtained element related probabilities.

Based on the obtained element correlation probabilities, the server 102 determines a maximum value, i.e., a maximum element correlation probability, of the element correlation probabilities corresponding to the training output semantic positions. After obtaining the maximum element correlation probability corresponding to the training output semantic position, the server 102 determines, in the sample input semantic element sequence, a sample input semantic element corresponding to the maximum element correlation probability, and obtains the most relevant sample input semantic element corresponding to the training output semantic position.

After obtaining the most relevant sample input semantic element corresponding to the training output semantic position, the server 102 determines at least one sample input semantic element that meets a preset correlation condition with the most relevant sample input semantic element in the sample input semantic element sequence, and determines the obtained at least one sample input semantic element as a target relevant sample input semantic element corresponding to the training output semantic position.

As an embodiment, there are various methods for determining at least one sample input semantic element satisfying a preset correlation condition with the most relevant sample input semantic element, and specifically, the preset correlation condition may be set according to an actual use scenario, and two determination methods are described below as an example.

The first determination method comprises the following steps:

And determining at least one sample input semantic element with the distance between the sample input semantic position and the sample input semantic position of the most relevant sample input semantic element within a preset distance range as a target relevant sample input semantic element corresponding to the training output semantic position.

For a training output semantic position, after determining the corresponding most relevant sample input semantic element in the sample input semantic element sequence, determining the most relevant sample input semantic element and each sample input semantic element arranged before and after the most relevant sample input semantic element as a target relevant sample input semantic element; or the sample input semantic elements which are arranged behind the most relevant sample input semantic elements and are separated from the most relevant sample input semantic elements by two sample input semantic elements can be determined as target relevant sample input semantic elements and the like, and the target relevant sample input semantic elements can be determined according to a preset distance range without limitation.

For example, the training output semantic element corresponding to "dating" is the sample input semantic element corresponding to "in-contact", and the sample input semantic element corresponding to "in-contact" and the sample input semantic element arranged before the sample input semantic element corresponding to "in-contact", that is, "one" and the sample input semantic element corresponding to "girl", may be determined as the training output semantic element target-related sample input semantic element corresponding to "dating".

And a determination method II:

And determining at least one sample input semantic element with the error between the element correlation probability and the maximum element correlation probability within a preset error range as a target correlation sample input semantic element corresponding to the training output semantic position.

For a training output semantic position, after determining the corresponding most relevant sample input semantic element in the sample input semantic element sequence, at least one sample input semantic element with element correlation probability smaller than or equal to the maximum element correlation probability can be determined as a target relevant sample input semantic element corresponding to the training output semantic position, a plurality of sample input semantic elements are selected, a preset error range can be set according to actual conditions, and the method is not limited in particular; or at least one sample input semantic element with the error between the element correlation probability and the maximum element correlation probability as a specified value can be determined as a target correlation sample input semantic element corresponding to the training output semantic position, and the method is not particularly limited.

For example, the training output semantic element corresponding to "dating" is most relevant to the sample input semantic element corresponding to "interaction", and the maximum element correlation probability is 0.68. Two sample input semantic elements, namely "a" and "sample input semantic elements corresponding to" and "follow", whose element correlation probabilities are ranked after the maximum element correlation probability corresponding to "interaction" may be determined as training output semantic element target correlation sample input semantic elements corresponding to "dating". Or the sample input semantic elements with the error between the element correlation probability and the maximum element correlation probability corresponding to the 'interaction' within 0.63-0.64, namely the sample input semantic elements corresponding to the 'one' and the 'girl' can be determined as the training output semantic element target correlation sample input semantic elements corresponding to the 'dating'.

S403, determining the local training output semantic elements corresponding to the training output semantic positions based on the obtained element correlation probabilities.

After obtaining the target related sample input semantic elements corresponding to the training output semantic locations, the server 102 determines a second element weight corresponding to the training output semantic locations for each target related sample input semantic element, respectively. The process of determining, by the server 102, the second element weight of each target related sample input semantic element corresponding to the training output semantic position is the same as the process of determining, by the server 102, the first element weight of each sample input semantic element corresponding to the training output semantic position in S401, and will not be described in detail herein.

After obtaining the second element weights of the input semantic elements of the target related samples corresponding to the training output semantic positions, the server 102 performs weighted summation processing on the input semantic elements of the target related samples based on the second element weights, and determines local training output semantic elements corresponding to the training output semantic positions.

It should be noted that, there is no necessary sequence relationship between S401 and S402 to S403, and S401 may be executed simultaneously, or S401 may be executed first and S402 to S403 may be executed later, or S402 to S403 may be executed first and S401 may be executed later.

S303, based on the obtained global training output semantic elements and the local training output semantic elements, respectively determining target training output semantic elements corresponding to the training output semantic positions so as to obtain training output semantic element sequences.

After obtaining each global training output semantic element and each local training output semantic element, the server 102 may determine, for each training output semantic position in the training output semantic element sequence, a corresponding target training output semantic element to obtain the training output semantic element sequence.

For a training output semantic location, the server 102 may obtain a global weight of a global training output semantic element and a local weight of a local training output semantic element corresponding to the training output semantic location. After global training output semantic elements and local training output semantic elements corresponding to the training output semantic positions are obtained, weighting and summing processing is carried out on the global training output semantic elements and the local training output semantic elements according to the global weights and the local weights, and target training output semantic elements corresponding to the training output semantic positions are obtained.

The global weight and the local weight satisfy a preset weight relationship, for example, the sum of the global weight and the local weight is a preset value of "1", and all possible weight relationships are not listed one by one.

The global weight and the local weight can be learned by continuously adjusting the model parameters of the data conversion model in the training process of the data conversion model. If the global weights and local weights are learned directly from the sample input semantic element sequence set, then for other input semantic element sequence sets, the data conversion model needs to retrain the global weights and local weights based on other input semantic element sequence sets, such that the data conversion model has certain limitations. Thus, the global weights may be set to be related to the corresponding training output semantic locations, such that the local weights are also related to the corresponding training output semantic locations. After the data conversion model learns the training output semantic elements which should be output by each training output semantic position, the global weight and the local weight can be matched with the training output semantic positions, and aiming at other input semantic element sequence sets in time, the global weight and the local weight do not need to be retrained, so that the flexibility of the data conversion model is improved.

An example description of the process of setting global weights to be relevant to corresponding training output semantic locations is presented below.

And aiming at a training output semantic position, performing linear transformation processing on a position identifier corresponding to the training output semantic position to obtain a position identifier corresponding to the training output semantic position after linear transformation. The location identifier is used to represent the training output semantic location, for example, a numerical value representing the sequence of the training output semantic location in the training output semantic element sequence, or a corresponding training output semantic element, etc., which is not particularly limited.

And determining the position identifier after linear transformation as the global weight of the global training output semantic element corresponding to the training output semantic position, and referring to a formula (1).

g＝W*Q_i (1)

Where g represents global weight of the global training output semantic element, W represents a linear transformation process, for example, a constant or a constant matrix, i represents training output semantic position, and Q _i represents a position identifier corresponding to the training output semantic position.

Or carrying out normalization processing on the position identifier after linear transformation to obtain a normalized position identifier, determining the normalized position identifier as the global weight of the global training output semantic element corresponding to the training output semantic position, and referring to formula (2).

g＝δ(W*Q_i) (2)

Where δ (·) represents a normalization process, such as a sigmoid function.

S304, determining training loss based on the sample input semantic element sequence and the training output semantic element sequence, and adjusting model parameters of the data conversion model based on the training loss.

After obtaining the training output sequence of semantic elements, a training penalty for the data conversion model may be determined based on the sample input sequence of semantic elements and the training output sequence of semantic elements. Since the coding model, the data conversion model, and the decoding model are co-trained, the codec training loss may be determined based on the input of the coding model and the output of the decoding model, with the codec training loss being taken as the training loss of the data conversion model. After obtaining the training loss of the data conversion model, it may be determined whether the training loss of the data conversion model satisfies a preset convergence condition, and if the preset convergence condition is satisfied, S305 is performed; if the preset convergence condition is not met, model parameters of the data conversion model can be adjusted based on training loss of the data conversion model, and a one-time learning process of the data conversion model based on sample input semantic elements is realized.

There may be various processes for determining the codec training loss based on the input of the coding model and the output of the decoding model, for example, determining the codec training loss based on the similarity between the initial data and the target data; or determining a codec training loss based on maximum likelihood estimation of the target data compared to the initial data, etc., without limitation.

S305, obtaining a trained data conversion model until the training loss of the data conversion model meets a preset convergence condition.

After model parameters of the data conversion model are adjusted based on the training loss of the data conversion model, and a learning process of the data conversion model based on sample input semantic elements is achieved, training of the data conversion model can be continued based on a next sample input semantic element sequence in the sample input semantic element sequence set until the training loss of the data conversion model meets a preset convergence condition, and a trained data conversion model is obtained. In the training process, new model parameters are not introduced, and the accuracy of the data conversion model is improved on the premise of not increasing the training burden of the data conversion model.

As an embodiment, the server 102 may send the data obtained in each training process to the second client 1012 for data display or visual display, so that the user may intuitively see the training process of the whole data conversion model based on the second client 1012, so as to facilitate the user to analyze the training process or adjust the data conversion model.

Based on the same inventive concept, the embodiment of the present application further provides a method for using the data conversion model, and the method for using the data conversion model is described below, referring to fig. 5a, which is a schematic diagram of the method for using the data conversion model.

The input semantic element sequence to be processed is obtained, a data conversion model is used for the input semantic element sequence to be processed, and the input semantic element sequence to be processed can be obtained by adopting a coding model, or can be obtained by adopting other models or adopting other modes, and the input semantic element sequence to be processed is not limited in particular. And obtaining element correlation probabilities between each conversion output semantic position in the conversion output semantic element sequence and each to-be-processed input semantic element in the to-be-processed input semantic element sequence.

After obtaining the relevant probabilities of the elements, a trained data conversion model is adopted, and global conversion output semantic elements and local conversion output semantic elements corresponding to the conversion output semantic positions are respectively obtained based on the obtained relevant probabilities of the elements. After each global conversion output semantic element and each local conversion output semantic element are obtained, respectively determining target conversion output semantic elements corresponding to each conversion output semantic position based on each obtained global conversion output semantic element and each local conversion output semantic element so as to obtain a conversion output semantic element sequence.

As an embodiment, the global conversion output semantic element corresponding to each conversion output semantic position is used for representing semantic features of all the to-be-processed input semantic elements in the to-be-processed input semantic element sequence aiming at the corresponding output semantic position, and the local conversion output semantic element corresponding to each conversion output semantic position is used for representing semantic features of part of to-be-processed input semantic elements in the to-be-processed input semantic element sequence aiming at the corresponding conversion output semantic position.

In the embodiment of the application, a data conversion model is adopted, and based on each global conversion output semantic element and each local conversion output semantic element, each target conversion output semantic element corresponding to each conversion output semantic position is obtained, so that a conversion output semantic element sequence is obtained. The multi-dimensional description of the input semantic element sequence to be processed is taken as the basis for obtaining the conversion output semantic element sequence, so that the obtained conversion output semantic element sequence can more accurately represent the semantics of the input semantic element sequence to be processed, and the accuracy of the data conversion model is improved.

The method for using the data conversion model provided by the embodiment of the application is described below.

In the embodiment of the present application, the server 102 obtains the sequence of the to-be-processed input semantic elements based on the coding model. For example, the server 102 receives the initial data sent by the first client 1011, i.e. the chinese sentence "i love drinking", and the server 102 inputs the initial data into the coding model to obtain the to-be-processed input semantic element sequence output by the coding model. For example, the input semantic elements to be processed corresponding to each word "i", "love", "drink" and "water" output by the coding model are obtained.

The server 102 inputs the obtained to-be-processed input semantic element sequence into the data conversion model, and obtains a conversion output semantic element sequence output by the data conversion model. The server 102 inputs the obtained conversion output semantic element sequence into a decoding model, and obtains target sub-data corresponding to each conversion output semantic element output by the decoding model. Each conversion output semantic element may correspond to a target sub-data, i.e. "I", "like", "drink" and "water", so as to obtain target data "I LIKE DRINK WATER", and implement chinese-english text translation.

For another example, the server 102 receives an image transmitted from the first client 1011, and the image includes an image area including a lawn, an image area including a person, and an image area including a dog. The server 102 inputs the image into the coding model, obtains the to-be-processed input semantic elements corresponding to each image area output by the coding model, and the server 102 obtains the to-be-processed input semantic element sequences.

The server 102 inputs the obtained to-be-processed input semantic element sequence into the data conversion model, and obtains a conversion output semantic element sequence output by the data conversion model. The server 102 inputs the obtained conversion output semantic element sequence into a decoding model, and obtains target sub-data corresponding to each conversion output semantic element output by the decoding model, including "target a", "lawn" and "dog", so as to obtain target data "target a walks the dog on the lawn".

The process of obtaining the input semantic element sequence to be processed by the server 102 based on the coding model is similar to the process of obtaining the sample input semantic element sequence by the server 102 based on the coding model described in S201, and is not described herein. The process of obtaining the target data by the server 102 based on the decoding model is similar to the process of obtaining the target data by the server 102 based on the decoding model described in S203, and will not be described herein.

Referring to fig. 5b, a flowchart of a method for using a data conversion model according to an embodiment of the present application is shown.

S501, obtaining a sequence of input semantic elements to be processed, and converting each conversion output semantic position in the sequence of conversion output semantic elements to be processed, wherein the element correlation probability between each conversion output semantic position and each input semantic element to be processed in the sequence of input semantic elements to be processed is obtained.

The process of the server 102 obtaining the conversion output semantic positions in the conversion output semantic element sequence and the element correlation probabilities between the to-be-processed input semantic elements in the to-be-processed input semantic element sequence is similar to the process of the server 102 obtaining the training output semantic positions in the training output semantic element sequence and the element correlation probabilities between the sample input semantic elements in the sample input semantic element sequence in S301, and will not be described again here.

S502, a data conversion model is adopted, and global conversion output semantic elements and local conversion output semantic elements corresponding to the conversion output semantic positions are respectively obtained based on the obtained element correlation probabilities.

The process of the server 102 adopting the data conversion model to obtain the global conversion output semantic elements and the local conversion output semantic elements corresponding to the conversion output semantic positions respectively based on the obtained relevant probabilities of the elements is similar to the process of the server 102 adopting the data conversion model to obtain the global training output semantic elements and the local training output semantic elements corresponding to the training output semantic positions respectively based on the obtained relevant probabilities of the elements in S302, and is not described herein again.

S503, based on the obtained global conversion output semantic elements and local conversion output semantic elements, respectively determining target conversion output semantic elements corresponding to the conversion output semantic positions to obtain a conversion output semantic element sequence.

The process of the server 102 determining, based on the obtained global conversion output semantic elements and the local conversion output semantic elements, the target conversion output semantic elements corresponding to the conversion output semantic positions respectively to obtain the conversion output semantic element sequence is similar to the process of determining, based on the obtained global training output semantic elements and the local training output semantic elements, the target training output semantic elements corresponding to the training output semantic positions respectively to obtain the training output semantic element sequence in S303, and is not described herein.

As one embodiment, the server 102 may visually display the process of using the data conversion model on the second client 1012, so that the user may intuitively see the whole process of using the data conversion model based on the second client 1012, and the user may conveniently analyze the process of using the data conversion model, adjust the data conversion model, or the like.

As an embodiment, to determine whether the data conversion model is directed to a conversion output semantic location, and whether to divert attention to other relevant to-be-processed input semantic elements other than the most relevant to-be-processed input semantic elements, the server 102 may determine the perceptibility of the data conversion model after obtaining the conversion output semantic elements. There are various ways to determine the perceptibility of the data conversion model, and one of them will be exemplified below. Referring to fig. 6, a flow chart of determining the perceptibility of the data conversion model is shown.

S601, obtaining local entropy corresponding to the conversion output semantic element sequence based on the relevant probabilities of the elements.

The server 102 may obtain an element correlation probability distribution corresponding to each conversion output semantic location based on element correlation probabilities between each conversion output semantic location and each to-be-processed input semantic element in the to-be-processed input semantic element sequence.

For example, the sequence of input semantic elements to be processed is { f ₁,f₂,……,f_j }, the sequence of output semantic elements to be converted is { e ₁,e₂,……,e_i }, the element correlation probability between a position pos of output semantic elements to be converted and the first input semantic element f ₁ to be processed in the sequence of input semantic elements to be processed is P (f ₁ |pos), and then the element correlation probability distribution P _pos corresponding to the position pos of output semantic elements to be converted is referred to formula (3).

P_pos＝{P(f₁|pos),P(f₂|pos),……,P(f₁|pos)} (3)

As an embodiment, if the decoding model includes a plurality of decoding layers, for each decoding layer, an element-related probability distribution corresponding to each conversion output semantic position can be determined, and referring to formula (4), for an element-related probability distribution corresponding to a conversion output semantic position pos in an nth decoding layer

After obtaining the element-related probability distribution corresponding to each conversion output semantic position, please refer to formula (5), and determine the local entropy LE corresponding to the conversion output semantic element sequence.

The smaller the local entropy LE corresponding to the conversion output semantic element sequence is, the more the attention of the data conversion model is focused on one input semantic element to be processed when the conversion output semantic element sequence is obtained; the larger the local entropy LE corresponding to the conversion output semantic element sequence is, the more the attention of the data conversion model is dispersed on different input semantic elements to be processed when the conversion output semantic element sequence is obtained.

S602, obtaining an accuracy analysis result of the data conversion model based on a comparison result of the obtained local entropy and a preset local entropy threshold.

After obtaining the local entropy corresponding to the conversion output semantic element sequence, the server 102 may compare the obtained local entropy with a preset local entropy threshold, and obtain an accuracy analysis result of the data conversion model based on the comparison result. For example, if the obtained local entropy is greater than a preset local entropy threshold, the accuracy analysis result of the obtained data conversion model indicates that the attention of the data conversion model is scattered on different to-be-processed input semantic elements, i.e. the to-be-processed input semantic elements concerned are more accurate. Or if the obtained local entropy is larger than a preset first local entropy threshold and smaller than a preset second local entropy threshold, the accuracy analysis result of the obtained data conversion model indicates that the attention of the data conversion model is concentrated and is dispersed on part of the input semantic elements to be processed in the input semantic element sequence to be processed, namely, the concerned input semantic elements to be processed are accurate, and the like.

The above method for using the data conversion model is described below as an example in connection with an application scenario.

Suppose that the data conversion model is required to translate the chinese sentence "a" into "A IS DATING A GIRL" when it is in contact with a girl. After receiving the chinese sentence "a is in contact with a girl" sent by the first client 1011, the server 102 inputs the chinese sentence as initial data into the coding model. The server 102 obtains each initial sub-data "a", "in", "with", "one", "girl" and "interaction" output by the coding model, and the corresponding to-be-processed input semantic elements respectively. The server 102 obtains a sequence of input semantic elements to be processed [ "a", "in", "with", "one", "girl", "in", etc. ] according to each input semantic element to be processed.

Taking the third association position, that is, the position of the target sub-data "dating" in the target data as an example, the server 102 determines that the data correlation probability between the third association position and "a" is 0.18, the data correlation probability between the third association position and "between" is 0.02, the data correlation probability between the third association position and "with" is 0.05, the data correlation probability between the third association position and "one" is 0.03, the data correlation probability between the third association position and "girl" is 0.04, and the data correlation probability between the third association position and "in contact" is 0.68.

Then, the server 102 obtains that the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "a" is 0.18, the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "at" is 0.02, the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "with" is 0.05, the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "one" is 0.03, the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "girl" is 0.04, and the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "in exchange" is 0.68.

The server 102 takes the element correlation probability as the first element weight of the corresponding input semantic element to be processed, and performs weighted summation processing on the input semantic element to be processed to obtain a global conversion output semantic element corresponding to the third conversion output semantic position. Thus, the server 102 may obtain a global transformation output semantic element corresponding to each transformation output semantic location.

According to preset relevant conditions, determining that the target relevant input semantic elements are to-be-processed input semantic elements corresponding to 'one', 'girl' and 'interaction', respectively. The server 102 determines that the data correlation probability between the third association location and "one" is 0.14, the data correlation probability between the third association location and "girl" is 0.15, and the data correlation probability between the third association location and "interaction" is 0.71. Then, the server 102 obtains that the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "one" is 0.14, the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "girl" is 0.15, and the element correlation probability between the third conversion output semantic position and the to-be-processed input semantic element corresponding to "interaction" is 0.71.

The server 102 takes the element correlation probability as the second element weight of the corresponding target correlation to-be-processed input semantic element, and performs weighted summation processing on the to-be-processed input semantic element to obtain the local conversion output semantic element corresponding to the third conversion output semantic position. Thus, the server 102 may obtain a local transformation output semantic element corresponding to each transformation output semantic location.

Based on the global weight and the local weight, the server 102 respectively performs weighted summation processing on the global conversion output semantic element and the local conversion output semantic element corresponding to each conversion output semantic position, and obtains the target conversion output semantic element corresponding to each conversion output semantic position. Thus, the server 102 may obtain a sequence of transformed output semantic elements.

After the server 102 obtains the conversion output semantic element sequence based on the data conversion model, the conversion output semantic element sequence may be input into the decoding model to obtain target sub-data "a", "is", "dating", "a", and "girl" output by the decoding model for each conversion output semantic element, thereby obtaining target data "A IS DATING A GIRL". After obtaining the target data "A IS DATING A GIRL", the server 102 may send to the second client 1012 to cause the second client 1012 to display the target data "A IS DATING A GIRL". It can be seen that the data conversion model has more focus on "one" and "girl" such that when "in contact" is translated, it will not translate to "socializing with" but to "dating", and obviously to "dating" more accurately.

The data conversion model may be applied to any model, such as recurrent neural network RNN, please refer to fig. 7a (1), or self-focused neural network SAN, please refer to fig. 7a (2). Each target sub-data of the RNN is only related to the corresponding initial sub-data, and the accuracy of the target data is low. And the SAN obtains each target sub-data based on each initial sub-data, so that the accuracy of the target data is higher.

The auto-focus neural network SAN includes an auto-regressive machine translation model (Autoregressive Machine Translation, AT), please refer to FIG. 7b (1), or a Non-auto-regressive neural machine translation model (Non-Autoregressive Machine Translation, NAT), please refer to FIG. 7b (2), etc. Taking a transducer model as an example, please refer to fig. 7c, which is a schematic diagram of the transducer model. The multi-head self-attention layer is used for masking target sub-data on the right side in the target data, so that the learning process is ensured to be from left to right, for example, non-output target sub-data is masked, and the learning process is ensured to be based on the output target sub-data.

For AT, in the case of the initial data x=x ₁,X₂,……,X_j, the target data y=y ₁,Y₂,……,Y_i, the conditional probability P (y|x; θ) of the target data, please refer to the formula (7).

Where Y < k represents part of the initial sub-data, representing the model parameters of the AT.

In training the AT, a maximum likelihood estimate of the conditional probability L (θ) is determined, please refer to equation (8).

For NAT, the conditional probability P (y|x) releases the conditional dependence of Y < k, please refer to equation (9).

Since the AT generates the corresponding target sub-data from the initial sub-data to the initial sub-data, each initial sub-data depends on the target sub-data generated before, please refer to table 1. The AT model has higher complexity, so that the efficiency of obtaining the target data is lower, while the NAT is that each initial sub-data independently generates the corresponding target sub-data, and can generate all the target sub-data simultaneously, so as to obtain the target data, and the efficiency of obtaining the target data is higher, please refer to table 2.

TABLE 1

TABLE 2

	Time (ms)	BLEU evaluation index
			NAT	210	4.76
AT	353	2.83

In the embodiment of the present application, the data conversion model is applied to the conditional mask language model (Conditional Masked Language Models, CMLMs) as an example, and the data conversion model provided in the embodiment of the present application is described as an example. The method for training the conditional mask language model will not be described in detail herein, and the following operations may be performed by inputting a semantic element sequence for a sample in the sample input semantic element sequence set in one training process.

Obtaining each training output semantic position in the training output semantic element sequence and element correlation probability between each sample input semantic element in the sample input semantic element sequencePlease refer to formula (10).

Wherein Q _pos represents the training output semantic position and K represents the correlation position of the sample input semantic element in the sample input semantic element sequence.

Based on the probabilities associated with the elementsFor a training output semantic location Q _pos, determining the global training output semantic element/>, of the training output semantic locationPlease refer to formula (11).

Where V represents the sample input semantic element.

Based on partial element correlation probabilitiesLocal training output semantic element/>, which determines the training output semantic positionWherein, part of element correlation probabilityPlease refer to formula (12). /(I)

Wherein r represents the associated position of the target related sample input semantic element in the sample input semantic element sequence, r ₀ represents the associated position of the sample input semantic element most related to the training output semantic position Q _pos in the sample input semantic element sequence, win represents a preset related condition,Representing a partial element correlation probability.

Outputting semantic elements based on global trainingAnd locally training output semantic elementsDetermine target training output semantic elements CCAN (Q _pos, K, V), please refer to equation (13).

Wherein the weights of the global training output semantic elements may be related to the linear transformation of Q _pos, please refer to equation (14).

g＝δ(WQ_pos) (14)

Where W represents the parameters of the linear transformation and delta (·) represents the sigmoid function.

Determining training loss of the data conversion model based on the sample input semantic element sequence and the training output semantic element sequence, and adjusting model parameters of the data conversion model based on the training loss of the data conversion model.

After the training of the conditional mask language model in combination with the data conversion model is completed, a test experiment may be performed on the trained conditional mask language model in combination with the data conversion model on the data set. The data set may be a small-scale WMT16 Romania-English Romanian-English (Ro-En), a medium-scale WMT14 English-German (En-De), a large-scale WMT17 Chinese-English (Zh-En), a WAT17 Japanese-English (Ja-En) with different orders of language, or the like. Before inputting the initial data into the transducer model, the data may be preprocessed by BPE word segmentation using a 32K merge operation, and finally using the BLEU as an evaluation index, and statistical significance checking may be performed.

As an example, a knowledge distillation method may be used to simplify training data. In the teacher model, the BASE and BIG transducer models are trained with an initial data set. In the large model, a big batch strategy (458K words per batch) is employed to optimize performance. The conditional mask language model may include 6 encoding layers and 6 decoding layers, wherein the decoding layers are trained using the conditional mask language model. The conditional mask language model 8 heads 512 and the feed forward network 2048.

Referring to Table 3, the conditional mask language model combines the comparison of the data translation model with the conventional NAT model on WMT16 RO-EN, WMT14 En-De, WMT17 ZH-EN and WAT17 JA-EN datasets. According to table 3, the translation performance of the conditional mask language model in combination with the data conversion model is higher.

TABLE 3 Table 3

By setting the global weight g, the local weight is converted into an element-related probability, i.e., a degree of importance. The local attention of each decoder layer is calculated, see fig. 8. During the data conversion process localness continues to decline, meaning that the data conversion model continues to be distracted until the penultimate layer increases and the data conversion model begins to concentrate. That is, the middle layer is mainly responsible for data conversion so distraction, and the top layer needs to generate target data so distraction begins.

The data conversion model should focus more on the most relevant to-be-processed input semantic elements, as well as other relevant to-be-processed input semantic elements. For the accuracy of the attention of the data conversion model, please refer to fig. 9, and different accuracy of the attention of the data conversion model is obtained by setting different win. The conditional mask language model in combination with the data conversion model is always better than the baseline (Δaccuracy), illustrating the increased Accuracy of the attention of the conditional mask language model.

To verify language characteristics learned by a conditional mask language model in combination with a data transformation model, the language may be quantified from a linguistic perspective using exploratory tasks. Exploratory tasks can be divided into three types, "surface tasks" for simple surface properties learned from sentence embedding; "syntactic task" is used to quantify syntactic retention; "semantics" are used to evaluate the deeper semantic representation capabilities, please refer to table 4. It can be seen that the conditional mask language model retains rich grammar and semantic information in combination with the data conversion model.

TABLE 4 Table 4

Based on the same inventive concept, the embodiment of the present application provides a device for training a data conversion model, which is equivalent to the server 102 discussed above, and can implement the functions corresponding to the method for training the data conversion model. Referring to fig. 10, the apparatus includes a training module 1001 and an obtaining module 1002, where:

Training module 1001: training the data conversion model by adopting a sample input semantic element sequence set to obtain a trained data conversion model; in the training process, at least the following operations are executed for a sample input semantic element sequence in a sample input semantic element sequence set:

Acquisition module 1002: the method comprises the steps of obtaining element correlation probabilities between each training output semantic position in a training output semantic element sequence and each sample input semantic element in a sample input semantic element sequence;

The training module 1001 is also for: the method comprises the steps of adopting a data conversion model, respectively obtaining global training output semantic elements and local training output semantic elements corresponding to each training output semantic position based on obtained element correlation probabilities, respectively determining target training output semantic elements corresponding to each training output semantic position based on each obtained global training output semantic element and each local training output semantic element, and obtaining a training output semantic element sequence, wherein the global training output semantic elements are related to each sample input semantic element, and the local training output semantic elements are related to part of sample input semantic elements.

In one possible embodiment, the obtaining module 1002 is specifically configured to:

Based on the arrangement rule of target sub-data learned by the coding model and the decoding model, respectively determining the associated position of the target sub-data corresponding to each training output semantic position in the target data and the data correlation probability between each initial sub-data corresponding to each sample input semantic position in the initial data, wherein the sample input semantic element sequence is obtained by adopting the coding model to code the initial data, and the target data is obtained by adopting the decoding model corresponding to the coding model to decode the training output semantic element sequence;

based on the obtained respective data correlation probabilities, respective training output semantic locations in the training output semantic element sequence are determined, together with element correlation probabilities between respective sample input semantic elements in the sample input semantic element sequence.

In one possible embodiment, the training module 1001 is specifically configured to:

For each training output semantic position, the following operations are respectively executed:

Based on the obtained element correlation probabilities, obtaining sample input semantic elements in the sample input semantic element sequence respectively, wherein the sample input semantic elements correspond to first element weights of one training output semantic position in each training output semantic position;

And carrying out weighted summation processing on each sample input semantic element based on the first element weight corresponding to each sample input semantic element to obtain a global training output semantic element corresponding to the training output semantic position.

Based on the obtained correlation probability of each element, determining a target correlation sample input semantic element corresponding to one training output semantic position in each training output semantic position in a sample input semantic element sequence;

Based on the obtained element correlation probabilities, obtaining target correlation sample input semantic elements in the sample input semantic element sequence respectively, wherein the target correlation sample input semantic elements correspond to second element weights of a training output semantic position;

and carrying out weighted summation processing on the input semantic elements of each target related sample based on the second element weights corresponding to the input semantic elements of each target related sample to obtain a local training output semantic element corresponding to the training output semantic position.

Based on the obtained element correlation probabilities, determining the maximum element correlation probability corresponding to a training output semantic position, and obtaining the most relevant sample input semantic element corresponding to the training output semantic position in the sample input semantic element sequence;

And in the sample input semantic element sequence, determining at least one sample input semantic element which meets the preset correlation condition with the most relevant sample input semantic element as a target relevant sample input semantic element corresponding to the training output semantic position.

In a sample input semantic element sequence, determining at least one sample input semantic element with a distance between a sample input semantic position and a sample input semantic position of the most relevant sample input semantic element within a preset distance range as a target relevant sample input semantic element corresponding to a training output semantic position; or alternatively

And in the sample input semantic element sequence, determining at least one sample input semantic element with the error between the element correlation probability and the maximum element correlation probability within a preset error range as a target correlation sample input semantic element corresponding to the training output semantic position.

Obtaining global weights of global training output semantic elements and local weights of local training output semantic elements corresponding to one training output semantic position in each training output semantic position;

and carrying out weighted summation processing on the global training output semantic elements and the local training output semantic elements corresponding to one training output semantic position based on the obtained global weight and the local weight to obtain target training output semantic elements corresponding to one training output semantic position.

Performing linear transformation processing on the position identifier corresponding to the training output semantic position to obtain a linear transformed position identifier corresponding to the training output semantic position;

determining the position mark after linear transformation as the global weight of a global training output semantic element corresponding to the training output semantic position;

And determining the weight coefficient meeting the preset weight relation with the obtained global weight as the local weight of the local training output semantic element corresponding to the training output semantic position.

In one possible embodiment, the training module 1001 is further configured to:

after the training output semantic element sequence is obtained, determining coding and decoding training loss corresponding to the coding model and the decoding model based on the similarity between target data corresponding to the training output semantic element sequence and initial data corresponding to the sample input semantic element sequence;

The coding and decoding training loss is determined as the training loss of the data conversion model, and model parameters of the data conversion model are adjusted based on the training loss.

Based on the same inventive concept, the embodiment of the present application provides a device using a data conversion model, which is equivalent to the server 102 discussed above, and can implement the functions corresponding to the method using the data conversion model. Referring to fig. 11, the apparatus includes an obtaining module 1101 and a converting module 1102, where:

Acquisition module 1101: the method comprises the steps of obtaining a to-be-processed input semantic element sequence, converting each conversion output semantic position in an output semantic element sequence, and obtaining element correlation probabilities between each to-be-processed input semantic element in the to-be-processed input semantic element sequence;

Conversion module 1102: the method comprises the steps of obtaining global conversion output semantic elements and local conversion output semantic elements corresponding to conversion output semantic positions respectively based on obtained element correlation probabilities by using a data conversion model, and determining target conversion output semantic elements corresponding to conversion output semantic positions respectively based on obtained global conversion output semantic elements and local conversion output semantic elements to obtain a conversion output semantic element sequence, wherein the global conversion output semantic elements are related to each input semantic element to be processed, and the local conversion output semantic elements are related to part of the input semantic elements to be processed.

In one possible embodiment, the obtaining module 1101 is specifically configured to:

Based on the arrangement rule of target sub-data learned by the coding model and the decoding model, respectively determining the associated position of the target sub-data corresponding to each conversion output semantic position in the target data and the data correlation probability between each initial sub-data corresponding to each input semantic position to be processed in the initial data, wherein the input semantic element sequence to be processed is obtained by adopting the coding model to code the initial data, and the target data is obtained by adopting the decoding model corresponding to the coding model to decode the conversion output semantic element sequence;

And determining element correlation probabilities between each conversion output semantic position in the conversion output semantic element sequence and each to-be-processed input semantic element in the to-be-processed input semantic element sequence based on the obtained each data correlation probabilities.

In one possible embodiment, the conversion module 1102 is specifically configured to:

for each conversion output semantic position, a data conversion model is adopted to respectively execute the following operations:

Based on the obtained relevant probabilities of the elements, obtaining the first element weight of each to-be-processed input semantic element in the to-be-processed input semantic element sequence, corresponding to one conversion output semantic position in each conversion output semantic position;

and carrying out weighted summation processing on each input semantic element to be processed based on the first element weight corresponding to each input semantic element to be processed, and obtaining a global conversion output semantic element corresponding to the conversion output semantic position.

For each conversion output semantic location, the following operations are performed:

Based on the obtained relevant probabilities of the elements, determining target relevant to-be-processed input semantic elements corresponding to one conversion output semantic position in the conversion output semantic positions in the to-be-processed input semantic element sequence;

Based on the obtained element correlation probabilities, obtaining target correlation to-be-processed input semantic elements in the to-be-processed input semantic element sequence respectively, and corresponding to second element weights of one conversion output semantic position in the conversion output semantic positions;

and carrying out weighted summation processing on each target related to-be-processed input semantic element based on the second element weight corresponding to each target related to-be-processed input semantic element to obtain a local conversion output semantic element corresponding to the conversion output semantic position.

Based on the obtained element correlation probabilities, determining the maximum element correlation probability corresponding to a conversion output semantic position, and obtaining the most relevant to-be-processed input semantic element corresponding to the conversion output semantic position in the to-be-processed input semantic element sequence;

And in the input semantic element sequence to be processed, determining at least one input semantic element to be processed, which meets the preset correlation condition with the most relevant input semantic element to be processed, as a target relevant input semantic element corresponding to the conversion output semantic position.

In the input semantic element sequence to be processed, determining at least one input semantic element to be processed, of which the distance between the input semantic position to be processed and the input semantic position to be processed of the most relevant input semantic element to be processed is within a preset distance range, as a target relevant input semantic element to be processed corresponding to the conversion output semantic position; or alternatively

And in the input semantic element sequence to be processed, determining at least one input semantic element to be processed, of which the error between the element correlation probability and the maximum element correlation probability is within a preset error range, as a target correlation input semantic element corresponding to a conversion output semantic position.

Obtaining global weights of global conversion output semantic elements and local weights of local conversion output semantic elements corresponding to one conversion output semantic position in each conversion output semantic position;

And carrying out weighted summation processing on the global conversion output semantic elements and the local conversion output semantic elements corresponding to one conversion output semantic position based on the obtained global weight and the local weight to obtain target conversion output semantic elements corresponding to one conversion output semantic position.

In one possible embodiment, the conversion module 1102 is further configured to:

After the conversion output semantic element sequence is obtained, obtaining a local entropy corresponding to the conversion output semantic element sequence based on the obtained element correlation probabilities;

And obtaining an accuracy analysis result of the data conversion model based on the obtained comparison result of the local entropy and the preset local entropy threshold, wherein the accuracy analysis result is used for representing whether the concerned input semantic elements to be processed are accurate or not when determining the target conversion output semantic elements corresponding to the conversion output semantic positions.

Based on the same inventive concept, embodiments of the present application provide a computer apparatus, and the computer apparatus 1200 will be described below.

Referring to fig. 12, the apparatus for training and using the data conversion model may be run on a computer device 1200, and the application software corresponding to the current version and the history version of the program for training and using the data conversion model and the program for training and using the data conversion model may be installed on the computer device 1200, where the computer device 1200 includes a display unit 1240, a processor 1280 and a memory 1220, and the display unit 1240 includes a display panel 1241 for displaying an interface interacted with by a user and the like.

In one possible embodiment, the display panel 1241 may be configured in the form of a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD) or an Organic Light-Emitting Diode (OLED) or the like.

The processor 1280 is configured to read a computer program and then execute a method defined by the computer program, for example, the processor 1280 reads a program or a file for training, using a data conversion model, etc., so that the program for training, using the data conversion model is run on the computer apparatus 1200, and a corresponding interface is displayed on the display unit 1240. Processor 1280 may include one or more general-purpose processors and may also include one or more DSPs (DIGITAL SIGNAL processors ) for performing the relevant operations to implement the technical solutions provided by embodiments of the present application.

Memory 1220 typically includes memory and external memory, and memory may be Random Access Memory (RAM), read Only Memory (ROM), CACHE memory (CACHE), and the like. The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk, a tape drive, etc. The memory 1220 is used to store computer programs including application programs corresponding to the respective clients, etc., and other data, which may include data generated after the operating system or application programs are executed, including system data (e.g., configuration parameters of the operating system) and user data. Program instructions are stored in memory 1220 in an embodiment of the present application, and processor 1280 executes the program instructions stored in memory 1220 to implement any of the methods of training and using a data conversion model discussed in the previous figures.

The above-described display unit 1240 is used to receive input digital information, character information, or touch operation/noncontact gestures, and to generate signal inputs related to user settings and function controls of the computer device 1200, and the like. Specifically, in an embodiment of the present application, the display unit 1240 may include a display panel 1241. The display panel 1241, e.g., a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the display panel 1241 or on the display panel 1241 using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program.

In one possible embodiment, the display panel 1241 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a player, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1280, and can receive commands from the processor 1280 and execute them.

The display panel 1241 may be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the display unit 1240, the computer device 1200 may also include an input unit 1230, which input unit 1230 may include a graphical input device 1231 and other input devices 1232, where the other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

In addition to the above, the computer device 1200 may also include a power supply 1290, audio circuitry 1260, a near field communication module 1270, and RF circuitry 1210 for powering other modules. The computer device 1200 may also include one or more sensors 1250, such as acceleration sensors, light sensors, pressure sensors, and the like. The audio circuit 1260 specifically includes a speaker 1261 and a microphone 1262, etc., and for example, the computer device 1200 can collect the sound of the user through the microphone 1262, perform corresponding operations, etc.

The number of processors 1280 may be one or more, and the processors 1280 and the memory 1220 may be coupled or may be relatively independent.

As an example, the processor 1280 in fig. 12 may be used to implement the functions of the training module 1001 and the acquisition module 1002 in fig. 10, and may also be used to implement the functions of the acquisition module 1101 and the conversion module 1102 in fig. 11.

As an example, the processor 1280 in fig. 12 may be used to implement the functionality corresponding to the test equipment 103 discussed previously.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or optical disk, or the like, which can store program codes.

Or the above-described integrated units of the invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of training a data conversion model, comprising:

adopting the data conversion model, and respectively executing the following operations aiming at each training output semantic position:

based on the obtained element correlation probabilities, obtaining sample input semantic elements in the sample input semantic element sequence respectively, wherein the sample input semantic elements correspond to first element weights of a training output semantic position;

Based on the first element weight corresponding to each sample input semantic element, carrying out weighted summation processing on each sample input semantic element to obtain a global training output semantic element corresponding to the training output semantic position;

Determining target relevant sample input semantic elements corresponding to the training output semantic positions in the sample input semantic element sequence based on the obtained element relevant probabilities, and obtaining second element weights corresponding to the training output semantic positions of the obtained target relevant sample input semantic elements;

Based on the second element weight corresponding to each target related sample input semantic element, carrying out weighted summation processing on each target related sample input semantic element to obtain a local training output semantic element corresponding to one training output semantic position;

And respectively determining target training output semantic elements corresponding to the training output semantic positions based on the obtained global training output semantic elements and local training output semantic elements so as to obtain the training output semantic element sequence, wherein the global training output semantic elements are related to each sample input semantic element, and the local training output semantic elements are related to part of sample input semantic elements.

2. The method of claim 1, wherein obtaining element-related probabilities between each training output semantic position in the training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence comprises:

Based on the arrangement rule of target sub-data learned by a coding model and a decoding model, respectively determining the associated position of the target sub-data corresponding to each training output semantic position in the target data and the data correlation probability between each initial sub-data corresponding to each sample input semantic position in the initial data, wherein the sample input semantic element sequence is obtained by adopting the coding model to code the initial data, and the target data is obtained by adopting the decoding model corresponding to the coding model to decode the training output semantic element sequence;

and determining element correlation probabilities between each training output semantic position in the training output semantic element sequence and each sample input semantic element in the sample input semantic element sequence based on the obtained respective data correlation probabilities.

3. The method of claim 1, wherein determining a target relevant sample input semantic element corresponding to one of the respective training output semantic locations based on the obtained respective element-related probabilities comprises:

Based on the obtained element correlation probabilities, determining the maximum element correlation probability corresponding to the training output semantic position, and obtaining the most relevant sample input semantic element corresponding to the training output semantic position in the sample input semantic element sequence; or alternatively

4. A method according to claim 3, wherein determining, in the sequence of sample input semantic elements, at least one sample input semantic element that satisfies a preset correlation condition with the most relevant sample input semantic element as a target relevant sample input semantic element corresponding to the one training output semantic location comprises:

In the sample input semantic element sequence, determining at least one sample input semantic element with a distance between a sample input semantic position and a sample input semantic position of the most relevant sample input semantic element within a preset distance range as a target relevant sample input semantic element corresponding to the training output semantic position; or alternatively

5. The method according to any one of claims 1 to 4, wherein determining, based on the obtained global training output semantic elements and local training output semantic elements, the target training output semantic elements corresponding to the training output semantic positions, respectively, includes:

for each training output semantic position, respectively executing the following operations:

And carrying out weighted summation processing on the global training output semantic elements and the local training output semantic elements corresponding to the training output semantic positions based on the obtained global weights and the local weights to obtain target training output semantic elements corresponding to the training output semantic positions.

6. The method of claim 5, wherein obtaining global weights for global training output semantic elements and local weights for local training output semantic elements corresponding to one of the respective training output semantic locations comprises:

performing linear transformation processing on the position identifier corresponding to the training output semantic position to obtain the position identifier corresponding to the training output semantic position after linear transformation;

Determining the position identifier after linear transformation as the global weight of the global training output semantic element corresponding to the training output semantic position;

7. The method of claim 2, further comprising, after obtaining the training output sequence of semantic elements:

determining coding and decoding training loss corresponding to the coding model and the decoding model based on the similarity between the target data corresponding to the training output semantic element sequence and the initial data corresponding to the sample input semantic element sequence;

And determining the coding and decoding training loss as the training loss of the data conversion model, and adjusting model parameters of the data conversion model based on the training loss.

8. A method of using a data transformation model, comprising:

Adopting the data conversion model, and respectively executing the following operations aiming at the conversion output semantic positions:

Based on the obtained relevant probabilities of the elements, obtaining the first element weight corresponding to a conversion output semantic position of each to-be-processed input semantic element in the to-be-processed input semantic element sequence;

Based on the first element weight corresponding to each input semantic element to be processed, carrying out weighted summation processing on each input semantic element to be processed to obtain a global conversion output semantic element corresponding to the conversion output semantic position;

Determining target related to-be-processed input semantic elements corresponding to the one conversion output semantic position in the to-be-processed input semantic element sequence based on the obtained element related probabilities, and obtaining second element weights corresponding to the one conversion output semantic position of each obtained target related to-be-processed input semantic element;

Based on the second element weight corresponding to each target related to-be-processed input semantic element, carrying out weighted summation processing on each target related to-be-processed input semantic element to obtain a local conversion output semantic element corresponding to the conversion output semantic position;

And respectively determining target conversion output semantic elements corresponding to the conversion output semantic positions based on the obtained global conversion output semantic elements and local conversion output semantic elements so as to obtain a conversion output semantic element sequence, wherein the global conversion output semantic elements are related to each input semantic element to be processed, and the local conversion output semantic elements are related to part of the input semantic elements to be processed.

9. The method of claim 8, further comprising, after obtaining the sequence of transformed output semantic elements:

Based on the obtained element correlation probabilities, obtaining local entropy corresponding to the conversion output semantic element sequence;

And obtaining an accuracy analysis result of the data conversion model based on a comparison result of the obtained local entropy and a preset local entropy threshold, wherein the accuracy analysis result is used for representing whether the concerned input semantic elements to be processed are accurate or not when determining the target conversion output semantic elements corresponding to each conversion output semantic position.

10. An apparatus for training a data transformation model, comprising:

The training module is also configured to: adopting the data conversion model, and respectively executing the following operations aiming at each training output semantic position:

11. An apparatus for using a data transformation model, comprising:

and a conversion module: the data conversion model is used for respectively executing the following operations aiming at the conversion output semantic positions:

12. A computer device, comprising:

A memory for storing program instructions;

a processor for invoking program instructions stored in the memory and executing the method according to any of the claims 1-9 according to the obtained program instructions.

13. A storage medium storing computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 9.