Disclosure of Invention
In order to solve the technical problems, the invention provides a task matching data security matching method based on a neural network so as to solve the existing problems.
The task matching data security matching method based on the neural network adopts the following technical scheme:
the embodiment of the invention provides a neural network-based task matching data security matching method, which comprises the following steps:
Collecting text data sets of different data sources;
For each text data in the text data set of the same data source, constructing a structural multivalent matrix of each text data according to the correlation relationship between each text data and the dependency relationship between words, acquiring the whole text meaning matching degree between any two text data according to the semantic information difference between any two text data, acquiring the data dependency structure similarity between any two text data according to the structure feature similarity condition between any two text data, constructing the homologous data correlation coefficient between any two text data based on the whole text meaning matching degree and the data dependency structure similarity;
Acquiring transaction association rule matching cost between any two words in different data sources according to association rule differences between any two words in different data sources; for any two pieces of text data between different data sources, acquiring the text characteristic space information matching degree between any two pieces of text data according to the space information difference between the any two pieces of text data; acquiring multi-section valuable matching cost between any two pieces of text data based on text feature space information matching degree and task multi-dimensional matching cost weight;
And obtaining the matching degree between any two pieces of text data in different data sources by adopting a twin network, and screening the matching data of each piece of text data in each data source according to the matching degree.
Preferably, the construction of the structural multivalent matrix of each text data according to the correlation relationship between each text data and the dependency relationship between words includes:
the Jacquard coefficient between each piece of text data and all the text data in the text data set of the same data source is used as input of an Ojin threshold algorithm, and a segmentation threshold is obtained;
The text data with the Jacquard coefficient larger than the segmentation threshold value in all the text data in the text data set of the same data source form a similar data set of each piece of text data;
counting all dependency relationship types and corresponding frequencies of all text data in a similar data set of each piece of text data;
And for each piece of text data, taking the frequency of the r dependency relationship corresponding to the p-th word in the text data as an element of the p-th row and the r-th column in the structural multivalent matrix of the text data.
Preferably, the obtaining the whole text meaning matching degree between any two text data according to the semantic information difference between any two text data includes:
The method comprises the steps of obtaining ED editing distance between any two pieces of text data, taking absolute value of difference value of word quantity in any two pieces of text data as an index of an exponential function based on a natural constant, calculating DTW distance between structural multivalent matrixes of any two pieces of text data, and obtaining the ED editing distance between any two pieces of text data;
And calculating the product of the calculation result of the exponential function and the DTW distance, and taking the sum of the product and the ED editing distance as the whole text meaning matching degree between any two text data.
Preferably, the obtaining the data dependency structure similarity between any two pieces of text data according to the structure feature similarity between any two pieces of text data includes:
for each word in each piece of text data, adopting ELMo model to obtain word vector of the word, using the word vector of the word as first element of single dependency structure down-conversion sequence, using the dependency relationship frequency in the corresponding row vector in the structure multivalent matrix of the text data where the word is located as second to last element of single dependency structure down-conversion sequence according to descending order;
Taking any two words in any two pieces of text data as a group of word pairs, calculating pearson correlation coefficients between single dependency structure down-conversion sequences of the word pairs, and calculating the average value of the pearson correlation coefficients of all the word pairs in the any two pieces of text data;
And taking the product of the word number and the mean value as the data dependency structure similarity between any two pieces of text data.
Preferably, the constructing the homologous data association coefficient between any two text data based on the whole text meaning matching degree and the data dependency structure similarity includes:
The method comprises the steps of constructing a first index function by taking a natural constant as a base number and taking the meaning matching degree of the whole text between any two pieces of text data as an index, constructing a second index function by taking the natural constant as a base number and taking the data dependency structure similarity between any two pieces of text data as an index, and taking the ratio of the calculation result of the second index function to the calculation result of the first index function as a homologous data association coefficient between any two pieces of text data.
Preferably, the obtaining the transaction association rule matching cost between any two words in different data sources according to the association rule difference between any two words in different data sources includes:
acquiring an association rule and an association rule confidence coefficient of each word in a text data set of each data source by adopting an Apriori algorithm;
For any two words in different data sources, acquiring the number of association rules in the data source where the words are located, and recording the sum of the number of association rules of any two words as a first sum;
aiming at any two words in different data sources, obtaining the minimum value and the average value of the confidence coefficient of the association rule in the data source where the words are located, and calculating the product of the minimum value and the average value;
And taking the ratio of the first sum value to the second sum value as the transaction association rule matching cost between any two words in different data sources.
Preferably, the obtaining the matching degree of the text feature space information between any two text data according to the space information difference between any two text data includes:
The method comprises the steps of taking each text data in each data source as each Node, taking a homologous data association coefficient between any two text data in each data source as edge weight between corresponding nodes, and constructing a label graph of each data source according to the nodes and the edge weight between the nodes;
for a node pair formed by any two nodes directly connected with corresponding nodes in the label graph of any two text data, calculating the average value of the correlation coefficients of the homologous data of the node pair;
And taking the sum of the sum and the first Euclidean distance as the matching degree of text feature space information between any two pieces of text data.
Preferably, the obtaining the task multidimensional matching cost weight between any two text data according to the association rule difference between words in any two text data includes:
acquiring the sum of the reciprocal of the transaction association rule matching cost between all any two words in any two pieces of text data;
Acquiring a set of matching cost levels corresponding to transaction association rule matching costs of all any two words in any two pieces of text data, wherein the same transaction association rule matching cost is used as the same matching cost level;
and calculating a Jacquard coefficient between the set of any two pieces of text data, and multiplying the inverse of the sum value of the Jacquard coefficient and a preset parameter adjustment factor by the sum value to obtain a task multidimensional matching cost weight between any two pieces of text data.
Preferably, the obtaining the multiple sections of valuable matching cost between any two pieces of text data based on the text feature space information matching degree and the task multidimensional matching cost weight includes:
And calculating the product of the matching degree of the text characteristic space information between any two pieces of text data and the multi-dimensional matching cost weight of the task for any two pieces of text data, and taking the reciprocal of the sum of the product and a preset parameter adjusting factor as the multi-section valuable matching cost between any two pieces of text data.
Preferably, the screening the matching data of each piece of text data in each data source according to the matching degree includes:
And regarding each piece of text data in each data source, taking the text data corresponding to the maximum value of the matching degree between each piece of text data and all text data in all the rest data sources as the matching data of each piece of text data.
The invention has at least the following beneficial effects:
According to the invention, through analyzing the similarity condition among text data in text data sets in a plurality of data sources in one-time energy transaction process of an enterprise, the similar data sets of the text data are screened, so that word analysis is conveniently carried out on text data with similar characteristics, the number of samples for word relation analysis is increased, and the quality of relation analysis is improved; meanwhile, the invention constructs the whole text meaning matching degree between any two pieces of text data based on semantic information features among different text data in homologous data, mines the whole difference information of the text data, words and the structural multivalent matrix, analyzes the data matching condition from the whole text data, then constructs the data dependency structure similarity between any two text data according to the single dependency structure down-conversion sequence difference of all word combinations in different text data, analyzes the similarity between the data structures from the structural feature, combines the semantic correlation and the lexical structural similarity to construct the homologous data association coefficient, and simultaneously considers the semantic information and the syntactic structural feature of the text data to accurately evaluate the association strength between the text data in the same data source;
the method comprises the steps of determining transaction association rule matching cost when text data of different data sources are matched based on association rules of text data mining of the different data sources, determining the transaction association rule matching cost when the text data of the different data sources are matched, considering that part of words in the text data are high in confidence degree and too frequently appear in spite of the fact that the association rules are high in confidence degree, so that importance degree of the matching is affected when the words are matched, constraining matching cost of words with different frequencies in the text data through the number and the confidence degree of the association rules, influencing weight of rule relation among deep analysis words on matching of the text data, reflecting basic word rule architecture in the text data from a deeper level, increasing reliability of text data matching analysis, constructing a tag graph through homologous data association coefficients of the text data, mining space information of each node in the tag graph, helping to identify similarity among all nodes through mapping of association information to a low-dimensional continuous space, and accordingly helping to judge data matching conditions, and secondly, determining multiple sections of valid word matching indexes among the text data in different data sources based on the fact that matching cost of single words and whole text in matching time is high and low, serving as a measure mode of a downstream network in a twin network, improving accuracy of matching of the text data in matching of the different data sources.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of the task matching data security matching method based on the neural network according to the invention, and the detailed implementation, structure, characteristics and effects thereof are described in detail below with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the task matching data security matching method based on the neural network provided by the invention with reference to the accompanying drawings.
The task matching data security matching method based on the neural network provided by the embodiment of the invention.
Specifically, the following method for matching task matching data based on a neural network is provided, please refer to fig. 1, and the method comprises the following steps:
And S001, collecting text data of different data sources when the enterprise A carries out energy transaction, and preprocessing the collected data.
In the embodiment, task matching data when the enterprise A carries out energy transaction is used as basic data of security matching. When an enterprise A carries out energy transaction, the energy consumption of the enterprise A, the purchase amount of each energy source, the actual condition of checking the energy use of the enterprise A and the like are required to be counted, namely, a plurality of tasks are involved in the flow of the energy transaction, and sample spaces of different tasks are different, for example, the energy use data of an enterprise database and the energy types purchased by the enterprise A and the purchase amount data of each energy source recorded by an energy supply center are different.
In this embodiment, the number of data sources required for processing multiple tasks is recorded as M, and the value of M is set by the implementer according to the actual situation, and in this embodiment, the value is 3. The 3 data sources are a third-party auditing institution, an enterprise database and an energy supply center respectively. When the enterprise a conducts an energy transaction, the supply of the seller and the demand of the enterprise a include a plurality of specific branches, such as the discharge condition, purchase amount, purchase time, selling amount of the seller, etc., of the enterprise a, and the above data are recorded in text form in different data sources.
For any one of the original text data for which there is time data, the time data in each of the original text data is first converted into a form of a time stamp. Secondly, in this embodiment, all original text data in each data source is used as input, each piece of original text data is converted into a data string form by using a dataframe. Str function in a Python tool library, the converted result is used as text data, and Chinese characters in each piece of text data are marked. For any one data source, all text data in each data source is formed into a text data set.
So far, the text data used for task matching during energy transaction in the embodiment, namely the text data set of each data source, is obtained and used for carrying out data matching among different data sources in the follow-up process.
And step S002, constructing a homologous data association coefficient based on semantic relativity and lexical structural similarity between text data in the same data source, and determining a multi-section valuable matching index between text data in different data sources based on a label graph of each data source and matching cost.
The embodiment aims at matching text data of different data sources through a neural network, and data security matching is completed based on a matching result. For datasets of different data sources, the present embodiment contemplates matching using the data structures and entity attribute tags of text data in the different data sources.
Specifically, when the label graph is constructed, text data is represented by nodes, the side represents the association relationship between two text data, and the same text data at different moments can be treated as the same node.
Further, a label graph is constructed for the text data set of each data source, and the purpose of the label graph is to reflect the relevance of entities and attributes in different data sources. This is because the same energy usage records the different corresponding content in different data sources, for example, the a enterprise purchases electricity once, the name, time, amount of purchase, etc. of the a enterprise are recorded in the function center, and the data of staff, unit price, signature staff, etc. of the electricity purchased are recorded in the database of the a enterprise, and the entities corresponding to the different data in the process of purchasing electricity once should have certain association with the a enterprise.
For each piece of text data, calculating the Jacquard coefficient of a character string set between each piece of text data and any other piece of text data in the text data set of the same data source, taking the Jacquard coefficient of the character string set between each piece of text data and all other pieces of text data as input, acquiring a segmentation threshold value by adopting an Ojin threshold algorithm, and taking a set formed by all pieces of text data with the Jacquard coefficient larger than the segmentation threshold value as a similar data set of each piece of text data. The jekcard coefficient and the oxford threshold algorithm are known techniques, and the description of this embodiment is omitted.
And taking each text data as input, and acquiring the dependency relationship between any two words in each text data by adopting a dependency syntax, wherein the dependency syntax is a known technology, and the specific process is not repeated. And secondly, obtaining the dependency relationship between any two words in each piece of text data in the similar data set of each piece of text data, and counting the types and the frequencies of the dependency relationship to form a structural multivalent matrix of each piece of text data. The structure multivalent matrix of the a-th text data is shown in fig. 2, wherein m is the number of words in the a-th text data, H is the category number of all the dependences in all the text data, q 1 to q H are the 1st to H-th dependences respectively, w 1 to w H are the 1st to m-th words respectively, and i 11 is the frequency of the first dependency q 1 existing in the first word w 1 in the a-th text data.
For any one data source, all sentences in the text data set of each data source are used as input of a ELMo model (Embeddings form Language Models), a ELMo model is utilized to obtain word vectors of each word, and a ELMo model is a known technology, and the specific process is not repeated. Secondly, taking a word vector of an xth word in the structural multivalent matrix of each text data as a first element, and taking a sequence formed by taking the dependency relationship frequency of the xth word as a second element to a last element according to a descending order as a single dependency structure down-conversion sequence of the xth word.
Based on the analysis, a homologous data association coefficient is constructed here for characterizing the degree of association of structural features and semantic information between two text data. Calculating a homologous data association coefficient between the a text data and the b text data:
Sab=ED(Ca,Cb)+exp(|Na-Nb|)×dtw(Ya,Yb)
Wherein S ab is the degree of matching of the meaning of the whole text between the text data of the a-th and the b-th, C a、Cb is the character string of the text data of the a-th and the b-th, ED (C a,Cb) is the ED editing distance between the character strings C a and C b, exp () is an exponential function based on a natural constant e, N a、Nb is the number of words in the text data of the a-th and the b-th, Y a、Yb is the structure multivalent matrix of the text data of the a-th and the b-th, DTW (Y a,Yb) is the DTW distance between the matrix Y a and Y b, wherein the ED editing distance and the DTW distance are all known techniques, and the detailed process is not repeated;
l ab is the data dependency structure similarity between the a-th and b-th text data, m 1 is the number of words simultaneously existing in the a-th and b-th text data, x and y are the x-th and y-th words in the a-th and b-th text data respectively, d x is the single dependency structure down-conversion sequence of the x-th word of the a-th text data, d y is the single dependency structure down-conversion sequence of the y-th word of the b-th text data, and P (d x,dy) is the pearson correlation coefficient between the sequences d x and d y;
l ab is the homology data correlation coefficient between the a-th and b-th text data.
The greater the probability of being generated by the same A enterprise energy transaction, the stronger the association of the two pieces of text data, the smaller the value of ED (C a,Cb), the stronger the association of the a-th and b-th text data, the greater the similarity of the a-th and b-th text data in the similar data set, the greater the similarity of the a-th and b-th text data, the greater the similarity of the elements in the structure multivalent matrix of the a-th and b-th text data, the greater the similarity of Y a and Y b, the smaller the value of exp (|N a-Nb |), the smaller the value of dtw (Y a,Yb), the greater the matching degree between the a-th and b-th text data, the greater the number of words simultaneously existing in the structure multivalent matrix of the a-th and b-th text data, the greater the value of m 1, the greater the similarity of the b-th and b-th text data, the greater the similarity of the b-th text data, the greater the value of b-th text data, the greater the similarity of the b-th and the greater the similarity of the b-th text data, the greater the b-and the greater the similarity of the b-th text data, the greater the value of the b-4.
Further, when any two pieces of text data in two data sources are matched, the matching results of different words have different contributions to the matching degree between the text data. For example, all text data generated by the enterprise a during one energy transaction have different recording forms of time data. For example, the A enterprise data center records 2021, 10 months and 9 days, the energy supply center records are nineteen days, the purchasing personnel recorded in the A enterprise data center has staff I 1、I2、I3, the energy supply center records are signing personnel I 1, words corresponding to two attributes of time and personnel can have different influences when text data of different data sources are matched, and the matching degree of the two text data can be directly determined according to the matching results of the time and part of personnel, because the energy transaction is not frequent under normal conditions and does not occur for a plurality of times in one day. That is, when text data of different data sources are matched, matching contribution of associated combinations of different words in the text data is different.
Therefore, in this embodiment, the text data set of each data source is used as a basic database of the transaction library, and the Apriori data mining algorithm is used to obtain the association rules of all the words in the text data set of each data source and the confidence level corresponding to each association rule, where the Apriori data mining algorithm is a known technology, and the specific process is not repeated.
For the association rules of the text data sets of any one data source, the relevance and the association degree between different words are different, the words in the text data form association rules with different confidence degrees, when the text data of different data sources are matched, different matching costs are considered to be set according to the confidence degree weights of the association rules, the matching cost between the words corresponding to the association rules with larger confidence degree weights is smaller, and the reason for the setting is that the more reliable the relevance between the words in the association rules with larger confidence degrees and the text data is, the more accurate the matching result is. The calculation formula of the transaction association rule matching cost between the xth word and the p-th word in different data sources is as follows:
Where D xp is the trade association rule matching cost between the xth word and the p-th word, n x、np is the number of association rules containing the xth and p-th words in the corresponding data source, mu x,min、μp,min is the minimum of the confidence of all the association rules containing the xth and p-th words in the corresponding data source, The confidence values of the association rules of the xth and the p words in the corresponding data sources are respectively the average value of the confidence values of the association rules of the xth and the p words.
Wherein, the larger the confidence value of the association rule containing the x-th and p-th words in the association rules corresponding to different data sources, mu x,min、μp,min,The larger the value of each of the (c) is,And taking the sum of n x、np as a molecule to consider that the confidence of the association rule is higher but the importance degree of the partial words in the text data in matching is affected too frequently, and constraining the matching cost of the words with different frequencies in the text data through the number of the association rules and the confidence.
According to the steps, the homologous data association coefficient between any two pieces of text data in each data source is obtained respectively, and the homologous data association coefficient between any two pieces of text data is used as the weight of the edge between the corresponding two nodes to obtain the label graph of each data source.
Text data sets of different data sources may have special characters and abbreviations in the text data due to differences in the form of the data record and record carrier, and concatenation of these many feature words may also lead to text miss semantics. The above situation results in that when a large number of abbreviations and special character combinations exist between two pieces of text data, the text similarity between the full names and the abbreviations of the same attribute value is not high, so that the text similarity is difficult to measure the true similarity, and other information needs to be introduced to assist matching.
Specifically, a label graph of each data source is taken as input, the Node2vec algorithm is adopted to acquire the spatial information of each Node in the input label graph, the Node2vec algorithm learns the representation of the nodes by designing a flexible exploration mode for the neighbors of the nodes of the graph, finally, the nodes in the graph are mapped to a low-dimensional continuous space, and the spatial information of the graph can be recorded and stored, and the Node2vec algorithm is a known technology, and the specific process is not repeated. For text data with higher matching degree in different data sources, the spatial information of the corresponding nodes must have larger similarity.
Further, transaction association rule matching costs between any two words in any two data sources are respectively obtained, and each equal transaction association rule matching cost is used as a matching cost level. When two text data of two data sources are matched, the more the number of matching cost stages is, the more unstable the matching cost of different words in the two text data is, and the more the data matching cost is.
Based on the analysis, a plurality of sections of valuable matching indexes are constructed and used for representing the cost when matching between two text data in different data sources. Calculating a plurality of sections of valuable matching indexes between the a-th text data and the k-th text data in two data sources:
R ak is the matching degree of text characteristic space information between the a-th text data and the k-th text data, o a、ok is the space information of corresponding nodes in a label graph where the a-th text data and the k-th text data are located, M 1、M2 is the number of nodes directly connected with the corresponding nodes in the a-th text data and the k-th text data in the label graph, j and h are the j-th and h-th nodes directly connected with the corresponding nodes in the a-th text data and the k-th text data in the label graph, L aj is the homologous data association coefficient corresponding to the corresponding nodes and the j-th nodes in the a-th text data in the label graph, L kh is the homologous data association coefficient corresponding to the corresponding nodes and the h-th nodes in the k-th text data in the label graph, o j、oh is the j-th and h-th node corresponding space information, and dist (o a,ok)、dist(oj,oh) is the Euclidean distance between o a and o k、oj and o h;
u ak is a task multidimensional matching cost weight between the a-th text data and the k-th text data, G a、Gk is a set of matching cost levels corresponding to the matching cost of transaction association rules of all words in the a-th text data and the k-th text data respectively, jac (G a,Gk) is a Jaccard coefficient between the sets G 1 and G 2, m 2、m3 is the number of words in the a-th text data and the k-th text data respectively, a x、kp is the x-th word and the p-th word in the a-th text data respectively, and D (a x,kp) is the matching cost of transaction association rules between the words a x and k p;
V ak is the multiple segment valuable match index between the a-th and k-th text data in two data sources, Is a preset parameter adjusting factor for preventing the denominator from being 0,The value of (2) is 0.001.
When the matching task of the energy transaction data between two data sources is performed, the matching degree between text data in the two data sources is larger, the space information o a、ok in the low-dimensional continuous space is more similar, the matching degree between a bar and k bar text data is larger, the data generated by the same energy transaction is more likely to be generated, the connection structure of the corresponding node of the a bar and k bar text data in a tag graph is more similar, the value of dist (o a,ok)、dist(oj,oh) is smaller, the value of R ak is smaller, the matching cost of different words in the a bar and k bar text data is more unstable, the distribution similarity between a bar and k bar text data is lower, the value of Jac (G a,Gk) is smaller, the confidence of the association rule in the a bar and k bar text data is larger, the association between the a bar and k bar text data is more reliable, the matching result is more reliable, the matching cost of the u is more accurate, the matching cost of the matching result is more stable, the matching cost of the words between the two words is larger, the matching cost of the matching value of the text data is more intense, and the matching cost of the text data is more intense, and the matching value of the text data is more intense, and 4.
So far, the multi-section valuable matching index between any two pieces of text data in different data sources is obtained and is used as a measurement mode in a twin network.
And step S003, obtaining a data matching result between different data sources in the energy transaction of the enterprise A by adopting the twin network based on the multiple sections of valuable matching indexes between the text data.
And respectively acquiring multiple sections of valuable matching indexes between any two pieces of text data in different data sources according to the steps, and completing data matching based on the multiple sections of valuable matching indexes.
Specifically, two pieces of text data in two data sources are used as input of a twin network, the twin network is adopted to obtain the matching degree between the two inputs, the output of an upstream network in the twin network is the high latitude characteristic of the two pieces of text data in the same characteristic space, the multi-section valuable matching index between the two pieces of text data is used as the measurement distance in a downstream network of the twin network, the twin network is a coupling framework established by two sub-networks sharing weights, the network structure is shown in fig. 3, the twin network is a known technology, and the specific process is not repeated.
And for any text data in each data source, taking the text data in the rest data sources corresponding to the maximum matching degree of each text data as the matching data. Wherein, the matching data acquisition flow chart is shown in fig. 4.
This embodiment is completed.
In summary, according to the embodiment of the invention, through analyzing the similarity condition among text data in text data sets in a plurality of data sources in one-time energy transaction process of an enterprise, similar data sets of each text data are screened, so that word analysis is conveniently performed on text data with similar characteristics, the number of samples for word relation analysis is increased, and the quality of relation analysis is improved; meanwhile, the embodiment of the invention constructs the whole text meaning matching degree between any two text data based on semantic information features among different text data in homologous data, mines the overall difference information of the text data's character strings, words and the structural multivalent matrix, analyzes the data matching condition from the overall angle of the text data, then constructs the data dependency structure similarity between any two text data according to the single structure down-conversion sequence difference of all word combinations in different text data, analyzes the similarity between the data structures from the structural feature angle, combines the semantic correlation and the lexical structural similarity to construct the homologous data association coefficient, and simultaneously considers the semantic information and the structural feature of the text data, thus being capable of accurately evaluating the association strength between the text data in the same data source;
Meanwhile, according to the embodiment of the invention, based on the association rules of text data mining of different data sources, transaction association rule matching cost is determined when text data of different data sources are matched, the transaction association rule matching cost considers that part of words in the text data are higher in confidence and occur too frequently although the association rules are high in confidence, so that importance degree of the matching is affected when the words are matched, matching cost of words with different frequencies in the text data is constrained by the number of the association rules and the confidence, weight is affected by the rule relation among deep analysis words on matching of the text data, reliability of text data matching analysis is improved by reflecting basic word rule architecture in the text data from a deeper level, spatial information of each node in the tag graph is mined by constructing homologous data association coefficients of the text data, similarity between each node is helped to be identified by mapping association information to a low-dimensional continuous space, and therefore, data matching situation is assisted to be judged, and a multi-section valuable matching index between text data in different data sources is determined as a downstream network measuring mode based on matching cost of single and whole text when the data is matched, and accuracy of the matching words in the twin network is improved.
It should be noted that the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.
The embodiments described above are only for illustrating the technical solutions of the present application, but not for limiting the same, and the technical solutions described in the foregoing embodiments are modified or some of the technical features are replaced equivalently, so that the essence of the corresponding technical solutions does not deviate from the scope of the technical solutions of the embodiments of the present application, and all the technical solutions are included in the protection scope of the present application.