Disclosure of Invention
In order to improve the accuracy of HAZOP analysis report risk classification, the application provides a risk classification method and system based on deep learning.
In a first aspect, the present application provides a risk classification method based on deep learning, which adopts the following technical scheme:
a risk classification method based on deep learning, comprising:
collecting HAZOP analysis reports to be classified, and recording the HAZOP analysis reports as first reports;
First modeling, establishing a T-BTM model, wherein the T-BTM model comprises a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model and a full connection layer;
The first input, namely taking a first report as the input of the BERT sub-model, generating a word vector with a label by utilizing an embedding layer of the BERT sub-model, and recording the word vector as a first word vector;
first obtaining, namely respectively inputting the first word vectors into the BERT submodel Layer self-attention layer, obtaining outputFeatures ofAnd sets the weight of each feature;
First calculation based onFeatures ofCalculating first input data, wherein a calculation model of the first input data is as follows:
;
wherein A is the first input data, Is the firstA plurality of features; Is the first Weights of the individual features; Is the first A plurality of features;
Second input, inputting the first input data into TextCNN submodels to obtain first output data ;
Third input, inputting the first input data into BILSTM submodels to obtain second output data;
Second calculation of the first output dataAnd second output dataPerforming linear combination to obtain third output data;
Third calculation of third output dataInput into the full connection layer of the T-BTM model to obtain fourth output dataFourth output data using Softmax functionNormalizing to obtain a risk type of the first report.
By adopting the technical scheme, the T-BTM model is built and comprises a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model and a full-connection layer, wherein an embedded layer of the BERT sub-model is used for generating a high-quality first word vector, the BERT sub-model is used for extracting n characteristics by utilizing n self-attention layers based on the first word vector, first input data are calculated based on the n characteristics and are input into the TextCNN sub-model for capturing local characteristics, the first input data are input into the BILSTM sub-model for capturing long-distance dependency, the outputs of the two sub-models are combined linearly, the local characteristics and the long-distance characteristics are fused, the combined characteristics are input into the full-connection layer, the full-connection layer can learn a higher-level characteristic representation, third output data are output, the output of the third output data is normalized through a Softmax function, and the type of a HAZOP analysis report is output. Because the HAZOP analysis report has more important word characteristics than a general text, the T-BTM model extracts characteristic information of different self-attention layers of the BERT sub-model to be combined, and local characteristics and long-distance characteristics are learned through the TextCNN sub-model and the BILSTM sub-model, so that the T-BTM model can learn the characteristic information of sentences in a text structure of the HAZOP classification report as much as possible.
Optionally, after performing the first modeling, before performing the first input, further comprising:
training the T-BTM model, wherein the training comprises secondary acquisition, training and verification;
Collecting a historical HAZOP analysis report, recording the report as a second report, and dividing the second report into a training set and a verification set;
training the T-BTM model by adopting a training set, and recording the prediction probability of the risk type of each sample in the training process;
And (3) verifying the T-BTM model by adopting a verification set.
By adopting the technical scheme, the method adopts the historical HAZOP analysis report as the training set and the verification set, the training set is adopted to train the T-BTM model, and in the training process, the T-BTM model optimizes the prediction capacity by continuously adjusting parameters, so that the prediction probability gradually approaches to the real risk type, and the process not only improves the classification accuracy of the T-BTM model, but also enhances the robustness of the T-BTM model. And verifying the T-BTM model by adopting a verification set, and judging whether the T-BTM model is over-fitted or under-fitted or not by using performance indexes (such as accuracy, recall, F1 score and the like) of the verification set so as to adjust a training strategy. Through the close combination of the second acquisition, training and verification, the T-BTM model can not only learn enough knowledge in the training process, but also keep certain generalization capability.
Optionally, after performing training and before performing verification, the method further comprises:
calculating loss, namely defining a loss function, fourth calculation, optimization, first judgment and second judgment;
defining a loss function, namely defining the loss function, wherein a calculation model of the loss function is as follows:
;
Wherein, The prediction probability corresponding to the correct risk type; the prediction probability corresponding to the risk type of the ith error; M is the number of risk types for errors;
Calculating the loss of the T-BTM model by using a loss function based on the risk type and the prediction probability;
Optimizing, namely updating parameters of a T-BTM model by using a gradient descent method to obtain updating times and minimum loss;
the first judgment, namely judging whether the minimum loss is smaller than a preset minimum loss threshold value, if so, executing verification, and if not, executing the second judgment;
and judging whether the updating times are larger than a preset iteration times threshold, if so, executing verification, and if not, executing optimization.
By adopting the technical scheme, the application provides an improved cross entropy loss function, which can calculate generalized cross entropy loss and simultaneously consider incorrect labels and probabilities thereof, so that after T-BTM model training, when the probabilities of the correct labels approach 1, the polynomial corresponding to the incorrect labels approaches 0 infinitely, and the loss function of the application enhances the information transmission of back propagation in the T-BTM model training process.
Optionally, after performing the second acquisition, before performing the training, further comprising:
collecting keywords, namely collecting keywords of a second report through a web crawler technology;
establishing a word stock, which comprises collecting corpus, word segmentation and first combination;
Collecting corpus, namely acquiring an accident report and acquiring accident information;
word segmentation, namely converting accident information into text data, and performing word segmentation operation on the text data to obtain first data;
And the first combination is to combine the keywords with the first data to obtain second data, and the second data is used as a dictionary.
By adopting the technical scheme, the application collects the keywords of the second report through the web crawler technology, increases the number of professional words in the training set, and can extract the key information related to risk analysis, accident reasons, safety measures and the like. The accident report obtained by the application contains abundant practical cases and experience training, and has important significance for improving the capabilities of the model in the aspects of risk identification, accident prevention and the like. The accident information is converted into text data and word segmentation operation is a basic step of text processing, and continuous texts can be segmented into independent word units by word segmentation, so that the T-BTM model is facilitated to better understand and process the text data. The method combines the keywords with the first data (namely text data after word segmentation), takes the combined content as a dictionary, and the dictionary contains all words which need to be identified in the training process of the T-BTM model, thereby having important significance for improving the classification accuracy and generalization capability of the T-BTM model.
Optionally, after the word stock is built, before the training is performed, the method further includes:
converting the first data and the keywords into Word vectors through a Word2Vec model;
screening, including first extraction, fifth calculation, sorting, second extraction and second combination;
extracting a keyword in a dictionary, and marking the keyword as a first keyword;
Fifth calculation, namely calculating the similarity of the first keywords and all the first data in the dictionary, and recording the similarity as the first similarity;
Sequencing, namely re-sequencing the first similarity according to the sequence from the big to the small, and marking the first similarity as a first sequence;
second extracting, namely extracting first m pieces of first data in the first sequence;
Combining the first keywords with the corresponding m first data to obtain third data, and taking the third data as a sub-dictionary;
Judging whether first keywords which are not similar to all first data in the dictionary are present in the dictionary or not, if yes, executing screening, otherwise, executing updating the training set;
updating the training set, namely taking all sub-dictionaries as a new training set.
By adopting the technical scheme, the Word2Vec model is utilized to convert the first data and the keywords into Word vectors, and the Word vectors can capture semantic relations among words, so that the T-BTM model can better understand the intrinsic meaning of text data in the training process. The method comprises the steps of extracting a keyword from a dictionary as a starting point, recording the keyword as a first keyword, calculating the similarity between the first keyword and all first data in the dictionary, finding out first data similar to the meaning of the first keyword through calculation of the similarity, sequencing the similarity according to the sequence from large to small, extracting first m pieces of first data from the sequenced sequence, taking the first keyword as m pieces of first data most relevant to the first keyword, combining the first keyword with m pieces of first data corresponding to the first keyword to obtain third data, taking the third data as a sub-dictionary, helping the T-BTM model to focus on the first data relevant to the first keyword in the training process, taking the sub-dictionary as a new training set, and improving the performance of the T-BTM model on specific tasks because the training set focuses on the first data relevant to the first keyword.
Optionally, after performing the third determination, before performing updating the training set, further comprising:
Deleting, namely deleting the first data which are not stored in the sub-dictionary in the dictionary.
By adopting the technical scheme, after the new sub-dictionary is combined, the first data which are not stored in the sub-dictionary are deleted from the original dictionary, so that redundancy is reduced.
Optionally, after performing the third calculation, further comprising:
establishing an SPN model, wherein the SPN model comprises a multi-head self-attention layer, a multi-head cross-attention layer and a full-connection layer;
A sixth calculation of taking the first word vector as the input of the multi-head self-attention layer, calculating the attention weight of each first word vector and the rest word vectors, taking the attention weight as the input of the multi-head cross-attention layer, and obtaining fifth output data ;
Seventh calculation of fifth output dataInput full connection layer of SPN model, output sixth output dataFor the sixth output data by Softmax functionAnd carrying out normalization processing to obtain a prediction result of the entity-relation triplet in the first report.
By adopting the technical scheme, the SPN model can more comprehensively capture semantic information in the text of the HAZOP analysis report by introducing self-attention and cross-attention mechanisms, and particularly the complex relationship between the entity and the relationship is beneficial to improving the accuracy of the SPN model in the entity-relationship extraction task. The multi-head mechanism enables the SPN model to learn the characteristics of data from multiple angles, and the generalization capability of the SPN model is enhanced. By visualizing the attention weights, it is possible to learn what parts of the information the SPN model focuses on when making predictions.
Optionally, after performing the sixth calculation, before performing the seventh calculation, further comprising:
Third modeling, namely establishing ACMix a model, wherein the ACMix model comprises a full connection layer and a self-attention layer, and the ACMix model is calculated as follows:
;
Wherein, Full connection layer output for ACMix model; Self-attention layer output for ACMix model; Weights output for the full connection layer of ACMix model; weights output for the self-attention layer of ACMix model; The output of ACMix models;
Eighth calculation will be Features ofAs an input to the ACMix model, seventh output data is obtained;
Ninth calculation of fifth output dataAnd seventh output dataAfter combining, as new fifth output data。
By adopting the technical scheme, the application establishes the ACMix model, and the ACMix model can learn more comprehensive and deep characteristic representation by combining the output of the full-connection layer and the self-attention layer, so that the accuracy of entity-relation triplet prediction is improved. The ACMix model reduces the dependence on single characteristics or single models by weighting and fusing the characteristic representations of different sources, thereby improving the robustness and generalization capability of the ACMix model.
Optionally, the prediction result is represented by an offset from the beginning to the end of the entity.
By adopting the technical scheme, the accuracy of entity extraction can be remarkably improved by using the offset to represent the predicted result, and the extracted entity can be judged whether to be correct according to whether the offset in the predicted result is correct or not because the offset is directly corresponding to a specific position in the text.
In a second aspect, the present application provides a risk classification system based on deep learning, which adopts the following technical scheme:
A deep learning-based risk classification system, comprising:
the data acquisition module is used for acquiring HAZOP analysis reports to be classified;
The building model module is used for building a T-BTM model, wherein the T-BTM model comprises a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model and a full connection layer;
the data processing module comprises a first processing unit, a second processing unit and a third processing unit;
the first processing unit is used for inputting the HAZOP analysis report to be classified into the BERT sub-model, obtaining a first word vector after the processing of an embedding layer of the BERT sub-model, inputting the first word vector into an attention layer of the BERT sub-model and outputting a plurality of characteristics;
A second processing unit for inputting the first input data into TextCNN submodels to obtain first output data ;
A third processing unit for inputting the first input data into BILSTM submodels to obtain second output data;
The computing module comprises a first computing unit, a second computing unit and a third computing unit;
a first calculation unit configured to calculate first input data based on a plurality of features;
a second calculation unit for outputting the first output data And second output dataPerforming linear combination to obtain third output data;
A third calculation unit for outputting the third output dataInput to the full connection layer to obtain fourth output dataFourth output data using Softmax functionNormalizing to obtain a risk type of the first report.
By adopting the technical scheme, the T-BTM model is constructed and comprises the BERT sub-model, the TextCNN sub-model, the BILSTM sub-model and the full-connection layer, and the BERT sub-model can capture the context information in the text and improve the quality of text representation through the strong pre-training capability and the depth bidirectional coding. The TextCNN sub-model is good at capturing local features of text, such as keywords, phrases, etc., and is very effective for recognition of short text or specific patterns. The BILSTM sub-model is capable of handling long-term dependency problems in sequence data, and is helpful for understanding timing relationships and semantic structures in text. The full connection layer is used as a core part of the classifier and is responsible for integrating the output of each submodel and carrying out final classification decision. The computing module is used for computing input and output data, combining the output data of each sub-module to obtain total output data, inputting the total output data to the full-connection layer, processing the output of the full-connection layer through a Softmax function, and outputting probability distribution of each risk type to obtain the most probable risk type. The system can automatically collect and process a large amount of text data, and quickly identify the risk type in the HAZOP analysis report.
In summary, the present application includes at least one of the following beneficial technical effects:
1. The T-BTM model comprises a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model and a full-connection layer, wherein an embedding layer of the BERT sub-model is used for generating a high-quality first word vector, the BERT sub-model is used for extracting n characteristics by using n self-attentive layers of the BERT sub-model based on the first word vector, first input data are calculated based on the n characteristics and are input into the TextCNN sub-model for capturing local characteristics, the first input data are also input into the BILSTM sub-model for capturing long-distance dependency, the outputs of the two sub-models are combined linearly and fused with the long-distance characteristics, the combined characteristics are input into the full-connection layer, the full-connection layer can learn a higher-level characteristic representation, third output data are output, the output of the third output data is normalized through a Softmax function, and the risk type of a HAZOP analysis report is output. Because the HAZOP analysis report has more important word characteristics than a general text, the T-BTM model extracts characteristic information of different self-attention layers of the BERT sub-model to be combined, and local characteristics and long-distance characteristics are learned through the TextCNN sub-model and the BILSTM sub-model, so that the T-BTM model can learn the characteristic information of sentences in a text structure of the HAZOP classification report as much as possible.
2. According to the application, the keywords of the second report are collected through the web crawler technology, so that the number of professional words in the training set is increased, and key information related to risk analysis, accident reasons, safety measures and the like can be extracted. The accident report obtained by the application contains abundant practical cases and experience training, and has important significance for improving the capabilities of the model in the aspects of risk identification, accident prevention and the like. The accident information is converted into text data and word segmentation operation is a basic step of text processing, and continuous texts can be segmented into independent word units by word segmentation, so that the T-BTM model is facilitated to better understand and process the text data. The method combines the keywords with the first data (namely text data after word segmentation), takes the combined content as a dictionary, and the dictionary contains all words which need to be identified in the training process of the T-BTM model, thereby having important significance for improving the classification accuracy and generalization capability of the T-BTM model.
3. The application builds a T-BTM model, which comprises a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model and a full connection layer, wherein the BERT sub-model can capture context information in a text through strong pre-training capability and depth bidirectional coding, and improve the quality of text representation. The TextCNN sub-model is good at capturing local features of text, such as keywords, phrases, etc., and is very effective for recognition of short text or specific patterns. The BILSTM sub-model is capable of handling long-term dependency problems in sequence data, and is helpful for understanding timing relationships and semantic structures in text. The full connection layer is used as a core part of the classifier and is responsible for integrating the output of each submodel and carrying out final classification decision. The computing module is used for computing input and output data, combining the output data of each sub-module to obtain total output data, inputting the total output data to the full-connection layer, processing the output of the full-connection layer through a Softmax function, and outputting probability distribution of each risk type to obtain the most probable risk type. The system can automatically collect and process a large amount of text data, and quickly identify the risk type in the HAZOP analysis report.
Detailed Description
The application is described in further detail below in connection with fig. 1 to 3.
Embodiment 1 discloses a risk classification method based on deep learning, referring to fig. 1, the method comprises the following steps of S1, first acquisition, S2, first modeling, S3, training of a T-BTM model, S4, data processing, S5, entity-relation extraction, wherein firstly, HAZOP analysis reports to be classified are acquired, then a T-BTM model is built, then the T-BTM model is trained, then the HAZOP analysis reports to be classified are processed through the T-BTM model, and then the entity-relation extraction is carried out on the processed HAZOP analysis reports, and the specific process of the method is as follows:
S1, collecting HAZOP analysis reports to be classified, and recording the HAZOP analysis reports as first reports.
The first acquisition step is focused on collecting and collating HAZOP (hazard and operability analysis) analysis reports to be subjected to safety analysis. These analytical reports contain detailed analysis of potential hazards and operational problems in industrial processes, equipment or systems, and are important documents to ensure production safety and prevent accidents. In this step, the latest HAZOP analysis report is automatically retrieved and downloaded through a designated path or database interface, so as to ensure timeliness and integrity of the data. These HAZOP analysis reports are noted as "first reports" and stored in a special folder or database for ready recall and access.
S2, first modeling, and building a T-BTM model, wherein the T-BTM model comprises a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model and a full connection layer.
The method comprises the steps of taking a first report as an embedding layer of a BERT sub-model, generating a word vector with a label, recording the word vector as the first word vector, respectively inputting the first word vector into n layers of attention layers of the BERT sub-model, respectively extracting characteristics of a HAZOP analysis report by the n layers of attention layers of the BERT sub-model, converting unstructured text information into characteristic vectors, simultaneously processing text long-distance characteristic information in the HAZOP analysis report by using a BILSTM sub-model, and introducing a TextCNN sub-model to process the local information of the text in the HAZOP analysis report so as to obtain more perfect chemical professional word characteristics, and rapidly learning the local characteristics in the text by using a TextCNN sub-model through local connection and weight sharing characteristics of a Convolutional Neural Network (CNN), so as to further increase understanding of the T-BTM model on text contents.
As shown in FIG. 2, the S3: T-BTM model training includes S31: second acquisition, S32: reset training set, S33: training, S34: calculate loss, and S35: validation.
And S31, second collection, namely collecting a historical HAZOP analysis report, recording the historical HAZOP analysis report as a second report, and dividing the second report into a training set and a verification set according to the proportion.
S32, resetting the training set, including S321, collecting keywords, S322, establishing a word stock and S323, and resetting.
And S321, collecting keywords, and collecting keywords of the second report through a web crawler technology.
The present embodiment utilizes advanced web crawler technology to automatically collect keywords related to the second report. A web crawler is an automated script or program that is able to browse the world wide web and extract information. Crawler programs are tailored to this task to target searches and parse content in websites, databases, or file systems that store these reports.
The crawler will simulate browser behavior, access the specified URL addresses, extract text data in the HTML page, and use regular expressions, XPath, or other parsing techniques to identify and extract keywords related to the HAZOP analysis report, which may include, but are not limited to, process names, device types, chemicals, security risk categories, failure modes, etc.
In order to improve the comprehensiveness and accuracy of keyword collection, a crawler can be further set to traverse related links in a recursion mode, the searching range is further enlarged, and the embodiment further follows the moral specification used by the web crawler by setting the authority of the crawler program.
S322, establishing a word stock, wherein the word stock comprises S3221, collecting corpus, S3222, word segmentation and S3223, and the first combination.
S3221, collecting corpus, acquiring accident reports, and acquiring accident information, wherein the accident reports are descriptions of industrial accidents, particularly a complete accident analysis report, and the whole text of the accident report comprises basic conditions of accident enterprises, accident passing, rescue and post-treatment conditions, accident reasons and properties, treatment suggestions of accident-related responsible personnel and responsible institutions and treatment suggestions of related problems, wherein the accident reasons and properties are part of important contents of the chemical accident report, and comprise factors such as direct reasons and indirect reasons of the accident, particularly industrial equipment, personnel operation, industrial materials, safety devices and the like involved in the accident.
In order to ensure the accuracy of the accident information, after the accident information is acquired, whether the problem of messy codes or illegal characters exist is checked manually, if yes, the problem of messy codes or illegal characters are cleared manually, and if not, S3222 is executed for word segmentation.
S3222, word segmentation is carried out, accident information is converted into text data, word segmentation operation is carried out on the text data, and first data are obtained. The present embodiment uses specialized word segmentation tools or algorithms, such as jieba word segmentation, hanLP, etc., to perform word segmentation processing on the text data.
S3223, integrating the keywords and the first data, and taking the integrated content as a dictionary. For example, the keyword and the first data are stored together in a database as a dictionary. For another example, the keywords are stored in one sequence table, the first data is stored in one sequence table, and both sequence tables are stored together in the database. For another example, one keyword and all the first data are stored in one sequence table, and if the second report has several keywords, several sequence tables are stored correspondingly, and these sequence tables are stored together in the database.
S323, first reset, comprising S3231 conversion, S3232 screening, S3233 third judging, S3234 deleting and S3235 updating training set.
The conversion of S3231, for each Word in the first data, the Word2Vec model outputs a fixed length vector, which is a representation of the Word in vector space. Similarly, this step also converts the collected keywords into word vectors for subsequent interaction and comparison with the word vectors of the first data.
S3232, screening, including S32321, first extraction, S32322, fifth calculation, S32323, ordering, S32324, second extraction, and S32325, second combination.
And S32321, first extraction, namely extracting one keyword in the dictionary based on the dictionary established in the first combination in S3223, and marking the keyword as a first keyword.
And S32322, fifth calculation, namely calculating the similarity of the first keyword and all the first data in the dictionary, obtaining a series of similarity scores, and recording the similarity scores as first similarity, wherein the first similarity reflects the semantic proximity degree between the first keyword and each first data.
And S32323, sequencing, namely re-sequencing the first similarity according to the sequence from the big to the small, and marking the first similarity as a first sequence.
S32324, second extraction, namely extracting first m pieces of first data in the first sequence according to the sequencing result of the first sequence.
And S32325, combining the first keywords with the m pieces of first data extracted in the second extraction as a sub dictionary. The combination form of the first keywords and the m first data comprises the steps of storing the keywords and the m first data in a database together to serve as a dictionary, storing the keywords in a sequence table, storing the m first data in the sequence table, storing the two sequence tables together in the database, storing the keywords and the m first data in the sequence table, storing a plurality of keywords in a second report, and storing a plurality of sequence tables correspondingly, and storing the sequence tables together in the database.
S3233, third judging whether first keywords which are not similar to all first data in the dictionary exist in the dictionary, if yes, executing S3232, screening based on the first keywords which are not similar to all the first data in the dictionary, and if not, executing S3234, deleting.
And S3234, deleting the first data which is not stored in the sub-dictionary in the dictionary.
And S3235, updating the training set, and taking all the sub-dictionaries as a new training set.
And S33, training, namely, training the T-BTM model by adopting a new training set obtained in the updated training set, and recording the prediction probability of the risk type of each sample in the training process, wherein the prediction probability is used for calculating the loss in the step S34.
The step S34 is to calculate the loss, including step S341 is to define a loss function, step S342 is to define a fourth calculation, step S343 is to optimize, step S344 is to first judge and step S345 is to second judge.
S341, defining a loss function, wherein the loss function is defined by a calculation model as follows:
;
Wherein, The prediction probability corresponding to the correct risk type; The prediction probability corresponding to the risk type of the ith error is obtained, and M is the number of the risk types of the errors.
And S342, fourth calculation, namely calculating loss by using a loss function based on the risk types of the samples in the training set and the prediction probability of each sample risk type.
And S343, optimizing, namely iteratively updating parameters of the T-BTM model by using a gradient descent method, recording the updating times once every updating, and defining the initial updating times as 0. Minimal loss is obtained by continually iteratively updating the parameters of the T-BTM model.
S344 is a first judgment, after each update, judging whether the minimum loss is smaller than a preset minimum loss threshold, if yes, executing S35 is verification, and if not, executing S345 is a second judgment.
S345, judging whether the updating times are larger than a preset iteration times threshold, if so, executing S35, verifying, and if not, executing S343, optimizing.
And S35, verifying the T-BTM model by adopting a verification set, wherein the verification set is used for verifying that the performance of the T-BTM model meets the expectations. If the validation results are not expected, the calculation of the penalty may be re-performed S34, optimizing the T-BTM model.
As shown in fig. 3, S4: data processing includes S41: first input, S42: first acquisition, S43: first calculation, S44: second input, S45: third input, S46: second calculation, and S47: third calculation.
S41, a first input, namely, using a first report as an input of the BERT sub-model, generating a word vector with labels by utilizing an embedding layer of the BERT sub-model, and recording the word vector as a first word vector.
S42, first obtaining, namely respectively inputting the first word vectors into the BERT submodelLayer self-attention layer, outputFeatures ofAnd sets the weight of each feature。
S43, first calculation based onFeatures ofAnd the weight of each featureCalculating first input data, wherein a calculation model of the first input data is as follows:
;
wherein A is the first input data, Is the firstA plurality of features; Is the first Weights of the individual features; Is the first And features.
S44, a second input, namely inputting the first input data into TextCNN submodels, carrying out convolution and pooling on the linear combination of the first input data through different convolution checks to obtain local characteristics of the HAZOP analysis report, namely the first output data。
S45, third input, inputting the first input data into BILSTM submodels, obtaining long-distance characteristics of the first input data, and obtaining second output data。
S46, second calculating the first output dataAnd second output dataPerforming linear combination to obtain third output data。
S47, third calculation, to output third dataInput into the full connection layer of the T-BTM model to obtain fourth output dataFourth output data using Softmax functionNormalizing to obtain a risk type of the first report.
Assume that a classification case of a certain sample has 10 different risk type labels (risk type label 1, risk type label 2,..and risk type label 10), and of the 10 risk type labels, only 1 is a correct label of the current sample, the remaining 9 are error labels, the risk type label 3 is set to be a correct label, and a common deep learning model is adopted for prediction, wherein the risk type and the prediction probability of the sample are as follows:
In the first case, the prediction probability of the risk type tag 3 is 0.46, the prediction probability of the risk type tag 2 is 0.46, and the rest risk type tags occupy 0.08 together.
After training, the T-BTM model of the embodiment is adopted, and the risk type and the prediction probability of the sample are as follows:
In the second case, the prediction probability of the risk type tag 3 is still 0.46, and the prediction probability of all the remaining risk type tags is 0.06.
In the first case, the wrong label (wrong risk type label 2) will generate a strong misleading to the correct label (correct risk type label 3), and in the second case, the correct label (correct risk type label 3) can be correctly found, and the same output can be obtained by selecting the conventional cross entropy loss function, that is, the conventional cross entropy loss function only focuses on the matching degree of the prediction probability distribution of each sample and the real label, and does not focus on the wrong label. The loss function adopted in the embodiment can consider all the error labels while performing generalized cross entropy loss calculation, so that the error labels approach to a relative average state, and the influence of extreme values of the error labels on the correct labels is reduced. By iterative training, the probability of the correct label is made to approach 1, the more the loss function approaches 0.
S5, entity-relation extraction, including S51, second modeling, S52, sixth calculation, S53, third modeling, S54, eighth calculation, S55, ninth calculation and S56, seventh calculation.
S51, second modeling, and establishing an SPN model, wherein the SPN model comprises a multi-head self-attention layer, a multi-head cross-attention layer and a full-connection layer. A multi-headed self-attention layer for capturing internal dependencies between input word vectors, each head independently calculating attention weights, and then combining these weights to generate a richer representation. A multi-headed cross-attention layer receives as input the output of the multi-headed self-attention layer and possibly performs cross-attention calculations with representations of other sequences (e.g., context, query, etc.) to further fuse the information.
S52, sixth calculating, namely taking the first word vector as the input of the multi-head self-attention layer, calculating the attention weights of the first word vector and the rest word vectors for each first word vector, taking the attention weights as the input of the multi-head cross-attention layer, and obtaining fifth output data。
S53, third modeling, namely establishing ACMix a model, wherein the ACMix model comprises a full connection layer and a self-attention layer, and the ACMix model is calculated as follows:
;
Wherein, Full connection layer output for ACMix model; Self-attention layer output for ACMix model; Weights output for the full connection layer of ACMix model; weights output for the self-attention layer of ACMix model; Is the output of ACMix models.
The ACMix model also includes a plurality of different convolution kernels for mapping the output of the BERT sub-model to different feature sets.
S54 eighth calculation, toFeatures ofThe data are input into a convolution layer of ACMix models to obtain a plurality of feature sets, and then the feature sets are divided into N groups, wherein each group is internally provided with three features.
And taking the three features as a query value key value and a value in the self-attention layer to participate in the calculation of the self-attention model, and obtaining an attention result.
And carrying out operation on the data of the N groups of feature sets through a full connection layer of ACMix models, and carrying out shift operation on operation results to obtain different new N groups of feature sets.
The output is then calculated from the calculation model of ACMix models:
;
Wherein, N groups of different feature sets are output for the full connection layer of the ACMix model; attention results output for the self-attention layer of ACMix model; Weights output for the full connection layer of ACMix model; weights output for the self-attention layer of ACMix model; Seventh output data output for ACMix models.
S55, ninth calculation, to output fifth output dataAnd seventh output dataAfter combining, as new fifth output data。
S56 seventh calculation, the fifth output data in the S55 ninth calculationInput full connection layer, output sixth output dataFor the sixth output data by Softmax functionAnd carrying out normalization processing to obtain a predicted result of the entity-relation triplet in the first report, and representing the predicted result by the offset of the head and the tail of the entity.
The embodiment provides a risk classification method based on deep learning, firstly, a HAZOP analysis report to be classified is collected as a first report. Subsequently, by building a T-BTM model, the model includes a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model, and a full connection layer. The present embodiment inputs the first report into the BERT sub-model to generate a tagged word vector (i.e., a first word vector), and extracts features and sets weights using the self-attention layer of BERT. Based on these features and their weights, first input data is calculated. Then, the first input data is fed into TextCNN and BILSTM sub-models, respectively, to obtain two independent output data (first output dataAnd second output data). Then, by linearly combining the two output data, third output data is generatedAnd finally, carrying out normalization processing through the full connection layer and the Softmax function to predict the risk type of the first report. The method fully utilizes the advantages of different submodels and improves the accuracy and the efficiency of the classification of the HAZOP analysis report.
Embodiment 2 this embodiment discloses a risk classification system based on deep learning, the system includes:
And the data acquisition module is used for acquiring HAZOP analysis reports to be classified.
And the building model module is used for building a T-BTM model, wherein the T-BTM model comprises a BERT sub-model, a TextCNN sub-model, a BILSTM sub-model and a full connection layer.
The data processing module comprises a first processing unit, a second processing unit and a third processing unit.
The first processing unit is used for inputting the HAZOP analysis report to be classified into the BERT sub-model, converting the text into tagged word vectors (namely first word vectors) through an embedding layer of the BERT sub-model, wherein the word vectors are rich in semantic information of the text, and sending the first word vectors into an attention layer of the BERT sub-model, and extracting a plurality of key features through an attention mechanism, wherein the features are used for subsequent calculation and classification.
A second processing unit for inputting the first input data into TextCNN submodels, performing convolution and pooling by different convolution checks to obtain local features of the HAZOP analysis report, namely the first output data。
A third processing unit for inputting the first input data into BILSTM submodels to obtain long-distance characteristics of the first input data and obtain second output data。
The computing module comprises a first computing unit, a second computing unit and a third computing unit;
and a first calculation unit for calculating first input data based on the plurality of features output by the BERT sub-model.
A second calculation unit for outputting the first output dataAnd second output dataPerforming linear combination to obtain third output data。
A third calculation unit for outputting the third output dataInput to the full connection layer to obtain fourth output dataFourth output data using Softmax functionNormalizing to obtain a risk type of the first report.
The embodiment provides a risk classification system based on deep learning, wherein a data acquisition module is responsible for collecting HAZOP analysis reports to be classified, and then a model building module is used for building an integrated T-BTM model, and the T-BTM model is combined with BERT sub-model, textCNN sub-model, BILSTM sub-model and full connection layer advantages. Units in the data processing module convert the report into word vectors and extract features, while the input TextCNN and BILSTM models obtain output data. The computing module integrates these outputs and ultimately determines the reported risk type classification via the full connection layer and Softmax functions.
The above embodiments are not intended to limit the scope of the application, so that the equivalent changes of the structure, shape and principle of the application are covered by the scope of the application.