CN115422356A - Media data processing method, system, computer equipment and storage medium - Google Patents
Media data processing method, system, computer equipment and storage medium Download PDFInfo
- Publication number
- CN115422356A CN115422356A CN202211051820.4A CN202211051820A CN115422356A CN 115422356 A CN115422356 A CN 115422356A CN 202211051820 A CN202211051820 A CN 202211051820A CN 115422356 A CN115422356 A CN 115422356A
- Authority
- CN
- China
- Prior art keywords
- parameter
- layer
- data
- media data
- cnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The application relates to a media data processing method, a system, a computer device and a storage medium, comprising the following steps: acquiring a media data text, and converting the media data text into a plurality of isometric digital sequence data sets according to a preset rule; establishing a hybrid neural network based on a transformer module, a CNN module and an LSTM module, wherein the transformer module comprises an ENCODER and a DECODER, the CNN module comprises 5 or 8 convolutional layers, and an average pooling layer is arranged between the convolutional layers; dividing the isometric digital sequence data set into a positive data set and a negative data set according to the proportion of 8:2, and training the hybrid neural network by using the positive data set and the negative data set. According to the invention, the feature degree of the text can be improved by preprocessing the media data text, the feature level of the data can be improved by the mixed neural network, and an accurate classification result is obtained.
Description
Technical Field
The present application relates to the field of data processing, and in particular, to a method, an apparatus, a control system, and a medium for processing news data.
Background
Media data processing is a branch of deep learning, the deep learning is an artificial intelligence deep technology, the realization of artificial intelligence depends on continuous breakthrough of the deep learning technology, the media data processing technology develops for more than ten years, technologies such as word vector construction, feature data manual slicing segmentation and the like are gradually developed, the current news text data set is marked and divided by adopting an artificial mode, the required workload is very large, and then the high-value news text data set can be obtained.
The traditional Neural Network model mainly learns the data point (vector) to data point (vector) transformation, while the Recurrent Neural Network (Recurrent Neural Network) learns the data point Sequence (Sequence) to data point Sequence transformation, and recently, the more commonly used models include a gated cyclic Unit (Gate recovery Unit), a Long-Short Term Memory (Long-Short Term Memory) and a Transformer model. Among them, the Transformer model can be understood as "Transformer" in the recurrent neural network model, which has good effect in many application scenarios (e.g. machine translation, media data processing), and the Transformer technology is also a new mainstream in the current research field. Media data represented by news texts have the characteristics of different lengths, variable contents and small form difference, and how to effectively distinguish mass media data is a technical problem to be solved urgently at present.
Disclosure of Invention
In order to solve the technical problems, the application relates to a media data processing method, a system, computer equipment and a storage medium based on deep learning, which are mainly applied to the identification of English news semantic texts, in particular to the identification of data with very large differences and the same type.
In one aspect, the present application provides a media data processing method, including the following steps: acquiring a media data text, and converting the media data text into a plurality of isometric digital sequence data sets according to a preset rule;
establishing a hybrid neural network based on a transformer module, a CNN module and an LSTM module, wherein the transformer module comprises an ENCODER and a DECODER, the CNN module comprises 5 or 8 convolutional layers, and an average pooling layer is arranged between the convolutional layers;
dividing the isometric digital sequence data set into a positive data set and a negative data set according to the proportion of 8:2, and training the hybrid neural network by using the positive data set and the negative data set.
Further, the predetermined rule includes translating the media data text into an english text if the media data text is a non-english text.
Further, the predetermined rule includes: the English letters are converted into numbers.
Further, the predetermined rules include converting characters to numbers, wherein the characters include intangible characters and tangible characters.
Further, the CNN module includes: the method comprises the following steps that (1) a parameter configuration value of a layer 1 CNN is 20, a parameter configuration value of a Max _ len is 20, a hidden layer configuration value of a hidden layer is 45, a kernel _ size parameter is 3, a bias parameter is True, the layer 1 neural network adopts an average pooling operation, after the layer 1 convolution is completed, before the layer 2 convolution is accurately entered, the average pooling operation needs to be carried out on data, and then the layer 2 CNN is entered;
the CNN of the 2 nd layer has the parameter configuration value of Max _ len of 45, the hidden layer of hidden _ dim is configured to be 75, the key_size parameter is 3, and the bias parameter is True; the average pooling configuration is that a stride parameter is None, a padding parameter is 0, a ceil _modeparameter is False, a count _ include _ pad parameter is True, and the layer 2 neural network adopts average pooling operation and inputs the layer 3 CNN;
the CNN of the 3 rd layer has a parameter configuration value of Max _ len of 75, a hidden layer configuration of hidden _ dim of 105, a kernel_size parameter of 3 and a bias parameter of True; the average pooling configuration is that a stride parameter is None, a padding parameter is 0, a ceil _modeparameter is False, a count _ include _ pad parameter is True, the layer 3 neural network adopts average pooling operation and then inputs the layer 4 CNN;
the 4 th layer CNN has a Max _ len parameter configuration value of 105, a hidden layer with a hidden _ dim parameter configuration value of 135, a kernel_size parameter of 3 and a bias parameter of True; the average pooling configuration is that a stride parameter is None, a padding parameter is 0, a ceil _modeparameter is False, a count _ include _ pad parameter is True, and the layer 4 neural network adopts average pooling operation and inputs the layer 5 CNN;
the number 5 of the layers CNN, the Max _ len parameter configuration value is 135, the hidden layer configuration is 165, the kernel _sizeparameter is 3, and the bias parameter is True; the average pooling configuration has parameters of stride parameter of None, padding parameter of 0, ceil _modeparameter of False, and count _ include _ pad parameter of True.
Further, the transform module has a Self attribute module for performing sliding window processing on the equal-length digital sequence data set.
Further, the predetermined rules include a random constraint mechanism for replacing special characters with special numbers in the media data text.
In another aspect, the present application provides a media data processing system, comprising:
the data preprocessing module is used for acquiring a media data text and converting the media data text into a plurality of isometric digital sequence data sets according to a preset rule;
the modeling module is used for establishing a hybrid neural network based on a transformer module, a CNN module and an LSTM module, wherein the transformer module comprises an ENCODER and a DECODER, the CNN module comprises 5 layers or 8 layers of convolutional layers, and an average pooling layer is arranged between the convolutional layers;
and the model training module is used for dividing the isometric digital sequence data set into a positive data set and a negative data set according to the proportion of 8:2 and training the hybrid neural network by using the positive data set and the negative data set.
In another aspect, the present application provides a computer device, which includes a processor, where the processor is configured to execute a program, where the program executes to execute the media data processing method.
In yet another aspect, the present application provides a non-volatile storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of a media data processing method.
The technology realizes the reconstruction of text data in a data preprocessing stage through data processing and feature conversion in an early stage, further amplifies the features of the text data through a neural network, effectively extracts data features with rich layering, and effectively constructs an efficient training data set by taking the internal rule change of a data internal module as a classification basis.
The method is based on self attribute mechanism in the transform mechanism, and mainly uses short sequence to perform characteristic transformation through 2 layers of cyclic neural network, so as to realize the transformation of sequence rules under different modes into clearer sequence transformation mode, the transformation of the mode is determined only according to probability distribution of data, meanwhile, as the complexity of parameters is further reduced, the characteristics are further amplified through 5 layers or 8 layers of CNN after average pooling, and finally, the stability is provided by LSTM. And further amplifying the characteristics obtained by the Transformer, and realizing the classification of the unordered sequences.
The method and the device adopt a Transformer + 5-layer or 8-layer CNN + LSTM hybrid neural network to well solve the problem of preprocessing of mass media data, and parameters of the neural network configured by the method and the device also serve as protection points. After the data is processed in the early stage by the LSTM neural network, the data is input into a special CNN, so that the random constraint characteristic can be further shown well, and a good effect is achieved. The configuration of the learning rate in the hybrid neural network, the average pooling, the step size design of the neural network and the size and length design of each step of the network are determined according to the particularity of the data, so that the design of the network is creative.
When the data sharpening characteristics are processed, a random constraint mechanism is mainly introduced in a data preprocessing stage, symbols of random constraint are inserted into spaces among original English words, then average pooling is used, data with distinct layers are obtained, and a good effect can be obtained through the model. The random constraint characteristics set by the random constraint mechanism change the data structure and increase the difference between the characteristics.
Drawings
In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings used in the detailed description or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is an overall flow diagram of a media data processing method according to some embodiments of the present application;
fig. 2 is a flowchart illustrating a media data preprocessing process in a media data processing method according to some embodiments of the present application;
FIG. 3 is a diagram of a neural network structure of a media data processing method according to some embodiments of the present application;
FIG. 4 is a schematic diagram of a media data processing system according to some embodiments of the present application;
FIG. 5 is a schematic diagram of a computer device for media data processing according to some embodiments of the present application;
Detailed Description
The technical solutions of the present application will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions in the embodiments of the present application will be described more clearly below with reference to the accompanying drawings. It should be emphasized that the described embodiments are merely some, but not all embodiments in the application. Other embodiments, which can be derived by one of ordinary skill in the art from the embodiments in the present application without creative efforts, are also within the protection scope of the present application.
The main work of the application is to design a model for automatically performing multi-classification on texts, and news data is distinguished from other news data. Thereby identifying what data is news and what data is not news. The design meaning is very large, the future semantic meaning judgment work can be more convenient, and the system can automatically compare the corpus to judge the real meaning shown by news.
The data used by the method is news text, and anonymous processing is performed according to the character level. The integration divides 14 candidate classification categories: financial, lottery, real estate, stock, home, education, science and technology, social, fashion, political, sports, constellation, game, entertainment. The application technology is capable of well classifying the data types of the data, the hierarchy of a deep learning model is increased by introducing a transform technology, and the mixed neural network designed in the way is an innovation point of the application technology, so that a desired news data classification effect can be obtained.
According to the method, on the basis of a transform, performance tuning is completed by adjusting parameters of CNN and controlling the process of a pooling layer, and data are differentiated in data processing. The technology of the application is a hybrid neural network constructed by combining multiple layers of CNN and LSTM technologies on the basis of a Transformer technology, and the adjustment parameters of the model of the application are key technologies of the application.
The problem that this application was solved mainly is through the effective classification task of 200000 news data sets that collect, and its meaning lies in when doing semantic identification, and AI can not only understand mandarin but also can understand the news.
The application adopts data preprocessing as follows: the method comprises the steps of carrying out news- - > English- - - - - > English capitalization- - > keeping blank spaces and other symbolic features- - - - - > finally converting capital letters into digital features- - - - - > reconstructing data in a one-dimensional Tensor (Tensor) mode (converting the news data at the beginning into sequence data). Finally, a one-dimensional matrix of the news data set is obtained. And then carrying out equal length processing on the data, wherein the equal length operation is mainly to increase 0 to enable the lengths of all sequence data to be equal, data suitable for model training is constructed, then the data can be put into a transform +5CNN + LSTM mixed model of the technology shown in the application, each layer in the middle of the model is processed by adopting average pooling, the learning rate is configured to be three per thousand, and the batch is 50 batches. After 3000 iterations, the final performance stabilized at 92.5%.
Fig. 1 is an overall flowchart of a media data processing method in some embodiments of the present application, as shown in fig. 1, including:
first, media data is preprocessed.
Fig. 2 is a schematic diagram of a media data preprocessing process of a media data processing method according to some embodiments of the present application.
As shown in fig. 2, taking a news data text as an example, the media data preprocessing process adopted in the present application can be summarized as follows: the method comprises the steps of generating a new text, converting the new text into an English news text, converting English letters into uppercase letters, reserving invisible symbols such as spaces and paragraph marks and other tangible symbol characteristics, converting the uppercase letters and the symbols into digital characteristics, and reconstructing data in a one-dimensional Tensor (Tensor) mode (converting the initial news data into sequence data). The one-dimensional matrix of the news data set is obtained after the processing in the mode, then the one-dimensional matrix of the data set is subjected to isometric processing, and the isometric processing is mainly to obtain a plurality of isometric short one-dimensional matrixes by filling the long one-dimensional matrix with 0, so that data suitable for model training is constructed. The preprocessing of data mainly focuses on the data structure definition of data and the cleaning of invalid data. The data related to the application mainly adopts a list form to replace a one-dimensional matrix, and then relevant matrix operation is also carried out in the neural network. The method comprises the following specific steps:
reading and converting news text data.
About 200000 news data are collected firstly, and the collected news data text is preprocessed. The characteristics of the target sequence data are clarified, the labeling is accurately carried out, the characteristics of the data, news, semantics and data main bodies are used as the characteristics, and the relevance between the data and the previous characteristics is judged according to the problem content so as to judge the category of the data.
And anonymizing according to the character level. The integration divides into 14 candidate classification categories: financial, lottery, real estate, stock, home, educational, scientific, social, fashion, political, sports, constellation, game, entertainment text data. Each piece of data is in the form of one-dimensional Tensor (Tensor), and the main body of the data is English.
The news classification method includes the steps that a plurality of pieces of news data are collected and read, the news text data are often different in length, the short news data can be dozens of characters, the long news data can be hundreds of characters, the news text data can be Chinese news, english news, japanese, french and the like, news which are not related are classified into one category, the data size of the news data is very small, and the feature differentiation is very large.
According to The characteristics of news texts, characteristics cannot be accurately acquired in classification tasks, in order to solve The problem of massive media data preprocessing, a processing method of semantic segmentation is not directly adopted in The data preprocessing stage, but on The basis of news original texts, a whole piece of complete semantic data is directly translated into English data by calling translation interfaces or tools such as an API (application programming interface) of google, a Baidu translation API (application programming interface), memq (media q) and The like, for example, "Sichuan power utilization stress relief general industrial and commercial power utilization complete recovery" is translated into "The power short in Sichuan ha term and The commercial power less stored".
If necessary, software such as grammarly can be used for further touch-up, english data after touch-up translation is carried out, a plurality of punctuations are formed in the middle of a sentence or a word, spaces are used among words, all lower-case letters are uniformly processed into upper-case letters, then English data of the words are digitalized according to a certain simple corresponding relation by taking the processed letters as units, 26 letter characteristics such as ABC and the like are used, valid segments of the words are replaced by 0,1,2,.., 24 and 25, space segmentation of the spaces between two words is used for replacement by 50, the features of the spaces are used, the spaces are replaced by "_" underscores, and 41 numbers are used for representing the valid segments of the words. For example, the following may be mentioned: a- - >0,B- - >1,C- - >2,D- - >3,E- - >4,F- - >5,G- - >6,H- - >7,I- - >8,J- - >9,K- - >10, L- - >11, M- - >12, N- - >13, O- - >14, P- - >15, Q- - >16, R- - >17, S- - >18, T >19, U- - >20, V- - >21, W- - >22, X- - >23, Y- - >24, Z- - >25 are in the relationship to carry out the conversion of capital letters and numbers; if an invisible character such as a space, a tab, a paragraph mark and the like and a visible punctuation such as a comma, a period and the like are encountered, the following rules are used for processing: ' - - - - >30, ', ' - - - - >31, ' - - - - >32, '! ' - - - >33, ' - - - - - >51, '? ' - - - >75, ' - - - >76, ' - ' - - >77, ', ' - - >78, '! ' - - - >79, ' II ' - - >80, ' (' - - >81, ') ' - - >82, ': ' - - >83, '/' - - >84, ' + ' - - - >85, '% ' - >86. Through the above operations, the data is finally converted into serialized one-dimensional Tensor (Tensor) data, and each news is expressed by using the one-dimensional Tensor (Tensor), for example, a piece of news is expressed in the form of [01 11 24 33 54 98 11 ].
Further, by controlling by the switch method of python, the features retained in the one-dimensional Tensor (Tensor) are subjected to the region differentiation process. In some embodiments of the present application, symbols other than spaces, such as commas, periods, exclamations, etc., are erased from the one-dimensional Tensor (sensor) dataset, leaving only the space features, and all space features within the data are left intact and replaced with the number 30. After the data processed in the mode is input into the neural network, the feature significance is greatly improved through average pooling rendering, and the classification accuracy of the news data can be obviously improved.
The method comprises the steps of combing a target of data required by a deep learning model, converting 200000 long English semantic data into a digital sequence, and putting the data into a two-dimensional Tensor (Tensor) again for further feature conversion.
All that is needed is to convert the above 200000 pieces of data into a two-dimensional Tensor (Tensor), which is read into the program. An English text data set of two-dimensional Tensor (sensor) is obtained, firstly two problems are taken out independently and are divided into two sentences, and then the two sentences are processed into a long sequence sentence.
A mechanism of a random constraint technology is designed: the method is characterized in that a set of brand-new controllable random constraint mechanism is designed on the basis of random constraint. The meaning of a random constraint mechanism improves the hierarchical sense of data. The news data has multiple categories, the characteristics are relatively complicated, the characteristic difference is large, if the characteristics are not embodied hierarchically, high difficulty is caused during classification, and high calculation power needs to be consumed, so that the efficiency of neural network processing problems based on deep learning can be improved by improving the hierarchical sense of the data, and a large amount of calculation power is saved.
The random constraint mechanism in the application is mainly controlled by controllable random constraint numbers, random constraint is constrained by designing a program of a set of switch mechanism, according to different practical scenes, customized adjustment parameters can be realized, further, the characteristic layering of more data is realized, the variation range and the variation richness of the parameter are controlled by a series of parameters, meanwhile, as the random constraint mechanism with controllable random constraint is used for secondary cutting brought by English letter type data, the characteristics obtained by the random constraint mechanism are further amplified through a magnifying glass for deep learning, and the effective segmentation of news text data is realized.
The random constraint mechanism of the application adopts a data processing method designed by a parameter interpolation mechanism, and the parameter interpolation mechanism is a directional constraint technology combining random numbers and probability density.
The random constraint characteristic characters of the data are: "! The characters of @ # &,,, "etc., which are converted to special numbers such as"! "="70"," @ "="71"," # "="72","% "="73"," & "="74"," = "75". The random interpolation is inserted by numbers, and the directional probability constraint refers to that the data are distributed into the obtained semantic sequence according to certain data probability distribution according to the density degree, english data are converted into letters in the previous step and then are converted into one-dimensional tensors (tensors) of the digital sequence, but the insertion mode among the tensors (tensors) is performed according to the change rule of data fluctuation.
The occurrence frequency of letters is different from sentence to sentence, one of 200000 data sets is assumed to be M, after English is M-E, after data is formed, the M-E-N and M-E-N data sets have k different letters with probability distribution of p1, p2,.
Design of deep learning-based hybrid neural network model for media data of random constraint mechanism
FIG. 3 is a block diagram of a hybrid neural network model in some embodiments of the present application.
As shown in fig. 3, the Transformer adopted in the present application combines with a 5-layer or 8-layer CNN + LSTM hybrid neural network, and the Transformer mechanism is a core method of the present application, and its excellent self-attention mechanism converts data features into a more obvious mechanism for the regular search and feature transformation of sequences, and for the processing of sequence commonsense, the Transformer has a higher affinity with the brain in the training mode and is largely used for the processing of natural language, whereas the recurrent neural network used in the present application is somewhat different from the normal Transformer, and its parameter setting and improvement in the number of network layers are mainly confirmed according to the irregular sequence data of 7700 length in the present scenario. Different from the traditional Transformer, the self authorization mechanism adopted by the application is designed for different modes of sequence data, and the cyclic neural network adopts a double-layer cyclic neural network in design because of non-image data and small data quantity and dimensionality, so that the expansion on a parallel layer is reduced, otherwise, a large amount of redundancy!
The Self attribute mainly carries out sliding window processing on S sequence data sets with the length of 70 to sequentially obtain small sequences of the S sequence data sets, and is characterized in that the features are sequentially extracted according to 10-20 data quantity, if the data quantity is a sequence with 20 data quantity, 50 small sequences can be extracted, and if the data quantity is 10, 60 small sequences can be extracted. An attention mechanism operation is performed for each small sequence. (configuration of Transformer)
And (3) setting parallel layers, simultaneously carrying out cyclic neural network training on 50-60 small sequences obtained by the S sequence, namely S1 and S2, and extracting small features with high similarity, thereby obtaining a subsequence SS1 with the highest public degree in the small sequences. SS1 is a modular feature of its data. The Transformer then inputs the resulting processed and data SS1 into CNN for further processing.
The method comprises the steps that the parameter configuration value of a layer 1 CNN and a Max _ len is 20, the hidden layer configuration is 45, the kernel _ u size parameter is 3, the bias parameter is True, the layer 1 neural network adopts the average pooling operation, after the CNN completes the layer 1 convolution, before the CNN accurately enters a layer 2 convolution, the data needs to be subjected to the average pooling operation, and then the CNN enters a layer 2 CNN.
The number of the layer 2 CNN is, the Max _ len parameter configuration value is 45, the hidden layer is configured to be 75, the kernel _ size parameter is 3, and the bias parameter is True. The average pooling configuration is that the parameter stride parameter is None, the padding parameter is 0, the ceil _modeparameter is False, the count _ include _ pad parameter is True, and the layer 2 neural network adopts the average pooling operation and inputs the layer 3 CNN.
The CNN of the 3 rd layer has a parameter configuration value of 75 Max _ len, the hidden layer of hidden _ dim is configured to be 105, the key_size parameter is 3, and the bias parameter is True. The average pooling configuration is that parameter stride parameter is None, padding parameter is 0, ceil _modeparameter is False, count _ include _ pad parameter is True, layer 3 neural network adopts average pooling operation, and then input into layer 4 CNN.
The 4 th layer CNN, the Max _ len parameter configuration value is 105, the hidden layer is configured to be 135, the kernel _sizeparameter is 3, and the bias parameter is True. The average pooling configuration is that the parameter stride parameter is None, the padding parameter is 0, the ceil _modeparameter is False, the count _ include _ pad parameter is True, and the layer 4 neural network adopts the average pooling operation and inputs the layer 5 CNN.
The number 5 of the layers CNN, the Max _ len parameter configuration value is 135, the hidden layer configuration is 165, the kernel _ size parameter is 3, and the bias parameter is True. The average pooling configuration has parameters of stride parameter of None, padding parameter of 0, ceil_mode parameter of False, and count _ include _ pad parameter of True.
As an optional implementation manner of the present application, the above CNN network model with a 5-layer structure may also have an 8-layer structure, based on the former 5-layer CNN network.
The layer 5 neural network adopts the operation of average pooling and inputs the layer 6 CNN.
The CNN of the 6 th layer has a Max _ len parameter configuration value of 165, the hidden layer of hidden _ dim is configured to be 200, the kernel _sizeparameter is 3, and the bias parameter is True. The average pooling configuration is that the parameter stride parameter is None, the padding parameter is 0, the ceil _modeparameter is False, the count _ include _ pad parameter is True, and the layer 6 neural network adopts the average pooling operation and inputs the layer 7 CNN.
The 7 th layer CNN, the Max _ len parameter configuration value is 200, the hidden layer is configured to be 240, the kernel _sizeparameter is 3, and the bias parameter is True. The average pooling configuration is that the parameter stride parameter is None, the padding parameter is 0, the ceil _modeparameter is False, the count _ include _ pad parameter is True, and the layer 7 neural network adopts the average pooling operation and inputs the layer 8 CNN.
The 8 th layer CNN, the Max _ len parameter configuration value is 240, the hidden layer is configured to be 280, the kernel _sizeparameter is 3, and the bias parameter is True. The average pooling configuration is that the parameter stride parameter is None, the padding parameter is 0, the ceil _modeparameter is False, the count _ include _ pad parameter is True, and the first-layer neural network adopts the average pooling operation and inputs the parameters into the LSTM model.
Inputting output data of a 5 th-layer CNN or an 8 th-layer CNN network into an LSTM model, configuring parameters of the LSTM to be embedding _ dim with 87, hidden \/dim with 117, num _layerswith 3, output _ size with 2 and padding with 1, then using a word vector conversion method torch.nn. Embedding, configuring internal length parameters to be 180, then putting the parameters processed by the previous embedding method into the LSTM model, after an output result enters a full connection layer, finally forming a characteristic number, dividing the category according to the number, and finally realizing the internal regularity classification of a disordered sequence, which is similar to human language, similar language modes spoken into the same type automatically, and different languages are divided according to different internal modes sequentially, thereby realizing the classification of mode levels.
And (5) a model training process.
After the preprocessed training sample is divided into two parts, namely T _ Pos and T _ Neg, after the training sample is input into the mixed neural network model, data can select positive and negative samples from the two T according to a previously configured batch, the positive and negative samples are randomly constrained, and the mixed neural network model is trained according to the proportion of 8:2, namely Pos: neg = 8:2.
Firstly, the number of rounds, namely the epochs parameter, needs to be configured, 1000-3000 rounds of parameter configuration are adopted for training, and screening is carried out on positive and negative samples in each round of training again to ensure that training data and test data obtained in each round are different, so that the fairness of the training process is also ensured. The random constraint mechanism is introduced, small-scale data are trained into large-scale structures, and the random constraint mechanism is also an essential technology in large-scale data development.
The following is the configuration of learning rate and batch, during training, the learning rate is adjusted to be reduced to three per thousand, but the problem of mass media data preprocessing with slow convergence is brought, data is utilized to the maximum extent in the previous data processing, it is difficult to add features from the data processing, therefore, from the perspective of training parameter adjustment, the parameter adjustment to three per thousand can cause slow convergence and local optimum, gradient explosion is inevitable, but from the overall perspective, the training of data is performed from the good direction, the gradient explosion occurrence rate can be avoided by setting the batch parameter to 50, so that the trend of the model can be performed towards the ascending direction all the time, and after the model is trained, it can be really put into use. Although the time of the process is long, in order to reduce the condition that the model training time is too long, the CNN is improved, the CNN + LSTM mode is used, and when the modes such as parameter adjustment, data conversion and the like fail, a new model is constructed by modifying the final mode of the model structure, so that the performance is better, and the convergence speed and the training accuracy are improved by 3-4% after the improvement compared with those before the improvement. The final accuracy is kept above 81%, and the highest accuracy is 92.5%.
When the LSTM is adopted in the model, the input parameter configuration is 26, the expansion is gradually carried out from small to large, and other parameter configurations are all the standard parameter configuration of the LSTM.
The method adopts a Transformer + 5-layer or 8-layer CNN + LSTM hybrid neural network to well solve the problem of preprocessing of mass media data, and the configuration of parameters of the neural network is also a key technical means of the method. After the data is processed in the early stage by the LSTM neural network, the data is input into a special 8-layer CNN, so that the random constraint characteristics can be further shown well, and a good effect is achieved. The configuration of the learning rate in the hybrid neural network, the average pooling, the step size design of the neural network and the size and length design of each step of the network are determined according to the particularity of the data, so that the design of the network is creative.
When the data sharpening characteristics are processed, a random constraint mechanism is mainly introduced in a data preprocessing stage, symbols of random constraint are inserted into spaces among original English words, then average pooling is used, data with distinct layers are obtained, and a good effect can be obtained through the model. The random constraint characteristics set by the random constraint mechanism change the data structure and increase the difference between the characteristics.
The technology of the application provides a deep learning model with better performance while reducing cost, and the neural network can be deployed according to semantic training data no matter light-weighted edge devices such as a single chip microcomputer and a central data center. The commercial value of the application lies in that the news diagnosis data set can be trained on a large scale, and a good effect can be achieved with low cost. The cloud change of the group is converted into the combination of big data and artificial intelligence, which is very common in the future, and once the AI system is perfected in the future, the system plays an important role in judging and analyzing epidemic situations under the strong calculation force of cloud computing.
The present embodiment further provides a media data processing system, please refer to fig. 4, which includes:
the data preprocessing module is used for acquiring a media data text and converting the media data text into a plurality of isometric digital sequence data sets according to a preset rule;
the modeling module is used for establishing a hybrid neural network based on a transformer module, a CNN module and an LSTM module, wherein the transformer module comprises an ENCODER and a DECODER, the CNN module comprises 5 layers or 8 layers of convolutional layers, and an average pooling layer is arranged between the convolutional layers;
and the model training module is used for dividing the isometric digital sequence data set into a positive data set and a negative data set according to the proportion of 8:2 and training the hybrid neural network by using the positive data set and the negative data set.
Some embodiments of the present application also provide computer apparatus. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as program codes of a media data processing method. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically arranged to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute the program code stored in the memory 61 or process data, for example, execute the program code of the media data processing method.
The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.
The present application further provides another embodiment, which is a non-transitory computer-readable storage medium storing a media data processing program, the media data processing program being executable by at least one processor to cause the at least one processor to perform the steps of the media data processing method as described above. The non-transitory computer-readable storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
The numbers of the disclosed embodiments in the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, and a program that can be implemented by the hardware and can be instructed by the program to be executed by the relevant hardware may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic or optical disk, and the like.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.
Claims (10)
1. A media data processing method, comprising the steps of:
acquiring a media data text, and converting the media data text into a plurality of digital sequence data sets with equal length according to a preset rule;
establishing a hybrid neural network based on a transformer module, a CNN module and an LSTM module, wherein the transformer module comprises an ENCODER and a DECODER, the CNN module comprises 5 or 8 convolutional layers, and an average pooling layer is arranged between the convolutional layers;
dividing the isometric digital sequence data set into a positive data set and a negative data set according to the proportion of 8:2, and training the hybrid neural network by using the positive data set and the negative data set.
2. A method for processing media data according to claim 1, characterized by: the predetermined rule includes translating the media data text into an English text if the media data text is a non-English text.
3. A media data processing method according to claim 2, characterized in that: the predetermined rule includes: the English letters are converted into numbers.
4. A method of processing media data according to claim 2 or 3, characterized by: the predetermined rules include converting characters to numbers, wherein the characters include intangible characters and tangible characters.
5. A method for processing media data according to claim 1, characterized by: the CNN module comprises: the method comprises the following steps that (1) a parameter configuration value of a layer 1 CNN is 20, a parameter configuration value of a Max _ len is 20, a hidden layer configuration value of a hidden layer is 45, a kernel _ size parameter is 3, a bias parameter is True, the layer 1 neural network adopts an average pooling operation, after the layer 1 convolution is completed, before the layer 2 convolution is accurately entered, the average pooling operation needs to be carried out on data, and then the layer 2 CNN is entered;
the configuration value of a Max _ len parameter of the CNN at the layer 2 is 45, the configuration value of a hidden layer of hidden layer is 75, the parameter of kernel_size is 3, and the parameter of bias is True; the average pooling configuration is that a stride parameter is None, a padding parameter is 0, a ceil _modeparameter is False, a count _ include _ pad parameter is True, and the layer 2 neural network adopts average pooling operation and inputs the layer 3 CNN;
the number of layers CNN of the 3 rd layer, the Max _ len parameter configuration value is 75, the hidden layer configuration is 105, the kernel _ size parameter is 3, and the bias parameter is True; the average pooling configuration is that a stride parameter is None, a padding parameter is 0, a ceil _modeparameter is False, a count _ include _ pad parameter is True, the layer 3 neural network adopts average pooling operation and then inputs the layer 4 CNN;
the 4 th layer CNN, the Max _ len parameter configuration value is 105, the hidden layer is configured to be 135, the kernel _sizeparameter is 3, and the bias parameter is True; the average pooling configuration is that a stride parameter is None, a padding parameter is 0, a ceil _modeparameter is False, a count _ include _ pad parameter is True, and the layer 4 neural network adopts average pooling operation and inputs the layer 5 CNN;
the number 5 of the layers CNN, the Max _ len parameter configuration value is 135, the hidden layer configuration is 165, the kernel _sizeparameter is 3, and the bias parameter is True; the average pooling configuration has parameters of stride parameter of None, padding parameter of 0, ceil _modeparameter of False, and count _ include _ pad parameter of True.
6. A media data processing method according to claim 1, characterized in that: the transform module is provided with a Self attention module used for performing sliding window processing on the digital sequence data sets with equal length.
7. The media data processing method according to one of claims 1 to 6, wherein the predetermined rule comprises a random restriction mechanism for replacing special characters with special numbers in the media data text.
8. A media data processing system, comprising:
the data preprocessing module is used for acquiring a media data text and converting the media data text into a plurality of isometric digital sequence data sets according to a preset rule;
the modeling module is used for establishing a hybrid neural network based on a transformer module, a CNN module and an LSTM module, wherein the transformer module comprises an ENCODER and a DECODER, the CNN module comprises 5 layers or 8 layers of convolutional layers, and an average pooling layer is arranged between the convolutional layers;
and the model training module is used for dividing the isometric digital sequence data set into a positive data set and a negative data set according to the proportion of 8:2 and training the hybrid neural network by using the positive data set and the negative data set.
9. A computer device, characterized in that the computer device comprises a processor for executing a program, wherein the program when executed performs the media data processing method of any one of claims 1-7.
10. A non-volatile storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the media data processing method according to any one of the claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211051820.4A CN115422356A (en) | 2022-08-31 | 2022-08-31 | Media data processing method, system, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211051820.4A CN115422356A (en) | 2022-08-31 | 2022-08-31 | Media data processing method, system, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115422356A true CN115422356A (en) | 2022-12-02 |
Family
ID=84199846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211051820.4A Pending CN115422356A (en) | 2022-08-31 | 2022-08-31 | Media data processing method, system, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115422356A (en) |
-
2022
- 2022-08-31 CN CN202211051820.4A patent/CN115422356A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3896597A2 (en) | Method, apparatus for text generation, device and storage medium | |
CN108287858B (en) | Semantic extraction method and device for natural language | |
CN110110041A (en) | Wrong word correcting method, device, computer installation and storage medium | |
CN109948149B (en) | Text classification method and device | |
CN112257446B (en) | Named entity recognition method, named entity recognition device, named entity recognition computer equipment and named entity recognition readable storage medium | |
CN110826298B (en) | Statement coding method used in intelligent auxiliary password-fixing system | |
CN113312453B (en) | A model pre-training system for cross-language dialogue understanding | |
US20200311345A1 (en) | System and method for language-independent contextual embedding | |
CN110610180A (en) | Method, device and equipment for generating recognition set of wrongly-recognized words and storage medium | |
CN111984792A (en) | Website classification method and device, computer equipment and storage medium | |
US11615247B1 (en) | Labeling method and apparatus for named entity recognition of legal instrument | |
CN113160917B (en) | Electronic medical record entity relation extraction method | |
CN112528649A (en) | English pinyin identification method and system for multi-language mixed text | |
CN111506726A (en) | Short text clustering method and device based on part-of-speech coding and computer equipment | |
Dilawari et al. | Neural attention model for abstractive text summarization using linguistic feature space | |
CN118551004A (en) | Knowledge retrieval graph-based Chinese dialogue knowledge retrieval method and system | |
CN110633456B (en) | Language identification method, language identification device, server and storage medium | |
Inunganbi et al. | Handwritten Meitei Mayek recognition using three‐channel convolution neural network of gradients and gray | |
CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
CN114780577B (en) | SQL statement generation method, device, equipment and storage medium | |
CN115422356A (en) | Media data processing method, system, computer equipment and storage medium | |
CN118113864A (en) | Text emotion classification method and device, electronic equipment and storage medium | |
CN107784328A (en) | The old character recognition method of German, device and computer-readable recording medium | |
CN114611489A (en) | Text logic condition extraction AI model construction method, extraction method and system | |
CN115617959A (en) | Question answering method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |