CN114780727B

CN114780727B - Text classification method, device, computer equipment and medium based on reinforcement learning

Info

Publication number: CN114780727B
Application number: CN202210433355.4A
Authority: CN
Inventors: 王伟; 张黔; 陈焕坤; 郑毅
Original assignee: China Resources Digital Technology Co Ltd
Current assignee: China Resources Digital Technology Co Ltd
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2025-02-25
Anticipated expiration: 2042-04-24
Also published as: CN114780727A

Abstract

The embodiment of the application belongs to the technical field of artificial intelligence and relates to a text classification method based on reinforcement learning, which comprises the steps of obtaining training text corpus, extracting semantic features of the training text corpus to obtain semantic feature vectors, inputting the semantic feature vectors into a trained clustering model, outputting semantic clusters, extracting keywords from all the semantic clusters, forming semantic feature queues corresponding to the semantic clusters according to the extracted keywords, selecting keywords from each semantic feature queue as target keywords, generating word semantic vectors based on the target keywords, inputting the word semantic vectors into a pre-built initial classification model for training to obtain a trained target classification model, obtaining texts to be classified, inputting the texts to be classified into the target classification model, and outputting text classification results. The application also provides a text classification device, computer equipment and medium based on reinforcement learning. The application can improve the accuracy of text classification.

Description

Text classification method, device, computer equipment and medium based on reinforcement learning

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a text classification method, apparatus, computer device, and medium based on reinforcement learning.

Background

Text classification has been a common task in the field of natural language understanding, and has formed many approaches, and can be generally divided into two categories, i.e., supervised learning and unsupervised learning. In the supervised learning field, various kinds of feature information capable of representing text semantics are extracted to complete classification through training models, and in the unsupervised field, features of texts are learned autonomously through clustering and other methods to form clusters of texts with similar features, so that classification is completed.

However, the current text classification model ignores the unbalance of the sample, the model tends to learn the characteristics of more dominant categories, and ignores the characteristics of less dominant categories, so that the learned algorithm is easy to be over-fitted, and the text classification is inaccurate.

Disclosure of Invention

The embodiment of the application aims to provide a text classification method, a text classification device, computer equipment and a text classification medium based on reinforcement learning, which are used for solving the technical problem of inaccurate text classification caused by sample unbalance in the related technology.

In order to solve the above technical problems, the embodiments of the present application provide a text classification method based on reinforcement learning, which adopts the following technical scheme:

acquiring a training text corpus, and extracting semantic features of the training text corpus to obtain a semantic feature vector;

inputting the semantic feature vectors into a trained clustering model, and outputting semantic clusters;

Extracting keywords from all the semantic clusters, and forming a semantic feature queue corresponding to each semantic cluster according to the extracted keywords;

Selecting keywords from each semantic feature queue as target keywords, and generating word sense vectors based on the target keywords;

inputting the word sense vector into a pre-constructed initial classification model for training to obtain a trained target classification model;

and obtaining a text to be classified, inputting the text to be classified into the target classification model, and outputting a text classification result.

Further, before the step of inputting the semantic feature vector into the trained cluster model and outputting the semantic cluster, the method further comprises:

Inputting the semantic feature vector into a pre-constructed neural network model, and outputting a clustering result;

determining a clustering loss function according to the clustering result;

adjusting model parameters of the neural network model based on the cluster loss function;

and when the iteration ending condition is met, generating a clustering model according to the model parameters.

Further, the step of determining a cluster loss function according to the clustering result includes:

calculating the contour coefficient of each cluster in the clustering result;

Obtaining training rewards according to the contour coefficients;

And obtaining the clustering loss function based on the clustering result and the training reward score.

Further, the step of forming a semantic feature queue corresponding to each semantic cluster according to the extracted keywords includes:

calculating the similarity between the keywords of each semantic cluster;

sorting the keywords according to the similarity to obtain a sorting result;

And generating a semantic feature queue corresponding to each semantic cluster based on the sequencing result.

Further, the step of generating a word sense vector based on the target keyword includes:

Extracting features of the target keywords to obtain keyword vectors;

and splicing the keyword vector and the semantic feature vector to obtain a word semantic vector.

Further, the step of inputting the word sense vector into a pre-constructed initial classification model to train and obtaining a trained target classification model includes:

inputting the word sense vector into a pre-constructed initial classification model to obtain a prediction classification result;

determining a classification loss function according to the prediction classification result;

adjusting model parameters of the initial classification model according to the classification loss function;

and when the iteration ending condition is met, generating a target classification model based on the model parameters.

Further, the step of determining a classification loss function according to the prediction classification result includes:

calculating to obtain a classified rewarding value according to the prediction classification result;

And obtaining a classification loss function based on the classification reward value and the prediction classification result.

In order to solve the technical problems, the embodiment of the application also provides a text classification device based on reinforcement learning, which adopts the following technical scheme:

the semantic feature extraction module is used for obtaining training text corpus, and extracting semantic features of the training text corpus to obtain semantic feature vectors;

The clustering module is used for inputting the semantic feature vectors into the trained clustering model and outputting semantic clusters;

The keyword extraction module is used for extracting keywords from all the semantic clusters and forming a semantic feature queue corresponding to each semantic cluster according to the extracted keywords;

The vector generation module is used for selecting keywords from each semantic feature queue as target keywords and generating word sense vectors based on the target keywords;

the training module is used for inputting the word meaning vector into a pre-constructed initial classification model for training to obtain a trained target classification model;

The classification module is used for acquiring texts to be classified, inputting the texts to be classified into the target classification model, and outputting text classification results.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

The computer device includes a memory having stored therein computer readable instructions which when executed implement the steps of the reinforcement learning based text classification method described above.

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

The computer readable storage medium has stored thereon computer readable instructions which when executed by a processor implement the steps of the reinforcement learning based text classification method as described above.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

The method comprises the steps of obtaining training text corpus, carrying out semantic feature extraction on the training text corpus to obtain semantic feature vectors, inputting the semantic feature vectors into a trained clustering model to output semantic clusters, carrying out keyword extraction on all the semantic clusters, forming semantic feature queues corresponding to each semantic cluster according to the extracted keywords, selecting keywords from each semantic feature queue as target keywords, generating word semantic vectors based on the target keywords, inputting the word semantic vectors into a pre-built initial classification model to carry out training to obtain a trained target classification model, obtaining text to be classified, inputting the text to be classified into the target classification model to output text classification results, carrying out semantic feature extraction on the training text corpus, clustering the extracted semantic feature vectors to obtain semantic clusters of different categories, and carrying out training on the classification models by using the semantic clusters of different categories to enable the classification models to learn the semantic features of different categories in the training text corpus, so that the text classification accuracy can be improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a reinforcement learning based text classification method in accordance with the present application;

FIG. 3 is a schematic diagram of an embodiment of a reinforcement learning based text classification device in accordance with the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

The application provides a text classification method based on reinforcement learning, which relates to artificial intelligence and can be applied to a system architecture 100 shown in fig. 1, wherein the system architecture 100 can comprise terminal equipment 101, 102 and 103, a network 104 and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the text classification method based on reinforcement learning provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the text classification device based on reinforcement learning is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow chart of one embodiment of a reinforcement learning based text classification method according to the present application is shown, comprising the steps of:

Step S201, obtaining a training text corpus, and extracting semantic features of the training text corpus to obtain semantic feature vectors.

The training text corpus may be obtained from public data sets including, but not limited to, a Chinese news data set, THUCNews data set, an online_ shoppping _10_cas data set, and the like. The method comprises the steps of obtaining an original text corpus from a public data set, performing preprocessing such as word segmentation and stop word removal on the original text corpus, and randomly dividing the preprocessed original text corpus into a training set and a testing set according to a preset proportion, wherein the training set is training text corpus and is a text set.

In this embodiment, the corpus of training text is subjected to semantic feature extraction, and semantic feature extraction may be performed using a semantic feature extraction model, where the semantic feature extraction model includes, but is not limited to, a CNN (Convolutional Neural Networks, convolutional neural network) model, a RNN (RecurrentNeuralNetwork, cyclic neural network) model, an LSTM (Long-short term memory, long-short-term memory network) model, a BERT (Bidirectional Encoder Representations from Transformers, bi-directional encoder characterization based on a converter) model, and the like, without limitation.

As a specific implementation, the training text corpus may be input to a BERT-based pre-training language model for semantic feature extraction.

Step S202, inputting the semantic feature vectors into a trained clustering model, and outputting semantic clusters.

In this embodiment, the semantic feature vectors may be clustered by a trained clustering model, where the clustering algorithm used by the clustering model includes, but is not limited to, a K-means algorithm, and a Single-Pass algorithm.

Taking a Single-Pass algorithm as an example, the clustering process is described in detail, and the steps comprise:

step A, selecting any semantic feature vector as a cluster center of a first cluster;

and B, selecting any unprocessed semantic feature vector from unprocessed other semantic feature vectors, calculating the similarity value of the semantic feature vector and all the existing class clusters, selecting the class cluster with the largest similarity value as the nearest class cluster of the semantic feature vector, and obtaining the similarity value of the nearest class cluster.

It should be appreciated that the algorithm just begins with only one cluster, i.e., the cluster generated in step a, when "all clusters in existence" are the clusters. With the operation of the algorithm, a new class cluster is created for the semantic feature vector or the semantic feature vector is classified into different class clusters, the number of the class clusters is increased, and the existing all class clusters refer to all the class clusters which are generated currently.

And C, judging the similarity value of the nearest class cluster and a similarity threshold value, if the similarity value is larger than the similarity threshold value, classifying the semantic feature vector selected in the step B into the nearest class cluster and updating the center of the nearest class cluster, otherwise, taking the semantic feature vector selected in the step B as the class cluster center of a new class cluster.

In this embodiment, the center of the class cluster is the average of the semantic feature vectors in the class cluster. The method is characterized by comprising the following steps:

wherein C represents the centroid vector of the class cluster, n represents the number of semantic feature vectors of the class cluster, and d _i represents the semantic feature vectors in the class cluster.

The update is to recalculate the average value of the semantic feature vectors of the clusters after adding one semantic feature vector.

And D, judging whether the semantic feature vectors in the semantic feature vector set to be processed are processed, if not, returning to the step C, otherwise, outputting a clustering result.

Step S203, extracting keywords from all the semantic clusters, and forming a semantic feature queue corresponding to each semantic cluster according to the extracted keywords.

In this embodiment, a preset number of keywords are extracted for each semantic cluster, and the method for extracting keywords includes, but is not limited to, TF-IDF (word frequency-inverse document frequency) algorithm, LDA algorithm, and the like.

And carrying out similarity sequencing on the extracted keywords to form a semantic feature sequence. Specifically, the similarity between the keywords of each semantic cluster is calculated, the keywords are ranked according to the similarity, a ranking result is obtained, and a semantic feature queue corresponding to each semantic cluster is generated based on the ranking result.

Wherein, computing the similarity between each semantic cluster keyword may employ a similarity algorithm including, but not limited to, cosine similarity (Cosine Similarity), lycemic distance (LEVENSHTEIN DISTANCE), and the like. After calculating the similarity between the keywords, the keywords are arranged in an ascending order according to the similarity, and a semantic feature queue of the semantic cluster with the similarity arranged from high to low is formed.

Step S204, selecting keywords from each semantic feature queue as target keywords, and generating word sense vectors based on the target keywords.

In the present embodiment, the text classification model is trained by reinforcement learning, and prior to training, the following definition is performed:

Before each training round, randomly selecting a keyword from each semantic feature queue, obtaining a corresponding vector by adopting the semantic feature extraction model, and multiplying the vector by a preset coefficient.

The classification rewards are defined as classification rewards by dividing the predicted label class value by the correct label value of the sample and multiplying the inverse of the proportion of the class in the whole sample. For example, assuming that the samples have 5 total categories, category 1 accounts for 1/10, category 2 accounts for 1/5, category 3 accounts for 1/4, category 4 accounts for 1/3, category 5 accounts for 7/60, the category 1 reward coefficient is 10, category 2 reward coefficient is 5, category 3 reward coefficient is 4, category 4 reward coefficient is 3, and category 5 reward coefficient is about 8.57.

Before each round of training, a preset number of keywords are randomly selected from each semantic feature queue to serve as target keywords, and specifically, one keyword is randomly selected.

And extracting the characteristics of the target keywords, obtaining vectors corresponding to the target keywords by using the semantic characteristic extraction model, and multiplying the vectors by preset coefficients to obtain word semantic vectors corresponding to the target keywords.

It should be noted that, the preset coefficient is the reciprocal of the absolute value of the contour coefficient of the semantic cluster where the target keyword is located.

And splicing the keyword vector and the semantic feature vector obtained through the semantic feature extraction model to obtain a word semantic vector.

In the embodiment, the keyword vectors and the semantic feature vectors selected from different semantic clusters are spliced, and the obtained word semantic vectors are used for training the classification model, so that the classification model can learn different semantic features in the training sample, and the accuracy of model classification is improved.

Step S205, inputting the word sense vector into a pre-constructed initial classification model for training to obtain a trained target classification model.

In this embodiment, the pre-built initial classification model is a multi-layer neural network model with N actions corresponding to the classification, the word sense vector is input into the multi-layer neural network model, and training and updating are performed on the multi-layer neural network model according to the classification reward value, so that the classification reward value is maximized, where N is a natural number greater than zero.

As a specific implementation mode, the multi-layer neural network model comprises an input layer, a first hidden layer, a second hidden layer and an output layer, wherein the input layer inputs word meaning vectors v, a first hidden layer weight matrix is set to be w ₁, a relu activation function is adopted, the bias amount is set to be b ₁, the output of the first hidden layer is o ₁＝relu(w₁*v+b₁, a second hidden layer weight matrix is set to be w ₂, a relu activation function is adopted, the bias amount is set to be b ₂, the output of the second hidden layer is o ₂＝relu(w₂*o₁+b₂, the output layer adopts a softmax layer, o ₂ is input into the softmax layer, and o ₃,o₃, namely the class probability Pa predicted by each training is obtained through the softmax layer.

In order to achieve better text classification, more hidden layers can be set according to the actual situation.

Step S206, obtaining a text to be classified, inputting the text to be classified into a target classification model, and outputting a text classification result.

And acquiring the text to be classified, inputting the text to be classified into the target classification model, and outputting a text classification result.

According to the method, the semantic feature extraction is carried out on the training text corpus, the extracted semantic feature vectors are clustered to obtain different semantic clusters, and the classification model is trained according to the different semantic clusters, so that the classification model learns the semantic features of different categories in the training text corpus, and the text classification accuracy can be improved.

In some optional implementations of this embodiment, before the step of inputting the semantic feature vector into the trained cluster model and outputting the semantic cluster, the step of further includes:

determining a clustering loss function according to the clustering result;

model parameters of the neural network model are adjusted based on the clustering loss function;

In this embodiment, the pre-built neural network model may have the same structure as the classification model, and includes an input layer, a first hidden layer, a second hidden layer and an output layer, where the input layer inputs a semantic feature vector x, the first hidden layer weight matrix is set to be w ₁, a relu activation function is adopted, the bias amount is b ₁, the output of the first hidden layer is o ₁＝relu(w₁*x+b₁), the second hidden layer weight matrix is set to be w ₂, a relu activation function is adopted, the bias amount is b ₂, the output of the second hidden layer is o ₂＝relu(w₂*o₁+b₂, the output layer adopts a softmax layer, the o ₂ is input into the softmax layer, and the probability Pc of o ₃,o₃, that is, each action is obtained through the softmax layer. Different hidden layers can be set according to actual needs.

In this embodiment, model parameters of the neural network model are adjusted based on the loss function, and when the iteration end condition is satisfied, a cluster model is generated according to the model parameters.

Specifically, model parameters of the neural network model are adjusted based on the loss function value of the loss function, iterative training is continued, the model is trained to a certain extent, at this time, the performance of the model reaches an optimal state, and the loss function value is hardly changed, namely convergence. And when the iteration ending condition is met, model convergence is achieved, and after model convergence, a final neural network model is output as a clustering model according to the finally adjusted model parameters.

According to the method, the neural network model pre-constructed through reinforcement learning training is used as a clustering model, so that the clustering precision can be improved, and the text classification efficiency can be improved.

In some optional implementations, the step of determining the cluster loss function according to the clustering result includes:

Calculating the contour coefficient of each cluster in the clustering result;

obtaining training reward points according to the contour coefficients;

and obtaining a clustering loss function based on the clustering result and the training reward score.

The contour coefficient can be calculated by adopting an index-contour coefficient for measuring the clustering effect in a clustering algorithm.

For each semantic feature vector o in cluster D, calculating o and other objects in the cluster to which o belongsThe average distance a (o) between the two is calculated as follows:

b (o) is the minimum average distance of o to all clusters not containing o, and the formula is as follows:

The contour coefficients are:

in the present embodiment, the cluster model is trained by reinforcement learning, and prior to training, the following definition is performed:

The prize value is defined as giving the prize value 1+1/|s (o) | when the contour coefficient is less than the preset threshold Tg, and otherwise giving the prize value- |s (o) |.

At the end of each training period, a training prize score S ₁ is calculated up to that point in time, calculated by the following formula:

Wherein, gamma is a gain attenuation coefficient, n is the number of training periods, i=1 to (n-1), S _t is the reward score obtained in the t training period, namely when the contour coefficient is smaller than a preset threshold Tg, S _t =1+1/|s (o) |, otherwise S _t = - |s (o) |.

In an embodiment, obtaining the clustering loss function based on the clustering result and the training reward score specifically includes:

Calculating the logarithmic value of the clustering result, multiplying the logarithmic value and the training rewards score to obtain a product value, taking the negative value of the product value as a clustering loss function, and then calculating the clustering loss function as follows:

Loss=-S₁×logPc_t;

Where Pc _t represents the probability of the t-th round of action.

In this embodiment, the model is iteratively updated by using the clustering result and the clustering loss function obtained by training the reward points, so that the training reward points are maximized, and the accuracy of the model is ensured.

In some optional implementations of this embodiment, the step of inputting the word sense vector into the pre-constructed initial classification model to perform training, and obtaining the trained target classification model includes:

inputting the word meaning vector into a pre-constructed initial classification model to obtain a prediction classification result;

Model parameters of the initial classification model are adjusted according to the classification loss function;

and when the iteration end condition is met, generating a target classification model based on the model parameters.

Specifically, model parameters of the initial classification model are adjusted based on the loss function value of the classification loss function, iterative training is continued, the model is trained to a certain extent, at this time, the performance of the model reaches an optimal state, and the loss function value is hardly changed, namely, convergence. And when the iteration ending condition is met, model convergence is achieved, and after model convergence, a final classification model is output as a target classification model according to the finally adjusted model parameters.

According to the word sense vector training classification model obtained by splicing the keyword vectors and the semantic feature vectors of different semantic clusters, the classification model can learn different categories and implicit semantic features in the training text corpus, and the accuracy of text classification is further improved.

In this embodiment, the step of determining the classification loss function according to the prediction classification result includes:

A classification loss function is derived based on the classification prize value and the predicted classification result.

Specifically, the logarithmic value of the prediction classification result Pa is calculated, the logarithmic value and the classification rewarding value S ₂ are multiplied to obtain a product value, and the negative value of the product value is taken as the classification loss function, so that the calculation formula of the classification loss function is as follows:

Loss=-S₂×logPa_t;

Wherein Pa _t represents the classified label probability of the training output of the t-th round.

According to the method and the device, the model is updated iteratively through the classification loss function obtained by predicting the classification result and the classification rewarding value, so that the classification rewarding value is maximized, the accuracy of the classification model is guaranteed, and the accuracy of text classification is improved.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a text classification apparatus based on reinforcement learning, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 3, the text classification device 300 based on reinforcement learning according to the present embodiment includes a semantic feature extraction module 301, a clustering module 302, a keyword extraction module 303, a vector generation module 304, a training module 305, and a classification module 306. Wherein:

The semantic feature extraction module 301 is configured to obtain a training text corpus, and perform semantic feature extraction on the training text corpus to obtain a semantic feature vector;

The clustering module 302 is configured to input the semantic feature vector into a trained clustering model, and output a semantic cluster;

the keyword extraction module 303 is configured to extract keywords from all the semantic clusters, and form a semantic feature queue corresponding to each semantic cluster according to the extracted keywords;

the vector generation module 304 is configured to select a keyword from each semantic feature queue as a target keyword, and generate a word sense vector based on the target keyword;

The training module 305 is configured to input the word sense vector into a pre-constructed initial classification model for training, so as to obtain a trained target classification model;

the classification module 306 is configured to obtain a text to be classified, input the text to be classified into the target classification model, and output a text classification result.

According to the text classification device based on reinforcement learning, semantic feature extraction is performed on the training text corpus, the extracted semantic feature vectors are clustered to obtain semantic clusters of different categories, and the classification model is trained by using the semantic clusters of different categories, so that the classification model learns the semantic features of different categories in the training text corpus, and the text classification accuracy can be improved.

In some optional implementations of the present embodiment, the reinforcement learning-based text classification device 300 further includes a cluster training module including a clustering sub-module, a computing sub-module, an adjusting sub-module, and a generating sub-module, wherein:

the clustering sub-module is used for inputting the semantic feature vector into a pre-constructed neural network model and outputting a clustering result;

the calculation sub-module is used for determining a clustering loss function according to the clustering result;

the adjustment submodule is used for adjusting model parameters of the neural network model based on the clustering loss function;

And the generation submodule is used for generating a clustering model according to the model parameters when the iteration ending condition is met.

According to the embodiment, the neural network model pre-constructed through reinforcement learning training is used as a clustering model, so that the clustering precision can be improved, and the text classification efficiency is further improved.

In this embodiment, the calculation submodule is further configured to:

calculating the contour coefficient of each cluster in the clustering result;

Obtaining training rewards according to the contour coefficients;

In some optional implementations of the present embodiment, the keyword extraction module 303 includes a similarity calculation sub-module, a ranking sub-module, and a generation sub-module, where:

The similarity calculation submodule is used for calculating the similarity between the keywords of each semantic cluster;

the sorting sub-module is used for sorting the keywords according to the similarity to obtain a sorting result;

And the generation submodule is used for generating a semantic feature queue corresponding to each semantic cluster based on the sequencing result.

In this embodiment, the vector generation module 304 includes an extraction sub-module and a stitching sub-module, where:

The extraction submodule is used for extracting the characteristics of the target keywords to obtain keyword vectors;

And the splicing sub-module is used for splicing the keyword vector and the semantic feature vector to obtain a word semantic vector.

In some alternative implementations of the present embodiment, the training module 305 includes a classification sub-module, a calculation sub-module, an adjustment sub-module, and an output sub-module, where:

the classification submodule is used for inputting the word meaning vector into a pre-constructed initial classification model to obtain a prediction classification result;

The calculation sub-module is used for determining a classification loss function according to the prediction classification result;

the adjustment submodule is used for adjusting model parameters of the initial classification model according to the classification loss function;

And the output submodule is used for generating a target classification model based on the model parameters when the iteration ending condition is met.

In this embodiment, the calculation submodule is further configured to:

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of a text classification method based on reinforcement learning. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, for example, execute computer readable instructions of the text classification method based on reinforcement learning.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

According to the text classification method based on reinforcement learning, the steps of the text classification method based on reinforcement learning in the above embodiment are realized when a processor executes computer readable instructions stored in a memory, semantic feature extraction is performed on training text corpus, extracted semantic feature vectors are clustered to obtain different semantic clusters, and classification models are trained according to the different semantic clusters, so that classification accuracy of the text in the training text corpus can be improved.

The application also provides another embodiment, namely a computer readable storage medium, wherein the computer readable storage medium stores computer readable instructions, and the computer readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the text classification method based on reinforcement learning, and the accuracy of text classification can be improved by extracting semantic features from training text corpus, clustering the extracted semantic feature vectors to obtain different semantic clusters, and training a classification model according to the different semantic clusters.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. A text classification method based on reinforcement learning, comprising the steps of:

Obtaining a text to be classified, inputting the text to be classified into the target classification model, and outputting a text classification result;

wherein, before the step of inputting the semantic feature vector into the trained cluster model and outputting the semantic cluster, the method further comprises the following steps:

determining a clustering loss function according to the clustering result;

the step of determining a clustering loss function according to the clustering result comprises the following steps:

calculating the contour coefficient of each cluster in the clustering result;

Obtaining training rewards according to the contour coefficients;

2. The reinforcement learning based text classification method of claim 1, further comprising, prior to said step of inputting said semantic feature vectors into a trained cluster model, outputting semantic clusters:

3. The reinforcement learning-based text classification method of claim 1, wherein said step of forming a semantic feature queue corresponding to each of said semantic clusters according to the extracted keywords comprises:

calculating the similarity between the keywords of each semantic cluster;

sorting the keywords according to the similarity to obtain a sorting result;

4. The reinforcement learning based text classification method of claim 1, wherein said step of generating a word sense vector based on said target keyword comprises:

Extracting features of the target keywords to obtain keyword vectors;

5. The reinforcement learning based text classification method of claim 1, wherein said step of inputting said word sense vector into a pre-constructed initial classification model for training to obtain a trained target classification model comprises:

6. The reinforcement learning based text classification method of claim 5, wherein said step of determining a classification loss function based on said predictive classification result comprises:

7. A reinforcement learning-based text classification device, comprising:

the classification module is used for acquiring texts to be classified, inputting the texts to be classified into the target classification model and outputting text classification results;

The text classification device based on reinforcement learning further comprises a clustering training module, wherein the clustering training module comprises a clustering sub-module and a computing sub-module, and the text classification device based on reinforcement learning comprises:

The calculation submodule is used for determining a clustering loss function according to the clustering result;

The computation submodule is further to:

calculating the contour coefficient of each cluster in the clustering result;

Obtaining training rewards according to the contour coefficients;

8. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the reinforcement learning based text classification method of any of claims 1 to 6.

9. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the reinforcement learning based text classification method of any of claims 1 to 6.