Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
The application provides a text classification method based on reinforcement learning, which relates to artificial intelligence and can be applied to a system architecture 100 shown in fig. 1, wherein the system architecture 100 can comprise terminal equipment 101, 102 and 103, a network 104 and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the text classification method based on reinforcement learning provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the text classification device based on reinforcement learning is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow chart of one embodiment of a reinforcement learning based text classification method according to the present application is shown, comprising the steps of:
Step S201, obtaining a training text corpus, and extracting semantic features of the training text corpus to obtain semantic feature vectors.
The training text corpus may be obtained from public data sets including, but not limited to, a Chinese news data set, THUCNews data set, an online_ shoppping _10_cas data set, and the like. The method comprises the steps of obtaining an original text corpus from a public data set, performing preprocessing such as word segmentation and stop word removal on the original text corpus, and randomly dividing the preprocessed original text corpus into a training set and a testing set according to a preset proportion, wherein the training set is training text corpus and is a text set.
In this embodiment, the corpus of training text is subjected to semantic feature extraction, and semantic feature extraction may be performed using a semantic feature extraction model, where the semantic feature extraction model includes, but is not limited to, a CNN (Convolutional Neural Networks, convolutional neural network) model, a RNN (RecurrentNeuralNetwork, cyclic neural network) model, an LSTM (Long-short term memory, long-short-term memory network) model, a BERT (Bidirectional Encoder Representations from Transformers, bi-directional encoder characterization based on a converter) model, and the like, without limitation.
As a specific implementation, the training text corpus may be input to a BERT-based pre-training language model for semantic feature extraction.
Step S202, inputting the semantic feature vectors into a trained clustering model, and outputting semantic clusters.
In this embodiment, the semantic feature vectors may be clustered by a trained clustering model, where the clustering algorithm used by the clustering model includes, but is not limited to, a K-means algorithm, and a Single-Pass algorithm.
Taking a Single-Pass algorithm as an example, the clustering process is described in detail, and the steps comprise:
step A, selecting any semantic feature vector as a cluster center of a first cluster;
and B, selecting any unprocessed semantic feature vector from unprocessed other semantic feature vectors, calculating the similarity value of the semantic feature vector and all the existing class clusters, selecting the class cluster with the largest similarity value as the nearest class cluster of the semantic feature vector, and obtaining the similarity value of the nearest class cluster.
It should be appreciated that the algorithm just begins with only one cluster, i.e., the cluster generated in step a, when "all clusters in existence" are the clusters. With the operation of the algorithm, a new class cluster is created for the semantic feature vector or the semantic feature vector is classified into different class clusters, the number of the class clusters is increased, and the existing all class clusters refer to all the class clusters which are generated currently.
And C, judging the similarity value of the nearest class cluster and a similarity threshold value, if the similarity value is larger than the similarity threshold value, classifying the semantic feature vector selected in the step B into the nearest class cluster and updating the center of the nearest class cluster, otherwise, taking the semantic feature vector selected in the step B as the class cluster center of a new class cluster.
In this embodiment, the center of the class cluster is the average of the semantic feature vectors in the class cluster. The method is characterized by comprising the following steps:
wherein C represents the centroid vector of the class cluster, n represents the number of semantic feature vectors of the class cluster, and d i represents the semantic feature vectors in the class cluster.
The update is to recalculate the average value of the semantic feature vectors of the clusters after adding one semantic feature vector.
And D, judging whether the semantic feature vectors in the semantic feature vector set to be processed are processed, if not, returning to the step C, otherwise, outputting a clustering result.
Step S203, extracting keywords from all the semantic clusters, and forming a semantic feature queue corresponding to each semantic cluster according to the extracted keywords.
In this embodiment, a preset number of keywords are extracted for each semantic cluster, and the method for extracting keywords includes, but is not limited to, TF-IDF (word frequency-inverse document frequency) algorithm, LDA algorithm, and the like.
And carrying out similarity sequencing on the extracted keywords to form a semantic feature sequence. Specifically, the similarity between the keywords of each semantic cluster is calculated, the keywords are ranked according to the similarity, a ranking result is obtained, and a semantic feature queue corresponding to each semantic cluster is generated based on the ranking result.
Wherein, computing the similarity between each semantic cluster keyword may employ a similarity algorithm including, but not limited to, cosine similarity (Cosine Similarity), lycemic distance (LEVENSHTEIN DISTANCE), and the like. After calculating the similarity between the keywords, the keywords are arranged in an ascending order according to the similarity, and a semantic feature queue of the semantic cluster with the similarity arranged from high to low is formed.
Step S204, selecting keywords from each semantic feature queue as target keywords, and generating word sense vectors based on the target keywords.
In the present embodiment, the text classification model is trained by reinforcement learning, and prior to training, the following definition is performed:
Before each training round, randomly selecting a keyword from each semantic feature queue, obtaining a corresponding vector by adopting the semantic feature extraction model, and multiplying the vector by a preset coefficient.
The classification rewards are defined as classification rewards by dividing the predicted label class value by the correct label value of the sample and multiplying the inverse of the proportion of the class in the whole sample. For example, assuming that the samples have 5 total categories, category 1 accounts for 1/10, category 2 accounts for 1/5, category 3 accounts for 1/4, category 4 accounts for 1/3, category 5 accounts for 7/60, the category 1 reward coefficient is 10, category 2 reward coefficient is 5, category 3 reward coefficient is 4, category 4 reward coefficient is 3, and category 5 reward coefficient is about 8.57.
Before each round of training, a preset number of keywords are randomly selected from each semantic feature queue to serve as target keywords, and specifically, one keyword is randomly selected.
And extracting the characteristics of the target keywords, obtaining vectors corresponding to the target keywords by using the semantic characteristic extraction model, and multiplying the vectors by preset coefficients to obtain word semantic vectors corresponding to the target keywords.
It should be noted that, the preset coefficient is the reciprocal of the absolute value of the contour coefficient of the semantic cluster where the target keyword is located.
And splicing the keyword vector and the semantic feature vector obtained through the semantic feature extraction model to obtain a word semantic vector.
In the embodiment, the keyword vectors and the semantic feature vectors selected from different semantic clusters are spliced, and the obtained word semantic vectors are used for training the classification model, so that the classification model can learn different semantic features in the training sample, and the accuracy of model classification is improved.
Step S205, inputting the word sense vector into a pre-constructed initial classification model for training to obtain a trained target classification model.
In this embodiment, the pre-built initial classification model is a multi-layer neural network model with N actions corresponding to the classification, the word sense vector is input into the multi-layer neural network model, and training and updating are performed on the multi-layer neural network model according to the classification reward value, so that the classification reward value is maximized, where N is a natural number greater than zero.
As a specific implementation mode, the multi-layer neural network model comprises an input layer, a first hidden layer, a second hidden layer and an output layer, wherein the input layer inputs word meaning vectors v, a first hidden layer weight matrix is set to be w 1, a relu activation function is adopted, the bias amount is set to be b 1, the output of the first hidden layer is o 1=relu(w1*v+b1, a second hidden layer weight matrix is set to be w 2, a relu activation function is adopted, the bias amount is set to be b 2, the output of the second hidden layer is o 2=relu(w2*o1+b2, the output layer adopts a softmax layer, o 2 is input into the softmax layer, and o 3,o3, namely the class probability Pa predicted by each training is obtained through the softmax layer.
In order to achieve better text classification, more hidden layers can be set according to the actual situation.
Step S206, obtaining a text to be classified, inputting the text to be classified into a target classification model, and outputting a text classification result.
And acquiring the text to be classified, inputting the text to be classified into the target classification model, and outputting a text classification result.
According to the method, the semantic feature extraction is carried out on the training text corpus, the extracted semantic feature vectors are clustered to obtain different semantic clusters, and the classification model is trained according to the different semantic clusters, so that the classification model learns the semantic features of different categories in the training text corpus, and the text classification accuracy can be improved.
In some optional implementations of this embodiment, before the step of inputting the semantic feature vector into the trained cluster model and outputting the semantic cluster, the step of further includes:
inputting the semantic feature vector into a pre-constructed neural network model, and outputting a clustering result;
determining a clustering loss function according to the clustering result;
model parameters of the neural network model are adjusted based on the clustering loss function;
And when the iteration ending condition is met, generating a clustering model according to the model parameters.
In this embodiment, the pre-built neural network model may have the same structure as the classification model, and includes an input layer, a first hidden layer, a second hidden layer and an output layer, where the input layer inputs a semantic feature vector x, the first hidden layer weight matrix is set to be w 1, a relu activation function is adopted, the bias amount is b 1, the output of the first hidden layer is o 1=relu(w1*x+b1), the second hidden layer weight matrix is set to be w 2, a relu activation function is adopted, the bias amount is b 2, the output of the second hidden layer is o 2=relu(w2*o1+b2, the output layer adopts a softmax layer, the o 2 is input into the softmax layer, and the probability Pc of o 3,o3, that is, each action is obtained through the softmax layer. Different hidden layers can be set according to actual needs.
In this embodiment, model parameters of the neural network model are adjusted based on the loss function, and when the iteration end condition is satisfied, a cluster model is generated according to the model parameters.
Specifically, model parameters of the neural network model are adjusted based on the loss function value of the loss function, iterative training is continued, the model is trained to a certain extent, at this time, the performance of the model reaches an optimal state, and the loss function value is hardly changed, namely convergence. And when the iteration ending condition is met, model convergence is achieved, and after model convergence, a final neural network model is output as a clustering model according to the finally adjusted model parameters.
According to the method, the neural network model pre-constructed through reinforcement learning training is used as a clustering model, so that the clustering precision can be improved, and the text classification efficiency can be improved.
In some optional implementations, the step of determining the cluster loss function according to the clustering result includes:
Calculating the contour coefficient of each cluster in the clustering result;
obtaining training reward points according to the contour coefficients;
and obtaining a clustering loss function based on the clustering result and the training reward score.
The contour coefficient can be calculated by adopting an index-contour coefficient for measuring the clustering effect in a clustering algorithm.
For each semantic feature vector o in cluster D, calculating o and other objects in the cluster to which o belongsThe average distance a (o) between the two is calculated as follows:
b (o) is the minimum average distance of o to all clusters not containing o, and the formula is as follows:
The contour coefficients are:
in the present embodiment, the cluster model is trained by reinforcement learning, and prior to training, the following definition is performed:
The prize value is defined as giving the prize value 1+1/|s (o) | when the contour coefficient is less than the preset threshold Tg, and otherwise giving the prize value- |s (o) |.
At the end of each training period, a training prize score S 1 is calculated up to that point in time, calculated by the following formula:
Wherein, gamma is a gain attenuation coefficient, n is the number of training periods, i=1 to (n-1), S t is the reward score obtained in the t training period, namely when the contour coefficient is smaller than a preset threshold Tg, S t =1+1/|s (o) |, otherwise S t = - |s (o) |.
In an embodiment, obtaining the clustering loss function based on the clustering result and the training reward score specifically includes:
Calculating the logarithmic value of the clustering result, multiplying the logarithmic value and the training rewards score to obtain a product value, taking the negative value of the product value as a clustering loss function, and then calculating the clustering loss function as follows:
Loss=-S1×logPct;
Where Pc t represents the probability of the t-th round of action.
In this embodiment, the model is iteratively updated by using the clustering result and the clustering loss function obtained by training the reward points, so that the training reward points are maximized, and the accuracy of the model is ensured.
In some optional implementations of this embodiment, the step of inputting the word sense vector into the pre-constructed initial classification model to perform training, and obtaining the trained target classification model includes:
inputting the word meaning vector into a pre-constructed initial classification model to obtain a prediction classification result;
Determining a classification loss function according to the prediction classification result;
Model parameters of the initial classification model are adjusted according to the classification loss function;
and when the iteration end condition is met, generating a target classification model based on the model parameters.
Specifically, model parameters of the initial classification model are adjusted based on the loss function value of the classification loss function, iterative training is continued, the model is trained to a certain extent, at this time, the performance of the model reaches an optimal state, and the loss function value is hardly changed, namely, convergence. And when the iteration ending condition is met, model convergence is achieved, and after model convergence, a final classification model is output as a target classification model according to the finally adjusted model parameters.
According to the word sense vector training classification model obtained by splicing the keyword vectors and the semantic feature vectors of different semantic clusters, the classification model can learn different categories and implicit semantic features in the training text corpus, and the accuracy of text classification is further improved.
In this embodiment, the step of determining the classification loss function according to the prediction classification result includes:
calculating to obtain a classified rewarding value according to the prediction classification result;
A classification loss function is derived based on the classification prize value and the predicted classification result.
Specifically, the logarithmic value of the prediction classification result Pa is calculated, the logarithmic value and the classification rewarding value S 2 are multiplied to obtain a product value, and the negative value of the product value is taken as the classification loss function, so that the calculation formula of the classification loss function is as follows:
Loss=-S2×logPat;
Wherein Pa t represents the classified label probability of the training output of the t-th round.
According to the method and the device, the model is updated iteratively through the classification loss function obtained by predicting the classification result and the classification rewarding value, so that the classification rewarding value is maximized, the accuracy of the classification model is guaranteed, and the accuracy of text classification is improved.
The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a text classification apparatus based on reinforcement learning, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus is particularly applicable to various electronic devices.
As shown in fig. 3, the text classification device 300 based on reinforcement learning according to the present embodiment includes a semantic feature extraction module 301, a clustering module 302, a keyword extraction module 303, a vector generation module 304, a training module 305, and a classification module 306. Wherein:
The semantic feature extraction module 301 is configured to obtain a training text corpus, and perform semantic feature extraction on the training text corpus to obtain a semantic feature vector;
The clustering module 302 is configured to input the semantic feature vector into a trained clustering model, and output a semantic cluster;
the keyword extraction module 303 is configured to extract keywords from all the semantic clusters, and form a semantic feature queue corresponding to each semantic cluster according to the extracted keywords;
the vector generation module 304 is configured to select a keyword from each semantic feature queue as a target keyword, and generate a word sense vector based on the target keyword;
The training module 305 is configured to input the word sense vector into a pre-constructed initial classification model for training, so as to obtain a trained target classification model;
the classification module 306 is configured to obtain a text to be classified, input the text to be classified into the target classification model, and output a text classification result.
According to the text classification device based on reinforcement learning, semantic feature extraction is performed on the training text corpus, the extracted semantic feature vectors are clustered to obtain semantic clusters of different categories, and the classification model is trained by using the semantic clusters of different categories, so that the classification model learns the semantic features of different categories in the training text corpus, and the text classification accuracy can be improved.
In some optional implementations of the present embodiment, the reinforcement learning-based text classification device 300 further includes a cluster training module including a clustering sub-module, a computing sub-module, an adjusting sub-module, and a generating sub-module, wherein:
the clustering sub-module is used for inputting the semantic feature vector into a pre-constructed neural network model and outputting a clustering result;
the calculation sub-module is used for determining a clustering loss function according to the clustering result;
the adjustment submodule is used for adjusting model parameters of the neural network model based on the clustering loss function;
And the generation submodule is used for generating a clustering model according to the model parameters when the iteration ending condition is met.
According to the embodiment, the neural network model pre-constructed through reinforcement learning training is used as a clustering model, so that the clustering precision can be improved, and the text classification efficiency is further improved.
In this embodiment, the calculation submodule is further configured to:
calculating the contour coefficient of each cluster in the clustering result;
Obtaining training rewards according to the contour coefficients;
And obtaining the clustering loss function based on the clustering result and the training reward score.
In this embodiment, the model is iteratively updated by using the clustering result and the clustering loss function obtained by training the reward points, so that the training reward points are maximized, and the accuracy of the model is ensured.
In some optional implementations of the present embodiment, the keyword extraction module 303 includes a similarity calculation sub-module, a ranking sub-module, and a generation sub-module, where:
The similarity calculation submodule is used for calculating the similarity between the keywords of each semantic cluster;
the sorting sub-module is used for sorting the keywords according to the similarity to obtain a sorting result;
And the generation submodule is used for generating a semantic feature queue corresponding to each semantic cluster based on the sequencing result.
In this embodiment, the vector generation module 304 includes an extraction sub-module and a stitching sub-module, where:
The extraction submodule is used for extracting the characteristics of the target keywords to obtain keyword vectors;
And the splicing sub-module is used for splicing the keyword vector and the semantic feature vector to obtain a word semantic vector.
In the embodiment, the keyword vectors and the semantic feature vectors selected from different semantic clusters are spliced, and the obtained word semantic vectors are used for training the classification model, so that the classification model can learn different semantic features in the training sample, and the accuracy of model classification is improved.
In some alternative implementations of the present embodiment, the training module 305 includes a classification sub-module, a calculation sub-module, an adjustment sub-module, and an output sub-module, where:
the classification submodule is used for inputting the word meaning vector into a pre-constructed initial classification model to obtain a prediction classification result;
The calculation sub-module is used for determining a classification loss function according to the prediction classification result;
the adjustment submodule is used for adjusting model parameters of the initial classification model according to the classification loss function;
And the output submodule is used for generating a target classification model based on the model parameters when the iteration ending condition is met.
According to the word sense vector training classification model obtained by splicing the keyword vectors and the semantic feature vectors of different semantic clusters, the classification model can learn different categories and implicit semantic features in the training text corpus, and the accuracy of text classification is further improved.
In this embodiment, the calculation submodule is further configured to:
calculating to obtain a classified rewarding value according to the prediction classification result;
And obtaining a classification loss function based on the classification reward value and the prediction classification result.
According to the method and the device, the model is updated iteratively through the classification loss function obtained by predicting the classification result and the classification rewarding value, so that the classification rewarding value is maximized, the accuracy of the classification model is guaranteed, and the accuracy of text classification is improved.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of a text classification method based on reinforcement learning. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, for example, execute computer readable instructions of the text classification method based on reinforcement learning.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
According to the text classification method based on reinforcement learning, the steps of the text classification method based on reinforcement learning in the above embodiment are realized when a processor executes computer readable instructions stored in a memory, semantic feature extraction is performed on training text corpus, extracted semantic feature vectors are clustered to obtain different semantic clusters, and classification models are trained according to the different semantic clusters, so that classification accuracy of the text in the training text corpus can be improved.
The application also provides another embodiment, namely a computer readable storage medium, wherein the computer readable storage medium stores computer readable instructions, and the computer readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the text classification method based on reinforcement learning, and the accuracy of text classification can be improved by extracting semantic features from training text corpus, clustering the extracted semantic feature vectors to obtain different semantic clusters, and training a classification model according to the different semantic clusters.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.