CN109635274B - Text input prediction method, device, computer equipment and storage medium - Google Patents
Text input prediction method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109635274B CN109635274B CN201811256223.9A CN201811256223A CN109635274B CN 109635274 B CN109635274 B CN 109635274B CN 201811256223 A CN201811256223 A CN 201811256223A CN 109635274 B CN109635274 B CN 109635274B
- Authority
- CN
- China
- Prior art keywords
- word vector
- text
- sample
- word
- sample word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/274—Converting codes to words; Guess-ahead of partial word inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The application discloses a text input prediction method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a training text, and converting N words in the training text into corresponding N sample word vectors according to a preset dictionary to obtain a sample word vector sequence; selecting M sample word vectors to input a neural process model to obtain corresponding predicted word vectors, wherein M is less than N; constructing a loss function according to the difference of the M+1st sample word vector in the predicted word vector and the sample word vector sequence; adjusting parameters in the neural process model according to the loss function until the loss function meets a preset ending condition; acquiring a target text, and converting the target text into a corresponding target word vector according to a preset dictionary; and inputting the target word vector into a prediction function to obtain a predicted word. The application predicts the input text by using the neural network, and can quickly and accurately obtain the prediction function through the neural process model without setting an connotation function.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for predicting text input, a computer device, and a storage medium.
Background
In order to speed up the text input by the user, a recurrent neural network model is used to predict the text that the user is likely to subsequently input based on the text that the user has input. In the prior art, a Gaussian process fitting function is adopted in a training stage of a cyclic neural network model, but the calculation amount required by the Gaussian process is large, and a kernel function which is set in advance is required to realize the Gaussian process, so that the calculation efficiency and the training efficiency of a text input prediction tool are limited.
Disclosure of Invention
The application mainly aims to provide a text input prediction method, a device, computer equipment and a storage medium, and aims to solve the technical problem that the traditional cyclic neural network model is low in calculation efficiency when used for text input prediction.
In order to achieve the above object, the present application provides a text input prediction method, including:
acquiring a training text, and converting N words in the training text into N corresponding sample word vectors according to a preset dictionary to obtain a sample word vector sequence;
m sample word vectors are selected from the sample word vector sequences and input into a neural process model, so that corresponding predicted word vectors are obtained, wherein M is smaller than N;
constructing a loss function according to the difference of the M+1st item sample word vector in the predicted word vector and the sample word vector sequence;
adjusting parameters in the neural process model according to the loss function until the loss function meets a preset ending condition;
acquiring a target text, and converting the target text into a corresponding target word vector according to a preset dictionary;
and inputting the target word vector into a prediction function in the neural process model to obtain a predicted word.
Preferably, the step of selecting M word vectors from the sample word vectors to input into the neural process model to obtain corresponding predicted word vectors includes:
obtaining a global latent variable according to the input sample word vector;
and obtaining the corresponding predicted word vector according to the sample word vector and the global latent variable.
Preferably, the neural process model is:
wherein ,xi For the input sample word vector, y i For the sample word vector x according to the input i The output predictive word vector, p (z) is a multi-element normal distribution, z is a global latent variable, g (x) i Z) is a predictive function, σ 2 Is random noise.
Preferably, the step of constructing a loss function according to the difference between the predicted word vector and the m+1st sample word vector in the sample word vector sequence includes:
calculating the mean square error of the M+1st sample word vector in the predicted word vector and the sample word vector sequence;
and constructing the loss function according to the mean square error.
Preferably, the step of adjusting parameters in the neural process model according to the loss function until the loss function meets a preset end condition, and obtaining a prediction function includes:
judging whether the loss function is larger than the preset threshold value or not;
if the loss function is larger than the preset threshold value, performing convex function optimization on the loss function;
adjusting parameters in the neural process model according to the convex function optimized loss function;
and recalculating a predicted word vector by adopting the neural process model with the parameters adjusted until a loss function constructed by the difference between the predicted word vector and the M+1th sample word vector in the sample word vector sequence is smaller than or equal to the preset threshold value.
Preferably, the step of obtaining the training text and converting N words in the training text into corresponding N sample word vectors according to a preset dictionary to obtain a sample word vector sequence includes:
acquiring a training text;
word segmentation is carried out on the training text according to the part of speech and word length to obtain a word group;
and converting N words in the training text into corresponding N sample word vectors according to a preset dictionary to obtain a sample word vector sequence.
Preferably, the step of inputting the target word vector into the prediction function to obtain a predicted word includes:
sequentially inputting the target word vectors into the prediction function according to the sequence of words corresponding to the target word vectors in the target text to obtain predicted word vectors;
calculating the distance between the predicted word vector and each standard word vector in the preset dictionary;
determining the standard word vector corresponding to the minimum value in the distance;
and setting the predicted word as the word corresponding to the standard word vector.
In addition, to achieve the above object, the present application further provides a text processing apparatus, which is characterized by comprising a model trainer and a text processor, the model trainer comprising:
the first vectorization module is used for acquiring a training text, converting N words in the training text into N corresponding sample word vectors according to a preset dictionary, and obtaining a sample word vector sequence;
the first input module is used for selecting M sample word vectors from the sample word vector sequences to input a neural process model;
the first output module is used for outputting predicted word vectors corresponding to the M sample word vectors through the neural process model;
the difference judging module is used for constructing a loss function according to the difference of the predicted word vector and the M+1st sample word vector in the sample word vector sequence;
the adjusting module is used for adjusting parameters in the nerve process model according to the loss function until the loss function meets a preset ending condition;
the text processor includes:
the second vector quantization module is used for acquiring a target text and converting the target text into a corresponding target word vector according to a preset dictionary;
and the second calculation module is used for inputting the target word vector into a prediction function in the neural process model to obtain a predicted word.
In addition, to achieve the above object, the present application also provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program implementing the steps of the above method when executed by the processor.
In addition, in order to achieve the above object, the present application also provides a storage medium, wherein the storage medium stores a computer program, and the computer program realizes the steps of the above method when being executed by a processor.
According to the text input prediction method, device, computer equipment and storage medium provided by the embodiment of the application, the training text is trained by adopting the neural process model, so that the process of setting a kernel function can be omitted, and the limitation of other neural network models on function design can be overcome; meanwhile, the training time of the model is shortened by utilizing the efficient computing capacity of the nerve process model. The prediction function obtained through the neural process model can predict the text to be input according to the target text, so that the text input efficiency of a user is improved.
Drawings
FIG. 1 is a schematic diagram of a computer device in a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a flowchart of a text input prediction method according to a first embodiment of the present application;
FIG. 3 is a flowchart of a text input prediction method according to a second embodiment of the present application;
FIG. 4 is a flowchart of a text input prediction method according to a third embodiment of the present application;
FIG. 5 is a flowchart of a text input prediction method according to a fourth embodiment of the present application;
FIG. 6 is a flowchart of a text input prediction method according to a fifth embodiment of the present application;
fig. 7 is a flowchart of a sixth embodiment of the text input prediction method of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, fig. 1 is a schematic diagram of a computer device structure of a hardware running environment according to an embodiment of the present application.
The computer equipment of the embodiment of the application can be a server, and can also be computer equipment with a data processing function such as a smart phone, a tablet personal computer, a portable computer and the like.
As shown in fig. 1, the computer device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Optionally, the computer device may further include an audio circuit, a WiFi module, a touch screen, and so on, which are not described herein. The computer equipment can acquire text data which is input by a user through the input unit and needs to be processed through the input unit, the acquired text data is used as a target text or a training text, the computer equipment can also receive the text data which is transmitted by other equipment and needs to be processed through a network interface, and the received text data is used as the target text or the training text; the computer device may further acquire text data presented in the display screen selected by the user through the input unit, and use the acquired text data as training text.
Those skilled in the art will appreciate that the computer device structure shown in FIG. 1 is not limiting of the computer device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a text input prediction method according to the present application, where the text input prediction method includes:
step S100, obtaining a training text, and converting N words in the training text into N corresponding sample word vectors according to a preset dictionary to obtain a sample word vector sequence;
wherein the training text is text data comprising a plurality of words composed according to language logic. Text data is a string of more than one character in sequence. Target text such as "weather today" or "my name Wang Pingpang", etc.
Specifically, the training text can be firstly segmented to obtain each word forming the training text, then each word is vectorized to obtain a word vector corresponding to each word, and then each sample word vector is sequentially combined to obtain a sample word vector sequence. The sample word vector sequence comprises each sample word vector obtained by vectorizing each word in a text form. The sequence of each sample word vector is consistent with the sequence of the corresponding word in the target text. It will be appreciated that the sample word vector corresponding to the first word in the target text is in the top order in the sequence of sample word vectors. A word vector is data for expressing words in text form in mathematical form. For example, "a letter" in a text form is expressed as a mathematical form "[ 00010000000.]", and in this case, "[ 00010000000..1" is a word vector of "weather", it is to be understood that the word vector into which the word in a text form is converted is not limited herein as long as the word in a text form can be expressed mathematically. For example, the target text is "my name is Wang Pingpang", and the word is segmented to obtain "i/m/y/Wang Pingpang", then the word vector of the word "i" is V1, the word vector of the word "i" is V2, the word vector of the word "name" is V3, the word vector of the word "yes" is V4, the word vector of the word "Wang Pingpang" is V5, and the word vectors included in the sample word vector sequence are V1, V2, V3, V4, and V5 in order.
Step S200, selecting M sample word vectors from the sample word vector sequence to be input into a neural process model to obtain corresponding predicted word vectors, wherein M is less than N;
the Neural process model (Neural process) is one of Neural network models, and is a machine learning model which needs to be trained, and particularly, the Neural process model can learn the word arrangement sequence logic in the sample text, and has the capability of predicting the next word to be input according to the current input text after learning training is completed. That is, the neural process model can perform linear or nonlinear transformation on the input data through a series of parameters and operation logic to obtain an operation result. The parameters and the arithmetic logic can reflect the correspondence of the input and the output.
The neural process model is similar to a Gaussian process, represents infinite different functions at unobserved positions, and can capture the uncertainty of prediction on the basis of a given observation result, so that function approximation is realized, a prediction rule is obtained, and a prediction function is obtained. Compared with a Gaussian process, the neural process model has high calculation efficiency, and the limitation on the design of a plurality of functions is overcome by directly learning an implicit kernel (implicit kernel) from data, namely, kernel functions are not required to be set when the neural process model is set, so that the efficiency of text prediction is accelerated.
The selection of the M sample word vectors is random, and the value of M is not necessarily required every time the step S200 is performed, and the ordering position of the words corresponding to the M sample word vectors in the sample text is also not necessarily required.
Step S300, constructing a loss function according to the difference between the predicted word vector and the M+1st sample word vector in the sample word vector sequence;
the predicted word vector is predicted by a neural network model, and the M+1st sample word vector in the sample word vector sequence is actually existing in the training text, that is, the M+1st sample word vector is a true value of the relative predicted word vector. The computer device may construct a loss function based on the difference between the predicted word vector and the actual value, and adjust parameters in the neural network model in a direction that minimizes the loss function, so that the output of the adjusted neural network model better meets the requirements.
Step S400, adjusting parameters in the neural process model according to the loss function until the loss function meets a preset ending condition;
in fact, the training process of the model needs to be repeated for a plurality of times, after the parameters in the neural process model are adjusted according to the loss function, the sample word vector sequence is input into the neural process model with the parameters adjusted to calculate the predicted word vector, namely, the steps S200 and S300 are repeated until the loss function meets the preset end condition, and the training is stopped.
The loss function can embody the accuracy of the prediction result output by the model, the smaller the loss function is, the higher the accuracy of the prediction is, the person skilled in the art can set a preset threshold value corresponding to the loss function by himself, and when the loss function is smaller than the preset threshold value, the training is stopped. The preset threshold corresponding to the number of times of repeating the steps S200, S300, that is, the preset threshold corresponding to the number of iterations of the neural process model may also be set, and when the number of iterations is greater than the preset threshold, the training is stopped.
Step S500, obtaining a target text, and converting the target text into a corresponding target word vector according to a preset dictionary;
and step S600, inputting the target word vector into a prediction function in the neural process model to obtain a predicted word.
Wherein the target text is text data to be predicted. The user may input the target text through an input unit of the computer device, thereby acquiring the target text. And vectorizing the target text according to the preset dictionary to form a target word vector, and calculating a predicted word vector through a prediction function to obtain a predicted word.
Steps S100 to S500 may be performed in the same computer device. In another embodiment, steps S100 through S400 may be performed in one computer device and step S500 may be performed in another computer device. For example: step S100 to step S400 are operated in a server to obtain a prediction function, the server sends the prediction function to a mobile terminal used by a user, the mobile terminal used by the user receives the prediction function, and step S500 is executed to obtain a prediction word.
In the text input prediction method, training text is trained by adopting a neural process model, so that the process of setting a kernel function can be omitted, and the limitation of a plurality of neural network models on function design is overcome; meanwhile, the training time of the model is shortened by utilizing the efficient computing capacity of the nerve process model. The prediction function obtained through the neural process model can predict the text to be input according to the target text, so that the text input efficiency of a user is improved.
Further, referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of the text input prediction method according to the present application, based on the above embodiment, the step S200 includes:
step S210, obtaining a global latent variable according to the input sample word vector;
global latent variables are latent variables that can be referenced by all objects or functions in the neural process model. Latent variables are variables that cannot be directly observed, as opposed to observable variables, but can be inferred from other variables observed. Since the corresponding prediction function cannot be derived directly from the sample word vector x and the prediction word vector y corresponding to the sample word vector x, the global latent variable z is introduced into the neural process model to derive the prediction function.
And step S220, obtaining the corresponding predicted word vector according to the sample word vector and the global latent variable.
Specifically, since step S210 is performed, it is assumed that the word vector following the sample word vector of the mth term is unknown, and the output predicted word vector y is obtained by inputting the word vector into the neural process model, that is, the sample word vector corresponding to the m+1 term is predicted, that is, it is assumed that the sample word vector corresponding to the m+1 term is unknown. It will be appreciated that in step S210, the input sample word vector is used to obtain the global latent variable z, and the predicted word vector y is obtained according to the global latent variable z.
Further, the neural process model is:
wherein ,xi For the input sample word vector, y i For the sample word vector x according to the input i The output predictive word vector, p (z) is a multi-element normal distribution, z is a global latent variable, g (x) i Z) is a predictive function, σ 2 Is random noise.
This logical process can be embodied by a function, provided that the M+1th word vector is predicted to be a marginal distribution of finite dimensions from the M word vectors in the training text. For a finite number of word vector sequences x 1:n The predicted word vector y can be obtained. Due to the introduction of the global latent variable z, the function= can be set, and then the prediction function is obtained.
Further, referring to fig. 4, fig. 4 is a flowchart of a third embodiment of the text input prediction method according to the present application, based on the above embodiment, the step S300 includes:
step S310, calculating the mean square error of the M+1st sample word vector in the predicted word vector and the sample word vector sequence;
the mean square error is calculated as follows:
J=(y 1 -y 2 ) 2 ,
wherein ,y1 Predictive word vector, y, output for neural process model 2 Is the actual true value, i.e., the M+1st term vector.
And step S320, constructing the loss function according to the mean square error.
The loss function (loss function) is used to measure the degree of inconsistency between the predicted value f (x) and the true value y of the model, and is a non-negative real value function, generally represented by L (a, f (x)), and the smaller the loss function, the better the model accuracy. The specific type of loss function employed is not limited in the present application.
In this embodiment, the difference between the predicted word vector and the true value is compared, so as to construct a loss function, and parameters in the neural process model are further modified through the loss function, so that the purpose of fitting the distribution of the predicted function can be achieved.
Further, referring to fig. 5, fig. 5 is a flowchart of a fourth embodiment of the text input prediction method according to the present application, based on the above embodiment, the step S400 includes:
step S410, determining whether the loss function is greater than the preset threshold;
the preset threshold is a value set by a person skilled in the art according to actual needs.
Step S420, if the loss function is greater than the preset threshold, performing convex function optimization on the loss function;
if the loss function is smaller than or equal to the preset threshold, obtaining a prediction function in the neural process model, and executing step S500; if the mean square error is smaller than or equal to the preset threshold, the difference degree between the preset value and the true value output by the neural process model meets the requirement, and the neural network model can be stopped from being adjusted.
Convex functions refer to a class of functions defined in solid linear space. Convex function optimization is the minimization of the loss function. The specific type of convex function optimization employed is not limited in the present application. In an embodiment, an adaptive time estimation method (Adaptive Moment Estimation, ADAM) is adopted for optimization, and compared with other adaptive learning rate algorithms, the ADAM optimization method has a faster convergence speed and a more effective learning effect, and can correct problems in other optimization technologies, such as disappearance of learning rate, too slow convergence or larger fluctuation of a loss function caused by parameter update with high variance.
Step S430, adjusting parameters in the neural process model according to the convex function optimized loss function;
specifically, since the neural process model is:
in particular, parameters z, p (z), N (y) in the neural process model can be adjusted according to the loss function optimized by the convex function i |g(x i ,z),σ 2 )、g(x i Z) such thatA fitting distribution is reached.
And S440, recalculating a predicted word vector by adopting the neural process model with the parameters adjusted until a loss function constructed by the difference between the predicted word vector and the M+1st sample word vector in the sample word vector sequence is smaller than or equal to the preset threshold.
And (3) recalculating the predicted word vector by adopting the neural process model with the adjusted parameters, namely repeating the steps S200 and S300 by adopting the neural process model with the adjusted parameters.
Further, referring to fig. 6, fig. 6 is a flowchart of a fifth embodiment of the text input prediction method according to the present application, based on the above embodiment, the step S100 includes:
step S110, obtaining training texts;
step S120, word segmentation is carried out on the training text according to the part of speech and word length to obtain a word group;
where word segmentation is the process of segmenting a continuous character sequence into individual characters or character sequences. Part of speech (POS) is data reflecting the type to which the content of a word belongs. The parts of speech include 12 parts of speech such as adjectives, prepositions, predicates and nouns. Word length is the number of characters contained in a word, and the part of speech and word length can influence the meaning corresponding to the word to a great extent, thereby influencing the prediction of the next word.
Specifically, the computer device may perform word segmentation on the training text by using a preset word segmentation manner to obtain a plurality of characters or character sequences, where the characters or character sequences form word groups according to the sequence of occurrence of the characters or character sequences in the text. The computer equipment determines the part of speech corresponding to each word in the word sequence according to a preset vocabulary, and counts the word length corresponding to each word. The preset word segmentation mode can be word segmentation modes based on character matching, semantic understanding, punctuation mark division or statistics. The computer device may set a word length threshold for the words obtained by word segmentation such that the word length of each word obtained by word segmentation does not exceed the word length threshold.
Step S130, converting N words in the training text into corresponding N sample word vectors according to a preset dictionary, and obtaining a sample word vector sequence.
The pre-set dictionary is a database or function pre-trained by one skilled in the art to convert words into corresponding word vectors according to the pre-set dictionary. Generally, word vectors corresponding to semantically identical words are also similar.
The computer equipment vectorizes each word according to the content, the part of speech and the word length of the word to obtain a corresponding word vector of the word, thereby obtaining a sample word vector sequence consisting of sample word vectors. Wherein the computer device may utilize a neural network model to convert the word into a word vector, in this embodiment, the type of neural network model that converts the word into a word vector is not limited.
Further, referring to fig. 7, fig. 7 is a flowchart of a sixth embodiment of the text input prediction method according to the present application, based on the above embodiment, the step S600 includes:
step S610, according to the sequence of words corresponding to each target word vector in the target text, sequentially inputting the target word vectors into the prediction function to obtain predicted word vectors;
before step S610, the user inputs character data into the computer device through the input unit, and converts the words in the target text into corresponding target word vectors according to the preset dictionary, and the step of obtaining the target word vector sequence may refer to steps S120 to S130 in the fifth embodiment.
Since the predictive function is a well-trained distribution function that reveals the logical rules of the context. Therefore, after the target word vector is input, the predicted word vector corresponding to the target word vector, namely the word vector with the largest occurrence probability after the target word vector, can be obtained through calculation.
Step S620, calculating the distance between the predicted word vector and each standard word vector in the preset dictionary;
word vectors are themselves high-dimensional vectors, typically in the tens of thousands to hundreds of thousands of dimensions. The distance between every two word vectors can be calculated through cosine similarity or Euclidean distance, and the closer the distance is, the higher the identity is. In this embodiment, the distance between word vectors is calculated using the euclidean distance. Euclidean distance refers to the arithmetic square root of the sum of squares of the difference in each dimension of the two word vectors.
Step S630, determining the standard word vector corresponding to the minimum value in the distance;
step S640, setting the predicted word as the word corresponding to the standard word vector.
Because the corresponding relation between the words and the word vectors is arranged in the preset dictionary, the words corresponding to the predicted word vectors can be obtained through the predicted word vectors, and the words are predicted words which are to be input after the target text is predicted after the target text is analyzed.
In addition, the embodiment of the application also provides a text processing device, which comprises a model trainer and a text processor, wherein the model trainer comprises:
the first vectorization module 11 is configured to obtain a training text, and convert N words in the training text into corresponding N sample word vectors according to a preset dictionary, so as to obtain a sample word vector sequence;
a first input module 12 for selecting M sample word vectors from the sequence of sample word vectors to input into a neural process model;
the first output module 13 is configured to output predicted word vectors corresponding to the M sample word vectors by using the neural process model;
a construction module 14, configured to construct a loss function according to the difference between the predicted word vector and the m+1st sample word vector in the sample word vector sequence;
the adjusting module 15 is configured to adjust parameters in the neural process model according to the loss function until the loss function meets a preset end condition;
the text processor includes:
the second vectorization module 21 is configured to obtain a target text, and convert the target text into a corresponding target word vector according to a preset dictionary;
a second calculation module 22, configured to input the target word vector into a prediction function in the neural process model, so as to obtain a predicted word.
Further, in yet another embodiment, the model trainer further includes a first calculation module 16, where the first calculation module 16 is configured to obtain a global latent variable according to the input sample word vector;
the first output module 13 is configured to obtain the corresponding predicted word vector according to the sample word vector and the global latent variable.
Further, in yet another embodiment, the first calculating module 16 is further configured to calculate a mean square error of the predicted word vector and an m+1st word vector in the sequence of sample word vectors;
the construction module 14 is configured to construct the loss function;
the model trainer further comprises a judging module 17 for judging whether the loss function is larger than the preset threshold value;
the construction module 14 is further configured to, if the loss function is greater than the preset threshold, adjust the parameters in the neural process model according to the loss function by the adjustment module 15 until the loss function meets a preset end condition, and obtain a prediction function. .
Further, in yet another embodiment, the adjustment module 15 is further configured to perform convex function optimization on the loss function;
the adjustment module 15 is further configured to adjust parameters in the neural process model according to the convex function optimized loss function.
Further, in yet another embodiment, the first vectorization module 11 is further configured to obtain training text;
word segmentation is carried out on the training text to obtain word groups;
word segmentation is carried out on the training text according to the part of speech and word length to obtain a word group;
and converting N words in the training text into corresponding N sample word vectors according to a preset dictionary to obtain a sample word vector sequence.
Further, in yet another embodiment, the method comprises,
the second vectorization module 21 is further configured to obtain a target text, and convert words in the target text into corresponding target word vectors according to a preset dictionary, so as to obtain a target word vector sequence;
the text processor further includes:
the second input module 23 is configured to sequentially input the target word vectors into the prediction function according to the sequence of the words corresponding to the target word vectors in the target text;
a second output module 24, configured to obtain a predicted word vector according to the prediction function and the target word vector;
the second calculation module 22 is further configured to calculate a distance between the predicted word vector and each standard word vector in the preset dictionary;
determining the standard word vector corresponding to the minimum value in the distance;
the synthesis module 25 is further configured to set the predicted word as a word corresponding to the standard word vector.
Furthermore, the embodiment of the application also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the steps of the method.
The specific embodiments of the computer device of the present application are substantially the same as the embodiments of the text input prediction method described above, and will not be described herein.
Furthermore, the embodiments of the present application also propose a storage medium having stored thereon a computer program which, when executed by a processor, implements the operations of the embodiments of the method as described above.
The specific embodiments of the computer readable storage medium of the present application are substantially the same as the embodiments of the text input prediction method described above, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, including several instructions for causing a computer device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (8)
1. A text input prediction method, comprising the steps of:
acquiring a training text, and converting N words in the training text into N corresponding sample word vectors according to a preset dictionary to obtain a sample word vector sequence;
selecting M sample word vectors from the sample word vector sequence to input a neural process model to obtain corresponding predicted word vectors, wherein M < N;
constructing a loss function according to the difference of the M+1st item sample word vector in the predicted word vector and the sample word vector sequence;
adjusting parameters in the neural process model according to the loss function until the loss function meets a preset ending condition to obtain a prediction function;
acquiring a target text, and converting the target text into a corresponding target word vector according to a preset dictionary;
inputting the target word vector into a prediction function in the neural process model to obtain a predicted word;
the step of selecting M sample word vectors from the sample word vector sequences to be input into a neural process model to obtain corresponding predicted word vectors comprises the following steps:
obtaining a global latent variable according to the input sample word vector;
obtaining the corresponding predicted word vector according to the sample word vector and the global latent variable;
the neural process model is:
wherein ,for the input sample word vector, ++>For the sample word vector according to the input +.>Output predictor vector,/>For the multicomponent normal distribution, z is global latent variable, +.>For predictive function +.>Is random noise.
2. The text input prediction method of claim 1, wherein the step of constructing a penalty function from differences between the predicted word vector and an m+1th sample word vector in the sequence of sample word vectors comprises:
calculating the mean square error of the M+1st sample word vector in the predicted word vector and the sample word vector sequence;
and constructing the loss function according to the mean square error.
3. The text input prediction method of claim 2, wherein the step of adjusting parameters in the neural process model according to the loss function until the loss function satisfies a preset end condition, to obtain a prediction function, comprises:
judging whether the loss function is larger than a preset threshold value or not;
if the loss function is larger than the preset threshold value, performing convex function optimization on the loss function;
adjusting parameters in the neural process model according to the convex function optimized loss function;
and recalculating a predicted word vector by adopting the neural process model with the parameters adjusted until a loss function constructed by the difference between the predicted word vector and the M+1th sample word vector in the sample word vector sequence is smaller than or equal to the preset threshold value.
4. The method of claim 1, wherein the step of obtaining training text and converting N words in the training text into corresponding N sample word vectors according to a predetermined dictionary to obtain a sample word vector sequence comprises:
acquiring a training text;
word segmentation is carried out on the training text according to the part of speech and word length to obtain a word group;
and converting N words in the training text into corresponding N sample word vectors according to a preset dictionary to obtain a sample word vector sequence.
5. The method of claim 1, wherein the step of inputting the target word vector into the predictive function to obtain a predicted word comprises:
sequentially inputting the target word vectors into the prediction function according to the sequence of words corresponding to the target word vectors in the target text to obtain predicted word vectors;
calculating the distance between the predicted word vector and each standard word vector in the preset dictionary;
determining the standard word vector corresponding to the minimum value in the distance;
and setting the predicted word as the word corresponding to the standard word vector.
6. A text processing apparatus comprising a model trainer and a text processor, the model trainer comprising:
the first vectorization module is used for acquiring a training text, converting N words in the training text into N corresponding sample word vectors according to a preset dictionary, and obtaining a sample word vector sequence;
the first input module is used for selecting M sample word vectors from the sample word vector sequences to input a neural process model;
the first output module is used for outputting predicted word vectors corresponding to the M sample word vectors through the neural process model;
the difference judging module is used for constructing a loss function according to the difference of the predicted word vector and the M+1st sample word vector in the sample word vector sequence;
the adjusting module is used for adjusting parameters in the nerve process model according to the loss function until the loss function meets a preset ending condition to obtain a prediction function;
the text processor includes:
the second vector quantization module is used for acquiring a target text and converting the target text into a corresponding target word vector according to a preset dictionary;
the second calculation module is used for inputting the target word vector into a prediction function in the neural process model to obtain a predicted word;
the first input module is further used for obtaining a global latent variable according to the input sample word vector; obtaining the corresponding predicted word vector according to the sample word vector and the global latent variable;
the neural process model is:
wherein ,for the input sample word vector, ++>For the sample word vector according to the input +.>Output predictor vector,/>For the multicomponent normal distribution, z is global latent variable, +.>For predictive function +.>Is random noise.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the method according to any one of claims 1 to 5.
8. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811256223.9A CN109635274B (en) | 2018-10-25 | 2018-10-25 | Text input prediction method, device, computer equipment and storage medium |
PCT/CN2018/122814 WO2020082561A1 (en) | 2018-10-25 | 2018-12-21 | Text input prediction method and apparatus, computer device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811256223.9A CN109635274B (en) | 2018-10-25 | 2018-10-25 | Text input prediction method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109635274A CN109635274A (en) | 2019-04-16 |
CN109635274B true CN109635274B (en) | 2023-10-27 |
Family
ID=66066742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811256223.9A Active CN109635274B (en) | 2018-10-25 | 2018-10-25 | Text input prediction method, device, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109635274B (en) |
WO (1) | WO2020082561A1 (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112015859B (en) * | 2019-05-31 | 2023-08-18 | 百度在线网络技术(北京)有限公司 | Knowledge hierarchy extraction method and device for text, computer equipment and readable medium |
CN110188360B (en) * | 2019-06-06 | 2023-04-25 | 北京百度网讯科技有限公司 | Model training method and device |
CN110362742A (en) * | 2019-06-18 | 2019-10-22 | 平安普惠企业管理有限公司 | Curriculum information matching process, device, computer equipment and storage medium |
CN110415022B (en) * | 2019-07-05 | 2023-08-18 | 创新先进技术有限公司 | Method and device for processing user behavior sequence |
CN110955789B (en) * | 2019-12-31 | 2024-04-12 | 腾讯科技(深圳)有限公司 | Multimedia data processing method and equipment |
CN110795935A (en) * | 2020-01-06 | 2020-02-14 | 广东博智林机器人有限公司 | Training method and device for character word vector model, terminal and storage medium |
US12073819B2 (en) | 2020-06-05 | 2024-08-27 | Google Llc | Training speech synthesis neural networks using energy scores |
CN114201576A (en) * | 2020-09-17 | 2022-03-18 | 广东博智林机器人有限公司 | Neural network language model and character information prediction method and device |
CN113779241B (en) * | 2021-03-11 | 2025-04-15 | 北京沃东天骏信息技术有限公司 | Information acquisition method and device, computer readable storage medium, and electronic device |
CN112883185B (en) * | 2021-03-30 | 2024-08-16 | 中国工商银行股份有限公司 | Problem recommendation method and device based on machine learning |
CN113112007B (en) * | 2021-06-11 | 2021-10-15 | 平安科技(深圳)有限公司 | Method, device and equipment for selecting sequence length in neural network and storage medium |
CN113539246B (en) * | 2021-08-20 | 2022-10-18 | 贝壳找房(北京)科技有限公司 | Voice recognition method and device |
CN114089841B (en) * | 2021-11-23 | 2025-05-09 | 北京百度网讯科技有限公司 | Text generation method, device, electronic device and storage medium |
CN114218352B (en) * | 2021-11-29 | 2024-12-31 | 华能(浙江)能源开发有限公司清洁能源分公司 | Method, device, storage medium and electronic device for abnormal monitoring of power generation equipment |
CN114565085A (en) * | 2022-03-03 | 2022-05-31 | 上海艾瑞德生物科技有限公司 | Training method for concentration detection model, concentration detection method, device, electronic device and storage medium |
CN114880990B (en) * | 2022-05-16 | 2024-07-05 | 马上消费金融股份有限公司 | Punctuation mark prediction model training method, punctuation mark prediction method and punctuation mark prediction device |
CN115270125A (en) * | 2022-08-11 | 2022-11-01 | 江苏安超云软件有限公司 | IDS log classification prediction method, device, equipment and storage medium |
CN115600114A (en) * | 2022-09-30 | 2023-01-13 | 成都卫士通信息产业股份有限公司(Cn) | Model training method, similarity calculation method, device, equipment and medium |
CN117932280B (en) * | 2024-03-25 | 2024-06-25 | 之江实验室 | Long sequence data prediction method, long sequence data prediction device, computer equipment, medium and long sequence data prediction product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001273293A (en) * | 2000-03-23 | 2001-10-05 | Nippon Telegr & Teleph Corp <Ntt> | Word estimation method and apparatus, and recording medium storing word estimation program |
CN107944014A (en) * | 2017-12-11 | 2018-04-20 | 河海大学 | A kind of Chinese text sentiment analysis method based on deep learning |
CN108334497A (en) * | 2018-02-06 | 2018-07-27 | 北京航空航天大学 | The method and apparatus for automatically generating text |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9653071B2 (en) * | 2014-02-08 | 2017-05-16 | Honda Motor Co., Ltd. | Method and system for the correction-centric detection of critical speech recognition errors in spoken short messages |
CN107247702A (en) * | 2017-05-05 | 2017-10-13 | 桂林电子科技大学 | A kind of text emotion analysis and processing method and system |
CN107705784B (en) * | 2017-09-28 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Text regularization model training method and device, and text regularization method and device |
CN108108428B (en) * | 2017-12-18 | 2020-05-12 | 苏州思必驰信息科技有限公司 | Method, input method and system for constructing language model |
-
2018
- 2018-10-25 CN CN201811256223.9A patent/CN109635274B/en active Active
- 2018-12-21 WO PCT/CN2018/122814 patent/WO2020082561A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001273293A (en) * | 2000-03-23 | 2001-10-05 | Nippon Telegr & Teleph Corp <Ntt> | Word estimation method and apparatus, and recording medium storing word estimation program |
CN107944014A (en) * | 2017-12-11 | 2018-04-20 | 河海大学 | A kind of Chinese text sentiment analysis method based on deep learning |
CN108334497A (en) * | 2018-02-06 | 2018-07-27 | 北京航空航天大学 | The method and apparatus for automatically generating text |
Also Published As
Publication number | Publication date |
---|---|
WO2020082561A1 (en) | 2020-04-30 |
CN109635274A (en) | 2019-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635274B (en) | Text input prediction method, device, computer equipment and storage medium | |
US11803731B2 (en) | Neural architecture search with weight sharing | |
CN109947919B (en) | Method and apparatus for generating text matching model | |
US11928601B2 (en) | Neural network compression | |
CN107526725B (en) | Method and device for generating text based on artificial intelligence | |
US11080589B2 (en) | Sequence processing using online attention | |
US20200265192A1 (en) | Automatic text summarization method, apparatus, computer device, and storage medium | |
US20150095017A1 (en) | System and method for learning word embeddings using neural language models | |
CN107180084B (en) | Word bank updating method and device | |
US20180046614A1 (en) | Dialogie act estimation method, dialogie act estimation apparatus, and medium | |
CN112329476B (en) | Text error correction method and device, equipment and storage medium | |
KR102315984B1 (en) | Event prediction device, prediction model generator and event prediction program | |
US20210209447A1 (en) | Information processing apparatus, control method, and program | |
CN110569505B (en) | A text input method and device | |
CN111639247A (en) | Method, apparatus, device and computer-readable storage medium for evaluating quality of review | |
CN108038208B (en) | Training method and device of context information recognition model and storage medium | |
CN112084769B (en) | Dependency syntax model optimization method, apparatus, device and readable storage medium | |
CN110222328B (en) | Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium | |
JP7596549B2 (en) | Generating neural network outputs by enriching latent embeddings using self-attention and mutual attention operations | |
JP6243072B1 (en) | Input / output system, input / output program, information processing device, chat system | |
CN114627863A (en) | Speech recognition method and device based on artificial intelligence | |
CN111554276B (en) | Speech recognition method, device, equipment and computer readable storage medium | |
CN117744632B (en) | Method, device, equipment and medium for constructing vulnerability information keyword extraction model | |
CN111695591A (en) | AI-based interview corpus classification method, device, computer equipment and medium | |
CN109299246B (en) | Text classification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |