CN112987941B

CN112987941B - Method and device for generating candidate words

Info

Publication number: CN112987941B
Application number: CN201911298337.4A
Authority: CN
Inventors: 刘世军
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2024-02-13
Anticipated expiration: 2039-12-17
Also published as: CN112987941A

Abstract

The invention discloses a method and a device for generating candidate words, wherein the method comprises the following steps: receiving a current input string input by a user in real time; searching whether candidate words corresponding to the current input string exist in a word stock; if yes, the candidate word is displayed as a first candidate word; if not, the current input string is segmented to obtain each substring; acquiring each candidate word and attribute thereof corresponding to each substring from a word stock; selecting one candidate word from the candidate words corresponding to each substring to form a target word group; and displaying the target phrase as a first candidate word. According to the scheme provided by the embodiment of the invention, in the process of generating the candidate word, the words in the word stock are searched first, and when the candidate word does not exist in the word stock, the current input string is considered to be split into a plurality of sub-strings to form the target word, and the attribute of each sub-string is considered, so that the accuracy of generating the candidate word expected by the user can be effectively improved, and the input efficiency of the user is improved.

Description

Method and device for generating candidate words

Technical Field

The invention relates to the technical field of input methods, in particular to a method and a device for generating candidate words.

Background

The input method is a coding method for inputting various symbols into computer or other equipment, and is an indispensable tool for human and computer to make a communication. For Chinese input, the Pinyin input method is one of the most common input methods for people. For an input method system, some common words are usually recorded in a word stock, the input method system firstly searches the word stock according to a pinyin string input by a user, and if the word stock cannot be found, a word is formed in a unified way. The method comprises the steps of dividing the received pinyin strings, and then selecting the word with the highest word frequency corresponding to each sub string obtained by dividing to form words. For example, the pinyin string input by the user is "zhoukoudianyizhi" (a peripheral store site), but the word library is searched for and no corresponding word is obtained, in this case, the input method system splits the pinyin string "zhoukoudianyizhi" to obtain two substrings, "zhoukoudian" and "yizhi", respectively, and because the words with the highest word frequencies corresponding to the two substrings are "peripheral store" and "always", respectively, the candidate word "peripheral store always" is generated. For another example, the user enters the pinyin string "dengdaichaoshi" (waiting for timeout), and the input method system generates the candidate word "waiting for supermarket". Obviously, such candidate words are not truly intended by the user, and not only affect the user input efficiency, but also affect the user input experience.

Disclosure of Invention

The embodiment of the invention provides a method and a device for generating candidate words, which are used for improving the user input efficiency and improving the user input experience.

Therefore, the invention provides the following technical scheme:

a method of generating a candidate word, the method comprising:

receiving a current input string input by a user in real time;

searching whether candidate words corresponding to the current input string exist in a word stock or not, wherein each word in the word stock has an attribute mark;

if yes, the candidate word is displayed as a first candidate word;

if not, the current input string is segmented to obtain each substring;

acquiring each candidate word corresponding to each substring and the attribute thereof from the word stock;

selecting one candidate word from the candidate words corresponding to each substring to form a target word group, wherein the attributes of the candidate words in the target word group are related;

and displaying the target phrase as a first candidate word.

Optionally, selecting one candidate word from the candidate words corresponding to each substring to form the target word group includes:

and according to the sequence of each substring in the current input string, selecting one candidate word from the candidate words corresponding to the subsequent substring in turn according to the attribute of the candidate word corresponding to the previous substring to form a target word.

Optionally, the selecting one candidate word from the candidate words corresponding to the following substring in turn according to the attribute of the candidate word corresponding to the preceding substring to form the target word includes:

sequentially judging whether candidate words corresponding to the previous substring exist in the candidate words corresponding to the next substring or not;

if so, selecting candidate words with related attributes from candidate words corresponding to the subsequent substring to form target words;

if not, selecting the candidate word with the highest word frequency from the candidate words corresponding to the subsequent substring to form a target word.

Optionally, before whether the candidate word corresponding to the current input string exists in the search word bank, the method further includes:

acquiring the attribute of the on-screen word;

and according to the attribute of the on-screen word, performing word frequency adjustment on each candidate word corresponding to the current input string.

Optionally, the word frequency adjustment of each candidate word corresponding to the current input string according to the attribute of the on-screen word includes:

determining the weight of each candidate word corresponding to the current input string according to the similarity between the attribute of each candidate word corresponding to the current input string and the attribute of the on-screen word;

And determining the word frequency of each candidate word according to the weight of each candidate word.

Optionally, the method further comprises:

presetting a binary relation library, wherein the binary relation library is a word library formed by candidate words with association relation;

before each candidate word and the attribute thereof corresponding to each substring are obtained from the word stock, searching whether binary phrases corresponding to each substring exist in the binary relation library or not;

and if so, displaying the binary phrase as a first candidate word.

Optionally, the attribute includes: part of speech of the candidate word, and/or semantic description of the candidate word.

An apparatus for generating candidate words, the apparatus comprising:

the receiving module is used for receiving the current input string input by the user in real time;

the word stock searching module is used for searching whether candidate words corresponding to the current input string exist in a word stock or not, and each word in the word stock is provided with an attribute mark;

the display module is used for displaying the candidate word as a first candidate word when the word library searching module searches the candidate word corresponding to the current input string;

the segmentation module is used for segmenting the current input string to obtain each substring when the candidate word corresponding to the current input string is not found by the word library searching module;

The first acquisition module is used for acquiring each candidate word and attribute thereof corresponding to each substring from the word stock;

the word forming module is used for selecting one candidate word from the candidate words corresponding to each substring to form a target word group, wherein the attributes of the candidate words in the target word group are related;

the display module is further configured to display the target phrase as a first candidate word.

Optionally, the word forming module is specifically configured to sequentially select one candidate word from candidate words corresponding to a subsequent sub-string according to the sequence of each sub-string in the current input string and the attribute of the candidate word corresponding to the previous sub-string to form the target word.

Optionally, the word forming module includes:

a judging unit, configured to sequentially judge whether candidate words corresponding to the previous substring exist among candidate words corresponding to the next substring, where the candidate words are related to the attribute of the candidate word corresponding to the previous substring;

a candidate word selecting unit, configured to select, when the judging unit determines that there is a candidate word related to an attribute of a candidate word corresponding to a previous sub-string, a candidate word related to the attribute from candidate words corresponding to a subsequent sub-string to form a target word; when the judging unit judges that no candidate word related to the attribute of the candidate word corresponding to the previous sub-string exists, the candidate word with the highest word frequency is selected from the candidate words corresponding to the subsequent sub-string to form a target word.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring the attribute of the on-screen word before the word library searching module searches whether the candidate word corresponding to the current input string exists in the word library;

and the word frequency adjustment module is used for adjusting the word frequency of each candidate word corresponding to the current input string according to the attribute of the on-screen word.

Optionally, the word frequency adjustment module includes:

the weight determining unit is used for determining the weight of each candidate word corresponding to the current input string according to the similarity between the attribute of each candidate word corresponding to the current input string and the attribute of the on-screen word;

and the word frequency determining unit is used for determining the word frequency of each candidate word according to the weight of each candidate word.

Optionally, the device is preset with a binary relation library, and the binary relation library is a word library formed by candidate words with association relation; the apparatus further comprises:

the binary relation library searching module is used for searching whether binary phrases corresponding to each substring exist in the binary relation library before the first obtaining module obtains each candidate word corresponding to each substring and the attribute thereof from the word library;

The display module is further configured to display the binary phrase as a first candidate word when the binary phrase corresponding to each sub-string is found by the binary relation library search module.

A computer device, comprising: one or more processors, memory;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the methods described above.

A readable storage medium having stored thereon instructions that are executed to implement the method described previously.

According to the method for generating the candidate words, provided by the embodiment of the invention, attribute marks are set for each word in the word bank, when the candidate word corresponding to the current input string is generated, the word in the word bank is searched, when the candidate word corresponding to the current input string does not exist in the word bank, the current input string is split into a plurality of sub-strings and forms a target word, namely the candidate word, and when the target word is formed, the attribute of each sub-string is considered, so that the generated candidate word meets the user expectations, the user input efficiency is improved, and the user input experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a flow chart of a method of generating candidate words according to an embodiment of the invention;

FIG. 2 is another flow chart of a method of generating candidate words according to an embodiment of the invention;

FIG. 3 is yet another flow chart of a method of generating candidate words according to an embodiment of the invention;

FIG. 4 is a block diagram of an apparatus for generating candidate words according to an embodiment of the present invention;

FIG. 5 is another block diagram of an apparatus for generating candidate words according to an embodiment of the present invention;

FIG. 6 is a block diagram of yet another configuration of an apparatus for generating candidate words according to an embodiment of the present invention;

FIG. 7 is a block diagram illustrating an apparatus for a method of generating candidate words, according to an example embodiment;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the solution of the embodiment of the present invention better understood by those skilled in the art, the embodiment of the present invention is further described in detail below with reference to the accompanying drawings and embodiments.

In order to improve the efficiency of inputting Chinese by a user by using a Chinese input method and improve the user input experience, the embodiment of the invention provides a method and a device for generating candidate words.

The method for generating candidate words provided by the embodiment of the invention is described below.

As shown in fig. 1, a flowchart of a method for generating candidate words according to an embodiment of the present invention includes the following steps:

step 101, receiving a current input string input by a user in real time.

In practical application, a user can input Chinese by using a Chinese input method on a computer, a mobile phone and other devices, for example, the user can input Chinese by using a pinyin input method. For the Chinese input method, the input is usually performed by taking the word group as a unit, for example, an input string formed by one word group such as a pinyin string is input, and sometimes, in order to improve the input efficiency, a user may also input an input string formed by a plurality of word groups.

It should be noted that the Pinyin string is only one of the input strings, and is exemplified only by the Pinyin string, and the input string in the embodiment of the present invention may be any other type of character string, and is not limited to the Pinyin string.

In general, an input string composed of only one phrase may be referred to as a unitary input string, an input string composed of two phrases may be referred to as a binary input string, and an input string composed of three or more phrases may be referred to as an N-ary input string or a multi-component input string; the embodiment of the invention also does not need to limit the element number of the input string.

Step 102, searching whether candidate words corresponding to the current input string exist in a word stock, wherein each word in the word stock has an attribute mark; if so, executing step 103; otherwise, steps 104 to 107 are performed.

Generally, an input method system often has a word stock, and a large number of words are recorded in the word stock. Because the word stock is usually large in capacity, the Chinese input requirement of a user can be met to a great extent. Therefore, after the input method system receives the current input string input by the user, firstly, searching whether the candidate word corresponding to the current input string exists in the word stock.

Unlike the prior art, in the embodiment of the invention, attribute marking is performed on candidate words in a word stock in advance. The attribute tags include, but are not limited to: parts of speech of candidate words, such as nouns, verbs, adjectives, etc.; semantic descriptions of candidate words, such as name of person, place name, color, industry, etc. It should be noted that, those skilled in the art may perform attribute labeling on candidate words in the word stock from different angles, and the added attribute may be a one-dimensional attribute, i.e. labeling from only one angle, or may be a multidimensional attribute, i.e. labeling from multiple angles. The embodiment of the invention does not limit the angle of the attribute mark and the dimension of the added attribute.

And step 103, displaying the candidate words as first candidate words.

After searching in the word stock by utilizing the step 102, if a candidate word corresponding to the input string exists, the candidate word is displayed in the first position of the candidate column as a first candidate word.

For the Chinese input method, homophones and different words are easy to appear, so that a plurality of candidate words are often provided in candidate columns for one input string.

It will be appreciated that the closer the first candidate word is to the target word that the user desires to input, the more easily the user's efficiency in inputting chinese is improved.

And 104, segmenting the current input string to obtain each substring.

The candidate word corresponding to the current input string cannot be found in the word stock, which may be that the length of the input string is too long or the number of input string elements is too large, etc. Therefore, in order to generate the candidate words corresponding to the current input string, the current input string may be first subjected to a process of reducing the length or the number of elements, that is, splitting the current input string into a plurality of sub-strings, so as to perform word formation according to the candidate words corresponding to the respective sub-strings.

In addition, a segmentation algorithm for an input string has been disclosed in the prior art, for example, the current input string may be segmented according to syllables, and the embodiment of the present invention does not need to limit a specific segmentation manner of the input string, and the relevant points can be seen in the prior art.

Step 105, each candidate word and its attribute corresponding to each substring are obtained from the word stock.

Because the words in the word stock are marked with the attributes in advance, after the current input string is segmented and a plurality of sub-strings are obtained, each candidate word corresponding to each sub-string and the attribute of each candidate word can be obtained by searching the word stock.

For example, the user inputs the current input string "zhoukoudianyizhi" (the site of the lazy store), and searches the word stock to find no candidate word corresponding to the current input string, so that the input string is segmented by using a segmentation algorithm to obtain two substrings of "zhoukoudian" and "yizhi". Then, the following candidate words and their attributes are obtained by searching word stock, as shown in the following table:

candidate words	All-round store	Zhou Kou	Circumference of circumference	Shaft
					Attributes of	Place name	Place name	Surname of surname	Mechanical component
Candidate words	Always, the method is used for the treatment of the heart disease	One by one	Consistency of	Remains of the site
					Attributes of	Status of	Quantitative word	Adjectives	Place name, noun

And 106, selecting one candidate word from the candidate words corresponding to each substring to form a target word group, wherein the attributes of the candidate words in the target word group are related.

After each candidate word and the attribute thereof are obtained, a target word is formed by selecting one candidate word from the candidate words corresponding to each substring, and then the candidate word is generated.

It should be noted that, since each candidate word has an attribute tag, word formation may be performed according to the correlation between the attributes of each candidate word when word formation is performed. The attribute correlation refers to that the attributes of the candidate words have a certain association relationship, for example, the candidate words have the same attribute, or the candidate words have a grammatical collocation (for example, a moving object phrase), and the like.

Still, taking the foregoing "zhoukoudianyizhi" (the site of the lazy store) as an example, the attribute of the candidate word "lazy store" corresponding to "zhoukoudian" is the place name, and the candidate word "site" most relevant to the attribute in "yizhi" is the candidate word. Therefore, the candidate words "surrounding store" and "site" are selected from the candidate words corresponding to the two substrings to form the target word, namely, the candidate word "surrounding store site" corresponding to the current input string "zhoukoudianizhi" is generated.

And 107, displaying the target phrase as a first candidate word.

As can be seen from the above, in the method for generating candidate words provided by the embodiment of the present invention, attribute labels are set for each word in a word stock, when candidate words corresponding to a current input string are generated, words in the word stock are searched first, and when candidate words corresponding to the current input string do not exist in the word stock, the current input string is split into a plurality of sub-strings and a target word, namely, the candidate words, is formed, and when the target word is formed, the attributes of each sub-string are considered, so that the generated candidate words better meet the user expectations, the user input efficiency is improved, and the user input experience is improved.

The following description is given of "attribute related" mentioned in the above step 106. For a binary input string, the attributes of the candidate words corresponding to each sub-string include: attribute correlation and attribute uncorrelation. For the multi-element input strings with more than three elements, the attributes of the candidate words corresponding to each sub-string include: three cases of complete correlation of attributes, partial correlation of attributes and complete uncorrelation of attributes.

In one embodiment of the invention, according to the sequence of each sub-string in the current input string, one candidate word is selected from the candidate words corresponding to the following sub-string in turn according to the attribute of the candidate word corresponding to the preceding sub-string to form the target word. It can be seen that this embodiment can determine the candidate word corresponding to the following sub-string using the attribute of the candidate word corresponding to the preceding sub-string, and can select the candidate word in this manner regardless of whether the number of sub-strings split is two (for binary input strings) or plural (for multi-input strings).

In a specific implementation manner, whether candidate words corresponding to the previous substring exist among the candidate words corresponding to the next substring or not can be judged in sequence; if so, selecting candidate words with related attributes from candidate words corresponding to the subsequent substring to form target words; if not, selecting the candidate word with the highest word frequency from the candidate words corresponding to the subsequent substring to form a target word.

Typically, a user will enter words with relevance through an input string, such as: "tianqiyubao" (weather forecast), "guominsengchanzongzhi" (national production total); in practical applications, the user may also input words with irrelevant attributes into an input string, for example: "huoguozhijin" (chafing dish towel). For the latter case, there are cases of irrelevance (for binary input strings) and partial relativity and complete irrelevance (for multi-input strings) among the candidate words corresponding to each split sub-string, in the embodiment of the invention, the candidate word corresponding to the subsequent sub-string is determined through the attribute of the candidate word corresponding to the preceding sub-string, which is equivalent to reducing the element number of the multi-input string, and for the two cases of relativity and irrelevance of the candidate word corresponding to the subsequent input string and the candidate word corresponding to the preceding sub-string, the word selection scheme is respectively given, namely, when the attribute is related, the candidate word related to the attribute is preferentially selected for word formation, and when the attribute is uncorrelated, the candidate word with the highest word frequency can be selected for word formation, thereby greatly meeting the user requirement.

As shown in FIG. 2, another flowchart of a method of generating candidate words according to an embodiment of the invention includes the steps of:

step 201, receiving a current input string input by a user in real time.

Step 202, obtaining the attribute of the on-screen words.

After the input method system receives the current input string input by the user, the attribute of the on-screen word corresponding to the previous input string can be determined by querying the word stock.

And 203, according to the attribute of the on-screen word, performing word frequency adjustment on each candidate word corresponding to the current input string.

In some situations, the user needs to make multiple inputs, such as a student writing a paper, a clerk editing a document, etc., and the words that the user inputs twice before and after are most likely words that have relevance. For example, when a user inputs "a perikou store remains", the on-screen word corresponding to the previous input string is "a perikou store" (the attribute is a place name), and the current input string is "yizhi", at this time, according to the attribute mark of the on-screen word "a perikou store", word frequency adjustment is performed on each candidate word corresponding to the current input string, and after word frequency adjustment, "remains" (the attribute is a place name) related to the perikou store attribute becomes the first candidate word.

In one implementation, word frequency adjustment may be performed on each candidate word corresponding to the current input string as follows:

(1) And determining the weight of each candidate word corresponding to the current input string according to the similarity between the attribute of each candidate word corresponding to the current input string and the attribute of the on-screen word.

A large number of algorithms for calculating the similarity have been disclosed in the prior art, and the embodiment of the present invention may perform similarity calculation according to the prior art, for example, calculate the similarity according to the matching degree of the words describing the attributes, where a higher matching degree indicates a more similar attribute, and conversely, a lower matching degree indicates a less similar attribute. In one mode, a correspondence between the similarity and the weight may be preset, so that the weight of each candidate word corresponding to the current input string may be determined by searching the correspondence.

(2) And determining the word frequency of each candidate word according to the weight of each candidate word.

After the weight of each candidate word corresponding to the current input string is determined, the word frequency of each candidate word can be weighted through a weighting algorithm disclosed in the prior art, so that the purpose of adjusting the word frequency of each candidate word is achieved.

Step 204, searching whether candidate words corresponding to the current input string exist in a word stock, wherein each word in the word stock has an attribute mark; if so, execute step 205; otherwise, steps 206 to 209 are performed.

And 205, displaying the candidate words as first candidate words.

And 206, segmenting the current input string to obtain each substring.

Step 207, obtaining each candidate word and its attribute corresponding to each substring from the word stock.

Step 208, selecting one candidate word from the candidate words corresponding to each substring to form a target word group, wherein the attributes of the candidate words in the target word group are related.

And step 209, displaying the target phrase as a first candidate word.

It should be noted that, step 201 in the method embodiment shown in fig. 2 is similar to step 101 in the method embodiment shown in fig. 1, and steps 204 to 209 are similar to steps 102 to 107, respectively, and specific details may refer to the method embodiment shown in fig. 1, and the embodiments of the present invention are not repeated.

As can be seen from the above, in the method for generating candidate words provided in the embodiment of the present invention, attribute labels are set for each word in a word stock, when candidate words corresponding to a current input string are generated, firstly, word frequency adjustment is performed on each candidate word corresponding to the current input string by using the attribute of a word on the screen corresponding to the previous input string, then, words in the word stock are searched, when candidate words corresponding to the current input string do not exist in the word stock, the current input string is split into a plurality of sub-strings and a target word, namely, candidate words are formed, and when the target word is formed, the attribute of each sub-string and the correlation between context words in the continuously input words are considered, so that the generated candidate words more meet the user expectations, further, the user input efficiency is improved, and the user input experience is improved.

As shown in fig. 3, there is still another flowchart of a method for generating candidate words according to an embodiment of the present invention, in which a binary relation library is preset, where the binary relation library is a word library formed by candidate words having an association relation.

In order to meet the specific input requirement of a user, a phrase formed by two words with association relations can be recorded in a binary relation library, so that expected candidate words can be quickly generated when the user inputs an input string corresponding to the phrase.

In one implementation, a large number of training corpuses can be prepared in advance, the training corpuses are expected to be segmented to obtain a plurality of segmentations, the segmentations are utilized to form binary phrases, word frequencies of the binary phrases are counted according to a word frequency counting method in the prior art, and the binary phrases with the word frequencies larger than a set value are recorded in a binary relation library.

It should be noted that this is merely a specific way to build the binary relational library, and should not be construed as limiting the scheme of the present invention.

As shown in fig. 3, this embodiment of the method includes the steps of:

step 301, receiving a current input string input by a user in real time.

Step 302, searching whether candidate words corresponding to the current input string exist in a word stock, wherein each word in the word stock has an attribute mark; if so, go to step 303; otherwise, steps 304-309 are performed.

And 303, displaying the candidate words as first candidate words.

And 304, segmenting the current input string to obtain each substring.

Step 305, searching whether the binary phrase corresponding to each sub-string exists in the binary relation library established in advance, if so, executing step 306, otherwise, continuing to execute step 307.

And 306, displaying the binary phrase as a first candidate word.

Step 307, each candidate word and its attribute corresponding to each substring are obtained from the word stock.

Step 308, selecting one candidate word from the candidate words corresponding to each substring to form a target word group, wherein the attributes of the candidate words in the target word group are related.

Step 309, presenting the target phrase as a first candidate word.

It should be noted that, steps 301 to 304 and steps 307 to 309 in the method embodiment shown in fig. 3 are similar to steps 101 to 104 and steps 105 to 107 in the method embodiment shown in fig. 1, respectively, and specific details may refer to the method embodiment shown in fig. 1, and the embodiments of the present invention are not repeated.

As can be seen from the above, in the method for generating candidate words provided by the embodiment of the present invention, attribute marks are set for each word in the word bank, when candidate words corresponding to the current input string are generated, words in the word bank are searched first, and when candidate words corresponding to the current input string do not exist in the word bank, for the split sub-strings, whether binary phrases corresponding to each sub-string exist or not is searched first from the binary relation bank, if yes, the binary phrases are directly displayed as the first candidate words, when corresponding binary phrases do not exist in the binary relation bank, the current input string is split into a plurality of sub-strings and form target words, namely candidate words, when the target words are formed, the attributes of each sub-string are considered, and because the association relationship between the sub-strings is considered by the binary phrases in the binary relation bank, the generated candidate words more conform to the user expectations, and user input efficiency is further improved, and user input experience is further improved.

Correspondingly, the embodiment of the invention also provides a device for generating the candidate words, and the device for generating the candidate words provided by the embodiment of the invention is described in detail below.

As shown in fig. 4, a structure diagram of an apparatus for generating candidate words according to an embodiment of the present invention includes the following modules:

A receiving module 10, configured to receive a current input string input by a user in real time;

the word stock searching module 20 is configured to search whether a candidate word corresponding to the current input string exists in a word stock, where each word in the word stock has an attribute tag;

the display module 30 is configured to display a candidate word corresponding to the current input string as a first candidate word when the candidate word is found;

the segmentation module 40 is configured to segment the current input string to obtain each sub-string when no candidate word corresponding to the current input string is found;

a first obtaining module 50, configured to obtain, from the word stock, each candidate word and an attribute thereof corresponding to each substring;

the word forming module 60 is configured to select one candidate word from the candidate words corresponding to each sub-string to form a target word group, where the attributes of the candidate words in the target word group are related;

the presenting module 30 is further configured to present the target phrase as a first candidate word.

It can be seen from the above that, by applying the device for generating candidate words provided by the embodiment of the present invention, attribute marks are set for each word in the word bank, when the candidate word corresponding to the current input string is generated, the word in the word bank is searched first, and when the candidate word corresponding to the current input string does not exist in the word bank, the current input string is split into a plurality of sub-strings and forms a target word, namely the candidate word, and when the target word is formed, the attribute of each sub-string is considered, so that the generated candidate word better accords with the user's expectations, the user input efficiency is improved, and the user input experience is improved.

In one embodiment of the present invention, the word forming module 60 is specifically configured to sequentially select, according to the sequence of each sub-string in the current input string, one candidate word from the candidate words corresponding to the following sub-string according to the attribute of the candidate word corresponding to the preceding sub-string, so as to form the target word.

In one implementation, the word forming module 60 may include the following elements:

a candidate word selecting unit, configured to select, when the judging unit determines a candidate word related to an attribute of a candidate word corresponding to a previous sub-string, a candidate word related to the attribute from candidate words corresponding to a subsequent sub-string to form a target word; when the judging unit judges that no candidate word related to the attribute of the candidate word corresponding to the previous sub-string exists, the candidate word with the highest word frequency is selected from the candidate words corresponding to the subsequent sub-string to form a target word.

Referring to fig. 5, which is another block diagram of an apparatus for generating candidate words according to an embodiment of the present invention, corresponding to the method embodiment shown in fig. 2, on the basis of the apparatus embodiment shown in fig. 4, the apparatus further includes the following modules:

A second obtaining module 70, configured to obtain an attribute of the on-screen word before the word stock searching module 20 searches whether there is a candidate word corresponding to the current input string in the word stock;

and the word frequency adjustment module 80 is configured to perform word frequency adjustment on each candidate word corresponding to the current input string according to the attribute of the on-screen word.

In one implementation, the word frequency adjustment module 80 may include the following units:

Therefore, when the candidate words corresponding to the current input string are generated, the attribute of the on-screen word corresponding to the last input string is utilized to adjust the word frequency of each candidate word corresponding to the current input string, then the words in the word library are searched, when the candidate words corresponding to the current input string do not exist in the word library, the current input string is split into a plurality of substrings and the substrings form a target word, namely the candidate words, and when the target word is formed, the attribute of each substring and the correlation between context words in the context of the continuously input words are considered, so that the generated candidate words more meet the user expectations, the user input efficiency is further improved, and the user input experience is improved.

As shown in fig. 6, which is a further structural block diagram of an apparatus for generating candidate words according to an embodiment of the present invention, corresponding to the method embodiment shown in fig. 3, the apparatus is preset with a binary relation library, where the binary relation library is a word library formed by candidate words with association relations, and on the basis of the apparatus embodiment shown in fig. 4, the apparatus further includes the following modules:

the binary relation library searching module 90 is configured to search whether the binary relation library has a binary phrase corresponding to each sub-string before the first obtaining module 50 obtains each candidate word and its attribute corresponding to each sub-string from the word library.

In this embodiment, the presenting module 30 is further configured to present the binary phrase as the first candidate word when the binary phrase corresponding to each sub-string is found by the binary relation library searching module 90.

Of course, the binary relational library lookup module 90 described above is equally applicable to the embodiment shown in FIG. 5.

It can be seen that, by applying the device for generating candidate words provided by the embodiment of the invention, attribute marks are set for each word in the word bank, when the candidate word corresponding to the current input string is generated, the word in the word bank is searched first, and when the candidate word corresponding to the current input string does not exist in the word bank, for the split sub-strings, whether the binary phrase corresponding to each sub-string exists or not is searched from the binary relation bank, if yes, the binary phrase is directly displayed as the first candidate word, when the corresponding binary phrase does not exist in the binary relation bank, the current input string is split into a plurality of sub-strings and forms the target word, namely the candidate word, when the target word is formed, the attribute of each sub-string is considered, and because the association relation between the sub-strings is considered by the binary phrase in the binary relation bank, the generated candidate word is more in line with the expectations of users, the user input efficiency is further improved, and the user input experience is improved.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

It should be noted that the method and the device for generating the candidate words in the embodiment of the invention can be applied to various terminal devices, such as mobile phones, computers, notebooks and other devices.

An embodiment of the present invention provides a computer apparatus including: one or more processors, memory; the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the foregoing method of generating candidate words.

Embodiments of the present invention provide a readable storage medium having instructions stored thereon that are executed to implement the foregoing method of generating candidate words.

Fig. 7 is a block diagram illustrating an apparatus 800 for a method of generating candidate words, according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 7, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 816 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described key-miss-touch error correction method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform a key false touch error correction method, the method comprising: in the process of inputting by a user, obtaining pressing information when each key is triggered; determining false triggering keys according to the obtained pressing information; correcting errors of the false triggering keys; and determining each candidate word corresponding to the corrected complete input string.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (Central Processing Units, CPU) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

A non-transitory computer readable storage medium, which when executed by a processor of an apparatus, causes the apparatus to perform the above-described key-press error correction method.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method of generating candidate words, the method comprising:

receiving a current input string input by a user in real time;

acquiring the attribute of the on-screen word;

determining word frequency of each candidate word according to the weight of each candidate word;

if yes, the candidate word is displayed as a first candidate word;

if not, the current input string is segmented to obtain each substring;

And displaying the target phrase as a first candidate word.

2. The method of generating candidate words according to claim 1, wherein selecting one candidate word from the candidate words corresponding to each substring to form the target word group comprises:

3. The method of generating candidate words according to claim 2, wherein sequentially selecting one candidate word from candidate words corresponding to a subsequent substring according to the attribute of the candidate word corresponding to the preceding substring to form a target word comprises:

4. The method of generating candidate words as defined in claim 1, further comprising:

and if so, displaying the binary phrase as a first candidate word.

5. The method of generating candidate words as defined in any one of claims 1 to 4, wherein the attributes comprise: part of speech of the candidate word, and/or semantic description of the candidate word.

6. An apparatus for generating candidate words, the apparatus comprising:

the second acquisition module is used for acquiring the attribute of the on-screen words;

the word frequency determining unit is used for determining the word frequency of each candidate word according to the weight of each candidate word;

7. The apparatus for generating candidate words according to claim 6, wherein the word forming module is specifically configured to sequentially select one candidate word from candidate words corresponding to a subsequent sub-string according to the attribute of the candidate word corresponding to a previous sub-string and according to the sequence of each sub-string in the current input string.

8. The apparatus for generating candidate words as defined in claim 7, wherein the word forming module comprises:

9. The apparatus for generating candidate words according to claim 6, wherein the apparatus is provided with a binary relation library in advance, the binary relation library being a word library composed of candidate words having an association relation; the apparatus further comprises:

10. The apparatus for generating candidate words according to any one of claims 6 to 9, wherein the attributes include: part of speech of the candidate word, and/or semantic description of the candidate word.

11. A computer device, comprising: one or more processors, memory;

the memory is for storing computer executable instructions and the processor is for executing the computer executable instructions to implement the method of any one of claims 1 to 5.

12. A readable storage medium having stored thereon instructions executable to implement the method of any of claims 1 to 5.