CN113589947A

CN113589947A - Data processing method and device and electronic equipment

Info

Publication number: CN113589947A
Application number: CN202010366774.1A
Authority: CN
Inventors: 姚波怀
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2021-11-02
Anticipated expiration: 2040-04-30
Also published as: CN113589947B

Abstract

The embodiment of the invention provides a data processing method, a data processing device and electronic equipment, wherein the method comprises the following steps: acquiring an input sequence and input associated information; long sentence prediction is carried out based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates; screening the plurality of sentence candidates by adopting the input sequence to determine target sentence candidates; and furthermore, long sentence prediction is carried out by combining the input sequence and the input associated information, so that the accuracy of the long sentence prediction is improved, and the input efficiency of a user is improved.

Description

Data processing method and device and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, and an electronic device.

Background

With the development of computer technology, electronic devices such as mobile phones and tablet computers are more and more popular, and great convenience is brought to life, study and work of people. These electronic devices are typically installed with an input method application (abbreviated as input method) so that a user can input information using the input method.

In the input process of the user, the input method can predict various types of candidates matched with the input sequence, such as sentence candidates, name candidates, associations and the like, so that the user can screen, and the input efficiency of the user is improved. However, in the prior art, the prediction of sentence candidates is not accurate, and the input requirement of the user cannot be well met, so that the input efficiency of the user cannot be well improved.

Disclosure of Invention

The embodiment of the invention provides a data processing method, which aims to improve the input efficiency of a user by improving the accuracy of long sentence prediction.

Correspondingly, the embodiment of the invention also provides a data processing device and electronic equipment, which are used for ensuring the realization and application of the method.

In order to solve the above problem, an embodiment of the present invention discloses a data processing method, which specifically includes: acquiring an input sequence and input associated information; long sentence prediction is carried out based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates; and screening the plurality of sentence candidates by adopting the input sequence to determine target sentence candidates.

Optionally, the selecting the input sequence includes a pinyin sequence, and the selecting the sentence candidates by using the input sequence to determine the target sentence candidates includes: analyzing the pinyin sequence to obtain a corresponding target syllable network; phonetic notation is carried out on each sentence candidate to obtain a phonetic notation sequence corresponding to each sentence candidate; and matching the target syllable network with the phonetic notation sequence of each sentence candidate to determine the target sentence candidate.

Optionally, the screening the plurality of sentence candidates by using the input sequence to determine a target sentence candidate includes: converting the input sequence into corresponding word candidates; and matching the word candidates with the sentence candidates to determine target sentence candidates.

Optionally, the analyzing the pinyin sequence to obtain a corresponding target syllable network includes: correcting errors of the pinyin sequence to obtain a corresponding error correction sequence; and analyzing the pinyin sequence and the error correction sequence to obtain a corresponding target syllable network.

Optionally, the analyzing the pinyin sequence to obtain a corresponding target syllable network includes: analyzing the pinyin sequence into multiple forms of pinyin; aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier completely matched with the pinyin in the target form to obtain a syllable path corresponding to the pinyin in the target form; and generating a target syllable network by adopting a plurality of syllable paths.

Optionally, the syllable path includes a plurality of syllable nodes, one syllable node corresponds to one pinyin identifier, and the phonetic notation sequence includes pinyin identifiers of a plurality of characters; the matching the target syllable network with the phonetic notation sequence of each sentence candidate to determine the target sentence candidate comprises: and aiming at a target syllable path in the target syllable network, determining the pinyin of the pinyin identifier in the phonetic notation sequence and the sentence candidate matched with the pinyin prefix of the pinyin identifier corresponding to the target syllable path as the target sentence candidate.

Optionally, the analyzing the pinyin sequence to obtain a corresponding target syllable network includes: analyzing the pinyin sequence into multiple forms of pinyin; aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier matched with the pinyin prefix in the target form to obtain a syllable path corresponding to the pinyin in the target form; and generating a target syllable network by adopting a plurality of syllable paths.

Optionally, the syllable path includes a plurality of syllable nodes, one syllable node corresponds to one pinyin identifier, and the phonetic notation sequence includes pinyin identifiers of a plurality of characters; the matching the target syllable network with the phonetic notation sequence of each sentence candidate to determine the target sentence candidate comprises: and aiming at a target syllable path in the target syllable network, determining the sentence candidate of which the pinyin of the pinyin identifier in the phonetic notation sequence is completely matched with the pinyin of the pinyin identifier corresponding to the target syllable path as the target sentence candidate.

Optionally, the method further comprises the step of training a sentence prediction model: collecting corpora; sentence granularity division is carried out on the corpus to obtain training data; and/or performing word granularity division on the corpus to obtain training data; and training the sentence prediction model by adopting the training data.

Optionally, the performing long sentence prediction based on the input associated information by using a sentence prediction model to obtain a plurality of sentence candidates includes: long sentence prediction is carried out on the basis of a part of the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates; before the screening the sentence candidates using the input sequence to determine the target sentence candidate, the method further includes: a plurality of sentence candidates is filtered based on another part of the input associated information.

The embodiment of the invention also discloses a data processing device, which specifically comprises: the acquisition module is used for acquiring the input sequence and the input associated information; the prediction module is used for adopting a sentence prediction model to carry out long sentence prediction based on the input associated information to obtain a plurality of sentence candidates; and the first screening module is used for screening the sentence candidates by adopting the input sequence and determining target sentence candidates.

Optionally, the input sequence includes a pinyin sequence, and the first filtering module includes: the analysis submodule is used for analyzing the pinyin sequence to obtain a corresponding target syllable network; the phonetic notation submodule is used for phonetic notation of each sentence candidate to obtain a phonetic notation sequence corresponding to each sentence candidate; and the first matching submodule is used for matching the target syllable network with the phonetic notation sequence of each sentence candidate to determine the target sentence candidate.

Optionally, the first screening module comprises: a conversion sub-module for converting the input sequence into corresponding word candidates; and the second matching submodule is used for matching the word candidates with all sentence candidates and determining target sentence candidates.

Optionally, the parsing sub-module includes: the error correction analysis unit is used for carrying out error correction on the pinyin sequence to obtain a corresponding error correction sequence; and analyzing the pinyin sequence and the error correction sequence to obtain a corresponding target syllable network.

Optionally, the parsing sub-module includes: the first analysis and conversion unit is used for analyzing the pinyin sequence into multiple forms of pinyin; aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier completely matched with the pinyin in the target form to obtain a syllable path corresponding to the pinyin in the target form; and generating a target syllable network by adopting a plurality of syllable paths.

Optionally, the syllable path includes a plurality of syllable nodes, one syllable node corresponds to one pinyin identifier, and the phonetic notation sequence includes pinyin identifiers of a plurality of characters; the first matching sub-module includes: and the first candidate determining unit is used for determining the pinyin of the pinyin identifier in the phonetic notation sequence and the sentence candidate matched with the pinyin prefix of the pinyin identifier corresponding to the target syllable path as the target sentence candidate aiming at a target syllable path in the target syllable network.

Optionally, the parsing sub-module includes: the second analysis and conversion unit is used for analyzing the pinyin sequence into multiple forms of pinyin; aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier matched with the pinyin prefix in the target form to obtain a syllable path corresponding to the pinyin in the target form; and generating a target syllable network by adopting a plurality of syllable paths.

Optionally, the syllable path includes a plurality of syllable nodes, one syllable node corresponds to one pinyin identifier, and the phonetic notation sequence includes pinyin identifiers of a plurality of characters; the first matching sub-module includes: and the second candidate determining unit is used for determining the pinyin of the pinyin identifier in the phonetic notation sequence and the sentence candidate of which the pinyin identifier corresponding to the target syllable path is completely matched as the target sentence candidate aiming at one target syllable path in the target syllable network.

Optionally, the method further comprises: the training module is used for collecting the linguistic data; sentence granularity division is carried out on the corpus to obtain training data; and/or performing word granularity division on the corpus to obtain training data; and training the sentence prediction model by adopting the training data.

Optionally, the prediction module is configured to perform long-sentence prediction based on a part of the input associated information by using a sentence prediction model to obtain a plurality of sentence candidates; the device further comprises: and a second filtering module, configured to filter the sentence candidates based on another part of the input associated information before the input sequence is used to filter the sentence candidates and determine a target sentence candidate.

An embodiment of the present invention also discloses an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by one or more processors, and the one or more programs include instructions for: acquiring an input sequence and input associated information; long sentence prediction is carried out based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates; and screening the plurality of sentence candidates by adopting the input sequence to determine target sentence candidates.

Optionally, further comprising instructions for training a sentence prediction model: collecting corpora; sentence granularity division is carried out on the corpus to obtain training data; and/or performing word granularity division on the corpus to obtain training data; and training the sentence prediction model by adopting the training data.

Optionally, the performing long sentence prediction based on the input associated information by using a sentence prediction model to obtain a plurality of sentence candidates includes: long sentence prediction is carried out on the basis of a part of the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates; before the filtering the plurality of sentence candidates using the input sequence and determining the target sentence candidate, further comprising instructions for: a plurality of sentence candidates is filtered based on another part of the input associated information.

The embodiment of the invention also discloses a readable storage medium, and when the instructions in the storage medium are executed by a processor of the electronic equipment, the electronic equipment can execute the data processing method according to any one of the embodiments of the invention.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, an input sequence and input associated information can be acquired, then a sentence prediction model is adopted to carry out long sentence prediction based on the input associated information to obtain a plurality of sentence candidates, and then the input sequence is adopted to screen the sentence candidates to determine target sentence candidates; and furthermore, long sentence prediction is carried out by combining the input sequence and the input associated information, so that the accuracy of the long sentence prediction is improved, and the input efficiency of a user is improved.

Drawings

FIG. 1 is a flow chart of the steps of one data processing method embodiment of the present invention;

FIG. 2 is a flow chart of the steps of one embodiment of a model training method of the present invention;

FIG. 3 is a flow chart of the steps of an alternative embodiment of a data processing method of the present invention;

FIG. 4 is a flow chart of the steps of yet another alternative embodiment of a data processing method of the present invention;

FIG. 5 is a flow chart of the steps of yet another alternative embodiment of a data processing method of the present invention;

FIG. 6 is a flow chart of the steps of yet another alternative embodiment of a data processing method of the present invention;

FIG. 7 is a block diagram of an embodiment of a data processing apparatus of the present invention;

FIG. 8 is a block diagram of an alternate embodiment of a data processing apparatus of the present invention;

FIG. 9 illustrates a block diagram of an electronic device for data processing in accordance with an exemplary embodiment;

fig. 10 is a schematic structural diagram of an electronic device for data processing according to another exemplary embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:

and 102, acquiring an input sequence and input associated information.

In the embodiment of the invention, long sentence prediction can be carried out in the process of inputting the input sequence by the user, and corresponding sentence candidates are generated.

The embodiment of the invention can be applied to long sentence prediction in scenes with various input modes. The method can be applied to stroke input scenes for long sentence prediction; for example, the method is applied to long sentence prediction in a pinyin input scene; the method is also applied to long sentence prediction in a voice input scene; etc., which are not limited in this respect by embodiments of the present invention.

In addition, the embodiment of the invention can also be applied to long sentence prediction in a plurality of language scenes. The method can be applied to long sentence prediction in a Chinese input scene; the method can also be applied to long sentence prediction in English input scenes for example; it can also be applied to long sentence prediction in Korean input scenes, for example; etc., which are not limited in this respect by embodiments of the present invention.

Correspondingly, the input sequence may include a stroke sequence, a pinyin sequence, a foreign language character string, and the like, which is not limited in this embodiment of the present invention.

The method comprises the steps that in the process of inputting by a user through an input method, an input sequence input by the user and input associated information can be obtained; and then predicting corresponding sentence candidates based on the obtained pinyin sequence and the input association information.

The input association information may include information related to an input, such as the above information, input environment information, and the like, which is not limited in this embodiment of the present invention.

In an example of the present invention, a manner of performing long sentence prediction based on the obtained input sequence and input related information may refer to the following steps 104 to 106:

and step 104, performing long sentence prediction based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates.

And 106, screening the plurality of sentence candidates by adopting the input sequence to determine target sentence candidates.

In the embodiment of the invention, a sentence prediction model can be trained in advance; and then long sentence prediction is carried out by adopting the trained sentence prediction model. Here, the training process of the sentence prediction model is explained in the following embodiments. The input associated information may be input into a sentence prediction model, and the sentence prediction model performs long-sentence prediction based on the input associated information to obtain a plurality of sentence candidates. And then, screening the plurality of sentence candidates by adopting the input sequence, and screening a target sentence candidate from the plurality of sentence candidates.

It should be noted that, the step 106 may be executed by the sentence prediction model or by another module, and the embodiment of the present invention is not limited thereto.

In summary, in the embodiment of the present invention, an input sequence and input associated information may be obtained, a sentence prediction model is first used to perform long-sentence prediction based on the input associated information to obtain a plurality of sentence candidates, and then the input sequence is used to screen the plurality of sentence candidates to determine target sentence candidates; the long sentence prediction is carried out by combining the input sequence and the input associated information, so that the accuracy of the long sentence prediction is improved, and the input efficiency of a user is improved.

The following describes a training process of the sentence prediction model.

Referring to fig. 2, a flowchart illustrating steps of an embodiment of a model training method according to the present invention is shown, which may specifically include the following steps:

step 202, collecting corpora.

In the embodiment of the invention, the corpus can be collected, and then the training data is generated according to the collected corpus, so that the sentence prediction model is trained by adopting the training data. The mode of collecting the corpus may include multiple modes, for example, sentences input by a user in an input method may be collected as the corpus; for example, text, abstract and the like in each webpage are collected as linguistic data; the embodiments of the present invention are not limited in this regard.

Step 204, performing sentence granularity division on the corpus to obtain training data; and/or performing word granularity division on the corpus to obtain training data.

In the embodiment of the invention, the corpus can be divided to generate training data. One way to divide the corpus and generate the training data may be: and carrying out sentence granularity division on the corpus to obtain training data. The corpus can be divided into a plurality of sentences by taking one sentence as a reference; then two sentences which are adjacent and semantically related are adopted as a group of training data; and then multiple sets of training data can be obtained. The sentences can include single sentences and compound sentences, the single sentences can refer to sentences composed of phrases or single words, and clauses cannot be separated out; the clauses are single sentences that are structurally similar without the syntactic units of a complete sentence tone. The compound sentence is composed of two or more than two clauses which are closely related in meaning and structurally not included mutually.

In addition, in order to provide more comprehensive sentence candidates for the user, better input experience is further brought to the user; and dividing each compound sentence into a plurality of clauses by taking punctuation marks as intervals for each compound sentence. Then, two adjacent clauses in each compound sentence can be used as a group of training data; to augment the training data generated above.

One way to divide the corpus and generate the training data may be: and performing word granularity division on the corpus to obtain training data. The word can be used as the granularity, and the corpus can be divided into a plurality of words; two words that are adjacent and semantically related are then employed as a set of training data.

In the embodiment of the invention, the granularity of the words can be determined based on natural language processing; and then dividing the corpus into a plurality of words by taking the words as granularity. Or determining the granularity of the words based on the screen-up operation of the user in the input process; dividing the corpus into a plurality of words by taking the words as granularity; the embodiments of the present invention are not limited in this regard.

And step 206, training the sentence prediction model by using the training data.

The following description will take an example of training the sentence prediction model using a set of training data.

In an example of the present invention, when the training data is obtained by performing sentence granularity division on the corpus, each set of training data may include two sentences, or two clauses. The following description takes as an example a set of training data comprising two sentences. In the embodiment of the invention, the previous sentence in the training data set can be input into the sentence prediction model, and the sentence prediction model is used for carrying out forward calculation to obtain the sentence candidate. Then, the sentence candidates are compared with the next sentence in the set of training data, and the weight of the sentence prediction model is adjusted. And then, training the sentence prediction model by adopting a plurality of groups of training data in the mode until the set ending condition is met.

In an embodiment of the present invention, when the training data is obtained by performing sentence granularity division on the corpus, each set of training data may include two sets of words. In the embodiment of the invention, the previous group of words in the training data can be input into the sentence prediction model, and the sentence prediction model is used for carrying out forward calculation to obtain word candidates. Then, the word candidates are compared with a later group of words in the training data set, and the weight of the sentence prediction model is adjusted. And then, training the sentence prediction model by adopting a plurality of groups of training data in the mode until the set ending condition is met.

In the embodiment of the present invention, the input sequence is adopted to filter the plurality of sentence candidates, and the target sentence candidate determination method may include a plurality of methods, one of which may be that the syllable network corresponding to the pinyin sequence is adopted to filter the plurality of sentence candidates to determine the target sentence candidate. The following description takes the input sequence as the pinyin sequence as an example:

referring to fig. 3, a flowchart illustrating steps of an alternative embodiment of the data processing method of the present invention is shown, which may specifically include the following steps:

step 302, obtaining a pinyin sequence and inputting associated information.

The pinyin sequence may include a single pinyin or a plurality of pinyins, which is not limited in this embodiment of the present invention. The inputting of the associated information may include: the above information and/or the input environment information may also include other information, which is not limited in the embodiments of the present invention; the above information may include interaction information and/or content in an edit box.

And step 304, performing long sentence prediction based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates.

When the information comprises interactive information, the sentence prediction model can perform long sentence prediction based on the input associated information and/or the interactive information; and further long sentence prediction can be performed at the beginning of the sentence.

When the above information includes content in an edit box, the sentence prediction model may predict a next sentence of the content in the edit box based on the input environment information; further, long sentence prediction can be performed in a sentence or at the end of a sentence.

When the above information includes the content and the interaction information in the edit box, the sentence prediction model may predict a next sentence of the content in the edit box based on the input environment information and/or the interaction information; further, long sentence prediction can be performed in a sentence or at the end of a sentence.

Then, the pinyin sequence can be analyzed to obtain a corresponding target syllable network; and then screening a plurality of sentence candidates predicted by the sentence prediction model according to the target syllable network. Wherein, one way of analyzing the pinyin sequence to obtain the corresponding target syllable network may refer to steps 306 to 310:

step 306, resolving the pinyin sequence into multiple forms of pinyin.

In the embodiment of the present invention, the same pinyin sequence may correspond to multiple forms of pinyin, for example, the pinyin sequence: "fangan", the form of the corresponding pinyin may include: "fang 'an", "fan' gan", "fa 'n' gan", and the like. Therefore, the pinyin sequence can be analyzed, and the pinyin sequence is analyzed into multiple forms of pinyin; wherein, each pinyin of the form may include pinyin of at least one syllable, such as: one form of pinyin is "fang' an", corresponding to a pinyin that includes two syllables: "fang" and "an"; one form of pinyin is "fan' gan" corresponding to a pinyin that includes two syllables: "fan" and "gan"; one form of pinyin is "fa 'n' gan" corresponding to a pinyin that includes two syllables: fa, n, and gan.

Step 308, aiming at the target form of pinyin, converting the target form of pinyin into a pinyin identifier completely matched with the target form of pinyin to obtain a syllable path corresponding to the target form of pinyin.

And step 310, generating a target syllable network by adopting a plurality of syllable paths.

In the embodiment of the invention, one form of pinyin can be selected from multiple forms of pinyin as the pinyin of a target form; and then converting the target form of pinyin into a corresponding syllable path.

When the target form pinyin is converted into the corresponding syllable path, the embodiment of the invention can convert the target form pinyin into the pinyin identifier completely matched with the target form pinyin to obtain the syllable path corresponding to the target form pinyin.

Wherein one way to convert the target form of pinyin to a pinyin identifier that is a perfect match to the target form of pinyin is: for each syllable in the target form of pinyin, the pinyin for the syllable may be converted to a pinyin identifier that is a perfect match to the pinyin for the syllable. The complete matching may mean that the pinyin corresponding to the pinyin identifier is identical to the pinyin of the syllable in the pinyin of the target form.

The mapping relationship between the pinyin and the pinyin identifier (such as the pinyin ID) can be inquired, and the pinyin identifier which is completely matched with the pinyin corresponding to each syllable in the pinyin in the target form is determined. Then, the pinyin marks of the pinyin corresponding to all syllables in the pinyin in the form are all used as syllable nodes; and connecting the corresponding syllable nodes in sequence according to the sequence of the pinyin corresponding to each syllable in the pinyin of the form to obtain a syllable path corresponding to the pinyin of the form. For example: one form of pinyin is: "fang' an", if the pinyin "fang" of the first syllable, the corresponding pinyin identification is 100, and the pinyin "an" of the second syllable, the corresponding pinyin identification is 20; then a syllable path corresponding to this form of pinyin "fang' an" can be generated: 100 → 20. Another example is: one form of pinyin fa 'n' gan, if the pinyin fa of a first syllable is 88, the pinyin n of a second syllable is 9, the pinyin gan of a third syllable is 200; a syllable path corresponding to this form of "fa 'n' gan" can be generated: 88 → 9 → 200. That is, a syllable path may include at least two syllable nodes, and each syllable node may correspond to a pinyin identifier.

Then, generating a target syllable network corresponding to the pinyin sequence by adopting syllable paths corresponding to the pinyins in various forms; the target syllable network may then include a plurality of syllable paths. Of course, when the pinyin sequence can only be resolved to one form of pinyin, then the target syllable network may contain only one syllable path.

And step 312, performing phonetic notation on each sentence candidate to obtain a phonetic notation sequence corresponding to each sentence candidate.

In the embodiment of the invention, phonetic notation can be carried out on each sentence candidate predicted by the sentence prediction model to obtain the phonetic notation sequence corresponding to each sentence candidate. For each sentence candidate, phonetic notation can be performed on each character in the sentence candidate in sequence to obtain pinyin corresponding to each character. Then, the mapping relation between the pinyin and the pinyin identification (such as pinyin ID) is inquired, and the pinyin of each character in the sentence candidate is converted into the corresponding pinyin identification, so that the phonetic notation sequence corresponding to the sentence candidate is obtained. That is, each ZhuYin sequence may include multiple corresponding Pinyin identifiers.

Then, the target syllable network can be matched with the phonetic notation sequence of each sentence candidate to determine the target sentence candidate; reference may be made to step 314 as follows:

in the embodiment of the invention, each syllable path in a target syllable network can be respectively adopted to screen a plurality of sentence candidates output by the sentence prediction model; and determining the sentence candidates screened by adopting one syllable path in the target syllable network as target sentence candidates. The following description will be given taking an example of screening by using one syllable path in the target syllable network.

Step 314, aiming at a target syllable path in the target syllable network, determining the pinyin of the pinyin identifier in the phonetic notation sequence and the sentence candidate matched with the pinyin prefix of the pinyin identifier corresponding to the target syllable path as the target sentence candidate.

In the embodiment of the present invention, one syllable path selected from the target syllable network each time and used for screening the target sentence candidates may be referred to as a target syllable path.

In the embodiment of the invention, the number of words corresponding to the candidate sentences predicted by the sentence prediction model is greater than or equal to the number of syllables corresponding to the pinyin sequence; that is, the word number of the sentence candidate is greater than or equal to the number of syllable nodes corresponding to the target syllable path. Therefore, the number of syllable nodes corresponding to the target syllable path may be determined first, wherein for convenience of the following description, the number of syllable nodes corresponding to the target syllable path may be represented by N, where N is a positive integer. The pinyin identifications of the N syllable nodes of the target syllable path may then be matched with the pinyin identifications of the first N characters of each sentence candidate to screen the target sentence candidate from the plurality of sentence candidates. For each sentence candidate, the pinyin identifier of the ith character in the sentence candidate can be matched with the pinyin identifier of the ith syllable node in the target syllable path, i is a positive integer and the value range is 1-N.

As most users are usually used to input the first pinyin character or the first few pinyin characters of the target character when inputting pinyin sequences; therefore, the embodiment of the invention can provide sentence candidates related to the target input for the user when the user does not input the complete pinyin sequence, thereby improving the input efficiency of the user; sentence candidates matching the pinyin of the pinyin identifier in the phonetic notation sequence and the pinyin prefix of the pinyin identifier corresponding to the target syllable path can be determined as target sentence candidates; so as to improve the comprehensiveness of the screened target sentence candidates.

Starting from the first pinyin identifier corresponding to the target syllable path, whether the pinyin of the first pinyin identifier corresponding to the target syllable path only contains the initial consonant or contains the initial consonant and the final sound is firstly judged. If the pinyin of the first pinyin identification corresponding to the target syllable path contains initial consonants and final consonants, whether the pinyin of the first pinyin identification in the candidate phonetic notation sequence of the sentence is completely matched with the pinyin of the first pinyin identification corresponding to the target syllable path is judged. If the pinyin of the first pinyin identification in the phonetic notation sequence of the sentence candidate is completely matched with the pinyin of the first pinyin identification corresponding to the target syllable path, the pinyin of the second pinyin identification in the phonetic notation sequence of the sentence candidate can be matched with the pinyin of the second pinyin identification corresponding to the target syllable path by continuously referring to the mode. Otherwise, it is determined that the sentence candidate is not the target sentence candidate. If the pinyin of the first pinyin identification corresponding to the target syllable path only contains initial consonants, whether the pinyin of the first pinyin identification in the candidate phonetic notation sequence of the sentence is matched with the initial consonants of the pinyin of the first pinyin identification corresponding to the target syllable path or not is judged. If the pinyin of the first pinyin identification in the candidate note sequence of the sentence is matched with the initial consonant of the pinyin of the first pinyin identification corresponding to the target syllable path, the pinyin of the second pinyin identification in the candidate note sequence of the sentence can be matched with the pinyin of the second pinyin identification corresponding to the target syllable path by continuously referring to the mode. Otherwise, it is determined that the sentence candidate is not the target sentence candidate. According to the above mode, until the pinyin of the N-1 pinyin identifier in the candidate phonetic notation sequence of the sentence is matched with the pinyin of the N-1 pinyin identifier corresponding to the target syllable path. Aiming at the Nth Pinyin identification of the target syllable path, whether the Pinyin of the Nth Pinyin identification in the sentence phonetic notation sequence is matched with the prefix of the Pinyin of the Nth Pinyin identification of the target syllable path can be judged; namely, whether the pinyin of the Nth pinyin identifier in the phonetic notation sequence of the sentence contains the pinyin of the Nth pinyin identifier of the target syllable path is judged. And if the pinyin of the Nth pinyin identifier in the sentence phonetic notation sequence is matched with the prefix of the pinyin of the Nth pinyin identifier of the target syllable path, determining the sentence candidate as the target sentence candidate. And if the pinyin of the Nth pinyin identifier in the sentence phonetic notation sequence is not matched with the prefix of the pinyin of the Nth pinyin identifier of the target syllable path, determining that the sentence candidate is not the target sentence candidate.

In summary, in the embodiment of the present invention, the pinyin sequence is first analyzed into multiple forms of pinyin, and for a target form of pinyin, the target form of pinyin is converted into a pinyin identifier that is completely matched with the target form of pinyin, so as to obtain a syllable path corresponding to the target form of pinyin, and then a plurality of syllable paths are used to generate a target syllable network; then aiming at a target syllable path in the target syllable network, through the sentence candidate matched with the pinyin prefix of the pinyin identifier corresponding to the target syllable path and the pinyin identifier in the phonetic notation sequence, more comprehensive target sentence candidates are screened out, sentence candidates related to target input of the user are provided for the user, and the input efficiency of the user is improved.

Referring to fig. 4, a flowchart illustrating steps of another alternative embodiment of the data processing method of the present invention is shown, which may specifically include the following steps:

step 402, obtaining a pinyin sequence and inputting associated information.

And step 404, performing long sentence prediction by adopting a sentence prediction model based on the input associated information to obtain a plurality of sentence candidates.

The steps 402 to 404 are similar to the steps 302 to 304, and are not described herein again.

Wherein, one way of analyzing the pinyin sequence to obtain the corresponding target syllable network may refer to steps 406 to 410:

step 406, analyzing the pinyin sequence into multiple forms of pinyin.

This step 406 is similar to the step 306 described above and will not be described herein again.

Step 408, aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier matched with the pinyin prefix in the target form to obtain a syllable path corresponding to the pinyin in the target form.

In the embodiment of the invention, one form of pinyin can be selected from multiple forms of pinyin as the pinyin of a target form; the target form of pinyin is then converted to the corresponding syllable path. Wherein, the pinyin of one form can comprise pinyin of M syllables, and M is a positive integer.

As most users are usually used to input the first pinyin character or the first few pinyin characters of the target character when inputting pinyin sequences; therefore, the embodiment of the invention can provide sentence candidates related to the target input for the user when the user does not input the complete pinyin sequence, thereby improving the input efficiency of the user; when the target form pinyin is converted into the corresponding syllable path, the target form pinyin is converted into the pinyin identifier matched with the target form pinyin prefix, and the syllable path corresponding to the target form pinyin is obtained; and then a more comprehensive target syllable network can be obtained, so that more comprehensive sentence candidates are screened out based on the comprehensive target syllable network.

Wherein, one way of converting the target form pinyin into pinyin identifiers matched with the target form pinyin prefixes is as follows: and converting the pinyin of the Mth syllable in the pinyin of the target form into a pinyin identifier matched with the pinyin prefix of the Mth syllable. The prefix matching may refer to that the pinyin corresponding to the pinyin identifier includes the pinyin corresponding to the syllable in the pinyin in the target form. Determining syllables corresponding to the pinyin and containing initial consonants and final vowels and syllables only containing initial consonants in the first M-1 syllables in the pinyin sequence of the target form; and converting syllables of the first M-1 syllables in the target form pinyin sequence, wherein the corresponding pinyin comprises initial consonants and vowels, into pinyin identifiers which are completely matched with the corresponding pinyin. And converting syllables of the first M-1 syllables in the pinyin sequence of the target form, wherein the corresponding pinyin only contains the initial consonant, into pinyin identifiers matched with the corresponding initial consonant.

The target form of pinyin can be converted into a pinyin identifier matched with a target form of pinyin prefix by inquiring the mapping relationship between the pinyin and the pinyin identifier (such as pinyin ID). For example, the pinyin for the mth syllable in the pinyin of the target form is "h", and the pinyins matched with the prefix of "h" have "h", "hen", "he", "heng", and "ha", etc.; then, the pinyin identifier 99 corresponding to the "h", the pinyin identifier 120 corresponding to the "hen", the pinyin identifier 110 corresponding to the "he", the pinyin identifier 122 corresponding to the "heng", and the pinyin identifier 105 corresponding to the "ha" are determined as the pinyin identifiers matching the pinyin prefix corresponding to the mth syllable. For another example, if the pinyin for a syllable existing in the first M-1 syllables of the pinyin in the target form is "he", only the identifier 110 corresponding to "he" can be used as the pinyin identifier of the pinyin corresponding to the syllable.

Wherein, a syllable path can comprise a plurality of syllable nodes, and a syllable node can correspond to a pinyin identifier.

And step 410, generating a target syllable network by adopting a plurality of syllable paths.

This step 410 is similar to the step 310 described above, and will not be described herein again.

Step 412, performing phonetic notation on each sentence candidate to obtain a phonetic notation sequence corresponding to each sentence candidate.

This step 412 is similar to the step 312, and will not be described herein again.

And 414, aiming at a target syllable path in the target syllable network, determining the pinyin of the pinyin identifier in the phonetic notation sequence and the sentence candidate of which the pinyin of the pinyin identifier corresponding to the target syllable path is completely matched as the target sentence candidate.

In the embodiment of the invention, in the process of analyzing the pinyin sequence into the corresponding target syllable network, prefix matching conversion is already carried out on the pinyin of each form corresponding to the pinyin sequence. Therefore, in the process of adopting the target syllable network screening, aiming at a target syllable path, sentence candidates with the pinyin of the pinyin identifier in the phonetic notation sequence completely matched with the pinyin of the pinyin identifier corresponding to the target syllable path can be determined as target sentence candidates. For convenience of subsequent description, the number of syllables included in the target syllable path may be referred to as N, that is, the number of pinyin identifiers in the target syllable path; n is a positive integer.

The pinyin of the first pinyin identification in the candidate phonetic notation sequence of the sentence is judged whether to be completely matched with the pinyin of the first pinyin identification corresponding to the target syllable path or not from the first pinyin identification corresponding to the target syllable path. If the pinyin of the first pinyin identification in the phonetic notation sequence of the sentence candidate is completely matched with the pinyin of the first pinyin identification corresponding to the target syllable path, the pinyin of the second pinyin identification in the phonetic notation sequence of the sentence candidate can be matched with the pinyin of the second pinyin identification corresponding to the target syllable path by continuously referring to the mode. Otherwise, determining that the sentence candidate is not the target sentence candidate; until the pinyin of the Nth pinyin identifier in the candidate phonetic notation sequence of the sentence is matched with the pinyin of the Nth pinyin identifier corresponding to the target syllable path. If the pinyin of the Nth pinyin identifier in the phonetic notation sequence of the sentence candidate is completely matched with the pinyin of the Nth pinyin identifier corresponding to the target syllable path, the sentence candidate can be determined as the target sentence candidate, and if the pinyin of the Nth pinyin identifier in the phonetic notation sequence of the sentence candidate is not completely matched with the pinyin of the Nth pinyin identifier corresponding to the target syllable path, the sentence candidate is determined not to be the target sentence candidate.

In summary, in the embodiments of the present invention, the pinyin sequence may be firstly analyzed into multiple forms of pinyin; then converting the target form pinyin into a pinyin identifier matched with the target form pinyin prefix to obtain a syllable path corresponding to the target form pinyin, and generating a target syllable network by adopting a plurality of syllable paths; thereby obtaining a comprehensive target syllable network; and aiming at a target syllable path in the target syllable network, the pinyin of the pinyin identifier in the phonetic notation sequence is completely matched with the pinyin of the pinyin identifier corresponding to the target syllable path to obtain sentence candidates, so that more comprehensive target sentence candidates can be screened, sentence candidates related to target input of the user are provided, and the input efficiency of the user is improved.

In another embodiment of the present invention, another way of determining the target sentence candidates by screening the plurality of sentence candidates using the input sequence may be to screen the plurality of sentence candidates using word candidates corresponding to the input sequence to determine the target sentence candidates.

Referring to fig. 5, a flowchart illustrating steps of yet another alternative embodiment of a data processing method of the present invention is shown.

Step 502, obtaining an input sequence and input associated information.

And step 504, performing long sentence prediction by adopting a sentence prediction model based on the input associated information to obtain a plurality of sentence candidates.

This step 504 is similar to the step 304, and is not described herein again.

In an embodiment of the present invention, a manner of screening the sentence candidates by using the input sequence to determine the target sentence candidate may refer to steps 506 to 508:

step 506, converting the input sequence into corresponding word candidates.

And step 508, matching the word candidates with the sentence candidates to determine target sentence candidates.

When the input sequence is a pinyin sequence, the pinyin sequence can be analyzed to obtain a corresponding target syllable network; and then converting the pinyin sequence into corresponding word candidates based on the target syllable network. For a way of analyzing the pinyin sequence to obtain the corresponding target syllable network, reference may be made to the above steps 406 to 408, which are not described herein again. When the target syllable network comprises a plurality of syllable paths, each syllable path can be converted into a corresponding word candidate; wherein each syllable path may be converted into at least one word candidate.

Of course, when the input sequence is another sequence or a foreign language character string, the input sequence may be converted into a corresponding word candidate in another manner, which is not limited in the embodiment of the present invention.

In the embodiment of the present invention, when the number of the word candidates converted from the input sequence includes a plurality of words, each word candidate corresponding to the input sequence may be matched with each sentence candidate to determine the target sentence candidate. Selecting one word candidate as a target word candidate from a plurality of word candidates corresponding to the input sequence; and determining the word number Y corresponding to the target word candidate, wherein Y is a positive integer. Then, matching the Y characters of the target character word candidates with the first Y characters in each sentence candidate; each word in the target word candidates may be sequentially matched with the word at the corresponding position in each sentence candidate, starting from the first word in the target word candidates, until the Y-th word in the target word candidates is matched with the Y-th word in each sentence candidate. And determining the sentence candidates with the first Y characters matched with the target word candidates as target sentence candidates.

In summary, in the embodiment of the present invention, the input sequence may be converted into corresponding word candidates, and then the word candidates are matched with each sentence candidate to determine target sentence candidates; furthermore, the sentence candidates are directly screened without phonetic notation, and the screening efficiency is improved.

The user may have a wrong input condition in the input process; therefore, in an optional embodiment of the present invention, the analyzing the pinyin sequence to obtain a corresponding target syllable network includes: correcting errors of the pinyin sequence to obtain a corresponding error correction sequence; and analyzing the pinyin sequence and the error correction sequence to obtain a corresponding target syllable network. Furthermore, even if the user is mistakenly input, sentence candidates which hit the requirement can be given; the efficiency of user input is further improved, and user experience can also be improved. The syllable network obtained by analyzing the error correction sequence and the pinyin sequence can be called a target syllable network. The method for analyzing the error correction sequence to obtain the corresponding target syllable network is similar to the method for analyzing the pinyin sequence to obtain the corresponding target syllable network, and is not repeated here.

Of course, when the input sequence is other sequences or foreign language character strings, the input sequence can also be corrected to obtain an error correction sequence; the plurality of sentence candidates are then screened based on the input sequence and the error correction sequence to determine target sentence candidates.

Referring to fig. 6, a flowchart illustrating the steps of yet another alternative embodiment of a data processing method of the present invention is shown.

Step 602, acquiring an input sequence and input associated information.

And step 604, performing long sentence prediction based on a part of the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates.

In the embodiment of the invention, a part of the input associated information can be input into the sentence prediction model, and then the sentence prediction model carries out long sentence prediction based on the part of the input associated information to obtain a plurality of sentence candidates.

Wherein, when the input associated information includes the above information, the above information may be divided into two parts: a first portion and a second portion, the first portion preceding the second portion. The first portion and other information in the input association information are then input into the sentence prediction model. For example, the information before the punctuation closest to the end of the sentence can be divided into a first part, and the information after the punctuation closest to the end of the sentence can be divided into a second part; like the above information "good weather, i", we can take "good weather" as the first part and "i" as the second part.

Step 606, a plurality of sentence candidates are filtered based on another part of the input association information.

Then, screening a plurality of sentences output by the sentence prediction model by adopting the other part of the input associated information; wherein, the sentence candidates which are completely matched with another part of the input associated information can be screened out. For example, on the basis of the above example, the sentence candidates output by the sentence prediction model include: "I want to go out to play", "we go out to play with a bar", "he wants to go out to play", and "you want to go out to play with"; the second part of the above information, "me", may be employed to screen out sentence candidates: "I want to go out to play" and "we go out to play a bar".

And 608, adopting the input sequence to filter the filtered sentence candidates again to determine target sentence candidates.

And then, on the basis of adopting the other part of the input associated information for screening, adopting input training to screen the screened sentence candidates again, and selecting the target sentence candidates.

For example, on the basis of the above example, if the input sequence input by the user is the pinyin sequence "xiangchuq", then "i want to go out to play" may be screened out from the sentence candidates "i want to go out to play" and "we go out to play" screened out with another part of the input association information.

In addition, when the sentence prediction model outputs each sentence candidate, the candidate score of each sentence candidate can be output; and the candidate values of the target sentence candidates are sorted conveniently in the follow-up process, and the target sentence candidates are displayed according to the sorted result.

When the input associated information is more, inputting part of the input associated information into the sentence prediction model to obtain a plurality of sentence candidates output by the sentence prediction model; to reduce the computational load of the sentence prediction model. Therefore, after the target sentence candidates are obtained, the target sentence candidates can be sequenced based on the complete input association information and the candidate scores of the sentence candidates; and then the accuracy of the candidate ordering of each target sentence can be improved. Of course, the target sentence candidates may also be sorted based on the complete input association information, which is not limited in the embodiment of the present invention.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 7, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:

an obtaining module 702, configured to obtain an input sequence and input association information;

a prediction module 704, configured to perform long-sentence prediction based on the input associated information by using a sentence prediction model to obtain a plurality of sentence candidates;

a first filtering module 706, configured to filter the sentence candidates by using the input sequence, and determine a target sentence candidate.

Referring to fig. 8, a block diagram of an alternative embodiment of a data processing apparatus of the present invention is shown.

In an alternative embodiment of the present invention, the input sequence includes a pinyin sequence, and the first filtering module 706 includes: the analysis submodule 7062 is configured to analyze the pinyin sequence to obtain a corresponding target syllable network; the phonetic notation submodule 7064 is configured to perform phonetic notation on each sentence candidate to obtain a phonetic notation sequence corresponding to each sentence candidate; a first matching sub-module 7066, configured to match the target syllable network with the phonetic notation sequence of each sentence candidate, and determine a target sentence candidate.

In an optional embodiment of the present invention, the first filtering module 706 includes: a conversion submodule 7068, configured to convert the input sequence into corresponding word candidates; and a second matching sub-module 70610, configured to match the word candidates with the sentence candidates, and determine target sentence candidates.

In an alternative embodiment of the present invention, the parsing sub-module 7062 includes: an error correction analysis unit 70622, configured to perform error correction on the pinyin sequence to obtain a corresponding error correction sequence; and analyzing the pinyin sequence and the error correction sequence to obtain a corresponding target syllable network.

In an alternative embodiment of the present invention, the parsing sub-module 7062 includes: a first parsing and converting unit 70624, configured to parse the pinyin sequence into multiple forms of pinyin; aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier completely matched with the pinyin in the target form to obtain a syllable path corresponding to the pinyin in the target form; and generating a target syllable network by adopting a plurality of syllable paths.

In an optional embodiment of the present invention, the syllable path includes a plurality of syllable nodes, one syllable node corresponds to one pinyin identifier, and the phonetic notation sequence includes pinyin identifiers of a plurality of characters; the first matching sub-module 7066 includes: a first candidate determining unit 70662, configured to determine, for a target syllable path in the target syllable network, a sentence candidate in which a pinyin identified by a pinyin identifier in a ZhuYin sequence matches a pinyin prefix of a pinyin identifier corresponding to the target syllable path, as a target sentence candidate.

In an alternative embodiment of the present invention, the parsing sub-module 7062 includes: a second parsing and converting unit 70626, configured to parse the pinyin sequence into multiple forms of pinyin; aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier matched with the pinyin prefix in the target form to obtain a syllable path corresponding to the pinyin in the target form; and generating a target syllable network by adopting a plurality of syllable paths.

In an optional embodiment of the present invention, the syllable path includes a plurality of syllable nodes, one syllable node corresponds to one pinyin identifier, and the phonetic notation sequence includes pinyin identifiers of a plurality of characters; the first matching sub-module 7066 includes: a second candidate determining unit 70664, configured to determine, for a target syllable path in the target syllable network, a sentence candidate in which the pinyin for the pinyin identifier in the ZhuYin sequence completely matches the pinyin for the pinyin identifier corresponding to the target syllable path, as a target sentence candidate.

In an optional embodiment of the present invention, the method further comprises: a training module 708 for collecting corpora; sentence granularity division is carried out on the corpus to obtain training data; and/or performing word granularity division on the corpus to obtain training data; and training the sentence prediction model by adopting the training data.

In an optional embodiment of the present invention, the prediction module 704 is configured to perform long-sentence prediction based on a part of the input associated information by using a sentence prediction model to obtain a plurality of sentence candidates; the device further comprises: a second filtering module 710, configured to filter the sentence candidates based on another part of the input association information before the sentence candidates are filtered by using the input sequence to determine the target sentence candidate.

In summary, in the embodiment of the present invention, an input sequence and input associated information may be obtained, a sentence prediction model is first used to perform long-sentence prediction based on the input associated information to obtain a plurality of sentence candidates, and then the input sequence is used to screen the plurality of sentence candidates to determine target sentence candidates; and furthermore, long sentence prediction is carried out by combining the input sequence and the input associated information, so that the accuracy of the long sentence prediction is improved, and the input efficiency of a user is improved.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Fig. 9 is a block diagram illustrating an architecture of an electronic device 900 for data processing in accordance with an example embodiment. For example, the electronic device 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 9, electronic device 900 may include one or more of the following components: a processing component 902, a memory 904, a power component 906, a multimedia component 908, an audio component 910, an input/output (I/O) interface 912, a sensor component 914, and a communication component 916.

The processing component 902 generally controls overall operation of the electronic device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on the electronic device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 906 provides power to the various components of the electronic device 900. Power components 906 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 900.

The multimedia components 908 include a screen that provides an output interface between the electronic device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, the audio component 910 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing status evaluations of various aspects of the electronic device 900. For example, sensor assembly 914 may detect an open/closed state of device 900, the relative positioning of components, such as a display and keypad of electronic device 900, sensor assembly 914 may also detect a change in the position of electronic device 900 or a component of electronic device 900, the presence or absence of user contact with electronic device 900, orientation or acceleration/deceleration of electronic device 900, and a change in the temperature of electronic device 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate wired or wireless communication between the electronic device 900 and other devices. The electronic device 900 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication part 914 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 914 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the electronic device 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a data processing method, the method comprising: acquiring an input sequence and input associated information; long sentence prediction is carried out based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates; and screening the plurality of sentence candidates by adopting the input sequence to determine target sentence candidates.

Fig. 10 is a schematic structural diagram of an electronic device 1000 for data processing according to another exemplary embodiment of the present invention. The electronic device 1000 may be a server, which may have large differences due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1022 (e.g., one or more processors) and a memory 1032, one or more storage media 1030 (e.g., one or more mass storage devices) storing an application 1042 or data 1044. Memory 1032 and storage medium 1030 may be, among other things, transient or persistent storage. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1022 may be disposed in communication with the storage medium 1030, and execute a series of instruction operations in the storage medium 1030 on the server.

The server may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input-output interfaces 1058, one or more keyboards 1056, and/or one or more operating systems 1041, such as WindowsServerTM, MacOSXTM, UnixTM, LinuxTM, FreeBSDTM, etc.

An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for: acquiring an input sequence and input associated information; long sentence prediction is carried out based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates; and screening the plurality of sentence candidates by adopting the input sequence to determine target sentence candidates.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The data processing method, the data processing apparatus and the electronic device provided by the present invention are described in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the descriptions of the above embodiments are only used to help understand the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A data processing method, comprising:

acquiring an input sequence and input associated information;

long sentence prediction is carried out based on the input associated information by adopting a sentence prediction model to obtain a plurality of sentence candidates;

and screening the plurality of sentence candidates by adopting the input sequence to determine target sentence candidates.

2. The method of claim 1, wherein the input sequence comprises a pinyin sequence, and wherein the employing the input sequence to filter the sentence candidates to determine target sentence candidates comprises:

analyzing the pinyin sequence to obtain a corresponding target syllable network;

phonetic notation is carried out on each sentence candidate to obtain a phonetic notation sequence corresponding to each sentence candidate;

and matching the target syllable network with the phonetic notation sequence of each sentence candidate to determine the target sentence candidate.

3. The method of claim 1, wherein the filtering the plurality of sentence candidates using the input sequence to determine target sentence candidates comprises:

converting the input sequence into corresponding word candidates;

and matching the word candidates with the sentence candidates to determine target sentence candidates.

4. The method of claim 2, wherein the parsing the pinyin sequence to obtain a corresponding target syllable network comprises:

correcting errors of the pinyin sequence to obtain a corresponding error correction sequence;

and analyzing the pinyin sequence and the error correction sequence to obtain a corresponding target syllable network.

5. The method of claim 2, wherein the parsing the pinyin sequence to obtain a corresponding target syllable network comprises:

analyzing the pinyin sequence into multiple forms of pinyin;

aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier completely matched with the pinyin in the target form to obtain a syllable path corresponding to the pinyin in the target form;

and generating a target syllable network by adopting a plurality of syllable paths.

6. The method of claim 5, wherein the syllable path comprises a plurality of syllable nodes, one syllable node corresponding to one pinyin identifier, and the ZhuYin sequence comprises pinyin identifiers for a plurality of words;

the matching the target syllable network with the phonetic notation sequence of each sentence candidate to determine the target sentence candidate comprises:

and aiming at a target syllable path in the target syllable network, determining the pinyin of the pinyin identifier in the phonetic notation sequence and the sentence candidate matched with the pinyin prefix of the pinyin identifier corresponding to the target syllable path as the target sentence candidate.

7. The method of claim 2, wherein the parsing the pinyin sequence to obtain a corresponding target syllable network comprises:

analyzing the pinyin sequence into multiple forms of pinyin;

aiming at the pinyin in the target form, converting the pinyin in the target form into a pinyin identifier matched with the pinyin prefix in the target form to obtain a syllable path corresponding to the pinyin in the target form;

8. A data processing apparatus, comprising:

the acquisition module is used for acquiring the input sequence and the input associated information;

the prediction module is used for adopting a sentence prediction model to carry out long sentence prediction based on the input associated information to obtain a plurality of sentence candidates;

and the first screening module is used for screening the sentence candidates by adopting the input sequence and determining target sentence candidates.

9. An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:

acquiring an input sequence and input associated information;

10. A readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method according to any of method claims 1-7.