[go: up one dir, main page]

CN109597983A - A kind of spelling error correction method and device - Google Patents

A kind of spelling error correction method and device Download PDF

Info

Publication number
CN109597983A
CN109597983A CN201710928606.5A CN201710928606A CN109597983A CN 109597983 A CN109597983 A CN 109597983A CN 201710928606 A CN201710928606 A CN 201710928606A CN 109597983 A CN109597983 A CN 109597983A
Authority
CN
China
Prior art keywords
phonetic
error correction
similar
pinyin
spelling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710928606.5A
Other languages
Chinese (zh)
Other versions
CN109597983B (en
Inventor
陈克凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201710928606.5A priority Critical patent/CN109597983B/en
Publication of CN109597983A publication Critical patent/CN109597983A/en
Application granted granted Critical
Publication of CN109597983B publication Critical patent/CN109597983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of spelling error correction method and devices, it is related to technical field of data processing, it is only associated according to the positional relationship of letter each in keyboard and error correction when to solve existing spelling error correction, so that the lower problem of spelling error correction accuracy rate caused by error correction is unilateral.The method comprise the steps that obtaining to error correction phonetic;It is extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic;According to the keypad editor distance based on Chinese phonetic alphabet rule settings, each similar pinyin is calculated with described to the corresponding error correction probability of error correction phonetic;According to error correction probability output with described to the corresponding amendment phonetic of error correction phonetic.The present invention is suitably applied in the error correction to phonetic spelling.

Description

A kind of spelling error correction method and device
Technical field
The present invention relates to technical field of data processing more particularly to a kind of spelling error correction methods and device.
Background technique
With the continuous development of network technology, in daily life more and more people be handled official business, done shopping by computer, Information etc. is searched, the interaction between user and smart machine is very frequent.Under normal conditions, people by using keyboard input, The modes such as touch screen hand-writing input, voice input input corresponding information in a device.However, people input by keyboard When, logical production, which can exist, to be tapped during spelling to wrong keyboard, for example, when people want input Chinese character " phone ", logical Crossing during keyboard is spelt may be write as " dianhia " for the correct Chinese phonetic alphabet " dianhua ", need at this time defeated according to user The incorrect pinyin entered carries out error correction and is associated with out corresponding text.
Currently, the error correcting system for misspelling is mainly associated according to the positional relationship of letter each in keyboard And error correction, so that error correcting system is more unilateral, so as to cause the lower problem of the accuracy of spelling error correction.
Summary of the invention
In view of the above problems, the present invention provides a kind of spelling error correction method and device, and main purpose is that Chinese is combined to spell Sound rule carries out error correction to the phonetic that user inputs.
In order to solve the above technical problems, in a first aspect, the present invention provides a kind of spelling error correction methods, this method comprises:
It obtains to error correction phonetic, the multiple characters inputted to error correction phonetic for user;
It is extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic, in the initialized data base Whole words and the corresponding correct spelling phonetic of word are preserved, the similar pinyin is to differ preset quantity to error correction phonetic with described The phonetic of character;
According to the keypad editor distance based on Chinese phonetic alphabet rule settings, each similar pinyin is calculated with described wait entangle The corresponding error correction probability of misspelled sound;
According to error correction probability output with described to the corresponding amendment phonetic of error correction phonetic.
Optionally, the method also includes:
It is obtained from the initialized data base and extracts accumulative number of searches corresponding with each similar pinyin.
Optionally, described to calculate each similar pinyin with described to the corresponding error correction probability of error correction phonetic, it wraps It includes:
According to the keypad editor distance and predetermined coefficient based on Chinese phonetic alphabet rule settings, calculate described each similar Phonetic with described to the corresponding clean up editing distance of error correction phonetic, the predetermined coefficient be according to the similar pinyin with to What the character quantity differed between error correction phonetic was set;
It will be between the corresponding accumulative number of searches of each similar pinyin and the inverse of the clean up editing distance Product, be determined as the error correction probability of each similar pinyin.
Optionally, described to include: to the corresponding phonetic of error correction phonetic with described according to error correction probability output
Each similar pinyin is ranked up according to the error correction probability;
Extract the corresponding similar pinyin of error correction probability more than predetermined probabilities threshold value;
The similar pinyin of the extraction is exported according to presetting rule.
Optionally, before the acquisition is to error correction phonetic, the method also includes:
Obtain phonetic to be detected;
Detection is in the initialized data base with the presence or absence of correct spelling phonetic corresponding with the phonetic to be detected;
If it exists, then output correct spelling phonetic corresponding with the phonetic to be detected;
If it does not exist, then the phonetic to be detected is determined as to error correction phonetic.
Second aspect, the present invention also provides a kind of spelling error correction device, which includes:
Acquiring unit, for obtaining to error correction phonetic, the multiple characters inputted to error correction phonetic for user;
Extraction unit, for being extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic, institute It states and preserves whole words and the corresponding correct spelling phonetic of word in initialized data base, the similar pinyin is to spell with described to error correction The phonetic of sound difference preset quantity character;
Computing unit, for calculating described each similar according to the keypad editor distance based on Chinese phonetic alphabet rule settings Phonetic is with described to the corresponding error correction probability of error correction phonetic;
Output unit, for according to error correction probability output with described to the corresponding amendment phonetic of error correction phonetic.
Optionally, the acquiring unit is also used to obtain each similar pinyin difference from the initialized data base Corresponding accumulative number of searches;
The extraction unit is also used to extract the corresponding with each similar pinyin of the acquiring unit acquisition Accumulative number of searches.
Optionally, the computing unit includes:
Computing module, for according to the keypad editor distance and predetermined coefficient based on Chinese phonetic alphabet rule settings, meter Each similar pinyin is calculated with described to the corresponding clean up editing distance of error correction phonetic, the predetermined coefficient is according to institute State what similar pinyin was set with the character quantity to differ between error correction phonetic;
Determining module, for will the corresponding accumulative number of searches of each similar pinyin and the clean up editing away from From inverse between product, be determined as the error correction probability of each similar pinyin.
Optionally, the output unit includes:
Sorting module, for being ranked up each similar pinyin according to the error correction probability;
Extraction module, for extracting the corresponding similar pinyin of error correction probability more than predetermined probabilities threshold value;
Output module, for exporting the similar pinyin of the extraction according to presetting rule.
Optionally, described device further include: detection unit,
The acquiring unit is also used to obtain phonetic to be detected;
The detection unit, for detecting in the initialized data base with the presence or absence of corresponding with the phonetic to be detected Correct spelling phonetic;
The output unit is also used to if it exists, then output correct spelling phonetic corresponding with the phonetic to be detected;
The determination unit is also used to if it does not exist, then the phonetic to be detected is determined as to error correction phonetic.
To achieve the goals above, according to the third aspect of the invention we, a kind of storage medium, the storage medium are provided Program including storage, wherein equipment where controlling the storage medium in described program operation executes spelling described above Write error correction method.
To achieve the goals above, according to the fourth aspect of the invention, a kind of processor is provided, the processor is used for Run program, wherein described program executes spelling error correction method described above when running.
By above-mentioned technical proposal, spelling error correction method and device provided by the invention, for the prior art to user When the phonetic of input carries out error correction, mainly it is associated according to positional relationship of each letter in keyboard and error correction, so that Error correcting system is more unilateral, and the present invention is by extracting and differing present count with to error correction phonetic after getting to error correction phonetic The whole similar pinyins for measuring character are successively calculated and are extracted then in conjunction with the keypad editor distance based on Chinese phonetic alphabet rule settings Each similar pinyin and to the error correction probability between error correction phonetic, determined according to the error correction probability being calculated and spelled with to error correction The corresponding phonetic of sound, therefore compared with the prior art, the present invention is combined when the phonetic inputted to user carries out spelling error correction The frequent degree that the corresponding keypad editor distance of Chinese phonetic alphabet rule settings and user search for web page contents, can be more complete Face, accurately according to user input incorrect pinyin be modified, thus improve spelling error correction accuracy.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of spelling error correction method flow chart provided in an embodiment of the present invention;
Fig. 2 shows another spelling error correction method flow charts provided in an embodiment of the present invention;
Fig. 3 shows a kind of composition block diagram for spelling error correction device provided in an embodiment of the present invention;
Fig. 4 shows the composition block diagram of another spelling error correction device provided in an embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
In order to improve the accuracy of spelling error correction, the embodiment of the invention provides a kind of spelling error correction methods, such as Fig. 1 institute Show, this method comprises:
101, it obtains to error correction phonetic.
Wherein, the multiple characters inputted to error correction phonetic for user.And described to error correction phonetic can be a word Corresponding pinyin character, or the corresponding pinyin character of a phrase or the corresponding character of an English word etc..Tool Body, application scenarios of the embodiment of the present invention can be the page etc. in browser page or application APP, but the present invention is implemented Example is not specifically limited in this embodiment.
It should be noted that can be used to entangle spelling in webpage for configuration for the executing subject of the embodiment of the present invention Wrong device, when device detect that user inputs in webpage be unrecognized phonetic when, illustrate to be needed this moment to the spelling Sound carries out error correction, triggers acquisition instruction, and then realize the error correction to phonetic.
102, it is extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic.
Wherein, whole words and the corresponding correct spelling phonetic of word, the similar pinyin are preserved in the initialized data base For with the phonetic that preset quantity character is differed to error correction phonetic.And the spelling with to the corresponding similar pinyin of error correction phonetic Sound quantity can be 3,10,20 etc., described to be with the phonetic for differing preset quantity character to error correction phonetic From to there are one or more different characters between error correction phonetic, or with to compared between error correction phonetic more than/ Less than one or more characters etc., for example, getting user's input to error correction phonetic is kaihin, then error correction phonetic pair is waited for this The similar pinyin answered can wrap containing { kaixin, kaibin, kaiyin, kuaijin }.
Specifically, for the embodiment of the present invention, the step 102 can be got according to error correction phonetic, can be with Directly in the database being pre-created traversal queries with to the corresponding similar pinyin of error correction phonetic, can also be first according to wait entangle Misspelled sound generates all and to the corresponding phonetic for differing preset quantity character of error correction phonetic, then compares in the database again To it is final with to the corresponding whole similar pinyins of error correction phonetic.
103, according to the keypad editor distance based on Chinese phonetic alphabet rule settings, calculate each similar pinyin with it is described To the corresponding error correction probability of error correction phonetic.
It wherein, include according to the different Chinese phonetic alphabet in the keypad editor distance based on Chinese phonetic alphabet rule settings The keypad editor distance of rule settings, such as tongue rule is stuck up according to flat in the Chinese phonetic alphabet, the keyboard of setting c and ch, s and sh etc. Editing distance is 0.5;According in the Chinese phonetic alphabet the initial and the final rule, the keyboard distance for setting bu is 0.5, and the editor of bt away from From being 1;And according to the accent of different regions Chinese phonetic alphabet rule, the keypad editor distance of setting lu and lv be 0.5, mo and The keypad editor distance of me is 0.5 etc..It should be noted that for the embodiment of the present invention, it can be in advance by the Chinese existing for whole The phonetic of language phonetic special rules and corresponding keypad editor distance pre-save in the database, and the Chinese is not present for remaining The keypad editor distance of the phonetic of language phonetic special rules then can be calculated and be saved according to calculation in the prior art, Finally obtain the keypad editor distance described in this step based on Chinese phonetic alphabet rule settings.It therefore in this step can be first It first traverses to error correction phonetic and is compared with the keypad editor distance based on Chinese phonetic alphabet rule settings saved in database, Get the keypad editor distance of each similar pinyin.
Further, the error correction probability is the incorrect pinyin that mark is inputted according to user, is associated in phonetic set each A possibility that a similar pinyin, thus when be calculated similar pinyin with when the corresponding error correction probability of error correction phonetic is bigger, that The phonetic and user input bigger a possibility that association to error correction phonetic, conversely, when the similar pinyin that be calculated Error correction probability gets over hour, then the phonetic and user's input rate to error correction phonetic a possibility that association with regard to smaller.
104, it is exported with described according to the error correction probability to the corresponding amendment phonetic of error correction phonetic.
Wherein, the amendment phonetic is the correct phonetic selected for user, and in this step final output amendment phonetic It can be one, or be arranged in descending order according to error correction probability multiple in order to therefrom being selected for user It selects.As described in step 103, since error correction probability is to identify each similar pinyin and to possibility associated between error correction phonetic Property, so, by calculating the error correction probability of each phonetic in whole similar pinyins, can be exported according to obtained error correction probability Treat the processing result of error correction phonetic.
Spelling error correction method provided in an embodiment of the present invention carries out error correction in the phonetic inputted to user for the prior art When, it is mainly associated according to each positional relationship of the letter in keyboard and error correction, so that error correcting system is more unilateral, this Invention is extracted and the similar spelling of whole that differs preset quantity character to error correction phonetic by after getting to error correction phonetic Sound successively calculates each similar of extraction then in conjunction with based on according to the keypad editor distance based on Chinese phonetic alphabet rule settings Phonetic and to the error correction probability between error correction phonetic, according to the error correction probability being calculated determine with to the corresponding spelling of error correction phonetic Sound, therefore compared with the prior art, the present invention combines Chinese phonetic alphabet rule when the phonetic inputted to user carries out spelling error correction Then set corresponding keypad editor distance and frequent degree that user searches for web page contents, it can be more comprehensively and accurately It is modified according to the incorrect pinyin that user inputs, to improve the accuracy of spelling error correction.
Further, as the refinement and extension to embodiment illustrated in fig. 1, the embodiment of the invention also provides another kinds to spell Sound error correction method, as shown in Figure 2.
201, it obtains to error correction phonetic.
Wherein, the multiple characters inputted to error correction phonetic for user.And it is specific described to error correction phonetic and described The concept explanation of character can be referred to and accordingly be described in the step 101, and details are not described herein.
In order to avoid the wasting of resources, for the embodiment of the present invention, before the step 201, the method can also be wrapped It includes: obtaining phonetic to be detected;Detection is in the initialized data base with the presence or absence of correct spelling corresponding with the phonetic to be detected Write phonetic;If it exists, then output correct spelling phonetic corresponding with the phonetic to be detected;It if it does not exist, then will be described to be checked Phonetic is surveyed to be determined as to error correction phonetic.Wherein, whole words are preserved in the initialized data base and the corresponding correct spelling of word is spelled Sound, and accumulation number of searches identification information corresponding with each correct spelling phonetic, the accumulation number of searches Identification information is for identity user to the searching times of each correct spelling phonetic.
It should be noted that each webpage can be respectively configured different preset data for the embodiment of the present invention The corresponding correct spelling phonetic data of the full content for including in the webpage are all stored in data for each webpage by library In library, and when each user searching webpage content in the webpage, the secondary search is recorded, and by statistical result Phonetic corresponding with content of pages carries out corresponding preservation.Corresponding data can also be configured according to webpage of the classification to each classification Library, by counting the searching times of content, in order to which user's searching times are carried out phonetic as reference factor It corrects.
202, it is extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic.
Wherein, it is described with it is described to the corresponding similar pinyin of error correction phonetic be to differ preset quantity to error correction phonetic with described The phonetic of character, and described explain with the concrete concept of the phonetic for differing preset quantity character to error correction phonetic can refer to It is accordingly described in the step 102, details are not described herein.
Specifically, the step 202 can generate corresponding lookup function, and root to error correction phonetic according to acquisition first It is searched in initialized data base according to the lookup function, but not limited to this.For example, when what is got is to error correction phonetic When " niulai ", successively the letter for including in the phonetic can be successively replaced to generate corresponding function, such as root Corresponding lookup function is generated according to " niula_ ", " niu_ai ", the letter in phonetic can also be increased or decreased, then utilizing should Function is searched in the database, obtains whole similar pinyins that preset quantity character is differed with to error correction phonetic.
It can also include: to obtain and extract from the initialized data base after this step for the embodiment of the present invention Accumulative number of searches corresponding with each similar pinyin.All may be by having obtained after step 202 To the amendment phonetic of error correction phonetic, the obtained corresponding accumulation number of searches of each similar pinyin will be searched at this time and is mentioned It takes, in order to which each similar pinyin is further processed, avoids and differ present count with to error correction phonetic to each respectively Cumbersome the problem of wasting time caused by the phonetic of amount character is searched and extracted, to improve the effect of phonetic error correction Rate.
It specifically, can be to write each similar pinyin and the corresponding accumulation number of searches of each similar pinyin respectively Enter into tables of data, the whole similar pinyins extracted and the corresponding accumulation number of searches of each phonetic can also directly be protected It deposits, and is separated each similar pinyin using preset separator, the embodiment of the present invention is not specifically limited.
For example, getting is wanba to error correction phonetic, by by each similar pinyin and corresponding accumulation number of searches It is as shown in Table 1 that similar pinyin tables of data is obtained in write-in tables of data:
Phonetic data Accumulate number of searches
wanha 5
wanna 1
rana 2
wanga 1
wangba 200
It is found that it should be to include 5 spellings in the corresponding similar pinyin of error correction phonetic according to phonetic tables of data shown in table one Sound, and it is 200 times that wherein the corresponding user of wangba, which accumulates searching times, and it is 1 that the corresponding user of wanga, which accumulates searching times, Deng.By will be all saved together with to the corresponding difference of error correction phonetic in the phonetic of preset quantity character, so as to It can directly be extracted from tables of data etc. when being further processed to phonetic data, avoid and successively extracted in a large amount of non-ordered datas Caused by extract mistake and cumbersome problem, to improve the accuracy and convenience of phonetic error correction.
For the embodiment of the present invention, after this step, the tables of data to error correction phonetic and creation can also be protected Deposit, thus when detect input when error correction phonetic has existed before this, can directly acquire and save before this Wait for that the corresponding tables of data of error correction phonetic has saved the time and avoided resource without extracting phonetic again and creating set with this The problem of waste, to improve the efficiency of phonetic error correction and economize on resources.
203, it according to the keypad editor distance and predetermined coefficient based on Chinese phonetic alphabet rule settings, calculates described each Similar pinyin is with described to the corresponding clean up editing distance of error correction phonetic.
Wherein, the predetermined coefficient is to be carried out according to the similar pinyin with the character quantity to differ between error correction phonetic Setting, that is to say, that when similar pinyin and when increasing/deleting a character between error correction phonetic, coefficient is determined as 1, when Coefficient is exactly N when increasing/delete N number of character, likewise, when like phonetic, with when error correction phonetic differs a character, coefficient is 1, when differing N number of character, coefficient is N etc..
Specifically, the step 203 can be by the keypad editor distance based on Chinese phonetic alphabet rule settings and default system Number is multiplied, by obtained product be determined as each similar pinyin with to the corresponding clean up editing distance of error correction phonetic.Wherein, described The concept explanation of keypad editor distance based on Pinyin rule setting can be referred in the step 102 and accordingly be described, herein not It repeats again.
Such as step 202 example, when error correction phonetic is wanba, calculated by taking the phonetic wangba in similar pinyin as an example with To the corresponding clean up editing distance of error correction phonetic: wangba and between error correction phonetic wanba, since wangba is than to error correction Phonetic increases a character, so the coefficient set according to preset rules as 1, is in advance based on Chinese phonetic alphabet rule settings ang The second keypad editor distance between an is 0.5, so phonetic wangba and to the editing distance between error correction phonetic wanba For 1 × 0.5=0.5.Similarly, other each similar pinyins can be distinguished and to the clean up editing distance between error correction phonetic, obtained Be 1 × 1=1 to the synthesis editing distance between wanha and wanba, the clean up editing distance between wanna and wanba for 1 × Clean up editing distance between 1=1, rana and wanba is 1 × 2=2, and the clean up editing distance between wanga and wanba is 1 × 1=1.
It 204, will be between the corresponding accumulative number of searches of each similar pinyin and the inverse of the clean up editing distance Product, be determined as the error correction probability of each similar pinyin.
Wherein, the error correction probability is the incorrect pinyin that is inputted according to user of mark, and be associated with each similar pinyin can It can property.Specifically, the calculation method of the step 204 are as follows: accumulative number of searches × (1/ (clean up editing distance)).Such as step Described in 203, according to the corresponding accumulative number of searches of each similar pinyin, the error correction that can successively calculate each similar pinyin is general It is 5 × (1/1)=5 that rate, which is respectively as follows: the error correction probability that the error correction probability of wangba is 200 × (1/0.5)=400, wanha, The error correction probability that the error correction probability that the error correction probability of wanna is 1 × (1/1)=1, rana is 2 × (1/2)=1, wanga is 1 × (1/1)=1.
205, it is exported with described according to the error correction probability to the corresponding phonetic of error correction phonetic.
Wherein, the concept explanation of the amendment phonetic can be no longer superfluous herein with reference to the corresponding description in the step 103 It states.
Specifically, the step 205 may include: to be arranged each similar pinyin according to the error correction probability Sequence;Extract the corresponding similar pinyin of error correction probability more than predetermined probabilities threshold value;By the similar pinyin of the extraction according to preset Rule is exported.Wherein, the predetermined probabilities threshold value can be 90%, 85% or 70% etc., and the embodiment of the present invention, which is not done, to be had Body limits.For example, with to include 5 phonetics { phonetic 1, phonetic 2, phonetic 3, phonetic in the corresponding similar pinyin of error correction phonetic 4, phonetic 5 }, be obtained by calculation this five phonetics be respectively 98% to the corresponding error correction probability of error correction phonetic, 23%, 3%, 67% and 85%, each phonetic is ranked up from high to low according to error correction probability to obtain { phonetic 1, phonetic 5, phonetic 4, phonetic 2, phonetic 3 }, if predetermined probabilities threshold value be 80%, error correction probability be more than predetermined probabilities threshold value phonetic be phonetic 1 and phonetic 5, Phonetic 1 and phonetic 5 and the Sequential output according to error correction probability from high to low: phonetic 1- phonetic 5 are extracted at this time.
It should be noted that amount threshold can be set for the embodiment of the present invention, when the phonetic in similar pinyin is corresponding Accumulation number of searches when being more than the threshold value, then can be with the sequence of each similar pinyin of appropriate adjustment, such as will be more than threshold value Pinyin sorting exports etc. at first.For the embodiment of the present invention, by setting predetermined probabilities threshold value and according to the threshold value It is screened to obtain qualified phonetic, and all output for selection by the user, avoids and only provides one by obtained phonetic Unilateral problem is corrected caused by when a amendment phonetic, to improve the comprehensive of phonetic error correction, and improves user's use Impression.
Further, as the realization to method shown in above-mentioned Fig. 1, the embodiment of the invention also provides a kind of spelling error correction Device, for being realized to above-mentioned method shown in FIG. 1.The Installation practice is corresponding with preceding method embodiment, for convenient for It reads, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, it should be understood that this reality The full content realized in preceding method embodiment can be corresponded to by applying the device in example.As shown in figure 3, the device includes: to obtain Unit 31, extraction unit 32, computing unit 33, output unit 34, wherein
Acquiring unit 31 can be used for obtaining to error correction phonetic, the multiple characters inputted to error correction phonetic for user.
Extraction unit 32 can be used for extracting from initialized data base and spell with what the acquiring unit 31 was got to error correction Sound corresponding multiple similar pinyins preserve whole words and the corresponding correct spelling phonetic of word in the initialized data base, described Similar pinyin be and the phonetic that preset quantity character is differed to error correction phonetic.
Computing unit 33 can be used for keypad editor distance of the basis based on Chinese phonetic alphabet rule settings, mention described in calculating The each similar pinyin for taking unit 32 to extract is with described to the corresponding error correction probability of error correction phonetic.
Output unit 34, the error correction probability that can be used for being calculated according to the computing unit 33 are exported with described wait entangle The corresponding amendment phonetic of misspelled sound.
Further, as the realization to method shown in above-mentioned Fig. 2, the embodiment of the invention also provides another kind spellings to entangle Misloading is set, for realizing to above-mentioned method shown in Fig. 2.The Installation practice is corresponding with preceding method embodiment, for just In reading, present apparatus embodiment no longer repeats the detail content in preceding method embodiment one by one, it should be understood that this Device in embodiment can correspond to the full content realized in preceding method embodiment.As shown in figure 4, the device includes: to obtain Unit 41, extraction unit 42, computing unit 43, output unit 44 are taken, wherein
Acquiring unit 41 can be used for obtaining to error correction phonetic, the multiple characters inputted to error correction phonetic for user.
Extraction unit 42 can be used for extracting from initialized data base with the acquiring unit 41 acquisition to error correction phonetic Corresponding multiple similar pinyins preserve whole words and the corresponding correct spelling phonetic of word, the phase in the initialized data base It is and the phonetic that preset quantity character is differed to error correction phonetic like phonetic.
Computing unit 43 can be used for keypad editor distance of the basis based on Chinese phonetic alphabet rule settings, mention described in calculating The each similar pinyin for taking unit 42 to extract is with described to the corresponding error correction probability of error correction phonetic.
Output unit 44, the error correction probability that can be used for being calculated according to the computing unit 43 are exported with described wait entangle The corresponding amendment phonetic of misspelled sound.
Further, described device further include: detection unit 45, determination unit 46.
The acquiring unit 41 can be also used for obtaining phonetic to be detected.
The detection unit 45, can be used for detecting whether there is and the phonetic to be detected in the initialized data base Corresponding correct spelling phonetic.
The output unit 44, can be also used for if it exists, then output correct spelling corresponding with the phonetic to be detected Phonetic.
The determination unit 46, can be used for if it does not exist, then the phonetic to be detected is determined as to error correction phonetic.
Further,
The computing unit 43, specifically can be used for disk editing distance according to described based on Chinese phonetic alphabet rule settings and Pre-set factory calculates each similar pinyin with described to the corresponding clean up editing distance of error correction phonetic.
The determination unit 46, can be also used for by the corresponding accumulative number of searches of each similar pinyin and it is described most Product between the inverse of whole editing distance is determined as the error correction probability of each similar pinyin.
Further, the output unit 44 includes:
Sorting module 4401 can be used for being ranked up each similar pinyin according to the error correction probability.
Extraction module 4402 can be used for extracting the corresponding similar pinyin of error correction probability more than predetermined probabilities threshold value.
Output module 4403 can be used for exporting the similar pinyin of the extraction according to presetting rule.
Another spelling error correction device provided in an embodiment of the present invention.Described device include: acquiring unit, extraction unit, Computing unit and output unit.For the prior art when the phonetic inputted to user carries out error correction, mainly according to each word Positional relationship of the mother in keyboard is associated and error correction so that error correcting system is more unilateral, the present invention by get to After error correction phonetic, whole similar pinyins that preset quantity character is differed with to error correction phonetic are extracted, then in conjunction with based on Chinese The keypad editor distance of Pinyin rule setting, successively calculates each similar pinyin of extraction and general to the error correction between error correction phonetic Rate, according to the error correction probability that is calculated determine with to the corresponding phonetic of error correction phonetic, it is therefore compared with the prior art, of the invention When the phonetic inputted to user carries out spelling error correction, combine the corresponding keypad editor distance of Chinese phonetic alphabet rule settings and The frequent degree that user searches for web page contents can be repaired more comprehensively and accurately according to the incorrect pinyin that user inputs Just, to improve the accuracy of spelling error correction.Meanwhile after the phonetic for getting user's input, detection is in preset number first According to whether there is content corresponding with the phonetic in library, it is ensured that only there is no the phonetics inputted with user in the database The error correction to phonetic is carried out when corresponding content, just so as to avoid being that correct phonetic still carries out error correction and makes when user's input At the wasting of resources the problem of, to save resource.
The text processing apparatus includes processor and memory, above-mentioned acquiring unit 31, extraction unit 32, computing unit 33, output unit 34 etc. stores in memory as program unit, is executed by processor stored in memory above-mentioned Program unit realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one Or more, the accuracy of spelling error correction is improved by adjusting kernel parameter.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor The existing spelling error correction method.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation Error correction method is spelt described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can The program run on a processor, processor performs the steps of when executing program to be obtained to error correction phonetic, described to spell to error correction Sound is multiple characters of user's input;It is extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic, Whole words and the corresponding correct spelling phonetic of word are preserved in the initialized data base, the similar pinyin is with described to error correction The phonetic of phonetic difference preset quantity character;According to the keypad editor distance based on Chinese phonetic alphabet rule settings, calculate described each A similar pinyin is with described to the corresponding error correction probability of error correction phonetic;It is exported according to the error correction probability and described to error correction The corresponding amendment phonetic of phonetic.
Further, the method also includes:
It is obtained from the initialized data base and extracts accumulative number of searches corresponding with each similar pinyin.
It is further, described to calculate each similar pinyin with described to the corresponding error correction probability of error correction phonetic, Include:
According to the keypad editor distance and predetermined coefficient based on Chinese phonetic alphabet rule settings, calculate described each similar Phonetic with described to the corresponding clean up editing distance of error correction phonetic, the predetermined coefficient be according to the similar pinyin with to What the character quantity differed between error correction phonetic was set;
It will be between the corresponding accumulative number of searches of each similar pinyin and the inverse of the clean up editing distance Product, be determined as the error correction probability of each similar pinyin.
Further, described to include: to the corresponding phonetic of error correction phonetic with described according to error correction probability output
Each similar pinyin is ranked up according to the error correction probability;
Extract the corresponding similar pinyin of error correction probability more than predetermined probabilities threshold value;
The similar pinyin of the extraction is exported according to presetting rule.
Further, before the acquisition is to error correction phonetic, the method also includes:
Obtain phonetic to be detected;
Detection is in the initialized data base with the presence or absence of correct spelling phonetic corresponding with the phonetic to be detected;
If it exists, then output correct spelling phonetic corresponding with the phonetic to be detected;
If it does not exist, then the phonetic to be detected is determined as to error correction phonetic.
Equipment in the embodiment of the present invention can be server, PC, PAD, mobile phone etc..
The embodiment of the invention also provides a kind of computer program products, when executing on data processing equipment, are suitable for It executes the program of initialization there are as below methods step: obtaining to error correction phonetic, described to error correction phonetic is the multiple of user's input Character;It is extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic, is protected in the initialized data base There are whole words and the corresponding correct spelling phonetic of word, the similar pinyin is to differ preset quantity word to error correction phonetic with described The phonetic of symbol;According to the keypad editor distance based on Chinese phonetic alphabet rule settings, calculate each similar pinyin and it is described to The corresponding error correction probability of error correction phonetic;According to error correction probability output with described to the corresponding amendment spelling of error correction phonetic Sound.
Further, the method also includes:
It is obtained from the initialized data base and extracts accumulative number of searches corresponding with each similar pinyin.
It is further, described to calculate each similar pinyin with described to the corresponding error correction probability of error correction phonetic, Include:
According to the keypad editor distance and predetermined coefficient based on Chinese phonetic alphabet rule settings, calculate described each similar Phonetic with described to the corresponding clean up editing distance of error correction phonetic, the predetermined coefficient be according to the similar pinyin with to What the character quantity differed between error correction phonetic was set;
It will be between the corresponding accumulative number of searches of each similar pinyin and the inverse of the clean up editing distance Product, be determined as the error correction probability of each similar pinyin.
Further, described to include: to the corresponding phonetic of error correction phonetic with described according to error correction probability output
Each similar pinyin is ranked up according to the error correction probability;
Extract the corresponding similar pinyin of error correction probability more than predetermined probabilities threshold value;
The similar pinyin of the extraction is exported according to presetting rule.
Further, before the acquisition is to error correction phonetic, the method also includes:
Obtain phonetic to be detected;
Detection is in the initialized data base with the presence or absence of correct spelling phonetic corresponding with the phonetic to be detected;
If it exists, then output correct spelling phonetic corresponding with the phonetic to be detected;
If it does not exist, then the phonetic to be detected is determined as to error correction phonetic.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence " including one ... ", it is not excluded that including element Process, method, there is also other identical elements in commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (10)

1. a kind of spelling error correction method, which is characterized in that the described method includes:
It obtains to error correction phonetic, the multiple characters inputted to error correction phonetic for user;
It is extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic, is saved in the initialized data base There are whole words and the corresponding correct spelling phonetic of word, the similar pinyin is to differ preset quantity character to error correction phonetic with described Phonetic;
According to the keypad editor distance based on Chinese phonetic alphabet rule settings, each similar pinyin is calculated and described to error correction spelling The corresponding error correction probability of sound;
According to error correction probability output with described to the corresponding amendment phonetic of error correction phonetic.
2. the method according to claim 1, wherein the method also includes:
It is obtained from the initialized data base and extracts accumulative number of searches corresponding with each similar pinyin.
3. according to the method described in claim 2, it is characterized in that, each similar pinyin and described to error correction of calculating The corresponding error correction probability of phonetic, comprising:
According to the keypad editor distance and predetermined coefficient based on Chinese phonetic alphabet rule settings, each similar pinyin is calculated With described to the corresponding clean up editing distance of error correction phonetic, the predetermined coefficient be according to the similar pinyin with to error correction What the character quantity differed between phonetic was set;
By multiplying between the corresponding accumulative number of searches of each similar pinyin and the inverse of the clean up editing distance Product, is determined as the error correction probability of each similar pinyin.
4. the method according to claim 1, wherein described according to error correction probability output and described to error correction The corresponding phonetic of phonetic includes:
Each similar pinyin is ranked up according to the error correction probability;
Extract the corresponding similar pinyin of error correction probability more than predetermined probabilities threshold value;
The similar pinyin of the extraction is exported according to presetting rule.
5. the method according to claim 1, wherein the method is also wrapped before the acquisition is to error correction phonetic It includes:
Obtain phonetic to be detected;
Detection is in the initialized data base with the presence or absence of correct spelling phonetic corresponding with the phonetic to be detected;
If it exists, then output correct spelling phonetic corresponding with the phonetic to be detected;
If it does not exist, then the phonetic to be detected is determined as to error correction phonetic.
6. a kind of spelling error correction device, which is characterized in that described device includes:
Acquiring unit, for obtaining to error correction phonetic, the multiple characters inputted to error correction phonetic for user;
Extraction unit, it is described pre- for being extracted from initialized data base with described to the corresponding multiple similar pinyins of error correction phonetic It sets and preserves whole words and the corresponding correct spelling phonetic of word in database, the similar pinyin is with described to error correction phonetic phase The phonetic of poor preset quantity character;
Computing unit, for calculating each similar pinyin according to the keypad editor distance based on Chinese phonetic alphabet rule settings With described to the corresponding error correction probability of error correction phonetic;
Output unit, for according to error correction probability output with described to the corresponding amendment phonetic of error correction phonetic.
7. device according to claim 6, which is characterized in that
The acquiring unit, is also used to obtain that each similar pinyin is corresponding accumulative searches from the initialized data base Rope quantity;
The extraction unit is also used to extract the corresponding with each similar pinyin accumulative of the acquiring unit acquisition Number of searches.
8. device according to claim 7, which is characterized in that the computing unit includes:
Computing module, for calculating institute according to the keypad editor distance and predetermined coefficient based on Chinese phonetic alphabet rule settings Each similar pinyin is stated with described to the corresponding clean up editing distance of error correction phonetic, the predetermined coefficient is according to the phase It is set like phonetic with the character quantity to be differed between error correction phonetic;
Determining module, for by the corresponding accumulative number of searches of each similar pinyin and the clean up editing distance Product between inverse is determined as the error correction probability of each similar pinyin.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require 1 to the spelling error correction side described in any one of claim 5 Method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit requires 1 to the spelling error correction method described in any one of claim 5.
CN201710928606.5A 2017-09-30 2017-09-30 Spelling error correction method and device Active CN109597983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710928606.5A CN109597983B (en) 2017-09-30 2017-09-30 Spelling error correction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710928606.5A CN109597983B (en) 2017-09-30 2017-09-30 Spelling error correction method and device

Publications (2)

Publication Number Publication Date
CN109597983A true CN109597983A (en) 2019-04-09
CN109597983B CN109597983B (en) 2022-11-04

Family

ID=65956394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710928606.5A Active CN109597983B (en) 2017-09-30 2017-09-30 Spelling error correction method and device

Country Status (1)

Country Link
CN (1) CN109597983B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111028834A (en) * 2019-10-30 2020-04-17 支付宝(杭州)信息技术有限公司 Voice message reminding method and device, server and voice message reminding equipment
CN111694985A (en) * 2020-06-17 2020-09-22 北京字节跳动网络技术有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN111739514A (en) * 2019-07-31 2020-10-02 北京京东尚科信息技术有限公司 Voice recognition method, device, equipment and medium
CN112560452A (en) * 2021-02-25 2021-03-26 智者四海(北京)技术有限公司 Method and system for automatically generating error correction corpus
CN112765231A (en) * 2021-01-04 2021-05-07 珠海格力电器股份有限公司 Data processing method and device and computer readable storage medium
CN115437511A (en) * 2022-11-07 2022-12-06 北京澜舟科技有限公司 Pinyin Chinese character conversion method, conversion model training method and storage medium
CN115905297A (en) * 2023-01-04 2023-04-04 脉策(上海)智能科技有限公司 Method, apparatus and medium for retrieving data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831177A (en) * 2012-07-31 2012-12-19 聚熵信息技术(上海)有限公司 Statement error correction method and system
US20140298168A1 (en) * 2013-03-28 2014-10-02 Est Soft Corp. System and method for spelling correction of misspelled keyword
CN106202153A (en) * 2016-06-21 2016-12-07 广州智索信息科技有限公司 The spelling error correction method of a kind of ES search engine and system
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831177A (en) * 2012-07-31 2012-12-19 聚熵信息技术(上海)有限公司 Statement error correction method and system
US20140298168A1 (en) * 2013-03-28 2014-10-02 Est Soft Corp. System and method for spelling correction of misspelled keyword
CN106202153A (en) * 2016-06-21 2016-12-07 广州智索信息科技有限公司 The spelling error correction method of a kind of ES search engine and system
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑文曦等: "自动拼写校对的算法设计和系统实现", 《科技和产业》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739514A (en) * 2019-07-31 2020-10-02 北京京东尚科信息技术有限公司 Voice recognition method, device, equipment and medium
CN111739514B (en) * 2019-07-31 2023-11-14 北京京东尚科信息技术有限公司 Voice recognition method, device, equipment and medium
CN111028834A (en) * 2019-10-30 2020-04-17 支付宝(杭州)信息技术有限公司 Voice message reminding method and device, server and voice message reminding equipment
CN111028834B (en) * 2019-10-30 2023-01-20 蚂蚁财富(上海)金融信息服务有限公司 Voice message reminding method and device, server and voice message reminding equipment
CN111694985A (en) * 2020-06-17 2020-09-22 北京字节跳动网络技术有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN111694985B (en) * 2020-06-17 2022-03-01 北京字节跳动网络技术有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN112765231A (en) * 2021-01-04 2021-05-07 珠海格力电器股份有限公司 Data processing method and device and computer readable storage medium
CN112560452A (en) * 2021-02-25 2021-03-26 智者四海(北京)技术有限公司 Method and system for automatically generating error correction corpus
CN115437511A (en) * 2022-11-07 2022-12-06 北京澜舟科技有限公司 Pinyin Chinese character conversion method, conversion model training method and storage medium
CN115905297A (en) * 2023-01-04 2023-04-04 脉策(上海)智能科技有限公司 Method, apparatus and medium for retrieving data
CN115905297B (en) * 2023-01-04 2023-12-15 脉策(上海)智能科技有限公司 Method, apparatus and medium for retrieving data

Also Published As

Publication number Publication date
CN109597983B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN109597983A (en) A kind of spelling error correction method and device
US11860684B2 (en) Few-shot named-entity recognition
US11016997B1 (en) Generating query results based on domain-specific dynamic word embeddings
CN105989040B (en) Intelligent question and answer method, device and system
CN107704503A (en) User's keyword extracting device, method and computer-readable recording medium
CN105975459B (en) A kind of the weight mask method and device of lexical item
CN110019668A (en) A kind of text searching method and device
US10831993B2 (en) Method and apparatus for constructing binary feature dictionary
CN106610931B (en) Topic name extraction method and device
TWI710917B (en) Data processing method and device
CN106970912A (en) Chinese sentence similarity calculating method, computing device and computer-readable storage medium
CN109344406A (en) Part-of-speech tagging method, apparatus and electronic equipment
US20170185653A1 (en) Predicting Knowledge Types In A Search Query Using Word Co-Occurrence And Semi/Unstructured Free Text
US8290925B1 (en) Locating product references in content pages
US20210109994A1 (en) Natural language processing using joint sentiment-topic modeling
CN106598997B (en) Method and device for calculating text theme attribution degree
CN110019784B (en) Text classification method and device
CN109597982B (en) Abstract text recognition method and device
CN110717008B (en) Search result ordering method and related device based on semantic recognition
CN111597336A (en) Processing method and device of training text, electronic equipment and readable storage medium
CN115563268A (en) Text abstract generation method and device, electronic equipment and storage medium
CN105095826B (en) A kind of character recognition method and device
CN111475641B (en) Data extraction method and device, storage medium and equipment
CN107861950A (en) The detection method and device of abnormal text
CN110019665A (en) Text searching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant