CN100440207C - Chinese dictionary search engine and method for quick positioning words in Chinese dictionary - Google Patents
Chinese dictionary search engine and method for quick positioning words in Chinese dictionary Download PDFInfo
- Publication number
- CN100440207C CN100440207C CNB200410104045XA CN200410104045A CN100440207C CN 100440207 C CN100440207 C CN 100440207C CN B200410104045X A CNB200410104045X A CN B200410104045XA CN 200410104045 A CN200410104045 A CN 200410104045A CN 100440207 C CN100440207 C CN 100440207C
- Authority
- CN
- China
- Prior art keywords
- chinese
- dictionary
- chinese words
- gbk
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000001174 ascending effect Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 244000144730 Amygdalus persica Species 0.000 description 1
- 241000345998 Calamus manan Species 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 239000004215 Carbon black (E152) Substances 0.000 description 1
- 206010062717 Increased upper airway secretion Diseases 0.000 description 1
- 241001233242 Lontra Species 0.000 description 1
- 206010033799 Paralysis Diseases 0.000 description 1
- 240000001619 Prunus glandulosa Species 0.000 description 1
- 235000018992 Prunus glandulosa Nutrition 0.000 description 1
- 235000013999 Prunus japonica Nutrition 0.000 description 1
- 235000006040 Prunus persica var persica Nutrition 0.000 description 1
- 206010053615 Thermal burn Diseases 0.000 description 1
- 235000009754 Vitis X bourquina Nutrition 0.000 description 1
- 235000012333 Vitis X labruscana Nutrition 0.000 description 1
- 240000006365 Vitis vinifera Species 0.000 description 1
- 235000014787 Vitis vinifera Nutrition 0.000 description 1
- 229910052787 antimony Inorganic materials 0.000 description 1
- WATWJIUSRGPENY-UHFFFAOYSA-N antimony atom Chemical compound [Sb] WATWJIUSRGPENY-UHFFFAOYSA-N 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 239000003610 charcoal Substances 0.000 description 1
- 210000000038 chest Anatomy 0.000 description 1
- 235000009508 confectionery Nutrition 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 210000000003 hoof Anatomy 0.000 description 1
- 229930195733 hydrocarbon Natural products 0.000 description 1
- 150000002430 hydrocarbons Chemical class 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 208000026435 phlegm Diseases 0.000 description 1
- 235000012950 rattan cane Nutrition 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 206010041232 sneezing Diseases 0.000 description 1
- 235000014347 soups Nutrition 0.000 description 1
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention discloses a method for rapidly finding a Chinese character in a dictionary, which is characterized in that an index table is formed by an offset in a dictionary of the starting position in a Chinese dictionary corresponding to Chinese characters in the Chinese dictionary; the offset in the dictionary corresponding to the Chinese characters is acquired from the index table according to the GB2312/GBK code of the input Chinese characters; the Chinese characters are directly positioned by the offset in the dictionary and the starting position of the Chinese dictionary. The present invention simultaneously discloses a Chinese dictionary searching index and has the technical scheme that the Chinese characters are input into the Chinese dictionary by positioning in an ordering mode of GB2312/GBK code of the Chinese characters; the present invention has the advantages of high speed and less resource occupation in the process of positioning.
Description
Technical field
The present invention relates to the method that a kind of Chinese dictionary search engine reaches quick positioning words in Chinese dictionary.
Background technology
To the search technique of literal, common have traversal and technology such as binary search.In the search technique of Chinese words, general with methods such as traversals, promptly entry word compares and searches one by one.
This searching method randomness is strong, adopts simple algorithm then to need the suitable time can obtain the result, and adopts complicated method also quite big to efficient and the memory headroom demand of handling.Therefore, use for the search of medium scale Chinese words data volume, as Chinese dictionary, just this method is inapplicable.In a word, when adopting existing searching method to locate literal in Chinese dictionary, speed is all slow, thereby causes inefficiency.
Summary of the invention
The invention provides the method that a kind of Chinese dictionary search engine reaches quick positioning words in Chinese dictionary, have inefficient problem when prior art is located word in Chinese dictionary to solve.
For addressing the above problem, the invention provides following technical scheme:
A kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; This method comprises the steps:
A, the GB2312/GBK coding of setting up Chinese words in the Chinese dictionary and this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table;
D, determine the position that is coded in concordance list of this Chinese words, and read dictionary bias internal amount according to described reference position and described table bias internal amount;
E, judge whether the GB2312/GBK coding of the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise execution in step F;
F, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
A kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; This method comprises the steps:
A, with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, press the GB2312/GBK coded sequence of GB2312/GBK coding schedule and corresponding Chinese words and preserve to form concordance list;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the position of the dictionary bias internal amount of first Chinese words correspondence in the GB2312/GBK table at this Chinese words place at described concordance list;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate this Chinese words table bias internal amount with respect to first Chinese words in the GB2312/GBK table;
D, the position of dictionary bias internal amount in described concordance list of determining this Chinese words correspondence according to the side-play amount and the described table bias internal amount of described first Chinese words correspondence, and read dictionary bias internal amount;
E, judge that whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
A kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; Comprise:
First module, be used for setting up and preserve the GB2312/GBK coding of Chinese words in the Chinese dictionary and this Chinese words at described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
Second module is used to receive the Chinese words of input, and utilizes the GB2312/GBK coding of this Chinese words to obtain corresponding dictionary bias internal amount from described first module;
Three module, whether the GB2312/GBK coding that is used to judge the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed; Wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
Unit the 3rd is used for determining according to the result of first module and Unit second position that is coded in concordance list of this Chinese words, and reads dictionary bias internal amount.
A kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; Comprise:
First module is used for the GB2312/GBK coded sequence by GB2312/GBK coding schedule and corresponding Chinese words, preserves the dictionary bias internal amount of the interior Chinese words of Chinese dictionary with respect to the reference position of this Chinese dictionary;
Second module, be used to receive the Chinese words of input, and according to the represented row of this Chinese words in the GB2312/GBK table of the represented GB2312/GBK of first byte table numbering and second byte in the coding of input Chinese words number be listed as number, calculate the position of the pairing dictionary bias internal of this Chinese words amount in described first module;
Three module, be used to judge whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for showing the reference position of numbering and concordance list according to the represented GB2312/GBK of first byte of the Chinese words of importing, calculate the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place, wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
The present invention is with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, pressing the GB2312/GBK coded sequence of GB2312/GBK coding schedule and corresponding Chinese words preserves to form concordance list, encode according to the GB2312/GBK of input Chinese words and to inquire corresponding dictionary bias internal amount, at last locate Chinese words according to the reference position of dictionary bias internal amount and Chinese dictionary, thereby, make basic fixed search time of each words controlled, not only locate the short and efficient height of time of Chinese words, and the resource that takies in position fixing process is few, can save internal memory and storage space.
Description of drawings
Fig. 1, Fig. 2 are the synoptic diagram of two kinds of concordance lists among the present invention;
Fig. 3, Fig. 4 are respectively and adopt Fig. 1, concordance list shown in Figure 2 to realize the process flow diagram of word location.
Fig. 5 is the structured flowchart of Chinese dictionary search engine of the present invention.
Embodiment
The present invention is by adopting the Chinese words of locating input with the GB2312 or the GBK coding and sorting order mode of Chinese words from the Chinese words allusion quotation.Its core concept is: with the dictionary bias internal amount formation concordance list of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, GB2312/GBK according to the input Chinese words encodes the dictionary bias internal amount of obtaining this Chinese words correspondence from concordance list, directly locatees Chinese words according to the reference position of dictionary bias internal amount and Chinese dictionary at last.
The GBK coding and the GB2312 coding of Chinese character all are the Chinese character code tables that a kind of country formulates, and describe a Chinese character by two bytes.GBK is the extended coding of GB2312.Present embodiment mainly is encoded to example with GBK and describes.
Basic GBK sheet format is as follows:
CC?0 1 2 3 4 5 6 7 8 9 A B C D E F
4 Oi Lin Xuan Qin Lai Fallen-leaves-and-bark E Rui Rui Qi Lo Sue Tui Mang Yun Ping
5 Yu Xun Ji Jiong Shou-Qiu Su Jiong Feng Tiller Rang Yi ?Yu Ju Xian
6 Lian Yin Qiang Ying Tiffany Tou Hua Yue Ling Yao Mei Han Hui Lan Ji Tang
7 Man Lei Lei Hua Song Zhi Wei
Huai Gracilaria Ji Lei Dill Spice
9 Xu Cuo Fu Virtual Krupp Hu No. Hao Ju Cruel Yan Zhan Zhan Loss Bin Terrified
A Shu otter flog spoil step on the tire tongue lift the safe phthalein of platform too attitude eliminate the stand that collapses
B covets altar wingceltis phlegm pool, paralysed beach
Talking the smooth blanket carbon that shields visits and to sigh charcoal
If the C soup pool is warded off Chinese bush cherry thorax Tang sugar and is lain to drop down time to scald and draw the great waves flood
D silk ribbon grape peach escapes wash in a pan to make pottery and begs for the special rattan of cover and rise the painful ladder of transcribing and pick and play
E antimony is carried the topic hoof body of crowing and is shaved the drawer sky and add and fill out the field for sneezing cautious tears
Sweet quiet the licking of F sumptuously chosen far the looking into the distance of bar and jump to be pasted the iron card Room and listen hydrocarbon
Chinese words GBK is encoded to two bytes, and wherein first byte is the numbering of this table (Table), and its scope is: 81~FE; The Gao Siwei of second byte is row (Row) label, and its scope is: 4~F, and low four of second byte are row (Column) label, its scope is: 0~F.Being encoded to of " Tan " word in the table: CCB7 for example.
Total for GBK one: FE-81+1=254-129+1=126 table, each table has: F-4+1=15-4+1=12 is capable, and each table has the F-0+1=15-0+1=16 row.Therefore, each table comprises 16 (row) * 12 (OK)=192 Chinese character, and is total total: 126 (table) * 192 (word)=24192 words.
Literal order in the dictionary can randomize.Just the speech behind this word must so that when navigating to first word, can inquire the speech with this word beginning fast only with thereafter.As:
The people
The people
Personnel
Ah
Auntie
The younger sister
In the present embodiment, the foundation of concordance list can be adopted dual mode (but being not limited to two kinds):
A kind of mode be the GBK coding that adopts Chinese words in the Chinese dictionary with this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position, the GBK of Chinese words is encoded as index.
As shown in Figure 1, in concordance list, the GBK of Chinese words coding is arranged by ascending order, and the Chinese words of arranging in the Chinese dictionary takies 4 bytes with respect to the side-play amount of the reference position of this dictionary.As shown in FIG., first table of GBK table promptly is numbered the GBK table of " 81 " in the foremost of concordance list, and the word among this GBK is arranged by the sequencing of row.As the index of showing the starting position is encoded to " 8140 ", promptly represents first word in the GBK table No. 81, this coding correspondence be this word side-play amount with respect to the dictionary reference position in Chinese dictionary; Being encoded to after the starting position " 8411 " promptly represented second word in the GBK table No. 81, by that analogy.
GBK according to the Chinese words of importing when obtaining the dictionary bias internal amount of this Chinese words correspondence encodes the side-play amount that inquires correspondence from concordance list.
The second way is with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, preserves by the GBK coded sequence of GBK coding schedule and corresponding Chinese words and sets up concordance list.
As shown in Figure 2, in concordance list, putting in order of dictionary bias internal amount arranged corresponding with the GBK coding ascending order of Chinese words.As shown in FIG., what concordance list was preserved from continuous 4 bytes of starting position is the dictionary bias internal amount that first table of GBK table promptly is numbered first word (being that GBK is encoded to " the 8410 ") correspondence the GBK table of " 81 ", after continuous 4 side-play amounts that byte is second word correspondence.By that analogy.
In setting up the concordance list process,, in concordance list shown in Figure 2, the dictionary bias internal amount of relevant position is put sky if the word of the coding representative after some GBK coding is then put sky with the coding of relevant position in the concordance list of Fig. 1 not in Chinese dictionary.Putting sky is meant specific and other codings or the different mark of dictionary bias internal amount is set.
Therefore, according to the difference of concordance list, the realization of word location has all differences.Adopt concordance list shown in Figure 1 to realize the word location process as shown in Figure 3:
Step 1, the GBK coding of setting up Chinese words in the Chinese dictionary and this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position, as shown in Figure 1.
Step 2, according to the represented GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the reference position in the described concordance list of being coded in of first Chinese words in the GBK table at this Chinese words place.
Step 3, according to the represented row of this Chinese words in the GBK table of second byte of the Chinese words of described input number and row number, calculate the table bias internal amount of this Chinese words with respect to first word in the GBK table.
Step 4, determine the position that is coded in concordance list of this Chinese words, and read dictionary bias internal amount according to described reference position and described table bias internal amount.
Step 5, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
In order in time to point out to null character (NUL), in step 4, read and judge whether corresponding GBK coding is empty before the dictionary bias internal amount, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise continue step 5.
Adopt concordance list shown in Figure 2 to realize the word location process as shown in Figure 4:
In order in time to point out to null character (NUL), before reading side-play amount, step 14 judges also that whether side-play amount is empty, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise continue step 15.
Find out that from following both differences are concordance list and obtain side-play amount and distinguish to some extent in concordance list.His-and-hers watches for concordance list shown in Figure 1, main according to being that GBK according to the Chinese words of input encodes and locatees the position that this is coded in concordance list, for concordance list shown in Figure 2, main GBK according to the Chinese words of importing encodes and locatees the corresponding position of side-play amount in concordance list.Taking 4 bytes (byte) space below by dictionary bias internal amount illustrates.
(1) calculate the location number of this word place GBK table according to first byte code in the GBK coding of the Chinese words of input:
7E (summary table number)-(FE-first byte code)-1 is initial with 81 wherein, and FE is termination, and starting symbol 81 is 0.
(2) reference position of the GBK at Shu Ru Chinese words place table in concordance list:
The location number * 192 (number of words of each table) * N; For the index of Fig. 1, N is 2, and promptly the byte number of the GBK of each word coding is (if coding is deposited continuously with corresponding dictionary bias internal amount, be that preceding two bytes are coding, back to back is the dictionary bias internal amount of 4 bytes, and N is 2+4 just in this case, i.e. 6 bytes); For Fig. 2, N is 4, i.e. the byte number that takies of each dictionary bias internal amount.
(3) calculate the table bias internal amount of word according to second byte of the GBK coding of the Chinese words of input:
[192 (total number of word)-(256-second byte code (Hex))] * N, wherein, 256 is last word code (FF)+1; For the index of Fig. 1, N is 2+4, i.e. the byte number of the GBK of each word coding; For Fig. 2, N is 4, i.e. the byte number that takies of each dictionary bias internal amount.
(4) the GBK table reference position in concordance list is added the dictionary bias internal amount of the Chinese words correspondence that table bias internal amount can obtain to import.
According to above-mentioned description, the present invention can obtain locating fast the Chinese dictionary search engine of the Chinese words of input equally from the Chinese words allusion quotation, and processor, storer and input equipment parts are realized the Chinese words location in this search engine and the computer installation.As shown in Figure 5, Chinese engine comprises:
First module, be used for setting up and preserve the GB2312/GBK coding of Chinese words in the Chinese dictionary and this Chinese words at described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
Second module is used to receive the Chinese words of input, and utilizes the GB2312/GBK coding of this Chinese words to obtain corresponding dictionary bias internal amount from described first module;
Three module is used for initial memory location and described dictionary bias internal amount according to Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary.
Above-mentioned second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
Equally, another Chinese dictionary search engine (its structure is with reference to figure 5) comprising:
First module is used for the GB2312/GBK coded sequence by GB2312/GBK coding schedule and corresponding Chinese words, preserves the dictionary bias internal amount of the interior Chinese words of Chinese dictionary with respect to the reference position of this Chinese dictionary;
Second module, be used to receive the Chinese words of input, and according to the represented row of this Chinese words in the GB2312/GBK table of the represented GB2312/GBK of first byte table numbering and second byte in the coding of input Chinese words number be listed as number, calculate the position of the pairing dictionary bias internal of this Chinese words amount in described first module, and read dictionary bias internal amount from first module;
Three module is used for initial memory location and described dictionary bias internal amount according to Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary.
Described second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
To the localization process of the Chinese words that adopts the GB2312 coding and said method in like manner, repeat no more.Obviously, concordance list of the present invention can also be other forms, and therefore, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.
Claims (5)
1, a kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that this method comprises the steps:
A, the GB2312/GBK coding of setting up Chinese words in the Chinese dictionary and this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table;
D, determine the position that is coded in concordance list of this Chinese words, and read dictionary bias internal amount according to described reference position and described table bias internal amount;
E, judge whether the GB2312/GBK coding of the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise execution in step F;
F, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
2, the method for claim 1 is characterized in that, described corresponding relation is stored in the concordance list, and the GB2312/GBK coding is arranged as index and by the coding ascending order.
3, a kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that this method comprises the steps:
A, with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, press the GB2312/GBK coded sequence of GB2312/GBK coding schedule and corresponding Chinese words and preserve to form concordance list;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the position of the dictionary bias internal amount of first Chinese words correspondence in the GB2312/GBK table at this Chinese words place at described concordance list;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate this Chinese words table bias internal amount with respect to first Chinese words in the GB2312/GBK table;
D, the position of dictionary bias internal amount in described concordance list of determining this Chinese words correspondence according to the side-play amount and the described table bias internal amount of described first Chinese words correspondence, and read dictionary bias internal amount;
E, judge that whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
4, a kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that, comprising:
First module, be used for setting up and preserve the GB2312/GBK coding of Chinese words in the Chinese dictionary and this Chinese words at described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
Second module is used to receive the Chinese words of input, and utilizes the GB2312/GBK coding of this Chinese words to obtain corresponding dictionary bias internal amount from described first module;
Three module, whether the GB2312/GBK coding that is used to judge the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed; Wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
Unit the 3rd is used for determining according to the result of first module and Unit second position that is coded in concordance list of this Chinese words, and reads dictionary bias internal amount.
5, a kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that, comprising:
First module is used for the GB2312/GBK coded sequence by GB2312/GBK coding schedule and corresponding Chinese words, preserves the dictionary bias internal amount of the interior Chinese words of Chinese dictionary with respect to the reference position of this Chinese dictionary;
Second module, be used to receive the Chinese words of input, and according to the represented row of this Chinese words in the GB2312/GBK table of the represented GB2312/GBK of first byte table numbering and second byte in the coding of input Chinese words number be listed as number, calculate the position of the pairing dictionary bias internal of this Chinese words amount in described first module;
Three module, be used to judge whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for showing the reference position of numbering and concordance list according to the represented GB2312/GBK of first byte of the Chinese words of importing, calculate the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place, wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB200410104045XA CN100440207C (en) | 2004-12-31 | 2004-12-31 | Chinese dictionary search engine and method for quick positioning words in Chinese dictionary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB200410104045XA CN100440207C (en) | 2004-12-31 | 2004-12-31 | Chinese dictionary search engine and method for quick positioning words in Chinese dictionary |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1632798A CN1632798A (en) | 2005-06-29 |
CN100440207C true CN100440207C (en) | 2008-12-03 |
Family
ID=34848200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB200410104045XA Expired - Fee Related CN100440207C (en) | 2004-12-31 | 2004-12-31 | Chinese dictionary search engine and method for quick positioning words in Chinese dictionary |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100440207C (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101118777B (en) * | 2007-08-22 | 2011-06-22 | 无锡中星微电子有限公司 | Playing method of multimedia container format file and indexes reading method thereof |
CN102609510B (en) * | 2012-02-06 | 2014-05-28 | 中国农业银行股份有限公司 | Chinese name data processing method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1196535A (en) * | 1997-04-15 | 1998-10-21 | 英业达股份有限公司 | The method of automatic labeling of pronunciation symbols |
CN1295295A (en) * | 1999-11-04 | 2001-05-16 | 英业达集团(西安)电子技术有限公司 | Word looking-up method for electronic dictionary with fast polling index structure |
-
2004
- 2004-12-31 CN CNB200410104045XA patent/CN100440207C/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1196535A (en) * | 1997-04-15 | 1998-10-21 | 英业达股份有限公司 | The method of automatic labeling of pronunciation symbols |
CN1295295A (en) * | 1999-11-04 | 2001-05-16 | 英业达集团(西安)电子技术有限公司 | Word looking-up method for electronic dictionary with fast polling index structure |
Also Published As
Publication number | Publication date |
---|---|
CN1632798A (en) | 2005-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8010344B2 (en) | Dictionary word and phrase determination | |
CN101199122B (en) | Using language models to expand wildcards | |
US8412517B2 (en) | Dictionary word and phrase determination | |
CN110019647B (en) | Keyword searching method and device and search engine | |
CN107545044A (en) | A kind of tables of data method for building up, electronic equipment and storage medium | |
CN102385609A (en) | Enhancing search-result relevance ranking using uniform resource locators for queries containing non-encoding characters | |
US9158758B2 (en) | Retrieval of prefix completions by way of walking nodes of a trie data structure | |
CN109918682B (en) | Text labeling method and device | |
KR20140056231A (en) | Detecting source languages of search queries | |
CN100464333C (en) | File name generating method and device in file distribution system | |
CN101715579A (en) | Language independent index storage system and retrieval method | |
WO2014047214A1 (en) | Hierarchical ordering of strings | |
CN104281275A (en) | Method and device for inputting English | |
CN100440207C (en) | Chinese dictionary search engine and method for quick positioning words in Chinese dictionary | |
CN103064847A (en) | Indexing equipment, indexing method, search device, search method and search system | |
CN101930474A (en) | Chinese character simple stroke search method | |
CN113553410B (en) | Long document processing method, processing device, electronic equipment and storage medium | |
CN102385597B (en) | The fault-tolerant searching method of a kind of POI | |
CN101122905A (en) | Method for associating classical book database with historical geographic information system for supporting four bytes | |
CN110619112A (en) | Pronunciation marking method and device for Chinese characters, electronic equipment and storage medium | |
US7366984B2 (en) | Phonetic searching using multiple readings | |
CN2869995Y (en) | Chinese-character searching engine | |
CN110489603A (en) | A kind of method for information retrieval, device and vehicle device | |
CN101331483A (en) | Method and apparatus for manipulation of data file | |
CN102004598B (en) | Media player and character input method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20081203 Termination date: 20111231 |