GB2166573A

GB2166573A - Encoding chinese characters

Info

Publication number: GB2166573A
Application number: GB08526483A
Authority: GB
Inventors: Kam-Fu Wong
Original assignee: Individual
Current assignee: Individual
Priority date: 1984-10-29
Filing date: 1985-10-28
Publication date: 1986-05-08
Also published as: CN85108032B; CN85108032A; GB2166573B; GB8526483D0; HK52890A; GB8427281D0

Abstract

A chinese character to be keyed in is divided into a "leading part" and a "body part", The leading part is keyed in as a whole. The body part is keyed in in terms of its constituent strokes. The keyboard has function keys, the leading part keys and the body part stroke keys. The body part stroke keys are in the positions which are used most frequently. The invention can also be used for indexing a chinese dictionary. <IMAGE>

Description

SPECIFICATION Materialistic system for encoding Chinese characters and the feeding keyboard thereof The invention relates to a system for encoding Chinese characters by means of which Chinese characters can be fed into an electronic computer, telex or teleprinter machine or the like, and the system can also be used to index a Chinese directory.

Up to now, there are more than 400 methods for encoding Chinese characters including the method of encoding Chinese characters in numerical notation described and claimed in United Kingdom Patent No. 2 100 899, but only a few are workable. Generally speaking, there are five categories of systems for encoding and inputting Chinese characters.

1. Phonetically Alphabetic system This kind of system is only applicable to professional computer operators who have a good command of Chinese alphabetic spelling. Only when the alphabetic spelling is 100 percent correct, can the character be identified. Since there are many homonyms in Chinese characters, it is too hard or even impossible for a person who does not know Chinese alphabetic spelling or only has scant knowledge thereof.

2. Large keyboard system Under this system, the whole character should be fed into computer by only one key. Only those characters which are on the keyboard can be fed into the computer. It needs a very large keyboard which occupies a large space. Furthermore, it takes a long time to get familiar with the keyboard. A person has no interest to learn if he is not intended to and nobody can use it to type or feed into computer without looking at it.

3. Parts system Under this system, the Chinese character is divided into several parts to one part of which a code (or digit) will be given and then can be fed into computer. The advantage of this system is that the Chinese character which can be easily divided can be fed into computer quickly for example, '.#+A=#,#+1:#.. Such feeding depends on 100 per cent correct division of the character, otherwise the character can not be identified. As for some characters, for example it * 7+, * '', it is even almost impossible to divide them into parts, particularly for those who are not conversant with Chinese characters. And more difficult thing is that there is strict rule for such division and the division of a character varies from person to person, it is not easy to make a correct division for a character.

4. Number code system This is a better system which is relatively easy to learn. But it usually involves a long code and some characters may have same codes. The procedure of encoding under this system is first to divide a character into several parts and then give each part a digit. Since there is no strick rule in dividing a character into parts and a stroke may be given a different digit in different characters, so it makes people often hesitate in taking code of a character.

5. Word, phrase and sentence system This encoding system has high logic of Chinese character. Under this system Chinese charnc ters compose 200,000 phrase and sentence and the whole phrase or whole sentence is fed into computer. This system is only suitable to a computer which has good performance and large store space.

To sum up, we can see that any one of system among above mentioned 5 systems has its own proper defect. It is too difficult to divide Chinese characters into parts and encode them. In addition, it is very inconvenient to encode a Chinese character from stroke to stroke, and the code is quite long. The Chinese character consists of leading part and body part or strokes.

How to divide them is very difficult.

It is therefore an object of the present invention to provide a simple encoding system for Chinese characters and in particularly, a system whereby the character can be encoded in a form suitable for simple and quick entry into a electronic computer.

According to the present invention, each Chinese character is considered as a plain figure which can be divided into two parts, a leading part and a body part. The leading part is represented by one code element and the body part is represented by four code elements (digits). The code element for the leading part is first fed into computer, and then the code elements for body part, so that the whole Chinese character can be fed into computer.

Since the encoding system of the present invention is not based on Phonetical Alphabet or stroke order of hand writing, it is especially suitable for those who are not familiar with Chinese characters. By means of the encoding method of the present invention anyone can feed Chinese characters into computer without difficulties no matter weather he or she knows phonetical Alphabet and the Chinese hand writing rules for Chinese character or not.

The keyboard of the present invention can be used for Chinese character and English, it can be applicable anywhere in the world. Besides, one can use it to typewrite without looking at it.

Brief description of the attached drawings is following: Figure 1 shows a keyboard arranged by the encoding system of the present invention.

Figure 2 shows the relation between 10 basic strokes and the corresponding keys on the keyboard of the encoding system of this invention.

Figure 3 shows a index example of the Chinese dictionary compiled upon the encoding system of this invention.

According to the present invention there are 82 leading parts which are arranged on 35 keys.

On each key in second line 3 leading parts are arranged (see Fig. 1) and on each key in third and fifth line 2 leading parts. With the help of leading-part-keys the operator can easily divide a Chinese character into leading part and body part and does not need to think how to divide them so as to decrease hesitation during the division of the leading part and the body part, decrease memory, decrease consideration pressure and increase encoding velocity.

For the body part there are 10 stroke shapes #, #, #, #, #, #, #, #, #, # which are represented respective by 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 (i.e. key A, S, D, F, G, H, J, K, L, +). Since for most of Chinese character it is necessary to push 3 or 4 times body-part-keys, therefore it is proper to arrange the body-part-keys in the positions which are frequently used so that the moving frequency of the fingers can be decreased and encoding velocity can be increased.

During encoding, at first one can divide a Chinese character into a leading part and a body part, find the corresponding leading-part-key on the keyboard and press it, then the code representing leading part is fed into computer. For example, if someone wants to feed the Chinese character " ## " into computer, at first divide " ## " into " # " and " # ". The " # " is the leading part. On the keyboard he can find the leading-part-key for leading part " # " and then press it, the leading part " # " is now fed into computer. The lower part and the right part of a character, however, can not be called the leading part, thus these parts can not be fed in by leading-part-keys. By way of example, the character " ## " has a part " # " on "on the right.

Although the "13" part has a key on the keyboard representing the same as a leading part, the IS " part of the character can not be inputted by that key. The character '' titi " thus can only be treated as a connected character. Another example is " # ". The "# " part can not be inputted by the same leading part key. This character " # " can only be inputted as a connected character.

After the leading part is fed into computer, you can turn to encode the body part of the character. The times for extracting stroke from the body part is at most 4. The basic rule for taking strokes is upper level first, lower second. For example, the character " * " " t ", its leading part is "# " and first four strokes of body part are "- ", " I ", " I " and corresponding code element (digit) 7, 3, 3 and 7, so the code of the character " > " is t 7337.

Another example, the Chinese character "s5 " can be divided into the leading part "t" and the body part " # ". The strokes of this body part are "-" and " # " corresponding digit 7 and 3, so the code for " ## " is # 73.

But there are some of Chinese characters whose body part can be divided into left part and right part. For those characters there is another rule for taking strokes, left first, right second.

By way of example, the body part " # " of the character " ### " can be divided into "X" and " # ". In encoding it is first to take the strokes of left part " " and second the strokes of right part "#" with the rule upper first, lower second.

For some of the characters which can be divided into outside part and inside part the rule for taking strokes is outside first, inside second. Of course for the inside sub-part or outside subpart it is still the rule upper first, lower second. For example, the character " #'#" which has the body part "kl ", its outside part is " > ", inside part is " # ", the first four strokes of the body part are "L", "I ", "7" and " The connected characters are those characters which have no leading parts. So the code for such characters should be taken stroke by stroke. The times of strokes taken from connected characters is at most five.The fule for taking code for connected characters is just the same as the rule for the body part of characters. For example, the code for character " # " is 656, the code for the character " # " is 07547.

The keyboard of the present invention can be the normal English keyboard. see Fig. 1. The first line of keys serves as the function-key by means of which feeding mode can be selected.

For example, F1 represents the feeding mode of the encoding system according to the present invention. F2 represents the feeding mode of the Phonetically Alphabetic system, and so on.

The second, third and fifth line in the keyboard serve "as leading-part-key". 82 basic leading parts are arranged on 35 keys.

In fourth line, first 10 keys from the left serves as "body-part-key" each of which corresponds two similar strokes, see Fig. 2.

The lowest bar-key on the keyboard serves as feeding key which is used after encoding. Of course the key is also the space key.

Furthermore the last two lines on the bottom of the screen of the computer are acted as "note-window" which show 22 Chinese characters and their codes in two lines. The computer operator can see Chinese characters and their codes on the screen so that he can learn the code through looking at the "note-window" and he need not look for the code of some hard-toencode characters in the code hand-book.

By way of example, we want to feed the character " # " into computer whose leading part is " # ".

First: press leading part-key " n + ", 22 Chinese characters which have leading part "'7 " or " # " are shown on the "note-window, most of them have a code composed of 4 digits.

Since the body part of " # " is " # " the first stroke of which is " # " corresponding digit 5, so the second step is pressing the body-part-key "5". A some of the characters, which have first stroke "/ " in body part among the characters which have leading part "#" or "t", are shown on the "note-window. These characters have left at most 3 digit code.

Since the second stroke of body part ",j" is "7" corresponding digit 8, the third step is to press the body part-key "8". A number of characters which have second stroke "7" in its body part are shows on note-window. These characters have left at most 2 digits.

Since the third stroke of the body part " # " is " # " corresponding digit 6, the forth step is to press body-part-key "6". A number of characters which have third stroke " " in body part are shown on the "note-window".

The fourth stroke of the body part "t1 " is corresponding digit 4.

Fifth step: press the body part key "4". The character ", & is the one that remains on the "note-window" by pressing the bar key, then one can feed the character " # " into computer.

In the same way, the Chinese character, can be fed into computer by means of phonetically alphabetic system.

For example, feeding the character " ## " which has phonetical alphabetic "Fei".

First step: pressing the key F, then a number of characters which have the phonetical Alphabet F will be shown on the "note-window". The area from the far right shows the Alphabet fed into computer.

Second step: pressing the key "e". The "note-window" then will show many characters.

Third step: pressing the key "i". All characters which have the same ponetical Alphabet "Fei'' and their notes are shown on the "note-window". You can now select " '' from them.

Another advantage of the present invention is that there are two or more codes for one character. This advantage brings much convenience to the operator. If one character has only one code, it will depend on 100% correct encoding and feeding. Otherwise one cannot find the character. If there are two or more codes for one character, it will make the encoding much more easy. For example, the character #" ,;4 " has two codes. Y0591 and YWI through which the character ", " can be fed into computer, so that the feeding volecity and efficiency will be improved.

This system for encoding Chinese characters can also be used for indexing a Chinese dictionary. Under this system characters of the same code are put on the same page of the dictionary. This is a new development of indexing a Chinese character. Under this system the leading part code and body part code are used as the page number in Chinese dictionary, the search of words in such dictionary becomes more simple.

For example, we want to search the character ",##". The procedures for search the character ",',#," are: (1) to determine the leading part code first. From Fig. 3 you can find the code for "#" is 17 which corresponds the page 17 in dictionary.

(2) to determine the body part code. In this case the body part is '' v ". The first stroke of " # " is " # " corresponding digit 5.

(3) So character "At#" should be on page "17-5" (see the following charts.

1, Ers#PiiiB#lgTBI111 1 113 | 7| Sa11-4#is LI#1# 1 17-# .. - 1 17 6 .flS Ptif ~ - 17-7 tgi E,;T! te~ I 9 5 17-8 t.;RA r t 17-9 R itz Ef ! r $ S

Claims

1. A method of encoding and inputting Chinese characters in which each character is considered as a plane figure with at least a first part having at least one stroke and, optionally, one or more additional parts together called a body part having at least one stroke, characterized in that, where the first part corresponds to one of a number of specified leading parts, this first part is input as a whole and each stroke of the body part is input separately and, where the first part does not correspond to any of the specified leading parts, the strokes of the first and body parts are input separately.

2. A method, according to Claim 1, for encoding and inputting Chinese characters in a way that each character has been divided into two parts, characterized in that each character is considered as a plain figure and divided into two parts, the first part called leading part and the second part called body part, the leading part of the character is input as a whole, while the body part is input from stroke to stroke.

3. The method, as claimed in Claim 1 or Claim 2, in which the principle for encoding Chinese characters is to encode the leading part before encoding the body part.

4. The method, as claimed in Claim 3, in which there are eighty two basic leading parts extracted from all Chinese characters, each leading part is input by only one key.

5. The method, as claimed in Claim 2, in which at most four strokes are taken from the body part of the character for encoding and inputting the body part.

6. The method, as claimed in Claim 5, in which each of the four strokes is one of ten basic strokes provided for encoding and inputting the body part.

7. The method, as claimed in Claim 6, in which the rule for taking strokes from the body part is upper first, lower second; if the body part can be divided into left part and right part, or outside part and inside part, the rules for taking part are, left, first, right second or outside first, inside second; and if a part can be divided into sub parts, in each subpart the rule for taking strokes is still upper first, lower second.

8. The method, as claimed in any preceding claim, for encoding and inputting Chinese characters into a machine having a screen, wherein a "note-window" is provided in two lines on the bottom of the screen.

9. The method, as claimed in any preceding claim, wherein connected characters are input by taking and encoding at most five strokes from the character in the same rule as to the body part.

10. The method as claimed in any preceding claim, wherein a character can be encoded in many different ways.

11. A dictionary of Chinese characters in which the characters have been encoded by a method as claimed in any preceding claim and the order of the codes in the dictionary is such that all those starting with the same digit have been grouped together and sub-divided.

12. A keyboard through which the encoded Chinese character can be input, characterized in that the keyboard is based on an ordinary Roman letter typewriter keyboard.

13. The keyboard, as claimed in Claim 12, in which thirty five keys in three lines of the keyboard represent the eighty two basic leading parts, two or three leading parts will be input by pressing one leading part key.

14. The keyboard, as claimed in Claim 13, wherein a row of function keys are provided on the keyboard for selecting different input modes.

15. The keyboard, as claimed in Claim 14, wherein a row of body part keys are placed in the positions which are in the most frequent use on the keyboard, i.e. where the hands of the operator are usually rested.

16. The keyboard, as claimed in Claim 15, wherein the Roman alphabet along with the leading parts or strokes actuated by the same key can be selected by pressing different function keys.

17. The keyboard, as claimed in Claim 16, in which after encoding, the Chinese character can automatically be fed into computer by pressing the space key one time.

18. The keyboard, as claimed in Claim 17, wherein the keyboard is formed as shown in Fig.