JPH08265167A

JPH08265167A - Data compression device

Info

Publication number: JPH08265167A
Application number: JP6234795A
Authority: JP
Inventors: Masaru Kobayashi; 優小林
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 1995-03-22
Filing date: 1995-03-22
Publication date: 1996-10-11

Abstract

(57)【要約】【目的】データ圧縮装置の規模を小型化するととも
に、そのコストを低減することを目的とする。【構成】制御部１０２は、入力レジスタ１０１にスト
アされた入力文字が部分文字列の先頭文字であるか否か
判別し、この判別結果に応じて各一致位置ポインタ１０
９〜１１２に保持させたアドレスを変更しながら、辞書
メモリ１０３から読み出した文字と入力文字を比較回路
１０６を用いて比較していくことで最大一致系列を検出
する。この最大一致系列の一致長は一致長カウンタ１１
４によりカウントし、その一致長、及び減算器１１６か
ら出力されるそれの一致位置を一致フラグとともに符号
出力部１１５から出力させる。また、制御部１０２は、
入力ポインタ１１３の書込アドレスを変更しながら、入
力された文字を随時辞書メモリ１０３に書き込む。 (57) [Abstract] [Purpose] The objective is to reduce the size and size of the data compression device. Configuration: The control unit 102 determines whether or not the input character stored in the input register 101 is the first character of the partial character string, and according to the determination result, each matching position pointer 10
The maximum match sequence is detected by comparing the characters read from the dictionary memory 103 with the input characters using the comparison circuit 106 while changing the addresses held in 9 to 112. The match length of the maximum match sequence is the match length counter 11
The match length and the match position output from the subtractor 116 are output from the code output unit 115 together with the match flag. In addition, the control unit 102
While changing the write address of the input pointer 113, the input character is written in the dictionary memory 103 as needed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データを入力しながら
それを圧縮する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for inputting data and compressing it.

【０００２】[0002]

【従来の技術】近年、コンピュータや通信等の各分野に
おいては扱うデータ量が急速に増大している。このデー
タ量の増大に伴い、データの記憶に要するコスト、電話
料金といった通信に要するコストが増大していることか
ら、このようなコストを抑えるために、データ列の中に
含まれる冗長性を取り除くことでそれを圧縮するデータ
圧縮技術が広く利用されてきている。データ圧縮技術
は、ファイル・アーカイバや分散ファイル・システム、
データ通信、音声・画像通信、コンピュータ・ネットワ
ークなどの分野では既に非常に重要な役割を果たしてお
り、これからさらにその重要度が増していくと考えられ
ている技術である。2. Description of the Related Art In recent years, the amount of data to be handled has rapidly increased in various fields such as computers and communications. Since the cost of storing data and the cost of communication such as telephone charges are increasing with the increase in the amount of data, the redundancy included in the data string is removed in order to suppress such costs. Therefore, data compression technology for compressing it has been widely used. Data compression technology is used for file archiver, distributed file system,
It is a technology that has already played a very important role in the fields of data communication, voice / image communication, computer networks, etc., and its importance is expected to increase in the future.

【０００３】データ圧縮方式は、圧縮後のデータ列（圧
縮データ列）を元のデータ列（入力データ列）に復元す
る復号化を行った際の歪みの有無により、歪みを有する
有雑音圧縮（非可逆符号化）と、歪みがない無雑音圧縮
（可逆符号化）とに大別される。通常、有雑音圧縮は、
画像、音声といったある程度の歪みを許容できるデータ
に対して利用され、他方の無雑音圧縮は、ファイル（例
えばプログラム）や文書といった歪みが許されないデー
タに対して利用される。In the data compression method, noise-containing compression having distortion (depending on the presence or absence of distortion when decoding is performed to restore the compressed data string (compressed data string) to the original data string (input data string)) Lossy coding) and noiseless compression without distortion (reversible coding). Normally, noisy compression is
It is used for data that can tolerate some distortion such as images and sounds, while noiseless compression is used for data that does not allow distortion, such as files (eg programs) and documents.

【０００４】無雑音圧縮として今日広く利用されている
ものに、Ｌｅｍｐｅｌ−Ｚｉｖ方式（以降、ＬＺ方式と
記す）がある。このＬＺ方式は、文字列を読み込みなが
らその統計的な性質を抽出して符号化を行う、所謂適応
符号化方式の一種である。ここでの文字の定義は、符号
化対象である入力データ列の最小の処理単位のことであ
る。The Lempel-Ziv method (hereinafter referred to as the LZ method) is widely used today as noiseless compression. The LZ method is a kind of so-called adaptive coding method in which a character string is read and its statistical properties are extracted for coding. The definition of the character here is the minimum processing unit of the input data string to be encoded.

【０００５】図５は、上記ＬＺ方式を採用した従来のデ
ータ圧縮装置５００の構成を示すブロック図である。こ
の図５を参照して、従来のデータ圧縮装置５００につい
て説明する。FIG. 5 is a block diagram showing the configuration of a conventional data compression apparatus 500 which adopts the LZ method. A conventional data compression apparatus 500 will be described with reference to FIG.

【０００６】このデータ圧縮装置５００は、図５に示す
ように、外部から入力した入力データ列（文字列）ＩＷ
を記憶する入力メモリ５０１と、該入力メモリ５０１内
の符号化が終了した部分文字列が順次記憶される辞書メ
モリ５０２と、入力メモリ５０１から読み出された文字
を辞書メモリ５０２から読み出された文字と比較する比
較回路５０３と、装置５００全体の制御を実行する制御
部５０４と、制御部５０４が生成した符号語が格納され
る符号メモリ５０５、及びフラグレジスタ５０６と、符
号メモリ５０５、及びフラグレジスタ５０６に格納され
た符号語を出力するために用いられる出力メモリ５０７
とから構成される。As shown in FIG. 5, this data compression device 500 has an input data string (character string) IW input from the outside.
Input memory 501, a dictionary memory 502 in which encoded partial character strings in the input memory 501 are sequentially stored, and a character read from the input memory 501 is read from the dictionary memory 502. A comparison circuit 503 for comparing with a character, a control unit 504 for controlling the entire apparatus 500, a code memory 505 in which a code word generated by the control unit 504 is stored, a flag register 506, a code memory 505, and a flag. Output memory 507 used to output the codeword stored in register 506
Composed of and.

【０００７】上記の構成において、辞書メモリ５０２に
記憶された文字列は辞書として利用される。制御部５０
４は、入力メモリ５０１、辞書メモリ５０２の文字を読
み出すアドレスを変更させながら、比較回路５０３を用
いて各メモリ５０１、５０２から読み出された文字が一
致するか否かを比較していくことにより、入力メモリ５
０１の符号化の対象となっている部分文字列と一致する
辞書メモリ５０２の部分文字列（一致系列）を検索す
る。In the above structure, the character string stored in the dictionary memory 502 is used as a dictionary. Control unit 50
4 changes the addresses for reading characters in the input memory 501 and the dictionary memory 502, and compares the characters read from the memories 501 and 502 by using the comparison circuit 503 to compare whether or not the characters match. , Input memory 5
A partial character string (match sequence) in the dictionary memory 502 that matches the partial character string that is the target of encoding 01 is searched.

【０００８】この制御部５０４による一致系列（部分文
字列）の検索は、例えば次のようにして行われる。先
ず、現時点の符号化を行う対象となっている部分文字列
の先頭文字と一致する文字を、辞書メモリ５０２の最も
過去に記憶された文字から順次探していき、この先頭文
字と一致する文字を見つける。先頭文字と一致する文字
を見つけると、その文字が記憶されているアドレス（以
降、一致位置と記す）を記憶した後、辞書メモリ５０２
のこの見つけた文字に続く文字と入力メモリ５０１の部
分文字列の先頭文字に続く文字とを順次比較していき、
一致すると比較した文字数（一致長）をカウントする。
このカウントは２つの文字が一致しないと比較するまで
継続して行い、カウントした値（一致長）は既に記憶し
た先頭アドレスと対応させて記憶する。これにより１つ
の一致系列の抽出が終了し、このような一致系列の抽出
を繰り返し行い、抽出した一致系列のなかからその一致
長が最大のものを最大一致系列として抽出する。The search for the matching sequence (partial character string) by the control unit 504 is performed as follows, for example. First, the character matching the first character of the partial character string to be encoded at the present time is sequentially searched from the character stored in the dictionary memory 502 in the earliest, and the character matching the first character is searched. locate. When a character that matches the first character is found, the address where that character is stored (hereinafter referred to as the matching position) is stored, and then the dictionary memory 502
The character following the found character of No. and the character following the first character of the partial character string of the input memory 501 are sequentially compared,
When they match, the number of characters compared (match length) is counted.
This counting is continuously performed until it is compared that the two characters do not match, and the counted value (match length) is stored in association with the already stored start address. As a result, the extraction of one matching series is completed, such matching series extraction is repeated, and the one with the largest matching length is extracted as the maximum matching series from the extracted matching series.

【０００９】制御部５０４は、この検索を行った結果に
基づいて符号化を行う。具体的には、上記のようにして
見つけた最大一致系列の長さが所定数以上であった場
合、一致フラグに“１”をセットしてこれをフラグレジ
スタ５０６に格納し、該最大一致系列の一致位置、及び
その一致長を符号メモリ５０５に格納する。反対に符号
化対象である部分文字列の先頭文字と一致する文字が辞
書メモリ５０２に見つけられなかった場合を含め、最大
一致系列の長さが所定数未満であった場合、一致フラグ
に“０”をセットしてこれをフラグレジスタ５０６に格
納し、該先頭文字を表すデータ（例えば、ＡＳＣＩＩコ
ード）を符号メモリ５０５にそのまま格納する。The control unit 504 performs encoding based on the result of this search. Specifically, when the length of the maximum matching sequence found as described above is equal to or greater than a predetermined number, the matching flag is set to "1" and stored in the flag register 506, and the maximum matching sequence is stored. The matching position and the matching length are stored in the code memory 505. On the contrary, when the length of the maximum matching sequence is less than the predetermined number, including the case where the character matching the first character of the partial character string to be encoded is not found in the dictionary memory 502, the matching flag is set to "0". "" Is set and stored in the flag register 506, and the data (for example, ASCII code) representing the first character is stored in the code memory 505 as it is.

【００１０】制御部５０４は、上記のようにして入力メ
モリ５０１の文字列に対する符号化を部分文字列に分け
て順次行うとともに、入力メモリ５０１、辞書メモリ５
０２に記憶させる内容（文字列）を更新する。具体的に
は、例えば入力メモリ５０１のある部分文字列に対する
符号化が終了すると、この符号化が終了した部分文字列
を辞書メモリ５０２に新たに登録し、入力メモリ５０１
にはこの部分文字列の文字数分、入力データ列ＩＷを新
たに書き込む。As described above, the control unit 504 divides the character string of the input memory 501 into partial character strings and sequentially performs the encoding, and also the input memory 501 and the dictionary memory 5
The contents (character string) stored in 02 are updated. Specifically, for example, when encoding of a partial character string in the input memory 501 is completed, the partial character string of which the encoding is completed is newly registered in the dictionary memory 502, and the input memory 501
In this case, the input data string IW is newly written for the number of characters of this partial character string.

【００１１】また、制御部５０４は、フラグレジスタ５
０６にそのビット数の一致フラグを格納すると、このフ
ラグレジスタ５０６、符号メモリ５０５に格納されてい
る内容を出力メモリ５０７に格納する。その後、この出
力メモリ５０７に格納された符号語を圧縮符号出力ＯＷ
として外部に出力する。The control unit 504 also includes a flag register 5
When the match flag of the number of bits is stored in 06, the contents stored in the flag register 506 and the code memory 505 are stored in the output memory 507. After that, the code word stored in the output memory 507 is compressed and outputted as a compressed code.
To the outside.

【００１２】上記の動作を繰り返すことで入力データ列
（文字列）ＩＷに対する圧縮が行われる。出力メモリ５
０７から随時出力される圧縮符号出力ＯＷは、例えば磁
気記憶装置に出力され、ここで磁気ディスク上に格納さ
れる。By repeating the above operation, the input data string (character string) IW is compressed. Output memory 5
The compressed code output OW that is output from 07 at any time is output to, for example, a magnetic storage device, and is stored on the magnetic disk here.

【００１３】なお、圧縮符号出力（符号語）ＯＷの復号
は、過去に復号化した文字列を辞書として利用し、一致
フラグの値が“０”のときは符号語が表す文字、その値
が“１”のときは辞書として記憶している復号した文字
列から符号語に応じて部分文字列を抽出して複写してい
くことで行われる。For decoding the compressed code output (code word) OW, the character string decoded in the past is used as a dictionary. When the value of the match flag is "0", the character represented by the code word and its value are When the value is "1", the partial character string is extracted from the decoded character string stored as a dictionary according to the code word and copied.

【００１４】[0014]

【発明が解決しようとする課題】しかしながら、上述し
た従来のデータ圧縮装置５００は、一致系列の抽出を繰
り返すことで最大一致系列を検索していることから、符
号化を行う文字列を記憶させておかなければならなかっ
た。符号化を行う文字列は入力メモリ５０１に記憶させ
ているが、このようなメモリを備えたことによって装置
５００が大型化していたという問題点があった。However, since the above-described conventional data compression apparatus 500 searches for the maximum matching sequence by repeating the extraction of the matching sequence, the character string to be encoded is stored. I had to go. The character string to be encoded is stored in the input memory 501, but there is a problem that the apparatus 500 is upsized due to the provision of such a memory.

【００１５】データ圧縮装置は、例えばコンピュータの
バスに接続して、その主記憶装置や外部記憶装置に記憶
させるデータの圧縮に利用される。このため、バス等に
接続して用いられるように、ＬＳＩ（Large Scale Inte
grated Circuit）化が望まれている。このＬＳＩ化を容
易とし、また、そのコストを抑えられるように、その規
模が小さいデータ圧縮装置が望まれていた。The data compression device is connected to, for example, a bus of a computer and is used to compress data to be stored in the main storage device or an external storage device. For this reason, LSI (Large Scale Inte
grated circuit) is desired. A data compression device having a small scale has been desired so that the LSI can be easily made and the cost can be suppressed.

【００１６】本発明の課題は、データ圧縮装置の規模を
小型化するとともに、そのコストを低減することにあ
る。An object of the present invention is to reduce the size of a data compression device and reduce its cost.

【００１７】[0017]

【課題を解決するための手段】本発明のデータ圧縮装置
は、過去に入力された文字列を記憶する記憶手段を備
え、該記憶手段に記憶された文字列を参照し、繰り返し
入力された部分文字列を検出することで文字列を表すデ
ータ量の圧縮を行うことを前提とし、以下の手段を備え
る。A data compression apparatus according to the present invention comprises a storage means for storing a character string input in the past, and refers to the character string stored in the storage means to repeatedly input a part. The following means are provided on the premise that the amount of data representing a character string is compressed by detecting the character string.

【００１８】先ず、文字判別手段は、入力された文字が
部分文字列の先頭文字であるか否かを判別する。一致位
置記憶手段は、文字判別手段が入力された文字を先頭文
字と判別したとき、該入力された文字と一致する文字が
記憶されている記憶手段のアドレスを一致位置として記
憶する。First, the character discriminating means discriminates whether or not the input character is the first character of the partial character string. When the character discriminating means discriminates the inputted character as the leading character, the coincident position storing means stores the address of the storing means in which the character coincident with the inputted character is stored as the coincident position.

【００１９】読出手段は、先頭文字であると文字判別手
段が判別した文字に続く文字が入力される度に、該文字
の入力に応じて一致位置から順次アドレスを変更しなが
ら記憶手段に記憶されている文字を読み出す。The reading means stores each time a character following the character judged by the character judging means as the first character is inputted, the address is stored in the memory means while sequentially changing the address from the matching position in accordance with the input of the character. Read the character.

【００２０】比較手段は、入力された文字と読出手段が
記憶手段から読み出した文字が一致するか否かを比較
し、書込手段は、文字が入力される度に、該入力された
文字を記憶手段に随時書き込む。The comparing means compares whether or not the input character and the character read from the storing means by the reading means match, and the writing means, each time a character is input, compares the input character with the input character. Write in the storage means at any time.

【００２１】計数手段は、文字判別手段が入力された文
字を先頭文字と判別してから比較手段が一致すると比較
した回数を計数することで、記憶手段に記憶されている
部分文字列の長さを計数する。The counting means counts the number of times the character judging means judges the inputted character as the first character and then compares the characters when the comparing means agree with each other, thereby calculating the length of the partial character string stored in the storing means. Is counted.

【００２２】符号生成手段は、文字判別手段が入力され
た文字を先頭文字と判別した後に比較手段の比較結果が
不一致となったとき、記憶手段に記憶されている部分文
字列を表す符号語を一致位置、計数手段が計数した文字
数から生成する。The code generation means, when the comparison result of the comparison means becomes inconsistent after the character discriminating means discriminates the inputted character as the leading character, the code generating means stores the code word representing the partial character string stored in the storing means. It is generated from the matching position and the number of characters counted by the counting means.

【００２３】なお、上記の構成において、一致位置記憶
手段を複数備え、読出手段は、文字が入力される度に、
複数の一致位置記憶手段に記憶されているそれぞれの一
致位置から順次変更させたアドレスに記憶されている文
字を記憶手段から各々読み出し、比較手段は、読出手段
が記憶手段から文字を読み出す毎に、該記憶手段から読
みだされた文字と入力された文字を順次比較し、計数手
段は、読出手段が記憶手段から読み出した文字のなかで
比較手段が入力された文字と一致すると比較した文字が
あった場合に計数を１度行い、符号生成手段は、読出手
段が記憶手段から読み出した文字のなかで比較手段が入
力された文字と一致すると比較した文字がなかった場合
に、記憶手段に記憶されている部分文字列を表す符号語
を生成する、ことが望ましい。In the above construction, a plurality of coincidence position storing means are provided, and the reading means is provided with each time a character is input.
The characters stored at the addresses sequentially changed from the respective coincidence positions stored in the plurality of coincidence position storage means are read out from the storage means, respectively, and the comparison means, each time the reading means reads out the characters from the storage means, The characters read out from the storage means are sequentially compared with the input characters, and the counting means finds among the characters read out from the storage means by the reading means that the comparison means matches the input characters. In the case where there is no character compared with the character input by the comparison means among the characters read by the read means from the storage means, the code generation means stores it in the storage means. It is desirable to generate a codeword that represents a substring that has

【００２４】[0024]

【作用】本発明のデータ圧縮装置は、文字が入力される
度に、該入力された文字が部分文字列の先頭文字に該当
するか否か判別し、該判別結果に応じてこの入力された
文字が辞書（記憶手段）に記憶されているか否か検索す
るとともに、この入力された文字を辞書に新たに登録し
ながら繰り返し入力された部分文字列を検出することで
符号化を行う。Each time a character is input, the data compression apparatus of the present invention determines whether or not the input character corresponds to the first character of the partial character string, and the input character is input according to the result of the determination. Encoding is performed by searching whether or not the character is stored in the dictionary (storage means), and by detecting the partial character string repeatedly input while newly registering the input character in the dictionary.

【００２５】例えば、入力された文字が部分文字列の先
頭文字に該当すると判別した場合、この入力された文字
に一致する文字が記憶されている辞書のアドレス（一致
位置）を複数検索する。この入力された文字に続けて入
力される文字は、文字が入力される度に各一致位置から
連続させて変更させたアドレスから読み出したそれぞれ
の文字と一致するか否か比較し、これらの文字のなかで
一致すると比較された文字が入力された数をカウントし
ていくことで、最大一致系列、及びその一致長を確定す
る。最大一致系列が確定すると、その一致位置、一致長
から符号語を生成することで、この最大一致系列に対応
する入力された部分文字列の符号化を行う。この符号化
を行う一方、入力された文字は随時辞書に新たに登録す
る。For example, when it is determined that the input character corresponds to the first character of the partial character string, a plurality of addresses (matching positions) in the dictionary in which the character matching the input character is stored are searched. Each time a character is input, the characters that are input after this input character are compared to see if they match the respective characters read from the addresses that were changed consecutively from each matching position. Among them, the maximum matching sequence and its matching length are determined by counting the number of input characters that are compared and are compared. When the maximum matching sequence is determined, a codeword is generated from the matching position and the matching length to encode the input partial character string corresponding to this maximum matching sequence. While this encoding is performed, the input character is newly registered in the dictionary as needed.

【００２６】上記のように、一致位置を検索しておくこ
とで文字の入力に合わせて随時符号化を行うことが可能
となり、また、入力された文字を辞書に随時登録するこ
とが可能となる。これにより、符号化対象の文字列を記
憶しておく必要性が回避され、メモリ数、及びその容量
の低減が可能となる。従って、装置の規模の小型化、及
びそのコストの低減が実現される。As described above, by searching for the matching position, it is possible to perform encoding at any time according to the input of the character, and it is possible to register the input character at any time in the dictionary. . This avoids the need to store the character string to be encoded and reduces the number of memories and the capacity thereof. Therefore, it is possible to reduce the size of the device and reduce its cost.

【００２７】[0027]

【実施例】以下、本発明の実施例を、図面を参照しなが
ら詳細に説明する。図１は、本実施例によるデータ圧縮
装置１００の構成を示すブロック図である。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a data compression apparatus 100 according to this embodiment.

【００２８】このデータ圧縮装置１００における入力デ
ータ列ＩＤは、特には図示しない外部装置（例えばＣＰ
Ｕ）から、例えば１文字を８ビットで表現するＡＳＣＩ
Ｉコードで入力される。外部装置からは、入力データ列
ＩＤは１ビットの書込信号とともに入力され、その書込
信号がアクティブになると、１文字の入力データ列ＩＤ
は入力レジスタ１０１にストアされる。The input data string ID in this data compression device 100 is an external device (not shown) such as CP.
U), for example, ASCII that represents one character in 8 bits
Input with I code. The input data string ID is input from the external device together with the 1-bit write signal, and when the write signal becomes active, the 1-character input data string ID is input.
Are stored in the input register 101.

【００２９】書込信号は、入力レジスタ１０１の他に、
制御部１０２に入力される。制御部１０２は、装置１０
０全体の制御を行うものであり、この書込信号がアクテ
ィブになる度に符号化処理（その動作は以下に説明す
る）を起動する。In addition to the input register 101, the write signal is
It is input to the control unit 102. The controller 102 controls the device 10
The control of 0 as a whole is performed, and the encoding process (the operation thereof will be described below) is activated each time the write signal becomes active.

【００３０】入力レジスタ１０１にストアされている文
字（以降、入力文字と記す）は、データセレクタ１０
４、及び前データレジスタ１０５に出力される。この前
データレジスタ１０５は、特には図示していないが、制
御部１０２から入力する書込信号がアクティブになるこ
とで入力文字（入力レジスタ１０１にストアされている
文字）をストアする。Characters stored in the input register 101 (hereinafter referred to as input characters) are the data selector 10
4 and the previous data register 105. Although not particularly shown, the previous data register 105 stores the input character (the character stored in the input register 101) when the write signal input from the control unit 102 becomes active.

【００３１】データセレクタ１０４は、前データレジス
タ１０５、入力レジスタ１０１がストアしている文字を
各々入力し、制御部１０２からのセレクト信号に従って
選択した一方を比較回路１０６に出力する。比較回路１
０６は、データセレクタ１０４から出力された文字と、
辞書メモリ１０３から読み出された文字とが一致するか
否かを比較し、この比較結果を制御部１０２に出力す
る。The data selector 104 inputs the characters stored in the previous data register 105 and the input register 101, respectively, and outputs one selected according to the select signal from the control unit 102 to the comparison circuit 106. Comparison circuit 1
06 is a character output from the data selector 104,
It compares whether or not the characters read from the dictionary memory 103 match, and outputs the comparison result to the control unit 102.

【００３２】本実施例における辞書メモリ１０３は、４
Ｋｂｙｔｅの記憶容量（辞書サイズ）を有し、過去に入
力された４Ｋｂｙｔｅ分の文字（符号化済入力）が記憶
される。辞書メモリ１０３の読出アドレス、書込アドレ
スは、アドレスセレクタ１０７に接続された各ポインタ
により指定される。The dictionary memory 103 in this embodiment has four
It has a storage capacity (dictionary size) of Kbytes and stores characters (coded input) for 4Kbytes input in the past. The read address and the write address of the dictionary memory 103 are designated by each pointer connected to the address selector 107.

【００３３】このアドレスセレクタ１０７には、辞書先
頭ポインタ１０８、一致位置ポインタ１０９〜１１２、
及び入力ポインタ１１３の各ポインタが接続されてい
る。アドレスセレクタ１０７は制御部１０２からのセレ
クト信号に従ってこれらのなかから１つを選択する。The address selector 107 has a dictionary head pointer 108, matching position pointers 109 to 112,
And each pointer of the input pointer 113 is connected. The address selector 107 selects one of these according to a select signal from the control unit 102.

【００３４】ここで、上記各ポインタについて、図２を
参照して説明する。図２は、各ポインタが示す辞書メモ
リ１０３上のアドレス例を示す図である。図２（ａ）
は、符号化済入力が辞書サイズ（本実施例では４Ｋｂｙ
ｔｅ）よりも小さい場合、図２（ｂ）は、符号化済入力
が辞書サイズよりも大きい場合を各々示し、斜線で示す
部分は文字が記憶されている領域を表している。また、
Ｐ１〜Ｐ４は、各々一致位置ポインタ１０９〜１１２が
指示（保持）しているアドレスの位置を示す。Here, each of the above pointers will be described with reference to FIG. FIG. 2 is a diagram showing an example of an address on the dictionary memory 103 indicated by each pointer. Figure 2 (a)
Indicates that the coded input is a dictionary size (4 Kby in this embodiment).
2 (b), the coded input is larger than the dictionary size, and the hatched portion represents the area in which characters are stored. Also,
P1 to P4 indicate the positions of the addresses designated (held) by the matching position pointers 109 to 112, respectively.

【００３５】上記したように辞書メモリ１０３は、過去
に入力された４Ｋｂｙｔｅ分の文字が記憶されることか
ら、各ポインタ１０８〜１１３は１２ビット（２¹²＝４
０９６）である。As described above, since the dictionary memory 103 stores the characters of 4 Kbytes input in the past, each of the pointers 108 to 113 has 12 bits (2 ¹² = 4).
096).

【００３６】辞書先頭ポインタ１０８は、制御部１０２
から出力された、辞書メモリ１０３に記憶されている文
字列において最も過去に記憶された文字のアドレスを保
持する。このため、符号化済入力が辞書サイズよりも小
さい場合（図２（ａ）参照）、辞書先頭ポインタ１０８
は辞書メモリ１０３の先頭アドレスを示す値を保持し、
符号化済入力が辞書サイズよりも大きい場合、図２
（ｂ）に示すように、辞書先頭ポインタ１０８が指示す
るアドレスは先頭アドレスとは限らず、辞書メモリ１０
３上をそれに文字が書き込まれる度に移動（スライド）
する。The dictionary head pointer 108 is used by the control unit 102.
The character string stored in the dictionary memory 103, which is output from, holds the address of the oldest stored character. Therefore, when the encoded input is smaller than the dictionary size (see FIG. 2A), the dictionary head pointer 108
Holds a value indicating the start address of the dictionary memory 103,
If the encoded input is larger than the dictionary size, then FIG.
As shown in (b), the address designated by the dictionary head pointer 108 is not limited to the head address, and the dictionary memory 10
Move over 3 every time a letter is written to it (slide)
To do.

【００３７】入力ポインタ１１３は、制御部１０２から
出力された、辞書メモリ１０３に新たに文字を書き込む
アドレスを保持する。このため、符号化済入力が辞書サ
イズよりも小さい場合（図２（ａ）参照）、入力ポイン
タ１１３には辞書メモリ１０３のまだ文字が書き込まれ
ていない領域の先頭を示すアドレスが保持され、符号化
済入力が辞書サイズよりも大きい場合、図２（ｂ）に示
すように、最も過去に記憶された文字のアドレス、即ち
辞書先頭ポインタ１０８が保持しているアドレスと同じ
値が入力ポインタ１１３に保持される。The input pointer 113 holds an address output from the control unit 102 for writing a new character in the dictionary memory 103. Therefore, when the coded input is smaller than the dictionary size (see FIG. 2A), the input pointer 113 holds the address indicating the beginning of the area of the dictionary memory 103 in which no character is written, and the code is When the digitized input is larger than the dictionary size, as shown in FIG. 2B, the address of the character stored in the earliest, that is, the same value as the address held by the dictionary head pointer 108 is stored in the input pointer 113. Retained.

【００３８】入力ポインタ１１３（辞書先頭ポインタ１
０８）に保持されるアドレスは、辞書メモリ１０３を循
環させるように変更される。このため、辞書メモリ１０
３には過去に入力された４Ｋｂｙｔｅ分の文字が常に記
憶される。Input pointer 113 (dictionary top pointer 1
The address held in 08) is changed so as to circulate the dictionary memory 103. Therefore, the dictionary memory 10
The character of 4 Kbytes input in the past is always stored in 3.

【００３９】一致位置ポインタ１０９〜１１２は、辞書
メモリ１０３から文字を読み出すアドレスを保持するも
のであり、入力レジスタ１０１にストアされた入力文字
が部分文字列の先頭文字であると制御部１０２が判別し
たとき（この判別については後述する）、その入力文字
と一致する文字が記憶されているアドレス（一致位置）
を各々保持する。ここでは、以降の説明を簡単にするた
め、現在入力レジスタ１０１にストアされている入力文
字は先頭文字であるとし、この先頭文字と判別した入力
文字、この入力文字に続く文字が入力された際の制御部
１０２の動作を順次説明していくなかで、入力された文
字の種類の判別について説明することにする。The matching position pointers 109 to 112 hold addresses for reading characters from the dictionary memory 103, and the control unit 102 determines that the input character stored in the input register 101 is the first character of the partial character string. Address (matching position) where the character that matches the input character is stored
Hold each. Here, in order to simplify the following description, it is assumed that the input character currently stored in the input register 101 is the first character, and when the input character determined to be the first character and the character following the input character are input. As the operation of the control unit 102 is sequentially described, the determination of the type of the input character will be described.

【００４０】本実施例では、上記一致位置の検索は辞書
先頭ポインタ１０８が示すアドレスから順次アドレスを
変更させながら行われる。一致位置ポインタ１０９に一
致位置を保持、即ち比較回路１０６の比較結果が一致を
示すと、制御部１０２はこのときの一致位置ポインタ１
０９の値を一致位置ポインタ１１０にストアさせる。そ
の後、アドレスセレクタ１０７に一致位置ポインタ１１
０を選択させ、一致位置ポインタ１１０をインクリメン
トしながら辞書メモリ１０３から読み出された文字を入
力文字と順次比較していくことにより、一致位置ポイン
タ１１０に一致位置を保持させる。以降、他の一致位置
ポインタ１１１、一致位置ポインタ１１２に対しても同
様に、一致位置を各々保持させる。In this embodiment, the search for the matching position is performed while sequentially changing the address from the address indicated by the dictionary head pointer 108. When the coincidence position is held in the coincidence position pointer 109, that is, when the comparison result of the comparison circuit 106 indicates coincidence, the control unit 102 causes the coincidence position pointer 1 at this time.
The value of 09 is stored in the coincidence position pointer 110. After that, the address selector 107
0 is selected, the character read from the dictionary memory 103 is sequentially compared with the input character while the matching position pointer 110 is incremented, and the matching position pointer 110 holds the matching position. After that, the matching positions are held similarly for the other matching position pointers 111 and 112.

【００４１】上記した入力文字と一致する文字の検索
は、一致位置ポインタ１０９〜１１２の全てに一致位置
を保持させるか、或いは辞書メモリ１０３に記憶されて
いる文字を全て参照、即ち辞書先頭ポインタ１０８のア
ドレスから循環させて入力ポインタ１１３の直前のアド
レスまでの文字を参照することで終了する。一致位置を
保持する順序により、各一致位置ポインタ１０９〜１１
２が保持する一致位置は図２（ａ）、及び（ｂ）に示す
ようになる。To search for a character that matches the input character, all of the matching position pointers 109 to 112 hold the matching position, or all the characters stored in the dictionary memory 103 are referenced, that is, the dictionary head pointer 108. The process ends by referring to the characters up to the address immediately before the input pointer 113 by circulating from the address of. Each of the matching position pointers 109 to 11 depends on the order of holding the matching positions.
The matching positions held by 2 are as shown in FIGS. 2 (a) and 2 (b).

【００４２】制御部１０２は、この検索の結果、入力レ
ジスタ１０１の入力文字と一致する文字を見つけた場
合、入力文字を前データレジスタ１０５にストアさせる
とともに、アドレスセレクタ１０７に入力ポインタ１１
３を選択させ、入力ポインタ１１３が指示するアドレス
にこの入力文字を記憶させる。入力文字を辞書メモリ１
０３に記憶させると、制御部１０２は一致位置を保持さ
せた一致位置ポインタ、辞書先頭ポインタ１０８、入力
ポインタ１１３をインクリメントし、その後、制御部１
０２は外部装置に対して文字の出力を要求する。この外
部装置に対する文字の出力の要求は、例えばその旨を示
す割り込み信号を外部装置に送出することで行われる。When the control unit 102 finds a character that matches the input character of the input register 101 as a result of this search, it stores the input character in the previous data register 105 and causes the address selector 107 to input the input pointer 11.
3 is selected, and this input character is stored in the address designated by the input pointer 113. Input character dictionary memory 1
When it is stored in 03, the control unit 102 increments the match position pointer holding the match position, the dictionary head pointer 108, and the input pointer 113, and then the control unit 1
02 requests the external device to output characters. The request to output the character to the external device is made by, for example, sending an interrupt signal to that effect to the external device.

【００４３】一方、上記の検索により入力文字と一致す
る文字が見つからなかった場合、制御部１０２は、
“０”をセットした一致フラグを符号出力部１１５に出
力するととともに、データセレクタ１０４を介してこの
入力文字を符号出力部１１５に出力させる。また、これ
に続いて入力文字を辞書メモリ１０３に書き込み、辞書
先頭ポインタ１０８、入力ポインタ１１３をインクリメ
ントした後、外部装置に次の文字の出力を要求する。こ
の場合、現時点の入力文字は符号化が終了しているの
で、制御部１０２は外部装置から次に入力する文字を先
頭文字と判別する。On the other hand, when the character matching the input character is not found by the above search, the control unit 102
The match flag in which “0” is set is output to the code output unit 115, and at the same time, the input character is output to the code output unit 115 via the data selector 104. Further, subsequently, the input character is written in the dictionary memory 103, the dictionary head pointer 108 and the input pointer 113 are incremented, and then the output of the next character is requested to the external device. In this case, since the input character at the present time has been encoded, the control unit 102 determines that the character input next from the external device is the first character.

【００４４】外部装置から新たに次の文字が入力される
と、即ち入力レジスタ１０１に新たに文字がストアされ
ると、制御部１０２は、符号化処理を再開する。このと
き、１つ前の入力文字と一致する文字を辞書メモリ１０
３内に見つけられなかった場合、上記したように、この
新たに入力された文字は先頭文字に該当することになる
ので、各一致位置ポインタ１０９〜１１２に一致位置を
保持させるための検索を再度行う。When the next character is newly input from the external device, that is, when a new character is stored in the input register 101, the control unit 102 restarts the encoding process. At this time, the character that matches the immediately preceding input character is set to the dictionary memory 10
If it is not found in 3, the newly input character corresponds to the first character, as described above, and the search for holding the matching position in each of the matching position pointers 109 to 112 is performed again. To do.

【００４５】一方、例えば各一致位置ポインタ１０９〜
１１２に一致位置を保持させていた場合（実際は、これ
らのなかで一致位置を保持するポインタが１つでもあれ
ばよい）、各々インクリメントしておいた各一致位置ポ
インタ１０９〜１１２が指示するアドレスに記憶された
文字がこの新たに入力された文字と一致するか否かの比
較を、一致位置ポインタ１０９が指示するアドレスに記
憶された文字から順次行う。これは、アドレスセレクタ
１０７が選択するポインタを順次切り換えていくことで
行われる。On the other hand, for example, each matching position pointer 109-
When the matching position is held in 112 (actually, even if there is only one pointer holding the matching position among these), the incremented matching position pointers 109 to 112 indicate the addresses. The comparison of whether the stored character matches the newly input character is sequentially performed from the character stored at the address designated by the matching position pointer 109. This is performed by sequentially switching the pointers selected by the address selector 107.

【００４６】上記比較を行った結果、各一致位置ポイン
タ１０９〜１１２が指示するアドレスから読み出された
文字のなかで入力文字と一致する文字がなかった場合、
制御部１０２は符号出力部１１５に“０”の一致フラグ
を出力するとともに、データセレクタ１０４を介して前
データレジスタ１０５の文字を符号出力部１１５に入力
させる。これにより、入力文字までの符号化が終了した
ことになるので、制御部１０２はこの入力文字を先頭文
字と判別し、上述した一致位置の検索を再び行う。As a result of the above comparison, when there is no character that matches the input character among the characters read from the addresses designated by the matching position pointers 109 to 112,
The control unit 102 outputs the coincidence flag of “0” to the code output unit 115 and causes the character of the previous data register 105 to be input to the code output unit 115 via the data selector 104. As a result, the encoding up to the input character has been completed, so the control unit 102 determines this input character as the first character, and performs the above-described matching position search again.

【００４７】反対に、各一致位置ポインタ１０９〜１１
２が指示するアドレスから読み出された文字のなかで入
力文字と一致する文字があった場合、制御部１０２は辞
書メモリ１０３に記憶されている一致系列の長さ（文字
数）をカウントさせる一致長カウンタ１１４をリセット
（“０”をセット）する。この一致長カウンタ１１４の
リセットを行うと、続いて入力文字を入力ポインタ１１
３が指示するアドレスに書き込み、その後、入力文字と
一致する文字を指示した一致位置ポインタ、入力ポイン
タ１１３、辞書先頭ポインタ１０８を各々インクリメン
トし、現在の入力文字に続く文字が入力されるのを待
つ。On the contrary, the matching position pointers 109 to 11
If there is a character that matches the input character among the characters read from the address designated by 2, the control unit 102 counts the length (the number of characters) of the matching sequence stored in the dictionary memory 103. The counter 114 is reset (“0” is set). When the match length counter 114 is reset, the input character is subsequently input to the input pointer 11
3 is written to the address indicated by 3, and then the matching position pointer indicating the character that matches the input character, the input pointer 113, and the dictionary head pointer 108 are each incremented, and the character following the current input character is waited for input. .

【００４８】一致長カウンタ１１４をリセットした以降
は、入力文字と一致する文字を読み出した一致位置ポイ
ンタだけをインクリメントしながら、辞書メモリ１０３
から読み出された文字が全て入力文字と一致しないと比
較回路１０６が比較するまで一致長カウンタ１１４のカ
ウントアップを継続して行う。このカウントアップを継
続している間、比較回路１０６による文字の比較が終了
した後に、入力文字の辞書メモリ１０３への書き込みが
行われる。After the match length counter 114 is reset, the dictionary memory 103 is incremented while incrementing only the match position pointer from which the character matching the input character is read out.
If not all the characters read from the input character match the input character, the match length counter 114 continues to count up until the comparison circuit 106 makes a comparison. While continuing this count-up, after the character comparison by the comparison circuit 106 is completed, the input character is written in the dictionary memory 103.

【００４９】辞書メモリ１０３から読み出された文字が
全て入力文字と一致しなくなると、或いは予め設定され
ている最大一致長（後述する）まで一致長カウンタ１１
４をカウントアップさせた後、次の文字が入力される
と、制御部１０２は、現時点で符号化を行っている部分
文字列の最大一致系列が確定したと判断し、符号出力部
１１５に“１”の一致フラグを出力する。また、符号出
力部１１５に対し、一致長カウンタ１１４からそのカウ
ント値（一致長）、減算器１１６からアドレスセレクタ
１０７の出力値から一致長カウンタ１１４のカウント値
に２を加算した値を減算した結果を順次入力させる。こ
のとき、各一致位置ポインタ１０９〜１１２のなかの最
後までインクリメントされていたものをアドレスセレク
タ１０７に選択させている。このため、減算器１１６の
減算結果は、各一致位置ポインタ１０９〜１１２に保持
されていた一致位置から始まる一致系列のなかの、最大
一致系列の一致位置となる。When all the characters read from the dictionary memory 103 do not match the input characters, or until the preset maximum matching length (described later) is reached, the matching length counter 11
When the next character is input after counting up 4, the control unit 102 determines that the maximum matching sequence of the partial character strings currently being encoded has been determined, and the code output unit 115 displays “ The match flag of "1" is output. Further, the result obtained by subtracting a value obtained by adding 2 to the count value of the match length counter 114 from the output value of the address selector 107 from the subtracter 116 to the code output unit 115 from the match length counter 114 and the count value (match length). Input sequentially. At this time, the address selector 107 is caused to select one of the matching position pointers 109 to 112 that has been incremented to the end. Therefore, the subtraction result of the subtracter 116 becomes the matching position of the maximum matching sequence among the matching sequences starting from the matching positions held in the respective matching position pointers 109 to 112.

【００５０】このように最大一致系列が確定すると、入
力レジスタ１０１にストアされている入力文字よりも過
去に入力された文字までの符号化が終了したことになる
ので、制御部１０２はこの入力文字を先頭文字と判別す
る。このため、制御部１０２は、符号出力部１１５が一
致長カウンタ１１４のカウント値、減算器１１６の減算
結果を入力した後、上述したように、各一致位置ポイン
タ１０９〜１１２に一致位置を保持させるための検索を
行う。When the maximum matching sequence is determined in this way, the characters up to the character input earlier than the input character stored in the input register 101 have been encoded, so the control unit 102 determines Is determined to be the first character. Therefore, after the code output unit 115 inputs the count value of the match length counter 114 and the subtraction result of the subtractor 116, the control unit 102 causes each of the match position pointers 109 to 112 to hold the match position, as described above. Do a search for.

【００５１】符号出力部１１５は、特には図示しない
が、例えばセレクタ、シフトレジスタ等から構成されて
おり、制御部１０２によって制御される。符号出力部１
１５は、制御部１０２の制御に従ってセレクタの切り換
え、シフトレジスタのビットシフト等を行うことで、こ
のシフトレジスタに符号語として一致フラグ、及びその
一致フラグの値に応じて文字（ＡＳＣＩＩコード）、或
いは一致長（一致長カウンタ１１４のカウント値）及び
一致位置（減算器１１６出力）が順次格納される。シフ
トレジスタに格納された符号語は随時８ビット単位に切
り出されて外部に出力される。このシフトレジスタから
切り出されて出力されたものが圧縮符号出力ＯＤであ
る。Although not particularly shown, the code output section 115 is composed of, for example, a selector, a shift register, etc., and is controlled by the control section 102. Code output unit 1
Reference numeral 15 performs selector switching, bit shift of the shift register, and the like according to the control of the control unit 102, so that the shift register has a match flag as a code word, and a character (ASCII code), or a character according to the value of the match flag. The match length (count value of the match length counter 114) and the match position (output of the subtractor 116) are sequentially stored. The code word stored in the shift register is cut out in 8-bit units at any time and output to the outside. The compressed code output OD is output by being cut out from this shift register.

【００５２】ここで、この符号出力部１１５から出力さ
れる圧縮符号出力ＯＤについて、図３を参照して説明す
る。図３は、圧縮符号出力ＯＤのデータ形式を説明する
図である。The compressed code output OD output from the code output section 115 will be described with reference to FIG. FIG. 3 is a diagram for explaining the data format of the compression code output OD.

【００５３】上記したように、本実施例では、入力され
た部分文字列と一致する部分文字列が辞書メモリ１０３
に記憶されているか否かにより、“１”、或いは“０”
が一致フラグにセットされ、図３に示すように、１ビッ
トで表される一致フラグの値により符号語のデータ形式
は互いに異なる。As described above, in this embodiment, the partial character string that matches the input partial character string is the dictionary memory 103.
"1" or "0" depending on whether it is stored in
Is set in the match flag, and as shown in FIG. 3, the data formats of the code words differ from each other depending on the value of the match flag represented by 1 bit.

【００５４】一致フラグに“１”がセットされた場合、
その符号語は該一致フラグ、４ビットの一致長、及び１
２ビットの一致位置からなり、その語長は１７ビットで
ある。一方、一致フラグに“０”がセットされた場合、
その符号語は該一致フラグ、及び８ビットで入力された
文字（不一致文字）からなり、その語長は９ビットであ
る。When "1" is set in the match flag,
The code word is the match flag, a match length of 4 bits, and 1
It consists of a 2-bit match position, and its word length is 17 bits. On the other hand, if "0" is set in the match flag,
The code word is composed of the match flag and a character (non-match character) input with 8 bits, and the word length is 9 bits.

【００５５】本実施例では、上記したようなデータ形式
としていることから、元のデータ量よりも符号化を行っ
た後のデータ量が大きくならないように、入力された符
号化の対象となる部分文字列と２文字以上一致する部分
文字列が辞書メモリ１０３に記憶されていない場合、そ
の部分文字列の先頭文字は不一致文字と判断するように
している。このため、一致フラグに“１”がセットされ
た際の一致長は４ビットで２文字から１７文字（最大一
致長）を表現させている。In the present embodiment, since the data format is as described above, the input portion to be encoded is prevented from becoming larger than the original data amount after encoding. When a partial character string that matches two or more characters with the character string is not stored in the dictionary memory 103, the first character of the partial character string is determined as a non-matching character. Therefore, when the match flag is set to "1", the match length is expressed by 4 bits from 2 to 17 characters (maximum match length).

【００５６】以上が本実施例によるデータ圧縮装置１０
０の概略圧縮動作である。次に、上述した圧縮動作につ
いて、具体例を挙げて説明する。図４は、データ圧縮装
置１００の圧縮動作例を具体的に説明するための図であ
る。図中、同図（ａ）は入力データ列ＩＤとして入力さ
れた入力文字列４０１、同図（ｂ）は該入力文字列４０
１における７番目の文字“Ｂ”が入力された際の各ポイ
ンタ１０８〜１１３が示す辞書メモリ１０３上のアドレ
ス、同図（ｃ）は該入力文字列４０１が入力されたこと
で出力される圧縮符号出力ＯＤを各々示す。The above is the data compression apparatus 10 according to the present embodiment.
This is a rough compression operation of 0. Next, the above-described compression operation will be described with a specific example. FIG. 4 is a diagram for specifically explaining a compression operation example of the data compression device 100. In the figure, (a) is the input character string 401 input as the input data string ID, and (b) is the input character string 40.
The address on the dictionary memory 103 indicated by each of the pointers 108 to 113 when the seventh character "B" in 1 is input, and FIG. 6C shows the compression output when the input character string 401 is input. The code output OD is shown respectively.

【００５７】図４（ａ）に示す入力文字列４０１は、入
力データ列ＩＤとして左から右に向かって１文字ずつ順
次入力レジスタ１０１にストアされる。この入力文字列
４０１において、現在７番目の文字“Ｂ”が現在入力値
としてこの入力レジスタ１０１にストアされていること
を表している。The input character string 401 shown in FIG. 4A is sequentially stored in the input register 101 as the input data string ID, one character at a time from left to right. In the input character string 401, the seventh character “B” is currently stored in the input register 101 as the current input value.

【００５８】図４（ｂ）に示すように、この７番目の文
字“Ｂ”が入力レジスタ１０１にストアされていると
き、入力された文字は随時辞書メモリ１０３に書き込む
ことから、辞書メモリ１０３には辞書先頭ポインタ１０
８が指示するアドレスからそれ以前に入力された６文字
が順次記憶されている。As shown in FIG. 4B, when the seventh character "B" is stored in the input register 101, the input character is written in the dictionary memory 103 at any time. Is the dictionary top pointer 10
Six characters previously input from the address designated by 8 are sequentially stored.

【００５９】１〜５番目の文字“ＡＢ＝０；”は、各々
それが始めて入力された種類であることから、制御部１
０２はこれらの文字を辞書メモリ１０３から検索するこ
とができない。このため、これら１〜５番目の各文字は
全て不一致文字と判断され、入力レジスタ１０１からデ
ータセレクタ１０４を介して符号出力部１１５に出力さ
れることで、“０”がセットされた一致フラグとその文
字とからなる符号語（圧縮符号出力ＯＤ）として符号出
力部１１５から出力される（図４（ｃ）参照）。Since the first to fifth characters "AB = 0;" are the types that were input for the first time, the control unit 1
02 cannot retrieve these characters from the dictionary memory 103. Therefore, each of the first to fifth characters is determined to be a non-matching character, and is output from the input register 101 to the code output unit 115 via the data selector 104, whereby a match flag in which “0” is set is set. It is output from the code output unit 115 as a code word (compressed code output OD) including the characters (see FIG. 4C).

【００６０】一方、６番目の文字“Ｂ”は２番目に入力
された文字と一致するので、辞書メモリ１０３を検索し
た結果、図４（ｂ）においてＰ１で示す２番目の文字が
書き込まれているアドレス（一致位置）が一致位置ポイ
ンタ１０９に保持される。７番目の文字“Ｂ”が入力す
ると、インクリメントされた一致位置ポインタ１０９が
指示するアドレスから読み出された３番目の文字とこの
７番目の文字は比較回路１０６によって比較されるが、
これらの文字は一致しないので、６番目の文字は不一致
文字と判断される。このとき、６番目の文字は前データ
レジスタ１０５にストアされており、この６番目の文字
はデータセレクタ１０４を介して符号出力部１１５に出
力され、“０”がセットされた一致フラグが付加されて
出力される。On the other hand, the sixth character "B" matches the second input character, and as a result of searching the dictionary memory 103, the second character indicated by P1 in FIG. 4B is written. The present address (match position) is held in the match position pointer 109. When the seventh character “B” is input, the third character read from the address indicated by the incremented coincidence position pointer 109 and this seventh character are compared by the comparison circuit 106.
Since these characters do not match, the sixth character is determined to be a non-matching character. At this time, the sixth character is stored in the previous data register 105, and the sixth character is output to the code output unit 115 via the data selector 104, and the match flag with "0" set is added. Is output.

【００６１】上記したように、７番目の文字は３番目の
文字と一致しないので、制御部１０２はこの７番目の文
字を先頭文字と判別し、これと一致する文字を辞書メモ
リ１０３から検索する。７番目の文字は、２番目、及び
６番目の文字と一致しているので、この検索の結果、図
４（ｂ）においてＰ１で示す２番目の文字が書き込まれ
ているアドレス（一致位置）が一致位置ポインタ１０９
に保持され、また、図４（ｂ）においてＰ２で示す６番
目の文字が書き込まれているアドレス（一致位置）が一
致位置ポインタ１１９に保持される。この検索が終了す
ると、制御部１０２はこの７番目の文字を辞書メモリ１
０３の入力ポインタ１１３が指示するアドレスに書き込
み、一致位置ポインタ１０９、１１０、辞書先頭ポイン
タ１０８、入力ポインタ１１３の各ポインタをインクリ
メントした後、次の文字（８番目の文字“＝”）の入力
を待つ。As described above, since the 7th character does not match the 3rd character, the control unit 102 determines that this 7th character is the first character, and searches the dictionary memory 103 for the matching character. . Since the 7th character matches the 2nd and 6th characters, as a result of this search, the address (matching position) where the 2nd character indicated by P1 in FIG. Match position pointer 109
Further, the address (match position) where the sixth character P2 in FIG. 4B is written is held in the match position pointer 119. Upon completion of this search, the control unit 102 sets the seventh character to the dictionary memory 1
03 at the address indicated by the input pointer 113, incrementing each of the matching position pointers 109 and 110, the dictionary head pointer 108, and the input pointer 113, and then inputting the next character (eighth character “=”). wait.

【００６２】８番目の文字が入力すると、この８番目の
文字は各々インクリメントされた一致位置ポインタ１０
９、１１０が指示するアドレスから読み出された文字、
即ち３番目、７番目の文字と順次比較される。８番目の
文字は７番目の文字と一致しないが、３番目の文字とは
一致するので、制御部１０２は一致長カウンタ１１４を
ここでリセットし、また、この入力された８番目の文字
を入力ポインタ１１３が指示するアドレスに書き込む。
その後、一致位置ポインタ１０９、辞書先頭ポインタ１
０８、入力ポインタ１１３を各々インクリメントした
後、次の文字（９番目の文字“０”）の入力を待つ。When the eighth character is input, this eighth character is incremented by the matching position pointer 10
Characters read from the address specified by 9, 110,
That is, the third and seventh characters are sequentially compared. The 8th character does not match the 7th character but does match the 3rd character, so the control unit 102 resets the match length counter 114 here, and also inputs this input 8th character. Write to the address designated by the pointer 113.
After that, the matching position pointer 109 and the dictionary head pointer 1
After incrementing 08 and the input pointer 113 respectively, the input of the next character (the ninth character "0") is awaited.

【００６３】図４（ａ）に示すように、９〜１０番目の
文字“０；”は、４〜５番目の文字と一致している。こ
のため、以降、１１番目の文字“Ｃ”が入力されるま
で、制御部１０２は一致位置ポインタ１０９をインクリ
メントしながら、この一致位置ポインタ１０９が指示す
るアドレスから読み出した文字と入力された文字とを順
次比較することになる。この結果、１１番目の文字が入
力されるまで一致長カウンタ１１４のカウントアップは
行われ、この１１番目の文字が入力された時点の一致長
カウンタ１１４のカウント値は“２”となる。また、辞
書メモリ１０３の先頭アドレス値を“０”と想定する
と、この時の一致位置ポインタ１０９の値は“５”であ
る。As shown in FIG. 4A, the 9th to 10th characters "0;" match the 4th to 5th characters. Therefore, thereafter, until the eleventh character “C” is input, the control unit 102 increments the coincidence position pointer 109 and detects the character read from the address designated by the coincidence position pointer 109 and the input character. Will be compared sequentially. As a result, the match length counter 114 is counted up until the eleventh character is input, and the count value of the match length counter 114 at the time when the eleventh character is input is "2". Further, assuming that the head address value of the dictionary memory 103 is "0", the value of the coincidence position pointer 109 at this time is "5".

【００６４】１１番目の文字が入力すると、この文字は
一致位置ポインタ１０９が指示するアドレスの文字、即
ち６番目の文字と比較される。図４（ａ）に示すよう
に、この両者は一致しないので、比較回路１０６は不一
致と比較した結果を制御部１０２に出力する。制御部１
０２は、比較回路１０６からこの比較結果を入力する
と、最大一致系列が確定したと判断し、続けて、この最
大一致系列の一致長が２文字以上であるか否か、即ち一
致フラグを“１”とした符号語を生成させるか否か判断
する。この場合、最大一致系列は図４（ｂ）にＰ１で示
す一致位置を先頭としたその文字数が４つの部分文字列
なので、制御部１０２は“１”をセットした一致フラグ
を符号出力部１１５に出力する。また、符号出力部１１
５に一致長カウンタ１１４のカウント値、及び減算器１
１６の減算結果を順次入力させる。When the eleventh character is input, this character is compared with the character at the address indicated by the matching position pointer 109, that is, the sixth character. As shown in FIG. 4A, since the two do not match, the comparison circuit 106 outputs the result of comparison with the mismatch to the control unit 102. Control unit 1
When the comparison result is input from the comparison circuit 106, 02 determines that the maximum matching sequence has been determined, and subsequently determines whether the matching length of this maximum matching sequence is 2 characters or more, that is, the matching flag is set to "1". It is determined whether or not to generate a code word with "." In this case, the maximum matching sequence is a partial character string having four characters starting from the matching position indicated by P1 in FIG. 4B, and therefore the control unit 102 sends the matching flag set to “1” to the code output unit 115. Output. Further, the code output unit 11
5, the match length counter 114 count value, and the subtracter 1
The 16 subtraction results are sequentially input.

【００６５】このとき、制御部１０２は、アドレスセレ
クタ１０７に一致位置ポインタ１０９を選択させてい
る。従って、減算器１１６から符号出力部１１５に出力
される減算結果は図４（ｂ）のＰ１を示す値である。At this time, the control unit 102 causes the address selector 107 to select the coincidence position pointer 109. Therefore, the subtraction result output from the subtractor 116 to the code output unit 115 is a value indicating P1 in FIG.

【００６６】図４（ｃ）は、入力文字列４０１の１０番
目の文字までを符号化した際の圧縮符号出力ＯＤを示し
ている。入力文字列４０１を符号化対象とした場合、上
述したように圧縮動作（符号化処理）が行われるので、
図４（ｃ）に示すように、一致フラグに“０”がセット
されている符号語が６つ続き、７番目で符号語の一致フ
ラグに“１”がセットされる。FIG. 4C shows the compression code output OD when the tenth character up to the input character string 401 is encoded. When the input character string 401 is to be encoded, the compression operation (encoding process) is performed as described above.
As shown in FIG. 4C, six codewords in which the match flag is set to "0" continue, and at the seventh position, the matchword of the codeword is set to "1".

【００６７】この７番目の符号語は、７〜１０番目の４
文字からなる部分文字列を表したものである。上述した
ように、一致長は２文字を“０”として順次カウントア
ップさせた値なので、このときの一致長は“２”であ
る。符号語として出力する一致位置は実際の一致位置と
しているので、このときの一致位置は“１”である（こ
れは辞書メモリ１０３の先頭アドレスを“０”としたと
きの値である）。この図４（ｃ）に示す圧縮符号出力Ｏ
Ｄは、符号出力部１１５のシフトレジスタに格納された
符号語を８ビット単位で切り出すことで符号出力部１１
５から随時出力される。This 7th code word is the 7th-10th 4th
It represents a substring of characters. As described above, the match length is a value obtained by sequentially counting up two characters as "0", and thus the match length at this time is "2". Since the matching position output as the code word is the actual matching position, the matching position at this time is "1" (this is the value when the start address of the dictionary memory 103 is "0"). The compression code output O shown in FIG.
D outputs the code word stored in the shift register of the code output unit 115 by cutting out the code word in units of 8 bits.
It is output from 5 at any time.

【００６８】このように、本実施例は、符号化の対象と
なる部分文字列の先頭文字が記憶されている辞書メモリ
１０３の一致位置を確定し、その先頭文字に続く文字が
入力される度に、この文字の入力に対応させてその一致
位置から変更させたアドレスに記憶されている文字と新
たに入力された文字とを随時比較していくことで符号化
を行うため、符号化が済んでいない文字を記憶させるた
めのメモリが不要となり、装置においてメモリが占める
割合を小さくすることができる。これにより、装置の規
模、及びそのコストを抑えることができ、ＬＳＩ化も容
易となるので、本発明はコンピュータのバス等に直結し
て利用するデータ圧縮装置等に対して広く適用すること
ができる。As described above, in this embodiment, the matching position of the dictionary memory 103 in which the first character of the partial character string to be encoded is stored is determined, and the character following the first character is input. In addition, since the character stored in the address changed from the matching position corresponding to the input of this character and the newly input character are compared with each other as needed, the encoding is completed. A memory for storing non-printed characters is not needed, and the ratio of the memory in the device can be reduced. As a result, the size and cost of the device can be suppressed, and the LSI can be easily implemented. Therefore, the present invention can be widely applied to a data compression device or the like that is directly connected to a bus of a computer and used. .

【００６９】また、本実施例では、符号出力部１１５を
用いて随時圧縮符号出力ＯＤを外部に出力させている。
このため、図５に示す従来のデータ圧縮装置５００と比
較すると、備えたメモリ数が低減、即ち入力メモリ５０
１、符号メモリ５０５等を備えなくともよいことから、
装置各部の個々の制御を簡易化することができるという
効果もある。Further, in the present embodiment, the code output section 115 is used to output the compressed code output OD as needed.
Therefore, as compared with the conventional data compression device 500 shown in FIG. 5, the number of memories provided is reduced, that is, the input memory 50.
1, since the code memory 505 and the like need not be provided,
There is also an effect that individual control of each part of the device can be simplified.

【００７０】一方、符号化に要する処理時間についてこ
の両者を比較すると、一致系列の抽出の打ち切りかた、
用いる一致位置ポインタ数といった両者の最大一致系列
の検索方法の違いにもよるが、データの入力に要する処
理時間が全体に占める割合は小さいことから、基本的に
はその処理時間に大きな差は発生しない。On the other hand, comparing the two with respect to the processing time required for encoding, how to terminate the extraction of the coincidence sequence
Although it depends on the difference in the maximum matching sequence search method between the two, such as the number of matching position pointers used, the processing time required for data input occupies a small proportion of the total processing time, so basically a large difference occurs in the processing time. do not do.

【００７１】なお、本実施例では、一致位置を保持する
一致位置ポインタの数を４つとしているが、この数はこ
れに限定したものではない。一致位置ポインタ数が少な
い程、データ圧縮を高速に行うことはできるが、この一
方で圧縮比（＝圧縮後のデータ量／圧縮前のデータ量）
が大きくなるという不具合が発生し易くなる。データ圧
縮に要求する処理速度、符号化の対象となるファイルの
種類等にもよるが、上記圧縮比が大きくなるのを抑える
ためには、この数を４つ以上とすることが望ましい。In this embodiment, the number of matching position pointers holding the matching position is four, but the number is not limited to this. The smaller the number of matching position pointers, the faster the data compression can be, but the compression ratio (= the amount of data after compression / the amount of data before compression)
Is more likely to occur. Although it depends on the processing speed required for data compression, the type of file to be encoded, and the like, it is desirable to set this number to four or more in order to prevent the compression ratio from increasing.

【００７２】また、本実施例では、辞書メモリ１０３の
各アドレスを保持させる各種ポインタ１０８〜１１３を
備えているが、例えばこれらのポインタの役割を制御部
１０２に行わせてもよく、また、一致長カウンタ１１
４、減算器１１６の役割を制御部１０２に行わせてもよ
い。このように、本発明は圧縮符号出力ＯＤのデータ形
式を含めて柔軟に適用させることができるものである。Further, in the present embodiment, various pointers 108 to 113 for holding each address of the dictionary memory 103 are provided, but for example, the control unit 102 may be made to play the role of these pointers, and the pointers match. Long counter 11
4. The control unit 102 may be caused to play the role of the subtractor 116. As described above, the present invention can be flexibly applied including the data format of the compression code output OD.

【００７３】[0073]

【発明の効果】以上、説明したように本発明によれば、
文字が入力される度に、該入力された文字が部分文字列
の先頭文字に該当するか否か判別し、該判別結果に応じ
てこの入力された文字が辞書（記憶手段）に記憶されて
いるか否か検索することで繰り返し入力された部分文字
列を検出するため、文字の入力に合わせて随時文字列の
符号化を行うことができ、また、入力された文字を随時
辞書に登録することができる。これにより、符号化対象
の文字列を記憶しておく必要性を回避され、装置の規模
を小型化するとともに、そのコストを低減することがで
きる。As described above, according to the present invention,
Each time a character is input, it is determined whether or not the input character corresponds to the first character of the partial character string, and the input character is stored in the dictionary (storage means) according to the determination result. Since the substrings that have been repeatedly input are detected by searching for the presence or absence of the characters, the character strings can be encoded at any time according to the input of characters, and the input characters can be registered in the dictionary at any time. You can As a result, it is possible to avoid the need to store the character string to be encoded, reduce the size of the device, and reduce the cost.

【００７４】また、先頭文字に該当すると判別した文字
と一致する文字が記憶されている辞書の一致位置を複数
抽出して上記の検索を行うようにすることで、圧縮比が
大きくなるのを抑えることができる。Further, it is possible to prevent the compression ratio from increasing by extracting a plurality of matching positions in the dictionary in which the character matching the character determined to correspond to the first character is stored and performing the above search. be able to.

[Brief description of drawings]

【図１】本実施例によるデータ圧縮装置の構成を示すブ
ロック図である。FIG. 1 is a block diagram showing the configuration of a data compression apparatus according to this embodiment.

【図２】各ポインタが示す辞書メモリ上のアドレス例を
示す図である。FIG. 2 is a diagram showing an example of an address on a dictionary memory indicated by each pointer.

【図３】圧縮符号出力のデータ形式を説明する図であ
る。FIG. 3 is a diagram illustrating a data format of compressed code output.

【図４】圧縮動作例を具体的に説明するための図であ
る。FIG. 4 is a diagram for specifically explaining a compression operation example.

【図５】従来のデータ圧縮装置の構成を示すブロック図
である。FIG. 5 is a block diagram showing a configuration of a conventional data compression device.

[Explanation of symbols]

１００データ圧縮装置１０１入力レジスタ１０２制御部１０３辞書メモリ１０６比較回路１０７アドレスセレクタ１０８辞書先頭ポインタ１０９〜１１２一致位置ポインタ１１３入力ポインタ１１４一致長カウンタ１１５出力制御部１１６減算器 100 Data Compressor 101 Input Register 102 Control Unit 103 Dictionary Memory 106 Comparison Circuit 107 Address Selector 108 Dictionary Start Pointer 109-112 Match Position Pointer 113 Input Pointer 114 Match Length Counter 115 Output Control Unit 116 Subtractor

Claims

[Claims]

1. A storage means for storing a character string input in the past is provided, and the character string stored in the storage means is referred to,
A data compression device that compresses the amount of data that represents a character string by detecting repeatedly input partial character strings, and determines whether the input character is the first character of the partial character string. A matching position storage unit that stores, as a matching position, an address of the storage unit in which a character matching the input character is stored when the character determining unit determines the input character as a leading character. Each time a character following the character discriminated by the character discriminating means is inputted as the first character, it is stored in the storage means while sequentially changing the address from the coincident position according to the input of the character. The reading means for reading out the character, the comparing means for comparing the inputted character with the character read out from the storing means, and the character for judging the input character Counting means for counting the length of the partial character string stored in the storage means by counting the number of times that the comparison means has matched after determining that the character is input, and the character determination means is input. After the character is discriminated as the first character, and when the comparison result of the comparison means does not match,
A data compression apparatus comprising: a code generation unit that generates a code word representing a partial character string stored in the storage unit from the coincidence position and the number of characters counted by the counting unit.

2. The data compression apparatus according to claim 1, further comprising a writing unit that writes the input character into the storage unit whenever the character is input.

3. A plurality of the matching position storage means are provided, and the reading means sets an address sequentially changed from each matching position stored in the plurality of matching position storage means each time a character is input. When each of the stored characters is read out from the storage means, and the counting means finds among the characters read out from the storage means by the reading means a character that is compared with the input character by the comparing means. When the reading means reads out from the storage means and there is no character that the comparing means compares with the input character, the code generating means performs the counting once. The data compression apparatus according to claim 1 or 2, wherein a code word representing a partial character string stored in is generated.