JPH07253968A

JPH07253968A - Character string data processor

Info

Publication number: JPH07253968A
Application number: JP6257800A
Authority: JP
Inventors: Eisaku Nakatani; 栄作中谷
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 1994-10-24
Filing date: 1994-10-24
Publication date: 1995-10-03

Abstract

PURPOSE:To change a specified character string to a predetermined character size, to add modification, to perform arrangement at a predetermined position and futher to arrange other character string data to a prescribed position. CONSTITUTION:The inputted character string data are stored in the document storage area 28 of a RAM 13. Then, when it is discriminated that the specified character string data are present in the document storage area 28 by a CPU 11, the discriminated specified character string data are changed to the predetermined character size, modification information is added, the arrangement is performed at the predetermined position and further, the other character string data are arranged at the prescribed position.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ワードプロセッサ等に
用いられる文字列データ処理装置に係り、特に、特定文
字列を予め決められた文字サイズに変更すると共に修飾
を付加して予め決められた位置に配置し、さらに他の文
字列データを所定位置に配置する自動編集機能を備えた
文字列データ処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string data processing device used in a word processor or the like, and more particularly, to changing a specific character string to a predetermined character size and adding a decoration to a predetermined position. And a character string data processing device having an automatic editing function for arranging other character string data at a predetermined position.

【０００２】[0002]

【従来の技術】一般にワードプロセッサ等の文字列デー
タ処理装置においては、文書の書式は一行当りの桁数を
設定することにより規定される。そして、この書式に従
って文書が作成され、作成された文書はプリンタで印刷
される。この場合、文字の表現は、表示あるいは印刷に
おいて、ハードウェア（記憶容量、プリンタ、内蔵フォ
ント、画面解像度等）に依存する部分が多く、各ワード
プロセッサの特徴が出るところであるが、一般的には文
字として、全角、半角、１／４倍角、横倍角、縦倍角、
４倍角が使え、文字単位あるいは行単位にピッチ指定が
でき、下線、文字飾り、罫線機能を備えたものが多い。
また、文章を作成する場合、多くの文書処理装置では特
定の文字または行に対して、アンダーラインを引いた
り、文字サイズを拡大することで、その文字または行を
強調することができる。これらの操作は文字修飾と呼ば
れている。2. Description of the Related Art Generally, in a character string data processing device such as a word processor, the format of a document is defined by setting the number of digits per line. Then, a document is created according to this format, and the created document is printed by the printer. In this case, the representation of characters often depends on the hardware (storage capacity, printer, built-in font, screen resolution, etc.) in display or printing, and the characteristics of each word processor come out. , Full-width, half-width, 1/4 double-width, horizontal double-width, vertical double-width,
Quadruple angle can be used, pitch can be specified for each character or line, and many have underline, character decoration, and ruled line functions.
In addition, when creating a sentence, many document processing apparatuses can emphasize a particular character or line by underlining or enlarging the character size. These operations are called character modification.

【０００３】また、文字列データ処理装置においては、
作成した文書を印字出力する際に、所望の書式に設定し
て出力できるようになっている。このような書式の設定
には、桁間のピッチ、行間のピッチ、１ページの桁数、
１ページの行数、文字ポイント等の設定項目にそれぞれ
所望の値を入力するあるいは選択することにより行われ
ている。こうして設定された書式に基づいて、プリンタ
等の印字装置により文書が出力されるものであるが、こ
の出力に先立って、その設定された書式でどのような印
字出力が得られるかを表示装置の表示画面上に表示する
印刷イメージ表示機能を備えた日本語ワードプロセッサ
も知られている。Further, in the character string data processing device,
When printing out the created document, it can be set in a desired format and output. To set such a format, the pitch between columns, the pitch between lines, the number of columns on a page,
This is done by inputting or selecting desired values for setting items such as the number of lines on one page and character points. A document is output by a printing device such as a printer based on the format set in this way. Prior to this output, what kind of print output can be obtained in the set format is displayed on the display device. A Japanese word processor equipped with a print image display function for displaying on a display screen is also known.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、このよ
うな従来の文字列データ処理装置にあっては、操作者が
自由に文書フォーマットを設定して文書作成ができるよ
うになっているため、作成された文書が何種類もある場
合には、操作者の個性に応じて文書フォーマット（例え
ば、「見出し」、「日付」等の順番、アンダーライン、
文字飾り等の書式情報）が複数混在することになり、見
づらくなるという問題があった。すなわち、各個人の好
み、目的に応じて様々な書式設定が行えるものの、ある
人が作成した文書の書式に合わせて他人が文書を作成し
ようとするときには、その都度書式を変更しなければな
らず、このような書式決定の変更の繰り返しは多くの労
力及び時間を費やしてしまうことになる。特に、製品の
使用書や複数人の分担で執筆される論文・辞書編集等に
おいては文書フォーマットに整合性が保たなければなら
ないが、従来の文字列データ処理装置では、あらかじめ
決められている１段組の文書フォーマット（例えば、情
報処理学会誌フォーマット）を２段組の文書フォーマッ
トに変換する程度のものであったため、これではオリジ
ナルな文書フォーマットへの変換はできない。また、従
来の文字列データ処理装置の場合、構成要素の順番は変
換後のフォーマットの構成要素の順番通りに入力しなけ
ればならず、変換前の文書にも制約があった。However, in such a conventional character string data processing apparatus, the operator can freely set the document format and create a document. If there are several types of documents, the document format (for example, the order of “heading”, “date”, underline,
There is a problem that it becomes difficult to see because a plurality of format information such as character decorations) are mixed. That is, although various format settings can be made according to individual tastes and purposes, the format must be changed each time another person tries to create a document in accordance with the format of a document created by one person. Iteratively changing such a format decision requires a lot of labor and time. In particular, it is necessary to maintain the consistency of the document format in the product usage manual and thesis / dictionary editing etc. written by a plurality of people, but in the conventional character string data processing device, it is predetermined 1 Since the document format of columns (for example, the journal of Information Processing Society of Japan) is converted to the document format of two columns, it cannot be converted to the original document format. Further, in the case of the conventional character string data processing device, the order of the constituent elements has to be input in the order of the constituent elements of the format after conversion, and there is a limitation in the document before conversion.

【０００５】この発明の課題は、特定文字列を予め決め
られた文字サイズに変更すると共に修飾を付加して予め
決められた位置に配置し、さらに他の文字列データを所
定位置に配置できるようにすることである。An object of the present invention is to change a specific character string to a predetermined character size and add a decoration to arrange it at a predetermined position, and further arrange other character string data at a predetermined position. Is to

【０００６】[0006]

【課題を解決するための手段】この発明の手段は次の通
りである。文字列データ記憶手段は、入力された文字列
データを記憶する。判別手段は、前記文字列データ記憶
手段に特定の文字列データがあるか判別する。制御手段
は、前記判別された特定文字列データを予め決められた
文字サイズに変更すると共に修飾情報を付加して予め決
められた位置に配置し、さらに他の文字列データを所定
位置に配置する。The means of the present invention are as follows. The character string data storage means stores the input character string data. The determining means determines whether or not there is specific character string data in the character string data storage means. The control means changes the determined specific character string data to a predetermined character size, adds modification information, and arranges the character string data at a predetermined position, and further arranges another character string data at a predetermined position. .

【０００７】[0007]

【作用】この発明の手段の作用は次の通りである。文字
列データ記憶手段には入力された文字列データが記憶さ
れている。そして、前記文字列データ記憶手段に特定の
文字列データがあるかが判別手段によって判別され、こ
の判別された特定文字列データは予め決められた文字サ
イズに変更されると共に修飾情報を付加して予め決めら
れた位置に配置され、さらに他の文字列データを所定位
置に制御手段によって配置される。したがって、特定文
字列を予め決められた文字サイズに変更すると共に修飾
を付加して予め決められた位置に配置し、さらに他の文
字列データを所定位置に配置することができる。The operation of the means of the present invention is as follows. The input character string data is stored in the character string data storage means. Then, the determining means determines whether or not there is specific character string data in the character string data storage means. The determined specific character string data is changed to a predetermined character size, and at the same time, modification information is added. The character string data is arranged at a predetermined position, and further character string data is arranged at a predetermined position by the control means. Therefore, it is possible to change the specific character string to a predetermined character size, add a decoration and arrange the character string at a predetermined position, and further arrange other character string data at a predetermined position.

【０００８】[0008]

【実施例】以下、図１〜図１７を参照して実施例を説明
する。図１〜図１７は文書処理装置１０の一実施例を示
す図であり、ワードプロセッサに適用した例である。EXAMPLES Examples will be described below with reference to FIGS. 1 to 17 are views showing an embodiment of the document processing apparatus 10, which is an example applied to a word processor.

【０００９】先ず、構成を説明する。図１は文書処理装
置１０のブロック図である。この図において、１１は装
置全体の制御、並びに後述する文書最小分割処理、文書
レイアウト情報抽出処理、文書アレンジ情報抽出処理、
学習処理、文書構造解析処理、文書アレンジ変換処理、
文書レイアウト変換処理の制御を行なうＣＰＵであり、
ＣＰＵ１１は後述するＲＯＭ１２に格納されているマイ
クロプログラムに従って文書処理装置の各種の動作を制
御する。ＣＰＵ１１には、所定のプログラム及び文字パ
ターン等の固定データを記憶するＲＯＭ１２と、演算に
使用するデータや演算結果などを一時的に記憶するＲＡ
Ｍ１３と、キーボード１４を制御するキーボード制御部
１５と、ＯＣＲ（ｏｐｔｉｃａｌｃｈａｒａｃｔｅｒ
ｒｅａｄｅｒ：光学式文字読取装置）１６を制御する
ＯＣＲ制御部１７と、入力された画像データ及び配置情
報、アレンジ情報を画面表示するＣＲＴ１８を制御する
ＣＲＴ制御部１９と、文書ファイルを記憶するフロッピ
ディスク等の外部記憶装置２０と、外部記憶装置２０に
対してデータの書込み／読出しを制御する外部記憶制御
部２１と、文書を印刷出力するプリンタ２２を制御する
プリンタ制御部２３と、後述する文書記憶領域２８に記
憶されている文書データの構造を解析する文書解析装置
２４と、文書解析装置２４による解析結果から文書要素
のレイアウト情報（配置情報）を抽出する文書レイアウ
ト情報抽出装置２５と、文書解析装置２４による解析結
果から文書要素のアレンジ情報を抽出する文書アレンジ
情報抽出装置２６と、抽出した情報を学習した学習デー
タを基に文書データを統一された文書フォーマットに変
換する文書フォーマット変換装置２７とがそれぞれ接続
されている。First, the structure will be described. FIG. 1 is a block diagram of the document processing apparatus 10. In this figure, 11 is control of the entire apparatus, and document minimum division processing, document layout information extraction processing, document arrangement information extraction processing, which will be described later.
Learning process, document structure analysis process, document arrangement conversion process,
A CPU that controls document layout conversion processing,
The CPU 11 controls various operations of the document processing device according to a micro program stored in a ROM 12 described later. The CPU 11 includes a ROM 12 that stores fixed data such as a predetermined program and a character pattern, and an RA that temporarily stores data used for calculation and calculation results.
M13, a keyboard control unit 15 for controlling the keyboard 14, and an OCR (optical character)
(reader: optical character reading device) 16, an OCR control unit 17, a CRT control unit 19 that controls a CRT 18 that displays input image data, arrangement information, and arrangement information on a screen, and a floppy disk that stores a document file. Etc., an external storage controller 21 that controls writing / reading of data to / from the external storage device 20, a printer controller 23 that controls a printer 22 that prints out a document, and a document storage described later. A document analysis device 24 for analyzing the structure of the document data stored in the area 28, a document layout information extraction device 25 for extracting layout information (arrangement information) of document elements from the analysis result by the document analysis device 24, and a document analysis. A document arrangement information extraction device 26 for extracting arrangement information of document elements from an analysis result by the device 24; A document format converter 27 for converting the learning data learned the extracted information to a document format that is unified document data group is connected.

【００１０】ＲＯＭ１２は、システム立ち上げ時のＯＳ
（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）となるＩＰＬプ
ログラム、キーボード１４、ＯＣＲ１６、ＣＲＴ１９、
外部記憶装置２０の入出力制御を行うためのＩＯＣＳ
（ＩｎｐｕｔＯｕｔｐｕｔＣｏｎｔｒｏｌＳｙｓｔ
ｅｍ）プログラム、キーワードを検索するためのキーワ
ード辞書、及び文字フォントデータなどを記憶する固定
メモリである。The ROM 12 is an OS for starting up the system.
(Operating System) IPL program, keyboard 14, OCR 16, CRT 19,
IOCS for controlling input / output of the external storage device 20
(Input Output Control System
em) A fixed memory that stores a program, a keyword dictionary for searching keywords, character font data, and the like.

【００１１】また、ＲＡＭ１３の記憶領域は所定用途毎
に分割して使用され、具体的には、入力された文書デー
タを記憶する文書記憶領域２８、文書レイアウト情報抽
出装置２５により抽出したレイアウト情報（配置情報）
を学習して学習データとして記憶するための文書レイア
ウト情報学習領域２９と、文書アレンジ情報抽出装置２
６により抽出したアレンジ情報を学習して学習データと
して記憶するための文書アレンジ情報学習領域３０と、
演算処理中のデータを一時的に格納するワーク領域３１
を備えている。上記各記憶領域のうち、学習データを記
憶する文書レイアウト情報学習領域２９及び文書アレン
ジ情報学習領域３０は、電源オフ後もその記憶内容を保
持するＥＥＰＲＯＭ等の不揮発性メモリや電源バックア
ップされたＲＡＭが使用される。The storage area of the RAM 13 is divided and used for each predetermined purpose. Specifically, the document storage area 28 for storing the input document data and the layout information extracted by the document layout information extraction device 25 ( Placement information)
Document layout information learning area 29 for learning and storing as learning data, and a document arrangement information extracting device 2
A document arrangement information learning area 30 for learning the arrangement information extracted by 6 and storing it as learning data;
Work area 31 for temporarily storing data being processed
Is equipped with. Of the above-mentioned storage areas, the document layout information learning area 29 and the document arrangement information learning area 30 for storing the learning data include a non-volatile memory such as an EEPROM that retains the stored contents even after the power is turned off or a power-backed RAM. used.

【００１２】ＣＰＵ１１は、外部記憶装置２０に記憶さ
れている文書データを読込んでＲＡＭ１３の文書記憶領
域２８に格納する。また、ＣＰＵ１１は上記各装置及び
ＲＡＭ１３を制御して文書フォーマット学習及び文書フ
ォーマット変換を実行する。すなわち、ＣＰＵ１１によ
って制御される文書解析装置２４は、文書記憶領域２８
に格納された文書データを１行ずつ取出して文書データ
の構造を解析し（解析手法の詳細な説明は後述する）、
その解析結果から文書レイアウト情報及び文書アレンジ
情報を抽出し、抽出した情報を文書レイアウト情報学習
領域２９及び文書アレンジ情報学習領域３０に学習デー
タとして記憶する。また、統一されていない文書データ
を統一しようとするときには文書記憶領域２８から統一
とようとする文書データを読出し、文書フォーマット変
換装置２７により文書レイアウト情報学習領域２９及び
文書アレンジ情報学習領域３０から読出した学習データ
を用いて文書フォーマット変換してＣＲＴ１８あるいは
プリンタ２２等に出力する。The CPU 11 reads the document data stored in the external storage device 20 and stores it in the document storage area 28 of the RAM 13. Further, the CPU 11 controls the above-mentioned devices and the RAM 13 to execute document format learning and document format conversion. That is, the document analysis device 24 controlled by the CPU 11 has the document storage area 28
The document data stored in is extracted line by line and the structure of the document data is analyzed (detailed description of the analysis method will be given later).
Document layout information and document arrangement information are extracted from the analysis result, and the extracted information is stored as learning data in the document layout information learning area 29 and the document arrangement information learning area 30. When undocumented document data is to be unified, the document data to be unified is read from the document storage area 28, and is read from the document layout information learning area 29 and the document arrangement information learning area 30 by the document format conversion device 27. The learned learning data is used to convert the document format and output to the CRT 18 or the printer 22.

【００１３】また、上記ワーク領域３１には、図６〜図
１７で後述する各処理（すなわち、文書最小分割処理、
文書レイアウト情報抽出処理、文書アレンジ情報抽出処
理、学習処理、文書構造解析処理、文書アレンジ変換処
理）によって文書最小分割ブロックテーブル（テーブル
１）、文書構成要素ブロックテーブル（テーブル２）、
文書構成要素レイアウト情報テーブル（テーブル３）、
文書アレンジ情報テーブル（テーブル４）、入力例文書
２の文書構造解析テーブル（テーブル５）が作成され
る。Further, in the work area 31, each processing described later with reference to FIGS. 6 to 17 (that is, the document minimum division processing,
Document layout information extraction processing, document arrangement information extraction processing, learning processing, document structure analysis processing, document arrangement conversion processing), document minimum division block table (table 1), document constituent element block table (table 2),
Document component layout information table (Table 3),
A document arrangement information table (table 4) and a document structure analysis table (table 5) of the input example document 2 are created.

【００１４】キーボード１４は、英数字、平仮名等を入
力するキーや、カーソル移動キー、実行キー、中止キー
等のファンクションキーが配設された操作盤である。ま
た、キーボード制御部１５は、キーボード１４上のいず
れかのキーが操作された場合、そのキーに対応する所定
のキーコードに変換し、ＣＰＵ１１に出力する。ＯＣＲ
１６は、帳票等に印刷あるいは手書きされた文字を光学
的スキャナにより読取り、識別、判断処理した後、文字
の符号化を行なう。また、プリンタ２２は予め作成者が
設定しておいた書式情報または学習機能により文書フォ
ーマット変換された書式情報に従ってＲＡＭ１３に記憶
された文書を印刷出力するためのものである。The keyboard 14 is an operation panel provided with keys for inputting alphanumeric characters, hiragana, etc., and function keys such as a cursor movement key, an execute key, and a stop key. When any key on the keyboard 14 is operated, the keyboard control unit 15 converts the key code into a predetermined key code corresponding to the key and outputs the key code to the CPU 11. OCR
Reference numeral 16 reads a character printed or handwritten on a form or the like with an optical scanner, performs identification and determination processing, and then encodes the character. The printer 22 is for printing out the document stored in the RAM 13 in accordance with the format information preset by the creator or the format information converted by the learning function.

【００１５】次に、本実施例の動作を説明する。本実施
例に係る文書処理装置１０は、文書データの構造を解析
して、文書フォーマットの指定・学習を行なう文書フォ
ーマット学習機能と、学習した文書フォーマットを用い
て変換しようとする文書データを別の異なるフォーマッ
トの文書に変換する文書フォーマット変換機能とを備え
ている。Next, the operation of this embodiment will be described. The document processing apparatus 10 according to the present exemplary embodiment analyzes the structure of document data to specify and learn the document format, and a document format learning function, which is different from the document data to be converted using the learned document format. It has a document format conversion function for converting documents of different formats.

【００１６】上記文書フォーマット学習機能は、レイア
ウト済みの文書の構造（文字のつながりや構成要素分
類）をタイトル記号や句点の有無及びキーワード等を基
に判別して、この判別結果からフォーマット情報や文字
のアレンジ情報を得、その情報を学習するものである。
また、上記文書フォーマット変換機能は、上記文書フォ
ーマット学習と同じ手段で文書の構造を判別した後、各
構成要素を学習情報により並び替え、更に文字のアレン
ジを行うものである。The document format learning function discriminates the structure of the laid out document (character connection or component element classification) based on the presence or absence of title symbols or punctuation marks, keywords, etc., and based on the discrimination result, format information or characters. Arrangement information of and to learn the information.
Further, the document format conversion function is to determine the structure of the document by the same means as in the document format learning, sort the respective constituent elements by the learning information, and further arrange the characters.

【００１７】以下、図２〜図１７を参照しながら上記文
書フォーマット学習機能及び文書フォーマット変換機能
について具体的に説明する。The document format learning function and the document format conversion function will be specifically described below with reference to FIGS.

【００１８】図２は文書フォーマット学習される入力例
文書（入力例文書１）、図３は学習した文書フォーマッ
ト学習結果を用いて文書フォーマット変換される入力例
文書（入力例文書２）、図４は出力例文書１に変換途中
の文書を示す中間例文書（中間例文書１）、図５は文書
フォーマット変換された出力例文書（出力例文書１）を
それぞれ示す図であり、本実施例では入力例文書１に対
して文書フォーマット学習を行ない、学習した文書フォ
ーマット学習結果を用いて入力例文書２を出力例文書１
に変換する変換例を示す。2 is an input example document (input example document 1) in which the document format is learned, FIG. 3 is an input example document (input example document 2) in which the document format is converted using the learned document format learning result, FIG. 5 is an intermediate example document (intermediate example document 1) showing a document being converted into the output example document 1, and FIG. 5 is a diagram showing an output example document (output example document 1) whose document format has been converted. In the present embodiment, Document format learning is performed on the input example document 1, and the input example document 2 is output using the learned document format learning result.
An example of conversion to

【００１９】なお、本実施例では説明の便宜上、文字は
全角のみを使用し、同一行に複数個の文書構成要素ブロ
ック（図６及び図７で後述する）は存在しない例文を用
いるものとする。In the present embodiment, for the sake of convenience of explanation, only full-width characters are used, and an example sentence in which a plurality of document component block (described later in FIGS. 6 and 7) does not exist in the same line is used. .

【００２０】文書フォーマット学習文書フォーマット学習は、文書最小分割処理（図６及び
図７）、文書レイアウト情報抽出処理（図８〜図１
１）、文書アレンジ情報抽出処理（図１２〜図１４）及
び学習処理からなる。また、既に前記ＲＡＭ１３の文書
記憶領域２８には図２に示す入力例文書１の文書が格納
されているものとする。Document Format Learning Document format learning includes document minimum division processing (FIGS. 6 and 7) and document layout information extraction processing (FIGS. 8 to 1).
1), document arrangement information extraction processing (FIGS. 12 to 14), and learning processing. Further, it is assumed that the document of the input example document 1 shown in FIG. 2 is already stored in the document storage area 28 of the RAM 13.

【００２１】文書最小分割処理図６は文書の構造を解析するために文書データを最小ブ
ロックに区分けする文書最小分割処理を示すフローチャ
ートであり、本文書最小分割処理を実行することにより
図７に示す文書最小分割ブロックテーブル（テーブル
１）が作成される。図６中、符号Ｓｎ（ｎ＝１，２，
…）はフローの各ステップを示している。Document Minimum Division Processing FIG. 6 is a flowchart showing the document minimum division processing for dividing the document data into the minimum blocks in order to analyze the structure of the document, and is shown in FIG. 7 by executing this document minimum division processing. A document minimum divided block table (table 1) is created. In FIG. 6, reference numeral Sn (n = 1, 2,
...) indicates each step of the flow.

【００２２】先ず、ステップＳ１でＲＡＭ１３の文書記
憶領域２８から文書データを１行づつ取出し、ステップ
Ｓ２で文書データの取出しができかた否かを判別する。
文書の取出しができなかったときは文書末と判断して本
フローの処理を終え、文書の取出しができたときはステ
ップＳ３で文書行取出しポインタを更新する。次いで、
ステップＳ４で有効文字をサーチし、ステップＳ５で有
効文字があると判別されたときはステップＳ６に進み、
有効文字がないと判断したときはステップＳ１に戻って
次の行の文書データの取出し処理を行なう。ここで、有
効文字サーチによって有効文字とされる文字データ以外
のデータ（例えば、スペース）は読飛ばされる。この有
効文字サーチを行なうことにより、例えば有効文字がス
ペースを挟んで２箇所あれば１ブロックを作成した後、
同一行にもう１つのブロックを作成することができる。
従って、文書データの行数よりも多くのブロックが作成
されることもある。First, in step S1, the document data is extracted line by line from the document storage area 28 of the RAM 13, and in step S2, it is determined whether or not the document data could not be extracted.
When the document cannot be taken out, it is determined that the document is at the end and the processing of this flow is ended. When the document can be taken out, the document line take-out pointer is updated in step S3. Then
In step S4, a valid character is searched, and if it is determined in step S5 that there is a valid character, the process proceeds to step S6.
If it is determined that there is no valid character, the process returns to step S1 and the document data of the next line is taken out. Here, data (for example, a space) other than the character data that is regarded as an effective character by the effective character search is skipped. By performing this effective character search, for example, after creating one block if there are two effective characters across a space,
You can create another block in the same row.
Therefore, more blocks may be created than the number of lines of document data.

【００２３】ステップＳ６では文書行位置及び開始桁位
置を図７に示す文書最小分割ブロックテーブル（テーブ
ル１）にセットし、ステップＳ７でスペース・未入力、
改行が現れるまで文字サーチを行なう。次いで、ステッ
プＳ８で終了桁位置を文書最小分割ブロックテーブル
（テーブル１）にセットし、ステップＳ９で「ブロック
属性（後述）」を文書最小分割ブロックテーブル（テー
ブル１）にセットしてステップＳ４に戻って上記処理を
繰り返す。In step S6, the document line position and the starting column position are set in the document minimum divided block table (table 1) shown in FIG.
Perform character search until a line break appears. Next, in step S8, the end digit position is set in the document minimum divided block table (table 1), in step S9 "block attribute (described later)" is set in the document minimum divided block table (table 1), and the process returns to step S4. Then, the above processing is repeated.

【００２４】上記文書最小分割処理により、ＲＡＭ１３
のワーク領域３１に図７に示すような文書最小分割ブロ
ックテーブル（テーブル１）が作成されることになる。
この文書最小分割ブロックテーブル（テーブル１）は、
文書記憶領域２８から１行づつ取出した文書データを、
未入力・改行・スペースで区切られた最小ブロック１−
１，１−２，…，１−５，１−６に分割して記憶するブ
ロックテーブルである。文書最小分割ブロックテーブル
の１ブロックは、図７に示すように文書の区切り情報に
基づく「文書行位置」、「開始桁位置」及び「終了桁位
置」と、文書最小分割ブロックが何で区切られたか及
び、句点・タイトル記号を含んでいるかを示す「ブロッ
ク属性」とからなり、上述した最小分割ブロック毎に上
記「ブロック属性」、「文書行位置」、「開始桁位置」
及び「終了桁位置」がセットされる。By the above document minimum division processing, the RAM 13
In this work area 31, the document minimum divided block table (table 1) as shown in FIG. 7 is created.
This document minimum divided block table (Table 1) is
Document data extracted line by line from the document storage area 28,
Smallest block 1-No input, line feed, space
1, 1-2, ..., 1-5, 1-6 are divided and stored in a block table. As shown in FIG. 7, one block of the document minimum division block table is divided into “document line position”, “start column position”, and “end column position” based on the document delimiter information, and how the document minimum division block is divided. And a "block attribute" indicating whether or not a punctuation mark / title symbol is included, and the above "block attribute", "document line position", and "start column position" for each of the above-mentioned minimum divided blocks.
And "end digit position" are set.

【００２５】また、上記「ブロック属性」は、８ビット
を４ビットと４ビットに分けたビット情報でセットされ
るものであり、各ビットに対する内容は次のようなもの
である。すなわち、ビット情報ｏｎの場合は「０」が句
点を含む、「１」がタイトル記号を含む、「２」がスペ
ースで分割される、「３」が未入力・改行で分割され
る、「４」が行末まで文字が続く、「５」〜「７」が未
使用、であることをそれぞれ示している。例えば、文書
最小分割ブロック１−１のブロック属性「０８Ｈ」は、
８＝２3で表される３ビット目のビット情報「未入力・
改行で分割される」を示している。また、文書最小分割
ブロック１−５のブロック属性「１０Ｈ」は、４ビット
目のビット情報「行末まで文字が続く」を示している。
さらに、文書最小分割ブロック１−６のブロック属性
「０９Ｈ」は、「０８Ｈ」にビット情報「０」を加えた
ものであり、したがって、上述した３ビット目のビット
情報「未入力・改行で分割される」とビット情報「句点
を含む」であることを示している。The "block attribute" is set by bit information in which 8 bits are divided into 4 bits and 4 bits, and the contents for each bit are as follows. That is, in the case of bit information on, “0” includes a punctuation mark, “1” includes a title symbol, “2” is divided by a space, “3” is divided by a non-input / line feed, “4” "Indicates that characters continue to the end of the line, and" 5 "to" 7 "are unused. For example, the block attribute “08H” of the minimum document division block 1-1 is
Bit information of the 3rd bit represented by 8 = 23
It is separated by a line break. " Further, the block attribute "10H" of the minimum document division block 1-5 indicates the bit information of the 4th bit "characters continue to the end of line".
Furthermore, the block attribute “09H” of the minimum document division block 1-6 is obtained by adding the bit information “0” to “08H”. Therefore, the above-mentioned third bit information “divided by non-input / line feed” is used. It is shown that the bit information is “included”.

【００２６】このように、上記文書最小分割処理では、
文書解析装置２４が、文書記憶領域２８から文書データ
を１行ずつ取出し、１行ずつ取出した文書データを未入
力・改行・スペースで区切られた最小分割ブロックに分
割し、文書最小分割ブロックテーブル（テーブル１）を
ワーク領域３１に作成する。この時に、文書最小分割ブ
ロックが何で区切られたか及び句点・タイトル記号を含
んであるかを示すブロック属性もセットする。As described above, in the above document minimum division processing,
The document analysis device 24 takes out the document data line by line from the document storage area 28 and divides the document data taken out line by line into the minimum divided blocks separated by no input, line feed, and space, and the document minimum divided block table ( The table 1) is created in the work area 31. At this time, a block attribute indicating what the document minimum division block is divided into and including a punctuation mark / title symbol is also set.

【００２７】以下、入力例文書１（図２）を例に採り上
記文書最小分割処理を具体的に説明する。先ず、図２に
示す入力例文書１の文書データから、文書１行目を取出
し、有効文字をサーチする。この場合の有効文字は「平
成３年１０月２３日」であるから「文書行位置」は「０
１」行目、「開始桁位置」は「２５」桁目、「終了桁位
置」は「３４」桁目となり、これらの情報が文書最小分
割ブロックテーブル（テーブル１）に文書最小分割ブロ
ック１−１としてセットされる。また、有効文字「平成
３年１０月２３日」は改行で分割されて最小分割ブロッ
クとなっているから「ブロック属性」は「０８Ｈ」とな
り、そのビット情報には「３：未入力・改行で分割され
る」がセットされる。このように、先ず文書１行目が取
り出され、２４桁目までのスペースは読み飛ばされ２５
桁目からの「平成３年１０月２３日」が文書最小分割ブ
ロックテーブル（テーブル１）の最小分割ブロック１−
１となる。この最小分割ブロックは改行で区切られてい
るので「ブロック属性」が「０８Ｈ」となる。次に、文
書２行目が取り出されるが、この２行目には有効文字は
なく行のみであるため、最小分割ブロックは作成されな
い。そして、次に文書３行目が取り出されるが、この３
行目には先頭から有効文字「出張報告書」があり、かつ
改行で区切られているから「ブロック属性」は「０８
Ｈ」、「文書行位置」は「０３」行目、「開始桁位置」
は「１３」桁目、「終了桁位置」は「２２」桁目とな
り、これらの情報が文書最小分割ブロックテーブル（テ
ーブル１）の最小分割ブロック１−２としてセットされ
る。このようにして入力例文書１の最小行まで解析さ
れ、全ての最小分割ブロックが作成される。The document minimum division process will be specifically described below by taking the input example document 1 (FIG. 2) as an example. First, the first line of the document is extracted from the document data of the input example document 1 shown in FIG. 2 and effective characters are searched. In this case, the valid characters are "October 23, 1991", so "Document line position" is "0".
The 1st line, the "start digit position" is the "25" digit, and the "end digit position" is the "34" digit, and these pieces of information are stored in the document minimum divided block table (table 1). Set as 1. Also, since the valid character "October 23, 1991" is divided by a line feed to form a minimum divided block, the "block attribute" is "08H", and the bit information is "3: No input / line feed". "Split" is set. Thus, the first line of the document is first fetched, and the space up to the 24th digit is skipped.
"October 23, 1991" from the digit is the smallest divided block of the document smallest divided block table (Table 1) 1-
It becomes 1. Since this minimum divided block is separated by a line feed, the "block attribute" is "08H". Next, the second line of the document is extracted, but since there are no valid characters in this second line and only lines are included, the minimum divided block is not created. Then, the third line of the document is taken out next.
In the line, there is a valid character "Business trip report" from the beginning, and since it is separated by a line feed, "Block attribute" is "08
"H" and "Document line position" are "03" line, "Start column position"
Becomes the "13" th digit and the "end digit position" becomes the "22" th digit, and these pieces of information are set as the minimum divided block 1-2 of the document minimum divided block table (table 1). In this way, the minimum line of the input example document 1 is analyzed, and all the minimum divided blocks are created.

【００２８】文書レイアウト情報抽出処理図８は文書データ構造の解析結果から文書要素のレイア
ウト情報を抽出する文書レイアウト情報抽出処理を示す
フローチャートであり、本フローは前記文書最小分割ブ
ロックが連結できるか否かを判別してブロックの属性を
付与する処理である。本処理を実行することにより図１
０に示す文書構成要素ブロックテーブル（テーブル２）
及び図１１に示す文書レイアウト情報テーブル（テーブ
ル３）が作成される。Document Layout Information Extraction Process FIG. 8 is a flowchart showing the document layout information extraction process for extracting the layout information of the document element from the analysis result of the document data structure. This flow shows whether or not the document minimum division blocks can be connected. This is a process of determining whether or not and assigning a block attribute. By executing this process,
Document component block table shown in 0 (Table 2)
A document layout information table (table 3) shown in FIG. 11 is created.

【００２９】先ず、ステップＳ１１で前記文書最小分割
処理（図６）で作成した文書最小分割ブロックを取出
し、ステップＳ１２で文書最小分割ブロックが取出しで
きたか否かを判別する。文書最小分割ブロックの取出し
ができたときはステップＳ１３で取出しポインタを更新
し、ステップＳ１４で図９に示すキーワード辞書を参照
して取出した文書最小分割ブロックの文字列にキーワー
ドがあるか否かをチェックする。First, in step S11, the minimum document division block created in the minimum document division process (FIG. 6) is taken out, and in step S12 it is determined whether the minimum document division block has been extracted. When the document minimum division block has been extracted, the extraction pointer is updated in step S13, and it is determined in step S14 whether or not there is a keyword in the extracted document minimum division block character string by referring to the keyword dictionary shown in FIG. To check.

【００３０】ここで、キーワード辞書とは、文書構成要
素特有の用語を辞書化したものであり、ＲＯＭ１２に記
憶されている。キーワード辞書は、文書構成要素分類に
よって分類されており、最小分割ブロックから文書構成
要素ブロックを作成する段階で、そのブロックの構成要
素分類を決定する要素の一つとなる。例えば、その一例
として図９に示すようなキーワード辞書がある。図９に
示すキーワード辞書は、日付に関するキーワード「平
成」、「昭和」、「明治」、…、「年」、「月」、
「日」等と、宛先・差出人に関するキーワード「株式会
社」、「有限会社」、…、「営業部」、「企画部」等を
備えている。このキーワード辞書を参照することによっ
て文書最小分割ブロックの文字列の中にキーワード辞書
に登録されたキーワードがあるか否かをチェックする。
例えば、文書最小分割ブロックテーブル（テーブル１）
の最小分割ブロック１−１（図２の入力例文書１の１行
目を分割したブロック）の文字列には、「平成」、
「月」、「日」の日付に関するキーワードがあり、ま
た、図２の入力例文書１の５行目を分割したブロックの
文字列には、「営業部」という宛名・差出人に関するキ
ーワードがある。なお、キーワード辞書を参照して該当
する文字列が見つかっただけでは、これによって直ちに
所定の文書構成要素ブロックであるとすることはでき
ず、次の文字列・ブロックとの連結状態をみることによ
って初めて文書構成要素ブロックであると判断すること
ができる。例えば、上述した最小分割ブロック１−１の
文字列には「平成」などがあるから、上記ステップＳ１
４のキーワードチェックの段階で日付情報のキーワード
として候補となり、また、この文字列は改行で終わって
いるから結果的に日付の文書構成要素ブロックとされ
る。ところが、キーワード辞書に登録されている同じ
「平成」のキーワードであっても、例えば図２の入力例
文書１の８行目の文字列「平成」では改行等がなく、そ
の前後の文字列から通常の文書の文字列がつながってい
ると判別されるから日付の文書構成ブロックと判断され
ない。図８に示す文書レイアウト情報抽出処理フローに
戻って、ステップＳ１４でキーワードチェックが済むと
ステップＳ１５で文書を取出した文書最小分割ブロック
が次の文書最小分割ブロックと連結するかを判断し、連
結するときはステップＳ１１に戻って上記処理を繰り返
すことによって文書最小分割ブロック同士を連結する。
また、取出した文書最小分割ブロックが次の文書最小分
割ブロックと連結しないときはステップＳ１７に進む。
また、上記ステップＳ１２で文書最小分割ブロックがな
いときはそのままステップＳ１７に進む。このようにし
て、キーワードがチェックされた後、文書最小分割ブロ
ックの連結が決定されると文書構成要素ブロックが作成
できることになり、図１０に示す文書構成要素ブロック
テーブル（テーブル２）が作成される。また、この文書
構成要素ブロックの位置情報は以下に述べるステップＳ
１７及びＳ１８で図１１に示す文書レイアウト情報テー
ブル（テーブル３）にセットされることになる。すなわ
ち、ステップＳ１７で上記キーワードチェック及びタイ
トル記号等を参照して「構成要素分類（図１０参照）」
を決定しこの「構成要素分類」を図１０に示す文書構成
要素ブロックテーブル（テーブル２）にセットするとと
もに、連結情報を基に決定された「先頭最小分割ブロッ
ク番号」及び「最終最小分割ブロック番号」を文書構成
要素ブロックテーブル（テーブル２）にセットし、さら
に、上記「構成要素分類」を図１１に示す文書レイアウ
ト情報テーブル（テーブル３）にセットする。次いで、
ステップＳ１８で上記文書構成要素ブロックの位置情報
を文書レイアウト情報テーブル（テーブル３）にセット
し、ステップＳ１９でまだ取出すべき文書最小分割ブロ
ックがあるか否かをチェックし、ステップＳ２０で取出
すべき次の文書最小分割ブロックがあると判別されたと
きはステップＳ１１に戻って次の文書最小分割ブロック
について同様の配置情報抽出処理を繰り返す。また、全
ブロックが終了していると判別されたときには本フロー
の処理を終える。Here, the keyword dictionary is a dictionary of terms peculiar to document constituent elements and is stored in the ROM 12. The keyword dictionary is classified according to the document constituent element classification, and is one of the elements that determines the constituent element classification of the block at the stage of creating the document constituent element block from the smallest divided block. For example, there is a keyword dictionary as shown in FIG. 9 as an example. The keyword dictionary shown in FIG. 9 includes keywords related to dates “Heisei”, “Showa”, “Meiji”, ..., “Year”, “Month”,
It is provided with "day" and the like, and the keywords "corporation", "limited company", ..., "sales department", "planning department", etc. regarding the destination and sender. By referring to this keyword dictionary, it is checked whether or not there is a keyword registered in the keyword dictionary in the character string of the document minimum division block.
For example, the minimum document division block table (Table 1)
The minimum divided block 1-1 (the block obtained by dividing the first line of the input example document 1 in FIG. 2) is “Heisei”,
There are keywords relating to the date of "month" and "day", and the character string of the block obtained by dividing the fifth line of the input example document 1 in FIG. 2 has a keyword relating to the address and sender of "sales department". Note that it is not possible to immediately determine that it is a predetermined document component block by simply finding the corresponding character string by referring to the keyword dictionary, and by checking the connection state with the next character string / block. For the first time, it can be determined that the block is a document component block. For example, since there is "Heisei" in the character string of the above-mentioned minimum division block 1-1, the above step S1
It becomes a candidate as a keyword of the date information at the keyword check stage of No. 4, and since this character string ends with a line feed, it is consequently a document component block of the date. However, even if the same keyword “Heisei” is registered in the keyword dictionary, for example, in the character string “Heisei” on the 8th line of the input example document 1 in FIG. Since it is determined that the character strings of a normal document are connected, it is not determined to be the document configuration block of the date. Returning to the document layout information extraction processing flow shown in FIG. 8, when the keyword check is completed in step S14, it is determined in step S15 whether the smallest document divided block taken out of the document is joined with the next smallest document divided block, and the document is divided. In this case, the process returns to step S11 and the above process is repeated to connect the document minimum divided blocks.
If the extracted minimum document divided block is not connected to the next minimum document divided block, the process proceeds to step S17.
If there is no minimum document division block in step S12, the process directly proceeds to step S17. In this way, after the keyword is checked and the concatenation of the document minimum division blocks is determined, the document component block can be created, and the document component block table (table 2) shown in FIG. 10 is created. . Further, the position information of this document component block is obtained in step S described below.
In 17 and S18, the document layout information table (table 3) shown in FIG. 11 is set. That is, in step S17, the “component classification (see FIG. 10)” is performed by referring to the keyword check and the title symbol.
And the "component classification" is set in the document component block table (table 2) shown in FIG. 10, and the "start minimum divided block number" and "final minimum divided block number" are determined based on the connection information. Is set in the document component block table (table 2), and the above-mentioned “component classification” is set in the document layout information table (table 3) shown in FIG. Then
In step S18, the position information of the document component block is set in the document layout information table (table 3), in step S19 it is checked whether there is a minimum document division block to be extracted, and in step S20 the next When it is determined that there is the smallest document divided block, the process returns to step S11 and the same arrangement information extraction process is repeated for the next smallest document divided block. When it is determined that all the blocks are finished, the process of this flow is finished.

【００３１】上記文書レイアウト情報抽出処理により、
ＲＡＭ１３のワーク領域３１に図１０に示す文書構成要
素ブロックテーブル（テーブル２）及び図１１に示す文
書レイアウト情報テーブル（テーブル３）が作成される
ことになる。By the above document layout information extraction processing,
The document component block table (table 2) shown in FIG. 10 and the document layout information table (table 3) shown in FIG. 11 are created in the work area 31 of the RAM 13.

【００３２】上記文書構成要素ブロックテーブル（テー
ブル２）は、文書最小分割ブロックの１つ１つについて
文書要素を分類しこれを文書構成要素ブロック２−１，
２−２，…毎に分類・記憶するものである。文書構成要
素ブロックテーブル（テーブル２）の文書構成要素ブロ
ックの１ブロックは図１０に示すように日付、タイトル
等の最小分割文書の構成要素を示す「構成要素分類」
と、最小分割ブロックの先頭及び最終番号を示す「先頭
最小分割ブロック番号」及び「最終最小分割ブロック番
号」とからなり、「構成要素分類」の内容は次のような
ものである。すなわち、「０１」が日付、「０２」が作
成者、「０３」が差出人、「０４」が宛名、「０５」が
大見出し、「０６」が序文、「０７」が本文、「０８」
が追記文、「０９」がその他であることをそれぞれ示
す。例えば、文書構成要素ブロック２−１は、先頭最小
分割ブロック番号が「０１」の最小分割ブロック（前記
図７の最小分割ブロック１−１に該当する）ものであ
り、この最小分割ブロックは図７の文書最小分割ブロッ
クテーブル（テーブル１）のブロック属性から分かるよ
うに次に連結される最小分割ブロックはないから最終最
小分割ブロック番号は「０１」となる。また、この文書
構成要素ブロック２−１の構成要素分類は「０１」の
「日付」である（図２の入力例文書１の１行目参照）。
また、文書構成要素ブロック２−２は、先頭最小分割ブ
ロック番号が「０２」の最小分割ブロック（前記図７の
最小分割ブロック２−１に該当する）ものであり、この
最小分割ブロックは文書最小分割ブロックテーブル（テ
ーブル１）のブロック属性から分かるように次に連結さ
れる最小分割ブロックはないから最終最小分割ブロック
番号は「０２」となる。また、この文書構成要素ブロッ
ク２−２の構成要素分類は「０５」の「大見出し」であ
る（図２の入力例文書１の３行目参照）。さらに、文書
構成要素ブロック２−５は、先頭最小分割ブロック番号
が「０５」の最小分割ブロック（前記図７の最小分割ブ
ロック１−５に該当する）ものであり、この最小分割ブ
ロックは図２の入力例文書１の８行目及び９行目から明
かなように次に図７の最小分割ブロック１−６が連結さ
れるから最終最小分割ブロック番号は「０６」となる。
また、この文書構成要素ブロック２−５の構成要素分類
は「０６」の「序文」である。ここで、「序文」という
のは図２の入力例文書１に示すように「記」の後に続く
本文の前に置かれる文書をいう。The document constituent element block table (table 2) classifies the document elements for each of the document minimum division blocks, and classifies the document elements into document constituent element blocks 2-1 and 2-1.
It is classified and stored for each 2-2, .... As shown in FIG. 10, one block of the document component block of the document component block table (Table 2) indicates the component of the smallest divided document such as date, title, etc. "component classification".
And the "start minimum divided block number" and "final minimum divided block number" indicating the start and end numbers of the minimum divided block, and the contents of the "component classification" are as follows. That is, "01" is the date, "02" is the creator, "03" is the sender, "04" is the addressee, "05" is the headline, "06" is the preface, "07" is the text, and "08".
Indicates a postscript, and “09” indicates other. For example, the document component block 2-1 is the smallest divided block having the head smallest divided block number "01" (corresponding to the smallest divided block 1-1 in FIG. 7), and this smallest divided block is shown in FIG. As can be seen from the block attribute of the document minimum divided block table (Table 1), the final minimum divided block number is “01” because there is no minimum divided block to be connected next. The component classification of this document component block 2-1 is "date" of "01" (see the first line of the input example document 1 in FIG. 2).
Further, the document component block 2-2 is the smallest divided block having the head smallest divided block number "02" (corresponding to the smallest divided block 2-1 in FIG. 7), and this smallest divided block is the smallest document. As can be seen from the block attribute of the divided block table (Table 1), there is no minimum divided block to be connected next, so the final minimum divided block number is “02”. The constituent element classification of this document constituent element block 2-2 is "05""majorheading" (see the third line of the input example document 1 in FIG. 2). Further, the document component block 2-5 is a minimum division block having a leading minimum division block number "05" (corresponding to the minimum division block 1-5 in FIG. 7), and this minimum division block is shown in FIG. As is apparent from the 8th and 9th lines of the input example document 1, the minimum divided block number 1-6 of FIG. 7 is connected next, and the final minimum divided block number is "06".
Further, the component classification of this document component block 2-5 is the "preface" of "06". Here, the "preface" refers to a document placed before the text following "notation" as shown in the input example document 1 in FIG.

【００３３】一方、上記レイアウト情報テーブル（テー
ブル３）の文書レイアウト情報ブロックの１ブロック
は、上述した文書構成要素ブロックで用いた「構成要素
分類」と、そのブロックの順番を記憶する「位置情報」
からなり、「位置情報」の内容は「０１」が左よせ、
「０２」がセンタリング、「０４」が右よせ、である。On the other hand, one block of the document layout information block of the layout information table (Table 3) is "positional information" which stores the "component classification" used in the above-mentioned document component block and the order of the block.
The contents of "Location information" are "01" on the left,
"02" is centering, and "04" is right-handed.

【００３４】このように、上記文書レイアウト情報抽出
処理では、図６の文書最小分割処理で作成した文書最小
分割ブロックの１つ１つについて図９に示したキーワー
ド辞書及びタイトル記号（例えば、１．、２．、−、
○、☆等）を参考にしながら文書構成要素ブロックテー
ブル（テーブル２）を作成し、同時にそのブロックの位
置情報を格納した文書レイアウト情報テーブル（テーブ
ル３）をワーク領域３１に作成する。この文書レイアウ
ト情報テーブル（テーブル３）に格納される順序が文書
上の文書構成要素ブロックのレイアウト順序を表わすこ
とになる。例えば、この文書レイアウト情報テーブル
（テーブル３）上で「大見出し」より「日付」が先にあ
れば、それは文書レイアウトでも「大見出し」より「日
付」が先にレイアウトされることを表わす。すなわち、
最初に文書最小分割ブロックに切り分けしたものを、あ
る一定のグループに纏められるものは纏めてその連結さ
れたブロックに文書構造要素を表わすブロック属性とそ
の順番（位置情報）を順次文書レイアウト情報テーブル
（テーブル３）に学習データとして記憶しておくように
する。そして、後述する文書フォーマット変換を行なう
場合には、この文書レイアウト情報テーブル（テーブル
３）に従って配置変換すべき文書データの文書レイアウ
トが変換されることになる。As described above, in the document layout information extracting process, the keyword dictionary and title symbols (for example, 1. 2.,-,
The document component block table (table 2) is created while referring to (○, ☆, etc.), and at the same time, the document layout information table (table 3) storing the position information of the block is created in the work area 31. The order stored in the document layout information table (Table 3) represents the layout order of the document component block on the document. For example, if "date" precedes "major heading" on this document layout information table (table 3), it means that "date" is laid out earlier than "major heading" in the document layout. That is,
Documents that are first divided into the document minimum division blocks are grouped into a certain group, and the block attributes representing the document structure elements and their order (position information) are sequentially arranged in the connected blocks. It should be stored as learning data in Table 3). When the document format conversion described later is performed, the document layout of the document data to be layout-converted is converted according to the document layout information table (Table 3).

【００３５】以下、入力例文書１を例に採り上記文書レ
イアウト情報抽出処理を具体的に説明する。先ず、文書
最小分割ブロックテーブル（テーブル１）の先頭の文書
最小分割ブロック１−１が取出される。次に、キーワー
ド辞書を参照すると、この文書最小ブロック１−１の文
字列の中に「日付」に関するキーワードが含まれている
ことが分かる。この文書最小ブロック１−１が「日付」
の記述のみで構成されていること及び文書の先頭であっ
て、かつ右よせされていることなどから、「日付」の文
書構成要素ブロックであると判断される。次の最小分割
ブロック１−２は前記最小分割ブロック１−１と１行離
れているため、連結されず別の文書構成要素ブロックと
される。この段階で、構成要素分類「０１（日付）」を
持つ文書構成要素テーブル（テーブル２）の文書構成要
素ブロック２−１が作成され、この文書構成要素ブロッ
ク２−１は最小分割ブロック１−１のみで構成されてい
るので「先頭最小分割ブロック番号」及び「最終最小分
割ブロック番号」はともに「０１（最小分割ブロック１
−１を示す）」がセットされる。と同時に、文書レイア
ウト情報テーブル（テーブル３）の文書レイアウト情報
ブロック３−１が作成され、構成要素分類情報「０１
（日付）」及び位置情報「０４（右よせ）」が文書レイ
アウト情報ブロック３−１にセットされる。最小分割ブ
ロックが連結される例としては、入力例文書１の行位置
０８行目がある。この行位置０８行目の最小分割ブロッ
ク１−５は行末まで文字が続いているので、次の最小分
割ブロック１−６と連結される。また、最小分割ブロッ
ク１−６が句点を含み、本文の前にあることなどの条件
から、この文書構成要素ブロックは序文と見なされ、文
書構成要素ブロック３−５が作成される。このようにし
て、文書構成要素ブロックテーブル（テーブル２）及び
文書レイアウト情報テーブル（テーブル３）が作成され
る。The document layout information extraction process will be described in detail below by taking the input example document 1 as an example. First, the first document minimum divided block 1-1 of the document minimum divided block table (table 1) is taken out. Next, referring to the keyword dictionary, it can be seen that the character string of this document minimum block 1-1 includes a keyword regarding "date". This document minimum block 1-1 is "date"
It is determined that the block is a document component block of "date" because it is composed only of the above description and is at the beginning of the document and right-aligned. The next minimum division block 1-2 is separated from the minimum division block 1-1 by one line, and thus is not connected and is regarded as another document component block. At this stage, the document component block 2-1 of the document component table (table 2) having the component classification “01 (date)” is created, and this document component block 2-1 is the minimum division block 1-1. Since it is composed of only "first minimum divided block number" and "final minimum divided block number", "01 (minimum divided block 1
-1) "is set. At the same time, the document layout information block 3-1 of the document layout information table (Table 3) is created, and the component classification information “01
(Date) "and position information" 04 (right) "are set in the document layout information block 3-1. An example in which the smallest divided blocks are connected is line position 08 of the input example document 1. Since characters continue to the end of the line in the smallest divided block 1-5 at the line position 08, they are connected to the next smallest divided block 1-6. Further, from the condition that the minimum divided block 1-6 includes a punctuation mark and is before the text, this document component block is regarded as an introduction, and the document component block 3-5 is created. In this way, the document component block table (table 2) and the document layout information table (table 3) are created.

【００３６】上記文書構成要素ブロックテーブル（テー
ブル２）及び文書レイアウト情報テーブル（テーブル
３）が作成されることによって文書レイアウト情報（配
置情報）を利用した学習ができることになるが、本実施
例に係る文書処理装置１０では上述した文書レイアウト
情報抽出処理に加えて、書式パターンや倍角、アンダー
ライン等のアレンジ情報も学習可能にするために、上記
文書レイアウト情報抽出処理で行ったレイアウト情報抽
出処理と同様の処理をアレンジ情報（書式パターン情報
・修飾情報・個別情報等）抽出処理として行なうように
する。By using the document component block table (table 2) and the document layout information table (table 3) described above, learning using the document layout information (arrangement information) can be performed. In addition to the document layout information extraction processing described above, the document processing apparatus 10 is similar to the layout information extraction processing performed in the document layout information extraction processing in order to enable learning of arrangement information such as format patterns, double-width characters, and underlines. Is performed as arrangement information (format pattern information, decoration information, individual information, etc.) extraction processing.

【００３７】文書アレンジ情報抽出処理図１２は文書データ構造の解析結果から文書要素のアレ
ンジ情報を抽出する文書アレンジ情報抽出処理を示すフ
ローチャートであり、本処理を実行することにより図１
４に示す文書アレンジ情報テーブル（テーブル４）が作
成される。また、図１３は修飾情報を文書アレンジ情報
テーブル（テーブル４）にセットする修飾情報セット処
理を示すフローチャートである。Document Arrangement Information Extraction Process FIG. 12 is a flow chart showing a document arrangement information extraction process for extracting arrangement information of document elements from the analysis result of the document data structure. By executing this process, FIG.
A document arrangement information table (table 4) shown in 4 is created. FIG. 13 is a flowchart showing a modification information setting process of setting modification information in the document arrangement information table (table 4).

【００３８】図１２において、先ず、ステップＳ２１で
前記文書レイアウト情報抽出処理（図８）で作成した文
書構成要素ブロックテーブル（テーブル２）の文書構成
要素ブロックを取出し、ステップＳ２２で文書構成要素
ブロックの取出しができたか否かを判別する。文書構成
要素ブロックの取出しができなかったときは次ブロック
なしと判断して本フローの処理を終え、文書構成要素ブ
ロックの取出しができたときはステップＳ２３で取出し
ポインタを更新する。次いで、ステップＳ２４で取出し
た文書構成要素ブロックテーブル（テーブル２）の文書
構成要素ブロックの「構成要素分類（図１０参照）」を
図１３に示す文書アレンジ情報テーブル（テーブル４）
にセットする。すなわち、前記文書構成要素ブロックテ
ーブル（テーブル２）の文書構成要素ブロックの「構成
要素分類」と同一の構成要素分類情報が文書アレンジ情
報テーブル（テーブル４）にセットされる。次いで、ス
テップＳ２３で後述する「構成要素分類別書式パター
ン」を文書アレンジ情報テーブル（テーブル４）にセッ
トする。次いで、ステップＳ２４で「修飾情報」を文書
アレンジ情報テーブル（テーブル４）にセットし、ステ
ップＳ２５で「個別情報」を文書アレンジ情報テーブル
（テーブル４）にセットしてステップＳ２１に戻って上
記文書アレンジ抽出処理を繰り返す。この場合、取出し
た文書構成要素ブロックの文書に「修飾情報（例えば、
倍角、アンダーライン、網かけ等）」がなければ修飾情
報サイズのみ（すなわち、ワード情報のみ）がセットさ
れることとなり、「修飾情報」があるときには図１３に
示す修飾情報セット処理フローで修飾情報がセットされ
る。「文書成要素分類」によって固有のアレンジ情報が
あったときはそのアレンジ情報は文書アレンジ情報ブロ
ックの「個別情報」にセットされる。In FIG. 12, first, in step S21, the document component block of the document component block table (table 2) created in the document layout information extraction processing (FIG. 8) is taken out, and in step S22, the document component block is extracted. It is determined whether or not the product can be taken out. When the document component block cannot be taken out, it is determined that there is no next block, and the processing of this flow is ended. When the document component block can be taken out, the take-out pointer is updated in step S23. Next, the document arrangement information table (table 4) shown in FIG. 13 showing the “component classification (see FIG. 10)” of the document component block of the document component block table (table 2) extracted in step S24.
Set to. That is, the same component classification information as the “component classification” of the document component block of the document component block table (Table 2) is set in the document arrangement information table (Table 4). Then, in step S23, a "format pattern by component element classification" described later is set in the document arrangement information table (table 4). Next, in step S24, "qualification information" is set in the document arrangement information table (table 4), in step S25 "individual information" is set in the document arrangement information table (table 4), and the process returns to step S21 to set the document arrangement. Repeat the extraction process. In this case, "qualification information (for example,
If there is no "double-width, underline, shading, etc.)", only the modification information size (that is, only word information) is set, and when there is "modification information", the modification information is set in the modification information setting processing flow shown in FIG. Is set. When there is unique arrangement information according to the "document element classification", the arrangement information is set in the "individual information" of the document arrangement information block.

【００３９】図１３は修飾情報を文書アレンジ情報テー
ブル（テーブル４）にセットする修飾情報セット処理の
フローチャートであり、修飾情報の一例としてアンダー
ライン修飾をセットする例を示す。FIG. 13 is a flowchart of the modification information setting process for setting the modification information in the document arrangement information table (table 4), and shows an example of setting the underline modification as an example of the modification information.

【００４０】先ず、ステップＳ３１でアンダーライン修
飾されているかをチェックし、ステップＳ３２でアンダ
ーライン修飾があると判別されたときにはステップＳ３
３でアンダーラインの線種（例えば、細実線アンダーラ
イン、太実線アンダーライン等）を取込み、ステップＳ
３４でこの取込んだ情報を基に修飾情報を作成して文書
アレンジ情報テーブル（テーブル４）にセットする。ま
た、上記ステップＳ３２でアンダーライン修飾がないと
判別されたときにはそのままステップＳ３５に進む。次
いで、ステップＳ３５でその他の修飾情報（例えば、網
かけ）について同様の処理を行ってその修飾情報を文書
アレンジ情報テーブル（テーブル４）にセットして本フ
ローの処理を終える。First, in step S31, it is checked whether the underline is modified. If it is determined in step S32 that the underline is modified, step S3 is performed.
Incorporate the underline line type in 3 (for example, thin solid line underline, thick solid line underline, etc.), and step S
At 34, modification information is created based on this fetched information and set in the document arrangement information table (table 4). If it is determined in step S32 that there is no underline modification, the process directly proceeds to step S35. Next, in step S35, similar processing is performed for other modification information (for example, shading), the modification information is set in the document arrangement information table (table 4), and the processing of this flow ends.

【００４１】上記文書アレンジ抽出処理及び修飾情報セ
ット処理により、ＲＡＭ１３のワーク領域３１に図１４
に示すよう文書アレンジ情報テーブル（テーブル４）が
作成されることになる。As a result of the document arrangement extraction processing and the modification information set processing, the work area 31 of the RAM 13 is shown in FIG.
A document arrangement information table (table 4) is created as shown in FIG.

【００４２】上記文書アレンジ情報テーブル（テーブル
４）は、文書構成要素ブロックの１つ１つについてアレ
ンジ情報ブロック４−１，４−２，…毎に記憶するもの
である。文書アレンジ情報テーブル（テーブル４）の文
書アレンジ情報ブロックの１ブロックは図１４に示すよ
うに文書の構成要素を示す「構成要素分類」、「書式パ
ターン」、「修飾情報サイズ」及び「個別情報サイズ」
と、修飾情報がある場合にセットされる「修飾情報」と
からなり、このうち、「構成要素分類」は前記文書構成
要素ブロックの「構成要素分類」と同一である。また、
「書式パターン」の内容としては、例えば構成要素分類
が「日付（平成３年１０月２３日）」の書式パターンで
は「０１」が平成３年１０月２３日、「０２」が１９９
１．１０．２３、「０３」がその他、である。また、
「修飾情報サイズ」及び「個別情報サイズ」には修飾情
報を格納する上記文書アレンジ情報ブロックのサイズ
（バイト数で表わす）であり、例えば「０００４」は４
バイト分のサイズがこのブロック内に確保されることを
示す。The document arrangement information table (table 4) is stored for each arrangement information block 4-1, 4-2, ... For each of the document constituent element blocks. As shown in FIG. 14, one block of the document arrangement information block of the document arrangement information table (Table 4) indicates the constituent elements of the document, that is, "component classification", "format pattern", "modification information size" and "individual information size". "
And "qualification information" that is set when there is qualification information, of which "component classification" is the same as the "component classification" of the document component block. Also,
As the content of the “format pattern”, for example, in the format pattern in which the component classification is “date (October 23, 1991)”, “01” is October 23, 1991, and “02” is 199.
1.10.23 and "03" are others. Also,
The "qualification information size" and the "individual information size" are the size (expressed in the number of bytes) of the document arrangement information block that stores the modification information, and "0004" is 4 for example.
Indicates that the byte size is reserved in this block.

【００４３】また、文書アレンジ情報テーブル（テーブ
ル４）の個別情報部分には、本文などのアレンジ情報の
段落字下げ情報やタイトル番号の種類（「１．」「」
など）の情報のようなその文書構成要素ブロック特有の
アレンジ情報が格納される。一方、修飾情報があったと
きにセットされる「修飾情報」は、上位４ビットで修飾
種を表わし、下位４ビットで修飾パターンを表わす（但
し、修飾種により異なる）ものとする。例えば、上位４
ビットについて「０１」が文字サイズ変更、「０２」が
アンダーライン、「０４」が網かけ、「０８」がその他
の文字修飾、である。また、例えば、下位４ビットは文
字サイズ変更、アンダーライン、網かけについて夫々
「００」が全角、細実線、網かけ１、「０１」が半角、
太実線、網かけ２、「０２」が横倍角、細破線、網かけ
３、「０３」が縦倍角、太破線、網かけ４、である。Further, in the individual information portion of the document arrangement information table (table 4), the paragraph indentation information of the arrangement information such as the text and the type of the title number (“1.” “”).
Arrangement information specific to the document component block, such as the information (1) is stored. On the other hand, the “modification information” that is set when there is modification information is such that the upper 4 bits represent the modification type and the lower 4 bits represent the modification pattern (however, it depends on the modification type). For example, the top 4
Regarding the bit, “01” is a character size change, “02” is an underline, “04” is a halftone, and “08” is another character decoration. Further, for example, in the lower 4 bits, “00” is full-width, thin solid line, half-tone 1, “01” is half-width for character size change, underline, and half-tone, respectively.
Thick solid line, shaded 2, “02” are horizontal double angle, thin broken line, shaded 3, “03” are vertical double angle, thick broken line, shaded 4.

【００４４】このように、上記文書アレンジ抽出処理で
は、図８の文書レイアウト情報抽出処理で作成された文
書構成要素ブロックテーブル（テーブル２）の文書構成
要素ブロックの１つ１つについてアレンジ情報（書式パ
ターン情報・修飾情報・個別情報など）を抽出し、文書
アレンジ情報テーブル（テーブル４）をワーク領域３１
に作成する。また、文書構成要素分類によって固有のア
レンジ情報は文書アレンジ情報テーブル（テーブル４）
の個別情報部分に格納される。As described above, in the document arrangement extraction process, the arrangement information (format) is set for each of the document component blocks of the document component block table (Table 2) created in the document layout information extraction process of FIG. (Pattern information, decoration information, individual information, etc.) is extracted, and the document arrangement information table (Table 4) is set in the work area 31.
To create. Arrangement information unique to the document component classification is a document arrangement information table (Table 4).
It is stored in the individual information part of.

【００４５】以下、入力例文書１を例に採り上記文書ア
レンジ情報抽出処理を具体的に説明する。先ず、文書最
構成要素ブロックテーブル（テーブル２）の先頭の文書
構成要素ブロック２−１が取出されると同時に、文書ア
レンジ情報テーブル（テーブル４９に文書アレンジ情報
ブロック４−１を作成し、この文書アレンジ情報テーブ
ル（テーブル４）に文書構造要素ブロック２−１と同一
の構成要素分類情報をセットする。次に構成要素分類別
書式パターン情報をセットすることになるが、この場合
「構成要素分類」が「日付」であるので日付の書式パタ
ーン分類の「０１（「平成＊＊年＊＊月＊＊日」パター
ン）」が文書アレンジ情報ブロック４−１にセットされ
る。次の修飾情報は文書上ブロック何の修飾も行われて
いないので、自分を含めたサイズ「０００２（ワード情
報）」のみがセットされる。次に文書構成要素ブロック
２−２が取出され文書アレンジ情報ブロック４−２にセ
ットされることになる。このようにして文書アレンジ情
報テーブル（テーブル４）が作成されていく。The document arrangement information extracting process will be specifically described below by taking the input example document 1 as an example. First, at the same time that the top document constituent element block 2-1 of the document maximum constituent element block table (table 2) is taken out, a document arrangement information table (document arrangement information block 4-1 is created in the table 49, In the arrangement information table (table 4), the same constituent element classification information as that of the document structural element block 2-1 is set, and then the constituent pattern classification-specific format pattern information is set. Is “date”, the date format pattern classification “01 (“ Heisei ** year ** month ** day ”pattern)” is set in the document arrangement information block 4-1. The next modification information is the document. Since the upper block is not modified, only the size "0002 (word information)" including itself is set. So that is set in the document arrangement information block 4-2 is taken out. In this way the document arrangement information table (table 4) is gradually created.

【００４６】学習処理学習処理は、上述の処理で得られた文書レイアウト情報
及び文書アレンジ情報を、文書フォーマット変換等で使
用できる形態にして記憶しておく処理である。Learning Process The learning process is a process of storing the document layout information and the document arrangement information obtained by the above-described process in a form that can be used for document format conversion and the like.

【００４７】すなわち、文書解析装置２４により前述の
処理でワーク領域３１上に作成された文書レイアウト情
報テーブル（テーブル３）及び文書アレンジ情報テーブ
ル（テーブル４）が、それぞれ文書レイアウト情報学習
情報２９及び文書アレンジ情報学習領域３０に格納され
ることで学習が行われる。この文書レイアウト情報学習
領域２９及び文書アレンジ学習領域３０は文書作成装置
の電源をオフしても内容は保持されるものとし、不揮発
性メモリ等により構成される。また、これらの学習領域
２９，３０の学習データは、外部記憶装置２０に保存す
ることもできる。That is, the document layout information learning table 29 and the document layout information table (table 3) and the document arrangement information table (table 4) created in the work area 31 by the document analysis device 24 are respectively processed. Learning is performed by being stored in the arrangement information learning area 30. Contents of the document layout information learning area 29 and the document arrangement learning area 30 are retained even when the power of the document creating apparatus is turned off, and are configured by a non-volatile memory or the like. Further, the learning data of these learning areas 29 and 30 can be stored in the external storage device 20.

【００４８】以上により文書最小分割処理（図６及び図
７）、文書レイアウト情報抽出処理（図８〜図１１）、
文書アレンジ情報抽出処理（図１２〜図１４）及び学習
処理からなる文書フォーマット学習の説明を終え、次に
学習情報に基づいて文書フォーマットを変換する文書フ
ォーマット変換について詳細に説明する。As described above, the document minimum division processing (FIGS. 6 and 7), the document layout information extraction processing (FIGS. 8 to 11),
After the description of the document format learning including the document arrangement information extraction processing (FIGS. 12 to 14) and the learning processing is finished, the document format conversion for converting the document format based on the learning information will be described in detail.

【００４９】文書フォーマット変換文書フォーマット変換は、文書構造解析処理（図１
５）、文書アレンジ変換処理（図１６）及び文書レイア
ウト変換処理（図１７）からなる。また、前述した学習
処理が終了し、文書フォーマット学習情報が文書レイア
ウト情報学習領域２９及び文書アレンジ情報学習領域３
０に既に格納されているものとし、図３に示す入力例文
書２を入力例文書１の学習結果に従って変換して出力例
文書１として出力する場合を例に採り説明する。Document Format Conversion The document format conversion is a document structure analysis process (see FIG. 1).
5), document arrangement conversion processing (FIG. 16) and document layout conversion processing (FIG. 17). Further, the learning process described above is completed, and the document format learning information is stored in the document layout information learning area 29 and the document arrangement information learning area 3.
The input example document 2 shown in FIG. 3 is converted according to the learning result of the input example document 1 and output as the output example document 1 as an example.

【００５０】文書構造解析処理この文書構造解析処理は、入力文の文書構造を解析して
ブロックの範囲と構成要素分類を持つ文書構造解析テー
ブル（テーブル５）をワーク領域３１に作成する処理で
あり、この文書構造解析処理は前述した文書最小分割処
理（図６及び図７）及び文書レイアウト情報抽出処理
（図８〜図１１）と結果として作成されるテーブルのフ
ォーマットが多少異なるだけで解析手順は略同一であ
る。すなわち、図６及び図８に示す処理フローと同様な
処理によって文書構造解析テーブル（テーブル５）を作
成することができ、この文書構造解析テーブル（テーブ
ル５）は前記図１０の文書構成要素ブロックテーブル
（テーブル２）に相当する。Document Structure Analysis Process This document structure analysis process is a process of analyzing the document structure of an input sentence and creating a document structure analysis table (Table 5) having a block range and a component classification in the work area 31. This document structure analysis process is slightly different from the above-described document minimum division process (FIGS. 6 and 7) and document layout information extraction process (FIGS. 8 to 11) in the format of the resulting table, and the analysis procedure is slightly different. It is almost the same. That is, the document structure analysis table (table 5) can be created by the same processing as the processing flows shown in FIGS. 6 and 8. This document structure analysis table (table 5) is the document component block table of FIG. This corresponds to (Table 2).

【００５１】上記文書構造解析テーブル（テーブル５）
は、入力例文書２の文書最小分割ブロックの１つ１つに
ついて文書構成要素を分類しこれを文書構造解析ブロッ
ク５−１，５−５，…毎に分類記憶するものである。ま
た、文書構造解析ブロックテーブル（テーブル５）の文
書構造解析要素ブロックの１ブロックは日付、タイトル
等の文書の構成要素を示す「構成要素分類」と、ブロッ
クの範囲を示す「開始行位置」及び「終了行位置」から
なる。Document structure analysis table (Table 5)
Is to classify document constituent elements for each of the minimum document division blocks of the input example document 2 and classify and store the document constituent elements for each of the document structure analysis blocks 5-1 5-5. In addition, one block of the document structure analysis element block of the document structure analysis block table (Table 5) is a "component classification" indicating a document component such as a date and a title, a "start line position" indicating a block range, and It consists of "end line position".

【００５２】なお、文書フォーマット変換の場合は、文
書構造解析テーブル（テーブル５）だけあればよく、前
記文書アレンジ情報テーブル（テーブル５）は必要では
ない。すなわち、文書のフォーマット変換は、フォーマ
ット学習と同じ手段で文書の構造を判断した後、各構成
要素を学習情報により並び替え、更に文字のアレンジを
行うものであるため、統一すべき文書の文書アレンジ情
報テーブル（テーブル４）があればよい。従って、ブロ
ックがどういう種類のものかさえ分かればよく、文書フ
ォーマット学習時に作成したレイアウト情報（配置情
報）やアレンジ情報をそのブロックに付加していくだけ
である。In the case of document format conversion, only the document structure analysis table (table 5) is required, and the document arrangement information table (table 5) is not necessary. That is, since the format conversion of a document determines the structure of the document by the same means as the format learning, rearranges each constituent element according to the learning information, and further arranges the characters, the document arrangement of the documents to be unified. An information table (table 4) is sufficient. Therefore, it suffices to know what kind of block the block is, and only adds the layout information (arrangement information) and arrangement information created at the time of learning the document format to the block.

【００５３】文書アレンジ変換処理図１６は学習したアレンジ情報に基づいて入力文書（入
力例文書２）のアレンジを行なう文書アレンジ変換処理
のフローチャートである。Document Arrangement Conversion Processing FIG. 16 is a flowchart of the document arrangement conversion processing for arranging the input document (input example document 2) based on the learned arrangement information.

【００５４】先ず、ステップＳ４１で入力文書の文書構
造を解析し、ステップＳ４２で入力文書の文書構造解析
テーブル（テーブル５）から文書構造解析ブロックを取
出し、ステップＳ４３で文書構造解析テーブル（テーブ
ル５）から文書構造解析ブロックの取出しができたか否
かを判別する。ここで、文書構造の解析は前述した文書
最小分割処理、文書レイアウト情報抽出処理による文書
情報の解析手順と全く同じように解析される。文書構造
解析ブロックの取出しができなかったときは次ブロック
なしと判断して本フローの処理を終え、文書構造解析ブ
ロックの取出しができたときはステップＳ４４で取出し
ポインタを更新する。次いで、ステップＳ４５で前記文
書アレンジ情報テーブル（テーブル４）から同じ「構成
要素分類」を持つブロックをサーチし、ステップＳ４６
で書式パターンの変更情報があるかをチェックする。ス
テップＳ４７で書式パターンの変更があると判別された
ときはステップＳ４８で書式パターンを変更し、書式パ
ターンの変更がないときにはそのままステップＳ４９に
進む。すなわち、文書アレンジ情報学習領域３０に格納
されている同じ構成要素ブロックから同じ構成要素分類
情報を持つ文書アレンジ情報ブロックの学習に従って書
式パターンの変更が行われる。これにより、文書の書式
が文書アレンジ変換しようとする文書の書式パターン
（例えば、文書サイズ）に変更される。First, the document structure of the input document is analyzed in step S41, the document structure analysis block is extracted from the document structure analysis table (table 5) of the input document in step S42, and the document structure analysis table (table 5) is retrieved in step S43. It is determined whether the document structure analysis block has been successfully extracted. Here, the analysis of the document structure is performed in the same manner as the document information analysis procedure by the document minimum division processing and the document layout information extraction processing described above. If the document structure analysis block cannot be extracted, it is determined that there is no next block, and the processing of this flow ends. If the document structure analysis block can be extracted, the extraction pointer is updated in step S44. Next, in step S45, a block having the same “component classification” is searched from the document arrangement information table (table 4), and step S46.
Check whether there is any change information of the format pattern with. When it is determined in step S47 that the format pattern is changed, the format pattern is changed in step S48, and when the format pattern is not changed, the process directly proceeds to step S49. That is, the format pattern is changed from the same constituent block stored in the document arrangement information learning area 30 according to the learning of the document arrangement information block having the same constituent classification information. As a result, the format of the document is changed to the format pattern (for example, the document size) of the document to be subjected to the document arrangement conversion.

【００５５】次いで、ステップＳ４９で修飾情報がある
かをチェックし、ステップＳ５０で修飾情報があると判
別されたときはステップＳ５１で文字修飾を行ってステ
ップＳ５２に進み、修飾情報がないときにはそのままス
テップＳ５２に進む。ステップＳ５２では個別アレンジ
情報があるかをチェックし、ステップＳ５３で個別アレ
ンジ情報があると判別されたときはステップＳ５４で個
別アレンジを行ってステップＳ５５に進み、個別アレン
ジ情報がないときにはそのままステップＳ５５に進む。
ステップＳ５５では上記書式パターン、文字修飾、個別
アレンジ等のアレンジ変換が行われた文書を、ワーク領
域３１に中間文書として出力して該当ブロックにおける
文書アレンジ変換を終えてステップＳ４２に戻り、上記
処理を文書構造解析ブロックがなくなるまで繰り返す。Next, in step S49, it is checked whether or not there is modification information. If it is determined in step S50 that there is modification information, character modification is performed in step S51, and the process proceeds to step S52. Proceed to S52. In step S52, it is checked whether or not there is individual arrangement information. When it is determined in step S53 that there is individual arrangement information, individual arrangement is performed in step S54 and the process proceeds to step S55. When there is no individual arrangement information, the process directly proceeds to step S55. move on.
In step S55, the document in which the arrangement conversion such as the format pattern, the character modification, and the individual arrangement is performed is output to the work area 31 as an intermediate document, the document arrangement conversion in the corresponding block is completed, and the process returns to step S42 to perform the above processing. Repeat until there are no document structure analysis blocks.

【００５６】このように、上記文書アレンジ変換処理で
は、図１５の文書構造解析処理で作成された文書構造解
析テーブル（テーブル５）先頭の構成要素ブロックか
ら、文書アレンジ情報学習領域３０に格納されている文
書アレンジ情報テーブル（テーブル４）の同じ構成要素
ブロックを取出す。そして、文書アレンジ情報学習領域
３０に格納されている同じ構成要素を持つ文書アレンジ
情報ブロックの学習に従って文書のアレンジ（例えば、
書式パターンの変更、文字修飾等）を行なう。この文書
アレンジ変換の結果は、ワーク領域３１に中間文書の形
で出力される。ここで、ワーク領域３１に一時的に格納
される中間文書は、文書のアレンジ変換はされている
が、レイアウト位置の変更はまだ行われていない文書で
あり、図４の中間例文書１で示される。As described above, in the above document arrangement conversion processing, the component block at the head of the document structure analysis table (Table 5) created in the document structure analysis processing of FIG. 15 is stored in the document arrangement information learning area 30. The same component block of the existing document arrangement information table (table 4) is extracted. Then, according to the learning of the document arrangement information block having the same components stored in the document arrangement information learning area 30, the arrangement of the document (for example,
Change the format pattern, character modification, etc.). The result of this document arrangement conversion is output to the work area 31 in the form of an intermediate document. Here, the intermediate document that is temporarily stored in the work area 31 is a document that has undergone the arrangement conversion of the document but the layout position has not been changed, and is shown in the intermediate example document 1 of FIG. Be done.

【００５７】以下、入力例文書２及び中間例文書１を例
に採り上記文書アレンジ変換処理を具体的に説明する。
先ず、最初に入力例文書２の文書構造解析テーブル（テ
ーブル５）の先頭のブロック５−１が取出される。この
文書構造解析ブロック５−１の構成要素分類情報は「日
付（０１）」であり、これと同じ構成要素分類情報を持
つ文書アレンジ情報学習領域３０に格納されている文書
アレンジ情報ブロック４−１に従ってアレンジが行われ
る。文書記憶領域２８に格納されている文書データ「１
９９１．１１．１５」は文書アレンジ情報ブロック４−
１の書式パターン情報によって「平成３年１１月１５
日」に変更される。次に、本来修飾情報及び個別情報に
従ってアレンジされているが、文書アレンジ情報ブロッ
ク４−１ではこのアレンジ情報がないのでそのままであ
る。このアレンジされた中間文書はワーク領域３１に出
力される。このようにして、ワーク領域３１上に図４に
示す中間例文書１が作成される。この中間例文書１は、
入力例文書１（図２）のアレンジ情報に従って入力例文
書２（図３）の内容をアレンジ変換した文書である。従
って、この中間例文書１では入力例文書１のレイアウト
位置の変更は行われておらず、書式パターン等のアレン
ジ情報のみが変更されている。例えば、入力例文書２の
行位置０１の日付「１９９１．１１．１５」は、入力例
文書１の０１行目の日付のアレンジ情報学習「平成＊＊
年＊＊月＊＊日」に合わせて中間例文書１の行位置０１
に示すように「平成３年１１月１５日」にアレンジ変換
される。また、入力例文書２の行位置０３の全角の大見
出し（タイトル）「出張報告書」は、入力例文書１の０
３行目の行位置０３の倍角アンダーラインの大見出し
（タイトル）「出張報告書」に合わせて中間
例文書１の行位置０３に示すように「出張報告
書」にアレンジ変換される。このように、文書レイア
ウト（配置）変換以外のアレンジ変換後の文書が中間例
文書１としてワーク領域３１に作成されることになる。The document arrangement conversion process will be described in detail below by taking the input example document 2 and the intermediate example document 1 as examples.
First, the first block 5-1 of the document structure analysis table (table 5) of the input example document 2 is taken out. The constituent element classification information of this document structure analysis block 5-1 is “date (01)”, and the document arrangement information block 4-1 stored in the document arrangement information learning area 30 having the same constituent element classification information as this. Arrangements are made according to. The document data “1 stored in the document storage area 28
991.11.15 "is the document arrangement information block 4-
According to the format pattern information of No. 1, "November 15, 1991"
It is changed to "day". Next, although originally arranged according to the modification information and the individual information, the document arrangement information block 4-1 does not have this arrangement information, so it is left as it is. The arranged intermediate document is output to the work area 31. In this way, the intermediate example document 1 shown in FIG. 4 is created on the work area 31. This intermediate example document 1
This is a document obtained by arranging the contents of the input example document 2 (FIG. 3) according to the arrangement information of the input example document 1 (FIG. 2). Therefore, in the intermediate example document 1, the layout position of the input example document 1 is not changed, and only the arrangement information such as the format pattern is changed. For example, the date “1991.11.15” at the line position 01 of the input example document 2 is arranged information learning “Heisei **” on the 01st line of the input example document 1.
Line position 01 of intermediate example document 1 according to "year ** month ** day"
The arrangement is converted to "November 15, 1991" as shown in. In addition, the full-width large headline (title) “trip report” at line position 03 of the input example document 2 is 0 in the input example document 1.
In line with the large headline (title) “ Business trip report ” of double-width underline at line position 03 of the third line, as shown in line position 03 of the intermediate example document 1, “ Business trip report ”
It is arranged into a book . In this way, the document after the arrangement conversion other than the document layout (arrangement) conversion is created in the work area 31 as the intermediate example document 1.

【００５８】ワーク領域３１に作成された中間例文書１
は以下に述べる文書レイアウト変換処理によって最終的
な出力例文書１（図５）にレイアウト変換され、一連の
文書フォーマット学習・文書フォーマット変換が終了す
る。Intermediate example document 1 created in work area 31
Is subjected to the layout conversion to the final output example document 1 (FIG. 5) by the document layout conversion processing described below, and a series of document format learning / document format conversion is completed.

【００５９】文書レイアウト変換処理図１７は学習した文書レイアウト情報に基づいて中間文
書に作成されたアレンジ変換後の文書の配置を変換して
最終的な出力文書を出力する文書レイアウト変換処理の
フローチャートである。Document Layout Conversion Process FIG. 17 is a flowchart of a document layout conversion process for converting the arrangement of the arranged document after arrangement conversion created in the intermediate document based on the learned document layout information and outputting the final output document. is there.

【００６０】先ず、ステップＳ６１で前記文書レイアウ
ト情報抽出処理（図８）で作成した文書レイアウト情報
テーブル（テーブル３）の文書レイアウト情報ブロック
を取出し、ステップＳ６２で文書レイアウト情報ブロッ
クの取出しができたか否かを判別する。文書レイアウト
情報ブロックの取出しができなかったときは次ブロック
なしと判断して本フローの処理を終え、文書レイアウト
情報ブロックの取出しができたときはステップＳ６３で
取出しポインタを更新する。次いで、ステップＳ６４で
対応する入力文の文書構造解析テーブル（テーブル５）
の文書構造解析テーブルブロックがあるかをチェック
し、ステップＳ６５で対応する入力文の文書構造解析テ
ーブルブロックがあると判別されたときはステップＳ６
６で中間文書の対応する部分を文書フォーマット変換後
の最終的な文書（出力例文書１）として文書記憶領域２
８に出力してステップＳ６１に戻り、文書レイアウト情
報ブロックがなくなるまで上記処理を繰り返す。一方、
ステップＳ６５で対応する入力文の文書構造解析テーブ
ルブロックがないと判別されたときはステップＳ６１に
戻り上記処理を繰り返す。First, in step S61, the document layout information block of the document layout information table (table 3) created in the document layout information extraction processing (FIG. 8) is taken out, and in step S62, whether or not the document layout information block can be taken out. Determine whether. If the document layout information block cannot be taken out, it is determined that there is no next block, and the processing of this flow is terminated. If the document layout information block can be taken out, the take-out pointer is updated in step S63. Next, in step S64, the document structure analysis table (table 5) of the corresponding input sentence
Of the document structure analysis table block is checked, and if it is determined in step S65 that there is a corresponding document structure analysis table block of the input sentence, step S6.
6, the corresponding portion of the intermediate document is used as the final document (output example document 1) after the document format conversion and the document storage area 2
Then, the process returns to step S61, and the above process is repeated until there are no document layout information blocks. on the other hand,
When it is determined in step S65 that there is no corresponding document structure analysis table block of the input sentence, the process returns to step S61 to repeat the above process.

【００６１】このように、上記文書レイアウト変換処理
では、前記文書アレンジ変換処理でワーク領域３１上に
作成された中間文書が、文書レイアウト情報学習領域２
９に学習されている文書レイアウト情報テーブル（テー
ブル３）に従って文書記憶領域２８に出力される。As described above, in the document layout conversion processing, the intermediate document created on the work area 31 by the document arrangement conversion processing is the document layout information learning area 2.
9 is output to the document storage area 28 according to the document layout information table (table 3) learned in FIG.

【００６２】中間例文書１により具体的に説明すると、
最初に文書レイアウト情報テーブル（テーブル３）の先
頭ブロック３−１が取出され、それと同じ構成要素分類
情報「日付」を持つ文書構造解析テーブル（テーブル
５）上の文書構造解析ブロック５−１が見つけられ、そ
れに対応するワーク領域３１にある中間例文書１の部分
が文書記憶領域２８に出力される。次に、文書レイアウ
ト情報テーブル（テーブル３）では文書レイアウト情報
ブロック３−２の「大見出し」が先にあるので中間例文
書１の「大見出し」に対応する文書部分が文書記憶領域
２８に出力される。これにより、入力例文書２と出力例
文書１で「大見出し」と「差出人」のレイアウトが入れ
替えられたことになる。このようにして中間例文書１は
全て出力例文書１にレイアウト変換されることになる。
最終的には、入力例文書２の内容を持つ中間例文書１
が、入力例文書１のレイアウト情報に従って並べ替えら
れて出力される。More specifically with reference to the intermediate example document 1,
First, the top block 3-1 of the document layout information table (Table 3) is taken out, and the document structure analysis block 5-1 on the document structure analysis table (Table 5) having the same component classification information "date" as that is found. Then, the portion of the intermediate example document 1 in the work area 31 corresponding thereto is output to the document storage area 28. Next, in the document layout information table (table 3), the “large headline” of the document layout information block 3-2 is first output, so the document portion corresponding to the “large headline” of the intermediate example document 1 is output to the document storage area 28. To be done. As a result, the layouts of the "large headline" and the "sender" in the input example document 2 and the output example document 1 are switched. In this way, the layout of the intermediate example document 1 is entirely converted into the output example document 1.
Finally, the intermediate example document 1 having the contents of the input example document 2
Are sorted and output according to the layout information of the input example document 1.

【００６３】以上説明したように、本実施例の文書処理
装置１０は、文書記憶領域２８に記憶されている文書デ
ータの構造を解析する文書解析装置２４と、文書解析装
置２４による解析結果から文書要素のレイアウト情報を
抽出する文書レイアウト情報抽出装置２５と、文書レイ
アウト情報抽出装置２５により抽出したレイアウト情報
を学習するための文書レイアウト情報学習領域２９とを
備え、ＣＰＵ１１によって制御される文書解析装置２４
は、文書記憶領域２８に格納された文書データを１行ず
つ取出して文書データの構造を解析し、その解析結果か
ら文書レイアウト情報を抽出し、抽出した情報を文書レ
イアウト情報学習領域２９に学習データとして記憶する
ようにしているので、他の文書を作成する際に学習して
いる配置情報をＣＲＴ１８やプリンタ２２に表示、印字
するようにすれば、これら表示、印字された配置情報を
参照することによって、元となる文書と同じレイアウト
を持つ文書を容易に作成することができる。As described above, the document processing apparatus 10 according to the present embodiment analyzes the document data structure stored in the document storage area 28 by the document analysis apparatus 24 and the analysis result by the document analysis apparatus 24. A document layout information extracting device 25 for extracting layout information of elements, and a document layout information learning area 29 for learning the layout information extracted by the document layout information extracting device 25, and a document analysis device 24 controlled by the CPU 11
Retrieves the document data stored in the document storage area 28 line by line, analyzes the structure of the document data, extracts the document layout information from the analysis result, and stores the extracted information in the document layout information learning area 29. Since the layout information learned when creating another document is displayed and printed on the CRT 18 or the printer 22, the displayed and printed layout information can be referred to. With this, it is possible to easily create a document having the same layout as the original document.

【００６４】また、本実施例の文書処理装置１０は、文
書フォーマットの学習が簡単に行えるという効果がある
が、文書データの読取りの際、ＯＣＲ１６等を用いて文
字を直接読取って符号化して前記学習を行うようにすれ
ばより作業性を高めることが可能になる。Further, although the document processing apparatus 10 of this embodiment has an effect that the learning of the document format can be easily performed, when reading the document data, the characters are directly read and encoded by using the OCR 16 or the like. If learning is performed, workability can be improved.

【００６５】なお、本実施例では、文書構成要素ブロッ
クを行単位としているが、これには限定されず、桁単位
の構造管理の追加等を行うことで、同一行に２つ以上の
文書構成要素ブロックがあっても同様に処理することが
できる。In the present embodiment, the document component block is set as a line unit, but the present invention is not limited to this, and by adding structure management in units of columns, two or more document configurations can be made in the same line. Even if there is an element block, it can be processed in the same manner.

【００６６】また、本実施例の構成要素の分類や文書レ
イアウト情報の抽出及び学習は例示であり、より詳細に
分類及び抽出・学習を行えるのは言うまでもなく、本実
施例で開示した方法と同様の方法によって実現可能であ
る。Further, the classification of the constituent elements and the extraction and learning of the document layout information in this embodiment are examples, and it goes without saying that the classification, extraction and learning can be performed in more detail, like the method disclosed in this embodiment. Can be realized by the method of.

【００６７】また、本実施例では入力例文書１，２のよ
うに文書レイアウトの施された文書のフォーマット学習
例を示したが、レイアウトを意識しないで作成された
（桁下げ等が行われていない）べた書きの文書を、学習
したフォーマットでレイアウトすることもできる。In the present embodiment, an example of format learning of a document having a document layout such as the input example documents 1 and 2 has been shown, but the document is created without consideration of the layout (the digit reduction is performed). You can also lay out plain text in a learned format.

【００６８】また、本実施例の文書フォーマット変換処
理例では、文書フォーマット学習機能により学習したフ
ォーマット学習データを用いたが、これに限らず、例え
ば外部記憶装置等にあるフォーマット学習データを学習
領域に呼び出して用いても良いことは言うまでもない。Further, in the document format conversion processing example of the present embodiment, the format learning data learned by the document format learning function is used, but not limited to this, for example, the format learning data in the external storage device or the like is used as the learning area. Needless to say, it may be called and used.

【００６９】また、本実施例では、文書アレンジ変換処
理後に、文書レイアウト変換処理を行っているが、まず
レイアウト変換処理を行ない、次にアレンジ変換処理を
行なうようにしてもよいことは勿論である。Further, in the present embodiment, the document layout conversion process is performed after the document arrangement conversion process, but it goes without saying that the layout conversion process may be performed first, and then the arrangement conversion process may be performed. .

【００７０】さらに、本実施例では、文書処理装置１０
を日本語ワードプロセッサに適用した例であるが、文書
書式学習機能を持つ装置であれば他の全ての装置、例え
ばパーソナルコンピュータにも適用できることは言うま
でもない。Further, in the present embodiment, the document processing device 10
The above is an example in which is applied to a Japanese word processor, but it goes without saying that it can be applied to all other devices such as a personal computer as long as the device has a document format learning function.

【００７１】[0071]

【発明の効果】この発明によれば、特定文字列を予め決
められた文字サイズに変更すると共に修飾を付加して予
め決められた位置に配置し、さらに他の文字列データを
所定位置に配置すること自動的にできるので、特定文字
列を半角、倍角等の文字サイズに変更してアンダーライ
ンを引くことや、各文字を所定位置に配置することを意
識することなく文字入力ができてユーザの負担を大幅に
軽減でき、極めて実用性の高いものとなる。According to the present invention, a specific character string is changed to a predetermined character size, a decoration is added and the character string is arranged at a predetermined position, and other character string data is arranged at a predetermined position. Since it can be automatically done, it is possible to input characters without being aware of changing the size of a specific character string to half-width, double-width, etc. underlining and placing each character in a predetermined position. The burden of can be greatly reduced, and it becomes extremely practical.

[Brief description of drawings]

【図１】文書処理装置のブロック構成図である。FIG. 1 is a block diagram of a document processing apparatus.

【図２】文書処理装置の入力例文書１を示す図である。FIG. 2 is a diagram illustrating an input example document 1 of the document processing apparatus.

【図３】文書処理装置の入力例文書２を示す図である。FIG. 3 is a diagram showing an input example document 2 of the document processing apparatus.

【図４】文書処理装置の中間例文書を示す図である。FIG. 4 is a diagram showing an intermediate example document of the document processing apparatus.

【図５】文書処理装置の出力例文書を示す図である。FIG. 5 is a diagram illustrating an output example document of the document processing apparatus.

【図６】文書処理装置の文書最小分割処理を示すフロー
チャートである。FIG. 6 is a flowchart showing a document minimum division process of the document processing apparatus.

【図７】文書処理装置の文書最小分割ブロックテーブル
（テーブル１）を示す図である。FIG. 7 is a diagram showing a document minimum division block table (table 1) of the document processing apparatus.

【図８】文書処理装置の文書レイアウト情報抽出処理を
示すフローチャートである。FIG. 8 is a flowchart showing a document layout information extraction process of the document processing device.

【図９】文書処理装置のキーワード辞書の構造を示す図
である。FIG. 9 is a diagram showing a structure of a keyword dictionary of the document processing device.

【図１０】文書処理装置の文書構成要素ブロックテーブ
ル（テーブル２）を示す図である。FIG. 10 is a diagram showing a document component block table (table 2) of the document processing apparatus.

【図１１】文書処理装置の文書レイアウト情報テーブル
（テーブル３）を示す図である。FIG. 11 is a diagram showing a document layout information table (table 3) of the document processing apparatus.

【図１２】文書処理装置の文書アレンジ情報抽出処理を
示すフローチャートである。FIG. 12 is a flowchart showing a document arrangement information extracting process of the document processing apparatus.

【図１３】文書処理装置の修飾情報セット処理を示すフ
ローチャートである。FIG. 13 is a flowchart showing a modification information setting process of the document processing device.

【図１４】文書処理装置の文書アレンジ情報テーブル
（テーブル４）を示す図である。FIG. 14 is a diagram showing a document arrangement information table (table 4) of the document processing apparatus.

【図１５】文書処理装置の入力例文書２の文書構造解析
テーブル（テーブル５）を示す図である。FIG. 15 is a diagram showing a document structure analysis table (table 5) of an input example document 2 of the document processing apparatus.

【図１６】文書処理装置の文書アレンジ変換処理を示す
フローチャートである。FIG. 16 is a flowchart showing a document arrangement conversion process of the document processing device.

【図１７】文書処理装置の文書レイアウト変化処理を示
すフローチャートである。FIG. 17 is a flowchart showing a document layout changing process of the document processing apparatus.

[Explanation of symbols]

１０文書処理装置１１ＣＰＵ１２ＲＯＭ１３ＲＡＭ１４キーボード１６ＯＣＲ１８ＣＲＴ２０外部記憶装置２２プリンタ２４文書解析装置２５文書レイアウト情報抽出装置２６文書アレンジ情報抽出装置２７文書フォーマット変換装置２８文書記憶領域２９文書レイアウト情報学習領域３０文書アレンジ情報学習領域３１ワーク領域 10 Document Processing Device 11 CPU 12 ROM 13 RAM 14 Keyboard 16 OCR 18 CRT 20 External Storage Device 22 Printer 24 Document Analysis Device 25 Document Layout Information Extraction Device 26 Document Arrangement Information Extraction Device 27 Document Format Conversion Device 28 Document Storage Area 29 Document Layout Information learning area 30 Document arrangement Information learning area 31 Work area

Claims

[Claims]

1. A character string data storage means for storing input character string data, a judging means for judging whether or not there is specific character string data in the character string data storage means, and the judged specific character string data. To a predetermined character size, add modification information and arrange the character string at a predetermined position, and further arrange other character string data at a predetermined position. Character string data processing device.