JP7512798B2

JP7512798B2 - Information processing device and computer program

Info

Publication number: JP7512798B2
Application number: JP2020162029A
Authority: JP
Inventors: 荘介下山
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2024-07-09
Anticipated expiration: 2040-09-28
Also published as: JP2022054812A

Description

本発明は、情報処理装置及びコンピュータプログラムに関する。 The present invention relates to an information processing device and a computer program.

雑誌、書籍、新聞等のレイアウト作成を支援する種々の手法が提案されている。特許文献１には、ドキュメントから複数のコンテンツを抽出し、抽出した複数のコンテンツ間の意味的な関連性の度合いに基づいてドキュメント上の各コンテンツの位置を決定し、決定した位置にコンテンツを配置した新たなドキュメントを生成する情報処理装置が開示されている。 Various methods have been proposed to assist in creating layouts for magazines, books, newspapers, etc. Patent Document 1 discloses an information processing device that extracts multiple pieces of content from a document, determines the position of each piece of content on the document based on the degree of semantic relevance between the extracted pieces of content, and generates a new document in which the content is placed in the determined position.

特開２００９－１６９５３６号公報JP 2009-169536 A

特許文献１の情報処理装置では、ドキュメント内のコンテンツに記載されたテキストの一致の程度に応じてコンテンツ間の類似度合いを算出し、算出した類似度合いに基づいてコンテンツを配置している。しかし、ドキュメント内のコンテンツはテキストに限定されるわけではなく、図などの他のコンテンツも含まれるので、特許文献１の情報処理装置では、関連性のあるコンテンツの配置を精度よく行うことができない可能性がある。 The information processing device of Patent Document 1 calculates the degree of similarity between contents in a document according to the degree of matching of the text written in the contents, and arranges the contents based on the calculated degree of similarity. However, since the content in a document is not limited to text, but also includes other content such as figures, the information processing device of Patent Document 1 may not be able to arrange related content with a high degree of accuracy.

本発明は、斯かる事情に鑑みてなされたものであり、文書内のコンテンツの関連性を精度よく判定できる情報処理装置及びコンピュータプログラムを提供することを目的とする。 The present invention has been made in consideration of the above circumstances, and aims to provide an information processing device and computer program that can accurately determine the relevance of content within a document.

本発明の実施の形態に係る情報処理装置は、複数のコンテンツを含む文書データを取得する取得部と、取得した文書データに含まれる前記複数のコンテンツのカテゴリを特定する特定部と、特定したカテゴリのコンテンツの組を生成する生成部と、カテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第１マップのセットを生成する第１マップ生成部と、一つの組に対してカテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第２マップのセットを前記コンテンツの組毎に生成する第２マップ生成部と、前記第１マップのセット、及び前記コンテンツの組毎の前記第２マップのセットを、コンテンツ間の関連性を学習済みの学習済みモデルに入力して、前記コンテンツの組毎のコンテンツ間の関連性の有無を判定する判定部とを備える。 An information processing device according to an embodiment of the present invention includes an acquisition unit that acquires document data including a plurality of pieces of content, an identification unit that identifies categories of the plurality of pieces of content included in the acquired document data, a generation unit that generates a set of content for the identified category, a first map generation unit that generates a set of first maps in which an area corresponding to the content for each category and other areas are imaged with different pixel values, a second map generation unit that generates a set of second maps for each set of content in which an area corresponding to the content for each category and other areas are imaged with different pixel values, and a determination unit that inputs the set of first maps and the set of second maps for each set of content into a trained model that has trained the relevance between the contents, and determines whether or not there is a relevance between the contents for each set of content.

本発明の実施の形態に係るコンピュータプログラムは、コンピュータに、複数のコンテンツを含む文書データを取得し、取得した文書データに含まれる前記複数のコンテンツのカテゴリを特定し、特定したカテゴリのコンテンツの組を生成し、カテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第１マップのセットを生成し、一つの組に対してカテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第２マップのセットを前記コンテンツの組毎に生成し、前記第１マップのセット、及び前記コンテンツの組毎の前記第２マップのセットを、コンテンツ間の関連性を学習済みの学習済みモデルに入力して、前記コンテンツの組毎のコンテンツ間の関連性の有無を判定する、処理を実行させる。 A computer program according to an embodiment of the present invention causes a computer to execute a process of acquiring document data including a plurality of pieces of content, identifying categories of the plurality of pieces of content included in the acquired document data, generating a set of content for the identified category, generating a set of first maps for each category in which an area corresponding to the content and other areas are imaged with different pixel values, generating a set of second maps for each set of content in which an area corresponding to the content and other areas are imaged with different pixel values for each category, inputting the set of first maps and the set of second maps for each set of content into a trained model that has trained the relevance between content, and determining whether or not there is a relevance between the content for each set of content.

本発明によれば、文書内のコンテンツの関連性を精度よく判定できる。 The present invention makes it possible to accurately determine the relevance of content within a document.

本実施の形態の情報処理装置の構成の一例を示すブロック図である。1 is a block diagram showing an example of a configuration of an information processing device according to an embodiment of the present invention; カテゴリ特定方法の一例を示す模式図である。FIG. 13 is a schematic diagram showing an example of a category identification method. 所要のカテゴリのコンテンツ同士の関連性を示す関連グラフの一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of an association graph showing associations between contents in a required category. コンテンツの関連性の判定を行うための一連の処理の流れを示す模式図である。FIG. 11 is a schematic diagram showing a flow of a series of processes for determining the relevance of content. コンテンツの組の一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of a set of contents. コンテンツ組マップの一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of a content group map. コンテンツ全体マップの一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of an entire content map. 座標マップの一例を示す模式図である。FIG. 13 is a schematic diagram showing an example of a coordinate map. 特徴マップの構成の一例を示す模式図である。FIG. 2 is a schematic diagram illustrating an example of a configuration of a feature map. コンテンツ組マップの他の例を示す模式図である。FIG. 13 is a schematic diagram showing another example of the content group map. 関連性判定部の学習方法の一例を示す模式図である。FIG. 13 is a schematic diagram showing an example of a learning method of the relevance determination unit. 情報処理装置によるコンテンツの関連性判定結果の一例を示す模式図である。10 is a schematic diagram showing an example of a content relevance determination result by an information processing device; FIG. クラスタコンテンツに対する操作の一例を示す模式図である。FIG. 13 is a schematic diagram showing an example of an operation on cluster content. 情報処理装置によるコンテンツの関連性判定の処理手順の一例を示すフローチャートである。11 is a flowchart illustrating an example of a processing procedure for determining relevance of content by an information processing device.

以下、本発明の実施の形態を図面に基づいて説明する。図１は本実施の形態の情報処理装置５０の構成の一例を示すブロック図である。情報処理装置５０は、通信ネットワーク１を介してサーバ１０に接続することができる。サーバ１０は、例えば、文書データを蓄積するデータサーバとすることができる。情報処理装置５０は、通信ネットワーク１を介して、サーバ１０から文書データを取得することができる。また、情報処理装置５０にはスキャナ２０を接続することができる。情報処理装置５０は、スキャナ２０で読み取って得られた文書データを取得することができる。文書データは、雑誌、書籍、新聞等の版面データであり、複数のコンテンツを含む。コンテンツは、文書内にレイアウトされる各要素である。 The following describes an embodiment of the present invention with reference to the drawings. FIG. 1 is a block diagram showing an example of the configuration of an information processing device 50 according to this embodiment. The information processing device 50 can be connected to a server 10 via a communication network 1. The server 10 can be, for example, a data server that stores document data. The information processing device 50 can acquire document data from the server 10 via the communication network 1. A scanner 20 can also be connected to the information processing device 50. The information processing device 50 can acquire document data obtained by reading with the scanner 20. The document data is page data for magazines, books, newspapers, etc., and includes multiple contents. The contents are the elements laid out in the document.

情報処理装置５０は、装置全体を制御する制御部５１、通信部５２、記憶部５３、カテゴリ特定部５４、コンテンツ組生成部５５、マップ生成部５６、関連性判定部５７、表示パネル５８、表示処理部５９、及び操作部６０を備える。情報処理装置５０は、例えば、パーソナルコンピュータ、タブレット、スマートフォン等で構成することができる。制御部５１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）などで構成することができる。 The information processing device 50 includes a control unit 51 that controls the entire device, a communication unit 52, a memory unit 53, a category identification unit 54, a content group generation unit 55, a map generation unit 56, a relevance determination unit 57, a display panel 58, a display processing unit 59, and an operation unit 60. The information processing device 50 may be configured, for example, as a personal computer, a tablet, a smartphone, etc. The control unit 51 may be configured as a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), etc.

通信部５２は、通信ネットワーク１を介して、サーバ１０との間で通信を行う機能を有し、所要の情報の送受信を行うことができる。より具体的には、通信部５２は、サーバ１０から文書データを取得することができる。また、通信部５２は、スキャナ２０との間のインタフェース機能も備え、スキャナ２０から文書データを取得することができる。 The communication unit 52 has a function of communicating with the server 10 via the communication network 1, and can transmit and receive required information. More specifically, the communication unit 52 can acquire document data from the server 10. The communication unit 52 also has an interface function with the scanner 20, and can acquire document data from the scanner 20.

記憶部５３は、半導体メモリ又はハードディスク等で構成され、通信部５２を介して取得した文書データを記憶することができる。また、記憶部５３は、情報処理装置５０内の処理結果などの所要のデータを記憶することができる。 The storage unit 53 is composed of a semiconductor memory or a hard disk, and can store document data acquired via the communication unit 52. The storage unit 53 can also store required data such as the processing results in the information processing device 50.

表示パネル５８は、液晶ディスプレイ又は有機ＥＬ（Electro Luminescence）ディスプレイで構成することができる。 The display panel 58 can be composed of a liquid crystal display or an organic EL (Electro Luminescence) display.

表示処理部５９は、表示パネル５８に所要の情報を表示するための処理を行う。 The display processing unit 59 performs processing to display the required information on the display panel 58.

操作部６０は、例えば、キーボード、マウス等で構成することができる。また、操作部６０は、タッチパネル等で構成され、表示パネル５８上で文字の入力操作、表示パネル５８に表示されたアイコン、画像又は文字等に対する操作を行うようにしてもよい。 The operation unit 60 can be configured with, for example, a keyboard, a mouse, etc. The operation unit 60 may also be configured with a touch panel or the like, and may be used to input characters on the display panel 58, and to perform operations on icons, images, characters, etc. displayed on the display panel 58.

カテゴリ特定部５４は、特定部としての機能を有し、通信部５２を介して、取得した文書データに含まれる複数のコンテンツのカテゴリを特定する。カテゴリには、例えば、タイトル、本文（テキスト）、図（画像）、キャプション（図の説明文）などが含まれる。カテゴリの特定は、例えば、機械学習された分類器、カテゴリを表す特徴量を抽出する画像解析、カテゴリの種類を識別する画像認識、パターンマッチングなど種々の画像処理技術を用いることができる。 The category identification unit 54 has a function as an identification unit, and identifies the categories of multiple contents contained in the acquired document data via the communication unit 52. The categories include, for example, the title, main text, figures (images), and captions (explanation of figures). The category can be identified using various image processing techniques, such as a machine-learned classifier, image analysis that extracts features that represent the category, image recognition that identifies the type of category, and pattern matching.

図２はカテゴリ特定方法の一例を示す模式図である。文書は、複数のコンテンツがレイアウトされ、それぞれのコンテンツは、タイトル、本文、図、キャプションなどのいずれかのカテゴリに分類される。カテゴリ特定部５４は、文書内にレイアウトされた各コンテンツのカテゴリを特定することができる。図２の例では、タイトル、本文、図、キャプションなどのカテゴリの中から、所要のカテゴリとして図及びキャプションを特定している。右側の図において、破線で囲まれたコンテンツのカテゴリが、図又はキャプションのいずれかである。なお、タイトル、本文、図、キャプションなどのカテゴリのうち、どのカテゴリを特定するかは、いずれのカテゴリのコンテンツ同士の関連性を対象とするかに応じて、予め設定することができる。以下、本明細書では、所要のカテゴリとして、図及びキャプションを例にして説明する。 Figure 2 is a schematic diagram showing an example of a category identification method. A document has multiple contents laid out, and each content is classified into one of the categories of title, text, figure, caption, etc. The category identification unit 54 can identify the category of each content laid out in the document. In the example of Figure 2, figures and captions are identified as the required category from among the categories of title, text, figure, caption, etc. In the diagram on the right, the category of content surrounded by a dashed line is either figure or caption. It should be noted that which category to identify from the categories of title, text, figure, caption, etc. can be set in advance depending on which category of content relationships is the target. In the following description, figures and captions are used as examples of the required categories.

図３は所要のカテゴリのコンテンツ同士の関連性を示す関連グラフの一例を示す模式図である。所要のカテゴリは、図及びキャプションとする。左図は、特定したカテゴリのコンテンツを表し、具体的には、カテゴリが図であるコンテンツfigure object1、figure object2、figure object3、及びカテゴリがキャプションであるコンテンツcaption object1、caption object2、caption object3がレイアウトされた図を示す。各コンテンツ同士の関連性の有無を判定することにより、右図に示すような正解グラフ（関連グラフ）を得ることができる。正解グラフに示すように、図fig１とキャプションcap1、及びcap2それぞれとが関連性があり、図fig2とキャプションcap3との間、及び図fig3とキャプションcap3との間で関連性がある。図では、関連性があるコンテンツ同士を線分で繋いでいる。 Figure 3 is a schematic diagram showing an example of an association graph showing the association between contents of a required category. The required categories are figures and captions. The left diagram shows the contents of the specified category, specifically, a diagram in which the contents figure object1, figure object2, and figure object3, which are in the category of figures, and the contents caption object1, caption object2, and caption object3, which are in the category of captions, are laid out. By determining whether or not there is an association between each piece of content, a correct answer graph (association graph) such as that shown in the right diagram can be obtained. As shown in the correct answer graph, there is an association between figure fig1 and caption cap1 and cap2, respectively, and there is an association between figure fig2 and caption cap3, and between figure fig3 and caption cap3. In the diagram, related contents are connected by lines.

次に、コンテンツの関連性の判定方法について説明する。 Next, we will explain how to determine the relevance of content.

図４はコンテンツの関連性の判定を行うための一連の処理の流れを示す模式図である。コンテンツ組生成部５５は、生成部としての機能を有し、カテゴリ特定部５４が特定したカテゴリのコンテンツの組を生成することができる。 Figure 4 is a schematic diagram showing a series of processing steps for determining the relevance of content. The content group generation unit 55 functions as a generation unit and can generate a content group for the category identified by the category identification unit 54.

図５はコンテンツの組の一例を示す模式図である。特定したカテゴリを、図（ｆｉｇ）とキャプション（ｃａｐ）とし、文書内に図のコンテンツが３つ（ｆｉｇ１、ｆｉｇ２、ｆｉｇ３）とキャプションのコンテンツが３つ（ｃａｐ１、ｃａｐ２、ｃａｐ３）存在するとする。なお、ｆｉｇ１は、図３のfigure object1に対応し、ｃａｐ１は、図３のcaption object1に対応する。他のコンテンツも同様である。コンテンツ組生成部５５は、図中、(1)～(9)で示す、カテゴリが異なるコンテンツの全ての組（９通りの組）を生成することができる。なお、ここでは、カテゴリが異なるコンテンツの組の全てを生成しているが、同じカテゴリのコンテンツの組（例えば、ｆｉｇ２とｆｉｇ３）を生成してもよい。 Figure 5 is a schematic diagram showing an example of a content pair. The specified categories are figures (fig) and captions (cap), and there are three figure contents (fig1, fig2, fig3) and three caption contents (cap1, cap2, cap3) in the document. Note that figure 1 corresponds to figure object 1 in Figure 3, and cap1 corresponds to caption object 1 in Figure 3. The same applies to other contents. The content pair generation unit 55 can generate all pairs (nine pairs) of contents in different categories, as shown by (1) to (9) in the figure. Note that, although all pairs of contents in different categories are generated here, pairs of contents in the same category (for example, fig2 and fig3) may also be generated.

マップ生成部５６は、特定したカテゴリのコンテンツのレイアウトなどを画像化した特徴マップを生成する。具体的には、マップ生成部５６は、コンテンツ組マップ、コンテンツ全体マップ、及び座標マップを生成することができる。以下、特徴マップの詳細について説明する。 The map generating unit 56 generates a feature map that visualizes the layout of the content of the identified category. Specifically, the map generating unit 56 can generate a content group map, an entire content map, and a coordinate map. The feature map is described in detail below.

図６はコンテンツ組マップの一例を示す模式図である。コンテンツ組マップは、要素組のセマンティックマップとも称する。コンテンツ組マップは、コンテンツ組生成部５５で生成したコンテンツの組それぞれの文書内配置を画像化したものであり、コンテンツの組毎に生成することができる。すなわち、コンテンツ組マップは、一つの組に対してカテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化したマップ（第２マップ）であり、マップ生成部５６は、コンテンツ組マップのセットを生成することができる。カテゴリ毎にチャネルを割り当てることができ、図の例では、チャネル１に図が割り当てられ、チャネル２にキャプションが割り当てられている。 Figure 6 is a schematic diagram showing an example of a content set map. The content set map is also called a semantic map of an element set. The content set map is an image of the layout within a document of each content set generated by the content set generation unit 55, and can be generated for each content set. In other words, the content set map is a map (second map) in which the area corresponding to the content and the other areas are imaged with different pixel values for each category for one set, and the map generation unit 56 can generate a set of content set maps. A channel can be assigned for each category, and in the example shown, figures are assigned to channel 1 and captions are assigned to channel 2.

コンテンツの組をｆｉｇ１－ｃａｐ２とすると、図に関するコンテンツ組マップ（チャネル１のマップ）は、文書内のｆｉｇ１のレイアウト（配置）を画像化したものとすることができ、キャプションに関するコンテンツ組マップ（チャネル２のマップ）は、文書内のｃａｐ２のレイアウト（配置）を画像化したものとすることができる。すなわち、１つのコンテンツの組に対応して、チャネル１、２それぞれの特徴マップを生成することができる。 If the content set is fig1-cap2, the content set map for the figure (channel 1 map) can be an image of the layout (placement) of fig1 within the document, and the content set map for the caption (channel 2 map) can be an image of the layout (placement) of cap2 within the document. In other words, feature maps for channels 1 and 2 can be generated corresponding to one content set.

また、コンテンツの組をｆｉｇ２－ｃａｐ１とすると、図に関するコンテンツ組マップ（チャネル１のマップ）は、文書内のｆｉｇ２のレイアウト（配置）を画像化したものとすることができ、キャプションに関するコンテンツ組マップ（チャネル２のマップ）は、文書内のｃａｐ１のレイアウト（配置）を画像化したものとすることができる。以下、同様にして、コンテンツの全ての組に対して、コンテンツ組マップを生成することができる。コンテンツの組が９通りある場合、図及びキャプションに関するコンテンツ組マップをそれぞれ９個生成することができる。 Furthermore, if the content set is fig2-cap1, then the content set map for figures (channel 1 map) can be an image of the layout (placement) of fig2 within the document, and the content set map for captions (channel 2 map) can be an image of the layout (placement) of cap1 within the document. In the same manner, content set maps can be generated for all content sets. If there are nine content sets, nine content set maps each for figures and captions can be generated.

コンテンツ組マップの画素値は、例えば、コンテンツ領域の画素値を１とし、コンテンツ領域以外の背景の画素値を０とすることができるが、これに限定されるものではない。 The pixel values of the content group map can be, for example, 1 for the content area and 0 for the background outside the content area, but are not limited to this.

図７はコンテンツ全体マップの一例を示す模式図である。コンテンツ全体マップは、要素全体のセマンティックマップとも称する。コンテンツ全体マップは、特定したカテゴリのコンテンツ全体の文書内配置を画像化したものである。すなわち、コンテンツ全体マップは、カテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化したマップ（第１マップ）である。マップ生成部５６は、コンテンツ全体マップのセットをコンテンツの組毎に生成する。カテゴリ毎にチャネルを割り当てることができ、図の例では、チャネル１に図が割り当てられ、チャネル２にキャプションが割り当てられている。 Figure 7 is a schematic diagram showing an example of an entire content map. The entire content map is also called a semantic map of all elements. The entire content map is an image of the layout of the entire content of a specified category within a document. In other words, the entire content map is a map (first map) in which the area corresponding to the content for each category and the other areas are imaged with different pixel values. The map generation unit 56 generates a set of entire content maps for each set of content. A channel can be assigned to each category; in the example shown, the figure is assigned to channel 1 and the caption is assigned to channel 2.

コンテンツ全体マップは、特定したカテゴリ毎に生成することができる。図に関するコンテンツ全体マップ（チャネル１のマップ）は、文書内のｆｉｇ１、ｆｉｇ２、ｆｉｇ３のレイアウト（配置）を画像化したものとすることができる。キャプションに関するコンテンツ全体マップ（チャネル２のマップ）は、文書内のｃａｐ１、ｃａｐ２、ｃａｐ３のレイアウト（配置）を画像化したものとすることができる。 An overall content map can be generated for each identified category. The overall content map for figures (channel 1 map) can be an image of the layout (placement) of fig1, fig2, and fig3 within the document. The overall content map for captions (channel 2 map) can be an image of the layout (placement) of cap1, cap2, and cap3 within the document.

コンテンツ全体マップの画素値は、例えば、コンテンツ領域の画素値を１とし、コンテンツ領域以外の背景の画素値を０とすることができるが、これに限定されるものではない。 The pixel values of the entire content map can be, for example, 1 for the content area and 0 for the background outside the content area, but are not limited to this.

図８は座標マップの一例を示す模式図である。座標マップは、文書内の位置情報を画像化したものである。文書の画像が（ｍ×ｎ）画素で構成されるとする（ｍ：ｙ軸方向の画素数、ｎ：ｘ軸方向の画素数）。座標マップは、チャネル１が割り当てられる、ｘ軸の座標マップ、及びチャネル２が割り当てられる、ｙ軸の座標マップで構成することができる。ｘ軸の座標マップは、画素値がｘ軸方向の位置を表すものであり、ｘ軸方向の画素０、１、２、…、ｎそれぞれに対して、画素値ｘ₀、ｘ₁、ｘ₂、…、ｘ_nが与えられている。図の例では、画素値ｘ₀、ｘ₁、ｘ₂、…、ｘ_nを、０、０．１、０．２、…としている。ｙ軸の座標マップは、画素値がｙ軸方向の位置を表すものであり、ｙ軸方向の画素０、１、２、…、ｍそれぞれに対して、画素値ｙ₀、ｙ₁、ｙ₂、…、ｙ_nが与えられている。図の例では、画素値ｙ₀、ｙ₁、ｙ₂、…、ｙ_nを、０、０．１、０．２、…としている。画素値の値が大きくなるに応じて画像は明るくなる。 FIG. 8 is a schematic diagram showing an example of a coordinate map. The coordinate map is an image of position information in a document. Assume that the image of the document is composed of (m×n) pixels (m: number of pixels in the y-axis direction, n: number of pixels in the x-axis direction). The coordinate map can be composed of an x-axis coordinate map to which channel 1 is assigned, and a y-axis coordinate map to which channel ₂ is assigned. In the x-axis coordinate map, pixel values represent positions in the x-axis direction, and pixel values x ₀ , x 1 , x 2 , ..., x _n are given to pixels 0, ₁ , 2, ..., n in the x-axis direction, respectively. In the example shown in the figure, pixel values x ₀ , x ₁ , x ₂ , ..., x _n are set to 0, 0.1, 0.2, .... In the y-axis coordinate map, pixel values represent positions in the y-axis direction, and pixel values y ₀ , y ₁ , y 2 , ..., _yn are given to pixels 0, 1, ₂ , ..., m in the y-axis direction, respectively. In the illustrated example, pixel values y ₀ , y ₁ , y ₂ , ..., _yn are set to 0, 0.1, 0.2, .... As the pixel value increases, the image becomes brighter.

図９は特徴マップの構成の一例を示す模式図である。特徴マップは、図（チャネル１）及びキャプション（チャネル２）それぞれのコンテンツ組マップ、図（チャネル１）及びキャプション（チャネル２）それぞれのコンテンツ全体マップ、及び座標マップ（ｘ軸及びｙ軸）の最終的に６チャネルのマップを結合することにより、構成することができる。なお、座標マップは必須の構成ではないが、座標マップを用いることにより、コンテンツ間の関連性の判定精度を向上させることができる。 Figure 9 is a schematic diagram showing an example of the configuration of a feature map. The feature map can be constructed by combining the maps of six channels, namely the content set maps for the figure (channel 1) and caption (channel 2), the overall content maps for the figure (channel 1) and caption (channel 2), and the coordinate map (x-axis and y-axis). Note that the coordinate map is not a required configuration, but by using the coordinate map, the accuracy of determining the relevance between contents can be improved.

関連性判定部５７は、判定部としての機能を有し、特徴マップを入力することにより、コンテンツの組毎にコンテンツ間の関連性の有無を判定することができる。すなわち、関連性判定部５７は、コンテンツ組マップのセット、及びコンテンツ全体マップのセットを、コンテンツ間の関連性を学習済みの学習済みモデルに入力して、コンテンツの組毎のコンテンツ間の関連性の有無を判定することができる。学習済みモデルは、例えば、畳み込みニューラルネットワークで構成することができるが、これに限定されるものではなく、他のニューラルネットワークで構成してもよい。また、ＳＶＭ（Support Vector Machine）、ベイジアンネットワークなど他の機械学習済みのモデルを用いてもよい。 The relevance determination unit 57 has a function as a determination unit, and by inputting a feature map, can determine whether or not there is a relevance between the contents for each group of contents. That is, the relevance determination unit 57 can input the set of content group maps and the set of entire content maps to a trained model that has trained the relevance between the contents, and can determine whether or not there is a relevance between the contents for each group of contents. The trained model can be configured, for example, as a convolutional neural network, but is not limited to this, and may be configured as another neural network. In addition, other machine-learned models such as SVM (Support Vector Machine) and Bayesian networks may also be used.

上述のように、コンテンツの組毎に、文書内の当該コンテンツのレイアウトを画像化して得られたコンテンツ組マップを用いてコンテンツ間の関連性の有無を判定するので、単にコンテンツ同士の相対位置（相対座標）に基づいて判定する場合に比べて、関連性の有無の判定に用いられる情報量を増やすことができ、コンテンツ間の関連性の有無の判定を精度よく行うことができる。 As described above, for each set of content, the presence or absence of a relationship between the contents is determined using a content set map obtained by imaging the layout of the content within the document. This allows a greater amount of information to be used to determine the presence or absence of a relationship compared to when the determination is simply based on the relative positions (relative coordinates) of the contents, making it possible to more accurately determine the presence or absence of a relationship between the contents.

また、コンテンツ全体マップを用いることにより、コンテンツｆｉｇ１及びｃａｐ１のレイアウト（コンテンツ組マップ）だけでなく、文書内の他のコンテンツとの関連性、例えば、コンテンツｆｉｇ１とｃａｐ１以外のカテゴリがキャプションのコンテンツとの関連性、及びコンテンツｃａｐ１とｆｉｇ１以外のカテゴリが図のコンテンツとの関連性も考慮することができ、文書内のコンテンツの関連性の有無をさらに精度良く判定することができる。 In addition, by using the entire content map, it is possible to take into account not only the layout of the contents fig1 and cap1 (content group map) but also the relevance with other content in the document, for example, the relevance of categories other than the contents fig1 and cap1 with the caption content, and the relevance of categories other than the contents cap1 and fig1 with the figure content, making it possible to more accurately determine whether the content in the document is relevant.

さらに、コンテンツ全体マップ（コンテンツの各組に対して共通）及びコンテンツ組マップ（コンテンツの組毎に異なる）に加えて、座標マップ（コンテンツの各組に対して共通）を用いて、コンテンツの組毎に関連性を判定することにより、コンテンツ全体マップ及びコンテンツ組マップ内の各コンテンツの文書内での位置関係を把握できる情報を加味して関連性を判定できるので、文書内のコンテンツの関連性の有無をさらに精度良く判定することができる。 Furthermore, by using a coordinate map (common to each set of content) in addition to the overall content map (common to each set of content) and the content set map (different for each set of content), relevance can be determined for each set of content by taking into account information that can grasp the positional relationship within the document of each piece of content in the overall content map and content set map, so that the relevance of content within a document can be determined with even greater accuracy.

図１０はコンテンツ組マップの他の例を示す模式図である。図６に例示したコンテンツ組マップは、図にチャネル１を割り当て、キャプションにチャネル２を割り当てて、２つのチャネルを用いて、コンテンツ組マップを生成するものであった。図１０の例では、マップ生成部５６は、特定したカテゴリ毎にコンテンツの画素値が異なるコンテンツ組マップを生成することができる。例えば、図１０に示すように、図を模様のない矩形で表し、キャプションを模様（斜線）のある矩形で表す。文書内の図の領域の画素の画素値をａとし、キャプションの領域の画素の画素値をｂとし、図及びキャプション以外の領域の画素の画素値をｃとすることができる。画素値の違いは、輝度の違いを表すものでもよく、色の違いを表すものでもよい。これにより、１つのコンテンツの組に対応して、１チャネルのコンテンツ組マップを生成することができる。 Figure 10 is a schematic diagram showing another example of a content group map. The content group map shown in Figure 6 is generated by using two channels, with channel 1 assigned to the figure and channel 2 assigned to the caption. In the example of Figure 10, the map generating unit 56 can generate a content group map in which the pixel values of the content differ for each specified category. For example, as shown in Figure 10, the figure is represented by a rectangle with no pattern, and the caption is represented by a rectangle with a pattern (diagonal lines). The pixel value of the pixel in the figure area in the document can be a, the pixel value of the pixel in the caption area can be b, and the pixel value of the pixel in the area other than the figure and the caption can be c. The difference in pixel value may represent a difference in brightness or a difference in color. This makes it possible to generate a content group map of one channel corresponding to one content set.

なお、図示していないが、マップ生成部５６は、特定したカテゴリ毎にコンテンツの画素値が異なるコンテンツ全体マップを生成することができる。すなわち、文書内の各図の領域の画素の画素値をａとし、各キャプションの領域の画素の画素値をｂとし、各図及び各キャプション以外の領域の画素の画素値をｃとすることができる。これにより、１チャネルのコンテンツ全体マップを生成することができる。 Although not shown, the map generator 56 can generate an entire content map in which the pixel values of the content differ for each identified category. That is, the pixel value of the pixels in the area of each figure in the document can be a, the pixel value of the pixels in the area of each caption can be b, and the pixel value of the pixels in the area other than each figure and each caption can be c. This makes it possible to generate an entire content map for one channel.

次に、関連性判定部５７の学習方法について説明する。 Next, we will explain the learning method of the relevance determination unit 57.

図１１は関連性判定部５７の学習方法の一例を示す模式図である。予め多数の文書データを収集しておき、各文書内のカテゴリが図のコンテンツと、カテゴリがキャプションのコンテンツの全ての組を生成し、生成したコンテンツの組に、関連ありのラベル、及び関連なしのラベルを付与する。生成したコンテンツの組に基づいて、特徴マップを生成する。訓練用データは、関連ありのラベルが付与された特徴マップ、及び関連なしのラベルが付与された特徴マップを含めることができる。 Figure 11 is a schematic diagram showing an example of a learning method for the relevance determination unit 57. A large amount of document data is collected in advance, and all pairs of content in the category of figures and content in the category of captions in each document are generated, and related and unrelated labels are assigned to the generated content pairs. A feature map is generated based on the generated content pairs. The training data can include feature maps that have been assigned related labels and feature maps that have been assigned unrelated labels.

図１１の例では、コンテンツｆｉｇ１とコンテンツｃａｐ２とは関連性があるので、コンテンツｆｉｇ１とコンテンツｃａｐ２とに基づいて生成された特徴マップと、関連ありのラベルを用いてニューラルネットワークを学習させて、関連性判定部５７を生成することができる。訓練用データには、他の関連性のあるコンテンツの組についての特徴マップと関連ありのラベルが含まれることは言うまでもない。 In the example of FIG. 11, since the content fig1 and the content cap2 are related to each other, the neural network can be trained using the feature map generated based on the content fig1 and the content cap2 and the related labels to generate the relatedness determination unit 57. It goes without saying that the training data includes feature maps and related labels for other related content pairs.

同様に、コンテンツｆｉｇ２とコンテンツｃａｐ１とは関連性がないので、コンテンツｆｉｇ２とコンテンツｃａｐ１とに基づいて生成された特徴マップと、関連なしのラベルを用いてニューラルネットワークを学習させて、関連性判定部５７を生成することができる。訓練用データには、他の関連性のないコンテンツの組についての特徴マップと関連なしのラベルが含まれることは言うまでもない。 Similarly, since content fig2 and content cap1 are unrelated, a neural network can be trained using a feature map generated based on content fig2 and content cap1 and an unrelated label to generate a relevance determination unit 57. Needless to say, the training data includes feature maps and unrelated labels for other unrelated content pairs.

なお、ニューラルネットワークの学習において、損失関数は、二乗誤差関数など適宜決定することができるが、バイナリ交差（クロス）エントロピーを用いてもよい。交差エントロピーは、訓練用データが教師データと同じになる確率の対数関数で表されるので、教師データと学習結果との差が大きい場合、損失関数が大きくなり、学習の都度の損失関数の減少幅が大きくなり、学習速度を早くできる。 In addition, in training a neural network, the loss function can be appropriately determined as a squared error function, or binary cross entropy can be used. Cross entropy is expressed as a logarithmic function of the probability that the training data will be the same as the teacher data, so if the difference between the teacher data and the training results is large, the loss function will be large, and the loss function will decrease more with each training, allowing for faster learning.

図１２は情報処理装置５０によるコンテンツの関連性判定結果の一例を示す模式図である。図１２Ａ及び図１２Ｂでは、文書内に３つの画像と３つのキャプションがレイアウトされている。図１２Ａは比較例であり、図１２Ｂは本実施の形態の情報処理装置５０による関連性判定結果を示す。図１２Ａの比較例では、コンテンツ同士の相対位置だけが考慮されている。このため、文書内における距離の近いコンテンツ同士に関連性があると判定してしまう傾向があり、点線で示すように、関連性のあるコンテンツを検出できない場合、あるいは、一点鎖線で示すように、関連性のないコンテンツを関連性があると誤検出する場合がある。これに対して、図１２Ｂに示すように、本実施の形態の情報処理装置５０によれば、関連性のあるコンテンツを全て正確に検出していることが分かる。 Figure 12 is a schematic diagram showing an example of a content relevance determination result by the information processing device 50. In Figures 12A and 12B, three images and three captions are laid out in a document. Figure 12A is a comparative example, and Figure 12B shows a relevance determination result by the information processing device 50 of this embodiment. In the comparative example of Figure 12A, only the relative positions of the contents are taken into consideration. For this reason, there is a tendency to determine that contents close to each other in a document are related, and as shown by the dotted line, there are cases where related content cannot be detected, or unrelated content is erroneously detected as related, as shown by the dashed and dotted line. In contrast, as shown in Figure 12B, it can be seen that the information processing device 50 of this embodiment accurately detects all related content.

図１２Ｃ及び図１２Ｄでは、文書内に２つの画像と２つのキャプションがレイアウトされている。図１２Ｃは比較例であり、図１２Ｄは本実施の形態の情報処理装置５０による関連性判定結果を示す。図１２Ｃの比較例では、コンテンツ同士の相対位置だけが考慮されている。このため、一点鎖線で示すように、関連性のないコンテンツを関連性があると誤検出する場合がある。これに対して、図１２Ｄに示すように、本実施の形態の情報処理装置５０によれば、関連性のあるコンテンツを全て正確に検出していることが分かる。 In Figures 12C and 12D, two images and two captions are laid out in a document. Figure 12C is a comparative example, and Figure 12D shows the result of a relevance determination made by the information processing device 50 of this embodiment. In the comparative example of Figure 12C, only the relative positions of the contents are taken into consideration. For this reason, as shown by the dashed dotted line, unrelated contents may be erroneously detected as related. In contrast, as shown in Figure 12D, it can be seen that the information processing device 50 of this embodiment accurately detects all related contents.

表示処理部５９は、出力部としての機能を有し、関連性判定部５７の判定結果に基づいて、文書内のカテゴリのコンテンツの関連性を識別する識別情報を出力することができる。識別情報は、例えば、図１２に例示したような、文書内の関連性のあるコンテンツ同士を繋ぐ線分でもよく、あるいは、関連性のあるコンテンツを囲む同色の枠でもよい。また、識別情報は、常時表示してもよいが、常時表示する必要がない場合には、所定の操作によって関連付けられたことが分かるように表示してもよい。これにより、ユーザは、関連性のあるコンテンツ同士を容易に認識することができる。 The display processing unit 59 has a function as an output unit, and can output identification information that identifies the relevance of content of categories in a document based on the determination result of the relevance determination unit 57. The identification information may be, for example, a line segment connecting related content in a document, as exemplified in FIG. 12, or a frame of the same color surrounding the related content. The identification information may be displayed at all times, but if it is not necessary to display it at all times, it may be displayed in a way that shows that the content has been associated by a specified operation. This allows the user to easily recognize related content.

図１３はクラスタコンテンツに対する操作の一例を示す模式図である。クラスタコンテンツは、関連性判定部５７によって関連性があると判定されたコンテンツである。図１３に示すように、表示パネル５８に複数のコンテンツが配置された文書（例えば、１頁、あるいは両開きの２頁相当）が表示されている。図１３の例では、コンテンツとして、タイトル、本文Ａ、本文Ｂ、画像（図）Ａ（figure objectＡ）、キャプションＡ（caption objectＡ）、キャプションＢ（caption objectＢ）が表示されている。また、画像（図）Ａ（figure objectＡ）とキャプションＡ（caption objectＡ）及びＢ（caption objectＢ）とがお互いに関連性があるとする。 Fig. 13 is a schematic diagram showing an example of an operation on cluster content. Cluster content is content that has been determined to be related by the relevance determination unit 57. As shown in Fig. 13, a document (e.g., one page, or two pages in a double-page spread) in which multiple contents are arranged is displayed on the display panel 58. In the example of Fig. 13, the following contents are displayed: title, main text A, main text B, image (figure) A (figure object A), caption A (caption object A), and caption B (caption object B). It is also assumed that image (figure) A (figure object A) and captions A (caption object A) and B (caption object B) are related to each other.

図１３の左図のように、アイコン１００を画像Ａ（または画像Ａの周辺、キャプションＡ又はＢでもよい）に近づけて、タッチ操作及びドラッグ操作を行うと、右図に示すように、画像ＡとともにキャプションＡ及びＢを同じように移動させることができる。画像Ａ、キャプションＡ及びキャプションＢは、1個のクラスタコンテンツ１０１を構成している。 As shown in the left diagram of FIG. 13, when the icon 100 is brought close to image A (or the periphery of image A, or caption A or B) and a touch operation and drag operation are performed, the captions A and B can be moved in the same way along with image A, as shown in the right diagram. Image A, caption A, and caption B make up one cluster content 101.

このように、表示処理部５９は、表示パネル５８に表示したクラスタコンテンツを選択する操作を受け付けた場合、クラスタコンテンツよって関連付けられたコンテンツそれぞれを選択した表示態様で表示することができる。例えば、表示パネル５８に表示されたクラスタコンテンツ内の一のコンテンツまたはコンテンツの周辺を選択する操作を行い、表示パネル５８上を移動（ドラッグ）すると、クラスタコンテンツ内のすべてのコンテンツが選択された表示態様で表示され、クラスタコンテンツ全体を移動（ドラッグ）させることができる。これにより、関連性のあるコンテンツに対しては、同様の操作を繰り返す必要がなく、文書内のコンテンツに対する操作性が向上する。 In this way, when the display processing unit 59 receives an operation to select cluster content displayed on the display panel 58, it can display each of the contents associated by the cluster content in the selected display mode. For example, when an operation is performed to select one piece of content or the periphery of a piece of content in the cluster content displayed on the display panel 58 and then moved (dragged) on the display panel 58, all of the contents in the cluster content are displayed in the selected display mode, and the entire cluster content can be moved (dragged). This eliminates the need to repeat the same operation for related content, improving operability of the content in the document.

図１４は情報処理装置５０によるコンテンツの関連性判定の処理手順の一例を示すフローチャートである。以下では便宜上、処理の主体を制御部５１として説明する。制御部５１は、複数のコンテンツを含む文書データを取得し（Ｓ１１）、コンテンツのカテゴリを特定する（Ｓ１２）。カテゴリは、例えば、タイトル、本文（テキスト）、図（画像）、キャプション（図の説明文）などを含む。ここでは、図及びキャプションを所要のカテゴリとして特定することができる。 Figure 14 is a flowchart showing an example of a processing procedure for determining the relevance of content by the information processing device 50. For convenience, the processing will be described below with the control unit 51 as the subject. The control unit 51 acquires document data including multiple pieces of content (S11), and identifies the category of the content (S12). The categories include, for example, titles, main body (text), figures (images), captions (explanations of figures), etc. Here, figures and captions can be identified as the required category.

制御部５１は、特定したカテゴリのコンテンツの組を生成する（Ｓ１３）。コンテンツの組は、例えば、図５に例示した組の全てとすることができる。制御部５１は、コンテンツ全体マップを生成する（Ｓ１４）。コンテンツ全体マップは、例えば、図７に例示したマップとすることができる。 The control unit 51 generates a set of content for the identified category (S13). The set of content may be, for example, all of the sets illustrated in FIG. 5. The control unit 51 generates an overall content map (S14). The overall content map may be, for example, the map illustrated in FIG. 7.

制御部５１は、座標マップを生成する（Ｓ１５）。座標マップは、例えば、図８に例示したマップとすることができる。制御部５１は、コンテンツ組マップを生成する（Ｓ１６）。ここで、コンテンツ組マップは、例えば、図６に例示したマップの一つとすることができる。 The control unit 51 generates a coordinate map (S15). The coordinate map can be, for example, the map illustrated in FIG. 8. The control unit 51 generates a content group map (S16). Here, the content group map can be, for example, one of the maps illustrated in FIG. 6.

制御部５１は、コンテンツ全体マップ、コンテンツ組マップ、及び座標マップを結合して、特徴マップを構成する（Ｓ１７）。特徴マップは、例えば、図９に例示したマップとすることができる。なお、座標マップを用いなくてもよい。この場合、ステップＳ１５の処理は不要である。 The control unit 51 combines the entire content map, the content group map, and the coordinate map to construct a feature map (S17). The feature map can be, for example, the map illustrated in FIG. 9. Note that the coordinate map does not have to be used. In this case, the process of step S15 is not necessary.

制御部５１は、特徴マップを関連性判定部５７に入力して、コンテンツの組の関連性の有無を判定する（Ｓ１８）。制御部５１は、すべてのコンテンツの組を判定したか否かを判定し（Ｓ１９）、すべてのコンテンツの組を判定していない場合（Ｓ１９でＮＯ）、未処理のコンテンツの組を選択し（Ｓ２０）、ステップＳ１６以降の処理を続ける。 The control unit 51 inputs the feature map to the relevance determination unit 57 to determine whether the content sets are related (S18). The control unit 51 determines whether all content sets have been determined (S19), and if not all content sets have been determined (NO in S19), it selects an unprocessed content set (S20) and continues the processing from step S16 onwards.

すべてのコンテンツの組を判定した場合（Ｓ１９でＹＥＳ）、制御部５１は、関連性のあるコンテンツをクラスタコンテンツとして記録し（Ｓ２１）、コンテンツの関連性を識別する識別情報を出力し（Ｓ２２）、処理を終了する。識別情報は、例えば、図１２に例示したコンテンツを繋ぐ線分とすることができる。 When all the content pairs have been determined (YES in S19), the control unit 51 records the related content as cluster content (S21), outputs identification information that identifies the relationship between the contents (S22), and ends the process. The identification information can be, for example, a line segment that connects the contents as shown in FIG. 12.

上述の例では、コンテンツの組を選択する都度、ステップＳ１６において、コンテンツ組マップを生成する処理を行う構成であったが、これに限定されるものではない。例えば、ステップＳ１６で、全ての組について、コンテンツ組マップを生成しておき、コンテンツの組を選択する都度、ステップＳ１７の処理を繰り返して特徴マップを構成するようにしてもよい。 In the above example, a content group map is generated in step S16 each time a content group is selected, but this is not limited to the above. For example, a content group map may be generated for all groups in step S16, and the process of step S17 may be repeated each time a content group is selected to generate a feature map.

情報処理装置５０は、例えば、ＣＰＵ（例えば、複数のプロセッサコアを実装したマルチ・プロセッサなど）、ＧＰＵ（Graphics Processing Units）、ＲＡＭなどを備えたコンピュータを用いて実現することもできる。図１４に示すような処理の手順を定めたコンピュータプログラム（記録媒体に記録可能）をコンピュータに備えられたＲＡＭにロードし、コンピュータプログラムをＣＰＵ（プロセッサ）で実行することにより、コンピュータ上で情報処理装置５０を実現することができる。 The information processing device 50 can be realized, for example, by using a computer equipped with a CPU (e.g., a multi-processor having multiple processor cores), a GPU (Graphics Processing Units), a RAM, etc. The information processing device 50 can be realized on a computer by loading a computer program (which can be recorded on a recording medium) that defines the processing procedure as shown in FIG. 14 into the RAM of the computer and executing the computer program with the CPU (processor).

上述の例では、カテゴリが図であるコンテンツと、カテゴリがキャプションであるコンテンツとの間の関連性の有無を判定するものであるが、カテゴリは、図とキャプションに限定されるものではなく、他のカテゴリのコンテンツとの関連性の有無を判定することもできる。また、カテゴリが図である複数のコンテンツの間の関連性の有無を判定してもよい。どのカテゴリのコンテンツの関連性の有無を判定するかは、ユーザが設定できるようにしてもよい。 In the above example, the relevance between content whose category is figures and content whose category is captions is determined, but the categories are not limited to figures and captions, and the relevance with content of other categories can also be determined. The relevance between multiple pieces of content whose category is figures may also be determined. The user may be able to set which category of content to determine the relevance of.

本実施の形態の情報処理装置は、複数のコンテンツを含む文書データを取得する取得部と、取得した文書データに含まれる前記複数のコンテンツのカテゴリを特定する特定部と、特定したカテゴリのコンテンツの組を生成する生成部と、カテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第１マップのセットを生成する第１マップ生成部と、一つの組に対してカテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第２マップのセットを前記コンテンツの組毎に生成する第２マップ生成部と、前記第１マップのセット、及び前記コンテンツの組毎の前記第２マップのセットを、コンテンツ間の関連性を学習済みの学習済みモデルに入力して、前記コンテンツの組毎のコンテンツ間の関連性の有無を判定する判定部とを備える。 The information processing device of this embodiment includes an acquisition unit that acquires document data including multiple contents, an identification unit that identifies categories of the multiple contents included in the acquired document data, a generation unit that generates a set of contents of the identified category, a first map generation unit that generates a set of first maps in which an area corresponding to the content and other areas are imaged with different pixel values for each category, a second map generation unit that generates a set of second maps for each set of contents in which an area corresponding to the content and other areas are imaged with different pixel values for each category, and a determination unit that inputs the set of first maps and the set of second maps for each set of contents into a trained model that has trained the relevance between contents and determines whether or not there is a relevance between the contents for each set of contents.

本実施の形態のコンピュータプログラムは、コンピュータに、複数のコンテンツを含む文書データを取得し、取得した文書データに含まれる前記複数のコンテンツのカテゴリを特定し、特定したカテゴリのコンテンツの組を生成し、カテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第１マップのセットを生成し、一つの組に対してカテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第２マップのセットを前記コンテンツの組毎に生成し、前記第１マップのセット、及び前記コンテンツの組毎の前記第２マップのセットを、コンテンツ間の関連性を学習済みの学習済みモデルに入力して、前記コンテンツの組毎のコンテンツ間の関連性の有無を判定する、処理を実行させる。 The computer program of this embodiment causes a computer to execute a process of acquiring document data including a plurality of pieces of content, identifying categories of the plurality of pieces of content included in the acquired document data, generating a set of content for the identified category, generating a set of first maps for each category in which an area corresponding to the content and other areas are imaged with different pixel values, generating a set of second maps for each set of content in which an area corresponding to the content and other areas are imaged with different pixel values for each category, inputting the set of first maps and the set of second maps for each set of content into a trained model that has learned the relevance between content, and determining whether or not there is a relevance between the content for each set of content.

取得部は、複数のコンテンツを含む文書データを取得する。コンテンツは、文書内にレイアウトされる各要素である。文書データは、雑誌、書籍、新聞等の版面データであり、文書データを記録するデータサーバから取得してもよく、スキャナ等の読取装置から読み取ってもよい。 The acquisition unit acquires document data including multiple contents. The contents are the elements laid out in the document. The document data is page data for magazines, books, newspapers, etc., and may be acquired from a data server that records document data, or may be read from a reading device such as a scanner.

特定部は、取得した文書データに含まれる複数のコンテンツのカテゴリを特定する。カテゴリには、例えば、タイトル、本文（テキスト）、図（画像）、キャプション（図の説明文）などが含まれる。特定部は、文書内にレイアウトされた各要素のカテゴリを特定することができる。 The identification unit identifies the categories of multiple contents contained in the acquired document data. The categories include, for example, the title, main text, figures (images), and captions (explanation of figures). The identification unit can identify the category of each element laid out in the document.

生成部は、特定したカテゴリのコンテンツの組を生成する。特定したカテゴリを、図（ｆｉｇ）とキャプション（ｃａｐ）とし、文書内に図が３つ（ｆｉｇ１、ｆｉｇ２、ｆｉｇ３）とキャプションが３つ（ｃａｐ１、ｃａｐ２、ｃａｐ３）存在するとする。生成部は、例えば、カテゴリが異なるコンテンツの全ての組（ｆｉｇ１－ｃａｐ１の組、ｆｉｇ１－ｃａｐ２の組、…等）を生成することができる。この場合、９通りの組を生成することができる。 The generation unit generates a set of content in the specified category. Let us say that the specified category is figure (fig) and caption (cap), and that there are three figures (fig1, fig2, fig3) and three captions (cap1, cap2, cap3) in the document. The generation unit can, for example, generate all sets of content in different categories (fig1-cap1 set, fig1-cap2 set, ..., etc.). In this case, nine sets can be generated.

第１マップ生成部は、カテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第１マップのセットを生成する。第１マップは、特定したカテゴリ毎に生成することができる。例えば、カテゴリが図の場合、図に関する第１マップは、文書内のｆｉｇ１、ｆｉｇ２、ｆｉｇ３のレイアウト（配置）を画像化したものとすることができ、例えば、ｆｉｇ１、ｆｉｇ２、ｆｉｇ３に対応する画素値を１とし、文書内のｆｉｇ１、ｆｉｇ２、ｆｉｇ３以外に対応する画素値を０とすることができる。同様に、カテゴリがキャプションの場合、キャプションに関する第１マップは、文書内のｃａｐ１、ｃａｐ２、ｃａｐ３のレイアウト（配置）を画像化したものとすることができ、例えば、ｃａｐ１、ｃａｐ２、ｃａｐ３に対応する画素値を１とし、文書内のｃａｐ１、ｃａｐ２、ｃａｐ３以外に対応する画素値を０とすることができる。 The first map generating unit generates a set of first maps in which the area corresponding to the content and the other areas are imaged with different pixel values for each category. The first map can be generated for each specified category. For example, if the category is a figure, the first map for the figure can be an image of the layout (arrangement) of fig1, fig2, and fig3 in the document, and for example, the pixel values corresponding to fig1, fig2, and fig3 can be set to 1, and the pixel values corresponding to other than fig1, fig2, and fig3 in the document can be set to 0. Similarly, if the category is a caption, the first map for the caption can be an image of the layout (arrangement) of cap1, cap2, and cap3 in the document, and for example, the pixel values corresponding to cap1, cap2, and cap3 can be set to 1, and the pixel values corresponding to other than cap1, cap2, and cap3 in the document can be set to 0.

第２マップ生成部は、一つの組に対してカテゴリごとにコンテンツに対応する領域とその他の領域を異なる画素値で画像化した第２マップのセットをコンテンツの組毎に生成する。コンテンツの組をｆｉｇ１－ｃａｐ１とすると、図に関する第２マップは、文書内のｆｉｇ１のレイアウト（配置）を画像化したものとすることができ、キャプションに関する第２マップは、文書内のｃａｐ１のレイアウト（配置）を画像化したものとすることができる。すなわち、１つのコンテンツの組に対応して２つの第２マップを生成することができる。また、コンテンツの組をｆｉｇ１－ｃａｐ２とすると、図に関する第２マップは、文書内のｆｉｇ１のレイアウト（配置）を画像化したものとすることができ、キャプションに関する第２マップは、文書内のｃａｐ２のレイアウト（配置）を画像化したものとすることができる。以下、同様にして、コンテンツの全ての組に対して、第２マップを生成することができる。コンテンツの組が９通りある場合、図及びキャプションに関する第２マップをそれぞれ９個生成することができる。 The second map generating unit generates a set of second maps for each content pair, in which the area corresponding to the content and the other areas are imaged with different pixel values for each category of the pair. If the content pair is fig1-cap1, the second map for the figure can be an image of the layout (placement) of fig1 in the document, and the second map for the caption can be an image of the layout (placement) of cap1 in the document. That is, two second maps can be generated for one content pair. If the content pair is fig1-cap2, the second map for the figure can be an image of the layout (placement) of fig1 in the document, and the second map for the caption can be an image of the layout (placement) of cap2 in the document. In the same manner, second maps can be generated for all content pairs. If there are nine content pairs, nine second maps for the figures and nine second maps for the captions can be generated.

判定部は、第１マップのセット、及びコンテンツの組毎の第２マップのセットを、コンテンツ間の関連性を学習済みの学習済みモデルに入力して、コンテンツの組毎のコンテンツ間の関連性の有無を判定する。 The determination unit inputs the set of first maps and the set of second maps for each group of content into a trained model that has learned the relevance between the contents, and determines whether there is a relevance between the contents for each group of content.

例えば、コンテンツの組（ｆｉｇ１－ｃａｐ１）のコンテンツｆｉｇ１とｃａｐ１との間の関連性の有無が、文書内のコンテンツｆｉｇ１及びｃａｐ１のレイアウトを画像化して得られた第２マップに基づいて判定されるので、単にコンテンツｆｉｇ１とｃａｐ１との相対位置（相対座標）に基づいて判定する場合に比べて、判定に用いる情報量を増やすことができ、精度の高い判定を行うことができる。 For example, the presence or absence of a relationship between contents fig1 and cap1 of a content pair (fig1-cap1) is determined based on a second map obtained by imaging the layout of contents fig1 and cap1 in a document, so the amount of information used for the determination can be increased and a more accurate determination can be made compared to when the determination is simply based on the relative positions (relative coordinates) of contents fig1 and cap1.

また、第１マップを用いることにより、コンテンツｆｉｇ１及びｃａｐ１のレイアウト（第２マップ）だけでなく、文書内の他のコンテンツとの関連性、例えば、コンテンツｆｉｇ１とｃａｐ１以外のカテゴリがキャプションのコンテンツとの関連性、及びコンテンツｃａｐ１とｆｉｇ１以外のカテゴリが図のコンテンツとの関連性も考慮することができ、文書内のコンテンツの関連性をさらに精度良く判定することができる。 In addition, by using the first map, it is possible to take into account not only the layout (second map) of the contents fig1 and cap1, but also the relevance with other content in the document, for example, the relevance of categories other than the contents fig1 and cap1 with the caption content, and the relevance of categories other than the contents cap1 and fig1 with the figure content, making it possible to more accurately determine the relevance of the content in the document.

本実施の形態の情報処理装置において、前記第１マップ生成部は、特定したカテゴリ毎にコンテンツの画素値が異なる第１マップを生成し、前記第２マップ生成部は、特定したカテゴリ毎にコンテンツの画素値が異なる第２マップを生成する。 In the information processing device of this embodiment, the first map generation unit generates a first map in which the pixel values of the content differ for each identified category, and the second map generation unit generates a second map in which the pixel values of the content differ for each identified category.

第１マップ生成部は、特定したカテゴリ毎にコンテンツの画素値が異なる第１マップを生成する。第２マップ生成部は、特定したカテゴリ毎にコンテンツの画素値が異なる第２マップを生成する。コンテンツの組をｆｉｇ１－ｃａｐ１とすると、文書内のｆｉｇ１に対応する画素の画素値をａとし、ｃａｐ１に対応する画素の画素値をｂとし、当該文書内のｆｉｇ１及びｃａｐ１以外に対応する部分の画素の画素値をｃとすることができる。画素値の違いは、輝度の違いを表すものでもよく、色の違いを表すものでもよい。これにより、１つのコンテンツの組に対応して１つの第１マップ及び１つのコンテンツの組に対応して１つの第２マップを生成することができる。 The first map generation unit generates a first map in which the pixel values of the content differ for each identified category. The second map generation unit generates a second map in which the pixel values of the content differ for each identified category. If the set of contents is fig1-cap1, the pixel value of a pixel corresponding to fig1 in the document can be a, the pixel value of a pixel corresponding to cap1 can be b, and the pixel value of a pixel in the document corresponding to parts other than fig1 and cap1 can be c. The difference in pixel value may represent a difference in brightness or a difference in color. This makes it possible to generate one first map corresponding to one set of contents and one second map corresponding to one set of contents.

本実施の形態の情報処理装置は、前記判定部の判定結果に基づいて、文書内の前記カテゴリのコンテンツの関連性を識別する識別情報を出力する出力部を備える。 The information processing device of this embodiment includes an output unit that outputs identification information that identifies the relevance of the content of the category within the document based on the judgment result of the judgment unit.

出力部は、判定部の判定結果に基づいて、文書内のカテゴリのコンテンツの関連性を識別する識別情報を出力する。識別情報は、例えば、文書内の関連性のあるコンテンツ同士を繋ぐ線分でもよく、関連性のあるコンテンツ同士を囲む同色の枠でもよく、常時表示されるものではなく、所定の操作によって関連付けられたことが分かる表示態様でもよい。これにより、ユーザは、関連性のあるコンテンツ同士を容易に認識することができる。 The output unit outputs identification information that identifies the relevance of the content of the category in the document based on the determination result of the determination unit. The identification information may be, for example, a line segment connecting related content in the document, or a frame of the same color surrounding related content, and may not be displayed all the time, but may be displayed in a manner that indicates that the content is associated through a specified operation. This allows the user to easily recognize related content.

本実施の形態の情報処理装置は、文書内の位置情報を画像化した第３マップを生成する第３マップ生成部を備え、前記判定部は、前記第３マップに基づいて、前記コンテンツの組毎のコンテンツ間の関連性の有無を判定する。 The information processing device of this embodiment includes a third map generation unit that generates a third map that visualizes positional information within a document, and the determination unit determines whether or not there is a relationship between the contents for each set of the contents based on the third map.

第３マップ生成部は、文書内の位置情報を画像化した第３マップを生成する。文書の画像が（ｍ×ｎ）画素で構成されるとする（ｍ：ｙ軸方向の画素数、ｎ：ｘ軸方向の画素数）。第３マップは、ｘ軸の座標マップ、及びｙ軸の座標マップで構成することができる。ｘ軸の座標マップは、画素値がｘ軸方向の位置を表すものであり、ｘ軸方向の画素０、１、２、…、ｎそれぞれに対して、画素値ｘ₀、ｘ₁、ｘ₂、…、ｘ_nが与えられている。ｙ軸の座標マップは、画素値がｙ軸方向の位置を表すものであり、ｙ軸方向の画素０、１、２、…、ｍそれぞれに対して、画素値ｙ₀、ｙ₁、ｙ₂、…、ｙ_nが与えられている。 The third map generating unit generates a third map that visualizes position information in the document. Assume that the image of the document is composed of (m×n) pixels (m: number of pixels in the y-axis direction, n: number of pixels in the x-axis direction). The third map can be composed of an x-axis coordinate map and a y-axis coordinate map. The x-axis coordinate map indicates the position of the pixel in the x-axis direction, and pixel values _x0 , x1, x2, ..., xn are given to pixels 0, ₁ , ₂ , ..., _n in the x-axis direction. The y-axis coordinate map indicates the position of the pixel in the y-axis direction, and pixel values y0, y1, y2, ..., yn are given to pixels ₀ , ₁ , ₂ , ..., _m in the y-axis direction.

判定部は、第３マップを入力して、コンテンツの組毎のコンテンツ間の関連性の有無を判定する。すなわち、第１マップ（コンテンツの各組に対して共通）及び第２マップ（コンテンツの組毎に異なる）に加えて、第３マップ（コンテンツの各組に対して共通）を用いて、コンテンツの組毎に関連性を判定することにより、第１マップ及び第２マップ内の各コンテンツの文書内での位置関係を抽出するための情報を考慮して関連性を判定できるので、文書内のコンテンツの関連性の有無をさらに精度良く判定することができる。 The determination unit inputs the third map and determines whether or not there is a relationship between the contents for each group of contents. That is, by determining the relationship for each group of contents using the third map (common to each group of contents) in addition to the first map (common to each group of contents) and the second map (different for each group of contents), the relevance can be determined taking into account information for extracting the positional relationship within the document of each piece of content in the first map and the second map, and therefore the relevance of the contents within the document can be determined with even greater accuracy.

１通信ネットワーク
１０サーバ
２０スキャナ
５０情報処理装置
５１制御部
５２通信部
５３記憶部
５４カテゴリ特定部
５５コンテンツ組生成部
５６マップ生成部
５７関連性判定部
５８表示パネル
５９表示処理部
６０操作部 REFERENCE SIGNS LIST 1 Communication network 10 Server 20 Scanner 50 Information processing device 51 Control unit 52 Communication unit 53 Storage unit 54 Category identification unit 55 Content group generation unit 56 Map generation unit 57 Relevance determination unit 58 Display panel 59 Display processing unit 60 Operation unit

Claims

An acquisition unit that acquires document data including a plurality of contents;
an identification unit that identifies categories of the plurality of contents included in the acquired document data;
A generation unit that generates a set of content for the identified category;
a first map generator that generates a set of first maps in which an area corresponding to the content and other areas are imaged with different pixel values for each category;
a second map generating unit that generates, for each set of contents, a set of second maps in which an area corresponding to the content and other areas are imaged with different pixel values for each category of the set;
a determination unit that inputs the set of first maps and the set of second maps for each set of content into a trained model that has trained the relevance between contents and determines whether or not there is a relevance between the contents for each set of content;
An information processing device comprising:

The first map generating unit is
generating a first map in which pixel values of the content differ for each of the identified categories;
The second map generating unit is
generating a second map in which pixel values of the content differ for each of the identified categories;
The information processing device according to claim 1 .

an output unit that outputs identification information that identifies the relevance of the content of the category in the document based on the determination result of the determination unit;
3. The information processing device according to claim 1 or 2.

a third map generating unit that generates a third map by imaging position information within a document;
The determination unit is
determining whether or not there is a relationship between the contents of each of the sets of contents based on the third map;
The information processing device according to claim 1 .

On the computer,
Acquire document data including multiple contents,
Identifying categories of the plurality of contents included in the acquired document data;
Generate a set of content for the identified categories;
generating a set of first maps in which an area corresponding to the content and other areas are imaged with different pixel values for each category;
generating a set of second maps for each set of content, in which an area corresponding to the content and other areas are imaged with different pixel values for each category of the set;
The set of first maps and the set of second maps for each of the sets of content are input to a trained model that has trained the relevance between contents, and a determination is made as to whether or not there is a relevance between the contents for each of the sets of content.
A computer program that executes a process.