JP7545061B2

JP7545061B2 - Information processing system, information processing method, and program

Info

Publication number: JP7545061B2
Application number: JP2022029784A
Authority: JP
Inventors: 義治進
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2024-09-04
Anticipated expiration: 2042-02-28
Also published as: JP2023125592A

Description

本発明は、情報処理システム、情報処理方法、プログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

企業内に電子化された文書が増えるにつれ、業務上必要な文書を効率的に検索するための文書検索システムの重要性が高まっている。ここで、文書検索システムは、ユーザーが入力した検索条件に関連する文書集合をユーザーに提示するシステムである。代表例としては、文字列で検索クエリを入力し、検索クエリ中に含まれる検索キーワードによって関連文書を取得する機能を持つものがある。 As the number of electronic documents within companies increases, the importance of document search systems for efficiently searching for documents required for business is growing. A document search system is a system that presents users with a collection of documents related to the search criteria they input. A typical example is a system that allows users to input a search query as a string and retrieves related documents based on the search keywords included in the search query.

検索システムにおいて、検索結果の文書において検索キーワードが現れる周辺の文字列を表示し、さらに検索キーワードをハイライトする（ハイライト機能）等により識別表示する技術が存在する（以下、識別表示の対象となる単語を識別単語という）。 In search systems, there is technology that displays the character strings around the search keywords in the documents of the search results and further highlights the search keywords (highlight function) to distinguish and display them (hereafter, words that are the subject of distinguishing display are called distinguished words).

ハイライト機能により、ユーザーは検索結果として得られる複数の文書から自分にとって興味のある単語を効率的に見つけることができ、その結果、自分が求めている文書がどの文書であるか素早く把握することができる。 The highlighting feature allows users to efficiently find words that interest them among multiple documents returned as search results, allowing them to quickly identify which document they are looking for.

非特許文献１には、文書検索システムにおけるハイライト機能について開示されている。 Non-patent document 1 discloses a highlighting function in a document search system.

ｈｔｔｐｓ：／／ｗｗｗ．ｈｉｔａｃｈｉ－ｓｙｓｔｅｍｓ．ｃｏｍ／ｉｎｄ／ｓｒｐａｒｔｎｅｒ／ｐｒｏｄｕｃｔ／ｈｉｇｈｌｉｇｈｔ／ｉｎｄｅｘ．ｈｔｍｌhttps://www. hitachi-systems. com/ind/srpartner/product/highlight/index. html

非特許文献１には、検索に用いたキーワードについてユーザが指定した色でハイライトして表示する機能について開示されている。 Non-Patent Document 1 discloses a function that highlights and displays keywords used in a search in a color specified by the user.

しかし、検索の仕方によっては、検索に用いたキーワードがハイライトされるだけでは、検索された文書の特徴を効率的に把握できない等の課題がある。 However, depending on the search method, there are issues with only highlighting the keywords used in the search, such as not being able to efficiently grasp the characteristics of the searched documents.

そこで、本発明は、検索結果を効率的に確認できる仕組みを提供することを目的とする。 Therefore, the present invention aims to provide a mechanism for efficiently checking search results.

本発明の情報処理システムは、ユーザから指定された検索クエリを用いて文書検索を行う検索手段と、前記検索手段による検索結果を表示するよう制御する表示制御手段と、前記検索された文書から当該文書における特徴語を取得する特徴語取得手段と、を備え、前記表示制御手段は、前記検索手段により検索された文書における前記特徴語を識別単語として識別可能に表示するよう制御することを特徴とする。 The information processing system of the present invention comprises a search means for performing a document search using a search query specified by a user, a display control means for controlling the display of the search results by the search means, and a characteristic word acquisition means for acquiring characteristic words in the searched documents from the documents, and the display control means controls the display of the characteristic words in the documents searched by the search means so as to be identifiable as identification words.

また、本発明の情報処理システムは、ユーザから指定された検索クエリを用いて文書検索を行う検索手段と、前記検索手段による検索結果を表示するよう制御する表示制御手段と、前記検索手段により検索された文書に関連する単語である関連語を取得する関連語取得手段と、を備え、前記表示制御手段は、前記検索手段により検索された文書に含まれる前記関連語を識別単語として識別可能に表示するよう制御することを特徴とする。 The information processing system of the present invention also includes a search means for performing a document search using a search query specified by a user, a display control means for controlling the display of search results by the search means, and a related word acquisition means for acquiring related words that are words related to the documents searched for by the search means, and the display control means controls the related words included in the documents searched for by the search means to be displayed identifiably as identification words.

本発明によれば、検索結果を効率的に確認することがかのうとなる。 The present invention makes it possible to efficiently review search results.

本発明の実施形態における、文書検索システムのシステム構成の一例を示す図である。1 is a diagram illustrating an example of a system configuration of a document search system according to an embodiment of the present invention. 本発明の実施形態における、文書検索システム、クライアント端末のハードウェア構成の一例を示すブロック図である。1 is a block diagram showing an example of a hardware configuration of a document search system and a client terminal according to an embodiment of the present invention. 本発明の実施形態における、文書ＤＢに保存された検索対象文書集合の一例を示す図である。FIG. 2 is a diagram showing an example of a set of documents to be searched stored in a document DB according to the embodiment of the present invention. 本発明の実施形態における、検索条件として用いられるデータの一例を示す図である。FIG. 4 is a diagram showing an example of data used as a search condition in the embodiment of the present invention. 本発明の実施形態における、検索結果として用いられるデータの一例を示す図FIG. 1 is a diagram showing an example of data used as a search result in an embodiment of the present invention. 本発明の実施形態における、検索結果一覧に関する優先度ルール表の一例を示す図である。FIG. 13 is a diagram showing an example of a priority rule table related to a search result list in the embodiment of the present invention. 本発明の実施形態における、検索結果詳細に関する優先度ルール表の一例を示す図である。FIG. 13 is a diagram showing an example of a priority rule table regarding detailed search results in the embodiment of the present invention. 本発明の実施形態における、識別単語候補作成部が作成する識別単語候補ソース表の一例を示す図である。4 is a diagram showing an example of an identification word candidate source table created by an identification word candidate creation unit in the embodiment of the present invention. FIG. 本発明の実施形態における、識別単語候補作成部が作成する識別単語候補表の一例を示す図である。4 is a diagram showing an example of an identification word candidate table created by an identification word candidate creating unit in the embodiment of the present invention. FIG. 本発明の実施形態における、検索処理部が実施する検索処理を示すフローチャートである。5 is a flowchart showing a search process performed by a search processing unit in the embodiment of the present invention. 本発明の実施形態における、識別単語候補作成部が実施する識別単語候補作成処理を示すフローチャートである。10 is a flowchart showing an identification word candidate generating process performed by an identification word candidate generating unit in the embodiment of the present invention. 本発明の実施形態における、検索結果画面の一例を示す図である。FIG. 11 is a diagram showing an example of a search result screen in the embodiment of the present invention. 本発明の実施形態における、検索結果画面において識別単語候補を表示している状態の一例を示す図である。FIG. 13 is a diagram showing an example of a state in which identification word candidates are displayed on a search result screen in the embodiment of the present invention. 本発明の実施形態における、検索結果画面において識別単語を選択した状態の一例を示す図である。FIG. 13 is a diagram showing an example of a state in which an identification word has been selected on a search result screen in an embodiment of the present invention. 本発明の実施形態における、検索結果詳細画面において識別単語候補を表示している状態の一例を示す図であるFIG. 13 is a diagram showing an example of a search result detail screen in which identification word candidates are displayed in the embodiment of the present invention. 本発明の実施形態における、検索結果詳細画面において識別単語を選択した状態の一例を示す図である。FIG. 13 is a diagram showing an example of a state in which an identification word has been selected on a search result details screen in an embodiment of the present invention.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 The following describes an embodiment of the present invention in detail with reference to the drawings.

図１は、本発明の実施形態における文書検索システム１００のシステム構成の一例を示す図である。 Figure 1 is a diagram showing an example of the system configuration of a document search system 100 in an embodiment of the present invention.

文書検索システム１００は、文書登録装置１１０、文書ＤＢ１２０、文書検索装置１３０、特徴語更新装置１４０から成る。 The document search system 100 consists of a document registration device 110, a document DB 120, a document search device 130, and a feature word update device 140.

文書登録装置１１０は、ユーザーによる検索の対象となる文書を登録するための装置であり、文書受信部１１１、キーワード抽出部１１２、文書登録処理部１１３から成る。 The document registration device 110 is a device for registering documents that are the subject of searches by users, and is composed of a document receiving unit 111, a keyword extraction unit 112, and a document registration processing unit 113.

文書受信部１１１は、登録対象の文書を受け付けるための機能部である。ユーザー（クライアント端末）はＷｅｂブラウザなどを通じて任意の文書を文書受信部１１１に送信できる。あるいは、クローラーが機械的に文書を収集して送信するような構成をとってもよい。 The document receiving unit 111 is a functional unit for accepting documents to be registered. A user (client terminal) can send any document to the document receiving unit 111 via a web browser or the like. Alternatively, a crawler may be configured to mechanically collect and send documents.

キーワード抽出部１１２は、文書受信部１１１が受け付けた文書から、当該文書における特徴語の候補となるキーワードとその出現頻度を抽出するための機能部である。特徴語の詳細については後述する。キーワード抽出部１１２におけるキーワード抽出処理は、公知の形態素解析技術を用いる。ここで、抽出する形態素は、文書検索システムの用途に応じて、固有名詞などの特定の品詞に限定してもよい。また、形態素解析を使用せずに、事前に定めたパターンに一致する文字列をキーワードとして抽出してもよい。 The keyword extraction unit 112 is a functional unit for extracting keywords that are candidates for characteristic words in a document received by the document receiving unit 111, and the frequency of their occurrence. Details of characteristic words will be described later. The keyword extraction process in the keyword extraction unit 112 uses a known morphological analysis technique. Here, the morphemes to be extracted may be limited to specific parts of speech, such as proper nouns, depending on the application of the document search system. Also, character strings that match a predetermined pattern may be extracted as keywords without using morphological analysis.

文書登録処理部１１３は、文書受信部１１１で受け付けた文書と、キーワード抽出部１１２において抽出したキーワードとを紐づけて、文書ＤＢ１２０へ格納する装置である。

文書ＤＢ１２０は、文書を一意に識別するための文書ＩＤ、文書名、本文、キーワード抽出部１１２が抽出した値を格納するキーワード：出現頻度、および、特徴語を格納する領域を備える。文書ＤＢ１２０に格納されたデータの一例を図３に示す。特徴語の作成方法ついては後述する。なお、本アイデアを説明するための構成として、前述の５項目を例示しているが、文書の所在を示すＵＲＬ、文書のサイズ、文書の作成者など、文書検索システムとして利用する項目を追加で備えてもよい。 The document registration processing unit 113 is a device that associates the document received by the document receiving unit 111 with the keywords extracted by the keyword extracting unit 112 and stores them in the document DB 120 .

The document DB 120 has an area for storing a document ID for uniquely identifying a document, a document name, a text body, a keyword (occurrence frequency) for storing values extracted by the keyword extraction unit 112, and characteristic words. An example of data stored in the document DB 120 is shown in FIG. 3. A method for creating characteristic words will be described later. Note that, although the above-mentioned five items are exemplified as a configuration for explaining this idea, additional items to be used as a document search system, such as a URL indicating the location of the document, the size of the document, and the creator of the document, may be included.

文書検索装置１３０は、検索処理部１３１、検索条件保存部１３２、検索操作保存部１３３、検索結果保存部１３４、識別単語候補作成部１３５、優先度ルール表（検索結果一覧）１３６、優先度ルール表（検索結果詳細）１３７からなる。 The document search device 130 comprises a search processing unit 131, a search condition storage unit 132, a search operation storage unit 133, a search result storage unit 134, an identification word candidate creation unit 135, a priority rule table (search result list) 136, and a priority rule table (search result details) 137.

検索処理部１３１は、ユーザーからの検索操作を受け付け、その検索操作を解釈して検索ＤＢに問い合わせる検索条件を生成し、その検索条件に合致する文書を文書ＤＢ１２０から検索する機能部であり、検索条件に関連のある文書をスコア順に取得する機能を備える。検索処理部１３１が行う検索処理の詳細は後述する。さらに、ユーザーは検索処理部１３１へ検索操作を送ると同時に識別単語を送ることもできる。 The search processing unit 131 is a functional unit that accepts a search operation from a user, interprets the search operation, generates search conditions to query the search DB, and searches the document DB 120 for documents that match the search conditions, and has a function of retrieving documents related to the search conditions in order of score. Details of the search process performed by the search processing unit 131 will be described later. Furthermore, the user can also send an identification word at the same time as sending the search operation to the search processing unit 131.

検索条件保存部１３２は、ユーザーが行った検索の検索条件を保存する機能部である。検索条件は図４のように「検索クエリ」と「類似文書検索クエリの特徴語」と「キーワードフィルター」からなる。 The search condition storage unit 132 is a functional unit that stores the search conditions of a search performed by a user. As shown in FIG. 4, the search conditions consist of a "search query," "characteristic words of the similar document search query," and a "keyword filter."

検索操作保存部１３３は、ユーザーが行った検索の検索操作を保存する機能部である。検索操作保存部１３３に保存される値としては少なくとも「検索クエリによる検索」、「類似文書検索」、「キーワードフィルターの追加」の３種類の値がありうる。さらに、「キーワードフィルターの追加」の際には追加情報としてキーワードの文字列を保存できる。 The search operation storage unit 133 is a functional unit that stores the search operations of searches performed by the user. At least three types of values can be stored in the search operation storage unit 133: "search by search query," "search for similar documents," and "add keyword filter." Furthermore, when "add keyword filter" is selected, a keyword string can be stored as additional information.

検索結果保存部１３４は、ユーザーが行った検索の検索結果を保存する機能部である。検索結果は図５のように、文書ＤＢ１２０から検索条件に合致する文書集合を抜き出した「文書一覧」と、検索結果に関連する単語の一覧である「関連語」と、文書一覧の文書ＩＤごとに本文内から識別表示する部分を抜き出した「スニペット」からなる。 The search result storage unit 134 is a functional unit that stores the results of searches performed by the user. As shown in FIG. 5, the search results consist of a "document list" that is a set of documents that match the search criteria extracted from the document DB 120, "related words" that are a list of words related to the search results, and "snippets" that are parts of the text that are extracted and displayed for identification for each document ID in the document list.

検索処理部１３１は、検索処理を実行するたびに検索条件保存部１３２および検索操作保存部１３３および検索結果保存部１３４に保存している情報を更新する。 The search processing unit 131 updates the information stored in the search condition storage unit 132, the search operation storage unit 133, and the search result storage unit 134 each time a search process is performed.

識別単語候補作成部１３５は、図９のような識別単語候補表を作成する識別単語候補作成処理を行う。識別単語候補表は単語とその優先度を持つ表である。識別単語候補表作成処理については後述する。 The identification word candidate creation unit 135 performs an identification word candidate creation process to create an identification word candidate table as shown in FIG. 9. The identification word candidate table is a table that contains words and their priorities. The identification word candidate table creation process will be described later.

優先度ルール表（検索結果一覧）１３６は、検索結果一覧に関する識別単語候補作成処理において用いる、図６のような優先度ルールを保持する。また、優先度ルール表（検索結果詳細）１３７は、検索結果詳細に関する識別単語候補作成処理において用いる、図７のような優先度ルールを保持する。これらの表の利用方法については、識別単語候補表作成処理の説明時に合わせて説明する。なお、これらの優先度ルール表は文書検索システムを構築したときにシステム管理者が値を設定できる。 The priority rule table (search result list) 136 holds priority rules such as those shown in FIG. 6, which are used in the process of creating identification word candidates for the search result list. Furthermore, the priority rule table (search result details) 137 holds priority rules such as those shown in FIG. 7, which are used in the process of creating identification word candidates for the search result details. The method of using these tables will be explained together with the explanation of the process of creating the identification word candidate table. Note that values can be set in these priority rule tables by the system administrator when the document search system is constructed.

特徴語更新装置１４０は、文書ＤＢに格納された各文書について、特徴的なキーワードを特徴語として抽出して当該レコードを更新する装置である。特徴語の選出は、単語の特徴量を表す指標の１つであるｔｆ－ｉｄｆを用いることで実現できる。特徴語更新装置１４０は、文書ＤＢ１２０におけるキーワード：出現頻度の項目から、各単語の出現頻度を取得し、ｔｆ－ｉｄｆ値の高い順に、最大Ｎ件のキーワードを特徴語として抽出する。Ｎの値は文書検索システムを構築したときにシステム管理者が値を設定できる。例えば、図３の文書ＤＢ１２０において、文書１の特徴語は、「設計」、「画面」、「モバイル」の３件である。 The feature word updating device 140 is a device that extracts characteristic keywords as feature words for each document stored in the document DB and updates the corresponding record. Feature words can be selected using tf-idf, which is one of the indices that represent the feature amount of a word. The feature word updating device 140 obtains the occurrence frequency of each word from the keyword: occurrence frequency field in the document DB 120, and extracts up to N keywords as feature words in descending order of tf-idf value. The value of N can be set by the system administrator when the document search system is constructed. For example, in the document DB 120 in Figure 3, the feature words for document 1 are "design", "screen", and "mobile".

なお、図１において、文書検索システム１００を構成する装置として、文書登録装置１１０、文書検索装置１３０、特徴語更新装置１４０の３つの装置と文書ＤＢ１２０とで構成される例を説明したが、本発明における文書検索システムはこの構成例に限定されるものではなく、各装置が備える機能を一つの装置が備えたシステムであっても良い。 In FIG. 1, an example has been described in which the document search system 100 is configured with three devices, the document registration device 110, the document search device 130, and the feature word update device 140, and the document DB 120. However, the document search system of the present invention is not limited to this configuration example, and may be a system in which a single device has the functions of each device.

図２は、本発明の文書検索検索システム１００や各装置として適用可能な情報処理装置のハードウェア構成の一例を示すブロック図である。 Figure 2 is a block diagram showing an example of the hardware configuration of an information processing device that can be used as the document search and retrieval system 100 of the present invention and each device.

図２に示すように、情報処理装置は、システムバス２００を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３、記憶装置２０４、入力コントローラ２０５、音声コントローラ２０６、ビデオコントローラ２０７、メモリコントローラ２０８、よび通信Ｉ／Ｆコントローラ２０９が接続される。 As shown in FIG. 2, the information processing device is connected to a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a storage device 204, an input controller 205, an audio controller 206, a video controller 207, a memory controller 208, and a communication I/F controller 209 via a system bus 200.

ＣＰＵ２０１は、システムバス２００に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 provides overall control over each device and controller connected to the system bus 200.

ＲＯＭ２０２あるいは外部メモリ２１３は、ＣＰＵ２０１が実行する制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、本情報処理方法を実現するためのコンピュータ読み取り実行可能なプログラムおよび必要な各種データ（データテーブルを含む）を保持している。 The ROM 202 or external memory 213 holds the BIOS (Basic Input/Output System) and OS (Operating System), which are control programs executed by the CPU 201, computer-readable and executable programs for implementing this information processing method, and various necessary data (including data tables).

ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１３からＲＡＭ２０３にロードし、ロードしたプログラムを実行することで各種動作を実現する。 RAM 203 functions as the main memory, work area, etc. of CPU 201. CPU 201 loads programs and the like required for executing processing from ROM 202 or external memory 213 into RAM 203, and executes the loaded programs to realize various operations.

入力コントローラ２０５は、キーボード２１０や不図示のマウス等のポインティングデバイス等の入力装置からの入力を制御する。入力装置がタッチパネルの場合、ユーザがタッチパネルに表示されたアイコンやカーソルやボタンに合わせて押下（指等でタッチ）することにより、各種の指示を行うことができることとする。 The input controller 205 controls input from input devices such as a keyboard 210 and a pointing device such as a mouse (not shown). If the input device is a touch panel, the user can give various instructions by pressing (touching with a finger, etc.) icons, cursors, or buttons displayed on the touch panel.

また、タッチパネルは、マルチタッチスクリーンなどの、複数の指でタッチされた位置を検出することが可能なタッチパネルであってもよい。 The touch panel may also be a touch panel capable of detecting positions touched by multiple fingers, such as a multi-touch screen.

ビデオコントローラ２０７は、ディスプレイ２１２などの外部出力装置への表示を制御する。ディスプレイは本体と一体になったノート型パソコンのディスプレイも含まれるものとする。なお、外部出力装置はディスプレイに限ったものははく、例えばプロジェクタであってもよい。また、前述のタッチ操作を受け付け可能な装置については、入力装置も提供する。 The video controller 207 controls the display on an external output device such as a display 212. The display also includes the display of a notebook computer that is integrated with the main body. Note that the external output device is not limited to a display, and may be, for example, a projector. In addition, for devices that can accept the aforementioned touch operations, an input device is also provided.

なおビデオコントローラ２０７は、表示制御を行うためのビデオメモリ（ＶＲＡＭ）を制御することが可能で、ビデオメモリ領域としてＲＡＭ２０３の一部を利用することもできるし、別途専用のビデオメモリを設けることも可能である。 The video controller 207 can control a video memory (VRAM) for display control, and can use part of the RAM 203 as a video memory area, or can provide a separate dedicated video memory.

メモリコントローラ２０８は、外部メモリ２１３へのアクセスを制御する。外部メモリとしては、ブートプログラム、各種アプリケーション、フォントデータ、ユーザファイル、編集ファイル、および各種データ等を記憶する外部記憶装置（ハードディスク）、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等を利用可能である。 The memory controller 208 controls access to the external memory 213. The external memory can be an external storage device (hard disk) that stores the boot program, various applications, font data, user files, edit files, and various data, a flexible disk (FD), or a compact flash (registered trademark) memory connected to a PCMCIA card slot via an adapter.

通信Ｉ／Ｆコントローラ２０９は、ネットワークを介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信やＩＳＤＮなどの電話回線、および携帯電話の４Ｇ回線、５Ｇ回線等を用いた通信が可能である。 The communication I/F controller 209 connects and communicates with external devices via a network, and executes communication control processing on the network. For example, communication using TCP/IP, telephone lines such as ISDN, and 4G and 5G lines for mobile phones are possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１２上での表示を可能としている。また、ＣＰＵ２０１は、ディスプレイ２１２上の不図示のマウスカーソル等でのユーザ指示を可能とする。 The CPU 201 can display the outline font on the display 212 by, for example, executing a process of expanding (rasterizing) the outline font in a display information area in the RAM 203. The CPU 201 can also allow the user to give instructions using a mouse cursor (not shown) on the display 212.

次に、図１０のフローチャートを用いて本発明の形態において検索処理部１３１がクライアント端末から検索リクエストを受けたときに実行する検索処理について説明する。 Next, the search process that the search processing unit 131 executes when it receives a search request from a client terminal in this embodiment of the invention will be described with reference to the flowchart in FIG. 10.

まず、ステップＳ１００１において、クライアントから受けた検索操作に含まれる検索条件と、検索条件保存部１３２に保存されている検索条件とを合わせることで、一つの検索条件を作る。 First, in step S1001, a single search condition is created by combining the search condition included in the search operation received from the client with the search condition stored in the search condition storage unit 132.

検索条件には図４のように「検索クエリ」と「類似文書検索クエリの特徴語」と「キーワードフィルター」とを含む。図４に示す通り、検索クエリはキーワードである。検索クエリを含む検索条件によって、文書ＤＢ１２０に登録された文書の本文に対して全文検索が行われる。 As shown in FIG. 4, the search conditions include a "search query," a "characteristic word of the similar document search query," and a "keyword filter." As shown in FIG. 4, the search query is a keyword. A full-text search is performed on the text of documents registered in the document DB 120 using the search conditions including the search query.

類似文書検索クエリの特徴語は、ユーザーが手持ちの文書を文書検索システム１００へ送信し、文書検索システム１００がその文書に類似する文書を検索ＤＢから取得する類似文書検索において、検索条件として用いられるキーワードである。ユーザーから文書を受け付けると、文書検索システム１００はその文書において特徴的な単語をｔｆ－ｉｄｆのような統計値から求めて特徴語として自動抽出する。類似文書検索クエリの特徴語を含む検索条件による検索処理により、文書ＤＢに登録された文書のキーワードと、類似文書検索クエリにおける特徴語の一致率がしきい値以上の文書が取得される。 The feature words of a similar document search query are keywords used as search conditions in a similar document search in which a user sends a document they have to the document search system 100, and the document search system 100 retrieves documents similar to that document from the search DB. When the document is received from a user, the document search system 100 finds characteristic words in the document from statistics such as tf-idf and automatically extracts them as feature words. By performing a search process using search conditions that include the feature words of the similar document search query, documents are retrieved that have a matching rate between the keywords of documents registered in the document DB and the feature words in the similar document search query that is equal to or exceeds a threshold value.

キーワードフィルターはキーワードを含む絞り込み条件である。これを含む検索条件による検索処理により、文書ＤＢから当該キーワードを含む文書のみの検索が行われる。 A keyword filter is a narrowing down condition that includes a keyword. When a search is performed using search conditions that include this, only documents that contain the keyword are searched from the document database.

「検索クエリ」と「類似文書検索クエリの特徴語」と「キーワードフィルター」のうち複数の種類を含む検索条件からは、それぞれの条件のＡＮＤ条件をとることで最終的な検索結果の文書一覧が決まる。 When search conditions include multiple types of "search query," "characteristic words of similar document search query," and "keyword filter," the final list of documents in the search results is determined by taking the AND condition of each condition.

例えば、検索条件保存部１３２に何も保存されていない状態で、ユーザーから「操作種別：検索クエリによる検索、検索クエリ：『製品Ｘ仕様』」という検索操作を受け付けたとする。このとき検索クエリが「製品Ｘ仕様」である検索条件を生成する。そうすると、後の処理により、検索条件保存部１３２に「検索クエリ：『製品Ｘ仕様』」という検索条件が保存される。 For example, assume that a search operation of "Operation type: Search by search query, Search query: 'Product X specifications'" is received from a user when nothing is stored in the search condition storage unit 132. At this time, a search condition with the search query being "Product X specifications" is generated. Then, through later processing, the search condition "Search query: 'Product X specifications'" is stored in the search condition storage unit 132.

その後、追加でユーザーから「操作種別：キーワードフィルターの追加、キーワード：『画面』」という検索操作を受け付けたとする。このとき、検索条件保存部１３２に保存された検索条件である「検索クエリ：『製品Ｘ仕様』」と、ユーザーから受け付けた操作により生成されるキーワードフィルター「画面」を合わせて、「検索クエリ：『製品Ｘ仕様』」かつ「キーワードフィルター：『画面』」であるような検索条件を生成する。 After that, suppose that an additional search operation of "operation type: add keyword filter, keyword: 'screen'" is received from the user. At this time, the search condition "search query: 'product X specifications'" stored in the search condition storage unit 132 is combined with the keyword filter "screen" generated by the operation received from the user to generate search conditions of "search query: 'product X specifications'" and "keyword filter: 'screen'".

ステップＳ１００２において、検索条件に合致する文書を文書ＤＢ１２０から検索し、検索された文書を、検索条件に合致する程度を示す値であるスコアの高い順に並べる。効率的な検索処理を実現するためには、文書登録処理部１１３において、公知の技術である転置インデックスを作成して検索時に用いればよい。 In step S1002, documents that match the search criteria are searched for in the document DB 120, and the retrieved documents are sorted in descending order of score, which is a value indicating the degree to which the documents match the search criteria. To achieve efficient search processing, the document registration processing unit 113 can create a transposed index, which is a well-known technique, and use it during the search.

ステップＳ１００３において、検索結果の関連語を求める。関連語としては、検索された文書に含まれるキーワードのうち一部を用いる。具体的な関連語の取得方法として、例えば検索された文書のうちおよそ半分の文書に共通に含まれるキーワードを選出する。この選出方法により、検索された文書において複数の話題を持つ文書が混在するとき、それらの話題を適切に分割する関連語が選ばれることを期待できる。 In step S1003, related words from the search results are obtained. Some of the keywords contained in the searched documents are used as related words. As a specific method for obtaining related words, for example, keywords that are commonly contained in approximately half of the searched documents are selected. With this selection method, when the searched documents contain a mixture of documents with multiple topics, it is expected that related words that appropriately divide those topics will be selected.

ステップＳ１００４において、生成された検索条件により検索条件保存部１３２を更新し、ユーザーから受け付けた検索操作により検索操作保存部１３３を更新する。 In step S1004, the search condition storage unit 132 is updated based on the generated search conditions, and the search operation storage unit 133 is updated based on the search operation received from the user.

ステップＳ１００５において、ユーザーから識別単語を提示（指定）されたか否かにより処理を分岐する。識別単語を提示されている場合、それを識別単語として用いてステップＳ１０１０に進む。提示されていない場合、ステップＳ１００６に進む。 In step S1005, the process branches depending on whether an identification word has been presented (specified) by the user. If an identification word has been presented, it is used as the identification word and the process proceeds to step S1010. If an identification word has not been presented, the process proceeds to step S1006.

ステップＳ１００６において、文書検索システムが識別単語簡易選択の動作モードであるか否かによって処理を分岐する。識別単語簡易選択の動作モードであるかどうかは、文書検索システムのシステム管理者により設定可能である。簡易選択の動作モードであればステップＳ１００７に進み、簡易選択の動作モードでなければＳ１００８に進む。 In step S1006, the process branches depending on whether the document search system is in the simple identification word selection operation mode. Whether the document search system is in the simple identification word selection operation mode can be set by the system administrator of the document search system. If the document search system is in the simple selection operation mode, the process proceeds to step S1007, and if the document search system is not in the simple selection operation mode, the process proceeds to step S1008.

識別単語簡易選択モードである場合（ステップＳ１００６：ＹＥＳ）は、ステップＳ１００７において、検索条件保存部に保存されている検索クエリから、空白区切りにより単語を抜き出し、それを識別単語として用いる。 If the mode is simple identification word selection mode (step S1006: YES), in step S1007, words are extracted from the search query stored in the search condition storage unit using spaces as delimiters, and are used as identification words.

識別単語簡易選択モードではない場合（ステップＳ１００６：ＮＯ）は、ステップＳ１００８において、識別単語候補作成部に識別単語候補作成処理を行わせ、その結果を受け取る。識別単語候補作成処理については後述する。結果として図９のような、単語と優先度を持つ識別単語候補表を得られる。 If the identification word simple selection mode is not selected (step S1006: NO), in step S1008, the identification word candidate creation unit is made to perform an identification word candidate creation process, and the result is received. The identification word candidate creation process will be described later. As a result, an identification word candidate table having words and priorities, as shown in FIG. 9, is obtained.

ステップＳ１００９では、識別単語候補表から所定の条件を満たす単語（例えば優先度上位Ｎ件の単語や優先度が閾値以上の単語など）を識別単語として選出する。ここでＮは文書検索システム１００において定義された定数である。 In step S1009, words that satisfy a predetermined condition (e.g., the top N words with a priority or words with a priority equal to or higher than a threshold) are selected as identification words from the identification word candidate table. Here, N is a constant defined in the document search system 100.

ユーザーは検索クエリに含まれる単語がハイライトされる動作に慣れている場合があるため、その場合にはシステム管理者が識別単語簡易選択の動作モードになるよう文書検索システムを設定することで、検索クエリの単語がハイライトされるようになり、ユーザーにとって違和感のない挙動を実現できる。 Users may be accustomed to the behavior of words included in a search query being highlighted. In that case, the system administrator can configure the document search system to use the simple word identification selection mode, which will highlight the words in the search query, providing a behavior that is natural to users.

一方で、よりユーザーにとって興味深い可能性が高い単語をハイライトさせたい場合、システム管理者は識別単語候補作成処理を行わせるよう文書検索システムを設定することもできる。 On the other hand, if the administrator wants to highlight words that are more likely to be interesting to users, the system administrator can configure the document retrieval system to perform a process to create candidate discriminative words.

ステップＳ１０１０において、検索結果として得られた各文書の本文から識別単語周辺の文字列をスニペットとして抽出する。周辺の文字列として、識別単語の前後Ｎ文字を用いることができる。ここでＮは文書検索システム１００において定義された定数である。 In step S1010, a character string around the identified word is extracted as a snippet from the body of each document obtained as a search result. The surrounding character string can be N characters before and after the identified word. Here, N is a constant defined in the document search system 100.

さらに、クライアント端末において識別単語部分が識別表示されるように、抽出したスニペットの識別単語の部分を識別表示タグで囲む。識別表示の方法としては、識別単語を太字で表示する方法、他の文字列とは異なるフォントで表示する方法、マーカーで色付けして表示する方法、他の文字列とは異なる文字色で表示する方法など、識別単語が識別可能になる表示形態であればいずれでも良い。 Furthermore, the identified word portion of the extracted snippet is enclosed in an identification display tag so that the identified word portion is displayed in an identifiable manner on the client terminal. The method of identifying and displaying the identified word may be any display format that makes the identified word identifiable, such as displaying the identified word in bold, displaying it in a font different from other character strings, displaying it in a colored marker, or displaying it in a character color different from other character strings.

スニペットを効率よく抽出するために、公知の技術である転置インデックスを用いて本文内における識別単語の位置を取得することができる。 To extract snippets efficiently, the position of identified words in the text can be obtained using a known technique called inverted indexing.

識別単語が複数ある場合、各識別単語および周辺文脈の抽出結果を文字列結合したものをスニペットとして用いることができる。 When there are multiple identification words, the extracted results of each identification word and surrounding context can be combined into a string and used as a snippet.

例えば図１２に示す検索画面においては、ユーザーは明示的に識別単語を指定しておらず、検索クエリ１２０１に「製品Ｘ仕様」が設定されており、類似文書検索クエリの特徴語１２０２に「製品Ｙ」「仕様」「画面」という特徴語が設定されており、キーワードフィルター１２０３に「画面」「モバイル」というキーワードが設定されている。なお、画面上では「絞り込み条件」というラベルによりキーワードフィルターを表示している。ここで、仮にステップＳ１００６において識別単語簡易選択の動作モードがＹｅｓであったとすると、ステップＳ１００７により識別単語として「製品Ｘ」「仕様」という２件の単語が選出される。そのとき、検索処理部は検索結果の一番目の文書において、「製品Ｘ」および「仕様」の周辺文脈として本文内から「製品Ｘの管理画面の仕様は以下の通りとする」というスニペットを抽出する。さらに「製品Ｘ」「仕様」を識別表示タグで囲む。その結果、クライアント端末では検索結果の一番目の文書において、「製品Ｘの管理画面の仕様は以下の通りとする」「管理画面の仕様は以下の通りとする」という２つのテキストからなるスニペットが表示され、「製品Ｘ」「仕様」は識別表示される。 For example, in the search screen shown in FIG. 12, the user does not explicitly specify an identification word, and "Product X specifications" is set in the search query 1201, the feature words "Product Y", "Specifications", and "Screen" are set in the feature word 1202 of the similar document search query, and the keywords "Screen" and "Mobile" are set in the keyword filter 1203. Note that the keyword filter is displayed on the screen with a label "Refinement conditions". If the operation mode of the identification word simple selection is Yes in step S1006, two words, "Product X" and "Specifications", are selected as identification words in step S1007. At that time, the search processing unit extracts a snippet "The specifications of the management screen of Product X are as follows" from the text as the surrounding context of "Product X" and "Specifications" in the first document of the search results. Furthermore, "Product X" and "Specifications" are surrounded by identification display tags. As a result, on the client device, the first document in the search results is displayed as a snippet consisting of two pieces of text: "The specifications of the administration screen for product X are as follows" and "The specifications of the administration screen are as follows," with "product X" and "specifications" clearly displayed.

また、図１４の検索画面において、ユーザーは識別単語として「モバイル」を指定している。この場合、検索処理部は識別単語として「モバイル」を選出する。以下同様にして、クライアント端末では検索結果の一番目の文書において、「管理画面はモバイル向けには提供しない」「モバイル向けの検索画面の設計は以下の通り」という２つのテキストからなるスニペットが表示され、「モバイル」は識別表示される。 In addition, on the search screen in Figure 14, the user specifies "mobile" as the identification word. In this case, the search processing unit selects "mobile" as the identification word. Similarly, on the client terminal, in the first document of the search results, a snippet consisting of two pieces of text, "The management screen is not provided for mobile devices" and "The design of the search screen for mobile devices is as follows," is displayed, with "mobile" being identified.

ステップＳ１０１１において、文書ＤＢ１２０から得られた文書一覧と求めた関連語とスニペットにより検索結果保存部１３４を更新する。 In step S1011, the search result storage unit 134 is updated with the document list obtained from the document DB 120 and the related words and snippets found.

ステップＳ１０１２において、検索結果をクライアント端末へ返す。クライアント端末では図１２のような検索結果画面が表示される。 In step S1012, the search results are returned to the client terminal. The search results screen shown in FIG. 12 is displayed on the client terminal.

次に、図１１のフローチャートを用いて、本発明の実施形態における識別単語候補作成部１３５が実行する識別単語候補作成処理について説明する。 Next, the identification word candidate creation process executed by the identification word candidate creation unit 135 in an embodiment of the present invention will be described using the flowchart in FIG.

また、参考例として、検索条件保存部１３２に図４の検索条件が、検索結果保存部１３３に図５の検索結果が保存されているものとし、検索操作保存部１３３に保存されている検索操作が「キーワードフィルターの追加・キーワード『モバイル』」であるとする。また、優先度ルール表として優先度ルール表（検索結果一覧）１３６を用いて、その中身が図６であるとする。 As a reference example, assume that the search conditions in FIG. 4 are stored in the search condition storage unit 132, the search results in FIG. 5 are stored in the search result storage unit 133, and the search operation stored in the search operation storage unit 133 is "Add keyword filter, keyword 'mobile'." Also assume that the priority rule table (search result list) 136 is used as the priority rule table, and its contents are as shown in FIG. 6.

ステップＳ１１０１からＳ１１０５にかけて、識別単語候補ソース表の作成処理が行われる。識別単語候補ソース表は図８のように単語と、その単語の取得元および取得元詳細からなる表である。取得元詳細は空のことがありうる。同じ単語が２回以上出現する場合もある。 In steps S1101 to S1105, the process of creating the identification word candidate source table is performed. The identification word candidate source table is a table consisting of words, their source of acquisition, and source details, as shown in FIG. 8. The source details may be empty. The same word may appear more than once.

ステップＳ１１０１において、検索条件保存部１３２に保存されている検索クエリから単語を識別単語候補ソース表に加える。 In step S1101, words from the search query stored in the search condition storage unit 132 are added to the identification word candidate source table.

例えば、図４の検索条件には検索クエリ「製品Ｘ仕様」が含まれるが、この検索クエリの文字列を空白で区切り「製品Ｘ」「仕様」という単語が得られる。「製品Ｘ」は取得元を「検索クエリ」とし、検索クエリ内の左から１番目に得られた単語であるため取得元詳細を「前から１番目」として識別単語候補ソース表に加える。同様に「仕様」は取得元を「検索クエリ」とし、検索クエリの左から２番目に得られた単語であるため取得元詳細を「前から２番目」として識別単語候補ソース表に加える。 For example, the search conditions in Figure 4 include the search query "Product X specifications", but by separating the string of this search query with spaces, the words "Product X" and "Specifications" are obtained. The source of "Product X" is the "search query", and since it is the first word obtained from the left in the search query, it is added to the identification word candidate source table with the source details set to "first from the front". Similarly, the source of "Specifications" is the "search query", and since it is the second word obtained from the left in the search query, it is added to the identification word candidate source table with the source details set to "second from the front".

ステップＳ１１０２において、検索条件保存部１３２に保存されている類似文書検索クエリの特徴語から単語を識別単語候補ソース表に加える。 In step S1102, words are added to the identification word candidate source table from the feature words of the similar document search query stored in the search condition storage unit 132.

例えば、図４の検索条件には類似文書検索クエリの特徴語として「製品Ｙ」「仕様」「画面」が含まれるため、「製品Ｙ」「仕様」「画面」という単語が得られる。これらの単語は、取得元を「類似文書検索クエリの特徴語」として識別単語候補ソース表に加える。 For example, the search conditions in Figure 4 include "product Y," "specifications," and "screen" as feature words in the similar document search query, so the words "product Y," "specifications," and "screen" are obtained. The sources from which these words are obtained are added to the identification word candidate source table as "feature words in similar document search queries."

ステップＳ１１０３において、検索条件保存部１３２に保存されているキーワードフィルターから単語を識別単語候補ソース表に加える。 In step S1103, words are added to the identification word candidate source table from the keyword filter stored in the search condition storage unit 132.

例えば、図４の検索条件にはキーワードフィルターとして「画面」「モバイル」が含まれるため、「画面」「モバイル」という単語が得られる。これらの単語は、取得元を「キーワードフィルター」として識別単語候補ソース表に加える。 For example, the search conditions in Figure 4 include "screen" and "mobile" as keyword filters, so the words "screen" and "mobile" are obtained. These words are added to the identified word candidate source table with the acquisition source as "keyword filter."

ステップＳ１１０４において、検索結果保存部１３４に保存されている関連語から単語を識別単語候補ソース表に加える。 In step S1104, words are added to the identification word candidate source table from the related words stored in the search result storage unit 134.

例えば、図５の検索結果には関連語として「企画」「設計」「提案」が含まれるため、「企画」「設計」「提案」という単語が得られる。これらの単語は、取得元を「検索結果の関連語」として識別単語候補ソース表に加える。 For example, the search results in Figure 5 include the related words "planning," "design," and "proposal," so the words "planning," "design," and "proposal" are obtained. These words are added to the identified word candidate source table as "related words in search results."

ステップＳ１１０５において、検索結果保存部１３４に保存されている文書一覧の特徴語（その文書において特徴的な単語であって、ｔｆ－ｉｄｆのような統計値から求められる単語）を識別単語候補ソース表に加える。 In step S1105, the characteristic words (words that are characteristic of the document and can be determined from statistical values such as tf-idf) in the document list stored in the search result storage unit 134 are added to the identification word candidate source table.

例えば、図５の検索結果の一番目の文書には特徴語として「設計」「画面」「モバイル」が含まれるため、「設計」「画面」「モバイル」という単語が得られる。これらの単語は、取得元を「検索結果の特徴語」とし、検索スコアが最も高い文書から得た特徴語であるため取得元詳細を「文書の検索スコア１位」として識別単語候補ソース表に加える。検索結果の２番目、３番目の文書についても同様に特徴語を加えることができる。 For example, the first document in the search results in Figure 5 contains the feature words "design," "screen," and "mobile," so the words "design," "screen," and "mobile" are obtained. The source of these words is set as "feature words of search results," and because they are feature words obtained from the document with the highest search score, the source details are added to the identification word candidate source table as "document with the highest search score." Feature words can be added in the same way for the second and third documents in the search results.

ステップＳ１１０６において、識別単語候補ソース表のエントリ一覧について繰り返す処理を開始する。 In step S1106, a process of repeating the list of entries in the identification word candidate source table begins.

ステップＳ１１０７において、取得元および検索操作保存部１３３に保存されている検索操作に応じて、優先度ルールから加算する優先度を計算（算出）する。 In step S1107, the priority to be added is calculated based on the priority rules according to the acquisition source and the search operation stored in the search operation storage unit 133.

例えば、図８の識別単語候補ソース表におけるエントリ８０１の単語「モバイル」の優先度は、図６の優先度ルール表（検索結果一覧）により以下のように計算される。まず取得元がキーワードフィルターであるため、ルール６０４より優先度は＋３００される。 For example, the priority of the word "mobile" in entry 801 in the identification word candidate source table in FIG. 8 is calculated as follows using the priority rule table (search result list) in FIG. 6. First, because the source is a keyword filter, the priority is set to +300 according to rule 604.

また、検索操作保存部１３３に保存されている直近の検索操作の操作種別が「キーワードフィルターの追加」であり、かつ追加されたキーワードが「モバイル」であるため、ルール６０５より優先度は＋８００される。 In addition, since the operation type of the most recent search operation stored in the search operation storage unit 133 is "add keyword filter" and the added keyword is "mobile," the priority is increased by +800 based on rule 605.

一般に、ユーザーが最後に行った検索操作はユーザーが直前に興味を持った内容を反映していると考えられる。そのため、ユーザーが最後に行った検索操作に関連する識別単語は、優先度を上げて優先的に表示することが有益と考えられる。ルール６０１、６０２、６０３も同様に、ユーザーが直前に興味を持った内容の優先度を高めるためのルールである。 In general, it is believed that the last search operation performed by a user reflects the content in which the user was most recently interested. For this reason, it is believed to be beneficial to increase the priority of identification words related to the last search operation performed by the user and display them preferentially. Rules 601, 602, and 603 are also rules for increasing the priority of content in which the user was most recently interested.

最終的にはエントリ８０１の単語「モバイル」の優先度は１１００となる。 Finally, the priority of the word "mobile" in entry 801 becomes 1100.

もう一つの例として、図８の識別単語候補ソース表におけるエントリ８０２の単語「設計」の優先度は以下のように計算される。まず取得元が「検索結果の特徴語」由来であるため、ルール６０４より優先度は＋５０される。
さらに、取得元詳細において文書の検索スコア順位が１位であるため、ルール６０５より優先度は－１される。 As another example, the priority of the word "design" in the entry 802 in the identified word candidate source table in Fig. 8 is calculated as follows: First, since the source is the "characteristic word of the search result", the priority is set to +50 according to the rule 604.
Furthermore, since the document has the first highest search score in the source details, the priority is reduced by 1 according to rule 605 .

このルールは、検索順位が高いほど優先度を高くする（下げ幅を小さくする）というルールである。 This rule states that the higher the search ranking, the higher the priority (the smaller the drop).

検索順位が高い文書はユーザーの指定した検索条件への一致度合いが高く、ユーザーが興味を持つ可能性が高い文書といえる。そのため、検索順位が高い文書に特徴的に現れる識別単語は、優先度を上げて優先的に表示することが有益と考えられる。 Documents with high search rankings match the search criteria specified by the user to a high degree, and are therefore likely to be of interest to the user. For this reason, it is considered beneficial to give priority to identifying words that appear characteristically in documents with high search rankings, and display them preferentially.

ステップＳ１１０８では、識別単語候補表におけるその単語エントリの優先度を更新する。その際の優先度の値は、識別単語候補表に既にその単語が存在する場合、既存の優先度とステップＳ１１０７で求めた値を足したものである。一方、識別単語候補表にその単語が存在していない場合、ステップＳ１１０７で求めた値を優先度の初期値として単語を追加する。 In step S1108, the priority of the word entry in the identification word candidate table is updated. If the word already exists in the identification word candidate table, the priority value is the sum of the existing priority and the value determined in step S1107. On the other hand, if the word does not exist in the identification word candidate table, the word is added with the value determined in step S1107 as the initial priority value.

例えば、図８の識別単語候補ソース表において、単語「モバイル」は取得元が「キーワードフィルター」であるエントリ８０１と、取得元が「検索結果文書の特徴語」であるエントリ８０３の２つのエントリがある。
エントリ８０１において優先度が１１００得られ、エントリ８０３において優先度が４９得られたとすると、最終的に単語「モバイル」の優先度はそれらの和である１１４９となる。 For example, in the identified word candidate source table of FIG. 8, the word "mobile" has two entries: an entry 801 whose source is "keyword filter" and an entry 803 whose source is "characteristic words of search result documents."
If entry 801 obtains a priority of 1100 and entry 803 obtains a priority of 49, then the final priority of the word "mobile" becomes 1149, which is the sum of the priorities.

ステップＳ１１０９では、識別単語の一覧に未処理の単語が残っていれば処理をステップＳ１１０６に戻し、全て処理が終了していればステップＳ１１１０に進む。 In step S1109, if there are unprocessed words remaining in the list of identified words, the process returns to step S1106, and if all processing has been completed, the process proceeds to step S1110.

ステップＳ１１１０では識別単語候補表を優先度の降順に並べ替える。なお、同一の優先度である単語はどのような順番にしてもよい。順番を一意にしたい場合、単語の文字コード順にしてもよい。 In step S1110, the identification word candidate table is sorted in descending order of priority. Note that words with the same priority may be sorted in any order. If a unique order is desired, the order may be in the order of the word's character code.

最終的な識別単語候補表は図９のようになる。仮に検索処理のステップＳ１００６で識別単語簡易選択の動作モードがＮｏであり、文書検索システムが優先度上位１単語を識別単語として選択する設定になっている場合、識別単語は「モバイル」となる。これは単純に検索クエリから識別単語を選んだ場合とは別の結果になる。こうして、直近の検索操作に関係が強いキーワードフィルターの単語など、ユーザーが興味を持つ可能性が高い単語を自動で優先して識別表示することができる。 The final identification word candidate table will look like that shown in Figure 9. If the operation mode for simple identification word selection is No in step S1006 of the search process, and the document search system is set to select the top priority word as the identification word, the identification word will be "mobile." This will be a different result from when an identification word is simply selected from the search query. In this way, it is possible to automatically prioritize and display words that are likely to interest the user, such as words in the keyword filter that are closely related to the most recent search operation.

識別単語候補作成処理は、検索処理の一部として呼ばれるだけでなく、クライアント端末からのリクエストに応じて単体で実行されることもある。ユーザーが識別単語を手動入力する際、候補をユーザーに提示して識別単語を簡易に入力できるようにすることを目的とする。 The identification word candidate creation process is not only called as part of the search process, but can also be executed independently in response to a request from a client terminal. The purpose is to present candidates to the user when manually inputting an identification word, making it easy to input the identification word.

図１３を例として説明する。クライアント端末に検索結果画面が表示されているとき、ユーザーはクライアント端末を操作し、識別単語を手動で入力するフォーム１３０１にフォーカスを当てる。このとき、クライアント端末は文書検索システムへ識別単語候補表をリクエストする。リクエストを受け取った文書検索システムは識別単語候補作成処理を実施し、識別単語候補表をクライアント端末へ返す。なお、この場合には作成された識別単語候補表を上位だけに絞り込むことをせず、全件クライアント端末へ返す。なお、現在の検索結果において識別単語になっている単語のエントリを返さないこともできる。 An explanation will be given using Figure 13 as an example. When the search result screen is displayed on the client terminal, the user operates the client terminal and focuses on form 1301 for manually entering identification words. At this time, the client terminal requests an identification word candidate table from the document search system. Upon receiving the request, the document search system performs an identification word candidate creation process and returns the identification word candidate table to the client terminal. Note that in this case, the created identification word candidate table is not narrowed down to just the top entries, but the entire table is returned to the client terminal. It is also possible not to return entries for words that are identification words in the current search results.

識別単語候補表を受けったクライアント端末は、フォーム１３０１の下部に識別単語候補一覧１３０２を表示する。識別単語候補一覧１３０２は、識別単語候補表の単語を並べたものである。上に表示される単語ほど優先度が高く、上下が同じ位置では左に表示される単語ほど優先度が高い。 The client terminal that receives the identification word candidate table displays an identification word candidate list 1302 at the bottom of the form 1301. The identification word candidate list 1302 lists the words in the identification word candidate table. The higher the word that appears, the higher the priority, and when the words are in the same vertical position, the word that appears to the left has the highest priority.

ユーザーがクライアント端末を操作して識別単語候補一覧１３０２に含まれる単語をクリックすると、フォーム１３０１にその単語が入力される。それと同時に、クライアント端末は、クリックされた単語を識別単語として現在と同じ検索条件で検索するよう、文書検索システムへリクエストを送る。その結果、文書検索システムにおいて検索処理が行われ、クライアント端末に検索結果が返ってくる。クライアント端末は返ってきた検索結果をもとに、検索結果の表示を更新する。なお、検索条件が同じであるため、検索結果の文書および順位は同一になり、実際にはスニペットの表示だけ更新されることになる。 When a user operates the client terminal to click on a word included in the list of identified word candidates 1302, that word is entered into form 1301. At the same time, the client terminal sends a request to the document search system to perform a search using the same current search conditions as the clicked word as the identified word. As a result, a search process is performed in the document search system, and the search results are returned to the client terminal. The client terminal updates the display of the search results based on the returned search results. Note that because the search conditions are the same, the documents and rankings of the search results will be the same, and in fact only the snippet display is updated.

例えば図１３において単語１３０３である「モバイル」がクリックされたとき、検索画面は図１４のように変化する。すなわち、検索結果の文書は同一であり、スニペットの表示が「モバイル」周辺の文字列に変化する。 For example, when the word 1303 "mobile" in FIG. 13 is clicked, the search screen changes to that shown in FIG. 14. In other words, the document in the search results is the same, and the snippet display changes to the character string surrounding "mobile."

さらに、クライアント端末に表示される画面として、特定の文書の詳細を表す図１５のような検索結果詳細画面も存在する。 In addition, a search result details screen like that shown in Figure 15 can be displayed on the client terminal, showing details of a specific document.

これは、検索リクエストの返り値である検索結果として得られた文書から、１件の文書のみを詳細に表示する画面である。この画面によりユーザーは興味を持った文書について、本文をより詳細に確認することができる。
検索結果詳細画面においても、ユーザーが識別単語を手動で設定し、識別単語の周辺文脈を確認する機能がある。このときも、文書検索システムは識別単語の候補をユーザーに提示してユーザーが識別単語を入力できるよう支援を行う。 This is a screen that displays the details of only one document from the documents returned as a result of a search request. This screen allows users to view the text of the document that interests them in more detail.
The search result details screen also has a function that allows the user to manually set the identification word and check the surrounding context of the identification word. In this case, the document search system also supports the user by presenting candidates for the identification word to the user so that the user can input the identification word.

識別単語候補提示の流れは、検索結果一覧画面のものと類似している。
ユーザーがクライアント端末を操作し、識別単語を手動で入力するフォーム１５０１にフォーカスを当てたとき、クライアント端末は文書検索システムへ識別単語候補表をリクエストする。ただしこのとき、クライアント端末は優先度ルール表（検索結果詳細）を用いて優先度を決定するようにリクエストを行う。文書検索システムは優先度ルール表（検索結果詳細）を用いて求めた識別単語候補表をクライアント端末へ返す。識別単語候補表を受けったクライアント端末は、フォーム１５０１の下部に識別単語候補一覧１５０２を表示する。 The flow of presenting identification word candidates is similar to that of the search result list screen.
When a user operates a client terminal and focuses on a form 1501 for manually inputting an identification word, the client terminal requests an identification word candidate table from the document search system. However, at this time, the client terminal requests that the priority be determined using the priority rule table (search result details). The document search system returns the identification word candidate table determined using the priority rule table (search result details) to the client terminal. The client terminal that receives the identification word candidate table displays an identification word candidate list 1502 at the bottom of the form 1501.

ユーザーがクライアント端末を操作して識別単語候補一覧１５０２に含まれる単語をクリックすると、フォーム１５０１にその単語が入力される。それと同時に、クライアント端末は、クリックされた単語を識別単語として、現在詳細表示している文書の文書ＩＤを検索条件として検索するよう、文書検索システムへリクエストを送る。その結果文書検索システムにおいて検索処理が行われ、当該文書１件のスニペットを含む検索結果がクライアント端末に返される。端末は返された検索結果をもとに、本文の表示をスニペットに置き換える。なお、検索システムは本文以外でも、タイトルのようなテキスト項目からスニペットを抽出できる構成にすることが可能であり、本文以外のフィールドの表示を抽出されたスニペットで置き換えることができる。 When a user operates the client terminal to click on a word included in the list of identification word candidates 1502, that word is entered into form 1501. At the same time, the client terminal sends a request to the document search system to search for the document ID of the document currently being displayed in detail, using the clicked word as the identification word and the document ID as the search condition. As a result, a search process is performed in the document search system, and search results including a snippet from that one document are returned to the client terminal. Based on the returned search results, the terminal replaces the display of the main text with the snippet. Note that the search system can be configured to extract snippets from text items other than the main text, such as the title, and the display of fields other than the main text can be replaced with the extracted snippet.

例えば図１５において識別単語候補一覧にある単語１５０３「モバイル」がクリックされると、図１６のように本文が「モバイル」周辺の文字列を表示するよう変化する。タイトルは「モバイル」を含まないため、空欄を表示する。 For example, when the word 1503 "mobile" in the list of identified word candidates in FIG. 15 is clicked, the text changes to show the characters around "mobile" as shown in FIG. 16. Because the title does not include "mobile," a blank is displayed.

検索結果詳細に関する識別単語候補表の作成処理について説明する。基本的には検索結果一覧によるものと同様、図１１のフローチャートに従って処理を行う。ただし、ステップＳ１１０５において、詳細表示対象の文書の特徴語のみを識別単語候補ソース表に加える対象として、それ以外の文書の特徴語は加えない。他の文書の関連語が詳細表示中の文書に関係ある可能性は低いためである。また、ステップＳ１１０７において優先度ルールを計算する際、優先度ルール表（検索結果詳細）１３７を用いて計算を行う。図７に優先度ルール表（検索結果詳細）１３７の例を記載している。基本的には優先度ルール表（検索結果一覧）１３６と同じであるが、検索結果の特徴語に関してはルール７０１のように、取得元が当該文書の特徴語である場合に適用されるルールとなる。また、検索スコアの順位は利用しない。 The process of creating an identification word candidate table for search result details will be described. Basically, the process is performed according to the flowchart in FIG. 11, similar to the process for the search result list. However, in step S1105, only the characteristic words of the document to be displayed in detail are added to the identification word candidate source table, and characteristic words of other documents are not added. This is because related words of other documents are unlikely to be related to the document being displayed in detail. In addition, when calculating the priority rule in step S1107, the calculation is performed using the priority rule table (search result details) 137. An example of the priority rule table (search result details) 137 is shown in FIG. 7. It is basically the same as the priority rule table (search result list) 136, but for the characteristic words of the search results, the rule is applied when the source is a characteristic word of the document, as in rule 701. In addition, the search score ranking is not used.

以上説明した通り、本発明では、検索クエリとして指定された単語だけでなく、検索された文書から取得される、当該文書を特徴付ける単語や、検索された文書に関連する単語についても識別表示の対象とすることが可能となるため、文書検索を行ったユーザは、検索結果を効率的に確認することが可能となる。 As described above, the present invention makes it possible to distinguish and display not only the words specified as the search query, but also words that characterize the searched document and words related to the searched document, allowing users who perform document searches to efficiently check the search results.

本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 The present invention can be embodied, for example, as a system, device, method, program, or recording medium. Specifically, it may be applied to a system made up of multiple devices, or to an apparatus made up of a single device.

また、本発明におけるプログラムは、図１０、図１１に示すフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は図１０、図１１の処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは図１０、図１１の各装置の処理方法ごとのプログラムであってもよい。 The program of the present invention is a program that enables a computer to execute the processing method of the flowcharts shown in Figures 10 and 11, and the storage medium of the present invention stores a program that enables a computer to execute the processing method of Figures 10 and 11. Note that the program of the present invention may be a program for each processing method of each device in Figures 10 and 11.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, it goes without saying that the object of the present invention can be achieved by supplying a recording medium on which a program that realizes the functions of the above-mentioned embodiments is recorded to a system or device, and having the computer (or CPU or MPU) of that system or device read and execute the program stored on the recording medium.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program read from the recording medium itself realizes the novel functions of the present invention, and the recording medium on which the program is recorded constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＤＶＤ－ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 Recording media for supplying the program may include, for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a DVD-ROM, a magnetic tape, a non-volatile memory card, a ROM, an EEPROM, a silicon disk, etc.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, it goes without saying that not only are the functions of the above-mentioned embodiments realized by the computer executing a program it has read, but also that the functions of the above-mentioned embodiments can be realized by an operating system (OS) or the like running on the computer carrying out some or all of the actual processing based on the instructions of the program.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, it goes without saying that this also includes cases where a program read from a recording medium is written into a memory provided on a function expansion board inserted into a computer or a function expansion unit connected to a computer, and then a CPU or the like provided on the function expansion board or function expansion unit performs some or all of the actual processing based on the instructions of the program code, thereby realizing the functions of the above-mentioned embodiments.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 The present invention may be applied to a system made up of multiple devices, or to a device made up of a single device. Needless to say, the present invention can also be applied to cases where the effects of the present invention are achieved by supplying a program to a system or device. In this case, the effects of the present invention can be enjoyed by reading a recording medium that stores a program for achieving the present invention into the system or device.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Furthermore, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or device can enjoy the effects of the present invention. Note that the present invention also includes configurations that combine the above-mentioned embodiments and their variations.

１００文書検索システム 100 Document search system

Claims

A search means for searching documents using a search query specified by a user;
a display control means for controlling display of a result of the search by the search means;
a feature word acquiring means for acquiring, for each document retrieved by the retrieval means, feature words that characterize the document;
Equipped with
The information processing system is characterized in that the display control means controls the display of words identified from among identification word candidates including words obtained from the search query specified by the user and characteristic words obtained by the characteristic word acquisition means according to a priority calculated based on information including the source of acquisition , in such a way that they are associated with the document as identification words for the document and displayed in an identifiable manner.

The information processing system according to claim 1, characterized in that the feature word acquisition means acquires feature words in the document based on the tf-idf values of words contained in the document.

3. The information processing system according to claim 1 , wherein the display control means controls so that a predetermined number of identification words are identifiable and displayed in descending order of priority.

A reception means is provided for receiving an identification word designated by a user,
4. The information processing system according to claim 1 , wherein the display control means controls the display of an identification word in an identifiable manner when the reception means receives a designation of an identification word from a user.

The display control means controls to display candidates of the identification word,
5. The information processing system according to claim 4 , wherein the accepting unit accepts the designation of an identification word by accepting a selection of a word to be identifiably displayed from among the displayed identification word candidates.

The information processing system according to any one of claims 1 to 5, characterized in that the display control means controls the display of a character string surrounding an identification word that is to be displayed in an identifiable manner from documents searched by the search means as a snippet .

The information processing system according to claim 6 , wherein the display control means displays the identification word included in the snippet in an identifiable manner.

An information processing method in an information processing system, comprising:
a search step in which a search means of the information processing system searches for documents using a search query specified by a user;
a display control step of controlling a display control means of the information processing system to display a result of the search performed by the search step;
a feature word acquisition step in which a feature word acquisition means of the information processing system acquires, for each document searched in the search step, feature words that characterize the document;
Equipped with
The information processing method is characterized in that the display control process controls the display of words identified from among identification word candidates including words obtained from the search query specified by the user and characteristic words obtained in the characteristic word acquisition process according to a priority calculated based on information including the source of acquisition , in such a way that the words are associated with the document as identification words for the document and displayed in an identifiable manner.

A program for causing a computer to function as each of the means according to any one of claims 1 to 7 .