JP7652480B1

JP7652480B1 - Information processing system, information processing method, and program

Info

Publication number: JP7652480B1
Application number: JP2025019025A
Authority: JP
Inventors: 敦光岡
Original assignee: Individual
Current assignee: Individual
Priority date: 2025-02-07
Filing date: 2025-02-07
Publication date: 2025-03-27
Anticipated expiration: 2045-02-07

Abstract

To enable effective selection of videos.
[Solution] An information processing system comprising: a video data storage unit that stores multiple video data; a question acquisition unit that acquires questions from users; and a playback video determination unit that provides a prompt to a large-scale language model including information about the video data, the question, and an instruction to select video data to provide in response to the question, thereby determining from the video data the video data to answer the question.
[Selected Figure] Figure 1

Description

本発明は、情報処理システム、情報処理方法及びプログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

ユーザからの質問に回答するシステムが提供されている（例えば特許文献１参照）。 A system is provided that answers questions from users (see, for example, Patent Document 1).

特開２００６－０３９８８１号公報JP 2006-039881 A

近年ではテキストよりも動画での情報提供を行いたいニーズがあるが、ユーザからの質問に対して動画で適切に回答することは難しい。 In recent years, there has been a demand to provide information through video rather than text, but it is difficult to provide appropriate answers to user questions through video.

本発明はこのような背景を鑑みてなされたものであり、動画を効果的に選択可能な技術を提供することを目的とする。 The present invention was made in light of this background, and aims to provide a technology that enables effective video selection.

上記課題を解決するための本発明の主たる発明は、情報処理システムであって、複数の動画データを記憶する動画データ記憶部と、ユーザからの質問を取得する質問取得部と、前記動画データに関する情報、前記質問、及び前記質問に対して提供する前記動画データを選択する指示を含むプロンプトを大規模言語モデルに与えて、前記動画データの中から前記質問に対して回答するための前記動画データを決定する再生動画決定部と、を備えることを特徴とする。 The main invention of the present invention for solving the above problem is an information processing system comprising a video data storage unit that stores multiple video data, a question acquisition unit that acquires a question from a user, and a playback video determination unit that provides a prompt to a large-scale language model, the prompt including information about the video data, the question, and an instruction to select the video data to be provided in response to the question, and determines the video data to be provided from the video data to answer the question.

その他本願が開示する課題やその解決方法については、発明の実施形態の欄及び図面により明らかにされる。 Other problems and solutions disclosed in this application will be made clear in the description of the embodiments of the invention and the drawings.

本発明によれば、動画を効果的に選択することができる。 The present invention allows for effective video selection.

情報処理システムの全体構成例を示す図である。FIG. 1 is a diagram illustrating an example of an overall configuration of an information processing system. 管理サーバ２のハードウェア構成例を示す図である。FIG. 2 illustrates an example of a hardware configuration of a management server 2. 管理サーバ２のソフトウェア構成例を示す図である。FIG. 2 illustrates an example of the software configuration of a management server 2. 管理サーバ２の動作を説明する図である。FIG. 2 is a diagram illustrating the operation of the management server 2.

＜システムの概要＞
以下、本発明の一実施形態に係る情報処理システムについて説明する。本実施形態の情報処理システムは、ユーザからの質問に対して動画で回答を行うようにするものであり、複数準備してある動画のうちどの動画をどの順番でユーザに提供するかを大規模言語モデル（Large Language Model；ＬＬＭ）を用いて決定するようにしている。 <System Overview>
An information processing system according to an embodiment of the present invention will be described below. The information processing system of this embodiment is configured to provide answers to questions from a user using videos, and uses a large language model (LLM) to determine which videos from a plurality of prepared videos should be provided to the user and in what order.

図１は、情報処理システムの全体構成例を示す図である。本実施形態の情報処理システムは、管理サーバ２を含んで構成される。管理サーバ２は、ユーザ端末１と通信ネットワークを介して通信可能に接続される。通信ネットワークは、例えばインターネットであり、公衆電話回線網や携帯電話回線網、無線通信路、イーサネット（登録商標）などにより構築される。 Figure 1 is a diagram showing an example of the overall configuration of an information processing system. The information processing system of this embodiment is configured to include a management server 2. The management server 2 is communicatively connected to a user terminal 1 via a communication network. The communication network is, for example, the Internet, and is constructed using a public telephone line network, a mobile phone line network, a wireless communication path, Ethernet (registered trademark), etc.

ユーザ端末１は、ユーザが操作するコンピュータである。ユーザ端末１は、例えば、スマートフォン、タブレットコンピュータ、パーソナルコンピュータなどとすることができる。 The user terminal 1 is a computer operated by a user. The user terminal 1 may be, for example, a smartphone, a tablet computer, a personal computer, etc.

管理サーバ２は、例えばワークステーションやパーソナルコンピュータのような汎用コンピュータとしてもよいし、あるいはクラウド・コンピューティングによって論理的に実現されてもよい。 The management server 2 may be a general-purpose computer such as a workstation or a personal computer, or may be logically realized by cloud computing.

＜管理サーバ＞
図２は、管理サーバ２のハードウェア構成例を示す図である。なお、図示された構成は一例であり、これ以外の構成を有していてもよい。管理サーバ２は、ＣＰＵ２０１、メモリ２０２、記憶装置２０３、通信インタフェース２０４、入力装置２０５、出力装置２０６を備える。記憶装置２０３は、各種のデータやプログラムを記憶する、例えばハードディスクドライブやソリッドステートドライブ、フラッシュメモリなどである。通信インタフェース２０４は、通信ネットワークに接続するためのインタフェースであり、例えばイーサネット（登録商標）に接続するためのアダプタ、公衆電話回線網に接続するためのモデム、無線通信を行うための無線通信機、シリアル通信のためのＵＳＢ（Universal Serial Bus）コネクタやＲＳ２３２Ｃコネクタなどである。入力装置２０５は、データを入力する、例えばキーボードやマウス、タッチパネル、ボタン、マイクロフォンなどである。出力装置２０６は、データを出力する、例えばディスプレイやプリンタ、スピーカなどである。なお、後述する管理サーバ２の各機能部はＣＰＵ２０１が記憶装置２０３に記憶されているプログラムをメモリ２０２に読み出して実行することにより実現され、管理サーバ２の各記憶部はメモリ２０２及び記憶装置２０３が提供する記憶領域の一部として実現される。 <Management Server>
FIG. 2 is a diagram showing an example of the hardware configuration of the management server 2. Note that the illustrated configuration is an example, and other configurations may be used. The management server 2 includes a CPU 201, a memory 202, a storage device 203, a communication interface 204, an input device 205, and an output device 206. The storage device 203 is, for example, a hard disk drive, a solid state drive, or a flash memory that stores various data and programs. The communication interface 204 is an interface for connecting to a communication network, for example, an adapter for connecting to Ethernet (registered trademark), a modem for connecting to a public telephone line network, a wireless communication device for wireless communication, a USB (Universal Serial Bus) connector or an RS232C connector for serial communication, etc. The input device 205 is, for example, a keyboard, a mouse, a touch panel, a button, a microphone, etc. that input data. The output device 206 is, for example, a display, a printer, a speaker, etc. that output data. Each functional unit of the management server 2 described below is realized by the CPU 201 reading a program stored in the storage device 203 into the memory 202 and executing it, and each storage unit of the management server 2 is realized as part of the storage area provided by the memory 202 and the storage device 203.

図３は、管理サーバ２のソフトウェア構成例を示す図である。管理サーバ２は、動画データ記憶部２３１と、質問取得部２１１と、再生動画決定部２１２と、再生処理部２１３と、を備える。 Figure 3 is a diagram showing an example of the software configuration of the management server 2. The management server 2 includes a video data storage unit 231, a question acquisition unit 211, a playback video determination unit 212, and a playback processing unit 213.

＜記憶部＞
動画データ記憶部２３１は、動画データを記憶する。本実施形態では、動画データ記憶部２３１は、動画データとともに、動画データに関する情報（以下、動画情報）を記憶する。動画情報には、動画データの内容や属性等を表す任意の情報（メタデータ）を含めることができ、例えば、動画を特定する情報（例えば、動画ＩＤ）、動画のタイトル、動画の説明、動画データなどを含めることができる。また、動画情報のメタデータには、動画の再生時間、動画の作成日時、動画の作成者、動画の対象とする視聴者の属性、動画のカテゴリ、動画に関連するキーワードなども含めることができる。また、動画情報のメタデータには、動画の内容を表すサムネイル画像や、動画の字幕データ、動画の音声データから生成した文字起こしテキストデータ等も含めることもできる。なお、動画情報には、動画データの内容を含めずに、動画データを例えばファイルとして記憶させておき、ファイルを取得するための情報（以下、ファイル取得情報という。例えば、ファイルシステム上のパスやクラウドストレージにおけるデータを特定するＵＲＬなどとすることができる。）を記憶するようにして、ファイル取得情報に基づいて動画データを取得するようにしてもよい。動画データ記憶部２３１には、複数の動画データ（及び動画データに関する動画情報）が記憶される。動画情報のタイトルは、ファイル名に設定されていてもよい。動画の説明はテキストデータを想定するが、動画や音声などであってもよい。 <Memory Unit>
The video data storage unit 231 stores video data. In this embodiment, the video data storage unit 231 stores information about the video data (hereinafter, video information) together with the video data. The video information can include any information (metadata) that represents the contents and attributes of the video data, and can include, for example, information that identifies the video (for example, a video ID), a video title, a video description, video data, and the like. The metadata of the video information can also include the playback time of the video, the creation date and time of the video, the creator of the video, the attributes of the viewers targeted by the video, the video category, and keywords related to the video. The metadata of the video information can also include thumbnail images that represent the contents of the video, subtitle data of the video, and transcribed text data generated from the audio data of the video. Note that the video information may not include the contents of the video data, and may store the video data as, for example, a file, and store information for acquiring the file (hereinafter, referred to as file acquisition information. For example, it may be a path on a file system or a URL that identifies data in cloud storage), and acquire the video data based on the file acquisition information. The video data storage unit 231 stores a plurality of video data (and video information related to the video data). The title of the video information may be set in the file name. The description of the video is assumed to be text data, but may be video, audio, or the like.

本実施形態では、短尺（例えば、１５秒から１分程度）の動画データを数多く動画データ記憶部２３１に登録しておくことを想定する。例えば、１つのトピックやサブトピックについて１つの動画データを登録しておくことができる。本実施形態では、後述するように、これらの短尺の動画データをつなぎ合わせて回答を動的に生成する。 In this embodiment, it is assumed that a large number of short video data (e.g., about 15 seconds to 1 minute) are registered in the video data storage unit 231. For example, one video data can be registered for one topic or subtopic. In this embodiment, as described below, these short video data are linked together to dynamically generate answers.

＜機能部＞
質問取得部２１１は、ユーザからの質問を取得する。ユーザからの質問は、例えば、ユーザ端末１から受信することができる。また、ユーザから質問を受けたスタッフのユーザ端末１から、ユーザの指定とともに、当該ユーザからの質問の入力を受け付けるようにしてもよい。質問取得部２１１は、例えば、ユーザがＢＯＴなどに入力したメッセージを質問として取得することができる。 <Functional department>
The question acquisition unit 211 acquires a question from a user. The question from the user can be received from, for example, the user terminal 1. Also, an input of the question from the user together with the user's designation may be accepted from the user terminal 1 of the staff member who received the question from the user. The question acquisition unit 211 can acquire, for example, a message input by the user to a BOT or the like as a question.

再生動画決定部２１２は、ユーザからの質問への回答として再生するべき動画データを決定する。再生動画決定部２１２は、動画データに関する情報と、質問と、質問に対して提供する動画データを選択する指示とを含むプロンプトを大規模言語モデルに与えて、動画データの中から質問に対して回答するための動画データを決定する。 The playback video determination unit 212 determines the video data to be played as a response to the user's question. The playback video determination unit 212 provides a prompt to the large-scale language model, the prompt including information about the video data, the question, and an instruction to select video data to provide in response to the question, and determines the video data to be used to respond to the question from among the video data.

動画データに関する情報は、動画情報に含まれるタイトル又は動画の説明の少なくともいずれかとすることができる。動画データに関する情報には、上述した各種のメタデータを含めるようにしてもよい。プロンプトに含める動画データに関する情報は、例えば、動画データ記憶部２３１に記憶されている全ての動画データについてのものとすることができる。 The information about the video data may be at least one of the title and the description of the video included in the video information. The information about the video data may include the various metadata described above. The information about the video data included in the prompt may be, for example, information about all video data stored in the video data storage unit 231.

大規模言語モデルは、例えば、管理サーバ２が備えるようにすることができる。この場合、管理サーバ２は、大規模言語モデルを実行するためのプログラムを記憶装置２０３に記憶しておき、ＣＰＵ２０１がそのプログラムを実行することにより、大規模言語モデルを実現することができる。また、大規模言語モデルは、管理サーバ２とは別のサーバが備えるようにしてもよい。この場合、管理サーバ２は、外部のＡＰＩ（Application Programming Interface）を利用して、大規模言語モデルにアクセスすることができる。例えば、管理サーバ２は、所定のＵＲＬ（Uniform Resource Locator）に対してＨＴＴＰ（Hypertext Transfer Protocol）リクエストを送信し、その応答として大規模言語モデルの処理結果を取得することができる。大規模言語モデルとしては、ＧＰＴ－３（Generative Pre-trained Transformer 3）、ＰａＬＭ（Pathways Language Model）、Ｃｈｉｎｃｈｉｌｌａ、ＯＰＴ（Open Pre-trained Transformer）などを用いることができる。また、これらの大規模言語モデルをファインチューニングしたものを用いてもよい。 The large-scale language model may be provided, for example, in the management server 2. In this case, the management server 2 stores a program for executing the large-scale language model in the storage device 203, and the CPU 201 executes the program to realize the large-scale language model. The large-scale language model may also be provided in a server other than the management server 2. In this case, the management server 2 can access the large-scale language model using an external API (Application Programming Interface). For example, the management server 2 can send an HTTP (Hypertext Transfer Protocol) request to a specific URL (Uniform Resource Locator) and obtain the processing result of the large-scale language model as a response. As the large-scale language model, GPT-3 (Generative Pre-trained Transformer 3), PaLM (Pathways Language Model), Chinchilla, OPT (Open Pre-trained Transformer), etc. may be used. Fine-tuned versions of these large-scale language models may also be used.

再生動画決定部２１２は、再生すべき動画データとともに、その再生順も決定することができる。 The playback video determination unit 212 can determine the video data to be played as well as the order in which they should be played.

再生動画決定部２１２は、質問に対して提供する動画データの再生順を決定する際、様々な方法を用いることができる。例えば、再生動画決定部２１２は、選択された動画データの内容と質問の内容とを比較し、質問の内容に関連する事項が説明されている順に動画データを並べることで、再生順を決定することができる。また、再生動画決定部２１２は、選択された動画データの内容同士を比較し、関連する内容の動画データを連続させるように並べることで、再生順を決定することもできる。さらに、再生動画決定部２１２は、動画データに予め設定されている優先度や重要度などに基づいて、再生順を決定することもできる。 The playback video determination unit 212 can use various methods when determining the playback order of video data provided in response to a question. For example, the playback video determination unit 212 can determine the playback order by comparing the content of the selected video data with the content of the question and arranging the video data in the order in which matters related to the content of the question are explained. The playback video determination unit 212 can also determine the playback order by comparing the contents of the selected video data with each other and arranging video data with related contents so that they are consecutive. Furthermore, the playback video determination unit 212 can determine the playback order based on the priority or importance that is preset for the video data.

本実施形態では、再生動画決定部２１２は、大規模言語モデルを使用して再生順を決定する。例えば、再生動画決定部２１２は、動画データに関する情報と、質問と、質問に対して提供する動画データを選択するとともにその再生順を決める指示を含むプロンプトを大規模言語モデルに与えて、動画データの中から質問に対して回答するための動画データとその再生順を決定することができる。 In this embodiment, the playback video determination unit 212 determines the playback order using a large-scale language model. For example, the playback video determination unit 212 can provide the large-scale language model with a prompt including information about the video data, a question, and instructions to select video data to be provided in response to the question and to determine the playback order, and can determine the video data from the video data to answer the question and the playback order thereof.

再生動画決定部２１２により決定された再生順は、様々な形式で表現することができる。例えば、再生順は、動画データに付与された番号の順序として表現することができる。この場合、再生動画決定部２１２は、選択された動画データに対して、再生すべき順に番号を付与することができる。また、再生順は、動画データのファイル名を並べたリストとして表現することもできる。この場合、再生動画決定部２１２は、選択された動画データのファイル名を、再生すべき順に並べたリストを生成することができる。さらに、再生順は、動画データの再生開始時間と再生終了時間を示したリストとして表現することもできる。この場合、再生動画決定部２１２は、選択された動画データごとに再生開始時間と再生終了時間を決定し、それらを再生すべき順に並べたリストを生成することができる。 The playback order determined by the playback video determination unit 212 can be expressed in various formats. For example, the playback order can be expressed as the order of numbers assigned to the video data. In this case, the playback video determination unit 212 can assign numbers to the selected video data in the order in which they should be played. The playback order can also be expressed as a list of the file names of the video data. In this case, the playback video determination unit 212 can generate a list in which the file names of the selected video data are arranged in the order in which they should be played. Furthermore, the playback order can also be expressed as a list indicating the playback start time and playback end time of the video data. In this case, the playback video determination unit 212 can determine the playback start time and playback end time for each selected video data, and generate a list in which they are arranged in the order in which they should be played.

再生動画決定部２１２は、再生すべき動画データを選択するとともに、その動画を選択した理由（質問に対してその動画を視聴すべき理由）を生成するようにしてもよい。再生動画決定部２１２は、動画を視聴すべき理由を説明する指示を含むプロンプトを大規模言語モデルに与えて理由を生成させることができる。この場合、再生動画決定部２１２は、動画データに関する情報と、質問と、質問に対して提供する動画データとその理由を考える指示とを含むプロンプトを大規模言語モデルに与えることができる。また、動画を選択させた後に、別のプロンプトでその理由を生成させるようにしてもよい。この場合、再生動画決定部２１２は、質問と、選択された動画データに関する情報と、動画データを選択した理由を説明する指示とを含むプロンプトを大規模言語モデルに与えることができる。再生動画決定部２１２は、動画データを選択するとともに、選択した理由と、その再生順とを考える指示をプロンプトに含めるようにしてもよい。 The playback video determination unit 212 may select the video data to be played and generate a reason for selecting the video (a reason for viewing the video in response to the question). The playback video determination unit 212 may provide a prompt including an instruction to explain the reason for viewing the video to the large-scale language model to generate the reason. In this case, the playback video determination unit 212 may provide a prompt including information on the video data, a question, and an instruction to consider the video data to be provided in response to the question and the reason for it to the large-scale language model. After the video is selected, the reason may be generated by another prompt. In this case, the playback video determination unit 212 may provide a prompt including a question, information on the selected video data, and an instruction to explain the reason for selecting the video data to the large-scale language model. The playback video determination unit 212 may select video data and include an instruction to consider the reason for the selection and the playback order in the prompt.

また、再生動画決定部２１２は、選択された動画データのタイトルや説明文に含まれるキーワードと、質問に含まれるキーワードとの関連性を分析し、関連性の高いキーワードを含む文を理由の一部として生成するように大規模言語モデルに指示するようにしてもよい。また、再生動画決定部２１２は、選択された動画データの内容を要約したテキストデータを生成し、そのテキストデータと質問とを比較して、質問の内容を説明するのに適した部分を抽出し、抽出した部分を理由の一部として生成するように大規模言語モデルに指示することもできる。また、再生動画決定部２１２は、選択された動画データに関連する他の動画データのタイトルや説明文を参照し、それらと質問との関連性を分析することで、選択された動画データが質問への回答として適切である理由を生成するように大規模言語モデルに指示することもできる。 The playback video determination unit 212 may also instruct the large-scale language model to analyze the relevance between keywords included in the title or description of the selected video data and keywords included in the question, and generate a sentence including highly relevant keywords as part of a reason. The playback video determination unit 212 may also instruct the large-scale language model to generate text data summarizing the contents of the selected video data, compare the text data with the question, extract a portion suitable for explaining the contents of the question, and generate the extracted portion as part of a reason. The playback video determination unit 212 may also instruct the large-scale language model to generate a reason why the selected video data is appropriate as an answer to the question, by referring to the titles and descriptions of other video data related to the selected video data and analyzing the relevance of these with the question.

再生処理部２１３は、再生順に動画データを再生するための処理を行う。再生処理部２１３は、再生動画決定部２１２が決定した動画データの一覧を出力（例えばユーザ端末１に送信）することができる。再生処理部２１３は、決定された動画データの一覧を、決定された再生順に並べて出力するようにしてもよい。再生処理部２１３は、決定された動画データの一覧を、決定された再生順に並べるとともに、動画データを視聴すべき理由を動画データに付帯させるようにしてもよい。 The playback processing unit 213 performs processing to play back the video data in the playback order. The playback processing unit 213 can output (e.g., transmit to the user terminal 1) a list of the video data determined by the playback video determination unit 212. The playback processing unit 213 may output the list of determined video data in the determined playback order. The playback processing unit 213 may arrange the list of determined video data in the determined playback order and attach to the video data a reason why the video data should be viewed.

再生処理部２１３は、再生動画決定部２１２が決定した動画データを、決定した再生順に連続して再生されるように制御することができる。再生処理部２１３は、例えば、決定した動画データを、決定した再生順に連続してユーザ端末１に送信してユーザ端末１において連続して再生されるようにすることができる。例えば、再生処理部２１３は、動画データをストリーミング配信することができる。ストリーミング配信の方式としては、例えば、HTTP Live Streaming（HLS）、Real-Time Messaging Protocol（RTMP）、Real-Time Streaming Protocol（RTSP）、MPEG-DASH等の各種ストリーミングプロトコルを用いることができる。また、再生処理部２１３は、動画データをダウンロード配信することもできる。この場合、再生処理部２１３は、動画データをユーザ端末１にダウンロードさせた後に、ユーザ端末１のローカルストレージに保存された動画データを再生させるように制御することができる。再生処理部２１３は、決定した動画データに再生順を付帯させてユーザ端末１に送信し、ユーザ端末１において再生順に再生されるようにしてもよい。 The playback processing unit 213 can control the video data determined by the playback video determination unit 212 to be played back continuously in the determined playback order. The playback processing unit 213 can, for example, transmit the determined video data to the user terminal 1 continuously in the determined playback order so that the data is played back continuously in the user terminal 1. For example, the playback processing unit 213 can stream the video data. As a streaming delivery method, for example, various streaming protocols such as HTTP Live Streaming (HLS), Real-Time Messaging Protocol (RTMP), Real-Time Streaming Protocol (RTSP), and MPEG-DASH can be used. The playback processing unit 213 can also download and distribute the video data. In this case, the playback processing unit 213 can control the video data stored in the local storage of the user terminal 1 to be played back after the video data is downloaded to the user terminal 1. The playback processing unit 213 may transmit the determined video data to the user terminal 1 with a playback order attached, so that the video data is played back in the playback order in the user terminal 1.

また、再生処理部２１３は、動画データとともに、その動画を視聴すべき理由を送信し、ユーザ端末１において、動画データの再生前、再生中、及び／又は再生後にその理由を表示させるようにしてもよい。 The playback processing unit 213 may also transmit, together with the video data, a reason why the video should be viewed, and cause the user terminal 1 to display the reason before, during, and/or after playback of the video data.

＜動作＞
図４は、管理サーバ２の動作を説明する図である。 <Operation>
FIG. 4 is a diagram illustrating the operation of the management server 2.

管理サーバ２は、ユーザからの質問を取得し（Ｓ３０１）、質問、動画のタイトル／説明をＬＬＭに与えて再生すべき動画・その順番を生成させ（Ｓ３０２）、選択された動画が再生順に再生されるように制御する（Ｓ３０３）。 The management server 2 acquires a question from the user (S301), provides the question and the titles/descriptions of the videos to be played to the LLM to generate the videos to be played and their order (S302), and controls the playback of the selected videos in the playback order (S303).

以上のようにして、本実施形態の情報処理システムによれば、予め準備した動画データの中から、ユーザからの質問に対する回答として適切なものを再生させることができる。また、本実施形態の情報処理システムによれば、質問に対する回答として適切なものを適切な順に再生させることもできる。また、本実施形態の情報処理システムによれば、提案された動画データを視聴すべき理由を表示させることもできる。 As described above, the information processing system of this embodiment can play back video data that is appropriate as an answer to a question from a user from among the video data prepared in advance. Furthermore, the information processing system of this embodiment can also play back appropriate answers to a question in an appropriate order. Furthermore, the information processing system of this embodiment can also display reasons why the suggested video data should be viewed.

以上、本実施形態について説明したが、上記実施形態は本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。本発明は、その趣旨を逸脱することなく、変更、改良され得ると共に、本発明にはその等価物も含まれる。 The present embodiment has been described above, but the above embodiment is intended to facilitate understanding of the present invention and is not intended to limit the interpretation of the present invention. The present invention may be modified or improved without departing from the spirit of the present invention, and equivalents thereof are also included in the present invention.

例えば、上述した管理サーバ２の各機能部による処理は、いずれの機能部により実行されるようにしてもよい。また、上述した各機能部の処理の一部を実行する異なる機能部を追加するようにしてもよい。また、管理サーバ２の機能部は、複数台のコンピュータが分散して備えるようにしてもよい。 For example, the processing by each functional unit of the management server 2 described above may be executed by any of the functional units. Also, a different functional unit that executes part of the processing by each functional unit described above may be added. Also, the functional units of the management server 2 may be distributed across multiple computers.

また、管理サーバ２の各記憶部が記憶する情報は、いずれの記憶部が記憶するようにしてもよい。すなわち、上述した複数の記憶部が記憶する情報を１つの記憶部により記憶するようにしてもよいし、上述したある記憶部が記憶する情報の一部を他の記憶部が記憶するようにしてもよい。 In addition, the information stored in each storage unit of the management server 2 may be stored by any storage unit. That is, the information stored in the multiple storage units described above may be stored by one storage unit, or part of the information stored in one storage unit described above may be stored by another storage unit.

＜変形例１＞
動画データに関する情報として、動画の作成日時、作成者、再生時間、対象とする視聴者の属性など、より多様な情報を含めることができる。 <Modification 1>
Information about video data can include a wider variety of information, such as the date and time the video was created, the creator, the playback time, and the attributes of the target audience.

例えば、動画情報には、動画の作成日時を示すタイムスタンプデータを含めることができる。質問に対して、最新の動画を優先的に選択したり、逆に古い動画を選択したりするなど、動画の作成日時に基づいて動画を選択することができる。 For example, the video information can include timestamp data indicating the date and time the video was created. In response to a question, videos can be selected based on the date and time the video was created, such as by preferentially selecting the most recent video or, conversely, by selecting an older video.

例えば、動画情報には、動画の作成者を示す作成者情報を含めることができる。作成者情報としては、例えば、作成者の氏名、所属、肩書き、専門分野などを含めることができる。質問に対して、特定の作成者による動画を優先的に選択したり、複数の作成者による動画をバランス良く選択したりするなど、動画の作成者に基づいて動画を選択することができる。 For example, the video information may include creator information that indicates the creator of the video. Creator information may include, for example, the creator's name, affiliation, title, and area of expertise. In response to a question, videos may be selected based on the video creator, such as preferentially selecting videos by a specific creator or selecting videos by multiple creators in a balanced manner.

例えば、動画情報には、動画の再生時間を示す再生時間情報を含めることができる。質問に対して、再生時間が長い動画や短い動画を優先的に選択したり、再生時間の合計が所定の時間以内に収まるように動画を選択したりするなど、動画の再生時間に基づいて動画を選択することができるようになる。 For example, the video information can include playback time information that indicates the playback time of the video. In response to a question, videos can be selected based on the video playback time, such as preferentially selecting videos with long or short playback times, or selecting videos whose total playback time falls within a specified time.

例えば、動画情報には、対象とする視聴者の属性を示す視聴者属性情報を含めることができる。視聴者属性情報としては、例えば、想定される視聴者の年齢層、性別、職業、興味関心などを含めることができる。質問をしたユーザの属性に合わせて動画を選択することができ、この選択した動画データに関する動画情報をプロンプトに与えて、選択した動画データの中からユーザに提案する動画データを選択するようにしてもよい。 For example, the video information may include viewer attribute information indicating the attributes of the target viewer. The viewer attribute information may include, for example, the expected age group, gender, occupation, and interests of the viewer. A video may be selected according to the attributes of the user who asked the question, and video information about this selected video data may be provided in a prompt to select video data to suggest to the user from the selected video data.

＜変形例２＞
再生動画決定部２１２は、動画データの選択を、大規模言語モデルを用いずに行うようにしてもよい。例えば、再生動画決定部２１２は、動画データのタイトルや説明文をベクトル化したデータを予め生成しておき、質問取得部２１１が取得した質問をベクトル化したデータとのコサイン距離を算出することで、質問に関連する動画データを選択することができる。 <Modification 2>
The playback video determination unit 212 may select video data without using a large-scale language model. For example, the playback video determination unit 212 generates data obtained by vectorizing the title and description of the video data in advance, and calculates the cosine distance between the data and the vectorized data of the question acquired by the question acquisition unit 211, thereby selecting video data related to the question.

例えば、再生動画決定部２１２は、各動画データについて、タイトルや説明文を形態素解析し、各形態素の出現頻度に基づいて、各動画データをベクトル化することができる。そして、再生動画決定部２１２は、質問取得部２１１が取得した質問についても同様に形態素解析し、ベクトル化することができる。そして、再生動画決定部２１２は、質問のベクトルと各動画データのベクトルとのコサイン距離を算出し、コサイン距離が小さい順に動画データを選択することができる。これにより、再生動画決定部２１２は、質問の内容に関連する動画データを適切に選択することができる。 For example, the play video determination unit 212 can perform morphological analysis on the title and description of each video data, and vectorize each video data based on the frequency of occurrence of each morpheme. The play video determination unit 212 can also perform morphological analysis and vectorize the question acquired by the question acquisition unit 211 in the same way. The play video determination unit 212 can then calculate the cosine distance between the vector of the question and the vector of each video data, and select the video data in order of smallest cosine distance. This allows the play video determination unit 212 to appropriately select video data related to the content of the question.

また、再生動画決定部２１２は、動画データのベクトル化の際に、タイトルや説明文以外の情報も用いるようにしてもよい。例えば、再生動画決定部２１２は、動画データに付与されたメタデータ（キーワードやカテゴリなど）や、動画データに対するユーザの評価（視聴回数、高評価数など）を用いて、動画データをベクトル化するようにしてもよい。これにより、再生動画決定部２１２は、質問との関連性がより高い動画データを選択することができる。 In addition, the playback video determination unit 212 may use information other than the title and description when vectorizing the video data. For example, the playback video determination unit 212 may vectorize the video data using metadata (keywords, category, etc.) assigned to the video data, or user ratings of the video data (number of views, number of likes, etc.). This allows the playback video determination unit 212 to select video data that is more relevant to the question.

また、再生動画決定部２１２は、質問に関連する動画データの選択を、コサイン距離以外の方法で行うようにしてもよい。例えば、再生動画決定部２１２は、質問のベクトルと各動画データのベクトルとのユークリッド距離を算出し、ユークリッド距離が小さい順に動画データを選択するようにしてもよい。また、再生動画決定部２１２は、質問のベクトルと各動画データのベクトルとの内積を算出し、内積が大きい順に動画データを選択するようにしてもよい。これらの方法を適宜選択することで、再生動画決定部２１２は、質問との関連性がより高い動画データを選択することができる。 The play video determination unit 212 may also select video data related to a question using a method other than cosine distance. For example, the play video determination unit 212 may calculate the Euclidean distance between the vector of the question and the vector of each video data, and select video data in order of smallest Euclidean distance. The play video determination unit 212 may also calculate the inner product between the vector of the question and the vector of each video data, and select video data in order of largest inner product. By appropriately selecting one of these methods, the play video determination unit 212 can select video data that is more related to the question.

＜変形例３＞
再生動画決定部２１２は、質問に対して提供する動画データの再生順を、動画データ間の内容の関連性に基づいて決定することができる。例えば、再生動画決定部２１２は、動画データのタイトルや説明文に含まれるキーワードを抽出し、共通するキーワードを多く含む動画データ同士を連続して再生するような再生順を決定することができる。また、再生動画決定部２１２は、動画データの内容を表すメタデータ（例えば、動画に写っている物体やシーンを表すタグなど）を参照し、メタデータの類似度が高い動画データ同士を連続して再生するような再生順を決定することができる。これにより、内容の関連性が高い動画データを連続して再生させることができる。 <Modification 3>
The playback video determination unit 212 can determine the playback order of video data provided in response to a question based on the relevance of the contents between the video data. For example, the playback video determination unit 212 can extract keywords contained in the titles and descriptions of the video data, and determine a playback order such that video data containing many common keywords are played consecutively. The playback video determination unit 212 can also refer to metadata (e.g., tags representing objects and scenes shown in the video) that represent the contents of the video data, and determine a playback order such that video data having high similarity in metadata are played consecutively. This allows video data having high relevance in content to be played consecutively.

また、再生動画決定部２１２は、質問に対して提供する動画データの再生順を、質問と動画データの内容との関連性の高さに基づいて決定することができる。例えば、再生動画決定部２１２は、質問文に含まれるキーワードと、動画データのタイトルや説明文に含まれるキーワードとを比較し、キーワードの一致度が高い動画データから順に再生するような再生順を決定することができる。また、再生動画決定部２１２は、質問文の内容と、動画データの内容を表すメタデータとの類似度を算出し、類似度が高い動画データから順に再生するような再生順を決定することができる。これにより、質問の内容に関連性の高い動画データを優先的に再生させることができる。 The playback video determination unit 212 can also determine the playback order of video data provided in response to a question based on the degree of relevance between the question and the content of the video data. For example, the playback video determination unit 212 can compare keywords contained in the question with keywords contained in the title or description of the video data, and determine a playback order in which video data with higher keyword matches are played first. The playback video determination unit 212 can also calculate the similarity between the content of the question and metadata representing the content of the video data, and determine a playback order in which video data with higher similarity are played first. This allows video data with higher relevance to the content of the question to be played preferentially.

さらに、再生動画決定部２１２は、動画データ間の内容の関連性と、質問と動画データの内容との関連性の高さとを組み合わせて、再生順を決定することができる。例えば、再生動画決定部２１２は、まず質問との関連性が高い動画データを選択し、次にその選択された動画データ間の関連性に基づいて再生順を決定することができる。 Furthermore, the playback video determination unit 212 can determine the playback order by combining the relevance of the content between the video data and the degree of relevance between the question and the content of the video data. For example, the playback video determination unit 212 can first select video data that is highly relevant to the question, and then determine the playback order based on the relevance between the selected video data.

＜変形例４＞
再生動画決定部２１２は、動画を選択した理由を生成する際に、選択された動画の内容と質問の内容とを比較して、関連性の高い部分を抽出することで理由を生成するようにしてもよい。例えば、再生動画決定部２１２は、選択された動画の説明文や字幕データ等のテキストデータと、質問のテキストデータとを比較し、両者に共通して出現する単語や、同義語・類義語の関係にある単語を抽出し、抽出された単語を含む文や段落等を動画及び質問のテキストデータから抜き出し、これらを文章要約のアルゴリズムを用いて要約することで、動画を選択した理由を生成することができる。 <Modification 4>
When generating a reason for selecting a video, the playback video determination unit 212 may generate the reason by comparing the content of the selected video with the content of the question and extracting a highly related part. For example, the playback video determination unit 212 can generate a reason for selecting a video by comparing text data such as the description and subtitle data of the selected video with the text data of the question, extracting words that appear in common between the two and words that are synonymous or similar, extracting sentences, paragraphs, etc. that include the extracted words from the text data of the video and the question, and summarizing these using a text summarization algorithm.

さらに、再生動画決定部２１２は、動画を選択した理由に、選択された動画の内容のうち質問の内容と関連性の高い部分の再生開始時間と終了時間を付加するようにしてもよい。再生処理部２１３は、この再生開始時間及び終了時間の情報に基づいて、動画の関連部分のみを再生するようにしてもよいし、関連部分の再生前や再生後に動画全体を再生するようにしてもよい。 Furthermore, the playback video determination unit 212 may add the playback start time and end time of the part of the selected video that is highly relevant to the content of the question to the reason for selecting the video. Based on this playback start time and end time information, the playback processing unit 213 may play only the relevant part of the video, or may play the entire video before or after playing the relevant part.

＜変形例５＞ <Variation 5>

再生処理部２１３は、動画データを様々な方式で再生させることができる。例えば、再生処理部２１３は、動画データをストリーミング配信することができる。この場合、再生処理部２１３は、動画データをリアルタイムでユーザ端末１に送信し、ユーザ端末１において動画データを受信しながら再生させることができる。 The playback processing unit 213 can play back the video data in various ways. For example, the playback processing unit 213 can stream the video data. In this case, the playback processing unit 213 can transmit the video data to the user terminal 1 in real time and play the video data as it is received by the user terminal 1.

再生処理部２１３は、動画データをダウンロードさせてから再生させることもできる。この場合、再生処理部２１３は、動画データをユーザ端末１に送信し、ユーザ端末１において動画データを受信して記憶させる。ユーザ端末１において、記憶された動画データを再生させることができる。 The playback processing unit 213 can also download the video data and then play it. In this case, the playback processing unit 213 transmits the video data to the user terminal 1, and the user terminal 1 receives and stores the video data. The stored video data can be played back in the user terminal 1.

再生処理部２１３は、動画データのストリーミング配信とダウンロードとを組み合わせて再生させることもできる。例えば、再生処理部２１３は、動画データの一部をダウンロードさせてから再生を開始させ、残りの部分をストリーミング配信することができる。 The playback processing unit 213 can also perform playback by combining streaming and downloading of video data. For example, the playback processing unit 213 can download part of the video data, start playback, and stream the remaining part.

＜開示事項＞
なお、本開示には、以下のような構成も含まれる。
［項目１］
複数の動画データを記憶する動画データ記憶部と、
ユーザからの質問を取得する質問取得部と、
前記動画データに関する情報、前記質問、及び前記質問に対して提供する前記動画データを選択する指示を含むプロンプトを大規模言語モデルに与えて、前記動画データの中から前記質問に対して回答するための前記動画データを決定する再生動画決定部と、
を備えることを特徴とする情報処理システム。
［項目２］
項目１に記載の情報処理システムであって、
前記再生動画決定部は、前記質問に対して提供する前記動画データ及び再生順を作成する指示を前記プロンプトに含めて、前記動画データの中から前記質問に対して回答するための前記動画データを選択し、選択した前記動画データの前記再生順を決定すること、
を特徴とする情報処理システム。
［項目３］
項目２に記載の情報処理システムであって、
前記再生動画決定部は、さらに前記動画データを選択した理由を説明する指示を含む前記プロンプトを前記大規模言語モデルに与えて前記理由を作成すること、
を特徴とする情報処理システム。
［項目４］
項目２に記載の情報処理システムであって、
前記再生順に前記動画データを再生する再生処理部を備えること、
を特徴とする情報処理システム。
［項目５］
複数の動画データを記憶するステップと、
ユーザからの質問を取得するステップと、
前記動画データに関する情報、前記質問、及び前記質問に対して提供する前記動画データを選択する指示を含むプロンプトを大規模言語モデルに与えて、前記動画データの中から前記質問に対して回答するための前記動画データを決定するステップと、
をコンピュータが実行することを特徴とする情報処理方法。
［項目６］
複数の動画データを記憶するステップと、
ユーザからの質問を取得するステップと、
前記動画データに関する情報、前記質問、及び前記質問に対して提供する前記動画データを選択する指示を含むプロンプトを大規模言語モデルに与えて、前記動画データの中から前記質問に対して回答するための前記動画データを決定するステップと、
をコンピュータに実行させるためのプログラム。 <Disclosures>
The present disclosure also includes the following configurations.
[Item 1]
a video data storage unit for storing a plurality of video data;
a question acquisition unit for acquiring a question from a user;
a playback video determination unit that provides a prompt including information about the video data, the question, and an instruction to select the video data to be provided in response to the question to a large-scale language model, and determines the video data to be provided in response to the question from among the video data;
An information processing system comprising:
[Item 2]
Item 1 is an information processing system according to the present invention,
the playback video determination unit includes in the prompt an instruction for creating the video data and a playback order to be provided in response to the question, selects the video data to answer the question from among the video data, and determines the playback order of the selected video data;
An information processing system comprising:
[Item 3]
Item 2: An information processing system according to the present invention,
the playback video determination unit further provides the prompt, including an instruction to explain a reason for selecting the video data, to the large-scale language model to create the reason;
An information processing system comprising:
[Item 4]
Item 2: An information processing system according to the present invention,
a playback processing unit that plays back the video data in the playback order;
An information processing system comprising:
[Item 5]
storing a plurality of video data;
obtaining a question from a user;
providing a prompt to a large-scale language model, the prompt including information about the video data, the question, and an instruction to select the video data to be provided in response to the question, to determine the video data to be provided from among the video data to answer the question;
An information processing method characterized by being executed by a computer.
[Item 6]
storing a plurality of video data;
obtaining a question from a user;
providing a prompt to a large-scale language model, the prompt including information about the video data, the question, and an instruction to select the video data to be provided in response to the question, to determine the video data to be provided from among the video data to answer the question;
A program for causing a computer to execute the following.

１ユーザ端末
２管理サーバ 1 User terminal 2 Management server

Claims

a video data storage unit for storing a plurality of video data;
a question acquisition unit for acquiring a question from a user;
a playback video determination unit that provides a prompt including information about the video data, the question, and an instruction to select the video data to be provided in response to the question to a large-scale language model, and determines the video data to be provided in response to the question from among the video data;
An information processing system comprising:

2. The information processing system according to claim 1,
the playback video determination unit includes in the prompt an instruction for creating the video data and a playback order to be provided in response to the question, selects the video data to answer the question from among the video data, and determines the playback order of the selected video data;
An information processing system comprising:

3. The information processing system according to claim 2,
the playback video determination unit further provides the prompt, which includes an instruction to explain a reason for selecting the video data, to the large-scale language model to create the reason;
An information processing system comprising:

3. The information processing system according to claim 2,
a playback processing unit that plays back the video data in the playback order;
An information processing system comprising:

storing a plurality of video data;
obtaining a question from a user;
providing a prompt to a large-scale language model, the prompt including information about the video data, the question, and an instruction to select the video data to be provided in response to the question, to determine the video data to be provided from among the video data to answer the question;
An information processing method characterized by being executed by a computer.

storing a plurality of video data;
obtaining a question from a user;
providing a prompt to a large-scale language model, the prompt including information about the video data, the question, and an instruction to select the video data to be provided in response to the question, to determine the video data to be provided from among the video data to answer the question;
A program for causing a computer to execute the following.