JP7616968B2

JP7616968B2 - Information processing device, method for generating teacher data, method for generating trained model, and program

Info

Publication number: JP7616968B2
Application number: JP2021135053A
Authority: JP
Inventors: 裕介三木
Original assignee: カナデビア株式会社
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2025-01-17
Anticipated expiration: 2041-08-20
Also published as: CN115713460A; JP2023029006A

Description

本発明は、機械学習に用いる教師データを生成する情報処理装置等に関する。 The present invention relates to an information processing device that generates training data for use in machine learning.

従来から、教師データを用いた機械学習が行われている。一般的に機械学習には多数の教師データが必要となるが、多数の教師データを取得することが難しい場合もある。このような問題を解消するための技術として、既存の教師データを基に新たな教師データを生成して教師データの総数を増やす技術が知られている。 Machine learning using training data has been practiced for some time. Machine learning generally requires a large amount of training data, but it can be difficult to obtain a large amount of training data. One known technology to solve this problem is to generate new training data based on existing training data to increase the total amount of training data.

例えば、下記の特許文献１には、教師画像を複数のパターンに分類して、数が少ないパターンを特定し、教師画像を空間的に反転したり色調を変更したりすることにより、不足パターンに属する新たな教師画像を生成する機械学習システムが開示されている。 For example, the following Patent Document 1 discloses a machine learning system that classifies training images into multiple patterns, identifies patterns with a small number of instances, and generates new training images that belong to the missing patterns by spatially inverting the training images or changing their color tones.

特開２０１８－１６９６７２号JP 2018-169672 A

また、画像に写る検出対象を検出する学習済みモデルを生成するための機械学習では、検出対象が写る教師画像に対して、その検出対象の位置および範囲を示すラベルを対応付けたものを教師データとする。一般的には、検出対象を囲む矩形領域の代表座標（例えば矩形領域の中心位置の座標）と当該矩形領域の幅および高さがラベルとして教師画像に対応付けられる。なお、幅および高さは、横の長さおよび縦の長さと読み替えることもできる。 In addition, in machine learning to generate a trained model for detecting a detection target appearing in an image, a teacher image in which the detection target appears is associated with a label indicating the position and range of the detection target, and this is used as the training data. Generally, the representative coordinates of a rectangular area surrounding the detection target (e.g., the coordinates of the center position of the rectangular area) and the width and height of the rectangular area are associated with the teacher image as labels. Note that width and height can also be interpreted as horizontal length and vertical length.

図４は、従来技術を示す図であり、教師画像における検出対象の位置および範囲を示すラベルの設定例を示す図である。図４の左側に示す教師画像ＩＭＧ１には、検出対象ＯＢ１が写っており、この検出対象ＯＢ１を囲む矩形領域Ｒ１が設定されている。この矩形領域Ｒ１の代表座標と幅と高さがラベルとして教師画像ＩＭＧ１に対応付けられる。 Figure 4 shows a conventional technique, and is a diagram showing an example of setting labels that indicate the position and range of a detection target in a teacher image. The teacher image IMG1 shown on the left side of Figure 4 shows a detection target OB1, and a rectangular region R1 that surrounds this detection target OB1 is set. The representative coordinates, width, and height of this rectangular region R1 are associated with the teacher image IMG1 as labels.

ここで、図４の教師画像ＩＭＧ１のように、検出対象ＯＢ１が棒状の物体であり、この検出対象ＯＢ１が傾斜して写っている場合、矩形領域Ｒ１に占める検出対象ＯＢ１が写っていない背景領域（矩形領域Ｒ１の右上領域と左下領域）の割合が高くなる。このようなラベルを用いて学習を行うと、検出対象ＯＢ１の背景領域の影響が大きくなるため好ましくない。例えば図４の教師画像ＩＭＧ１では、板状の物体の一部が矩形領域Ｒ１内に写り込んでおり、これが学習結果に影響を与える可能性がある。 Here, as in the teacher image IMG1 in Figure 4, when the detection target OB1 is a rod-shaped object and is captured at an angle, the proportion of the background area (the upper right and lower left areas of the rectangular area R1) in which the detection target OB1 is not captured in the rectangular area R1 becomes high. Learning using such labels is not preferable because it increases the influence of the background area of the detection target OB1. For example, in the teacher image IMG1 in Figure 4, part of the plate-shaped object is captured in the rectangular area R1, which may affect the learning results.

そこで、図４の右側の教師画像ＩＭＧ２のように、傾斜した矩形領域Ｒ２を設定することが考えられる。これにより、矩形領域Ｒ１と比べて背景領域の割合を大きく引き下げることができる。教師画像ＩＭＧ２には、矩形領域Ｒ２の代表座標、幅、および高さに加えて、傾斜角度αをラベルとして対応付ければよい。 As such, it is possible to set an inclined rectangular region R2, as in the teacher image IMG2 on the right side of Figure 4. This makes it possible to significantly reduce the proportion of the background region compared to the rectangular region R1. In addition to the representative coordinates, width, and height of the rectangular region R2, the inclination angle α can be associated with the teacher image IMG2 as a label.

しかしながら、矩形領域を傾斜させた場合、教師画像のバリエーションを増やすために、元画像のアスペクト比（高さと幅の比。縦横比と同義）を変える変換を行ったときに、変換により生成された教師画像において、ラベルにより示される検出対象の位置および範囲が、教師画像に写る検出対象の位置および範囲とずれる場合があるという問題がある。 However, when the rectangular region is tilted, and a transformation is performed to change the aspect ratio (ratio of height to width, equivalent to aspect ratio) of the original image in order to increase the variety of teacher images, there is a problem that the position and range of the detection target indicated by the label in the teacher image generated by the transformation may differ from the position and range of the detection target shown in the teacher image.

これについて、図５に基づいて説明する。図５は、従来技術の問題点を説明する図である。図５の例では、上述の教師画像ＩＭＧ１、ＩＭＧ２と同じく検出対象ＯＢ１が写る画像ＩＭＧ３から一点鎖線で示す切出領域の部分を切り出している。そして、その切出領域の部分を元画像ＩＭＧ３’とし、元画像ＩＭＧ３’をサイズ調整することにより教師画像ＩＭＧ４を生成している。 This will be explained with reference to Figure 5. Figure 5 is a diagram illustrating the problems with the conventional technology. In the example of Figure 5, like the teacher images IMG1 and IMG2 described above, a portion of the cut-out area indicated by the dashed dotted line is cut out from image IMG3, which contains detection target OB1. Then, that portion of the cut-out area is used as original image IMG3', and teacher image IMG4 is generated by adjusting the size of original image IMG3'.

より詳細には、画像ＩＭＧ３のサイズは幅および高さが何れも６０８ピクセルであり、画像ＩＭＧ３に写る対象物ＯＢ１には、図４のＩＭＧ２と同様に角度αで傾斜した矩形領域Ｒ３が設定されている。また、画像ＩＭＧ３から切り出された元画像ＩＭＧ３’のサイズは幅５６６、高さ３８３ピクセルである。元画像ＩＭＧ３’の下端部は画像ＩＭＧ３からはみ出しているが、教師画像の生成においては、このような切り出し方も可能である。 More specifically, image IMG3 has a width and height of 608 pixels, and a rectangular region R3 inclined at an angle α is set on object OB1 depicted in image IMG3, similar to IMG2 in Figure 4. Furthermore, original image IMG3' cut out from image IMG3 has a width of 566 pixels and a height of 383 pixels. The bottom edge of original image IMG3' extends beyond image IMG3, but such a cutout method is also possible when generating a teacher image.

そして、元画像ＩＭＧ３’を拡大・縮小して、幅および高さが５４４ピクセルの教師画像ＩＭＧ４としている。具体的には、元画像ＩＭＧ３’を幅方向に（５４４／５６６）倍し、高さ方向に（５４４／３８３）倍することにより教師画像ＩＭＧ４が生成されている。幅方向と高さ方向の拡大率が相違しているため、元画像ＩＭＧ３’と教師画像ＩＭＧ４ではアスペクト比が変わっている。アスペクト比が変わること自体は、教師画像のバリエーションを増やすという観点から好ましい。 The original image IMG3' is then enlarged and reduced to create teacher image IMG4, which has a width and height of 544 pixels. Specifically, teacher image IMG4 is generated by multiplying original image IMG3' by (544/566) in the width direction and (544/383) in the height direction. Because the enlargement rates in the width and height directions are different, the aspect ratios of original image IMG3' and teacher image IMG4 are different. The change in aspect ratio itself is desirable from the perspective of increasing the variety of teacher images.

また、ＩＭＧ４において、検出対象ＯＢ１の位置および範囲を示す矩形領域Ｒ４は、矩形領域Ｒ２の幅を（５４４／５６６）倍し、高さを（５４４／３８３）倍したものである。このように、矩形領域Ｒ３の幅および高さを元画像ＩＭＧ３’と同様に拡大・縮小して矩形領域Ｒ４を設定した場合、図５に示すように検出対象ＯＢ１の一部が矩形領域Ｒ４からはみ出すことがある。 Furthermore, in IMG4, rectangular region R4 indicating the position and range of detection target OB1 is obtained by multiplying the width of rectangular region R2 by (544/566) and the height by (544/383). In this way, if rectangular region R4 is set by enlarging and reducing the width and height of rectangular region R3 in the same way as original image IMG3', part of detection target OB1 may extend beyond rectangular region R4, as shown in Figure 5.

これは、元画像ＩＭＧ３’のアスペクト比が変わったことにより検出対象ＯＢ１の傾斜角度が変化しているのに対し、矩形領域Ｒ４の傾斜角度αは、元画像ＩＭＧ３’の矩形領域Ｒ３の傾斜角度と同じであるためである。 This is because the inclination angle of the detection object OB1 has changed due to a change in the aspect ratio of the original image IMG3', whereas the inclination angle α of the rectangular region R4 is the same as the inclination angle of the rectangular region R3 of the original image IMG3'.

このように、元画像に設定された矩形領域が傾斜している場合であって、その元画像とはアスペクト比が異なる教師画像を生成した場合には、教師画像に対応付けられるラベルが、検出対象の位置および範囲を正しく示すものとはならないという問題が生じる。 In this way, if the rectangular area set in the original image is tilted and a teacher image with a different aspect ratio from the original image is generated, the label associated with the teacher image will not correctly indicate the position and range of the detection target.

本発明の一態様は、上記のような場合にも、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データを生成することが可能な情報処理装置等を実現することを目的とする。 One aspect of the present invention aims to realize an information processing device or the like that is capable of generating training data including labels that correctly indicate the position and range of the detection target in the training image even in the above-mentioned cases.

上記の課題を解決するために、本発明の一態様に係る情報処理装置は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成部と、前記変換の倍率に基づいて、前記元画像に写る前記検出対象の位置および範囲を示す矩形領域の各頂点の座標を、前記教師画像に写る前記検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換部と、前記座標変換部による変換後の前記座標に基づいて前記教師画像に写る前記検出対象の位置および範囲を示すラベルを生成し、前記教師画像に対応付けて教師データとする教師データ生成部と、を備える。 In order to solve the above problem, an information processing device according to one aspect of the present invention includes a teacher image generation unit that generates a teacher image by performing a transformation on an original image in which a detection target is captured, enlarging or reducing the image in at least one of the height and width directions; a coordinate transformation unit that transforms the coordinates of each vertex of a rectangular area indicating the position and range of the detection target captured in the original image based on the magnification of the transformation into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection target captured in the teacher image based on the coordinates after transformation by the coordinate transformation unit; and a teacher data generation unit that generates a label indicating the position and range of the detection target captured in the teacher image based on the coordinates after transformation by the coordinate transformation unit, and associates the label with the teacher image to generate teacher data.

上記の課題を解決するために、本発明の一態様に係る教師データの生成方法は、１または複数の情報処理装置が実行する教師データの生成方法であって、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップと、前記変換の倍率に基づいて、前記元画像に写る前記検出対象の位置および範囲を示す矩形領域の各頂点の座標を、前記教師画像に写る前記検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップと、前記座標変換ステップによる変換後の前記座標に基づいて前記教師画像に写る前記検出対象の位置および範囲を示すラベルを生成し、前記教師画像に対応付けて教師データとする教師データ生成ステップと、を含む。 In order to solve the above problems, a teacher data generation method according to one aspect of the present invention is a teacher data generation method executed by one or more information processing devices, and includes a teacher image generation step of generating a teacher image by performing a transformation on an original image in which a detection target is shown, enlarging or reducing the image in at least one of the height direction and the width direction, a coordinate transformation step of transforming the coordinates of each vertex of a rectangular area indicating the position and range of the detection target shown in the original image based on the magnification of the transformation into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection target shown in the teacher image, and a teacher data generation step of generating a label indicating the position and range of the detection target shown in the teacher image based on the coordinates after transformation by the coordinate transformation step, and associating the label with the teacher image to generate teacher data.

上記の課題を解決するために、本発明の一態様に係る学習済みモデルの生成方法は、１または複数の情報処理装置が実行する学習済みモデルの生成方法であって、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップと、前記変換の倍率に基づいて、前記元画像に写る前記検出対象の位置および範囲を示す矩形領域の各頂点の座標を、前記教師画像に写る前記検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップと、前記座標変換ステップによる変換後の前記座標に基づいて前記教師画像に写る前記検出対象の位置および範囲を示すラベルを生成し、前記教師画像に対応付けて教師データとする教師データ生成ステップと、前記教師データ生成ステップで生成された前記教師データを用いた機械学習により、画像から前記検出対象を検出するための学習済みモデルを生成する学習ステップと、を含む。 In order to solve the above problem, a method for generating a trained model according to one aspect of the present invention is a method for generating a trained model executed by one or more information processing devices, and includes a trained image generation step of performing a transformation to enlarge or reduce an original image in which a detection target is shown in at least one of the height direction and the width direction to generate a trained image, a coordinate transformation step of transforming the coordinates of each vertex of a rectangular area indicating the position and range of the detection target shown in the original image into the coordinates of each vertex of a rectangular area indicating the position and range of the detection target shown in the trained image based on the magnification of the transformation, a trained data generation step of generating labels indicating the position and range of the detection target shown in the trained image based on the coordinates after transformation by the coordinate transformation step, and associating the labels with the trained image to use as trained data, and a learning step of generating a trained model for detecting the detection target from an image by machine learning using the trained data generated in the trained data generation step.

本発明の一態様によれば、傾斜した矩形領域が設定された元画像から、その元画像とはアスペクト比が異なる教師画像を生成した場合においても、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データを生成することができる。 According to one aspect of the present invention, even if a teacher image with a different aspect ratio from an original image in which an inclined rectangular area is set is generated from the original image, teacher data can be generated that includes labels that correctly indicate the position and range of the detection target in the teacher image.

本発明の一実施形態に係る情報処理装置の要部構成の一例を示すブロック図である。1 is a block diagram showing an example of a configuration of a main part of an information processing device according to an embodiment of the present invention; 上記情報処理装置による教師画像とラベルの生成例を示す図である。3A to 3C are diagrams illustrating an example of generation of teacher images and labels by the information processing device. 上記情報処理装置が実行する処理の一例を示すフローチャートである。4 is a flowchart illustrating an example of a process executed by the information processing device. 従来技術を示す図であり、教師画像における検出対象の位置および範囲を示すラベルの設定例を示す図である。FIG. 13 is a diagram illustrating a conventional technique, showing an example of setting labels indicating the position and range of a detection target in a teacher image. 従来技術の問題点を説明する図である。FIG. 1 is a diagram illustrating a problem with the conventional technology.

〔装置構成〕
本発明の一実施形態に係る情報処理装置１の構成を図１に基づいて説明する。図１は、情報処理装置１の要部構成の一例を示すブロック図である。情報処理装置１は、一例として、パーソナルコンピュータ、サーバー、またはワークステーションであってもよい。 [Device configuration]
The configuration of an information processing device 1 according to an embodiment of the present invention will be described with reference to Fig. 1. Fig. 1 is a block diagram showing an example of a main configuration of the information processing device 1. The information processing device 1 may be, for example, a personal computer, a server, or a workstation.

図示のように、情報処理装置１は、情報処理装置１の各部を統括して制御する制御部１０と、情報処理装置１が使用する各種データを記憶する記憶部１１を備えている。また、情報処理装置１は、情報処理装置１が他の装置と通信するための通信部１２、情報処理装置１に対する各種データの入力を受け付ける入力部１３、および情報処理装置１が各種データを出力するための出力部１４を備えている。 As shown in the figure, the information processing device 1 includes a control unit 10 that controls each unit of the information processing device 1, and a storage unit 11 that stores various data used by the information processing device 1. The information processing device 1 also includes a communication unit 12 that allows the information processing device 1 to communicate with other devices, an input unit 13 that accepts input of various data to the information processing device 1, and an output unit 14 that allows the information processing device 1 to output various data.

また、制御部１０には、教師画像生成部１０１、座標変換部１０２、教師データ生成部１０３、学習部１０４、および推論部１０５が含まれている。そして、記憶部１１には、画像１１１、教師データ１１２、および学習済みモデル１１３が記憶されている。 The control unit 10 also includes a teacher image generation unit 101, a coordinate conversion unit 102, a teacher data generation unit 103, a learning unit 104, and an inference unit 105. The memory unit 11 stores an image 111, teacher data 112, and a trained model 113.

画像１１１は、検出対象が写っている画像である。画像１１１には、画像１１１に写る検出対象の位置および範囲を示すラベルが対応付けられている。このラベルは、例えば、検出対象を囲む矩形領域の代表座標、幅、高さ、および傾斜角度を示すものであってもよい。なお、幅および高さは、横の長さおよび縦の長さと読み替えることもできる。また、検出対象の検出のみならず識別についても行う学習済みモデルを生成する場合、上述した情報に加えて、検出対象の識別子をラベルに追加すればよい。 Image 111 is an image in which a detection target appears. A label indicating the position and range of the detection target appearing in image 111 is associated with image 111. This label may indicate, for example, representative coordinates, width, height, and tilt angle of a rectangular area surrounding the detection target. Note that width and height can also be interpreted as horizontal length and vertical length. Furthermore, when generating a trained model that not only detects but also identifies the detection target, an identifier of the detection target can be added to the label in addition to the information described above.

画像１１１は、入力部１３を介して入力してもよいし、通信部１２を介して受信してもよい。例えば、入力部１３は、ＵＳＢ（Universal Serial Bus）インターフェースであってもよい。この場合、入力部１３と検出対象を撮影する撮影装置とをＵＳＢインターフェースを介して接続してもよい。そして、撮影装置が撮影した画像を、情報処理装置１が入力部１３を介して受信して、画像１１１として記憶するようにしてもよい。また、撮影装置が撮影した画像を、ＬＡＮ（Local-Area Network）や無線ＬＡＮ等により送信し、情報処理装置１が通信部１２を介して当該画像を受信して、画像１１１として記憶する構成としてもよい。 The image 111 may be input via the input unit 13 or may be received via the communication unit 12. For example, the input unit 13 may be a USB (Universal Serial Bus) interface. In this case, the input unit 13 and a photographing device that photographs the detection target may be connected via the USB interface. The information processing device 1 may receive the image photographed by the photographing device via the input unit 13 and store it as the image 111. Alternatively, the image photographed by the photographing device may be transmitted via a LAN (Local-Area Network) or a wireless LAN, and the information processing device 1 may receive the image via the communication unit 12 and store it as the image 111.

教師画像生成部１０１は、検出対象が写る画像１１１から切り出した元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する。なお、画像１１１をそのまま元画像として教師画像を生成してもよい。また、教師画像の生成方法の詳細は図２に基づいて後述する。 The teacher image generating unit 101 generates a teacher image by performing a conversion on an original image cut out from an image 111 containing a detection target, enlarging or reducing the image in at least one of the height and width directions. Note that the teacher image may be generated by using the image 111 as the original image as it is. The method of generating the teacher image will be described in detail later with reference to FIG. 2.

座標変換部１０２は、教師画像の変換の倍率（高さ方向および幅方向への拡大／縮小の倍率）に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する。座標の変換方法の詳細は図２に基づいて後述する。 The coordinate conversion unit 102 converts the coordinates of each vertex of a rectangular area indicating the position and range of the detection target in the original image into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection target in the teacher image, based on the conversion magnification of the teacher image (magnification of enlargement/reduction in the height direction and width direction). The coordinate conversion method will be described in detail later with reference to FIG. 2.

教師データ生成部１０３は、座標変換部１０２による変換後の座標に基づいて、教師画像生成部１０１が生成した教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データとする。そして、教師データ生成部１０３は、生成した教師データを教師データ１１２として記憶部１１に記憶させる。ラベルの生成方法の詳細は図２に基づいて後述する。 The teacher data generation unit 103 generates labels indicating the position and range of the detection target appearing in the teacher image generated by the teacher image generation unit 101 based on the coordinates converted by the coordinate conversion unit 102, and associates the labels with the teacher image to generate teacher data. The teacher data generation unit 103 then stores the generated teacher data in the storage unit 11 as teacher data 112. The method of generating the labels will be described in detail below with reference to FIG. 2.

学習部１０４は、教師データ生成部１０３が生成する教師データ１１２を用いた機械学習により、画像から検出対象を検出するための学習済みモデルを生成する。そして、学習部１０４は、生成した学習済みモデルを学習済みモデル１１３として記憶部１１に記憶させる。モデルは、例えば、ニューラルネットワークモデル（深層ニューラルネットワークモデルを含む）等であってもよい。また、学習は、学習中のモデルに教師データに含まれる教師画像を入力して得られる推論結果とその教師データに示される正解との誤差を計算し、誤差逆伝搬により上記モデルを更新する、という処理を、教師データを変えながら繰り返し行うことにより行えばよい（誤差逆伝搬法）。 The learning unit 104 generates a trained model for detecting a detection target from an image by machine learning using the teacher data 112 generated by the teacher data generation unit 103. The learning unit 104 then stores the generated trained model in the storage unit 11 as the trained model 113. The model may be, for example, a neural network model (including a deep neural network model). Furthermore, learning may be performed by repeatedly changing the teacher data to calculate the error between the inference result obtained by inputting a teacher image included in the teacher data into the model being trained and the correct answer indicated in the teacher data, and updating the model by error backpropagation (error backpropagation method).

推論部１０５は、学習済みモデル１１３を用いて画像から検出対象を検出する。より詳細には、まず、推論部１０５は、対象となる画像を読み込む。画像は予め記憶部１１に記憶されているものであってもよいし、通信部１２または入力部１３を介して撮影装置等から入力されたものであってもよい。そして、推論部１０５は、読み込んだ上記の画像を学習済みモデル１１３に入力し、得られた検出結果のうち、確度が閾値以上の検出結果を出力部１４に出力させる。 The inference unit 105 detects the detection target from the image using the trained model 113. More specifically, the inference unit 105 first reads the target image. The image may be one that has been stored in advance in the storage unit 11, or may be one that has been input from a photographing device or the like via the communication unit 12 or the input unit 13. The inference unit 105 then inputs the read image into the trained model 113, and causes the output unit 14 to output, from among the obtained detection results, those detection results whose accuracy is equal to or exceeds a threshold.

学習済みモデル１１３は、検出対象の位置および範囲を正しく示すラベルを含む教師データ１１２を用いた学習により生成されたものであるから、学習済みモデル１１３を用いて検出を行う推論部１０５によれば、高精度な物体検出が可能である。なお、上記検出結果は、画像に写る検出対象の位置および範囲を示すものである。検出結果としては、検出対象を囲む領域の代表座標、幅、高さ、および傾斜角度を出力させてもよいし、対象となる画像に上記領域を示す矩形を描画する態様で出力させてもよい。 The trained model 113 is generated by learning using the teacher data 112 including labels that correctly indicate the position and range of the detection target, so that the inference unit 105 that performs detection using the trained model 113 can perform highly accurate object detection. The detection result indicates the position and range of the detection target that appears in the image. The detection result may be output as the representative coordinates, width, height, and tilt angle of the area surrounding the detection target, or may be output in the form of a rectangle that indicates the above area drawn on the target image.

以上のように、情報処理装置１は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成部１０１と、前記変換の倍率に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換部１０２と、座標変換部１０２による変換後の座標に基づいて教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データ１１２とする教師データ生成部１０３と、を備える。 As described above, the information processing device 1 includes a teacher image generation unit 101 that generates a teacher image by performing a transformation on an original image in which the detection target is captured, enlarging or reducing the image in at least one of the height and width directions; a coordinate transformation unit 102 that transforms the coordinates of each vertex of a rectangular area indicating the position and range of the detection target captured in the original image into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection target captured in the teacher image based on the magnification of the transformation; and a teacher data generation unit 103 that generates a label indicating the position and range of the detection target captured in the teacher image based on the coordinates after transformation by the coordinate transformation unit 102, and associates the label with the teacher image to generate teacher data 112.

上記の構成によれば、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する。これにより、元画像とアスペクト比（高さと幅の比。縦横比と同義）が異なるものを含め、バリエーションに富んだ教師画像を生成することができる。 According to the above configuration, a teacher image is generated by enlarging or reducing the original image showing the detection target in at least one of the height and width directions. This makes it possible to generate a wide variety of teacher images, including those with a different aspect ratio (ratio of height to width, synonymous with aspect ratio) from the original image.

また、上記の構成によれば、生成した教師画像に対応付けるラベルを、教師画像の生成時に適用した変換の倍率に基づいて元画像の矩形領域の４頂点の座標を変換した四角形領域の４頂点の座標に基づいて生成する。そして、上記教師画像に上記ラベルを対応付けて教師データとする。 In addition, according to the above configuration, a label to be associated with the generated teacher image is generated based on the coordinates of the four vertices of a quadrangular area obtained by transforming the coordinates of the four vertices of the rectangular area of the original image based on the magnification of the transformation applied when generating the teacher image. Then, the label is associated with the teacher image to create teacher data.

四角形領域の４頂点の座標は、教師画像の生成時に適用した変換の倍率に基づいて求められたものであるから、教師画像に写る検出対象の位置および範囲を正しく示している。これにより、教師画像に写る検出対象の位置および範囲を正しく示すラベルが対応付けられた教師データ１１２を生成することができる。 The coordinates of the four vertices of the rectangular area are calculated based on the transformation magnification applied when generating the teacher image, and therefore correctly indicate the position and range of the detection target appearing in the teacher image. This makes it possible to generate teacher data 112 associated with labels that correctly indicate the position and range of the detection target appearing in the teacher image.

以上のとおり、上記の構成によれば、元画像に設定された矩形領域が傾斜している場合であって、その元画像とはアスペクト比が異なる教師画像を生成した場合においても、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データ１１２を生成することができるという効果を奏する。 As described above, the above configuration has the effect of generating teacher data 112 that includes labels that correctly indicate the position and range of the detection target in the teacher image, even when the rectangular area set in the original image is tilted and a teacher image with a different aspect ratio from that of the original image is generated.

また、情報処理装置１は、教師データ生成部１０３が生成する教師データ１１２を用いた機械学習により、画像から検出対象を検出するための学習済みモデル１１３を生成する学習部１０４を備える。この構成によれば、学習済みモデル１１３を自動で生成することができる。 The information processing device 1 also includes a learning unit 104 that generates a trained model 113 for detecting a detection target from an image by machine learning using the teacher data 112 generated by the teacher data generation unit 103. With this configuration, the trained model 113 can be automatically generated.

〔教師画像とラベルの生成例〕
図２は、情報処理装置１による教師画像とラベルの生成例を示す図である。図２には、幅および高さが何れも６０８ピクセルの画像ＩＭＧ５から一点鎖線で示す切出領域の部分を切り出して元画像ＩＭＧ５’とし、これを拡大・縮小して、幅および高さが５４４ピクセルの教師画像ＩＭＧ６としている。なお、教師画像ＩＭＧ６の幅および高さは、乱数を用いる等して決定してもよい。また、検出対象ＯＢ１の全体ではなく一部を切り出した元画像を用いて教師画像を生成してもよい。このような教師画像を用いて学習を行うことにより、検出対象ＯＢ１の一部のみ写った画像から検出対象ＯＢ１を検出することが可能になる。 [Example of generating training images and labels]
FIG. 2 is a diagram showing an example of teacher images and labels generated by the information processing device 1. In FIG. 2, a portion of the cutout area indicated by the dashed line is cut out from an image IMG5 having a width and height of 608 pixels, and an original image IMG5' is obtained by cutting out the cutout area, which is then enlarged and reduced to obtain a teacher image IMG6 having a width and height of 544 pixels. The width and height of the teacher image IMG6 may be determined by using a random number or the like. Also, a teacher image may be generated using an original image in which not the entire detection target OB1 but a part of it is cut out. By performing learning using such a teacher image, it becomes possible to detect the detection target OB1 from an image in which only a part of the detection target OB1 is captured.

このように、教師画像生成部１０１は、検出対象が写る画像ＩＭＧ５から切り出した元画像ＩＭＧ５’に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像ＩＭＧ６を生成する。教師画像生成部１０１は、切出領域の位置および範囲並びに変換倍率を、例えば乱数で決定してもよい。 In this way, the teacher image generating unit 101 performs a transformation on the original image IMG5' cut out from the image IMG5 containing the detection target, enlarging or reducing it in at least one of the height and width directions, to generate the teacher image IMG6. The teacher image generating unit 101 may determine the position and range of the cut-out area and the transformation magnification using, for example, random numbers.

教師画像ＩＭＧ６が生成されると、座標変換部１０２は、教師画像ＩＭＧ６を生成する際の変換の倍率に基づいて、元画像ＩＭＧ５’に写る検出対象ＯＢ１の位置および範囲を示す矩形領域Ｒ５の各頂点Ｐ５ａ～Ｐ５ｄの座標を求める。これらの座標は、画像ＩＭＧ５に対応付けられているラベルに示される、矩形領域Ｒ５の代表座標、幅、高さ、および傾斜角度から算出される。 When the teacher image IMG6 is generated, the coordinate conversion unit 102 determines the coordinates of each vertex P5a to P5d of the rectangular region R5 that indicates the position and range of the detection target OB1 appearing in the original image IMG5', based on the conversion magnification used when generating the teacher image IMG6. These coordinates are calculated from the representative coordinates, width, height, and tilt angle of the rectangular region R5, which are indicated in the label associated with the image IMG5.

そして、座標変換部１０２は、頂点Ｐ５ａ～Ｐ５ｄの座標を、教師画像ＩＭＧ６に写る検出対象ＯＢ１の位置および範囲を示す四角形領域Ｒ６の各頂点Ｐ６ａ～Ｐ６ｄの座標に変換する。例えば、座標変換部１０２は、頂点Ｐ５ａ～Ｐ５ｄのｘ座標に教師画像ＩＭＧ６を生成する際の幅方向の変換倍率を乗じ、頂点Ｐ５ａ～Ｐ５ｄのｙ座標に教師画像ＩＭＧ６を生成する際の高さ方向の変換倍率を乗じることにより上記の変換を行う。 Then, the coordinate conversion unit 102 converts the coordinates of the vertices P5a to P5d into the coordinates of each of the vertices P6a to P6d of the rectangular region R6 that indicates the position and range of the detection target OB1 that appears in the teacher image IMG6. For example, the coordinate conversion unit 102 performs the above conversion by multiplying the x coordinates of the vertices P5a to P5d by the conversion factor in the width direction when generating the teacher image IMG6, and multiplying the y coordinates of the vertices P5a to P5d by the conversion factor in the height direction when generating the teacher image IMG6.

教師画像ＩＭＧ６の生成時に適用した高さ方向と幅方向の変換の倍率が異なるため、元画像ＩＭＧ５’と教師画像ＩＭＧ６のアスペクト比は異なっている。この場合、教師画像ＩＭＧ６に写る検出対象ＯＢ１は高さと幅のバランスが変わって歪んだ形状となっている。このとき、傾斜した矩形領域Ｒ５に対応する四角形領域Ｒ６は、歪んだ検出対象ＯＢ１の外縁に沿った平行四辺形の領域になる。なお、元画像と教師画像とでアスペクト比が変わっていなければ、教師画像に写る検出対象は歪むことなく拡大または縮小される。この場合、元画像の矩形領域は歪むことなく拡大または縮小されるので変換後の領域も矩形となる。 The aspect ratios of the original image IMG5' and the teacher image IMG6 are different because the magnifications of the transformation in the height and width directions applied when generating the teacher image IMG6 are different. In this case, the balance between the height and width of the detection object OB1 shown in the teacher image IMG6 has changed and the shape is distorted. In this case, the rectangular area R6 corresponding to the tilted rectangular area R5 becomes a parallelogram area along the outer edge of the distorted detection object OB1. If the aspect ratio does not change between the original image and the teacher image, the detection object shown in the teacher image is enlarged or reduced without distortion. In this case, the rectangular area of the original image is enlarged or reduced without distortion, so the area after transformation is also rectangular.

次に、教師データ生成部１０３は、座標変換部１０２による変換後の座標、すなわち頂点Ｐ６ａ～Ｐ６ｄの座標に基づいて、教師画像ＩＭＧ６に写る検出対象ＯＢ１の位置および範囲を示すラベルを生成する。具体的には、教師データ生成部１０３は、頂点Ｐ６ａ～Ｐ６ｄの座標から下記の４つの値を算出する。 Next, the teacher data generation unit 103 generates a label indicating the position and range of the detection target OB1 appearing in the teacher image IMG6 based on the coordinates after transformation by the coordinate transformation unit 102, i.e., the coordinates of the vertices P6a to P6d. Specifically, the teacher data generation unit 103 calculates the following four values from the coordinates of the vertices P6a to P6d.

（１）四角形領域Ｒ６の代表座標
（２）四角形領域Ｒ６の長辺の長さ
（３）四角形領域Ｒ６の長辺の傾斜角度
（４）四角形領域Ｒ６の対向する長辺間の距離
上記（１）は、教師画像ＩＭＧ６に写る検出対象ＯＢ１の位置を示す情報である。代表座標は四角形領域Ｒ６の位置を特定できるような座標であればよい。例えば、教師データ生成部１０３は、四角形領域Ｒ６の対角線の交点Ｐ６ｅの座標を代表座標として算出してもよい。また、例えば、教師データ生成部１０３は、四角形領域Ｒ６の左上端の頂点の座標、つまり頂点Ｐ６ａの座標を代表座標として算出してもよい。 (1) Representative coordinates of the rectangular region R6 (2) Length of the long side of the rectangular region R6 (3) Tilt angle of the long side of the rectangular region R6 (4) Distance between opposing long sides of the rectangular region R6 The above (1) is information indicating the position of the detection target OB1 in the teacher image IMG6. The representative coordinates may be any coordinates that can identify the position of the rectangular region R6. For example, the teacher data generation unit 103 may calculate the coordinates of the intersection P6e of the diagonals of the rectangular region R6 as the representative coordinates. Also, for example, the teacher data generation unit 103 may calculate the coordinates of the vertex at the top left corner of the rectangular region R6, i.e., the coordinates of the vertex P6a, as the representative coordinates.

上記（２）は、教師画像ＩＭＧ６に写る検出対象の高さＨを示す情報である。教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｄとの間の距離、または頂点Ｐ６ｂとＰ６ｃとの間の距離を算出すればよい。なお、四角形領域Ｒ６の長辺の長さの代わりに、四角形領域Ｒ６の短辺の長さを算出してもよい。 The above (2) is information indicating the height H of the detection target shown in the teacher image IMG6. The teacher data generation unit 103 may calculate the distance between vertices P6a and P6d, or the distance between vertices P6b and P6c. Note that instead of the length of the long side of the quadrangular region R6, the length of the short side of the quadrangular region R6 may be calculated.

上記（３）は、教師画像ＩＭＧ６に写る検出対象の傾斜角度を示す情報である。教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｄを結ぶ直線が、教師画像ＩＭＧ６の幅方向に対してなす角度β、または頂点Ｐ６ｂとＰ６ｃを結ぶ直線が、教師画像ＩＭＧ６の幅方向に対してなす角度を算出すればよい。なお、上記（２）において、四角形領域Ｒ６の長辺の長さの代わりに、四角形領域Ｒ６の短辺の長さを算出した場合、短辺の傾斜角度を算出すればよい。 (3) above is information indicating the tilt angle of the detection target shown in the teacher image IMG6. The teacher data generation unit 103 need only calculate the angle β that the line connecting vertices P6a and P6d makes with respect to the width direction of the teacher image IMG6, or the angle that the line connecting vertices P6b and P6c makes with respect to the width direction of the teacher image IMG6. Note that in (2) above, if the length of the short side of the rectangular region R6 is calculated instead of the length of the long side of the rectangular region R6, the tilt angle of the short side can be calculated.

上記（４）は、教師画像ＩＭＧ６に写る検出対象の幅Ｗを示す情報である。教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｄを結ぶ直線と、頂点Ｐ６ｂとＰ６ｃを結ぶ直線との距離を算出すればよい。上記（２）において、四角形領域Ｒ６の長辺の長さの代わりに、四角形領域Ｒ６の短辺の長さを算出した場合、短辺間の距離を算出すればよい。 (4) above is information indicating the width W of the detection target shown in the teacher image IMG6. The teacher data generation unit 103 need only calculate the distance between the line connecting vertices P6a and P6d and the line connecting vertices P6b and P6c. In (2) above, if the length of the short side of the rectangular region R6 is calculated instead of the length of the long side of the rectangular region R6, the distance between the short sides can be calculated.

なお、実際のＯＢ１の幅よりもやや広くなるが、教師データ生成部１０３は、頂点Ｐ６ａとＰ６ｂとの間の距離、または頂点Ｐ６ｃとＰ６ｄとの間の距離を、四角形領域Ｒ６の幅として算出してもよい。 Note that the teacher data generating unit 103 may calculate the distance between vertices P6a and P6b, or the distance between vertices P6c and P6d, as the width of the rectangular region R6, although this will be slightly wider than the actual width of OB1.

また、教師データ生成部１０３は、図２に示すように、頂点Ｐ６ａ～Ｐ６ｄをＰ６ａ’～Ｐ６ｄ’に補正して、Ｐ６ａ’～Ｐ６ｄ’を４頂点とする長方形を特定し、その長方形の位置および範囲を示すラベルを生成してもよい。具体的には、教師データ生成部１０３は、Ｐ６ａ～Ｐ６ｄを４頂点とする平行四辺形と同じ面積となり、かつ対角線の交点がＰ６ｅとなるように頂点Ｐ６ａ～Ｐ６ｄの位置を補正する。例えば、長方形の長辺が、平行四辺形の長辺と同じ長さとなるようにし、長方形の短辺の長さが平行四辺形の高さと同じになるようにしてもよい。これにより、図２に示すようなＰ６ａ’～Ｐ６ｄ’を４頂点とする長方形が特定される。この場合、例えばＰ６ａ’～Ｐ６ｄ’の座標を、当該長方形の位置および範囲を示すラベルとしてもよい。無論、長方形の短辺が、平行四辺形の短辺と同じ長さとなるようにし、長方形の長辺の長さが平行四辺形の高さと同じになるようにしてもよい。 Also, as shown in FIG. 2, the teacher data generating unit 103 may correct the vertices P6a to P6d to P6a' to P6d', identify a rectangle with P6a' to P6d' as its four vertices, and generate a label indicating the position and range of the rectangle. Specifically, the teacher data generating unit 103 corrects the positions of the vertices P6a to P6d so that the area is the same as that of a parallelogram with P6a to P6d as its four vertices, and the intersection of the diagonals is P6e. For example, the long side of the rectangle may be the same length as the long side of the parallelogram, and the short side of the rectangle may be the same length as the height of the parallelogram. This identifies a rectangle with P6a' to P6d' as its four vertices, as shown in FIG. 2. In this case, for example, the coordinates of P6a' to P6d' may be used as a label indicating the position and range of the rectangle. Of course, the short side of the rectangle can be the same length as the short side of the parallelogram, and the long side of the rectangle can be the same length as the height of the parallelogram.

最後に、教師データ生成部１０３は、算出した上記４つの値を示すラベルを教師画像ＩＭＧ６に対応付けて教師データとする。このように、教師データ生成部１０３は、座標変換部１０２が求めた各頂点の座標から、（１）検出対象の位置を示す情報として当該四角形領域の代表座標を算出し、（２）検出対象の高さを示す情報として当該四角形領域の一辺の長さを算出し、（３）検出対象の傾斜角度を示す情報として上記一辺の傾斜角度を算出し、（４）検出対象の幅を示す情報として上記一辺と該一辺に対向する辺との間の距離を算出してもよい。 Finally, the teacher data generation unit 103 associates labels indicating the above four calculated values with the teacher image IMG6 to generate teacher data. In this way, the teacher data generation unit 103 may, from the coordinates of each vertex obtained by the coordinate conversion unit 102, (1) calculate representative coordinates of the rectangular area as information indicating the position of the detection target, (2) calculate the length of one side of the rectangular area as information indicating the height of the detection target, (3) calculate the inclination angle of the one side as information indicating the inclination angle of the detection target, and (4) calculate the distance between the one side and the side opposite to the one side as information indicating the width of the detection target.

上記の構成によれば、物体検出のラベルとして一般的な、代表座標、高さ、および幅、に加えて、棒状の検出対象の学習の際に有用な傾斜角度を示すラベルが対応付けられた教師データを生成することができる。 The above configuration makes it possible to generate training data that is associated with labels indicating tilt angles that are useful when learning about rod-shaped detection targets, in addition to representative coordinates, height, and width, which are common labels for object detection.

〔処理の流れ〕
情報処理装置１が実行する処理の流れを図３に基づいて説明する。図３は、情報処理装置１が実行する処理の一例を示すフローチャートである。なお、図３の処理の開始前に、学習に必要な設定情報は読み込みが終了しているものとする。設定情報としては、例えば画像１１１の格納場所、教師画像を生成する際に使用する乱数の範囲、一度に読み込む画像１１１の数、および学習の終了条件等が挙げられる。 [Processing flow]
The flow of processing executed by the information processing device 1 will be described with reference to Fig. 3. Fig. 3 is a flowchart showing an example of processing executed by the information processing device 1. It is assumed that the reading of setting information necessary for learning has been completed before the start of the processing in Fig. 3. Examples of the setting information include a storage location for the image 111, a range of random numbers used when generating a teacher image, the number of images 111 to be read at one time, and a condition for ending learning.

Ｓ１では、教師画像生成部１０１が、記憶部１１から画像１１１を取得する。取得する画像１１１の数は上述のように設定情報で設定しておけばよい。取得する画像１１１の数は１つであっても複数であってもよい。また、教師画像生成部１０１は、何れの画像１１１を取得するかを乱数によって決定してもよい。 In S1, the teacher image generating unit 101 acquires images 111 from the memory unit 11. The number of images 111 to be acquired may be set in the configuration information as described above. The number of images 111 to be acquired may be one or more. In addition, the teacher image generating unit 101 may determine which image 111 to acquire using a random number.

Ｓ２では、教師画像生成部１０１は、Ｓ１で取得した画像１１１の切り出し範囲を決定する。例えば、教師画像生成部１０１は、乱数により切り出し範囲の位置、幅、および高さを決定してもよい。 In S2, the teacher image generating unit 101 determines the cut-out range of the image 111 acquired in S1. For example, the teacher image generating unit 101 may determine the position, width, and height of the cut-out range using random numbers.

Ｓ３（教師画像生成ステップ）では、教師画像生成部１０１は、Ｓ１で読み出した画像１１１をＳ２で決定した切り出し範囲で切り出して教師画像の元になる元画像とする。そして、教師画像生成部１０１は、上記の元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する。なお、拡大または縮小の倍率は、生成する教師画像の高さ方向および幅方向のサイズと、元画像の高さ方向および幅方向のサイズによって決まる。教師画像生成部１０１は、教師画像の高さ方向および幅方向のサイズを乱数によって決定してもよい。 In S3 (teacher image generation step), the teacher image generation unit 101 cuts out the image 111 read out in S1 within the cut-out range determined in S2 to generate an original image that will be the basis for the teacher image. The teacher image generation unit 101 then performs a conversion on the original image to enlarge or reduce it in at least one of the height and width directions to generate a teacher image. The magnification of enlargement or reduction is determined by the height and width sizes of the teacher image to be generated and the height and width sizes of the original image. The teacher image generation unit 101 may determine the height and width sizes of the teacher image using random numbers.

Ｓ４（座標変換ステップ）では、まず、座標変換部１０２が、上記元画像における検出対象の範囲を示す４頂点の座標を算出する。そして、座標変換部１０２は、算出した座標を、元画像に施された変換の倍率に基づいて変換して、Ｓ３で生成された教師画像における検出対象の範囲を示す４頂点の座標を算出する。各頂点の座標の算出方法については、図２に基づいて説明したとおりであるから、ここでは説明を繰り返さない。 In S4 (coordinate transformation step), first, the coordinate transformation unit 102 calculates the coordinates of the four vertices that indicate the range of the detection target in the original image. The coordinate transformation unit 102 then transforms the calculated coordinates based on the magnification of the transformation performed on the original image, to calculate the coordinates of the four vertices that indicate the range of the detection target in the teacher image generated in S3. The method of calculating the coordinates of each vertex is as described with reference to FIG. 2, so the description will not be repeated here.

Ｓ５では、教師データ生成部１０３は、Ｓ４で座標変換部１０２が算出した各頂点の座標から、Ｓ３で生成された教師画像に写る検出対象の位置および範囲を示す情報として、検出対象の代表座標、幅、高さ、および傾斜角度を算出する。これらの情報の算出方法については、図２に基づいて説明したとおりであるから、ここでは説明を繰り返さない。 In S5, the teacher data generation unit 103 calculates the representative coordinates, width, height, and tilt angle of the detection target as information indicating the position and range of the detection target appearing in the teacher image generated in S3 from the coordinates of each vertex calculated by the coordinate conversion unit 102 in S4. The method of calculating this information is as described with reference to FIG. 2, so the description will not be repeated here.

Ｓ６（教師データ生成ステップ）では、教師データ生成部１０３が、Ｓ５で算出した各情報からラベルを生成し、Ｓ３で生成された教師画像に対応付けて教師データとする。そして、教師データ生成部１０３は、生成した教師データを教師データ１１２として記憶部１１に記憶させる。変換後の座標からラベルを生成する方法については、図２に基づいて説明したとおりであるから、ここでは説明を繰り返さない。 In S6 (teacher data generation step), the teacher data generation unit 103 generates labels from each piece of information calculated in S5, and associates them with the teacher image generated in S3 to generate teacher data. The teacher data generation unit 103 then stores the generated teacher data in the storage unit 11 as teacher data 112. The method of generating labels from the transformed coordinates is as described with reference to FIG. 2, and therefore will not be described again here.

Ｓ７（学習ステップ）では、学習部１０４が、Ｓ６で生成された教師データ１１２を用いて機械学習を行う。例えば、ニューラルネットワークモデルの学習を行う場合には、学習部１０４は、学習中のモデルに教師データ１１２に含まれる教師画像を入力し、当該モデルから出力される出力値と、当該教師画像に対応付けられたラベルの値との誤差を算出する。そして、学習部１０４は、算出した誤差に基づいて誤差逆伝搬法により、ニューラルネットワークモデルの重み値を更新する。 In S7 (learning step), the learning unit 104 performs machine learning using the teacher data 112 generated in S6. For example, when learning a neural network model, the learning unit 104 inputs a teacher image included in the teacher data 112 to the model being learned, and calculates the error between the output value output from the model and the label value associated with the teacher image. Then, the learning unit 104 updates the weight values of the neural network model by the backpropagation method based on the calculated error.

Ｓ８では、学習部１０４は、学習を終了するか否かを判定する。ここで終了すると判定された場合（Ｓ８でＹＥＳ）には、学習部１０４は、最新の更新後のモデルを学習済みモデル１１３として記憶部１１に記憶させ、図３の処理は終了となる。一方、終了しないと判定された場合（Ｓ８でＮＯ）には処理はＳ１に戻り、新たな画像１１１が読み出される。Ｓ８の終了条件は設定情報として予め定めておけばよい。例えば、モデルの更新が所定回数以上行われたことを終了条件としてもよいし、誤差が閾値以下となったことを終了条件としてもよい。 In S8, the learning unit 104 determines whether or not to end the learning. If it is determined that the learning should be ended (YES in S8), the learning unit 104 stores the latest updated model in the storage unit 11 as the trained model 113, and the processing in FIG. 3 ends. On the other hand, if it is determined that the learning should not be ended (NO in S8), the processing returns to S1, and a new image 111 is read out. The end condition for S8 may be set in advance as setting information. For example, the end condition may be that the model has been updated a predetermined number of times or more, or that the error is equal to or less than a threshold value.

なお、Ｓ１で複数の画像１１１を読み出していた場合には、Ｓ２～Ｓ６の処理により、Ｓ１で読み出された複数の画像１１１のそれぞれから教師データが生成され、教師データ１１２として記憶される。そして、Ｓ７では、それらの教師データ１１２を用いた学習が行われる。 If multiple images 111 are read in S1, teacher data is generated from each of the multiple images 111 read in S1 through the processes of S2 to S6, and stored as teacher data 112. Then, in S7, learning is performed using the teacher data 112.

図３のＳ１～Ｓ６の処理は教師データの生成方法である。また、この教師データの生成方法に、機械学習によって学習済みモデルを生成するＳ７～Ｓ８の処理を加えたＳ１～Ｓ８は学習済みモデルの生成方法である。なお、Ｓ１～Ｓ６の処理とＳ７～Ｓ８の処理は、必ずしも連続で行う必要はない。Ｓ１では学習に必要な数の画像１１１を取得し、それらの画像１１１からＳ２～Ｓ６の処理により学習に必要な数の教師データ１１２が生成された後で、Ｓ７～Ｓ８の処理を行うようにしてもよい。 The processes S1 to S6 in FIG. 3 are a method for generating teacher data. Furthermore, S1 to S8, which are the method for generating teacher data plus the processes S7 to S8 for generating a trained model by machine learning, are a method for generating a trained model. Note that the processes S1 to S6 and S7 to S8 do not necessarily need to be performed consecutively. In S1, the number of images 111 required for learning may be acquired, and after the number of teacher data 112 required for learning is generated from these images 111 by the processes S2 to S6, the processes S7 to S8 may be performed.

また、教師データの生成においては、教師データのバリエーションを増やすため、教師画像の色や回転角度を元画像から変化させてもよい。教師データのバリエーションを増やすための色変換や回転変換の手法としては、従来から適用されている様々な手法を適用することができる。 In addition, when generating training data, the color and rotation angle of the training image may be changed from the original image to increase the variation of the training data. Various conventional methods can be used for color conversion and rotation conversion to increase the variation of the training data.

以上のように、情報処理装置１が実行する教師データの生成方法は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップ（Ｓ３）と、変換の倍率に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップ（Ｓ４）と、座標変換ステップによる変換後の座標に基づいて教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データとする教師データ生成ステップ（Ｓ６）と、を含む。これにより、教師画像における検出対象の位置および範囲を正しく示すラベルを含む教師データを生成することができる。 As described above, the method for generating teacher data executed by the information processing device 1 includes a teacher image generation step (S3) for generating a teacher image by performing a transformation on an original image in which the detection target is captured, enlarging or reducing the image in at least one of the height and width directions; a coordinate transformation step (S4) for transforming the coordinates of each vertex of a rectangular area indicating the position and range of the detection target captured in the original image based on the transformation magnification into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection target captured in the teacher image based on the coordinates after transformation in the coordinate transformation step; and a teacher data generation step (S6) for generating labels indicating the position and range of the detection target captured in the teacher image based on the coordinates after transformation in the coordinate transformation step, and associating the labels with the teacher image to generate teacher data. This makes it possible to generate teacher data including labels that correctly indicate the position and range of the detection target in the teacher image.

以上のように、情報処理装置１が実行する学習済みモデルの生成方法は、検出対象が写る元画像に対し、高さ方向および幅方向の少なくとも何れかに拡大または縮小する変換を施して教師画像を生成する教師画像生成ステップ（Ｓ３）と、変換の倍率に基づいて、元画像に写る検出対象の位置および範囲を示す矩形領域の各頂点の座標を、教師画像に写る検出対象の位置および範囲を示す四角形領域の各頂点の座標に変換する座標変換ステップ（Ｓ４）と、座標変換ステップによる変換後の座標に基づいて教師画像に写る検出対象の位置および範囲を示すラベルを生成し、教師画像に対応付けて教師データとする教師データ生成ステップ（Ｓ６）と、教師データ生成ステップで生成された教師データを用いた機械学習により、画像から検出対象を検出するための学習済みモデルを生成する学習ステップ（Ｓ７）と、を含む。これにより、検出対象の位置および範囲を正しく示すラベルを含む教師データを用いて、画像から検出対象を検出するための学習済みモデルを自動で生成することができる。 As described above, the method for generating a trained model executed by the information processing device 1 includes a trained image generation step (S3) for generating a trained image by performing a transformation to enlarge or reduce the original image in which the detection target appears in at least one of the height direction and the width direction, a coordinate transformation step (S4) for transforming the coordinates of each vertex of a rectangular area indicating the position and range of the detection target appearing in the original image into the coordinates of each vertex of a rectangular area indicating the position and range of the detection target appearing in the trained image based on the transformation magnification, a trained data generation step (S6) for generating labels indicating the position and range of the detection target appearing in the trained image based on the coordinates after transformation by the coordinate transformation step, and associating them with the trained image to use them as trained data, and a learning step (S7) for generating a trained model for detecting the detection target from an image by machine learning using the trained data generated in the trained data generation step. As a result, a trained model for detecting the detection target from an image can be automatically generated using trained data including labels that correctly indicate the position and range of the detection target.

〔変形例〕
上述の実施形態で説明した各処理の実行主体は任意であり、上述の例に限られない。つまり、相互に通信可能な複数の情報処理装置により、情報処理装置１と同様の機能を有する情報処理システムを構築することができる。例えば、教師画像の生成と、座標の変換と、教師データの生成と、学習と、推論とをそれぞれ別の情報処理装置が実行する構成としてもよい。また、例えば、教師画像の生成から教師データの生成までの処理、つまり教師データの生成方法を実行する情報処理装置と、生成された教師データを用いて学習済みモデルを生成する情報処理装置と、生成された学習済みモデルを用いて推論を行う情報処理装置と、を含む情報処理システムを構築することもできる。このように、上述の教師データの生成方法は、１または複数の情報処理装置により実現できる。上述の学習済みモデルの生成方法についても同様である。また、検出対象を撮影する撮影装置を情報処理システムの構成要素に含めてもよい。 [Modifications]
The execution subject of each process described in the above embodiment is arbitrary and is not limited to the above example. In other words, an information processing system having the same function as the information processing device 1 can be constructed by a plurality of information processing devices capable of communicating with each other. For example, the generation of the teacher image, the conversion of coordinates, the generation of the teacher data, learning, and inference may be respectively executed by different information processing devices. In addition, for example, an information processing system including an information processing device that executes the process from the generation of the teacher image to the generation of the teacher data, that is, the method of generating the teacher data, an information processing device that generates a trained model using the generated teacher data, and an information processing device that performs inference using the generated trained model can be constructed. In this way, the above-mentioned method of generating the teacher data can be realized by one or more information processing devices. The same applies to the above-mentioned method of generating the trained model. In addition, a photographing device that photographs the detection target may be included in the components of the information processing system.

〔ソフトウェアによる実現例〕
情報処理装置１（以下、「装置」と呼ぶ）の機能は、当該装置としてコンピュータを機能させるためのプログラムであって、当該装置の各制御ブロック（特に制御部１０に含まれる各部）としてコンピュータを機能させるためのプログラムにより実現することができる。 [Software implementation example]
The functions of the information processing device 1 (hereinafter referred to as the "device") can be realized by a program for causing a computer to function as the device, and a program for causing a computer to function as each control block of the device (particularly each part included in the control unit 10).

この場合、上記装置は、上記プログラムを実行するためのハードウェアとして、少なくとも１つの制御装置（例えばプロセッサ）と少なくとも１つの記憶装置（例えばメモリ）を有するコンピュータを備えている。この制御装置と記憶装置により上記プログラムを実行することにより、上記各実施形態で説明した各機能が実現される。 In this case, the device includes a computer having at least one control device (e.g., a processor) and at least one storage device (e.g., a memory) as hardware for executing the program. The control device and storage device execute the program, thereby realizing each of the functions described in each of the above embodiments.

上記制御装置としては、例えばＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、あるいはこれらの組み合わせ等を適用してもよい。また、上記記憶装置には、高速でデータの書き込みおよび読み出しが可能な高速記憶部と、高速記憶部よりもデータの記憶容量が大きい大容量記憶部とが含まれていてもよい。高速記憶部としては、例えばＳＤＲＡＭ（Synchronous Dynamic Random-Access Memory）等の高速アクセスメモリを適用することもできる。また、大容量記憶部としては、例えばＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid-State Drive）、ＳＤ（Secure Digital）カード、あるいはｅＭＭＣ（embedded Multi-Media Controller）等を適用することもできる。 The control device may be, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a combination of these. The storage device may include a high-speed storage unit capable of writing and reading data at high speed, and a large-capacity storage unit having a larger data storage capacity than the high-speed storage unit. The high-speed storage unit may be, for example, a high-speed access memory such as an SDRAM (Synchronous Dynamic Random-Access Memory). The large-capacity storage unit may be, for example, a hard disk drive (HDD), a solid-state drive (SSD), a secure digital (SD) card, or an embedded multi-media controller (eMMC).

上記プログラムは、一時的ではなく、コンピュータ読み取り可能な、１または複数の記録媒体に記録されていてもよい。この記録媒体は、上記装置が備えていてもよいし、備えていなくてもよい。後者の場合、上記プログラムは、有線または無線の任意の伝送媒体を介して上記装置に供給されてもよい。 The program may be recorded on one or more computer-readable recording media, not on a temporary basis. The recording media may or may not be included in the device. In the latter case, the program may be provided to the device via any wired or wireless transmission medium.

また、上記各制御ブロックの機能の一部または全部は、集積回路（ＩＣチップ）等に形成された論理回路により実現することも可能である。例えば、上記各制御ブロックとして機能する論理回路が形成された集積回路も本発明の範疇に含まれる。この他にも、例えば量子コンピュータにより上記各制御ブロックの機能を実現することも可能である。 In addition, some or all of the functions of each of the above control blocks can be realized by a logic circuit formed in an integrated circuit (IC chip) or the like. For example, an integrated circuit in which a logic circuit that functions as each of the above control blocks is formed is also included in the scope of the present invention. In addition, it is also possible to realize the functions of each of the above control blocks by, for example, a quantum computer.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. The technical scope of the present invention also includes embodiments obtained by appropriately combining the technical means disclosed in different embodiments.

１情報処理装置
１０１教師画像生成部
１０２座標変換部
１０３教師データ生成部
１０４学習部 1 Information processing device 101 Teacher image generation unit 102 Coordinate conversion unit 103 Teacher data generation unit 104 Learning unit

Claims

a teacher image generating unit that performs a transformation to enlarge or reduce an original image including a detection target in at least one of a height direction and a width direction to generate a teacher image;
a coordinate conversion unit that converts, based on the conversion magnification, the coordinates of each vertex of a rectangular area indicating the position and range of the detection object in the original image into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection object in the teacher image;
a teacher data generation unit that generates labels indicating a position of the detection target shown in the teacher image , a range of the detection target shown in the teacher image , and an inclination angle of the detection target shown in the teacher image based on the coordinates converted by the coordinate conversion unit, and associates the labels with the teacher image to set as teacher data ;
The teacher data generation unit calculates an inclination angle of one side of the rectangular area as information indicating the inclination angle of the detection object in the label from the coordinates of each vertex of the rectangular area .

The teacher data generation unit calculates, from the coordinates of each vertex of the quadrangular region,
(1) calculating representative coordinates of the rectangular area as information indicating the position of the detection target on the label ;
(2) The information processing device according to claim 1, further comprising: calculating a length of one side of the rectangular area and a distance between the one side and an opposite side to the one side as information indicating the range of the detection target on the label.

The information processing device according to claim 1 or 2, further comprising a learning unit that generates a trained model for detecting the detection target from an image by machine learning using the training data generated by the training data generation unit.

A method for generating teacher data executed by one or more information processing devices, comprising:
a teacher image generating step of generating a teacher image by performing a transformation such that an original image including a detection target is enlarged or reduced in at least one of a height direction and a width direction;
a coordinate transformation step of transforming the coordinates of each vertex of a rectangular area indicating the position and range of the detection object in the original image into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection object in the teacher image based on the magnification of the transformation;
a teacher data generating step of generating labels indicating a position of the detection target shown in the teacher image , a range of the detection target shown in the teacher image , and an inclination angle of the detection target shown in the teacher image based on the coordinates converted by the coordinate conversion step, and associating the labels with the teacher image to obtain teacher data;
A method for generating teacher data, in which the teacher data generating step calculates an inclination angle of one side of the rectangular area from the coordinates of each vertex of the rectangular area as information indicating the inclination angle of the detection object in the label .

A method for generating a trained model executed by one or more information processing devices,
a teacher image generating step of generating a teacher image by performing a transformation such that an original image including a detection target is enlarged or reduced in at least one of a height direction and a width direction;
a coordinate transformation step of transforming the coordinates of each vertex of a rectangular area indicating the position and range of the detection object in the original image into the coordinates of each vertex of a quadrangular area indicating the position and range of the detection object in the teacher image based on the magnification of the transformation;
a teacher data generation step of generating labels indicating a position of the detection target shown in the teacher image , a range of the detection target shown in the teacher image , and an inclination angle of the detection target shown in the teacher image based on the coordinates converted by the coordinate conversion step, and associating the labels with the teacher image to set as teacher data;
A learning step of generating a trained model for detecting the detection target from an image by machine learning using the training data generated in the training data generation step ,
A method for generating a trained model , in which the teacher data generation step calculates an inclination angle of one side of the rectangular area from the coordinates of each vertex of the rectangular area as information indicating the inclination angle of the detection object in the label .

A program for causing a computer to function as the information processing device according to claim 1, the program causing a computer to function as the teacher image generating unit, the coordinate conversion unit, and the teacher data generating unit.