JP7711764B2

JP7711764B2 - Image processing method and image processing device

Info

Publication number: JP7711764B2
Application number: JP2023558635A
Authority: JP
Inventors: カイラン; ヤンリー; ニージャン
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2025-07-23
Anticipated expiration: 2041-03-24
Also published as: WO2022198526A1; JP2024512583A

Description

本開示の実施形態は、全体として、画像処理の分野に関するものであり、特に画像処理の方法、装置及びコンピュータ可読媒体に関するものである。 Embodiments of the present disclosure relate generally to the field of image processing, and more particularly to methods, apparatus, and computer-readable media for image processing.

実際の臨床現場では、正確な診断には証拠データのあらゆる側面を考慮する必要がある。例えば、経鼻内視鏡診断では、医師は同じカメラ位置で異なる光源を手動で切り替えて疑わしい病変や腫瘍を調べることが多い。異なる光源下の病変や腫瘍は異なる特徴を示す可能性があり、医師がより正確で信頼性の高い判断を下すのに有用な情報を大量に提供することができる。したがって、これらの特徴は、経鼻内視鏡画像についての画像処理タスク（例えば、セマンティックセグメンテーション、インスタンスセグメンテーション、及び／又は物体識別）の性能向上にも寄与すると仮定するのが妥当である。 In real clinical practice, accurate diagnosis requires considering all aspects of evidential data. For example, in transnasal endoscopy diagnosis, physicians often manually switch between different light sources at the same camera position to examine suspicious lesions or tumors. Lesions or tumors under different light sources may exhibit different features, which can provide a large amount of information useful for physicians to make more accurate and reliable decisions. Therefore, it is reasonable to assume that these features also contribute to improving the performance of image processing tasks (e.g., semantic segmentation, instance segmentation, and/or object identification) on transnasal endoscopy images.

全体として、本開示の例示的な実施形態は、画像処理の方法、装置及びコンピュータ可読媒体を提供する。 Overall, exemplary embodiments of the present disclosure provide methods, apparatus, and computer-readable media for image processing.

第１の態様において、画像処理の方法が提供される。前記方法は、異なる光源の下でキャプチャされた物体に関する複数の画像であって、ターゲット画像と少なくとも１つの関連画像とを含む複数の画像を取得することと、前記複数の画像のセグメンテーション結果に基づいて、前記ターゲット画像についてのセグメンテーションラベルを生成することと、を含む。 In a first aspect, a method of image processing is provided. The method includes obtaining a plurality of images of an object captured under different illuminants, the plurality of images including a target image and at least one related image, and generating a segmentation label for the target image based on segmentation results of the plurality of images.

第２の態様において、画像処理の方法が提供される。前記方法は、異なる光源の下でキャプチャされた物体に関する複数の画像を取得することと、前記複数の画像をレジストレーションすることと、レジストレーションされた前記複数の画像に基づいて前記物体を識別することと、を含む。 In a second aspect, a method of image processing is provided, the method including obtaining a plurality of images of an object captured under different illuminants, registering the plurality of images, and identifying the object based on the registered plurality of images.

第３の態様において、画像処理の装置が提供される。前記装置は少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、異なる光源の下でキャプチャされた物体に関する複数の画像であって、ターゲット画像と少なくとも１つの関連画像とを含む複数の画像を取得し、前記複数の画像のセグメンテーション結果に基づいて、前記ターゲット画像についてのセグメンテーションラベルを生成するように設定されている。 In a third aspect, an apparatus for image processing is provided. The apparatus includes at least one processor configured to obtain a plurality of images of an object captured under different illuminants, the plurality of images including a target image and at least one related image, and generate a segmentation label for the target image based on segmentation results of the plurality of images.

第４の態様において、画像処理の装置が提供される。前記装置は少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、異なる光源の下でキャプチャされた物体に関する複数の画像を取得し、前記複数の画像をレジストレーションし、レジストレーションされた前記複数の画像に基づいて前記物体を識別するように設定されている。 In a fourth aspect, an apparatus for image processing is provided. The apparatus includes at least one processor configured to obtain a plurality of images of an object captured under different illuminants, register the plurality of images, and identify the object based on the registered plurality of images.

第５の態様において、命令を記憶したコンピュータ可読記憶媒体が提供される。前記命令は、少なくとも１つのプロセッサ上で実行された場合、前記少なくとも一つのプロセッサに、本開示の第１又は第２の態様に記載の方法を実行させる。 In a fifth aspect, a computer-readable storage medium is provided having instructions stored thereon that, when executed on at least one processor, cause the at least one processor to perform a method according to the first or second aspect of the present disclosure.

第６の態様において、マシン実行可能命令を含むコンピュータプログラム製品が提供される。前記マシン実行可能命令は、少なくとも１つのプロセッサ上で実行された場合、前記少なくとも一つのプロセッサに、本開示の第１又は第２の態様に記載の方法を実行させる。 In a sixth aspect, a computer program product is provided that includes machine executable instructions that, when executed on at least one processor, cause the at least one processor to perform a method according to the first or second aspect of the present disclosure.

発明の概要は、本発明の実施形態の重要又は基本的な特徴を特定することも、本発明の範囲を限定することも意図していないことを理解すべきである。本実施形態のその他の特徴は、以下の説明により容易に理解できるはずである。 It should be understood that this Summary is not intended to identify key or essential features of the embodiments of the present invention, nor to limit the scope of the present invention. Other features of the embodiments will be readily apparent from the following description.

図面において本開示のいくつかの実施形態をさらに詳細に説明することで、本開示の上述の及びその他の目的、特徴及び利点を、さらに明らかにする。図中、同じ符号は、全体として、本開示の実施形態内の同じ構成要素を参照する。
本発明の実施形態を実施可能な例示的な画像処理システムを示す図である。本開示のいくつかの実施形態にかかる画像処理の模式図である。本開示のいくつかの実施形態にかかる画像処理の模式図である。本開示のいくつかの実施形態にかかる画像処理の模式図である。本開示のいくつかの実施形態にかかる画像処理の模式図である。本開示のいくつかの実施形態にかかる画像レジストレーションの模式図である。本開示のいくつかの実施形態にかかる画像処理の例示的な方法を示す図である。本開示のいくつかの実施形態にかかる画像処理の例示的な方法を示す図である。本開示の実施形態を実現するのに適した装置の概略ブロック図である。 The above and other objects, features and advantages of the present disclosure will become more apparent from a more detailed description of several embodiments of the present disclosure in the drawings, in which like reference numerals generally refer to like components within the embodiments of the present disclosure.
FIG. 1 illustrates an exemplary image processing system in which embodiments of the present invention may be implemented. FIG. 2 is a schematic diagram of image processing according to some embodiments of the present disclosure. FIG. 2 is a schematic diagram of image processing according to some embodiments of the present disclosure. FIG. 2 is a schematic diagram of image processing according to some embodiments of the present disclosure. FIG. 2 is a schematic diagram of image processing according to some embodiments of the present disclosure. FIG. 1 is a schematic diagram of image registration in accordance with some embodiments of the present disclosure. FIG. 2 illustrates an exemplary method of image processing according to some embodiments of the present disclosure. FIG. 2 illustrates an exemplary method of image processing according to some embodiments of the present disclosure. FIG. 1 is a schematic block diagram of an apparatus suitable for implementing embodiments of the present disclosure.

ここで、いくつかの例示的実施形態を参照して、本開示の原理を説明する。これらの実施形態は、説明のためにのみ記載され、当業者が本開示を理解し、実施するのを助けるものであり、本開示の範囲に関するいかなる限定も示唆しないことを理解すべきである。本明細書で説明される開示内容は、以下で説明される方法とは異なる様々な方法で実施することができる。 The principles of the present disclosure will now be described with reference to some exemplary embodiments. It should be understood that these embodiments are provided for illustrative purposes only, to aid those skilled in the art in understanding and implementing the present disclosure, and do not imply any limitations on the scope of the present disclosure. The disclosure described herein can be implemented in a variety of ways different from those described below.

以下の説明及び特許請求の範囲において、別途定義されていない限り、本文で使用される全ての技術的及び科学的用語は、本開示の当業者が一般に理解するものと同一の意味を有する。 In the following description and claims, unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.

本明細書で使用される単数形「１つ」、及び「前記」は、文脈に明示的に示されていない限り、複数形も含まれる。用語「含む」及びその変型は、「含むが、これらに限定されるものではない」を意味するオープンエンド用語として理解されるべきである。用語「に基づく」は、「に少なくとも部分的に基づく」と理解されるべきである。用語「一実施形態」及び「実施形態」は、「少なくとも１つの実施形態」と理解されるべきである。用語「別の実施形態」は、「少なくとも１つの別の実施形態」と理解されるべきである。用語「第１」、「第２」などは、異なる又は同一の対象を指してもよい。以下では、その他の明示的及び暗黙的な定義を含む場合がある。 As used herein, the singular forms "a," "an," and "said" include the plural unless the context clearly indicates otherwise. The term "comprises" and variations thereof should be understood as open-ended terms meaning "including, but not limited to." The term "based on" should be understood as "based at least in part on." The terms "one embodiment" and "embodiment" should be understood as "at least one embodiment." The term "another embodiment" should be understood as "at least one other embodiment." The terms "first," "second," etc. may refer to different or the same object. The following may include other explicit and implicit definitions.

いくつかの例において、値、プロシージャ、又は機器は、「最良」、「最低」、「最高」、「最小」、「最大」などと称される。このような説明は、多くの使用される機能的代替案の中から選択することができることを示すことを意図されており、そして、このような選択は、他の選択より良く、より小さく、より高い必要がなく、又はそのほかの点でより好ましい必要はないことが、理解されるであろう。 In some instances, values, procedures, or devices are referred to as "best," "lowest," "highest," "minimum," "maximum," etc. It will be understood that such descriptions are intended to indicate that selections may be made from among many functional alternatives used, and that such selections are not necessarily better, smaller, higher, or otherwise more preferred than other selections.

本明細書で使用されるように、「ニューラルネットワーク」又は「ネットワーク」は、入力を処理し、対応する出力を提供することができ、通常は、入力層と、出力層と、入力層と出力層との間の１つ又は複数の隠れ層を含む。ニューラルネットワークは通常、順に接続された多数の層を含み、ここで、前の層の出力は次の層の入力として提供され、入力層はニューラルネットワークの入力を受け付けて、出力層の出力はニューラルネットワークの最終出力として機能する。ニューラルネットワークの各層は、１つ又は複数のノード（プロセッシングノード又はニューロンとも称される）を含み、各ノードは、前の層からの入力を処理する。以下では、用語「ニューラルネットワーク」、「モデル」、「ネットワーク」及び「ニューラルネットワークモデル」は、互換的に使用することができる。 As used herein, a "neural network" or "network" can process inputs and provide corresponding outputs, and typically includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer. A neural network typically includes multiple layers connected in sequence, where the output of the previous layer is provided as the input of the next layer, the input layer accepts the input of the neural network, and the output of the output layer serves as the final output of the neural network. Each layer of a neural network includes one or more nodes (also referred to as processing nodes or neurons), and each node processes the input from the previous layer. Hereinafter, the terms "neural network", "model", "network" and "neural network model" can be used interchangeably.

上述したように、異なる光源の下でキャプチャされる同じ物体に関する画像は、異なる特徴を示し、画像処理の性能を向上させるに有用な情報を大量に提供することができる。従来の画像処理ソリューションによれば、単一の画像について画像セグメンテーション（例えば、セマンティックセグメンテーション、インスタンスセグメンテーション）又は物体識別を実行して処理結果を得ることができる。しかしながら、従来のソリューションでは、これらの特徴や有用な情報を利用して画像処理の性能を向上させることはできなかった。さらに、同じ物体に関する画像を異なる光源の下で異なる時間にキャプチャすることができるので、画像内の物体はわずかな歪みを持つかもしれない。この場合、画像について直接画像セグメンテーションや物体識別を行うことは、セグメンテーション／識別精度を悪くしてしまう可能性がある。 As mentioned above, images of the same object captured under different light sources can exhibit different features and provide a large amount of useful information to improve the performance of image processing. Conventional image processing solutions can perform image segmentation (e.g., semantic segmentation, instance segmentation) or object identification on a single image to obtain processing results. However, conventional solutions cannot utilize these features or useful information to improve the performance of image processing. Furthermore, since images of the same object can be captured at different times under different light sources, the objects in the images may have slight distortions. In this case, performing image segmentation or object identification directly on the images may result in poor segmentation/identification accuracy.

本開示の実施形態は、上記の問題及び１つ又は複数の他の潜在的な問題を解決するために、画像処理のための解決策を提供する。いくつかの実施形態において、異なる光源の下でキャプチャされた物体に関する複数の画像を取得することができ、ここで、該複数の画像は、ターゲット画像と少なくとも１つの関連画像とを含む。複数の画像のセグメンテーション結果に基づいて、ターゲット画像についてのセグメンテーションラベルを生成してもよい。こうして、異なる光源の下でキャプチャされた物体に関する画像のセグメンテーション結果を合成してターゲット画像についての最終的なセグメンテーション結果を得ることにより、ターゲット画像のセグメンテーション精度を向上させることができる。いくつかの他の実施形態において、異なる光源の下でキャプチャされた物体に関する複数の画像を取得し、レジストレーションしてもよい。レジストレーションされた複数の画像に基づいて物体を識別してもよい。こうして、異なる画像間での物体のわずかな変形の影響を画像レジストレーションにより除去することにより、物体識別の精度を向上させることができる。 Embodiments of the present disclosure provide a solution for image processing to solve the above problems and one or more other potential problems. In some embodiments, multiple images of an object captured under different light sources may be obtained, where the multiple images include a target image and at least one related image. A segmentation label may be generated for the target image based on the segmentation results of the multiple images. In this way, the segmentation accuracy of the target image may be improved by combining the segmentation results of the images of the object captured under different light sources to obtain a final segmentation result for the target image. In some other embodiments, multiple images of the object captured under different light sources may be obtained and registered. The object may be identified based on the registered multiple images. In this way, the accuracy of object identification may be improved by removing the effect of slight deformation of the object between different images by image registration.

以下、添付の図面を参照して、本開示のいくつかの例示的な実施形態について説明する。しかしながら、これらの図面に関する本明細書で与えられた詳細な説明は、説明の目的のためにのみ提供され、本開示の範囲についてのいかなる限定も示唆しないことを、当業者であれば容易に理解するはずである。 Hereinafter, some exemplary embodiments of the present disclosure will be described with reference to the accompanying drawings. However, it should be readily understood by those skilled in the art that the detailed description given in this specification with respect to these drawings is provided for illustrative purposes only and does not imply any limitation on the scope of the present disclosure.

図１は本発明の実施形態を実施可能な例示的な画像処理システム１００を示す図である。図１に示すように、システム１００は、画像収集装置１１０と画像処理装置１２０とを含んでもよい。いくつかの実施形態において、装置１１０及び１２０は、それぞれ異なる物理装置内で実現されてもよい。代替として、装置１１０及び１２０は、同じ物理装置内で実現されてもよい。システム１００の構成は、説明の目的のためにのみ示されており、本開示の範囲についてのいかなる限定も示唆しないことを、理解すべきである。本開示の実施形態は、異なる構成を有する他のシステムにも適用されてもよい。 1 is a diagram illustrating an exemplary image processing system 100 in which embodiments of the present invention may be implemented. As shown in FIG. 1, the system 100 may include an image acquisition device 110 and an image processing device 120. In some embodiments, the devices 110 and 120 may be implemented in different physical devices. Alternatively, the devices 110 and 120 may be implemented in the same physical device. It should be understood that the configuration of the system 100 is shown for illustrative purposes only and does not imply any limitation on the scope of the present disclosure. The embodiments of the present disclosure may also be applied to other systems having different configurations.

画像収集装置１１０は、画像処理装置１２０により処理される画像１０１を収集してもよい。いくつかの実施形態において、画像収集装置１１０は、同じカメラ位置で異なる光源の下でキャプチャされた、物体に関する複数の画像を収集してもよい。例えば、異なる光源は、異なる波長又は波長の異なる組み合わせに関連付けられてもよい。いくつかの実施形態において、画像収集装置１１０は、医療補助装置又は内視鏡補助装置であってもよい。画像１０１は、内視鏡画像などの医用画像であってもよい。例えば、上述したように、経鼻内視鏡診断では、医師は同じカメラ位置で異なる光源を手動で切り替えて疑わしい病変を調べ、鼻病変について複数の画像をキャプチャしてもよい。画像収集装置１１０により収集された画像１０１は、画像処理装置１２０に提供されてもよい。画像処理装置１２０は、画像１０１を処理して画像処理結果１０２を生成してもよい。 The image acquisition device 110 may collect images 101 to be processed by the image processing device 120. In some embodiments, the image acquisition device 110 may collect multiple images of an object captured under different light sources at the same camera position. For example, the different light sources may be associated with different wavelengths or different combinations of wavelengths. In some embodiments, the image acquisition device 110 may be a medical support device or an endoscopy support device. The images 101 may be medical images, such as endoscopy images. For example, as described above, in a transnasal endoscopy diagnosis, a physician may manually switch between different light sources at the same camera position to examine a suspected lesion and capture multiple images of a nasal lesion. The images 101 collected by the image acquisition device 110 may be provided to the image processing device 120. The image processing device 120 may process the images 101 to generate an image processing result 102.

いくつかの実施形態において、画像処理装置１２０は、画像セグメンテーションタスクを実行してもよい。例えば、同じ物体に関する複数の画像は、ターゲット画像と少なくとも１つの関連画像とを含んでもよい。画像処理装置１２０は、複数の画像のセグメンテーション結果に基づいて、ターゲット画像についてのセグメンテーションラベルを生成してもよい。画像処理結果１０２は、ターゲット画像についてのセグメンテーションラベルを示してもよい。本明細書で使用されるように、画像の「セグメンテーション結果」は、画像内の各画素が異なる所定のカテゴリに属するそれぞれの確率を示してもよい。例えば、セグメンテーション結果は、画素があるカテゴリに属する確率を示すために画素の明るさが使用されるヒートマップとして表されてもよい。画像の「セグメンテーションラベル」は、画像内の各画素が所定のカテゴリのうちの１つに属することを示してもよい。例えば、セグメンテーションラベルは、各画素の対応するカテゴリを示すベクトル又はアレイとして表されてもよく、又は異なるカテゴリの画素が異なる色で識別される視覚画像として表されてもよい。 In some embodiments, the image processing device 120 may perform an image segmentation task. For example, the multiple images of the same object may include a target image and at least one related image. The image processing device 120 may generate a segmentation label for the target image based on the segmentation results of the multiple images. The image processing result 102 may indicate the segmentation label for the target image. As used herein, the "segmentation result" of an image may indicate the respective probability that each pixel in the image belongs to different predefined categories. For example, the segmentation result may be represented as a heat map in which the brightness of a pixel is used to indicate the probability that the pixel belongs to a category. The "segmentation label" of an image may indicate that each pixel in the image belongs to one of the predefined categories. For example, the segmentation label may be represented as a vector or array indicating the corresponding category of each pixel, or may be represented as a visual image in which pixels of different categories are identified with different colors.

セマンティックセグメンテーションにおいて、画像のセグメンテーション結果は、画像内の各画素が異なる所定のセマンティックカテゴリに属するそれぞれの確率を示してもよい。画像のセグメンテーションラベルは、画像内の各画素が所定のセマンティックカテゴリのうちの１つに属することを示してもよい。セマンティックカテゴリの例には、背景、人物、動物、車両などが含まれてもよく、これらに限定されない。インスタンスセグメンテーションにおいて、画像のセグメンテーション結果は、画像内の各画素が異なる所定のインスタンスカテゴリに属するそれぞれの確率を示してもよい。画像のセグメンテーションラベルは、画像内の各画素が所定のインスタンスカテゴリのうちの１つに属することを示してもよい。例えば、セマンティックセグメンテーションネットワークは、画像内の異なる人物に対応する画素を、同じセマンティックカテゴリ、例えば人物に分類してもよい。しかしながら、インスタンスセグメンテーションネットワークは、これらのピクセルを異なる人物に対応する異なるインスタンスカテゴリに分類してもよい。以下では、セマンティックセグメンテーションを参照していくつかの実施形態について説明する。しかしながら、これらの実施形態は、インスタンスセグメンテーションにも適用可能であることを理解すべきである。 In semantic segmentation, the segmentation result of an image may indicate the respective probability that each pixel in the image belongs to different predefined semantic categories. The segmentation label of the image may indicate that each pixel in the image belongs to one of the predefined semantic categories. Examples of semantic categories may include, but are not limited to, background, person, animal, vehicle, etc. In instance segmentation, the segmentation result of an image may indicate the respective probability that each pixel in the image belongs to different predefined instance categories. The segmentation label of the image may indicate that each pixel in the image belongs to one of the predefined instance categories. For example, a semantic segmentation network may classify pixels corresponding to different people in an image into the same semantic category, e.g., person. However, the instance segmentation network may classify these pixels into different instance categories corresponding to different people. In the following, some embodiments are described with reference to semantic segmentation. However, it should be understood that these embodiments are also applicable to instance segmentation.

図２Ａ～２Ｃは、本開示のいくつかの実施形態にかかる画像処理の模式図である。図２Ａ～２Ｃに示すように、画像処理装置１２０は、レジストレーションモジュール１２１とセグメンテーションモジュール１２２とを備えてもよい。セグメンテーションモジュール１２２は、セマンティックセグメンテーションモジュール又はインスタンスセグメンテーションモジュールであってもよい。図２Ａ～２Ｃに示す例において、例えば、セグメンテーションモジュール１２２は、セマンティックセグメンテーションモジュールである。画像処理装置１２０により処理される画像には、異なる光源の下でキャプチャされた鼻病変に関する画像２０１、２０２及び２０３が含まれてもよい。 2A-2C are schematic diagrams of image processing according to some embodiments of the present disclosure. As shown in FIGS. 2A-2C, the image processing device 120 may include a registration module 121 and a segmentation module 122. The segmentation module 122 may be a semantic segmentation module or an instance segmentation module. In the example shown in FIGS. 2A-2C, for example, the segmentation module 122 is a semantic segmentation module. The images processed by the image processing device 120 may include images 201, 202, and 203 of nasal lesions captured under different light sources.

図２において、例えば、セマンティックセグメンテーションが実行されるターゲット画像として画像２０１を選択してもよい。画像２０２及び２０３は関連画像である。レジストレーションモジュール１２１は、レジストレーション画像２０４及び２０５を得るために、関連画像２０２及び２０３をターゲット画像２０１にレジストレーションしてもよい。いくつかの実施形態において、関連画像２０２及び２０３の各々について、レジストレーションモジュール１２１は、第１の変換に基づいて関連画像をターゲット画像２０１にレジストレーションすることにより変換画像を生成してから、第２の変換に基づいて変換画像をターゲット画像２０１にレジストレーションすることによりレジストレーション画像を生成してもよい。いくつかの実施形態において、第１の変換は、アフィン変換又は剛体変換であってもよい。第１の変換の例は、平行移動、回転、スケーリング（ｓｃａｌｉｎｇ）などを含んでもよいが、これらに限定されない。いくつかの実施形態において、第２の変換は、変形可能な変換（deformable transformation）又は非剛体変換であってもよい。例えば、第２の変換は、画像の画素又はコンテンツの変換を指してもよい。いくつかの実施形態において、レジストレーションモジュール１２１は、訓練済みの画像レジストレーションネットワークを使用して、以下に図４を参照して説明するように、関連画像２０２及び２０３をターゲット画像２０１にレジストレーションしてもよい。 In FIG. 2, for example, image 201 may be selected as a target image on which semantic segmentation is performed. Images 202 and 203 are related images. Registration module 121 may register related images 202 and 203 to target image 201 to obtain registration images 204 and 205. In some embodiments, for each of related images 202 and 203, registration module 121 may generate a transformed image by registering the related image to target image 201 based on a first transformation, and then generate a registered image by registering the transformed image to target image 201 based on a second transformation. In some embodiments, the first transformation may be an affine transformation or a rigid transformation. Examples of the first transformation may include, but are not limited to, translation, rotation, scaling, and the like. In some embodiments, the second transformation may be a deformable transformation or a non-rigid transformation. For example, the second transformation may refer to a transformation of pixels or content of the image. In some embodiments, the registration module 121 may use a trained image registration network to register the related images 202 and 203 to the target image 201, as described below with reference to FIG. 4.

図２Ａに示すように、ターゲット画像２０１及びレジストレーション画像２０４及び２０５は、セグメンテーションモジュール１２２に入力されてもよい。セグメンテーションモジュール１２２は、画像２０１、２０４及び２０５についてセマンティックセグメンテーションを実行して、それらのセマンティックセグメンテーション結果を生成してもよい。セマンティックセグメンテーションは、訓練済みのセマンティックセグメンテーションネットワークを使用することにより、又は、現在知られている、又は将来開発される任意の他の適切なアルゴリズムを使用することにより実行されてもよい。セグメンテーションモジュール１２２は、画像２０１、２０４及び２０５の重みに基づいて、画像２０１、２０４及び２０５のセマンティックセグメンテーション結果を結合することにより、最終的なセグメンテーション結果２３１を生成してもよい。いくつかの実施形態において、画像２０１、２０４及び２０５に関連付けられるそれぞれの重みは、予め決定されてもよい。例えば、ターゲット画像２０１について変換が行われないため、ターゲット画像２０１に関連付けられる重みが最も高くてもよい。レジストレーション画像２０４に関連付けられた重みと、レジストレーション画像２０５に関連付けられた重みとは、互いに同じであっても異なってもよい。セグメンテーションモジュール１２２は、画像２０１、２０４及び２０５のセマンティックセグメンテーション結果の加重合計を最終的なセグメンテーション結果２１１として決定してもよい。最終的なセグメンテーション結果２１１をａｒｇｍａｘ関数１２３に入力して、ターゲット画像２０１についてのセマンティックセグメンテーションラベル２１２を生成してもよい。例えば、上述したように、セマンティックセグメンテーションラベル２１２は、ターゲット画像２０１内の画素のそれぞれのセマンティックカテゴリを示してもよい。いくつかの実施形態において、セマンティックセグメンテーションラベル２１２は、疾患診断のために医者又は自動診断システムに提供されてもよい。 As shown in FIG. 2A, the target image 201 and the registration images 204 and 205 may be input to the segmentation module 122. The segmentation module 122 may perform semantic segmentation on the images 201, 204, and 205 to generate semantic segmentation results thereof. The semantic segmentation may be performed by using a trained semantic segmentation network or by using any other suitable algorithm now known or developed in the future. The segmentation module 122 may generate the final segmentation result 231 by combining the semantic segmentation results of the images 201, 204, and 205 based on the weights of the images 201, 204, and 205. In some embodiments, the respective weights associated with the images 201, 204, and 205 may be predetermined. For example, the weight associated with the target image 201 may be the highest because no transformation is performed on the target image 201. The weights associated with the registered image 204 and the weights associated with the registered image 205 may be the same or different. The segmentation module 122 may determine a weighted sum of the semantic segmentation results of the images 201, 204, and 205 as a final segmentation result 211. The final segmentation result 211 may be input to an argmax function 123 to generate a semantic segmentation label 212 for the target image 201. For example, as described above, the semantic segmentation label 212 may indicate a semantic category of each of the pixels in the target image 201. In some embodiments, the semantic segmentation label 212 may be provided to a doctor or an automated diagnostic system for disease diagnosis.

いくつかの実施形態において、異なる光源の下でキャプチャされた同じ物体に関する同じグループの画像について、各画像をターゲット画像として選択することができるため、該グループの画像について１グループのセグメンテーションラベルを生成することができる。いくつかの実施形態において、該１グループのセグメンテーションラベルは、疾患診断のために医者又は自動診断システムに直接提供されてもよい。代替として、該１グループのセグメンテーションラベルは、適切な方法で組み合わせられ、その後、疾患診断のために医者又は自動診断システムに提供されてもよい。 In some embodiments, for a group of images of the same object captured under different light sources, each image can be selected as a target image, so that a group of segmentation labels can be generated for the group of images. In some embodiments, the group of segmentation labels can be provided directly to a doctor or an automated diagnostic system for disease diagnosis. Alternatively, the group of segmentation labels can be combined in a suitable manner and then provided to a doctor or an automated diagnostic system for disease diagnosis.

図２Ｂに示すように、例えば、画像２０３をターゲット画像として選択し、画像２０２及び２０１を関連画像として選択してもよい。レジストレーションモジュール１２１は、レジストレーション画像２０６及び２０７を得るために、関連画像２０２及び２０１をターゲット画像２０３にレジストレーションしてもよい。セグメンテーションモジュール１２２は、画像２０３、２０６及び２０７についてセマンティックセグメンテーションを実行して、それらのセマンティックセグメンテーション結果を最終的なセグメンテーション結果２２１に結合してもよい。最終的なセグメンテーション結果２２１をａｒｇｍａｘ関数１２３に入力して、ターゲット画像２０３についてのセマンティックセグメンテーションラベル２２２を生成してもよい。図２Ｃに示すように、例えば、画像２０２をターゲット画像として選択し、画像２０３及び２０１を関連画像として選択してもよい。レジストレーションモジュール１２１は、レジストレーション画像２０８及び２０９を得るために、関連画像２０３及び２０１をターゲット画像２０２にレジストレーションしてもよい。セグメンテーションモジュール１２２は、画像２０２、２０８及び２０９についてセマンティックセグメンテーションを実行して、それらのセマンティックセグメンテーション結果を最終的なセグメンテーション結果２３１に結合してもよい。最終的なセグメンテーション結果２３１をａｒｇｍａｘ関数１２３に入力して、ターゲット画像２０３についてのセマンティックセグメンテーションラベル２３２を生成してもよい。 2B, for example, image 203 may be selected as the target image, and images 202 and 201 may be selected as related images. Registration module 121 may register related images 202 and 201 to target image 203 to obtain registration images 206 and 207. Segmentation module 122 may perform semantic segmentation on images 203, 206, and 207 and combine the semantic segmentation results into a final segmentation result 221. The final segmentation result 221 may be input to argmax function 123 to generate a semantic segmentation label 222 for target image 203. As shown in FIG. 2C, for example, image 202 may be selected as the target image, and images 203 and 201 may be selected as related images. The registration module 121 may register the related images 203 and 201 to the target image 202 to obtain registered images 208 and 209. The segmentation module 122 may perform semantic segmentation on the images 202, 208, and 209 and combine the semantic segmentation results into a final segmentation result 231. The final segmentation result 231 may be input to an argmax function 123 to generate a semantic segmentation label 232 for the target image 203.

例えば、セマンティックセグメンテーションラベル２１２、２２２及び２３２は、疾患診断のために医者又は自動診断システムに直接提供されてもよい。代替として、セマンティックセグメンテーションラベル２１２、２２２及び２３２は、適切な方法で組み合わせられ、その後、疾患診断のために医者又は自動診断システムに提供されてもよい。 For example, the semantic segmentation labels 212, 222 and 232 may be provided directly to a doctor or an automated diagnostic system for disease diagnosis. Alternatively, the semantic segmentation labels 212, 222 and 232 may be combined in a suitable manner and then provided to a doctor or an automated diagnostic system for disease diagnosis.

いくつかの実施形態において、図１に示された画像処理装置１２０は、物体識別タスクを実行してもよい。例えば、異なる光源の下でキャプチャされた物体に関する複数の画像を取得したことに応じて、画像処理装置１２０は、該複数の画像をレジストレーションし、レジストレーションされた複数の画像に基づいて物体を識別してもよい。 In some embodiments, the image processing device 120 shown in FIG. 1 may perform an object identification task. For example, in response to obtaining multiple images of an object captured under different light sources, the image processing device 120 may register the multiple images and identify the object based on the registered multiple images.

図３は本開示のいくつかの実施形態にかかる画像処理の模式図である。図３に示すように、画像処理装置１２０は、図２Ａ～２Ｃと同じレジストレーションモジュール１２１と、物体識別モジュール１２４とを備えてもよい。画像処理装置１２０により処理される画像には、異なる光源の下でキャプチャされた鼻病変に関する画像３０１、３０２及び３０３が含まれてもよい。 Figure 3 is a schematic diagram of image processing according to some embodiments of the present disclosure. As shown in Figure 3, the image processing device 120 may include the same registration module 121 as in Figures 2A-2C and the object identification module 124. The images processed by the image processing device 120 may include images 301, 302, and 303 of nasal lesions captured under different light sources.

図３において、例えば、画像３０１をターゲット画像として選択してもよい。画像３０２及び３０３は関連画像である。レジストレーションモジュール１２１は、レジストレーション画像３０４及び３０５を得るために、関連画像３０２及び３０３をターゲット画像３０１にレジストレーションしてもよい。画像３０１、３０４及び３０５は、物体識別モジュール１２４に入力されてもよい。いくつかの実施形態において、物体識別モジュール１２４は、画像３０１、３０４及び３０５に基づいて物体を識別してもよい。例えば、物体識別モジュール１２４は、訓練済み物体識別ネットワークを使用することにより、又は現在知られている、又は将来開発される任意の適切なアルゴリズムを使用することにより、画像３０１、３０４及び３０５の各々について物体識別を実行して、それらのそれぞれの物体識別結果を得てもよい。そして、物体識別モジュール１２４は、物体識別結果を、それらのそれぞれの重みに基づいて、最終的な物体識別結果３０６に結合してもよい。代替として、いくつかの実施形態において、物体識別モジュール１２４は、画像３０１、３０４及び３０５の各々についてセマンティックセグメンテーション又はインスタンスセグメンテーションを実行して、それらそれぞれのセグメンテーション結果を取得し、セグメンテーション結果をそれらそれぞれの重みに基づいて最終的なセグメンテーション結果に結合してもよい。そして、物体識別モジュール１２４は、最終的なセグメンテーション結果に基づいて物体を識別し、最終的な物体識別結果３０６を得てもよい。例えば、最終的な物体識別結果３０６は、疾患診断のために医者又は自動診断システムに提供されてもよい。 3, for example, image 301 may be selected as a target image. Images 302 and 303 are related images. Registration module 121 may register related images 302 and 303 to target image 301 to obtain registration images 304 and 305. Images 301, 304, and 305 may be input to object identification module 124. In some embodiments, object identification module 124 may identify objects based on images 301, 304, and 305. For example, object identification module 124 may perform object identification on each of images 301, 304, and 305 by using a trained object identification network or by using any suitable algorithm now known or developed in the future to obtain their respective object identification results. Then, object identification module 124 may combine the object identification results into a final object identification result 306 based on their respective weights. Alternatively, in some embodiments, the object identification module 124 may perform semantic segmentation or instance segmentation on each of the images 301, 304, and 305 to obtain their respective segmentation results, and combine the segmentation results into a final segmentation result based on their respective weights. Then, the object identification module 124 may identify objects based on the final segmentation results to obtain a final object identification result 306. For example, the final object identification result 306 may be provided to a doctor or an automatic diagnostic system for disease diagnosis.

いくつかの実施形態において、異なる光源の下でキャプチャされた同じ物体に関する同じグループの画像について、各画像をターゲット画像として選択することができるため、該グループの画像について１グループの物体識別結果を生成することができる。いくつかの実施形態において、該１グループの物体識別結果は、疾患診断のために医者又は自動診断システムに直接提供されてもよい。代替として、該１グループの物体識別結果は、適切な方法で組み合わせられ、その後、疾患診断のために医者又は自動診断システムに提供されてもよい。 In some embodiments, for a group of images of the same object captured under different light sources, each image can be selected as a target image, so that a group of object identification results can be generated for the group of images. In some embodiments, the group of object identification results can be provided directly to a doctor or an automated diagnostic system for disease diagnosis. Alternatively, the group of object identification results can be combined in an appropriate manner and then provided to a doctor or an automated diagnostic system for disease diagnosis.

図４は本開示のいくつかの実施形態にかかる画像レジストレーションの模式図である。図４は、図２Ａ～２Ｃ及び図３に示すレジストレーションネットワーク１２１において使用されることができる画像レジストレーションネットワーク４００を示す。画像レジストレーションネットワーク４００は、サブネットワーク４１０及び４２０を含む。サブネットワーク４１０は、第１の変換に基づいて画像をレジストレーションするように訓練されてもよい。上述したように、第１の変換は、アフィン変換又は剛体変換であってもよい。第１の変換の例は、平行移動、回転、スケーリング（ｓｃａｌｉｎｇ）などを含んでもよいが、これらに限定されない。サブネットワーク４１０は、第２の変換に基づいて画像をレジストレーションするように訓練されてもよい。上述したように、第２の変換は、変形可能な変換又は非剛体変換であってもよい。例えば、第２の変換は、画像の画素又はコンテンツの変換を指してもよい。画像レジストレーションネットワーク４００は、１グループの訓練画像ペアに基づいて訓練されてもよく、ここで、各訓練画像ペアは、固定画像と、固定画像に対する移動画像とを含む。該１グループの訓練画像ペア内の固定画像と移動画像とは、同じ画素数を有してもよい。 4 is a schematic diagram of image registration according to some embodiments of the present disclosure. FIG. 4 illustrates an image registration network 400 that can be used in the registration network 121 illustrated in FIGS. 2A-2C and 3. The image registration network 400 includes sub-networks 410 and 420. The sub-network 410 may be trained to register images based on a first transformation. As described above, the first transformation may be an affine transformation or a rigid transformation. Examples of the first transformation may include, but are not limited to, translation, rotation, scaling, and the like. The sub-network 410 may be trained to register images based on a second transformation. As described above, the second transformation may be a deformable transformation or a non-rigid transformation. For example, the second transformation may refer to a transformation of pixels or content of an image. The image registration network 400 may be trained based on a group of training image pairs, where each training image pair includes a fixed image and a translation image relative to the fixed image. The fixed and moving images in the group of training image pairs may have the same number of pixels.

図４に示すように、訓練段階において、固定画像４０２（以下では「Ｆ」とも表される）と移動画像４０１（以下では「Ｍ」とも表される）とを含む各訓練画像ペアをサブネットワーク４１０に入力して、第１の変換関数
を抽出する。例えば、第１の変換関数
は、アフィン変換関数又は剛体変換関数である。第１の変換関数
に基づいて移動画像４０１を固定画像４０２にレジストレーションすることにより、第１の変換画像４０３を生成することができる。例えば、第１の変換画像４０３は、
として表されてもよい。第１の変換画像４０３及び固定画像４０２は、サブネットワーク４２０に入力されて、第２の変換関数を抽出してもよい。例えば、第２の変換関数
は、変形可能な変換関数又は非剛体変換関数である。第２の変換関数
に基づいて第１の変換画像４０３を固定画像４０２にレジストレーションすることにより、第２の変換画像４０４を生成することができる。例えば、第２の変換画像４０４は、
として表されてもよい。 As shown in FIG. 4, during the training phase, each training image pair including a fixed image 402 (hereinafter also denoted as “F”) and a moving image 401 (hereinafter also denoted as “M”) is input to a sub-network 410 to generate a first transformation function
For example, the first transformation function
is an affine transformation function or a rigid transformation function.
A first transformed image 403 can be generated by registering the moved image 401 to the fixed image 402 based on
The first transformed image 403 and the fixed image 402 may be input to a sub-network 420 to extract a second transformation function. For example, the second transformation function
is a deformable or non-rigid transformation function.
A second transformed image 404 can be generated by registering the first transformed image 403 to the fixed image 402 based on
It may be expressed as:

いくつかの実施形態において、画像レジストレーションネットワーク４００を訓練するためのターゲット損失は、固定画像４０２と、第１の変換画像４０３と、第２の変換画像４０４とに基づいて決定されてもよい。画像レジストレーションネットワーク４００のネットワークパラメータは、ターゲット損失が最小になるように、反復的に更新されてもよい。いくつかの実施形態において、画像レジストレーションネットワーク４００を訓練するためのターゲット損失は、１グループの損失の加重合計として決定されてもよい。 In some embodiments, a target loss for training the image registration network 400 may be determined based on the fixed image 402, the first transformed image 403, and the second transformed image 404. Network parameters of the image registration network 400 may be iteratively updated such that the target loss is minimized. In some embodiments, the target loss for training the image registration network 400 may be determined as a weighted sum of a group of losses.

図４に示すように、いくつかの実施形態において、第１の類似度損失４４１は、固定画像４０２（すなわち、Ｆ）及び第１の変換画像４０３（すなわち、
）に基づいて決定されてもよい。例えば、第１の類似度損失４４１は、次式として表されてもよい。
ここで、Ｎは、単一の画像（即ち、固定画像又は移動画像）に含まれる画素数を表す。 As shown in FIG. 4, in some embodiments, the first similarity loss 441 is calculated by multiplying a fixed image 402 (i.e., F) and a first transformed image 403 (i.e.,
For example, the first similarity loss 441 may be expressed as:
Here, N represents the number of pixels contained in a single image (i.e., a fixed image or a moving image).

図４に示すように、いくつかの実施形態において、第２の類似度損失４４２は、固定画像４０２（すなわち、Ｆ）及び第２の変換画像４０４（すなわち、
）に基づいて決定されてもよい。例えば、第２の類似度損失４４２は、次式として表されてもよい。

As shown in FIG. 4, in some embodiments, the second similarity loss 442 is calculated by multiplying the fixed image 402 (i.e., F) and the second transformed image 404 (i.e.,
For example, the second similarity loss 442 may be expressed as:

図４に示すように、いくつかの実施形態において、第３の類似度損失４４３は、第１の変換画像４０３（すなわち、
）及び第２の変換画像４０４（すなわち、
）に基づいて決定されてもよい。例えば、第３の類似度損失４４３は、次式として表されてもよい。
ここで
である。第３の類似度損失
は、レジストレーション精度を向上させるための、
と
とについての逆方向類似度損失（backward similarity loss）である。 As shown in FIG. 4, in some embodiments, the third similarity loss 443 is calculated based on the first transformed image 403 (i.e.,
) and a second transformed image 404 (i.e.
For example, the third similarity loss 443 may be expressed as:
where
The third similarity loss
is to improve the registration accuracy.
and
and the backward similarity loss on

図４に示すように、いくつかの実施形態において、空間平滑性損失４４４は、第１の変換画像４０３（すなわち、
）及び第２の変換４３２（すなわち、
）に基づいて決定されてもよい。例えば、空間平滑性損失４４４は、次式として表されてもよい。
ここで、空間平滑性損失は、空間的に平滑な変形を強制するように、
に正則化制約を設ける。いくつかの実施形態において、ターゲット損失Ｌは、上記の全ての損失の加重合計として、すなわち、次式として決定することができる。
ここで、
である。 As shown in FIG. 4, in some embodiments, the spatial smoothness loss 444 is applied to the first transformed image 403 (i.e.,
) and a second transformation 432 (i.e.,
For example, the spatial smoothness loss 444 may be expressed as:
Here, the spatial smoothness loss is used to enforce spatially smooth deformations.
In some embodiments, the target loss L can be determined as a weighted sum of all the losses above, i.e.,
Where:
It is.

上記に鑑みて、本開示の実施形態は、画像処理のための解決策を提供することが分かる。本開示のいくつかの実施形態によれば、異なる光源の下でキャプチャされた物体に関する画像のセグメンテーション結果を合成してターゲット画像についての最終的なセグメンテーション結果を得ることにより、ターゲット画像についての画像セグメンテーション（例えば、セマンティックセグメンテーション又はインスタンスセグメンテーション）の精度を向上させることができる。追加として、異なる画像間での物体のわずかな変形の影響を画像レジストレーションにより除去することにより、画像セグメンテーション及び／又は物体識別の精度を向上させることができる。 In view of the above, it can be seen that embodiments of the present disclosure provide a solution for image processing. According to some embodiments of the present disclosure, the accuracy of image segmentation (e.g., semantic segmentation or instance segmentation) for a target image can be improved by combining segmentation results of images for an object captured under different light sources to obtain a final segmentation result for the target image. Additionally, the accuracy of image segmentation and/or object identification can be improved by removing the effects of slight deformations of the object between different images through image registration.

図５は本開示のいくつかの実施形態にかかる画像処理の例示的な方法５００を示す図である。方法５００は、図１に示すような画像処理装置１２０において実現できる。方法５００は、図示されていない追加のブロックを含んでもよく、且つ／又は図示されているいくつかのブロックを省略してもよく、本開示の範囲はこの点で限定されないことを理解すべきである。 FIG. 5 illustrates an exemplary method 500 of image processing according to some embodiments of the present disclosure. Method 500 may be implemented in image processing device 120 such as that shown in FIG. 1. It should be understood that method 500 may include additional blocks not shown and/or omit some of the blocks shown, and that the scope of the present disclosure is not limited in this respect.

ブロック５１０において、画像処理装置１２０は、異なる光源の下でキャプチャされた同じ物体に関する複数の画像を取得してもよく、ここで、該複数の画像は、ターゲット画像と少なくとも１つの関連画像とを含む。 In block 510, the image processing device 120 may obtain multiple images of the same object captured under different light sources, where the multiple images include a target image and at least one related image.

ブロック５２０において、画像処理装置１２０は、複数の画像のセグメンテーション結果に基づいて、ターゲット画像についてのセグメンテーションラベルを生成してもよい。 In block 520, the image processing device 120 may generate a segmentation label for the target image based on the segmentation results of the multiple images.

いくつかの実施形態において、方法５００は、該少なくとも１つの関連画像を該ターゲット画像にレジストレーションすることにより、少なくとも１つのレジストレーション画像を生成することと、該ターゲット画像と該少なくとも１つのレジストレーション画像とについて、セマンティックセグメンテーション又はインスタンスセグメンテーションを実行することにより、該セグメンテーション結果を生成することと、をさらに含んでもよい。 In some embodiments, the method 500 may further include generating at least one registered image by registering the at least one related image to the target image, and generating the segmentation result by performing semantic segmentation or instance segmentation on the target image and the at least one registered image.

いくつかの実施形態において、該少なくとも１つのレジストレーション画像を生成することは、該少なくとも１つの関連画像のうちの各関連画像について、第１の変換に基づいて該少なくとも１つの関連画像を該ターゲット画像にレジストレーションすることにより、変換画像を生成することと、第２の変換に基づいて該変換画像を該ターゲット画像にレジストレーションすることにより、レジストレーション画像を生成することと、を含む。 In some embodiments, generating the at least one registration image includes, for each related image of the at least one related image, generating a transformed image by registering the at least one related image to the target image based on a first transformation, and generating a registration image by registering the transformed image to the target image based on a second transformation.

いくつかの実施形態において、第１の変換はアフィン変換であり、第２の変換は変形可能な変換である。 In some embodiments, the first transformation is an affine transformation and the second transformation is a deformable transformation.

いくつかの実施形態において、該少なくとも１つの関連画像を該ターゲット画像にレジストレーションすることは、訓練済みの画像レジストレーションネットワークを使用することにより、該少なくとも１つの関連画像を該ターゲット画像にレジストレーションすることを含む。 In some embodiments, registering the at least one related image to the target image includes registering the at least one related image to the target image by using a trained image registration network.

いくつかの実施形態において、方法５００は、１グループの訓練画像ペアに基づいて該画像レジストレーションネットワークを訓練することをさらに含んでもよく、ここで、各訓練画像ペアは、固定画像と、該固定画像に対する移動画像とを含む。 In some embodiments, the method 500 may further include training the image registration network based on a group of training image pairs, where each training image pair includes a fixed image and a translation image relative to the fixed image.

いくつかの実施形態において、該画像レジストレーションネットワークを訓練することは、アフィン変換に基づいて該移動画像を該固定画像にレジストレーションすることにより第１の変換画像を生成することと、変形可能な変換に基づいて該第１の変換画像を該固定画像にレジストレーションすることにより第２の変換画像を生成することと、該固定画像と、該第１の変換画像と、該第２の変換画像とに基づいて、該画像レジストレーションネットワークを訓練するためのターゲット損失を決定することと、該ターゲット損失が最小化されるように該画像レジストレーションネットワークを訓練することと、を含む。 In some embodiments, training the image registration network includes generating a first transformed image by registering the moving image to the fixed image based on an affine transformation, generating a second transformed image by registering the first transformed image to the fixed image based on a deformable transformation, determining a target loss for training the image registration network based on the fixed image, the first transformed image, and the second transformed image, and training the image registration network such that the target loss is minimized.

いくつかの実施形態において、該ターゲット損失を決定することは、該固定画像と該第１の変換画像とに基づいて第１の類似度損失を決定することと、該固定画像と該第２の変換画像とに基づいて第２の類似度損失を決定することと、該第１の変換画像と該変形可能な変換に対応する関数とに基づいて、空間平滑性損失を決定することと、該第１の変換画像と該第２の変換画像とに基づいて第３の類似度損失を決定することと、該第１の類似度損失と、該第２の類似度損失と、該空間平滑性損失と、該第３の類似度損失との加重合計に基づいて該ターゲット損失を決定することと、を含む。 In some embodiments, determining the target loss includes determining a first similarity loss based on the fixed image and the first transformed image, determining a second similarity loss based on the fixed image and the second transformed image, determining a spatial smoothness loss based on the first transformed image and a function corresponding to the deformable transformation, determining a third similarity loss based on the first transformed image and the second transformed image, and determining the target loss based on a weighted sum of the first similarity loss, the second similarity loss, the spatial smoothness loss, and the third similarity loss.

いくつかの実施形態において、該ターゲット画像と該少なくとも１つのレジストレーション画像とについて、セマンティックセグメンテーション又はインスタンスセグメンテーションを実行することは、訓練済みのセマンティックセグメンテーションネットワークを使用することにより、該ターゲット画像と該少なくとも１つのレジストレーション画像とについて該セマンティックセグメンテーションを実行すること、又は、訓練済みのインスタンスセグメンテーションネットワークを使用することにより、該ターゲット画像と該少なくとも１つのレジストレーション画像とについて該インスタンスセグメンテーションを実行すること、を含む。 In some embodiments, performing semantic segmentation or instance segmentation on the target image and the at least one registration image includes performing the semantic segmentation on the target image and the at least one registration image by using a trained semantic segmentation network, or performing the instance segmentation on the target image and the at least one registration image by using a trained instance segmentation network.

いくつかの実施形態において、該ターゲット画像についての該セマンティックセグメンテーションラベルを生成することは、該セグメンテーション結果の加重合計に基づいて該ターゲット画像についての最終的なセグメンテーション結果を決定することと、該最終的なセグメンテーション結果に基づいて該ターゲット画像についての該セグメンテーションラベルを生成することと、を含む。 In some embodiments, generating the semantic segmentation label for the target image includes determining a final segmentation result for the target image based on a weighted sum of the segmentation results, and generating the segmentation label for the target image based on the final segmentation result.

いくつかの実施形態において、異なる光源は、異なる波長又は波長の異なる組み合わせに関連付けられている。 In some embodiments, different light sources are associated with different wavelengths or different combinations of wavelengths.

図６は本開示のいくつかの実施形態にかかる画像処理の例示的な方法６００を示す図である。方法６００は、図１に示すような画像処理装置１２０において実現できる。方法５００は、図示されていない追加のブロックを含んでもよく、且つ／又は図示されているいくつかのブロックを省略してもよく、本開示の範囲はこの点で限定されないことを理解すべきである。 FIG. 6 illustrates an exemplary method 600 of image processing according to some embodiments of the present disclosure. Method 600 may be implemented in an image processing device 120 such as that shown in FIG. 1. It should be understood that method 500 may include additional blocks not shown and/or omit some of the blocks shown, and the scope of the present disclosure is not limited in this respect.

ブロック６１０において、画像処理装置１２０は、異なる光源の下でキャプチャされた同じ物体に関する複数の画像を取得してもよい。 In block 610, the image processing device 120 may acquire multiple images of the same object captured under different light sources.

ブロック６２０において、画像処理装置１２０は、複数の画像をレジストレーションしてもよい。 In block 620, the image processing device 120 may register the multiple images.

ブロック６３０において、画像処理装置１２０は、レジストレーションされた複数の画像に基づいて物体を識別してもよい。 In block 630, the image processing device 120 may identify the object based on the registered images.

いくつかの実施形態において、該複数の画像は、ターゲット画像と少なくとも１つの関連画像とを含み、該複数の画像をレジストレーションすることは、該少なくとも１つの関連画像を該ターゲット画像にレジストレーションすることにより少なくとも１つのレジストレーション画像を生成することを含み、ここで、レジストレーションされた該複数の画像は、該少なくとも１つのレジストレーション画像と該ターゲット画像とを含む。 In some embodiments, the plurality of images includes a target image and at least one related image, and registering the plurality of images includes generating at least one registered image by registering the at least one related image to the target image, where the registered plurality of images includes the at least one registered image and the target image.

いくつかの実施形態において、レジストレーションされた該複数の画像に基づいて該物体を識別することは、該ターゲット画像と該少なくとも１つのレジストレーション画像とについて、セマンティックセグメンテーション又はインスタンスセグメンテーションを実行することにより、セグメンテーション結果を生成することと、該セグメンテーション結果に基づいて該物体を識別することと、を含む。 In some embodiments, identifying the object based on the registered images includes performing semantic or instance segmentation on the target image and the at least one registered image to generate a segmentation result, and identifying the object based on the segmentation result.

いくつかの実施形態において、該少なくとも１つのレジストレーション画像を生成することは、該少なくとも１つの関連画像のうちの各関連画像について、第１の変換に基づいて該少なくとも１つの関連画像を該ターゲット画像にレジストレーションすることにより、変換画像を生成することと、第２の変換に基づいて、該変換画像を該ターゲット画像にレジストレーションすることにより、レジストレーション画像を生成することと、を含む。 In some embodiments, generating the at least one registration image includes, for each related image of the at least one related image, generating a transformed image by registering the at least one related image to the target image based on a first transformation, and generating a registration image by registering the transformed image to the target image based on a second transformation.

いくつかの実施形態において、方法６００は、１グループの訓練画像ペアに基づいて該画像レジストレーションネットワークを訓練することをさらに含んでもよく、ここで、各訓練画像ペアは、固定画像と、該固定画像に対する移動画像とを含む。 In some embodiments, the method 600 may further include training the image registration network based on a group of training image pairs, where each training image pair includes a fixed image and a translation image relative to the fixed image.

図７は本開示の実施形態を実現するのに使用できる装置７００の概略ブロック図である。例えば、画像収集装置１１０及び／又は画像処理装置１２０は、装置７００により実現することができる。例えば、装置７００は、異なる光源の下で疑わしい病変又は腫瘍に関する画像をキャプチャすることができる医療補助装置又は内視鏡補助装置を実現するために使用されることができる。図７に示すように、装置７００は、リードオンリーメモリ（ＲＯＭ）７０２に記憶されたコンピュータプログラム命令、又は記憶部７０８からランダムアクセスメモリ（ＲＡＭ）７０３にアップロードされたコンピュータプログラム命令に基づいて、様々な適切な動作及び処理を実行することができる中央処理装置（ＣＰＵ）７０１を含む。ＲＡＭ７０３には、装置７００のオペレーションに必要とされる各種のプログラムとデータとがさらに記憶されている。ＣＰＵ７０１、ＲＯＭ７０２及びＲＡＭ７０３は、バス７０４を介して相互接続されている。入出力（Ｉ／Ｏ）インターフェース７０５もバス７０４に接続されている。 7 is a schematic block diagram of an apparatus 700 that can be used to implement an embodiment of the present disclosure. For example, the image acquisition device 110 and/or the image processing device 120 can be implemented by the apparatus 700. For example, the apparatus 700 can be used to implement a medical support device or an endoscopy support device that can capture images of a suspicious lesion or tumor under different light sources. As shown in FIG. 7, the apparatus 700 includes a central processing unit (CPU) 701 that can perform various suitable operations and processes based on computer program instructions stored in a read-only memory (ROM) 702 or uploaded from a storage unit 708 to a random access memory (RAM) 703. The RAM 703 further stores various programs and data required for the operation of the apparatus 700. The CPU 701, the ROM 702, and the RAM 703 are interconnected via a bus 704. An input/output (I/O) interface 705 is also connected to the bus 704.

Ｉ／Ｏインターフェース７０５には、キーボード、マウスなどの入力部７０６と、各種のディスプレイとスピーカなどの出力部７０７と、磁気ディスクと光ディスクなどの記憶部７０８と、ネットワークカード、モデム、無線通信トランシーバなどの通信部７０９と、を含む構成要素が接続されている。通信部７０９により、装置７００が、インターネット及び／又は電気通信ネットワークなどのコンピュータネットワークを介して他の装置とデータ／情報を交換することを可能にする。 The I/O interface 705 is connected to components including an input section 706 such as a keyboard and a mouse, an output section 707 such as various displays and speakers, a storage section 708 such as a magnetic disk and an optical disk, and a communication section 709 such as a network card, a modem, a wireless communication transceiver, etc. The communication section 709 enables the device 700 to exchange data/information with other devices via a computer network such as the Internet and/or a telecommunications network.

上述した方法又は処理、例えば、方法５００及び／又は６００は、処理ユニット７０１により実行することができる。例えば、いくつかの実現において、方法５００は、記憶部７０８のようなマシン可読媒体中に実体的に含まれるコンピュータソフトウェアプログラムとして実現されてもよい。いくつかの実現において、コンピュータプログラムは、ＲＯＭ７０２及び／又は通信部７０９によって、装置７００上に部分的又は完全にロード及び／又はマウントされてもよい。コンピュータプログラムがＲＡＭ７０３にアップロードされ、ＣＰＵ７０１により実行されたときに、上述した方法５００及び／又は６００の１つ又は複数のステップを実行することができる。 The above-mentioned methods or processes, e.g., methods 500 and/or 600, may be executed by the processing unit 701. For example, in some implementations, the method 500 may be implemented as a computer software program tangibly contained in a machine-readable medium, such as the storage unit 708. In some implementations, the computer program may be partially or fully loaded and/or mounted on the device 700 by the ROM 702 and/or the communication unit 709. When the computer program is uploaded to the RAM 703 and executed by the CPU 701, it may perform one or more steps of the above-mentioned methods 500 and/or 600.

いくつかの実施形態において、画像処理装置は回路を備え、前記回路は、異なる光源の下でキャプチャされた物体に関する複数の画像であって、ターゲット画像と少なくとも１つの関連画像とを含む複数の画像を取得し、前記複数の画像のセグメンテーション結果に基づいて、前記ターゲット画像についてのセグメンテーションラベルを生成するように設定されている。 In some embodiments, the image processing device comprises a circuit configured to obtain a plurality of images of an object captured under different illuminants, the plurality of images including a target image and at least one related image, and generate a segmentation label for the target image based on segmentation results of the plurality of images.

いくつかの実施形態において、前記回路はさらに、前記少なくとも１つの関連画像を前記ターゲット画像にレジストレーションすることにより、少なくとも１つのレジストレーション画像を生成し、前記ターゲット画像と前記少なくとも１つのレジストレーション画像とについて、セマンティックセグメンテーション又はインスタンスセグメンテーションを実行することにより、前記セグメンテーション結果を生成するように設定されている。 In some embodiments, the circuitry is further configured to generate at least one registered image by registering the at least one related image to the target image, and to generate the segmentation result by performing semantic segmentation or instance segmentation on the target image and the at least one registered image.

いくつかの実施形態において、画像処理装置は回路を備え、前記回路は、異なる光源の下でキャプチャされた物体に関する複数の画像を取得し、前記複数の画像をレジストレーションし、レジストレーションされた前記複数の画像に基づいて前記物体を識別するように設定されている。 In some embodiments, the image processing device includes circuitry configured to obtain a plurality of images of an object captured under different illuminants, register the plurality of images, and identify the object based on the registered plurality of images.

いくつかの実施形態において、該複数の画像は、ターゲット画像と少なくとも１つの関連画像とを含み、前記回路はさらに、前記少なくとも１つの関連画像を前記ターゲット画像にレジストレーションすることにより少なくとも１つのレジストレーション画像を生成するように設定され、ここで、レジストレーションされた前記複数の画像は、前記少なくとも１つのレジストレーション画像と前記ターゲット画像とを含む。 In some embodiments, the plurality of images includes a target image and at least one related image, and the circuitry is further configured to generate at least one registered image by registering the at least one related image to the target image, where the registered plurality of images includes the at least one registered image and the target image.

いくつかの実施形態において、前記回路はさらに、前記ターゲット画像と前記少なくとも１つのレジストレーション画像とについて、セマンティックセグメンテーション又はインスタンスセグメンテーションを実行することにより、セグメンテーション結果を生成し、前記セグメンテーション結果に基づいて前記物体を識別するように設定されている。 In some embodiments, the circuitry is further configured to perform semantic or instance segmentation on the target image and the at least one registered image to generate a segmentation result and identify the object based on the segmentation result.

いくつかの実施形態において、前記回路はさらに、前記少なくとも１つの関連画像のうちの各関連画像について、第１の変換に基づいて前記少なくとも１つの関連画像を前記ターゲット画像にレジストレーションすることにより、変換画像を生成し、第２の変換に基づいて前記変換画像を前記ターゲット画像にレジストレーションすることにより、レジストレーション画像を生成するように設定されている。 In some embodiments, the circuitry is further configured to, for each related image of the at least one related image, generate a transformed image by registering the at least one related image to the target image based on a first transformation, and generate a registered image by registering the transformed image to the target image based on a second transformation.

いくつかの実施形態において、前記回路はさらに、訓練済みの画像レジストレーションネットワークを使用することにより、前記少なくとも１つの関連画像を前記ターゲット画像にレジストレーションするように設定されている。 In some embodiments, the circuitry is further configured to register the at least one related image to the target image by using a trained image registration network.

いくつかの実施形態において、前記回路はさらに、１グループの訓練画像ペアに基づいて前記画像レジストレーションネットワークを訓練するように設定され、ここで、各訓練画像ペアは、固定画像と、前記固定画像に対する移動画像とを含む。 In some embodiments, the circuitry is further configured to train the image registration network based on a group of training image pairs, where each training image pair includes a fixed image and a translation image relative to the fixed image.

いくつかの実施形態において、前記回路はさらに、アフィン変換に基づいて前記移動画像を前記固定画像にレジストレーションすることにより第１の変換画像を生成し、変形可能な変換に基づいて前記第１の変換画像を前記固定画像にレジストレーションすることにより第２の変換画像を生成し、前記固定画像と、前記第１の変換画像と、前記第２の変換画像とに基づいて、前記画像レジストレーションネットワークを訓練するためのターゲット損失を決定し、前記ターゲット損失が最小化されるように前記画像レジストレーションネットワークを訓練するように設定されている。 In some embodiments, the circuitry is further configured to generate a first transformed image by registering the moving image to the fixed image based on an affine transformation, generate a second transformed image by registering the first transformed image to the fixed image based on a deformable transformation, determine a target loss for training the image registration network based on the fixed image, the first transformed image, and the second transformed image, and train the image registration network such that the target loss is minimized.

いくつかの実施形態において、前記回路はさらに、前記固定画像と前記第１の変換画像とに基づいて第１の類似度損失を決定し、前記固定画像と前記第２の変換画像とに基づいて第２の類似度損失を決定し、前記第１の変換画像と前記変形可能な変換に対応する関数とに基づいて、空間平滑性損失を決定し、前記第１の変換画像と前記第２の変換画像とに基づいて第３の類似度損失を決定し、前記第１の類似度損失と、前記第２の類似度損失と、前記空間平滑性損失と、前記第３の類似度損失との加重合計に基づいて前記ターゲット損失を決定するように設定されている。 In some embodiments, the circuitry is further configured to determine a first similarity loss based on the fixed image and the first transformed image, determine a second similarity loss based on the fixed image and the second transformed image, determine a spatial smoothness loss based on the first transformed image and a function corresponding to the deformable transformation, determine a third similarity loss based on the first transformed image and the second transformed image, and determine the target loss based on a weighted sum of the first similarity loss, the second similarity loss, the spatial smoothness loss, and the third similarity loss.

いくつかの実施形態において、前記回路はさらに、訓練済みのセマンティックセグメンテーションネットワークを使用することにより、前記ターゲット画像と前記少なくとも１つのレジストレーション画像とについて前記セマンティックセグメンテーションを実行するか、又は訓練済みのインスタンスセグメンテーションネットワークを使用することにより、前記ターゲット画像と前記少なくとも１つのレジストレーション画像とについて前記インスタンスセグメンテーションを実行するように設定されている。 In some embodiments, the circuitry is further configured to perform the semantic segmentation of the target image and the at least one registered image by using a trained semantic segmentation network, or to perform the instance segmentation of the target image and the at least one registered image by using a trained instance segmentation network.

いくつかの実施形態において、前記回路はさらに、前記セグメンテーション結果の加重合計に基づいて前記ターゲット画像についての最終的なセグメンテーション結果を決定し、前記最終的なセグメンテーション結果に基づいて前記ターゲット画像についての前記セグメンテーションラベルを生成するように設定されている。 In some embodiments, the circuitry is further configured to determine a final segmentation result for the target image based on the weighted sum of the segmentation results, and generate the segmentation label for the target image based on the final segmentation result.

本開示は、システム、方法及び／又はコンピュータプログラム製品として実現されてもよい。本開示がシステムとして実装される場合、本明細書で説明される構成要素は、単一の装置上で実現されることに加えて、クラウドコンピューティングアーキテクチャの形で実現されてもよい。クラウドコンピューティング環境では、これらの構成要素は遠隔的に配置され、本開示に記載された機能を実現するためにともに作動することができる。クラウドコンピューティングは、コンピューティング、ソフトウェア、データアクセス及びストレージサービスを提供することができる。これらのサービスを提供するシステムやハードウェアの物理的な場所や設定をエンドユーザが知る必要はない。クラウドコンピューティングは、適切なプロトコルを使用して、広域ネットワーク（例えばインターネット）上でサービスを提供することができる。例えば、クラウドコンピューティングプロバイダは、広域ネットワークを介してアプリケーションを提供し、これらは、ブラウザ又は任意の他のコンピューティングコンポーネントを介してアクセスすることができる。クラウドコンピューティングコンポーネント及び対応するデータは、遠隔のサーバに記憶することができる。クラウドコンピューティング環境内のコンピューティングリソースを、遠隔のデータセンターに集中させてもよく、これらのコンピューティングリソースを分散させてもよい。クラウドコンピューティングインフラストラクチャは、たとえ共有されたデータセンターがユーザにとって単一のアクセスポイントに見えても、これらの共有されたデータセンターを介してサービスを提供することができる。したがって、クラウドコンピューティングアーキテクチャは、本明細書に記載された様々な機能を遠隔のサービスプロバイダから提供するために使用することができる。代替として、これらの機能は、従来のサーバから提供されてもよく、直接又は他の方法でクライアント装置上にインストールされてもよい。追加として、本開示は、コンピュータプログラム製品として実現されてもよい。コンピュータプログラム製品は、本開示の様々な態様を実行するためのコンピュータ可読プログラム命令がロードされたコンピュータ可読記憶媒体を含んでもよい。 The present disclosure may be realized as a system, a method, and/or a computer program product. When the present disclosure is implemented as a system, the components described herein may be realized in the form of a cloud computing architecture in addition to being realized on a single device. In a cloud computing environment, these components may be located remotely and work together to realize the functions described in the present disclosure. Cloud computing may provide computing, software, data access, and storage services. The physical location and configuration of the systems and hardware that provide these services need not be known to the end user. Cloud computing may provide services over a wide area network (e.g., the Internet) using an appropriate protocol. For example, a cloud computing provider may provide applications over a wide area network, which may be accessed through a browser or any other computing component. Cloud computing components and corresponding data may be stored on a remote server. Computing resources in a cloud computing environment may be centralized in a remote data center, or these computing resources may be distributed. A cloud computing infrastructure may provide services through these shared data centers, even if the shared data centers appear to users as a single access point. Thus, a cloud computing architecture may be used to provide various functions described herein from a remote service provider. Alternatively, these functions may be provided from a conventional server or may be installed directly or otherwise on a client device. Additionally, the present disclosure may be embodied as a computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for carrying out various aspects of the present disclosure.

コンピュータ可読記憶媒体は、命令実行装置による使用のために命令を保持及び記憶することができる有形装置であってもよい。コンピュータ可読記憶媒体は、例えば、電子記憶装置、磁気記憶装置、光学記憶装置、電磁記憶装置、半導体記憶装置、又は上記の任意の適切な組み合わせであってもよいが、これらに限定されない。コンピュータ可読記憶媒体のより具体的な例の非網羅的なリストは、ポータブルコンピュータフロッピーディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリーメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリーメモリ（ＥＰＲＯＭ又はフラッシュメモリ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、ポータブル光ディスクリードオンリーメモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、メモリスティック、フロッピーディスク、パンチカード又は命令が記録された溝内の隆起構造のような機械的に符号化された装置、及び前述の任意の適切な組み合わせを含む。本明細書で使用されるように、コンピュータ可読記憶媒体は、それ自体が、電波又は他の自由に伝搬する電磁波、導波路又は他の送信媒体を伝搬する電磁波（例えば、光ケーブルを経過する光パルス）、又はワイヤを介して送信される電気信号などの一時的な信号として解釈されるべきではない。 A computer-readable storage medium may be a tangible device capable of holding and storing instructions for use by an instruction execution device. A computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. A non-exhaustive list of more specific examples of computer-readable storage media includes portable computer floppy disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable optical disk read-only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically encoded devices such as punch cards or raised structures in grooves in which instructions are recorded, and any suitable combination of the foregoing. As used herein, a computer-readable storage medium should not be construed as a transitory signal, such as an electric wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., a light pulse traveling through an optical cable), or an electrical signal transmitted over a wire.

本明細書に記載されたコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体から対応する計算／処理装置に、又は、インターネット、ローカルエリアネットワーク、広域ネットワーク及び／又は無線ネットワークなどのネットワークを介して、外部コンピュータ又は外部記憶装置にダウンロードされてもよい。ネットワークは、銅送信ケーブル、光送信ファイバ、無線送信、ルータ、ファイアウォール、スイッチ、ゲートウェイコンピュータ及び／又はエッジサーバを含んでもよい。各計算／処理装置内のネットワークアダプタカード又はネットワークインターフェースは、ネットワークからコンピュータ可読プログラム命令を受信し、それぞれの計算／処理装置内のコンピュータ可読記憶媒体に記憶するためにコンピュータ可読プログラム命令を転送する。 The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a corresponding computing/processing device or to an external computer or storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives the computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

本開示のオペレーションを実行するためのコンピュータ可読プログラム命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、又はＳｍａｌｌｔａｌｋ、Ｃ＋＋などのオブジェクト指向プログラミング言語と、「Ｃ」プログラミング言語などの従来の手続き型プログラミング言語とを含む１つ又は複数のプログラミング言語の任意の組み合わせで書かれたソースコード又はオブジェクトコードであってもよい。コンピュータ可読プログラム命令は、完全にユーザのコンピュータ上で、部分的にユーザのコンピュータ上で、独立したソフトウェアパッケージとして、部分的にユーザのコンピュータ上で且つ部分的に遠隔のコンピュータ上で、又は完全に遠隔のコンピュータ又はサーバ上で実行してもよい。後者のシナリオにおいて、遠隔のコンピュータは、ローカルエリアネットワーク（ＬＡＮ）又は広域ネットワーク（ＷＡＮ）を含む任意のタイプのネットワークを介してユーザのコンピュータに接続されてもよく、又は（例えば、インターネットサービスプロバイダを利用してインターネットを介して）外部コンピュータに接続されてもよい。いくつかの実施形態において、例えば、プログラマブル論理回路、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、又はプログラマブル論理アレイ（ＰＬＡ）を含む電子回路は、本開示の態様を実行するために、コンピュータ可読プログラム命令の状態情報を利用して電子回路をパーソナライズすることにより、コンピュータ可読プログラム命令を実行してもよい。 The computer-readable program instructions for carrying out the operations of the present disclosure may be assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state-setting data, or source or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, and traditional procedural programming languages such as the "C" programming language. The computer-readable program instructions may run entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), may execute computer-readable program instructions by utilizing state information of the computer-readable program instructions to personalize the electronic circuitry to perform aspects of the present disclosure.

本開示の実施形態にかかる方法、装置（システム）、及びコンピュータプログラム製品のフローチャート図及び／又はブロック図を参照して、本明細書で本開示の態様について説明する。フローチャート図及び／又はブロック図の各ブロック、ならびにフローチャート図及び／又はブロック図内のブロックの組み合わせは、コンピュータ可読プログラム命令により実現ｎされてもよいことを、理解すべきである。 Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, may be implemented by computer readable program instructions.

これらのコンピュータ可読プログラム命令は、コンピュータ又は他のプログラマブルデータ処理装置のプロセッサを介して実行する命令がフローチャート及び／又はブロック図の一つ又は複数のブロック内で指定された機能／動作を実現するための手段を生成するように、汎用コンピュータ、専用コンピュータ、又は他のプログラマブルデータ処理装置のプロセッサに提供されて、マシンを生成してもよい。これらのコンピュータ可読プログラム命令は、コンピュータ、プログラマブルデータ処理装置及び／又は他の装置を特定の方法で機能するように指示することができるコンピュータ可読記憶媒体に記憶されてもよく、その結果、命令を記憶しているコンピュータ可読記憶媒体は、フローチャート及び／又はブロック図の１つ又は複数のブロック内で規定された機能／動作の態様を実現する命令を含む製品を含む。 These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that the instructions, which execute through the processor of the computer or other programmable data processing device, generate means for implementing the functions/operations specified in one or more blocks of the flowcharts and/or block diagrams to produce a machine. These computer-readable program instructions may be stored on a computer-readable storage medium capable of directing a computer, programmable data processing device, and/or other device to function in a particular manner, such that the computer-readable storage medium storing the instructions includes a product including instructions implementing aspects of the functions/operations defined in one or more blocks of the flowcharts and/or block diagrams.

コンピュータ可読プログラム命令は、コンピュータ、他のプログラマブルデータ処理装置、又は他の装置にロードされて、一連のオペレーションステップがコンピュータ、他のプログラマブルデータ処理装置、又は他の装置上で実行され、それによって、コンピュータ、他のプログラマブルデータ処理装置、又は他の装置上で実行される命令が、フローチャート及び／又はブロック図ブロックに規定された機能／動作を実現するように、コンピュータ実現プロセスを生成してもよい。 The computer readable program instructions may be loaded into a computer, other programmable data processing device, or other device to generate a computer implemented process such that a series of operational steps are executed on the computer, other programmable data processing device, or other device, thereby causing the instructions executing on the computer, other programmable data processing device, or other device to implement the functions/acts specified in the flowchart and/or block diagram blocks.

フローチャート及びブロック図は、本開示の様々な実施形態にかかる、システム、方法及びコンピュータプログラム製品の可能な実現のアーキテクチャ、機能及びオペレーションを示す。この点において、フローチャート又はブロック図内の各ブロックは、規定された論理機能を実現するための１つ又は複数の実行可能命令を含むコードのモジュール、スニペット、又は部分を表してもよい。いくつかの代替的な実現において、ブロックに記録された機能は、図に記録された順序と異なる順序で発生してもよい。例えば、関連する機能によっては、連続して示される２つのブロックは実際には実質的に同時に実行されてもよく、又はこれらのブロックは時には逆の順序で実行されてもよい。ブロック図及び／又はフローチャート図内の各ブロック、ならびにブロック図及び／又はフローチャート図内のブロックの組み合わせは、特定の機能又は動作を実行する専用ハードウェアベースのシステム、又は専用ハードウェア及びコンピュータ命令の組み合わせにより実装されてもよいことにも留意すべきである。 The flowcharts and block diagrams illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, snippet, or portion of code that includes one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions recorded in the blocks may occur in a different order than the order recorded in the diagram. For example, depending on the functionality involved, two blocks shown in succession may actually be executed substantially simultaneously, or these blocks may sometimes be executed in the reverse order. It should also be noted that each block in the block diagrams and/or flowchart diagrams, as well as combinations of blocks in the block diagrams and/or flowchart diagrams, may be implemented by a dedicated hardware-based system that performs a particular function or operation, or a combination of dedicated hardware and computer instructions.

本開示の様々な実施形態の説明が例示の目的で提示されたが、開示された実施形態を網羅的に又は限定的に説明することを意図するものではない。説明された実施形態の範囲及び精神から逸脱することなく、多くの修正及び変更は当業者にとって明らかである。本明細書で使用される用語は、実施形態の原理、市場で見出される技術についての実用的な応用又は技術的改良を最もよく説明するため、又は当業者が本明細書で開示された実施形態を理解することを可能にするために選択されている。 The description of various embodiments of the present disclosure has been presented for illustrative purposes, but is not intended to be an exhaustive or limiting description of the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terms used in this specification have been selected to best explain the principles of the embodiments, practical applications or technical improvements of the technology found in the market, or to enable those skilled in the art to understand the embodiments disclosed herein.

Claims

1. A method of image processing, comprising the steps of:
obtaining a plurality of images of an object captured under different light sources, the plurality of images including a target image and at least one associated image;
generating a segmentation label for the target image based on the segmentation results of the plurality of images;
generating at least one registered image by registering the at least one related image to the target image;
performing semantic or instance segmentation on the target image and the at least one registered image to generate the segmentation result;
Including,
Registering the at least one related image to the target image includes:
registering the at least one related image to the target image by using a trained image registration network;
The method comprises:
training the image registration network based on a group of training image pairs;
Each training image pair includes a fixed image and a translation image relative to the fixed image;
Training the image registration network includes:
generating a first transformed image by registering the moved image to the fixed image based on an affine transformation;
generating a second transformed image by registering the first transformed image to the fixed image based on a deformable transformation;
determining a target loss for training the image registration network based on the fixed image, the first transformed image, and the second transformed image;
training the image registration network such that the target loss is minimized; and
Including,
Determining the target loss comprises:
determining a first similarity loss based on the fixed image and the first transformed image;
determining a second similarity loss based on the fixed image and the second transformed image;
determining a spatial smoothness loss based on the first transformed image and a function corresponding to the deformable transformation;
determining a third similarity loss based on the first transformed image and the second transformed image;
determining the target loss based on a weighted sum of the first similarity loss, the second similarity loss, the spatial smoothness loss, and the third similarity loss;
A method comprising :

1. A method of image processing, comprising the steps of:
Obtaining a plurality of images of an object captured under different light sources;
registering the plurality of images; and
identifying the object based on the registered images;
Including,
The plurality of images includes a target image and at least one related image, and registering the plurality of images includes:
generating at least one registered image by registering the at least one related image to the target image;
the registered plurality of images includes the at least one registered image and the target image;
Registering the at least one related image to the target image includes:
registering the at least one related image to the target image by using a trained image registration network;
The method comprises:
training the image registration network based on a group of training image pairs;
Each training image pair includes a fixed image and a translation image relative to the fixed image;
Training the image registration network includes:
generating a first transformed image by registering the moved image to the fixed image based on an affine transformation;
generating a second transformed image by registering the first transformed image to the fixed image based on a deformable transformation;
determining a target loss for training the image registration network based on the fixed image, the first transformed image, and the second transformed image;
training the image registration network such that the target loss is minimized; and
Including,
Determining the target loss comprises:
determining a first similarity loss based on the fixed image and the first transformed image;
determining a second similarity loss based on the fixed image and the second transformed image;
determining a spatial smoothness loss based on the first transformed image and a function corresponding to the deformable transformation;
determining a third similarity loss based on the first transformed image and the second transformed image;
determining the target loss based on a weighted sum of the first similarity loss, the second similarity loss, the spatial smoothness loss, and the third similarity loss;
A method comprising :

Identifying the object based on the registered plurality of images includes:
performing semantic or instance segmentation on the target image and the at least one registered image to generate a segmentation result;
identifying the object based on the segmentation result;
3. The method of claim 2 comprising:

Producing the at least one registration image includes:
For each related image of the at least one related image,
generating a transformed image by registering the at least one related image to the target image based on a first transformation;
generating a registered image by registering the transformed image to the target image based on a second transformation; and
The method according to claim 1 or 2 , comprising:

The method of claim 4 , wherein the first transformation is an affine transformation and the second transformation is a deformation transformation.

Performing semantic or instance segmentation on the target image and the at least one registered image includes:
performing the semantic segmentation on the target image and the at least one registered image by using a trained semantic segmentation network; or performing the instance segmentation on the target image and the at least one registered image by using a trained instance segmentation network;
The method according to claim 1 or 3 , comprising:

Generating the segmentation labels for the target image includes:
determining a final segmentation result for the target image based on a weighted sum of the segmentation results;
generating the segmentation label for the target image based on the final segmentation result;
2. The method of claim 1, comprising:

The method according to any one of the preceding claims, wherein the different light sources are associated with different wavelengths or different combinations of wavelengths.

An image processing apparatus comprising at least one processor,
The at least one processor
acquiring a plurality of images of an object captured under different light sources, the plurality of images including a target image and at least one related image;
generating a segmentation label for the target image based on the segmentation results of the plurality of images;
generating at least one registered image by registering the at least one related image to the target image;
performing a semantic or instance segmentation on the target image and the at least one registered image to generate the segmentation result;
registering the at least one related image to the target image by using a trained image registration network;
configured to train the image registration network based on a group of training image pairs;
Each training image pair includes a fixed image and a translation image relative to the fixed image;
The at least one processor further comprises:
generating a first transformed image by registering the moved image to the fixed image based on an affine transformation;
generating a second transformed image by registering the first transformed image to the fixed image based on a deformable transformation;
determining a target loss for training the image registration network based on the fixed image, the first transformed image, and the second transformed image;
training the image registration network such that the target loss is minimized;
determining a first similarity loss based on the fixed image and the first transformed image;
determining a second similarity loss based on the fixed image and the second transformed image;
determining a spatial smoothness loss based on the first transformed image and a function corresponding to the deformable transformation;
determining a third similarity loss based on the first transformed image and the second transformed image;
determining the target loss based on a weighted sum of the first similarity loss, the second similarity loss, the spatial smoothness loss, and the third similarity loss;
4. An image processing device configured as follows:

An image processing apparatus comprising at least one processor,
The at least one processor
Obtaining a plurality of images of an object captured under different light sources;
registering the plurality of images;
configured to identify the object based on the registered plurality of images;
the plurality of images includes a target image and at least one related image;
The at least one processor further comprises:
configured to generate at least one registered image by registering the at least one related image to the target image;
the registered plurality of images includes the at least one registered image and the target image;
The at least one processor further comprises:
registering the at least one related image to the target image by using a trained image registration network;
configured to train the image registration network based on a group of training image pairs;
Each training image pair includes a fixed image and a translation image relative to the fixed image;
The at least one processor further comprises:
generating a first transformed image by registering the moved image to the fixed image based on an affine transformation;
generating a second transformed image by registering the first transformed image to the fixed image based on a deformable transformation;
determining a target loss for training the image registration network based on the fixed image, the first transformed image, and the second transformed image;
training the image registration network such that the target loss is minimized;
determining a first similarity loss based on the fixed image and the first transformed image;
determining a second similarity loss based on the fixed image and the second transformed image;
determining a spatial smoothness loss based on the first transformed image and a function corresponding to the deformable transformation;
determining a third similarity loss based on the first transformed image and the second transformed image;
determining the target loss based on a weighted sum of the first similarity loss, the second similarity loss, the spatial smoothness loss, and the third similarity loss;
4. An image processing device configured as follows:

The at least one processor further comprises:
performing a semantic or instance segmentation on the target image and the at least one registered image to generate a segmentation result;
The apparatus of claim 10 , configured to identify the object based on the segmentation result.

The at least one processor further comprises:
For each related image of the at least one related image,
generating a transformed image by registering the at least one related image to the target image based on a first transformation;
11. Apparatus according to claim 9 or 10 , configured to generate a registered image by registering the transformed image to the target image based on a second transformation.

The apparatus of claim 12 , wherein the first transformation is an affine transformation and the second transformation is a deformation transformation.

The at least one processor further comprises:
12. The apparatus of claim 9 or 11, configured to: perform the semantic segmentation on the target image and the at least one registered image by using a trained semantic segmentation network; or perform the instance segmentation on the target image and the at least one registered image by using a trained instance segmentation network.

The at least one processor further comprises:
determining a final segmentation result for the target image based on a weighted sum of the segmentation results;
The apparatus of claim 9 , configured to generate the segmentation label for the target image based on the final segmentation result.

16. Apparatus according to any one of claims 9 to 15 , wherein the different light sources are associated with different wavelengths or different combinations of wavelengths.