CN115713535B

CN115713535B - Image segmentation model determination method and image segmentation method

Info

Publication number: CN115713535B
Application number: CN202211386108.XA
Authority: CN
Inventors: 许敏丰; 郭恒; 张剑锋
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2024-05-14
Anticipated expiration: 2042-11-07
Also published as: CN115713535A

Abstract

The embodiment of the specification provides an image segmentation model determining method and an image segmentation method, wherein the image segmentation model determining method comprises the steps of determining a first image sample set and a second image sample set containing a target object; determining a feature extraction model according to the first image sample in the first image sample set; inputting the second image sample into a feature extraction model to obtain a second feature image of the second image sample; and determining an image segmentation model according to the second characteristic image and the sample label. According to the method, a feature extraction model capable of achieving multi-scale image feature extraction is trained through a first image sample set without labels, so that the feature extraction model can learn rich and advanced semantic information of the first image sample; and then taking the feature extraction model as a feature extractor, and combining a small amount of second image sample sets with labels to train and obtain an image segmentation model, so that the image segmentation model can carry out accurate image segmentation subsequently.

Description

Image segmentation model determination method and image segmentation method

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to an image segmentation model determining method.

Background

The organ segmentation based on the medical image plays an important role in the aspects of preoperative planning, radiotherapy planning, disease early screening and the like, but the diversity of the medical image data is complex due to factors such as the performance difference of medical equipment, the difference of imaging parameters, the difference between patients, the focus morphology difference and the like, and the labeling of the diversity data is very time-consuming and labor-consuming, and needs professional medical workers for labeling, so that the cost is high.

Therefore, in the case of a large amount of medical image data with relatively complex diversity and without labels, how to accurately realize the division of human tissue data in the medical image data by using the large amount of medical image data without labels and the small amount of medical image data with labels is a technical problem to be solved currently.

Disclosure of Invention

In view of this, the present embodiment provides two image segmentation model determination methods. One or more embodiments of the present specification relate to two kinds of image segmentation model determination apparatuses, an image segmentation method, an image segmentation apparatus, a computing device, a computer-readable storage medium, and a computer program to solve the technical drawbacks existing in the prior art.

According to a first aspect of embodiments of the present specification, there is provided an image segmentation model determination method, including:

Determining a first image sample set and a second image sample set containing a target object, wherein the second image sample set comprises a second image sample and a sample label corresponding to the second image sample;

Determining a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder, and the encoder comprises at least two coding layers for extracting image features with different scales and a vector quantization module which is arranged behind at least one coding layer and is based on dictionary learning;

inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample;

and determining an image segmentation model according to the second characteristic image and the sample label.

According to a second aspect of embodiments of the present specification, there is provided an image segmentation model determination apparatus including:

A sample determining module configured to determine a first image sample set and a second image sample set containing a target object, wherein the second image sample set includes a second image sample and a sample label corresponding to the second image sample;

A first model determining module configured to determine a feature extraction model from a first image sample in the first image sample set, wherein the feature extraction model includes an encoder including at least two encoding layers performing image feature extraction of different scales, and a dictionary learning-based vector quantization module disposed behind at least one of the encoding layers;

a first feature image obtaining module configured to input the second image sample into the feature extraction model, obtaining a second feature image of the second image sample;

A second model determination module configured to determine an image segmentation model from the second feature image and the sample label.

According to a third aspect of embodiments of the present specification, there is provided an image segmentation method including:

receiving a CT image of a human body target part input by a user, and inputting the CT image into a feature extraction model to obtain a feature image of the CT image;

Inputting the characteristic image into an image segmentation model to obtain an image segmentation result of the CT image, and displaying the image segmentation result to the user, wherein the characteristic extraction model and the image segmentation model are the characteristic extraction model and the image segmentation model in the image segmentation model determining method.

According to a fourth aspect of embodiments of the present specification, there is provided an image segmentation apparatus comprising:

the image receiving module is configured to receive a CT image of a human body target part input by a user, input the CT image into a feature extraction model and obtain a feature image of the CT image;

And the image classification module is configured to input the characteristic image into an image segmentation model, obtain an image segmentation result of the CT image and display the image segmentation result to the user, wherein the characteristic extraction model and the image segmentation model are the characteristic extraction model and the image segmentation model in the image segmentation model determining method.

According to a fifth aspect of embodiments of the present specification, there is provided an image segmentation model determination method, including:

responding to an image segmentation model processing request sent by a user, and displaying an image input interface for the user;

receiving a first image sample set and a second image sample set which are input by the user through the image input interface and contain target objects, wherein the second image sample set comprises a second image sample and sample labels corresponding to the second image sample;

And determining an image segmentation model according to the second characteristic image and the sample label, and returning the image segmentation model to the user.

According to a sixth aspect of embodiments of the present specification, there is provided an image segmentation model determination apparatus including:

the interface display module is configured to respond to an image segmentation model processing request sent by a user and display an image input interface for the user;

The sample receiving module is configured to receive a first image sample set and a second image sample set which are input by the user through the image input interface and contain target objects, wherein the second image sample set comprises a second image sample and sample labels corresponding to the second image sample;

A third model determining module configured to determine a feature extraction model from a first image sample in the first image sample set, wherein the feature extraction model includes an encoder including at least two encoding layers performing image feature extraction of different scales, and a dictionary learning-based vector quantization module disposed behind at least one of the encoding layers;

a second feature image obtaining module configured to input the second image sample into the feature extraction model, obtaining a second feature image of the second image sample;

and a fourth model determination module configured to determine an image segmentation model from the second feature image and the sample tag and return the image segmentation model to the user.

According to a seventh aspect of embodiments of the present specification, there is provided a computing device comprising:

A memory and a processor;

the memory is configured to store computer executable instructions that, when executed by the processor, implement the steps of the image segmentation model determination method or the image segmentation method described above.

According to an eighth aspect of embodiments of the present specification, there is provided a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the above-described image segmentation model determination method or image segmentation method.

According to a ninth aspect of the embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-described image segmentation model determination method or image segmentation method.

One embodiment of the present disclosure implements an image segmentation model determining method, including determining a first image sample set including a target object and a second image sample set, where the second image sample set includes a second image sample and a sample label corresponding to the second image sample; determining a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder, and the encoder comprises at least two coding layers for extracting image features with different scales and a vector quantization module which is arranged behind at least one coding layer and is based on dictionary learning; inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample; and determining an image segmentation model according to the second characteristic image and the sample label.

Specifically, the method trains feature extraction models for extracting image features based on dictionary learning under different scales through a first image sample set without labels, combines the dictionary learning with an encoder by utilizing the dictionary learning thought, so that the feature extraction models can learn rich and advanced semantic information of the first image sample, and can realize multi-scale image feature extraction in the follow-up process; and then taking the feature extraction model as a feature extractor, and combining a small amount of second image sample sets with labels to train and obtain an image segmentation model, so that the image segmentation model can accurately segment the image containing the target object.

Drawings

Fig. 1 is a schematic view of a specific scene of an image segmentation method applied to CT image segmentation of a human heart according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method for determining an image segmentation model according to one embodiment of the present disclosure;

FIG. 3 is a process flow diagram of a method for determining an image segmentation model according to one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a network structure of an unsupervised multi-level sparse vector quantization variation automatic encoder SVQ-VAE in an image segmentation model determination method according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a semantic information visualization result learned by different decoding layers in a reconstruction process of an original image in an image segmentation model determination method according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an image segmentation model determination device according to an embodiment of the present disclosure;

FIG. 7 is a flow chart of a method of image segmentation provided in one embodiment of the present disclosure;

FIG. 8 is a flow chart of another image segmentation model determination method provided by one embodiment of the present disclosure;

FIG. 9 is a block diagram of a computing device provided in one embodiment of the present description.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

First, terms related to one or more embodiments of the present specification will be explained.

CT: computed Tomography, i.e. computerized tomography, is a cross-sectional scan of a part of the human body, one after the other, with a detector of extremely high sensitivity, using precisely collimated X-ray beams, gamma rays, ultrasound waves, etc.

SVQ-VAE: sparse Vector Quantization Variational Auto-Encoder, a sparse vector quantization variation automatic encoder.

MLP: multi-Layer persistence, a Multi-Layer Perceptron, a simple forward structured neural network.

In order to solve the technical problems, a human body target part segmentation method based on pre-training can be adopted: MIM (MASKED IMAGE Modeling, mask image Modeling) implementation; for example, MAE (Masked AutoEncoder, mask self encoder): the model of pre-training combined with fine-tuning is mainly used on the human body target part segmentation task, but the problem is that what is learned by pre-training is not known, and in addition, a decoder used in human body target part segmentation is a model with a heavy weight, so that the realization effect is poor.

Based on this, in the present specification, two image segmentation model determination methods are provided. One or more embodiments of the present specification relate to two kinds of image segmentation model determination apparatuses, an image segmentation method, an image segmentation apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail in the following embodiments one by one.

Referring to fig. 1, fig. 1 is a schematic view of a specific scene of an image segmentation method applied to CT image segmentation of a human heart according to an embodiment of the present disclosure.

Fig. 1 includes a CT scanner 102, a terminal 104, and a server 106.

In specific implementation, the CT scanner 102 performs CT scanning on a user to be subjected to CT image segmentation of a human heart, and obtains a CT image of the human heart of the user; the terminal 104 acquires a CT image of the heart from the CT scanner 102, and sends the CT image of the heart to the server 106, the server 106 inputs the CT image of the heart into a pre-trained feature extraction model to perform feature extraction, inputs the extracted feature image of the CT image into an image processing model, outputs a heart segmentation image (i.e., maskt mask image) corresponding to the CT image of the heart through the image processing model, and returns the heart segmentation image to the terminal 104; the user (e.g., doctor) operating the terminal 104 can then determine the heart condition of the user based on the heart segmentation image, such as screening for heart disease, planning before heart surgery, etc. The feature extraction model may be understood as an unsupervised feature learning model obtained by setting dictionary learning encoders on at least two encoding layers, based on the sparse multi-level vector quantization variation automatic encoder, and combining CT image training of unlabeled historical hearts, and the image segmentation model may be understood as a deep learning model pre-trained by taking the unsupervised feature learning model as a feature extractor and combining a small amount of CT images of the labeled learned historical hearts.

The image processing model may output not only the heart divided image but also object labels of CT images of the heart, such as left atrium, right atrium, left ventricle, right ventricle, and the like, based on labels of each divided portion in the heart divided image.

The image segmentation method provided by the embodiment of the specification is applied to a specific scene of CT image segmentation of a human heart, and a sparse-based multi-level vector quantization variation automatic encoder is provided, and learns overcomplete dictionaries under different scales on a large amount of unlabeled data, and obtains rich semantic representation through the dictionaries; and finally, training a simple MLP model based on a small amount of labeling data to realize the accurate segmentation of the multi-organ image, and improving the segmentation accuracy of the image processing model.

Referring to fig. 2, fig. 2 shows a flowchart of a method for determining an image segmentation model according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 202: a first set of image samples containing the target object and a second set of image samples are determined.

The second image sample set comprises a second image sample and a sample label corresponding to the second image sample.

Specifically, the image segmentation model determination method provided in the embodiments of the present disclosure may be applied not only to the medical field, where CT images of target sites of various human or animals (such as heart, liver, etc. of human or animal organs) are segmented, but also to other fields with complexity and diversity and high labeling cost, such as satellite images or planetary images in the aerospace field.

For easy understanding, in the embodiments of the present specification, the application of the image segmentation model to the medical field is taken as an example, and the detailed description is made.

Then, in the case where the image segmentation model is applied to the medical field, the target object may be understood as a human target site, such as a human organ of heart, liver, etc.; the first image sample set comprising the target object may be understood as a first image sample set comprising a target region of a human body, for example a first image sample set comprising a human heart, i.e. the first image sample set comprises a plurality of first image samples comprising a human heart, and the human heart comprised in each first image sample is different; similarly, the second image sample set including the target object may be understood as a second image sample set including a target portion of the human body, for example, a second image sample set including a human heart, that is, the second image sample set includes a plurality of second image samples including a human heart, and the human heart included in each of the second image samples is different; the first image sample set and the second image sample set containing the target object are the first CT image sample set and the second CT image sample set containing the target part of the human body.

The sample label corresponding to the second image sample may be understood as a segmented image of the second image sample, for example, a segmented image containing a human heart, for example, in the second image sample containing a human heart, a left atrium, a right atrium, a left ventricle, a right ventricle, etc. of the human heart in the second image sample are segmented and labeled, and then the segmented and labeled image may be understood as the sample label corresponding to the second image sample.

In practical applications, the first image sample in the first image sample set and the second image sample in the second image sample set may have partial overlapping, or the second image sample in the second image sample set may be the first image sample obtained from the first image sample set; because the first image sample in the first image sample set is subsequently applied to the unsupervised feature extraction model training without labeling sample data, and the second image sample in the second image sample set is subsequently applied to the supervised image segmentation model training, data labeling is needed to generate a sample label, a large number of first image samples without labels can be adopted during the unsupervised feature extraction model training in order to improve the model training effect; the supervised image segmentation model training adopts a small amount of second image samples with labels; thus, the number of first image samples in the first image sample set is greater than the number of second image samples in the second image sample set.

Taking a target object as a human heart as an example, determining a first image sample set and a second image sample set containing the target object can be understood as acquiring or receiving the first image sample set and the second image sample set containing different human hearts, wherein the first image sample set comprises a first image sample without a label, and the second image sample set comprises a second image sample with a label.

Step 204: and determining a feature extraction model according to the first image sample in the first image sample set.

The feature extraction model comprises an encoder, wherein the encoder comprises at least two coding layers for extracting image features with different scales, and a vector quantization module which is arranged behind at least one coding layer and is based on dictionary learning.

The feature extraction model may be understood as a feature extraction model of an encoder-decoder (decoder) structure, such as a feature extraction model of Unet structure; unet is a relatively simple image segmentation algorithm, extracting the target features by four downsampling and restoring to the original size by four upsampling, i.e. is actually an algorithm based on the idea of an encoder-decoder, the encoder comprising four encoding layers and the corresponding decoder comprising four symmetrical decoding layers.

For easy understanding, the following embodiments will take a feature extraction model with a feature extraction model Unet as an example.

Specifically, determining a feature extraction model according to a first image sample in the first image sample set; it may be understood that, according to a plurality of first image samples in the first image sample set, training is performed to obtain a feature extraction model of Unet structures, and the feature extraction model includes an encoder and a decoder, where the encoder includes at least two coding layers for performing feature extraction of images with different scales, and a vector quantization module based on dictionary learning, which is disposed behind at least one coding layer, where the object of dictionary learning (Dictionary Learning) is to extract the most essential features (similar to words or terms in a dictionary).

In implementation, the feature extraction model is a feature extraction model based on an encoder-decoder structure, and then, according to a first image sample in a first image sample set, specific implementation of feature extraction model training is as follows:

The determining a feature extraction model according to the first image sample in the first image sample set includes:

Inputting a first image sample in the first image sample set into a current coding layer of an encoder of a feature extraction model for coding to obtain an initial feature image of the first image sample in the current coding layer;

under the condition that the current coding layer is determined to be provided with a vector quantization module based on dictionary learning, determining a target feature image of the first image sample in the current coding layer according to the vector quantization module;

after the initial characteristic image of the first image sample at the current coding layer is downsampled, inputting the next coding layer of the current coding layer, and obtaining a target characteristic image of the first image sample at the next coding layer;

And inputting the first image sample into target feature images of all encoding layers, decoding the target feature images by a decoding layer of a decoder of the feature extraction model, and training according to a decoding result to obtain the feature extraction model, wherein the encoding layers in the encoder and the decoding layers in the decoder are symmetrically arranged.

In order to make the feature extraction effect of the feature extraction model better, in the embodiment of the specification, an unsupervised representation learning is performed on the diversity data by adopting a multi-level sparse vector quantization automatic encoder (SVQ-VAE), but in practical application, a vector quantization module based on dictionary learning is not arranged behind all encoding layers in the SVQ-VAE, and in order to make the feature extraction effect of the feature extraction model better, the embodiment of the specification takes the fact that a vector quantization module based on dictionary learning is arranged in each encoding layer of the SVQ-VAE as an example, and trains the feature extraction model, so that the feature extraction model can learn an overcomplete dictionary under different scales of different encoding layers of the SVQ-VAE, and the extraction of multi-scale features of an image is realized.

Taking one first image sample in the first image sample set as an example, training of the feature extraction model will be described in detail.

Specifically, inputting a first image sample in a first image sample set into a current coding layer (for example, a first coding layer) of a feature extraction model for coding to obtain an initial feature image of the first pattern sample in the current coding layer, namely, an initial coded feature image after coding; under the condition that the current coding layer is determined to be provided with a vector quantization module based on dictionary learning, determining a target feature image of the first image sample in the current coding layer according to the vector quantization module; then downsampling the initial characteristic image of the first image sample in the current coding layer, for example, downsampling 256 pixels to 128 pixels, and inputting the downsampled initial characteristic image into the next coding layer of the current coding layer under the condition that the current coding layer exists in the next coding layer (for example, a second coding layer), and obtaining the target characteristic image of the first image sample in the next coding layer through processing in the next coding layer; and the same is repeated until the target feature image of the first image sample in each coding layer of all coding layers is obtained, and finally, the target feature image of the first image sample in each coding layer of all coding layers is decoded by combining with a decoding layer of a decoder of a feature extraction model, and the feature extraction model is obtained through training according to a decoding result.

According to the image segmentation model determining method provided by the embodiment of the specification, unsupervised characterization learning is carried out on the first image samples with diversity through setting a sparse vector quantization automatic encoder (SVQ-VAE) on the feature extraction model, and through the SVQ-VAE, an overcomplete dictionary can be learned under different scales of different coding layers, so that training of the feature extraction model is realized, multi-scale feature extraction is carried out on the image by the subsequent feature extraction model, and the feature richness is improved.

Specifically, if the next coding layer of the current coding layer is also provided with a vector quantization module based on dictionary learning, after the first image sample is downsampled in the initial feature image of the current coding layer, the specific implementation manner of the target feature image input to the next coding layer of the current coding layer is the same as the implementation step of the first image sample in the current coding layer, so that feature learning of the first image sample in another scale of the coding layer by the encoder of the next coding layer is completed. The specific implementation steps are as follows:

After the downsampling of the initial feature image of the first image sample at the current coding layer, inputting the next coding layer of the current coding layer, and obtaining the target feature image of the first image sample at the next coding layer, wherein the downsampling comprises the following steps:

After downsampling the first image sample on an initial characteristic image of the current coding layer, inputting the first image sample into a next coding layer of the current coding layer;

Coding is carried out on the next coding layer, and an initial characteristic image of the first image sample on the next coding layer is obtained;

And determining a target characteristic image of the first image sample at the next coding layer according to the vector quantization module under the condition that the next coding layer is determined to be provided with the vector quantization module based on dictionary learning.

The vector quantization module set by the next coding layer and based on dictionary learning is the same as the vector quantization module set by the current coding layer and based on dictionary learning in the above embodiment, and will not be described herein.

The present coding layer is taken as a first coding layer, and the next coding layer of the present coding layer is taken as a second coding layer for example for detailed description.

Specifically, after downsampling an initial characteristic image of a first image sample in a first coding layer, inputting the downsampled initial characteristic image into a second coding layer; after the second coding layer is coded, obtaining an initial characteristic image of the first image sample in the second coding layer; and then determining the target characteristic image of the first image sample in the second coding layer according to the vector quantization module under the condition that the second coding layer is provided with the vector quantization module based on dictionary learning.

When the second coding layer also has the next coding layer (such as the third coding layer), and the third coding layer is also provided with a vector quantization module based on dictionary learning, the specific implementation manner of obtaining the target feature image of the first image sample at the third coding layer is the same as the step of obtaining the target feature image of the first image sample at the second coding layer; similarly, by the implementation manner of obtaining the target feature images of the first image sample at the second coding layer, the target feature images of the first image sample at all the coding layers of the feature extraction model can be obtained.

In practical application, if the feature extraction model is a feature extraction model with Unet structures, the vector quantization module based on dictionary learning can be arranged on all four coding layers, which has the advantages that, for example, in a first coding layer, the features of a first image sample are more observed, the large-scale features of an overcomplete dictionary which is segmented later are more observed, in a fourth coding layer, the detailed features of the first image sample which is seen are more, the detailed features of the overcomplete dictionary which is segmented later are more observed, then the whole is equivalent to the encoders in the four coding layers, and the four overcomplete dictionaries are segmented to respectively focus on the image features of different scales of the first image sample, so that the extraction richness of the image features is improved.

In the implementation, the specific implementation steps of determining the target feature image of the first image sample in the first coding layer by the encoder set in the first coding layer and determining the target feature image of the first image sample in the second coding layer by the encoder set in the second coding layer are the same, and the specific implementation modes are as follows:

The determining, according to the vector quantization module, the target feature image of the first image sample at the current coding layer includes:

decomposing the initial characteristic image into an overcomplete dictionary and sparse coding according to the vector quantization module;

Determining a target characteristic image of the first image sample in the current coding layer according to the overcomplete dictionary and the sparse coding;

accordingly, the determining, according to the vector quantization module, the target feature image of the first image sample at the next coding layer includes:

And determining a target characteristic image of the first image sample in the next coding layer according to the overcomplete dictionary and the sparse coding.

The present coding layer is still taken as a first coding layer, and the next coding layer is taken as a second coding layer for example for explanation in detail.

Specifically, according to the vector quantization module after the first coding layer, decomposing the initial feature image of the first image sample in the first coding layer into a corresponding overcomplete dictionary and sparse coding through a preset algorithm (such as an orthogonal matching pursuit algorithm, orthogonal Matching Pursuit, OMP); and then calculating the target characteristic image of the first image sample in the first coding layer according to the overcomplete dictionary and the sparse coding, for example, multiplying the overcomplete dictionary and the sparse coding, and obtaining the product which is the target characteristic image of the first image sample in the first coding layer.

Similarly, at the second coding layer, according to a vector quantization module behind the second coding layer, decomposing an initial characteristic image of the first image sample at the second coding layer into a corresponding overcomplete dictionary and sparse coding through a preset algorithm (such as an orthogonal matching pursuit algorithm, orthogonal Matching Pursuit, OMP); and then calculating the target characteristic image of the first image sample at the second coding layer according to the overcomplete dictionary and the sparse coding, for example, multiplying the overcomplete dictionary and the sparse coding, and obtaining the product which is the target characteristic image of the first image sample at the second coding layer.

In the embodiment of the present disclosure, in the case that a dictionary learning-based vector quantization module is disposed after an encoding layer of a feature extraction model, the vector quantization module may decompose an initial feature image of a first image sample at each encoding layer into an overcomplete dictionary and sparse coding, and calculate, according to the overcomplete dictionary and the sparse coding, a target feature image of the first image sample at a current scale of a current encoding layer, so that a decoding layer of a subsequent decoder learns very stable and rich semantic information of the first image sample on the basis of reconstructing the first image sample according to target feature images of different scales of each encoding layer.

Under the condition that the current coding layer and the next coding layer of the current coding layer are not provided with vector quantization modules based on dictionary learning, the specific implementation modes of obtaining the target feature image of the first image sample at the current coding layer and the target feature image of the first image sample at the next coding layer of the current coding layer are as follows:

the obtaining the first image sample after the initial feature image of the current coding layer further includes:

Under the condition that the current coding layer is not provided with a vector quantization module based on dictionary learning, determining an initial characteristic image of the first image sample at the current coding layer as a target characteristic image of the first image sample at the current coding layer;

accordingly, the obtaining the first image sample after the initial feature image of the next encoding layer further includes:

and determining the initial characteristic image of the first image sample at the next coding layer as a target characteristic image of the first image sample at the next coding layer under the condition that the next coding layer is not provided with a vector quantization module based on dictionary learning.

Specifically, under the condition that the first coding layer is determined not to be provided with a vector quantization module based on dictionary learning, the initial characteristic image of the first image sample in the first coding layer is determined to be the target characteristic image of the first image sample in the first coding layer.

Similarly, under the condition that the second coding layer is determined not to be provided with a vector quantization module based on dictionary learning, the initial characteristic image of the first image sample at the second coding layer is determined to be the target characteristic image of the first image sample at the second coding layer.

In the embodiment of the present disclosure, under the condition that no vector quantization module based on dictionary learning is set in the coding layers of the feature extraction model, the initial feature image of the first image sample in each coding layer may be directly used as the target feature image, and participate in subsequent decoder training, so as to improve training efficiency of the feature extraction model.

After the target feature image of the first image sample in each coding layer of the feature extraction model is obtained, training the feature extraction model can be achieved by combining a decoder of the feature extraction model according to the target feature image of the first image sample in each coding layer of the feature extraction model; the specific implementation mode is as follows:

inputting the first image sample into the target feature images of all the coding layers, decoding the target feature images by a decoding layer of a decoder of the feature extraction model, training according to a decoding result to obtain the feature extraction model, wherein the method comprises the following steps:

Inputting a target feature image of the first image sample in a last coding layer into a current decoding layer of a decoder of the feature extraction model, which corresponds to the last coding layer, for decoding, and obtaining an initial decoding feature image of the first image sample in the current decoding layer;

Inputting a target characteristic image of a last coding layer of the first image sample and a decoding characteristic image of a current decoding layer of the first image sample into the last decoding layer of the current coding layer to obtain an initial decoding characteristic image of the first image sample in the last decoding layer;

And training to obtain the feature extraction model according to the initial decoding feature images of the first image sample in all decoding layers.

Specifically, inputting a target characteristic image of a first image sample in a last coding layer (such as a fourth coding layer), and decoding the target characteristic image in a current decoding layer (such as a first decoding layer) corresponding to the last coding layer to obtain an initial decoding characteristic image of the first image sample in the current decoding layer; inputting a target characteristic image of a first image sample at a last coding layer (such as a third coding layer) and an initial decoding characteristic image of the first image sample at the current decoding layer into a last decoding layer (such as a second decoding layer) of the current decoding layer to obtain an initial decoding characteristic image of the first image sample at the last decoding layer; and finally training to obtain a feature extraction model according to the initial decoding feature image of the first image sample in each decoding layer.

According to the image segmentation model determining method provided by the embodiment of the specification, the target characteristic image of each coding layer is combined with the output of the last decoding layer of each decoding layer, and is input into the current decoding layer for decoding processing, and rich semantic information of the first image sample in different scales of different coding layers is learned in the process of reconstructing the first image sample.

Specifically, the training to obtain the feature extraction model according to the initial decoded feature images of the first image sample in all decoding layers includes:

And determining a target decoding characteristic image corresponding to the first image sample according to the initial decoding characteristic images of the first image sample in all decoding layers, and training to obtain the characteristic extraction model according to the target decoding characteristic image.

In practical application, after the initial decoding feature images of the first image samples in all decoding layers are obtained, a target decoding feature image corresponding to the first image samples can be obtained according to the initial decoding feature images, for example, the initial decoding feature images of the first image samples in all decoding layers are spliced to obtain the target decoding feature images corresponding to the first image samples; and finally, quickly and accurately training the feature extraction model according to the target decoding feature image. The specific implementation mode is as follows:

the training to obtain the feature extraction model according to the target decoding feature image comprises the following steps:

according to the target decoding feature image, adjusting network parameters of the feature extraction model and the overcomplete dictionary;

and obtaining the trained feature extraction model until the condition of finishing training is met.

The preset end training condition may be set according to practical applications, for example, the preset end training condition may be understood that the iteration number exceeds a preset number threshold (e.g. 20000 times), etc.; or dividing a part of training samples in the initial training samples to serve as a verification set, and ending the training of the feature extraction model under the condition that the loss function of the verification set meets the preset condition.

Specifically, according to the target decoding feature image, the network parameters of the feature extraction model and the overcomplete dictionary of each coding layer are adjusted until the training of the feature extraction model meets the preset training ending condition, and the trained feature extraction model which can obtain rich semantic information with different scales is obtained.

Step 206: and inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample.

After the feature extraction model is obtained through the training step in the above embodiment, after the image segmentation model is trained, the second feature image of each second image sample is extracted according to the feature extraction model, and then the image segmentation model can be trained according to the second feature image and the corresponding sample label, which is specifically implemented as follows:

the inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample includes:

And inputting the second image sample into the feature extraction model, obtaining a target decoding feature image of the second image sample, and determining the target decoding feature image as the second feature image of the second image sample.

A second image sample is illustrated.

Specifically, a second image sample is input into a feature extraction model, a target decoding feature image of the second image sample is obtained, and the target decoding feature image of the second image sample is determined to be the second feature image of the second image sample.

Step 208: and determining an image segmentation model according to the second characteristic image and the sample label.

The sample label is an image segmentation result of the second image sample;

accordingly, the determining an image segmentation model according to the second feature image and the sample label includes:

and training to obtain an image segmentation model according to the second characteristic image and an image segmentation result of the second image sample, wherein the image segmentation model comprises a multi-layer perceptron image segmentation model.

In particular, the image segmentation result of the second image sample may be understood as an image segmentation result implemented for the second image sample by means of color labeling or the like, for example, labeling the target portion with different colors, differentiating different symbols in the second image sample, or the like.

Image segmentation models include, but are not limited to, multi-layer perceptron image segmentation models (i.e., MLPs), as well as other lightweight image segmentation models.

In the implementation, the second feature image corresponding to each second image sample in the second image sample set and the sample label corresponding to each second image sample may be: and training to obtain an image segmentation model according to an image segmentation result of the second image sample. The image segmentation model based on the feature extraction model can be trained through a small amount of second image samples, and a good image segmentation effect can be obtained.

In addition, in order to ensure the training effect of the feature extraction model, the output of the decoding layer of the feature extraction model may be subjected to semantic visualization, and the specific implementation manner is as follows:

The obtaining the first image sample after the initial decoding feature image of the previous decoding layer further includes:

and displaying the initial decoding characteristic images of the first image sample in all decoding layers through an image display interface.

According to the image segmentation model determining method provided by the embodiment of the specification, through the first image sample set without labels, the feature extraction model for extracting the image features based on dictionary learning under different scales is trained, and the dictionary learning and the encoder are combined by utilizing the dictionary learning thought, so that the feature extraction model can learn rich and advanced semantic information of the first image sample, and can realize multi-scale image feature extraction in the follow-up process; and then taking the feature extraction model as a feature extractor, and combining a small amount of second image sample sets with labels to train and obtain an image segmentation model, so that the image segmentation model can accurately segment the image containing the target object.

The image segmentation model determination method provided in the present specification will be further described with reference to fig. 3, by taking an application of the image segmentation model determination method in the medical field as an example. Fig. 3 is a flowchart of a process of an image segmentation model determining method according to an embodiment of the present disclosure, which specifically includes the following steps.

Specifically, the image segmentation model determining method includes two parts of contents, wherein the first part is: feature extraction model training, namely unsupervised multi-level sparse vector quantization variation automatic encoder learning (Unsupervised HIERARCHICAL SPARSE VQVAE LEARNING), the second part is: image segmentation model training, i.e., semantic segmentation (semantic segmentation).

The feature extraction model in the image segmentation model determining method is a feature extraction model with a Unet structure, the feature extraction model comprises an encoder (namely SVQ-VAE) and a decoder, the encoder comprises four encoding layers, a vector quantization module based on dictionary learning is arranged behind each encoding layer, the decoder comprises four decoding layers, and the image segmentation model is described in detail for an MLP (multi-level processor) example.

Step one: taking the unlabeled cardiac CT image as an example, for the first part: training of the feature extraction model is described in detail.

Specifically, the cardiac CT image is input to a first encoding layer of an encoder to be encoded, a first encoded image (corresponding to an initial feature image in the above embodiment) is obtained, the first encoded image is decomposed into a first overcomplete dictionary and a first sparse code by a vector quantization module based on dictionary learning, and then a first vector quantization code (corresponding to a target feature image in the above embodiment) of the first encoded image at a feature extraction scale of the first encoding layer is calculated according to the first overcomplete dictionary and the first sparse code; then inputting the first vector quantization code into a corresponding fourth decoding layer, down-sampling the first coded image, inputting the first coded image into a second coding layer for coding to obtain a second coded image, decomposing the second coded image into a second overcomplete dictionary and a second sparse code through a dictionary learning-based vector quantization module, and calculating the second vector quantization code of the second coded image under the feature extraction scale of the second coding layer according to the second overcomplete dictionary and the second sparse code; the second vector quantization encoding is then input to a corresponding third decoding layer.

Meanwhile, the second coded image is subjected to downsampling and then is input into a third coding layer for coding, a third coded image is obtained, the third coded image is decomposed into a third overcomplete dictionary and a third sparse code based on a vector quantization module of dictionary learning, and then a third vector quantization code of the third coded image under the characteristic extraction scale of the third coding layer is calculated according to the third overcomplete dictionary and the third sparse code; then inputting the third vector quantization code into a corresponding second decoding layer; and the third coded image is subjected to downsampling and then is input into a fourth coding layer for coding, a fourth coded image is obtained, the fourth coded image is decomposed into a fourth overcomplete dictionary and a fourth sparse code based on a vector quantization module of dictionary learning, and a fourth vector quantization code of the fourth coded image under the characteristic extraction scale of the fourth coding layer is calculated according to the fourth overcomplete dictionary and the fourth sparse code; and then, inputting the fourth vector quantization coding into a corresponding first decoding layer for decoding to obtain a first decoding image of the heart CT image in the first decoding layer.

The first decoding layer carries out up-sampling on the first decoding image and then inputs the first decoding image into the second decoding layer, and the second decoding layer fuses the third vector quantization coding with the up-sampled first decoding image to obtain a second decoding image of the heart CT image in the second decoding layer.

The second decoding layer carries out up-sampling on the second decoding image and then inputs the second decoding image into a third decoding layer, and the third decoding layer fuses the second vector quantization coding with the up-sampled second decoding image to obtain a third decoding image of the heart CT image in the third decoding layer.

And the third decoding layer carries out up-sampling on the third decoding image and then inputs the third decoding image into a fourth decoding layer, and the fourth decoding layer fuses the first vector quantization coding with the up-sampled third decoding image to obtain a fourth decoding image of the heart CT image in the fourth decoding layer.

Finally, splicing the first decoding image, the second decoding image, the third decoding image and the fourth decoding image to obtain a target decoding characteristic image corresponding to the heart CT image; and then, adjusting network parameters of the feature extraction model through a target decoding feature image corresponding to the heart CT image, and adjusting an overcomplete dictionary of each coding layer based on the subsequent other heart CT images, and acquiring the feature extraction model after training under the condition that the training meets the preset ending condition.

Referring to fig. 4, fig. 4 is a schematic diagram showing a network structure of an unsupervised multi-level sparse vector quantization variation automatic encoder SVQ-VAE in an image segmentation model determining method according to an embodiment of the present disclosure.

In fig. 4, X may represent a cardiac CT image X, where the original cardiac CT image X is decomposed into an overcomplete dictionary and sparse codes, then a vector quantization code of X is obtained according to the overcomplete dictionary and sparse codes corresponding to X, then X1 is obtained by reconstruction through a corresponding decoding layer according to the vector quantization code of X, and then network parameters of the network structure are reversely adjusted according to X1.

Step two: selecting a part of heart CT images with labels from heart CT images of a training feature extraction model, and obtaining target feature images spliced according to a first decoding image, a second decoding image, a third decoding image and a fourth decoding image through the feature extraction model; and training to obtain an image classification model according to the target feature image corresponding to the heart CT image and the sample label corresponding to the heart CT image.

Of course, in practical application, the training of the image classification model can also be implemented by using other labeled cardiac CT images, which is not limited herein.

According to the description of the embodiment, the image segmentation model determining method is implemented in two stages, in the first stage, a large number of unlabeled CT images of a target part (such as a heart) of a human body are utilized to perform characterization learning, a multi-level sparse vector quantized variational automatic encoder is provided in a feature extraction model, so that the feature extraction model can learn over-complete dictionaries under different scales of different encoding layers and on the premise of guaranteeing sparsity, the obtained sparse codes and over-complete dictionaries are used to obtain vector quantized codes (i.e., encoded images) of the CT images of the target part of the human body in each encoding layer, and finally the vector quantized codes are reconstructed through decoders of corresponding decoding layers, so that the decoders learn rich and stable semantic information in the reconstruction process, and the feature extraction model is obtained through training.

Meanwhile, in the reconstruction process, the output of the decoding layers can be simply clustered so as to see semantic visualization results learned in the reconstruction process of different decoding layers.

Referring to fig. 5, fig. 5 is a schematic diagram showing a semantic information visualization result learned by different decoding layers in a reconstruction process of an original image in an image segmentation model determination method according to an embodiment of the present disclosure.

Fig. 5 includes the visualization results of the original image, the decoder of the second decoding layer, and the visualization results of the decoder of the third decoding layer. As can be seen from fig. 5, the decoder of the third decoding layer can learn semantic information of the comparison details, such as fat at the heart edge, related information of blood vessels, etc.; the decoder of the second decoding layer can learn large-scale semantic information, such as information about the heart as a whole.

In the second stage, after training the feature extraction model, a small number of CT images of the labeled human target part (such as heart) are acquired, and a pixel-level (pixel-level) MLP classifier (namely the image segmentation model) is trained to realize the subsequent segmentation of CT images of the human target part according to the image segmentation model.

The method comprises inputting the labeled cardiac CT image into the feature extraction model of the first part to obtain output results of different decoding layers, upsampling the output results to image sizes specified by different encoding layers, and splicing all the output results together after flat (i.e. compression) to form a training set X of N X M, wherein N is image feature (such as HxWxD), and M is the sum of the output results of decoders of different decoding layers. And flattening a mask corresponding to the CT image of the heart to obtain an Nx 1Y (namely a sample label), and training an MLP classifier by using the X and Y to obtain a final image segmentation model.

According to the image segmentation model determining method provided by the embodiment of the specification, dictionary learning and VQ-VAE are well combined by utilizing the dictionary learning thought, and unlike the typical method that the VQ-VAE generates an overcomplete dictionary through learning and selects vector quantization codes by utilizing distances, the scheme of the embodiment of the specification has constraint on the overcomplete dictionary generated through learning, namely vector quantization codes are generated through the sparse codes and the overcomplete dictionary obtained through learning on the premise of guaranteeing sparse codes as much as possible, so that a rough mode selected by the distances is replaced, and the effect is better.

According to the image segmentation model determining method provided by the embodiment of the specification, the feature extraction model is trained through the designed unsupervised multi-level sparse vector quantization variation automatic encoder, so that a segmentation task based on the feature extraction model can be realized only by a very simple MLP model, and the MLP structure is simple, so that a good training effect can be obtained only by a small amount of labeling data. Specifically, in the embodiment of the specification, the multi-scale characterization learning of the feature extraction model is realized through the design of the SVQ-VAE with multiple layers; and the problem of difficult iteration of the VQ-VAE overcomplete dictionary is solved by introducing dictionary learning, and stable and rich semantic representation learning results can be obtained.

Corresponding to the above method embodiments, the present disclosure further provides an image segmentation model determining apparatus embodiment, and fig. 6 shows a schematic structural diagram of an image segmentation model determining apparatus provided in one embodiment of the present disclosure. As shown in fig. 6, the apparatus includes:

a sample determination module 602 configured to determine a first image sample set containing a target object and a second image sample set, wherein the second image sample set includes a second image sample and a sample label corresponding to the second image sample;

a first model determining module 604 configured to determine a feature extraction model from a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder comprising at least two encoding layers performing image feature extraction of different scales, and a dictionary learning-based vector quantization module disposed after at least one encoding layer;

a first feature image obtaining module 606 configured to input the second image sample into the feature extraction model, obtaining a second feature image of the second image sample;

a second model determination module 608 is configured to determine an image segmentation model from the second feature image and the sample label.

Optionally, the first model determination module 604 is further configured to:

Optionally, the apparatus further comprises:

A first image determining module configured to determine an initial feature image of the first image sample at the current encoding layer as a target feature image of the first image sample at the current encoding layer, if it is determined that the current encoding layer is not provided with a vector quantization module based on dictionary learning;

Accordingly, the device further comprises:

And a second image determining module configured to determine an initial feature image of the first image sample at the next encoding layer as a target feature image of the first image sample at the next encoding layer, if it is determined that the vector quantization module based on dictionary learning is not provided at the next encoding layer.

Optionally, the first model determination module 604 is further configured to:

Optionally, the first model determination module 604 is further configured to

Optionally, the first model determination module 604 is further configured to:

Optionally, the first feature image obtaining module 606 is further configured to:

Optionally, the sample label is an image segmentation result of the second image sample;

accordingly, the second model determination module 608 is further configured to:

Optionally, the first image sample set and the second image sample set containing the target object are a first CT image sample set and a second CT image sample set containing a target portion of a human body.

Optionally, the apparatus further comprises:

and the visualization module is configured to display the initial decoding characteristic images of the first image samples in all decoding layers through an image display interface.

According to the image segmentation model determining device provided by the embodiment of the specification, through the first image sample set without labels, a feature extraction model comprising at least two coding layers and provided with encoders through dictionary learning is trained, and the encoders arranged through each coding layer can learn an overcomplete dictionary of the first image sample under different scales so as to realize extraction of multi-scale features of the first image sample, so that the feature extraction model can learn rich and advanced semantic information of the first image sample; and then taking the feature extraction model as a feature extractor, and combining a small amount of second image sample sets with labels to train and obtain an image segmentation model, so that the image segmentation model can accurately segment the image containing the target object.

The above is a schematic version of an image segmentation model determination apparatus of the present embodiment. It should be noted that, the technical solution of the image segmentation model determining device and the technical solution of the image segmentation model determining method belong to the same concept, and details of the technical solution of the image segmentation model determining device, which are not described in detail, can be referred to the description of the technical solution of the image segmentation model determining method.

Referring to fig. 7, fig. 7 shows a flowchart of an image segmentation method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 702: receiving a CT image of a human body target part input by a user, and inputting the CT image into a feature extraction model to obtain a feature image of the CT image;

step 704: inputting the characteristic image into an image segmentation model, obtaining an image segmentation result of the CT image, and displaying the image segmentation result to the user.

The feature extraction model and the image segmentation model are the feature extraction model and the image segmentation model in the image segmentation model determining method.

Specifically, a human target site may be understood as a human organ, such as the heart, liver, etc.; then the CT image of the target site of the human body can be understood as a CT image of an organ of the human body, such as a CT image of the heart, a CT image of the liver, etc.

According to the image segmentation method provided by the embodiment of the specification, after receiving the CT image of the target part of the human body input by the user, the CT image of the target part is input into the feature extraction model trained by the image segmentation model determination method of the embodiment to obtain the stable and rich feature image of the CT image of the target part, and then the feature image is input into the image segmentation model trained by the image segmentation model determination method of the embodiment to quickly and accurately obtain the segmented image of the CT image of the target part by the image segmentation model.

In addition, an embodiment of the present specification also provides an image segmentation apparatus including:

The above is a schematic solution of an image segmentation apparatus of the present embodiment. It should be noted that, the technical solution of the image segmentation apparatus and the technical solution of the image segmentation method belong to the same concept, and details of the technical solution of the image segmentation apparatus, which are not described in detail, can be referred to the description of the technical solution of the image segmentation method.

Referring to fig. 8, fig. 8 shows a flowchart of another image segmentation model determination method provided according to an embodiment of the present specification, specifically including the following steps.

Step 802: responding to an image segmentation model processing request sent by a user, and displaying an image input interface for the user;

step 804: receiving a first image sample set and a second image sample set which are input by the user through the image input interface and contain target objects, wherein the second image sample set comprises a second image sample and sample labels corresponding to the second image sample;

step 806: and determining a feature extraction model according to the first image sample in the first image sample set.

The feature extraction model comprises an encoder, wherein the encoder comprises at least two coding layers for extracting image features with different scales, and a vector quantization module which is arranged behind at least one coding layer and is based on dictionary learning;

Step 808: inputting the second image sample into the feature extraction model to obtain a second feature image of the second image sample;

step 810: and determining an image segmentation model according to the second characteristic image and the sample label, and returning the image segmentation model to the user.

According to the method for determining the image segmentation model, the feature extraction model is trained through the designed unsupervised multi-level sparse vector quantization variation automatic encoder, so that the segmentation task based on the feature extraction model can be realized through only one very simple MLP image segmentation model, and the image segmentation model is simple in structure, so that a good training effect can be obtained through only a small amount of labeling data, and the training efficiency can be greatly improved.

In addition, an embodiment of the present specification further provides an image segmentation model determining apparatus, including:

The above is another exemplary embodiment of the image segmentation model determination apparatus of the present embodiment. It should be noted that, the technical solution of the image segmentation model determining device and the technical solution of the other image segmentation model determining method belong to the same concept, and details of the technical solution of the other image segmentation model determining device, which are not described in detail, can be referred to the description of the technical solution of the other image segmentation model determining method.

Referring to fig. 9, fig. 9 illustrates a block diagram of a computing device 900 provided in accordance with one embodiment of the present description. The components of computing device 900 include, but are not limited to, memory 910 and processor 920. Processor 920 is coupled to memory 910 via bus 930 with database 950 configured to hold data.

Computing device 900 also includes an access device 940, access device 940 enabling computing device 900 to communicate via one or more networks 960. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 940 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperabilityfor Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 900 and other components not shown in FIG. 9 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 9 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 900 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 900 may also be a mobile or stationary server.

Wherein the processor 920 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the image segmentation model determination method or the image segmentation method described above.

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the above-mentioned image segmentation model determining method or the image segmentation method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the above-mentioned image segmentation model determining method or the image segmentation method.

An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the above-described image segmentation model determination method or image segmentation method.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the above-mentioned image segmentation model determining method or the image segmentation method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the above-mentioned image segmentation model determining method or the image segmentation method.

An embodiment of the present specification also provides a computer program, wherein the computer program, when executed in a computer, causes the computer to execute the steps of the above-described image segmentation model determination method or image segmentation method.

The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the above image segmentation model determining method or the image segmentation method belong to the same concept, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the above image segmentation model determining method or the image segmentation method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. An image segmentation model determination method, comprising:

Determining a feature extraction model according to a first image sample in the first image sample set, wherein the feature extraction model comprises an encoder, the encoder comprises at least two coding layers for extracting image features with different scales, and a vector quantization module which is arranged behind at least one coding layer and is based on dictionary learning, and the vector quantization module is used for enabling the feature extraction model to learn an overcomplete dictionary under different scales of the different coding layers so as to realize multi-scale feature extraction of the first image sample;

2. The image segmentation model determination method according to claim 1, the determining a feature extraction model from a first image sample in the first image sample set, comprising:

3. The method for determining an image segmentation model according to claim 2, wherein after the downsampling the initial feature image of the current coding layer by the first image sample, inputting the next coding layer of the current coding layer, and obtaining the target feature image of the next coding layer by the first image sample, includes:

4. The image segmentation model determination method according to claim 3, the determining, according to the vector quantization module, a target feature image of the first image sample at the current encoding layer, comprising:

5. The image segmentation model determination method according to claim 3, said obtaining the first image sample after the initial feature image of the current coding layer, further comprising:

6. The method for determining an image segmentation model according to claim 2, wherein the step of inputting the first image samples into the decoding layers of the decoder of the feature extraction model for decoding in all target feature images of the coding layers, training according to the decoding result, and obtaining the feature extraction model comprises the following steps:

7. The method for determining the image segmentation model according to claim 6, wherein training to obtain the feature extraction model according to the initial decoded feature images of the first image sample at all decoding layers comprises:

8. The image segmentation model determination method according to claim 1 or 6, the inputting the second image sample into the feature extraction model, obtaining a second feature image of the second image sample, comprising:

9. The image segmentation model determination method as set forth in claim 1, the sample label being an image segmentation result of the second image sample;

10. The method of claim 1, wherein the first and second image sample sets including the target object are first and second CT image sample sets including a target portion of a human body.

11. The image segmentation model determination method as set forth in claim 6, the obtaining the first image sample after the initial decoded feature image of the last decoding layer further comprising:

12. An image segmentation method, comprising:

Inputting the characteristic image into an image segmentation model to obtain an image segmentation result of the CT image, and displaying the image segmentation result to the user, wherein the characteristic extraction model and the image segmentation model are the characteristic extraction model and the image segmentation model in the image segmentation model determining method according to any one of claims 1-11.

13. An image segmentation model determination method, comprising:

14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the image segmentation model determination method of any one of claims 1 to 11, or the image segmentation method of claim 12.