CN111882558B

CN111882558B - Image processing method and device, electronic device and storage medium

Info

Publication number: CN111882558B
Application number: CN202010801035.0A
Authority: CN
Inventors: 王文集; 夏清; 胡志强
Original assignee: Shanghai Shangtang Shancui Medical Technology Co ltd
Current assignee: Shanghai Shangtang Shancui Medical Technology Co ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2025-02-25
Anticipated expiration: 2040-08-11
Also published as: KR20220034844A; WO2022032998A1; TW202219831A; CN111882558A; JP2022547372A

Abstract

The present disclosure relates to an image processing method and device, an electronic device and a storage medium, the method comprising: obtaining local images of one or more scales based on an image to be processed; performing feature extraction processing on the one or more local images respectively to obtain feature maps corresponding to the local images of one or more scales; segmenting the image to be processed based on the feature maps corresponding to the local images of one or more scales to obtain a segmentation result. According to the image processing method of an embodiment of the present disclosure, by performing screenshot processing of the image to be processed at one or more scales and performing feature extraction processing on the local image, feature maps of one or more scales can be obtained, which is conducive to obtaining local fine features and global features, and can simultaneously obtain fine features of smaller targets and more complex global distribution features, thereby improving the accuracy of segmentation processing.

Description

Image processing method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

In the image processing, for the case where the target is small and the distribution is complex, it is difficult to obtain a good segmentation effect, for example, in the medical image processing field, in a computed tomography angiography (Computed tomography angiography, CTA) image, it is difficult to accurately segment the coronary vessel because the coronary vessel is small and the target is complex and is easily disturbed by noise.

Disclosure of Invention

The disclosure provides an image processing method and device, electronic equipment and a storage medium.

According to one aspect of the disclosure, an image processing method is provided, and the image processing method comprises the steps of obtaining one or more scale local images based on an image to be processed, respectively carrying out feature extraction processing on the one or more scale local images to obtain feature images corresponding to the one or more scale local images, and dividing the image to be processed based on the feature images corresponding to the one or more scale local images to obtain a division result.

According to the image processing method of the embodiment of the disclosure, the feature map with one or more scales can be obtained by performing screenshot processing with one or more scales on the image to be processed and performing feature extraction processing on the local image, so that local fine features and global features can be obtained, fine features of smaller targets and more complex global distribution features can be obtained at the same time, and accuracy of segmentation processing is improved.

In one possible implementation manner, the acquiring the local image with one or more scales based on the image to be processed includes performing screenshot processing with multiple scales on the image to be processed to acquire multiple local images, wherein the multiple local images include a first local image with a reference size and a second local image with a size larger than the reference size, and the image centers of the multiple local images are the same.

In one possible implementation manner, the feature extraction processing is performed on the one or more partial images to obtain feature images corresponding to the one or more scale partial images, and the feature extraction processing is performed on the plurality of partial images to obtain a first feature image corresponding to the first partial image and a second feature image corresponding to the second partial image.

In one possible implementation manner, the segmentation of the image to be processed based on the feature map corresponding to the local image with one or more scales to obtain a segmentation result includes performing superposition processing on the first feature map and the second feature map to obtain a third feature map, and performing activation processing on the third feature map to obtain a segmentation result of a target region in the image to be processed.

In one possible implementation manner, the feature extraction processing is performed on the multiple partial images respectively to obtain a first feature map corresponding to a first partial image and a second feature map corresponding to a second partial image, and the feature extraction processing includes performing downsampling processing on the second partial image to obtain a third partial image with a reference size, and performing feature extraction processing on the first partial image and the third partial image respectively to obtain a first feature map corresponding to the first partial image and a second feature map corresponding to the second partial image.

In this way, the local images with different sizes can meet the input requirements of the feature extraction network through downsampling, so that feature graphs with multiple scales are obtained, and the feature information with multiple scales is obtained.

In one possible implementation manner, the first feature map and the second feature map are subjected to superposition processing to obtain a third feature map, the method comprises the steps of carrying out up-sampling processing on the second feature map to obtain a fourth feature map, wherein the ratio of the sizes of the fourth feature map to the first feature map is the same as the ratio of the sizes of the second partial image to the first partial image, carrying out clipping processing on the fourth feature map to obtain a fifth feature map, the fifth feature map is consistent with the first feature map in size, and carrying out weighted summation processing on the first feature map and the fifth feature map to obtain the third feature map.

In this way, the sizes of the plurality of feature images can be unified through up-sampling and clipping processing, and weighted average is performed to fuse the plurality of feature images, so that the obtained third feature image not only contains more feature information, but also is beneficial to determining the details and distribution of the target, reducing noise interference and improving segmentation accuracy.

In one possible implementation, the image processing method is implemented through a neural network, the neural network comprises a plurality of feature extraction networks, an overlay network and an activation layer, wherein the method further comprises training the plurality of feature extraction networks, the overlay network and the activation layer through a sample image to obtain a plurality of trained feature extraction networks, an overlay network and an activation layer.

In one possible implementation manner, training the feature extraction networks, the superposition network and the activation layer through sample images to obtain a plurality of trained feature extraction networks, superposition networks and activation layers, wherein the training process comprises the steps of performing screenshot processing on the sample images to obtain a fourth sample partial image with a reference size and a fifth sample partial image with a larger reference size, performing downsampling processing on the fifth sample partial image to obtain a sixth sample partial image with the reference size, respectively inputting the fourth sample partial image and the sixth sample partial image into the corresponding feature extraction networks to perform feature extraction processing to obtain a third sample feature image corresponding to the fourth sample partial image and a fourth sample feature image corresponding to the sixth sample partial image, performing upsampling processing and clipping processing on the fourth sample feature image to obtain a fifth sample feature image, inputting the third sample feature image and the fifth sample feature image into the superposition network to obtain a sixth sample feature image, inputting the sixth sample feature image into the activation layer to perform feature extraction processing, respectively inputting the fourth sample partial image and the fourth sample feature image corresponding to obtain a third sample feature image corresponding to the fourth sample feature image and a fourth sample feature image corresponding to the sixth sample feature image, performing the upsampling processing and clipping processing on the fourth sample feature image to obtain a fifth sample feature image, and obtaining a target feature image loss, and determining the superimposed feature images and the feature loss of the activation layer, and the training network and the feature extraction network.

In this way, the neural network can be trained through the pre-trained feature extraction network and the sample image, and the training accuracy of the neural network is improved. Further, through training the superimposed network to obtain the superimposed weight, proper weight parameters can be selected in the training process, the effect of feature fusion is improved, detail features and global features are optimized, and the accuracy of the neural network is improved.

In one possible implementation manner, the third sample feature map and the fifth sample feature map are input into the superposition network to obtain a sixth sample feature map, and the third sample feature map and the fifth sample feature map are subjected to weighted summation through the superposition network to obtain the sixth sample feature map.

In a possible implementation manner, the training of the feature extraction networks, the superposition networks and the activation layer through the sample image to obtain the trained feature extraction networks, superposition networks and activation layer further comprises the steps of performing activation processing on the fourth sample feature image to obtain a third sample target area, determining third network loss of the feature extraction network corresponding to the fourth sample feature image according to labeling information of the third sample target area and the sample image, and training the feature extraction network corresponding to the fourth sample feature image according to the third network loss.

In this way, the prediction information in the larger partial image may be utilized to further train the feature extraction network, which may further improve the accuracy of the neural network.

According to another aspect of the disclosure, an image processing device is provided, which comprises a local image module, a feature extraction module and a segmentation module, wherein the local image module is used for acquiring one or more scale local images based on the to-be-processed image, the feature extraction module is used for respectively carrying out feature extraction processing on the one or more local images to obtain feature images corresponding to the one or more scale local images, and the segmentation module is used for segmenting the to-be-processed image based on the feature images corresponding to the one or more scale local images to obtain segmentation results.

In one possible implementation, the local image module is further configured to perform screenshot processing of multiple scales on an image to be processed to obtain multiple local images, wherein the multiple local images include a first local image of a reference size and a second local image of a size larger than the reference size, and the image centers of the multiple local images are the same.

In a possible implementation manner, the feature extraction module is further configured to perform feature extraction processing on the plurality of local images respectively to obtain a first feature map corresponding to the first local image and a second feature map corresponding to the second local image.

In a possible implementation manner, the segmentation module is further configured to perform superposition processing on the first feature map and the second feature map to obtain a third feature map, and perform activation processing on the third feature map to obtain a segmentation result of a target region in the image to be processed.

In one possible implementation manner, the feature extraction module is further configured to perform downsampling processing on the second local image to obtain a third local image with a reference size, and perform feature extraction processing on the first local image and the third local image to obtain a first feature map corresponding to the first local image and a second feature map corresponding to the second local image.

In one possible implementation manner, the segmentation module is further configured to perform upsampling processing on the second feature map to obtain a fourth feature map, wherein the ratio of the dimensions of the fourth feature map to the dimensions of the first feature map are the same as the ratio of the dimensions of the second partial image to the dimensions of the first partial image, perform cropping processing on the fourth feature map to obtain a fifth feature map, wherein the fifth feature map is consistent with the dimensions of the first feature map, and perform weighted summation processing on the first feature map and the fifth feature map to obtain the third feature map.

In one possible implementation, the image processing apparatus is executed by a neural network, where the neural network includes a plurality of feature extraction networks, an overlay network, and an activation layer, and the apparatus further includes a training module configured to train the plurality of feature extraction networks, the overlay network, and the activation layer through a sample image, to obtain a plurality of trained feature extraction networks, an overlay network, and an activation layer.

In one possible implementation, the training module is further configured to perform a multi-scale screenshot process on a sample image to obtain a fourth sample partial image with a reference size and a fifth sample partial image with a reference size, perform a downsampling process on the fifth sample partial image to obtain a sixth sample partial image with the reference size, input the fourth sample partial image and the sixth sample partial image into a corresponding feature extraction network respectively to perform a feature extraction process to obtain a third sample feature image corresponding to the fourth sample partial image and a fourth sample feature image corresponding to the sixth sample partial image, perform an upsampling process and a clipping process on the fourth sample feature image to obtain a fifth sample feature image, input the third sample feature image and the fifth sample feature image into the overlay network to obtain a sixth sample feature image, input the sixth sample feature image into the activation layer to obtain a second sample target area of the sample image, perform a feature extraction process on the second sample target area and the sixth sample partial image respectively, and obtain a fourth sample feature image corresponding to the sixth sample feature image, perform an upsampling process and a fifth sample feature image corresponding to the fourth sample feature image, input the fourth sample feature image corresponding to the fourth sample feature image, input the third sample feature image obtained to the fifth sample feature image, input the third sample feature image obtained by the third feature image is input to the activation layer, and the overlay feature layer is obtained according to the second feature loss, the overlay feature layer is obtained, the training network is obtained.

In a possible implementation, the training module is further configured to perform a weighted summation process on the third sample feature map and the fifth sample feature map through the overlay network to obtain the sixth sample feature map.

In a possible implementation manner, the training module is further configured to perform activation processing on the fourth sample feature map to obtain a third sample target area, determine a third network loss of a feature extraction network corresponding to the fourth sample feature map according to labeling information of the third sample target area and the sample image, and train the feature extraction network corresponding to the fourth sample feature map according to the third network loss.

According to another aspect of the present disclosure, there is provided an electronic device comprising a processor, a memory for storing processor-executable instructions, wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.

According to another aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

FIG. 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure;

2A, 2B, 2C and 2D illustrate schematic diagrams of screenshot processing of an image to be processed according to an embodiment of the disclosure;

FIG. 3 shows a schematic application diagram of an image processing method according to an embodiment of the present disclosure;

fig. 4 shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of an electronic device according to an embodiment of the disclosure;

Fig. 6 shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure, as shown in fig. 1, the method including:

in step S11, acquiring a local image of one or more scales based on the image to be processed;

In step S12, feature extraction processing is performed on the one or more partial images, so as to obtain feature graphs corresponding to the one or more scale partial images;

In step S13, the image to be processed is segmented based on the feature map corresponding to the local image with one or more scales, so as to obtain a segmentation result.

In a possible implementation manner, the image processing method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, or the like, and the method may be implemented by a processor invoking computer readable instructions stored in a memory. Or the method may be performed by a server.

In one possible implementation, the image to be processed may include a medical image, for example, a computed tomography angiography (Computed tomography angiography, CTA) image, and the target region includes a region where a target such as a coronary vessel is located, where the coronary vessel target is small, has a complex distribution, is associated with other vessels (such as a pulmonary vessel), and is susceptible to noise interference from the other vessels. It should be understood that the image to be processed may also be other images, such as a portrait image, a street view image, etc., and the object in the image to be processed may include a person's five sense organs, a pedestrian on a street, a vehicle, etc., and the type of the image to be processed and the type of the object are not limited by the present disclosure.

In one possible implementation, in step S11, a screenshot of one or more scales may be performed on the image to be processed, and a local image of one or more scales is obtained. Any scale of screenshot process can be performed on the image to be processed to obtain a local image of the scale, for example, a smaller scale screenshot can be performed to obtain a local image representing a detail feature, or a larger scale screenshot can be performed to obtain a local image representing a global feature.

In one possible implementation manner, the image to be processed may also be subjected to multi-scale screenshot to obtain a multi-scale local image, and step S11 may include performing multi-scale screenshot processing on the image to be processed to obtain a plurality of local images, where the plurality of local images include a first local image with a reference size and a second local image with a size larger than the reference size, and the image centers of the plurality of local images are the same.

In one possible implementation, the image to be processed may be subjected to a plurality of scale screenshot processes, obtaining a plurality of partial images of different sizes, including a first partial image of a reference size and a second partial image of a size greater than the reference size. In the screenshot, the image centers can be kept consistent, for example, the screenshot can be performed by taking the image center of the image to be processed as the center, and the image center of each obtained local image is consistent with the image center of the image to be processed. Or any pixel point of the image to be processed is taken as the image center of the local image, screenshot is carried out, and the obtained image center of each local image is taken as the pixel point. Among the plurality of partial images obtained, the second partial image includes more content than the first partial image. The present disclosure does not limit the selection of the reference size and the shape of the partial image.

Fig. 2A, 2B, 2C, and 2D illustrate schematic diagrams of screenshot processing of an image to be processed according to an embodiment of the disclosure. As shown in fig. 2A, the image to be processed may be subjected to a plurality of scale capturing processes, that is, a capturing process with a reference size as a capturing size is performed to obtain a first partial image x1 (as shown in fig. 2B) of the reference size, and a capturing process with a capturing size larger than the reference size is performed to obtain a second partial image x2 (as shown in fig. 2C) and a second partial image x3 (as shown in fig. 2D) larger than the reference size. The first partial image x1 and the second partial images x2 and x3 have the same image center, the second partial image x2 contains more content than the first partial image x1, and the second partial image x3 contains more content than the second partial image x 2. The first partial image x1 of the reference size contains finer local detail features (e.g. detail features of the coronary vessel itself) and the second partial images x2 and x3 contain more global distribution features (e.g. distribution of the coronary vessel and links to other vessels), e.g. links between the object in the first partial image x1 and other regions in the second partial image x2 or x3 (e.g. links between the coronary vessel in the first partial image x1 and vessels in other regions in the second partial image x2 or x 3).

In a possible implementation manner, in step S12, if the screenshot obtains a local image with one scale, feature extraction processing may be performed on the local image to obtain a feature map. If the local images with multiple scales are obtained, the feature extraction processing can be respectively carried out on the multiple local images, so as to obtain a feature map of each local image. Step S12 may include performing feature extraction processing on the plurality of partial images, respectively, to obtain a first feature map corresponding to the first partial image and a second feature map corresponding to the second partial image.

In an example, the feature extraction processing may be performed on each local image through a feature extraction network, which may be a deep learning neural network such as a convolutional neural network, and the type of the feature extraction network is not limited in the present disclosure.

In an example, the number of feature extraction networks may be the same as the number of partial images, i.e. one partial image is extracted per feature extraction network. For example, the first partial image x1 may be input to the feature extraction network 1 for feature extraction processing, the second partial image x2 may be input to the feature extraction network 2 for feature extraction processing, and the second partial image x3 may be input to the feature extraction network 3 for feature extraction processing.

In one possible implementation, the plurality of feature extraction networks may be the same feature extraction network, the sizes of the images input to the feature extraction network may be kept consistent, the reference size of the first partial image may be used as the input size of the input feature extraction network, and the second partial image larger than the reference size may be processed to reduce the size of the second partial image so as to meet the input requirement of the feature extraction network. In an example, other dimensions may be used as the input dimensions of the input feature extraction network, for example, a dimension may be preset as the input dimension, and partial images inconsistent with the input dimension may be processed to meet the input requirement of the feature extraction network. The present disclosure does not limit the choice of input dimensions.

In one possible implementation manner, when the reference size is taken as an input size, the feature extraction processing is respectively performed on the plurality of local images to obtain a first feature map corresponding to a first local image and a second feature map corresponding to a second local image, and the method can comprise the steps of performing downsampling processing on the second local image to obtain a third local image of the reference size, and performing feature extraction processing on the first local image and the third local image to obtain a first feature map corresponding to the first local image and a second feature map corresponding to the second local image.

In one possible implementation, the second partial image has a size greater than the reference size, i.e. greater than the input size, and the second partial image may be downsampled to obtain a third partial image of the reference size, i.e. the third partial image meets the input size requirement of the feature extraction network.

In one possible implementation manner, the first partial image and the third partial image may be subjected to feature extraction processing, and in an example, the corresponding feature extraction networks may be input to perform feature extraction processing, for example, the first partial image x1 may be input to the feature extraction network 1 to perform feature extraction processing, to obtain a first feature map, the third partial image corresponding to the second partial image x2 may be input to the feature extraction network 2 to perform feature extraction processing, to obtain a second feature map corresponding to x2, and the third partial image corresponding to the second partial image x3 may be input to the feature extraction network 3to perform feature extraction processing, to obtain a second feature map corresponding to x 3.

In one possible implementation, the first feature map may contain fine local detail features (e.g., detail features of the coronary vessel itself), e.g., detail features (e.g., shape, contour, etc.) of the object (e.g., coronary vessel) in the first local image x1, and the second feature map may have a larger receptive field, containing more global distribution features, e.g., distribution of coronary vessels in the second local image x2 or x3 and links with other vessels.

In one possible implementation manner, when the plurality of feature extraction networks are the same feature extraction network and the sizes of the first partial image and the third partial image that are input are the same, the feature images output by each feature extraction network, that is, the sizes of the first feature image and the second feature image are the same, but the second feature image is a processing result of the second partial image, so that the size of a corresponding region of the second feature image in the original image to be processed is larger than the size of a corresponding region of the first feature image in the original image to be processed, and the feature information included in the second feature image is different from that of the first feature image.

In one possible implementation manner, the image to be processed may be segmented according to the feature map including different feature information, so as to obtain a segmentation result, and step S13 may include performing superposition processing on the first feature map and the second feature map to obtain a third feature map, and performing activation processing on the third feature map to obtain a segmentation result of the target region in the image to be processed.

In one possible implementation manner, the stacking processing is performed on the first feature map and the second feature map to obtain a third feature map, which may include performing upsampling processing on the second feature map to obtain a fourth feature map, where a ratio of dimensions of the fourth feature map to the first feature map is the same as a ratio of dimensions of the second partial image to the first partial image, performing clipping processing on the fourth feature map to obtain a fifth feature map, where the fifth feature map is consistent with the first feature map in dimension, and performing weighted summation processing on the first feature map and the fifth feature map to obtain the third feature map.

In one possible implementation manner, the second feature map may be subjected to an upsampling process to obtain a fourth feature map, for example, the upsampling process may be performed by using an interpolation method or the like, and the method of the upsampling process is not limited in this disclosure.

In one possible implementation, the first partial image and the second partial image are both part of an image to be processed, the first partial image is smaller in size than the second partial image, and the first partial image is the same as the second partial image in center, i.e., the first partial image is part of the second partial image. The first feature map is a feature map of the first partial image, and the fourth feature map is a feature map of the second partial image, so that the ratio of the dimensions of the fourth feature map to the dimensions of the first feature map is the same as the ratio of the dimensions of the second partial image to the dimensions of the first partial image.

In an example, the size of the second partial image x2 is 8 times (three-dimensional image) that of the first partial image x1, the first partial image is a part of the second partial image (the first partial image x1 is consistent with the central area of the second partial image x2, and the length and width of the first partial image x1 are both one half of the second partial image x 2), then the ratio of the size of the fourth feature map corresponding to the second partial image x2 to that of the first feature map is also 8, and the length and width of the first feature map are both one half of the fourth feature map corresponding to the second partial image x2, and the central area of the fourth feature map corresponding to the second partial image x2 is the same as the corresponding area of the first partial image x1 in the image to be processed. In the example, the size of the second partial image x3 is 27 times that of the first partial image x1, the first partial image is a part of the second partial image (the first partial image x1 is consistent with the central area of the second partial image x3, and the length and width of the first partial image x1 are both one third of that of the second partial image x 3), then the ratio of the size of the fourth feature map corresponding to the second partial image x3 to that of the first feature map is 27, the length and width of the first feature map are both one third of that of the fourth feature map corresponding to the second partial image x3, and the central area of the fourth feature map corresponding to the second partial image x3 is the same as that of the first partial image x1 in the image to be processed.

In an example, the first feature map may be of a size consistent with the first partial image x1, the fourth feature map corresponding to the second partial image x2 may be of a size consistent with the second partial image x2, and the fourth feature map corresponding to the second partial image x3 may be of a size consistent with the second partial image x 3. The present disclosure does not limit the size of the first feature map and the fourth feature map.

In one possible implementation, the fourth feature map may be cropped to obtain a fifth feature map that is consistent with the first feature map in size. In an example, a central region of the fourth feature map (a region corresponding to the central region in the image to be processed is the same as a region corresponding to the first feature map in the image to be processed) may be reserved, and other regions may be cropped to obtain the fifth feature map. In an example, the fifth feature map corresponding to the second partial image x2 is a central region of the fourth feature map corresponding to the second partial image x2 (the corresponding region in the image to be processed is the same as the first partial image x 1), and the fifth feature map corresponding to the second partial image x3 is a central region of the fourth feature map corresponding to the second partial image x3 (the corresponding region in the image to be processed is the same as the first partial image x 1). The fifth feature map contains a part of the global features of the central region of the second partial image, for example, the global features of the second partial image x2 or x3 in the central region thereof, specifically, other regions than the central region in the fourth feature map are cropped, and only the central region (i.e., the fifth feature map) remains, so that the fifth feature map contains features of the central region of the fourth feature map, for example, the receptive field of the fifth feature map is larger than the first feature map, and can contain, for example, distribution information of coronary vessels in the region corresponding to x1 of the image to be processed, which is advantageous for determining the distribution of coronary vessels, and reducing noise interference of other vessels (for example, pulmonary vessels). That is, the fifth feature map and the first feature map are both feature maps of the region corresponding to x1, but because the parameters (such as the weight and the receptive field) of each feature extraction network are different, the features of the fifth feature map are different from those of the first feature map, so that the feature information amount of the region corresponding to x1 can be increased in the above manner, a richer basis is provided for the segmentation process, and the accuracy of the segmentation process is improved.

In one possible implementation, the first feature map and the fifth feature map may be subjected to a weighted summation process to obtain the third feature map. For example, a pixel-by-pixel weighted summation may be performed on the first and fifth feature maps. In an example, the weight of the first feature map is α, the weight of the fifth feature map corresponding to the second partial image x2 is β, the weight of the fifth feature map corresponding to the second partial image x3 is γ, and according to the weights, the first feature map, the fifth feature map corresponding to the second partial image x2, and the fifth feature map corresponding to the second partial image x3 may be subjected to pixel-by-pixel weighted summation, so as to obtain the third feature map. The third feature map contains both the detailed features of the first feature map and the global features of the fifth feature map.

In an example, the weighted summation process may be performed by a superposition network, which may be a deep learning neural network such as a convolutional neural network, the present disclosure is not limited to the type of superposition network.

In one possible implementation, the third feature map may be subjected to an activation process, for example, the activation process may be performed according to feature information included in the third feature map, to obtain a segmentation result of a target region where a target (for example, a coronary vessel) is located, for example, the third feature map may be subjected to an activation process by a softmax activation function to obtain a probability map, and may be subjected to an activation process by another activation function, for example, a RELU activation function, or the like, which is not limited in the present disclosure. In the probability map, the probability of each pixel point represents the probability of the pixel point in the target area (for example, in the case where the probability is greater than or equal to the probability threshold (for example, 50%), the pixel point may be considered to be in the target area), and the position of the target area where the target is located may be determined based on the probability of each pixel point in the probability map. In an example, the segmentation process may be performed by an activation layer. In an example, the target region may be a region where a target in a region corresponding to x1 in the image to be processed is located, for example, a coronary vessel in the region corresponding to x1 may be segmented.

In one possible implementation manner, the image processing method is implemented through a neural network, where the neural network includes a plurality of feature extraction networks, an overlay network, and an activation layer, for example, feature extraction processing may be performed on a plurality of local images through the feature extraction networks, overlay processing may be performed on a first feature map and a fifth feature map through the overlay network, and segmentation processing may be performed on a third feature map through the activation layer. The neural network may be trained prior to performing the above image processing method using the neural network.

In one possible implementation, the method further comprises training the plurality of feature extraction networks, the overlay network, and the activation layer through the sample image to obtain a plurality of trained feature extraction networks, overlay networks, and activation layers.

In one possible implementation, the sample image may comprise a medical image, e.g., a CTA image, and the sample image may comprise labeling information for a target (e.g., a coronary vessel).

In one possible implementation, the feature extraction network may be pre-trained. A plurality of scale screenshot processes may be performed on the sample image to obtain a first sample partial image of the reference size and a second sample partial image of a size greater than the reference size. The total number of the first sample partial images and the second sample partial images is the same as the number of the feature extraction networks. In an example, the image centers of the first sample partial image and the second sample partial image are the same, and the region taken by the first sample partial image is the center region of the second sample partial image. For example, the sample image may be captured to obtain a first sample partial image y1 of the reference size, a second sample partial image y2 of a size larger than the reference size, and a second sample partial image y3 of a size larger than the second sample partial image y 2.

In one possible implementation, the second sample partial image may be subjected to a downsampling process to obtain a third sample partial image of the reference size. The plurality of feature extraction networks may be the same feature extraction network, and the sizes of the images input to the feature extraction network may be kept uniform, for example, a reference size of the first sample partial image may be used as the input size of the input feature extraction network, and a downsampling process may be performed on the second sample partial image larger than the reference size to reduce the size of the second sample partial image, thereby obtaining a third sample partial image, so that the third sample partial image meets the input requirement of the feature extraction network, that is, the size of the third sample partial image is equal to the reference size. In an example, the second sample partial images y2 and y3 may be respectively subjected to downsampling processing to obtain third sample partial images of the reference size.

In one possible implementation manner, the first sample partial image and the third sample partial image may be respectively input into a corresponding feature extraction network to perform feature extraction processing, so as to obtain a first sample feature map corresponding to the first sample partial image and a second sample feature map corresponding to the second sample partial image. In an example, the first sample partial image y1 may be input to the feature extraction network 1 to perform feature extraction processing to obtain a first sample feature map, the third sample partial image corresponding to the second sample partial image y2 may be input to the feature extraction network 2 to perform feature extraction processing to obtain a second sample feature map corresponding to y2, and the third sample partial image corresponding to the second sample partial image y3 may be input to the feature extraction network 3 to perform feature extraction processing to obtain a second sample feature map corresponding to y 3.

In one possible implementation manner, the first sample feature map and the second sample feature map may be respectively subjected to an activation process, for example, by using a softmax function, to obtain first sample target areas corresponding to the plurality of feature extraction networks respectively. In an example, the activation processing may be performed on the first sample feature map, the second sample feature map corresponding to y2, and the second sample feature map corresponding to y3, respectively, to obtain the first sample target region in the first sample partial image y1, the first sample target region in the second sample partial image y2, and the first sample target region in the second sample partial image y3, respectively. That is, the sample target region is determined in the plurality of sample partial images by the feature extraction process and the activation process of the feature extraction network, respectively, and there may be an error in the sample target region.

In one possible implementation, the first network loss of the plurality of feature extraction networks may be determined according to the labeling information of the sample image and the first sample target area, respectively. In an example, the first network loss of the feature extraction network 1 may be determined from the labeling information of the region corresponding to y1 in the sample image and the first sample target region in the first sample partial image y 1. The first network loss of the feature extraction network 2 may be determined from the labeling information of the region corresponding to y2 in the sample image and the first sample target region in the second sample partial image y 2. The first network loss of the feature extraction network 3 may be determined from the labeling information of the region corresponding to y3 in the sample image and the first sample target region in the second sample partial image y 3. In an example, the first network loss may include a cross entropy loss and a collective similarity loss (dice loss), the cross entropy loss and the collective similarity loss of each feature extraction network may be determined according to the first sample target area and corresponding labeling information, and the cross entropy loss and the collective similarity loss may be weighted and summed to obtain the first network loss of each feature extraction network, where the determining manner of the first network loss is not limited in the disclosure.

In one possible implementation, the plurality of feature extraction networks may be trained based on the first network loss to obtain a plurality of pre-trained feature extraction networks. In an example, each feature extraction network may be trained based on a first network loss of each feature extraction network, e.g., feature extraction network 1 may be trained based on a first network loss of feature extraction network 1, feature extraction network 2 may be trained based on a first network loss of feature extraction network 2, and feature extraction network 3 may be trained based on a first network loss of feature extraction network 3. In an example, the network parameters of the feature extraction network may be adjusted by back-propagating the first network loss through a gradient descent method, which may be iteratively performed a plurality of times until a first training condition is satisfied, which may include a number of training times, i.e., the first training condition is satisfied when the number of training times is greater than or equal to a preset number, which may include a magnitude or convergence of the first network loss, e.g., the first network loss is less than or equal to a preset threshold, or the first training condition is satisfied when the first network loss converges to a preset interval, which is not limited by the present disclosure. In case the first training condition is fulfilled, a pre-trained feature extraction network may be obtained.

In one possible implementation, the neural network may be trained based on a pre-trained feature extraction network.

In one possible implementation, the sample image may be subjected to a multi-scale screenshot process to obtain a fourth sample partial image of the reference size, and a fifth sample partial image that is larger than the reference size. The total number of the fourth sample partial images and the fifth sample partial images is the same as the number of the feature extraction networks. In an example, the image centers of the fourth sample partial image and the fifth sample partial image are the same, and the region taken by the fourth sample partial image is the center region of the fifth sample partial image. For example, the sample image may be captured to obtain a fourth sample partial image z1 of the reference size, a fifth sample partial image z2 of a size greater than the reference size, and a fifth sample partial image z3 of a size greater than the fifth sample partial image z 2.

In one possible implementation, the fifth sample partial image may be subjected to a downsampling process to obtain a sixth sample partial image of a reference size. The plurality of feature extraction networks may be feature extraction networks of the same structure, and the sizes of the images input to the feature extraction networks may be kept uniform, for example, a reference size of the fourth sample partial image may be used as an input size of the input feature extraction network, and a downsampling process may be performed on the fifth sample partial image larger than the reference size to reduce the size of the fifth sample partial image, thereby obtaining a sixth sample partial image, which satisfies the input requirement of the feature extraction network, that is, the size of the sixth sample partial image is equal to the reference size. In an example, the fifth sample partial images z2 and z3 may be respectively subjected to downsampling processing to obtain sixth sample partial images of the reference size.

In one possible implementation manner, the fourth sample partial image and the sixth sample partial image may be respectively input into a corresponding feature extraction network to perform feature extraction processing, so as to obtain a third sample feature map corresponding to the fourth sample partial image and a fourth sample feature map corresponding to the sixth sample partial image. In an example, the fourth sample partial image z1 may be input to the feature extraction network 1 to perform feature extraction processing to obtain a third sample feature map, the sixth sample partial image corresponding to the fifth sample partial image z2 may be input to the feature extraction network 2 to perform feature extraction processing to obtain a fourth sample feature map corresponding to z2, and the sixth sample partial image corresponding to the fifth sample partial image z3 may be input to the feature extraction network 3 to perform feature extraction processing to obtain a fourth sample feature map corresponding to z 3.

In one possible implementation, the fourth sample feature map may be subjected to an upsampling process and a clipping process to obtain a fifth sample feature map. In an example, the plurality of feature extraction networks are the same feature extraction network, and the size of the fourth sample partial image and the sixth sample partial image that are input are the same, and therefore, the obtained third sample feature map and fourth sample feature map are the same in size. The fourth sample feature map may be up-sampled such that a ratio of dimensions of the up-sampled fourth sample feature map and the third sample feature map is the same as a ratio of the fifth sample partial image and the fourth sample partial image. For example, the third sample feature map and the fourth sample feature map may be consistent with a fourth sample partial image size, and the up-sampled fourth sample feature map may be consistent with a fifth sample partial image size.

In an example, a central region of the fifth sample feature map may be reserved, and other regions may be cropped to obtain a fifth sample feature map, where a region of the fifth sample feature map corresponding to a region of the third sample feature map corresponding to the sample image, for example, each region of the fourth sample partial image z1 corresponding to the sample image.

In one possible implementation, the third sample feature map and the fifth sample feature map may be input into the overlay network to obtain a sixth sample feature map. The third sample characteristic diagram and the fifth sample characteristic diagram are input into the superposition network to obtain a sixth sample characteristic diagram, and the method comprises the step of carrying out weighted summation processing on the third sample characteristic diagram and the fifth sample characteristic diagram through the superposition network to obtain the sixth sample characteristic diagram.

In an example, the overlay network may perform a pixel-by-pixel weighted summation of the three sample feature map and the fifth sample feature map. In an example, the third sample feature map has a weight of α, the fifth sample feature map corresponding to the fifth sample partial image z2 has a weight of β, the fifth sample feature map corresponding to the fifth sample partial image z3 has a weight of γ, and α, β, and γ may be network parameters of the overlay network, and the value of the weight may be determined by training the overlay network. After the superimposed network processing, a sixth sample feature map may be obtained, where the sixth sample feature map may include more feature information.

In one possible implementation, the sixth sample feature map may be input to the activation layer to obtain a second sample target region of the sample image. In an example, the activation layer may perform activation processing on the sixth sample feature map through a softmax function to obtain a target region where a target (e.g., a coronary vessel) is located. The target region may be a region where a target in a corresponding region of the fourth sample partial image z1 in the sample image is located, for example, a region where a coronary vessel in the corresponding region of z1 in the sample image is located, that is, a second sample target region may be segmented. The second sample target region may be subject to error.

In one possible implementation, a second network loss of the neural network may be determined from the second sample target region and the labeling information of the sample image. The second network loss may be determined according to labeling information of the sample image (for example, labeling information of z1 in a corresponding region in the sample image) and a second sample target region obtained by the neural network, and in an example, the second network loss may include a cross entropy loss and a collective similarity loss (dice), cross entropy loss and collective similarity loss of each neural network may be determined according to the second sample target region and the labeling information, and the cross entropy loss and the collective similarity loss may be weighted and summed to obtain the second network loss of the neural network, where a determination manner of the second network loss is not limited in the present disclosure.

In one possible implementation, the plurality of feature extraction networks, the overlay network, and the activation layer may be trained based on the second network loss, to obtain a plurality of trained feature extraction networks, overlay networks, and activation layers. In an example, the network parameters of the neural network may be adjusted by back-propagating the second network loss by a gradient descent method, which may be iteratively performed a plurality of times until a second training condition is satisfied, which may include a number of training times, i.e., the second training condition is satisfied when the number of training times is greater than or equal to a preset number, which may include a magnitude or convergence of the second network loss, e.g., the second network loss is less than or equal to a preset threshold, or the second training condition is satisfied when the second network loss converges to a preset interval, which is not limited by the present disclosure. After the second training condition is met, a plurality of trained feature extraction networks, superposition networks and an activation layer can be obtained, and the method can be used for dividing targets in images.

In one possible implementation manner, the parameters of the feature extraction network can be further adjusted, so that the accuracy of the neural network is further improved. The method comprises the steps of training the feature extraction networks, the superposition network and the activation layer through sample images to obtain a plurality of trained feature extraction networks, the superposition network and the activation layer, activating the fourth sample feature images to obtain a third sample target area, determining third network loss of the feature extraction network corresponding to the fourth sample feature images according to labeling information of the third sample target area and the sample images, and training the feature extraction network corresponding to the fourth sample feature images according to the third network loss.

In an example, the fourth sample feature map may be subjected to an activation process (e.g., an activation process may be performed by a softmax function) prior to the upsampling and cropping process of the fourth sample feature map to obtain the third sample target region. For example, the third sample target region is predicted by the fourth sample feature map obtained by the feature extraction network 2 and the feature extraction network 3. In an example, the feature extraction network 2 may predict a third sample target region in the fifth sample partial image z2 and the feature extraction network 3 may predict a third sample target region in the fifth sample partial image z 3.

In an example, the third network loss of the feature extraction network 2 may be obtained using the labeling information of the corresponding region of z2 in the sample image and the third sample target region in the fifth sample partial image z2, and the third network loss of the feature extraction network 3 may be obtained using the labeling information of the corresponding region of z3 in the sample image and the third sample target region in the fifth sample partial image z3, which in an example may include a cross entropy loss and a set similarity loss, which is not limited by the present disclosure.

In an example, the third network penalty of each feature extraction network may be utilized to train each feature extraction network to continue adjusting parameters of the feature extraction network to improve the accuracy of the neural network using the sample target region predicted in the larger partial images (e.g., z2 and z 3) and the labeling information.

After the training is completed, the trained neural network may be used in an image segmentation process to segment a target region in an image (e.g., a CTA image).

According to the image processing method of the embodiment of the disclosure, the screenshot processing of a plurality of scales is carried out on the image to be processed, and the local images with different sizes can all meet the input requirements of the feature extraction network through downsampling, so that the feature images with the plurality of scales are obtained. The sizes of the plurality of feature maps can be unified through up-sampling and clipping processing, and weighted average is carried out so as to fuse the plurality of feature maps, so that the obtained third feature map contains more feature information, is favorable for determining the details of the target, is favorable for determining the distribution of the target, and reduces noise interference. Further, the neural network can be trained through the pre-trained feature extraction network and the sample image, so that the training precision of the neural network is improved. And the superimposed weight can be obtained by training the superimposed network, so that proper weight parameters can be selected in the training process, the effect of feature fusion is improved, detail features and global features are optimized, and the feature extraction network is further trained by utilizing the prediction information in a larger local area, thereby being beneficial to improving the accuracy of the neural network and the accuracy of segmentation processing.

Fig. 3 illustrates an application diagram of an image processing method according to an embodiment of the present disclosure, and as illustrated in fig. 3, a neural network for image segmentation may include a feature extraction network 1, a feature extraction network 2, a feature extraction network 3, an overlay network, and an activation layer.

In one possible implementation, the neural network described above may be trained through a sample image, which may include a medical image, e.g., a CTA image, which may include labeling information for a target (e.g., coronary blood vessel). The sample image may be preprocessed, for example, the sample image may be subjected to preprocessing such as resampling and normalization, and the preprocessed sample image may be subjected to screenshot processing to obtain a local image x1 with a reference size, a local image x2 with a size larger than the reference size, and a local image x3 with a size larger than x2, and the screenshot mode refers to fig. 3, three local images with the same image center but different sizes, that is, x1, x2, and x3, may be intercepted in the sample image, further, the local images x2 and x3 may be downsampled, the sizes thereof may be reduced to the reference size, and the local images x1, the x2 with the reference size, and the x3 with the reference size may be respectively input into the feature extraction network 1, the feature extraction network 2, and the feature extraction network 3 for training (for example, training is performed according to the output result of each feature extraction network and the labeling information of the sample image), and the pre-trained feature extraction network 1, the feature extraction network 2, and the feature extraction network 3 may be obtained.

In one possible implementation, the neural network may be trained continuously through the pre-trained feature extraction network 1, feature extraction network 2, and feature extraction network 3, as well as the sample images (which may be the same as the sample images described above, or other sample images). In an example, the sample image may be preprocessed and screenshot processed to obtain a partial image x1 of a reference size, a partial image x2 of a size larger than the reference size, and a partial image x3 of a size larger than x 2. The partial images x2, x3 may be downsampled, the size thereof may be reduced to a reference size, and the partial image x1, the reference size x2, and the reference size x3 may be input to the feature extraction network 1, the feature extraction network 2, and the feature extraction network 3, respectively, the feature extraction network 1 may obtain a feature map corresponding to the partial image x1, the feature extraction network 2 may obtain a feature map corresponding to the partial image x2, and the feature extraction network 3 may obtain a feature map corresponding to the partial image x3.

In one possible implementation, the feature map corresponding to the partial image x2 and the feature map corresponding to the partial image x3 may be upsampled and the center region corresponding to the partial image x1 may be cropped. Further, the feature map corresponding to the partial image x1 may be subjected to a superimposition process with the two clipped feature maps, for example, the three feature maps may be respectively weighted and summed with weights α, β, and γ through a superimposition network, and the superimposition result may be input to an activation layer to be subjected to an activation process, so as to obtain a second sample target region, that is, a target region predicted by the partial image x1 in a corresponding region in the sample image. The network loss of the neural network can be determined according to the labeling information of the target area and the sample image, and the neural network can be trained by utilizing the network loss.

In a possible implementation manner, the feature graphs output by the feature extraction network 2 and the feature extraction network 3 may be respectively subjected to activation processing, so as to respectively obtain a third sample target area, that is, a target area predicted by x2 in a corresponding area in the sample image and a target area predicted by x3 in a corresponding area in the sample image, further, the network losses of the feature extraction network 2 and the feature extraction network 3 may be respectively determined by using labeling information of the third sample target area and the sample image, and the feature extraction network 2 and the feature extraction network 3 may be trained.

In one possible implementation manner, after the training process is completed, the neural network may be used to perform segmentation processing on the image to be processed, for example, the image to be processed may be preprocessed, and screenshot processing is performed on the image to be processed, so as to obtain a local image x1 of a reference size, a local image x2 of a size greater than the reference size, and a local image x3 of a size greater than x 2. The partial images x2, x3 may be downsampled to a reference size, and the partial image x1, the reference size x2, and the reference size x3 may be input to the feature extraction network 1, the feature extraction network 2, and the feature extraction network 3, respectively. Further, after up-sampling and clipping the feature graphs output from the feature extraction network 2 and the feature extraction network 3, the feature graphs are superimposed with the feature graphs output from the feature extraction network 1, that is, input into a superimposing network. The superposition result output by the superposition network can be input into the activation layer for activation processing, and a target region where a target is located in a corresponding region of the x1 image to be processed can be obtained, for example, the image to be processed is a computed tomography angiography (Computed tomography angiography, CTA) image, and the neural network can divide a region where a coronary vessel is located in the corresponding region of the x1 image to be processed.

In one possible implementation manner, the image processing method may be used in the segmentation processing of the coronary blood vessel, and may utilize feature information of multiple scales to improve the segmentation accuracy, obtain the region where the coronary blood vessel is located, and provide a basis for subsequent diagnosis (for example, diagnosis of plaque, blood vessel blockage, stenosis, etc. in the blood vessel). The image processing method can also be used in image segmentation processing in other fields, for example, the image processing method can be used in the fields of portrait segmentation, object segmentation and the like, and the application field of the image processing method is not limited in the disclosure.

Fig. 4 shows a block diagram of an image processing apparatus according to an embodiment of the disclosure, as shown in fig. 4, the apparatus includes a local image module 11 configured to acquire one or more scale local images based on an image to be processed, a feature extraction module 12 configured to perform feature extraction processing on the one or more local images, respectively, to obtain feature maps corresponding to the one or more scale local images, and a segmentation module 13 configured to segment the image to be processed based on the feature maps corresponding to the one or more scale local images, to obtain a segmentation result.

It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.

In addition, the disclosure further provides an image processing apparatus, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the image processing methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.

The embodiment of the disclosure also provides electronic equipment, which comprises a processor and a memory for storing instructions executable by the processor, wherein the processor is configured to call the instructions stored by the memory so as to execute the method.

The disclosed embodiments also provide a computer program product comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the image processing method as provided in any of the embodiments above.

The disclosed embodiments also provide another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the image processing method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server or other form of device.

Fig. 5 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to FIG. 5, the electronic device 800 can include one or more of a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only an edge of a touch or slide action, but also a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to, a home button, a volume button, an activate button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.

Fig. 6 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 6, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server ^TM,Mac OS X^TM,Unix^TM,Linux^TM,FreeBSD^TM or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image processing method, comprising:

acquiring local images of a plurality of scales based on the image to be processed;

respectively carrying out feature extraction processing on a plurality of local images to obtain feature images corresponding to the local images with a plurality of scales;

Dividing the image to be processed based on the feature images corresponding to the local images with the multiple scales to obtain a division result;

the step of respectively performing feature extraction processing on the plurality of local images to obtain feature graphs corresponding to the local images with the plurality of scales comprises the following steps:

Respectively carrying out feature extraction processing on the plurality of partial images to obtain a first feature image corresponding to the first partial image and a second feature image corresponding to the second partial image;

The segmentation of the image to be processed based on the feature images corresponding to the local images with multiple scales to obtain a segmentation result comprises the following steps:

performing up-sampling processing on the second feature map to obtain a fourth feature map, wherein the ratio of the sizes of the fourth feature map to the first feature map is the same as the ratio of the sizes of the second partial image to the first partial image;

carrying out weighted summation processing on the first characteristic diagram and the fifth characteristic diagram to obtain a third characteristic diagram;

And activating the third feature map to obtain a segmentation result of the target region in the image to be processed.

2. The image processing method according to claim 1, wherein the acquiring a local image of a plurality of scales based on the image to be processed includes:

And performing screenshot processing of a plurality of scales on the image to be processed to obtain a plurality of local images, wherein the local images comprise a first local image with a reference size and a second local image with a larger reference size, and the image centers of the local images are the same.

3. The method according to claim 1, wherein performing feature extraction processing on the plurality of partial images, respectively, to obtain a first feature map corresponding to a first partial image and a second feature map corresponding to a second partial image, comprises:

performing downsampling processing on the second partial image to obtain a third partial image with a reference size;

And respectively carrying out feature extraction processing on the first partial image and the third partial image to obtain a first feature map corresponding to the first partial image and a second feature map corresponding to the second partial image.

4. The method of claim 1, wherein the image processing method is implemented by a neural network comprising a plurality of feature extraction networks, an overlay network, and an activation layer,

Wherein the method further comprises:

and training the plurality of feature extraction networks, the superposition network and the activation layer through the sample image to obtain a plurality of trained feature extraction networks, superposition networks and activation layers.

5. The method of claim 4, wherein training the plurality of feature extraction networks, overlay networks, and activation layers through the sample image to obtain a trained plurality of feature extraction networks, overlay networks, and activation layers, comprises:

Performing screenshot processing on the sample image in multiple scales to obtain a fourth sample local image with a reference size and a fifth sample local image with a size larger than the reference size;

Downsampling the fifth sample partial image to obtain a sixth sample partial image with a reference size;

respectively inputting the fourth sample local image and the sixth sample local image into a corresponding feature extraction network to perform feature extraction processing, and obtaining a third sample feature map corresponding to the fourth sample local image and a fourth sample feature map corresponding to the sixth sample local image;

Performing up-sampling processing and clipping processing on the fourth sample feature map to obtain a fifth sample feature map;

inputting the third sample feature map and the fifth sample feature map into the overlay network to obtain a sixth sample feature map;

inputting the sixth sample feature map into the activation layer for activation processing to obtain a second sample target area of the sample image;

Determining a second network loss of the neural network according to the second sample target area and the labeling information of the sample image;

And training the plurality of feature extraction networks, the superposition network and the activation layer according to the second network loss to obtain the plurality of trained feature extraction networks, the superposition network and the activation layer.

6. The method of claim 5, wherein inputting the third sample feature map and the fifth sample feature map into the overlay network, obtaining a sixth sample feature map, comprises:

and carrying out weighted summation processing on the third sample characteristic diagram and the fifth sample characteristic diagram through the superposition network to obtain the sixth sample characteristic diagram.

7. The method of claim 5, wherein training the plurality of feature extraction networks, overlay networks, and activation layers with the sample image results in a trained plurality of feature extraction networks, overlay networks, and activation layers, further comprising:

Activating the fourth sample feature map to obtain a third sample target area;

Determining a third network loss of the feature extraction network corresponding to the fourth sample feature map according to the labeling information of the third sample target region and the sample image;

and training a feature extraction network corresponding to the fourth sample feature map according to the third network loss.

8. An image processing apparatus, comprising:

The local image module is used for acquiring local images of multiple scales based on the image to be processed;

The feature extraction module is used for respectively carrying out feature extraction processing on the plurality of local images to obtain feature images corresponding to the local images with the plurality of scales;

The segmentation module is used for segmenting the image to be processed based on the feature images corresponding to the local images with the multiple scales to obtain segmentation results;

The feature extraction module is further configured to perform feature extraction processing on the plurality of partial images respectively to obtain a first feature map corresponding to the first partial image and a second feature map corresponding to the second partial image;

The segmentation module is further configured to perform up-sampling processing on the second feature map to obtain a fourth feature map, wherein the ratio of the sizes of the fourth feature map to the first feature map is the same as the ratio of the sizes of the second partial image to the first partial image, perform clipping processing on the fourth feature map to obtain a fifth feature map, the size of the fifth feature map is consistent with that of the first feature map, perform weighted summation processing on the first feature map and the fifth feature map to obtain a third feature map, and perform activation processing on the third feature map to obtain a segmentation result of a target region in the image to be processed.

9. An electronic device, comprising:

A processor;

A memory for storing processor-executable instructions;

Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.