CN109815965B

CN109815965B - Image filtering method and device and storage medium

Info

Publication number: CN109815965B
Application number: CN201910112755.3A
Authority: CN
Inventors: 边成
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-02-13
Filing date: 2019-02-13
Publication date: 2021-07-06
Anticipated expiration: 2039-02-13
Also published as: CN109815965A

Abstract

The embodiment of the application discloses an image filtering method, an image filtering device and a storage medium, wherein the embodiment of the application acquires tissue images of multiple modalities of a target tissue, and determines a quality control network model to be used, and the quality control network model comprises: the method comprises the steps of sequentially connecting a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network, fusing tissue image features of multiple modes based on the cross-channel convolution sub-network to obtain a first fusion feature, performing feature intensive extraction on the first fusion feature based on the convolution sub-network to obtain a second fusion feature, classifying the second fusion feature based on the classification sub-network to obtain a quality identification result of the tissue image, and filtering the tissue images of the multiple modes according to the quality identification result to obtain a filtered tissue image.

Description

Image filtering method and device and storage medium

Technical Field

The present application relates to the field of image recognition, and in particular, to an image filtering method, an image filtering device, and a storage medium.

Background

Medical imaging is the non-invasive acquisition of internal tissue images of a human body or a part of a human body. Medical images are used in medical treatment or medical research. For example, the medical image may include an Optical Coherence Tomography (OCT) image, or the like. The OCT image has higher definition for observing the fundus structure than other inspection methods, so the OCT image has good effect for diagnosing macular hole, central serous chorioretinopathy, cystoid edema and the like.

Currently, the conventional filtering method for medical images is mainly to select images by human, so as to screen out unsatisfactory images, but the efficiency and accuracy of this method are limited.

Because the current filtering method for medical images depends on manual filtering, for example, images with unsatisfactory quality need to be manually screened out, the efficiency and accuracy of image filtering are low.

Disclosure of Invention

In view of this, embodiments of the present application provide an image filtering method, an image filtering apparatus, and a storage medium, which can improve efficiency and accuracy of image filtering.

In a first aspect, an embodiment of the present application provides an image filtering method, including:

acquiring tissue images of a plurality of modalities of a target tissue;

determining a quality control network model to be used, the quality control network model comprising: the cross-channel convolution sub-network, the convolution sub-network and the classification sub-network are connected in sequence;

fusing the tissue image features of multiple modalities based on the cross-channel convolution sub-network to obtain a first fusion feature;

carrying out dense feature extraction on the first fusion feature based on the convolution sub-network to obtain a second fusion feature;

classifying the second fusion characteristics based on the classification sub-network to obtain a quality identification result of the tissue image;

and filtering the tissue images of the plurality of modes according to the quality identification result to obtain a filtered tissue image.

In a second aspect, an embodiment of the present application provides an image filtering apparatus, including:

an acquisition module for acquiring tissue images of a plurality of modalities of a target tissue;

a determination module for determining a quality control network model to be used, the quality control network model comprising: the cross-channel convolution sub-network, the convolution sub-network and the classification sub-network are connected in sequence;

the cross-channel convolution module is used for fusing the tissue image characteristics of a plurality of modes based on the cross-channel convolution sub-network to obtain a first fusion characteristic;

the convolution module is used for carrying out dense feature extraction on the first fusion feature based on the convolution sub-network to obtain a second fusion feature;

the classification module is used for classifying the second fusion characteristics based on the classification sub-network to obtain a quality identification result of the tissue image;

and the filtering module is used for filtering the tissue images in the plurality of modes according to the quality identification result to obtain a filtered tissue image.

In a third aspect, a storage medium is provided in this application, and a computer program is stored thereon, and when the computer program runs on a computer, the computer is caused to execute the image filtering method provided in any embodiment of this application.

The method includes the steps of obtaining tissue images of multiple modalities of a target tissue, and determining a quality control network model to be used, wherein the quality control network model comprises the following steps: the method comprises the steps that a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network are sequentially connected, the tissue image features of multiple modes are fused based on the cross-channel convolution sub-network to obtain a first fusion feature, the first fusion feature is subjected to feature intensive extraction based on the convolution sub-network to obtain a second fusion feature, the second fusion feature is classified based on the classification sub-network to obtain a quality identification result of the tissue image, the tissue images of the multiple modes are filtered according to the quality identification result, and the filtered tissue image is obtained, so that the image filtering efficiency and accuracy are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of an image filtering method according to an embodiment of the present application.

Fig. 2 is a first flowchart of an image filtering method according to an embodiment of the present application.

Fig. 3 is a second flowchart of the image filtering method according to the embodiment of the present application.

Fig. 4 is a third flow chart of the image filtering method according to the embodiment of the present application.

Fig. 5 is a schematic diagram of a fundus image and an OCT image provided in an embodiment of the present application.

Fig. 6 is a schematic view of an inclusion a structure provided in an embodiment of the present application.

Fig. 7 is a schematic view of an inclusion B structure provided in an embodiment of the present application.

Fig. 8 is a schematic view of an inclusion C structure provided in an embodiment of the present application.

Fig. 9 is a schematic view of an inclusion V4 network structure provided in an embodiment of the present application.

Fig. 10 is a schematic view of a stem structure provided in an embodiment of the present application.

Fig. 11 is a schematic diagram of a reduction structure provided in an embodiment of the present application.

Fig. 12 is a schematic diagram of a density Net network structure provided in an embodiment of the present application.

Fig. 13 is a schematic diagram of a difficult sample sampler provided by an embodiment of the present application.

Fig. 14 is a schematic structural diagram of a quality control network model provided in an embodiment of the present application.

Fig. 15 is a schematic diagram of cross-channel convolution according to an embodiment of the present application.

Fig. 16 is a schematic diagram of dense connections provided by an embodiment of the present application.

Fig. 17 is a schematic structural diagram of inclusion a for increasing short-circuit connection according to an embodiment of the present application.

Fig. 18 is a schematic structural diagram of inclusion B for increasing short-circuit connection according to an embodiment of the present application.

Fig. 19 is a schematic structural diagram of an inclusion C for increasing short-circuit connection according to an embodiment of the present application.

Fig. 20 is a schematic structural diagram of an image scanning type identification network model provided in an embodiment of the present application.

Fig. 21 is a schematic diagram of experimental results provided in the examples of the present application.

Fig. 22 is a schematic view of a first structure of an image filtering apparatus according to an embodiment of the present application.

Fig. 23 is a second structural schematic diagram of an image filtering apparatus according to an embodiment of the present application.

Fig. 24 is a schematic diagram of a network device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.

The term "module" as used herein may be considered a software object executing on the computing system. The different components, modules, engines, and services described herein may be considered as implementation objects on the computing system. The apparatus and method described herein may be implemented in software, but may also be implemented in hardware, and are within the scope of the present application.

The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

An execution main body of the image filtering method may be the image filtering apparatus provided in the embodiment of the present application, or a network device integrated with the image filtering apparatus, where the image filtering apparatus may be implemented in a hardware or software manner. The network device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of an image filtering method according to an embodiment of the present application, taking an example that an image filtering apparatus is integrated in a network device, where the network device may obtain tissue images of multiple modalities of a target tissue and determine a quality control network model to be used, where the quality control network model includes: the method comprises the steps that a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network are sequentially connected, tissue image features of multiple modes are fused on the basis of the cross-channel convolution sub-network to obtain a first fusion feature, the first fusion feature is subjected to feature intensive extraction on the basis of the convolution sub-network to obtain a second fusion feature, the second fusion feature is classified on the basis of the classification sub-network to obtain a quality identification result of the tissue image, and the tissue images of the multiple modes are filtered according to the quality identification result to obtain the filtered tissue image.

Referring to fig. 2, fig. 2 is a schematic flow chart of an image filtering method according to an embodiment of the present disclosure. The specific flow of the image filtering method provided by the embodiment of the application can be as follows:

201. tissue images of multiple modalities of a target tissue are acquired.

The target tissue may be a certain tissue of a living body, for example, the target tissue may be a living body tissue, a certain part of a living body tissue, a tissue composed of a plurality of living body tissues, or the like. For example, the target tissue may be an eye, a blood vessel, subcutaneous tissue, and the like. The life body is an independent individual which has a life form and can correspondingly reflect external stimulation. For example, the living body may be a human body or an animal body, and the like.

The tissue images of the target tissue in the multiple modalities may be images of the target tissue acquired by different imaging technologies, for example, the tissue images of the multiple modalities may be OCT images imaged by using near infrared rays and optical interference principles and medical images imaged by means of optical lenses, and the like.

In one embodiment, for example, when the target tissue is an eye, the tissue images of the target tissue in the plurality of modalities may be OCT images using near infrared rays and optical interference principles, fundus images acquired by a fundus camera, and the like.

In practical applications, there are various ways to acquire tissue images of multiple modalities of a target tissue, for example, when the tissue images of multiple modalities of the target tissue are an OCT image and a fundus image, the OCT image may be acquired by an optical coherence tomography or the like, the fundus image may be acquired by a fundus camera or the like, the OCT image and the fundus image may be acquired by locally acquiring the OCT image and the fundus image, or downloading the OCT image and the fundus image via a network, or the like.

Among them, as shown in fig. 5, the fundus image is a fundus image acquired by a fundus camera or the like, from which vitreous body, retina, choroid, optic nerve disease, and the like can be examined. Eyeground lesions can occur in various systemic diseases such as hypertension, nephropathy, diabetes, toxemia of pregnancy, sarcoidosis, certain blood diseases, central nervous system diseases and the like, and even become the main reason for the patient to see a doctor, so the examination of the eyeground can provide important diagnostic data.

Among them, as shown in fig. 5, the OCT image is an Optical Coherence tomography (Optical Coherence tomography) which is an imaging technique rapidly developed in the last decade, and it uses the basic principle of weak coherent Optical interferometer to detect back-reflected or several scattered signals of different depth levels of biological tissue to incident weak coherent light, and through scanning, two-dimensional or three-dimensional structural images of biological tissue can be obtained.

202. A quality control network model to be used is determined.

(1) And (4) determining a quality control network model.

The quality control network model is a network model that can acquire quality information of a tissue image of a target tissue, for example, as shown in fig. 14, the quality control network model may include a cross-channel convolution sub-network, a convolution sub-network, and a classification sub-network, which are connected in sequence. The convolution sub-network comprises at least one convolution module and a parallel dimension reduction structure, and the classification module can comprise an average pooling layer, a classifier and the like. The quality control network model may be a network model that is an improvement of the inclusion V4 network model, and so on.

Because the existing network model, the inclusion V4 network model and the Dense Net network model, in different dimensions, realize the improvement of the performance of the network model, however, the inclusion V4 network model pays attention to the network width but does not pay attention to the network depth, and the Dense Net network model can pay attention to the network depth but does not give consideration to the network width. Therefore, the idea of Dense connection in the Dense Net network model can be added to the inclusion V4 network model, so that the network model with consideration of network width and network depth can be obtained.

Moreover, both the inclusion V4 network model and the Dense Net network model only consider a modality (an image) as an input, and since the tissue image may include multiple modalities, for example, the tissue image may include an OCT image and a fundus image, the network model with a single structure has no way to effectively process tissue images of multiple modalities at the same time, and the lack of information of a certain modality may cause the network model to classify part of data inaccurately. Therefore, by adding a cross-channel convolution sub-network in the quality control network model, information of multiple modes can be fused, so that the network equipment can process multi-mode input simultaneously.

As shown in fig. 9, the inclusion V4 network model is a network model by decomposing an n × n convolution kernel into two convolutions of 1 × n and n × 1. For example, a convolution of 3 x 3 is equivalent to first performing a convolution of 1 x 3 and then a convolution of 3 x 1, which is less costly than a single convolution of 3 x 3. Fig. 10 is a schematic diagram of stem structure in the inclusion V4 network model. In addition, a parallel dimension Reduction structure (Reduction) in the inclusion V4 network model, as shown in fig. 11, may be used to replace a single pooling layer, and may also increase sampling information of the network on objects of different sizes.

The inclusion V4 network model adopts a mode of widening the network width, and a basic unit (convolution submodule) of the inclusion V4 network model comprises four convolution branches, namely the convolution submodule comprises a plurality of convolution branches. As shown in fig. 6, 7 and 8, the basic units (convolution sub-modules) of the inclusion V4 network model include an inclusion a unit, an inclusion B unit, an inclusion C unit, and the like. The detail branch of the inclusion a unit (the second and third convolution branches from the left in the inclusion a unit) includes fewer convolution layers and a smaller convolution kernel for focusing on the detail part of the image; the whole branch (the first convolution branch and the fourth convolution branch from the left in the increment A unit) comprises a pooling layer and more convolution layers, is used for paying attention to the global part of the image, and finally, the four convolution branches are connected together and output to the next layer so as to increase the diversity of the features. The inclusion B unit and the inclusion C unit adopt a different connection method from the inclusion a unit, but actually adopt a method of widening the network width to optimize the network model.

The Dense Net network model is a convolutional neural network model with Dense connection. As shown in fig. 12, the Dense Net network model improves the performance of the network by increasing the depth of the network, and in the Dense Net network model, there is a direct connection between any two layers, that is, the input of each layer of the Dense Net network model includes the output of all the previous layers, for example, the output of all the previous layers can be a union, the output of all the previous layers can be mathematically added, and so on. And the feature graph learned by the layer can be directly transmitted to all layers behind the layer to be used as input, the gradient disappearance problem is relieved through dense connection, the feature transmission is enhanced, the feature multiplexing is encouraged, and therefore the parameter quantity is greatly reduced.

(2) And (5) training a quality control network model.

The image filtering method can further comprise training of the quality control network model.

In an embodiment, specifically, the image filtering method may further include:

acquiring a training set, wherein the training set comprises a plurality of training samples;

sampling the training set according to the sampling probability of the sample image to obtain a target training sample;

training a quality control network model based on the target training sample to obtain a trained quality control network model;

obtaining a loss function corresponding to a sample image in the target training sample based on the trained quality control network model;

and updating the sampling probability of the sample image in the target training sample according to the loss function, and returning to the step of sampling the training set according to the sampling probability of the sample image until the sampling termination condition is met.

(a) A training set is obtained.

The training set includes a plurality of training samples, and the training set and the training samples both include a plurality of sample images, for example, the training set includes B training samples, the training samples are subsets of the training set, the training set includes M sample images, and the training samples include N sample images.

The training set comprises a plurality of sample images, the sample images are tissue sample images of a plurality of modalities of target tissues, and the labeled quality information is carried, for example, the images with unqualified quality can comprise images caused by human factors such as shooting blurring, cropping and position error, or images caused by non-human factors such as refractive interstitial turbidity, high myopia and the like. The sample image can be a plurality of fundus images and OCT images with marked quality information, and the number ratio of the fundus images to the OCT images can be 1: 1, fundus images correspond one-to-one to OCT images, and so on.

In one embodiment, for example, the ratio of the number of fundus images to the number of OCT images may be other ratios, but it is desirable to ensure that each OCT image corresponds to one fundus image.

In practical applications, the training set may be obtained in various manners, for example, the training set may be formed by acquiring sample images through a medical instrument, or may be obtained through a network, a database, a local network, or the like. Specifically, 4476 sample images may be acquired as a training set, the training set accounts for 70% of the total images, 1919 tissue images of the target tissue may be acquired as a test set, the test set accounts for 30% of the total images, the sample images in the test set do not include labeled quality information, and the total images include the test set and the training set.

The sample images in the training set can be preprocessed to eliminate irrelevant information in the images, useful real information is recovered, the detectability of the relevant information is enhanced, and the data is simplified to the maximum extent, so that the training accuracy is improved.

In an embodiment, specifically, the image filtering method may further include:

carrying out size adjustment on the sample image to obtain a sample image after size adjustment;

carrying out pixel adjustment on the sample image after size adjustment to obtain a sample image after pixel adjustment;

performing data enhancement on the sample image after the pixel adjustment to obtain an enhanced sample image;

and taking the enhanced sample image as the training sample image.

In one embodiment, for example, the size of the fundus image in the sample image may be 496 × 496, and the size of the OCT image may be 496 × 496, 496 × 768, or 496 × 1024, and so on. The quality control network model input image may be a 496 × 496 fundus image, and a 496 × 768 OCT image. Since the size of the OCT image in the sample image is more than one, the size of the OCT image needs to be adjusted. For example, when the image width is less than 768, the operation of 0 filling (black edge) is performed on two sides of the image; when the image width is larger than 768, the symmetrical cropping operation is performed on both sides of the image, so that the width of all OCT images is 768, thereby realizing the uniformity of the image size.

The sample image with uniform size may then be normalized, for example, by subtracting the image mean divided by the image variance. Then, random rotation of-30 to +30 degrees, random horizontal inversion, random elastic deformation or random speckle noise addition can be performed on the image, so that the data volume of the sample image is increased, the generalization capability of the model is improved, and the noise data can be increased to improve the robustness of the network model.

(b) And sampling the training set according to the sampling probability of the sample image to obtain a target training sample.

The sampling probability of the sample image is the probability that each sample image is sampled when the sample image is sampled from the training set, for example, when M sample images are included in the training set, the sampling probability of the sample image is 1/M.

In practical application, a target training sample including a certain number of sample images can be obtained by sampling from a training set according to the sampling probability of the sample images. For example, when the training set includes M sample images and the sampling probability of the sample images is 1/M, N sample images may be collected and a target training sample including N sample images may be obtained.

In an embodiment, for example, the step of sampling the training set according to the sampling probability of the sample image to obtain the target training sample may be further implemented by a sampler.

The sampler (sampler) is a tool for deep learning training, and can be used for sampling from a training set to obtain a target training sample, and inputting the target training sample into a network model for training.

The sampler used in the conventional deep learning may first initialize all sample images in a training set, so that the probability of sampling each sample image in the training set is the same, for example, if the training set includes M sample images, each sample image may be sampled with a probability of 1/M, and form a target training sample to perform network model training, which belongs to non-playback sampling. And completing one round of network model training after all sample images in the training set are sampled. However, this sampling method cannot be repeatedly trained on difficult samples, and can cause over-learning of simple samples, thereby limiting the improvement of network model performance.

In an embodiment, since the ordinary Sampler cannot adjust the sampling probability of the difficult samples, a Hard sample Boosting Sampler (Hard sample Boosting Sampler) may be used to sample the training set. For example, as shown in fig. 13, a new type of sampler with difficult samples may be used to initialize the sample images in the training set so that the probability of each training sample image being sampled is the same, and then the training set is sampled to obtain the target training sample.

(c) And training the quality control network model based on the target training sample to obtain the trained quality control network model.

In practical application, after a target training sample is obtained by sampling from a training set, the quality control network model can be trained based on the target training sample to obtain a trained quality control network model.

(d) And obtaining a loss function corresponding to the sample image in the target training sample based on the trained quality control network model.

The loss function is a function in machine learning, and since all algorithms in machine learning need to maximize or minimize a function, the function is called an objective function. One class of functions that is minimized is the loss function. The loss function can measure the model prediction capability according to the prediction result.

In practical application, after the quality control network model is trained by the target training sample, a loss function corresponding to a sample image in the target training sample can be obtained. And if the loss function corresponding to the sample image is large, the sample image is considered to belong to a difficult sample.

(e) And updating the sampling probability of the sample image in the target training sample according to the loss function, and returning to the step of sampling the training set according to the sampling probability of the sample image until the sampling termination condition is met.

Wherein, the condition of meeting the sampling termination condition is the condition that the sampling can be terminated and the quality control network model after the training is finished is obtained. For example, when there is no playback sample, the sampling termination condition may be set when all images in the training set are trained. When the sampling is put back, the number of the trained sample images can be used as the sampling termination condition when the number of the trained sample images reaches the number of the sample images in the training set.

In practical application, the sampling probability of the sample image in the target training sample can be updated according to the loss function corresponding to the obtained sample image, and then the sampling step is continued until the sampling termination condition is met.

In order to realize repeated sampling of the difficult samples, the difficult samples in the target training samples can be judged according to the loss function of the sample images, and then the difficult samples are repeatedly sampled, so that the accuracy of the training network model is improved.

Specifically, the step of "updating the sampling probability of the sample image in the target training sample according to the loss function" may include:

sequencing the sample images in the target training sample according to a preset rule according to the loss function to obtain the sequencing of the sample images in the target training sample;

and updating the sampling probability of the sample images in the target training sample according to the sequence of the sample images in the target training sample.

In practical application, the sample images in the target training sample can be sequenced according to a preset rule according to the loss function to obtain the sequence of the sample images in the target training sample, and the sampling probability of the sample images in the target training sample is updated. For example, the samples can be ranked from small to large according to the loss functions of the sample images in the target training samples, the sample images with the largest loss functions in the target training samples can be regarded as difficult samples, then the sampling probability of the sample images with the largest loss functions in the target training samples can be doubled, and the difficult samples have larger sampling probability due to the fact that the samples are put back for sampling, and therefore the difficult samples have larger probability to be sampled and trained for multiple times.

In an embodiment, for example, the training set includes B training samples, the training set includes M sample images, one training sample may be obtained from the training set as a target training sample based on the novel difficult sample sampler, the sampling probability of each sample image is 1/M, and the target training sample includes N sample images. After sample training, obtaining loss functions corresponding to N sample images in the target training sample, sequencing the sample images in the target training sample according to the size of the loss functions, and adjusting the sampling probability of the sample image with the largest loss function from 1/M to 2/M so as to increase the sampling probability of the difficult sample.

In an embodiment, for example, after the sampling probability is adjusted, the target training sample includes a sample image with the sampling probability of 2/M and the loss function of the sample image is the largest at the next sampling, and the sampling probability of the sample image may be continuously adjusted from 2/M to 4/M to increase the sampling probability of the difficult sample.

In one embodiment, for example, the parameters of the quality control network model may be pre-trained on the Image Net data set using the inclusion V4 network model, and the newly added convolutional layer may be initialized using a gaussian distribution with a variance of 0.01 and a mean of 0.

The Image Net dataset is a dataset applied to the field of deep learning images, and research works such as Image classification, positioning, detection and the like can be performed according to the dataset.

In one embodiment, for example, Adam-based gradient descent may be used to solve for the convolutional layer parameters w and bias parameters b of the quality control network model, and the learning rate may be attenuated by 90% every 20K iterations.

203. And fusing the tissue image features of the plurality of modalities based on the cross-channel convolution sub-network to obtain a first fusion feature.

The cross-channel convolution sub-network can fuse the features of various images so as to extract the fused features, and therefore the accuracy of the network model is improved.

In practical application, for example, the OCT image and the fundus image may be input into a quality control network model for quality recognition to obtain quality information about the OCT image and the fundus image, and clinically unavailable data, such as images with blur, clipping, incorrect position, and the like due to human factors, may be removed by the quality control network model; images such as refractive interstitial turbidity, high myopia, etc., caused by non-human factors.

As shown in fig. 14, in order to improve the accuracy of the quality identification result of the tissue image obtained by the quality control network model, the quality identification may be performed by using a network model including a cross-channel convolution sub-network, a convolution sub-network, and a classification sub-network.

In practical applications, for example, the quality control network model may include a cross-channel convolution sub-network, a convolution sub-network, and a classification sub-network connected in sequence. The fundus image and the OCT image can be input into a quality control network model, and the characteristics of the fundus image and the characteristics of the OCT image are fused through a cross-channel convolution sub-network to obtain a first fusion characteristic.

As shown in fig. 15, in order to effectively process the fundus image and the OCT image at the same time, a cross-channel convolution sub-network may be used to fuse the features of the fundus image and the OCT image, so as to improve the feature diversity and thus improve the accuracy of the network model.

In an embodiment, specifically, the step of "fusing tissue image features of a plurality of modalities based on the cross-channel convolution sub-network to obtain a first fused feature" may include:

extracting tissue image features of a plurality of modalities based on the feature extraction unit;

performing feature reformation on the tissue image features of the plurality of modalities based on the modality reformation unit to obtain reformed features;

and fusing the reformed features based on the feature fusion unit to obtain a first fusion feature.

The cross-channel convolution sub-network comprises a feature extraction unit, a mode reforming unit and a feature fusion unit. The feature extraction unit is a unit that can extract image features, and may include a stem network structure, for example. The modality reforming unit is a unit capable of reforming image characteristics. The feature fusion unit is a unit capable of fusing image features, for example, the feature fusion unit may fuse image features through three-dimensional convolution.

In practical applications, for example, factors affecting image quality can be observed not only completely on an OCT image but also partially on a fundus image. Therefore, the OCT image and the fundus image need to be checked, so that a cross-channel convolution sub-network is introduced, and the feature richness is increased through the extraction of the OCT image and the fundus image.

In practical application, for example, as shown in fig. 15, based on a cross-channel convolution sub-network, a feature extraction unit may extract features of an OCT image and features of a fundus image to obtain a 2 × W × H × C feature image, a mode reforming unit may perform feature reforming on the feature image to obtain a C × 2 × H × W feature image, and a feature fusion unit may fuse bimodal information together by using a 2 × 1 × 1 three-dimensional convolution to obtain a first fusion feature, thereby completing the cross-channel convolution.

The three-dimensional convolution is a convolution mode, and the features of the multiple channels are convoluted and combined through the three-dimensional filter, so that the fused features can be obtained.

204. And carrying out dense feature extraction on the first fusion feature based on the convolution sub-network to obtain a second fusion feature.

The convolution sub-network may perform a dense extraction of features of the image, and may include at least one convolution module, which may include a plurality of convolution sub-modules densely connected.

In one embodiment, the convolution sub-network may include at least one convolution module and may further include one convolution module, for example, only the first convolution module may be included in the convolution sub-network, and so on.

For example, as shown in fig. 12, the feature extraction may be performed by using the concept of Dense Net, and the current convolution submodule may be connected to all the historical convolution submodules in the convolution module, so that the input feature may be subjected to multiple feature extractions, thereby realizing feature extraction diversification and improving the accuracy of the network model.

In practical applications, for example, the first fused feature may be input into a convolution sub-network to perform dense feature extraction, so as to obtain a second fused feature.

In order to improve the accuracy of feature extraction, at least one convolution module in a convolution sub-network may be used to extract features.

In an embodiment, specifically, the step of performing dense feature extraction on the first fused feature based on the convolution sub-network to obtain the second fused feature may include:

determining the first fusion feature as a current input feature of a current convolution module;

carrying out dense feature extraction on the current input features based on a convolution submodule in the current convolution module;

taking the extracted features as current input features, and selecting the rest convolution modules in the convolution sub-network as current convolution modules;

and returning to the step of executing the dense feature extraction of the current input features based on the convolution submodule in the current convolution module until a first feature extraction termination condition is met, and obtaining second fusion features.

The first feature extraction termination condition may be a condition for determining whether the feature extraction step in the convolution sub-network is terminated, for example, feature extraction may be performed on all convolution modules in the convolution sub-network, and the condition is used as the first feature extraction termination condition.

In one embodiment, a plurality of convolution modules may be included in a convolution sub-network, and a plurality of convolution sub-modules may be included in a convolution module. The first fusion feature acquired through the cross-channel convolution sub-network can be determined as the current input feature of the current convolution module, then feature intensive extraction is carried out on the current input feature based on a convolution sub-module in the current convolution module, the extracted feature is used as the current input feature, other convolution modules in the convolution sub-network are selected as the current convolution module, then the step of carrying out feature intensive extraction on the current input feature based on the convolution sub-module in the current convolution module is returned to be executed until a first feature extraction termination condition is met, and a second fusion feature is obtained.

In an embodiment, the convolution module may include a plurality of convolution sub-modules, and the convolution module may further include a convolution sub-module, for example, the convolution module may include only a first convolution sub-module, and so on.

For example, as shown in fig. 14, a convolution sub-network may include a first convolution module, a second convolution module, and a third convolution module. The convolution module may include a plurality of convolution sub-modules, for example, the first convolution module may include a plurality of first convolution sub-modules.

The method comprises the steps of determining a first convolution module as a current convolution module, determining a first fusion feature obtained through a cross-channel convolution sub-network as a current input feature of the first convolution module, performing feature intensive extraction on the current input feature based on a convolution sub-module in the first convolution module, taking the extracted feature as the current input feature, selecting a second convolution module as the current convolution module, returning to the step of performing feature intensive extraction on the current input feature based on the convolution sub-module in the current convolution module until the feature extraction of the first convolution module, the second convolution module and a third convolution module in the convolution sub-network is completed, and obtaining a second fusion feature.

For example, a convolution sub-network may include a first convolution module, a second convolution module, and a third convolution module, and the three convolution modules may be connected in series. And determining the first fusion features acquired through the cross-channel convolution sub-network as the current input features of the first convolution module, and then carrying out feature dense extraction on the current input features based on the convolution sub-modules in the first convolution module. And then, carrying out dense feature extraction on the features extracted by the first convolution module based on a convolution submodule in the second convolution module, carrying out dense feature extraction on the features extracted by the second convolution module based on a convolution submodule in the third convolution module, and finally taking the features extracted by the third convolution module as second fusion features.

In order to improve the transmission efficiency of information and gradient in the network model, each layer of network can directly take the gradient from a loss function and obtain an input signal, so that a deeper network model is trained, the concept of Dense Net can be added into the quality control network model, and the network performance is improved from the aspect of characteristic reuse, so that any two layers of network can be directly connected, that is, the input of each layer of network comprises the output of all the previous layers of network.

In an embodiment, specifically, the step of "performing dense extraction of features on the current input features based on the convolution sub-module in the current convolution module" may include:

determining the current input characteristic of the current convolution module as the target input characteristic of the current convolution submodule;

performing feature extraction on the target input features based on the current convolution submodule to obtain target extraction features;

fusing the target extraction features and history extraction features based on the current convolution submodule, wherein the history extraction features are extracted by all history convolution submodules before the current convolution submodule in the current convolution module;

taking the fused features as target input features, and selecting the other convolution sub-modules in the current convolution module as current convolution sub-modules;

and returning to execute the step of performing feature extraction on the target input features based on the current convolution submodule to obtain target extraction features until a second feature extraction termination condition is met.

And the target input features are features which need to be extracted by the current convolution submodule. The history extraction features are extracted by all history convolution submodules before the current convolution submodule in the current convolution module. The convolution module includes a plurality of densely connected convolution sub-modules.

The second feature extraction termination condition may be a condition for judging whether the feature extraction step in the current convolution module is terminated, for example, feature extraction may be performed on all convolution sub-modules in the current convolution module, and the condition is used as the second feature extraction termination condition. For example, when the current convolution module is the first convolution module, feature extraction may be performed on all first convolution sub-modules in the first convolution module, and the feature extraction may be used as a second feature extraction termination condition.

In practical application, in order to increase the reuse rate of features, all convolution sub-modules in the same convolution module can be densely connected, each path of features between modules in the same mode is reused, the current input features of the current convolution module are determined as the target input features of the current convolution sub-module, then feature extraction is carried out on the target input features based on the current convolution sub-module to obtain target extraction features, then the target extraction features and the history extraction features are fused based on the current convolution sub-module, then the fused features are used as the target input features, and selecting the other convolution submodules in the convolution module as the current convolution submodule, and finally returning to execute the step of performing feature extraction on the target input feature based on the current convolution submodule to obtain the target extraction feature until a second feature extraction termination condition is met.

For example, as shown in fig. 16, in order to increase the reuse rate of features, convolution sub-modules may be densely connected, and each path of features between modules of the same modality may be reused, a first convolution module may include a plurality of first convolution sub-modules, and the plurality of first convolution sub-modules in the first convolution module are all connected in a dense connection manner, that is, each first convolution sub-module is connected to a previous first convolution sub-module.

The current input feature of the first convolution module can be determined as the target input feature of the current first convolution submodule, feature extraction is carried out on the target input feature based on the current first convolution submodule to obtain a target extraction feature, then the target extraction feature and the history extraction feature are fused based on the current first convolution submodule, the fused feature is used as the target input feature, other first convolution submodules in the first convolution module are selected as the current first convolution submodule, finally, the step of carrying out feature extraction on the target input feature based on the current first convolution submodule is returned to be executed, and the step of obtaining the target extraction feature is carried out until all the first convolution submodules in the first convolution module carry out feature extraction.

For example, the convolution module includes four convolution sub-modules, which are a first convolution sub-module, a second convolution sub-module, a third convolution sub-module, and a fourth convolution sub-module. And four convolution sub-modules in the convolution module are connected in a dense connection mode. For example, the convolution sub-modules are arranged in the convolution module according to the order of the first convolution sub-module, the second convolution sub-module, the third convolution sub-module, and the fourth convolution sub-module. And determining the current input features of the input convolution module as the target input features of the first convolution submodule.

The four convolution submodules in the convolution module are connected in a dense connection mode, that is, the features extracted by the first convolution submodule can be directly input into the second convolution submodule, the third convolution submodule and the fourth convolution submodule respectively.

The input of the second convolution submodule is the feature extracted by the first convolution submodule, and the feature extracted by the second convolution submodule can be directly input into the third convolution submodule and the fourth convolution submodule respectively.

The input of the third convolution submodule is the features extracted by the first convolution submodule and the second convolution submodule, and the features extracted by the third convolution submodule can be directly input into the fourth convolution submodule.

The input of the fourth convolution submodule is the features extracted by the first convolution submodule, the second convolution submodule and the third convolution submodule, and the features extracted by the fourth convolution submodule can be used as the features extracted by the convolution module.

In order to improve feature richness, original features can be reused by adopting a short-circuit connection mode, and therefore accuracy of the network model is improved. In an embodiment, specifically, the step of "performing feature extraction on the target input feature based on the current convolution submodule to obtain a target extracted feature" may include:

respectively extracting the features of the target input features based on the plurality of convolution branches to obtain a plurality of convolution output features;

and fusing the target input features and the plurality of convolution output features to obtain target extraction features.

The convolution submodule comprises a plurality of convolution branches. The convolution branch can comprise a plurality of convolution layers, and the extraction of the characteristics can be realized through the convolution branch.

In practical application, the target input features can be respectively subjected to feature extraction based on the plurality of convolution branches to obtain a plurality of convolution output features, and then the target input features, the plurality of convolution output features are fused to obtain the target extraction features.

For example, as shown in fig. 17, 18, and 19, in order to increase feature diversity, a short-circuit connection may be added to each of the inclusion a unit, the inclusion B unit, and the inclusion C unit in the basic unit (convolution sub-module) of the inclusion V4 network model, and the original input features are multiplexed by the short-circuit connection. The target extraction features are obtained by fusing convolution output features output by a plurality of convolution branches and target input features obtained through short-circuit connection, and then fusing the target input features and the convolution output features through a fusion layer (Filter Concat).

In an embodiment, the basic units (convolution sub-modules) of the inclusion V4 network model are not limited to include only the inclusion a unit, the inclusion B unit and the inclusion C unit, and may include one or more of the inclusion a unit, the inclusion B unit and the inclusion C unit, and may further include other basic units. In the basic units (convolution sub-modules) of the inclusion V4 network model, the order of the basic units can be changed according to actual conditions, and the like.

205. And classifying the second fusion characteristics based on the classification sub-network to obtain a quality identification result of the tissue image.

The classification sub-network may classify the features of the image, and may include a full connectivity layer, a Pooling layer, and the like, for example, the classification sub-network may include an Average capacitance layer, a SoftMax layer, and the like.

The quality identification result may be a result of determining whether the tissue image is a clinically usable image, for example, a clinically unusable image may include: images caused by artifacts such as blur, cropping, misregistration, or non-artifacts such as refractive interstitial turbidity, high myopia, etc.

In practical applications, for example, the second fusion features may be classified by a classification sub-network, so as to obtain quality recognition results of the fundus image and the OCT image, such as probabilities related to image quality information.

206. And filtering the tissue images of the plurality of modes according to the quality identification result to obtain a filtered tissue image.

In practical application, after the quality identification result of the tissue image is obtained, the tissue image can be filtered according to the quality identification result to obtain the filtered tissue image, for example, images which are unavailable in clinic can be obtained according to the quality identification result, and images which are caused by human factors such as shooting blurring, clipping and position irregularity; and (3) filtering and deleting images caused by non-human factors, such as refractive interstitial turbidity, high myopia and the like. The deleted image is a filtered tissue image, which may include a clinically usable fundus image and a clinically usable OCT image.

In an embodiment, the image filtering method may further include filtering the image that does not satisfy the requirement scan mode. As shown in fig. 3, the image filtering method may further include the following steps:

301. the type of scan of the filtered tissue image is identified.

The scanning type of the filtered tissue image may be a scanning type used when the tissue image is captured. For example, for a clinically usable fundus image in the filtered tissue image, the scan type of the clinically usable fundus image may be whether a green line of the fundus image passes through the macula lutea, or the like.

In one embodiment, to improve the accuracy of the scan type for obtaining the filtered tissue image, a network model may be used to obtain the scan type.

Specifically, the step of "identifying the scan type of the filtered tissue image" may comprise:

and identifying the filtered tissue image based on an image scanning type identification network model to obtain the scanning type of the filtered tissue image.

(1) And identifying the filtered tissue image based on the image scanning type identification network model to obtain the scanning type of the filtered tissue image.

The image scanning type identification network model is a network model capable of acquiring a scanning type of the filtered tissue image, for example, the image scanning type identification network model may be a network model obtained by improving an inclusion V4 network model.

In practical applications, for example, the clinically usable fundus images in the filtered tissue images can be input into the image scanning type identification network model to obtain the scanning types of the clinically usable fundus images, and the fundus images which do not meet the standard can be removed through the image scanning type identification network model, including: annular saccade of the papilla image, line saccade of the fundus image of the macular region as well as the papilla not being treated, and so on. Then, the OCT image corresponding to the fundus image which does not meet the standard in scanning can be removed, and the left OCT image which meets the standard in scanning is the target image.

In order to improve the accuracy of obtaining the scanning type of the filtered tissue image by the image scanning type identification network model, the scanning type identification can be performed by adopting a network model comprising a convolution sub-network and a classification sub-network.

In practical applications, for example, as shown in fig. 20, the image scan type recognition network model may include a convolution sub-network and a classification sub-network connected in sequence. The filtered tissue image can be input into an image scanning type identification network model, the image features are extracted through a convolution sub-network, and then the extracted features are classified through a classification sub-network, so that the scanning type of the filtered tissue image is obtained.

In an embodiment, the concept of Dense Net can be added into an image scanning type recognition network model, so that the network performance is improved from the aspect of feature reuse.

Wherein the convolution sub-network comprises at least one convolution module. The convolution module includes a plurality of convolution sub-modules that are densely connected. The convolution sub-module includes a plurality of convolution branches.

In practical applications, for example, in order to increase the reuse rate of features, the convolution sub-modules may be densely connected, each path of features between modules in the same mode may be reused, and features extracted by historical convolution sub-modules before the current convolution sub-module are fused.

In order to improve feature richness, the original features can be reused by increasing a short-circuit connection mode, and accuracy of the network model is improved.

In practical applications, for example, in order to increase feature diversity, a short-circuit connection may be added to an inclusion a unit, an inclusion B unit, and an inclusion C unit in a basic unit (convolution sub-module) of the inclusion V4 network model, and the original features are multiplexed by the short-circuit connection. The features output by the plurality of convolution branches and the features obtained by the short circuit connection are fused by the convolution sub-module, and then the input features and the features output by the plurality of convolution branches are fused by a fusion layer (Filter Concat).

The structure of the dense connection and the short-circuit connection is similar to the quality control network model, and is not described here again.

302. And filtering the filtered tissue image according to the scanning type to obtain a target image.

In practical application, after the scanning type of the filtered tissue image is obtained, the filtered tissue image can be filtered according to the scanning type to obtain a target image. For example, when the filtered tissue image is a clinically available fundus image, filtering and deleting images which do not meet the standard according to the scanning type, including images of a saccade papilla, images of a linear-scanning macular region and the like, and deleting an OCT image corresponding to the fundus image which does not meet the standard according to the scanning type, wherein the remaining OCT image which meets the standard according to the scanning type is a target image.

(2) And (4) training an image scanning type recognition network model.

The image filtering method can further comprise the step of training the image scanning type recognition network model.

In practical applications, the training set may be obtained in various ways, for example, the training set may be formed by acquiring sample images through a medical instrument, or may be obtained through a network, a database, a local network, or the like.

In an embodiment, for example, the size of the fundus image in the sample image in the training set may be 496 × 496, the fundus image may be normalized, and then the image may be subjected to data enhancement, so as to increase the training data amount, improve the generalization capability of the model, increase noise data, and improve the robustness of the model.

In practical applications, a target training sample including a certain number of sample images may be obtained by sampling from a training set.

In an embodiment, sampling from the training set to obtain the target training sample may also be performed by a sampler, for example.

In an embodiment, since the ordinary Sampler cannot adjust the sampling probability of the difficult samples, a Hard sample Boosting Sampler (Hard sample Boosting Sampler) may be used to sample the training set. In practical application, the novel sampler initializes all sample images to make each sample image have the same sampling probability, for example, the training set includes B training samples, the training set includes M sample images, each sample image is sampled at a probability of 1/M, and a target training sample is obtained for network model training, the target training sample may include N sample images, after sample training, a loss function corresponding to the N sample images in the target training sample is obtained, the sample image can be considered to belong to a difficult sample with a large loss function value, then the sample images in the target training sample can be sorted from large to small according to the loss function value, and the probability 1/M of the sample image with the largest loss function value is doubled, so that the difficult sample has a greater probability to be sampled when the sampler is sampled next time, therefore, difficult samples can be repeatedly sampled, and the novel sampler belongs to the field of replacement sampling.

In practical application, the image scanning type recognition network model may be trained by using a training sample, for example, the training sample may be added to the image scanning type recognition network model to be trained, so as to obtain a trained image scanning type recognition network model, and the trained image scanning type recognition network model is used as the image scanning type recognition network model.

In one embodiment, for example, the parameters of the Image scan type identification network model may be pre-trained on the Image Net data set using the inclusion V4 network model, and the newly added convolutional layer may be initialized using a gaussian distribution with a variance of 0.01 and a mean of 0.

In one embodiment, for example, Adam-based gradient descent may be used to solve the convolutional layer parameters w and bias parameters b of the image scan type identification network model, with the learning rate being attenuated by 90% every 20K iterations.

As can be seen from the above, in the embodiments of the present application, tissue images of multiple modalities of a target tissue are acquired, and a quality control network model to be used is determined, where the quality control network model includes: the method comprises the steps that a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network are sequentially connected, tissue image features of multiple modes are fused on the basis of the cross-channel convolution sub-network to obtain a first fusion feature, the first fusion feature is subjected to feature intensive extraction on the basis of the convolution sub-network to obtain a second fusion feature, the second fusion feature is classified on the basis of the classification sub-network to obtain a quality identification result of the tissue image, and the tissue images of the multiple modes are filtered according to the quality identification result to obtain the filtered tissue image. According to the scheme, the idea of dense connection is added into a network model, so that the width and the depth of a network are considered, and the richness and the reusability of features in the same mode are increased. And by cross-channel convolution, information in images of multiple modes can be simultaneously combined, and characteristics of different modes can be increased. By adopting the novel sampler, the sampling rate of the difficult samples is increased, repeated sampling of the difficult samples is realized, and the image filtering efficiency and the accuracy are improved.

The method described in the above embodiments is further illustrated in detail by way of example.

Referring to fig. 4, in the present embodiment, an example in which the image filtering apparatus is specifically integrated in a network device will be described.

401. A network device acquires tissue images of multiple modalities of a target tissue.

402. The network device determines a quality control network model to be used.

(1) And (4) determining a quality control network model.

As shown in fig. 14, the quality control network model may include a cross-channel convolution sub-network, a convolution sub-network, and a classification sub-network connected in sequence, where the convolution sub-network includes at least one convolution module, and the classification module may include an averaging pooling layer, a classifier, and the like. The quality control network model may be a network model that improves the inclusion V4 network model. The idea of Dense connection in the Dense Net network model can be added into the inclusion V4 network model, so that the network model considering both the network width and the network depth can be obtained.

Moreover, both the inclusion V4 network model and the density Net network model only consider a modality (an image) as an input, and since the tissue image may include multiple modalities, the network model with a single structure has no way to effectively process tissue images of multiple modalities at the same time, and lack of information of a modality may cause the classification of partial data by the network model to be inaccurate. Therefore, by adding a cross-channel convolution sub-network in the quality control network model, information of multiple modes can be fused, so that the network equipment can process multi-mode input simultaneously.

(2) And (5) training a quality control network model.

(a) A training set is obtained.

In one embodiment, a difficult sample new Sampler (Hard Example Boosting Sampler) may be used to sample the training set. For example, a novel sampler for difficult samples may be used to initialize the sample images in the training set so that the probability of each training sample image being sampled is the same, and then the training set is sampled to obtain the target training sample.

403. The network equipment fuses the organization image characteristics of the plurality of modes based on the cross-channel convolution sub-network to obtain a first fusion characteristic.

As shown in fig. 14, in order to improve the accuracy of the quality identification result of the quality control network model acquiring the tissue image, the quality identification may be performed using a network model including a cross-channel convolution sub-network, a convolution sub-network, and a classification sub-network.

As shown in fig. 15, in order to effectively process the fundus image and the OCT image at the same time, features of the fundus image and the OCT image may be fused using a cross-channel convolution sub-network to improve feature diversity, thereby improving accuracy of a network model.

In practical application, for example, as shown in fig. 15, based on a cross-channel convolution sub-network, a feature extraction unit may extract features of an OCT image and features of a fundus image to obtain a 2 × W × H × C feature image, a mode reforming unit may perform feature reforming on the feature image to obtain a C × 2 × H × W feature image, and a feature integration unit may reform bimodal information together by using a 2 × 1 × 1 three-dimensional convolution to obtain a first fusion feature, thereby completing the cross-channel convolution.

404. And the network equipment performs dense feature extraction on the first fusion feature based on the convolution sub-network to obtain a second fusion feature.

In practical applications, for example, the first fused feature may be input into a convolution sub-network to obtain the second fused feature.

As shown in fig. 14, a convolution sub-network may include a first convolution module, a second convolution module, and a third convolution module. The convolution module may include a plurality of convolution sub-modules, for example, the first convolution module may include a plurality of first convolution sub-modules. And determining the first fusion feature acquired through the cross-channel convolution sub-network as the current input feature of the first convolution module, and then extracting the feature of the current input feature based on the convolution sub-module in the first convolution module. And then extracting the features extracted by the first convolution module based on the convolution submodule in the second convolution module, extracting the features extracted by the second convolution module based on the convolution submodule in the third convolution module, and finally taking the features extracted by the third convolution module as second fusion features.

For example, in order to increase the reuse rate of the features, all convolution sub-modules in the same convolution module may be densely connected, and each path of features between the same modal module is reused, the first convolution module may include a plurality of first convolution sub-modules, and the plurality of first convolution sub-modules are connected in a dense connection manner, and each first convolution sub-module is connected to the previous first convolution sub-module.

In order to increase feature diversity, a short-circuit connection can be added to the inclusion a unit, the inclusion B unit and the inclusion C unit in the basic unit (convolution sub-module) of the inclusion V4 network model, and the original input features are multiplexed through the short-circuit connection. The target extraction features are obtained by fusing convolution output features output by a plurality of convolution branches and target input features obtained through short-circuit connection, and then fusing the target input features and the convolution output features through a fusion layer (Filter Concat).

405. And the network equipment classifies the second fusion characteristics based on the classification sub-network to obtain the quality identification result of the organization image.

In practical applications, for example, the second fusion features are input into the classification sub-network, and the quality recognition results of the fundus image and the OCT image are obtained, for example, the probability related to the image quality information may be obtained.

406. And the network equipment filters the organization images of the plurality of modes according to the quality identification result to obtain the filtered organization images.

407. The network device identifies a scan type of the filtered tissue image.

In practical applications, for example, the clinically usable fundus images in the filtered tissue images can be input into the image scanning type identification network model to obtain the scanning types of the clinically usable fundus images, and the fundus images which do not meet the standard can be removed through the image scanning type identification network model, including: annular glance at the nipple image, line glance at the image of the macula, and so on. And deleting the OCT image corresponding to the fundus image of which the scanning type does not meet the standard, wherein the left OCT image of which the scanning type meets the standard is the target image.

408. And the network equipment filters the filtered organization image according to the scanning type to obtain a target image.

In practical application, after the scanning type of the filtered tissue image is obtained, the filtered tissue image can be filtered according to the scanning type to obtain a target image. For example, when the filtered tissue image is a clinically available fundus image, filtering and deleting images which do not meet the standard according to the scanning type, including a saccade papilla image, a fundus image in a linear scanning macular region and the like, and deleting an OCT image corresponding to the fundus image whose scanning type does not meet the standard, wherein the remaining OCT image whose scanning type meets the standard is a target image.

(2) And (4) training an image scanning type recognition network model.

In an embodiment, since the ordinary Sampler cannot adjust the sampling probability of the difficult samples, a new Sampler (Hard sample Boosting Sampler) for the difficult samples can be used to sample the training set.

In practical application, the novel sampler initializes all sample images, so that the probability of each sample image being sampled is the same, for example, the training set includes B training samples, the training set includes M sample images, a training sample can be obtained from the training set as a target training sample based on the novel sampler of the difficult sample, the sampling probability of each sample image is 1/M, and the target training sample includes N sample images. After sample training, obtaining loss functions corresponding to N sample images in the target training sample, sequencing the sample images in the target training sample according to the size of the loss functions, and adjusting the sampling probability of the sample image with the largest loss function from 1/M to 2/M so as to increase the sampling probability of the difficult sample.

The image filtering method can effectively improve the accuracy of the quality control network model and the image scanning type identification network model, and can be applied to auxiliary image data cleaning, image database construction and quality control module for developing subsequent algorithms.

As shown in fig. 21, the distribution of clinical images was actually tested for 100 randomly selected images according to the image filtering method, wherein the scan types include four types, a macular line scan type, a papillary line scan type, a macular cross scan type, and a papillary replacement scan type. OCT image quality types include two categories, the clinically available ones and the less quality ones. The experimental results are 74 macula-overmacular line-scan types, 8 papilla-overviews line-scan types, 13 macula-macular transverse-scan types and 5 papilla-replacement-scan types. 78 clinically available types and 22 types with poor quality, so that the accuracy rate of the image scanning type identification network model is 99 percent, and the accuracy rate of the quality control network model is 99 percent.

The following table shows that the accuracy of the unqualified scanning type discrimination can be greatly improved by the image scanning type recognition network model on the premise of keeping the accuracy of the over-looking nipple and the over-macula lutea.

As can be seen from the above, in the embodiment of the present application, a network device acquires tissue images of multiple modalities of a target tissue, and determines a quality control network model to be used, where the quality control network model includes: the method comprises the steps that a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network are sequentially connected, tissue image features of multiple modes are fused on the basis of the cross-channel convolution sub-network to obtain a first fusion feature, the first fusion feature is subjected to feature intensive extraction on the basis of the convolution sub-network to obtain a second fusion feature, the second fusion feature is classified on the basis of the classification sub-network to obtain a quality identification result of the tissue image, and the tissue images of the multiple modes are filtered according to the quality identification result to obtain the filtered tissue image. According to the scheme, the idea of dense connection is added into a network model, so that the width and the depth of a network are considered, and the richness and the reusability of features in the same mode are increased. And by cross-channel convolution, information in images of multiple modes can be simultaneously combined, and characteristics of different modes can be increased. By adopting the novel sampler, the sampling rate of the difficult samples is increased, repeated sampling of the difficult samples is realized, and the image filtering efficiency and the accuracy are improved.

In order to better implement the method, the embodiment of the present application further provides an image filtering apparatus, which may be specifically integrated in a network device, such as a terminal or a server.

For example, as shown in fig. 22, the image filtering apparatus may include an acquisition module 221, a determination module 222, a cross-channel convolution module 223, a convolution module 224, a classification module 225, and a filtering module 226, as follows:

an obtaining module 221, configured to obtain tissue images of multiple modalities of a target tissue;

a determining module 222 for determining a quality control network model to be used, the quality control network model comprising: the cross-channel convolution sub-network, the convolution sub-network and the classification sub-network are connected in sequence;

a cross-channel convolution module 223, configured to fuse tissue image features of multiple modalities based on the cross-channel convolution sub-network to obtain a first fusion feature;

a convolution module 224, configured to perform dense feature extraction on the first fusion feature based on the convolution sub-network to obtain a second fusion feature;

a classification module 225, configured to classify the second fusion feature based on the classification subnetwork, so as to obtain a quality identification result of the tissue image;

a filtering module 226, configured to filter the tissue images in the multiple modalities according to the quality identification result, so as to obtain a filtered tissue image.

In an embodiment, the cross-channel convolution module 223 may be specifically configured to:

In an embodiment, the convolution module 224 may be specifically configured to:

In an embodiment, referring to fig. 23, the image filtering apparatus may further include:

a scan type obtaining module 227, configured to identify a scan type of the filtered tissue image;

a target image obtaining module 228, configured to filter the filtered tissue image according to the scanning type, so as to obtain a target image.

In an embodiment, the scan type obtaining module 227 may be specifically configured to:

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, in the embodiment of the present application, the obtaining module 221 obtains tissue images of multiple modalities of a target tissue, and the determining module 222 determines a quality control network model to be used, where the quality control network model includes: the method comprises the steps that a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network which are sequentially connected are used for fusing tissue image features of multiple modes through a cross-channel convolution module 223 based on the cross-channel convolution sub-network to obtain a first fusion feature, a convolution module 224 based on the convolution sub-network performs feature intensive extraction on the first fusion feature to obtain a second fusion feature, a classification module 225 based on the classification sub-network classifies the second fusion feature to obtain a quality identification result of a tissue image, and a filtering module 226 is used for filtering the tissue images of the multiple modes according to the quality identification result to obtain a filtered tissue image. According to the scheme, the idea of dense connection is added into a network model, so that the width and the depth of a network are considered, and the richness and the reusability of features in the same mode are increased. And by cross-channel convolution, information in images of multiple modes can be simultaneously combined, and characteristics of different modes can be increased. By adopting the novel sampler, the sampling rate of the difficult samples is increased, repeated sampling of the difficult samples is realized, and the image filtering efficiency and the accuracy are improved.

The embodiment of the present application further provides a network device, which may be a server or a terminal, and integrates any one of the image filtering apparatuses provided in the embodiment of the present application. As shown in fig. 24, fig. 24 is a schematic structural diagram of a network device provided in an embodiment of the present application, specifically:

the network device may include components such as a processor 241 of one or more processing cores, memory 242 of one or more computer-readable storage media, a power supply 243, and an input unit 244. Those skilled in the art will appreciate that the network device configuration shown in fig. 24 does not constitute a limitation of network devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 241 is a control center of the network device, connects various parts of the entire network device using various interfaces and lines, performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 242 and calling data stored in the memory 242, thereby performing overall monitoring of the network device. Optionally, processor 241 may include one or more processing cores; preferably, the processor 241 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 241.

The memory 242 may be used to store software programs and modules, and the processor 241 executes various functional applications and data processing by operating the software programs and modules stored in the memory 242. The memory 242 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the network device, and the like. Further, the memory 242 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 242 may also include a memory controller to provide processor 241 access to memory 242.

The network device further includes a power supply 243 for supplying power to each component, and preferably, the power supply 243 may be logically connected to the processor 241 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The power supply 243 may also include any component including one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The network device may also include an input unit 244, the input unit 244 operable to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the network device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 241 in the network device loads the executable file corresponding to the process of one or more application programs into the memory 242 according to the following instructions, and the processor 241 runs the application programs stored in the memory 242, so as to implement various functions as follows:

acquiring tissue images of a plurality of modalities of a target tissue, and determining a quality control network model to be used, wherein the quality control network model comprises: the method comprises the steps that a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network are sequentially connected, the tissue image features of multiple modes are fused on the basis of the cross-channel convolution sub-network to obtain a first fusion feature, the first fusion feature is subjected to feature intensive extraction on the basis of the convolution sub-network to obtain a second fusion feature, the second fusion feature is classified on the basis of the classification sub-network to obtain a quality identification result of the tissue image, and the tissue images of the multiple modes are filtered according to the quality identification result to obtain the filtered tissue image.

Processor 241 may also execute applications stored in memory 242 to implement the following functions:

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, in the embodiments of the present application, tissue images of multiple modalities of a target tissue are acquired, and a quality control network model to be used is determined, where the quality control network model includes: the method comprises the steps that a cross-channel convolution sub-network, a convolution sub-network and a classification sub-network are sequentially connected, the tissue image features of multiple modes are fused on the basis of the cross-channel convolution sub-network to obtain a first fusion feature, the first fusion feature is subjected to feature intensive extraction on the basis of the convolution sub-network to obtain a second fusion feature, the second fusion feature is classified on the basis of the classification sub-network to obtain a quality identification result of the tissue image, and the tissue images of the multiple modes are filtered according to the quality identification result to obtain the filtered tissue image. According to the scheme, the idea of dense connection is added into a network model, so that the width and the depth of a network are considered, and the richness and the reusability of features in the same mode are increased. And by cross-channel convolution, information in images of multiple modes can be simultaneously combined, and characteristics of different modes can be increased. By adopting the novel sampler, the sampling rate of the difficult samples is increased, repeated sampling of the difficult samples is realized, and the image filtering efficiency and the accuracy are improved.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the image filtering methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any image filtering method provided in the embodiments of the present application, the beneficial effects that can be achieved by any image filtering method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The foregoing detailed description is directed to an image filtering method, an image filtering device, and a storage medium provided in embodiments of the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the methods and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image filtering method, comprising:

acquiring tissue images of a plurality of modalities of a target tissue;

determining a quality control network model to be used, the quality control network model comprising: the cross-channel convolution sub-network, the convolution sub-network and the classification sub-network are connected in sequence; wherein the convolution sub-network comprises at least one convolution module and a parallel dimension reduction structure, and the convolution module comprises at least one densely connected convolution sub-module;

2. The image filtering method according to claim 1, wherein performing dense feature extraction on the first fused feature based on the convolution sub-network to obtain a second fused feature comprises:

randomly determining a convolution module as a current convolution module;

and returning to the step of performing dense feature extraction on the current input features based on the convolution submodule in the current convolution module until all the convolution modules in the convolution sub-network perform feature extraction to obtain second fusion features.

3. The image filtering method according to claim 2, wherein performing dense feature extraction on the current input feature based on a convolution submodule in the current convolution module comprises:

randomly determining a convolution submodule of a current convolution module as a current convolution submodule;

and returning to the step of performing feature extraction on the target input features based on the current convolution submodule to obtain target extraction features until feature extraction is performed on all convolution submodules in the current convolution module.

4. The image filtering method according to claim 3, wherein the convolution sub-module comprises a plurality of convolution branches;

performing feature extraction on the target input features based on the current convolution submodule to obtain target extraction features, wherein the feature extraction includes:

5. The image filtering method according to claim 1, wherein the cross-channel convolution sub-network includes a feature extraction unit, a modal reconstruction unit, and a feature fusion unit;

fusing tissue image features of a plurality of modalities based on the cross-channel convolution sub-network to obtain a first fused feature, comprising:

6. The image filtering method according to claim 1, further comprising:

identifying a scan type of the filtered tissue image;

and filtering the filtered tissue image according to the scanning type to obtain a target image.

7. The image filtering method according to claim 6, wherein the image scan type recognition network model comprises: a convolution sub-network and a classification sub-network which are connected in sequence; the convolution sub-network comprises at least one convolution module comprising a plurality of densely connected convolution sub-modules;

identifying a scan type of the filtered tissue image, comprising:

8. The image filtering method according to claim 1, further comprising:

9. The image filtering method according to claim 8, wherein updating the sampling probability of the sample image in the target training sample according to the loss function comprises:

10. An image filtering apparatus, comprising:

a determination module for determining a quality control network model to be used, the quality control network model comprising: the cross-channel convolution sub-network, the convolution sub-network and the classification sub-network are connected in sequence; wherein the convolution sub-network comprises at least one convolution module and a parallel dimension reduction structure, and the convolution module comprises at least one densely connected convolution sub-module;

11. A network device comprising a memory storing instructions and a processor loading the instructions to perform the steps of the method according to any one of claims 1 to 10.

12. A storage medium having stored thereon a computer program, characterized in that, when the computer program is run on a computer, it causes the computer to execute the image filtering method according to any one of claims 1 to 10.