Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The following terms are used herein, see table 1.
TABLE 1
Example 1
The embodiment of the application provides a convolutional neural network model, which at least comprises a first module, a second module, a third module, a fourth module, a fifth module and a sixth module which are sequentially connected, as shown in fig. 1.
The first module is used for carrying out the reduction processing on the channel number of the received target feature map;
the second module and the third module are respectively and sequentially used for performing first conversion on the channel number after the reduction processing and performing dimension reduction processing on the dimension of the target feature map;
the fourth module is configured to perform a second conversion based on the first converted channel number;
the fifth module is configured to obtain a feature vector of each channel plane after the second conversion;
and the sixth module is used for outputting a classification result according to the feature vector.
Specifically, the target feature map specifically includes: and preprocessing a target image of which the micro factors and macro factors are required to be considered, so as to obtain a target feature map. In some examples, the preprocessing may include: and cutting the target image into a plurality of image blocks, extracting feature vectors of the image blocks, and generating a target feature map corresponding to the target image according to the feature vectors. Further, the generating the target feature map corresponding to the target image according to the feature vector may include: and acquiring the characteristic length of the image block, and splicing according to the characteristic vector of the image block and the characteristic length to generate a target characteristic diagram corresponding to the target image. The target feature map has a larger channel value and a smaller size than an image that can be processed by a general convolutional neural network model in the related art. Other details of the target feature map may be found in the description of the third embodiment, and will not be described in detail herein.
It will be appreciated that in the related art, since the target image belongs to the macro image, a larger memory is occupied, for example, the pathological image needs to occupy a larger memory, which is typically above several hundred Megabytes (MB), even several gigabytes (G). If the whole sample image corresponding to the target image is input to the convolutional neural network model at one time, the video memory capacity of the common GPU is far insufficient, so that training and learning of the convolutional neural network model cannot be performed in computer engineering; if the whole sample image is split into a plurality of sample image blocks and then is independently learned, because the association relation among the sample image blocks is not considered, the lack of macroscopic consideration on the whole sample image can lead to a narrow conclusion of 'see only trees and not forest'. Therefore, if the existing convolutional neural network model in the related art is used, the accuracy of feature recognition of the pathological image is low. In the embodiment of the application, the feature recognition of 1 pathological image is converted into the feature recognition of 1 pathological feature map, so that the data size of the pathological image is reduced by thousands of times. For example, if the pathology image is segmented into pathology image blocks of 2048×2048, the feature length of which is 2048, the dimension reduction rate is 2048×2048×3/(2048×4) =1536 times. Therefore, when the convolutional neural network model is used for carrying out feature recognition on the pathological feature map, the consumption of GPU video memory can be greatly reduced while the accuracy of recognition is ensured.
In the embodiment of the present application, the target image is taken as a pathological image as an example. However, it will be understood by those skilled in the art that other types of images that need to consider micro-factors and macro-factors in the images may also be used as inputs to the convolutional neural network model provided in embodiments of the present application, and the types of the target images are not specifically limited in the embodiments of the present application.
In the embodiment of the present application, a pathological feature map of the pathological image is used as an input of the convolutional neural network model. Since the dimension of the pathological feature map is m×n×k. Wherein K represents a characteristic length of each pathological image block and is a fixed value. In reality, the pathological images have different sizes, so that the values of M and N corresponding to the different pathological images are different and are not fixed values. Convolutional neural networks as they are known in the art cannot be handled if they are used. This is because the number of channels of an image that can be processed by the convolutional neural network in the related art is 1 or 3, wherein 1 corresponds to a gray image and 3 corresponds to a color RGB image, and the size of the image that can be processed by the convolutional neural network in the related art is the width x height of the image, which is much larger than the pathological feature map, and the feature content in the image can be completely preserved after scaling the image. In summary, in order to expand the receptive field of the pathological image and combine the micro-feature and macro-feature of the pathological image, the convolutional neural network model provided by the embodiment of the application is provided by using the characteristics of unfixed size, multiple channels and smaller size of the input pathological feature map.
The convolutional neural network model provided by the embodiment of the application is suitable for pathological images. Specifically, firstly, carrying out a reduction treatment on the channel number of a received target feature map through a first module, for example, reducing the channel number of the pathological feature map to 1/4 of the original channel number, then sequentially carrying out a first conversion on the channel number after the reduction treatment through a second module and a third module, carrying out a dimension reduction treatment on the dimension of the target feature map, and reducing the dimension of the pathological feature map to 1/2 of the original dimension; after the fourth module performs the second conversion based on the number of channels after the first conversion, the fifth module can obtain the feature vector of each channel plane after the second conversion, so that the sixth module outputs a classification result according to the feature vector.
The sixth module is a classifier, and the classifier may belong to two classes or may belong to multiple classes, which is not specifically limited herein. When the target image is a pathological image, if the classifier belongs to a second classification, the classifier can be suitable for screening scenes aiming at a specific tumor, such as screening of papillary thyroid cancer, and the result of the second classification is negative or positive; if the classifier belongs to multiple classifications, the classifier can be suitable for a scene of screening for multiple tumors, and the result of multiple classifications can be tumor classification/tumor grading and the like.
Compared with the related art, the convolutional neural network model provided by the embodiment of the application can be suitable for processing the target feature map obtained after preprocessing the target image which needs to consider the micro factors and the macro factors in the image, and the specific structure of the convolutional neural network model is redesigned.
Example two
The embodiment of the application provides a specific implementation mode of a convolutional neural network model according to the first embodiment. See fig. 1 and 2.
The first module comprises a convolution operation layer, a normalization layer and a nonlinear activation function layer;
the second module and the third module respectively comprise a convolution module and a descending convolution block;
the fourth module comprises the convolution module, the normalization layer and the nonlinear activation function layer;
the fifth module comprises a global maximum pooling layer and a full connection layer;
the sixth module is a classifier.
For ease of understanding, the structure of the convolutional neural network model will be described herein with reference to an example in which the number of channels of the input pathological feature map is 2048, and whether or not the pathological image is papillary carcinoma is identified by 2 classification.
A first module: by adopting 1 two-dimensional convolution kernel of 3×3, the channel number of the pathological feature map is reduced from 2048 to 512, and then sequentially passes through a normalization layer BN layer and a nonlinear activation function layer ReLU layer, wherein the BN layer can improve the stability of the convolution neural network model, and the ReLU layer can improve the nonlinear learning capacity of the convolution neural network model.
A second module: the channel number of the pathological feature map is increased from 512 to 1024 by adopting 1 convolution module ConvGroup and 1 descending convolution block reduce_block, and then the channel number of the pathological feature map is decreased from 1024 to 512, and the descending convolution block reduce_block is used for halving the width and height dimensions of the pathological feature map and reducing the dimension in the feature dimension.
And a third module: the third module and the second module are identical in structure and function. The depth of the convolutional neural network model can be increased, and the fitting capacity of the convolutional neural network model is improved.
A fourth module: and (3) after the channel number of the pathological feature map is increased from 512 to 1024 by adopting 1 convolution module ConvGroup, sequentially processing the normalization layer BN layer and the nonlinear activation function layer ReLU layer. Similarly, the BN layer may promote stability of the convolutional neural network model, and the ReLU layer may promote nonlinear learning ability of the convolutional neural network model.
A fifth module: and taking the maximum value of each channel plane of the pathological feature map by adopting 1 two-dimensional global maximum pooling layer GlobalMaxPool2d, wherein the dimension of the pathological feature map is M multiplied by N multiplied by K, so that 1 feature vector with 1024 length can be obtained, and then the feature vector with 1024 length is reduced to 2 by a full-connection layer Linear, so that the sixth module can conveniently conduct two-classification.
It can be understood that before the convolutional neural network finally enters the sixth module to output the identification result, the fifth module sequentially adopts global maximum pooling of channels and a full-connection layer with a fixed size, so that the convolutional neural network can be matched with the input of pathological feature maps with different sizes; the size of the convolution neural network model gradually decreases the size amplitude of the pathological feature map in the processing process of the pathological feature map due to the fact that the channel number of the pathological feature map is large and the size of the channel number is small.
A sixth module: may be a Softmax classifier. For example, the classification result of the pathological feature map can be output through a Softmax classifier, and the classification result is the identification result. The recognition result may be, for example, a probability of positive or negative. The normalization formula of the Softmax classifier is as follows:
in the training stage of the convolutional neural network model, training can be performed based on a plurality of pathological sample images. a, a ij Representing the probability that the ith pathology sample image belongs to the jth class; z ij Representing that the ith pathology sample image corresponds to a feature value on the jth class; c represents the total number of classifications. Similarly, in the use stage of the convolutional neural network model, training may be performed based on 1 batch of pathological sample images, that is, a plurality of pathological sample images, which will not be described herein.
In some examples, for each pathology image output of length 2, its vector format may be: "probability that the pathology image is a normal pathology image; the pathological image is the probability of a papillary carcinoma pathological slice. In practical application, a class corresponding to the Index of the maximum classification probability value of the vector array may be used as the classification result of the pathological image.
Further, in some embodiments of the present application, the convolution module may include a convolution block and a feature channel stitching module; the convolution block comprises the normalization layer, the nonlinear activation function layer and the two-dimensional convolution operation layer which are sequentially connected.
As shown in fig. 3, a process flow diagram of the convolution module is shown. The 1 convolution module ConvGroup comprises 2 convolution blocks. For the input feature x, the convolution module can generate a feature Y1 through a convolution block conv_block, then the feature x and the feature Y1 are spliced in the feature channel dimension through a feature channel splicing module, and the spliced output feature is marked as a feature x||Y1; and then, using the characteristic x I Y1 as an input characteristic, generating a characteristic Y2 by using another convolution block conv_block, and splicing the characteristic x, the characteristic Y1 and the characteristic Y2 in the characteristic channel dimension by a characteristic channel splicing module, wherein the characteristic output after splicing is marked as the characteristic x I Y1I Y2.
As shown in fig. 4, is a process flow diagram of the convolution block. Each convolution block comprises a normalization layer BN layer, a nonlinear activation function layer ReLU layer and a two-dimensional convolution operation layer Conv2d layer which are sequentially connected, wherein the two-dimensional convolution operation layer Conv2d layer can specifically adopt a 3 multiplied by 3 convolution kernel.
In some embodiments of the present application, the down convolution block includes the normalization layer, the two-dimensional convolution operation layer, and an average pooling layer connected in sequence.
As shown in fig. 5, a flowchart of the process of the down convolutional block is shown. The descending convolution block reduce_block is specifically a channel/size descending block, and comprises a normalization layer BN layer, a two-dimensional convolution operation layer Conv2d layer and a two-dimensional average pooling layer AvgPool2d layer which are sequentially connected. The Conv2d layer of the two-dimensional convolution operation layer can specifically adopt a 1 multiplied by 1 convolution kernel for reducing the number of dimension channels; the two-dimensional average pooling layer AvgPool2d is used for reducing the characteristic amplitude of the dimension-reducing pathological feature map, namely halving the length and width of the corresponding pathological feature map.
Compared with the related art, the specific structure of the convolutional neural network model is redesigned in order to adapt to the processing of the target feature map obtained after the target image of micro factors and macro factors in the image is required to be considered.
In practical application, a thyroid pathology image is used as a data set to identify and predict whether the thyroid pathology image is papillary cancer, and the identification result is a negative or positive 2-classification result. The total number of data sets is 1245, and each thyroid pathology image has a negative label or a positive label, wherein the number of positive images is 1019, and the number of negative images is 226. According to 7:3 (training set: validation/test set, 871:374) ratio to randomly divide the number of samples, training the convolutional neural network model to obtain a confusion matrix of the prediction results as shown in table 2 below:
TABLE 2
The corresponding indices are shown in table 3 below:
TABLE 3 Table 3
Compared with other algorithms, the convolutional neural network model provided by the embodiment of the application has higher recognition accuracy, and is shown in the following table 4:
TABLE 4 Table 4
Example III
The third embodiment of the present application is an application embodiment provided according to any one or any several of the first to second embodiments. In the implementation of the present application, the target image is specifically a pathological image.
Some embodiments of the present application provide a system for identifying a pathological image, where the system includes an image segmentation unit, a feature extraction unit, an image feature generation unit, and an image input unit;
the image segmentation unit is used for segmenting the pathological image into a plurality of pathological image blocks;
the feature extraction unit is used for extracting pathological features of the pathological image blocks;
the image feature generation unit is used for generating a pathological feature map corresponding to the pathological image according to the pathological feature;
the image input unit is configured to input the pathological feature map to a pre-trained convolutional neural network model according to any one of the first to second embodiments, so that the convolutional neural network model performs feature recognition on the pathological feature map, and a recognition result is obtained.
The respective units described above are specifically described below.
In some examples, the pathological slice of the cell tissue can be scanned by a microscope, so that a pathological image occupying a large memory is generated, and the pathological image scanned by the microscope is in svs format and can be opened and browsed by using tool software such as ASAP. Then, the image segmentation unit may segment the pathology image to obtain m×n pathology image blocks, where M represents the number of rows and N represents the number of columns, as shown in fig. 6. The size of the pathological image block can be 512×512, 1024×1024, 2048×2048, or the like. In some examples, the image segmentation unit may be an OpenSlide library interface, for example, the OpenSlide library interface may be used to segment a svs format pathology image.
For example, assuming that the pathological image has a size of w×h pixels, the image segmentation unit may segment the pathological image into pathological image blocks of 2048×2048, the number of lines
Column number
In some examples, portions of the pathology image having fewer rows and/or columns than 2048 may be filled with blanks. It will be appreciated that +.>
The symbology is rounded up.
In some examples, the pathological features of the pathological image blocks may be extracted by a feature extraction unit. The feature extraction unit may be a feature extraction model trained based on a convolutional neural network such as res net disclosed in the related art. For example, the ResNet50 convolutional neural network disclosed in the related technology can be utilized to delete the last fully connected layer of the ResNet50 convolutional neural network and all layers after the fully connected layer, and further training is performed to obtain a feature extraction model, wherein the implementation of the feature extraction model mainly comprises a training stage and a using stage.
Specifically, during the training phaseThe method comprises the steps that a transfer learning technology can be used, a transfer ResNet convolutional neural network takes weight parameters which are determined based on an image set of the ImageNet as a pre-training model, and then a large number of pathological image blocks are input into the pre-training model for fine adjustment learning, so that a feature extraction model for extracting pathological features of the pathological image blocks is obtained. Wherein, in some examples, before inputting a large number of pathological image blocks into the pre-training model for fine-tuning learning, a simple label classification can be performed manually on whether a focus area exists in the large number of pathological image blocks; in some other examples, the feature extraction model may also be trained using an unsupervised learning mode, as embodiments of the present application are not specifically limited. In the using stage, a plurality of pathological image blocks of the pathological image can be input into the feature extraction model, so that the feature extraction model performs feature extraction on each pathological image block to obtain feature vectors for representing the pathological features, for example, the feature vectors are F, F epsilon R 2048 Where R represents a rational number and 2048 represents the dimension of feature extraction using a res net50 convolutional neural network.
Further, in some embodiments of the present application, the image feature generating unit may include an acquiring module and a stitching module; the acquisition module is used for acquiring the characteristic length of the pathological image block; and the splicing module is used for splicing the pathological image blocks and the characteristic length to generate a pathological feature map corresponding to the pathological image.
For example, the obtaining module may determine the feature length of each pathological image block according to the dimension of feature extraction performed by the res net convolutional neural network, for example, the dimension of feature extraction performed by using the res net50 convolutional neural network is 2048, and the value of the feature length of each pathological image block is 2048; and then the splicing module splices according to the size of the pathological image blocks and the characteristic length of each pathological image block to generate a pathological feature map corresponding to the pathological image. For example, if the dimension of the pathological image block is m×n, the dimension of the pathological feature map is m×n×k. Wherein K represents a characteristic length of each of the pathological image blocks.
After the preprocessing of the image segmentation unit, the feature extraction unit and the image feature generation unit, each pathological image may form 1 corresponding m×n×k-dimensional pathological feature map, where K is a fixed value, and as mentioned above, for example, the dimension of feature extraction by using a res net50 convolutional neural network is 2048, the value of the feature length of each pathological image block is 2048. In reality, each pathology image has different sizes, so that the values of M and N are correspondingly different from each other. By sample statistics from the actual thyroid pathology image, the values of M and N are mainly distributed between [10,60], typically m+.n.
Further, the image input unit may input the pathological feature map with the dimension of mxnxk to a pre-trained convolutional neural network model according to any one of the first to second embodiments, so that the convolutional neural network model performs feature recognition on the pathological feature map to obtain a recognition result.
It is easy to find that, compared with the related art, the recognition system for the pathological image provided by the embodiment of the application can perform feature recognition on the pathological image, and the pathological features of the pathological image blocks are extracted by dividing the pathological image into a plurality of pathological image blocks, so that the requirements for microscopic features such as local cell nucleus morphology and the like in the pathological image are met; by generating a pathological feature map corresponding to the pathological image according to the pathological features, the requirements of macroscopic characteristics such as cell growth vigor and the like of certain areas in the pathological image are met; the feature recognition of 1 pathological image is converted into the feature recognition of 1 pathological feature image, so that the data volume of the pathological image is reduced by thousands of times. For example, if the pathology image is segmented into pathology image blocks of 2048×2048, the feature length of which is 2048, the dimension reduction rate is 2048×2048×3/(2048×4) =1536 times. Therefore, when the convolutional neural network model is used for carrying out feature recognition on the pathological feature map, the consumption of GPU video memory can be greatly reduced while the accuracy of recognition is ensured.
Example IV
Some embodiments of the present application provide a method for identifying a pathological image, which is applied to the system according to the third embodiment, and the method may include the following steps, as shown in fig. 7:
step S101, segmenting the pathological image into a plurality of pathological image blocks;
step S102, extracting pathological features of the pathological image blocks;
step S103, generating a pathological feature map corresponding to the pathological image according to the pathological feature;
step S104, inputting the pathological feature map to a pre-trained convolutional neural network model according to any one of the above embodiments, so that the convolutional neural network model performs feature recognition on the pathological feature map to obtain a recognition result.
It is to be noted that, in this embodiment of the present application, the implementation details in the third embodiment are the same as those applicable thereto, and in order to avoid repetition, the details are not repeated here.
In addition, the embodiment of the application further provides an electronic device, the structure of which is shown in fig. 7, the device includes a memory 11 for storing computer readable instructions and a processor 12 for executing the computer readable instructions, where the computer readable instructions, when executed by the processor, trigger the processor to execute the virtual content distribution method.
The methods and/or embodiments of the present application may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. The above-described functions defined in the method of the present application are performed when the computer program is executed by a processing unit.
It should be noted that, the computer readable medium described in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowchart or block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more computer readable instructions executable by a processor to implement the steps of the methods and/or techniques of the various embodiments of the present application described above.
In a typical configuration of the present application, the terminals, the devices of the services network each include one or more processors (CPUs), graphics Processors (GPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer-readable media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device.
In addition, the embodiment of the application also provides a computer program which is stored in the computer equipment, so that the computer equipment executes the method for executing the control code.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In some embodiments, the software programs of the present application may be executed by a processor to implement the above steps or functions. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.