Introduction

In the 3D visualization of medical imaging, the maximum intensity projection (MIP) method can provide more intuitive images, the display effect of which is similar to that afforded by X-ray imaging. The algorithm principle is simple, and the amount of calculation is small. More importantly, MIP can be displayed in real time, so it is widely used in the field of medical imaging, such as displaying the three-dimensional structure of blood vessels, details of bronchi, and so forth [1,2,3]. In addition, the MIP technology plays an important role in the optimal diagnosis of diseases or postprocessing of medical imaging. For instance, Zheng et al. [4] applied MIP images to convolutional neural networks (CNNs) to improve the effectiveness of automatic lung nodule detection. The annotations of 2D MIP images were adopted to enhance the performances of neural networks to better segment the linear structures of 3D magnetic resonance angiography (MRA) images [5]. Furthermore, Harvey et al. [6] explored the effects of MIP images of cerebral CT angiography (CTA) based on photon counting CT and proved that the quality of CTA-MIP images is better than that of traditional CT images. Recently, the world has undergone tremendous changes due to the coronavirus disease 2019 (COVID-19). In order to solve this worldwide difficulty as soon as possible, the first important thing for scientists is to better understand the nature of this virus from different perspectives. The MIP technology plays a key role in better understanding and diagnosis of COVID-19 [7,8,9,10].

The MIP technology treats the voxel of the three-dimensional volume data as a small light source. Then, according to the theory of image space rendering, the light source emits rays along a certain direction. When the rays pass through the data field and encounter the maximum density value, the maximum value is projected in each pixel on the corresponding screen to form the final projection [11,12,13]. Therefore, based on this projection logic, MIP has several limitations, such as the overlap of blood vessels, bones, and internal organs or inconspicuous visualization of small blood vessels and trachea due to unrelated impurities with similar intensity. Accordingly, several improved algorithms have been proposed [14,15,16,17]; however, these distant algorithms are mainly aimed at reducing the amount of calculation or increasing the depth of projection. Therefore, after more than 10 years or even 20 years, these algorithms have not been widely used. In contrast, nowadays, the traditional MIP algorithm is still being used in 3D visualization of medical imaging. Based on these considerations, we assumed that the biggest function of MIP technology is to display the stenosis, dilation, and morphological direction of blood vessels or trachea. By now, the developed post processing-based high-performance software can display high-quality MIP images well [18]. If the three-dimensionality of the image needs to be displayed, the volume rendering (VR) method can replace the projection method. Therefore, in further research, it would be necessary to understand how to use a certain algorithm to generate the better-quality MIP images, which can more clearly show the tiny details. This “high quality-higher quality” research logic or direction can better promote clinically high-precision diagnosis. However, to the best of our knowledge, when deep learning has been becoming the mainstream in the recent years, very few studies have addressed the aforementioned issue.

Furthermore, artificial intelligence technology has been widely used in medical image processing, and various network structures have been proposed to perform image synthesis, image fusion, and so forth [19,20,21,22]. The logic of image synthesis or image fusion is to transform a medical image or several medical images into the desired medical image through neural network technology. In simple terms, the MIP technology aims to synthesize a projection image from volumetric data or continuous 2D data images. Inspired by this, the purpose of the present study is to explore the potential of using neural networks to synthesize or generate projection images and to compare the results with those afforded by the traditional MIP technology. Accordingly, we propose a new network structure based on the hybrid of U-net and radial basis function neural network (U-RBFNN), which is a combination of deep neural network and shallow neural network. To this end, we first adopted a U-net to extract features of input images and generate enhanced or improved output images. Then, these output images were used as parallel inputs of radial basis function (RBF) neural network to synthesize the final projection image. The characteristic of the proposed U-RBFNN is to fully combine the learning capabilities of different types of neural networks. First, the feature prior points are extracted by the convolutional neural network, and then point-to-point fusion is performed through the shallow neural network whose running is based on the data points; finally, the learning abilities are superimposed. Additionally, our aim is to get higher-quality images than traditional MIP images. Accordingly, for training, we adopted a transfer learning style. We first performed a certain blur processing on the initial images and then used traditional MIP images as the gold standard for supervised training. For a better comparison, in addition to comparing the traditional MIP images, we also used two neural network networks based on image fusion for better observation. For experimental datasets, in order to increase the robustness and generality of the proposed algorithm, we used three open datasets to verify the proposed algorithm by using different human body parts and different slab thicknesses. Finally, in addition to applying radiologist subjective observation, to objectively compare the generated resultant image, five metrics were implemented to evaluate the performance of different methods. The results demonstrated that the performance of the proposed algorithm was significantly better than that of the traditional MIP technology and the other two neural network-based structures. Overall, the contributions of this study can be summarized as follows:

  1. 1.

    The present study is the first to introduce the neural network technology to synthesize the maximum intensity projection (MIP) images to achieve superior image performance.

  2. 2.

    This study is the first to combine the convolution-based U-net neural network and the radial basis function neural network (RBFNN). The results proved that the CNN can effectively activate the intelligence of the shallow neural network and achieve good effects.

  3. 3.

    The results of applying a large number of open-databases demonstrated robustness, generality, and applicability of the proposed algorithm.

Methods and Materials

The entire flow chart of the proposed hybrid of U-net and radial basis function neural network (U-RBFNN) is shown in Fig. 1. The data basis for the generation of the maximum intensity projection (MIP) images required a specific 3D volumetric data or continuous 2D data. Accordingly, we firstly substituted the 2D images used for MIP into the U-net network, and the output images with increased detail could be obtained after feature extraction and up-sampling. Subsequently, we substituted these output images into RBFNN in parallel and, finally, the MIP image was obtained by effectively running RBFNN.

Fig. 1
figure 1

The detailed flow chart of the proposed method

Radial Basis Function Neural Network

Radial basis function neural network (RBFNN) is a single hidden layer, a feedforward neural network based on function approximation proposed in the late 1980s. The structure of classic RBFNN includes the following three layers: input layer, hidden layer, and output layer (see Fig. 2). With the maturity of technology, RBFNN has received considerable attention from researchers in various fields due to its simple structure, strong nonlinear approximation ability, and good generalization ability. It is widely used in many research fields, including pattern classification, function approximation, and data mining [23,24,25]. In RBFNN, the Gaussian function is the most commonly used radial basis function to effectively activate the logical relationship between the input layer and the hidden layer [24]. The expression of the Gaussian function G is as follows (see Eq. 1):

Fig. 2
figure 2

The structure of the traditional radial basis function neural network (RBFNN), which includes three layers: input layer, hidden layer, and output layer

$$G\left({x}_{i},{c}_{pi},{\sigma }_{pi}\right)=exp(-\frac{1}{2{\sigma }_{pi}^{2}}{\parallel {x}_{i}-{c}_{pi}\parallel }^{2})$$
(1)

where x is input variable, i is ith neuron of input layer; p is the pth neuron of hidden layer, which also corresponds to the pth Gaussian function. \({c}_{pi}\) and \({\sigma }_{pi}\) are the center and variance of the pth Gaussian function of ith input neuron, and \(\parallel \parallel\) is Euclidean norm.

After activating the correlation of each hidden layer neuron and each input layer neuron, the corresponding relationship between a certain hidden layer and the entire input layer was obtained using Eq. (2). Finally, the logical relationship between hidden layer neurons and output layer neurons was the linear weighting style (see Eq. 3).

$$R\left(x,\mu_p,\sigma_p\right)={\textstyle\prod_{i=1}^k}G(x_i,c_{pi},\sigma_{pi})$$
(2)
$$Y_o={\textstyle\sum_{p=1}^l}\omega_{po}R\left(x,c_p,\sigma_p\right)$$
(3)

where \(R\left(x,{\mu }_{p},{\sigma }_{p}\right)\) is the value of the pth hidden neuron. k is the number of input neurons. Yo is the value of the oth output neuron. l is the number of hidden neurons. ωpo is the connection weight of the pth neuron of the hidden layer and the oth neuron of the output layer.

Accordingly, each input neuron of RBFNN was actually a variable value. In a previous study, we succeeded in substituting medical images of different modalities into RBFNN for medical image fusion [26]. From the point of view of variable points, we used the pixels at the same position of different modal images through neural network technology to synthesize the pixels at the corresponding positions of the fused image. Therefore, following the same logic, in this study, our aim in the present study was to substitute the pixels at the same position of the continuous 2D data into the RBFNN in parallel. The output layer of the neural network is a neuron which represents a pixel point at the corresponding position of the MIP image. In turn, the entire projected image could be obtained. In addition, according to our previous results [25, 26], in medical image processing, one of the key aspects necessary to improve the intelligence of RBFNN is to effectively select or calculate the feature points of the pending images which are the neurons that make up the input layer of the neural network. Based on this, in the present study, we applied U-net neural network as a priori processing for the feature extraction of the input layer of RBFNN.

U-Net Neural Network

From the image level, to better highlight the detailed features of the image, many previous studies have applied U-net and got good effects [27, 28]. The structure of the U-net in this study, including an encoder path and a decoder path, is shown in Fig. 3. The encoder part adopted typical convolutional neural network structure, with 4 down-sampling and 5 layers of convolutions. Each group of convolution consisted of two convolution operations. The size of the convolution kernel was 3 × 3, followed by corresponding batch normalization operation and rectified linear unit (ReLU) activation function. In the 2nd layer to the 5th layer, there were corresponding down-sampling max pooling operation with a convolution kernel of 2 × 2 and a stride of 2. The purpose of down-sampling was that the size of the feature map continued shrinking, and the number of channels went on increasing. The decoder part was expansion processing; the up-sampling operation (the size of the convolution kernel is 2 × 2) was performed first; then, it was superimposed with the feature map of the corresponding layer on the down-sampling path on the left to perform convolution processing. In addition, other convolution operations were consistent with the corresponding up-sampling part on the left side, until the final image was output through a 1 × 1 convolution. Of note, in order to ensure the consistency of image size, we implemented zero padding.

Fig. 3
figure 3

The structure of U-net for activating the intelligence of RBFNN

Experimental Data

To increase the robustness and versatility of the proposed method, we adopted three open datasets. To explore morphological changes of the weak and small bronchi of the lung parenchyma, we used The Lung Image Database Consortium and Image Database Resource Initiative (L-IDRI) dataset [29]. Subsequently, Cancer Imaging Archive-pancreas dataset (CIA-P) was selected to observe the projection of the shape details of the blood vessels in the liver and pancreas [30]. Finally, Information Extraction form Images-Magnetic resonance angiography (IXI-MRA) dataset was adopted to check the morphological changes of blood vessels in the brain [31]. To increase the diversity of experimental data effects based on different slab thicknesses, the number of 2D data required for each projection based on L-IDRI, CIA-P, and IXI-MRA was selected as 10–25, 20–35, and 70–100, respectively, depending on whether it was the training dataset or the test dataset. In addition, for L-IDRI data, in order to more effectively perform projection processing without being disturbed by irrelevant factors, we implemented the extraction of lung parenchyma. First, we transformed the image into the binary image according to the threshold and the “Find” function [32]. This was done to roughly distinguish the internal and external information of the lung parenchyma. The threshold range was determined according to the mean of the maximum pixel value and the minimum pixel value. Subsequently, the maximum connected component [33] was calculated to connect as many small components as possible. Furthermore, for the remaining gaps in the lung parenchyma, we used erosion processing [34] to fill the entire lung parenchyma. Next, in order to prevent unnecessary details from interfering with the final extraction of lung parenchyma, we removed the interfering details of original image through hole filling [35] and the Find function. Finally, based on the corroded binary image and the processed original image, we successfully extracted the lung parenchymal by the Find function. The whole processing is depicted in Fig. 4.

Fig. 4
figure 4

The whole process of lung parenchyma extraction

Training Processing

In medical image processing based on deep learning, the determination or characterization of the ground truth data has always been a challenging task [36, 37]. In the present study, we aimed to obtain a better MIP image than the traditional ones through the proposed neural network. Therefore, we had no gold standard to rely on. Accordingly, targeted transfer learning was carried out. To train U-net, according to the characteristics of the key organ structure, gray value, and contour stripe trend, and so forth, we carefully selected 2000, 2000, and 1000 data based on L-IDRI dataset, CIA-P dataset, and IXI-MRA dataset, respectively. Furthermore, we augmented the dataset by implementing blurring and adding noise locally or globally. Accordingly, the processed data were used as the original input data for training U-net, while the unprocessed data were used as the ground truth data for training U-net. The Adam optimization algorithm [38] was performed to train the U-net. We chose the learning rate as 10−3 to 10−6. Specifically, the original learning rate was 10−3, and it decreased exponentially with the increase of epochs until 10−6. The batch size was selected as 5. In order to prevent overfitting, the early stop based on 8 epochs was implemented. Additionally, the loss function based on the mean square error (MSE) was selected [27]. To train RBFNN, we adopted the classic gradient descent method (GDM) [39]. We set the learning rate of shallow network and iteration number range to 0.01 and 200–300, respectively. The loss function of RBFNN was also selected based on classic MSE [25]. More importantly, the original 2D image used to synthesize the MIP image was used as the input of the initial neural network. MIP images synthesized by traditional method were used as ground truth of RBFNN. The numbers of ground truth data used for training based on the L-IDRI dataset, CIA-P dataset, and IXI-MRA dataset were selected as 200, 200, and 250, respectively.

Evaluation Metrics

As mentioned above, to the best of our knowledge, none of the previous studies has applied neural network algorithm to MIP technology. Therefore, in order to better highlight the proposed algorithm, in addition to comparing with traditional MIP (T-MIP), we also compared the proposed method with two image fusion–based deep convolutional neural networks: IFCNN-1 [40] and IFCNN-2 [20]. In subjective observation, the radiologists can directly evaluate the shape of a key part under the projection, the stripe texture, and so forth to judge the quality of the projection. In addition to subjective evaluation, the application of objective indicators is also essential and necessary. It is well known that the most common and classic metrics for evaluating image quality are image sharpness, image contrast, and so on. Since there are no reference images, in the present study, we adopted the following five classic no-reference image quality metrics: Histogram Entropy (HISE) [41], image contrast (CONT) [42], Brenner’s (BREN) [43], Tenengrad (TENG) [41], and Tenengrad variance (TENV) [44]. For all metrics, the larger was the value, the higher was the image quality.

Results

Results on Visual Observation

For the L-IDRI dataset, CIA-P dataset, and IXI-MRA dataset, we tested 1000, 1000, and 250 data, respectively. Specifically, 3 groups of L-IDRI data (labeled L-IDRI Data-1, L-IDRI Data-2, and L-IDRI Data-3), 2 grounds of CIA-P data (labeled CIA-P Data-1, CIA-P Data-2), and 2 grounds of IXI-MRA data (labeled IXI-MRA Data-1 and IXI-MRA Data-2) are shown in Figs. 5, 6, 7, 8, 9, 10 and 11. In Fig. 5, annotation letters (a–d) denote the projection images by implementing traditional MIP (T-MIP) method, two image fusion convolutional neural network methods (IFCNN-1 and IFCNN-2), and the proposed method, respectively. e–h of the second row denotes the magnified images of region of interest (ROI) in respective (a–d) for making it easier to observe the performances of several methods. Figures 6, 7, 8, 9, 10 and 11 follow the layout format of Fig. 5. In terms of visually judging image quality, the most important criterion is image definition. In addition, we also adopted fidelity to observe whether the projection images were distorted; texture continuity to observe whether some trachea, blood vessels, and other structures in the image were intact; and detail imaging to observe whether more details, especially, tiny structural details, were displayed. In order to increase authority and reduce subjective differences, we randomly selected 10 pieces of data from each dataset, and all the data were judged and compared through four experienced radiologists. Additionally, we adopted the 5-point evaluation method; specifically, for the four standards, radiologists would score from 0 to 5, with 0 points being the lowest and 5 points being the highest. For the different types of data, the scoring situations are shown in Tables 1, 2 and 3.

Fig. 5
figure 5

Projection performance of L-IDRI Data-1 based on four methods. ad The projection images by implementing T-MIP method, IFCNN-1, IFCNN-2 method, and the proposed method, respectively. eh the magnified images of the region of interest (ROI) in respective (ad)

Fig. 6
figure 6

Projection performance of L-IDRI Data-2 based on four methods. ad The projection images by implementing T-MIP method, IFCNN-1, IFCNN-2 method, and the proposed method, respectively. eh the magnified images of the region of interest (ROI) in respective (a-d)

Table 1 The 5-point evaluation table based on the L-IDRI data, where four radiologists scored four metrics. DE, FI, TC, DI, and GT stand for definition, fidelity, texture continuity, detail imaging and grand total, respectively. Bold represents the best performances

Subsequently, we analyzed different datasets in further detail. In the three datasets of the L-IDRI dataset in Figs. 5, 6 and 7, we clearly observed that the performances of the two methods based on IFCNN were the worst, the images were severely distorted, and the bronchus, especially the tiny bronchus, was blurred. Compared to the traditional MIP method and the proposed method, the proposed method showed a superior performance in definition and small tracheal detail imaging. As can be seen in Fig. 7 that shows image inversion processing, the proposed method was clearly superior as compared to the other three methods. In addition, according to the scores of four radiologists, as compared with the other three methods, the proposed method got the highest score (see Table 1). For the CIA-P dataset, our aim was to observe the projection of blood vessels in the liver. Since the gray values of blood vessels and other tissues around the liver had relatively small differences, it was very important to more clearly visualize blood vessels to observe related lesions. As can be seen in Figs. 8 and 9, while some small blood vessels were clearly displayed in the image produced by the proposed method, this did not occur in the other three methods. The comprehensive performance of our method was also optimal based on the 5-point evaluation standard (see Table 2). Finally, we selected thick slab thickness to project the blood vessels of the brain. As shown in Figs. 10 and 11, for both blood vessel continuity and microvascular imaging, our proposed method yielded more details without distortion of the image. The results in Table 3 further confirm the superior performance of the proposed method.

Fig. 7
figure 7

Projection performance of L-IDRI Data-3 based on four methods. ad The projection images by implementing T-MIP method, IFCNN-1, IFCNN-2 method, and the proposed method, respectively. eh the magnified images of the region of interest (ROI) in respective (ad)

Fig. 8
figure 8

Projection performance of CIA-P Data-1 based on four methods. ad The projection images by implementing T-MIP method, IFCNN-1, IFCNN-2 method, and the proposed method, respectively. eh The magnified images of the region of interest (ROI) in respective (ad)

Fig. 9
figure 9

Projection performance of CIA-P Data-2 based on four methods. ad The projection images by implementing T-MIP method, IFCNN-1, IFCNN-2 method, and the proposed method, respectively. eh The magnified images of the region of interest (ROI) in respective (ad)

Table 2 The 5-point evaluation table based on the CIA-P data, where four radiologists scored four metrics. DE, FI, TC, DI, and GT stand for definition, fidelity, texture continuity, detail imaging, and grand total, respectively. Bold represents the best performance
Fig. 10
figure 10

Projection performance of IXI-MRA Data-1 based on four methods. ad The projection images by implementing T-MIP method, IFCNN-1, IFCNN-2 method, and the proposed method, respectively. eh The magnified images of the region of interest (ROI) in respective (ad)

Fig. 11
figure 11

Projection performance of IXI-MRA Data-2 based on four methods. ad The projection images by implementing T-MIP method, IFCNN-1, IFCNN-2 method, and the proposed method, respectively. eh The magnified images of the region of interest (ROI) in respective (ad)

Table 3 The 5-point evaluation table based on the IXI-MRA data, where four radiologists scored four metrics. DE, FI, TC, DI, and GT stand for definition, fidelity, texture continuity, detail imaging, and grand total, respectively. Bold represents the best performance

Results on No-Reference Image Quality Metrics

In addition to subjective observation, another important criterion is the evaluation of objective metrics. As mentioned above, for the three datasets, we tested 1000 L-IDRI dataset, 1000 CIA-P data, and 250 IXI-MRA data, respectively. Since there is no real standard image, we analyzed the definition of the images produced by the four methods. Based on this, in order to objectively evaluate a total of 2250 data, we adopted the following five definition-based metrics: histogram entropy (HISE), image contrast (CONT), Brenner’s (BREN), Tenengrad (TENG), and Tenengrad variance (TENV). Tables 4, 5 and 6 show the means and standard deviations of respective test datasets based on five metrics.

Table 4 For the four methods, the mean and standard deviation (SD) performances of the five metrics based on the L-IDRI dataset. Bold represents the best performance
Table 5 For the four methods, the mean and standard deviation (SD) performances of the five metrics based on the CIA-P dataset. Bold represents the best performance
Table 6 For the four methods, the mean and standard deviation (SD) performances of the five metrics based on the IXI-MRA dataset. Bold represents the best performance

Taken together, the proposed methods almost achieved the highest metrics value performance. Except for the CIA-P dataset and IXI-MRA dataset, the two methods based on IFCNN achieved better HISE performance. Combined with the image performance, we conclude that the most likely cause was that the image distortion–produced artifacts, which interfered with the judgment of HISE. In short, as compared to the traditional MIP technology, the proposed method produced the higher quality projection images based on the three datasets. In addition, all experimental results were statistically significant (p ˂ 0.01).

Discussion

Analysis of the Performances of Two IFCNN Structures for Projection Synthesis

As described in “Results”, regardless of subjective observation or objective analysis, the two convolutional neural network methods based on image fusion (IFCNN-1 and IFCNN-2) did not achieve good results for synthesizing MIP images. Simply put, the overall framework of IFCNN-1 is based on the classic fully connected convolutional neural network. IFCNN-2 uses the full convolutional network structure and feature map fusion processing. These two methods achieved good results based on image fusion technology with fewer input images. Conversely, IFCNN-1 and IFCNN-2 performed poorly based on the MIP technology that requires many parallel input images. This finding can be attributed to the fact that, first of all, the principle of MIP technology is to project the details or elements with the maximum density, the pertinence of feature extraction which is more accurate than that of image fusion. The layer-by-layer convolution processing based on the window or patch-level may cause blurred recognition. In addition, due to the large amount of input images, many parallel network structures are needed, so training neural networks is a considerable challenge. Under existing conditions, neural networks cannot effectively learn.

Analysis of the Performances of the Proposed Network Structures for Projection Synthesis

In a previous study, we successfully applied the radial basis function neural network based on point-level to medical image fusion. Based on these results, in the present study, we adopted a shallow, simple neural network to apply the MIP technology under the situation when the performances of the parallel and complex network structures were not good for synthesizing projection images. As is widely known, the traditional shallow neural network (SNN) is not as popular as the convolutional neural network in two-dimensional or even multi-dimensional image processing. Due to the simple network structure, the SNN cannot effectively learn the complex image structure. However, in the present study, we found that, after targeted feature point extraction, radial basis function neural network (RBFNN) was then implemented to analyze and process specific feature points, and combining effective learning, the intelligence of RBFNN can also be effectively activated. More importantly, as there is no need to train feature points and RBFNN is based on pixel (point)-level, the training burden is much smaller. In the field of medical imaging, labeled data are actually not abundant [45]. Therefore, it is necessary to investigate the training of small sample data. Based on this, we adopted U-net network to provide feature points for RBFNN. Furthermore, in order to show more intuitively why we chose the network structure based on the hybrid of U-net and RBFNN, we conducted another set of comparative experiments based on the traditional MIP method (T-MIP), the hybrid structure of U-net and traditional MIP (U-MIP), meaning that the U-net was first implemented to obtain feature images or enhanced images, and then the projection image was obtained based on the traditional MIP method of projecting these enhanced images, RBFNN-only structure, and the hybrid structure of U-net and RBFNN structure (proposed method). For all structures, we also tested 1000 L-IDRI data. Figure 12 shows the comparison effect of one of the L-IDRI dataset based on the four methods, where a–d denote the T-MIP method, the RBFNN-only method, the U-MIP method, and the proposed method, respectively. As can be seen in Fig. 12, the proposed hybrid structure based on U-net and RBFNN is the best in terms of intuitive effects, such as detail imaging and image definition. Not surprisingly, based on the objective judgment of five image definition metrics, the proposed hybrid structure of U-net and RBFNN yielded the best performance for 1000 sets of test data (see Fig. 13).

Fig. 12
figure 12

The image performances based on L-IDRI data. ad The MIP based on traditional method (T-MIP), the radial basis function neural network-only method (RBFNN), the MIP based on U-net neural network (U-MIP), and the proposed method, respectively

Fig. 13
figure 13

For L-IDRI dataset, the objective comparison between the proposed three structures and the traditional MIP based on five no-reference quality evaluation metrics. The values of all metrics are the mean values of the test data

Analysis of the Performances of Different Numbers of Hidden Layer Neurons Based on RBFNN

Furthermore, due to some factors, such as the different types of data, the choice of different slab thickness when inputting data, and the different amount of adopted data, and so forth, the choice of the number of hidden layer neurons in the RBFNN is not fixed. The optimal hidden layer neuron mainly depends on the training data, input variables, and other factors; however, the amount of calculation and generalization effect should also be considered [46, 47]. For L-IDRI dataset, we selected 50 as the number of hidden neurons. For the selection of the number of hidden layers, we conducted a set of comparative experiments based on the different number of hidden layer neurons: 25, 30, 35, 40, 45, 50, 60, 70, and 100, respectively. The actual output data from the respective trained neural network were compared to the ideal output data (gold standard) through mean square error (MSE), peak signal to noise ratio (PSNR), and structural similarity (SSIM) [48]. The larger was the value of PSNR and SSIM, the better (i.e., closer to the ideal output images) was the image quality, while MSE was the opposite. For a more intuitive observation, we selected the respective mean values for the metric values of all test data. The comparison based on different numbers of neurons is shown in Fig. 14. From the overall performances, 50 obtained the best performance. Finally, combined with image factors, we chose 50 as the optimal hidden layer neuron.

Fig. 14
figure 14

For L-IDRI dataset, based on mean square error (MSE), peak signal to noise ratio (PSNR), and structural similarity (SSIM), the performances of different numbers of hidden layer neurons

Significance of Proposed Method and Further Study

In the present study, we aimed to establish whether quality of the projection images obtained through the proposed neural network would be better than the traditional projection map, which would help doctors to more accurately and timely determine small lesions. In the field of medical imaging, today’s medical equipment and post-processing method based on software can produce high-quality images; however, in some cases, they still cannot meet the requirements for accurate diagnosis. Therefore, we believe that the use of artificial intelligence technology to further improve high-quality images is a future development direction. Accordingly, we made a preliminary attempt in the field of synthetic projection maps. To the best of our knowledge, our study is the first to apply neural network technology to synthesize projection images. In further research, we will improve the network structure and training methods, apply more clinical data, and strive to obtain higher quality MIP images.

Conclusion

In the present study, we aimed to obtain higher-quality maximum intensity projection (MIP) images to help radiologists precisely diagnose diseases. To this end, we proposed a hybrid structure based on U-net network and radial basis function neural network (RBFNN) to synthesize MIP images. Compared to the traditional MIP method and other network structures, through the doctor’s judgment and objective metric analysis, the qualities of the images obtained by the proposed method were found to be optimal. In addition, the application of a large amount of data also demonstrated robustness and generality of the proposed method.