Iterative Reconstruction of Micro Computed Tomography Scans Using Multiple Heterogeneous GPUs
<p>FSAs for flow control of OTFSM calculation for OS-MLTR using heterogeneous GPUs.</p> "> Figure 2
<p>(<b>A</b>) QRM HA phantom. (<b>B</b>) Image reconstructed using OS-MLTR from a full scan condition, and (<b>C</b>) QRM HA phantom five cylindrical sectional view.</p> "> Figure 3
<p>CT images reconstructed with OS-MLTR algorithm method. (SID = 127 mm, SOD = 78.59 mm, Detector size = 1944 × 1536 pixels, detector pixel size = 0.075 × 0.075 mm, reconstruction image = 1944 × 1944 pixels).</p> ">
Abstract
:1. Introduction
2. Background Information
Algorithm 1: OTFSM calculation for OS-MLTR using a single GPU |
Input: Projection, AngleList, StartSlice, EndSlice, BlankProj, TotalProjCnt, SetNumber, SubSetNumber, GPU_ID, SetIterCnt; Output: SliceImage SetCudaDevice(GPU_ID);/Activates a CUDA device with the GPU_ID float ts = SumSM(0→TotalProjCnt);//The term of in Equation (4) float cur_proj_angle; uint SetIndex, SliceN, cur_proj_index; boolean StopCriteria; float SM, SMT; //Dynamic allocation and transpose of the system matrix float U1, U2, cur_guess, ; for SliceN: StartSlice→EndSlice While(! StopCriteria || iter_cnt < SetIterCnt){ for SetIndex: 0→SetNumber //OS-MLTR outer loop SetIndex = 0; U1 = 0; U2 = 0; cur_guess = 0; for SubSetIndex: 0→SubSetNumber //OS-MLTR inner loop cur_proj_index = SubSetIndex × SetNumber + SetIndex; cur_proj_angle = AngleList[cur_proj_index]; rojection[cur_proj_index]; gpu_OTFSM(cur_proj_angle, &SM, &SMT); P = SMT × cur_guess; //Forward-projection = BlankProjexp(−P); U1 = SM × () + U1; //Backprojection U2 = SM × () + U2; cur_guess = (U1U2) + cur_guess; SliceImage = cur_guess; StopCriteria = Check_StoppingCriteria();//Stopping Criteria is RMSE |
3. Materials and Methods
3.1. Proposed Method
Algorithm 2: FSAs for OTFSM calculation for OS-MLTR using heterogeneous GPUs |
Input: Projection, AngleList, StartSlice, EndSlice, BlankProj, TotalProjCnt, SetNumber, SubSetNumber, SetIterCnt; Output: SliceImage struct recon_param_struct; uint SetIndex, SliceN, TotalGPU, StopCriteria; TotalGPU = Gather_available_GPU(); float ts = SumSM(0→TotalProjCnt);//The term of in Equation (4) uint gpu_Idle_queue[TotalGPU], gpu_suspend_queue[TotalGPU];//The flow control queue pthread_t pthd_idle[TotalGPU], pthd_suspend[TotalGPU]; float U1[TotalGPU], U2[TotalGPU], cur_guess; for SliceN: StartSlice→EndSlice While(! StopCriteria || iter_cnt < SetIterCnt) for SetIndex: 0→SetNumber SetIndex = 0; cur_guess = 0; U1[0→TotalGPU] = 0; U2[0→TotalGPU] = 0; Launch_FSA(OSEM_M, OSEM_Inner_S1, OSEM_Inner_s2); While (SetIndex == SetNumber) if(!is_empty(gpu_idle_queue)& is_get_subset_index) recon_param_struct->proj_index=SubSetIndex×SetNumber+SetIndex; gpu_id = pop(gpu_idle_queu); SetIndex++; if(OSEM_Inner_s1.S7) pthd_idle[gpu_id] = pthread_create(func_inner_s1, gpu_id); if(!is_empty(gpu_suspend_queue)& is_ready(cur_guess)) inner_s2_id = pop(gpu_suspend_queue); pthd_suspedn[gpu_id]=pthread_create(func_inner_s2,inner_s2_id); if(check_pthd_join(pthd_idle[indx:0→TotalGPU])) gpu_id = indx; push(gpu_idle_queue, gpu_id); if(check_pthd_join(pthd_suspend[inner_s2_id]) push(gpu_idle_queue, inner_s2_id); if(is_OSEM_M.S5 & !is_empty(gpu_idle_queue) gpu_id = pop(gpu_idle_queu); pthd_idle[gpu_id] = pthread_create(func_outer, gpu_id); if(check_pthd_join(pthd_idle[gpu_id])) gpu_id = indx; push(gpu_idle_queue, gpu_id); SliceImage = cur_guess; StopCriteria = Check_StoppingCriteria();//Stopping Criteria is RMSE |
Algorithm 3: OTFSM calculation |
Function: func_inner_s1(gpu_id, reco_param_struc){ setCudaDevice(gpu_id); uint proj_index = reco_param_struct->proj_index; float SM = recon_param_struc->SM; float SMT = recon_param_struct->SMT; float proj_angle = recon_param_struct->AngleList[proj_index]; gpu_OTFSM(proj_angle, SM, SMT); } |
Algorithm 4: Refinement step for back-projection calculation |
Function: func_inner_s2(gpu_id, cur_guess, recon_param_struct){ setCudaDevice(gpu_id); uint proj_index = reco_param_struct->proj_index; float SM = recon_param_struc->SM; float SMT = recon_param_struct->SMT; float U1 = recon_param_struct->U1; float U2 = recon_param_struct->U2; float y = Projection[proj_index]; float P = SMT × cur_guess; float exp(−P); U1 = SM × () + U1; U2 = SM × () + U2; } |
Algorithm 5: Calculation of the current predicted image |
Function: func_outer(gpu_id, cur_guess, recon_param_struct){ setCudaDevice(gpu_id); float U1 = recon_param_struct->U1; float U2 = recon_param_struct->U2; cur_guess = (U1U2) + cur_guess;} |
3.2. Experiments
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gordon, R.; Bender, R.; Herman, G.T. Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 1970, 29, 471–481. [Google Scholar] [CrossRef]
- Herman, G.T.; Meyer, L.B. Algebraic reconstruction techniques can be made computationally efficient [positron emission tomography application]. IEEE Trans. Med. Imaging 1993, 12, 600–609. [Google Scholar] [CrossRef]
- Wei, W.; Biwen, Y.; Jiexian, W. Application of a simultaneous iterations reconstruction technique for a 3-D water vapor tomography system. Geod. Geodyn. 2013, 4, 41–45. [Google Scholar] [CrossRef]
- Andersen, A.H.; Kak, A.C. Simultaneous Algebraic Reconstruction Technique (SART): A superior implementation of the ART algorithm. Ultrason. Imaging 1984, 6, 81–94. [Google Scholar] [CrossRef] [PubMed]
- Shepp, L.A.; Vardi, Y. Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imaging 1982, 1, 113–122. [Google Scholar] [CrossRef] [PubMed]
- Kamphuis, C.; Beekman, F.J. Accelerated iterative transmission CT reconstruction using an ordered subsets convex algorithm. IEEE Trans. Med. Imaging 1998, 17, 1101–1105. [Google Scholar] [CrossRef] [PubMed]
- Levitan, E.; Herman, G.T. A maximum a posteriori probability expectation maximization algorithm for image reconstruction in emission tomography. IEEE Trans. Med. Imaging 1987, 6, 185–192. [Google Scholar] [CrossRef] [PubMed]
- Feldkamp, L.A.; Davis, L.C.; Kress, J.W. Practical cone-beam algorithm. J. Opt. Soc. Am. A 1984, 1, 612–619. [Google Scholar] [CrossRef]
- Klaus, M.; Fang, X.; Neophytos, N. Why do commodity graphics hardware boards (GPUs) work so well for acceleration of computed tomography? In Computational Imaging V; SPIE: Bellingham, WA, USA, 2007; p. 64980N. [Google Scholar]
- Xu, F.; Xu, W.; Jones, M.; Keszthelyi, B.; Sedat, J.; Agard, D.; Mueller, K. On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs. Comput. Methods Programs Biomed. 2010, 98, 261–270. [Google Scholar] [CrossRef]
- Beister, M.; Kolditz, D.; Kalender, W.A. Iterative reconstruction methods in X-ray CT. Phys. Medica Eur. J. Med. Phys. 2012, 28, 94–108. [Google Scholar] [CrossRef]
- Brodtkorb, A.R.; Hagen, T.R.; Sætra, M.L. Graphics processing unit (GPU) programming strategies and trends in GPU computing. J. Parallel Distrib. Comput. 2013, 73, 4–13. [Google Scholar] [CrossRef]
- Leeser, M.; Mukherjee, S.; Brock, J. Fast reconstruction of 3D volumes from 2D CT projection data with GPUs. BMC Res. Notes 2014, 7, 582. [Google Scholar] [CrossRef] [PubMed]
- Park, H.G.; Shin, Y.G.; Lee, H. A Fully GPU-Based Ray-Driven Backprojector via a Ray-Culling Scheme with Voxel-Level Parallelization for Cone-Beam CT Reconstruction. Technol. Cancer Res. Treat. 2015, 14, 709–720. [Google Scholar] [CrossRef]
- Xie, L.; Hu, Y.; Yan, B.; Wang, L.; Yang, B.; Liu, W.; Zhang, L.; Luo, L.; Shu, H.; Chen, Y. An Effective CUDA Parallelization of Projection in Iterative Tomography Reconstruction. PLoS ONE 2015, 10, e0142184. [Google Scholar] [CrossRef]
- Chen, P.; Wahib, M.; Takizawa, S.; Takano, R.; Matsuoka, S. iFDK: A scalable framework for instant high-resolution image reconstruction. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 17–19 November 2019; Association for Computing Machinery: Denver, CO, USA, 2019; p. 84. [Google Scholar]
- Hofmann, H.G.; Keck, B.; Rohkohl, C.; Hornegger, J. Comparing performance of many-core CPUs and GPUs for static and motion compensated reconstruction of C-arm CT data. Med. Phys. 2011, 38, 468–473. [Google Scholar] [CrossRef] [PubMed]
- Matenine, D.; Côté, G.; Mascolo-Fortin, J.; Goussard, Y.; Després, P. System matrix computation vs. storage on GPU: A comparative study in cone beam CT. Med. Phys. 2018, 45, 579–588. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhao, Y.; Zhao, X. A multi-thread scheduling method for 3D CT image reconstruction using multi-GPU. J. X-ray Sci. Technol. 2012, 20, 187–197. [Google Scholar] [CrossRef] [PubMed]
- Blas, J.G.; Abella, M.; Isaila, F.; Carretero, J.; Desco, M. Surfing the optimization space of a multiple-GPU parallel implementation of a X-ray tomography reconstruction algorithm. J. Syst. Softw. 2014, 95, 166–175. [Google Scholar] [CrossRef]
- Orr, L.J.; Jimenez, E.S.; Thompson, K.R. Cluster-based approach to a multi-GPU CT reconstruction algorithm. In Proceedings of the 2014 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), Seattle, WA, USA, 8–15 November 2014. [Google Scholar]
- Yu, X.; Nikitin, V.; Ching, D.J.; Aslan, S.; Gürsoy, D.; Biçer, T. Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data. Sci. Rep. 2022, 12, 5334. [Google Scholar] [CrossRef]
- Zhu, Y.; Wang, Q.; Li, M.; Jiang, M.; Zhang, P. Image reconstruction by Mumford–Shah regularization for low-dose CT with multi-GPU acceleration. Phys. Med. Biol. 2019, 64, 155017. [Google Scholar] [CrossRef]
- Rockmore, A.J.; Macovski, A. A Maximum Likelihood Approach to Emission Image Reconstruction from Projections. IEEE Trans. Nucl. Sci. 1976, 23, 1428–1432. [Google Scholar] [CrossRef]
- Lange, K.; Carson, R. EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assist. Tomogr. 1984, 8, 306–316. [Google Scholar]
- Mumcuoglu, E.U.; Leahy, R.; Cherry, S.R.; Zhou, Z. Fast gradient-based methods for Bayesian reconstruction of transmission and emission PET images. IEEE Trans. Med. Imaging 1994, 13, 687–701. [Google Scholar] [CrossRef] [PubMed]
- Nuyts, J.; Dupont, P.; Mortelmans, L. Iterative Reconstruction of Transmission Sinograms with Low Signal to Noise Ratio. In Computer Intensive Methods in Control and Signal Processing: The Curse of Dimensionality; Kárný, M., Warwick, K., Eds.; Birkhäuser Boston: Boston, MA, USA, 1997; pp. 237–248. [Google Scholar]
- Nuyts, J.; De Man, B.; Dupont, P.; Defrise, M.; Suetens, P.; Mortelmans, L. Iterative reconstruction for helical CT: A simulation study. Phys. Med. Biol. 1998, 43, 729–737. [Google Scholar] [CrossRef] [PubMed]
- Man, B.D.; Nuyts, J.; Dupont, P.; Marchal, G.; Suetens, P. Reduction of metal streak artifacts in X-ray computed tomography using a transmission maximum a posteriori algorithm. IEEE Trans. Nucl. Sci. 2000, 47, 977–981. [Google Scholar] [CrossRef]
- Fessler, J.A.; Ficaro, E.P.; Clinthorne, N.H.; Lange, K. Grouped-coordinate ascent algorithms for penalized-likelihood transmission image reconstruction. IEEE Trans. Med. Imaging 1997, 16, 166–175. [Google Scholar] [CrossRef]
- Joseph, P.M. An Improved Algorithm for Reprojecting Rays through Pixel Images. IEEE Trans. Med. Imaging 1982, 1, 192–196. [Google Scholar] [CrossRef] [PubMed]
- Siddon, R.L. Fast calculation of the exact radiological path for a three-dimensional CT array. Med. Phys. 1985, 12, 252–255. [Google Scholar] [CrossRef]
State No. | State Name |
---|---|
S0 | INITIAL |
S1 | IDLE |
S2 | CHECK_DEVICE_QUEUE |
S3 | CREATE_THREAD_OF_OSEM_INNER |
S4 | WAIT_OSEM_INNER_COMPLETE |
S5 | OSEM_OUTER |
S6 | OSEM_OUTER_COMPLETE |
Condition | State Trans. | Description |
---|---|---|
C1 | S0→S1 | OS-MLTR parameters are initialized. |
C2 | S1→S2 | Set the flow control to the initial state and build up the idle and suspended queues. |
C3 | S2→S2 | The idle queue is empty. |
C4 | S2→S3 | The idle queue is not empty. |
C5 | S4→S2 | The OSEM inner loop is completed. |
C6 | S3→S4 | The function of the OSEM inner loop is launched. |
C7 | S4→S4 | The OSEM inner loop is not complete. |
C8 | S2→S5 | The OSEM inner loop is complete, and the idle queue is empty. |
C9 | S5→S6 | The function of the OSEM outer loop is launched. |
C10 | S6→S6 | The OSEM outer loop is not complete. |
C11 | S6→S1 | The OSEM outer loop is completed. |
State No. | State Name |
---|---|
S0 | OSEM_S1_INITIAL |
S1 | OSEM_S1_IDLE |
S2 | GET_SUBSET_INDEX |
S3 | GET_IDLE_DEVIC |
S4 | CHECK_DEVICE_MEM |
S5 | WAIT_DEVICE_MEM_CHECK |
S6 | SUSPEND_DEVICE_QUEUE |
S7 | PTH_CREATE_OSEM_S1 |
S8 | STOP_OSEM_S1_FSM |
Condition | State Trans. | Description |
---|---|---|
C1 | S0→S1 | Set image scanning parameters, gather properties of available GPUs, and flush the idle and suspended queues. |
C2 | S1→S8 | SM calculation of the last projection with the subset has been completed. |
C3 | S1→S1 | Waiting for an idle GPU. |
C4 | S1→S2 | The idle queue is not empty. |
C5 | S2→S1 | Waiting for an idle GPU. |
C6 | S2→S3 | Obtain a candidate GPU ID from the idle queue. |
C7 | S3→S3 | Waiting for an idle GPU |
C8 | S3→S4 | Check the onboard memory usage of the candidate GPU |
C9 | S4→S5 | Waiting for the memory usage of the candidate GPU to be checked |
C10 | S5→S6 | The memory usage is over the limit, and the current subset index is abandoned; the GPU ID is pushed into the suspended queue. |
C11 | S5→S7 | Launch a thread task for OTFSM calculation with the candidate GPU |
C12 | S6→S1 | Push the GPU ID into the suspended queue and wait for the current predicted image to be ready |
C13 | S7→S8 | Return to the idle state |
C14 | S8→S8 | Completed OTFSM calculation of the subset |
C15 | S8→S1 | Return to the idle state |
State No. | State Name |
---|---|
S0 | OSEM_S2_IDLE |
S1 | GET_CURRENT_GUESS |
S2 | GET_SUSPEND_DEVICE |
S3 | PTH_CREATE_OSEM_S2 |
S4 | OSEM_INNER_S2_COMPLETE |
S5 | SET_DEVICE_IDEL |
Condition | State Trans. | Description |
---|---|---|
C1 | S0→S1 | The suspended queue is not empty. |
C2 | S0→S0 | The suspended queue is empty. |
C3 | S1→S2 | The current predicted image is ready. |
C4 | S1→S1 | The current predicted image token is busy. |
C5 | S2→S3 | Draw a candidate GPU from the suspended queue, and assign the current predicted token. |
C6 | S2→S2 | Wait until the suspended queue is not empty. |
C7 | S3→S4 | Launch a thread task to update the predicted image with the candidate GPU. |
C8 | S4→S4 | Wait for the GPU to update the newest current predicted image. |
C9 | S4→S5 | Complete the thread task. |
C10 | S5→S0 | Push the candidate GPU back to the idle queue and return to the idle state. |
State No. | State Name |
---|---|
S0 | OSEM_S1_INITIAL |
S1 | OSEM_S1_IDLE |
S2 | SET_JOB_Q |
S3 | PTH_CREATE_OSEM_S1 |
S4 | WAIT_JOB_COMPLETE |
S5 | SUBSET_PENDING |
S6 | STOP_OSEM_S1_FSM |
State No. | State Name |
---|---|
S0 | OSEM_S2_IDLE |
S1 | GET_CURRENT_GUESS |
S3 | PTH_CREATE_OSEM_S2 |
S4 | OSEM_INNER_S2_COMPLETE |
S5 | SET_DEVICE_IDEL |
Condition | State Trans. | Description |
---|---|---|
C1 | S0→S1 | Set the initial value for the threading synchronization control variables |
C2 | S1→S2 | The threading synchronization variable of the device idle is true |
C3 | S1→S6 | The threading synchronization variable of the last projection with the subset is true |
C4 | S2→S3 | Set the degree index in the job queue for the system matrix calculation |
C5 | S2→S1 | The synchronization variable of the device idle is false |
C6 | S2→S6 | The threading synchronization variable of the last projection with the subset is true |
C7 | S3→S4 | Create a threading task to process system matrix calculation with the job queue |
C8 | S4→S6 | The threading synchronization variable of the last projection with the subset is true |
C9 | S4→S5 | Wait for a threading task of the system matrix calculation completion |
C10 | S5→S1 | Set the threading synchronization variable of the device suspend that waits for refinement operation |
C11 | S6→S6 | Wait for the refinement operation completion |
C12 | S6→S1 | The threading synchronization variable of the last projection with the subset is true, and the refinement operation completion |
Condition | State Trans. | Description |
---|---|---|
C1 | S0→S0 | Wait for the system matrix calculation completion of the job queue |
C2 | S0→S1 | System matrix calculation completion |
C3 | S1→S1 | Wait for current guess image is ready and to do the refinement operation |
C4 | S1→S2 | Current guess image ready and create a thread task to do refinement operation |
C5 | S2→S2 | The device threading synchronization variable of refinement operation completion is false |
C6 | S2→S3 | The device threading synchronization variable of refinement operation completion is true |
C7 | S3→S3 | Wait for the refinement operation completion |
C8 | S3→S4 | The refinement operation completion |
C9 | S4→S0 | Set the threading synchronization of device status to idle |
Nvidia GPU Type | CUDA Core | Memory Size | Memory Width | Bandwidth |
---|---|---|---|---|
Titan-Xp | 3840 | 12 GB | 384 bits | 547 GB/s |
GeForce GTX1060 | 1280 | 6 GB | 192 bits | 192 GB/s |
GeForce GTX1050 | 768 | 4 GB | 128 bits | 112 GB/s |
Scenario | GPUs 4 | 243 × 243 | 486 × 486 | 972 × 972 | 1944 × 1944 | 3888 × 3888 |
---|---|---|---|---|---|---|
A 5 | 1 | 7.27 | 17.05 | 40.58 | 127.83 | 479.69 |
B 6 | 1, 2, 3 | 9.84 | 25.24 | 74.43 | 279.76 | 923.18 |
C 7 | 1, 1, 3 | 1.39 | 3.11 | 7.91 | 24.68 | 95.30 |
D 7 | 1, 2, 3 | 2.10 | 5.23 | 11.99 | 34.66 | 124.20 |
E 8 | 1, 1 | 1.32 | 2.80 | 7.17 | 21.60 | 81.12 |
F 8 | 1, 2 | 1.98 | 4.09 | 9.58 | 29.00 | 114.73 |
G 8 | 1, 3 | 1.76 | 3.89 | 9.99 | 31.30 | 122.52 |
H 8 | 2, 3 | 2.54 | 6.05 | 17.59 | 64.37 | 275.21 |
I 8 | 3, 3 | 2.43 | 6.21 | 19.56 | 76.81 | 333.29 |
SNR 1,2 | Run Time 4 | HA0 | HA100 | HA200 | HA400 | HA800 |
---|---|---|---|---|---|---|
OS-MLTR_s1 3 | 6484.32 | 20.7417 | 26.3722 | 30.8319 | 40.2266 | 58.2329 |
OS-MLTR_s5 3 | 1369.56 | 10.4133 | 13.3618 | 15.4639 | 20.2634 | 29.3281 |
OS-MLTR_s10 3 | 762.36 | 8.6491 | 11.0100 | 12.8613 | 16.7671 | 24.3092 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chou, W.-H.; Wu, C.-H.; Jin, S.-C.; Chen, J.-C. Iterative Reconstruction of Micro Computed Tomography Scans Using Multiple Heterogeneous GPUs. Sensors 2024, 24, 1947. https://doi.org/10.3390/s24061947
Chou W-H, Wu C-H, Jin S-C, Chen J-C. Iterative Reconstruction of Micro Computed Tomography Scans Using Multiple Heterogeneous GPUs. Sensors. 2024; 24(6):1947. https://doi.org/10.3390/s24061947
Chicago/Turabian StyleChou, Wen-Hsiang, Cheng-Han Wu, Shih-Chun Jin, and Jyh-Cheng Chen. 2024. "Iterative Reconstruction of Micro Computed Tomography Scans Using Multiple Heterogeneous GPUs" Sensors 24, no. 6: 1947. https://doi.org/10.3390/s24061947