This project attempts to remove bone structures from plane film X-Ray images to make the soft tissue structure easier to appreciate.
NA general pytorch environment:
conda create --name fastai
conda activate fastai
conda install -c pytorch -c fastai fastai
pip3 install torchvision
conda install -c pytorch -c fastai fastai
conda install -c conda-forge pydicom
Check utilization of GPU
nvidia-smi -q -g 0 -d UTILIZATION -l
Integration of physics based visualization with machine learning to improve medical image based decision support
Together with classification, image segmentation and fused overlays we demonstrate that changing the appearance of an image is an effective tool to help clinicians in detecting abnormalities and disease processes. Primarily the domain of trained radiologists some processes are difficult to detect on imaging but important to rule out if a patient presents with general symptoms such as shortness of breath. In this case a radiologist will attempt to establish individual characteristics in the image relevant to a common, most likely disease process. Once ruled out, given the evidence in the image, he or she will then proceed to the next likely process and inspect the image under a changed set of characteristics until a cause can be established.
This process is guided by favorable conditions during the visual inspection of the images. Room illumination is kept low and constant, high quality digital screens (10bit) are used in a fixed arrangement, image brightness and contrast are either preset or calculated from the image intensity histogram (based on proportion of volumes of interest covered in the image modality) and a fixed disease specific workflow is setup automatically by the reading station based on the type of medical image detected. This includes the arrangement of data views as multi-planar reconstructions, the set of measurement tools suitable for the current workflow and, if appropriate, the use of inverted views.
Over time new visualization features have been added to the set of tools provided by image viewing systems. One such example are cine loops for dynamic data another is automatic registration for prior-current comparison workflows. Both of these techniques are general enough to be useful in more than a single workflow and require specific input data features to become active. Following the same general strategy here we are trying to enhance the capabilities of a general purpose PACS station with novel decision support tools based on image data. The questions addressed are a) how can we integrate advanced image adjustments into established workflows and b) how can we make the introduction of novel data processing techniques safe, transparent and easy to interpret by the user/radiologist. This includes the presentation of the inherent limitations of the specific data processing pipeline. c) Can we make a data visualization steerable by the user to become useful in a staged interrogation of a given image modality.
As outlined above the image presentation system and the user both attempt to keep a constant appearance of the image. Overall brightness and contrast are kept fixed at a level that allows relevant features in the image to be most salient and therefore easiest to interpret. Under these circumstances how then can we introduce changes in image appearance useful to the expert user? A simple example is presented in the case of computer tomography (CT) generated volumetric images, which can be viewed with different sets of brightness and contrast settings specifically tuned to viewing of the lungs, soft-tissue, or bone structures. A single image modality is therefore used in more than one staged interrogation of the patient.
Image appearance in this case is also affected by the image reconstruction algorithm. The image can be reconstructed using specialized, tissue appropriate reconstruction kernels all using the same captured projection images. Whereas each of these reconstructed images is best at highlighting specific tissue types (e.g. bone in bone reconstruction kernel images) the separation of tissue density into different image intensity ranges as well as the calibration of the image intensities remains a general feature of CT permitting the use of brightness and contrast to highlight specific structures. In imaging systems such settings are accessible using predefined keyboard-shortcuts limiting the need for any manual adjustment. More specialized mouse-based tools are using in the definition of local image regions of interest. They trigger the recalculation of image brightness and contrast appropriate to the regional image statistics.
Other image modalities such as MR do not share this correlation of structural components to image intensity ranges. They are complementary in use and provide for example superior contrast for soft-tissue compartments and can be tailored to highlight dynamic processes such as perfusion, neuronal activation using blood-oxygen-level dependent imaging, or used to map out tissue micro-structure as in diffusion tensor imaging. In general each of the MR image modalities is specifically tailored to map out different domains and obtained and viewed in sequence based on the presented symptoms of the patient. Whereas intensity range adjustment cannot in general be used to highlight different features in these images automated contrast adjustments are applied to provide a uniform appearance for image reading based on the body part and modality imaged.
Other examples of image modalities for which it is difficult to provide tailored views to highlight specific anatomy are 2D plain-film images such as plain-film X-Rays. As single projection images of the whole body individual structures such as bones overlay in the picture with soft-tissue and as an example lung tissue regions. Such image of the chest are prescribed for patients that present for example with general breathing problems. Easy to administer the obtained images can be used to help rule out broken bones, air-way obstructions or partially collapsed lungs. As an example for the usefulness of adjustable images in the clinical practice we will focus on plain-film X-Ray images as they are widely used in clinical practice and more difficult to read by even experienced radiologists. We will show how to enhance individual features in these images supporting a staged radiology read and we will illustrate how such image adjustments can be introduced into clinical practice to provide an easy to understand and interpret reading experience.
Some of the changes implemented by our method can produce subtle effects in the output images. One the one hand this is done by design to allow the experienced radiologist to apply prior training to the image interpretation. Hopefully this will also increase the acceptance of these methods in clinical practice. One the other hand, especially for tasks that are difficult to read, such as lung tissue regions with low contrast in the presence of heavily obstructing bone structures, our algorithm can generate images that appear similar to structures indicating other diseases. It is therefore especially important to provide context for the implemented image adjustments to the reader and to make sure that image adjustments accurately reflect existing structures. Interpretability and transparency of the methods will be implemented by focusing on image enhancements, a tight control of the generated training data, and improvements in the presentation of images during the radiology read.
We use five computer tomography dataset (spiral CT, GE) with varying resolution (2x 512 by 512 by 0.5mm, 2x 512 by 512 by 3mm, 1x 256 by 256 by 6mm) all reconstructed with a standard soft-tissue kernel (beam strength).
Improving the salience of image features such as bone structures and lung-tissue in plain-film X-Ray images can be implemented by enhancing or diminishing contrast for texture features linked to these imaged body structures; these same features are used by the experienced radiologist to establish diagnostically relevant information. Machine learning and in particular deep learning using the U-net architecture are established tools to train image processing tasks using supervised learning with perceptual feature loss. Given appropriate training data these models can be applied as image filters to enhance input images and highlight trained features.
{\bf Generation of training data:} The essential task in training these supervised networks is the generation of appropriate sets of training data pairs of input and expected output image. This leads us to the need to specify the expected image appropriate for a given step in the radiology workflow. Imagine in the case of plain-film X-Rays that we could image the chest of a patient once and secondarily, in the same pose the same patient with its bones removed. Imagine also that we could enhance the density of the lung-tissue selectively in the patient and again, in the same pose produce another plain-film X-Ray image. Such tissue specific adjustment would alter the visual appearance of the image in well defined ways. So how in practice do we remove the bones from patients for imaging? How do we increase the density of lung tissue? Remember that these steps only have to be performed for the generation of training, test and validation data. Once the algorithm has learned how to identify individual components in the image based on location or texture features it can be applied to novel data with the hope of generating workflow step optimized images for radiology reads.
In this work we use a physically based rendering technique to generate what-if images suitable for supervised learning. The appearance of plain-film X-Ray images can be simulated using volumetric CT data and digitally reconstructed radiograph (DRR) rendering. The scanned CT data is placed in a virtual position resembling the pose used in plain-film X-Ray images. Using the density information of the volumetric data a projection image is formed by DRR rendering that resembles the appearance of X-Ray images. Changing the position and orientation of the CT volume in the view-frame a series of images is generated as input images for the supervised learning. The target images for each input image is generated as follows. Bone removal is trivially implemented in CT images by using a fixed Hounsfield unit range () based on the density of bone structures. Volume elements that belong to bone structures using this approach are replaced with a fixed density value suitable for soft-tissue (Amira software, ThermoFisher). A lung segmentation algorithm is used to automatically segment lung tissue in CT volume data (in-house software for lung segmentation available at github.com/MMIV-CENTER/LungSegmentation). The image intensity in the lung region of interest is adjusted to increase the brightness of higher density areas, without affecting the image intensity in other organs. Amira software (ThermoFisher) is used for the composition of the intensity modulated output images as well as for the digitally reconstructed radiograph (DRR) rendering of original data and intensity modulated volumes in the same pose. For each of the four patient datasets 100 equally spaced pose positions are generated covering a rotation angle of
The image pairs are used to train a super-resolution machine learning model, in particular a U-net architecture - ResNet-34 with feature and gram loss and a one-cycle learning with 60 iterations. Such models are used for general image generation tasks. In this case we train the model to retain the input resolution (8bit grayscale, 1024 by 1024) and to change its appearance. The training is done using data augmentation and a 10% test set to supervise the training progress.
In order to provide context to the reader we selected a data presentation approach that allows for a direct comparison of the original image and the derived image with tissue specific adjustments. Using an animation/cine loop original images are blended over into image where the appearance has been adjusted for a specific task. This type of presentation is available in most general purpose image reading systems and is therefore familiar to the reader who can control the speed of the presentation.
Image enhancement algorithms are in contrast to the task of segmentation, which usually generate masks for selected image regions supporting a quantitative image workflow. Inter- and intra-operator variability makes it difficult to come up with generally agreed on results. Workflows to adjust generated masks are also difficult to establish but promise adjustable workflows that can use reader feedback to improve and adjust segmentation in order to keep up with changes in image equipment, reconstruction software and edge cases that might be underrepresented in the initial training data. By focusing on image enhancements an arguable easier task our framework provides access to machine learning algorithms that support existing decision workflows.
The approach of generating what-if training data for supervised learning is very efficient in terms of its automated generation of high quality, well controlled training data from a few volumetric CT images using pose adjustments. Our approach generalizes to other organs in the body. Also a variety of regional enhancement strategies can be used making it easy to adjust the approach to individual workflow steps. As an example workflow steps focused on a specific organ in the field of view can use any of the published or commercially available segmentation algorithms to create masks suitable for the adjustment of contrast properties by organ. The combination of segmentation algorithms, physics based rendering and deep learning allow for what-if images previously impossible, like patients without bones. Replacing components of this workflow with alternatives we can hope that our pipeline allows to design new applications that deliver images suitable for a larger number of clinical workflows. The integration of our images into medical viewing stations allows for the effective and transparent evaluation of the component segmentation and machine learning algorithms.