[go: up one dir, main page]

0% found this document useful (0 votes)
5 views12 pages

Journal_JMS_2018

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/326068526

Feature Selection for Automatic Tuberculosis Screening in Frontal Chest


Radiographs

Article in Journal of Medical Systems · June 2018


DOI: 10.1007/s10916-018-0991-9

CITATIONS READS

149 1,032

8 authors, including:

Szilard Vajda Alexandros Karargyris


Central Washington University MLCommons
61 PUBLICATIONS 1,105 CITATIONS 98 PUBLICATIONS 4,067 CITATIONS

SEE PROFILE SEE PROFILE

Stefan Jaeger Kc Santosh


National Institutes of Health University of South Dakota
113 PUBLICATIONS 7,303 CITATIONS 352 PUBLICATIONS 6,921 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Szilard Vajda on 03 November 2018.

The user has requested enhancement of the downloaded file.


J Med Syst (2018) 42: 146
https://doi.org/10.1007/s10916-018-0991-9

SYSTEMS-LEVEL QUALITY IMPROVEMENT

Feature Selection for Automatic Tuberculosis Screening in Frontal


Chest Radiographs
Szilárd Vajda1 · Alexandros Karargyris2 · Stefan Jaeger4 · K.C. Santosh3 · Sema Candemir4 · Zhiyun Xue4 ·
Sameer Antani4 · George Thoma4

Received: 9 October 2017 / Accepted: 12 June 2018 / Published online: 29 June 2018
© Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract
To detect pulmonary abnormalities such as Tuberculosis (TB), an automatic analysis and classification of chest radiographs
can be used as a reliable alternative to more sophisticated and technologically demanding methods (e.g. culture or sputum
smear analysis). In target areas like Kenya TB is highly prevalent and often co-occurring with HIV combined with low
resources and limited medical assistance. In these regions an automatic screening system can provide a cost-effective
solution for a large rural population. Our completely automatic TB screening system is processing the incoming CXRs (chest
X-ray) by applying image preprocessing techniques to enhance the image quality followed by an adaptive segmentation
based on model selection. The delineated lung regions are described by a multitude of image features. These characteristics
are than optimized by a feature selection strategy to provide the best description for the classifier, which will later decide if
the analyzed image is normal or abnormal. Our goal is to find the optimal feature set from a larger pool of generic image
features, –used originally for problems such as object detection, image retrieval, etc. For performance evaluation measures
such as under the curve (AUC) and accuracy (ACC) were considered. Using a neural network classifier on two publicly
available data collections, –namely the Montgomery and the Shenzhen dataset, we achieved the maximum area under the
curve and accuracy of 0.99 and 97.03%, respectively. Further, we compared our results with existing state-of-the-art systems
and to radiologists’ decision.

Keywords Tuberculosis · Chest x-ray · Automatic chest x-ray analysis · Feature selection · Neural networks · HOG ·
Automatic TB screening

Introduction
https://ceb.nlm.nih.gov/repos/chestImages.php
Tuberculosis (TB) – according to the 2017 WHO report
This article is part of the Topical Collection on Advanced [41], is considered one of the major life threats beside
Computational Intelligence and Soft Computing in Medical HIV (human immunodeficiency virus), with a mortality
Imaging
rate of 1.3 million people among the 10.4 million people
 Szilárd Vajda developing the disease each year. Cure rates over 90% have
szilard.vajda@cwu.edu been described in clinical studies. However, it still remains
Alexandros Karargyris a major challenge due to the presence of TB in tandem with
akararg@us.ibm.com HIV in 1.7 million cases out of the reported 10.4 million
Stefan Jaeger
stefan.jaeger@nih.gov
George Thoma
K.C. Santosh george.thoma@nih.gov
Santosh.KC@usd.edu
Sema Candemir 1 Central Washington University, Ellensburg, WA, USA
sema.candemir@nih.gov
2 IBM Almaden Research, San Jose, CA, USA
Zhiyun Xue
3 University of South Dakota, Vermillion, SD, USA
zhiyun.xue@nih.gov
4 National Library of Medicine, National Institutes of Health,
Sameer Antani
sameer.antani@nih.gov Bethesda, MD, USA
146 Page 2 of 11 J Med Syst (2018) 42: 146

ones. Among the population contracting the virus, 90% are Our objective with the feature selection [17] imple-
adults, 65% are male, and 56% are coming from only five mented for our CXRs classification scenario was three-fold:
countries: Indonesia, Pakistan, India, the Philippines and i) improve the prediction performance of the underlying
China. classifier, ii) provide an optimal feature set suitable to
TB is an infectious disease caused by the bacillus describe abnormalities such as TB in the lung, and iii) pro-
Mycobacterium tuberculosis, which typically affects the vide a direct comparison of our results with those published
lungs. It spreads through the air when people with active TB by Jaeger et al. [23]. In addition to the main goal to find an
cough, sneeze, or otherwise expel infectious bacteria [52]. optimal feature set providing high classification accuracy,
TB is most prevalent in Africa and Southeast-Asia, where our secondary goal was to select a fast and well performing
widespread poverty and malnutrition reduces resistance to classifier such as an artificial neural network [3, 53]. Such
the disease. The most common method for diagnosing TB a network is able to define complex non-linear decision sur-
worldwide is sputum smear microscopy (developed more faces necessary to distinguish TB cases from non-TB cases
than 100 years ago), in which bacteria are observed in relying only on features. The feature selection will also
sputum samples examined under a microscope. Following make possible an overall shorter processing time due to the
recent developments in TB diagnostics, the use of rapid fact that only a reduced number of features is to be extracted
molecular tests for the diagnosis of TB and drug-resistant and used in the classification process.
TB is increasing, as highlighted in WHO’s reports [40, 41]. In this paper, we propose an end-to-end system
In countries with more developed laboratory capacity TB capable of detecting different lung abnormalities from
cases are also diagnosed via culture methods (the current CXRs analysis using only image processing and machine
reference standard). However, these methods are currently learning. The rest of the paper is structured as follows:
rather expensive, and not easily applicable in low-resourced “Related work” gives a brief overview of the state-
regions such as Africa. In these areas chest X-ray (CXR) is of-the-art, “Methods” discusses the methods in use, –
still the most prominent TB detection method in use. involving lung segmentation, features description, features
Tuberculosis is exhibited in CXR images in form of selection and classification. “Experiments” provides a
cavitations, consolidations, infiltrates, blunted costophrenic brief description of the used chest X-ray collections, the
angles, opacities, pleural effusion and thickening, pneu- evaluation protocols and the different results. Finally, a brief
monia, horizontal fissure displacement, hilar enlargement summary highlighting the strengths of our paper is provided
and small broadly distributed nodules [52], among other in “Conclusion”.
radiological manifestations. These changes can often be
detected in CXRs in the form of corrupted and/or deformed
lung profiles [27], disruptions in the lung shapes, intensity Related work
changes in the lung tissue [23], texture abnormalities [8],
etc. Some prominent TB manifestations can be observed in Recently, we note an increased focus on automatic chest
Fig. 1. Besides the design and development of a deploy- radiography [2, 11, 24, 25, 28, 29, 31, 44] due to the more
able and reliable CXR screening system, our major aim is to affordable prices for X-ray machines, and the huge potential
select the best and complementing features. These specially residing in the automatic image processing [16]. Such tools
selected characteristics will help the underlying classifier to analyze these digital images without any external human
produce a complex decision surface necessary to distinguish involvement [30]. Even though, in the last few years many
normal CXRs from the abnormal ones. papers have been published in computer-aided diagnosis

Fig. 1 Different Tuberculosis manifestations in chest X-Ray images


J Med Syst (2018) 42: 146 Page 3 of 11 146

(CAD) targeting chest x-ray images [26], there are only a segmentation [1], rib supression or histogram equalization
limited number of systems which can accurately read chest [23] have been implemented. The perceptual errors commit-
radiographs [14, 31]. ted by human readers can be corrected with focused analysis
Due to the uncontested success of deep convolutional using systematic search strategies, coning devices, etc.
neural networks, in the recent years different works Depeursing et al. [10] proposed a study to compare
appeared in the medical image analysis field [2, 22, different classification methods involving five different
28, 29, 31]. Instead of using the traditional feature classifiers applied to three types of feature groups: gray-
extractions followed by classification, the researchers level histograms, air components and quincunx wavelet
in this new paradigm tasked the networks to extract frame coefficients with B-spline wavelets. Similar attempts
automatically [18] the separating characteristics from X- have been proposed by Jaeger et al. [23] involving classifiers
rays, MRIs, etc. However, such methodologies need very such as SVM, multi-layer perceptron, decision tree and
large training samples [21] and some small deformations linear regression. In both cases SVMs provided the best
like calcifications or infiltrates might not be detected performances. The work [54] presents a rather small scale
properly [2]. Do to this fact we focused our current experiment (77 images) only for nodule detection involving
research on the classical solution, where well defined image feature extraction. The features – mainly intensity values,
characteristics can describe the different abnormalities and wavelets, Gabor coefficients, multi-scale Hurst features,
is usable for reduced size data too. etc, in total 67 different characteristics were selected
Nodule detection is becoming one of the popular research using a genetic algorithm (GA) by minimizing the overall
focuses, due to the very well defined aspect of the classification error. With the method they managed to
problem. Even some commercial systems are available on reduce the features number to 25. However, there is no direct
the market [46] helping radiologist to localize and diagnose comparison showing the importance of the feature selection.
lung cancer [19]. However, nodules are one of many In general, there is no clear understanding why some
representations of TB, besides consolidations, infiltrates, features perform better than others and there is no clear
blunted costophrenic angles, opacities, pleural effusion and view how those image features can actively contribute
thickening, pneumonia, horizontal fissure displacement, or in the classification. Therefore, a clear understanding of
hilar enlargement. Due to the high complexity of the the features and their combination is a necessity in order
problem to detect these different type of TB manifestations, to provide a well defined framework in the future for
recent studies concentrate more on specific topics, such pulmonary disease detection and classification.
as lung segmentation [5, 8], temporal subtraction for bone
scans [47], or other aspects such as detection of catheters
and pneumothorax, texture analysis or shape analysis [15]. Methods
To overcome human involvement in lung cancer detec-
tion addressed by the seminal work of Lodwick et al. This section describes the different processing steps of
[33] by converting the visual images of roentgenograms the system: starting with lung segmentation using atlas-
into numerical sequences, the current research shifted the based segmentation, the feature selection, and finally the
focus to more sophisticated and automatic feature extraction classification which provides the user with a confidence
methods. These features are able to describe the differ- measure for each analyzed image belonging to the normal
ent phenomenons encountered in the different CXRs. Van or abnormal cases.
Ginneken et al. [15] identified these possible features as
being texture related as patterns are diffuse. The analysis Lung segmentation
of pixel neighborhood intensities can reveal certain spe-
cific characteristics. As the authors mention it, is hard for a In our system, we use an atlas-based lung segmentation
radiologist ”to get a clue” why these image features relate algorithm. The atlas is a set of CXRs from several
to certain diseases. However, to mimic the radiologists’ patients and their expert delineated lung boundaries. The
reading habits, computer scientists should transcribe the system first chooses the most similar models to the
reading knowledge in a more formal way. The extraction patient X-ray by measuring the lung shape similarities.
of all types of image characteristics (intensity, shape pro- Then, it warps the selected models to the patient X-
files, wavelets, etc.) should be followed by feeding these ray using a registration algorithm. This algorithm uses
characteristics into sophisticated classifiers such as neural the scale invariant feature transform (SIFT) flow (i.e.,
networks, Support Vector Machines (SVM), etc. The noise SIFT-flow) registration approach [32], which computes the
– caused by the image acquisition or size of the lung region corresponding pixels of image pairs according to their SIFT
of the analyzed subject, etc. [36] can skew the results. To feature similarity. The average of the registered models will
reduce these type of artifacts, different methods such as rib constitute the patient-specific lung model. The system then
146 Page 4 of 11 J Med Syst (2018) 42: 146

combines the CXR intensity values and lung model with extracted from the segmented lung regions. In particular,
an objective function to decide for the final boundary. The the following descriptors were considered: Color Layout
segmentation solves the objective function with a graph-cut Descriptor (CLD), Edge Histogram Descriptor (EHD)
energy minimization approach [4]. from MPEG-7 standard [34], Color and Edge Direc-
The system produces state-of-the-art results on a tion Descriptor (CEDD) [6], Fuzzy Color and Texture
public set (c.f., JSRT set [45]) reaching 0.954 ± 0.0015 (FCTH) [7], Tamura texture descriptor, Gabor texture
coverage. Similar scores were reported for the Montgomery feature [20], and other texture features such as primitive
collection, where 0.941 ± 0.034 coverage was obtained. length (PL), edge frequency (EF), and autocorrelation
For more details about this stage, we refer to the work by (AC) [48]. This feature set is larger, comprising 595
Candemir et al. [5]. features.
Set C: Is a focused feature collection involving only shape
Features description measurements calculated from the lung shapes provided
by the standard MATLAB implementation. For our
To characterize normal and TB suspicious CXR lung purpose size, orientation, eccentricity, extent, centroid,
segments, we considered three different feature sets. The and bounding box were considered. The dimension of
feature Set A is inspired from object detection [16, 37] and this feature set is 12. For each lung segment 6 different
was used with success in a previous work [23]. The feature features were extracted and later concatenated. For
Set B has been utilized with success by Rahman et al. [42] details please refer to the help provided by MATLAB’s
for a medical CBIR system. Finally, we considered basic regionprops1 .
shape features, which can also be powerful to characterize
abnormalities. For pleural effusion the lower part of the lung Set C contains only similar types of features, while set
is not visible due to the accumulated fluid in the thoracic A and B are a mixture of all sorts of features, as they were
cavity, thus producing a blunt costophrenic angle [35] and a used separately for different pattern recognition tasks [16,
considerably modified lung shape [52]. 37, 42]. Therefore, we have not seen the necessity to classify
All features have only been extracted from the lung the features based on their properties and nature.
regions detected by the atlas-based segmentation method
(see “Lung segmentation”) preceded by a histogram Feature selection and classification
equalization to enhance the overall contrast of the analyzed
CXR images. In many systems devoted to better CXR analysis [8, 14,
23, 24, 27], the authors do not specifically motivate their
Set A: Is a versatile and compact feature set combining selection for the particular features in use. Rather, they
shape, edge and texture descriptors. The final feature rep- just borrow well-known features from image processing
resentation is built by concatenating the different descrip- [16]. Without any specific motivation, – excepting color,
tors (histograms) extracted from the segmented lung edges or texture, which are applicable to all kinds of object
regions. In particular, the following shape and texture detection tasks [37], content based image retrieval (CBIR)
descriptors were considered: Intensity Histogram (IH), [42] works do not consider particularly crafted features
Gradient Magnitude Histogram (GM), Shape Descriptor to characterize abnormalities such as TB. While some
Histogram (SD), Curvature Descriptor Histogram (CD), features can complement each other, – by improving the
Histogram of Oriented Gradient (HOG) [9], Local Binary discriminating power of the descriptor [50], some features
Pattern (LBP) [39]. A modified multiscale approach pro- might work in the detriment of others, thus the selection
posed by Frangi et al. [13] is considered to compute of features from a larger pool is necessary and useful for
the eigenvalues of Hessian matrix needed for the shape further consideration.
and curvature descriptors. The Hessian describes the For our purpose we considered a wrapper type feature
second-order surface curvature properties of the local selection model [43, 49]. Instead of aiming to reach a
image intensity surface. The normalization makes these certain accuracy level, – often used as selection criterion, we
descriptors intensity invariant. Jaeger et al. [23] deter- conducted an exhaustive search among the different feature
mined that quantizing these features into 32 bins provides combinations. Given n different features, the number of
good discrimination performance. The size of the feature possible combinations is:
descriptor is 192.      
n n n
Set B: Is a rather diversified and low-level feature collec- N= + + ··· + = 2n − 1 (1)
1 2 n
tion involving intensity, edge, texture, color and shape
moment features. The final feature representation is built
1 http://www.mathworks.com/help/images/ref/regionprops.html
by concatenating the different descriptors (histograms)
J Med Syst (2018) 42: 146 Page 5 of 11 146

Due to the different feature combinations, the corresponding Data


feature subsets were concatenated, and a train/test procedure
launched to established the best combination. The feature For the experiments two different, publicly available
combination providing the maximum ACC or AUC was CXR data collections were considered. The images in
retained as the winner combination. Due to the exponential these studies were de-identified by the data providers,
nature of the experiment, a parallel approach of the wrapper and are exempted from IRB review at their institution.
method was implemented. The selection/training/testing The data was exempted from IRB review (No. 5357) by
phases for each particular feature combination is completely the NIH office of Human Research Protection Program.
independent so such parallelism was possible. We are The Montgomery dataset, – a representative subset of
aware that this exhaustive search might take longer than a larger image repository, was collected over many
a random search looking for a certain threshold, but years within the tuberculosis control program of the
due to the fact, that this search has to be performed Department of Health and Human Services of Montgomery
only once, we preferred performance maximization over County. The set contains 138 posteroanterior CXRs, among
speed. which 80 CXRs are normal ones, while the remaining
For classification, a neural network-based classifier was 58 CXRs are abnormal cases (presenting some sort
considered. Our choice for the selection of the classifier of abnormality indicating TB). The Shenzhen dataset
might sound a bit arbitrary, but we conducted some is from Shenzhen No. 3 Hospital (Shenzhen, China),
preliminary experiments using Support Vector Machine one of the largest hospitals in China for infectious
(SVM). The results were not encouraging, and they were diseases. The CXR images belong to outpatient clinics.
in the range of the original experiments conducted by The collection contains 342 normal, and 334 abnormal
Jaeger et al. [23]. Neural networks in particular are cases. For more details about the data, please refer to
known for their capacities of estimating complex decision [23].
surfaces [3] and handling multi-class problems. Due to
the large numbers of features to be handled (up to 799 Evaluation protocols
dimensions), and the lack of information about the possible
correlations among the different feature components, a fully In order to properly evaluate the performance of the current
connected multi-layer perceptron network was utilized. system, several measures were considered. Accuracy (ACC)
The number of neurons in the input layer was selected and area under the curve (AUC) were selected [12]
based on the dimensionality of the input feature vector. to measure these performances. We considered these
The number of output neurons was also set based on measurements and not others such as MCC (Matthew
the possible outcomes: normal and abnormal. The number correlation coefficients), because we wanted to directly
of neurons in the hidden layer was estimated based on compare our results with the results presented in [23].
several trial runs. Finally, for the experiments 10 neurons Beside the ACC, the AUC is a necessary measure to
were considered as being optimal in the hidden layer. For understand the behavior of the underlying classifier. Due
training, error-backpropagation strategy was considered, to the special application in question, namely deciding if a
while for learning rate = 0.001, and momentum = 0.2 specific CXR contains abnormalities, there is a high need to
were used. The different parameters were established based control the true positive rate, as nobody should be missed
on several trial runs. For the number of hidden neurons if his/her CXR contains a certain type of abnormality [38].
in the hidden layer, we considered the criteria to have The ROC curve also gives us the possibility to adjust our
as less possible neurons to keep the complexity low as classification threshold for the purpose of our application.
possible, therefore the recognition time becomes faster. Each of our experiments follows a 10x cross-validation
More sophisticated network optimizations involve pruning protocol. The reported results are the average scores of the
algorithms [51, 53] could be considered, however this different folds.
kind of optimization goes beyond the scope of the current
research. Results

Different experiments were conducted involving the


Experiments Montgomery, and the Shenzhen collection, respectively.
First some results are reported using as input for the clas-
This section gives a detailed description of the data in use, sifier the feature set A, feature set B, and feature set C,
the evaluation protocols considered for the experiments. respectively. The goal of these experiments was to show
Finally, the different experiment setups are described the importance of these features separately, as well as their
followed by some comparisons. strength by combining them together.
146 Page 6 of 11 J Med Syst (2018) 42: 146

In Table 1. is to be observed the discriminative power Table 2 Area under the curve (AUC) measures reported for the
of the feature set A, involving less features than set B. The different feature representations for different data collections
intensity histogram, the local binary patterns, the histogram Dataset Set A Set B Set C Set {A,B,C}
of oriented gradients, etc. seem to be more powerful than the
features borrowed from MPEG-7 standard. Similar trend is Montgomery 0.87 0.72 0.71 0.79
to be observed in the work proposed by Jaeger et al. [23]. Shenzhen 0.99 0.90 0.77 0.97
The increased scores in our cases suggest also the fact, that
the neural network is able to better estimate the decision
surface than a support vector machine (SVM). retrieval. Nobody analyzed their particular contribution to
While the first column in the table show individual results the final recognition. Therefore, our feature selection exper-
for the different feature sets, the last column involves all iment identified some 17 different features belonging to
the features described in detail in “Features description”. Set A(#6), B(#10), and C(#1). The experiments in Tables 3
This extended feature set is focusing more on the common and 4. show those optimized feature sets which provide
representational effort of these features –by stitching them the highest accuracy, and maximized area under the curve,
together. Apparently, the combined feature set involving respectively. Our optimization criteria for the best selection
set A, B, and C can not overcome the individual results of features was max{ACC} and max{AU C}.
generated by set A, because set B and C are introducing a For feature selection each possible feature combination
certain number on confusions. was trained/tested on a 10x cross-validation basis, and the
Due to the the limited number of blunt costophrenic angle average scores were reported. The scores for the optimized
appearances in the analyzed collections, the shape features feature collections (see Tables 3 and 4.) are way more
have only limited description power. The majority of the accurate than the results obtained by the original features
CXRs available in our collections have TB manifestations (see Tables 1 and 2.) obtaining a net ACC gain of 6.45%
inside the lung regions, and not that much along the (Montgomery) and 1.46% (Shenzen), respectively. The
boundaries usually involving severe shape deformations. AUC net gain goes up to 4% (Montgomery), while for
However, shape features (Set C) can be still considered as the Shenzhen data the same AUC has been achieved, more
a reliable source to separate abnormal CXRs from normal precisely 0.99, – a result which is acceptable considering the
ones, when the TB manifestation is to be observed on the importance of the correct classification of the true positive
shape such as pleural effusion [35, 52]. cases (abnormalities).
In Table 2. similar conditions were considered as in the Is to be noted that for the feature selection, – due to the
case of Table 1., but this time instead of measuring ACC, nature of the evaluation protocol (no dedicated training/test
the AUC is measured to show the real strength of the set), all the folds contributed to the selection of the best
neural classifier, –by varying the threshold applied to the feature collection, therefore the results could be biased
accuracy. The results achieved for the feature set A are very [49].
promising, achieving almost perfect scores for the Shenzhen In order to support the correctness of our choice for
data (AUC = 0.99), and promising score (AUC = 0.87) the selection, we published the standard deviation (σ ) for
for the Montgomery collection. These scores provide a real the results coming from the 10 folds. While the standard
proof that it is possible to set up a classifier which provides deviation is low for the results reported for the Shenzhen
almost perfect classification rates. Following the trend collection (see Table 4.), the spread is really high for the
discussed earlier, the feature set B and feature set C provide Montgomery collection (see Table 3.). The rather high σ
moderate results, due to their limitation describing and level for the second collection can be explained with the
capturing the specific shape and orientations of lung, ribs, relatively low number of CXRs and the unbalanced aspect
etc. The ROC curves for the Montgomery and Shenzhen of this collection. The standard deviation in this case can be
collections, involving set A are shown in Fig. 2. considered as a measure to see how far we would be in case
However, these characteristics are classical image fea- of a dedicated test set to test with.
tures used in object recognition or content based image While the results reported for the feature selection
might be biased, not only is the accuracy gain substantial,
but all features need much more time to be extracted,
Table 1 Accuracy (ACC) measures reported for the different feature
representations for different data collections
thus influencing the overall processing time of each chest
radiograph. Beside the increased accuracy/area under the
Dataset Set A(%) Set B(%) Set C(%) Set {A,B,C}(%) curve, we also would like to focus our discussion towards
the selected features. Among the selected ones is to
Montgomery 78.30 72.47 65.82 69.45
be observed a net dominance of features such as IH,
Shenzhen 95.57 81.06 70.40 92.00
CD, LBP, HOG belonging to the original feature set
J Med Syst (2018) 42: 146 Page 7 of 11 146

Fig. 2 ROC curve for the


Montgomery and Shenzhen
collection using the feature set
A for classification

A. From the set B, features like FTCH, GLCM, Gabor it is still acceptable to identify a healthy lung as being
and EF were considered, showing the strength of these abnormal, none of the abnormal cases should be missed.
particular features in the overall evaluation. As one can Therefore, we show some results in Table 5. for both
see, these features describe texture, intensity, edginess, etc., collections involving recall values of 0.90, 0.95, and 0.99,
properties valuable to distinguish normal and abnormal respectively.
CXRs. Far beyond the comparison to the baseline system The false positive rate (FPR) for the Shenzhen collection
[23], our ultimate goal is to find the best possible is rather promising and acceptable. The results produced
feature combination, and deploy the system in Kenya for the Montgomery collection are more modest. One
to accurately detect TB positive patients. Therefore, for possible explanation could be the reduced number of
direct evaluation our feature selection scores are somehow samples present in the collection. It is known that for
biased, but for the upcoming chest x-ray images to be statistical classifiers such as neural networks, to adjust the
analyzed, we discovered the best feature descriptors to be different weights through the learning process a multitude
considered. of different samples is necessary [3]. This condition is more
The optimization also supports the fact that shape fulfilled for the Shenzhen collection, where the number of
features do not contribute to the classification, and their samples is over 300, both for abnormal and normal cases,
usage should be rather considered in a pre-filtering phase, respectively.
before starting a thorough analysis of the radiograph. To directly compare our results, we considered the
Such a filter beside a costophrenic angle estimator [35] system proposed by Jaeger et al. [23]. In this paper the
could help to quickly identify lung shape abnormalities authors are focusing exactly on the same data and using
which are strong indicators of different types of lung same type of experiments as we provided in this current
diseases. paper. For quality measurements accuracy (ACC) and
As our screening application is to be used in a TB area under the curve (AUC) were considered, both being
prevalent area such as Kenya, it is important to provide the adequate measures to decide about the quality of the system.
healthcare providers with a reliable tool for screening, – While the authors of the previously mentioned work report
avoiding any misclassification of the abnormal cases. While different type of results, for the sake of clarity, only their

Table 3 Results for the optimized feature set involving the Mont- Table 4 Results for the optimized feature set involving the Shenzhen
gomery collections collections

Optimization Result σ Selected features Optimization Result σ Selected features

max{ACC} 84.75% 11.16 {FTCH,EF,IH,GM,SD,CD,LBP} max{ACC} 97.03% 1.71 {CLD,Gabor,GLCM,EF,IH,HOG,LBP}


max{AU C} 0.91 0. 11 {FTCH,GLCM,EF,IH,SD,CD} max{AU C} 0.99 0.005 {Gabor,EF,GM,HOG,LBP}
146 Page 8 of 11 J Med Syst (2018) 42: 146

Table 5 False positive rates for recall values of 0.90, 0.95 and 0.99 for Table 7 Comparison of human consensus performance with ground
Montgomery and the Shenzhen collections truth of Montgomery collection [23]

Dataset 0.90 0.95 0.99 Consensus

Montgomery 0.261 0.261 0.261 + –


Shenzhen 0.003 0.011 0.062 Ground truth + 58 0 58
– 25 55 80
83 55 138

best scores will be mentioned in the comparison. To our


best knowledge, there are no other works reporting results
on the Montgomery and Shenzhen benchmark collections.
Conclusion
As one can see in Table 6, the improvements both ACC
In this paper, we presented a completely automatic frontal
(11.47%) and AUC (11%) reported by our system are rather
chest radiograph screening system able to detect healthy
significant for the Shenzhen collection. The improvement
lungs and spot abnormal ones - carrying different type
of the AUC with 1% for the Montgomery collection is
of Tuberculosis manifestations. Due to the focus group
statistically insignificant. The comparison with the achieved
specificities (Kenya’s rural population), – involving limited
scores by the feature selection (see Tables 3 and 4.) would
resources and limited medical personnel, the development
be even more impressive than the results obtained by
of such mobile screening systems is important, and it has
the feature set A (see Tables 1 and 2.). However, such
a huge benefit for the public health endeavors sustained
comparison would not be exactly accurate.
currently in Kenya.
We also compared the performance of our system with
Our main goal, besides the description of the automatic
human reading performances. For that purpose we used the
CXR screening system, was to gain a deeper understanding;
results reported by Jaeger et al. [23]. For the experiments
why some features can carry the necessary information
two independent radiologists were asked to read the CXRs
to separate the abnormal cases from the normal cases
belonging to the Montgomery collection. This process
using and some others do not possess such capability. The
was completely independent from our experiments, and
majority of the current systems just borrow some well-
it was based only on visual inspection of the frontal
known features from the literature, –considered for larger
chest X-rays. In Table 7. a detailed confusion matrix
purpose object detection or content-based image retrieval,
is presented, showing how human readers perform in
and apply a classification scheme on top of that. Our
classifying the CXRs into normal and abnormal cases,
solution provides a wrapper based feature selection to
respectively. By calculating the accuracy (ACC) obtained by
find a particular feature combination which minimizes the
the radiologists, one can observe the fact, that the accuracy
classification error rate, and maximizes the area under the
(81.86%) achieved by the radiologist is still higher than
curve.
the accuracy (78.30%) reported by our system, considering
Considering three different feature sets involved in a pre-
the exact same conditions for the Montgomery collection.
vious study, we managed to select, –in a data-driven manner,
With more specific features and more training samples
those particular feature combinations which maximize the
available, we are confident that the scores provided by
overall performances of the classification systems for the
the automatic systems will increase gradually. All these
different CXR collections. Among the selected features we
results point us to the conclusion that automatic screening
can enumerate features such as Gabor, Fuzzy Color and
systems are necessary and helpful. With the corresponding
Texture Histogram, Intensity Histogram, Shape Descrip-
medical expertise provided by the radiologist, machines
tor, Local Binary Pattern, Curvature descriptor, Histogram
can also classify with high accuracy and reliability chest
of Oriented Gradient, Edge Frequency, features which can
radiographs for the benefit of the overall diagnostic
be considered in the future for similar classification tasks.
process.
These characteristics are more concentrated on the over-
all image quality, edginess and texture, –properties which
can apparently distinguish between normal and abnor-
Table 6 ACC and AUC comparisons between the results reported by
Jaeger et al. [23] and the results produced by our system (see Our)
mal CXRs. However, we are aware that these results are
reported for only two different frontal chest X-ray collec-
Dataset ACC [23] ACC (Our) AUC [23] AUC (Our) tions, namely the Montgomery and Shenzhen collections.
These publicly available collections contain only a limited
Montgomery 78.3% 78.3% 0.86 0.87
number of X-rays, but beside our main goal to detect impor-
Shenzhen 84.10% 95.57% 0.88 0.99
tant and descriptive features from a larger collection, we
J Med Syst (2018) 42: 146 Page 9 of 11 146

also wanted to provide a direct comparison of our results References


with those published by Jaeger et al. [23], – hence the choice
of the data. 1. Banik, S., Rangayyan, R. M., and Boag, G. S., Automatic
Our classification shows a net improvement of up to segmentation of the ribs, the vertebral column, and the spinal
canal in pediatric computed tomographic images. J. Digit. Imaging
11.45% accuracy and 11% improvement in the area under
23(3):301–322, 2010.
the curve for the Shenzhen collection. Considering the 2. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., and
results involving the feature selection, the scores can go Greenspan, H., Chest pathology detection using deep learning
even higher. Admittedly, our feature selection scheme is with non-medical training. In: 12Th IEEE International Sympo-
sium on Biomedical Imaging, ISBI 2015, brooklyn, April 16-19,
biased, however, with this selection we managed to identify
2015, pp. 294–297. https://doi.org/10.1109/ISBI.2015.7163871,
the feature subset on which the trained upcoming deployed 2015.
system in Kenya could provide the best recognition score. 3. Bishop, C. M., Neural networks for pattern recognition. New
Our experiments involving false positives rates for fixed York: Oxford University Press, inc., 1995.
4. Boykov, Y., Veksler, O., and Zabih, R., Fast approximate energy
recall values of 0.90, 0.95, and 0.99 show that we can
minimization via graph cuts. IEEE Trans. Pattern Anal. Mach.
define such a threshold mechanism based on the ROC Intell. 23(11):1222–1239, 2001.
analysis which could provide high specificity values. This is 5. Candemir, S., Jaeger, S., Palaniappan, K., Musco, J. P., Singh,
a necessity for such medical applications. R. K., Xue, Z., Karargyris, A., Antani, S., Thoma, G. R., and
McDonald, C. J., Lung segmentation in chest radiographs using
To further improve the automatic part in the classification
anatomical atlases with nonrigid registration. IEEE Trans. Med.
process, one could extract automatically features from the Imaging 33(2):577–590, 2014.
analyzed lung regions using an encoder type network. 6. Chatzichristofis, S. A., and Boutalis, Y. S., Cedd: Color and edge
Combining both type of features could lead to increased directivity descriptor: A compact descriptor for image indexing
and retrieval. In: Proceedings of the 6th International Conference
performances. Beside concentrating on the features some
on Computer Vision Systems, ICVS’08, pp. 312–322. Berlin:
special attention can be focused on the classifier too. Springer, 2008.
Instead of using one classifier to identify all sort of 7. Chatzichristofis, S. A., and Boutalis, Y. S., Fcth: Fuzzy color
TB manifestations, specialized classifiers could better and texture histogram - a low level feature for accurate
identify certain particular anomalies such as infiltrates, image retrieval. In: Proceedings of the 2008 Ninth International
Workshop on Image Analysis for Multimedia Interactive Services,
calcifications, pleural effusion, etc. WIAMIS ’08, pp. 191–196. Washington: IEEE Computer Society,
Besides identifying the normal cases the precise detec- 2008.
tion of abnormal cases could be deferred to other, more 8. Chauhan, A., Chauhan, D., and Rout, C., Role of Gist and
sophisticated healthcare facilities such as hospitals or clin- PHOG Features in Computer-Aided Diagnosis of Tuberculosis
without Segmentation. PLoS ONE 9(11): e112980. https://doi.org/
ics where more in-depth investigations can take place. The 10.1371/journal.pone.0112980, 2014.
comparison of our results with medical experts’ readings 9. Dalal, N., and Triggs, B., Histograms of oriented gradients for
shows that automatic systems such as ours can be consid- human detection. In: 2005 IEEE Computer Society Conference on
ered in the screening process. Such computer-aided diag- Computer Vision and Pattern Recognition (CVPR 2005), 20–26
june 2005, San diego, pp. 886–893, 2005.
nosis systems can work side-by-side with medical experts 10. Depeursinge, A., Iavindrasana, J., Hidki, A., Cohen, G.,
providing a second opinion and actively helping pulmonary Geissbühler, A., Platon, A., Poletti, P., and Müller, H., Com-
diagnosis of patients. parative performance analysis of state-of-the-art classification
algorithms applied to lung tissue categorization. J. Digit. Imaging
Acknowledgments This research is supported in past by the Intra- 23(1):18–30, 2010.
mural Research Program of the National Institutes of Health (NIH), 11. Doi, K., Computer-aided diagnosis in medical imaging: Historical
National Library of Medicine, and Lister Hill National Center for review, current status and future potential. Comput. Med. Imaging
Biomedical Communications (LHNCBC). Graph. 31(4–5):198–211. https://doi.org/10.1016/j.compmedima
The authors are grateful to Mr. Rodney Long for the fruitful g.2007.02.002, 2007. http://www.sciencedirect.com/science/artic
discussions during the development of this project. le/pii/S0895611107000262. Computer-aided Diagnosis (CAD)
and Image-guided Decision Support.
Funding Information This research was supported in part by the 12. Fawcett, T., An introduction to ROC analysis. Pattern Recogn.
Intramural Research Program of the National Institutes of Health Lett. 27(8):861–874, 2006.
(NIH), National Library of Medicine (NLM), and Lister Hill National 13. Frangi, A. F., Niessen, W. J., Vincken, K. L., and Viergever, M.
Center for Biomedical Communications (LHNCBC). A., Muliscale vessel enhancement filtering. In: Medical Image
Computing and Computer-assisted Intervention - MICCAI’98,
Compliance with Ethical Standards first international conference, Cambridge, October 11-13, 1998,
pp. 130–137, 1998.
Conflict of interests Authors declare that they have no conflict of 14. van Ginneken, B., ter Haar Romeny, B. M., and Viergever, M. A.,
interest. Computer-aided diagnosis in chest radiography: A survey. IEEE
Trans. Med. Imaging 20(12):1228–1241, 2001.
15. van Ginneken, B., Hogeweg, L., and Prokop, M., Computer-aided
Ethical approval All images used in this study were collected prior
diagnosis in chest radiography: Beyond nodules. Eur. J. Radiol.
to this study during routine clinical care. They were de-identified at
72(2):226–230. https://doi.org/10.1016/j.ejrad.2009.05.061, 2009.
source and have been exempted from review (NIH IRB# 5357).
146 Page 10 of 11 J Med Syst (2018) 42: 146

http://www.sciencedirect.com/science/article/pii/S0720048X090 31. Litjens, G. J. S., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi,
03581. Digital Radiography. F., Ghafoorian, M., van der Laak, J. A. W. M., van Ginneken,
16. Gonzalez, R. C., and Woods, R. E., Digital image processing. 3 B., and Sánchez, C. I., A survey on deep learning in medical
ed. Upper Saddle River: Prentice-Hall, Inc., 2006. image analysis. Med. Image Anal. 42:60–88, 2017. https://doi.org/
17. Guyon, I., and Elisseeff, A., An introduction to variable and 10.1016/j.media.2017.07.005.
feature selection. J. Mach. Learn. Res. 3:1157–1182, 2003. http:// 32. Liu, C., Yuen, J., and Torralba, A., Sift flow: Dense correspon-
dl.acm.org/citation.cfm?id=944919.944968. dence across scenes and its applications. IEEE Trans. Pattern
18. Hinton, G., and Salakhutdinov, R., Reducing the dimensionality Anal. Mach. Intell. 33(5):978–994, 2011.
of data with neural networks. Science 313(5786):504–507, 2006. 33. Lodwick, G. S., Keats, T. E., and Dorst, J. P., The coding of
19. de Hoop, B., Schaefer-Prokop, C., Gietema, H. A., de Jong, roentgen images for computer analysis as applied to lung cancer.
P. A., van Ginneken, B., van Klaveren, R. J., and Prokop, Radiology 81(2):185–200, 1963.
M., Screening for lung cancer with digital chest radiography: 34. Lux, M., Caliph & emir: Mpeg-7 photo annotation and retrieval.
Sensitivity and number of secondary work-up ct examinations. In: Proceedings of the 17th ACM International Conference on
Radiology 255(2):629–637, 2010. Multimedia, MM ’09, pp. 925–926. New York: ACM, 2009.
20. Howarth, P., Yavlinsky, A., Heesch, D., and Ruger, S., Medical 35. Maduskar, P., Hogeweg, L., Philipsen, R., and van Ginneken, B.,
image retrieval using texture, locality and colour. In: Peters, C., 2013.
Clough, P., Gonzalo, J., Jones, G., Kluck, M., and Magnini, 36. McAdams, H. P., Samei, E., James Dobbins, I., Tourassi, G. D.,
B. (Eds.) Multilingual Information Access for Text, Speech and and Ravin, C. E., Recent advances in chest radiography. Radiology
Images, Lecture Notes in Computer Science, Vol. 3491, pp. 740– 241(3):663–683, 2006.
749. Berlin: Springer, 2005. 37. Murphy, K. P., Torralba, A., Eaton, D., and Freeman, W. T., Object
21. Hwang, S., Kim, H., Jeong, J., and Kim, H., A novel approach detection and localization using local and global features. In:
for tuberculosis screening based on deep convolutional neural Toward Category-level Object Recognition, pp. 382–400, 2006.
networks. In: Medical imaging 2016: Computer-aided diagnosis, 38. Obuchowski, N. A., Roc analysis. Fundamentals of Clinical
San diego, 27 february - 3 march 2016, p. 97852w, 2016. Research for Radiologists 184(2):364–372, 2005.
22. Islam, M. T., Aowal, M. A., Minhaz, A. T., and Ashraf, K., 39. Ojala, T., Pietikäinen, M., and Harwood, D., A comparative
Abnormality detection and localization in chest x-rays using deep study of texture measures with classification based on featured
convolutional neural networks. CoRR arXiv:1705.09850, 2017. distributions. Pattern Recogn. 29(1):51–59, 1996.
23. Jaeger, S., Karargyris, A., Candemir, S., Folio, L., Siegelman, 40. Organization, W. H., Global tuberculosis report. http://apps.
J., Callaghan, F. M., Xue, Z., Palaniappan, K., Singh, R. K., who.int/iris/bitstream/10665/75938/1/9789241564502 eng.pdf.
Antani, S., Thoma, G. R., Wang, Y., Lu, P., and McDonald, C. J., Online; accessed 23-March-2015, 2012.
Automatic tuberculosis screening using chest radiographs. IEEE 41. Organization, W. H., Global tuberculosis report. http://apps.
Trans. Med. Imaging 33(2):233–245, 2014. who.int/iris/bitstream/10665/137094/1/9789241564809 eng.pdf.
24. Jaeger, S., Karargyris, A., Candemir, S., Siegelman, J., Folio, L., Online; accessed 20-April-2018, 2017.
Antani, S., and Thoma, G., Automatic screening for tuberculosis in 42. Rahman, M. M., You, D., Simpson, M. S., Antani, S., Demner-
chest radiographs: a survey. Quant. Imaging Med. Surg. 3(2):89, 2013. fushman, D., and Thoma, G. R., Interactive cross and multimodal
25. Karargyris, A., Siegelman, J., Tzortzis, D., Jaeger, S., Candemir, biomedical image retrieval based on automatic region-of-interest
S., Xue, Z., Santosh, K. C., Vajda, S., Antani, S. K., Folio, L., and (ROI) identification and classification. IJMIR 3(3):131–146, 2014.
Thoma, G. R., Combination of texture and shape features to detect 43. Saeys, Y., Inza, I. N., and Larrañaga, P., A review of
pulmonary abnormalities in digital chest x-rays. Int. J. Comput. feature selection techniques in bioinformatics. Bioinformatics
Assist. Radiol. Surg. 11(1):99–106, 2016. https://doi.org/10.1007 23(19):2507–2517, 2007.
/s11548-015-1242-x. 44. Santosh, K. C., Vajda, S., Antani, S. K., and Thoma, G. R., Edge
26. Katsuragawa, S., and Doi, K., Computer-aided diagnosis in map analysis in chest x-rays for automatic pulmonary abnormality
chest radiography. Comput. Med. Imaging Graph. 31(4–5):212– screening. Int. J. Comput. Assist. Radiol. Surg. 11(9):1637–1646.
223. https://doi.org/10.1016/j.compmedimag.2007.02.003, 2007. https://doi.org/10.1007/s11548-016-1359-6, 2016.
http://www.sciencedirect.com/science/article/pii/S089561110700 45. Shiraishi, J., Katsuragawa, S., Ikezoe, J., Matsumoto, T.,
0286. Computer-aided Diagnosis (CAD) and Image-guided Kobayashi, T., Komatsu, K., Matsui, M., Fujita, H., Kodera, Y.,
Decision Support. and Doi, K., Development of a digital image database for chest
27. KC, S., Vajda, S., Antani, S., and Thoma, G., Automatic pulmonary radiographs with and without a lung nodule: receiver operating
abnormality screening using thoracic edge map. In: Int. Sympo- characteristic analysis of radiologists detection of pulmonary
sium on computer-based medical systems, pp. 360–361, 2015. nodules. Am. J. Roentgenol. 174:71–74, 2000.
28. Kim, H. E., and Hwang, S., Scale-invariant feature learning using 46. Shiraishi, J., Li, F., and Doi, K., Computer-aided diagnosis
deconvolutional neural networks for weakly-supervised semantic for improved detection of lung nodules by use of posterior-
segmentation. CoRR arXiv:1602.04984, 2016. anterior and lateral chest radiographs. Acad. Radiol. 14(1):28–
29. Kooi, T., Litjens, G. J. S., van Ginneken, B., Gubern-mérida, 37. https://doi.org/10.1016/j.acra.2006.09.057, 2007. http://www.
A., Sánchez, C. I., Mann, R., den Heeten, A., and Karssemeijer, sciencedirect.com/science/article/pii/S1076633206005599.
N., Large scale deep learning for computer aided detection 47. Shiraishi, J., Li, Q., Appelbaum, D., and Doi, K., Computer-
of mammographic lesions. Med. Image Anal. 35:303–312. aided diagnosis and artificial intelligence in clinical imag-
https://doi.org/10.1016/j.media.2016.07.007, 2017. ing. Semin. Nucl. Med. 41(6):449–462. https://doi.org/10.1053
30. Li, Q., Recent progress in computer-aided diagnosis of lung /j.semnuclmed.2011.06.004, 2011. http://www.sciencedirect.com/
nodules on thin-section {CT}. Comput. Med. Imaging Graph. science/article/pii/S0001299811000742. Image Perception in
31(4–5):248–257. https://doi.org/10.1016/j.compmedimag.2007 Nuclear Medicine.
.02.005, 2007. http://www.sciencedirect.com/science/article/pii/ 48. Singh, S., and Sharma, M., Texture analysis experiments with
S0895611107000316. Computer-aided Diagnosis (CAD) and meastex and vistex benchmarks. In: Singh, S., Murshed, N., and
Image-guided Decision Support. Kropatsch, W. (Eds.) Advances in Pattern Recognition — ICAPR
J Med Syst (2018) 42: 146 Page 11 of 11 146

2001, Lecture Notes in Computer Science, pp. 419–426. Berlin: 52. Weinberger, S., Cockrill, B., and Mandel, J., Principles of
Springer, 2001. pulmonary medicine. Elsevier Health Sciences, 2013.
49. Smialowski, P., Frishman, D., and Kramer, S., Pitfalls of 53. Zhang, Y., Sun, Y., Phillips, P., Liu, G., Zhou, X., and Wang, S.,
supervised feature selection. Bioinformatics 26(3):440–443, 2010. A multilayer perceptron based smart pathological brain detection
50. Vajda, S., Rangoni, Y., and Cecotti, H., Semi-automatic ground system by fractional fourier entropy. J. Med. Syst. 40(7):1–11,
truth generation using unsupervised clustering and limited man- 2016.
ual labeling: Application to handwritten character recognition. 54. Zhu, Y., Tan, Y., Hua, Y., Wang, M., Zhang, G., and Zhang, J.,
Pattern Recogn. Lett. 58(0):23–28, 2015. Feature selection and performance evaluation of support vector
51. Wang, S. H., Muhammad, K., Lv, Y., Sui, Y., Han, L., and Zhang, machine (svm)-based classifier for differentiating benign and
Y. D., Identification of alcoholism based on wavelet renyi entropy malignant pulmonary nodules by computed tomography. J. Digit.
and three-segment encoded jaya algorithm. Complexity 2018:13, Imaging 23(1):51–65, 2010.
2018.

View publication stats

You might also like