[go: up one dir, main page]

0% found this document useful (0 votes)
65 views6 pages

How Do Convolutional Neural Networks Learn Design

This document summarizes a research paper that used convolutional neural networks and layer-wise relevance propagation to analyze visual design elements in book cover images that help classify the book's genre. Specifically: 1. CNNs were used to classify book cover images into genres, achieving varying accuracy levels across genres. 2. Layer-wise relevance propagation was then applied to the CNNs to generate heatmaps highlighting the pixels most relevant to each genre classification. 3. Additional object and text detection methods were used to quantitatively analyze the visual elements identified by the heatmaps as most important for genre classification.

Uploaded by

Thaís Donega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views6 pages

How Do Convolutional Neural Networks Learn Design

This document summarizes a research paper that used convolutional neural networks and layer-wise relevance propagation to analyze visual design elements in book cover images that help classify the book's genre. Specifically: 1. CNNs were used to classify book cover images into genres, achieving varying accuracy levels across genres. 2. Layer-wise relevance propagation was then applied to the CNNs to generate heatmaps highlighting the pixels most relevant to each genre classification. 3. Additional object and text detection methods were used to quantitatively analyze the visual elements identified by the heatmaps as most important for genre classification.

Uploaded by

Thaís Donega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2018 24th International Conference on Pattern Recognition (ICPR)

Beijing, China, August 20-24, 2018

How do Convolutional Neural Networks


Learn Design?
Shailza Jolly∗ , Brian Kenji Iwana† , Ryohei Kuroki† , Seiichi Uchida†
∗ University
of Kaiserslautern, Kaiserlautern, Germany
Email: sjolly@rhrk.uni-kl.de
† Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan

Email: {brian, kuroki, uchida}@human.ait.kyushu-u.ac.jp

Abstract—In this paper, we aim to understand the design [7]. LRP decomposes output function on its input variables and
principles in book cover images which are carefully crafted by highlights input pixels contributing towards the network deci-
experts. Book covers are designed in a unique way, specific to sion. It produces a layer-wise relevance heatmap by recursively
genres which convey important information to their readers. By
using Convolutional Neural Networks (CNN) to predict book gen- multiplying the relevance of higher layers by the normalized
res from cover images, visual cues which distinguish genres can feature maps of the target layer. The heatmaps can help us to
be highlighted and analyzed. In order to understand these visual discover the input image elements which have an effect on the
clues contributing towards the decision of a genre, we present the classification result.
application of Layer-wise Relevance Propagation (LRP) on the
book cover image classification results. We use LRP to explain
The main contributions of this paper are threefold. Firstly,
the pixel-wise contributions of book cover design and highlight we classified the book cover images using one-vs-others
the design elements contributing towards particular genres. In classification with CNNs. Secondly, the models built by the
addition, with the use of state-of-the-art object and text detection CNNs are analyzed using LRP. With LRP, we demonstrate
methods, insights about genre-specific book cover designs are design elements specifically relevant to classification of the
discovered.
book cover images. We show that certain objects have a strong
I. I NTRODUCTION relevance to particular genres. Finally, we use state-of-the-art
object detection and text detection methods, namely Single
Visual design renders specific impressions to transmit in- Shot Multibox Detector (SSD) [8] and Efficient and Accurate
formation which enriches the product’s value. However, these Scene Text Detector [9], to quantitatively enforce the results
visual designs despite of being important are not analyzed found by LRP. This reveals the specific elements in which
objectively or statistically. Analyzing these visual designs CNNs classify book cover images for genre classification.
enables us to understand the contained information carried by The organization is as follows. Section II provides related
them. works in design understanding and genre classification as
An interesting target of visual design analysis is book cover well as feature visualization of CNNs. Section III reviews
image design where the design of a book cover can infer the the data and tools used for understanding book cover design.
genre. Each book cover is carefully designed by typographers Section IV presents analysis of CNN’s understanding of book
and their designs represent the book contents in an intuitive cover design. In Section V, we demonstrate the use of LRP
way for better sales. This association of books to specific combined with SSD and EAST for quantitative analysis.
genres is based on the differences in their underlying book Finally, Section VI draws a conclusion.
cover designs [1]. The slight change in book cover design can
reflect changes in book genre which makes design learning a
challenging task for book covers. II. R ELATED W ORK
In order to understand the design elements used for machine
A. Genre Classification
aided book cover classification, we employ Convolutional
Neural Networks (CNN) [2]. In recent years, CNNs have Artistic style understanding and subjective genre classifica-
achieved state-of-the-art results in isolated character recog- tion is a budding field in machine learning. For example, recent
nition [3], [4] and large-scale image recognition [5], [6]. attempts have been done to identify artistic styles and quality
Notably, Iwana et al. [1] demonstrated that CNNs can be of paintings and photographs [10], [11] with neural network
used for genre classification based on book cover image, models. In addition, there have been trials to classify music
although with a high level of difficulty. However, that study by genre [12], [13], book covers by genre [1], movie posters
was subjective and not enough explanation is given as to why by genre [14], paintings by genre [15], and text by genre [16],
the CNN performed as it did. [17]. Also, in a general sense, document classification can be
To interpret the reasoning behind a CNN’s prediction we considered genre classification and deep CNNs are the state-
used a method called Layer-wise Relevance Propagation (LRP) of-the-art in the document classification domain [18]–[20].

978-1-5386-3788-3/18/$31.00 ©2018 IEEE 1085

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on November 22,2023 at 22:06:37 UTC from IEEE Xplore. Restrictions apply.
B. Visualization inside of CNNs
There is a desire to visualize features and determine pixel-
wise attention and relevance within the hidden layers of CNNs.
However, this is a not a straightforward task [21]. Erhan
et al. [21] proposed using gradient decent to maximize a
node’s activation to visualize the employed features. Similar
work has been done for large-scale image classification [22].
Zeiler and Fergus [23] used deconvolutional neural networks to
visualize features learned by CNNs. In addition, they created
heatmaps by monitoring class changes systematic cover up of
portions of the images. Class Activation Maps (CAM) [24],
GradCAM [25], and GradCAM++ [26] reveal the parts of Fig. 1. CNN accuracy by genre.
images which are most important to a class using global
average pooling (GAP).
Recently, LRP has been used in the fields of text [27] where reference classes, such as ”Engineering & Transportation,”
classification scores were projected back to input features for ”Health, Fitness and Dieting,” ”History,” ”Medical Books,”
extracting relevant words for a specific prediction. The method and ”Reference.” Conversely, ”Children Books,” ”Romance,”
has also shown successes in model understanding in fields of and ”Test Preparation” had high accuracies. However, more
sentiment analysis [28], action recognition [29], and age and than just classification accuracy, the purpose of this paper is
gender classification [30]. As far as the authors are aware, this to understand why the CNN’s performed as such and reveal
is the first time LRP has been used for the understanding of the relevant parts of the images.
genre or design classification.
C. Layer-wise Relevance Propagation
III. DATA AND TOOLS FOR UNDERSTANDING BOOK COVER
The LRP algorithm and the LRP toolbox [31] aims to
DESIGN
explain the reasoning behind the decision made by a network
A. Amazon Book Cover Dataset model which allows its users to validate classifier results.
We used the Book Cover Image to Genre dataset1 Task 1.A. LRP is mainly derived from Deep Taylor Decomposition [32],
The dataset consists of 57,000 book cover images divided a method of decomposing network’s output predictions onto
into 30 classes of equal sizes. In the experiments, we used its input variable. The results after such a decomposition is
the predefined training set and test set modified for one-vs- visualized in the form of a heatmap highlighting each pixel’s
others classification. In this way, genre-wise training sets were importance for the prediction.
prepared with an equal distribution of positive and negative LRP explains output function, i.e. classifier’s decision,
data samples. which helps us to derive all of the crucial pixels for a
particular prediction. In Fig. 2, the technique is shown in
B. Convolution Neural Networks which the output value given by the network is decomposed
CNNs are able to tackle image recognition by implementing backwards layer by layer until it reaches the input. This
convolutions of learned filter-like shared weights which main- backward decomposition of network’s prediction uses local
tain the structural qualities of images while acting as feature redistribution rules for assigning relevance values Ri to each
extractors [2]. For the experiment, we implement CNNs to neuron contributing towards the output, namely
tackle book genre classification. To use the book cover images X X X
with a CNN, they were preprocessed by scaling them to Ri = Rj = · · · = Rk = f (x), (1)
112×112 pixels by 3 color channel images and by normalizing i j k

the values to be between -1 and 1. The CNN used for the where f (x) is the prediction function, Ri is the relevance of
experiments has six convolutional layers with Rectified Linear node i in the target layer, Rj is the relevance of node j of
Units (ReLU) activations and a softmax output layer. The the previous layer, and Rk is the relevance of node k of the
convolutional layers consisted of three layers of 10 nodes with highest layer. The total amount of relevance is conserved in
5 × 5 convolutional filters, one layer of 25 nodes with a 4 × 4 this equation.
filter, one layer of 50 nodes with a 3 × 3 filter, and one layer For the experiment, we used the α − β decomposition
of 100 nodes with a 1×1 filter. A 2×2 maxpooling layer with formula defined by
stride 2 was used between each convolution layer. Finally, the
X  (ai wij )+ (ai wij )−

CNNs were trained using gradient decent with a batch size of Ri = αP + β Rj , (2)

+
P
25 at a learning rate of 0.001 for 50,000 iterations. j i (ai wij ) i (ai wij )
The accuracy results for each genre is summarized in
Fig. 1. In particular, the CNNs had difficulties with the where α and β are hyperparameters to weight the positive
(ai wij ) (ai wij )
values of P (a i wij )
and the negative values of P (a i wij )
,
i i
1 https://github.com/uchidalab/book-dataset respectively. Furthermore, wij is the weight between nodes i

1086

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on November 22,2023 at 22:06:37 UTC from IEEE Xplore. Restrictions apply.
A. Sports & Outdoors
Under this genre, many book covers with pictures of players
playing indoor and outdoor games were seen. Figure 3 (a)
shows LRP results on these covers, which presents significance
of player’s picture on the cover. The first image in Fig. 3 (a)
supports this fact with LRP being centered on players who are
either playing a sport or showing some player like gesture,
Fig. 2. Feed forward neural network with the (left) forward pass and the
(right) backward relevance calculation. The function f (x) is the prediction with car in background adding no contribution. The second
outcome given input x. The variables ai and aj are the inputs for node i and image in Fig. 3 (a) emphasizes the animal’s importance for
j, respectively. Ri is the relevance of node i and Rj is the relevance of node this genre’s prediction.
j.

B. Engineering & Transportation


and j and ai is the input to node i. This decomposition allows
for the separation of the positive connections and the negative For this genre, almost all the covers with vehicle pictures
connections. Values inside positive bracket indicates propa- on their covers were classified correctly by the network. With
gation of activating input messages while negative weight LRP in Fig. 3 (b), part of image containing cars or motorbikes
connections indicate deactivating input values. seem to add more relevance than others. The last image in the
Fig. 3 (b) presents the cases when contribution of person image
D. Single-Shot Multibox Detector was dominated by vehicle in the image.
To develop a better understanding of the objects within book
cover images, we employed SSD [8], a state-of-the-art deep C. Romance
neural network based object detection method. SSD is a feed
forward CNN which produces a multi-scale collection of fixed Its obvious from the genre name that pictures of couples on
size bounding boxes and scores for object detection within the the cover are going to have more relevance and LRP results
boxes. A final non-maximal suppression step determines the showed this fact to be true. However, among pictures presented
final detections. The result of SSD is bounding box regions in Fig. 3 (c), LRP depicted girls to add more relevance
with object classification labels. Using SSD, it is possible to than men or other things. The reason could reside in their
accurately detect multiple objects of different classes within physical appearance, hairs, and choice of dresses. The same
images. was demonstrated in last picture of Fig. 3 (c) in which girl’s
hair are seen to add more relevance with zero relevance coming
E. Efficient and Accurate Scene Text Detector from animal part on book cover.

For humans, text is an important component of book covers;


it is where the title, authors, and additional information is D. Children’s Books
conveyed. However, a CNN may place a different importance Almost all the children book covers contain pictures of car-
on text than humans. Thus, to analyze the relevance of text in toon characters. LRP on covers from this genre showed these
book covers, we use EAST [9] as a text detector. EAST uses cartoon characters to have higher relevance. An interesting
a multi-channel Fully Convolutional Network (FCN) and non- result is shown in first picture of Fig. 3 (d), where person is
maximal suppression on predicted geometric shapes to detect depicted as an adversarial identity and importance of cartoons
multi-orient text-line and word boxes. in cover is highlighted. Some covers showed more relevance
for one object in the set of objects. Like, in Fig. 3 (d) some
IV. H OW CNN S U NDERSTAND B OOK C OVER D ESIGN :
cartoons in last picture have higher relevance. It can be because
Q UALITATIVE A NALYSIS
of the object placement and their orientations. With the help
In this section, we have presented LRP results from main of this information, one can make smart choices for different
genres. The analysis helped us to deduce book cover design characters, cartoons and color patterns.
elements contributing towards a prediction by CNN. We used
α − β decomposition formula with values of α = 2 and β = E. Cookbooks, Food & Wine Books
−1 which is suggested for networks using ReLU activation
functions because it emphasizes the positive elements and de- Book covers in this genre most commonly contained pic-
emphasizes the negative ones [7]. This is important due to tures of different kinds of food. The results in Fig. 3 (e)
the ReLU activation function setting negative values to zero. showed these food pictures as containment of higher relevance
In the heatmaps generated by LRP under this decomposition, for this genre. However, carefully analyzing LRP results we
pixels adding positive contribution are represented in red color discovered shapes of dishes like bowls or spoons adding
and the ones adding negative contribution are represented by significant relevance for the genre’s prediction. So, this marks
blue color. significance of dish shape designs on covers from this genre.

1087

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on November 22,2023 at 22:06:37 UTC from IEEE Xplore. Restrictions apply.
(a) “Sports & Outdoors” (b) “Engineering & Transportation” (c) “Romance”

(d) “Children’s Books” (e) “Cookbooks, Food & Wine” (f) “Test Preparation”

Fig. 3. Correctly recognized book covers. Object classes by SSD and text by EAST are highlighted.

F. Test Preparation V. H OW CNN S U NDERSTAND B OOK C OVER D ESIGN :


The genre contained covers with both text and pictorial Q UANTITATIVE A NALYSIS
information as shown in Fig. 3 (f). With most contribution
A. Experiment Setup
coming from big text content on covers. Images of Fig. 3 (f)
presents big texts to add more relevance than images of people. In order to quantitatively analyze LRP, we propose using
In first image of Fig. 3 (f), despite of big girl face, relevance SSD to detect objects and EAST to detect text within the book
is concentrated on text area of book cover. cover images. We then use LRP to compare the relevance
Such analysis helped us to find design elements specific of objects bound by the detection methods. The SSD was
to the presented genres. To get more familiar with design, trained on the 2012 PASCAL Visual Object Classes (VOC)
we also presented some cases where the network was not Challenge dataset [33]. The VOC dataset contains 20 classes,
able to correctly classify the genre. Figure 4 shows some of including ”person,” six animal classes, eight vehicle classes,
these misclassifications, mainly from the presented genres. The and seven indoor object classes. While SSD trained with VOC
correct genre names are written below the image. From the is intended for natural scene images, it can be used with book
analysis presented above, one can easily decode the reason cover images because book covers often contain many of the
behind their misclassification because the designs on these shared classes, such as ”person” and ”car.” Similarly, EAST
book covers are not aligned with their genres which makes it was trained on the 2015 ICDAR Robust Reading Competition
obvious for network to mis-classify. Here, cover from ”Sports dataset [34] meant for scene text detection. Despite being
& Outdoors” contains birds, ”Romance” cover contains text, trained for scene text, shown in Fig. 3, EAST performs
”Cookbooks, Food & Wine Books” contain no food picture remarkably well on book covers for detecting text.
and ”Test Preparation” cover is also without any significant To extract object and text bounding box information, the
feature. LRP justifies all these covers misclassification by book covers were prepared by scaling the images to 512×512
highlighting these mentioned objects contributing towards the pixels by 3 color channels. It is important to note that the
”other” class in one-vs-others. images used for SSD and EAST were larger than the images

1088

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on November 22,2023 at 22:06:37 UTC from IEEE Xplore. Restrictions apply.
“Sports & Outdoors” “Romance”

“Cookbooks,Food & Wine” “Test Preparation”

Fig. 4. “Misclassified” book covers with correct genre names written below
each book cover.

used by the CNN used for genre classification. This is due to Fig. 5. Average object-wise relevance for text detected by EAST and
each object class detected by SSD for each book genre. Only object-genre
the detection methods being much more effective at the higher combinations with five or more data points are shown.
resolution. To accommodate this, the bounding boxes were
scaled post detection and projected onto the LRP heatmaps.
The relevance of an object Robj is calculated using the sum
of the relevance within the bounding box, or
X
Robj = R(n,m) , (3)
(n,m)∈B

where R(n,m) is the relevance at pixel coordinates (n, m)


within bounding box B.

B. LRP with Object Detection


A macro view of the genres can be seen by viewing the
average relevance of object classes. Figure 5 illustrates the
average object-wise relevance of each object class as detected
by SSD and EAST for each book genre using the test set
book cover images. It should be noted that detected objects
Fig. 6. Box plot of relevance of ”person” Rperson for each genre. The boxes
such as ”bottle” and ”tvmonitor” were overfit to certain book represent the first through third quartile and the mean is in red. The whiskers
cover images because many books have plain covers which mark the minimum and maximum datum.
resemble bottle labels or televisions. However, this does not
mean that the information is useless. For example, from Fig. 5,
”bottle” is more relevant for reference and nonfiction genres C. LRP with Text Detection
where plain covers are common. Figure 5 also reveals that the average relevance of text is
In addition, by examining the distribution of the Robj of low. The reasoning behind this phenomenon can be explained
specific object classes, such as ”person,” it is possible to by Fig. 7. The figure shows that the majority of the detected
create associations between genres and detected objects. For text boxes have a very small relevance Rtext , but there are
example, the relevance of ”person” Rperson for each genre some text boxes have a higher relevance. For most genres, the
is shown in Fig. 6. The figure demonstrates that detected title text contains a significant amount of relevance determined
”person”s within certain genres are more relevant than other by LRP, but the small descriptive text carries very little
genres. For instance, the genres of ”Romance” and ”Mystery, relevance. Figure 3 (f) in particular demonstrates this with
Thriller & Suspense” put a high average relevance in ”person.” the large title text having a high relevance and much of the
This indicates that ”person” is important for the CNNs of those smaller descriptive text having near zero relevance.
categories. In addition, mentioned in Section IV-F and shown
in Fig. 3 (f), people are common in ”Test Preparation” but VI. C ONCLUSION
are not necessarily relevant. This is supported by Fig. 6 which In this paper, we presented importance of design in book
indicates that on average, ”person” has very little relevance. covers belonging to a specific genre. The application of LRP
Distributions for the other object classes are provided in the on the book cover dataset showed genre specific book cover
supplemental material. features. The method described most relevant parts of input

1089

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on November 22,2023 at 22:06:37 UTC from IEEE Xplore. Restrictions apply.
[13] C. McKay and I. Fujinaga, “Automatic genre classification using large
high-level musical feature sets.” in Int. Soc. of Music Inform. Retrieval,
vol. 2004, 2004, pp. 525–530.
[14] W.-T. Chu and H.-J. Guo, “Movie genre classification based on poster
images with deep neural networks,” in Proc. Workshop Multimodal
Understanding of Social, Affective and Subjective Attributes, 2017, pp.
39–45.
[15] J. Zujovic, L. Gandy, S. Friedman, B. Pardo, and T. N. Pappas,
“Classifying paintings by artistic genre: An analysis of features &
classifiers,” in IEEE Int. Workshop Multimedia Sig. Process. IEEE,
2009, pp. 1–5.
[16] A. Finn and N. Kushmerick, “Learning to classify documents according
to genre,” J. Amer. Soc. for Inform. Sci. and Technology, vol. 57, no. 11,
pp. 1506–1518, 2006.
[17] P. Petrenz and B. Webber, “Stable classification of text genres,” Com-
putational Linguistics, vol. 37, no. 2, pp. 385–393, 2011.
[18] L. Kang, J. Kumar, P. Ye, Y. Li, and D. Doermann, “Convolutional
neural networks for document image classification,” in Int. Conf. Pattern
Recognition. IEEE, 2014, pp. 3168–3172.
Fig. 7. Box plot of relevance of ”text” Rtext for each genre. The boxes [19] A. W. Harley, A. Ufkes, and K. G. Derpanis, “Evaluation of deep
represent the first through third quartile and the mean is in red. The whiskers convolutional nets for document image classification and retrieval,” in
mark the minimum and maximum datum. Int. Conf. Document Anal. and Recognition, 2015, pp. 991–995.
[20] M. Z. Afzal, S. Capobianco, M. I. Malik, S. Marinai, T. M. Breuel,
A. Dengel, and M. Liwicki, “Deepdocclassifier: Document classification
with deep convolutional neural network,” in Int. Conf. Document Anal.
book cover contributing towards a genre prediction by CNN. and Recognition, 2015, pp. 1111–1115.
We also presented quantitative analysis of LRP using an object [21] D. Erhan, Y. Bengio, A. Courville, and P. Vincent, “Visualizing higher-
detection method, SSD, and a text detection method, EAST. layer features of a deep network,” Tech. Rep. University of Montreal,
vol. 1341, p. 3, 2009.
The analysis further demonstrates that genre classification [22] C. Olah, A. Mordvintsev, and L. Schubert, “Feature visualization,”
heavily relies on specific objects for each genres. Distill, 2017, https://distill.pub/2017/feature-visualization.
[23] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu-
VII. ACKNOWLEDGEMENT tional networks,” in European Conf. Comput. Vision. Springer, 2014,
This research was partially supported by MEXT-Japan pp. 818–833.
[24] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning
(Grant No.J17H06100). deep features for discriminative localization,” in Proc. IEEE Conf.
Comput. Vision and Pattern Recognition, 2016, pp. 2921–2929.
R EFERENCES [25] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and
[1] B. K. Iwana, S. T. R. Rizvi, S. Ahmed, A. Dengel, and S. Uchida, D. Batra, “Grad-cam: Why did you say that? visual explanations
“Judging a book by its cover,” arXiv preprint arXiv:1610.09204, 2016. from deep networks via gradient-based localization,” arXiv preprint
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning arXiv:1610.02391, 2016.
applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278– [26] A. Chattopadhyay, A. Sarkar, P. Howlader, and V. N. Balasubramanian,
2324, 1998. “Grad-cam++: Generalized gradient-based visual explanations for deep
[3] D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep neural convolutional networks,” arXiv preprint arXiv:1710.11063, 2017.
networks for image classification,” in Proc. IEEE Conf. Comput. Vision [27] L. Arras, F. Horn, G. Montavon, K.-R. Müller, and W. Samek, “”what
and Pattern Recognition. IEEE, 2012, pp. 3642–3649. is relevant in a text document?”: An interpretable machine learning
[4] S. Uchida, S. Ide, B. K. Iwana, and A. Zhu, “A further step to perfect approach,” Plos ONE, 2017.
accuracy by training cnn with larger data,” in Int. Conf. Frontiers in [28] L. Arras, G. Montavon, K.-R. Müller, and W. Samek, “Explaining
Handwriting Recognition, 2016. recurrent neural network predictions in sentiment analysis,” in Proc.
[5] K. Simonyan and A. Zisserman, “Very deep convolutional networks for Workshop Computational Approaches to Subjectivity, Sentiment and
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. Social Media Anal. Association for Computational Linguistics, 2017,
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, pp. 159–168.
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” [29] V. Srinivasan, S. Lapuschkin, C. Hellge, K.-R. Müller, and W. Samek,
in Proc. IEEE Conf. Comput. Vision and Pattern Recognition, 2015, pp. “Interpretable human action recognition in compressed domain,” in IEEE
1–9. Int. Conf. Acoustics, Speech and Sig. Process., 2017.
[7] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and [30] F. Arbabzadeh, G. Montavon, K.-R. Müller, and W. Samek, “Identifying
W. Samek, “On pixel-wise explanations for non-linear classifier deci- individual facial expressions by deconstructing a neural network,” in
sions by layer-wise relevance propagation,” PloS ONE, vol. 10, no. 7, German Conf. Pattern Recognition, ser. Lecture Notes Comput. Science,
p. e0130140, 2015. B. Rosenhahn and B. Andres, Eds. Springer International Publishing,
[8] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and 2016, vol. 9796, pp. 344–354.
A. C. Berg, “Ssd: Single shot multibox detector,” in European Conf. [31] S. Lapuschkin, A. Binder, G. Montavon, K.-R. Müller, and W. Samek,
Comput. Vision. Springer, 2016, pp. 21–37. “The lrp toolbox for artificial neural networks,” J. Mach. Learning
[9] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “East: Research, vol. 17, no. 1, pp. 3938–3942, 2016.
An efficient and accurate scene text detector,” in IEEE Conf. Comput. [32] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. Müller,
Vision and Patern Recognition. IEEE, 2017, pp. 2642–2651. “Explaining nonlinear classification decisions with deep taylor decom-
[10] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, position,” Pattern Recognition, vol. 65, pp. 211–222, 2017.
influences, and trends of the new age,” ACM Computing Surveys, vol. 40, [33] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisser-
no. 2, p. 5, 2008. man, “The pascal visual object classes (voc) challenge,” Int. J. Comput.
[11] S. Karayev, M. Trentacoste, H. Han, A. Agarwala, T. Darrell, A. Hertz- Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010.
mann, and H. Winnemoeller, “Recognizing image style,” arXiv preprint [34] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov,
arXiv:1311.3715, 2013. M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu et al.,
[12] G. Tzanetakis and P. Cook, “Musical genre classification of audio “Icdar 2015 competition on robust reading,” in Int. Conf. Document
signals,” IEEE Trans. Speech and Audio Process., vol. 10, no. 5, pp. Anal. and Recognition, 2015, pp. 1156–1160.
293–302, 2002.

1090

Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on November 22,2023 at 22:06:37 UTC from IEEE Xplore. Restrictions apply.

You might also like