[go: up one dir, main page]

0% found this document useful (0 votes)
46 views9 pages

Semi-Supervised Domain Adaptation

1) The document proposes a new method called Minimax Entropy (MME) for semi-supervised domain adaptation when a few labeled examples are available in the target domain. 2) MME estimates class prototypes (representatives) and alternately maximizes the entropy of unlabeled target data to update the prototypes, and minimizes the entropy to update the feature extractor and cluster features around the prototypes. 3) This minimax game formulation makes the prototypes domain-invariant and the features discriminative, outperforming baselines that ignore unlabeled data or only perform distribution alignment.

Uploaded by

chatgptfortcc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views9 pages

Semi-Supervised Domain Adaptation

1) The document proposes a new method called Minimax Entropy (MME) for semi-supervised domain adaptation when a few labeled examples are available in the target domain. 2) MME estimates class prototypes (representatives) and alternately maximizes the entropy of unlabeled target data to update the prototypes, and minimizes the entropy to update the feature extractor and cluster features around the prototypes. 3) This minimax game formulation makes the prototypes domain-invariant and the features discriminative, outperforming baselines that ignore unlabeled data or only perform distribution alignment.

Uploaded by

chatgptfortcc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Semi-supervised Domain Adaptation via Minimax Entropy

Kuniaki Saito1 , Donghyun Kim1 , Stan Sclaroff1 , Trevor Darrell2 and Kate Saenko1
1
Boston University, 2 University of California, Berkeley
1 2
{keisaito, donhk, sclaroff, saenko}@bu.edu, trevor@eecs.berkeley.edu

Abstract Task-specific
Classifier
Conventional Domain Classifier Based Method
Train Domain Classifier Update features
Domain Classifier
Contemporary domain adaptation methods are very ef- Labeled Source
fective at aligning feature distributions of source and tar- Labeled Target
get domains without any target supervision. However, Unlabeled Target
we show that these techniques perform poorly when even Estimated Prototypes
Deceive
Domain Classifier
a few labeled examples are available in the target do-
main. To address this semi-supervised domain adapta- Minimax Entropy (Ours)
tion (SSDA) setting, we propose a novel Minimax Entropy Estimate prototype
Update prototype Update features
(MME) approach that adversarially optimizes an adaptive with labeled examples

few-shot model. Our base model consists of a feature


encoding network, followed by a classification layer that
computes the features’ similarity to estimated prototypes
Maximize entropy Minimize entropy
(representatives of each class). Adaptation is achieved on unlabeled target on unlabeled target

by alternately maximizing the conditional entropy of un- Figure 1: We address the task of semi-supervised domain adapta-
labeled target data with respect to the classifier and min- tion. Top: Existing domain-classifier based methods align source
imizing it with respect to the feature encoder. We em- and target distributions but can fail by generating ambiguous fea-
pirically demonstrate the superiority of our method over tures near the task decision boundary. Bottom: Our method esti-
many baselines, including conventional feature alignment mates a representative point of each class (prototype) and extracts
and few-shot methods, setting a new state of the art for discriminative features using a novel minimax entropy technique.
SSDA. Our code is available at http://cs-people.
bu.edu/keisaito/research/MME.html.
call Minimax Entropy (MME), is based on optimizing a
minimax loss on the conditional entropy of unlabeled data,
1. Introduction as well as the task loss; this reduces the distribution gap
while learning discriminative features for the task.
Deep convolutional neural networks [16] have signifi- We exploit a cosine similarity-based classifier architec-
cantly improved image classification accuracy with the help ture recently proposed for few-shot learning [12, 5]. The
of large quantities of labeled training data, but often gener- classifier (top layer) predicts a K-way class probability vec-
alize poorly to new domains. Recent unsupervised domain tor by computing cosine similarity between K class-specific
adaptation (UDA) methods [11, 19, 20, 28, 37] improve weight vectors and the output of a feature extractor (lower
generalization on unlabeled target data by aligning distri- layers), followed by a softmax. Each class weight vector is
butions, but can fail to learn discriminative class boundaries an estimated “prototype” that can be regarded as a represen-
on target domains (see Fig. 1.) We show that in the Semi- tative point of that class. While this approach outperformed
Supervised Domain Adaptation (SSDA) setting where a few more advanced methods in few-shot learning and we con-
target labels are available, such methods often do not im- firmed its effectiveness in our setting, as we show below it
prove performance relative to just training on labeled source is still quite limited. In particular, it does not leverage unla-
and target examples, and can even make it worse. beled data in the target domain.
We propose a novel approach for SSDA that overcomes Our key idea is to minimize the distance between the
the limitations of previous methods and significantly im- class prototypes and neighboring unlabeled target samples,
proves the accuracy of deep classifiers on novel domains thereby extracting discriminative features. The problem is
with only a few labels per class. Our approach, which we how to estimate domain-invariant prototypes without many

8050
Class1 Class2 Baseline Few-shot Learning Method
Estimated Prototypes
Entire Network Optimization without unlabeled examples

Labeled Source
Labeled Target
Classification loss
Unlabeled Target minimization

Proposed Method
Step1: Update Estimated Prototypes Step2: Update Feature Extractor

Entropy Maximization Entropy Minimization

Figure 2: Top: baseline few-shot learning method, which estimates class prototypes by weight vectors, yet does not consider
unlabeled data. Bottom: our model extracts discriminative and domain-invariant features using unlabeled data through a
domain-invariant prototype estimation. Step 1: we update the estimated prototypes in the classifier to maximize the entropy
on the unlabeled target domain. Step 2: we minimize the entropy with respect to the feature extractor to cluster features
around the estimated prototype.

labeled target examples. The prototypes are dominated by 2. Related Work


the source domain, as shown in the leftmost side of Fig. 2 Domain Adaptation. Semi-supervised domain adapta-
(bottom), as the vast majority of labeled examples come tion (SSDA) is a very important task [8, 40, 1], however it
from the source. To estimate domain-invariant prototypes, has not been fully explored, especially with regard to deep
we move weight vectors toward the target feature distribu- learning based methods. We revisit this task and compare
tion. Entropy on target examples represents the similarity our approach to recent semi-supervised learning or unsu-
between the estimated prototypes and target features. A uni- pervised domain adaptation methods. The main challenge
form output distribution with high entropy indicates that the in domain adaptation (DA) is the gap in feature distribu-
examples are similar to all prototype weight vectors. There- tions between domains, which degrades the source classi-
fore, we move the weight vectors towards target by maxi- fier’s performance. Most recent work has focused on unsu-
mizing the entropy of unlabeled target examples in the first pervised domain adaptation (UDA) and, in particular, fea-
adversarial step. Second, we update the feature extractor to ture distribution alignment. The basic approach measures
minimize the entropy of the unlabeled examples, to make the distance between feature distributions in source and tar-
them better clustered around the prototypes. This process get, then trains a model to minimize this distance. Many
is formulated as a mini-max game between the weight vec- UDA methods utilize a domain classifier to measure the dis-
tors and the feature extractor and applied over the unlabeled tance [11, 37, 19, 20, 33]. The domain-classifier is trained
target examples. to discriminate whether input features come from the source
Our method offers a new state-of-the-art in performance or target, whereas the feature extractor is trained to deceive
on SSDA; as reported below, we reduce the error relative to the domain classifier to match feature distributions. UDA
baseline few-shot methods which ignore unlabeled data by has been applied to various applications such as image clas-
8.5%, relative to current best-performing alignment meth- sification [27], semantic segmentation [32], and object de-
ods by 8.8%, and relative to a simple model jointly trained tection [6, 29]. Some methods minimize task-specific deci-
on source and target by 11.3% in one adaptation scenario. sion boundaries’ disagreement on target examples [30, 28]
Our contributions are summarized as follows: to push target features far from decision boundaries. In this
respect, they increase between-class variance of target fea-
tures; on the other hand, we propose to make target features
• We highlight the limitations of state-of-the-art domain
well-clustered around estimated prototypes. Our MME ap-
adaptation methods in the SSDA setting;
proach can reduce within-class variance as well as increas-
• We propose a novel adversarial method, Minimax En- ing between-class variance, which results in more discrim-
tropy (MME), designed for the SSDA task; inative features. Interestingly, we empirically observe that
UDA methods [11, 20, 28] often fail in improving accuracy
• We show our method’s superiority to existing methods in SSDA.
on benchmark datasets for domain adaptation. Semi-supervised learning (SSL). Generative [7, 31],

8051


Class1

・ ・
・ ・ ・

Labeled Target
Labeled Source

Class2

Unlabeled Target
sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit>
sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit><latexit
sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit><latexit
sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit><latexit
latexit
<
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit>
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit><latexit
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit><latexit
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit><latexit
latexit
<
F
C

sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit>
sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit><latexit
sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit><latexit
sha1_base64="p+V/6RrHgYVCGEasarEveFmTkbk=">AAACZHichVHLSgMxFD0dX7VWWy2CIEixVFxJKoLiqiiIyz7sA2opM2Nah86LmWmhFn9At4oLVwoi4me48Qdc9AcEcVnBjQtvpwOiRb0hycnJPTcniWSqiu0w1vEJQ8Mjo2P+8cBEcHIqFJ6eydtGw5J5TjZUwypKos1VRec5R3FUXjQtLmqSygtSfbu3X2hyy1YMfc9pmbysiTVdqSqy6BCV3qmEY2yFuREdBAkPxOBFygjfYh8HMCCjAQ0cOhzCKkTY1EpIgMEkrow2cRYhxd3nOEaAtA3K4pQhElunsUarksfqtO7VtF21TKeo1C1SRhFnT+yOddkju2cv7OPXWm23Rs9Li2apr+VmJXQyl33/V6XR7ODwS/WnZwdVbLheFfJuukzvFnJf3zy66GY3M/H2Ertmr+T/inXYA91Ab77JN2meuUSAPiDx87kHQX51JUE4vRZLbnlf4cc8FrFM772OJHaRQo7O5TjFGc59z0JQiAiz/VTB52ki+BbCwieSTonG</latexit><latexit
latexit
<
F

limitation of CEM in domain adaptation.


= Classifier

L2
Normalize
= Feature Extractor

sha1_base64="qJB/PHFia9q7YWEVD/rcOeHnvng=">AAACbXichVHLSgMxFD0d3/VVFUFQpFiqrkoqguKq6Malrz6wLWVmTOvQeTGTFrT0B1wLLkRBQUT8DDf+gIt+grhwUcGNC2+nA6JFvSHJyck9NyeJYuuaKxhrBKSu7p7evv6B4ODQ8MhoaGw85VoVR+VJ1dItJ6PILtc1kyeFJnSesR0uG4rO00p5o7WfrnLH1SxzTxzZPG/IJVMraqosiNrPGbI4VIq1Yr0QirAY8yLcCeI+iMCPLSt0ixwOYEFFBQY4TAjCOmS41LKIg8EmLo8acQ4hzdvnqCNI2gplccqQiS3TWKJV1mdNWrdqup5apVN06g4pw4iyJ3bHmuyR3bNn9vFrrZpXo+XliGalreV2YfRkavf9X5VBs8Dhl+pPzwJFrHpeNfJue0zrFmpbXz0+a+6u7URr8+yavZD/K9ZgD3QDs/qm3mzznXME6QPiP5+7E6SWYnHC28uRxLr/Ff2YxhwW6b1XkMAmtpCkc02c4gKXgVdpUpqRZtupUsDXTOBbSAufwZ6OBg==</latexit>
sha1_base64="qJB/PHFia9q7YWEVD/rcOeHnvng=">AAACbXichVHLSgMxFD0d3/VVFUFQpFiqrkoqguKq6Malrz6wLWVmTOvQeTGTFrT0B1wLLkRBQUT8DDf+gIt+grhwUcGNC2+nA6JFvSHJyck9NyeJYuuaKxhrBKSu7p7evv6B4ODQ8MhoaGw85VoVR+VJ1dItJ6PILtc1kyeFJnSesR0uG4rO00p5o7WfrnLH1SxzTxzZPG/IJVMraqosiNrPGbI4VIq1Yr0QirAY8yLcCeI+iMCPLSt0ixwOYEFFBQY4TAjCOmS41LKIg8EmLo8acQ4hzdvnqCNI2gplccqQiS3TWKJV1mdNWrdqup5apVN06g4pw4iyJ3bHmuyR3bNn9vFrrZpXo+XliGalreV2YfRkavf9X5VBs8Dhl+pPzwJFrHpeNfJue0zrFmpbXz0+a+6u7URr8+yavZD/K9ZgD3QDs/qm3mzznXME6QPiP5+7E6SWYnHC28uRxLr/Ff2YxhwW6b1XkMAmtpCkc02c4gKXgVdpUpqRZtupUsDXTOBbSAufwZ6OBg==</latexit><latexit
sha1_base64="qJB/PHFia9q7YWEVD/rcOeHnvng=">AAACbXichVHLSgMxFD0d3/VVFUFQpFiqrkoqguKq6Malrz6wLWVmTOvQeTGTFrT0B1wLLkRBQUT8DDf+gIt+grhwUcGNC2+nA6JFvSHJyck9NyeJYuuaKxhrBKSu7p7evv6B4ODQ8MhoaGw85VoVR+VJ1dItJ6PILtc1kyeFJnSesR0uG4rO00p5o7WfrnLH1SxzTxzZPG/IJVMraqosiNrPGbI4VIq1Yr0QirAY8yLcCeI+iMCPLSt0ixwOYEFFBQY4TAjCOmS41LKIg8EmLo8acQ4hzdvnqCNI2gplccqQiS3TWKJV1mdNWrdqup5apVN06g4pw4iyJ3bHmuyR3bNn9vFrrZpXo+XliGalreV2YfRkavf9X5VBs8Dhl+pPzwJFrHpeNfJue0zrFmpbXz0+a+6u7URr8+yavZD/K9ZgD3QDs/qm3mzznXME6QPiP5+7E6SWYnHC28uRxLr/Ff2YxhwW6b1XkMAmtpCkc02c4gKXgVdpUpqRZtupUsDXTOBbSAufwZ6OBg==</latexit><latexit
sha1_base64="qJB/PHFia9q7YWEVD/rcOeHnvng=">AAACbXichVHLSgMxFD0d3/VVFUFQpFiqrkoqguKq6Malrz6wLWVmTOvQeTGTFrT0B1wLLkRBQUT8DDf+gIt+grhwUcGNC2+nA6JFvSHJyck9NyeJYuuaKxhrBKSu7p7evv6B4ODQ8MhoaGw85VoVR+VJ1dItJ6PILtc1kyeFJnSesR0uG4rO00p5o7WfrnLH1SxzTxzZPG/IJVMraqosiNrPGbI4VIq1Yr0QirAY8yLcCeI+iMCPLSt0ixwOYEFFBQY4TAjCOmS41LKIg8EmLo8acQ4hzdvnqCNI2gplccqQiS3TWKJV1mdNWrdqup5apVN06g4pw4iyJ3bHmuyR3bNn9vFrrZpXo+XliGalreV2YfRkavf9X5VBs8Dhl+pPzwJFrHpeNfJue0zrFmpbXz0+a+6u7URr8+yavZD/K9ZgD3QDs/qm3mzznXME6QPiP5+7E6SWYnHC28uRxLr/Ff2YxhwW6b1XkMAmtpCkc02c4gKXgVdpUpqRZtupUsDXTOBbSAufwZ6OBg==</latexit><latexit
latexit
<
f

few-shot learning [5] and we build our method on it in our


This simple architecture was shown to be very effective for
class closer to each other and separating different classes.
cuses on making the direction of the features from the same
sue. To make the output more confident, the network fo-
tors. ℓ2 normalization on feature vectors can solve this is-
increasing the norm does not change the direction of vec-
not necessarily increase the between-class variance because
can try to increase the norm of features. However, this does
fident output. To make the output more confident, networks
fication of a feature vector with a large norm results in con-
and applied to few-shot learning [12, 5]. Generally, classi-
rameter T , which was proposed for face verification [25]
features before the last linear layer and a temperature pa-
We employ a network architecture with ℓ2 normalization on
few labeled examples from a novel domain or novel classes.
ever both tasks aim to extract discriminative features given a
aims to adapt to the same classes in a new domain. How-
aims to acquire knowledge of novel classes, while SSDA
ent assumptions: FSL does not use unlabeled examples and
and labeled “base” classes. SSDA and FSL make differ-
26] aims to learn novel classes given a few labeled examples
Few-shot learning (FSL). Few shot learning [35, 39,
as a variant of entropy minimization which overcomes the
domains (see experimental section.) MME can be regarded
there is a large domain gap between the source and target
we found that CEM fails to improve performance when
(CEM) is a widely used method in SSL [13, 10]. However,
not address domain shift. Conditional entropy minimization
boosted performance in semi-supervised learning, but do
model-ensemble [17], and adversarial approaches [22] have
Flipping
Gradient

sha1_base64="FzdOKzVVMODyZQ63IytaQ1+SYO4=">AAACbXichVHLSgMxFD0dX7W+qiIIihSLj5VkRFBciW5cqrW22IrMjKkG58VMWtChP+BacCEKCiLiZ7jxB1z0E8SFCwU3LrydDoiKekOSk5N7bk4S3TWFLxmrxZSm5pbWtnh7oqOzq7sn2du34Ttlz+BZwzEdL69rPjeFzbNSSJPnXY9rlm7ynL6/VN/PVbjnC8delwcu37K0XVuUhKFJojaLlib39FKQq24n02yKhZH6CdQIpBHFipO8RhE7cGCgDAscNiRhExp8agWoYHCJ20JAnEdIhPscVSRIW6YsThkasfs07tKqELE2res1/VBt0CkmdY+UKYyxB3bDXtg9u2WP7P3XWkFYo+7lgGa9oeXuds/RYObtX5VFs8Tep+pPzxIlzIVeBXl3Q6Z+C6OhrxyevGTm18aCcXbJnsj/BauxO7qBXXk1rlb52ikS9AHq9+f+CTamp1TCqzPphcXoK+IYwigm6b1nsYBlrCBL59o4xhnOY8/KgDKsjDRSlVik6ceXUCY+AKOPjfc=</latexit>
sha1_base64="FzdOKzVVMODyZQ63IytaQ1+SYO4=">AAACbXichVHLSgMxFD0dX7W+qiIIihSLj5VkRFBciW5cqrW22IrMjKkG58VMWtChP+BacCEKCiLiZ7jxB1z0E8SFCwU3LrydDoiKekOSk5N7bk4S3TWFLxmrxZSm5pbWtnh7oqOzq7sn2du34Ttlz+BZwzEdL69rPjeFzbNSSJPnXY9rlm7ynL6/VN/PVbjnC8delwcu37K0XVuUhKFJojaLlib39FKQq24n02yKhZH6CdQIpBHFipO8RhE7cGCgDAscNiRhExp8agWoYHCJ20JAnEdIhPscVSRIW6YsThkasfs07tKqELE2res1/VBt0CkmdY+UKYyxB3bDXtg9u2WP7P3XWkFYo+7lgGa9oeXuds/RYObtX5VFs8Tep+pPzxIlzIVeBXl3Q6Z+C6OhrxyevGTm18aCcXbJnsj/BauxO7qBXXk1rlb52ikS9AHq9+f+CTamp1TCqzPphcXoK+IYwigm6b1nsYBlrCBL59o4xhnOY8/KgDKsjDRSlVik6ceXUCY+AKOPjfc=</latexit><latexit
sha1_base64="FzdOKzVVMODyZQ63IytaQ1+SYO4=">AAACbXichVHLSgMxFD0dX7W+qiIIihSLj5VkRFBciW5cqrW22IrMjKkG58VMWtChP+BacCEKCiLiZ7jxB1z0E8SFCwU3LrydDoiKekOSk5N7bk4S3TWFLxmrxZSm5pbWtnh7oqOzq7sn2du34Ttlz+BZwzEdL69rPjeFzbNSSJPnXY9rlm7ynL6/VN/PVbjnC8delwcu37K0XVuUhKFJojaLlib39FKQq24n02yKhZH6CdQIpBHFipO8RhE7cGCgDAscNiRhExp8agWoYHCJ20JAnEdIhPscVSRIW6YsThkasfs07tKqELE2res1/VBt0CkmdY+UKYyxB3bDXtg9u2WP7P3XWkFYo+7lgGa9oeXuds/RYObtX5VFs8Tep+pPzxIlzIVeBXl3Q6Z+C6OhrxyevGTm18aCcXbJnsj/BauxO7qBXXk1rlb52ikS9AHq9+f+CTamp1TCqzPphcXoK+IYwigm6b1nsYBlrCBL59o4xhnOY8/KgDKsjDRSlVik6ceXUCY+AKOPjfc=</latexit><latexit
sha1_base64="FzdOKzVVMODyZQ63IytaQ1+SYO4=">AAACbXichVHLSgMxFD0dX7W+qiIIihSLj5VkRFBciW5cqrW22IrMjKkG58VMWtChP+BacCEKCiLiZ7jxB1z0E8SFCwU3LrydDoiKekOSk5N7bk4S3TWFLxmrxZSm5pbWtnh7oqOzq7sn2du34Ttlz+BZwzEdL69rPjeFzbNSSJPnXY9rlm7ynL6/VN/PVbjnC8delwcu37K0XVuUhKFJojaLlib39FKQq24n02yKhZH6CdQIpBHFipO8RhE7cGCgDAscNiRhExp8agWoYHCJ20JAnEdIhPscVSRIW6YsThkasfs07tKqELE2res1/VBt0CkmdY+UKYyxB3bDXtg9u2WP7P3XWkFYo+7lgGa9oeXuds/RYObtX5VFs8Tep+pPzxIlzIVeBXl3Q6Z+C6OhrxyevGTm18aCcXbJnsj/BauxO7qBXXk1rlb52ikS9AHq9+f+CTamp1TCqzPphcXoK+IYwigm6b1nsYBlrCBL59o4xhnOY8/KgDKsjDRSlVik6ceXUCY+AKOPjfc=</latexit><latexit
latexit
<
sha1_base64="NKtVB3y0RiJGq12NLSJEAOPhV6k=">AAACZHichVHLSsNAFD2Nr1qrrRZBEKRYKq5kIoLiqujGZd8t1FKSOK2haRKStFCLP6BbxYUrBRHxM9z4Ay76A4K4rODGhbdpQLSod5iZM2fuuXNmRjY11XYY6/qEkdGx8Qn/ZGAqOD0TCs/O5W2jaSk8pxiaYRVlyeaaqvOcozoaL5oWlxqyxgtyfbe/X2hxy1YNPeu0TV5uSDVdraqK5BCVylbCMbbG3IgOA9EDMXiRNMK32McBDChoogEOHQ5hDRJsaiWIYDCJK6NDnEVIdfc5jhEgbZOyOGVIxNZprNGq5LE6rfs1bVet0CkadYuUUcTZE7tjPfbI7tkL+/i1Vset0ffSplkeaLlZCZ0sZN7/VTVodnD4pfrTs4MqtlyvKnk3XaZ/C2Wgbx1d9DLb6XhnhV2zV/J/xbrsgW6gt96UmxRPXyJAHyD+fO5hkF9fEwmnNmKJHe8r/FjEMlbpvTeRwB6SyNG5HKc4w7nvWQgKEWF+kCr4PE0E30JY+gSuTonU</latexit>
sha1_base64="NKtVB3y0RiJGq12NLSJEAOPhV6k=">AAACZHichVHLSsNAFD2Nr1qrrRZBEKRYKq5kIoLiqujGZd8t1FKSOK2haRKStFCLP6BbxYUrBRHxM9z4Ay76A4K4rODGhbdpQLSod5iZM2fuuXNmRjY11XYY6/qEkdGx8Qn/ZGAqOD0TCs/O5W2jaSk8pxiaYRVlyeaaqvOcozoaL5oWlxqyxgtyfbe/X2hxy1YNPeu0TV5uSDVdraqK5BCVylbCMbbG3IgOA9EDMXiRNMK32McBDChoogEOHQ5hDRJsaiWIYDCJK6NDnEVIdfc5jhEgbZOyOGVIxNZprNGq5LE6rfs1bVet0CkadYuUUcTZE7tjPfbI7tkL+/i1Vset0ffSplkeaLlZCZ0sZN7/VTVodnD4pfrTs4MqtlyvKnk3XaZ/C2Wgbx1d9DLb6XhnhV2zV/J/xbrsgW6gt96UmxRPXyJAHyD+fO5hkF9fEwmnNmKJHe8r/FjEMlbpvTeRwB6SyNG5HKc4w7nvWQgKEWF+kCr4PE0E30JY+gSuTonU</latexit><latexit
sha1_base64="NKtVB3y0RiJGq12NLSJEAOPhV6k=">AAACZHichVHLSsNAFD2Nr1qrrRZBEKRYKq5kIoLiqujGZd8t1FKSOK2haRKStFCLP6BbxYUrBRHxM9z4Ay76A4K4rODGhbdpQLSod5iZM2fuuXNmRjY11XYY6/qEkdGx8Qn/ZGAqOD0TCs/O5W2jaSk8pxiaYRVlyeaaqvOcozoaL5oWlxqyxgtyfbe/X2hxy1YNPeu0TV5uSDVdraqK5BCVylbCMbbG3IgOA9EDMXiRNMK32McBDChoogEOHQ5hDRJsaiWIYDCJK6NDnEVIdfc5jhEgbZOyOGVIxNZprNGq5LE6rfs1bVet0CkadYuUUcTZE7tjPfbI7tkL+/i1Vset0ffSplkeaLlZCZ0sZN7/VTVodnD4pfrTs4MqtlyvKnk3XaZ/C2Wgbx1d9DLb6XhnhV2zV/J/xbrsgW6gt96UmxRPXyJAHyD+fO5hkF9fEwmnNmKJHe8r/FjEMlbpvTeRwB6SyNG5HKc4w7nvWQgKEWF+kCr4PE0E30JY+gSuTonU</latexit><latexit
sha1_base64="NKtVB3y0RiJGq12NLSJEAOPhV6k=">AAACZHichVHLSsNAFD2Nr1qrrRZBEKRYKq5kIoLiqujGZd8t1FKSOK2haRKStFCLP6BbxYUrBRHxM9z4Ay76A4K4rODGhbdpQLSod5iZM2fuuXNmRjY11XYY6/qEkdGx8Qn/ZGAqOD0TCs/O5W2jaSk8pxiaYRVlyeaaqvOcozoaL5oWlxqyxgtyfbe/X2hxy1YNPeu0TV5uSDVdraqK5BCVylbCMbbG3IgOA9EDMXiRNMK32McBDChoogEOHQ5hDRJsaiWIYDCJK6NDnEVIdfc5jhEgbZOyOGVIxNZprNGq5LE6rfs1bVet0CkadYuUUcTZE7tjPfbI7tkL+/i1Vset0ffSplkeaLlZCZ0sZN7/VTVodnD4pfrTs4MqtlyvKnk3XaZ/C2Wgbx1d9DLb6XhnhV2zV/J/xbrsgW6gt96UmxRPXyJAHyD+fO5hkF9fEwmnNmKJHe8r/FjEMlbpvTeRwB6SyNG5HKc4w7nvWQgKEWF+kCr4PE0E30JY+gSuTonU</latexit><latexit
latexit
<
T
W

8052
sha1_base64="EkM06j/iRBGDJqMNRRaQ0NIYNo0=">AAACgHichVG7SgNBFD1ZXzG+ojaCjRgUqzgrgmIl2lhGk5hAomF3nY1L9sXuJKDLFrb+gIWVgojY6DfY+AMWfoJYKthYeLNZEBX1DjNz5sw9d87MqK5p+IKxx4TU1d3T25fsTw0MDg2PpEfHtn2n6Wm8qDmm45VVxeemYfOiMITJy67HFUs1eUltrLf3Sy3u+YZjF8SBy3cspW4buqEpgqhaerKqe4oWVC1F7Kt6EASl3UKoh2EYFMJaOsOyLIqpn0COQQZx5Jz0JarYgwMNTVjgsCEIm1DgU6tABoNL3A4C4jxCRrTPESJF2iZlccpQiG3QWKdVJWZtWrdr+pFao1NM6h4ppzDDHtgVe2H37Jo9sfdfawVRjbaXA5rVjpa7tZHjifzbvyqLZoH9T9WfngV0LEdeDfLuRkz7FlpH3zo8ecmvbM0Es+ycPZP/M/bI7ugGdutVu9jkW6dI0QfI35/7J9heyMqENxczq2vxVyQxiWnM0XsvYRUbyKFI5x7hEje4lSRpTpqX5E6qlIg14/gS0soHigSVFw==</latexit>
sha1_base64="EkM06j/iRBGDJqMNRRaQ0NIYNo0=">AAACgHichVG7SgNBFD1ZXzG+ojaCjRgUqzgrgmIl2lhGk5hAomF3nY1L9sXuJKDLFrb+gIWVgojY6DfY+AMWfoJYKthYeLNZEBX1DjNz5sw9d87MqK5p+IKxx4TU1d3T25fsTw0MDg2PpEfHtn2n6Wm8qDmm45VVxeemYfOiMITJy67HFUs1eUltrLf3Sy3u+YZjF8SBy3cspW4buqEpgqhaerKqe4oWVC1F7Kt6EASl3UKoh2EYFMJaOsOyLIqpn0COQQZx5Jz0JarYgwMNTVjgsCEIm1DgU6tABoNL3A4C4jxCRrTPESJF2iZlccpQiG3QWKdVJWZtWrdr+pFao1NM6h4ppzDDHtgVe2H37Jo9sfdfawVRjbaXA5rVjpa7tZHjifzbvyqLZoH9T9WfngV0LEdeDfLuRkz7FlpH3zo8ecmvbM0Es+ycPZP/M/bI7ugGdutVu9jkW6dI0QfI35/7J9heyMqENxczq2vxVyQxiWnM0XsvYRUbyKFI5x7hEje4lSRpTpqX5E6qlIg14/gS0soHigSVFw==</latexit><latexit
sha1_base64="EkM06j/iRBGDJqMNRRaQ0NIYNo0=">AAACgHichVG7SgNBFD1ZXzG+ojaCjRgUqzgrgmIl2lhGk5hAomF3nY1L9sXuJKDLFrb+gIWVgojY6DfY+AMWfoJYKthYeLNZEBX1DjNz5sw9d87MqK5p+IKxx4TU1d3T25fsTw0MDg2PpEfHtn2n6Wm8qDmm45VVxeemYfOiMITJy67HFUs1eUltrLf3Sy3u+YZjF8SBy3cspW4buqEpgqhaerKqe4oWVC1F7Kt6EASl3UKoh2EYFMJaOsOyLIqpn0COQQZx5Jz0JarYgwMNTVjgsCEIm1DgU6tABoNL3A4C4jxCRrTPESJF2iZlccpQiG3QWKdVJWZtWrdr+pFao1NM6h4ppzDDHtgVe2H37Jo9sfdfawVRjbaXA5rVjpa7tZHjifzbvyqLZoH9T9WfngV0LEdeDfLuRkz7FlpH3zo8ecmvbM0Es+ycPZP/M/bI7ugGdutVu9jkW6dI0QfI35/7J9heyMqENxczq2vxVyQxiWnM0XsvYRUbyKFI5x7hEje4lSRpTpqX5E6qlIg14/gS0soHigSVFw==</latexit><latexit
sha1_base64="EkM06j/iRBGDJqMNRRaQ0NIYNo0=">AAACgHichVG7SgNBFD1ZXzG+ojaCjRgUqzgrgmIl2lhGk5hAomF3nY1L9sXuJKDLFrb+gIWVgojY6DfY+AMWfoJYKthYeLNZEBX1DjNz5sw9d87MqK5p+IKxx4TU1d3T25fsTw0MDg2PpEfHtn2n6Wm8qDmm45VVxeemYfOiMITJy67HFUs1eUltrLf3Sy3u+YZjF8SBy3cspW4buqEpgqhaerKqe4oWVC1F7Kt6EASl3UKoh2EYFMJaOsOyLIqpn0COQQZx5Jz0JarYgwMNTVjgsCEIm1DgU6tABoNL3A4C4jxCRrTPESJF2iZlccpQiG3QWKdVJWZtWrdr+pFao1NM6h4ppzDDHtgVe2H37Jo9sfdfawVRjbaXA5rVjpa7tZHjifzbvyqLZoH9T9WfngV0LEdeDfLuRkz7FlpH3zo8ecmvbM0Es+ycPZP/M/bI7ugGdutVu9jkW6dI0QfI35/7J9heyMqENxczq2vxVyQxiWnM0XsvYRUbyKFI5x7hEje4lSRpTpqX5E6qlIg14/gS0soHigSVFw==</latexit><latexit
<latexit
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit>
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit><latexit
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit><latexit
sha1_base64="GQyATEZNrXBuezetIIGN/QQBFHk=">AAACZHichVHLSsNAFD2N7/potQiCIGKpuCo3IiiuxG5cttXWQi2SxKmGpklI0kIt/oBuFReuFETEz3DjD7jwBwRxWcGNC2/TgGhR7zAzZ87cc+fMjGobuusRPYWknt6+/oHBofDwyOhYJDo+kXetmqOJnGYZllNQFVcYuilynu4ZomA7QqmqhthWK6n2/nZdOK5umVtewxalqrJv6mVdUzymMqndaJyS5MdsN5ADEEcQaSt6gx3swYKGGqoQMOExNqDA5VaEDILNXAlN5hxGur8vcIQwa2ucJThDYbbC4z6vigFr8rpd0/XVGp9icHdYOYsEPdItteiB7uiFPn6t1fRrtL00eFY7WmHvRo6nNt//VVV59nDwpfrTs4cyVnyvOnu3faZ9C62jrx+etzZXs4nmPF3RK/u/pCe65xuY9TftOiOyFwjzB8g/n7sb5BeTMiXlzFJ8bT34ikFMYw4L/N7LWMMG0sjxuQInOMVZ6FkakWLSZCdVCgWaGL6FNPMJjEyJww==</latexit><latexit
latexit
<
T
C

work.
WT f

puts T1 W
T
= Temperature
= Weight Matrix

F (x)
loss on unlabeled target examples is flipped by a gradient reversal layer [11, 37].

p(x) = σ( T1 W
T
mu
Du = {(xui )}i=1
sha1_base64="EcJi5/JAALAe1M/jfJ4cEu+Re9w=">AAACeXicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuI+MRXJ6fWasTkJpZkJKVVF9TqVGrGCygb6BmAgQImwxDKUGaAgoB8geUMMQwpDPkMyQylDLkMqQx5DCVAdg5DIkMxEEYzGDIYMBQAxWIZqoFiRUBWJlg+laGWgQuotxSoKhWoIhEomg0k04G8aKhoHpAPMrMYrDsZaEsOEBcBdSowqBpcNVhp8NnghMFqg5cGf3CaVQ02A+SWSiCdBNGbWhDP3yUR/J2grlwgXcKQgdCF180lDGkMFmC3ZgLdXgAWAfkiGaK/rGr652CrINVqNYNFBq+B7l9ocNPgMNAHeWVfkpcGpgbNZuACRoAhenBjMsKM9AwN9AwDTZQdnKBRwcEgzaDEoAEMb3MGBwYPhgCGUKC9VQwLGFYyrGL8zaTIpMGkBVHKxAjVI8yAApiMAc04kgY=</latexit>
sha1_base64="EcJi5/JAALAe1M/jfJ4cEu+Re9w=">AAACeXicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuI+MRXJ6fWasTkJpZkJKVVF9TqVGrGCygb6BmAgQImwxDKUGaAgoB8geUMMQwpDPkMyQylDLkMqQx5DCVAdg5DIkMxEEYzGDIYMBQAxWIZqoFiRUBWJlg+laGWgQuotxSoKhWoIhEomg0k04G8aKhoHpAPMrMYrDsZaEsOEBcBdSowqBpcNVhp8NnghMFqg5cGf3CaVQ02A+SWSiCdBNGbWhDP3yUR/J2grlwgXcKQgdCF180lDGkMFmC3ZgLdXgAWAfkiGaK/rGr652CrINVqNYNFBq+B7l9ocNPgMNAHeWVfkpcGpgbNZuACRoAhenBjMsKM9AwN9AwDTZQdnKBRwcEgzaDEoAEMb3MGBwYPhgCGUKC9VQwLGFYyrGL8zaTIpMGkBVHKxAjVI8yAApiMAc04kgY=</latexit><latexit
sha1_base64="EcJi5/JAALAe1M/jfJ4cEu+Re9w=">AAACeXicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuI+MRXJ6fWasTkJpZkJKVVF9TqVGrGCygb6BmAgQImwxDKUGaAgoB8geUMMQwpDPkMyQylDLkMqQx5DCVAdg5DIkMxEEYzGDIYMBQAxWIZqoFiRUBWJlg+laGWgQuotxSoKhWoIhEomg0k04G8aKhoHpAPMrMYrDsZaEsOEBcBdSowqBpcNVhp8NnghMFqg5cGf3CaVQ02A+SWSiCdBNGbWhDP3yUR/J2grlwgXcKQgdCF180lDGkMFmC3ZgLdXgAWAfkiGaK/rGr652CrINVqNYNFBq+B7l9ocNPgMNAHeWVfkpcGpgbNZuACRoAhenBjMsKM9AwN9AwDTZQdnKBRwcEgzaDEoAEMb3MGBwYPhgCGUKC9VQwLGFYyrGL8zaTIpMGkBVHKxAjVI8yAApiMAc04kgY=</latexit><latexit
sha1_base64="EcJi5/JAALAe1M/jfJ4cEu+Re9w=">AAACeXicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuI+MRXJ6fWasTkJpZkJKVVF9TqVGrGCygb6BmAgQImwxDKUGaAgoB8geUMMQwpDPkMyQylDLkMqQx5DCVAdg5DIkMxEEYzGDIYMBQAxWIZqoFiRUBWJlg+laGWgQuotxSoKhWoIhEomg0k04G8aKhoHpAPMrMYrDsZaEsOEBcBdSowqBpcNVhp8NnghMFqg5cGf3CaVQ02A+SWSiCdBNGbWhDP3yUR/J2grlwgXcKQgdCF180lDGkMFmC3ZgLdXgAWAfkiGaK/rGr652CrINVqNYNFBq+B7l9ocNPgMNAHeWVfkpcGpgbNZuACRoAhenBjMsKM9AwN9AwDTZQdnKBRwcEgzaDEoAEMb3MGBwYPhgCGUKC9VQwLGFYyrGL8zaTIpMGkBVHKxAjVI8yAApiMAc04kgY=</latexit><latexit
<latexit
mt
Dt = {(xti , yi t )}i=1

F (x)
Softmax

sha1_base64="7LJVnfaswH0P4hKinZRIP5cYjKo=">AAACZHichVHLSsNAFD2Nr1ofrRZBEEQsiqtyI4Liquimyz5sLVSRJI4amiYhSQu1+AO6VVy4UhARP8ONP+CiPyCIywpuXHibBkRFvcPMnDlzz50zM6pt6K5H1ApJPb19/QPhwcjQ8MhoNDY2XnStmqOJgmYZllNSFVcYuikKnu4ZomQ7QqmqhthUK+ud/c26cFzdMje8hi22q8q+qe/pmuIxlU3vxBKUJD9mfgI5AAkEkbFiN9jCLixoqKEKARMeYwMKXG5lyCDYzG2jyZzDSPf3BY4QYW2NswRnKMxWeNznVTlgTV53arq+WuNTDO4OK2cwR490S216oDt6pvdfazX9Gh0vDZ7VrlbYO9Hjyfzbv6oqzx4OPlV/evawhxXfq87ebZ/p3ELr6uuH5+38am6uOU9X9ML+L6lF93wDs/6qXWdF7gIR/gD5+3P/BMXFpExJObuUSK0FXxHGFGaxwO+9jBTSyKDA5wqc4BRnoSdpWIpLE91UKRRo4vgS0vQHlkyJyA==</latexit>
sha1_base64="7LJVnfaswH0P4hKinZRIP5cYjKo=">AAACZHichVHLSsNAFD2Nr1ofrRZBEEQsiqtyI4Liquimyz5sLVSRJI4amiYhSQu1+AO6VVy4UhARP8ONP+CiPyCIywpuXHibBkRFvcPMnDlzz50zM6pt6K5H1ApJPb19/QPhwcjQ8MhoNDY2XnStmqOJgmYZllNSFVcYuikKnu4ZomQ7QqmqhthUK+ud/c26cFzdMje8hi22q8q+qe/pmuIxlU3vxBKUJD9mfgI5AAkEkbFiN9jCLixoqKEKARMeYwMKXG5lyCDYzG2jyZzDSPf3BY4QYW2NswRnKMxWeNznVTlgTV53arq+WuNTDO4OK2cwR490S216oDt6pvdfazX9Gh0vDZ7VrlbYO9Hjyfzbv6oqzx4OPlV/evawhxXfq87ebZ/p3ELr6uuH5+38am6uOU9X9ML+L6lF93wDs/6qXWdF7gIR/gD5+3P/BMXFpExJObuUSK0FXxHGFGaxwO+9jBTSyKDA5wqc4BRnoSdpWIpLE91UKRRo4vgS0vQHlkyJyA==</latexit><latexit
sha1_base64="7LJVnfaswH0P4hKinZRIP5cYjKo=">AAACZHichVHLSsNAFD2Nr1ofrRZBEEQsiqtyI4Liquimyz5sLVSRJI4amiYhSQu1+AO6VVy4UhARP8ONP+CiPyCIywpuXHibBkRFvcPMnDlzz50zM6pt6K5H1ApJPb19/QPhwcjQ8MhoNDY2XnStmqOJgmYZllNSFVcYuikKnu4ZomQ7QqmqhthUK+ud/c26cFzdMje8hi22q8q+qe/pmuIxlU3vxBKUJD9mfgI5AAkEkbFiN9jCLixoqKEKARMeYwMKXG5lyCDYzG2jyZzDSPf3BY4QYW2NswRnKMxWeNznVTlgTV53arq+WuNTDO4OK2cwR490S216oDt6pvdfazX9Gh0vDZ7VrlbYO9Hjyfzbv6oqzx4OPlV/evawhxXfq87ebZ/p3ELr6uuH5+38am6uOU9X9ML+L6lF93wDs/6qXWdF7gIR/gD5+3P/BMXFpExJObuUSK0FXxHGFGaxwO+9jBTSyKDA5wqc4BRnoSdpWIpLE91UKRRo4vgS0vQHlkyJyA==</latexit><latexit
sha1_base64="7LJVnfaswH0P4hKinZRIP5cYjKo=">AAACZHichVHLSsNAFD2Nr1ofrRZBEEQsiqtyI4Liquimyz5sLVSRJI4amiYhSQu1+AO6VVy4UhARP8ONP+CiPyCIywpuXHibBkRFvcPMnDlzz50zM6pt6K5H1ApJPb19/QPhwcjQ8MhoNDY2XnStmqOJgmYZllNSFVcYuikKnu4ZomQ7QqmqhthUK+ud/c26cFzdMje8hi22q8q+qe/pmuIxlU3vxBKUJD9mfgI5AAkEkbFiN9jCLixoqKEKARMeYwMKXG5lyCDYzG2jyZzDSPf3BY4QYW2NswRnKMxWeNznVTlgTV53arq+WuNTDO4OK2cwR490S216oDt6pvdfazX9Gh0vDZ7VrlbYO9Hjyfzbv6oqzx4OPlV/evawhxXfq87ebZ/p3ELr6uuH5+38am6uOU9X9ML+L6lF93wDs/6qXWdF7gIR/gD5+3P/BMXFpExJObuUSK0FXxHGFGaxwO+9jBTSyKDA5wqc4BRnoSdpWIpLE91UKRRo4vgS0vQHlkyJyA==</latexit><latexit
latexit
<
H

sha1_base64="pqpsMzL0PDji+m5tsXqyLGOmol4=">AAACbXichVHLSgMxFD0d3/VVFUFQRCw+ViUjguKq6Malr6rYFpkZ0xo6L2bSgpb+gGvBhSgoiIif4cYfcOEniAsXCm5ceGc6IFrUG5KcnNxzc5Loril8ydhjTGlqbmlta++Id3Z19/Qm+vo3fafsGTxjOKbjbeuaz01h84wU0uTbrsc1Szf5ll5aCva3KtzzhWNvyAOX5y2taIuCMDRJ1E7O0uS+Xqi6td1EkqVYGGONQI1AElGsOIlr5LAHBwbKsMBhQxI2ocGnloUKBpe4PKrEeYREuM9RQ5y0ZcrilKERW6KxSKtsxNq0Dmr6odqgU0zqHinHMMEe2A17Zffslj2xj19rVcMagZcDmvW6lru7vUdD6+//qiyaJfa/VH96lihgPvQqyLsbMsEtjLq+cnjyur6wNlGdZJfsmfxfsEd2RzewK2/G1SpfO0WcPkD9+dyNYHMmpRJenU2mF6OvaMcwxjFN7z2HNJaxggyda+MYZziPvSiDyogyWk9VYpFmAN9CmfoE1aiOEA==</latexit>
sha1_base64="pqpsMzL0PDji+m5tsXqyLGOmol4=">AAACbXichVHLSgMxFD0d3/VVFUFQRCw+ViUjguKq6Malr6rYFpkZ0xo6L2bSgpb+gGvBhSgoiIif4cYfcOEniAsXCm5ceGc6IFrUG5KcnNxzc5Loril8ydhjTGlqbmlta++Id3Z19/Qm+vo3fafsGTxjOKbjbeuaz01h84wU0uTbrsc1Szf5ll5aCva3KtzzhWNvyAOX5y2taIuCMDRJ1E7O0uS+Xqi6td1EkqVYGGONQI1AElGsOIlr5LAHBwbKsMBhQxI2ocGnloUKBpe4PKrEeYREuM9RQ5y0ZcrilKERW6KxSKtsxNq0Dmr6odqgU0zqHinHMMEe2A17Zffslj2xj19rVcMagZcDmvW6lru7vUdD6+//qiyaJfa/VH96lihgPvQqyLsbMsEtjLq+cnjyur6wNlGdZJfsmfxfsEd2RzewK2/G1SpfO0WcPkD9+dyNYHMmpRJenU2mF6OvaMcwxjFN7z2HNJaxggyda+MYZziPvSiDyogyWk9VYpFmAN9CmfoE1aiOEA==</latexit><latexit
sha1_base64="pqpsMzL0PDji+m5tsXqyLGOmol4=">AAACbXichVHLSgMxFD0d3/VVFUFQRCw+ViUjguKq6Malr6rYFpkZ0xo6L2bSgpb+gGvBhSgoiIif4cYfcOEniAsXCm5ceGc6IFrUG5KcnNxzc5Loril8ydhjTGlqbmlta++Id3Z19/Qm+vo3fafsGTxjOKbjbeuaz01h84wU0uTbrsc1Szf5ll5aCva3KtzzhWNvyAOX5y2taIuCMDRJ1E7O0uS+Xqi6td1EkqVYGGONQI1AElGsOIlr5LAHBwbKsMBhQxI2ocGnloUKBpe4PKrEeYREuM9RQ5y0ZcrilKERW6KxSKtsxNq0Dmr6odqgU0zqHinHMMEe2A17Zffslj2xj19rVcMagZcDmvW6lru7vUdD6+//qiyaJfa/VH96lihgPvQqyLsbMsEtjLq+cnjyur6wNlGdZJfsmfxfsEd2RzewK2/G1SpfO0WcPkD9+dyNYHMmpRJenU2mF6OvaMcwxjFN7z2HNJaxggyda+MYZziPvSiDyogyWk9VYpFmAN9CmfoE1aiOEA==</latexit><latexit
sha1_base64="pqpsMzL0PDji+m5tsXqyLGOmol4=">AAACbXichVHLSgMxFD0d3/VVFUFQRCw+ViUjguKq6Malr6rYFpkZ0xo6L2bSgpb+gGvBhSgoiIif4cYfcOEniAsXCm5ceGc6IFrUG5KcnNxzc5Loril8ydhjTGlqbmlta++Id3Z19/Qm+vo3fafsGTxjOKbjbeuaz01h84wU0uTbrsc1Szf5ll5aCva3KtzzhWNvyAOX5y2taIuCMDRJ1E7O0uS+Xqi6td1EkqVYGGONQI1AElGsOIlr5LAHBwbKsMBhQxI2ocGnloUKBpe4PKrEeYREuM9RQ5y0ZcrilKERW6KxSKtsxNq0Dmr6odqgU0zqHinHMMEe2A17Zffslj2xj19rVcMagZcDmvW6lru7vUdD6+//qiyaJfa/VH96lihgPvQqyLsbMsEtjLq+cnjyur6wNlGdZJfsmfxfsEd2RzewK2/G1SpfO0WcPkD9+dyNYHMmpRJenU2mF6OvaMcwxjFN7z2HNJaxggyda+MYZziPvSiDyogyWk9VYpFmAN9CmfoE1aiOEA==</latexit><latexit
<latexit
p
Lce (p, y)

Backward path for unlabeled target examples

ms
domain Ds = {(xsi , yi s )}i=1

ature parameter T . C takes kF


Backward path for labeled source and target examples

F (x)
Ds , Dt , and Du and evaluate on Du .
Target
= Entropy

Unlabeled
Labeled Example

sha1_base64="SRTudCVhkHiiWVupo5SXcuwrI+4=">AAACinichVG7SgNBFD2u7/hI1EawWQwRBZGJCL4aUQsLC19RwUjYXSfJ4L7YnQTikh/QD7CwUhARK1stbfwBCz9BLBVsLLzZLIiKeoeZOXPmnjtnZnTXFL5k7LFBaWxqbmlta491dHZ1xxM9vZu+U/IMnjEc0/G2dc3nprB5Rgpp8m3X45qlm3xL31+o7W+VuecLx96QFZfvWlrBFnlhaJKoXCKlLucCg1fVrMnzcljNWpos6vnArY6qFTXriUJRjqi5RJKNsTDUnyAdgSSiWHESF8hiDw4MlGCBw4YkbEKDT20HaTC4xO0iIM4jJMJ9jipipC1RFqcMjdh9Ggu02olYm9a1mn6oNugUk7pHShUp9sAu2Qu7Z1fsib3/WisIa9S8VGjW61ru5uKH/etv/6osmiWKn6o/PUvkMRV6FeTdDZnaLYy6vnxw/LI+s5YKhtgZeyb/p+yR3dEN7PKrcb7K104Qow9If3/un2BzfCxNeHUiOTcffUUbBjCIYXrvScxhCSvI0LlHuMYNbpVOZVyZVmbrqUpDpOnDl1AWPwBKgZby</latexit>
sha1_base64="SRTudCVhkHiiWVupo5SXcuwrI+4=">AAACinichVG7SgNBFD2u7/hI1EawWQwRBZGJCL4aUQsLC19RwUjYXSfJ4L7YnQTikh/QD7CwUhARK1stbfwBCz9BLBVsLLzZLIiKeoeZOXPmnjtnZnTXFL5k7LFBaWxqbmlta491dHZ1xxM9vZu+U/IMnjEc0/G2dc3nprB5Rgpp8m3X45qlm3xL31+o7W+VuecLx96QFZfvWlrBFnlhaJKoXCKlLucCg1fVrMnzcljNWpos6vnArY6qFTXriUJRjqi5RJKNsTDUnyAdgSSiWHESF8hiDw4MlGCBw4YkbEKDT20HaTC4xO0iIM4jJMJ9jipipC1RFqcMjdh9Ggu02olYm9a1mn6oNugUk7pHShUp9sAu2Qu7Z1fsib3/WisIa9S8VGjW61ru5uKH/etv/6osmiWKn6o/PUvkMRV6FeTdDZnaLYy6vnxw/LI+s5YKhtgZeyb/p+yR3dEN7PKrcb7K104Qow9If3/un2BzfCxNeHUiOTcffUUbBjCIYXrvScxhCSvI0LlHuMYNbpVOZVyZVmbrqUpDpOnDl1AWPwBKgZby</latexit><latexit
sha1_base64="SRTudCVhkHiiWVupo5SXcuwrI+4=">AAACinichVG7SgNBFD2u7/hI1EawWQwRBZGJCL4aUQsLC19RwUjYXSfJ4L7YnQTikh/QD7CwUhARK1stbfwBCz9BLBVsLLzZLIiKeoeZOXPmnjtnZnTXFL5k7LFBaWxqbmlta491dHZ1xxM9vZu+U/IMnjEc0/G2dc3nprB5Rgpp8m3X45qlm3xL31+o7W+VuecLx96QFZfvWlrBFnlhaJKoXCKlLucCg1fVrMnzcljNWpos6vnArY6qFTXriUJRjqi5RJKNsTDUnyAdgSSiWHESF8hiDw4MlGCBw4YkbEKDT20HaTC4xO0iIM4jJMJ9jipipC1RFqcMjdh9Ggu02olYm9a1mn6oNugUk7pHShUp9sAu2Qu7Z1fsib3/WisIa9S8VGjW61ru5uKH/etv/6osmiWKn6o/PUvkMRV6FeTdDZnaLYy6vnxw/LI+s5YKhtgZeyb/p+yR3dEN7PKrcb7K104Qow9If3/un2BzfCxNeHUiOTcffUUbBjCIYXrvScxhCSvI0LlHuMYNbpVOZVyZVmbrqUpDpOnDl1AWPwBKgZby</latexit><latexit
sha1_base64="SRTudCVhkHiiWVupo5SXcuwrI+4=">AAACinichVG7SgNBFD2u7/hI1EawWQwRBZGJCL4aUQsLC19RwUjYXSfJ4L7YnQTikh/QD7CwUhARK1stbfwBCz9BLBVsLLzZLIiKeoeZOXPmnjtnZnTXFL5k7LFBaWxqbmlta491dHZ1xxM9vZu+U/IMnjEc0/G2dc3nprB5Rgpp8m3X45qlm3xL31+o7W+VuecLx96QFZfvWlrBFnlhaJKoXCKlLucCg1fVrMnzcljNWpos6vnArY6qFTXriUJRjqi5RJKNsTDUnyAdgSSiWHESF8hiDw4MlGCBw4YkbEKDT20HaTC4xO0iIM4jJMJ9jipipC1RFqcMjdh9Ggu02olYm9a1mn6oNugUk7pHShUp9sAu2Qu7Z1fsib3/WisIa9S8VGjW61ru5uKH/etv/6osmiWKn6o/PUvkMRV6FeTdDZnaLYy6vnxw/LI+s5YKhtgZeyb/p+yR3dEN7PKrcb7K104Qow9If3/un2BzfCxNeHUiOTcffUUbBjCIYXrvScxhCSvI0LlHuMYNbpVOZVyZVmbrqUpDpOnDl1AWPwBKgZby</latexit><latexit
<latexit
sha1_base64="pSwD012ky3kYeCYh62VUh3X0tsU=">AAACcXichVG7SgNBFD1Z3/EVtVFsgkGJiOGujWIl2lj6igqJht11kizui91JQIM/4A8oWCmIiJ9h4w9Y+AliGcHGwrubBVFR7zAzZ87cc+fMjO5ZZiCJnhJKW3tHZ1d3T7K3r39gMDU0vB24Nd8QecO1XH9X1wJhmY7IS1NaYtfzhWbrltjRD1fC/Z268APTdbbkkSf2bK3imGXT0CRT+7Or2aKtyapebngn06VUhnIURfonUGOQQRxrbuoGRRzAhYEabAg4kIwtaAi4FaCC4DG3hwZzPiMz2hc4QZK1Nc4SnKExe8hjhVeFmHV4HdYMIrXBp1jcfVamMUmPdEtNeqA7eqb3X2s1ohqhlyOe9ZZWeKXB09HNt39VNs8S1U/Vn54lyliIvJrs3YuY8BZGS18/PmtuLm5MNqboil7Y/yU90T3fwKm/GtfrYuMCSf4A9ftz/wTbczmVcuo6ZZaW46/oxjgmkOX3nscSVrGGPJ/r4xyXuEo0lTElrUy0UpVErBnBl1BmPgAB/o76</latexit>
sha1_base64="pSwD012ky3kYeCYh62VUh3X0tsU=">AAACcXichVG7SgNBFD1Z3/EVtVFsgkGJiOGujWIl2lj6igqJht11kizui91JQIM/4A8oWCmIiJ9h4w9Y+AliGcHGwrubBVFR7zAzZ87cc+fMjO5ZZiCJnhJKW3tHZ1d3T7K3r39gMDU0vB24Nd8QecO1XH9X1wJhmY7IS1NaYtfzhWbrltjRD1fC/Z268APTdbbkkSf2bK3imGXT0CRT+7Or2aKtyapebngn06VUhnIURfonUGOQQRxrbuoGRRzAhYEabAg4kIwtaAi4FaCC4DG3hwZzPiMz2hc4QZK1Nc4SnKExe8hjhVeFmHV4HdYMIrXBp1jcfVamMUmPdEtNeqA7eqb3X2s1ohqhlyOe9ZZWeKXB09HNt39VNs8S1U/Vn54lyliIvJrs3YuY8BZGS18/PmtuLm5MNqboil7Y/yU90T3fwKm/GtfrYuMCSf4A9ftz/wTbczmVcuo6ZZaW46/oxjgmkOX3nscSVrGGPJ/r4xyXuEo0lTElrUy0UpVErBnBl1BmPgAB/o76</latexit><latexit
sha1_base64="pSwD012ky3kYeCYh62VUh3X0tsU=">AAACcXichVG7SgNBFD1Z3/EVtVFsgkGJiOGujWIl2lj6igqJht11kizui91JQIM/4A8oWCmIiJ9h4w9Y+AliGcHGwrubBVFR7zAzZ87cc+fMjO5ZZiCJnhJKW3tHZ1d3T7K3r39gMDU0vB24Nd8QecO1XH9X1wJhmY7IS1NaYtfzhWbrltjRD1fC/Z268APTdbbkkSf2bK3imGXT0CRT+7Or2aKtyapebngn06VUhnIURfonUGOQQRxrbuoGRRzAhYEabAg4kIwtaAi4FaCC4DG3hwZzPiMz2hc4QZK1Nc4SnKExe8hjhVeFmHV4HdYMIrXBp1jcfVamMUmPdEtNeqA7eqb3X2s1ohqhlyOe9ZZWeKXB09HNt39VNs8S1U/Vn54lyliIvJrs3YuY8BZGS18/PmtuLm5MNqboil7Y/yU90T3fwKm/GtfrYuMCSf4A9ftz/wTbczmVcuo6ZZaW46/oxjgmkOX3nscSVrGGPJ/r4xyXuEo0lTElrUy0UpVErBnBl1BmPgAB/o76</latexit><latexit
sha1_base64="pSwD012ky3kYeCYh62VUh3X0tsU=">AAACcXichVG7SgNBFD1Z3/EVtVFsgkGJiOGujWIl2lj6igqJht11kizui91JQIM/4A8oWCmIiJ9h4w9Y+AliGcHGwrubBVFR7zAzZ87cc+fMjO5ZZiCJnhJKW3tHZ1d3T7K3r39gMDU0vB24Nd8QecO1XH9X1wJhmY7IS1NaYtfzhWbrltjRD1fC/Z268APTdbbkkSf2bK3imGXT0CRT+7Or2aKtyapebngn06VUhnIURfonUGOQQRxrbuoGRRzAhYEabAg4kIwtaAi4FaCC4DG3hwZzPiMz2hc4QZK1Nc4SnKExe8hjhVeFmHV4HdYMIrXBp1jcfVamMUmPdEtNeqA7eqb3X2s1ohqhlyOe9ZZWeKXB09HNt39VNs8S1U/Vn54lyliIvJrs3YuY8BZGS18/PmtuLm5MNqboil7Y/yU90T3fwKm/GtfrYuMCSf4A9ftz/wTbczmVcuo6ZZaW46/oxjgmkOX3nscSVrGGPJ/r4xyXuEo0lTElrUy0UpVErBnBl1BmPgAB/o76</latexit><latexit
<latexit
= Cross Entropy Loss

3.1. Similarity based Network Architecture


3. Minimax Entropy Domain Adaptation
−H(p)
Lce (p, y)

the normalized feature vector is used as an input to C

each class. An architecture of our method is shown in Fig. 3.


weight vectors can be regarded as estimated prototypes for
ized features of the corresponding class. In this respect, the
of a weight vector has to be representative to the normal-
tion. In order to classify examples correctly, the direction
kF (x)k ), where σ indicates a softmax func-
layer to obtain the probabilistic output p ∈ Rn . We denote
kF (x)k . The output of C is fed into a softmax-
(x)k as an input and out-
where K represents the number of classes and a temper-
which consists of weight vectors W = [w1 , w2 , . . . , wK ]
form ℓ2 normalization on the output of the network. Then,
we employ a deep convolutional neural network and per-
tractor F and a classifier C. For the feature extractor F ,
Inspired by [5], our base model consists of a feature ex-
. Our goal is to train the model on
, as well as unlabeled target images
are also given a limited number of labeled target images
. In the target domain, we
source images and the corresponding labels in the source
In semi-supervised domain adaptation, we are given
2) whereas F is trained to minimize it (Step 2 in Fig. 2). To achieve the adversarial learning, the sign of gradients for entropy
C which has weight vectors (W) and temperature T . W is trained to maximize entropy on unlabeled target (Step 1 in Fig.
a few labeled target examples, and unlabeled target examples. Our model consists of the feature extractor F and the classifier
Figure 3: An overview of the model architecture and MME. The inputs to the network are labeled source examples (y=label),
3.2. Training Objectives maximization) and entropy minimization process yields dis-
We estimate domain-invariant prototypes by performing criminative features.
entropy maximization with respect to the estimated proto- To summarize, our method can be formulated as adver-
type. Then, we extract discriminative features by perform- sarial learning between C and F . The task classifier C is
ing entropy minimization with respect to feature extractor. trained to maximize the entropy, whereas the feature ex-
Entropy maximization prevents overfitting that can reduce tractor F is trained to minimize it. Both C and F are also
the expressive power of the representations. Therefore, en- trained to classify labeled examples correctly. The overall
tropy maximization can be considered as the step of select- adversarial learning objective functions are:
ing prototypes that will not cause overfitting to the source
θ̂F = argmin L + λH
examples. In our method, the prototypes are parameterized θF
by the weight vectors of the last linear layer. First, we train (3)
θ̂C = argmin L − λH
F and C to classify labeled source and target examples cor- θC
rectly and utilize an entropy minimization objective to ex- where λ is a hyper-parameter to control a trade-off between
tract discriminative features for the target domain. We use minimax entropy training and classification on labeled ex-
a standard cross-entropy loss to train F and C for classifi- amples. Our method can be formulated as the iterative min-
cation: imax training. To simplify training process, we use a gra-
L = E(x,y)∈Ds ,Dt Lce (p(x), y) . (1) dient reversal layer [11] to flip the gradient between C and
With this classification loss, we ensure that the feature ex- F with respect to H. With this layer, we can perform the
tractor generates discriminative features with respect to the minimax training with one forward and back-propagation,
source and a few target labeled examples. However, the which is illustrated in Fig. 3.
model is trained on the source domain and a small fraction
3.3. Theoretical Insights
As shown in [2], we can measure domain-divergence by
of target examples for classification. This does not learn
using a domain classifier. Let h ∈ H be a hypothesis, ǫs (h)
discriminative features for the entire target domain. There-
and ǫt (h) be the expected risk of source and target respec-
fore, we propose minimax entropy training using unlabeled
tively, then ǫt (h) 6 ǫs (h) + dH (p, q) + C0 where C0 is a
target examples.
constant for the complexity of hypothesis space and the risk
A conceptual overview of our proposed adversarial
of an ideal hypothesis for both domains and dH (p, q) is the
learning is illustrated in Fig. 2. We assume that there exists
H-divergence between p and q.
a single domain-invariant prototype for each class, which
[h(f s ) = 1] − Pr
 t
dH (p, q) , 2 sup Pr

can be a representative point for both domains. The esti- s
h(f ) = 1 .
h∈H x ∼p
t
x ∼q
mated prototype will be near source distributions because (4)
source labels are dominant. Then, we propose to estimate where f s and f t denote the features in the source and target
the position of the prototype by moving each wi toward tar- domain respectively. In our case the features are outputs of
get features using unlabeled data in the target domain. To the feature extractor. The H-divergence relies on the capac-
achieve this, we increase the entropy measured by the simi- ity of the hypothesis space H to distinguish distributions p
larity between W and unlabeled target features. Entropy is and q. This theory states that the divergence between do-
calculated as follows, mains can be measured by training a domain classifier and
K
features with low divergence are the key to having a well-
X
H = −E(x,y)∈Du p(y = i|x) log p(y = i|x) (2)
i=1 performing task-specific classifier. Inspired by this, many
methods [11, 3, 37, 36] train a domain classifier to discrim-
where K is the number of classes and p(y = i|x) represents inate different domains while also optimizing the feature
the probability of prediction to class i, namely i th dimen- extractor to minimize the divergence.
T
F (x)
sion of p(x) = σ( T1 W kF (x)k ). To have higher entropy, that Our proposed method is also connected to Eq. 4. Al-
is, to have uniform output probability, each wi should be though we do not have a domain classifier or a domain clas-
similar to all target features. Thus, increasing the entropy sification loss, our method can be considered as minimizing
encourages the model to estimate the domain-invariant pro- domain-divergence through minimax training on unlabeled
totypes as shown in Fig. 2. target examples. We choose h to be a classifier that decides
To obtain discriminative features on unlabeled target ex- a binary domain label of a feature by the value of the en-
amples, we need to cluster unlabeled target features around tropy, namely, (
the estimated prototypes. We propose to decrease the en- 1, if H(C(f )) ≥ γ,
tropy on unlabeled target examples by the feature extractor h(f ) = (5)
0, otherwise
F . The features should be assigned to one of the prototypes
to decrease the entropy, resulting in the desired discrimina- where C denotes our classifier, H denotes entropy, and
tive features. Repeating this prototype estimation (entropy γ is a threshold to determine a domain label. Here,

8053
we assume C outputs the probability of the class pre- and 126 classes. We focus on the adaptation scenarios
diction for simplicity. Eq. 4 can be rewritten as follows, where the target domain is not real images, and construct
dH (p, q) , 2 sup Pr [h(f s ) = 1] − Pr
 t
h(f ) = 1
 7 scenarios from the four domains. See our supplemental
s
h∈H f ∼p
t
f ∼q material for more details. Office-Home [38] contains 4 do-
mains (Real, Clipart, Art, Product) with 65 classes. This
= 2 sup Pr [H(C(f s )) ≥ γ] − Pr [H(C(f t )) ≥ γ]
s
C∈C f ∼p f t ∼q dataset is one of the benchmark datasets for unsupervised
H(C(f t )) ≥ γ . domain adaptation. We evaluated our method on 12 sce-
 
≤ 2 sup Pr
t
C∈C f ∼q narios in total. Office [27] contains 3 domains (Amazon,
In the last inequality, we assume that sPr [H(C(f s )) ≥ γ] ≤
  f ∼p Webcam, DSLR) with 31 classes. Webcam and DSLR are
Pr H(C(f t )) ≥ γ . This assumption should be realistic small domains and some classes do not have a lot of exam-
f t ∼p
because we have access to many labeled source examples ples while Amazon has many examples. To evaluate on the
and train entire networks to minimize the classification domain with enough examples, we have 2 scenarios where
loss. Minimizing the cross-entropy loss (Eq. 1) on source we set Amazon as the target domain and DSLR and Web-
examples ensures that the entropy on a source example cam as the source domain.
is very small. Intuitively, this inequality states that the Implementation Details. All experiments are implemented
divergence can be bounded by the ratio of target examples in Pytorch [23]. We employ AlexNet [16] and VGG16 [34]
having entropy greater than γ. Therefore, we can have pre-trained on ImageNet. To investigate the effect of deeper
the upper bound by finding the C that achieves maximum architectures, we use ResNet34 [14] in experiments on Do-
entropy for all target features. Our objective is finding mainNet. We remove the last linear layer of these networks
features that achieve lowest divergence. We suppose there to build F , and add a K-way linear classification layer C
exists a C that achieves the maximum in the inequality with a randomly initialized weight matrix W . The value of
above, then the objective can be rewritten as, temperature T is set 0.05 following the results of [25] in
H(C(f t )) ≥ γ
 
min max Pr (6) all settings. Every iteration, we prepared two mini-batches,
ft t
C∈C f ∼q
one consisting of labeled examples and the other of unla-
Finding the minimum with respect to f t is equivalent to find beled target examples. Half of the labeled examples comes
a feature extractor F that achieves that minimum. Thus, from source and half from labeled target. Using the two
we derive the minimax objective of our proposed learning mini-batches, we calculated the objective in Eq. 3. To im-
method in Eq . 3. To sum up, our maximum entropy pro- plement the adversarial learning in Eq. 3, we use a gradient
cess can be regarded as measuring the divergence between reversal layer [11, 37] to flip the gradient with respect to
domains, whereas our entropy minimization process can be entropy loss. The sign of the gradient is flipped between C
regarded as minimizing the divergence. In our experimen- and F during backpropagation. We adopt SGD with mo-
tal section, we observe that our method actually reduces mentum of 0.9. In all experiments, we set the trade-off pa-
domain-divergence (Fig. 6c). In addition, target features rameter λ in Eq. 3 as 0.1. This is decided by the validation
produced by our method look aligned with source features performance on Real to Clipart experiments. We show the
and are just as discriminative. These come from the effect performance sensitivity to this parameter in our supplemen-
of the domain-divergence minimization. tal material, as well as more details including learning rate
scheduling.
4. Experiments Baselines. S+T [5, 25] is a model trained with the labeled
4.1. Setup source and labeled target examples without using unlabeled
We randomly selected one or three labeled examples per target examples. DANN [11] employs a domain classifier
class as the labeled training target examples (one-shot and to match feature distributions. This is one of the most pop-
three-shot setting, respectively.) We selected three other la- ular methods in UDA. For fair comparison, we modify this
beled examples as the validation set for the target domain. method so that it is trained with the labeled source, labeled
The validation examples are used for early stopping, choos- target, and unlabeled target examples. ADR [28] utilizes a
ing the hyper-parameter λ, and training scheduling. The task-specific decision boundary to align features and ensure
other target examples are used for training without labels, that they are discriminative on the target. CDAN [20] is
their labels are only used to evaluate classification accuracy one of the state-of-the art methods on UDA and performs
(%). All examples of the source are used for training. domain alignment on features that are conditioned on the
Datasets. Most of our experiments are done on a subset output of classifiers. In addition, it utilizes entropy min-
of DomainNet [24], a recent benchmark dataset for large- imization on target examples. CDAN integrates domain-
scale domain adaptation that has many classes (345) and six classifier based alignment and entropy minimization. Com-
domains. As labels of some domains and classes are very parison with these UDA methods (DANN, ADR, CDAN)
noisy, we pick 4 domains (Real, Clipart, Painting, Sketch) reveals how much gain will be obtained compared to the

8054
R to C R to P P to C C to S S to P R to S P to R MEAN
Net Method
1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot 1-shot 3-shot
S+T 43.3 47.1 42.4 45.0 40.1 44.9 33.6 36.4 35.7 38.4 29.1 33.3 55.8 58.7 40.0 43.4
DANN 43.3 46.1 41.6 43.8 39.1 41.0 35.9 36.5 36.9 38.9 32.5 33.4 53.6 57.3 40.4 42.4
ADR 43.1 46.2 41.4 44.4 39.3 43.6 32.8 36.4 33.1 38.9 29.1 32.4 55.9 57.3 39.2 42.7
AlexNet
CDAN 46.3 46.8 45.7 45.0 38.3 42.3 27.5 29.5 30.2 33.7 28.8 31.3 56.7 58.7 39.1 41.0
ENT 37.0 45.5 35.6 42.6 26.8 40.4 18.9 31.1 15.1 29.6 18.0 29.6 52.2 60.0 29.1 39.8
MME 48.9 55.6 48.0 49.0 46.7 51.7 36.3 39.4 39.4 43.0 33.3 37.9 56.8 60.7 44.2 48.2
S+T 49.0 52.3 55.4 56.7 47.7 51.0 43.9 48.5 50.8 55.1 37.9 45.0 69.0 71.7 50.5 54.3
DANN 43.9 56.8 42.0 57.5 37.3 49.2 46.7 48.2 51.9 55.6 30.2 45.6 65.8 70.1 45.4 54.7
ADR 48.3 50.2 54.6 56.1 47.3 51.5 44.0 49.0 50.7 53.5 38.6 44.7 67.6 70.9 50.2 53.7
VGG
CDAN 57.8 58.1 57.8 59.1 51.0 57.4 42.5 47.2 51.2 54.5 42.6 49.3 71.7 74.6 53.5 57.2
ENT 39.6 50.3 43.9 54.6 26.4 47.4 27.0 41.9 29.1 51.0 19.3 39.7 68.2 72.5 36.2 51.1
MME 60.6 64.1 63.3 63.5 57.0 60.7 50.9 55.4 60.5 60.9 50.2 54.8 72.2 75.3 59.2 62.1
S+T 55.6 60.0 60.6 62.2 56.8 59.4 50.8 55.0 56.0 59.5 46.3 50.1 71.8 73.9 56.9 60.0
DANN 58.2 59.8 61.4 62.8 56.3 59.6 52.8 55.4 57.4 59.9 52.2 54.9 70.3 72.2 58.4 60.7
ADR 57.1 60.7 61.3 61.9 57.0 60.7 51.0 54.4 56.0 59.9 49.0 51.1 72.0 74.2 57.6 60.4
ResNet
CDAN 65.0 69.0 64.9 67.3 63.7 68.4 53.1 57.8 63.4 65.3 54.5 59.0 73.2 78.5 62.5 66.5
ENT 65.2 71.0 65.9 69.2 65.4 71.1 54.6 60.0 59.7 62.1 52.1 61.1 75.0 78.6 62.6 67.6
MME 70.0 72.2 67.7 69.7 69.0 71.7 56.3 61.8 64.8 66.8 61.0 61.9 76.1 78.5 66.4 68.9

Table 1: Accuracy on the DomainNet dataset (%) for one-shot and three-shot settings on 4 domains, R: Real, C: Clipart, P:
Clipart, S: Sketch. Our MME method outperformed other baselines for all adaptation scenarios and for all three networks,
except for only one case where it performs similarly to ENT.

Office-Home Office exactly the same architecture used in our method. In case
Net Method of CDAN, we could not find any advantage of using our
1-shot 3-shot 1-shot 3-shot
S+T 44.1 50.0 50.2 61.8 architecture. The details of baseline implementations are in
DANN 45.1 50.3 55.8 64.8 our supplemental material.
ADR 44.5 49.5 50.6 61.3
AlexNet
CDAN 41.2 46.2 49.4 60.8
4.2. Results
ENT 38.8 50.9 48.1 65.1
Overview. The main results on the DomainNet dataset are
MME 49.2 55.2 56.5 67.6
shown in Table 1. First, our method outperformed other
baselines for all adaptation scenarios and all three networks
S+T 57.4 62.9 68.7 73.3
except for one case. On average, our method outperformed
DANN 60.0 63.9 69.8 75.0
S+T with 9.5% and 8.9% in ResNet one-shot and three-shot
ADR 57.4 63.0 69.4 73.7
VGG setting respectively. The results on Office-Home and Office
CDAN 55.8 61.8 65.9 72.9
are summarized in Table 2, where MME also outperforms
ENT 51.6 64.8 70.6 75.3
all baselines. Due to the limited space, we show the results
MME 62.7 67.6 73.4 77.0
averaged on all adaptation scenarios.
Table 2: Results on Office-Home and Office dataset (%). Comparison with UDA Methods. Generally, baseline
The value is the accuracy averaged over all adaptation sce- UDA methods need strong base networks such as VGG
narios. Performance on each setting is summarized in sup- or ResNet to perform better than S+T. Interestingly, these
plementary material. methods cannot improve the performance in some cases.
The superiority of MME over existing UDA methods is sup-
ported by Tables 1 and 2. Since CDAN uses entropy min-
existing domain alignment-based methods. ENT [13] is a imization and ENT significantly hurts the performance for
model trained with labeled source and target and unlabeled AlexNet and VGG, CDAN does not consistently improve
target using standard entropy minimization. Entropy is cal- the performance for AlexNet and VGG.
culated on unlabeled target examples and the entire network Comparison with Entropy Minimization. ENT does not
is trained to minimize it. The difference from MME is that improve performance in some cases because it does not ac-
ENT does not have a maximization process, thus compari- count for the domain gap. Comparing results on one-shot
son with this baseline clarifies its importance. and three-shot, entropy minimization gains performance
Note that all methods except for CDAN are trained with with the help of labeled examples. As we have more labeled

8055
Method R-C R-P P-C C-S S-P R-S P-R Avg 4.3. Analysis
Source 41.1 42.6 37.4 30.6 30.0 26.3 52.3 37.2 Varying Number of Labeled Examples. First, we show
DANN 44.7 36.1 35.8 33.8 35.9 27.6 49.3 37.6 the results on unsupervised domain adaptation setting in Ta-
ADR 40.2 40.1 36.7 29.9 30.6 25.9 51.5 36.4
ble 3. Our method performed better than other methods
CDAN 44.2 39.1 37.8 26.2 24.8 24.3 54.6 35.9
ENT 33.8 43.0 23.0 22.9 13.9 12.0 51.2 28.5
on average. In addition, only our method improved per-
MME 47.6 44.7 39.9 34.0 33.0 29.0 53.5 40.2 formance compared to source only model in all settings.
Furthermore, we observe the behavior of our method when
Table 3: Results on the DomainNet dataset in the unsuper-
the number of labeled examples in the target domain varies
vised domain adaptation setting (%).
from 0 to 20 per class, which corresponds to 2520 labeled
examples in total. The results are shown in Fig. 4. Our
method works much better than S+T given a few labeled ex-
amples. On the other hand, ENT needs 5 labeled examples
per class to improve performance. As we add more labeled
examples, the performance gap between ENT and ours is
reduced. This result is quite reasonable, because prototype
estimation will become more accurate without any adapta-
tion as we have more labeled target examples.
Effect of Classifier Architecture. We introduce an abla-
(a) AlexNet (b) VGG tion study on the classifier network architecture proposed
Figure 4: Accuracy vs the number of labeled target exam- in [5, 25] with AlexNet on DomainNet. As shown in Fig.
ples. The ENT method needs more labeled examples to ob- 3, we employ ℓ2 normalization and temperature scaling. In
tain similar performance to our method. this experiment, we compared it with a model having a stan-
dard linear layer without ℓ2 normalization and temperature.
The result is shown in Table 4. By using the network ar-
Method
R to C R to S chitecture proposed in [5, 25], we can improve the per-
1-shot 3-shot 1-shot 3-shot formance of both our method and the baseline S+T model
S+T (Standard Linear) 41.4 44.3 26.5 28.7 (model trained only on source examples and a few labeled
S+T (Few-shot [5, 25]) 43.3 47.1 29.1 33.3
target examples.) Therefore, we can argue that the net-
MME (Standard Linear) 44.9 47.7 30.0 32.2
MME (Few-shot [5, 25]) 48.9 55.6 33.3 37.9
work architecture is an effective technique to improve per-
formance when we are given a few labeled examples from
Table 4: Comparison of classifier architectures on the Do- the target domain.
mainNet dataset using AlexNet, showing the effectiveness
Feature Visualization. In addition, we plot the learned fea-
of the architecture proposed in [5, 25].
tures with t-SNE [21] in Fig. 5. We employ the scenario
Real to Clipart of DomainNet using AlexNet as the pre-
trained backbone. Fig 5 (a-d) visualizes the target features
target examples, the estimation of prototypes will be more and estimated prototypes. The color of the cross represents
accurate without any adaptation. In case of ResNet, entropy its class, black points are the prototypes. With our method,
minimization often improves accuracy. There are two po- the target features are clustered to their prototypes and do
tential reasons. First, ResNet pre-trained on ImageNet has not have a large variance within the class. We visualize fea-
a more discriminative representation than other networks. tures on the source domain (red cross) and target domain
Therefore, given a few labeled target examples, the model (blue cross) in Fig. 5 (e-h). As we discussed in the method
can extract more discriminative features, which contributes section, our method aims to minimize domain-divergence.
to the performance gain in entropy minimization. Second, Indeed, target features are well-aligned with source features
ResNet has batch-normalization (BN) layers [15]. It is re- with our method. Judging from Fig. 5f, entropy minimiza-
ported that BN has the effect of aligning feature distribu- tion (ENT) also tries to extract discriminative features, but
tions [4, 18]. Hence, entropy minimization was done on it fails to find domain-invariant prototypes.
aligned feature representations, which improved the perfor- Quantitative Feature Analysis. We quantitatively investi-
mance. When there is a large domain gap such as C to S, gate the characteristics of the features we obtain using the
S to P, and R to S in Table 1, BN is not enough to handle same adaptation scenario. First, we perform the analysis on
the domain gap. Therefore, our proposed method performs the eigenvalues of the covariance matrix of target features.
much better than entropy minimization in such cases. We We follow the analysis done in [9]. Eigenvectors represent
show an analysis of BN in our supplemental material, re- the components of the features and eigenvalues represent
vealing its effectiveness for entropy minimization. their contributions. If the features are highly discrimina-

8056
(a) Ours (b) ENT (c) DANN (d) S+T

(e) Ours (f) ENT (g) DANN (h) S+T


Figure 5: Feature visualization with t-SNE. (a-d) We plot the class prototypes (black circles) and features on the target domain
(crosses). The color of a cross represents its class. We observed that features on our method show more discrimative features
than other methods. (e-h) Red: Features of the source domain. Blue: Features of the target domain. Our method’s features
are well-aligned between domains compared to other methods.

(a) Eigenvalues (b) Entropy (c) A-distance


Figure 6: (a) Eigenvalues of the covariance matrix of the features on the target domain. Eigenvalues reduce quickly in our
method, which shows that features are more discriminative than other methods. (b) Our method achieves lower entropy than
baselines except ENT. (c) Our method clearly reduces domain-divergence compared to S+T.

tive, only a few components are needed to summarize them. model for semi-supervised domain adaptation (SSDA). Our
Therefore, in such a case, the first few eigenvalues are ex- model consists of a feature encoding network, followed by a
pected to be large, and the rest to be small. The features are classification layer that computes the features’ similarity to
clearly summarized by fewer components in our method as a set of estimated prototypes (representatives of each class).
shown in Fig. 6a. Second, we show the change of entropy Adaptation is achieved by alternately maximizing the con-
value on the target in Fig. 6b. ENT diminishes the entropy ditional entropy of unlabeled target data with respect to the
quickly, but results in poor performance. This indicates that classifier and minimizing it with respect to the feature en-
the method increases the confidence of predictions incor- coder. We empirically demonstrated the superiority of our
rectly while our method achieves higher accuracy at the method over many baselines, including conventional feature
same time. Finally, in Fig. 6c, we calculated A-distance alignment and few-shot methods, setting a new state of the
by training a SVM as a domain classifier as proposed in [2]. art for SSDA.
Our method greatly reduces the distance compared to S+T.
The claim that our method reduces a domain divergence is
empirically supported with this result. 6. Acknowledgements
5. Conclusion
We proposed a novel Minimax Entropy (MME) ap- This work was supported by Honda, DARPA, BAIR,
proach that adversarially optimizes an adaptive few-shot BDD, and NSF Award No. 1535797.

8057
References [21] Laurens van der Maaten and Geoffrey Hinton. Visualizing
data using t-sne. JMLR, 9(11):2579–2605, 2008.
[1] Shuang Ao, Xiang Li, and Charles X Ling. Fast generalized [22] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken
distillation for semi-supervised domain adaptation. In AAAI, Nakae, and Shin Ishii. Distributional smoothing with virtual
2017. adversarial training. arXiv, 2015.
[2] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando [23] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
Pereira. Analysis of representations for domain adaptation. Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al-
In NIPS, 2007. ban Desmaison, Luca Antiga, and Adam Lerer. Automatic
[3] Konstantinos Bousmalis, George Trigeorgis, Nathan Silber- differentiation in pytorch. 2017.
man, Dilip Krishnan, and Dumitru Erhan. Domain separa- [24] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate
tion networks. In NIPS, 2016. Saenko, and Bo Wang. Moment matching for multi-source
[4] Fabio Maria Cariucci, Lorenzo Porzi, Barbara Caputo, Elisa domain adaptation. ICCV, 2019.
Ricci, and Samuel Rota Bulò. Autodial: Automatic domain [25] Rajeev Ranjan, Carlos D Castillo, and Rama Chellappa. L2-
alignment layers. In ICCV, 2017. constrained softmax loss for discriminative face verification.
[5] Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank arXiv, 2017.
Wang, and Jia-Bin Huang. A closer look at few-shot classi- [26] Sachin Ravi and Hugo Larochelle. Optimization as a model
fication. arXiv, 2018. for few-shot learning. arXiv, 2016.
[6] Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and [27] Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell.
Luc Van Gool. Domain adaptive faster r-cnn for object de- Adapting visual category models to new domains. In ECCV,
tection in the wild. In CVPR, 2018. 2010.
[7] Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and [28] Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate
Ruslan R Salakhutdinov. Good semi-supervised learning that Saenko. Adversarial dropout regularization. In ICLR, 2018.
requires a bad gan. In NIPS, 2017. [29] Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate
[8] Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, and Saenko. Strong-weak distribution alignment for adaptive ob-
Trevor Darrell. Semi-supervised domain adaptation with in- ject detection. arXiv, 2018.
stance constraints. In CVPR, 2013. [30] Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tat-
[9] Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, and suya Harada. Maximum classifier discrepancy for unsuper-
Nikhil Naik. Maximum-entropy fine grained classification. vised domain adaptation. In CVPR, 2018.
In NIPS, 2018. [31] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki
Cheung, Alec Radford, and Xi Chen. Improved techniques
[10] Ayse Erkan and Yasemin Altun. Semi-supervised learning
for training gans. In NIPS, 2016.
via generalized maximum entropy. In AISTATS, 2010.
[32] Swami Sankaranarayanan, Yogesh Balaji, Arpit Jain,
[11] Yaroslav Ganin and Victor Lempitsky. Unsupervised domain
Ser Nam Lim, and Rama Chellappa. Learning from synthetic
adaptation by backpropagation. In ICML, 2014.
data: Addressing domain shift for semantic segmentation. In
[12] Spyros Gidaris and Nikos Komodakis. Dynamic few-shot CVPR, 2018.
visual learning without forgetting. In CVPR, 2018.
[33] Rui Shu, Hung H Bui, Hirokazu Narui, and Stefano Ermon.
[13] Yves Grandvalet and Yoshua Bengio. Semi-supervised A dirt-t approach to unsupervised domain adaptation. In
learning by entropy minimization. In NIPS, 2005. ICLR, 2018.
[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [34] Karen Simonyan and Andrew Zisserman. Very deep convo-
Deep residual learning for image recognition. In CVPR, lutional networks for large-scale image recognition. arXiv,
2016. 2014.
[15] Sergey Ioffe and Christian Szegedy. Batch normalization: [35] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical
Accelerating deep network training by reducing internal co- networks for few-shot learning. In NIPS, 2017.
variate shift. arXiv, 2015. [36] Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell.
[16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Adversarial discriminative domain adaptation. In CVPR,
Imagenet classification with deep convolutional neural net- 2017.
works. In NIPS, 2012. [37] Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and
[17] Samuli Laine and Timo Aila. Temporal ensembling for semi- Trevor Darrell. Deep domain confusion: Maximizing for
supervised learning. arXiv, 2016. domain invariance. arXiv, 2014.
[18] Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and [38] Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty,
Xiaodi Hou. Revisiting batch normalization for practical do- and Sethuraman Panchanathan. Deep hashing network for
main adaptation. arXiv, 2016. unsupervised domain adaptation. In CVPR, 2017.
[19] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I [39] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan
Jordan. Learning transferable features with deep adaptation Wierstra, et al. Matching networks for one shot learning.
networks. In ICML, 2015. In NIPS, 2016.
[20] Mingsheng Long, Zhangjie Cao, Jianmin Wang, and [40] Ting Yao, Yingwei Pan, Chong-Wah Ngo, Houqiang Li, and
Michael I Jordan. Conditional adversarial domain adapta- Tao Mei. Semi-supervised domain adaptation with subspace
tion. In NIPS, 2018. learning for visual recognition. In CVPR, 2015.

8058

You might also like