Unsupervised Learning of Disentangled Representation via Auto-Encoding: A Survey
<p>An illustration of the notation used in this paper. For <math display="inline"><semantics> <mrow> <mi>X</mi> <mo>=</mo> <mfenced separators="" open="{" close="}"> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>x</mi> <mi>N</mi> </msub> </mfenced> </mrow> </semantics></math>, a set of <span class="html-italic">N</span> observations. Disentangled representation learning is expected to identify the distinct generative factors <math display="inline"><semantics> <mrow> <mi>V</mi> <mo>=</mo> <mfenced separators="" open="{" close="}"> <msub> <mi>ν</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>ν</mi> <mn>2</mn> </msub> <mo>…</mo> <mo>,</mo> <msub> <mi>ν</mi> <mi>n</mi> </msub> </mfenced> </mrow> </semantics></math> that explain these observations and encode them with independent latent variables <math display="inline"><semantics> <mrow> <mi>Z</mi> <mo>=</mo> <mfenced separators="" open="{" close="}"> <msub> <mi>z</mi> <mn>1</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>z</mi> <mi>n</mi> </msub> </mfenced> </mrow> </semantics></math> in latent space.</p> "> Figure 2
<p>The structure of the variational auto-encoder (VAE). The stochastic encoder <math display="inline"><semantics> <mrow> <msub> <mi>q</mi> <mi>ϕ</mi> </msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>, also called the inference model, learns stochastic mappings between an observed <span class="html-italic">X</span>-space (input data) and a latent <span class="html-italic">Z</span>-space (hidden representation). The generative model <math display="inline"><semantics> <mrow> <msub> <mi>p</mi> <mi>θ</mi> </msub> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>, a stochastic decoder, reconstructs the data given the hidden representation.</p> "> Figure 3
<p>Schematic overview of different choices of augmenting the evidence lower bound of VAE. To improve disentanglement, most approaches focus on regularizing the original VAE objective by (i) up-weighting the ELBO with an adjustable hyperparameter <math display="inline"><semantics> <mi>β</mi> </semantics></math>, resulting in a <math display="inline"><semantics> <mi>β</mi> </semantics></math>-VAE approach. (ii) Adding different terms to the ELBO, such as mutual information, total correlation or covariance, resulting in InfoMax-VAE, <math display="inline"><semantics> <mi>β</mi> </semantics></math>-TCVAE, Factor-VAE and DIP-VAE approaches, respectively.</p> ">
Abstract
:1. Introduction
2. Background
2.1. Auto-Encoders
2.2. Variational Auto-Encoders (VAEs)
2.3. Reconstruction Error
2.4. Mutual Information Theory
3. Methods
3.1. -Variational Auto-Encoder
3.2. InfoMax-Variational Auto-Encoder
3.3. Factor Variational Auto-Encoder
3.4. -Total Correlation Variational Auto-Encoder
3.5. DIP-Variational Auto-Encoder
4. Metrics
4.1. Score
- Randomly select a generative factor .
- Create a batch of couples vectors, and , where the value of the chosen factor is kept fixed and equal within the pair while the other generative factors are chosen randomly. For a batch of L samples:
- Map each generated pair to a pair of latent variables using the inference model .
- Compute the value of the absolute linear difference between the variables related to the sample:
- The mean of all pair differences in a batch gives a single instance in the final training set. These steps are repeated for each generative factor in order to create a substantial training set.
- Train a linear classifier on the generated training set to predict which generative factor has been fixed.
- score, also known as -VAE metric, is the accuracy of the classifier.
4.2. Variance Score
- Randomly choose a generative factor .
- Generate a batch of vectors, where the value of the selected factor is held fixed in the batch while the other generative factors are randomly selected. For a batch of L samples:
- Map each generated vector to latent code using the inference model:
- Normalize each variable within the latent representation using its empirical standard deviation calculated on the dataset. For a batch of L samples:
- Calculate the empirical variance in each code of the normalized representations.
- The factor index k and the latent variable index that has the lowest variance provide a training instance for the classifier. The factor index k and the index of the code dimension with the lowest variance give one training point for the classifier. These steps are repeated for each generative factor in order to create a substantial training set.
- Train a majority vote classifier on the generated training set to predict which generative factor was fixed.
- Variance score is equivalent to the classifier accuracy.
4.3. Mutual Information Gap (MIG Score)
- Calculate the mutual information between each pair of latent variables and known generative factors.
- Each generative factor may have high mutual information with several latent variables. Therefore, for every single factor, classify latent variables according to the amount of information they stored about this factor.
- Calculate the difference between the top two values of mutual information for each generative factor.
- Normalize this difference by dividing by the entropy of the corresponding generative factor.
- The Mutual Information Gap (MIG) score is equivalent to the average of these differences [41]:
4.4. Attribute Predictability Score (SAP)
- For each generative factor, compute the R2 score of linear regression (for continuous factors) or classification score (balanced accuracy for categorical factors) of predicting a j-th generative factor using only a i-th variable in the latent representation.
- Compute the difference between the top two most-predictive latent codes.
- The mean of those differences is the Attribute Predictability Score (SAP) [43].
5. Discussion
5.1. Methods
5.2. Metrics
5.2.1. Properties of a Disentangled Representation
5.2.2. Comparison
6. Conclusions and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1–127. [Google Scholar] [CrossRef]
- Higgins, I.; Amos, D.; Pfau, D.; Racaniere, S.; Matthey, L.; Rezende, D.; Lerchner, A. Towards a definition of disentangled representations. arXiv 2018, arXiv:1812.02230. [Google Scholar]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Szabó, A.; Hu, Q.; Portenier, T.; Zwicker, M.; Favaro, P. Challenges in disentangling independent factors of variation. arXiv 2017, arXiv:1711.02245. [Google Scholar]
- Suter, R.; Miladinovic, D.; Schölkopf, B.; Bauer, S. Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness. In Proceedings of the 36th International Conference on Machine Learning, Beach, CA, USA, 10–15 June 2019; Volume 97, pp. 6056–6065. [Google Scholar]
- Le-Khac, P.; Healy, G.; Smeaton, A. Contrastive Representation Learning: A Framework and Review. IEEE Access 2020, 8, 193907–193934. [Google Scholar] [CrossRef]
- Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
- Báscones, D.; González, C.; Mozos, D. Hyperspectral Image Compression Using Vector Quantization, PCA and JPEG2000. Remote Sens. 2018, 10, 907. [Google Scholar] [CrossRef] [Green Version]
- Stone, J. Independent component analysis: An introduction. Trends Cogn. Sci. 2002, 6, 59–64. [Google Scholar] [CrossRef]
- Naik, G.; Kumar, D. An overview of independent component analysis and its applications. Informatica 2011, 35, 63–82. [Google Scholar]
- Henry, E.; Hofrichter, J. Singular value decomposition: Application to analysis of experimental data. Methods Enzymol. 1992, 210, 129–192. [Google Scholar]
- Montero, M.; Ludwig, C.; Costa, R.; Malhotra, G.; Bowers, J. The Role of Disentanglement in Generalisation. Available online: https://openreview.net/forum?id=qbH974jKUVy (accessed on 15 February 2023).
- Shen, Z.; Liu, J.; He, Y.; Zhang, X.; Xu, R.; Yu, H.; Cui, P. Towards out-of-distribution generalization: A survey. arXiv 2021, arXiv:2108.13624. [Google Scholar]
- Duan, S.; Matthey, L.; Saraiva, A.; Watters, N.; Burgess, C.; Lerchner, A.; Higgins, I. Unsupervised model selection for variational disentangled representation learning. arXiv 2019, arXiv:1905.12614. [Google Scholar]
- Zheng, H.; Lapata, M. Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning. arXiv 2022, arXiv:2212.05982. [Google Scholar]
- Dittadi, A.; Träuble, F.; Locatello, F.; Wüthrich, M.; Agrawal, V.; Winther, O.; Bauer, S.; Schölkopf, B. On the transfer of disentangled representations in realistic settings. arXiv 2020, arXiv:2010.14407. [Google Scholar]
- Montero, M.; Bowers, J.; Costa, R.; Ludwig, C.; Malhotra, G. Lost in Latent Space: Disentangled Models and the Challenge of Combinatorial Generalisation. arXiv 2022, arXiv:2204.02283. [Google Scholar]
- Locatello, F.; Tschannen, M.; Bauer, S.; Rätsch, G.; Schölkopf, B.; Bachem, O. Disentangling factors of variation using few labels. arXiv 2019, arXiv:1905.01258. [Google Scholar]
- Schölkopf, B.; Janzing, D.; Peters, J.; Sgouritsa, E.; Zhang, K.; Mooij, J. On causal and anticausal learning. arXiv 2012, arXiv:1206.6471. [Google Scholar]
- Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. beta-vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Ridgeway, K. A survey of inductive biases for factorial representation-learning. arXiv 2016, arXiv:1612.05299. [Google Scholar]
- Wang, Q.; Zhou, H.; Li, G.; Guo, J. Single Image Super-Resolution Method Based on an Improved Adversarial Generation. Appl. Sci. 2022, 12, 6067. [Google Scholar] [CrossRef]
- Revell, G. Madeleine: Poetry and Art of an Artificial Intelligence. Arts 2022, 11, 83. [Google Scholar] [CrossRef]
- Tsai, Y.; Liang, P.; Zadeh, A.; Morency, L.; Salakhutdinov, R. Learning factorized multimodal representations. arXiv 2018, arXiv:1806.06176. [Google Scholar]
- Hsu, W.; Glass, J. Disentangling by partitioning: A representation learning framework for multimodal sensory data. arXiv 2018, arXiv:1805.11264. [Google Scholar]
- Xu, Z.; Lin, T.; Tang, H.; Li, F.; He, D.; Sebe, N.; Timofte, R.; Van Gool, L.; Ding, E. Predict, prevent, and evaluate: Disentangled text-driven image manipulation empowered by pre-trained vision-language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18229–18238. [Google Scholar]
- Zou, W.; Ding, J.; Wang, C. Utilizing BERT Intermediate Layers for Multimodal Sentiment Analysis. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 11–15 July 2022; pp. 1–6. [Google Scholar]
- Liu, X.; Sanchez, P.; Thermos, S.; O’Neil, A.; Tsaftaris, S. Learning disentangled representations in the imaging domain. Med. Image Anal. 2022, 80, 102516. [Google Scholar] [CrossRef] [PubMed]
- Chartsias, A.; Joyce, T.; Papanastasiou, G.; Semple, S.; Williams, M.; Newby, D.; Dharmakumar, R.; Tsaftaris, S. Disentangled representation learning in cardiac image analysis. Med. Image Anal. 2019, 58, 101535. [Google Scholar] [CrossRef] [Green Version]
- Hsieh, J.; Liu, B.; Huang, D.; Fei-Fei, L.; Niebles, J. Learning to decompose and disentangle representations for video prediction. Adv. Neural Inf. Process. Syst. 2018, 31, 515–524. [Google Scholar]
- Denton, E.L. Unsupervised learning of disentangled representations from video. Adv. Neural Inf. Process. Syst. 2017, 30, 4417–4426. [Google Scholar]
- Comas, A.; Zhang, C.; Feric, Z.; Camps, O.; Yu, R. Learning disentangled representations of videos with missing data. Adv. Neural Inf. Process. Syst. 2020, 33, 3625–3635. [Google Scholar]
- Guen, V.; Thome, N. Disentangling physical dynamics from unknown factors for unsupervised video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11474–11484. [Google Scholar]
- Fan, K.; Joung, C.; Baek, S. Sequence-to-Sequence Video Prediction by Learning Hierarchical Representations. Appl. Sci. 2020, 10, 8288. [Google Scholar] [CrossRef]
- Zou, Y.; Liu, H.; Gui, T.; Wang, J.; Zhang, Q.; Tang, M.; Li, H.; Wang, D. Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents. arXiv 2022, arXiv:2203.02898. [Google Scholar]
- Dougrez-Lewis, J.; Liakata, M.; Kochkina, E.; He, Y. Learning disentangled latent topics for twitter rumour veracity classification. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 3902–3908. [Google Scholar]
- Zhu, Q.; Zhang, W.; Liu, T.; Wang, W. Neural stylistic response generation with disentangled latent variables. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Bangkok, Thailand, 1–6 August 2020; pp. 4391–4401. [Google Scholar]
- Lake, B.; Ullman, T.; Tenenbaum, J.; Gershman, S. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef] [Green Version]
- Kingma, D.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Burgess, C.; Higgins, I.; Pal, A.; Matthey, L.; Watters, N.; Desjardins, G.; Lerchner, A. Understanding disentangling in β-VAE. arXiv 2018, arXiv:1804.03599. [Google Scholar]
- Chen, R.; Li, X.; Grosse, R.; Duvenaud, D. Isolating Sources of Disentanglement in Variational Autoencoders. arXiv 2018, arXiv:1802.04942. [Google Scholar]
- Kim, H.; Mnih, A. Disentangling by factorising. arXiv 2018, arXiv:1802.05983. [Google Scholar]
- Kumar, A.; Sattigeri, P.; Balakrishnan, A. Variational inference of disentangled latent concepts from unlabeled observations. arXiv 2017, arXiv:1711.00848. [Google Scholar]
- Rezaabad, A.; Vishwanath, S. Learning representations by maximizing mutual information in variational autoencoders. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 2729–2734. [Google Scholar]
- Hejna, J.; Vangipuram, A.; Liu, K. Improving Latent Representations via Explicit Disentanglement. 2020. Available online: http://joeyhejna.com/files/disentanglement.pdf (accessed on 13 December 2022).
- Locatello, F.; Bauer, S.; Lucic, M.; Rätsch, G.; Gelly, S.; Schölkopf, B.; Bachem, O. A sober look at the unsupervised learning of disentangled representations and their evaluation. arXiv 2020, arXiv:2010.14766. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Cho, W.; Choi, Y. LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators. Sensors 2022, 22, 8761. [Google Scholar] [CrossRef]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv. Neural Inf. Process. Syst. 2016, 29, 2172–2180. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Lin, Z.; Thekumparampil, K.; Fanti, G.; Oh, S. Infogan-cr: Disentangling generative adversarial networks with contrastive regularizers. arXiv 2019, arXiv:1906.06034. [Google Scholar]
- Xiao, T.; Hong, J.; Ma, J. Dna-gan: Learning disentangled representations from multi-attribute images. arXiv 2017, arXiv:1711.05415. [Google Scholar]
- Jeon, I.; Lee, W.; Pyeon, M.; Kim, G. Ib-gan: Disentangled representation learning with information bottleneck generative adversarial networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 7926–7934. [Google Scholar]
- Jing, L.; Tian, Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4037–4058. [Google Scholar] [CrossRef] [PubMed]
- Ericsson, L.; Gouk, H.; Loy, C.; Hospedales, T. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Process. Mag. 2022, 39, 42–62. [Google Scholar] [CrossRef]
- Schiappa, M.; Rawat, Y.; Shah, M. Self-supervised learning for videos: A survey. ACM Comput. Surv. 2022. [Google Scholar] [CrossRef]
- Xie, Y.; Arildsen, T.; Tan, Z. Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective. In Proceedings of the 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), Gold Coast, Australia, 25–28 October 2021; pp. 1–6. [Google Scholar]
- Zhang, Z.; Zhang, L.; Zheng, X.; Tian, J.; Zhou, J. Self-supervised adversarial example detection by disentangled representation. arXiv 2021, arXiv:2105.03689. [Google Scholar]
- Kaya, B.; Timofte, R. Self-supervised 2D image to 3D shape translation with disentangled representations. In Proceedings of the 2020 International Conference on 3D Vision (3DV), Fukuoka, Japan, 25–28 November 2020; pp. 1039–1048. [Google Scholar]
- Wang, T.; Yue, Z.; Huang, J.; Sun, Q.; Zhang, H. Self-supervised learning disentangled group representation as feature. Adv. Neural Inf. Process. Syst. 2021, 34, 18225–18240. [Google Scholar]
- Locatello, F.; Bauer, S.; Lucic, M.; Raetsch, G.; Gelly, S.; Schölkopf, B.; Bachem, O. Challenging common assumptions in the unsupervised learning of disentangled representations. Int. Conf. Mach. Learn. 2019, 97, 4114–4124. [Google Scholar]
- Baldi, P. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Bellevue, DC, USA, 27 June 2012; pp. 37–49. [Google Scholar]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Pham, C.; Ladjal, S.; Newson, A. PCA-AE: Principal Component Analysis Autoencoder for Organising the Latent Space of Generative Networks. J. Math. Imaging Vis. 2022, 64, 569–585. [Google Scholar] [CrossRef]
- Bank, D.; Koenigstein, N.; Giryes, R. Autoencoders. arXiv 2020, arXiv:2003.05991. [Google Scholar]
- Song, C.; Liu, F.; Huang, Y.; Wang, L.; Tan, T. Auto-encoder Based Data Clustering. In Proceedings of the CIARP, Havana, Cuba, 20–23 November 2013. [Google Scholar]
- Gogoi, M.; Begum, S. Image classification using deep autoencoders. In Proceedings of the 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Tamil Nadu, India, 14–16 December 2017; pp. 1–5. [Google Scholar]
- Zhang, Y.; Lee, K.; Lee, H. Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification. In Proceedings of the 33rd International Conference on Machine Learning, York City, NY, USA, 19–24 June 2016; Volume 48, pp. 612–621. [Google Scholar]
- Hoffman, M.; Blei, D.; Wang, C.; Paisley, J. Stochastic variational inference. J. Mach. Learn. Res. 2013, 14, 1303–1347. [Google Scholar]
- Jha, A.; Anand, S.; Singh, M.; Veeravasarapu, V. Disentangling factors of variation with cycle-consistent variational auto-encoders. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 805–820. [Google Scholar]
- Doersch, C. Tutorial on variational autoencoders. arXiv 2016, arXiv:1606.05908. [Google Scholar]
- Kingma, D.; Welling, M. An introduction to variational autoencoders. arXiv 2019, arXiv:1906.02691. [Google Scholar]
- Rezende, D.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. Int. Conf. Mach. Learn. 2017, 32, 1278–1286. [Google Scholar]
- Asperti, A.; Trentin, M. Balancing reconstruction error and kullback-leibler divergence in variational autoencoders. IEEE Access 2020, 8, 199440–199448. [Google Scholar] [CrossRef]
- Hu, M.; Liu, Z.; Liu, J. Learning Unsupervised Disentangled Capsule via Mutual Information. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 3730–3738. [Google Scholar]
- Aubry, M.; Maturana, D.; Efros, A.; Russell, B.; Sivic, J. Seeing 3d chairs: Exemplar part-based 2d-3d alignment using a large dataset of cad models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3762–3769. [Google Scholar]
- Paysan, P.; Knothe, R.; Amberg, B.; Romdhani, S.; Vetter, T. A 3D face model for pose and illumination invariant face recognition. In Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, Genova, Italy, 2–4 September 2009; pp. 296–301. [Google Scholar]
- Matthey, L.; Higgins, I.; Hassabis, D.; Lerchner, A. dSprites: Disentanglement Testing Sprites Dataset. 2017. Available online: https://github.com/deepmind/dsprites-dataset/ (accessed on 13 December 2022).
- Kullback, S. Information Theory and Statistics; (Courier Corporation) Dover Publications: New York, NY, USA, 1997. [Google Scholar]
- Hoffman, M.; Johnson, M. Elbo surgery: Yet another way to carve up the variational evidence lower bound. In Proceedings of the Workshop in Advances in Approximate Bayesian Inference, NIPS, Barcelona, Spain, 9 December 2016; Volume 1. [Google Scholar]
- Makhzani, A.; Frey, B. Pixelgan autoencoders. Adv. Neural Inf. Process. Syst. 2017, 30, 1972–1982. [Google Scholar]
- Watanabe, S. Information Theoretical Analysis of Multivariate Correlation. IBM J. Res. Dev. 1960, 4, 66–82. [Google Scholar] [CrossRef]
- Nguyen, X.; Wainwright, M.; Jordan, M. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 2010, 56, 5847–5861. [Google Scholar] [CrossRef] [Green Version]
- Sugiyama, M.; Suzuki, T.; Kanamori, T. Density-ratio matching under the Bregman divergence: A unified framework of density-ratio estimation. Ann. Inst. Stat. Math. 2011, 64, 1009–1044. [Google Scholar] [CrossRef]
- Harrison, R. Introduction to monte carlo simulation. AIP Conf. Proc. 2010, 1204, 17–21. [Google Scholar] [PubMed] [Green Version]
- Eastwood, C.; Williams, C. A framework for the quantitative evaluation of disentangled representations. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zaidi, J.; Boilard, J.; Gagnon, G.; Carbonneau, M. Measuring disentanglement: A review of metrics. arXiv 2020, arXiv:2012.09276. [Google Scholar]
- Sepliarskaia, A.; Kiseleva, J.; Rijke, M. Evaluating disentangled representations. arXiv 2019, arXiv:1910.05587. [Google Scholar]
- Ridgeway, K.; Mozer, M. Learning deep disentangled embeddings with the f-statistic loss. Adv. Neural Inf. Process. Syst. 2018, 31, 185–194. [Google Scholar]
- Chen, X.; Kingma, D.; Salimans, T.; Duan, Y.; Dhariwal, P.; Schulman, J.; Sutskever, I.; Abbeel, P. Variational lossy autoencoder. arXiv 2016, arXiv:1611.02731. [Google Scholar]
- Zhao, S.; Song, J.; Ermon, S. Towards deeper understanding of variational autoencoding models. arXiv 2017, arXiv:1702.08658. [Google Scholar]
- Zhang, K. On mode collapse in generative adversarial networks. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, 14–17 September 2021; pp. 563–574. [Google Scholar]
- Alemi, A.; Poole, B.; Fischer, I.; Dillon, J.; Saurous, R.; Murphy, K. Fixing a broken ELBO. Int. Conf. Mach. Learn. 2018, 80, 159–168. [Google Scholar]
- Liu, J.; Yuan, Z.; Pan, Z.; Fu, Y.; Liu, L.; Lu, B. Diffusion Model with Detail Complement for Super-Resolution of Remote Sensing. Remote Sens. 2022, 14, 4834. [Google Scholar] [CrossRef]
- Benrhouma, O.; Alkhodre, A.; AlZahrani, A.; Namoun, A.; Bhat, W. Using Singular Value Decomposition and Chaotic Maps for Selective Encryption of Video Feeds in Smart Traffic Management. Appl. Sci. 2022, 12, 3917. [Google Scholar] [CrossRef]
- Andriyanov, N. Methods for preventing visual attacks in convolutional neural networks based on data discard and dimensionality reduction. Appl. Sci. 2021, 11, 5235. [Google Scholar] [CrossRef]
- Samuel, D.; Cuzzolin, F. Svd-gan for real-time unsupervised video anomaly detection. In Proceedings of the British Machine Vision Conference (BMVC), Virtual, 22–25 November 2021. [Google Scholar]
Methods | Metrics |
---|---|
-VAE [20] | Z-diff Score |
Factor-VAE [42] | Z-min Variance Score |
-TCVAE [41] | Mutual Information Gap (MIG) |
DIP-VAE [43] | Attribute Predictability Score (SAP) |
InfoMax-VAE [44] | - |
Term | Mathematical Expression |
---|---|
Prior | |
Generative Model (Decoder) | |
Inference Model (Encoder) | |
Data Log-Likelihood | |
Kullback–Leibler Divergence | |
Evidence Lower Bound (ELBO) |
Datasets | Encoder | Decoder |
---|---|---|
2D Shape | Input: 64 × 64 × number of channels, | Input: , FC, 256 ReLU, FC, |
3D Shape | Conv 32 × 4 × 4 (stride 2), 32 × 4 × 4 (stride 2), | 4 × 4 × 64 ReLU, Upconv 64 × 4 × 4 (stride 2), |
3D Chairs | 64 × 4 × 4 (stride 2), 64 × 4 × 4 (stride 2), | 32 × 4 × 4 (stride 2), 32 × 4 × 4 (stride 2), |
3D Faces | FC 256, ReLU activation. | 4 × 4 × number of channels (stride 2), ReLU activation. Bernoulli Decoder |
CelebA | Input: 64 × 64 × 3, Conv 32 × 4 × 4 (stride 2), 32 × 4 × 4 (stride 2), 64 × 4 × 4 (stride 2), 64 × 4 × 4 (stride 2), FC 256, ReLU activation | , FC, 256 ReLU, FC, 4 × 4 × 64 ReLU, Upconv 64 × 4 × 4 (stride 2), 32 × 4 × 4 (stride 2), 32 × 4 × 4 (stride 2), 4 × 4 × number of channels (stride 2), ReLU activation. Gaussian Decoder |
Method | Regularizers | |
---|---|---|
VAE | − | |
-VAE | − | |
InfoMax-VAE | ||
Factor-VAE | ||
-TCVAE | + | |
DIP-VAE |
Metric | Satisfy Modularity | Satisfy Compactness | Satisfy Informativeness |
---|---|---|---|
Yes | No | No | |
Yes | Yes | No | |
SAP | No | Yes | Yes |
MIG | Yes | Yes | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Eddahmani, I.; Pham, C.-H.; Napoléon, T.; Badoc, I.; Fouefack, J.-R.; El-Bouz, M. Unsupervised Learning of Disentangled Representation via Auto-Encoding: A Survey. Sensors 2023, 23, 2362. https://doi.org/10.3390/s23042362
Eddahmani I, Pham C-H, Napoléon T, Badoc I, Fouefack J-R, El-Bouz M. Unsupervised Learning of Disentangled Representation via Auto-Encoding: A Survey. Sensors. 2023; 23(4):2362. https://doi.org/10.3390/s23042362
Chicago/Turabian StyleEddahmani, Ikram, Chi-Hieu Pham, Thibault Napoléon, Isabelle Badoc, Jean-Rassaire Fouefack, and Marwa El-Bouz. 2023. "Unsupervised Learning of Disentangled Representation via Auto-Encoding: A Survey" Sensors 23, no. 4: 2362. https://doi.org/10.3390/s23042362