Abstract
Co-creation with AI is trending and AI-generation of images from textual descriptions has shown advanced and attractive capabilities. However, commonly trained machine-learning models or built AI-based systems may have deficient points to generate satisfied results for personal usage or novice users of painting or AI co-creation, maybe because of deficient understanding of personal textual expressions or low customization of trained text-to-image machine learning models. Therefore, we assist in creating flexible and diverse visual contents from textual descriptions, by developing neural-networks models with machine learning. In modeling, we generate synthesized images using word-visual co-occurrence by Transformer model and synthesize images by decoding visual tokens. To improve visual and textual expressions and their relevance with more diversities, we utilize contrastive learning applying on texts, images, or pairs of texts and images. In the experimental results of a dataset of birds, we showed that the rendering quality was required of models with some scale neural-networks, and necessary training process with fined training by applying relatively low learning rates until the end of training. We further showed contrastive learning was possible for improvement of visual and textual expressions and their relevance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chang, Y., Subramanian, D., Pavuluri, R., Dinger, T.: Time series representation learning with contrastive triplet selection. In: Dasgupta, G., et al. (eds.) CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), Bangalore, India, 8–10 January 2022, pp. 46–53. ACM (2022)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR (2020)
Cherepkov, A., Voynov, A., Babenko, A.: Navigating the GAN parameter space for semantic image editing. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 3671–3680. Computer Vision Foundation/IEEE (2021)
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 12873–12883. Computer Vision Foundation/IEEE (2021)
Haralabopoulos, G., Torres, M.T., Anagnostopoulos, I., McAuley, D.: Text data augmentations: permutation, antonyms and negation. Expert Syst. Appl. 177, 114769 (2021)
Hong, Y., Niu, L., Zhang, J., Zhao, W., Fu, C., Zhang, L.: F2GAN: fusing-and-filling GAN for few-shot image generation. In: Chen, C.W., et al. (eds.) MM 2020: The 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, USA, 12–16 October 2020, pp. 2535–2543. ACM (2020)
Ji, G., Zhu, L., Zhuge, M., Fu, K.: Fast camouflaged object detection via edge-based reversible re-calibration network. Pattern Recognit. 123, 108414 (2022)
Liu, D., Nabail, M., Hertzmann, A., Kalogerakis, E.: Neural contours: learning to draw lines from 3d shapes. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 5427–5435. Computer Vision Foundation/IEEE (2020)
Lo, Y., et al.: CLCC: contrastive learning for color constancy. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 8053–8063. Computer Vision Foundation/IEEE (2021)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021)
Ramesh, A., et al.: Zero-shot text-to-image generation. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 8821–8831. PMLR (2021)
Richardson, E., et al.: Encoding in style: a styleGAN encoder for image-to-image translation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 2287–2296. Computer Vision Foundation/IEEE (2021)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, 7–12 August 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics (2016)
Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report. CNS-TR-2010-001, California Institute of Technology (2010)
Weng, L., Elsawah, A.M., Fang, K.: Cross-entropy loss for recommending efficient fold-over technique. J. Syst. Sci. Complex. 34(1), 402–439 (2021)
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. CoRR abs/2001.04193 (2020)
Zhang, R., et al.: A progressive generative adversarial method for structurally inadequate medical image data augmentation. IEEE J. Biomed. Health Inform. 26(1), 7–16 (2022)
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 3774–3782. IEEE Computer Society (2017)
Acknowledgement
This work was supported by Japan Science and Technology Agency (JST CREST: JPMJCR19F2, Research Representative: Prof. Yoichi Ochiai, University of Tsukuba, Japan), and by University of Tsukuba (Basic Research Support Program Type A).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, JL., Ochiai, Y. (2022). Customizable Text-to-Image Modeling by Contrastive Learning on Adjustable Word-Visual Pairs. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-05643-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05642-0
Online ISBN: 978-3-031-05643-7
eBook Packages: Computer ScienceComputer Science (R0)