[go: up one dir, main page]

CN114092713B - Image generation method, device, readable medium and electronic device - Google Patents

Image generation method, device, readable medium and electronic device Download PDF

Info

Publication number
CN114092713B
CN114092713B CN202111435770.5A CN202111435770A CN114092713B CN 114092713 B CN114092713 B CN 114092713B CN 202111435770 A CN202111435770 A CN 202111435770A CN 114092713 B CN114092713 B CN 114092713B
Authority
CN
China
Prior art keywords
image
target
initial
training
model parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111435770.5A
Other languages
Chinese (zh)
Other versions
CN114092713A (en
Inventor
高永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202111435770.5A priority Critical patent/CN114092713B/en
Publication of CN114092713A publication Critical patent/CN114092713A/en
Application granted granted Critical
Publication of CN114092713B publication Critical patent/CN114092713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种图像生成方法、装置、可读介质和电子设备,涉及图像处理技术领域,该方法包括:响应于用户请求,获取原始图像,对原始图像提取原始图像特征,基于原始图像特征确定初始模型参数,并根据初始模型参数,生成目标对象对应的初始图像,对初始图像提取初始图像特征,基于初始图像特征生成目标模型参数,并根据目标模型参数对初始图像中的目标对象进行修正,以得到目标图像。本公开利用初始生成模型生成低精度的初始图像,再结合目标生成模型生成高精度的目标图像,能够提高目标图像与原始图像的相似度,从而提升目标图像的表现力。

The present disclosure relates to an image generation method, device, readable medium and electronic device, and relates to the field of image processing technology. The method includes: in response to a user request, obtaining an original image, extracting original image features from the original image, determining initial model parameters based on the original image features, and generating an initial image corresponding to a target object according to the initial model parameters, extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image. The present disclosure generates a low-precision initial image using an initial generation model, and then generates a high-precision target image in combination with a target generation model, which can improve the similarity between the target image and the original image, thereby improving the expressiveness of the target image.

Description

Image generation method, device, readable medium and electronic equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image generating method, an image generating device, a readable medium, and an electronic apparatus.
Background
With the continuous development of electronic information technology, various Application programs (APP) appear in the Application market to meet the diversified demands of users. For gaming applications, particularly MMORPGs (english: multiplayer Online Role-PLAYING GAME, chinese: massively multiplayer online role playing games), players are often allowed to edit the character's avatar in the game according to their own preferences rather than using default templates in order to improve their immersion and interactivity. However, it is often difficult to create a character that meets the needs due to varying levels of skill in the player's operation. Thus, in order to facilitate player manipulation, it is possible for a player to upload an image (e.g., a photograph of the player himself) satisfying the demand and generate a corresponding avatar from the image.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides an image generation method, the method comprising:
Responding to a user request, and acquiring an original image;
Extracting original image characteristics from the original image, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters;
extracting initial image characteristics from the initial image, generating target model parameters based on the initial image characteristics, and correcting the target object in the initial image according to the target model parameters to obtain a target image.
In a second aspect, the present disclosure provides an image generation apparatus, the apparatus comprising:
the acquisition module is used for responding to the user request and acquiring an original image;
The initial generation module is used for extracting original image characteristics from the original image, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters;
and the target generation module is used for extracting initial image characteristics from the initial image, generating target model parameters based on the initial image characteristics, and correcting the target object in the initial image according to the target model parameters so as to obtain a target image.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which when executed by a processing device performs the steps of the method of the first aspect of the present disclosure.
In a fourth aspect, the present disclosure provides an electronic device comprising:
A storage device having a computer program stored thereon;
Processing means for executing said computer program in said storage means to carry out the steps of the method of the first aspect of the disclosure.
According to the technical scheme, the method and the device for generating the initial image of the target object comprise the steps of firstly responding to a user request, obtaining an original image, then extracting original image features from the original image, determining initial model parameters based on the original image features, and generating the initial image corresponding to the target object according to the initial model parameters. And finally, extracting initial image characteristics from the initial image, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image. The method and the device generate the low-precision initial image by utilizing the initial generation model, and generate the high-precision target image by combining the target generation model, so that the similarity between the target image and the original image can be improved, and the expressive force of the target image is improved.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flowchart illustrating an image generation method according to an exemplary embodiment;
FIG. 2 is a schematic diagram of the connection between an initial generative model and a target generative model;
FIG. 3 is a flowchart illustrating another image generation method according to an exemplary embodiment;
FIG. 4 is a flowchart illustrating another image generation method according to an exemplary embodiment;
FIG. 5 is a schematic diagram illustrating an initial generation model according to an exemplary embodiment;
FIG. 6 is a schematic diagram of a target generation model, shown in accordance with an exemplary embodiment;
FIG. 7 is a flowchart illustrating a training initial generation model according to an exemplary embodiment;
FIG. 8 is a flowchart illustrating a training object generation model in accordance with an exemplary embodiment;
FIG. 9 is a flowchart illustrating another training object generation model in accordance with an exemplary embodiment;
FIG. 10 is a flowchart illustrating another training object generation model in accordance with an exemplary embodiment;
FIG. 11 is a block diagram of an image generation apparatus according to an exemplary embodiment;
FIG. 12 is a block diagram of another image generation apparatus shown according to an exemplary embodiment;
FIG. 13 is a block diagram of another image generation apparatus shown according to an exemplary embodiment;
Fig. 14 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment," another embodiment "means" at least one additional embodiment, "and" some embodiments "means" at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
Fig. 1 is a flowchart illustrating an image generation method according to an exemplary embodiment, as shown in fig. 1, the method including:
In step 101, an original image is acquired in response to a user request.
For example, first, upon receiving a user request initiated by a user, an original image of a target object uploaded by the user (e.g., a player) may be obtained, where the original image includes the target object. The target object may be the user himself, a person (e.g., a history person, a star, etc.) specified by the user, or an animal or article specified by the user. The original image is a 2D image, for example, may be a photograph of the target object taken in the real scene, or may be an portrait of the target object, which is not particularly limited in this disclosure. After the original image is obtained, a process of feature point alignment may be performed first to determine feature points included in the original image. Taking the target object as a human face as an example, the original image may be processed by advanced human face feature point alignment (english: FACE ALIGNMENT) to determine feature points included in the original image, such as left eye, right eye, nose, etc. The face feature point alignment may be performed by ASM (english: ACTIVE SHAPE Model), AAM (english: ACTIVE APPEARANCE Model), CLM (english: constrained Local Model), or the like, which is not particularly limited in this disclosure.
Step 102, extracting original image features from the original image, determining initial model parameters based on the original image features, and generating an initial image corresponding to the target object according to the initial model parameters.
For example, the original image after the feature points are aligned may be subjected to feature extraction to extract features of the original image. Specifically, the original image may be input into a pre-trained feature extraction network to convert the original image into a vector of specified dimensions that characterizes the original image, i.e., the original image features. Then, corresponding initial model parameters can be determined based on the original image characteristics, and an initial image corresponding to the target object can be generated according to the initial model parameters. Wherein the initial image comprises the target object. For example, the original image is a photograph of the target object taken in a real scene, and then the original image may be a cartoon character (also understood as a avatar character) of the target object. The initial image is generated according to the complete original image, and the initial image can be kept approximate to the original image on the whole, but approximation on details cannot be achieved, so that the accuracy of simulating the original image is low.
Step 103, extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting a target object in the initial image according to the target model parameters to obtain a target image.
For example, after the initial image is obtained, feature extraction may be performed on the initial image to extract features of the initial image. Likewise, feature extraction may be performed using a feature extraction network. Then, corresponding target model parameters can be determined based on the initial image features, and the target object in the initial image is corrected according to the corresponding target model parameters, so that a target image corresponding to the target object is generated. Wherein the target image also includes the target object. In one implementation, step 102 may be implemented by a pre-trained initial generative model and step 103 may be implemented by a pre-trained target generative model. Accordingly, the connection relationship between the initial generation model and the target generation model is a cascade relationship, as shown in fig. 2, the input of the initial generation model is an original image, the output of the initial generation model (i.e., the initial image) is used as the input of the target generation model, and the output of the target generation model is the target image.
The target image is obtained by correcting the target object on the basis of the initial image, so that the target image can be kept similar to the original image on the whole and can be kept similar to the original image on the detail, the imitation accuracy of the original image is higher, and compared with the initial image, the similarity of the target image and the original image is higher, and the expressive force of the target image is improved.
In summary, the present disclosure firstly obtains an original image in response to a user request, then extracts original image features from the original image, determines initial model parameters based on the original image features, and generates an initial image corresponding to a target object according to the initial model parameters. And finally, extracting initial image characteristics from the initial image, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image. The method and the device generate the low-precision initial image by utilizing the initial generation model, and generate the high-precision target image by combining the target generation model, so that the similarity between the target image and the original image can be improved, and the expressive force of the target image is improved.
FIG. 3 is a flowchart illustrating another image generation method, as shown in FIG. 3, according to an exemplary embodiment, the method further comprising:
step 104, dividing the initial image into a plurality of partial images according to the key parts of the target object.
For example, after obtaining the initial image, the initial image may be divided according to a key portion of the target object to obtain a plurality of partial images. Taking the target object as a human face as an example, the initial image can be divided according to the facial feature distribution of the human face to obtain a plurality of partial images, and the partial images can be overlapped. For example, eyes, eyebrows, nose, mouth are located at the upper part of the face, nose, mouth, face outline are located at the lower part of the face, so the initial image can be taken as a partial image (including eyes, eyebrows, nose, mouth) from top down to 3/4, and the initial image can be taken as a partial image (including nose, mouth, face outline) from bottom up to 3/4. For another example, eyes and eyebrows are located at the upper part of the face, a nose and a mouth are located at the middle part of the face, and a face contour is located at the lower part of the face, so that the initial image can be taken as a partial image (including eyes and eyebrows) from the top down to 1/3, the initial image can be taken as a partial image (including a face contour) from the bottom up to 1/3, and the rest can be taken as a partial image (including a nose and a mouth). The division manner of the initial image is not particularly limited by the present disclosure.
Accordingly, the implementation manner of step 103 may be:
global features are extracted from the initial image, and local features corresponding to each local image are extracted from the plurality of local images.
For example, in the process of extracting the features of the initial image, feature extraction may be performed on the initial image and the plurality of local images respectively, so as to obtain global features capable of representing the whole initial image, and local features capable of representing each local image.
FIG. 4 is a flowchart illustrating another image generation method according to an exemplary embodiment, as shown in FIG. 4, the implementation of step 103 may include:
step 1031, determining global model parameters according to the global features, and determining corresponding local model parameters according to each local feature.
Step 1032, determining the target model parameters based on the global model parameters and the plurality of local model parameters.
And step 1033, correcting the target object in the initial image according to the target model parameters to obtain a target image.
For example, global model parameters may be generated according to global features capable of characterizing the entire initial image, and a plurality of local model parameters may be generated according to local features capable of characterizing each local image, and then the global model parameters and the plurality of local model parameters are synthesized into target model parameters, and finally a target image is generated according to the target model parameters.
In one implementation, the implementation of step 102 may be:
extracting original image features from an original image through a pre-trained initial generation model, determining initial model parameters based on the original image features, and generating an initial image corresponding to a target object according to the initial model parameters. The initial generation model is obtained through training according to a plurality of sample images.
Accordingly, the implementation manner of step 103 may be:
Extracting initial image characteristics from an initial image through a pre-trained target generation model, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image.
The target generation model is trained according to a plurality of image groups, and each image group comprises a training image generated through an initial generation model according to a sample image and a plurality of local training images obtained by dividing the training image.
Illustratively, step 102 may be implemented by a pre-trained initial generative model, and step 103 may be implemented by a pre-trained target generative model. Correspondingly, the connection relationship between the initial generation model and the target generation model is shown in fig. 2, the input of the initial generation model is an original image, the output of the initial generation model (i.e. the initial image) is used as the input of the target generation model, and the output of the target generation model is the target image.
The initial generation model is trained according to a plurality of sample images, and the target generation model is trained through a plurality of image groups. Wherein each image group can comprise a training image generated according to the sample image by using the initial generation model and a plurality of local training images obtained by dividing the training image. That is, a plurality of sample images may be acquired in advance, and from each sample image, a corresponding image group, each corresponding to one sample image, is generated using the initial generation model. The sample image may be any image including a sample object (which may be understood as any person or object), that is, the style of the sample image is not limited. For example, the sample image may have the same style as the original image uploaded by the user in step 101, or may have the same style as the original image.
Furthermore, the training images may be divided according to the key portion of the sample object, so as to obtain a plurality of local training images, where the plurality of local training images may overlap. The division manner of the training image is the same as that of the initial image, and will not be described here again.
The image group comprises a complete training image and a local training image capable of reflecting local, so that the target generation model obtained by training can learn the integral characteristics in the training image and the local characteristics in the training image. Therefore, the target image generated by the target generation model can be kept approximate to the original image on the whole and can be kept approximate to the original image on the detail, so that the imitation accuracy of the original image is higher, compared with the original image, the similarity of the target image and the original image is higher, and the expressive force of the target image is improved.
In one implementation, the initial generation model may include an identification network, a converter (which may be understood as a Translator), and a generator, as shown in FIG. 5. The recognition network is used for extracting features of the original image so as to extract original image features capable of representing the original image. The converter is used for determining corresponding initial model parameters according to the original image characteristics, and the final generator is used for generating an initial image according to the initial model parameters. In particular, the recognition network may be understood as a high-dimensional mapping table capable of converting the original image into a vector of a specified dimension. For example, the original image is a 512×512 photograph, and the original image is input into the recognition network to obtain a 256-dimensional vector (i.e., the original image feature). The Generator may be, for example, a Generator (i.e., generator) in GAN (english: GENERATIVE ADVERSARIAL Network, chinese: generating type countermeasure Network), but may be other structures, which are not particularly limited in this disclosure.
In another implementation, the initial generation model may include an identification network, a segmentation network, a transformer, a shape guidance model, a synthesizer, and a generator. The original image can be respectively input into an identification network and a segmentation network to obtain the characteristics of the original image and the segmentation result. The segmentation result can characterize which channel each pixel point in the original image belongs to. The segmentation network can segment the original image according to a plurality of channels to obtain mask images, namely segmentation results, of the original image corresponding to the channels. The pixel point with 1 in each mask image belongs to the channel corresponding to the mask image, and the pixel point with 0 does not belong to the channel corresponding to the mask image. For example, the original image is a 512×512 photograph including a human face, the segmentation network can segment the original image according to 19 channels, and the 19 channels can include a background, skin, a nose, a left eye, a right eye, a left eyebrow, a right eyebrow, a left ear, and the like, and the obtained segmentation result is 19 mask images of 512×512. The pixel point with the value of 1 in the mask image corresponding to the background indicates that the position corresponding to the pixel point in the original image belongs to the background, and the pixel point with the value of 0 indicates that the position corresponding to the pixel point in the original image does not belong to the background. For another example, a pixel point of 1 in the mask image corresponding to the left eyebrow indicates that the position corresponding to the pixel point in the original image belongs to the left eyebrow, and a pixel point of 0 indicates that the position corresponding to the pixel point in the original image does not belong to the left eyebrow. Therefore, the segmentation result can reflect the positions of all parts of the human face in the original image.
The original image features may then be input to a converter, which outputs the first model parameters. And simultaneously, inputting the segmentation result into a shape guiding model, and outputting second model parameters by the shape guiding model. The shape guidance model may have a structure of RNN (english: recurrent Neural Network, chinese: recurrent neural network), CNN (english: convolutional Neural Networks, chinese: convolutional neural network), LSTM (Long Short-Term Memory, chinese: long-Term Memory network), and other neural networks, for example, may include an input layer, a plurality of convolutional layers, and an output layer, which is not particularly limited in this disclosure. After the first model parameter and the second model parameter are obtained, the first model parameter and the second model parameter may be input into the synthesizer to obtain initial model parameters of the synthesizer output. The synthesizer is capable of integrating the first model parameter and the second model parameter into an initial model parameter, wherein the dimensions of the first model parameter, the second model parameter and the initial model parameter are the same. For example, a synthesizer may be used to weight sum the first model parameter and the second model parameter to obtain the initial model parameter. The synthesizer may also be an MLP (english: muti-Layer permission, chinese: multi-Layer perceptron) capable of mapping the first model parameter and the second model parameter to the initial model parameter. And finally, inputting the initial model parameters into a generator, wherein the output of the generator is the initial image. The original image features can characterize features in the original image, so that the initial generation model can learn the mind state of the target object in the original image, the segmentation result can reflect the positions of all parts of the target object in the original image in the image, and the initial generation model can learn the shapes such as the proportion, the positions and the like of all the parts in the original image. Therefore, the initial image and the original image can be similar and look like at the same time, and the expressive force of the initial image is improved.
Further, the object generation model may also include an identification network, a converter and a generator, wherein the identification network is used for identifying the initial image so as to extract initial image features capable of representing the initial image. The converter is used for determining corresponding target model parameters according to the initial image characteristics, and the final generator is used for generating a target image according to the target model parameters. In another implementation, the object generation model may also include an identification network, a segmentation network, a transformer, a shape guidance model, a synthesizer, and a generator as well. The process of generating the target image is the same as the process of generating the initial image, and will not be described here again.
In yet another implementation, another structure of the object generation model is shown in FIG. 6, and may include multiple generation sub-models, and one generator. The plurality of generation sub-models correspond to the initial image and the plurality of partial images, respectively. Each generation sub-model includes an identification network and a converter. Specifically, the generating sub-model corresponding to the initial image firstly utilizes the identification network to extract the characteristics of the initial image so as to obtain global characteristics, and then utilizes the converter to determine the corresponding global model parameters according to the global characteristics. The method comprises the steps of generating a sub-model corresponding to each local image, firstly extracting features of the local image by using an identification network to obtain corresponding local features, and then determining corresponding local model parameters according to the local features by using a converter. That is, the global model parameters corresponding to the initial image and the local model parameters corresponding to each of the plurality of local images can be obtained by the plurality of generation sub-models.
Further, the object generation model may integrate the global model parameter and the plurality of local model parameters into an object model parameter, and finally generate the object image according to the object model parameter by using the generator. The target image is generated by integrating global model parameters and local model parameters, that is, the target generation model can extract the integral characteristics of the target object in the initial image and the local characteristics of the target object in the initial image, so that the target image can be kept approximate to the original image on the whole and the original image on the detail, the similarity of the target image and the original image is further improved, and the expressive force of the target image is improved.
How to determine the target model parameters from the global model parameters and the plurality of local model parameters is specifically described below:
In one implementation, the global model parameter and the plurality of local model parameters may be weighted summed to obtain the target model parameter. In another implementation, the partial parameters may be selected from the corresponding local model parameters according to the location included in each local image, and weighted sum may be performed with the global model parameters. For example, including two partial images PICa and PICC, PICC includes eyes, eyebrows, nose, mouth, PICb includes nose, mouth, facial contours. Then the parameters corresponding to the eyes and the eyebrows in the local model parameters corresponding to PICa can be weighted and summed (or averaged) with the parameters corresponding to the eyes and the eyebrows in the global model parameters, and the obtained result is used as the parameters corresponding to the eyes and the eyebrows in the target model parameters. And then, carrying out weighted summation on parameters corresponding to the facial contours in the local model parameters corresponding to PICb (which can be understood as face shape parameters) and parameters corresponding to the facial contours in the global model parameters, and taking the obtained result as the parameters corresponding to the facial contours in the target model parameters. And finally, weighting and summing parameters corresponding to the nose and the mouth in the local model parameters corresponding to PICa, parameters corresponding to the nose and the mouth in the local model parameters corresponding to PICb and parameters corresponding to the nose and the mouth in the global model parameters, and taking the obtained result as parameters corresponding to the nose and the mouth in the target model parameters, thereby obtaining the complete target model parameters.
It should be noted that, the model parameters (including the initial model parameters, the target model parameters, the global model parameters, and the local model parameters) mentioned in this disclosure may be understood as parameters that can characterize each portion of the target object from the aspects of geometry and image vision. Taking the target object as a human face for example, the model parameters are kneading face parameters, and can comprise a human face shape, a nose shape, a mouth shape, an eye shape, a eyebrow shape, a five sense organs distribution, a beard position and the like. The target object may also be a cat, and the model parameters may include facial shape, nasal shape, mouth shape, eye shape, beard position, pattern, limb proportion, etc. of the cat. For another example, the target object may also be a vehicle, and the model parameters may include a windshield shape, a hood shape, a front light shape, a rear light shape, a wheel distribution, etc. of the vehicle.
FIG. 7 is a flowchart illustrating a training of an initial generative model, as shown in FIG. 7, according to an exemplary embodiment, the initial generative model being trained as follows:
And step A, acquiring a plurality of sample images generated by the game engine according to a preset rule, and determining real model parameters used when the game engine generates the corresponding sample images.
And B, extracting sample image features from each sample image through an initial generation model, and determining initial training model parameters based on the sample image features.
And step C, training an initial generation model according to the initial training model parameters and the real model parameters corresponding to the sample image.
For example, training an initially generated model first requires obtaining a sample input set and a corresponding sample output set for training. The sample input set includes a plurality of sample inputs and the sample output set includes a sample output corresponding to each sample input. Specifically, a game engine can be selected, the images in the game engine accord with the appointed style, then the game engine is utilized to generate a plurality of sample images according with the appointed style according to the preset rule, and the real model parameters used by the game engine when generating each sample image are recorded. The preset rule may be that the game engine randomly generates the sample image, or may set a certain constraint (for example, a threshold value of the ratio of five sense organs, a range of the distance between two eyes, etc.) for the game engine, and then randomly generate the sample image. In this way, a plurality of sample images can be used as a sample input set, and real model parameters corresponding to the plurality of sample images can be used as a sample output set. Then, the sample input set is used as an input of the initial generation model, and the initial generation model is trained by using the sample output set.
Specifically, a sample image is input into an initial generation model, the initial generation model firstly performs feature extraction on the sample image to obtain sample image features, and corresponding initial training model parameters are determined according to the sample image features. The loss function of the initial generated model may be determined from the initial training model parameters, the real model parameters corresponding to the sample image. For example, a difference (or mean square error) between the initial model parameter and the real model parameter may be used as a Loss function, and l1_loss or l2_loss between the initial model parameter and the real model parameter may be used as a Loss function, which is not specifically limited in this disclosure. Then, in order to reduce the loss function, the parameters of the neurons in the initial generation model, such as weights (english: weight) and offsets (english: bias), are corrected by using a back propagation algorithm. Repeating the steps until the loss function meets the preset condition, for example, the loss function is smaller than the preset loss threshold value. Further, the initial generation model may be trained, the recognition network, the converter and the generator included in the initial generation model may be jointly trained, or the training may be performed separately (e.g., only modifying parameters of neurons in the generator), which is not specifically limited in this disclosure.
FIG. 8 is a flowchart illustrating a training target generation model, as shown in FIG. 8, according to an exemplary embodiment, the target generation model being trained by:
And D, generating a training image through an initial generation model according to the sample image.
And E, dividing the training image into a plurality of local training images according to the key parts of the target object.
And F, generating a target training image through a target generation model according to the training image and the plurality of local training images.
And G, training a target generation model according to the sample image and the target training image.
For example, training the object generation model also requires obtaining a sample input set and a corresponding sample output set for training. The sample input set includes a plurality of sample inputs and the sample output set includes a sample output corresponding to each sample input. Specifically, since the sample image may have any style (i.e., the sample image does not need to be labeled in advance), a large number of sample images may be collected randomly, then an initial generation model is utilized to generate a training image corresponding to each sample image according to each sample image, and then the training image is divided into a plurality of local training images according to the key parts of the target object. And taking the training image and a plurality of local training images as an image group. Multiple image groups may be taken as a sample input set while sample images are taken as a sample output set. And finally taking the sample input set as the input of the target generation model, and training the target generation model by utilizing the sample output set.
For example, the Loss function may be determined from the target training image and the sample image output by the target generation model, and for example, a difference (or mean square error) between the target training image and the sample image may be used as the Loss function, and Percetual Loss, the recognition Loss, the feature map Loss, l1_loss, or l2_loss of the target training image and the sample image may be used as the Loss function, which is not particularly limited in this disclosure. Then, in order to reduce the loss function, the parameters of the neurons in the target generation model, such as weights and offsets of the neurons, are corrected by using a back propagation algorithm. Repeating the steps until the loss function meets the preset condition, for example, the loss function is smaller than the preset loss threshold value. Thus, the self-supervision training can be performed on the target generation model without marking the sample image in advance.
Further, training the target generation model may be performed by jointly training the recognition network, the converter, and the generator included in the target generation model, or may be performed separately (e.g., only modifying parameters of neurons in the generator), which is not specifically limited in this disclosure.
FIG. 9 is a flowchart illustrating another training object generation model according to an exemplary embodiment, as shown in FIG. 9, the implementation of step F may include:
And F1, extracting global training features from the training images, and extracting local training features corresponding to each local training image from the plurality of local training images.
And F2, determining global training model parameters according to the global training characteristics, and determining corresponding local training model parameters according to each local training characteristic.
And F3, determining target training model parameters according to the global training model parameters and the plurality of local training model parameters.
And F4, generating a target training image according to the target training model parameters.
For example, in the manner of generating the target training image, feature extraction may be performed on the training image and the plurality of local training images, respectively, so as to obtain a global training feature and a plurality of local training features. And then determining global training model parameters according to the global training characteristics, and determining corresponding local training model parameters according to each local training characteristic. And determining target training model parameters according to the global training model parameters and the local training model parameters. Specifically, the method for determining the target training model parameters is the same as the method for determining the target model parameters according to the global model parameters and the plurality of local model parameters, and will not be described herein. And finally, generating a target training image according to the target training model parameters. Furthermore, the target training image is generated, and meanwhile, a plurality of local training model parameters can be utilized to generate an intermediate training image corresponding to each local model parameter.
FIG. 10 is a flowchart illustrating another training object generation model according to an exemplary embodiment, as shown in FIG. 10, step G may be implemented by:
and G1, determining global loss according to the sample image and the target training image.
And G2, determining local loss according to the sample image and a plurality of intermediate training images, wherein the intermediate training images are generated according to each local training model parameter.
And G3, determining comprehensive loss according to the global loss and the local loss, and training a target generation model by using a back propagation algorithm with the goal of reducing the comprehensive loss.
For example, the loss function of the target generation model is specifically described below, and the loss function is a comprehensive loss and includes two parts, namely a global loss and a local loss. Wherein the global loss is determined from the sample image and the target training image and the local loss is determined from the sample image and the intermediate training image.
In particular, the local loss may be further divided into a loss corresponding to each local training image, i.e., a loss between the sample image and the intermediate training image corresponding to each local training image. For example, the local loss may be determined by equation one:
Li=αLi,1+βLi,2
Wherein L part represents a local Loss, L i represents a Loss corresponding to an ith local training image among the N local training images, L i,1 represents a sample image, an intermediate training image corresponding to the ith local training image is based on l1_loss of a key point, L i,2 represents a sample image, an intermediate training image corresponding to the ith local training image is based on l1_loss of a feature map, and α and β represent preset weights.
The global penalty may be determined according to equation two:
Lall=λ1L12L23L34L45L5 Formula II
Where L all represents global Loss, L 1 represents sample image, l1_loss based on keypoints with the target training image, L 2 represents sample image, l1_loss based on feature maps with the target training image, L 3 represents sample image, MSE Loss with the target training image, L 4 represents sample image, percetual Loss with the target training image, L 5 represents sample image, and recognition Loss with the target training image. Lambda 1、λ2、λ3、λ4、λ5 represents the preset weights, respectively.
Finally, the integrated loss may be determined based on the global loss and the local loss. For example, the global and local losses may be weighted and summed according to equation three to yield a composite loss:
l mix=ηLall+μLpart equation three
Where L mix represents the integrated loss, η represents the weight corresponding to the global loss, μ represents the weight corresponding to the local loss. Finally, the parameters of the neurons in the target generation model can be modified by using a back propagation algorithm with the goal of reducing the overall loss.
In an application scenario, the method may further comprise the steps of:
Step 1) obtaining the recorded appointed face type.
And 2) determining a specified initial generation model and a specified target generation model corresponding to the specified face type from a plurality of initial generation models and a plurality of target generation models which are trained in advance, wherein the specified initial generation model is trained according to a plurality of sample images with the specified face type, and the specified target generation model is trained according to the specified initial generation model and the plurality of sample images.
Accordingly, the implementation manner of step 102 may be:
Extracting original image features from an original image by designating an initial generation model, determining initial model parameters based on the original image features, and generating an initial image with a designated face type corresponding to a target object according to the initial model parameters.
The implementation manner of step 103 may be:
extracting initial image features from the initial image by specifying a target generation model, generating target model parameters based on the initial image features, and correcting a target object in the initial image according to the target model parameters to obtain a target image with a specified face type.
For example, in embodiments of the present disclosure, the specified face type may also be entered by the user, e.g., the user may enter the specified face type at the same time when uploading the original image. The face type is used to characterize specific features of the face, and may be, for example, a canine face type, a feline face type, a rabbit face type, a baby face type, etc., where the specified face type is a user-specified face type. Accordingly, the initial generation model and the target generation model corresponding to each face type can be trained in advance for a plurality of face types.
After the specified face type is acquired, a specified initial generation model and a specified target generation model corresponding to the specified face type may be selected from a plurality of initial generation models and target generation models to generate a target image, so that the resulting target image can have the specified face type. The user can select different face types on the basis of selecting the target object, and the expressive force of the target image is further improved.
In summary, the present disclosure firstly obtains an original image in response to a user request, then extracts original image features from the original image, determines initial model parameters based on the original image features, and generates an initial image corresponding to a target object according to the initial model parameters. And finally, extracting initial image characteristics from the initial image, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image. The method and the device generate the low-precision initial image by utilizing the initial generation model, and generate the high-precision target image by combining the target generation model, so that the similarity between the target image and the original image can be improved, and the expressive force of the target image is improved.
Fig. 11 is a block diagram of an image generation apparatus according to an exemplary embodiment, and as shown in fig. 11, the apparatus 200 may include:
an acquisition module 201, configured to acquire an original image in response to a user request.
The initial generation module 202 is configured to extract an original image feature from the original image, determine an initial model parameter based on the original image feature, and generate an initial image corresponding to the target object according to the initial model parameter.
The target generating module 203 is configured to extract an initial image feature from the initial image, generate a target model parameter based on the initial image feature, and correct a target object in the initial image according to the target model parameter to obtain a target image.
Fig. 12 is a block diagram of another image generation apparatus according to an exemplary embodiment, and as shown in fig. 12, the apparatus 200 may further include:
The dividing module 204 is configured to divide the initial image into a plurality of partial images according to the key parts of the target object.
Accordingly, the target generation module 203 may be configured to:
global features are extracted from the initial image, and local features corresponding to each local image are extracted from the plurality of local images.
Fig. 13 is a block diagram of another image generating apparatus according to an exemplary embodiment, and as shown in fig. 13, the object generating module 203 may include:
The extraction submodule 2031 is configured to determine global model parameters according to global features, and determine corresponding local model parameters according to each local feature.
A determination submodule 2032 for determining a target model parameter from the global model parameter and the plurality of local model parameters.
The generating submodule 2033 is configured to correct the target object in the initial image according to the target model parameter, so as to obtain a target image.
In one implementation, the initial generation module 202 may be configured to extract original image features from an original image by a pre-trained initial generation model, determine initial model parameters based on the original image features, and generate an initial image corresponding to the target object according to the initial model parameters. The initial generation model is obtained through training according to a plurality of sample images.
The target generation module 203 may be configured to:
Extracting initial image characteristics from an initial image through a pre-trained target generation model, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image.
The target generation model is trained from a plurality of image groups, each image group including a training image generated by the initial generation model from the sample image, and a plurality of local training images obtained by dividing the training image.
In one implementation, the initial generative model is trained by:
And step A, acquiring a plurality of sample images generated by the game engine according to a preset rule, and determining real model parameters used when the game engine generates the corresponding sample images.
And B, extracting sample image features from each sample image through an initial generation model, and determining initial training model parameters based on the sample image features.
And step C, training an initial generation model according to the initial training model parameters and the real model parameters corresponding to the sample image.
In another implementation, the goal generating model is trained by:
And D, generating a training image through an initial generation model according to the sample image.
And E, dividing the training image into a plurality of local training images according to the key parts of the target object.
And F, generating a target training image through a target generation model according to the training image and the plurality of local training images.
And G, training a target generation model according to the sample image and the target training image.
In yet another implementation, the implementation of step F may include:
And F1, extracting global training features from the training images, and extracting local training features corresponding to each local training image from the plurality of local training images.
And F2, determining global training model parameters according to the global training characteristics, and determining corresponding local training model parameters according to each local training characteristic.
And F3, determining target training model parameters according to the global training model parameters and the plurality of local training model parameters.
And F4, generating a target training image according to the target training model parameters.
In yet another implementation, step G may be implemented by:
and G1, determining global loss according to the sample image and the target training image.
And G2, determining local loss according to the sample image and a plurality of intermediate training images, wherein the intermediate training images are generated according to each local training model parameter.
And G3, determining comprehensive loss according to the global loss and the local loss, and training a target generation model by using a back propagation algorithm with the goal of reducing the comprehensive loss.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
In summary, the present disclosure firstly obtains an original image in response to a user request, then extracts original image features from the original image, determines initial model parameters based on the original image features, and generates an initial image corresponding to a target object according to the initial model parameters. And finally, extracting initial image characteristics from the initial image, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image. The method and the device generate the low-precision initial image by utilizing the initial generation model, and generate the high-precision target image by combining the target generation model, so that the similarity between the target image and the original image can be improved, and the expressive force of the target image is improved.
Referring now to fig. 14, there is shown a schematic diagram of an electronic device (e.g., an execution body in the embodiment shown in the present disclosure, which may be a terminal device or a server) 300 suitable for implementing the embodiment of the present disclosure. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 14 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 14, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
In general, devices may be connected to I/O interface 305 including input devices 306 such as a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 307 including a Liquid Crystal Display (LCD), speaker, vibrator, etc., storage devices 308 including, for example, magnetic tape, hard disk, etc., and communication devices 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 14 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the terminal device, server, may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be included in the electronic device or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to obtain an original image in response to a user request, extract original image features from the original image, determine initial model parameters based on the original image features, and generate an initial image corresponding to a target object according to the initial model parameters, extract initial image features from the initial image, generate target model parameters based on the initial image features, and correct the target object in the initial image according to the target model parameters to obtain a target image.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of a module is not limited to the module itself in some cases, and for example, the acquisition module may also be described as "a module that acquires an original image".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, example 1 provides an image generation method, including acquiring an original image in response to a user request, extracting original image features from the original image, determining initial model parameters based on the original image features, and generating an initial image corresponding to a target object according to the initial model parameters, extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image.
According to one or more embodiments of the present disclosure, example 2 provides the method of example 1, further comprising dividing the initial image into a plurality of local images according to key parts of the target object, and extracting initial image features for the initial image comprises extracting global features for the initial image and extracting local features corresponding to each of the local images for the plurality of local images.
According to one or more embodiments of the present disclosure, example 3 provides the method of example 2, wherein the generating a target model parameter based on the initial image feature and correcting the target object in the initial image according to the target model parameter to obtain a target image includes determining a global model parameter according to the global feature and determining a corresponding local model parameter according to each of the local features, determining the target model parameter according to the global model parameter and a plurality of the local model parameters, and correcting the target object in the initial image according to the target model parameter to obtain the target image.
According to one or more embodiments of the present disclosure, example 4 provides the method of example 1, the extracting original image features from the original image, determining initial model parameters based on the original image features, and generating an initial image corresponding to a target object according to the initial model parameters, including extracting the original image features from the original image by a pre-trained initial generation model, determining the initial model parameters based on the original image features, and generating an initial image corresponding to the target object according to the initial model parameters, the initial generation model being obtained by training a plurality of sample images, the extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters, to obtain a target image, including extracting the initial image features from the initial image by a pre-trained target generation model, generating the target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters, the target image generating the target image being obtained by training a plurality of sets of training images, the target image generating the target image being obtained by dividing the plurality of training images, and the training images.
According to one or more embodiments of the present disclosure, example 5 provides the method of example 4, wherein the initial generation model is trained by acquiring a plurality of sample images generated by a game engine according to a preset rule, determining real model parameters used when the game engine generates the corresponding sample images, extracting sample image features from each sample image through the initial generation model, determining initial training model parameters based on the sample image features, and training the initial generation model according to the initial training model parameters and the real model parameters corresponding to the sample images.
According to one or more embodiments of the present disclosure, example 6 provides the method of example 4 or example 5, the target generation model is trained by generating the training image from the sample image through the initial generation model, dividing the training image into a plurality of the local training images according to key parts of the target object, generating a target training image from the training image and the plurality of the local training images through the target generation model, and training the target generation model from the sample image and the target training image.
According to one or more embodiments of the present disclosure, example 7 provides the method of example 6, wherein generating a target training image from the training image and the plurality of local training images by the target generation model includes extracting global training features for the training image, extracting local training features corresponding to each of the local training images for the plurality of local training images, determining global training model parameters from the global training features, and determining corresponding local training model parameters from each of the local training features, determining target training model parameters from the global training model parameters and the plurality of local training model parameters, and generating the target training image from the target training model parameters.
According to one or more embodiments of the present disclosure, example 8 provides the method of example 7, the training the target generation model from the sample image and the target training image comprising determining a global loss from the sample image and the target training image, determining a local loss from the sample image and a plurality of intermediate training images, the intermediate training images being generated from each of the local training model parameters, determining a composite loss from the global loss and the local loss, and training the target generation model using a back propagation algorithm with the goal of reducing the composite loss.
According to one or more embodiments of the present disclosure, example 9 provides an image generating apparatus, including an acquisition module configured to acquire an original image in response to a user request, an initial generation module configured to extract an original image feature of the original image, determine an initial model parameter based on the original image feature, and generate an initial image corresponding to a target object according to the initial model parameter, and a target generation module configured to extract an initial image feature of the initial image, generate a target model parameter based on the initial image feature, and correct the target object in the initial image according to the target model parameter, so as to obtain a target image.
According to one or more embodiments of the present disclosure, example 10 provides a computer-readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the methods described in examples 1 to 8.
According to one or more embodiments of the present disclosure, example 11 provides an electronic device comprising a storage device having stored thereon a computer program, and a processing device for executing the computer program in the storage device to implement the steps of the method described in examples 1 to 8.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims (10)

1. An image generation method, the method comprising:
Responding to a user request, and acquiring an original image;
Extracting original image features from the original image through a pre-trained initial generation model, determining initial model parameters based on the original image features, and generating an initial image corresponding to a target object according to the initial model parameters;
extracting initial image characteristics from the initial image through a pre-trained target generation model, generating target model parameters based on the initial image characteristics, and correcting the target object in the initial image according to the target model parameters to obtain a target image;
The initial generation model is obtained through training according to a plurality of sample images, the target generation model is obtained through training according to a plurality of image groups, and each image group comprises a training image generated through the initial generation model according to the sample images and a plurality of local training images obtained through dividing the training images.
2. The method according to claim 1, wherein the method further comprises:
Dividing the initial image into a plurality of partial images according to the key parts of the target object;
the extracting initial image features from the initial image includes:
and extracting global features from the initial image, and extracting local features corresponding to each local image from a plurality of local images.
3. The method of claim 2, wherein generating target model parameters based on the initial image features and correcting the target object in the initial image according to the target model parameters to obtain a target image comprises:
Determining global model parameters according to the global features, and determining corresponding local model parameters according to each local feature;
Determining the target model parameters according to the global model parameters and the local model parameters;
And correcting the target object in the initial image according to the target model parameters to obtain the target image.
4. The method of claim 1, wherein the initial generative model is trained by:
Acquiring a plurality of sample images generated by a game engine according to a preset rule, and determining real model parameters used when the game engine generates the corresponding sample images;
extracting sample image features from each sample image through the initial generation model, and determining initial training model parameters based on the sample image features;
and training the initial generation model according to the initial training model parameters and the real model parameters corresponding to the sample image.
5. The method according to claim 1 or 4, wherein the object generation model is trained by:
generating the training image through the initial generation model according to the sample image;
Dividing the training image into a plurality of local training images according to the key parts of the target object;
generating a target training image through the target generation model according to the training image and the local training images;
And training the target generation model according to the sample image and the target training image.
6. The method of claim 5, wherein generating a target training image from the training image and the plurality of local training images by the target generation model comprises:
Extracting global training features from the training images, and extracting local training features corresponding to each local training image from a plurality of local training images;
Determining global training model parameters according to the global training features, and determining corresponding local training model parameters according to each local training feature;
determining target training model parameters according to the global training model parameters and the local training model parameters;
and generating the target training image according to the target training model parameters.
7. The method of claim 6, wherein the training the target generation model from the sample image and the target training image comprises:
determining a global loss according to the sample image and the target training image;
Determining local loss according to the sample image and a plurality of intermediate training images, wherein the intermediate training images are generated according to each local training model parameter;
And determining comprehensive loss according to the global loss and the local loss, and training the target generation model by using a back propagation algorithm with the aim of reducing the comprehensive loss.
8. An image generation apparatus, the apparatus comprising:
the acquisition module is used for responding to the user request and acquiring an original image;
the initial generation module is used for extracting original image characteristics from the original image through a pre-trained initial generation model, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters;
The target generation module is used for extracting initial image characteristics from the initial image through a pre-trained target generation model, generating target model parameters based on the initial image characteristics, and correcting the target object in the initial image according to the target model parameters to obtain a target image;
The initial generation model is obtained through training according to a plurality of sample images, the target generation model is obtained through training according to a plurality of image groups, and each image group comprises a training image generated through the initial generation model according to the sample images and a plurality of local training images obtained through dividing the training images.
9. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-7.
10. An electronic device, comprising:
A storage device having a computer program stored thereon;
processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-7.
CN202111435770.5A 2021-11-29 2021-11-29 Image generation method, device, readable medium and electronic device Active CN114092713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111435770.5A CN114092713B (en) 2021-11-29 2021-11-29 Image generation method, device, readable medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111435770.5A CN114092713B (en) 2021-11-29 2021-11-29 Image generation method, device, readable medium and electronic device

Publications (2)

Publication Number Publication Date
CN114092713A CN114092713A (en) 2022-02-25
CN114092713B true CN114092713B (en) 2025-03-07

Family

ID=80305501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111435770.5A Active CN114092713B (en) 2021-11-29 2021-11-29 Image generation method, device, readable medium and electronic device

Country Status (1)

Country Link
CN (1) CN114092713B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989904A (en) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 Method for generating style image, method, device, equipment and medium for training model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10410362B2 (en) * 2016-11-14 2019-09-10 Htc Corporation Method, device, and non-transitory computer readable storage medium for image processing
JP7042092B2 (en) * 2018-01-26 2022-03-25 日本放送協会 Image information converter and its program
CN111626218B (en) * 2020-05-28 2023-12-26 腾讯科技(深圳)有限公司 Image generation method, device, equipment and storage medium based on artificial intelligence
CN112149634B (en) * 2020-10-23 2024-05-24 北京神州数码云科信息技术有限公司 Image generator training method, device, equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989904A (en) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 Method for generating style image, method, device, equipment and medium for training model

Also Published As

Publication number Publication date
CN114092713A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN110009059B (en) Method and apparatus for generating a model
CN110555896B (en) Image generation method and device and storage medium
CN113362263B (en) Method, apparatus, medium and program product for transforming an image of a virtual idol
CN109754464B (en) Method and apparatus for generating information
CN111402122A (en) Image mapping processing method and device, readable medium and electronic equipment
CN113887498B (en) Face recognition method, device, equipment and storage medium
CN112116589B (en) Method, device, equipment and computer readable storage medium for evaluating virtual image
CN117351115A (en) Training method of image generation model, image generation method, device and equipment
CN113744286A (en) Virtual hair generation method and device, computer readable medium and electronic equipment
CN112581635B (en) Universal quick face changing method and device, electronic equipment and storage medium
US12299822B2 (en) Virtual clothing changing method, apparatus, electronic device and readable medium
CN117252947A (en) Image processing method, image processing apparatus, computer, storage medium, and program product
CN114092712B (en) Image generation method, device, readable medium and electronic equipment
CN118071887B (en) Image generation method and related device
CN113705302A (en) Training method and device for image generation model, computer equipment and storage medium
CN118212687A (en) Human body posture image generation method, device, equipment and medium
CN114092713B (en) Image generation method, device, readable medium and electronic device
CN113284206A (en) Information acquisition method and device, computer readable storage medium and electronic equipment
CN117152308B (en) Virtual person action expression optimization method and system
WO2024066549A1 (en) Data processing method and related device
CN116542292A (en) Training method, device, equipment and storage medium of image generation model
CN113298731B (en) Image color migration method and device, computer readable medium and electronic equipment
CN117011449A (en) Reconstruction method and device of three-dimensional face model, storage medium and electronic equipment
CN114373033A (en) Image processing method, image processing apparatus, image processing device, storage medium, and computer program
CN119722837B (en) Text-based multi-mode face generation method and device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant