Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
FIG. 1 is a flow chart illustrating a method of image generation, as shown in FIG. 1, according to an exemplary embodiment, the method comprising:
step 101, in response to a user request, an original image is acquired.
For example, first, in a case of receiving a user request initiated by a user, an original image of a target object uploaded by the user (e.g., a player) may be obtained, where the target object is included in the original image. The target object may be the user himself, a person specified by the user (e.g., a historical person, a star, etc.), or an animal or an object specified by the user. The original image is a 2D image, and may be, for example, a photograph of the target object taken in a real scene, or an image of the target object, which is not limited in this disclosure. After the original image is obtained, a feature point alignment process may be performed to determine feature points included in the original image. Taking the target object as a Face as an example, the original image may be processed for advanced human Face feature point Alignment (english: Face Alignment) to determine feature points included in the original image, for example: left eye, right eye, nose, etc. The alignment of the facial feature points may be performed by ASM (Active Shape Model), AAM (Active application Model), CLM (Constrained Local Model), and the like, which is not limited in this disclosure.
102, extracting original image characteristics of the original image, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to the target object according to the initial model parameters.
For example, the original image after aligning the feature points may be subjected to feature extraction to extract features of the original image. Specifically, the original image may be input into a pre-trained feature extraction network to convert the original image into a vector of a specified dimension, which can characterize the original image, and the vector is the original image feature. Then, the corresponding initial model parameters can be determined based on the original image characteristics, and an initial image corresponding to the target object can be generated. Wherein the initial image includes the target object. For example, the original image is a photograph of the target object taken in a real scene, and the initial image may be an animation image (also understood as an avatar image) of the target object. And generating an initial image according to the complete original image, wherein the initial image can keep approximate to the original image on the whole, but cannot be approximated in details, so that the accuracy of simulating the original image is low.
And 103, extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image.
For example, after obtaining the initial image, feature extraction may be performed on the initial image again to extract the initial image features. Similarly, feature extraction may be performed using a feature extraction network. Then, the corresponding target model parameters can be determined based on the initial image features, and the target object in the initial image is modified according to the corresponding target model parameters, so that a target image corresponding to the target object is generated. Wherein the target image also comprises the target object. In one implementation, step 102 may be implemented by a pre-trained initial generative model, and step 103 may be implemented by a pre-trained target generative model. Accordingly, the connection relationship between the initial generation model and the target generation model is a cascade relationship, and as shown in fig. 2, the input of the initial generation model is an original image, the output of the initial generation model (i.e., the initial image) is used as the input of the target generation model, and the output of the target generation model is a target image.
Because the target image is obtained by correcting the target object on the basis of the initial image, the target image can be kept similar to the original image on the whole, and can also be kept similar to the original image in details, so that the simulation precision of the original image is higher, compared with the initial image, the similarity between the target image and the original image is higher, and the expressive force of the target image is improved.
In summary, the present disclosure first responds to a user request, acquires an original image, then extracts original image features from the original image, determines initial model parameters based on the original image features, and generates an initial image corresponding to a target object according to the initial model parameters. And finally, extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image. According to the method and the device, the initial generation model is used for generating the low-precision initial image, and the high-precision target image is generated by combining the target generation model, so that the similarity between the target image and the initial image can be improved, and the expressive force of the target image is improved.
FIG. 3 is a flow chart illustrating another image generation method according to an exemplary embodiment, as shown in FIG. 3, the method further comprising:
and 104, dividing the initial image into a plurality of local images according to the key parts of the target object.
For example, after obtaining the initial image, the initial image may be divided according to the key portion of the target object to obtain a plurality of partial images. Taking the target object as a face for example, the initial image may be divided according to the distribution of five sense organs of the face to obtain a plurality of partial images, and the plurality of partial images may be overlapped with each other. For example, the eyes, eyebrows, nose, and mouth are located at the upper part of the human face, and the nose, mouth, and face contour are located at the lower part of the human face, so that the initial image can be taken from the top down to 3/4 as a partial image (including the eyes, eyebrows, nose, and mouth), and from the bottom up to 3/4 as a partial image (including the nose, mouth, and face contour). For another example, the eyes and eyebrows are located at the top of the face, the nose and mouth are located at the middle of the face, and the face contour is located at the bottom of the face, so that the initial image can be moved from the top to 1/3 as a partial image (including the eyes and eyebrows), the initial image can be moved from the bottom to 1/3 as a partial image (including the face contour), and the rest can be moved as a partial image (including the nose and mouth). The division manner of the initial image is not particularly limited in this disclosure.
Accordingly, the implementation manner of step 103 may be:
and extracting global features from the initial image, and extracting local features corresponding to each local image from the plurality of local images.
For example, in the process of extracting the initial image features, feature extraction may be performed on the initial image and the plurality of local images respectively to obtain a global feature capable of characterizing the entire initial image and a local feature capable of characterizing each local image.
Fig. 4 is a flowchart illustrating another image generation method according to an exemplary embodiment, and as shown in fig. 4, the implementation of step 103 may include:
and step 1031, determining global model parameters according to the global features, and determining corresponding local model parameters according to each local feature.
Step 1032, determining target model parameters according to the global model parameters and the plurality of local model parameters.
And 1033, modifying the target object in the initial image according to the target model parameter to obtain a target image.
For example, a global model parameter may be generated according to a global feature that can represent the entire initial image, a plurality of local model parameters may be generated according to a local feature that can represent each local image, the global model parameter and the plurality of local model parameters may be integrated into a target model parameter, and finally, a target image may be generated according to the target model parameter.
In one implementation, the implementation of step 102 may be:
extracting original image characteristics from an original image through a pre-trained initial generation model, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters. The initial generation model is obtained by training according to a plurality of sample images.
Accordingly, the implementation manner of step 103 may be:
extracting initial image characteristics from the initial image through a pre-trained target generation model, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image.
The target generation model is obtained by training according to a plurality of image groups, and each image group comprises: the method includes generating a training image by initially generating a model according to a sample image, and dividing the training image into a plurality of local training images.
Illustratively, step 102 may be implemented by a pre-trained initial generative model, and step 103 may be implemented by a pre-trained target generative model. Accordingly, the connection relationship between the initial generation model and the target generation model is as shown in fig. 2, the input of the initial generation model is an original image, the output of the initial generation model (i.e., the initial image) is used as the input of the target generation model, and the output of the target generation model is a target image.
Wherein, the initial generating model is obtained by training according to a plurality of sample images, and the target generating model is obtained by training through a plurality of image groups. Each image group may include a training image generated from the sample image by using the initial generation model, and a plurality of local training images obtained by dividing the training image. That is, a plurality of sample images may be acquired in advance, and from each sample image, a corresponding image group, each corresponding to one sample image, is generated using the initial generation model. The sample image may be any image including a sample object (which may be understood as any person or object), that is, the style of the sample image is not limited. For example, the sample image may have the same style as the original image uploaded by the user in step 101, or may have the same style as the original image.
Further, the training image may be divided according to a key portion of the sample object to obtain a plurality of local training images, and the plurality of local training images may be overlapped. The division manner of the training images is the same as that of the initial images, and is not described herein again.
Because the image group comprises the complete training image and the local training image capable of reflecting the local part, the target generation model obtained by training can learn the integral characteristic in the training image and the local characteristic in the training image. Therefore, the target image generated by the target generation model can be kept similar to the original image on the whole, and can also be kept similar to the original image in details, so that the original image is simulated with high precision, the similarity between the target image and the original image is higher than that of the original image, and the expressive force of the target image is improved.
In one implementation, the initial generative model may include the recognition network, a transformer (which may be understood as a Translator), and a generator, as shown in FIG. 5. The identification network is used for extracting the characteristics of the original image so as to extract the characteristics of the original image which can represent the original image. The converter is used for determining corresponding initial model parameters according to the characteristics of the original image, and the final generator is used for generating an initial image according to the initial model parameters. In particular, the identification network can be understood as a high-dimensional mapping table capable of converting the original image into a vector of a given dimension. For example, the original image is a 512 × 512 photograph, and the original image is input to the recognition network to obtain a 256-dimensional vector (i.e., the original image features). The Generator may be, for example, a Generator (i.e., a Generator) in a GAN (english: Generative adaptive Network, chinese: Generative countermeasure Network), and may also have other structures, which is not specifically limited in this disclosure.
In another implementation, the initial generation model may include a recognition network, a segmentation network, a transformer, a shape guidance model, a synthesizer, and a generator. The original image can be respectively input into the recognition network and the segmentation network to obtain the original image characteristics and the segmentation result. The segmentation result can represent the channel to which each pixel point in the original image belongs. The segmentation network can segment the original image according to a plurality of channels to obtain mask images of the original image corresponding to the plurality of channels, namely segmentation results. The pixel point of 1 in each mask image belongs to the channel corresponding to the mask image, and the pixel point of 0 does not belong to the channel corresponding to the mask image. For example, the original image is a 512 × 512 photograph including a human face, the segmentation network can segment the original image according to 19 channels, and the 19 channels may include: background, skin, nose, left eye, right eye, left eyebrow, right eyebrow, left ear, etc., resulting in 19 mask images 512 by 512 as a segmentation result. The pixel point of 1 in the mask image corresponding to the background represents that the position corresponding to the pixel point in the original image belongs to the background, the pixel point of 0 represents that the position corresponding to the pixel point in the original image does not belong to the background. For another example, a pixel point of 1 in the mask image corresponding to the left eyebrow indicates that the position corresponding to the pixel point in the original image belongs to the left eyebrow, and a pixel point of 0 indicates that the position corresponding to the pixel point in the original image does not belong to the left eyebrow. Therefore, the segmentation result can reflect the positions of the human face parts in the original image in the image.
The raw image features may then be input to a converter, which outputs first model parameters. Meanwhile, the segmentation result is input into a shape guidance model, and the shape guidance model outputs a second model parameter. The shape guidance model may have a structure of a Neural Network such as RNN (chinese: Recurrent Neural Network), CNN (chinese: Convolutional Neural Network), LSTM (Long Short-Term Memory Network), for example, may include an input layer, a plurality of Convolutional layers, and an output layer, which is not limited in this disclosure. After obtaining the first model parameters and the second model parameters, the first model parameters and the second model parameters may be input to the synthesizer to obtain initial model parameters output by the synthesizer. The synthesizer can integrate the first model parameter and the second model parameter into an initial model parameter, wherein the dimensions of the first model parameter, the second model parameter and the initial model parameter are the same. For example, the synthesizer may be configured to perform a weighted summation of the first model parameter and the second model parameter to obtain the initial model parameter. The synthesizer may also be an MLP (multi-Layer perceptron) capable of mapping the first model parameters and the second model parameters to initial model parameters. And finally, inputting the initial model parameters into a generator, wherein the output of the generator is the initial image. The original image features can represent the features in the original image, so that the initial generation model can learn the expression of the target object in the original image, the segmentation result can reflect the positions of all parts of the target object in the original image in the image, and the initial generation model can learn the shapes of all parts in the original image, such as the proportion, the positions and the like. Therefore, the initial image and the original image can be similar and expression similar at the same time, and the expressive force of the initial image is improved.
Further, the target generation model may also include a recognition network, a converter and a generator, wherein the recognition network is used for recognizing the initial image to extract initial image features capable of characterizing the initial image. The converter is used for determining corresponding target model parameters according to the initial image characteristics, and the final generator is used for generating a target image according to the target model parameters. In another implementation, the target generation model may also include a recognition network, a segmentation network, a transformer, a shape guidance model, a synthesizer, and a generator as well. The process of generating the target image is the same as the process of generating the initial image, and is not described herein again.
In yet another implementation, another structure of the target generation model, as shown in FIG. 6, may include a plurality of generation submodels, and one generator. The plurality of generation submodels correspond to the initial image and the plurality of partial images, respectively. Each generation submodel includes an identification network and a converter. Specifically, feature extraction is performed on the initial image by using an identification network to obtain global features corresponding to the generation submodel of the initial image, and then corresponding global model parameters are determined by using a converter according to the global features. And corresponding to the generation submodel of each local image, firstly, utilizing an identification network to extract the characteristics of the local image so as to obtain corresponding local characteristics, and then utilizing a converter to determine corresponding local model parameters according to the local characteristics. That is, by using a plurality of generation submodels, a global model parameter corresponding to an initial image and a local model parameter corresponding to each of a plurality of local images can be obtained.
Furthermore, the target generation model can integrate the global model parameters and the plurality of local model parameters into target model parameters, and finally, the generator is used for generating a target image according to the target model parameters. The target image is generated by combining the global model parameters and the local model parameters, that is, the target generation model can extract the overall characteristics of the target object in the initial image and can also extract the local characteristics of the target object in the initial image, so that the target image can be kept similar to the original image on the whole and can be kept similar to the original image on the details, the similarity between the target image and the original image is further improved, and the expressive force of the target image is improved.
How to determine the target model parameters according to the global model parameters and the plurality of local model parameters is specifically described as follows:
in one implementation, the global model parameter and the plurality of local model parameters may be weighted and summed to obtain the target model parameter. In another implementation manner, part of the parameters may be selected from the corresponding local model parameters according to the part included in each local image, and the part of the parameters may be weighted and summed with the global model parameters. For example, two partial images PICa and PICb are included, PICa including eyes, eyebrows, nose, mouth, and PICb including nose, mouth, face contour. Then, the parameters corresponding to the eyes and eyebrows in the local model parameters corresponding to the PICa and the parameters corresponding to the eyes and eyebrows in the global model parameters may be subjected to weighted summation (or averaged), and the obtained result is used as the parameters corresponding to the eyes and eyebrows in the target model parameters. Then, weighting and summing the parameters (which can be understood as face parameters) corresponding to the face contour in the local model parameters corresponding to the PICb and the parameters corresponding to the face contour in the global model parameters, and taking the obtained result as the parameters corresponding to the face contour in the target model parameters. And finally, carrying out weighted summation on parameters corresponding to the nose and the mouth in the local model parameters corresponding to the PICa, parameters corresponding to the nose and the mouth in the local model parameters corresponding to the PICb and parameters corresponding to the nose and the mouth in the global model parameters, and taking the obtained result as the parameters corresponding to the nose and the mouth in the target model parameters, thereby obtaining the complete target model parameters.
It should be noted that the model parameters (including the initial model parameters, the target model parameters, the global model parameters, and the local model parameters) referred to in this disclosure may be understood as parameters that can characterize the respective portions of the target object from the aspects of geometry and image vision. Taking the target object as a human face for example, the model parameters are face-pinching parameters, and may include: face shape, nose shape, mouth shape, eye shape, eyebrow shape, distribution of five sense organs, beard position, etc. The target object may also be a cat, and the model parameters may then include: face, nose, mouth, eye, beard position, pattern, and limb ratio of cat. As another example, the target object may also be a vehicle, and the model parameters may include: windshield shape, hood shape, front light shape, rear light shape, wheel distribution, etc. of the vehicle.
FIG. 7 is a flowchart illustrating a method for training an initial generative model, according to an exemplary embodiment, wherein the initial generative model is trained as shown in FIG. 7 by:
step A, obtaining a plurality of sample images generated by a game engine according to a preset rule, and determining real model parameters used when the game engine generates corresponding sample images.
And step B, extracting sample image characteristics of each sample image through the initial generation model, and determining initial training model parameters based on the sample image characteristics.
And C, training an initial generation model according to the initial training model parameters and the real model parameters corresponding to the sample image.
For example, training an initial generative model first requires obtaining a sample input set and a corresponding sample output set for training. The sample input set includes a plurality of sample inputs and the sample output set includes a sample output corresponding to each sample input. Specifically, a game engine may be selected, the image in the game engine conforms to a specified style, then a plurality of sample images conforming to the specified style are generated by using the game engine according to a preset rule, and simultaneously, real model parameters used by the game engine when each sample image is generated are recorded. The preset rule may be that the game engine randomly generates the sample image, or randomly generates the sample image after setting a certain constraint (for example, a threshold of a proportion of five sense organs, a range of a distance between two eyes, etc.) for the game engine. In this way, a plurality of sample images can be used as a sample input set, and the real model parameters corresponding to the plurality of sample images can be used as a sample output set. The sample input set is then used as input to the initial generative model, and the initial generative model is trained using the sample output set.
Specifically, a sample image is input into an initial generation model, the initial generation model firstly performs feature extraction on the sample image to obtain sample image features, and corresponding initial training model parameters are determined according to the sample image features. The loss function of the initially generated model may be determined based on the initial training model parameters and the real model parameters corresponding to the sample images. For example, the difference (or mean square error) between the initial model parameters and the real model parameters may be used as the Loss function, and L1_ Loss or L2_ Loss between the initial model parameters and the real model parameters may also be used as the Loss function, which is not particularly limited by the present disclosure. Thereafter, parameters of neurons in the initial generative model, such as weights (in English) and offsets (in English) of the neurons, are corrected using a back propagation algorithm with the goal of reducing the loss function. And repeating the steps until the loss function meets a preset condition, for example, the loss function is smaller than a preset loss threshold. Further, the initial generation model may be trained, and the recognition network, the converter and the generator included in the initial generation model may be jointly trained, or may be separately trained (for example, only parameters of neurons in the generator are modified), which is not specifically limited by the present disclosure.
FIG. 8 is a flowchart illustrating a method for training a target generative model, according to an exemplary embodiment, wherein the target generative model is trained as shown in FIG. 8 by:
and D, generating a training image through an initial generation model according to the sample image.
And E, dividing the training image into a plurality of local training images according to the key parts of the target object.
And F, generating a target training image through a target generation model according to the training image and the plurality of local training images.
And G, training a target generation model according to the sample image and the target training image.
For example, training the target generation model also requires obtaining a sample input set and a corresponding sample output set for training. The sample input set includes a plurality of sample inputs and the sample output set includes a sample output corresponding to each sample input. Specifically, because the sample images may have any style (i.e., the sample images do not need to be labeled in advance), a large number of sample images may be randomly acquired, then a training image corresponding to each sample image is generated according to each sample image by using an initial generation model, and then the training image is divided into a plurality of local training images according to the key parts of the target object. Then the training image and a plurality of local training images are used as an image group. Multiple image groups may be used as a sample input set, while sample images are used as a sample output set. And finally, taking the sample input set as the input of the target generation model, and training the target generation model by using the sample output set.
For example, a Loss function may be determined based on the target training image output by the target generation model and the sample image, for example, a difference (or mean square error) between the target training image and the sample image may be used as the Loss function, and a percentage Loss, a recognition Loss, a feature map Loss, L1_ Loss, or L2_ Loss of the target training image and the sample image may be used as the Loss function, which is not particularly limited in this disclosure. Thereafter, parameters of neurons in the target generation model, such as weights and offsets of the neurons, are modified using a back propagation algorithm with the goal of reducing the loss function. And repeating the steps until the loss function meets a preset condition, for example, the loss function is smaller than a preset loss threshold. Therefore, the target generation model can be trained in an automatic supervision mode without labeling the sample image in advance.
Further, the training of the target generation model may be performed by jointly training the recognition network, the converter and the generator included in the target generation model, or by separately training (for example, only modifying parameters of neurons in the generator), which is not limited in this disclosure.
FIG. 9 is a flowchart illustrating another method for training a target generative model, according to an exemplary embodiment, and as shown in FIG. 9, the implementation of step F may include:
step F1, extracting global training features from the training images, and extracting local training features corresponding to each of the local training images from the plurality of local training images.
Step F2, determining global training model parameters according to the global training features, and determining corresponding local training model parameters according to each local training feature.
Step F3, determining target training model parameters according to the global training model parameters and the plurality of local training model parameters.
Step F4, generating a target training image according to the target training model parameters.
For example, in a manner of generating the target training image, feature extraction may be performed on the training image and the plurality of local training images, respectively, to obtain a global training feature and a plurality of local training features. And then determining global training model parameters according to the global training characteristics, and determining corresponding local training model parameters according to each local training characteristic. And determining target training model parameters according to the global training model parameters and the plurality of local training model parameters. Specifically, the method for determining the target training model parameters is the same as the method for determining the target model parameters according to the global model parameters and the plurality of local model parameters, and details are not repeated here. And finally, generating a target training image according to the target training model parameters. Furthermore, when the target training image is generated, the intermediate training image corresponding to each local model parameter can be generated by using a plurality of local training model parameters.
FIG. 10 is a flowchart illustrating another method for training a target generation model according to an exemplary embodiment, and as shown in FIG. 10, step G may be implemented by:
and G1, determining the global loss according to the sample image and the target training image.
And G2, determining local loss according to the sample image and a plurality of intermediate training images, wherein the intermediate training images are generated according to each local training model parameter.
And G3, determining the comprehensive loss according to the global loss and the local loss, and training a target generation model by using a back propagation algorithm with the goal of reducing the comprehensive loss as a target.
For example, the following describes a specific loss function of the target generation model, where the loss function is a synthetic loss and includes two parts: global losses and local losses. Wherein the global loss is determined based on the sample image and the target training image, and the local loss is determined based on the sample image and the intermediate training image.
Specifically, the local loss may be further divided into a loss corresponding to each local training image, that is, a loss between the sample image and the intermediate training image corresponding to each local training image. For example, the local loss can be determined by the formula one:
Li=αLi,1+βLi,2
wherein L ispartDenotes local loss, LiRepresents the corresponding loss of the ith local training image, Li,1Representing sample images, the intermediate training image corresponding to the ith local training image is based on keypoint L1_ Loss, Li,2The sample image is represented, and the intermediate training image corresponding to the i-th local training image represents a preset weight based on L1_ Loss, α, and β of the feature map.
The global penalty can be determined according to equation two:
Lall=λ1L1+λ2L2+λ3L3+λ4L4+λ5L5formula two
Wherein L isallDenotes global penalty, L1L1_ Loss, L representing a sample image, keypoint-based with a target training image2L1_ Loss, L representing a sample image, together with a target training image based on a feature map3MSE loss, L, from the target training image representing the sample image4Percental Loss, L, representing the sample image and the target training image5Representing the sample image and the identification Loss of the target training image. Lambda [ alpha ]1、λ2、λ3、λ4、λ5Respectively, represent preset weights.
Finally, the global loss and the local loss can be used to determine the integrated loss. For example, the global loss and the local loss may be weighted and summed according to formula three to obtain the combined loss:
Lmix=ηLall+μLpartformula three
Wherein L ismixRepresents the global penalty, η represents the weight for the global penalty, and μ represents the weight for the local penalty. Finally, parameters of neurons in the target generation model can be modified using a back propagation algorithm with the goal of reducing the synthetic loss.
In an application scenario, the method may further comprise the steps of:
step 1) acquiring the entered specified face type.
And 2) determining a specified initial generation model and a specified target generation model corresponding to the specified face type from a plurality of initial generation models and a plurality of target generation models which are trained in advance, wherein the specified initial generation model is obtained by training according to a plurality of sample images with the specified face type, and the specified target generation model is obtained by training according to the specified initial generation model and the plurality of sample images.
Accordingly, the implementation manner of step 102 may be:
the method includes extracting original image features from an original image by specifying an initial generation model, determining initial model parameters based on the original image features, and generating an initial image corresponding to a target object and having a specified face type according to the initial model parameters.
The implementation manner of step 103 may be:
extracting initial image characteristics from the initial image through a specified target generation model, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image with a specified face type.
For example, in the embodiment of the present disclosure, the specified face type may also be entered by the user, for example, the user may enter the specified face type at the same time when the original image is uploaded. The face type is used to represent some specific features of the face, for example, the face type may be a canine face type, a feline face type, a rabbit face type, a baby face type, etc., and the specified face type is a face type specified by the user. Accordingly, the initial generative model and the target generative model corresponding to each face type may be trained in advance for a plurality of face types.
After acquiring the specified face type, a specified initial generative model and a specified target generative model corresponding to the specified face type may be selected among the plurality of initial generative models and target generative models to generate a target image, so that the resulting target image can have the specified face type. The user can select different face types on the basis of selecting the target object, and the expressive force of the target image is further improved.
In summary, the present disclosure first responds to a user request, acquires an original image, then extracts original image features from the original image, determines initial model parameters based on the original image features, and generates an initial image corresponding to a target object according to the initial model parameters. And finally, extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image. According to the method and the device, the initial generation model is used for generating the low-precision initial image, and the high-precision target image is generated by combining the target generation model, so that the similarity between the target image and the initial image can be improved, and the expressive force of the target image is improved.
Fig. 11 is a block diagram illustrating an image generation apparatus according to an exemplary embodiment, and as shown in fig. 11, the apparatus 200 may include:
an obtaining module 201, configured to obtain an original image in response to a user request.
The initial generation module 202 is configured to extract an original image feature of an original image, determine an initial model parameter based on the original image feature, and generate an initial image corresponding to the target object according to the initial model parameter.
And the target generation module 203 is configured to extract initial image features from the initial image, generate target model parameters based on the initial image features, and correct a target object in the initial image according to the target model parameters to obtain a target image.
Fig. 12 is a block diagram illustrating another image generation apparatus according to an exemplary embodiment, and as shown in fig. 12, the apparatus 200 may further include:
a dividing module 204, configured to divide the initial image into a plurality of local images according to the key portion of the target object.
Accordingly, the target generation module 203 may be configured to:
and extracting global features from the initial image, and extracting local features corresponding to each local image from the plurality of local images.
Fig. 13 is a block diagram illustrating another image generation apparatus according to an exemplary embodiment, and as shown in fig. 13, the target generation module 203 may include:
the extracting sub-module 2031 is configured to determine a global model parameter according to the global feature, and determine a corresponding local model parameter according to each local feature.
The determining submodule 2032 is configured to determine the target model parameter according to the global model parameter and the plurality of local model parameters.
The generating sub-module 2033 is configured to correct the target object in the initial image according to the target model parameter, so as to obtain a target image.
In one implementation, the initial generation module 202 may be configured to: extracting original image characteristics from an original image through a pre-trained initial generation model, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters. The initial generative model is trained from a plurality of sample images.
The target generation module 203 may be configured to:
extracting initial image characteristics from the initial image through a pre-trained target generation model, generating target model parameters based on the initial image characteristics, and correcting a target object in the initial image according to the target model parameters to obtain a target image.
The target generation model is trained from a plurality of image sets, each image set comprising: the method includes generating a training image by initially generating a model according to a sample image, and dividing the training image into a plurality of local training images.
In one implementation, the initial generative model is trained by:
step A, obtaining a plurality of sample images generated by a game engine according to a preset rule, and determining real model parameters used when the game engine generates corresponding sample images.
And step B, extracting sample image characteristics of each sample image through the initial generation model, and determining initial training model parameters based on the sample image characteristics.
And C, training an initial generation model according to the initial training model parameters and the real model parameters corresponding to the sample image.
In another implementation, the target generation model is trained by:
and D, generating a training image through an initial generation model according to the sample image.
And E, dividing the training image into a plurality of local training images according to the key parts of the target object.
And F, generating a target training image through a target generation model according to the training image and the plurality of local training images.
And G, training a target generation model according to the sample image and the target training image.
In another implementation, the implementation of step F may include:
step F1, extracting global training features from the training images, and extracting local training features corresponding to each of the local training images from the plurality of local training images.
Step F2, determining global training model parameters according to the global training features, and determining corresponding local training model parameters according to each local training feature.
Step F3, determining target training model parameters according to the global training model parameters and the plurality of local training model parameters.
Step F4, generating a target training image according to the target training model parameters.
In yet another implementation, step G may be implemented by:
and G1, determining the global loss according to the sample image and the target training image.
And G2, determining local loss according to the sample image and a plurality of intermediate training images, wherein the intermediate training images are generated according to each local training model parameter.
And G3, determining the comprehensive loss according to the global loss and the local loss, and training a target generation model by using a back propagation algorithm with the goal of reducing the comprehensive loss as a target.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
In summary, the present disclosure first responds to a user request, acquires an original image, then extracts original image features from the original image, determines initial model parameters based on the original image features, and generates an initial image corresponding to a target object according to the initial model parameters. And finally, extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image. According to the method and the device, the initial generation model is used for generating the low-precision initial image, and the high-precision target image is generated by combining the target generation model, so that the similarity between the target image and the initial image can be improved, and the expressive force of the target image is improved.
Referring now to fig. 14, a schematic structural diagram of an electronic device (e.g., an execution main body in the embodiment of the present disclosure, which may be a terminal device or a server) 300 suitable for implementing the embodiment of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 14 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 14, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 14 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processing device 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the terminal devices, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: responding to a user request, and acquiring an original image; extracting original image characteristics from the original image, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters; extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not in some cases constitute a limitation of the module itself, and for example, an acquisition module may also be described as a "module that acquires an original image".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides an image generation method according to one or more embodiments of the present disclosure, including: responding to a user request, and acquiring an original image; extracting original image characteristics from the original image, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters; extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image.
Example 2 provides the method of example 1, further comprising, in accordance with one or more embodiments of the present disclosure: dividing the initial image into a plurality of local images according to the key part of the target object; the extracting initial image features from the initial image comprises: and extracting global features from the initial image, and extracting local features corresponding to each local image from the plurality of local images.
Example 3 provides the method of example 2, wherein generating target model parameters based on the initial image features and modifying the target object in the initial image according to the target model parameters to obtain a target image, includes: determining global model parameters according to the global features, and determining corresponding local model parameters according to each local feature; determining the target model parameters according to the global model parameters and the plurality of local model parameters; and correcting the target object in the initial image according to the target model parameters to obtain the target image.
Example 4 provides the method of example 1, wherein extracting original image features from the original image, determining initial model parameters based on the original image features, and generating an initial image corresponding to a target object according to the initial model parameters, comprises: extracting the original image characteristics of the original image through a pre-trained initial generation model, determining the initial model parameters based on the original image characteristics, and generating an initial image corresponding to the target object according to the initial model parameters; the initial generation model is obtained by training according to a plurality of sample images; the extracting initial image features from the initial image, generating target model parameters based on the initial image features, and modifying the target object in the initial image according to the target model parameters to obtain a target image includes: extracting the initial image features of the initial image through a pre-trained target generation model, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain the target image; the target generation model is trained from a plurality of image sets, each of the image sets comprising: according to the sample image, a training image generated through the initial generation model and a plurality of local training images obtained by dividing the training image are obtained.
Example 5 provides the method of example 4, the initial generative model being trained in the following manner: acquiring a plurality of sample images generated by a game engine according to a preset rule, and determining real model parameters used when the game engine generates the corresponding sample images; extracting sample image features for each sample image through the initial generation model, and determining initial training model parameters based on the sample image features; and training the initial generation model according to the initial training model parameters and the real model parameters corresponding to the sample image.
Example 6 provides the method of example 4 or example 5, the target generation model being trained by: generating the training image through the initial generation model according to the sample image; dividing the training image into a plurality of local training images according to the key part of the target object; generating a target training image through the target generation model according to the training image and the plurality of local training images; and training the target generation model according to the sample image and the target training image.
Example 7 provides the method of example 6, the generating, from the training image and a plurality of the local training images, a target training image by the target generation model according to the training image, including: extracting global training features from the training images, and extracting local training features corresponding to each local training image from a plurality of local training images; determining global training model parameters according to the global training features, and determining corresponding local training model parameters according to each local training feature; determining target training model parameters according to the global training model parameters and the plurality of local training model parameters; and generating the target training image according to the target training model parameters.
Example 8 provides the method of example 7, the training the target generative model from the sample images and the target training images, comprising: determining global loss according to the sample image and the target training image; determining local loss according to the sample image and a plurality of intermediate training images, wherein the intermediate training images are generated according to each local training model parameter; and determining the comprehensive loss according to the global loss and the local loss, and training the target generation model by using a back propagation algorithm with the goal of reducing the comprehensive loss as a target.
Example 9 provides, in accordance with one or more embodiments of the present disclosure, an image generation apparatus comprising: the acquisition module is used for responding to a user request and acquiring an original image; the initial generation module is used for extracting original image characteristics of the original image, determining initial model parameters based on the original image characteristics, and generating an initial image corresponding to a target object according to the initial model parameters; and the target generation module is used for extracting initial image features from the initial image, generating target model parameters based on the initial image features, and correcting the target object in the initial image according to the target model parameters to obtain a target image.
Example 10 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing device, implements the steps of the methods of examples 1-8, in accordance with one or more embodiments of the present disclosure.
Example 11 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to implement the steps of the methods of examples 1 to 8.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.