CN116152403B

CN116152403B - Image generation method and device, storage medium and electronic equipment

Info

Publication number: CN116152403B
Application number: CN202310027752.6A
Authority: CN
Inventors: 曹佳炯; 丁菁汀
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2024-06-07
Anticipated expiration: 2043-01-09
Also published as: CN116152403A

Abstract

The specification discloses an image generation method, an image generation device, a storage medium and electronic equipment, wherein the method comprises the following steps: generating a first virtual three-dimensional image based on a target two-dimensional image of at least one reference angle of a target object, performing image quality evaluation processing on the first virtual three-dimensional image based on the acquired reference virtual three-dimensional image and the first virtual three-dimensional image to obtain image evaluation information of the first virtual three-dimensional image, and then determining the target virtual three-dimensional image corresponding to the target object based on the image evaluation information and the first virtual three-dimensional image.

Description

Image generation method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image generating method, an image generating device, a storage medium, and an electronic device.

Background

With the rapid development of computer technology, virtual scenes such as metauniverse and virtual reality are increasingly widely applied in recent years, and currently, related virtual scene applications such as metauniverse are in a rapid development stage, and most of research and application are focused on image generation of a three-dimensional virtual image.

Disclosure of Invention

The specification provides an image generation method, an image generation device, a storage medium and electronic equipment, wherein the technical scheme is as follows:

In a first aspect, the present specification provides a character generation method, the method comprising:

Generating a first virtual three-dimensional image corresponding to a target object based on a target two-dimensional image of at least one reference angle of the target object;

Acquiring a reference virtual three-dimensional image, and performing image quality evaluation processing on the first virtual three-dimensional image based on the reference virtual three-dimensional image and the first virtual three-dimensional image to obtain image evaluation information of the first virtual three-dimensional image;

and determining a target virtual three-dimensional image corresponding to the target object based on the image evaluation information and the first virtual three-dimensional image.

In a second aspect, the present specification provides an image generation apparatus, the apparatus comprising:

The image generation module is used for generating a first virtual three-dimensional image corresponding to a target object based on a target two-dimensional image of at least one reference angle of the target object;

The quality evaluation module is used for acquiring a reference virtual three-dimensional image, and performing image quality evaluation processing on the first virtual three-dimensional image based on the reference virtual three-dimensional image and the first virtual three-dimensional image to acquire image evaluation information of the first virtual three-dimensional image;

And the image optimization module is used for determining a target three-dimensional image corresponding to the target object based on the image evaluation information and the first three-dimensional image.

In a third aspect, the present description provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a fourth aspect, the present description provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

In a fifth aspect, the present description provides a computer program product storing at least one instruction adapted to be loaded by a processor and to perform the method steps of one or more embodiments of the present description.

The technical scheme provided by some embodiments of the present specification has the following beneficial effects:

In one or more embodiments of the present disclosure, an electronic device generates a first avatar based on a target two-dimensional image of at least one reference angle of a target object, and by introducing the acquired reference avatar, it can realize that the avatar evaluation of the first avatar obtains avatar evaluation information, and then performs avatar optimization based on the avatar evaluation information and the first avatar, so as to determine a target avatar corresponding to the target object, thereby realizing supervision of the avatar generation quality in combination with the reference avatar after the avatar is generated, realizing accurate quality control through supervision, and simultaneously obtaining a high-quality target avatar in combination with the avatar evaluation information, thereby giving consideration to the quality of avatar generation and the efficiency of avatar generation, and ensuring normal operation of avatar generation transactions.

Drawings

In order to more clearly illustrate the technical solutions of the present specification or the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario of an image generating system provided in the present specification;

FIG. 2 is a flow chart of an image generation method provided in the present specification;

FIG. 3 is a schematic diagram of a training process involved in an avatar generation model provided in the present specification;

FIG. 4 is a flow chart illustrating a target avatar determination process provided herein;

fig. 5 is a schematic structural view of an image generating apparatus provided in the present specification;

FIG. 6 is a schematic diagram of an electronic device provided in the present specification;

FIG. 7 is a schematic diagram of the architecture of the operating system and user space provided herein;

FIG. 8 is an architecture diagram of the android operating system of FIG. 7;

FIG. 9 is an architecture diagram of the IOS operating system of FIG. 7.

Detailed Description

The following description of the embodiments of the present invention will be made apparent from, and elucidated with reference to, the drawings of the present specification, in which embodiments described are only some, but not all, embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In the related art, the image generation of the three-dimensional virtual image is often realized through an image generation model, the focus is usually at the image generation stage in practical application, and the attention to the quality of the virtual image is less, after the image generation method is applied to the line-end side application scene, the quality of the virtual image is lack of attention and is controlled, and the quality of the virtual image is uneven due to the fact that the line-end side application scene is not controlled, and the user experience is poor due to the fact that part of the low-quality image is poor. In addition, after the three-dimensional avatar is applied to the scene at the upper line end, the model error calibration of the avatar generation model must be performed offline, and the model error calibration cannot be performed on the basis of ensuring the online avatar generation task; it can be seen that the image generation method in the related art has a certain limitation.

The present specification is described in detail below with reference to specific examples.

Referring to fig. 1, an application scenario diagram of an image generating system provided in the present specification is shown. The avatar generation system may include at least a client cluster, as shown in fig. 1, and in some embodiments, the avatar generation system may include at least a client cluster and a service platform 100.

The client cluster may include at least one client, as shown in fig. 1, specifically including a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, …, and a client n corresponding to a user n, where n is an integer greater than 0.

Each client in the client cluster may be a communication-enabled electronic device including, but not limited to: wearable devices, handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Electronic devices in different networks may be called different names, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, an electronic device in a Personal Digital Assistant (PDA), a 5G network, or a future evolution network, etc.

The service platform 100 may be a separate server device, and the service platform 100 may be provided with a shared three-dimensional virtual environment including a plurality of three-dimensional character objects, and each client may control one or more of the three-dimensional character objects, and as an example of this embodiment, the client may maintain an application program for avatar generation, such as any one of a 3D game application program, a 3D image processing program, a 3D avatar generation program, a 3D expression management program, a 3D game class program, and the like, which relate to an avatar generation scene.

The service platform 100 may be, for example: rack-mounted, blade, tower-type or cabinet-type server equipment or hardware equipment with stronger computing capacity such as workstations, mainframe computers and the like is adopted; the server cluster may also be a server cluster formed by a plurality of servers, and each server in the server cluster may be formed in a symmetrical manner, wherein each server is functionally equivalent and functionally equivalent in a transaction link, and each server may independently provide services to the outside, and the independent provision of services may be understood as no assistance of another server is needed.

In one or more embodiments of the present disclosure, the service platform 100 may establish a communication connection with at least one client in the client cluster, and based on the communication connection, complete interaction of relevant data in the avatar generation process, such as the client may collect target two-dimensional images corresponding to several reference angles of the current target object, send the target avatar to the service platform 100 by performing the avatar generation method of one or more embodiments of the present disclosure, and the service platform 100 may update the target avatar into the three-dimensional virtual environment to share with other clients.

It should be noted that, the service platform 100 establishes a communication connection with at least one client in the client cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, where the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., a target compression package). All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), etc. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

The image generation system embodiments provided in the present specification belong to the same concept as the image generation method in one or more embodiments, and an execution subject corresponding to the image generation method related to one or more embodiments in the present specification may be the service platform 100 described above; the execution subject corresponding to the image generation method in one or more embodiments of the specification may also be a terminal device corresponding to the client, which is specifically determined based on an actual application environment. The implementation process of the image generating system embodiment may be described in detail in the following method embodiment, which is not described herein.

Based on the scene diagram shown in fig. 1, the image generation method provided in one or more embodiments of the present specification will be described in detail.

Referring to fig. 2, a flow diagram of a figure generation method, which may be implemented in dependence on a computer program, may be run on a figure generation device based on von neumann system is provided for one or more embodiments of the present description. The computer program may be integrated in the application or may run as a stand-alone tool class application. The image generating device can be an electronic device, and the electronic device can be a service platform or a terminal device.

Specifically, the image generation method comprises the following steps:

S102: generating a first virtual three-dimensional image corresponding to a target object based on a target two-dimensional image of at least one reference angle of the target object;

application scenarios to which the avatar generation method of one or more embodiments of the present disclosure may be applied include, but are not limited to, fitting of one or more of a metauniverse scenario, an augmented reality scenario, a virtual reality scenario, a visual special effect scenario, etc., in which the avatar generation method may be employed to implement generating a virtual three-dimensional avatar of a virtual object in a virtual space world based on a two-dimensional image of a physical target object of a real physical world.

In some application scenarios, often by collecting multiple target two-dimensional images of multiple reference angles of a target object in the real physical world, an execution subject may execute an image generation method on a virtual object (a virtual object of the target object in the virtual space world) based on the multiple target two-dimensional images to generate a virtual three-dimensional image;

The target object is a physical object in real object world space, such as a user, an animal, a plant and the like in real physical world space;

It can be understood that the image acquisition part (such as a camera) can acquire target two-dimensional images of different reference angles of the target object, and the first virtual three-dimensional image of the target object corresponding to the virtual object is generated by adopting at least one virtual image generation mode.

Alternatively, the avatar generation method may be to use an avatar generation model, such as a neural radiation field model, receive target two-dimensional images (such as RGB user images) of multiple reference angles, and output a first avatar of the target object corresponding to the virtual object, where the first avatar is a 3D avatar.

Optionally, the avatar generating method may use a three-dimensional avatar engine, and the feature data is extracted by the three-dimensional avatar engine based on the target two-dimensional images of the plurality of reference angles to synthesize the feature data, thereby generating the corresponding first virtual three-dimensional shape.

In one or more embodiments of the present specification, taking a target object as an example, the virtual three-dimensional character includes at least one of a three-dimensional face character and a three-dimensional trunk character, wherein when the three-dimensional character includes the three-dimensional face character, the three-dimensional face character includes at least one portion of hair, eyes, eyebrows, nose, mouth, ears, face shape, and makeup, and when the three-dimensional character includes the three-dimensional trunk character, the three-dimensional body character includes at least one portion of legs, feet, hands, arms, neck, chest, abdomen, and buttocks, without limitation.

Alternatively, the angle value and the number of angles of the reference angle may be preset based on the actual application scenario, and the specific angle value and the number of angles are not specifically limited herein.

Optionally, the terminal device at the end side may collect target two-dimensional images of at least one reference angle of the target object, and then send the target two-dimensional images of a plurality of reference angles to the service platform, and the service platform generates the first virtual three-dimensional image; or, generating, by the terminal device at the end side, a first virtual three-dimensional avatar based on the target two-dimensional images of the several reference angles.

S104: acquiring a reference virtual three-dimensional image, and performing image quality evaluation processing on the first virtual three-dimensional image based on the reference virtual three-dimensional image and the first virtual three-dimensional image to obtain image evaluation information of the first virtual three-dimensional image;

The reference virtual three-dimensional image and the first virtual three-dimensional image form a Pair of Pair-wise image groups;

Illustratively, the reference avatar is not a reference avatar for the target object acquired at the current time, and in terms of the avatar generation time dimension, the reference avatar is not a three-dimensional avatar at the same time or in the same time period as the first avatar, and there is usually no association between the reference avatar and the first avatar.

Optionally, the reference three-dimensional image is a pre-labeled three-dimensional image for the current image quality evaluation reference, and the acquisition object corresponding to the reference three-dimensional image and the first three-dimensional image can be different target objects.

The reference avatar may be an avatar previously generated by manual screening or manual based on an expert service.

Optionally, a reference virtual three-dimensional image with high image quality can be screened or generated in advance and stored, for example, the reference virtual three-dimensional image with the image quality larger than a first quality threshold can be stored, and by comparing the characteristic difference between the two images, the smaller the characteristic difference is, the better the image quality evaluation information is;

Optionally, the reference virtual three-dimensional image with low image quality can be screened or generated in advance and stored, for example, the reference virtual three-dimensional image with the image quality smaller than the second quality threshold can be stored, and for the reference virtual three-dimensional image with low image quality, the larger the characteristic difference is, the better the image quality evaluation information is by comparing the characteristic difference between the two;

It will be appreciated that the Pair of the reference avatar based on the high or low image quality and the currently generated first avatar may be a Pair of Pair-wise avatars, even though they belong to different objects, to assist in image quality evaluation of the first avatar, the image quality of the first virtual three-dimensional image can be measured through the overall dimension and the local dimension of the image, meanwhile, the reference virtual three-dimensional image with high image quality is introduced to perform image quality evaluation to obtain image evaluation information, and the image evaluation information can assist in performing image quality adjustment and optimization on the first virtual three-dimensional image, so that the high-quality target virtual three-dimensional image can be assisted to be generated.

In practical application, the image quality evaluation directly performed on the first virtual three-dimensional image has larger contingency, and the image quality evaluation is unstable. Even if a machine learning model is used to generate a model for quality evaluation, the above situation is difficult to overcome, and in one or more embodiments of the present specification, a pair-wise based quality evaluation can be achieved by introducing a reference virtual three-dimensional figure, and accuracy of quality evaluation can be greatly improved by comparing feature differences between the two to perform quality evaluation.

In one or more embodiments of the present disclosure, the avatar evaluation information may be an image local quality map, which may feed back a region quality evaluation (e.g., may be a score or rating) of at least one local avatar region of the first avatar; the character evaluation information may be an image character evaluation parameter, such as an overall image character score or rating, of the overall character feeding back the first avatar.

S106: and determining a target virtual three-dimensional image corresponding to the target object based on the image evaluation information and the first virtual three-dimensional image.

The target virtual three-dimensional image is an virtual three-dimensional image obtained by performing two-stage optimization on the first virtual three-dimensional image.

In one possible implementation, the low-quality area in the first virtual three-dimensional image may be subjected to two-stage image optimization based on the image local quality map, that is, the area to be optimized is determined based on the area quality evaluation (for example, the scoring or the grading) of the corresponding local image area in the image local quality map, and then the local two-stage image optimization is performed on the area to be optimized in the first virtual three-dimensional image, so as to obtain the target virtual three-dimensional image.

In one possible implementation, the low-quality part in the first virtual three-dimensional image may be subjected to two-stage image optimization based on the image local quality map, that is, the part to be optimized is determined based on the regional quality evaluation (for example, the scoring or the grading) of the corresponding local image part in the image local quality map, and then the part to be optimized in the first virtual three-dimensional image is subjected to two-stage image optimization to obtain the target virtual three-dimensional image.

After the first virtual three-dimensional image is initially generated, a Pair of Pair-wise image groups formed by the reference virtual three-dimensional images are introduced to perform image quality evaluation processing, so that accurate image evaluation information can be obtained, the image evaluation information can feed back partial quality poor image parts, two-stage image optimization is performed on the partial quality poor image parts in a targeted manner, and image generation quality and image generation efficiency can be considered.

In one or more of the specifications, the electronic equipment generates the first virtual three-dimensional image based on the target two-dimensional image of at least one reference angle of the target object, the image evaluation of the first virtual three-dimensional image can be realized by introducing the acquired reference virtual three-dimensional image to obtain image evaluation information, then the target virtual three-dimensional image corresponding to the target object can be determined by performing image adjustment based on the image evaluation information and the first virtual three-dimensional image, the supervision of the image generation quality by combining the reference virtual three-dimensional image after the virtual three-dimensional image is generated is realized, accurate quality control can be realized by supervision, and meanwhile, the target virtual three-dimensional image with high quality can be obtained by combining the image evaluation information, so that the quality of virtual three-dimensional image generation and the efficiency of the virtual three-dimensional image are considered, and the normal running of virtual three-dimensional image generation transaction is ensured.

Optionally, based on the one or more embodiments, when executing the target two-dimensional image based on at least one reference angle of the target object, the generating the first virtual three-dimensional image corresponding to the target object may include the following schemes:

Specifically, the electronic device may input a target two-dimensional image of at least one reference angle of a target object into the avatar generation model, and output a first avatar corresponding to the target object;

The avatar generation model is generated by training an initial avatar generation model based on an image reconstruction supervisory signal, an image edge supervisory signal and a high-frequency information supervisory signal.

Schematically, the virtual image generating model is obtained through pre-training, the virtual image generating model is obtained through model training based on image reconstruction supervision signals, image edge supervision signals and high-frequency information supervision signals, in the virtual image generating model processing stage, the model processing effect can be supervised through the supervision of introducing the edge supervision and the high-frequency information retention, the edge supervision and the high-frequency supervision can directly feed back virtual image generating details, the virtual image generating process is greatly optimized, and the virtual three-dimensional image generating quality is improved.

In one possible embodiment, as shown in fig. 3, fig. 3 is a schematic diagram of a training flow related to an avatar generation model designed in the present specification, and a model training process of the initial avatar generation model by creating an initial avatar generation model may include the following schemes:

s202: acquiring a sample two-dimensional image of at least one reference sample angle of a sample object;

In a model training stage of the initial avatar generation model, a plurality of sample objects can be determined, and for any sample object, sample two-dimensional images of different reference sample angles of the sample objects are obtained;

It should be noted that, the number and the angle value of the reference sample angles of each sample object are determined based on the actual model training scene, and the number and the angle value of the reference sample angles are not limited herein.

In one or more embodiments of the present description, a sample two-dimensional image of at least one reference sample angle is used as an input to an initial avatar generation model during a model training phase, the initial avatar generation model may be derived based on one or more of the machine learning networks;

S204: performing model training on the initial virtual image generation model by adopting the sample two-dimensional images, and performing three-dimensional virtual image generation on each sample two-dimensional image through the initial virtual image generation model in the model training process to obtain a first virtual three-dimensional sample image;

The initial avatar generation model may be implemented based on a fit of one or more of a machine learning network including, but not limited to, a convolutional neural network (Convolutional Neural Network, CNN) model, a deep neural network (Deep Neural Network, DNN) model, a recurrent neural network (Recurrent Neural Networks, RNN), a model, an embedded (embedding) model, a gradient-lifting decision tree (Gradient Boosting Decision Tree, GBDT) model, a logistic regression (Logistic Regression, LR) model, and the like.

For example, the initial avatar generation model may be a neural radiation field model selected from machine-based learning models. Different types of models can be used to create the initial avatar generation model in different scenarios, the method is applicable to all types of avatar generation models

In the model training process, multiple rounds of training are carried out on an initial avatar generation model, each round of training can be carried out on the basis of sample two-dimensional images of a plurality of reference sample angles of sample objects, the initial avatar generation model can extract sample object characteristics from each sample two-dimensional image, and then three-dimensional avatar generation is carried out on the basis of the sample object characteristics, so that a first virtual three-dimensional sample image is obtained;

Illustratively, the first avatar may be understood as an avatar of the virtual three-dimensional sample that the initial avatar generation model outputs during each round of forward propagation.

S206: determining a first projection sample image of the first virtual three-dimensional sample image at the reference sample angle, acquiring first image edge information of the first projection sample image and sample image edge information of the sample two-dimensional image, and acquiring first image high-frequency information of the first projection sample image and sample image high-frequency information of the sample two-dimensional image;

it can be appreciated that in each round of model training, model loss is calculated based on the back propagation algorithm, and model parameters of the initial avatar generation model are adjusted based on the model loss until the initial avatar generation model meets the model ending training condition, so as to obtain the avatar generation model.

Further, in the back propagation process of each round of model training, a first projection sample image of the first virtual three-dimensional sample image at the reference sample angle can be determined, the reference sample angle is assumed to be the angle theta, and the projection of the first virtual three-dimensional sample image at the angle theta is calculated, so that the first projection sample image is obtained. Further, in the case where the reference sample angle is plural, the number of first projection sample images is plural.

The first image edge information is obtained by extracting image edge features of a first projection sample image through at least one network layer, and the first image edge information can be characterized in the form of image edge feature vectors.

Illustratively, edge information can be added on the reconstructed 3D virtual image projection in each model training process based on the first image edge information and the sample image edge information to serve as an edge supervision signal;

The sample image edge information is obtained by extracting image edge characteristics of a sample two-dimensional image through at least one network layer, and the sample image edge information can be characterized in the form of image edge characteristic vectors.

The first image high-frequency information is obtained by extracting image high-frequency features of a first projection sample image through at least one network layer, so that the extracted image high-frequency information of the first projection sample image is obtained by extracting image high-frequency features of a sample two-dimensional image through at least one network layer; based on the first image high-frequency information and the sample image high-frequency information, adding the high-frequency information on the reconstructed 3D virtual image projection in each model training process to serve as a high-frequency supervision signal;

S208: and determining the image reconstruction monitoring signal, the image edge monitoring signal and the high-frequency information monitoring signal based on the first projection sample image and the sample two-dimensional image, and performing model parameter adjustment on an initial avatar generation model based on the image reconstruction monitoring signal, the image edge monitoring signal and the high-frequency information monitoring signal to obtain an avatar generation model.

It will be appreciated that the model loss back propagation process of the initial avatar generation model corresponds to the image reconstruction supervisory signal, the image edge supervisory signal and the high frequency information supervisory signal.

Illustratively, the determining the image reconstruction supervisory signal, the image edge supervisory signal and the high frequency information supervisory signal based on the first projected sample image and the sample two-dimensional image may be:

Determining an image reconstruction loss by adopting a third loss calculation formula based on the first projection sample image and the sample two-dimensional image, taking the image reconstruction loss as an image reconstruction monitor signal, determining an image edge loss by adopting a fourth loss calculation formula based on the first image edge information and the sample image edge information, taking the image edge loss as an image edge monitor signal, determining an image high-frequency loss by adopting a fifth loss calculation formula based on the first image high-frequency information and the sample image high-frequency information, and taking the image high-frequency loss as an image high-frequency monitor signal;

Wherein the third loss calculation satisfies the following formula:

The Loss ₃ is the image reconstruction penalty, the NERF (x) is the first virtual three-dimensional sample image, the Proj _theta (NERF (x)) is the first projected sample image, and the x _theta is the sample two-dimensional image;

Schematically, theta represents the reference sample angle, NERF (x) represents the model-generated 3D avatar, i.e., the first virtual three-dimensional sample avatar; the Proj _theta (NERF (x)) represents a projected image (2D image) of the 3D avatar at an angle theta.

Wherein the fourth loss calculation satisfies the following formula:

The Loss ₄ is the image edge Loss, the edge (Proj _theta (NERF (x))) is the first image edge information, the edge (x _theta) is the sample image edge information;

Illustratively, edge (x _theta) may be derived by extracting edge information for a corresponding image based on edge detection operators in at least one network layer.

Wherein the fifth loss calculation satisfies the following formula:

The Loss ₅ is the image high-frequency Loss, the DCT _hp(Proj_theta (NERF (x)) is the first image high-frequency information, and the DCT _hp(x_theta) is the sample image high-frequency information.

Illustratively, DCT _hp () represents extracting a high-frequency part by means of DCT transformation through at least one network layer to obtain high-frequency information such as a sample image;

In one or more embodiments of the present disclosure, the avatar evaluation information includes an image local quality map and an image avatar evaluation parameter, and the performing, based on the reference avatar and the first avatar, an avatar quality evaluation process on the first avatar to obtain the avatar evaluation parameter of the first avatar may be:

And the electronic equipment performs image difference comparison processing on the reference virtual three-dimensional image and the first virtual three-dimensional image to obtain an image local quality map and an image evaluation parameter.

Illustratively, the image local quality map may feed back the regional quality evaluation (for example, may be scoring or grading) of at least one local image region of the avatar three-dimensional image, which may be understood as that the image local quality map is composed of a plurality of local image regions and scores or grading corresponding to the local image regions.

Illustratively, the image character evaluation parameter is an image character evaluation parameter for an overall character of the first avatar, such as an overall image character score or rating.

In one possible implementation, a combined quality evaluation model may be trained in advance, and the combined quality evaluation model is used to implement image quality evaluation processing on the first virtual three-dimensional image, and output an image local quality map and an image evaluation parameter.

Specifically, the electronic device may use the reference avatar and the first avatar as a three-dimensional avatar Pair, which may be understood as a Pair of Pair-wise avatar groups, and input the three-dimensional avatar Pair to the combined quality assessment model;

The electronic device can respectively extract the first virtual image feature of the reference virtual three-dimensional image and the second virtual image feature of the first virtual three-dimensional image through the combined quality evaluation model, and perform image difference comparison processing based on the first virtual image feature and the second virtual image feature, and output an image local quality map and an image evaluation parameter.

The first avatar characteristics are avatar characteristics obtained by extracting characteristics of a reference three-dimensional avatar;

The second avatar characteristics are avatar characteristics obtained by extracting characteristics of the first three-dimensional avatar.

Alternatively, an initial combined quality assessment model may be created in advance, and the model training process of the initial combined quality assessment model may include the steps of:

B2: acquiring at least one group of three-dimensional virtual image sample pairs aiming at an initial combined quality evaluation model, wherein the three-dimensional virtual image sample pairs comprise a first virtual three-dimensional sample image and a reference virtual three-dimensional sample image, the reference virtual three-dimensional sample image corresponds to a local quality map label and an image evaluation parameter label, and the initial combined quality evaluation model at least comprises an image feature coding network and a feature fusion network;

the initial combined quality assessment model may be based on a fit of one or more of the machine learning models.

In the model training phase, a large number of pre-selected avatars may be used as reference avatars, typically high quality avatars that are manually selected are pre-used as reference avatars, which may include reference avatars that are actually used in some embodiments.

Further, when the reference virtual three-dimensional sample image is set for label marking, expert service can be called to manually set the corresponding local quality spectrum as a local quality spectrum label and set the image evaluation parameter (such as overall score) as an image evaluation parameter label.

The first virtual three-dimensional sample image is the virtual three-dimensional sample image generated by the virtual image generating model in the process, and is taken as a sample of the initial combined quality assessment model in the model training stage.

Illustratively, the initial combined quality assessment model includes at least an image feature encoding network and a feature fusion network;

the character encoding network may be a character encoder;

The feature fusion network may be a feature fusion device;

b4: the three-dimensional virtual image sample is subjected to model training for inputting an initial combined quality evaluation model, in the model training process, a first virtual image sample characteristic of the reference virtual three-dimensional sample image and a second virtual image sample characteristic of the first virtual three-dimensional image are respectively extracted through the image characteristic coding network, and a sample image local quality map and sample image evaluation parameters are obtained by adopting the characteristic fusion network based on the first virtual image sample characteristic and the second virtual image sample characteristic;

alternatively, the character feature encoding network may be classified into a first character feature encoding network extracting a first character sample feature of the reference avatar three-dimensional sample character and a second character feature encoding network extracting a second character sample feature of the first avatar three-dimensional sample character;

It can be understood that in the forward propagation process of model training, the first avatar sample feature f1 of the reference avatar three-dimensional sample figure and the second avatar sample feature f2 of the first avatar three-dimensional figure are respectively extracted through the avatar feature coding network, and then the image difference comparison processing is performed by adopting the feature fusion network based on the first avatar sample feature f1 and the second avatar sample feature f2 to obtain a sample image local quality map M and a sample image figure evaluation parameter q;

The local quality map of the sample image is the regional quality evaluation (such as grading or grading) of at least one local image region of the feedback virtual three-dimensional sample image, and can be understood that the local quality map of the sample image is composed of a plurality of local image regions and grading or grading corresponding to the local image regions.

The sample image character evaluation parameter is an image character evaluation parameter for the overall character of the first virtual three-dimensional sample character, such as an overall image character score or rating.

B6: determining a first model loss based on the sample image local quality map, the sample image evaluation parameter, the local quality map label and the image evaluation parameter label, and performing model parameter adjustment on the initial combined quality evaluation model by adopting the first model loss to obtain a combined quality evaluation model.

It can be appreciated that in the back propagation process of model training, determining a first model loss based on the sample image local quality map, the sample image evaluation parameter, the local quality map label and the image evaluation parameter label, and performing model parameter adjustment on the initial combined quality evaluation model by adopting the first model loss until the initial combined quality evaluation model meets the model finishing training condition, so as to obtain the combined quality evaluation model.

In one or more embodiments herein, the model ending training condition may include, for example, a loss function having a value less than or equal to a preset loss function threshold, a number of iterations reaching a preset number of times threshold, and so on. The specific model end training conditions may be determined based on actual conditions and are not specifically limited herein.

Illustratively, the determining the first model loss based on the local quality map of the sample image, the image evaluation parameter of the sample image, the local quality map label, and the image evaluation parameter label may be:

The electronic equipment obtains overall quality regression loss by adopting a first loss calculation formula based on the image evaluation parameters of the sample image and the image evaluation parameter labels, and obtains local quality regression loss by adopting a second loss calculation formula based on the local quality map of the sample image and the local quality map labels;

the electronic equipment obtains a first model loss based on the overall quality regression loss and the local quality regression loss;

wherein the first loss calculation formula satisfies the following formula:

The L1 is the overall quality regression loss, the q is the image evaluation parameter of the sample image, and the q _GT is the image evaluation parameter label;

Wherein the second loss calculation satisfies the following formula:

and L2 is a local quality regression loss, M is an image evaluation parameter of the sample image, and M _eT is a local quality map label.

In the process, by training a pair-wise combined quality assessment model and improving a related instruction assessment method, the combined quality assessment model receives an avatar generated in the previous stage as input and also receives a reference avatar as input, and by comparing the difference between the two models, a local quality map and an overall image assessment parameter (such as quality score) are output, so that the quality assessment with deep fine granularity can provide a good basis for the optimization of the subsequent stage.

Illustratively, in one or more embodiments of the present description, FIG. 4 is a flow diagram of an exemplary target avatar determination process. Optionally, based on the one or more embodiments, when the determining the target three-dimensional avatar corresponding to the target object based on the avatar evaluation information and the first three-dimensional avatar may include the following schemes:

s302: determining image evaluation parameters in the image evaluation information;

in one or more embodiments of the present disclosure, the profile information may include image profile parameters (e.g., overall score);

s304: if the image evaluation parameter is greater than or equal to an evaluation parameter threshold, taking the first virtual three-dimensional image as a target virtual three-dimensional image of the target object;

The evaluation parameter threshold is a threshold value or a critical value set for an image evaluation parameter, and in some embodiments, the image evaluation parameter is greater than or equal to the evaluation parameter threshold value, at this time, the three-dimensional image generated at a common stage meets the output expectation, and the three-dimensional image quality meets the output quality requirement, so that the three-dimensional image can be directly output as a final target three-dimensional image. In some embodiments, the image avatar evaluation parameter is smaller than the evaluation parameter threshold, and at this time, the three-dimensional avatar generated in one stage generally does not meet the output expectations, and its three-dimensional avatar quality does not meet the output quality requirements, and two-stage local tuning is required to be performed on the three-dimensional avatar in one stage so as to generate the final target three-dimensional avatar.

S306: and if the image evaluation parameter is smaller than the evaluation parameter threshold, performing image local adjustment processing on the first virtual three-dimensional image to obtain a second virtual three-dimensional image, and taking the second virtual three-dimensional image as the target virtual three-dimensional image of the target object.

In one possible embodiment, it may be:

The electronic equipment determines an image local quality map in the image evaluation information, and performs image local adjustment processing on the first virtual three-dimensional image based on the image local quality map to obtain a second virtual three-dimensional image.

Specifically, the first virtual three-dimensional image is input into an virtual image tuning model, the image local quality map is called through the virtual image tuning model to perform image local tuning processing on a target local area of the first virtual three-dimensional image, and a second virtual three-dimensional image is output.

In one possible implementation, the control avatar optimization model performs two-stage avatar optimization on a low-quality area and/or a low-quality part in the first avatar based on the image local quality map, namely, determines an area or part to be optimized based on area quality evaluation (such as scoring or grading) of a corresponding local avatar area and/or a corresponding local avatar part in the image local quality map, and performs local two-stage avatar optimization on the area or part to be optimized in the first avatar to obtain the target avatar.

Optionally, an initial avatar tuning model may be created in advance, and a model training process of the initial avatar tuning model is as follows:

a2: acquiring at least one first virtual three-dimensional sample image aiming at an initial virtual image tuning model, wherein the first virtual three-dimensional sample image corresponds to a sample image local quality map and a sample image evaluating parameter;

The initial avatar tuning model may be created based on a fit of one or more of the machine learning models.

Optionally, the training sample data of the initial avatar optimization model is output of the avatar generation model, and the output is used as a first avatar three-dimensional sample avatar of the stage, and further, the first avatar three-dimensional sample avatar is an avatar with an image avatar evaluation parameter smaller than the evaluation parameter threshold.

Further, when the first virtual three-dimensional sample image is obtained, the image local quality spectrum of the first virtual three-dimensional sample image at the previous stage is also obtained to be used as a sample image local quality spectrum, and the image evaluation parameter of the first virtual three-dimensional sample image is obtained to be used as a sample image evaluation parameter. Optionally, expert service can be introduced to correct the local quality spectrum of the sample image to obtain an accurate spectrum, and correct the image evaluation parameter of the sample image to obtain an accurate parameter.

A4: inputting the first virtual three-dimensional sample image into the initial virtual image tuning model for model training;

A6: in the model training process, calling the sample image local quality map through the initial virtual image tuning model to perform image local adjustment processing on a target local sample area of the first virtual three-dimensional sample image to obtain a second virtual three-dimensional sample image, and calling the combined quality evaluation model to determine a second sample image local quality map and a second sample image evaluation parameter of the second virtual three-dimensional sample image;

in one or more embodiments of the present disclosure, the model training process of the initial avatar tuning model corresponds to a multi-round local tuning process, and the combined quality evaluation model is recalled for re-evaluation for the second virtual three-dimensional sample image obtained after each round of local tuning, so as to obtain a new second sample image local quality map and a second sample image evaluation parameter.

A8: and carrying out loss maximization detection processing on the local quality map of the second sample image and the image evaluation parameter of the second sample image to obtain a maximization detection result, and carrying out model parameter adjustment on the initial virtual image tuning model based on the maximization detection result to obtain the virtual image tuning model.

In the multi-round local tuning process, the maximized detection processing is adopted as a model tuning target to carry out model back propagation training. The maximum detection processing can be to set a preset tuning wheel number, and locally tune for multiple times within the range of the preset tuning wheel number to determine the local quality map of the image and the image evaluation parameters so as to converge to the maximum extreme value. Or the maximization detection processing may be setting a reference threshold, for example, setting a threshold map for the image local quality map, monitoring that the local score of the image local quality map of the second virtual three-dimensional sample image after the local tuning is performed for multiple times is greater than the local threshold score in the threshold map, and monitoring that the image evaluating parameter of the second virtual three-dimensional sample image after the local tuning is performed for multiple times is greater than the threshold evaluating parameter.

In a possible implementation manner, the performing a loss maximization detection process on the second sample image local quality map and the second sample image evaluation parameter to obtain a maximization detection result may be:

presetting a sixth loss calculation formula;

calculating image tuning loss through a sixth loss calculation type, and carrying out loss maximization detection processing on the local quality map of the second sample image and the image evaluation parameter of the second sample image based on the image tuning loss to obtain a maximization detection result;

If the maximized detection result is of a loss normal type, executing the second virtual three-dimensional sample image as the first virtual three-dimensional sample image, calling the sample image local quality map through the initial virtual image optimization model to perform image local adjustment processing on a target local sample area of the first virtual three-dimensional sample image to obtain a second virtual three-dimensional sample image sum, calling the combined quality evaluation model to determine a second sample image local quality map and a second sample image evaluation parameter of the second virtual three-dimensional sample image, and performing loss maximized detection processing on the second sample image local quality map and the second sample image evaluation parameter to obtain the maximized detection result;

And if the maximized detection result is of a loss maximized type, taking the initial avatar tuning model as an avatar tuning model:

Wherein the sixth loss calculation satisfies the following formula:

Loss_refine＝maximun(q)+maximun(M)

the Loss _refine is the figure optimizing Loss, the maximum (q) is a figure evaluation parameter maximizing function, the q is the second sample image figure evaluation parameter, the maximum (M) quality map maximizing function, and the M is the second sample image figure evaluation parameter.

Schematically, the loss normal type is that the loss of the image optimization does not reach the maximum value, and the maximum detection result is determined to be the loss normal type.

Schematically, the model training process generally involves multiple rounds of local tuning, each round of local tuning uses a sixth loss calculation type computing image tuning loss as a model tuning target, and determines whether the computing image tuning loss reaches a maximum value, and further: if not, namely the image tuning loss does not reach the maximum value, determining that the maximum detection result is of a loss normal type, then adjusting model parameters based on the image tuning loss, then continuously performing local tuning on the virtual three-dimensional sample image based on the initial virtual image tuning model after adjusting the parameters, continuously calculating the image tuning loss based on a sixth loss calculation type, and so on; if the loss of the initial virtual image tuning model is not the maximum value, determining that the maximum detection result is the loss maximum type, and taking the initial virtual image tuning model as the virtual image tuning model;

The maximization detection processing may be to set a preset tuning wheel number, perform local tuning within the range of the preset tuning wheel number for multiple times to determine an image local quality map and an image evaluating parameter so as to calculate an image tuning loss, so that the image tuning loss converges to a maximum extremum, if the image tuning loss does not converge to the maximum extremum, determine that the maximization detection result is a loss normal type, continue to perform local tuning, if the image tuning loss converges to the maximum extremum, determine that the maximization detection result is a loss maximization type, and use the initial avatar tuning model as an avatar tuning model.

Illustratively, the maximization detection processing may be setting a reference threshold, for example, setting a threshold map for an image local quality map, setting a threshold evaluation parameter for an image evaluation parameter, monitoring that the image local quality map and the image evaluation parameter of the second virtual three-dimensional sample image subjected to local tuning for multiple times are large, and if the local score of the current image local quality map is greater than/equal to the local threshold score in the threshold map and the image evaluation parameter is greater than/equal to the threshold evaluation parameter, determining that the maximization detection result is of a loss maximization type, and taking the initial virtual image tuning model as the virtual image tuning model; if the local score of the current image local quality map is smaller than the local threshold score in the threshold map and the kernel/or image evaluation parameter is smaller than the threshold evaluation parameter, determining that the maximized detection result is of a loss normal type, and continuing to perform local optimization.

And in the model training process, carrying out model training on the initial avatar tuning model until the initial avatar tuning model meets the model finishing training condition, and obtaining the avatar tuning model.

In one or more embodiments of the present disclosure, the model ending training condition may include, for example, a loss function having a value satisfying a preset loss function threshold, a number of iterations reaching a preset number of times threshold, and so on. The specific model end training conditions may be determined based on actual conditions and are not specifically limited herein.

In the two-stage optimization adjustment link, aiming at the virtual three-dimensional image with the integral quality which does not meet the requirement in the last step, two-stage optimization adjustment is carried out, wherein the basis of the optimization adjustment is the generated image local quality map, and the part or local area with poor quality in the local quality map is optimized in a targeted manner in the stage, so that the quality of the virtual three-dimensional image is improved.

The image generating apparatus provided in the present specification will be described in detail with reference to fig. 5. The image generating apparatus shown in fig. 5 is used to perform the method of the embodiment shown in fig. 1 to 4 of the present specification, and for convenience of explanation, only the portion relevant to the present specification is shown, and specific technical details are not disclosed, please refer to the embodiment shown in fig. 1 to 4 of the present specification.

Referring to fig. 5, a schematic structural diagram of the image generating apparatus of the present specification is shown. The image generating apparatus 1 may be implemented as all or a part of the user terminal by software, hardware or a combination of both. According to some embodiments, the figure generating device 1 comprises a figure generating module 11, a quality evaluating module 12 and a figure optimizing module 13, in particular for:

The image generation module 11 is used for generating a first virtual three-dimensional image corresponding to a target object based on a target two-dimensional image of at least one reference angle of the target object;

A quality evaluation module 12, configured to acquire a reference three-dimensional avatar, and perform an avatar quality evaluation process on the first three-dimensional avatar based on the reference three-dimensional avatar and the first three-dimensional avatar, so as to obtain an avatar evaluation information of the first three-dimensional avatar;

And the image optimization module 13 is used for determining a target three-dimensional image corresponding to the target object based on the image evaluation information and the first three-dimensional image.

Optionally, the image evaluation information includes an image local quality map and an image evaluation parameter, and the quality evaluation module 12 is configured to:

and performing image difference comparison processing on the reference virtual three-dimensional image and the first virtual three-dimensional image to obtain an image local quality map and an image evaluation parameter.

Optionally, the quality evaluation module 12 is configured to:

Taking the reference three-dimensional avatar and the first three-dimensional avatar as a three-dimensional avatar pair, and inputting the three-dimensional avatar pair into a combined quality evaluation model;

And respectively extracting a first avatar characteristic of the reference three-dimensional avatar and a second avatar characteristic of the first three-dimensional avatar by the combined quality evaluation model, and carrying out image difference comparison processing based on the first avatar characteristic and the second avatar characteristic to output an image local quality map and an image evaluation parameter.

Optionally, the quality evaluation module 12 is configured to:

acquiring at least one group of three-dimensional virtual image sample pairs aiming at an initial combined quality evaluation model, wherein the three-dimensional virtual image sample pairs comprise a first virtual three-dimensional sample image and a reference virtual three-dimensional sample image, the reference virtual three-dimensional sample image corresponds to a local quality map label and an image evaluation parameter label, and the initial combined quality evaluation model at least comprises an image feature coding network and a feature fusion network;

The three-dimensional virtual image sample is subjected to model training for inputting an initial combined quality evaluation model, in the model training process, a first virtual image sample characteristic of the reference virtual three-dimensional sample image and a second virtual image sample characteristic of the first virtual three-dimensional image are respectively extracted through the image characteristic coding network, and a sample image local quality map and sample image evaluation parameters are obtained by adopting the characteristic fusion network based on the first virtual image sample characteristic and the second virtual image sample characteristic;

Determining a first model loss based on the sample image local quality map, the sample image evaluation parameter, the local quality map label and the image evaluation parameter label, and performing model parameter adjustment on the initial combined quality evaluation model by adopting the first model loss to obtain a combined quality evaluation model.

Optionally, the quality evaluation module 12 is configured to:

Obtaining overall quality regression loss by adopting a first loss calculation formula based on the image evaluation parameters of the sample image and the image evaluation parameter labels, and obtaining local quality regression loss by adopting a second loss calculation formula based on the local quality map of the sample image and the local quality map labels;

Obtaining a first model loss based on the overall quality regression loss and the local quality regression loss;

wherein the first loss calculation formula satisfies the following formula:

Wherein the second loss calculation satisfies the following formula:

and L2 is a local quality regression loss, M is an image evaluation parameter of the sample image, and M _GT is a local quality map label.

Optionally, the image generating module 11 is configured to:

Inputting a target two-dimensional image of at least one reference angle of a target object into an avatar generation model, and outputting a first avatar corresponding to the target object;

Optionally, the image generating module 11 is configured to:

acquiring a sample two-dimensional image of at least one reference sample angle of a sample object;

Performing model training on the initial virtual image generation model by adopting the sample two-dimensional images, and performing three-dimensional virtual image generation on each sample two-dimensional image through the initial virtual image generation model in the model training process to obtain a first virtual three-dimensional sample image;

Determining a first projection sample image of the first virtual three-dimensional sample image at the reference sample angle, acquiring first image edge information of the first projection sample image and sample image edge information of the sample two-dimensional image, and acquiring first image high-frequency information of the first projection sample image and sample image high-frequency information of the sample two-dimensional image;

And determining the image reconstruction monitoring signal, the image edge monitoring signal and the high-frequency information monitoring signal based on the first projection sample image and the sample two-dimensional image, and performing model parameter adjustment on an initial avatar generation model based on the image reconstruction monitoring signal, the image edge monitoring signal and the high-frequency information monitoring signal to obtain an avatar generation model.

Optionally, the image generating module 11 is configured to:

Wherein the third loss calculation satisfies the following formula:

wherein the fourth loss calculation satisfies the following formula:

Wherein the fifth loss calculation satisfies the following formula:

The Loss _s is the image high-frequency Loss, the DCT _hp(Proj_theta (NERF (x)) is the first image high-frequency information, and the DCT _hp(x_theta) is the sample image high-frequency information.

Optionally, the image optimization module 13 is configured to:

Determining image evaluation parameters in the image evaluation information;

if the image evaluation parameter is greater than or equal to an evaluation parameter threshold, taking the first virtual three-dimensional image as a target virtual three-dimensional image of the target object;

And if the image evaluation parameter is smaller than the evaluation parameter threshold, performing image local adjustment processing on the first virtual three-dimensional image to obtain a second virtual three-dimensional image, and taking the second virtual three-dimensional image as the target virtual three-dimensional image of the target object.

Optionally, the image optimization module 13 is configured to:

and determining an image local quality map in the image evaluation information, and performing image local adjustment processing on the first virtual three-dimensional image based on the image local quality map to obtain a second virtual three-dimensional image.

Optionally, the image optimization module 13 is configured to:

inputting the first virtual three-dimensional image into an virtual image tuning model, calling the image local quality map through the virtual image tuning model to perform image local tuning processing on a target local area of the first virtual three-dimensional image, and outputting a second virtual three-dimensional image.

Optionally, the image optimization module 13 is configured to:

Acquiring at least one first virtual three-dimensional sample image aiming at an initial virtual image tuning model, wherein the first virtual three-dimensional sample image corresponds to a sample image local quality map and a sample image evaluating parameter;

Inputting the first virtual three-dimensional sample image into the initial virtual image tuning model for model training;

In the model training process, calling the sample image local quality map through the initial virtual image tuning model to perform image local adjustment processing on a target local sample area of the first virtual three-dimensional sample image to obtain a second virtual three-dimensional sample image, and calling the combined quality evaluation model to determine a second sample image local quality map and a second sample image evaluation parameter of the second virtual three-dimensional sample image;

and carrying out loss maximization detection processing on the local quality map of the second sample image and the image evaluation parameter of the second sample image to obtain a maximization detection result, and carrying out model parameter adjustment on the initial virtual image tuning model based on the maximization detection result to obtain the virtual image tuning model.

Optionally, the image optimization module 13 is configured to:

if the maximized detection result is of a loss maximized type, the initial avatar tuning model is used as an avatar tuning model;

Wherein the sixth loss calculation satisfies the following formula:

Loss_refine＝maximun(q)+maximun(M)

It should be noted that, when the image generating apparatus provided in the above embodiment performs the image generating method, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image generating device and the image generating method provided in the foregoing embodiments belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not described herein again.

The foregoing description is provided for the purpose of illustration only and does not represent the advantages or disadvantages of the embodiments.

The electronic equipment generates a first virtual three-dimensional image based on a target two-dimensional image of at least one reference angle of a target object, can realize image evaluation of the first virtual three-dimensional image to obtain image evaluation information by introducing the acquired reference virtual three-dimensional image, then performs image adjustment and optimization based on the image evaluation information and the first virtual three-dimensional image, and can determine the target virtual three-dimensional image corresponding to the target object.

The present disclosure further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded by a processor and execute the image generating method according to the embodiment shown in fig. 1 to fig. 4, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to fig. 4, which is not repeated herein.

The present disclosure further provides a computer program product, where at least one instruction is stored, where the at least one instruction is loaded by the processor and executed by the processor to perform the image generating method according to the embodiment shown in fig. 1 to fig. 4, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to fig. 4, which is not repeated herein.

Referring to fig. 6, a block diagram of an electronic device according to an exemplary embodiment of the present disclosure is shown. The electronic device in this specification may include one or more of the following: processor 110, memory 120, input device 130, output device 140, and bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.

Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-programmable gate array (FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processor (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, which may be an Android (Android) system, including an Android system-based deep development system, an IOS system developed by apple corporation, including an IOS system-based deep development system, or other systems, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device in use, such as phonebooks, audiovisual data, chat log data, and the like.

Referring to FIG. 7, the memory 120 may be divided into an operating system space in which the operating system is running and a user space in which native and third party applications are running. In order to ensure that different third party application programs can achieve better operation effects, the operating system allocates corresponding system resources for the different third party application programs. However, the requirements of different application scenarios in the same third party application program on system resources are different, for example, under the local resource loading scenario, the third party application program has higher requirement on the disk reading speed; in the animation rendering scene, the third party application program has higher requirements on the GPU performance. The operating system and the third party application program are mutually independent, and the operating system often cannot timely sense the current application scene of the third party application program, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third party application program.

In order to enable the operating system to distinguish specific application scenes of the third-party application program, data communication between the third-party application program and the operating system needs to be communicated, so that the operating system can acquire current scene information of the third-party application program at any time, and targeted system resource adaptation is performed based on the current scene.

Taking an operating system as an Android system as an example, as shown in fig. 8, a program and data stored in the memory 120 may be stored in the memory 120 with a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360 and an application layer 380, where the Linux kernel layer 320, the system runtime library layer 340 and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides the underlying drivers for various hardware of the electronic device, such as display drivers, audio drivers, camera drivers, bluetooth drivers, wi-Fi drivers, power management, and the like. The system runtime layer 340 provides the main feature support for the Android system through some C/c++ libraries. For example, the SQLite library provides support for databases, the OpenGL/ES library provides support for 3D graphics, the Webkit library provides support for browser kernels, and the like. Also provided in the system runtime library layer 340 is An Zhuoyun runtime library (Android runtime), which primarily provides some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building applications, which developers can also build their own applications by using, for example, campaign management, window management, view management, notification management, content provider, package management, call management, resource management, location management. At least one application program is running in the application layer 380, and these application programs may be native application programs of the operating system, such as a contact program, a short message program, a clock program, a camera application, etc.; and may also be a third party application developed by a third party developer, such as a game-like application, instant messaging program, photo beautification program, etc.

Taking an operating system as an IOS system as an example, the program and data stored in the memory 120 are shown in fig. 9, the IOS system includes: core operating system layer 420 (Core OS layer), core services layer 440 (Core SERVICES LAYER), media layer 460 (MEDIA LAYER), and touchable layer 480 (Cocoa Touch Layer). The core operating system layer 420 includes an operating system kernel, drivers, and underlying program frameworks that provide more hardware-like functionality for use by the program frameworks at the core services layer 440. The core services layer 440 provides system services and/or program frameworks required by the application, such as a Foundation (Foundation) framework, an account framework, an advertisement framework, a data storage framework, a network connection framework, a geographic location framework, a sports framework, and the like. The media layer 460 provides an interface for applications related to audiovisual aspects, such as a graphics-image related interface, an audio technology related interface, a video technology related interface, an audio video transmission technology wireless play (AirPlay) interface, and so forth. The touchable layer 480 provides various commonly used interface-related frameworks for application development, with the touchable layer 480 being responsible for user touch interactions on the electronic device. Such as a local notification service, a remote push service, an advertisement framework, a game tool framework, a message User Interface (UI) framework, a User Interface UIKit framework, a map framework, and so forth.

Among the frameworks illustrated in fig. 9, frameworks related to most applications include, but are not limited to: a base framework in core services layer 440 and UIKit frameworks in touchable layer 480. The infrastructure provides many basic object classes and data types, providing the most basic system services for all applications, independent of the UI. While the class provided by the UIKit framework is a base UI class library for creating touch-based user interfaces, iOS applications can provide UIs based on the UIKit framework, so it provides the application's infrastructure for building user interfaces, drawing, handling and user interaction events, responding to gestures, and so on.

The manner and principle of implementing data communication between the third party application program and the operating system in the IOS system may refer to the Android system, and this description is not repeated here.

The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are a touch display screen for receiving a touch operation thereon or thereabout by a user using a finger, a touch pen, or any other suitable object, and displaying a user interface of each application program. Touch display screens are typically provided on the front panel of an electronic device. The touch display screen may be designed as a full screen, a curved screen, or a contoured screen. The touch display screen can also be designed to be a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen is not limited in this specification.

In addition, those skilled in the art will appreciate that the configuration of the electronic device shown in the above-described figures does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the electronic device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WIRELESS FIDELITY, wiFi) module, a power supply, and a bluetooth module, which are not described herein.

In this specification, the execution subject of each step may be the electronic device described above. Optionally, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or other operating systems, which is not limited in this specification.

The electronic device of the present specification may further have a display device mounted thereon, and the display device may be various devices capable of realizing a display function, for example: cathode ray tube displays (cathode ray tubedisplay, CR), light-emitting diode displays (light-emitting diode display, LED), electronic ink screens, liquid Crystal Displays (LCD), plasma display panels (PLASMA DISPLAY PANEL, PDP), and the like. A user may utilize a display device on electronic device 101 to view displayed text, images, video, etc. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage, an audio playing device, a video playing device, a notebook, a desktop computing device, a wearable device such as an electronic watch, electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, etc.

In the electronic device shown in fig. 6, the processor 110 may be configured to invoke an application program stored in the memory 120, and specifically perform the following operations:

In one embodiment, the processor 110 performs the image evaluation information including an image local quality map and an image evaluation parameter, and performs the image quality evaluation process on the first avatar based on the reference avatar and the first avatar to obtain the image evaluation parameter of the first avatar, and performs the following operations:

In one embodiment, the processor 110 performs the image difference comparison processing on the reference avatar and the first avatar to obtain an image local quality map and an image evaluation parameter, and performs the following steps:

In one embodiment, the processor 110, when executing the avatar generation method, further performs the steps of:

In one embodiment, the processor 110 performs the following steps in performing the determining the first model loss based on the sample image local quality map, the sample image profile parameter, the local quality map label, and the profile parameter label:

wherein the first loss calculation formula satisfies the following formula:

Wherein the second loss calculation satisfies the following formula:

In one embodiment, the processor 110 generates a first virtual three-dimensional avatar corresponding to the target object when executing the target two-dimensional image based on at least one reference angle of the target object, including:

In one embodiment, the processor 110, in performing the determining the image reconstruction supervisory signal, the image edge supervisory signal and the high frequency information supervisory signal based on the first projected sample image and the sample two-dimensional image, performs the steps of:

Wherein the third loss calculation satisfies the following formula:

wherein the fourth loss calculation satisfies the following formula:

Wherein the fifth loss calculation satisfies the following formula:

In one embodiment, the processor 110, when executing the determining the target avatar corresponding to the target object based on the avatar evaluation information and the first avatar, performs the following steps:

Determining image evaluation parameters in the image evaluation information;

In one embodiment, the processor 110 performs the following steps when performing the local avatar adjustment process on the first avatar to obtain a second avatar:

In one embodiment, the processor 110 performs the image local adjustment process on the first avatar based on the image local quality map to obtain a second avatar, including:

In one embodiment, the processor 110 performs the following steps when performing the loss maximization detection processing on the second sample image local quality map and the second sample image profile evaluation parameter to obtain a maximization detection result:

Wherein the sixth loss calculation satisfies the following formula:

Lossrefine＝maximun(q)+maximun(M)

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, object features, interactive behavior features, user information, and the like referred to in this specification are all acquired with sufficient authorization.

The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.

Claims

1. A character generation method, the method comprising:

Generating a first virtual three-dimensional image corresponding to a target object based on a target two-dimensional image of at least one reference angle of the target object, wherein the target object is an entity object;

Acquiring a reference virtual three-dimensional image, and performing image quality evaluation processing on the first virtual three-dimensional image based on the reference virtual three-dimensional image and the first virtual three-dimensional image to obtain image evaluation information of the first virtual three-dimensional image, wherein the image evaluation information comprises an image local quality map and an image evaluation parameter;

And if the image evaluation parameter is smaller than the evaluation parameter threshold, performing image local adjustment processing on the first virtual three-dimensional image based on the image local quality map to obtain a second virtual three-dimensional image, and taking the second virtual three-dimensional image as a target virtual three-dimensional image of the target object.

2. The method of claim 1, the avatar evaluation information including an image local quality map and an image avatar evaluation parameter, the performing an avatar quality evaluation process on the first avatar based on the reference avatar and the first avatar to obtain the avatar evaluation parameter of the first avatar, comprising:

3. The method according to claim 2, wherein the performing the image difference comparison processing on the reference three-dimensional avatar and the first three-dimensional avatar to obtain an image local quality map and an image evaluating parameter includes:

4. A method according to claim 3, the method further comprising:

5. The method of claim 4, the determining a first model loss based on the sample image local quality spectrum, the sample image visual assessment parameter, the local quality spectrum tag, and the visual assessment parameter tag comprising:

wherein the first loss calculation formula satisfies the following formula:

Wherein the second loss calculation satisfies the following formula:

6. The method of claim 1, the generating the first avatar corresponding to the target object based on the target two-dimensional image of the target object at least one reference angle, comprising:

7. The method of claim 6, the method further comprising:

8. The method of claim 7, the determining the image reconstruction supervisory signal, the image edge supervisory signal, and high frequency information supervisory signals based on the first projected sample image and the sample two-dimensional image, comprising:

Wherein the third loss calculation satisfies the following formula:

wherein the fourth loss calculation satisfies the following formula:

Wherein the fifth loss calculation satisfies the following formula:

9. The method of claim 1, wherein performing the locally adjusting process on the first avatar to obtain the second avatar based on the locally quality map of the image comprises:

10. The method of claim 9, the method further comprising:

In the model training process, calling the sample image local quality map through the initial virtual image tuning model to perform image local adjustment processing on a target local sample area of the first virtual three-dimensional sample image to obtain a second virtual three-dimensional sample image, and calling a combined quality evaluation model to determine a second sample image local quality map and a second sample image evaluation parameter of the second virtual three-dimensional sample image;

11. The method according to claim 10, wherein the performing the loss maximization detection processing on the second sample image local quality map and the second sample image profile evaluation parameter to obtain the maximization detection result includes:

Wherein the sixth loss calculation satisfies the following formula:

Loss_refine＝maximun(q)+maximun(M)

12. An image generation apparatus, the apparatus comprising:

The image generation module is used for generating a first virtual three-dimensional image corresponding to a target object based on a target two-dimensional image of at least one reference angle of the target object, wherein the target object is an entity object;

the quality evaluation module is used for acquiring a reference virtual three-dimensional image, performing image quality evaluation processing on the first virtual three-dimensional image based on the reference virtual three-dimensional image and the first virtual three-dimensional image to obtain image evaluation information of the first virtual three-dimensional image, wherein the image evaluation information comprises an image local quality map and an image evaluation parameter;

and the image optimization module is used for taking the first virtual three-dimensional image as a target virtual three-dimensional image of the target object if the image evaluation parameter is greater than or equal to the evaluation parameter threshold, and carrying out image local adjustment processing on the first virtual three-dimensional image based on the image local quality map to obtain a second virtual three-dimensional image if the image evaluation parameter is less than the evaluation parameter threshold, and taking the second virtual three-dimensional image as the target virtual three-dimensional image of the target object.

13. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 11.

14. A computer program product storing at least one instruction for loading by a processor and performing the method steps of any one of claims 1 to 11.

15. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-11.