CN118096498B

CN118096498B - Image generation method and electronic device

Info

Publication number: CN118096498B
Application number: CN202410365189.8A
Authority: CN
Inventors: 程志华; 董云鹏
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2024-03-28
Filing date: 2024-03-28
Publication date: 2024-09-06
Anticipated expiration: 2044-03-28
Also published as: CN118096498A

Abstract

An image generation method and an electronic device relate to the technical field of neural network models. In a first time interval, the electronic device uses the second feature vector of the image data to be processed as input, runs the control module, and obtains the first feature map of the image data to be processed. In a second time interval, the electronic device uses the first feature vector of the demand text corresponding to the image data to be processed as input, runs the first encoder, and obtains the potential code of the demand text. There is an overlapping time interval between the second time interval and the first time interval. The electronic device uses the potential code and the first feature map as input, runs the first decoder, and generates target image data. In this way, the electronic device can run the control module and the first encoder in parallel, shortening the time consumption of the process of the electronic device generating the target image data and improving the generation efficiency of the target image data.

Description

Image generation method and electronic device

技术领域Technical Field

本申请实施例涉及神经网络模型，尤其涉及一种图像生成方法及电子设备。The embodiments of the present application relate to a neural network model, and more particularly to an image generation method and an electronic device.

背景技术Background Art

人工智能生成内容(AI Generated Content，AIGC)技术可以用于生成文字、图像或视频。在绘画、设计及影视等领域内，采用AIGC技术，可以基于用户输入的需求自动生成图像数据，减少人工生产成本。AI Generated Content (AIGC) technology can be used to generate text, images or videos. In the fields of painting, design, film and television, AIGC technology can automatically generate image data based on user input requirements, reducing manual production costs.

现有技术中，采用AIGC技术生成图像数据（也称AI绘画）需要耗费大量的时间，无法满足用户的高效率需求。In the existing technology, the use of AIGC technology to generate image data (also known as AI painting) takes a lot of time and cannot meet the user's high efficiency needs.

发明内容Summary of the invention

本申请实施例提供一种图像生成方法及电子设备，电子设备并行运行控制网络中的控制模块和SD模型中U型网络的编码器，缩短了电子设备采用控制网络生成图像的耗时，提高电子设备生成图像的效率，满足用户的高效率需求。An embodiment of the present application provides an image generation method and an electronic device, wherein the electronic device runs a control module in a control network and an encoder of a U-type network in an SD model in parallel, thereby shortening the time consumed by the electronic device in generating images using the control network, improving the efficiency of the electronic device in generating images, and meeting the user's high-efficiency requirements.

为达到上述目的，本申请的实施例采用如下技术方案：To achieve the above objectives, the embodiments of the present application adopt the following technical solutions:

第一方面，本申请实施例提供了一种图像生成方法，该方法应用于电子设备，电子设备中配置有控制模块和稳定扩散SD模型，SD模型包括第一编码器和第一解码器。In a first aspect, an embodiment of the present application provides an image generation method, which is applied to an electronic device, wherein a control module and a stable diffusion SD model are configured in the electronic device, and the SD model includes a first encoder and a first decoder.

图像生成方法包括：在第一时间区间，电子设备将待处理图像数据的第二特征向量作为输入，运行控制模块，得到待处理图像数据的第一特征图。在第二时间区间，电子设备将待处理图像数据对应的需求文本的第一特征向量作为输入，运行SD模型的第一编码器，得到需求文本的潜在编码。需求文本可以用于描述目标图像数据的内容需求和/或质量需求，第二时间区间和第一时间区间存在重叠的时间区间。这样，电子设备在重叠的时间区间内可以并行运行控制模块和第一编码器。然后，电子设备将潜在编码和第一特征图作为输入，运行SD模型的第一解码器，生成目标图像数据。例如，需求文本可以包括“一张学生毕业照”指示目标图像数据的内容，或者需求文本可以包括“一张清晰的学生毕业照”指示目标图像数据的内容和质量。The image generation method includes: in a first time interval, the electronic device uses the second feature vector of the image data to be processed as input, runs the control module, and obtains the first feature map of the image data to be processed. In a second time interval, the electronic device uses the first feature vector of the requirement text corresponding to the image data to be processed as input, runs the first encoder of the SD model, and obtains the potential code of the requirement text. The requirement text can be used to describe the content requirements and/or quality requirements of the target image data, and there is an overlapping time interval between the second time interval and the first time interval. In this way, the electronic device can run the control module and the first encoder in parallel in the overlapping time interval. Then, the electronic device uses the potential code and the first feature map as input, runs the first decoder of the SD model, and generates the target image data. For example, the requirement text may include "a student graduation photo" to indicate the content of the target image data, or the requirement text may include "a clear student graduation photo" to indicate the content and quality of the target image data.

其中，待处理图像数据包括需经过控制模块和SD模型进行处理的图像数据，例如，待处理图像数据可以是电子设备中摄像头采集的初始图像数据，或者用户从电子设备的图库中选取的图像数据，还或者电子设备从第三方平台中获取（或接收）经授权的图像数据。第三方平台包括但不限于与电子设备通信连接的其他电子设备。The image data to be processed includes image data that needs to be processed by the control module and the SD model. For example, the image data to be processed may be the initial image data collected by the camera in the electronic device, or the image data selected by the user from the gallery of the electronic device, or the authorized image data obtained (or received) by the electronic device from a third-party platform. The third-party platform includes but is not limited to other electronic devices that are connected to the electronic device for communication.

待处理图像数据与需求文本为一组输入数据，待处理图像数据和与待处理图像数据对应的需求文本均为电子设备执行图像数据生成方法的过程中使用的输入数据。电子设备中还包括文本编码器和/或变分自动VAE编码器，电子设备将需求文本作为输入，运行文本编码器，可以得到需求文本的第一特征向量。电子设备将待处理图像数据作为输入，运行VAE编码器，可以得到待处理图像数据的第二特征向量。The image data to be processed and the required text are a set of input data. The image data to be processed and the required text corresponding to the image data to be processed are both input data used in the process of the electronic device executing the image data generation method. The electronic device also includes a text encoder and/or a variational automatic VAE encoder. The electronic device takes the required text as input and runs the text encoder to obtain a first feature vector of the required text. The electronic device takes the image data to be processed as input and runs the VAE encoder to obtain a second feature vector of the image data to be processed.

在一种可能的实现方式中，电子设备将需求文本的第一特征向量和待处理图像数据的第二特征向量一同作为输入，运行控制模块，得到待处理图像数据的第一特征图。这样，第一特征图中可以包括需求文本中携带的内容，使得第一特征图中携带的图像特征信息更加丰富。In a possible implementation, the electronic device takes the first feature vector of the demand text and the second feature vector of the image data to be processed as input, runs the control module, and obtains the first feature map of the image data to be processed. In this way, the first feature map can include the content carried in the demand text, so that the image feature information carried in the first feature map is richer.

采用上述方法，电子设备可以在一定的重叠时间区间内同时运行控制模块和第一编码器，这样，相比电子设备顺次运行控制模块和第一编码器，可以减少电子设备运行控制模块和第一编码器的过程的耗时，从而缩短了电子设备生成目标图像数据的耗时，提高目标图像数据的生成效率。By adopting the above method, the electronic device can simultaneously run the control module and the first encoder within a certain overlapping time interval. In this way, compared with the electronic device running the control module and the first encoder sequentially, the time consumed by the electronic device to run the control module and the first encoder can be reduced, thereby shortening the time consumed by the electronic device to generate target image data and improving the generation efficiency of target image data.

在第一方面的一种可能的实现方式中，SD模型包括U型网络和VAE解码器，U型网络包括编码器和解码器。第一解码器包括第一子解码器和第二子解码器。第一编码器可以指示U型网络的编码器，第一子解码器可以指示SD模型中U型网络的解码器，第二子解码器可以指示SD模型中的VAE解码器。In a possible implementation of the first aspect, the SD model includes a U-type network and a VAE decoder, and the U-type network includes an encoder and a decoder. The first decoder includes a first sub-decoder and a second sub-decoder. The first encoder may indicate an encoder of the U-type network, the first sub-decoder may indicate a decoder of the U-type network in the SD model, and the second sub-decoder may indicate a VAE decoder in the SD model.

具体地，电子设备将潜在编码和第一特征图作为输入，运行第一子解码器，得到第二特征图。电子设备将第二特征图作为输入，运行第二子解码器，生成目标图像数据。Specifically, the electronic device takes the potential code and the first feature map as input, runs the first sub-decoder to obtain the second feature map, and takes the second feature map as input, runs the second sub-decoder to generate target image data.

在第一方面的一种可能的实现方式中，电子设备将待处理图像数据对应的需求文本的第一特征向量作为输入，运行第一编码器，得到需求文本的潜在编码之后，方法还包括：电子设备获取标志位参数，标志位参数用于指示控制模块的运行状态。其中，运行状态包括运行结束和未运行结束。这样，电子设备可以在第一编码器结束之后，根据标志位参数确定出控制模块的运行状态。响应于标志位参数指示控制模块的运行状态为运行结束，电子设备可以将第一编码器输出的潜在编码以及控制模块输出的第一特征图作为输入数据，运行第一解码器，生成目标图像数据。In a possible implementation of the first aspect, the electronic device takes the first feature vector of the requirement text corresponding to the image data to be processed as input, runs the first encoder, and obtains the potential code of the requirement text. The method also includes: the electronic device obtains a flag parameter, and the flag parameter is used to indicate the operating state of the control module. Among them, the operating state includes the end of operation and the end of operation. In this way, the electronic device can determine the operating state of the control module according to the flag parameter after the first encoder ends. In response to the flag parameter indicating that the operating state of the control module is the end of operation, the electronic device can use the potential code output by the first encoder and the first feature map output by the control module as input data, run the first decoder, and generate target image data.

采用上述实现方式，电子设备可以在第一编码器结束之后获取标志位参数，如果标志位参数指示控制模块的运行状态为运行结束，电子设备可以运行第一解码器来对第一编码器输出的潜在编码和控制模块输出的第一特征图进行处理，得到第一解码器输出的目标图像数据。这样，电子设备仅可以在第一编码器和控制模块都运行结束后运行第一解码器，以使第一解码器可以正确接收潜在编码和第一特征图并进行处理，从而得到目标图像数据。With the above implementation, the electronic device can obtain the flag parameter after the first encoder is finished. If the flag parameter indicates that the operation state of the control module is finished, the electronic device can run the first decoder to process the potential code output by the first encoder and the first feature map output by the control module to obtain the target image data output by the first decoder. In this way, the electronic device can only run the first decoder after the first encoder and the control module have finished running, so that the first decoder can correctly receive the potential code and the first feature map and process them, thereby obtaining the target image data.

在一种可能的实现方式中，标志位参数可以存储在电子设备的内存中，这样电子设备可以在第一编码器结束之后，随时从内存中获取标志位参数。可以理解的是，在每次执行图像生成方法的流程之前，电子设备内存中存储的标志位参数为指示控制模块的运行状态为未运行结束，仅在控制模块运行结束后，标志位参数才会更新为指示控制模块的运行状态为运行结束，便于电子设备及时的、准确的确定控制模块的运行状态。In a possible implementation, the flag parameter can be stored in the memory of the electronic device, so that the electronic device can obtain the flag parameter from the memory at any time after the first encoder is finished. It can be understood that before each execution of the process of the image generation method, the flag parameter stored in the memory of the electronic device indicates that the operation status of the control module is not completed, and only after the control module is completed, the flag parameter will be updated to indicate that the operation status of the control module is completed, so that the electronic device can determine the operation status of the control module in a timely and accurate manner.

在第一方面的一种可能的实现方式中，电子设备可以周期性获取标志位参数，电子设备每次获取到标志位参数之后，方法还包括：响应于标志位参数指示控制模块的运行状态为未运行结束，则电子设备不运行第一解码器，这时候电子设备需要等待控制模块的运行结束。响应于标志位参数指示控制模块的运行状态为运行结束，电子设备将潜在编码和第一特征图作为输入，运行第一解码器，生成目标图像数据。In a possible implementation of the first aspect, the electronic device may periodically obtain a flag parameter, and each time the electronic device obtains the flag parameter, the method further includes: in response to the flag parameter indicating that the operation state of the control module is not completed, the electronic device does not run the first decoder, and the electronic device needs to wait for the operation of the control module to end. In response to the flag parameter indicating that the operation state of the control module is completed, the electronic device uses the potential code and the first feature map as input, runs the first decoder, and generates target image data.

可以理解的是，电子设备在获取的标志位参数指示控制模块的运行状态为运行结束时，电子设备可以不再继续获取标志位参数。It is understandable that when the flag bit parameter obtained by the electronic device indicates that the operation status of the control module is the end of operation, the electronic device may no longer continue to obtain the flag bit parameter.

具体地，一般情况下，SD模型中的第一编码器的运行耗时大于控制模块的运行耗时。因此，电子设备在第一次获取标志位参数的时候，标志位参数已经指示控制模块的运行状态为运行结束，这种情况下，电子设备也可以不再继续获取标志位参数，电子设备可以将潜在编码和第一特征图作为输入，运行第一解码器，生成目标图像数据。Specifically, in general, the operation time of the first encoder in the SD model is greater than the operation time of the control module. Therefore, when the electronic device obtains the flag parameter for the first time, the flag parameter has indicated that the operation state of the control module is the end of operation. In this case, the electronic device may no longer continue to obtain the flag parameter, and the electronic device may use the potential code and the first feature map as input, run the first decoder, and generate target image data.

采用上述实现方式，电子设备周期性的获取标志参数，可以根据标志位参数确定控制模块的状态，及时在控制模块的运行状态更新为运行结束的时候运行第一解码器，对潜在编码和第一特征图进行处理，得到目标图像数据。而在控制模块的运行状态为未运行结束的时候，电子设备不运行第一解码器。这样，第一解码器可以在正确接收第一编码器输出的潜在编码和控制模块输出的第一特征图之后再运行。By adopting the above implementation, the electronic device periodically obtains the flag parameter, can determine the state of the control module according to the flag parameter, and promptly runs the first decoder when the operation state of the control module is updated to the end of operation, processes the potential code and the first feature map, and obtains the target image data. When the operation state of the control module is not yet completed, the electronic device does not run the first decoder. In this way, the first decoder can run after correctly receiving the potential code output by the first encoder and the first feature map output by the control module.

在第一方面的一种可能的实现方式中，在第一时间区间之前，电子设备中标志位参数指示控制模块的运行状态为未运行结束。这样，在控制模块运行结束之前，标志位参数的指示信息可以准确指示控制模块还未运行结束，电子设备不会运行第一解码器。并且，在电子设备运行控制模块，得到待处理图像数据的第一特征图之后，电子设备可以确定控制模块运行结束。这时，电子设备可以将标志位参数更新为指示控制模块的运行状态为运行结束。In a possible implementation of the first aspect, before the first time interval, the flag parameter in the electronic device indicates that the operation state of the control module is not completed. In this way, before the control module ends, the indication information of the flag parameter can accurately indicate that the control module has not ended, and the electronic device will not run the first decoder. In addition, after the electronic device runs the control module and obtains the first feature map of the image data to be processed, the electronic device can determine that the control module has ended. At this time, the electronic device can update the flag parameter to indicate that the operation state of the control module is ended.

采用上述实现方式，电子设备中的标志位参数仅在控制模块运行结束之后才会更新为运行结束。这样，在第一编码器运行结束之后，电子设备仅在控制模块运行结束的时候才会运行第一解码器，避免因标志位参数指示错误导致电子设备中第一解码器无法准确接收到控制模块输出的第一特征图。也就是说，采用上述方式可以使第一解码器在正确接收到控制模块输出的第一特征图之后再运行。By adopting the above implementation, the flag parameter in the electronic device will be updated to the end of operation only after the control module ends. In this way, after the first encoder ends, the electronic device will only run the first decoder when the control module ends, avoiding the first decoder in the electronic device from being unable to accurately receive the first characteristic map output by the control module due to an error in the indication of the flag parameter. In other words, by adopting the above method, the first decoder can be operated after correctly receiving the first characteristic map output by the control module.

在第一方面的一种可能的实现方式中，电子设备在运行第一解码器，生成目标图像数据之后，方法还包括：电子设备将标志位参数更新为指示控制模块的运行状态为未运行结束。In a possible implementation manner of the first aspect, after the electronic device runs the first decoder to generate target image data, the method further includes: the electronic device updates the flag bit parameter to indicate that the operating state of the control module is not completed.

采用上述实现方式，电子设备可以目标图像数据生成之后，及时更新标志位参数，使标志位参数指示控制模块的运行状态为未运行结束。避免电子设备再次读取标志位参数的时候，如果控制模块未运行结束，电子设备读取到标志位参数指示控制模块的运行状态为运行结束的错误情况。By adopting the above implementation, the electronic device can update the flag parameter in time after the target image data is generated, so that the flag parameter indicates that the operation status of the control module is not completed. This avoids the error situation that when the electronic device reads the flag parameter again, if the control module has not completed the operation, the electronic device reads the flag parameter indicating that the operation status of the control module is completed.

在第一方面的一种可能的实现方式中，在第一时间区间，电子设备将待处理图像数据的第二特征向量作为输入，运行控制模块，得到待处理图像数据的第一特征图之前，方法还包括：电子设备在前台运行电子设备中内的相机应用，电子设备的摄像头采集待处理图像数据。这样，在第一时间区间，电子设备可以将摄像头采集的待处理图像数据的第二特征向量作为输入，运行控制模块，得到待处理图像数据的第一特征图。在第二时间区间，电子设备将待处理图像数据对应的需求文本的第一特征向量作为输入，运行第一编码器，得到需求文本的潜在编码。电子设备将潜在编码和第一特征图作为输入，运行第一解码器，生成目标图像数据。然后，电子设备显示第一界面，第一界面包括拍摄预览区域，拍摄预览区域包括目标图像数据。In a possible implementation of the first aspect, in a first time interval, the electronic device uses the second feature vector of the image data to be processed as input, runs the control module, and obtains the first feature map of the image data to be processed. The method also includes: the electronic device runs a camera application in the electronic device in the foreground, and the camera of the electronic device collects the image data to be processed. In this way, in the first time interval, the electronic device can use the second feature vector of the image data to be processed collected by the camera as input, run the control module, and obtain the first feature map of the image data to be processed. In the second time interval, the electronic device uses the first feature vector of the demand text corresponding to the image data to be processed as input, runs the first encoder, and obtains the potential code of the demand text. The electronic device uses the potential code and the first feature map as input, runs the first decoder, and generates the target image data. Then, the electronic device displays a first interface, and the first interface includes a shooting preview area, and the shooting preview area includes the target image data.

采用上述实现方式，电子设备在前台运行相机应用时，摄像头采集待处理图像。电子设备可以采用控制模块和SD模型对摄像头采集的待处理图像进行处理，生成与待处理图像数据和需求文本对应的目标图像数据，并通过第一界面显示。这样，电子设备可以通过并行运行控制模块和第一编码器，提高目标图像数据的生成效率，从而实时在相机应用的拍摄预览显示界面中显示目标图像数据，进而提高了电子设备拍摄的效率。With the above implementation, when the electronic device runs the camera application in the foreground, the camera collects the image to be processed. The electronic device can use the control module and the SD model to process the image to be processed collected by the camera, generate target image data corresponding to the image data to be processed and the required text, and display it through the first interface. In this way, the electronic device can improve the efficiency of generating the target image data by running the control module and the first encoder in parallel, so as to display the target image data in real time in the shooting preview display interface of the camera application, thereby improving the efficiency of shooting of the electronic device.

在第一方面的一种可能的实现方式中，电子设备包括多个目标文本以及与多个目标文本一一对应的多个目标特征向量。其中，多个目标文本可以和多个目标特征向量可以存储在电子设备的内存中。与每个目标文本对应的目标特征向量是电子设备运行文本编码器，分别对每个目标文本进行特征提取处理之后得到的特征向量。In a possible implementation of the first aspect, the electronic device includes a plurality of target texts and a plurality of target feature vectors corresponding to the plurality of target texts. The plurality of target texts and the plurality of target feature vectors can be stored in a memory of the electronic device. The target feature vector corresponding to each target text is a feature vector obtained after the electronic device runs a text encoder and performs feature extraction processing on each target text.

在第二时间区间，电子设备将待处理图像数据对应的需求文本的第一特征向量作为输入，运行第一编码器，得到需求文本的潜在编码之前，方法还包括：电子设备接收到用户的设置操作，可以从多个目标文本中确定与用户的设置操作所匹配的需求文本，以及与该需求文本对应的第一特征向量。其中，用户的设置操作可以包括用户在电子设备中相机应用的显示界面通过点击、滑动、拖拽等方式设置的目标图像数据的内容需求和/或质量需求。或者，用户的设置操作还包括电子设备在前台运行相机应用之前，用户在电子设备的多个目标文本选项中选中的对目标图像数据的内容需求或质量需求的选项。In the second time interval, the electronic device uses the first feature vector of the requirement text corresponding to the image data to be processed as input, runs the first encoder, and obtains the potential encoding of the requirement text. The method also includes: the electronic device receives the user's setting operation, and can determine the requirement text that matches the user's setting operation from multiple target texts, as well as the first feature vector corresponding to the requirement text. Among them, the user's setting operation may include the content requirements and/or quality requirements of the target image data set by the user through clicking, sliding, dragging, etc. on the display interface of the camera application in the electronic device. Alternatively, the user's setting operation also includes the option of the content requirements or quality requirements for the target image data selected by the user from multiple target text options of the electronic device before the electronic device runs the camera application in the foreground.

示例性的，多个目标文本包括“美颜”、“美白”、“瘦脸”、“祛斑”等文本，则电子设备中存储了 “美白”、“瘦脸”、“清晰、明亮”等文本，与“美白”对应的特征向量，与“瘦脸”对应的特征向量以及与“清晰明亮”对应的特征向量。如果电子设备接收到用户设置操作为在相机应用的显示界面点击选择“瘦脸”选项，则用户的需求为“瘦脸”，电子设备可以将目标文本中的“瘦脸”作为需求文本，将与“瘦脸”对应的特征向量作为第一特征向量。Exemplarily, the multiple target texts include texts such as "beautify", "whiten", "thin face", and "remove freckles". Then the electronic device stores texts such as "whiten", "thin face", "clear, bright", and the feature vector corresponding to "whiten", the feature vector corresponding to "thin face", and the feature vector corresponding to "clear and bright". If the electronic device receives a user setting operation of clicking and selecting the "thin face" option on the display interface of the camera application, and the user's demand is "thin face", the electronic device can use "thin face" in the target text as the demand text and the feature vector corresponding to "thin face" as the first feature vector.

采用上述实现方式，电子设备预先存储有多个目标文本和与多个目标文本一一对应的多个特征向量，电子设备在摄像头采集到图像数据后，可以直接根据用户设置操作确定出需求文本及对应的第一特征向量，而无需文本编码器对需求文本进行特征提取处理，缩短了目标图像数据的生成耗时，提高了电子设备生成目标图像数据的效率。Using the above implementation method, the electronic device pre-stores multiple target texts and multiple feature vectors corresponding to the multiple target texts. After the camera captures the image data, the electronic device can directly determine the required text and the corresponding first feature vector according to the user setting operation, without the need for a text encoder to perform feature extraction processing on the required text, thereby shortening the time for generating target image data and improving the efficiency of the electronic device in generating target image data.

在一种可能的实现方式中，电子设备中多个目标文本还包括默认需求文本，与多个目标文本一一对应的多个目标特征向量也包括与默认需求文本对应的特征向量。在电子设备没有接收到用户的设置操作时，电子设备可以将多个目标文本中的默认需求文本作为需求文本，将默认需求文本对应的特征向量作为第一特征向量。In a possible implementation, the multiple target texts in the electronic device also include a default requirement text, and the multiple target feature vectors corresponding to the multiple target texts also include a feature vector corresponding to the default requirement text. When the electronic device does not receive a setting operation from the user, the electronic device may use the default requirement text in the multiple target texts as the requirement text, and use the feature vector corresponding to the default requirement text as the first feature vector.

示例性的，默认需求文本为“清晰、明亮”，则在电子设备没有接收到用户的设置操作时，电子设备可以将“清晰、明亮”作为需求文本，将与“清晰、明亮”对应的特征向量作为第一特征向量。Exemplarily, the default requirement text is "clear, bright". When the electronic device does not receive the user's setting operation, the electronic device can use "clear, bright" as the requirement text and the feature vector corresponding to "clear, bright" as the first feature vector.

采用上述实现方式，电子设备可以在没有接收到用户的设置操作时，直接获取默认需求文本以及与默认文本对应的第一特征向量，直接对摄像头采集的图像数据进行处理，得到目标图像数据，而无需重新获取需求文本，提高了目标图像数据的生成效率。By adopting the above implementation method, the electronic device can directly obtain the default requirement text and the first feature vector corresponding to the default text without receiving the user's setting operation, and directly process the image data collected by the camera to obtain the target image data without re-obtaining the requirement text, thereby improving the efficiency of generating the target image data.

在一种可能的视线方式中，控制模块和SD模型的输入数据还包括迭代次数需求。电子设备可以存储多个迭代次数需求，与多个迭代次数需求一一对应的迭代次数需求的特征向量。电子设备接收到用户的设置操作时，也可以从多个迭代次数需求中确定与用户的设置操作所匹配的迭代次数需求，以及与该迭代次数需求对应的第三特征向量。电子设备可以将第三特征向量和第二特征向量作为输入，运行控制模块，得到第一特征图。电子设备可以将第三特征限量和第一特征向量作为输入，运行第一编码器，得到潜在编码。In a possible line of sight mode, the input data of the control module and the SD model also include iteration number requirements. The electronic device can store multiple iteration number requirements and feature vectors of iteration number requirements corresponding to the multiple iteration number requirements. When the electronic device receives the user's setting operation, it can also determine the iteration number requirement that matches the user's setting operation from the multiple iteration number requirements, and the third feature vector corresponding to the iteration number requirement. The electronic device can use the third feature vector and the second feature vector as input, run the control module, and obtain the first feature map. The electronic device can use the third feature limit and the first feature vector as input, run the first encoder, and obtain the potential code.

在第一方面的一种可能的实现方式中，电子设备包括第一处理器、第二处理器及第三处理器，控制模块部署在第三处理器上，SD模型部署在第二处理器上。In a possible implementation manner of the first aspect, the electronic device includes a first processor, a second processor, and a third processor, the control module is deployed on the third processor, and the SD model is deployed on the second processor.

在第一时间区间之前，方法还包括：第一处理器生成第一控制指令和第二控制指令，第一控制指令包括待处理图像数据对应的需求文本的第一特征向量，第二控制指令包括待处理图像数据的第二特征向量，第一处理器向第二处理器发送第一控制指令，向第三处理器发送第二控制指令。Before the first time interval, the method also includes: the first processor generates a first control instruction and a second control instruction, the first control instruction includes a first feature vector of the requirement text corresponding to the image data to be processed, and the second control instruction includes a second feature vector of the image data to be processed, the first processor sends the first control instruction to the second processor, and sends the second control instruction to the third processor.

在第一时间区间，电子设备将待处理图像数据的第二特征向量作为输入，运行控制模块，得到待处理图像数据的第一特征图，包括：在第一时间区间，第三处理器基于第二控制指令，将第二特征向量作为输入，运行控制模块，得到待处理图像数据的第一特征图。In a first time interval, the electronic device uses the second feature vector of the image data to be processed as input, runs the control module, and obtains a first feature map of the image data to be processed, including: in the first time interval, the third processor uses the second feature vector as input based on the second control instruction, runs the control module, and obtains the first feature map of the image data to be processed.

在第二时间区间，电子设备将待处理图像数据对应的需求文本的第一特征向量作为输入，运行第一编码器，得到需求文本的潜在编码，包括：在第二时间区间，第二处理器基于第一控制指令，将第一特征向量作为输入，运行第一编码器，得到需求文本的潜在编码。In the second time interval, the electronic device takes the first feature vector of the required text corresponding to the image data to be processed as input, runs the first encoder, and obtains the latent code of the required text, including: in the second time interval, the second processor takes the first feature vector as input based on the first control instruction, runs the first encoder, and obtains the latent code of the required text.

电子设备将潜在编码和第一特征图作为输入，运行第一解码器，生成目标图像数据，包括：第二处理器获取第一特征图，将潜在编码和第一特征图作为输入，运行第一解码器，生成目标图像数据。其中，第一处理器可以包括CPU。第二处理器可以包括GPU，第三处理器可以包括NPU。或者，第二处理器可以包括NPU，第三处理器可以包括GPU。The electronic device uses the potential code and the first feature map as input, runs the first decoder, and generates target image data, including: the second processor obtains the first feature map, uses the potential code and the first feature map as input, runs the first decoder, and generates target image data. The first processor may include a CPU. The second processor may include a GPU, and the third processor may include an NPU. Alternatively, the second processor may include an NPU, and the third processor may include a GPU.

具体地，在第三处理器在控制模块运行结束之后，存储有第一特征图的情况下，第二处理器可以从第三处理器获取第一特征图。第三处理器在控制模块运行结束之后，也可以将第一特征图存储至内存中，这种情况下，第二处理器可以从内存中获取第一特征图。Specifically, when the third processor stores the first characteristic map after the control module is finished running, the second processor can obtain the first characteristic map from the third processor. After the third processor finishes running the control module, the first characteristic map can also be stored in the memory, in which case the second processor can obtain the first characteristic map from the memory.

采用上述实现方式，电子设备可以将控制模块和SD模型部署在不同的处理器上，通过第一处理器控制第二处理器和第三处理器的运行状态，可以实现第二处理器上SD模型中U型网络的编码器和第三处理器上控制模块的并行运行，减小电子设备生成目标图像数据的过程的耗时，提高电子设备生成目标图像数据的效率。By adopting the above implementation method, the electronic device can deploy the control module and the SD model on different processors, and control the operating status of the second processor and the third processor through the first processor, so as to realize the parallel operation of the encoder of the U-type network in the SD model on the second processor and the control module on the third processor, thereby reducing the time consumption of the process of the electronic device generating target image data and improving the efficiency of the electronic device in generating target image data.

在第一方面的一种可能的实现方式中，在得到待处理图像数据的第一特征图之后，方法还包括：响应于控制模块运行结束，第一处理器将标志位参数更新为指示控制模块的运行状态为运行结束。In a possible implementation of the first aspect, after obtaining the first feature map of the image data to be processed, the method further includes: in response to the control module running out of operation, the first processor updates the flag parameter to indicate that the operating state of the control module is running out of operation.

在得到需求文本的潜在编码之后，方法还包括：第二处理器获取标志位参数，标志位参数用于指示控制模块的运行状态，运行状态包括运行结束和未运行结束。After obtaining the latent code of the requirement text, the method further includes: a second processor obtains a flag parameter, the flag parameter is used to indicate the operation status of the control module, and the operation status includes operation completion and non-operation completion.

第二处理器获取第一特征图，将潜在编码和第一特征图作为输入，运行第一解码器，生成目标图像数据，包括：响应于标志位参数指示控制模块的运行状态为运行结束，第二处理器获取第一特征图，将潜在编码和第一特征图作为输入，运行第一解码器，生成目标图像数据。The second processor obtains the first feature map, takes the potential code and the first feature map as input, runs the first decoder, and generates target image data, including: in response to the flag parameter indicating that the operating status of the control module is ended, the second processor obtains the first feature map, takes the potential code and the first feature map as input, runs the first decoder, and generates target image data.

在生成目标图像数据之后，方法还包括：第一处理器将标志位参数更新为指示控制模块的运行状态为未运行结束。After generating the target image data, the method further includes: the first processor updates the flag parameter to indicate that the operation status of the control module is not completed.

具体地，电子设备中第一处理器在控制模块运行结束时，及时将标志位参数更新为指示控制模块的运行状态为未运行结束。第二处理器在第一编码器运行结束之后，获取标志位参数，标志位参数可以指示控制模块的运行状态，如果标志位参数指示控制模块的运行状态为未运行结束，则第二处理器不运行第一解码器。如果标志位参数指示控制模块的运行状态为运行结束，则第二处理器运行第一解码器。在第二处理器生成目标图像数据之后，第一处理器可以将标志位参数更新为指示控制模块的运行状态为未运行结束。Specifically, when the operation of the control module ends, the first processor in the electronic device promptly updates the flag parameter to indicate that the operation status of the control module is not completed. After the operation of the first encoder ends, the second processor obtains the flag parameter, and the flag parameter can indicate the operation status of the control module. If the flag parameter indicates that the operation status of the control module is not completed, the second processor does not run the first decoder. If the flag parameter indicates that the operation status of the control module is completed, the second processor runs the first decoder. After the second processor generates the target image data, the first processor can update the flag parameter to indicate that the operation status of the control module is not completed.

采用上述实现方式，第二处理器在SD模型中U型网络的编码器运行结束时，及时获取标志位参数确定出控制模块的运行状态，在标志位参数指示控制模块的运行状态为运行结束时，第二处理器及时响应，运行第一解码器，对控制模块输出的第一特征图和U型网络的编码器输出的潜在编码进行处理，得到目标图像数据。第二处理器在标志位参数指示控制模块的运行状态为未运行结束时，不运行第一解码器。使第一解码器在正确接收控制模块运行结束输出的第一特征图，以及U型网络的编码器输出的潜在编码之后再运行。With the above implementation, when the encoder of the U-type network in the SD model ends its operation, the second processor promptly obtains the flag parameter to determine the operation state of the control module. When the flag parameter indicates that the operation state of the control module is at the end of operation, the second processor promptly responds, runs the first decoder, processes the first feature map output by the control module and the potential code output by the encoder of the U-type network, and obtains the target image data. When the flag parameter indicates that the operation state of the control module is not at the end of operation, the second processor does not run the first decoder. The first decoder is allowed to run after correctly receiving the first feature map output by the control module at the end of operation and the potential code output by the encoder of the U-type network.

第二方面，本申请实施例提供一种电子设备，电子设备包括存储器和一个或多个处理器，存储器与处理器耦合；存储器中存储有计算机程序代码，计算机程序代码包括计算机指令，当计算机指令被处理器执行时，使得电子设备执行上述第一方面及其任一种可能的设计方式中的方法。In a second aspect, an embodiment of the present application provides an electronic device, which includes a memory and one or more processors, and the memory is coupled to the processor; computer program code is stored in the memory, and the computer program code includes computer instructions. When the computer instructions are executed by the processor, the electronic device executes the method in the above-mentioned first aspect and any possible design method thereof.

具体地，上述一个或多个处理器可以是第一处理器、第二处理器和第三处理器。Specifically, the one or more processors may be a first processor, a second processor, and a third processor.

第三方面，本申请实施例还提供了一种计算机可读存储介质，其上存储有计算机指令，当所述计算机指令在电子设备上运行时，使得所述电子设备执行如上述第一方面及其任一种可能的设计方式中的方法。In a third aspect, an embodiment of the present application further provides a computer-readable storage medium having computer instructions stored thereon. When the computer instructions are executed on an electronic device, the electronic device executes the method in the first aspect and any possible design thereof.

第四方面，本申请实施例还提供了一种计算机程序产品，包括计算机指令，当计算机程序产品在电子设备上运行时，使得电子设备执行如上述第一方面及其任一种可能的设计方式中的方法。In a fourth aspect, an embodiment of the present application further provides a computer program product, including computer instructions, which, when the computer program product runs on an electronic device, enables the electronic device to execute the method in the first aspect and any possible design thereof.

可以理解地，上述提供的电子设备，计算机可读存储介质，计算机程序产品所能达到的有益效果，可参考第一方面及其任一种可能的设计方式中的有益效果，此处不再赘述。It can be understood that the beneficial effects that can be achieved by the electronic device, computer-readable storage medium, and computer program product provided above can refer to the beneficial effects in the first aspect and any possible design method thereof, and will not be repeated here.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请实施例提供的一种AI绘画的场景示意图；FIG1 is a schematic diagram of an AI painting scene provided in an embodiment of the present application;

图2为本申请实施例提供的一种图像生成场景示意图之一；FIG2 is one of the schematic diagrams of an image generation scenario provided in an embodiment of the present application;

图3为本申请实施例提供的一种控制网络的流程示意图之一；FIG3 is one of the flowcharts of a control network provided in an embodiment of the present application;

图4为本申请实施例提供的一种控制网络的流程示意图之二；FIG4 is a second schematic diagram of a flow chart of a control network provided in an embodiment of the present application;

图5为本申请实施例提供的一种图像生成场景示意图之二；FIG5 is a second schematic diagram of an image generation scenario provided in an embodiment of the present application;

图6为本申请实施例提供的一种控制网络的使用原理图；FIG6 is a schematic diagram of a control network provided in an embodiment of the present application;

图7为本申请实施例提供的一种控制网络的结构示意图；FIG7 is a schematic diagram of the structure of a control network provided in an embodiment of the present application;

图8为本申请实施例提供的一种电子设备的硬件结构图；FIG8 is a hardware structure diagram of an electronic device provided in an embodiment of the present application;

图9为本申请实施例提供的一种电子设备的软件架构图；FIG9 is a software architecture diagram of an electronic device provided in an embodiment of the present application;

图10为本申请实施例提供的一种图像生成方法的实现交互图。FIG. 10 is an implementation interaction diagram of an image generation method provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行描述。其中，在本申请的描述中，除非另有说明，“/”表示或的意思，例如，A/B可以表示A或B；本文中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。The technical solution in the embodiment of the present application will be described below in conjunction with the drawings in the embodiment of the present application. In the description of the present application, unless otherwise specified, "/" means or, for example, A/B can mean A or B; "and/or" in this article is only a description of the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.

另外，为了便于清楚描述本申请实施例的技术方案，在本申请的实施例中，采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样仅用于描述目的，并不对数量和执行次序进行限定，也不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中，除非另有说明，“多个”的含义是两个或两个以上。并且“第一”、“第二”等字样也并不限定一定不同。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中，除非另有说明，“至少一个”是指一个或多个，“多个”的含义是两个或两个以上。In addition, in order to facilitate the clear description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first", "second" and the like are used to distinguish the same items or similar items with substantially the same functions and effects. Those skilled in the art can understand that the words "first", "second" and the like are only used for descriptive purposes, and do not limit the quantity and execution order, nor can they be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the present embodiment, unless otherwise specified, the meaning of "multiple" is two or more. And the words "first", "second" and the like are not necessarily different. Thus, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the present embodiment, unless otherwise specified, "at least one" means one or more, and the meaning of "multiple" is two or more.

在本申请实施例中，“示例性地”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性地”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言，使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, the words "exemplarily" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplarily" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplarily" or "for example" is intended to present related concepts in a specific way.

人工智能生成内容(AI Generated Content，AIGC)技术是一种利用人工智能生成与用户的需求相关内容的技术，它可以根据输入数据中包含的信息生成需求相关的内容。其中，AIGC技术生成内容的类型包括但不限于文本、图像或视频。示例性的，用户可以输入简短的需求文本，AIGC技术对需求文本进行分析，从而生成与需求文本对应的图片。例如，用户输入“输出一张女性人物图像”，人工智能生成内容技术可以输出一张包含女性人物的图像。AI Generated Content (AIGC) technology is a technology that uses artificial intelligence to generate content related to user needs. It can generate content related to needs based on the information contained in the input data. Among them, the types of content generated by AIGC technology include but are not limited to text, images or videos. Exemplarily, the user can enter a short demand text, and AIGC technology analyzes the demand text to generate a picture corresponding to the demand text. For example, the user enters "output an image of a female character", and the AI generated content technology can output an image containing a female character.

需要说明的是，AIGC技术实际上的输入数据只包括文本类型。在输入数据的类型为非文本类型（如图片、音频或者视频）时，文字提取模型可以从图片、音频或视频类型的输入数据中提取需求文本并输入AIGC技术， AIGC技术可以对需求文本进行分析并生成与需求文本对应的图片。例如，输入数据的类型为图片时，文字提取模型可以从图片中提取文字“一张女性人物图像”作为需求文本，AIGC技术对“一张女性人物图像”进行分析后输出一张包含女性人物的图像数据。It should be noted that the actual input data of AIGC technology only includes text type. When the input data type is non-text type (such as picture, audio or video), the text extraction model can extract the required text from the input data of picture, audio or video type and input it into AIGC technology. AIGC technology can analyze the required text and generate a picture corresponding to the required text. For example, when the input data type is a picture, the text extraction model can extract the text "a female character image" from the picture as the required text. AIGC technology analyzes "a female character image" and outputs an image data containing a female character.

AIGC技术生成图像数据的过程也可以被称为AI绘画。AI绘画通常使用机器学习模型来实现，机器学习模型包括稳定扩散（Stable Diffusion，SD）模型。SD模型包括有文本编码器（Text Encoder）、U型网络（Unet）和变分自动解码器（VAE解码器）。SD模型的输入数据包括需求文本。其中，文本编码器可以对需求文本进行特征提取处理，得到与需求文本对应的特征向量1。Unet用于对特征向量1进行迭代降噪处理，进一步提取特征得到需求文本对应的特征向量2。VAE解码器用于将特征向量2转换成图像数据1。这样，SD模型可以输出与需求文本对应的图像数据1。The process of generating image data by AIGC technology can also be called AI painting. AI painting is usually implemented using machine learning models, including stable diffusion (SD) models. The SD model includes a text encoder, a U-type network (Unet), and a variational automatic decoder (VAE decoder). The input data of the SD model includes the demand text. Among them, the text encoder can perform feature extraction processing on the demand text to obtain a feature vector 1 corresponding to the demand text. Unet is used to perform iterative noise reduction processing on the feature vector 1, and further extract features to obtain a feature vector 2 corresponding to the demand text. The VAE decoder is used to convert the feature vector 2 into image data 1. In this way, the SD model can output image data 1 corresponding to the demand text.

参见图1，用户将需求文本 “绘制一张学生毕业照”输入到SD模型中，文本编码器对需求文本“绘制一张学生毕业照”进行特征提取处理并输出特征向量1。U型网络对特征向量1进行迭代降噪处理，得到特征向量2。VAE解码器对特征向量2进行处理，得到一张包含学生毕业情景的图像数据101。Referring to FIG1 , the user inputs the requirement text “Draw a student graduation photo” into the SD model, and the text encoder performs feature extraction processing on the requirement text “Draw a student graduation photo” and outputs feature vector 1. The U-shaped network performs iterative noise reduction processing on feature vector 1 to obtain feature vector 2. The VAE decoder processes feature vector 2 to obtain image data 101 containing a student graduation scene.

然而，SD模型生成图像数据1是建立在提取需求文本的特征向量1的基础上的，而需求文本只能指示图像数据1应包含的内容，而无法指示图像特征（例如，三通道RGB值，纹理等）。也就是说，虽然SD模型生成的图像数据1在内容层面上可以接近需求文本中包含的信息，但图像数据1的质量不高，无法与用户需求完全匹配。继续沿用上述图1的举例，图像数据101包含了需求文本“绘制一张学生毕业照”中“学生”特征的相关内容（如图1中包含的人物图像），但图像数据101中不包括“毕业”相关的信息，与需求文本“绘制一张学生毕业照”中的“毕业”特征不匹配。However, the generation of image data 1 by the SD model is based on the extraction of feature vector 1 of the requirement text, and the requirement text can only indicate the content that image data 1 should contain, but cannot indicate image features (for example, three-channel RGB values, texture, etc.). In other words, although the image data 1 generated by the SD model can be close to the information contained in the requirement text at the content level, the quality of image data 1 is not high and cannot fully match the user's needs. Continuing with the example of Figure 1 above, image data 101 contains relevant content of the "student" feature in the requirement text "Draw a student graduation photo" (such as the character image contained in Figure 1), but image data 101 does not include information related to "graduation" and does not match the "graduation" feature in the requirement text "Draw a student graduation photo".

针对上述问题，在一些实现方式中提出了一种控制网络（ControlNet）。ControlNet是在SD模型架构的基础上增加了变分自动（Variational AutoEncoders,VAE）编码器和控制（Control）模块。VAE编码器的输入数据是图像数据2，VAE编码器对图像数据2进行特征提取处理，得到特征向量3并输入Control模块中。Control模块可以对特征向量1和特征向量3进行下采样处理，得到特征图，上述特征图也可以称为特征向量4或第一特征图，并将特征向量4输入到ControlNet中的SD模型。ControlNet中的SD模型可以根据需求文本对特征向量4进行优化，从而得到与需求文本匹配的图像数据3。这样，用户可以控制Control模块的输入数据（即图像数据2）的内容来调整Control模块输出的特征向量4，从而调整ControlNet生成的图像数据3的内容。并且，ControlNet可以从图像数据2中提取图像特征信息，生成包含上述图像特征信息的图像数据3。这样，相比SD模型仅基于需求文本生成的图像数据1来说，ControlNet生成的图像数据3的质量更高。In view of the above problems, a control network (ControlNet) is proposed in some implementations. ControlNet adds a variational autoencoder (VAE) encoder and a control (Control) module on the basis of the SD model architecture. The input data of the VAE encoder is image data 2, and the VAE encoder performs feature extraction processing on image data 2 to obtain feature vector 3 and input it into the Control module. The Control module can downsample feature vector 1 and feature vector 3 to obtain a feature map, which can also be called feature vector 4 or a first feature map, and input feature vector 4 into the SD model in ControlNet. The SD model in ControlNet can optimize feature vector 4 according to the requirement text to obtain image data 3 that matches the requirement text. In this way, the user can control the content of the input data (i.e., image data 2) of the Control module to adjust the feature vector 4 output by the Control module, thereby adjusting the content of the image data 3 generated by ControlNet. In addition, ControlNet can extract image feature information from image data 2 to generate image data 3 containing the above image feature information. In this way, compared with the image data 1 generated by the SD model based only on the requirement text, the image data 3 generated by ControlNet has higher quality.

参见图2，用户将需求文本 “绘制一张学生毕业照”输入到控制网络中的SD模型，将图像数据201输入控制网络中的VAE编码器，SD模型中文本编码器输出的特征向量1和VAE编码器输出特征向量3一同输入控制模块中。控制模块将特征向量4输入到控制网络中的SD模型中。控制网络中的SD模型的U型网络基于“绘制一张学生毕业照”对应的特征向量1及特征向量4输出特征向量2， SD模型的VAE解码器基于特征向量2生成图像数据202。图像数据202中包括了图像数据201携带的图像特征信息（如图像数据201中包含的大学建筑物以及自然景物太阳、云朵等内容及纹理信息），也就是说，图像数据202包含了更多与需求文本中“毕业”特征的相关内容。相比图1中示出的图像数据101，图像数据202与需求文本 “绘制一张学生毕业照”更匹配。Referring to FIG. 2 , the user inputs the demand text “Draw a student graduation photo” into the SD model in the control network, inputs the image data 201 into the VAE encoder in the control network, and the feature vector 1 output by the text encoder in the SD model and the feature vector 3 output by the VAE encoder are input into the control module together. The control module inputs the feature vector 4 into the SD model in the control network. The U-shaped network of the SD model in the control network outputs the feature vector 2 based on the feature vector 1 and feature vector 4 corresponding to “Draw a student graduation photo”, and the VAE decoder of the SD model generates the image data 202 based on the feature vector 2. The image data 202 includes the image feature information carried by the image data 201 (such as the university buildings and the natural scenery such as the sun, clouds, etc. and the texture information contained in the image data 201), that is, the image data 202 contains more content related to the “graduation” feature in the demand text. Compared with the image data 101 shown in FIG. 1 , the image data 202 is more matched with the demand text “Draw a student graduation photo”.

具体地，ControlNet中的Unet包括编码器（encoder）和解码器（decoder）两个部分，其中，编码器用于对特征向量1进行下采样处理，得到与需求文本对应的隐变量（Latentcode），上述隐变量也可以称为特征向量5，或者也可以称为潜在编码。在Control模块对特征向量3进行下采样处理，得到对应的特征向量4之后，解码器可以接收编码器输出的特征向量5以及Control模块的输出的特征向量4，对特征向量4和特征向量5进行上采样处理，得到特征向量2。然后，VAE解码器根据特征向量2生成对应的图像数据3。在ControlNet在根据需求文本和图像数据2生成图像数据3的过程中，SD模型中的文本编码器输出特征向量1、VAE编码器输出特征向量3之后，基于Unet中解码器的输入数据与Control模块的输出数据之间的关系，ControlNet应先运行Control模块得到特征向量4之后再运行SD模型中的Unet。这样，Unet中的解码器才能在编码器运行结束之后接收到Unet中编码器输出的特征向量5以及Control模块的输出的特征向量4，并生成特征图（Feature Map），上述特征图也可以称为特征向量2或第二特征图。Specifically, Unet in ControlNet includes two parts: an encoder and a decoder. The encoder is used to downsample feature vector 1 to obtain a latent variable (latentcode) corresponding to the required text. The latent variable can also be called feature vector 5, or can also be called potential code. After the Control module downsamples feature vector 3 to obtain the corresponding feature vector 4, the decoder can receive feature vector 5 output by the encoder and feature vector 4 output by the Control module, and upsample feature vector 4 and feature vector 5 to obtain feature vector 2. Then, the VAE decoder generates the corresponding image data 3 according to feature vector 2. In the process of ControlNet generating image data 3 according to the required text and image data 2, after the text encoder in the SD model outputs feature vector 1 and the VAE encoder outputs feature vector 3, based on the relationship between the input data of the decoder in Unet and the output data of the Control module, ControlNet should first run the Control module to obtain feature vector 4 and then run Unet in the SD model. In this way, the decoder in Unet can receive the feature vector 5 output by the encoder in Unet and the feature vector 4 output by the Control module after the encoder runs, and generate a feature map (Feature Map), which can also be called feature vector 2 or the second feature map.

参见图3， SD模型中的文本编码器对需求文本进行特征提取处理，输出特征向量1、VAE编码器对图像数据2进行特征提取处理，输出特征向量3之后，ControlNet先运行控制网络中控制模块得到特征向量4，然后将特征向量1和特征向量4输入到U型网络中，顺次运行U型网络中的编码器、解码器，以及控制网络中VAE解码器，输出图像数据3。Referring to Figure 3, the text encoder in the SD model performs feature extraction on the required text and outputs feature vector 1. The VAE encoder performs feature extraction on the image data 2 and outputs feature vector 3. Then, ControlNet first runs the control module in the control network to obtain feature vector 4, and then inputs feature vector 1 and feature vector 4 into the U-type network. The encoder and decoder in the U-type network and the VAE decoder in the control network are sequentially run to output image data 3.

然而，采用上述的先运行Control模块再运行SD模型中的Unet方式，导致ControlNet在迭代环节的耗时较长，从而导致ControlNet生成图像数据3的过程耗时较长，无法满足用户对拍摄功能的高时效需求。However, the above-mentioned method of first running the Control module and then running the Unet in the SD model causes ControlNet to take a long time in the iteration phase, which results in a long time for ControlNet to generate image data 3, and cannot meet the user's high timeliness requirements for the shooting function.

例如，Control模块的运行耗时为100ms，Unet中编码器和解码器的运行耗时均为150ms。ControlNet先运行Control模块再运行Unet需要耗时400ms，无法满足用户的高效率需求。For example, the running time of the Control module is 100ms, and the running time of the encoder and decoder in Unet is 150ms. It takes 400ms for ControlNet to run the Control module first and then run Unet, which cannot meet the high efficiency requirements of users.

针对上述问题，本申请实施例提出一种图像生成方法。在根据需求文本和图像数据2生成图像数据3的过程中，在SD模型中的文本编码器输出特征向量1、VAE编码器输出特征向量3之后，ControlNet可以并行运行ControlNet中的Control模块和Unet中的编码器。在Control模块和Unet中的编码器运行结束后，ControlNet再运行Unet中的解码器。Unet中的解码器可以接收到Control模块输出的特征向量4及编码器输出的特征向量5，并根据特征向量4、特征向量5生成特征向量2，VAE解码器根据特征向量2生成对应的图像数据3。这样，可以减小ControlNet生成图像数据3的耗时。为了便于说明，Unet中的编码器也可以称为第一编码器，特征向量1也可以称为第一特征向量，特征向量3也可以称为第二特征向量，图像数据2也可以称为待处理图像数据，图像数据3也可以称为目标图像数据。运行ControlNet中的Control模块的时间区间，也可以称作第一时间区间。运行Unet中的编码器的时间区间，也可以称作第二时间区间。并行运行ControlNet中的Control模块和Unet中的编码器的时间区间，也可以称为第一时间区间和第二时间区间上存在的重叠的时间区间。In view of the above problems, an embodiment of the present application proposes an image generation method. In the process of generating image data 3 according to the required text and image data 2, after the text encoder in the SD model outputs feature vector 1 and the VAE encoder outputs feature vector 3, ControlNet can run the Control module in ControlNet and the encoder in Unet in parallel. After the Control module and the encoder in Unet are finished running, ControlNet runs the decoder in Unet again. The decoder in Unet can receive the feature vector 4 output by the Control module and the feature vector 5 output by the encoder, and generate feature vector 2 according to feature vector 4 and feature vector 5, and the VAE decoder generates the corresponding image data 3 according to feature vector 2. In this way, the time consumption of ControlNet to generate image data 3 can be reduced. For the sake of convenience, the encoder in Unet can also be called the first encoder, the feature vector 1 can also be called the first feature vector, the feature vector 3 can also be called the second feature vector, the image data 2 can also be called the image data to be processed, and the image data 3 can also be called the target image data. The time interval for running the Control module in ControlNet can also be called the first time interval. The time interval for running the encoder in Unet can also be called the second time interval. The time interval for running the Control module in ControlNet and the encoder in Unet in parallel may also be referred to as an overlapping time interval between the first time interval and the second time interval.

示例性的，图4提供了一种控制网络的流程示意图。如图4所示，用户输入需求文本和图像数据2， SD模型中的文本编码器对需求文本进行特征提取处理得到特征向量1、VAE编码器对图像数据2进行特征提取处理得到特征向量3之后，控制网络并行运行U型网络中的编码器和控制模块。其中，U型网络中的编码器对特征向量1进行下采样处理得到特征向量5，控制模块对特征向量3进行下采样处理得到特征向量4。在U型网络中的编码器和控制模块运行结束之后，控制网络运行U型网络中的解码器。U型网络中的解码器对特征向量4和特征向量5进行上采样处理得到特征向量2。最后，控制网络中的VAE解码器对特征向量2进行处理，得到图像数据3。Exemplarily, FIG4 provides a flow chart of a control network. As shown in FIG4, the user inputs the demand text and image data 2, the text encoder in the SD model performs feature extraction processing on the demand text to obtain feature vector 1, and the VAE encoder performs feature extraction processing on the image data 2 to obtain feature vector 3, and then the control network runs the encoder and control module in the U-type network in parallel. Among them, the encoder in the U-type network performs downsampling processing on feature vector 1 to obtain feature vector 5, and the control module performs downsampling processing on feature vector 3 to obtain feature vector 4. After the encoder and control module in the U-type network are finished running, the control network runs the decoder in the U-type network. The decoder in the U-type network performs upsampling processing on feature vector 4 and feature vector 5 to obtain feature vector 2. Finally, the VAE decoder in the control network processes feature vector 2 to obtain image data 3.

一般情况下，Control模块运行的时长小于Unet中的编码器的运行时长。因此，ControlNet并行运行Control模块和编码器的耗时可以视为Unet中的编码器的运行时长。也就是说，ControlNet并行运行Control模块和编码器可以节约掉Control模块运行的时长，提高了ControlNet图像数据3的生成效率。Generally, the running time of the Control module is shorter than the running time of the encoder in Unet. Therefore, the time consumed by ControlNet to run the Control module and the encoder in parallel can be regarded as the running time of the encoder in Unet. In other words, ControlNet running the Control module and the encoder in parallel can save the running time of the Control module and improve the generation efficiency of ControlNet image data 3.

例如，Control模块的运行耗时为100ms，Unet中编码器和解码器的运行耗时均为150ms， Control模块和编码器的并行运行耗时为150ms，然后Unet中解码器的运行耗时为150ms。这样，Control模块和Unet的运行总耗时为300ms， ControlNet生成图像数据3的过程缩短了150ms的耗时。For example, the running time of the Control module is 100ms, the running time of the encoder and decoder in Unet is 150ms, the parallel running time of the Control module and the encoder is 150ms, and the running time of the decoder in Unet is 150ms. In this way, the total running time of the Control module and Unet is 300ms, and the process of ControlNet generating image data 3 is shortened by 150ms.

基于ControlNet具有根据需求文本和图像数据2生成与用户需求匹配的图像数据3的能力。在实际应用中，ControlNet还可以作为图像优化模型嵌入进电子设备中（如手机、平板电脑、桌面型、手持计算机、笔记本电脑、超级移动个人计算机（ultra-mobilepersonal computer，UMPC）、增强现实（augmented reality，AR）或者虚拟现实（virtualreality，VR）设备等，本申请实施例对该电子设备的具体形态不作特殊限制）。Based on ControlNet, it has the ability to generate image data 3 that matches user needs based on the required text and image data 2. In practical applications, ControlNet can also be embedded into electronic devices (such as mobile phones, tablet computers, desktop computers, handheld computers, notebook computers, ultra-mobile personal computers (UMPC), augmented reality (AR) or virtual reality (VR) devices, etc., and the specific form of the electronic device is not particularly limited in the embodiments of the present application) as an image optimization model.

具体地，电子设备将初始图像数据作为图像数据2输入ControlNet中，以及将需求文本输入至ControlNet中。电子设备可以采用ControlNet根据需求文本对初始图像数据进行内容优化得到图像数据3。其中，初始图像数据可以是电子设备驱动摄像头拍摄得到的初始图像数据。或者，初始图像数据可以是用户输入的图像数据。又或者，初始图像数据可以是电子设备中图库内存储的图像数据。再或者，初始图像数据可以是电子设备从第三方平台获取（或接收）的已授权的图像数据。第三方平台可以包括与电子设备通信的其他电子设备。这样，电子设备可以生成与用户需求匹配的高质量的图像数据3，如电子设备中相机应用拍摄生成美颜图像，优化电子设备中图库应用的图像数据，或者电子设备生成AI绘画图像等。Specifically, the electronic device inputs the initial image data as image data 2 into ControlNet, and inputs the requirement text into ControlNet. The electronic device can use ControlNet to optimize the content of the initial image data according to the requirement text to obtain image data 3. Among them, the initial image data can be the initial image data obtained by the electronic device driving the camera to shoot. Or, the initial image data can be image data input by the user. Alternatively, the initial image data can be image data stored in a gallery in the electronic device. Alternatively, the initial image data can be authorized image data obtained (or received) by the electronic device from a third-party platform. The third-party platform may include other electronic devices that communicate with the electronic device. In this way, the electronic device can generate high-quality image data 3 that matches the user's needs, such as the camera application in the electronic device shoots to generate beauty images, optimizes the image data of the gallery application in the electronic device, or the electronic device generates AI painting images, etc.

应理解，在拍摄场景中，电子设备采用ControlNet根据需求文本对摄像头采集的初始图像数据进行内容优化，可以快速生成高质量的图像数据3并作为电子设备拍摄预览界面的照片，提高了电子设备拍摄高质量的照片的效率。特别地，针对用户通过放大变焦倍率来拍摄范围更大或更远场景下的照片时，以及针对用户拍摄较暗场景下的照片时，电子设备采用ControlNet强大的图像生成能力，可以快速拍摄出更清晰、更明亮的高质量照片。It should be understood that in the shooting scene, the electronic device uses ControlNet to optimize the content of the initial image data collected by the camera according to the demand text, and can quickly generate high-quality image data 3 and use it as the photo of the preview interface of the electronic device, thereby improving the efficiency of the electronic device in taking high-quality photos. In particular, when the user zooms in to take photos of scenes with a larger range or farther away, and when the user takes photos of darker scenes, the electronic device uses ControlNet's powerful image generation capabilities to quickly take clearer, brighter, and high-quality photos.

示例性的，用户使用电子设备拍摄照片的一般要求均为清晰、明亮。对应的，电子设备可以将清晰、明亮作为默认需求文本。这样，电子设备采用ControlNet对摄像头拍摄的初始图像数据进行优化，输出更清晰、更明亮的图像数据3作为电子设备拍摄预览界面的照片。For example, the general requirements of users for taking photos with electronic devices are clarity and brightness. Correspondingly, the electronic device can use clarity and brightness as the default requirement text. In this way, the electronic device uses ControlNet to optimize the initial image data taken by the camera, and outputs clearer and brighter image data 3 as the photo taken by the electronic device preview interface.

以电子设备为手机为例，如图5所示，手机中相机应用驱动摄像头拍摄得到初始图像数据301，该初始图像数据301具有不清晰的问题。手机内部署的控制网络基于“清晰、明亮”的需求文本302对初始图像数据301进行优化，可以得到清晰、明亮的图像数据303来作为手机中相机应用拍摄预览界面显示的照片。Taking a mobile phone as an example, as shown in FIG5 , the camera application in the mobile phone drives the camera to shoot and obtains initial image data 301, which has the problem of being unclear. The control network deployed in the mobile phone optimizes the initial image data 301 based on the "clear and bright" requirement text 302, and can obtain clear and bright image data 303 as the photo displayed on the preview interface of the camera application in the mobile phone.

在一些实施例中，电子设备内部署的ControlNet后，可以将ControlNet 中Control模块和Unet分别部署在不同的处理器上。电子设备可以控制两个处理器实现并行运行ControlNet 中Control模块和Unet中的编码器。这样，电子设备可以在采用ControlNet生成高质量的图像的同时提高生成图像的效率。In some embodiments, after ControlNet is deployed in an electronic device, the Control module and Unet in ControlNet can be deployed on different processors respectively. The electronic device can control two processors to run the Control module in ControlNet and the encoder in Unet in parallel. In this way, the electronic device can generate high-quality images using ControlNet while improving the efficiency of generating images.

示例性的，电子设备将Control模块部署在神经网络处理器(neural-networkprocessing unit，NPU)上，将Unet部署在图形处理器(graphics processing unit，GPU)上。又示例性的，电子设备将Control模块部署在GPU上，将Unet部署在NPU上。Exemplarily, the electronic device deploys the Control module on a neural-network processing unit (NPU) and deploys Unet on a graphics processing unit (GPU). In another exemplary embodiment, the electronic device deploys the Control module on a GPU and deploys Unet on an NPU.

参见图6，图6示出的电子设备将控制模块部署在NPU上，将U型网络部署在GPU上。电子设备在GPU运行SD模型中的文本编码器结束、VAE编码器运行结束之后。电子设备可以控制NPU和GPU并行运行控制模块和U型网络中的编码器（图6中示出的实线箭头表示顺次运行顺序）。U型网络中的编码器运行结束后，GPU中的U型网络暂停运行，在控制模块也运行结束后（如图6中虚线箭头表示在U型网络中的编码器运行结束后暂停运行，等待控制模块运行结束后才继续运行的运行顺序），电子设备控制GPU运行U型网络中的解码器以及VAE解码器。Referring to Figure 6, the electronic device shown in Figure 6 deploys the control module on the NPU and the U-shaped network on the GPU. After the electronic device finishes running the text encoder in the SD model on the GPU and the VAE encoder ends running. The electronic device can control the NPU and the GPU to run the control module and the encoder in the U-shaped network in parallel (the solid arrows shown in Figure 6 indicate the sequential running order). After the encoder in the U-shaped network ends running, the U-shaped network in the GPU pauses running, and after the control module also ends running (as shown by the dotted arrows in Figure 6, the running sequence of pausing after the encoder in the U-shaped network ends running and waiting for the control module to end running before continuing to run), the electronic device controls the GPU to run the decoder in the U-shaped network and the VAE decoder.

具体地，电子设备可以采用中央处理器（ Central Processing Unit ，CPU）上一个跨硬件平台部署模型库，实现控制NPU上Control模块和GPU上Unet并行运行的操作。以跨硬件平台部署模型库为NNAdapter为例，电子设备采用NNAdapter分别向NPU和GPU发送并行运行的控制指令，以适配电子设备并行运行NPU上的Control模块以及GPU上Unet的逻辑。为了便于说明，CPU可以称为第一处理器，GPU也可以称为第二处理器，NPU也可以称为第三处理器，或者，GPU也可以称为第三处理器，NPU也可以称为第二处理器。Specifically, the electronic device can use a cross-hardware platform deployment model library on the central processing unit (CPU) to implement the operation of controlling the parallel operation of the Control module on the NPU and the Unet on the GPU. Taking the cross-hardware platform deployment model library as NNAdapter as an example, the electronic device uses NNAdapter to send parallel operation control instructions to the NPU and GPU respectively to adapt the logic of the electronic device to run the Control module on the NPU and the Unet on the GPU in parallel. For ease of explanation, the CPU can be called the first processor, the GPU can be called the second processor, the NPU can be called the third processor, or the GPU can be called the third processor and the NPU can be called the second processor.

在一些实施例中， NNAdapter中包括“model manager”参数，“model manager”参数可以用于控制处理器的运行状态。CPU可以在NNAdapter中设置不同的“model manager”参数，“model manager=GPU”指示被控处理器为GPU，“model manager=NPU”指示被控处理器为NPU。 CPU根据“model manager=GPU”参数生成第一控制指令，根据“model manager=NPU”参数生成第二控制指令。CPU上NNAdapter分别向GPU发送第一控制指令，向NPU发送第二控制指令。GPU接收到第一控制指令时开始运行Unet，NPU接收到第二控制指令时开始运行Control模块。这样，CPU可以实现GPU和NPU两个处理器的运行状态的控制操作。In some embodiments, the NNAdapter includes a "model manager" parameter, and the "model manager" parameter can be used to control the operating state of the processor. The CPU can set different "model manager" parameters in the NNAdapter, "model manager=GPU" indicates that the controlled processor is a GPU, and "model manager=NPU" indicates that the controlled processor is an NPU. The CPU generates a first control instruction according to the "model manager=GPU" parameter, and generates a second control instruction according to the "model manager=NPU" parameter. The NNAdapter on the CPU sends the first control instruction to the GPU and the second control instruction to the NPU respectively. When the GPU receives the first control instruction, it starts to run Unet, and when the NPU receives the second control instruction, it starts to run the Control module. In this way, the CPU can implement the control operation of the operating state of the two processors, the GPU and the NPU.

并且，电子设备可以在Unet运行的过程中设置中断等待机制，这样，电子设备可以在Unet中编码器运行结束时进入中断等待状态，直至Control 模块运行结束之后，电子设备再运行Unet中的解码器。另外，电子设备还可以设置一个标志位参数，标志位参数用于指示Control模块的运行状态。标志位参数可以包括“0”或“1”。其中，标志位参数为“0”时，指示Control模块的运行状态为未运行结束，标志位参数为“1”时，指示Control模块的结束状态为运行结束。Furthermore, the electronic device can set an interrupt waiting mechanism during the operation of Unet, so that the electronic device can enter the interrupt waiting state when the encoder in Unet ends, and the electronic device can run the decoder in Unet after the Control module ends. In addition, the electronic device can also set a flag parameter, which is used to indicate the operating status of the Control module. The flag parameter can include "0" or "1". Among them, when the flag parameter is "0", it indicates that the operating status of the Control module is not completed, and when the flag parameter is "1", it indicates that the end status of the Control module is completed.

应理解，在每次生成图像数据3之后，电子设备均将标志位参数设置为指示Control模块运行状态为未运行结束。It should be understood that after each generation of the image data 3, the electronic device sets the flag parameter to indicate that the running state of the Control module is not completed.

具体地，一般Unet的运行框架在运行Unet的过程中，只接收一次输入数据，然后顺次执行Unet中的各模块，直到Unet运行结束，输出特征向量2。为并行运行Control模块和Unet中编码器，电子设备可以在Unet的运行框架设置中断等待机制。Unet的运行框架在Unet中的编码器运行结束之后进入中断等待状态，等到控制模块运行结束时，将控制模块输出的特征向量4和Unet中编码器输出的特征向量2一同作为输入数据，运行Unet 中的解码器。Specifically, in the process of running Unet, the running framework of Unet generally receives input data only once, and then executes each module in Unet in sequence until the Unet runs to the end and outputs the feature vector 2. In order to run the Control module and the encoder in Unet in parallel, the electronic device can set an interrupt waiting mechanism in the running framework of Unet. The running framework of Unet enters the interrupt waiting state after the encoder in Unet runs to the end, and when the control module runs to the end, the feature vector 4 output by the control module and the feature vector 2 output by the encoder in Unet are used as input data to run the decoder in Unet.

以GPU中Unet的运行框架为GPU图像（Fast Artificial IntelligenceTechnology by Honor，FAITH）框架为例，电子设备可以在FAITH框架中设置中断等待机制。电子设备在SD模型中的文本编码器运行结束、VAE编码器运行结束之后，电子设备中的NNAdapter同时向GPU发送携带“model manager=GPU”的第一控制指令，向NPU发送携带“model manager=NPU”的第二控制指令。然后，GPU基于第一控制指令控制FAITH框架运行Unet，NPU基于第二控制指令运行Control 模块。FAITH框架在Unet中的编码器运行结束之后进入中断等待状态，并获取标志位参数，根据标志位参数确定Control模块的运行状态。其中，GPU运行Unet 中编码器的过程与NPU运行Control模块的过程具有重叠的时间区间。Control模块运行结束时，电子设备将标志位参数设置为指示Control模块运行状态为运行结束。FAITH框架在标志位参数显示Control模块运行状态为运行结束时，开始运行Unet中的解码器，然后GPU运行SD模型中的VAE编码器，最终输出图像数据3。FAITH框架在标志位参数显示Control模块的运行状态为未运行结束时继续中断等待状态，直至标志位参数显示Control模块运行结束。Taking the GPU image (Fast Artificial Intelligence Technology by Honor, FAITH) framework as an example, the electronic device can set an interrupt waiting mechanism in the FAITH framework. After the text encoder in the SD model of the electronic device is finished running and the VAE encoder is finished running, the NNAdapter in the electronic device simultaneously sends a first control instruction carrying "model manager = GPU" to the GPU and a second control instruction carrying "model manager = NPU" to the NPU. Then, the GPU controls the FAITH framework to run Unet based on the first control instruction, and the NPU runs the Control module based on the second control instruction. After the encoder in Unet is finished running, the FAITH framework enters the interrupt waiting state, obtains the flag bit parameter, and determines the running state of the Control module according to the flag bit parameter. Among them, the process of GPU running the encoder in Unet and the process of NPU running the Control module have an overlapping time interval. When the Control module runs, the electronic device sets the flag bit parameter to indicate that the running state of the Control module is running. When the flag bit parameter shows that the running state of the Control module is running, the FAITH framework starts running the decoder in Unet, and then the GPU runs the VAE encoder in the SD model, and finally outputs image data 3. The FAITH framework continues to interrupt the waiting state when the flag parameter shows that the running status of the Control module is not completed, until the flag parameter shows that the running of the Control module is completed.

在一些实施例中，电子设备将控制网络的运行过程记录在离线日志（offlinelog）中。In some embodiments, the electronic device records the operation process of the control network in an offline log.

示例性的，offline log可以用于记录运行控制网络中每一个模块的处理器，以及控制网络中每一个模块的运行时间。例如，offline log包括控制网络中控制模块运行在NPU上，运行时间为10:00:00-10:00：20，控制网络中U型网络的编码器运行在GPU上，运行时间为10:00:00-10:00：30，控制网络中U型网络的解码器运行在GPU上，运行时间为10:00:30-10:01：00。Exemplarily, the offline log can be used to record the processor that runs each module in the control network and the running time of each module in the control network. For example, the offline log includes that the control module in the control network runs on the NPU and the running time is 10:00:00-10:00:20, the encoder of the U-type network in the control network runs on the GPU and the running time is 10:00:00-10:00:30, and the decoder of the U-type network in the control network runs on the GPU and the running time is 10:00:30-10:01:00.

在一些实施例中，控制网络还包括迭代次数编码器，控制网络的输入数据还包括迭代次数需求。迭代次数编码器对迭代次数需求进行特征提取处理，得到迭代次数需求对应的特征向量6之后，输入SD模型和控制模块中。为了便于说明，特征向量6也可以称为第三特征向量。In some embodiments, the control network further includes an iteration number encoder, and the input data of the control network further includes an iteration number requirement. The iteration number encoder performs feature extraction processing on the iteration number requirement, obtains a feature vector 6 corresponding to the iteration number requirement, and then inputs it into the SD model and the control module. For ease of explanation, the feature vector 6 may also be referred to as a third feature vector.

在一些实施例中，控制网络中控制模块的输入数据还包括随机变量（random），控制模块对随机变量进行下采样处理，可以得到包含更丰富的图像特征信息的特征向量2。也就是说，将随机变量输入控制网络中控制模块，可以进一步提高控制网络生成的图像数据3的质量。In some embodiments, the input data of the control module in the control network also includes random variables (random), and the control module performs downsampling processing on the random variables to obtain a feature vector 2 containing richer image feature information. In other words, inputting the random variables into the control module in the control network can further improve the quality of the image data 3 generated by the control network.

参见图7，本申请实施例提供了一种控制网络的结构图。该控制网络包括SD模型和控制模块。其中，SD模型的输入数据包括需求文本（Prompt）、迭代次数需求（Time）。控制网络中的文本编码器对需求文本进行特征提取处理，得到对应的特征向量1。VAE编码器对图像数据2进行特征提取处理，得到与图像数据2对应的特征向量3。Referring to FIG. 7 , an embodiment of the present application provides a structural diagram of a control network. The control network includes an SD model and a control module. The input data of the SD model includes a requirement text (Prompt) and a number of iterations required (Time). The text encoder in the control network performs feature extraction processing on the requirement text to obtain a corresponding feature vector 1. The VAE encoder performs feature extraction processing on the image data 2 to obtain a feature vector 3 corresponding to the image data 2.

迭代次数需求指示U型网络中的编码器（Unet encoder）、解码器（Unet decoder）需要迭代的次数，控制网络中的迭代次数编码器（Time Encoder）对迭代次数需求进行特征提取处理，得到迭代次数对应的特征向量6。例如，迭代次数需求为20，则U型网络中的编码器、解码器需要迭代20次，也即U型网络可以经过20次迭代降噪，得到特征向量2。The iteration requirement indicates the number of iterations required by the encoder (Unet encoder) and decoder (Unet decoder) in the U-type network, and the iteration encoder (Time Encoder) in the control network performs feature extraction processing on the iteration requirement to obtain a feature vector 6 corresponding to the iteration number. For example, if the iteration requirement is 20, the encoder and decoder in the U-type network need to iterate 20 times, that is, the U-type network can be denoised after 20 iterations to obtain a feature vector 2.

U型网络中的编码器包括SD模型第一编码器（SD Encoder Block1）、SD模型第二编码器（SD Encoder Block2）、SD模型第三编码器（SD Encoder Block3）、SD模型第四编码器（SD Encoder Block4）。其中，SD模型第一编码器的输出数据为64*64的矩阵，SD模型第二编码器的输出数据为32*32的矩阵，SD模型第三编码器的输出数据为16*16的矩阵， SD模型第四编码器的输出数据为8*8的矩阵，这样U型网络中的编码器可以通过逐级降噪提取得到特征向量5。The encoders in the U-type network include the first encoder of the SD model (SD Encoder Block1), the second encoder of the SD model (SD Encoder Block2), the third encoder of the SD model (SD Encoder Block3), and the fourth encoder of the SD model (SD Encoder Block4). The output data of the first encoder of the SD model is a 64*64 matrix, the output data of the second encoder of the SD model is a 32*32 matrix, the output data of the third encoder of the SD model is a 16*16 matrix, and the output data of the fourth encoder of the SD model is an 8*8 matrix. In this way, the encoders in the U-type network can extract the feature vector 5 through step-by-step noise reduction.

U型网络中的解码器包括SD模型第一解码器（SD Decoder Block1）、SD模型第二解码器（SD Decoder Block2）、SD模型第三解码器（SD Decoder Block3）、SD模型第四解码器（SD Decoder Block4）和SD模型第五解码器（SD Middle Block）。SD模型第一编码器与SD模型第一解码器、SD模型第二编码器与SD模型第二解码器、SD模型第三编码器与SD模型第三解码器、SD模型第四编码器与SD模型第四解码器一一对应，每个编码器的输出数据可以作为与之一一对应的解码器的输入数据。The decoders in the U-shaped network include the SD model first decoder (SD Decoder Block1), the SD model second decoder (SD Decoder Block2), the SD model third decoder (SD Decoder Block3), the SD model fourth decoder (SD Decoder Block4) and the SD model fifth decoder (SD Middle Block). The SD model first encoder corresponds to the SD model first decoder, the SD model second encoder corresponds to the SD model second decoder, the SD model third encoder corresponds to the SD model third decoder, and the SD model fourth encoder corresponds to the SD model fourth decoder, and the output data of each encoder can be used as the input data of the decoder corresponding to it.

控制模块的输入数据包括特征向量3（Condition）、与需求文本对应的特征向量1、与迭代次数需求对应的迭代次数特征向量6。其中，一层零卷积层（zero convolution）对特征向量3进行卷积处理后得到特征向量7，输入控制模块的SD模型编码器A（SD EncoderBlock A）中。在一些可能的实施例中，控制模块的输入数据还包括随机变量。The input data of the control module includes feature vector 3 (Condition), feature vector 1 corresponding to the requirement text, and feature vector 6 corresponding to the iteration number requirement. Among them, a zero convolution layer (zero convolution) convolves feature vector 3 to obtain feature vector 7, which is input into the SD model encoder A (SD EncoderBlock A) of the control module. In some possible embodiments, the input data of the control module also includes random variables.

控制模块包括SD模型编码器A、SD模型编码器B（SD Encoder Block B）、SD模型编码器C（SD Encoder Block C）、SD模型编码器D（SD Encoder Block D）和SD模型编码器E（SDMiddle Block）。控制模块中SD模型编码器A与U型网络中SD模型第一解码器、SD模型编码器B与U型网络中SD模型第二解码器、SD模型编码器C与U型网络中SD模型第三解码器、SD模型编码器D与U型网络中SD模型第四解码器、SD模型编码器E与U型网络中SD模型第五解码器一一对应。其中，SD模型编码器A的输出数据为64*64的矩阵，SD模型编码器B的输出数据为32*32的矩阵，SD模型编码器C的输出数据为16*16的矩阵，SD模型编码器D的输出数据为8*8的矩阵，SD模型编码器E的输出数据为8*8的矩阵。这样，控制模块可以通过多层编码器逐级降噪进一步提取得到与特征向量1、特征向量7以及特征向量6对应的特征向量8等。The control module includes an SD model encoder A, an SD model encoder B (SD Encoder Block B), an SD model encoder C (SD Encoder Block C), an SD model encoder D (SD Encoder Block D) and an SD model encoder E (SDMiddle Block). In the control module, the SD model encoder A corresponds to the first decoder of the SD model in the U-type network, the SD model encoder B corresponds to the second decoder of the SD model in the U-type network, the SD model encoder C corresponds to the third decoder of the SD model in the U-type network, the SD model encoder D corresponds to the fourth decoder of the SD model in the U-type network, and the SD model encoder E corresponds to the fifth decoder of the SD model in the U-type network. Among them, the output data of the SD model encoder A is a 64*64 matrix, the output data of the SD model encoder B is a 32*32 matrix, the output data of the SD model encoder C is a 16*16 matrix, the output data of the SD model encoder D is an 8*8 matrix, and the output data of the SD model encoder E is an 8*8 matrix. In this way, the control module can further extract the feature vector 8 corresponding to the feature vector 1, the feature vector 7 and the feature vector 6 through the multi-layer encoder step-by-step noise reduction.

控制模块中的每个编码器的输出数据（包括特征向量8）都经过一个零卷积层处理得到特征向量4，作为输入数据输入U型网络中与之一一对应的解码器中。The output data (including feature vector 8) of each encoder in the control module is processed by a zero convolution layer to obtain feature vector 4, which is input into the decoder corresponding to each one in the U-type network as input data.

在U型网络中的编码器和控制模块运行结束后，U型网络中的各解码器对U型网络中一一对应每个编码器输出的特征向量1、以及与控制模块中一一对应的每个编码器输出的特征向量4进行上采样处理，得到特征向量2。After the encoder and control module in the U-shaped network finish running, each decoder in the U-shaped network upsamples the feature vector 1 output by each encoder in the U-shaped network corresponding to each other, and the feature vector 4 output by each encoder in the control module corresponding to each other, to obtain the feature vector 2.

然后，通过VAE解码器对特征向量2进行解码处理，得到对应的图像数据3。Then, the feature vector 2 is decoded by the VAE decoder to obtain the corresponding image data 3.

这样，电子设备采用ControlNet根据文字需求对图像数据2进行优化，可以生成与用户需求匹配的高质量的图像数据3。并且，电子设备并行运行Control模块和Unet中编码器，可以缩短电子设备采用ControlNet生成图像数据3的耗时，提高了图像数据3的生成效率。In this way, the electronic device uses ControlNet to optimize the image data 2 according to the text requirements, and can generate high-quality image data 3 that matches the user's requirements. In addition, the electronic device runs the Control module and the encoder in Unet in parallel, which can shorten the time it takes for the electronic device to use ControlNet to generate the image data 3, and improve the efficiency of generating the image data 3.

本申请提供的图像生成方法可以在包含下述硬件及软件结构的电子设备上执行，参见图8，为本申请实施例提供的一种电子设备的硬件结构图。The image generation method provided in the present application can be executed on an electronic device including the following hardware and software structures. See Figure 8, which is a hardware structure diagram of an electronic device provided in an embodiment of the present application.

电子设备100可以包括处理器110，外部存储器接口120，内部存储器121，通用串行总线(universal serial bus，USB)接头130，充电管理模块140，电源管理模块141，电池142，天线1，天线2，移动通信模块150，无线通信模块160，音频模块170，扬声器170A，受话器170B，麦克风170C，耳机接口170D，传感器模块180，按键190，摄像模组193，显示屏191，以及用户标识模块(subscriber identification module，SIM)卡接口192等。其中，传感器模块180可以包括压力传感器180A，触摸传感器180B，环境光传感器180C等。The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a camera module 193, a display screen 191, and a subscriber identification module (SIM) card interface 192. The sensor module 180 may include a pressure sensor 180A, a touch sensor 180B, an ambient light sensor 180C, and the like.

可以理解的是，本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中，电子设备100可以包括比图示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。图示的部件可以以硬件，软件或软件和硬件的组合实现。It is to be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.

处理器110可以包括一个或多个处理单元，例如：处理器110可以包括应用处理器(application processor，AP)，调制解调处理器，图形处理器(graphics processingunit，GPU)，图像信号处理器(image signal processor，ISP)，控制器，视频编解码器，数字信号处理器(digital signal processor，DSP)，基带处理器，和/或神经网络处理器(neural-network processing unit，NPU)等。其中，不同的处理单元可以是独立的器件，也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Different processing units may be independent devices or integrated into one or more processors.

处理器可以根据指令操作码和时序信号，产生操作控制信号，完成取指令和执行指令的控制。The processor can generate operation control signals based on instruction opcodes and timing signals to complete the control of instruction fetching and execution.

处理器110中还可以设置存储器，用于存储指令和数据。The processor 110 may also be provided with a memory for storing instructions and data.

在一些实施例中，处理器110可以包括一个或多个接口。处理器110可以通过以上至少一种接口连接触摸传感器、音频模块、无线通信模块、显示器、摄像头等模块。可以理解的是，本申请实施例示意的各模块间的接口连接关系，只是示意性说明，并不构成对电子设备100的结构限定。在本申请另一些实施例中，电子设备100也可以采用上述实施例中不同的接口连接方式，或多种接口连接方式的组合。In some embodiments, the processor 110 may include one or more interfaces. The processor 110 may be connected to a touch sensor, an audio module, a wireless communication module, a display, a camera and other modules through at least one of the above interfaces. It is understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

USB接头130是一种符合USB标准规范的接口，可以用于连接电子设备100和外围设备，具体可以是Mini USB接头，Micro USB接头，USB Type C接头等。The USB connector 130 is an interface that complies with USB standard specifications and can be used to connect the electronic device 100 and peripheral devices. Specifically, it can be a Mini USB connector, a Micro USB connector, a USB Type C connector, etc.

充电管理模块140用于接收充电器的充电输入。The charging management module 140 is used to receive a charging input from a charger.

电源管理模块141用于连接电池142，充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入，为处理器110，内部存储器121，显示屏191，摄像模组193，和无线通信模块160等供电。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140 to power the processor 110, the internal memory 121, the display screen 191, the camera module 193, and the wireless communication module 160.

电子设备100的无线通信功能可以通过天线1，天线2，移动通信模块150，无线通信模块160，调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.

电子设备100可以通过GPU，显示屏191，以及应用处理器等实现显示功能。GPU为图像处理的微处理器，连接显示屏191和应用处理器。GPU用于执行数学和几何计算，用于图形渲染。处理器110可包括一个或多个GPU，其执行程序指令以生成或改变显示信息。The electronic device 100 can realize the display function through a GPU, a display screen 191, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen 191 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.

显示屏191用于显示图像，视频等。显示屏191包括显示面板。The display screen 191 is used to display images, videos, etc. The display screen 191 includes a display panel.

电子设备100可以通过摄像模组193，ISP，视频编解码器，GPU，显示屏191以及应用处理器AP、神经网络处理器NPU等实现摄像功能。The electronic device 100 can realize the camera function through the camera module 193, ISP, video codec, GPU, display screen 191, application processor AP, neural network processor NPU, etc.

摄像模组193可用于采集拍摄对象的彩色图像数据以及深度数据。ISP 可用于处理摄像模组193采集的彩色图像数据。例如，拍照时，打开快门，光线通过镜头被传递到摄像头感光元件上，光信号转换为电信号，摄像头感光元件将该电信号传递给ISP处理，转化为肉眼可见的图像。ISP还可以对图像的噪点，亮度，肤色进行算法优化。ISP还可以对拍摄场景的曝光，色温等参数优化。在一些实施例中，ISP可以设置在摄像模组193中。The camera module 193 can be used to collect color image data and depth data of the photographed object. The ISP can be used to process the color image data collected by the camera module 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converts it into an image visible to the naked eye. The ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. The ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP can be set in the camera module 193.

在一些实施例中，电子设备100可以包括1个或多个摄像模组193。In some embodiments, the electronic device 100 may include one or more camera modules 193 .

在一些实施例中，处理器110中的CPU或GPU或NPU可以对摄像模组193所采集的图像数据进行处理。在一些实施例中，NPU可以通过人脸识别算法，确定被拍摄人物的人脸所在位置，然后通过ControlNet对被拍摄人物的人脸图像进行美化处理，生成美化后的人脸图像。在一些实施例中，CPU或GPU或NPU还可用于根据摄像模组193（可以是3D感测模组）所采集的深度数据和已识别出的骨骼点来确认被拍摄人物的身材（如身体比例、骨骼点之间的身体部位的胖瘦情况），然后ControlNet对拍摄图像中被拍摄人物身体所对应的位置进行处理，以使得该拍摄图像中该被拍摄人物的体型被美化，得到美化后的人物图像。In some embodiments, the CPU or GPU or NPU in the processor 110 can process the image data collected by the camera module 193. In some embodiments, the NPU can determine the location of the face of the person being photographed through a face recognition algorithm, and then beautify the face image of the person being photographed through ControlNet to generate a beautified face image. In some embodiments, the CPU or GPU or NPU can also be used to confirm the body shape of the person being photographed (such as body proportions, fatness and thinness of body parts between bone points) based on the depth data collected by the camera module 193 (which can be a 3D sensing module) and the identified bone points, and then ControlNet processes the position corresponding to the body of the person being photographed in the captured image, so that the body shape of the person being photographed in the captured image is beautified to obtain a beautified image of the person.

NPU为神经网络(neural-network ，NN)计算处理器，通过借鉴生物神经网络结构，例如借鉴人脑神经元之间传递模式，对输入信息快速处理，还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用，例如：图像生成、图像识别，人脸识别，语音识别，文本理解等。NPU is a neural network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transmission mode between neurons in the human brain, it can quickly process input information and can also continuously self-learn. Through NPU, applications such as intelligent cognition of electronic device 100 can be realized, such as: image generation, image recognition, face recognition, voice recognition, text understanding, etc.

外部存储器接口120可以用于连接外部存储卡，例如Micro SD卡，实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信，实现数据存储功能。例如将音乐，视频等文件保存在外部存储卡中。或将音乐，视频等文件从电子设备传输至外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and videos are stored in the external memory card. Or files such as music and videos are transferred from the electronic device to the external memory card.

内部存储器121可以用于存储计算机可执行程序代码，该可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中，存储程序区可存储操作系统，至少一个功能所需的应用程序(比如声音播放功能，图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据，电话本等)等。此外，内部存储器121可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件，闪存器件，通用闪存存储器(universal flash storage，UFS)等。处理器110通过运行存储在内部存储器121的指令，和/或存储在设置于处理器中的存储器的指令，执行电子设备100的各种功能方法或数据处理。The internal memory 121 can be used to store computer executable program codes, which include instructions. The internal memory 121 may include a program storage area and a data storage area. Among them, the program storage area may store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc. The data storage area may store data created during the use of the electronic device 100 (such as audio data, a phone book, etc.), etc. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc. The processor 110 executes various functional methods or data processing of the electronic device 100 by running instructions stored in the internal memory 121, and/or instructions stored in a memory provided in the processor.

电子设备100可以通过音频模块170，扬声器170A，受话器170B，麦克风170C，耳机接口170D，以及应用处理器等实现音频功能。例如音乐播放，录音等。The electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.

传感器模块180用于采集设备数据或经用户授权后采集用户的行为数据。例如，采集用户常输入的需求文本“瘦脸”、“瘦身”等。The sensor module 180 is used to collect device data or collect user behavior data after authorization by the user, for example, collecting the user's frequently input demand texts such as "slim face" and "slim body".

按键190可以包括开机键，音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入，产生与电子设备100的用户设置以及功能控制有关的键信号输入。The key 190 may include a power key, a volume key, etc. The key 190 may be a mechanical key or a touch key. The electronic device 100 may receive key input and generate key signal input related to user settings and function control of the electronic device 100.

上述电子设备中的软件系统可以采用分层架构，事件驱动架构或云架构。本申请实施例以电子设备为分层架构的Android™系统为例，示例性说明电子设备的软件结构。The software system in the above electronic device can adopt a layered architecture, an event-driven architecture or a cloud architecture. The embodiment of the present application takes the electronic device as an Android™ system with a layered architecture as an example to illustrate the software structure of the electronic device.

参照图9，为本申请实施例提供的电子设备的一种软件架构图。如图9所示，分层架构可将电子设备的软件分成若干个层，每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中，将Android™系统分为四层，从上至下分别为应用程序层（Application layer）、应用程序框架层、系统库和内核层。Referring to FIG9 , a software architecture diagram of an electronic device provided in an embodiment of the present application is shown. As shown in FIG9 , the layered architecture can divide the software of the electronic device into several layers, each layer having a clear role and division of labor. The layers communicate with each other through software interfaces. In some embodiments, the Android™ system is divided into four layers, namely, from top to bottom, the application layer (Application layer), the application framework layer, the system library, and the kernel layer.

其中，应用程序层可以包括多种应用程序包，如通话、短信和图库等。The application layer may include a variety of application packages, such as calls, text messages, and gallery.

在一些实施例中，应用程序包还可以包括具有图像生成功能的应用，如AI绘画、相机、美颜相机等。示例性的，AI绘画可以接收用户输入的需求文本，并根据用户操作从电子设备中图库内存储的图像数据中确定图像数据2。AI绘画运行ControlNet对图像数据2、需求文本进行处理，生成满足用户需求的图像数据3。In some embodiments, the application package may also include applications with image generation functions, such as AI painting, camera, beauty camera, etc. Exemplarily, AI painting may receive the required text input by the user, and determine the image data 2 from the image data stored in the gallery in the electronic device according to the user operation. AI painting runs ControlNet to process the image data 2 and the required text, and generates image data 3 that meets the user's requirements.

相机可以将“清晰、明亮”作为默认的需求文本，并将摄像头拍摄的初始图像数据作为图像数据2。相机运行ControlNet生成更清晰、明亮的图像数据3，作为相机拍摄的图像数据。The camera may use "clear and bright" as the default requirement text and use the initial image data captured by the camera as image data 2. The camera runs ControlNet to generate clearer and brighter image data 3 as the image data captured by the camera.

美颜相机可以根据不同美颜模式确定出对应的需求文本，并将摄像头拍摄的初始图像数据或者电子设备中图库内存储的图像数据作为图像数据2。美颜相机运行ControlNet生成与美颜模式对应的图像数据3，作为美颜相机拍摄的图像数据。例如，美颜模式为“瘦脸”，则美颜相机运行ControlNet对图像数据2中人脸所在位置进行优化，达到“瘦脸”的效果，从而生成对应的图像数据3，作为美颜相机拍摄的图像数据。The beauty camera can determine the corresponding demand text according to different beauty modes, and use the initial image data taken by the camera or the image data stored in the gallery of the electronic device as image data 2. The beauty camera runs ControlNet to generate image data 3 corresponding to the beauty mode as the image data taken by the beauty camera. For example, if the beauty mode is "face thinning", the beauty camera runs ControlNet to optimize the position of the face in image data 2 to achieve the effect of "face thinning", thereby generating the corresponding image data 3 as the image data taken by the beauty camera.

应用程序框架层为应用程序层的APP提供应用编程接口(applicationprogramming interface，API)和编程框架。应用程序框架层包括一些预先定义的函数。示例性的，应用程序框架层可以包括窗口管理器，资源管理器、视图系统等。The application framework layer provides an application programming interface (API) and a programming framework for the APP in the application layer. The application framework layer includes some predefined functions. Exemplarily, the application framework layer may include a window manager, a resource manager, a view system, etc.

系统库可以包括多个功能模块，如三维图形处理库(如OpenGL ES)和媒体库(Media Libraries)等。The system library may include multiple functional modules, such as a 3D graphics processing library (such as OpenGL ES) and a media library.

内核层是硬件和软件之间的层。内核层可以包含显示驱动、摄像头驱动、音频驱动等各种硬件驱动，用于驱动相应的硬件工作。The kernel layer is the layer between hardware and software. The kernel layer can include various hardware drivers such as display driver, camera driver, audio driver, etc., which are used to drive the corresponding hardware work.

应理解，图9所示，软件架构的分层仅为示例性的，软件架构可以包括更多或者更少的软件层。It should be understood that the layering of the software architecture shown in FIG. 9 is merely exemplary, and the software architecture may include more or fewer software layers.

以Control模块部署在GPU上、Unet部署在NPU上、并以相机应用为例，在一些实施例中，结合电子设备的软硬件结构，本实施例可以通过如图10所示的步骤实现图像生成方法。Taking the Control module deployed on the GPU, Unet deployed on the NPU, and the camera application as an example, in some embodiments, combined with the software and hardware structure of the electronic device, this embodiment can implement the image generation method through the steps shown in Figure 10.

S900、相机应用接收操作a，S900: The camera application receives operation a.

具体地，用户在相机应用的界面上执行操作a，操作a为拍摄操作，相机应用接收到该操作a。Specifically, the user performs operation a on the interface of the camera application, where operation a is a shooting operation, and the camera application receives operation a.

S901、相机应用响应于接收到操作a，向摄像头发送第一帧请求，第一帧请求用于指示摄像头采集图像数据2。S901 . In response to receiving operation a, the camera application sends a first frame request to the camera, where the first frame request is used to instruct the camera to collect image data 2 .

具体地，相机应用响应于接收到的操作a生成第一帧请求。其中，第一帧请求用于指示摄像头采集图像数据2。图像数据2可以作为ControlNet中Control模块的输入数据。Specifically, the camera application generates a first frame request in response to the received operation a. The first frame request is used to instruct the camera to collect image data 2. The image data 2 can be used as input data of the Control module in ControlNet.

S902、摄像头响应于第一帧请求采集图像数据2。S902: The camera collects image data 2 in response to the first frame request.

具体地，摄像头响应于第一帧请求，启动并采集初始图像数据作为图像数据2。Specifically, the camera starts and collects initial image data as image data 2 in response to the first frame request.

S903、摄像头向相机应用发送图像数据2。S903: The camera sends image data 2 to the camera application.

具体地，摄像头采集到图像数据2之后向相机应用发送图像数据2，便于相机应用指示CPU运行ControlNet对图像数据2进行优化，从而得到图像数据3。Specifically, after the camera collects the image data 2 , it sends the image data 2 to the camera application, so that the camera application instructs the CPU to run ControlNet to optimize the image data 2 , thereby obtaining the image data 3 .

S904、相机应用响应于接收到操作a，从内存中读取特征向量1和特征向量6。S904 : In response to receiving operation a, the camera application reads feature vector 1 and feature vector 6 from the memory.

具体地，在用户执行操作a之后，相机应用响应于接收到操作a还需要获取需求文本。电子设备内存可以预先存储多种需求文本、与多种需求文本具有一一对应关系的特征向量1、多种迭代次数需求以及与多种迭代次数需求具有一一对应关系的特征向量6。这样，相机应用接收到的操作a后，可以确定操作a携带的需求文本参数和迭代次数需求参数之后，可以直接从内存中读取到对应的特征向量1和特征向量6。为了便于说明，上述多种需求文本也可以称为多个目标文本，上述与多种需求文本具有一一对应关系的特征向量1也可以称为与多个目标文本一一对应的多个目标特征向量。Specifically, after the user performs operation a, the camera application needs to obtain the requirement text in response to receiving operation a. The electronic device memory can pre-store multiple requirement texts, feature vectors 1 that have a one-to-one correspondence with the multiple requirement texts, multiple iteration number requirements, and feature vectors 6 that have a one-to-one correspondence with the multiple iteration number requirements. In this way, after the camera application receives operation a, it can determine the requirement text parameters and iteration number requirement parameters carried by operation a, and then directly read the corresponding feature vector 1 and feature vector 6 from the memory. For the sake of convenience, the above-mentioned multiple requirement texts can also be referred to as multiple target texts, and the above-mentioned feature vector 1 that has a one-to-one correspondence with the multiple requirement texts can also be referred to as multiple target feature vectors that correspond one-to-one to the multiple target texts.

具体地，操作a可以携带用户设置的需求文本参数和迭代次数需求参数。例如，用户通过在相机应用的界面选择需求文本参数和迭代次数需求参数，或者在相机应用的界面内输入设置需求文本参数和迭代次数需求参数。其中，需求文本参数可以是具体的需求文本内容或者与内存中存储的需求文本对应的ID，迭代次数需求参数可以包括迭代次数的值，或者对应的特征向量6迭代次数需求对应的ID。例如，需求文本参数为需求文本“瘦脸”，或者需求文本参数为“XQ001”，用于指示内存中存储的需求文本“瘦脸”。还例如，迭代次数需求参数可以包括迭代次数需求15，或者迭代次数需求参数为“DD001”，用于指示内存中存储的迭代次数需求15次。Specifically, operation a can carry the requirement text parameter and the iteration number requirement parameter set by the user. For example, the user selects the requirement text parameter and the iteration number requirement parameter in the interface of the camera application, or enters the setting requirement text parameter and the iteration number requirement parameter in the interface of the camera application. Among them, the requirement text parameter can be a specific requirement text content or an ID corresponding to the requirement text stored in the memory, and the iteration number requirement parameter can include the value of the iteration number, or the ID corresponding to the corresponding feature vector 6 iteration number requirement. For example, the requirement text parameter is the requirement text "thin face", or the requirement text parameter is "XQ001", which is used to indicate the requirement text "thin face" stored in the memory. For example, the iteration number requirement parameter can include an iteration number requirement of 15, or the iteration number requirement parameter is "DD001", which is used to indicate that the iteration number requirement stored in the memory is 15 times.

示例性的，内存中存储有分别与ID为XQ001的需求文本“瘦脸”、ID为XQ002的需求文本“放大眼睛”、ID为XQ003的需求文本“清晰、明亮”一一对应的特征向量1。以及内存中存储有分别与ID为DD001的迭代次数需求“15”，ID为DD002的迭代次数需求“20”以及ID为DD003的迭代次数需求“30” 一一对应的特征向量6。用户在相机应用界面执行操作a后，操作a携带有需求文本参数“瘦脸”以及ID为“DD001”的迭代次数需求参数。相机应用响应于接收到操作a，可以从内存中读取ID为XQ001的需求文本“瘦脸”对应的特征向量1以及ID为DD001的迭代次数需求“15” 对应的特征向量6。Exemplarily, the memory stores feature vectors 1 that correspond one-to-one to the requirement text "thin face" with ID XQ001, the requirement text "enlarge eyes" with ID XQ002, and the requirement text "clear and bright" with ID XQ003. And the memory stores feature vectors 6 that correspond one-to-one to the iteration number requirement "15" with ID DD001, the iteration number requirement "20" with ID DD002, and the iteration number requirement "30" with ID DD003. After the user performs operation a in the camera application interface, operation a carries the requirement text parameter "thin face" and the iteration number requirement parameter with ID "DD001". In response to receiving operation a, the camera application can read feature vector 1 corresponding to the requirement text "thin face" with ID XQ001 and feature vector 6 corresponding to the iteration number requirement "15" with ID DD001 from the memory.

具体地，电子设备中存储的多种需求文本中包括默认需求文本，多种迭代次数需求中包括默认迭代次数需求。在操作a不携带需求文本参数和迭代次数需求参数时，相机应用可以直接从内存中读取默认需求文本具有对应关系的特征向量1，以及与默认迭代次数需求具有对应关系的特征向量6。Specifically, the multiple requirement texts stored in the electronic device include a default requirement text, and the multiple iteration number requirements include a default iteration number requirement. When operation a does not carry a requirement text parameter and an iteration number requirement parameter, the camera application can directly read a feature vector 1 having a corresponding relationship with the default requirement text and a feature vector 6 having a corresponding relationship with the default iteration number requirement from the memory.

示例性的，一般情况下，用户使用相机应用拍摄图片、视频的需求为“清晰、明亮”，因此，电子设备可以将“清晰、明亮”作为相机应用的默认需求文本。为了提高图像生成的效率，进一步的，电子设备可以预先通过SD模型中的文本编码器对默认需求文本“清晰、明亮”进行特征提取处理，得到特征向量1并存储在内存中。这样，在操作a不携带需求文本参数时，相机应用可以直接从内存中读取特征向量1。可以理解的是，相机应用的默认需求文本还可以根据实际需求进行具体设定，上述将“清晰、明亮”作为相机应用的默认需求文本仅是一种示例，在此不对相机应用的默认需求文本做具体限定。ControlNet的迭代次数需求可以也设定默认值，例如迭代次数需求为20次，电子设备采用迭代次数编码器对20次迭代次数需求进行特征提取处理得到特征向量6,并存储在内存中。这样，在操作a不携带迭代次数需求参数时，相机应用可以直接从内存中读取特征向量6。For example, in general, the user's requirement for taking pictures and videos with a camera application is "clear and bright", so the electronic device can use "clear and bright" as the default requirement text of the camera application. In order to improve the efficiency of image generation, further, the electronic device can perform feature extraction processing on the default requirement text "clear and bright" through the text encoder in the SD model in advance, obtain feature vector 1 and store it in the memory. In this way, when operation a does not carry the requirement text parameter, the camera application can directly read the feature vector 1 from the memory. It can be understood that the default requirement text of the camera application can also be specifically set according to actual needs. The above-mentioned "clear and bright" as the default requirement text of the camera application is only an example, and the default requirement text of the camera application is not specifically limited here. The iteration number requirement of ControlNet can also set a default value. For example, the iteration number requirement is 20 times, and the electronic device uses the iteration number encoder to perform feature extraction processing on the 20 iteration number requirements to obtain feature vector 6 and store it in the memory. In this way, when operation a does not carry the iteration number requirement parameter, the camera application can directly read the feature vector 6 from the memory.

电子设备预先将多种需求文本的特征向量1存储在内存中，相机应用在拍摄过程中可以根据用户的操作a直接从内存中读取出与用户需求匹配的特征向量1，而无需文本编码器对操作a对应的需求文本进行特征提取处理，进一步减少了电子设备拍摄过程的耗时。The electronic device stores feature vectors 1 of multiple demand texts in memory in advance. During the shooting process, the camera application can directly read the feature vector 1 matching the user demand from the memory according to the user's operation a, without the need for a text encoder to perform feature extraction processing on the demand text corresponding to operation a, thereby further reducing the time consumption of the electronic device's shooting process.

可以理解的是，步骤S901和步骤S904均为相机应用响应于接收到操作a之后执行的步骤，上述两个步骤的执行顺序不以图10中所示的顺序为限制。同理，与步骤S901、S904关联的步骤也不以图10中所示的顺序为限制。例如，相机应用响应于接收到操作a，先执行步骤S904，后执行步骤S901。这种情况下，与步骤S901关联的步骤S902-S903的执行顺序位于仍步骤S901之后。It is understandable that both step S901 and step S904 are steps executed by the camera application in response to receiving operation a, and the execution order of the above two steps is not limited to the order shown in FIG10. Similarly, the steps associated with steps S901 and S904 are not limited to the order shown in FIG10. For example, in response to receiving operation a, the camera application first executes step S904 and then executes step S901. In this case, the execution order of steps S902-S903 associated with step S901 is still after step S901.

S905、相机应用在接收到特征向量1和特征向量6之后，生成图像生成指令，图像生成指令包括特征向量1、特征向量6及图像数据2。S905 . After receiving feature vector 1 and feature vector 6 , the camera application generates an image generation instruction, where the image generation instruction includes feature vector 1 , feature vector 6 , and image data 2 .

具体地，相机应用在接收到特征向量1特征向量和6之后，基于操作a生成对应的图像生成指令。图像生成指令包括特征向量1、特征向量6及图像数据2。图像生成指令用于指示CPU运行ControlNet对特征向量1、特征向量6以及图像数据2进行处理，得到对应的图像数据3。Specifically, after receiving feature vector 1, feature vector 6, and image data 2, the camera application generates a corresponding image generation instruction based on operation a. The image generation instruction includes feature vector 1, feature vector 6, and image data 2. The image generation instruction is used to instruct the CPU to run ControlNet to process feature vector 1, feature vector 6, and image data 2 to obtain corresponding image data 3.

S906、相机应用向CPU发送图像生成指令。S906: The camera application sends an image generation instruction to the CPU.

具体地，相机应用向CPU发送图像生成指令，以指示CPU根据图像生成指令运行ControlNet。Specifically, the camera application sends an image generation instruction to the CPU to instruct the CPU to run ControlNet according to the image generation instruction.

S907、CPU响应于接收到图像生成指令，生成第三控制指令，第三控制指令包括图像数据2。S907 . In response to receiving the image generation instruction, the CPU generates a third control instruction, where the third control instruction includes image data 2 .

具体地，CPU响应于接收到图像生成指令开始运行ControlNet，根据图像生成指令生成第三控制指令，第三控制指令包括图像数据2，第三控制指令用于控制NPU运行VAE编码器，对图像数据2进行优化处理得到特征向量3。Specifically, the CPU starts running ControlNet in response to receiving an image generation instruction, and generates a third control instruction based on the image generation instruction. The third control instruction includes image data 2. The third control instruction is used to control the NPU to run the VAE encoder and optimize the image data 2 to obtain a feature vector 3.

S908、CPU向NPU发送第三控制指令。S908. The CPU sends a third control instruction to the NPU.

具体地，CPU向NPU发送第三控制指令。Specifically, the CPU sends a third control instruction to the NPU.

S909、NPU响应于接收到第三控制指令，运行控制网络中的VAE编码器。S909 . In response to receiving the third control instruction, the NPU runs the VAE encoder in the control network.

具体地，NPU响应于接收到第三控制指令，根据第三控制指令运行控制网络中的VAE编码器，VAE编码建立线程对图像数据2进行特征提取处理，得到与图像数据2对应的特征向量3。Specifically, in response to receiving the third control instruction, the NPU runs the VAE encoder in the control network according to the third control instruction, and the VAE encoding establishment thread performs feature extraction processing on the image data 2 to obtain a feature vector 3 corresponding to the image data 2.

应理解，如图10示出的VAE编码器部署在NPU上， VAE解码器部署在GPU上，此处仅为对VAE编码器、VAE解码器部署情况的示例，并不构成对VAE编码器、VAE解码器的具体限定。It should be understood that the VAE encoder is deployed on the NPU and the VAE decoder is deployed on the GPU as shown in FIG10 . This is merely an example of the deployment of the VAE encoder and VAE decoder and does not constitute a specific limitation on the VAE encoder and VAE decoder.

示例性的，VAE编码器也可以部署于GPU上，不与控制网络中的控制模块部署在同一个处理器上。这种情况下，GPU需要在VAE编码器运行结束之后，将VAE编码器运行结果（包括特征向量3）发送至NPU，便于NPU将特征向量3作为输入，运行控制网络中的控制模块。For example, the VAE encoder can also be deployed on the GPU, and not on the same processor as the control module in the control network. In this case, the GPU needs to send the VAE encoder operation results (including feature vector 3) to the NPU after the VAE encoder is finished running, so that the NPU can use feature vector 3 as input to run the control module in the control network.

S910、VAE编码器运行结束。S910, VAE encoder operation ends.

具体地，VAE编码器运行结束NPU跳出当前线程，此时CPU可以确定VAE编码器运行结束。可以理解的是，此时VAE编码器的第三运行结果（包括特征向量3）存储在NPU中，在NPU运行控制模块时，VAE编码器的第三运行结果可以作为控制模块的输入数据。Specifically, when the VAE encoder finishes running, the NPU jumps out of the current thread, and the CPU can determine that the VAE encoder has finished running. It can be understood that at this time, the third running result of the VAE encoder (including feature vector 3) is stored in the NPU, and when the NPU runs the control module, the third running result of the VAE encoder can be used as input data of the control module.

S911、CPU生成第一控制指令和第二控制指令，第一控制指令包括特征向量1和特征向量6，第二控制指令包括特征向量1和特征向量6。S911. The CPU generates a first control instruction and a second control instruction. The first control instruction includes feature vector 1 and feature vector 6, and the second control instruction includes feature vector 1 and feature vector 6.

具体地，CPU生成第一控制指令和第二控制指令，第一控制指令包括特征向量1和特征向量6，用于指示GPU运行控制网络中U型网络内的编码器，对特征向量1和特征向量6进行下采样处理。第二控制指令包括特征向量1和特征向量6，第二控制指令用于指示NPU运行控制网络中的控制模块对特征向量1、特征向量3及特征向量6进行下采样处理。Specifically, the CPU generates a first control instruction and a second control instruction, the first control instruction includes feature vector 1 and feature vector 6, and is used to instruct the GPU to run the encoder in the U-shaped network in the control network to downsample feature vector 1 and feature vector 6. The second control instruction includes feature vector 1 and feature vector 6, and the second control instruction is used to instruct the NPU to run the control module in the control network to downsample feature vector 1, feature vector 3, and feature vector 6.

S912、CPU向GPU发送第一控制指令。S912: The CPU sends a first control instruction to the GPU.

S913、CPU向NPU 发送第二控制指令。S913: The CPU sends a second control instruction to the NPU.

具体地，CPU向GPU发送第一控制指令之后，向NPU发送第二控制指令，可以使GPU运行U型网络的编码器的过程与NPU运行控制模块的过程具有重叠的时间区间。相比等控制模块运行结束之后才运行U型网络的编码器的方式来说，可以减少控制网络的运行耗时，从而提高图像数据3的生成效率。Specifically, after the CPU sends the first control instruction to the GPU, it sends the second control instruction to the NPU, so that the process of the GPU running the encoder of the U-type network and the process of the NPU running the control module have an overlapping time interval. Compared with the method of running the encoder of the U-type network after the control module is finished running, the operation time of the control network can be reduced, thereby improving the generation efficiency of the image data 3.

步骤S912和步骤S913均为CPU生成第一控制指令和第二控制指令之后执行的步骤，上述两个步骤的执行顺序不以图10中所示的顺序为限制。例如，步骤S911之后，先执行步骤S913后执行步骤S912。或者，执行步骤S911之后，同时步骤S912和步骤S913。Step S912 and step S913 are both steps executed after the CPU generates the first control instruction and the second control instruction, and the execution order of the above two steps is not limited to the order shown in Figure 10. For example, after step S911, step S913 is executed first and then step S912 is executed. Alternatively, after step S911 is executed, step S912 and step S913 are executed simultaneously.

在一种可能的实施例中，CPU同时向GPU发送第一控制指令，GPU发送第一控制指令。这样，GPU接收到第一控制指令和NPU接收到第二控制指令的时间几乎相同，从而实现CPU同时控制GPU和NPU中控制网络的运行状态。In a possible embodiment, the CPU sends the first control instruction to the GPU at the same time, and the GPU sends the first control instruction. In this way, the time when the GPU receives the first control instruction and the time when the NPU receives the second control instruction are almost the same, so that the CPU can control the running state of the control network in the GPU and the NPU at the same time.

S914、GPU响应于接收到第一控制指令，运行U型网络中的编码器。S914: In response to receiving the first control instruction, the GPU runs the encoder in the U-type network.

S915、NPU响应于接收到第二控制指令，运行控制网络中的控制模块。S915. In response to receiving the second control instruction, the NPU runs a control module in the control network.

具体地，GPU响应于接收到第一控制指令，建立一个线程来运行U型网络中的编码器， NPU响应于接收到第二控制指令，建立一个线程来运行控制网络中的控制模块。可以理解的是，GPU和NPU分别建立线程，可以实现GPU上U型网络中的编码器和NPU上控制网络中的控制模块并行运行，GPU运行U型网络的编码器的线程与NPU运行控制模块的线程具有重叠的时间区间。Specifically, in response to receiving the first control instruction, the GPU establishes a thread to run the encoder in the U-type network, and in response to receiving the second control instruction, the NPU establishes a thread to run the control module in the control network. It can be understood that the GPU and the NPU establish threads respectively, so that the encoder in the U-type network on the GPU and the control module in the control network on the NPU can run in parallel, and the thread of the GPU running the encoder of the U-type network and the thread of the NPU running the control module have overlapping time intervals.

在一种可能的实施例中，在同一时刻，GPU响应于接收到第一控制指令，建立一个线程来运行U型网络中的编码器，并且NPU响应于接收到第二控制指令，建立一个线程来运行控制网络中的控制模块。这样，GPU上U型网络中的编码器和NPU上控制网络中的控制模块可以同时开始运行。In a possible embodiment, at the same time, the GPU, in response to receiving the first control instruction, establishes a thread to run the encoder in the U-type network, and the NPU, in response to receiving the second control instruction, establishes a thread to run the control module in the control network. In this way, the encoder in the U-type network on the GPU and the control module in the control network on the NPU can start running at the same time.

可以理解的是，步骤S914、步骤S915的执行顺序不以图10中所示的顺序为限制。例如，在执行步骤S912和步骤S913之后，先执行步骤S915后执行步骤S914。It is understandable that the execution order of step S914 and step S915 is not limited to the order shown in Fig. 10. For example, after executing step S912 and step S913, step S915 is executed first and then step S914.

S916、控制模块运行结束。S916: The control module operation ends.

具体地，NPU上的控制模块运行结束时，NPU跳出当前线程，此时CPU可以确定控制模块器运行结束。Specifically, when the control module on the NPU finishes running, the NPU jumps out of the current thread, and at this time the CPU can determine that the control module has finished running.

S917、CPU更新内存中的标志位参数为控制模块运行结束。S917: The CPU updates the flag parameter in the memory to indicate that the control module operation has ended.

具体地，电子设备在每一次执行完图像生成流程之后，设置内存中存储的标志位参数为指示Control模块未运行结束，这样，在每次执行图像生成流程之前，电子设备内存中存储的标志位参数为指示Control模块未运行结束。在NPU上的控制模块运行结束时，CPU及时更新内存中的标志位参数为控制模块运行结束。例如，在执行图像生成流程之前，电子设备内存中存储的标志位参数为 “0”指示Control模块未运行结束。在NPU上的控制模块运行结束时，CPU及时更新内存中的标志位参数为“1”指示控制模块运行结束。Specifically, after each execution of the image generation process, the electronic device sets the flag parameter stored in the memory to indicate that the Control module has not completed operation. In this way, before each execution of the image generation process, the flag parameter stored in the memory of the electronic device indicates that the Control module has not completed operation. When the control module on the NPU completes operation, the CPU promptly updates the flag parameter in the memory to indicate that the control module has completed operation. For example, before executing the image generation process, the flag parameter stored in the memory of the electronic device is "0", indicating that the Control module has not completed operation. When the control module on the NPU completes operation, the CPU promptly updates the flag parameter in the memory to "1", indicating that the control module has completed operation.

S918、NPU响应于控制模块运行结束，存储控制模块的第一运行结果。S918. In response to the control module completing its operation, the NPU stores a first operation result of the control module.

具体地，控制模块的运行时长一般小于U型网络中编码器的运行时长。考虑到存在一些设备异常等情况，导致控制模块先运行结束的可能。因此，在控制模块运行结束时，NPU可以先将控制模块的第一运行结果（包括特征向量4）存储至内存中。在GPU上U型网络的编码器运行结束时，可以从内存中读取控制模块的第一运行结果。Specifically, the running time of the control module is generally shorter than the running time of the encoder in the U-type network. Considering the possibility that the control module may end running first due to some equipment anomalies, the NPU can first store the first running result of the control module (including feature vector 4) in the memory when the control module ends running. When the encoder of the U-type network on the GPU ends running, the first running result of the control module can be read from the memory.

在另一些实施例中，NPU响应于控制模块运行结束，可以向GPU发送控制模块的第一运行结果，便于GPU运行U型网络中的解码器将控制模块的第一运行结果（包括特征向量4）作为输入数据进行上采样处理。In other embodiments, in response to the completion of the control module operation, the NPU may send the first operation result of the control module to the GPU, so that the GPU runs the decoder in the U-type network and uses the first operation result of the control module (including feature vector 4) as input data for upsampling processing.

可以理解的是，步骤S917和步骤S918均为控制模块运行结束之后执行的步骤，上述两个步骤的执行顺序不以图10中所示的顺序为限制。例如，控制模块运行结束之后，先执行步骤S918后执行步骤S917。It is understandable that both step S917 and step S918 are steps executed after the control module is finished running, and the execution order of the above two steps is not limited to the order shown in Figure 10. For example, after the control module is finished running, step S918 is executed first and then step S917.

S919、GPU响应于U型网络中的编码器运行结束，从内存中读取标志位参数，标志位参数指示控制模块运行结束。S919. In response to the encoder in the U-type network finishing its operation, the GPU reads a flag parameter from the memory, where the flag parameter indicates that the control module has finished its operation.

具体地，GPU响应于U型网络中的编码器运行结束，GPU进入中断等待状态，可以周期性的从内存中读取标志为参数，在标志为参数指示控制模块运行结束时，GPU可以运行U型网络中的解码器。其中，GPU进入中断等待状态具体为运行U型网络的框架暂停执行，也就是说运行U型网络的框架不继续运行U型网络中的解码器。示例性的，GPU采用FAITH框架运行U型网络，FAITH框架响应于U型网络中的编码器运行结束，FAITH框架不继续运行U型网络中的解码器，FAITH框架可以周期性的从内存中读取标志为参数。标志位参数用于指示控制模块的运行状态。示例性的，标志位参数为 “0”，指示Control模块未运行结束，标志位参数为“1”指示控制模块运行结束。在标志位参数为“1”指示控制模块运行结束时，FAITH框架可以运行U型网络中的解码器。Specifically, the GPU responds to the end of the operation of the encoder in the U-type network, and the GPU enters an interrupt waiting state, and can periodically read the flag as a parameter from the memory. When the flag as a parameter indicates that the control module has ended, the GPU can run the decoder in the U-type network. Among them, the GPU enters the interrupt waiting state specifically for the framework running the U-type network to suspend execution, that is, the framework running the U-type network does not continue to run the decoder in the U-type network. Exemplarily, the GPU uses the FAITH framework to run the U-type network. The FAITH framework responds to the end of the operation of the encoder in the U-type network. The FAITH framework does not continue to run the decoder in the U-type network, and the FAITH framework can periodically read the flag as a parameter from the memory. The flag bit parameter is used to indicate the running state of the control module. Exemplarily, the flag bit parameter is "0", indicating that the Control module has not ended, and the flag bit parameter is "1" indicating that the control module has ended. When the flag bit parameter is "1" indicating that the control module has ended, the FAITH framework can run the decoder in the U-type network.

其中，由于图像生成的耗时较短，读取标志位的周期的单位较小，一般为ms级。GPU周期性的读取标志位参数可以及时确定控制模块的运行状态。上述读取标志位的周期是可以根据实际情况进行具体设定，例如，读取标志位的周期为2ms，或者读取标志位的周期为5ms，在此不做具体限定。Among them, since the image generation takes a short time, the unit of the period of reading the flag bit is small, generally in the ms level. The GPU periodically reads the flag bit parameters to determine the operating status of the control module in a timely manner. The above-mentioned period of reading the flag bit can be specifically set according to actual conditions. For example, the period of reading the flag bit is 2ms, or the period of reading the flag bit is 5ms, which is not specifically limited here.

需要说明的是，GPU响应于U型网络中的编码器运行结束，从内存中读取标志位参数时，标志位参数也可能指示控制模块未运行结束，此时GPU继续保持中断等待状态，直至读取到的标志位参数指示控制模块运行结束。It should be noted that when the GPU responds to the completion of the encoder operation in the U-type network and reads the flag parameter from the memory, the flag parameter may also indicate that the control module has not completed the operation. At this time, the GPU continues to maintain the interrupt waiting state until the flag parameter read indicates that the control module has completed the operation.

S920、GPU从内存中读取控制模块的第一运行结果。S920: The GPU reads a first operation result of the control module from the memory.

具体地，GPU在标志位参数指示控制模块运行结束时，从内存中读取控制模块的第一运行结果，便于将U型网络中编码器的第二运行结果和控制模块的第一运行结果作为U型网络中解码器的输入数据。Specifically, when the flag parameter indicates that the control module has finished running, the GPU reads the first running result of the control module from the memory, so as to use the second running result of the encoder in the U-type network and the first running result of the control module as input data of the decoder in the U-type network.

S921、GPU运行U型网络中的解码器。S921, GPU runs the decoder in the U-type network.

具体地，GPU在读取到控制模块的第一运行结果（包括特征向量4）时，将U型网络中编码器的第二运行结果（包括特征向量5）和上述第一运行结果叠加输入到解码器中，运行U型网络中的解码器。Specifically, when the GPU reads the first operation result (including feature vector 4) of the control module, it superimposes the second operation result (including feature vector 5) of the encoder in the U-type network and the above-mentioned first operation result and inputs them into the decoder to run the decoder in the U-type network.

S922、GPU运行控制网络中的VAE解码器，得到图像数据3。S922, GPU runs the VAE decoder in the control network to obtain image data 3.

具体地，GPU在U型网络中的解码器运行结束时，顺序运行控制网络中的VAE解码器，得到图像数据3。Specifically, when the GPU finishes running the decoder in the U-type network, it sequentially runs the VAE decoder in the control network to obtain image data 3.

S923、GPU向相机应用发送图像数据3。S923. GPU sends image data 3 to the camera application.

具体地，GPU将控制网络输出的图像数据3发送至相机应用。Specifically, the GPU sends the image data 3 output by the control network to the camera application.

S924、相机应用显示第一界面，第一界面包括图像数据3。S924. The camera application displays a first interface, where the first interface includes image data 3.

具体地，相机应用在接收到图像数据3时，显示包括上述图像数据3的第一界面。此时，图像数据3可以作为相机应用拍摄预览的照片。Specifically, when the camera application receives the image data 3, it displays a first interface including the image data 3. At this time, the image data 3 can be used as a photo taken and previewed by the camera application.

采用上述图像生成流程，电子设备可以在相机应用拍摄的过程中，采用控制网络优化摄像头采集的初始图像数据，并行运行控制网络中的控制模块和U型网络中的编码器，以提高控制网络基于初始图像数据生成图像数据3的效率，从而快速在界面显示满足用户需求的图片，提高电子设备拍摄高质量照片的效率。By adopting the above-mentioned image generation process, the electronic device can use the control network to optimize the initial image data collected by the camera during the camera application shooting process, and run the control module in the control network and the encoder in the U-shaped network in parallel to improve the efficiency of the control network in generating image data 3 based on the initial image data, thereby quickly displaying pictures that meet user needs on the interface and improving the efficiency of the electronic device in taking high-quality photos.

本申请另一些实施例提供了一种电子设备，该电子设备可以包括：存储器和一个或多个处理器。该存储器和处理器耦合。该存储器用于存储计算机程序代码，该计算机程序代码包括计算机指令。当处理器执行计算机指令时，电子设备可执行上述方法实施例中的各个步骤。该电子设备的结构可以参考图8所示的结构。Some other embodiments of the present application provide an electronic device, which may include: a memory and one or more processors. The memory is coupled to the processor. The memory is used to store computer program code, and the computer program code includes computer instructions. When the processor executes the computer instructions, the electronic device may perform each step in the above method embodiment. The structure of the electronic device may refer to the structure shown in FIG8.

本申请实施例还提供一种计算机可读存储介质，该计算机可读存储介质包括计算机指令，当所述计算机指令在上述电子设备上运行时，使得该电子设备执行上述方法实施例中的各个步骤。An embodiment of the present application also provides a computer-readable storage medium, which includes computer instructions. When the computer instructions are executed on the above-mentioned electronic device, the electronic device executes each step in the above-mentioned method embodiment.

本申请实施例还提供一种计算机程序产品，当所述计算机程序产品在计算机上运行时，使得所述计算机执行上述方法实施例中的各个步骤。The embodiment of the present application also provides a computer program product. When the computer program product is run on a computer, the computer is enabled to execute each step in the above method embodiment.

通过以上实施方式的描述，所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。Through the description of the above implementation methods, technical personnel in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional modules is used as an example. In actual applications, the above-mentioned functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个装置，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是一个物理单元或多个物理单元，即可以位于一个地方，或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该软件产品存储在一个存储介质中，包括若干指令用以使得一个设备（可以是单片机，芯片等）或处理器（processor）执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（read only memory，ROM）、随机存取存储器（random access memory，RAM）、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium, including several instructions to enable a device (which can be a single-chip microcomputer, chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program code.

以上内容，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何在本申请揭露的技术范围内的变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above contents are only specific implementation methods of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present application shall be included in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image generation method, characterized in that the method is applied to an electronic device, a control module and a stable diffusion SD model are configured in the electronic device, the SD model comprises a first encoder and a first decoder, the electronic device comprises a first processor, a second processor and a third processor, the SD model is disposed on the second processor, and the control module is disposed on the third processor, the method comprises:

Before a first time interval, the first processor generates a first control instruction and a second control instruction, wherein the first control instruction comprises a first feature vector of a required text corresponding to image data to be processed, and the second control instruction comprises a second feature vector of the image data to be processed; the first processor sends the first control instruction to the second processor and sends the second control instruction to the third processor;

in a first time interval, the third processor takes a second feature vector of the image data to be processed as input based on the second control instruction, and operates the control module to obtain a first feature map of the image data to be processed;

In a second time interval, the second processor takes a first feature vector of a required text corresponding to image data to be processed as input, and operates the first encoder to obtain a potential code of the required text, wherein the required text is used for describing the content requirement and/or the quality requirement of target image data, and the second time interval and the first time interval have overlapping time intervals;

The second processor obtains the first feature map, operates the first decoder with the potential code and the first feature map as inputs, and generates the target image data.

2. The method of claim 1, wherein the electronic device operates the first encoder with a first feature vector of a desired text corresponding to image data to be processed as an input, and wherein after obtaining the potential encoding of the desired text, the method further comprises:

The electronic equipment acquires a zone bit parameter, wherein the zone bit parameter is used for indicating the running state of the control module, and the running state comprises running end and non-running end;

The electronic device operating the first decoder with the potential encoding and the first feature map as inputs, generating the target image data, comprising:

And in response to the flag bit parameter indicating that the running state of the control module is running end, the electronic device runs the first decoder by taking the potential codes and the first feature map as inputs, and generates the target image data.

3. The method of claim 2, wherein the electronic device obtaining the flag parameter comprises:

The electronic equipment periodically acquires the flag bit parameters;

after the electronic device acquires the flag bit parameter each time, the method further comprises:

responding to the flag bit parameter to indicate that the operation state of the control module is not operation end, and the electronic equipment does not operate the first decoder;

4. The method of claim 2, wherein the flag parameter indicates that the operating state of the control module is not end of operation before the first time interval;

after the obtaining the first feature map of the image data to be processed, the method further includes:

And the electronic equipment updates the zone bit parameter to indicate that the running state of the control module is running ending.

5. The method of claim 4, wherein after generating the target image data, the method further comprises:

and the electronic equipment updates the zone bit parameter to indicate that the running state of the control module is not running.

6. The method of claim 1, wherein during a first time interval, the electronic device takes as input a second feature vector of image data to be processed, and wherein prior to operating the control module to obtain a first feature map of the image data to be processed, the method further comprises:

the electronic equipment runs a camera application in the electronic equipment at a foreground, and a camera of the electronic equipment acquires the image data to be processed;

after the generating the target image data, the method further comprises:

the electronic equipment displays a first interface, wherein the first interface comprises a shooting preview area, and the shooting preview area comprises the target image data.

7. The method of claim 6, wherein the electronic device comprises a plurality of target texts and a plurality of target feature vectors corresponding to the plurality of target texts one-to-one;

In a second time interval, the electronic device uses a first feature vector of a required text corresponding to image data to be processed as input, and runs the first encoder, so that before potential encoding of the required text is obtained, the method further comprises:

The electronic equipment receives setting operation of a user, wherein the setting operation is used for indicating the content requirement and/or quality requirement of the user on target image data;

the electronic device determines the required text matched with the setting operation and the first feature vector corresponding to the required text from the target texts.

8. The method according to claim 5, wherein after the obtaining the first feature map of the image data to be processed, the method further comprises:

in response to the control module ending in operation, the first processor updates the flag bit parameter to indicate that the operation state of the control module is ending in operation;

after the obtaining the potential encoding of the demand text, the method further comprises:

the second processor acquires a flag bit parameter, wherein the flag bit parameter is used for indicating the running state of the control module, and the running state comprises running end and non-running end;

the second processor obtaining the first feature map, operating the first decoder with the potential encoding and the first feature map as inputs, generating the target image data, comprising:

The second processor acquires the first feature map in response to the flag bit parameter indicating that the running state of the control module is running end, and runs the first decoder with the potential codes and the first feature map as inputs to generate the target image data;

after the generating the target image data, the method further comprises:

and the first processor updates the flag bit parameter to indicate that the running state of the control module is not running end.

9. An electronic device comprising a memory and one or more processors; the memory is coupled to the processor; the memory is for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any of claims 1-8.

10. A computer readable storage medium having instructions stored therein, which when run on an electronic device, cause the electronic device to perform the method of any of claims 1-8.

11. A computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-8.