[go: up one dir, main page]

CN117495894A - Image generation processing method and electronic equipment - Google Patents

Image generation processing method and electronic equipment Download PDF

Info

Publication number
CN117495894A
CN117495894A CN202311272120.2A CN202311272120A CN117495894A CN 117495894 A CN117495894 A CN 117495894A CN 202311272120 A CN202311272120 A CN 202311272120A CN 117495894 A CN117495894 A CN 117495894A
Authority
CN
China
Prior art keywords
image
target
area
background
design base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311272120.2A
Other languages
Chinese (zh)
Inventor
王维民
朱大鹏
裴立
陈国文
易纪斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Alibaba Overseas Internet Industry Co ltd
Original Assignee
Hangzhou Alibaba Overseas Internet Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Alibaba Overseas Internet Industry Co ltd filed Critical Hangzhou Alibaba Overseas Internet Industry Co ltd
Priority to CN202311272120.2A priority Critical patent/CN117495894A/en
Publication of CN117495894A publication Critical patent/CN117495894A/en
Priority to PCT/CN2024/106268 priority patent/WO2025066457A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0641Shopping interfaces
    • G06Q30/0643Graphical representation of items or shoppers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the application discloses an image generation processing method and electronic equipment, wherein the method comprises the following steps: determining an image to be processed, and dividing a target subject image from the image to be processed; generating a design base map with a transparent background for the target main body image, and carrying out backup processing; the method comprises the steps of utilizing an artificial intelligence AI large language module to complement a background image for a transparent background part in a design base diagram to obtain an AI generated image, wherein the area of the background image in the AI generated image is larger than that of the transparent background in the design base diagram; and covering the backed-up design base diagram on the AI generated image and performing image fusion processing to generate a target image, wherein the target image comprises a complete target main image and shields the edge of the junction between the background image and the target main image. By the embodiment of the application, the high-quality image can be produced more efficiently at lower cost.

Description

Image generation processing method and electronic equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image generating and processing method and an electronic device.
Background
In the commodity information service system, the primary impression of the consumer on the commodity is mainly generated through a commodity main image, and the image represents the external image of the commodity, so that the quality of the commodity main image can directly influence the click rate of the commodity, and further, indexes such as the flow rate, the browse-purchase conversion rate and the like of a shop are improved.
In order to obtain a high quality commodity main map, some merchants can ask professional art designing teams to take live-view shots of commodities, but the cost of this approach can be high. In addition, some merchants or technicians in the system can implement processes such as matting and drawing through offline tools, and compared with professional team, the production cost of the picture can be reduced, but the production efficiency of the picture is not high because of technical threshold and complexity in the operation of the tools. Therefore, in practical application, a large number of commodity graphs with relatively low quality still exist, and it is difficult to bring about improvement of click rate for commodities.
Therefore, it is a technical problem that needs to be solved by those skilled in the art how to produce a high quality commodity image at a lower cost and more efficiently.
Disclosure of Invention
The present application provides an image generation processing method and an electronic apparatus, which can produce a high-quality image at a lower cost and more efficiently.
The application provides the following scheme:
an image generation processing method, comprising:
determining an image to be processed, and dividing a target subject image from the image to be processed;
generating a design base map with a transparent background for the target main body image, and carrying out backup processing;
the method comprises the steps that an artificial intelligence AI large language module is used for complementing a background image for a transparent background part in a design base diagram to obtain an AI generated image, wherein the area of the background image in the AI generated image is larger than that of the transparent background in the design base diagram, and shielding of the edge part of a target main body image is formed;
and covering the backed-up design base image on the AI generated image, and performing image fusion processing to generate a target image, wherein the target image comprises a complete target main image, and the target main image shields the edge of the junction between the background image and the target main image.
The method for supplementing the transparent background part in the design base diagram with the background image by using the AI large language module comprises the following steps:
generating a mask map according to the position and the size of the target subject image in the design base map, wherein the mask map comprises a first area corresponding to the area of the target subject image in the design base map, and the part outside the first area is a second area;
Performing first processing on the mask map to enable the first area to be reduced and the second area to be enlarged;
and calling the AI large language module to generate a background image by taking the design base diagram and the first processed mask diagram as input information so as to complement the generated background image to the enlarged second area under the condition that the image content of the reduced first area in the design base diagram is kept unchanged, thereby obtaining the AI generated image.
Wherein, the first processing on the mask map includes:
and carrying out Gaussian blur processing on the mask map so as to enable the first area in the mask map to be reduced and the second area to be enlarged.
Wherein the image to be processed is an image related to the commodity;
the segmenting the target main body image from the image to be processed comprises the following steps:
identifying the category to which the commodity included in the image to be processed belongs;
and carrying out target detection in the image to be processed according to the identified commodity category, and dividing the target subject image from the image to be processed according to the detected target area.
Wherein if there is an additional white edge around the segmented target subject image, the method further comprises:
Starting from the outermost side of the white edge, inwards selecting a pixel point within a target width range as a pixel point to be processed;
processing the pixel value of the pixel point to be processed on the transparency channel by taking the square N times to remove or weaken the white edge; wherein N is a natural number greater than 1.
Wherein if the segmented target subject image has a jagged edge, the method further comprises:
and carrying out Gaussian blur processing on the area where the sawtooth-shaped edge is located so as to remove or weaken the sawtooth-shaped edge.
Wherein the generating a design base map with transparent background for the target subject image includes:
providing a canvas interface, and displaying the divided target main body image into the canvas interface;
and determining the position and the size of the target subject image in the canvas according to the moving and/or zooming operation performed on the target subject image by the user based on the canvas interface, and generating the design base drawing with the transparent background.
The AI large language module is also associated with a control plug-in, and the control plug-in is used for controlling the background image generated by the AI large language module so as to enhance the association between the generated background image and the target main body image;
The control plug-in reads the area of the target main body image in the control process, and controls the area outside the target main body image by using a control weight lower than that of the area where the target main body image is positioned;
and the area of the target subject image read by the control plug-in is larger than the actual area of the target subject image.
Wherein, still include:
and determining target background style and/or scene description information so as to add the target background style and/or scene description information into input information when the AI large language module is called, and generating a background image consistent with the target background style and/or scene description information by the AI large language module.
An image generation processing method, comprising:
determining an image to be processed, and dividing a target subject image from the image to be processed;
generating a design base map with a transparent background for the target subject image;
in the process of supplementing the transparent background area in the design base diagram by the AI large language module, controlling the generation process of the background image, reading the area of the target main body image in the control process, and controlling the area outside the target main body image by using a control weight lower than that of the area where the target main body image is positioned; wherein the read area of the target subject image is larger than the actual area of the target subject image.
Wherein the reading of the area of the target subject image includes:
reading the region of the target main body image according to the mask diagram corresponding to the design base diagram;
wherein the mask map is generated by: generating an original mask image according to the position and the size of the target subject image in the design base image, wherein the original mask image comprises a first area corresponding to the area of the target subject image in the design base image, and the part outside the first area is a second area;
and performing second processing on the original mask image, and expanding the first region and reducing the second region so as to read the region corresponding to the target main body image according to the mask image obtained after the second processing.
An image segmentation processing method, comprising:
determining a target subject image segmented from the original image;
if additional white edges exist around the segmented target main body image, inwards selecting pixel points within a target width range from the outermost side of the white edges as pixel points to be processed;
processing the pixel value of the pixel point to be processed on the transparency channel by taking the square N times to remove or weaken the white edge; wherein N is a natural number greater than 1.
An image generation processing apparatus comprising:
the main body segmentation unit is used for determining an image to be processed and segmenting a target main body image from the image to be processed;
the design base map generating unit is used for generating a design base map with a transparent background for the target main body image and carrying out backup processing;
the image generation unit is used for complementing the background image for the transparent background part in the design base diagram by using the artificial intelligence AI large language module to obtain an AI generated image, wherein the area of the background image in the AI generated image is larger than that of the transparent background in the design base diagram, and the shielding of the edge part of the target main body image is formed;
and the fusion processing unit is used for covering the backed-up design base image on the AI generated image and carrying out image fusion processing on the AI generated image so as to generate a target image, wherein the target image comprises a complete target main image, and the target main image shields the edge of the junction between the background image and the target main image.
An image generation processing apparatus comprising:
the main body segmentation unit is used for determining an image to be processed and segmenting a target main body image from the image to be processed;
A design base map generating unit for generating a design base map with a transparent background for the target subject image;
the generation control unit is used for controlling the generation process of the background image in the process of complementing the background image for the transparent background area in the design base chart by the AI large language module, reading the area of the target main body image in the control process, and controlling the area except the target main body image by using a control weight lower than the area of the target main body image; wherein the read area of the target subject image is larger than the actual area of the target subject image.
An image segmentation processing apparatus, comprising:
a subject segmentation result determination unit for determining a target subject image segmented from the original image;
a pixel point to be processed determining unit, configured to, if there is an additional white edge around the segmented target subject image, inwardly select a pixel point within a target width range as a pixel point to be processed, starting from an outermost side of the white edge;
the computing processing unit is used for carrying out square processing on the pixel value of the pixel point to be processed on the transparency channel to remove or weaken the white edge; wherein N is a natural number greater than 1.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.
An electronic device, comprising:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding claims.
According to a specific embodiment provided by the application, the application discloses the following technical effects:
according to the embodiment of the application, the main image can be segmented from the image to be processed appointed by the user, and the background image is generated by using the AI large model as the main image under the condition that the main image is kept unchanged. In the process of generating the background image, in order to enable the edge part of the main image and the background image to realize more natural fusion, the AI large model can be firstly enabled to generate a background image with a circle larger than the background image, after the background image with the circle larger than the background image is complemented into the design base image to obtain the AI generated image, the original design base image can be further covered on the AI generated image through an image fusion technology, so that the edge at the junction between the background image generated by the AI large model and the main image can be blocked by the main image, even if the edge of the background image has pure color or speculative content generated by the AI large model, the edge of the background image is not displayed, and therefore, more natural fusion between the main image and the background image can be realized, and a generated image with higher quality can be obtained.
In a preferred implementation manner, after the main image is segmented from the image to be processed, processing such as removing "white edges", "saw teeth" and the like may be performed, so as to further improve the final image effect. When the white edge removing process is performed, a mode of taking the transparency channel pixel value of the pixel point in the white edge area to the power of N is adopted, so that the white edge removing process can be performed in an automatic mode.
In addition, in order to enable the background image generated by the AI large model to better interact with the main image in terms of outline, light and shadow relation and the like, a plug-in program of the AI large model can be used for controlling the generation process of the background image, and in order to avoid the problem of edge diffusion of the main image or single background image caused by too low or too high control weight setting, the control logic of the plug-in program is improved. That is, the same control weight is not used for the whole image, but the control can be performed for the area outside the main image, and the lower control weight can be used for control, so that the control is more applied to the area where the main image is located, the generation of too-single background image by the AI large model is avoided, and the occurrence probability of the edge diffusion of the main image is reduced. Meanwhile, in order to further avoid causing edge diffusion of the main body image, the main body image area read by the control plug-in can be slightly larger than the actual main body image, so that the problem of edge diffusion of the main body image is better solved.
Of course, not all of the above-described advantages need be achieved at the same time in practicing any one of the products of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;
FIG. 2 is a flow chart of a first method provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of "white edges" generated during segmentation of a subject image;
fig. 4 is a schematic diagram of an image processing flow provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of processing a mask map according to an embodiment of the present application;
FIG. 6 is a schematic diagram of the Control Net principle;
FIG. 7 is a schematic diagram of an AI generation graph and a fusion process with a design base graph provided in an embodiment of the application;
FIG. 8 is a flow chart of a second method provided by an embodiment of the present application;
FIG. 9 is a flow chart of a third method provided by an embodiment of the present application;
fig. 10 is a schematic diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.
Firstly, it should be noted that, in the commodity information service system, regarding the quality of the commodity image, besides whether the commodity image is sufficiently clear, whether the shooting angle is selected reasonably, and the like, a more important aspect is whether the mind expressed by the commodity main body image and the background image in the image is unified. For example, if the shooting background is disordered for a relatively delicate commodity, or the quality of the commodity image is compromised if the quality of the shooting background cannot be set off from the delicacy of the commodity. In practical application, a reasonable shooting angle is selected and a clearer commodity image is shot, the technical threshold is relatively low, but how to select a proper background image for the commodity is relatively high, the merchant is required to have enough aesthetic sensitivity, and if the merchant does not have enough aesthetic sensitivity, in the prior art, the merchant needs to ask for help processing of professional art designing teams and the like. Although the beautification of images can also be achieved by some drawing tools, most rely on manual operations, and technical thresholds and complexities still exist.
In view of the above, in the embodiment of the present application, an image generation service may be provided for a user such as a merchant, in which the user may be allowed to upload or designate a specific image to be processed, then a specific subject image may be separated therefrom, and in the case that the content of this part of image is kept unchanged, a suitable background image is generated by using the capability of an AI (Artificial Intelligence ) large language module (Large Language Model, LLM, abbreviated as AI large model), and then a complete image is combined. Because the AI large model has strong diagramming capability, understanding capability of multi-mode information and the like, a more reasonable background image with more uniform mental expression can be generated for a specific main image, and besides training and the like of the AI large model in the earlier stage, additional labor cost and technical threshold are not needed in the use process, so that the image production can be carried out at lower cost and higher efficiency. Of course, in specific implementation, the specific background style and/or scene description information can be selected by the user or specified by default of the system, so that the background image generated by the large AI model can meet the specific style or scene requirement.
In this way, under the requirement that the merchant generates images such as a commodity main image for the commodity, if the commodity is photographed in advance, a photo comprising the commodity main image content is obtained, the quality of the commodity main image content in the photo is relatively high, the optimization is not needed, and only the background needs to be optimized, so that the image generation service provided by the embodiment of the application can be provided. At this time, the photo can be uploaded to a specific service, and the service utilizes the AI large model to generate a more reasonable background image which has more uniform mental expression with the commodity main body image for the photo, so that professional art workers are not required to shoot, and beautifying processing such as manual background replacement is not required to be carried out through the existing drawing tool, thereby reducing cost and improving efficiency.
The large AI Model may refer to a basic Model (Foundation Model), and specifically may refer to a Model with huge parameters trained by using massive data and capable of adapting to a series of downstream tasks. For the AI large model, there is a characteristic that the parameter amount is huge (along with the continuous iteration of the model, the parameter amount generally increases exponentially, from one hundred million to one trillion to one million, and even more) on the parameter scale, and from the mode support, the AI large model gradually develops to support multiple tasks in multiple modes from supporting a single task in a single mode such as picture, image, text, voice, video and the like. That is, the large model generally has high-efficiency understanding capability of multi-mode information, cross-mode sensing capability, migration and execution capability of cross-differentiation tasks, and the like, and may even have multi-mode information sensing capability as embodied by human brain.
From another perspective, the AI large model is a short for an artificial intelligence pre-training large model, and comprises two layers of meanings of the pre-training and the large model, and the two layers of meanings are combined to generate a new artificial intelligence mode, namely, the model can support various downstream applications without fine adjustment after the pre-training is completed on a large-scale data set or with fine adjustment of a small amount of data. That is, the AI large model benefits from its paradigm of "large-scale pretraining plus fine tuning," which can adapt well to different downstream tasks, exhibiting its powerful versatility. The large AI model with universality can obtain excellent performance only by carrying out corresponding fine adjustment in different downstream application scenes under the condition of sharing parameters, and breaks through the limitation that the traditional AI model is difficult to generalize to other tasks.
From the viewpoint of the processing results, the above-described AI large Model also belongs to a Generative Model. Such models not only can "understand" how the data was generated based on the feature predictions, but can also "create" new data based thereon.
With the above capabilities and support of the existing knowledge of the AI large model, some pre-training may be performed on the basic AI large model based on the scene requirements in the embodiments of the present application, so that the AI large model has the background image production capability. For example, some high-definition high-quality images can be collected in advance, images with mind unification of some main images and background images can be used as positive samples, non-unification is used as negative samples, and the like, and then the images are input into an AI large model, and specific background image contents are produced by the AI large model according to requirements; for the content produced by the model, manual verification can be performed, accuracy information and the like of the content are fed back to the AI large model, so that the AI large model is continuously and iteratively learned, and finally the produced content has higher accuracy and the like.
In the background image production process through the AI large model, in order to make the finally generated image look more realistic (that is, more like an image obtained by photographing a subject such as a commodity in a scene of a specific background image, instead of attaching the commodity image to the background image), various improvements are provided in the embodiment of the present application.
First, since it is necessary to keep the main image portion unchanged and regenerate the background image for the portion of the image, a specific way may be to first identify and segment the main image portion (i.e., the portion belonging to the foreground or the main subject in the image to be processed) from the original image to be processed, and then regenerate only the background portion with the portion as the content to be kept unchanged. To achieve this, a design base map with a transparent background may be generated for a subject image after the subject image portion is segmented, and then a Mask (Mask) map of the design base map is obtained. The region belonging to the subject image in the design base map can be covered by the mask map, so that the region is marked as a region which does not need to be generated by the AI, and other regions are regions which need to be generated by the AI large model.
After the above processing, theoretically, the Mask corresponding to the design base map may be directly used as a Mask, and the background image may be generated by the AI large model, but the problem of unnatural blending between the edge of the main image and the background may be caused. To solve the problem, the embodiment of the application is improved, specifically, the background image of a circle larger than the AI is generated by using the AI large language module, and then the original design base chart is covered on the upper layer of the AI generated image, so that the edge of the background image generated by the AI (mainly the edge at the joint of the background image and the main image) can be covered by the main image, the edge of the background image generated by the AI can not be displayed, and the generation effect of the image is improved.
In order to achieve the object, after the Mask corresponding to the design base image is generated, the Mask is not directly used as a Mask, but the region corresponding to the subject image in the Mask is first reduced inwards, and accordingly, the region around which the background image needs to be generated is increased. That is, the area of the target subject image read by the AI large model is made smaller than the actual one, so that the generated background image is slightly larger, and thus, when the background image is completed into the design base image, the edge portion belonging to the subject image in the design base image can be blocked. Of course, in the image generated at this time, the main image lacks the content of the edge portion, so the original design base map can be covered on the upper layer of the AI generated image, so that the main image content in the finally generated image is complete and can be better blended into the surrounding background image.
In order to further enhance the effect of the finally generated image, when the main image is segmented from the original image to be processed, some improvements may be performed, for example, a circle of "white edges" may exist around the main image segmented by using a common segmentation algorithm, or some edges may be saw-toothed, etc., so that the flaws may also be processed to further enhance the authenticity of the finally generated image.
In addition, in the process of generating the background image by the AI large model, although a specific background image can be freely generated by the AI large model in theory, if the specific background image is completely freely generated, there may be no interaction or association between the background image and the subject image, and at this time, even if the specific background image and the subject image are mentally unified, there may be defects in terms of overall coordination and the like. Therefore, in a preferred mode, the generation process of the background image can be controlled through the control plug-in of the AI large model, namely, the background image generated by the AI large model can better interact with the content of the main image on the basis of keeping a certain degree of freedom. However, since the control plug-in controls the generation of the entire image by default, this creates a conflict: if the control weight is too low, the edges of the subject image may spread; if the control weight is too high, the generated background image is single and not rich enough. For this reason, the embodiment of the application also provides a corresponding improvement scheme. Improvements to the above described improvements relating to deburring, serrations, and control inserts will be described in detail below.
From the system architecture perspective, referring to fig. 1, the embodiment of the present application may provide an image generation service for a merchant or a technician in the system, etc., where the service may provide a service end, and a specific AI big model may run on the service end. For the client, an independent client program (App) or a web page and the like can be adopted, and a user can upload an image or designate a related image of a commodity and the like through the client, initiate an image generation request and then process the image by the server. The server may take a certain time when invoking a specific AI large model to generate an image, so the server may also use a task queue to manage. For example, after a user selects the desired style description information and creates a task, the task may be entered into a task queue of the server, then the task may be pulled from the queue by an image generation service in the server, the generated image may be stored by a storage system, for download by the user, and so on.
Specific embodiments provided in the embodiments of the present application are described in detail below.
Example 1
First, this embodiment provides an image generation processing method from the perspective of the server, referring to fig. 2, the method may include:
s201: and determining an image to be processed, and segmenting a target subject image from the image to be processed.
The image to be processed is an image which needs to be generated by the background image. Specifically, regarding the determination manner of the image to be processed, different usage scenarios may be supported, for example, a picture manually uploaded by the user may be used, or, because it may be associated with a specific merchandise information service system, input may be performed by specifying a merchandise link or the like, at this time, an image conforming to the condition may be parsed from the merchandise link as the image to be processed, and so on.
In addition, for the condition that the user uploads the image to be processed, the uploaded image to be processed can be pre-checked, and images including person shielding commodity, multi-commodity main body and the like can be filtered. For example, an object detection service in the system may be invoked, all objects included in the image are detected, whether a human body exists or not is determined by a certain threshold value, then an element identification service is invoked, whether the number of commodity elements in the image is greater than the threshold value is determined, and so on. For example, if the image uploaded by the user does not include a human body, the number of commodity elements does not exceed a threshold (e.g., three, the judgment is mainly used to ensure the quality of the finally generated image, for example, if the same image contains too many commodity elements, each element may be smaller and not suitable as a commodity main image, and thus may be filtered out), then the detection is passed, that is, the image uploaded by the user may be used as an image to be processed, and the main image is segmented in a subsequent step, and a background image is generated.
After the determination of the image to be processed, the target subject image may be segmented from the image to be processed, a so-called "matting" process. In particular implementation, the main image may be segmented by adopting a traditional "matting" manner, or in another manner, because the image to be processed in the embodiment of the present application may be an image related to a commodity, and the commodity itself has category information, the information may be used to assist in image segmentation, so as to improve segmentation accuracy. In this case, therefore, it is possible to first identify the category to which the commodity included in the image to be processed belongs, then perform target detection in the image to be processed based on the identified commodity category, and divide the target subject image from the image to be processed based on the detected target area. For example, the subject image segmentation capability may take Segment analysis (a basic model for image segmentation) +grouping dinio (for object detection, the detected object is a range, if the range contains multiple objects and is not easy to distinguish, all the objects can be covered) as a technical base, after the commodity category is predicted by the category prediction capability (the category name may also be machine-translated into english by a deployed intermediate translation model to adapt to the input language of the grouping dinio), the grouping dinio object detection algorithm may identify the position of the commodity in the subject image according to the category, and call Segment analysis again, and the Mask image of the commodity may be segmented. Then, a certain pixel mapping can be carried out according to the Mask image and the original image of the commodity, and a main image segmentation result can be obtained.
Here, when the image to be processed may include a plurality of commodity elements, a probability value corresponding to each of a plurality of categories that may be included in the image to be processed may be output when the image to be processed is identified as a category. The probability value is also affected by the position, the size and the like of the specific element in the image to be processed, and then the object with the higher probability value can be used as the commodity category of the main body identified in the image to be processed, namely, the commodity category is used as the main body commodity category mainly expressed in the image to be processed, and then the main body image can be segmented according to the identified commodity category.
The subject image segmentation result obtained in the above manner may be a rough result, the rough nature of which mainly manifests itself in two aspects: 1) The edges of the commodity may have additional white edges (abbreviated as "white edges"), for example, as shown at 31 in fig. 3 is a main image in the image, a circle of "white edges" is visible around the main image, and if subsequent image generation is directly performed according to the segmentation result, the "white edges" also exist in the newly generated image; 2) There may also be some jagged edges. Therefore, in order to ensure the final image generation quality, the image segmentation result can be further refined. The optimization may include optimization such as "white edge removal", saw tooth removal, etc.
In order to remove the "white edge", the inventor of the present application first analyzes the characteristic of the "white edge" and finds that the characteristic is as follows: the "white edge" formed by dividing the subject image around the subject image, which is a circle of translucent extra pixel bands, is composed of pixel points which are located inward within the target width range from the outermost side of the "white edge", the width range of which is usually about 5 pixels, and the pixel points exhibit a gradual phenomenon from completely transparent to completely opaque in transparency from outside to inside. Based on the above characteristics, the embodiment of the application provides a specific method for removing the "white edge", specifically, each pixel point on the extra pixel band may be used as a pixel point to be processed, and then, the pixel values of the pixel points to be processed on the transparency channel are calculated to be N (N may be a natural number greater than 1, for example, may be generally taken as 2, 3, etc.) times, so that the pixel points with high original transparency become more transparent, the pixel points with low original transparency become more opaque, and the transparency change curve of the pixel points is changed from a linear curve to an exponential curve, so that the "white edge" is narrowed, even to an extent that the transparency cannot be perceived by naked eyes, thereby achieving the purpose of removing or weakening the white edge.
In addition, for the jagged edges, there are some methods of removing the saw teeth in the prior art, and in the embodiments of the present application, these methods are preferred, and it is verified that the gaussian blur processing is more suitable for the specific scene characteristics in the embodiments of the present application. Therefore, the area where the sawtooth-shaped edge is located can be subjected to Gaussian blur processing so as to remove or weaken the sawtooth-shaped edge.
After the processing of removing the white edges and the saw teeth, the segmented target main body image has higher quality, and a better fusion effect can be obtained when the segmented target main body image is fused with a background image generated by an AI large model.
S202: and generating a design base map with a transparent background for the target subject image, and carrying out backup processing.
After the target subject image is segmented from the image to be processed, a design base map with a transparent background may be generated for the target subject image. The design base chart is usually a drawing required for a designer to complete design of a solution, plan view, and analysis drawing, and is also referred to as the first drawing of the drawing stage. In the embodiment of the application, the target main body image can be added into a canvas, and the canvas can be provided with a transparent background, so that the canvas carrying the transparent background of the target main body image becomes a design base chart. The design base map and the finally generated target image have the same size, and the subsequent AI large model is based on the design base map to complement the background image.
The specific design base diagram may be generated in various manners, for example, in one manner, the canvas size may be set by default by the system, and the position and the size of the target main body image in the canvas, so as to generate the design base diagram of the transparent background. Alternatively, the canvas may be manually resized by the user, and the location, size, etc. of the target subject image in the canvas. For example, in a specific implementation, a canvas interface may be provided in an interface of the client, after the division of the main body image of the image to be processed is completed, the canvas interface may be entered, a specific canvas may be displayed as a default size, and the target main body image is displayed at a default position in the canvas in the default size, after that, the user may adjust the size of the canvas according to his own needs or preferences, and may perform operations such as moving the position, scaling, etc. on the target main body image, so as to change the position, size, etc. of the target main body image in the canvas. After the user completes the operation, a specific design base graph may be generated.
That is, the design base drawing includes canvas, which determines the size of the finally generated image, and the target subject image, which is transparent in the background, and information such as the position, size, etc. of the target subject image in the canvas. For example, as shown in fig. 4, assuming that the original input image to be processed is "pic1" in fig. 4, after the subject image is segmented, the obtained target subject image may be shown as "pic2" in fig. 4, and then the target subject image may be added to a canvas, and operations such as dragging or scaling may be performed on the target subject image to generate a design base diagram shown as "pic3" in fig. 4. It should be noted here that, in order to display the size of the canvas, and the position, size, etc. of the subject image in the canvas, the design base diagram shown by "pic3" in fig. 4 shows a checkerboard style background, which is actually a background in the fetched client operation interface, and it will be understood by those skilled in the art that the design base diagram is actually a transparent background.
After the design base map is generated, a backup may be performed, that is, a copy may be stored separately, where one copy may be used to generate an AI-generated image, and another copy may be used to continue processing the AI-generated image, which will be described in detail later.
S203: and supplementing a background image for the transparent background part in the design base diagram by using an artificial intelligence AI large language module to obtain an AI generated image, wherein the area of the background image in the AI generated image is larger than that of the transparent background in the design base diagram, and the edge part of the target main body image is blocked.
After the design base diagram with the transparent background is generated, the artificial intelligence AI large language module can be utilized to complement the background image for the transparent background part in the design base diagram so as to obtain an AI generated image. In the embodiment of the present application, in the AI-generated image, the area of the background image may be larger than the area of the transparent background in the design base map, that is, the AI large model may generate a background image of "one circle" so that the background image may form a mask for the edge portion of the target subject image.
In particular, in order to achieve the above objective, there may be various ways, for example, in one way, a mask map may be first generated according to a position and a size of the target subject image in the design base map, where a size of the mask map may also be consistent with the design base map, and the mask map includes a first area corresponding to an area where the target subject image in the design base map is located, and a portion outside the first area is a second area. In this embodiment, the first area may be black, and the second area is white, so that the first area is represented as an area that needs to be masked, that is, an area that does not need to be generated by AI, or an area that needs to keep the image content unchanged. For example, a diagram shown as "pic4" in fig. 4 is a mask diagram generated in the corresponding example (in this diagram, only a first region portion of black is shown, and a second region portion of white is not shown).
After obtaining the mask map, as described above, in theory, the mask map may be directly used as a mask of the design base map, that is, the design base map and the mask map are input into the AI large model, and the AI large model may generate the background image; however, when generating the background image, the AI large model does not include the region where the subject image is located, but generates the background image only for the region other than the subject image, and then complements the design base image with the generated background image. That is, the image output by the AI large model may be generated by stitching together the background image generated by the AI large model with the main body image portion in the design base map. While the AI large model may refer to some information of the main image portion when generating the background image, it is mainly information about colors of pixels, that is, what the main image is, may not be known about the AI large model, so that the AI large model may generate the background image as white or some other color for an edge portion of the background image (mainly, an edge at a boundary with the main image), or may speculatively generate some content, and so on. However, in any specific manner, the above-mentioned edge portion of the background image may be difficult to achieve a relatively natural transition with the content of the subject image, that is, it still looks like an image generated by pasting the subject image onto the background image, rather than an image obtained by capturing the subject in a scene corresponding to the background image.
Therefore, in the embodiment of the present application, the mask map may be first processed so that the first area (i.e., the area corresponding to the subject image) in the mask map is reduced, and correspondingly, the second area is enlarged. In particular, the first region may be inwardly tapered, although the shape of the edge may be maintained. For example, as shown in fig. 5, assuming that fig. 5 (a) is an original mask, a black region in the middle represents a region where the target subject image is located, that is, the first region, and a white region around is the second region; if the mask is taken directly as a mask for the design base, the AI large model generates a background image for the second region, for example, as shown by the striped filled region in FIG. 5 (B); however, in the embodiment of the present application, the first region in the mask map is first reduced, so that the second region is enlarged, and then the background image is generated, for example, as shown in fig. 5 (C), the area of the middle black region is smaller than that of the black region in fig. 5 (a), and naturally, the edge shape of the black region may be kept unchanged. Then, when the background image can be generated using the processed mask image as a mask for the design base image, the area of the background image generated by the AI large model is larger than the area of the background image generated in fig. 5 (B), as shown by the area filled with stripes in fig. 5 (D).
That is, if the function of the mask map is regarded as "hollowing out" the portion belonging to the subject image region in the design base map so that the background image is generated for other regions by the AI large model while keeping the content of this portion unchanged, after the first region in the mask map is subjected to the reduction processing, the portion "hollowing out" in the design base map becomes smaller, and accordingly, the AI large model can be made to generate one more background image content.
In particular, in order to reduce the first area and expand the second area of the mask, there may be various implementations, for example, in one implementation, the mask may be processed using gaussian blur (specifically, large kernel gaussian blur, where a larger kernel size may generate a stronger blur effect), so that a white portion of the mask may be diffused and a black area may be reduced.
After the first processed mask map is obtained, the design base map and the processed mask map are used as input information, and the AI large model is called to generate a background image. At this time, the AI large model may use the processed mask map as a mask for the design base map, and then generate a background image for the enlarged second region. In specific implementation, stable diffration and the like can be selected as the AI large model in the embodiment of the application, and of course, before the model is specifically used, the model can also be subjected to fine tuning according to the scene in the embodiment of the application, for example, the fine tuning training of the model includes collecting some high-quality commodity images and the like.
However, if the AI large model is completely freely generated, there may be no interaction or association between the background image and the subject image in terms of outline, light-shadow relation, depth, etc., and at this time, even if the two are unified mentally, there may be defects in terms of overall coordination, etc. Therefore, in a preferred mode, the generation process of the background image can be controlled through the control plug-in of the AI large model, namely, the background image generated by the AI large model can better interact with the content of the main image in the aspects of outline, light-shadow relation, depth and the like on the basis of keeping a certain degree of freedom, and the overall picture quality is improved. For example, for the Stable Diffusion model, the Control can be achieved by a Control Net plug-in. Specifically, a Control diagram for performing Control Net plug-in Control may be generated by preprocessing a design base diagram, for example, as shown by "pic5" in fig. 4 for the example shown in fig. 4, the Control diagram may be similar to a line script diagram, that is, may be generated by extracting lines included in a subject image, which may include lines for drawing edges, shadows, and the like of the subject image, and the like. By inputting such a Control diagram into the Control Net plug-in, the background image generation of the Stable diffration model can be controlled so that the generated background image can interact with the subject image content.
However, the Control Net plug-in is a network capable of generating Control effects on each layer of the Stable diffration backbone network, and by default, the Control effect is generated on the whole image, which generates a contradiction: if the control weight is too low, the edges of the subject image may spread (which may cause the subject image to have a distorted visual effect); if the control weight is too high, the generated background image is single, and a proper weight is difficult to find to reach a better balance point. Under the background, the embodiment of the application also modifies the Control logic of the Control Net, specifically, in the Control process, the position of the main image in the design base map can be read, and for the part outside the main image, the Control weight can be reduced, so that the Control on the background image is reduced.
For ease of understanding, the principle of Control Net will be briefly described below. The control net has the function of being capable of controlling an AI large model for generating images, such as Stable diffration, and the like, and controlling the image generation through various conditions, such as edge detection, sketch processing or gesture, and the like, and particularly can support various types of input, including sketch, edge image, semantic segmentation image, human body key point characteristics, hough transformation detection straight line, depth map, human body skeleton, and the like, so as to generate images which are more close to the demands of users. Taking the Stable Diffusion model as an example, in one Stable Diffusion model, if control net is not added, the original neural network may be as shown in fig. 6 (a). In control net, the original neural network of the Stable diffration model is locked and set as locked copy, and then the model of the original network is modeled as a copy, called as a trace copy, and operation is performed on the copy to apply control conditions. That is, the ControlNet first replicates the weights of the Stable diffration model over to get a "trainable copy". In contrast, the original Stable Diffusion model is pre-trained with billions of pictures, so parameters are "locked". While this "trainable copy" only requires training on a small dataset for a particular task to learn the condition control. As shown in fig. 6 (B), the "lock model" and the "trainable copy" are connected by a 1 x 1 convolutional layer, known as a "zero convolutional layer". The weight and bias of the zero convolution layer is initialized to 0, so that the training speed is very fast, and is close to the speed of the fine-tuning Stable Diffusion model, and even the training can be performed on personal equipment. Specifically, the control condition is added to the original input after zero convolution, the added control condition enters a simulated neural network block ("trainable copy") of the control net, and the network output is added to the output of the original network after zero convolution again, so as to obtain a final output.
In the prior art, the control weights of the control net on the whole graph are the same, however, as described above, in the scenario of the embodiment of the present application, the problem that the control net sets the control weights too high or too low is that the control net generates a corresponding problem, and it is difficult to find a suitable weight. Therefore, in the embodiment of the present application, an implementation scheme of performing control by dividing regions is adopted. Specifically, the control net may control in units of pixels, and when a pixel to be controlled currently is read, it may first determine whether the pixel is located in an area where the main image is located, if so, control is performed by using a first control weight, otherwise, if the current pixel is located outside the area where the main image is located, control may be performed by using a second control weight, where the first control weight is higher than the second control weight. That is, the control of the control net on the background image generated by the AI large model is reduced, so that the high degree of freedom of the generation is maintained, and the generated background image is prevented from being too single. Meanwhile, for the pixels in the area where the main image is located, higher weight can be adopted for control, so that the occurrence probability of the problems of deformation and the like of the main image caused by edge diffusion is reduced. It should be noted that, in the network structure of the Stable diffration model, there may be multiple layers, where each layer corresponds to a different image sampling rate, for example, including 8×8, 16×16, 32×32, 64×64, and so on; correspondingly, the control net can also comprise multiple layers, and each layer in the Stable diffration model is controlled respectively. In this case, when each layer is controlled, the weight control scheme may be performed in accordance with the above-described divided regions.
In addition, in the process of performing control net control in the above-mentioned divided regions, the position of the read subject image may be slightly larger than the actual position of the subject image, otherwise edge diffusion may not be completely avoided. That is, control Net is mainly used to make interaction between the background image and the main image, but the interaction area is not completely according to the edge of the main image, but is slightly larger than the main image area, so that the generated image looks more like the main image is actually located in the scene of the background image, rather than being pasted.
In particular, in order to achieve the above object, a second process may be performed on the mask map corresponding to the design base map, so that the first area is enlarged and the second area is reduced. The second processed mask pattern may be provided to the control plug-in, so that the control plug-in uses a smaller control weight for the reduced second region than for the enlarged first region in the control process.
In particular, when the second processing is performed on the mask map, there may be multiple manners, for example, in one manner, after the mask map corresponding to the design base map is subjected to the inverse color processing, the gaussian blur processing is performed, so that the first area in the mask map is enlarged, and the second area is reduced. That is, the mask pattern may be etched in reverse to form a mask pattern with a first region expanded, and then the main and non-main positions may be distinguished by the image pixel values in the mask pattern.
After the background image is generated for the second area by the AI large model, the background image can be complemented into the design base diagram, so that an AI generation diagram is obtained, in the AI generation diagram, the image content of the part covered by the first area after the shrinking processing in the mask diagram is kept unchanged, and other areas are the background images generated by the AI. For example, as shown in fig. 7, assume that fig. 7 (a) is a design base chart, a middle part is a subject image, and other areas are transparent backgrounds; as shown in fig. 7 (B), it can be seen that the background portion in the design base map has been complemented in the state where the mask map shown in fig. 5 (C) is used as a mask, but at the same time, it can be seen that the background image generated by the AI large model obscures the edge portion of the subject image, that is, the subject image is incomplete in the state, since the first region of the mask map is subjected to the reduction process.
Here, the example shown in fig. 7 is merely for explaining the case where the edge of the subject image is blocked by the background image and cannot be completely displayed, and in practical application, the effect of the background image generated by the AI large model is much higher than that shown in fig. 7, and there is no obvious black edge or the like.
In addition, when the AI large model is specifically called to generate an image, the information such as the background style and/or scene description specifically required can be input in addition to the design base map and the processed mask map, so that the AI large model can generate a background image corresponding to the style or scene. Specifically, such background styles and/or scene description information may be submitted after selection by a user, for example, as shown in fig. 4, a "blue sea" style may be selected, and a corresponding legend may be provided in the selection interface for the specific style for reference by the user. Alternatively, a default style or scene may be used, etc., without selection by the user. Such style or scene description information may be used as a hint word for the AI large model, enabling the AI large model to generate a background image of the corresponding style or scene.
Under the condition that a plurality of optional styles or scenes are provided, when the AI large model is trained, the AI large model can be trained in various different styles or scenes respectively, and different model parameters can be correspondingly used in different styles or scenes, so that the AI large model can produce high-quality background images in various different styles or scenes.
S204: and covering the backed-up design base image on the AI generated image, and performing image fusion processing to generate a target image, wherein the target image comprises a complete target main image, and the target main image shields the edge of the junction between the background image and the target main image.
Because the first area in the mask image is reduced in the process of generating the background image by the AI large model, the edge area of the main image in the AI generated image is blocked by the background image, so that after the AI generated image is obtained, the original design base image backed up before can be covered on the AI generated image again, and meanwhile, specific images can be fused better through some image fusion processes. In this way, the edges of the background image generated by the AI (the edges at the interface with the background image) can be blocked by the main image in the original design base map, so that even if some solid or other guessed edges exist in the background image generated by the AI, the edges of the main image can be blocked by the main image, and the edges of the main image can be more naturally blended into the background image. For example, as shown in fig. 7 (C), in the finally generated target image, the subject image content is still complete, and the edge portion that interfaces with the subject content in the AI-generated background image may be blocked. In addition, since the divided subject image may be subjected to processing such as "white edge" removal and "jaggy" removal in the preferred embodiment, the finally generated image may be made more like a subject in an actual scene, rather than a map manually generated using a drawing tool.
After the target image is obtained, scoring can be carried out, and if the scoring result meets the condition, the scoring result can be stored in a storage system of the server side for the user to download. For example, assuming that a specific image to be processed is an image related to a commodity, after a background image is added by using the method provided by the embodiment of the present application, the image may be used for publishing to a commodity information service system, as a commodity main image, and so on.
For a better understanding of the specific implementation of the embodiments of the present application, a complete implementation procedure is described below in conjunction with the example shown in fig. 4 (of course, in practical applications, some steps may be omitted).
41: before specific image generation is carried out, firstly, a network structure of an AI large model can be selected, and in addition, a plurality of high-quality high-definition images can be utilized for training the model, and specific generation parameters can be determined;
42: specifically, when the image is generated, firstly, the original pic1 to be processed can be determined;
43: carrying out main image segmentation processing on the original image, and segmenting a main image pic2 from the main image segmentation processing;
44: generating a design base diagram pic3 of the transparent background according to the main body segmentation result;
45: generating a Mask (Mask) diagram pic4 corresponding to the design base diagram;
46: in an alternative manner, a graph pic5 for performing Control Net Control can be generated according to the design base graph;
47: in an alternative manner, the desired scene may be selected by the user;
48: generating a Prompt word text (Prompt) for carrying out dialogue with the AI large model according to a representative picture and the like corresponding to a specific scene, wherein the Prompt word text can specifically comprise Prompt words in positive and negative directions;
49: taking a design base diagram pic3, a mask diagram pic4 and a diagram pic5 for performing Control Net Control as inputs, and performing background diagram generation of an AI large model to obtain an AI generated image pic6; the main image area can be reduced based on the original mask image by the part input into the AI large model, and the corresponding non-main image area can be enlarged; for the part input into the Control Net, the main image area can be enlarged based on the original mask image, and the corresponding non-main image area can be reduced, so that different Control weights can be set for the sub-areas, the generated background image is prevented from being single in process, and the phenomena of edge diffusion and the like of the main image are avoided;
410: fusing the AI generated image pic6 with the backed-up design base map pic3 to obtain a final generated map pic7, wherein the generated map can be stored in a server;
411: the user can obtain a specific download map pic8 according to the download address provided by the server.
In summary, according to the embodiment of the present application, a subject image may be segmented from an image to be processed specified by a user, and a background image may be generated for the subject image using an AI large model while the subject image is kept unchanged. In the process of generating the background image, in order to enable the edge part of the main image and the background image to realize more natural fusion, the AI large model can generate a background image of 'one circle' and after the background image of 'one circle' is complemented into the design base image to obtain the AI generated image, the backed-up design base image can be further covered on the AI generated image through an image fusion technology, so that the edge at the junction between the background image generated by the AI large model and the main image can be blocked by the main image, even if the edge of the background image has pure color or speculative content generated by the AI large model, the edge of the background image is not displayed, therefore, the more natural fusion between the main image and the background image can be realized, and the generated image with higher quality is obtained.
In a preferred implementation manner, after the main image is segmented from the image to be processed, processing such as removing "white edges", "saw teeth" and the like may be performed, so as to further improve the final image effect. When the white edge removing process is performed, a mode of taking the transparency channel pixel value of the pixel point in the white edge area to the power of N is adopted, so that the white edge removing process can be performed in an automatic mode.
In addition, in order to enable the background image generated by the AI large model to better interact with the main image in terms of outline, light and shadow relation and the like, a plug-in program of the AI large model can be used for controlling the generation process of the background image, and in order to avoid the problem of edge diffusion of the main image or single background image caused by too low or too high control weight setting, the control logic of the plug-in program is improved. That is, the same control weights are not used for the whole image, but can be distinguished, and for the area outside the main image, the control can be performed by using lower control weights, so that the control is more applied to the area where the main image is located, the generation of too-single background images by the AI large model is avoided, and the occurrence probability of edge diffusion of the main image is reduced. Meanwhile, in order to further avoid causing edge diffusion of the main body image, the main body image area read by the control plug-in can be slightly larger than the actual main body image, so that the problem of edge diffusion of the main body image is better solved.
Example two
In the first embodiment, it is mentioned that in the process of generating the background image by the AI large model, the plug-in program may control the generating process of the background image, so that the generated background image better forms interaction with the main image in terms of contour, light-shadow relation, depth, and the like, and meanwhile, in the controlling process, for the region outside the main image region, a lower control weight may be adopted, so as to avoid that the generated background image is too single; in addition, the read subject image area may be slightly larger than the actual subject image to avoid causing edge diffusion of the subject image. This method may be independent, and thus, this embodiment is separately protected. Specifically, the second embodiment provides an image generation processing method, referring to fig. 8, which may include:
s801: determining an image to be processed, and dividing a target subject image from the image to be processed;
s802: generating a design base map with a transparent background for the target subject image;
s803: in the process of supplementing the transparent background area in the design base diagram by the AI large language module, controlling the generation process of the background image, reading the area of the target main body image in the control process, and controlling the area outside the target main body image by using a control weight lower than that of the area where the target main body image is positioned; wherein the read area of the target subject image is larger than the actual area of the target subject image.
In specific implementation, the region of the target main body image can be read according to the mask diagram corresponding to the design base diagram;
wherein the mask map is generated by: generating an original mask image according to the position and the size of the target subject image in the design base image, wherein the original mask image comprises a first area corresponding to the area of the target subject image in the design base image, and the part outside the first area is a second area;
and performing second processing on the original mask image, and expanding the first region and reducing the second region so as to read the region corresponding to the target main body image according to the mask image obtained after the second processing.
Example III
In the first embodiment, the problem of "white edge" possibly existing after the segmentation of the subject image is also mentioned, and a corresponding solution is provided in the embodiment of the present application, and the solution may also be applied to other scenes outside the embodiment of the present application, so the solution is separately protected in the third embodiment. Specifically, the third embodiment further provides an image segmentation processing method, referring to fig. 9, the method specifically may include:
S901: determining a target subject image segmented from the original image;
s902: if additional white edges exist around the segmented target main body image, inwards selecting pixel points within a target width range from the outermost side of the white edges as pixel points to be processed;
s903: processing the pixel value of the pixel point to be processed on the transparency channel by taking the square N times to remove or weaken the white edge; wherein N is a natural number greater than 1.
That is, in any application scenario, the subject image is segmented by using any algorithm, and as long as there is a circle of "white edges" around the segmented subject image, the method provided in the embodiment of the present application may be used to remove or attenuate such "white edges".
For the details of the second and third embodiments, reference may be made to the description of the first embodiment and other portions of the present specification, and the details are not repeated here.
It should be noted that, in the embodiments of the present application, the use of user data may be involved, and in practical applications, user specific personal data may be used in the schemes described herein within the scope allowed by applicable legal regulations in the country where the applicable legal regulations are met (for example, the user explicitly agrees to the user to actually notify the user, etc.).
Corresponding to the first embodiment, the embodiment of the present application further provides an image generation processing apparatus, which may include:
the main body segmentation unit is used for determining an image to be processed and segmenting a target main body image from the image to be processed;
the design base map generating unit is used for generating a design base map with a transparent background for the target main body image and carrying out backup processing;
the image generation unit is used for complementing the background image for the transparent background part in the design base diagram by using the artificial intelligence AI large language module to obtain an AI generated image, wherein the area of the background image in the AI generated image is larger than that of the transparent background in the design base diagram, and the shielding of the edge part of the target main body image is formed;
and the fusion processing unit is used for covering the backed-up design base image on the AI generated image and carrying out image fusion processing on the AI generated image so as to generate a target image, wherein the target image comprises a complete target main image, and the target main image shields the edge of the junction between the background image and the target main image.
Specifically, the image generating unit may specifically include:
Generating a mask map according to the position and the size of the target subject image in the design base map, wherein the mask map comprises a first area corresponding to the area of the target subject image in the design base map, and the part outside the first area is a second area;
performing first processing on the mask map to enable the first area to be reduced and the second area to be enlarged;
and calling an artificial intelligence AI large language module to generate a background image by taking the design base diagram and the first processed mask diagram as input information so as to complement the generated background image to the enlarged second area under the condition that the image content of the reduced first area in the design base diagram is kept unchanged, thereby obtaining the AI generated image.
Specifically, the first region in the mask map may be reduced and the second region may be enlarged by performing gaussian blur processing on the mask map.
Wherein the image to be processed is an image related to the commodity;
the body dividing unit may specifically be configured to:
identifying the category to which the commodity included in the image to be processed belongs;
and carrying out target detection in the image to be processed according to the identified commodity category, and dividing the target subject image from the image to be processed according to the detected target area.
Wherein if there is an additional white edge around the segmented target subject image, the apparatus further comprises:
the white edge processing unit is used for inwards selecting pixel points within a target width range from the outermost side of the white edge as pixel points to be processed; processing the pixel value of the pixel point to be processed on the transparency channel by taking the square N times to remove or weaken the white edge; wherein N is a natural number greater than 1.
In addition, if the segmented target subject image has a jagged edge, the apparatus may further include:
and the sawtooth processing unit is used for carrying out Gaussian blur processing on the area where the sawtooth edge is located so as to remove or weaken the sawtooth edge.
Specifically, the design base map generating unit may specifically be configured to:
providing a canvas interface, and displaying the divided target main body image into the canvas interface;
and determining the position and the size of the target subject image in the canvas according to the moving and/or zooming operation performed on the target subject image by the user based on the canvas interface, and generating the design base drawing with the transparent background.
The AI large language module is also associated with a control plug-in, and the control plug-in is used for controlling the background image generated by the AI large language module so as to enhance the association between the generated background image and the target main body image;
The control plug-in reads the area of the target main body image in the control process, and controls the area outside the target main body image by using a control weight lower than that of the area where the target main body image is positioned;
and the area of the target subject image read by the control plug-in is larger than the actual area of the target subject image.
In addition, the apparatus may further include:
and the style scene determining unit is used for determining target background style and/or scene description information so as to add the target background style and/or scene description information into input information when the AI large language module is called, and the AI large language module generates a background image consistent with the target background style and/or scene description information.
Corresponding to the embodiment, the embodiment of the application also provides an image generation processing device, which may include:
the main body segmentation unit is used for determining an image to be processed and segmenting a target main body image from the image to be processed;
a design base map generating unit for generating a design base map with a transparent background for the target subject image;
the generation control unit is used for controlling the generation process of the background image in the process of complementing the background image for the transparent background area in the design base chart by the AI large language module, reading the area of the target main body image in the control process, and controlling the area except the target main body image by using a control weight lower than the area of the target main body image; wherein the read area of the target subject image is larger than the actual area of the target subject image.
Specifically, the generating control unit may read the area of the target main body image according to the mask map corresponding to the design base map;
wherein the mask map is generated by: generating an original mask image according to the position and the size of the target subject image in the design base image, wherein the original mask image comprises a first area corresponding to the area of the target subject image in the design base image, and the part outside the first area is a second area;
and performing second processing on the original mask image, and expanding the first region and reducing the second region so as to read the region corresponding to the target main body image according to the mask image obtained after the second processing.
Corresponding to the embodiment, the embodiment of the application also provides an image segmentation processing device, which may include:
a subject segmentation result determination unit for determining a target subject image segmented from the original image;
a pixel point to be processed determining unit, configured to, if there is an additional white edge around the segmented target subject image, inwardly select a pixel point within a target width range as a pixel point to be processed, starting from an outermost side of the white edge;
The computing processing unit is used for carrying out square processing on the pixel value of the pixel point to be processed on the transparency channel to remove or weaken the white edge; wherein N is a natural number greater than 1.
In addition, the embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method of any one of the foregoing method embodiments.
And an electronic device comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.
Fig. 10 illustrates an architecture of an electronic device, which may include a processor 1010, a video display adapter 1011, a disk drive 1012, an input/output interface 1013, a network interface 1014, and a memory 1020, among others. The processor 1010, the video display adapter 1011, the disk drive 1012, the input/output interface 1013, the network interface 1014, and the memory 1020 may be communicatively connected by a communication bus 1030.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit, processor), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided herein.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. The memory 1020 may store an operating system 1021 for controlling the operation of the electronic device 1000, and a Basic Input Output System (BIOS) for controlling the low-level operation of the electronic device 1000. In addition, a web browser 1023, a data storage management system 1024, an image generation processing system 1025, and the like can also be stored. The image generation processing system 1025 may be an application program that specifically implements the operations of the foregoing steps in the embodiments of the present application. In general, when implemented in software or firmware, the relevant program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1013 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The network interface 1014 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1030 includes a path to transfer information between components of the device (e.g., processor 1010, video display adapter 1011, disk drive 1012, input/output interface 1013, network interface 1014, and memory 1020).
It is noted that although the above-described devices illustrate only the processor 1010, video display adapter 1011, disk drive 1012, input/output interface 1013, network interface 1014, memory 1020, bus 1030, etc., the device may include other components necessary to achieve proper operation in an implementation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the present application, and not all the components shown in the drawings.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The image generation processing method and the electronic device provided by the application are described in detail, and specific examples are applied to illustrate the principle and the implementation of the application, and the description of the above examples is only used for helping to understand the method and the core idea of the application; also, as will occur to those of ordinary skill in the art, many modifications are possible in view of the teachings of the present application, both in the detailed description and the scope of its applications. In view of the foregoing, this description should not be construed as limiting the application.

Claims (14)

1. An image generation processing method, comprising:
determining an image to be processed, and dividing a target subject image from the image to be processed;
generating a design base map with a transparent background for the target main body image, and carrying out backup processing;
the method comprises the steps that an artificial intelligence AI large language module is used for complementing a background image for a transparent background part in a design base diagram to obtain an AI generated image, wherein the area of the background image in the AI generated image is larger than that of the transparent background in the design base diagram, and shielding of the edge part of a target main body image is formed;
and covering the backed-up design base image on the AI generated image, and performing image fusion processing to generate a target image, wherein the target image comprises a complete target main image, and the target main image shields the edge of the junction between the background image and the target main image.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the step of using the AI large language module to complement the background image for the transparent background part in the design base graph comprises the following steps:
generating a mask map according to the position and the size of the target subject image in the design base map, wherein the mask map comprises a first area corresponding to the area of the target subject image in the design base map, and the part outside the first area is a second area;
performing first processing on the mask map to enable the first area to be reduced and the second area to be enlarged;
and calling the AI large language module to generate a background image by taking the design base diagram and the first processed mask diagram as input information so as to complement the generated background image to the enlarged second area under the condition that the image content of the reduced first area in the design base diagram is kept unchanged, thereby obtaining the AI generated image.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
the first processing of the mask map includes:
and carrying out Gaussian blur processing on the mask map so as to enable the first area in the mask map to be reduced and the second area to be enlarged.
4. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the image to be processed is an image related to the commodity;
the segmenting the target main body image from the image to be processed comprises the following steps:
identifying the category to which the commodity included in the image to be processed belongs;
and carrying out target detection in the image to be processed according to the identified commodity category, and dividing the target subject image from the image to be processed according to the detected target area.
5. The method of claim 1, wherein the step of determining the position of the substrate comprises,
if there is an additional white edge around the segmented target subject image, the method further comprises:
starting from the outermost side of the white edge, inwards selecting a pixel point within a target width range as a pixel point to be processed;
processing the pixel value of the pixel point to be processed on the transparency channel by taking the square N times to remove or weaken the white edge; wherein N is a natural number greater than 1.
6. The method of claim 1, wherein the step of determining the position of the substrate comprises,
if there is a jagged edge in the segmented target subject image, the method further comprises:
and carrying out Gaussian blur processing on the area where the sawtooth-shaped edge is located so as to remove or weaken the sawtooth-shaped edge.
7. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the generating a design base map with transparent background for the target subject image includes:
providing a canvas interface, and displaying the divided target main body image into the canvas interface;
and determining the position and the size of the target subject image in the canvas according to the moving and/or zooming operation performed on the target subject image by the user based on the canvas interface, and generating the design base drawing with the transparent background.
8. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the AI large language module is also associated with a control plug-in, and the control plug-in is used for controlling the background image generated by the AI large language module so as to enhance the association between the generated background image and the target main body image;
the control plug-in reads the area of the target main body image in the control process, and controls the area outside the target main body image by using a control weight lower than that of the area where the target main body image is positioned;
and the area of the target subject image read by the control plug-in is larger than the actual area of the target subject image.
9. The method according to any one of claims 1 to 8, further comprising:
and determining target background style and/or scene description information so as to add the target background style and/or scene description information into input information when the AI large language module is called, and generating a background image consistent with the target background style and/or scene description information by the AI large language module.
10. An image generation processing method, comprising:
determining an image to be processed, and dividing a target subject image from the image to be processed;
generating a design base map with a transparent background for the target subject image;
in the process of supplementing the transparent background area in the design base diagram by the AI large language module, controlling the generation process of the background image, reading the area of the target main body image in the control process, and controlling the area outside the target main body image by using a control weight lower than that of the area where the target main body image is positioned; wherein the read area of the target subject image is larger than the actual area of the target subject image.
11. The method of claim 10, wherein the step of determining the position of the first electrode is performed,
the area of the read target subject image includes:
reading the region of the target main body image according to the mask diagram corresponding to the design base diagram;
wherein the mask map is generated by: generating an original mask image according to the position and the size of the target subject image in the design base image, wherein the original mask image comprises a first area corresponding to the area of the target subject image in the design base image, and the part outside the first area is a second area;
and performing second processing on the original mask image, and expanding the first region and reducing the second region so as to read the region corresponding to the target main body image according to the mask image obtained after the second processing.
12. An image segmentation processing method, characterized by comprising:
determining a target subject image segmented from the original image;
if additional white edges exist around the segmented target main body image, inwards selecting pixel points within a target width range from the outermost side of the white edges as pixel points to be processed;
processing the pixel value of the pixel point to be processed on the transparency channel by taking the square N times to remove or weaken the white edge; wherein N is a natural number greater than 1.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of any of claims 1 to 12.
14. An electronic device, comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of claims 1 to 12.
CN202311272120.2A 2023-09-27 2023-09-27 Image generation processing method and electronic equipment Pending CN117495894A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202311272120.2A CN117495894A (en) 2023-09-27 2023-09-27 Image generation processing method and electronic equipment
PCT/CN2024/106268 WO2025066457A1 (en) 2023-09-27 2024-07-18 Image generation processing method and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311272120.2A CN117495894A (en) 2023-09-27 2023-09-27 Image generation processing method and electronic equipment

Publications (1)

Publication Number Publication Date
CN117495894A true CN117495894A (en) 2024-02-02

Family

ID=89667929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311272120.2A Pending CN117495894A (en) 2023-09-27 2023-09-27 Image generation processing method and electronic equipment

Country Status (2)

Country Link
CN (1) CN117495894A (en)
WO (1) WO2025066457A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118823039A (en) * 2024-09-14 2024-10-22 中科星图数字地球合肥有限公司 A method, device and medium for eliminating specific targets in panoramic images
WO2025066457A1 (en) * 2023-09-27 2025-04-03 杭州阿里巴巴海外互联网产业有限公司 Image generation processing method and electronic device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082366B (en) * 2021-03-12 2024-07-19 中国移动通信集团广东有限公司 Image synthesis method and system
US11915429B2 (en) * 2021-08-31 2024-02-27 Gracenote, Inc. Methods and systems for automatically generating backdrop imagery for a graphical user interface
CN116740120A (en) * 2023-06-09 2023-09-12 深圳市超像素智能科技有限公司 Background replacement method, device, electronic equipment and readable storage medium
CN116797507A (en) * 2023-06-30 2023-09-22 成都恒图科技有限责任公司 Intelligent fusion method for image main body and new background
CN117495894A (en) * 2023-09-27 2024-02-02 杭州阿里巴巴海外互联网产业有限公司 Image generation processing method and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025066457A1 (en) * 2023-09-27 2025-04-03 杭州阿里巴巴海外互联网产业有限公司 Image generation processing method and electronic device
CN118823039A (en) * 2024-09-14 2024-10-22 中科星图数字地球合肥有限公司 A method, device and medium for eliminating specific targets in panoramic images
CN118823039B (en) * 2024-09-14 2024-12-10 中科星图数字地球合肥有限公司 Method, equipment and medium for eliminating specific target in panoramic image

Also Published As

Publication number Publication date
WO2025066457A1 (en) 2025-04-03

Similar Documents

Publication Publication Date Title
CN109670558B (en) Digital image completion using deep learning
US8824821B2 (en) Method and apparatus for performing user inspired visual effects rendering on an image
JP7059318B2 (en) Learning data generation method and system for classifier learning with regional characteristics
CN114372931B (en) A method, device, storage medium and electronic device for blurring a target object
CN117495894A (en) Image generation processing method and electronic equipment
CN112634282B (en) Image processing method and device and electronic equipment
US12210800B2 (en) Modifying digital images using combinations of direct interactions with the digital images and context-informing speech input
JP7592160B2 (en) Method and device for training an image processing model, image processing method and device, electronic device, and computer program
Beyeler OpenCV with Python blueprints
US20240127510A1 (en) Stylized glyphs using generative ai
JP7656721B2 (en) Subdividing and removing objects from media items
AU2023270207A1 (en) Modifying digital images via scene-based editing using image understanding facilitated by artificial intelligence
AU2023270205A1 (en) Dilating object masks to reduce artifacts during inpainting
US20240361891A1 (en) Implementing graphical user interfaces for viewing and interacting with semantic histories for editing digital images
AU2023270204A1 (en) Detecting and modifying object attributes
US20240404138A1 (en) Automatic removal of lighting effects from an image
KR20220012785A (en) Apparatus and method for developing object analysis model based on data augmentation
Mir et al. Invisibility Cloak using Color Extraction and Image Segmentation with OpenCV
CN112927321B (en) Intelligent image design method, device, equipment and storage medium based on neural network
US20250104196A1 (en) Generating non-destructive synthetic lens blur with in-focus edge rendering
US20250104197A1 (en) Interactively refining a digital image depth map for non destructive synthetic lens blur
US20250104198A1 (en) Interactively adjusting light source brightness in digital images with non-destructive synthetic lens blur
US20240362758A1 (en) Generating and implementing semantic histories for editing digital images
US20250005763A1 (en) Modifying digital images via adaptive rendering order of image objects
US20250209779A1 (en) Toroidal segmentation of repeating patterns in images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40106580

Country of ref document: HK