CN111784726B

CN111784726B - Portrait matting method and device

Info

Publication number: CN111784726B
Application number: CN201910912853.5A
Authority: CN
Inventors: 申童; 张炜; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2024-09-24
Anticipated expiration: 2039-09-25
Also published as: CN111784726A

Abstract

The embodiment of the application discloses a portrait matting method and device. One embodiment of the method comprises the following steps: acquiring an original image with a portrait; inputting an original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; determining a mask of the original image based on the original image, the trisection of the original image and a pre-trained matting model; and intercepting the portrait image from the original image by using the mask of the original image. The implementation mode can enable the three-dimensional graph generation model and the matting model not to be coupled, and modification or enhancement of any model is facilitated.

Description

Portrait matting method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a portrait matting method and device.

Background

With the rapid development of smart phones and internet technology, shooting with a mobile phone and editing and sharing pictures have become an important component in people's life. Portrait photos, as a form of photo, play an important role in daily entertainment, social and sharing. Portrait photographs generally refer to photographs in the form of portraits plus background. For such pictures, there is a relatively common need to separate the portrait from the background, i.e. what we often say is portrait segmentation or portrait matting.

Disclosure of Invention

The embodiment of the application provides a portrait matting method and device.

In a first aspect, an embodiment of the present application provides a portrait matting method, including: acquiring an original image with a portrait; inputting an original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; determining a mask of the original image based on the original image, the trisection of the original image and a pre-trained matting model; and intercepting the portrait image from the original image by using the mask of the original image.

In some embodiments, determining a mask for the original image based on the original image, the trisection of the original image, and the pre-trained matting model, includes: correcting the trimap image of the original image; and inputting the original image and the corrected trisection into a pre-trained matting model to obtain a mask of the original image.

In some embodiments, modifying the trimap image of the original image includes: determining at least one connected region of a foreground region in a trimap image of an original image; determining a communication region of which the region area is smaller than a preset first area threshold value in at least one communication region of the foreground region as a first target communication region; and changing the first target communication area from the foreground area to an unknown area to obtain a corrected trimap image.

In some embodiments, modifying the trimap image of the original image includes: determining at least one connected region of a background region in a trimap image of an original image; determining a communication region of which the region area in at least one communication region of the background region is smaller than a preset second area threshold as a second target communication region; and changing the second target communication area from the background area to an unknown area to obtain a corrected trimap image.

In some embodiments, the bipartite graph generation model is trained by: acquiring a first training sample set, wherein the first training sample comprises a first sample image and a first sample trimap image, and the first sample trimap image is generated by performing morphological transformation on a mask corresponding to the first sample image; and respectively taking a first sample image and a first sample trimap image in a first training sample in the first training sample set as input and expected output of a first initial model, and training the first initial model by using a machine learning method to obtain a trimap image generation model.

In some embodiments, the matting model is trained by: acquiring a second training sample set, wherein the second training sample comprises a second sample image, a second sample trimap image and a sample mask, and the second sample image is provided with a portrait; and taking a second sample image and a second sample trimap image in a second training sample in the second training sample set as input of a second initial model, taking a sample mask corresponding to the input second sample image and the second sample trimap image as expected output of the second initial model, and training the second initial model by using a machine learning method to obtain a matting model.

In a second aspect, an embodiment of the present application provides a portrait matting apparatus, including: an acquisition unit configured to acquire an original image in which a portrait is presented; the input unit is configured to input an original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; a determining unit configured to determine a mask of the original image based on the original image, the trisection of the original image, and the pre-trained matting model; and the intercepting unit is configured to intercept the portrait image from the original image by using the mask of the original image.

In some embodiments, the determining unit is further configured to determine the mask of the original image based on the original image, the trisection of the original image, and the pre-trained matting model by: correcting the trimap image of the original image; and inputting the original image and the corrected trisection into a pre-trained matting model to obtain a mask of the original image.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

According to the portrait matting method and device provided by the embodiment of the application, an original image with a portrait is obtained firstly; then, inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; then, determining a mask of the original image based on the original image, the trisection image of the original image and a pre-trained matting model; finally, utilizing the mask of the original image to intercept the portrait image from the original image. By the method, the three-dimensional graph generating model and the matting model are not coupled, and any model is convenient to modify or enhance.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which various embodiments of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a portrait matting method according to the present application;

FIG. 3 is a flow chart of yet another embodiment of a portrait matting method according to the present application;

fig. 4 is a schematic diagram of an application scenario of a portrait matting method according to the present application;

Fig. 5 is a schematic structural view of an embodiment of a portrait matting apparatus according to the present application;

Fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which the portrait matting method or device of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages (e.g., the terminal devices 101, 102, 103 may send captured raw images presenting portraits to the server 105; the server 105 may also send pre-trained bipartite graph generation models and pre-trained matting models to the terminal devices 101, 102, 103), etc. Various communication client applications, such as an image processing-type application, a camera-type application, a face recognition-type application, and the like, may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may first acquire an original image in which a portrait is presented; then, the original image can be input into a pre-trained trimap image generation model to obtain a trimap image of the original image; then, determining a mask of the original image based on the original image, the trisection image of the original image and a pre-trained matting model; finally, the portrait image can be cut out from the original image by using the mask of the original image.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having cameras and supporting information interaction, including but not limited to smartphones, tablets, laptop portable computers, etc. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services. For example, an original image in which a portrait is presented may be analyzed to extract a portrait image from the original image. The server 105 may first acquire an original image in which a portrait is presented; then, the original image can be input into a pre-trained trimap image generation model to obtain a trimap image of the original image; then, determining a mask of the original image based on the original image, the trisection image of the original image and a pre-trained matting model; finally, the portrait image can be cut out from the original image by using the mask of the original image.

It should be noted that, the server 105 may be hardware, or may be software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be noted that, the portrait matting method provided by the embodiment of the present application may be executed by the terminal devices 101, 102, 103, or may be executed by the server 105.

It should be further noted that, the terminal devices 101, 102, 103 may locally store a pre-trained trimap image generating model and a pre-trained matting model, and the terminal devices 101, 102, 103 may determine the trimap image of the original image and a mask of the original image, so as to extract the portrait image from the original image. The exemplary system architecture 100 may now be absent the network 104 and the server 105.

The original image in which the portrait is displayed may be stored locally in the server 105, and the server 105 may obtain the original image in which the portrait is displayed locally. The exemplary system architecture 100 may now be absent of the terminal devices 101, 102, 103 and network 104.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a person image matting method in accordance with the present application is shown. The portrait matting method comprises the following steps:

step 201, an original image presenting an existing portrait is acquired.

In the present embodiment, an execution subject of the portrait matting method (for example, a server shown in fig. 1) may acquire an original image in which a portrait is presented. A figure is a planar or three-dimensional description of a person as a whole, typically including the face, eyes, nose, mouth, ears, eyebrows, etc. of the face. Sometimes also describing the general appearance of a person's limb, refers to a general outline of the person.

Step 202, inputting the original image into a pre-trained trimap image generating model to obtain the trimap image of the original image.

In this embodiment, the execution subject may input the original image acquired in step 201 into a pre-trained trimap image generating model to obtain a trimap image of the original image. A trimap (trimap) can reflect the foreground region, background region (background region), and unknown region (uncertainty region) of the original image. Here, the above-described trimap image generation model may be used to characterize the correspondence between the original image and the trimap image of the original image, i.e., the trimap image generation model may generate the trimap image of the original image based on the original image. Here, the trimap image generation model, which can characterize the correspondence between the original image and the trimap image of the original image, may be trained in various ways.

As an example, there may be a correspondence table in which correspondence of a plurality of sample images and corresponding trimmings of the sample images are stored, which is generated by a technician based on statistics of a large number of sample images and trimmings of the sample images (for example, a trimmings drawn by the technician based on masks corresponding to the sample images), and the correspondence table is used as a trimap image generation model.

Step 203, determining a mask of the original image based on the original image, the trisection of the original image and the pre-trained matting model.

In this embodiment, the execution subject may determine the mask of the original image based on the original image acquired in step 201, the trisection image of the original image obtained in step 202, and a pre-trained matting model. The mask refers to the outside of the frame. The mask herein generally refers to a mask with alpha channels. The alpha channel refers to the transparency and translucency of a picture. Specifically, the execution subject may input the original image and the trimap image of the original image into the pre-trained matting model to obtain a mask of the original image.

Here, the above-described matting model may be used to characterize the correspondence between the original image and the trisection of the original image and the mask of the original image, i.e., the matting model may generate the mask of the original image based on both the original image and the trisection of the original image. Here, a matting model that can characterize the original image and the correspondence between the trimmings of the original image and the masks of the original image may be trained in a variety of ways.

As an example, a correspondence table in which correspondence between a plurality of sample images and a trisection image of the sample images and a mask of the sample images is stored, which is generated by a technician based on statistics of a large number of sample images, the trisection image of the sample images, and the mask of the sample images, may be used as the matting model.

Step 204, using the mask of the original image to extract the portrait image from the original image.

In this embodiment, the execution subject may extract the portrait image from the original image by using the mask of the original image determined in step 203. Since the mask refers to the outside of the frame, the execution subject may determine the inside area of the frame as a portrait image, thereby capturing the portrait image from the original image.

In some optional implementations of this embodiment, the trimap image generating model may be the execution body or other execution bodies for training the trimap image generating model, which are trained by:

in step S1, a first set of training samples may be obtained.

Here, the first training samples in the first training sample set may include a first sample image and a first sample trimap image. The first sample trimap image may be generated by morphological transformation of a mask corresponding to the first sample image. The mask refers to the outside of the box (the inside of the box is the selection field). The mask herein generally refers to a mask with an alpha (alpha ) channel. The alpha channel refers to the transparency and translucency of a picture.

The key problem of the matting technique is to solve the following formula (1):

I_p＝α_pF_p+(1-α_p)B_p (1)

Wherein, I _p is the pixel value of the p-th pixel point of the image, which is a known quantity; alpha _p is the transparency of the p-th pixel of the image, F _p is the foreground pixel value of the p-th pixel of the image, B _p is the background pixel value of the p-th pixel of the image, and the three variables are unknowns. For the understanding of this formula, the original image can be considered as a superposition of foreground and background with a certain weight (alpha _p transparency). For pixels that are fully determined to be foreground, α=1; for pixels that are fully determined to be background, α=0; for pixels that are not determined to be foreground or background (unknown region), α is between 0 and 1.

Here, the mask corresponding to the image may be morphologically transformed to obtain a trimap image of the image by: the mask corresponding to the image may be subjected to an expansion (dilate) transformation and a corrosion (erode) transformation to obtain a trimap image of the image. It should be noted that the expansion and corrosion are generally directed to the foreground region (white portion, highlight portion). Here, the expansion is to expand a foreground region in the image, and the portion of the region expanded by the expansion operation is taken as an unknown region, and it is easily understood that the background region (black portion) naturally becomes smaller after the expansion operation is performed on the image; the etching is performed on a foreground region in the image, and the region reduced by the etching operation is regarded as an unknown region, and it is easily understood that the foreground region (white portion) naturally becomes smaller after the etching operation is performed on the image again.

In the course of morphological transformation, when the morphological transformation is performed using different parameters (expansion parameter, corrosion parameter), the obtained trimap image is different. In order to enhance the effect of the trimap image generated by the trained trimap image generation model during application, each first sample trimap image herein is typically generated by morphological transformation of the mask corresponding to the first sample image with different parameters.

Step S2, a first sample image and a first sample trimap image in a first training sample in the first training sample set may be respectively used as an input and an expected output of a first initial model, and the first initial model may be trained by using a machine learning method, so as to obtain a trimap image generating model.

Here, the first sample image in the first training sample set may be input into the first initial model to obtain a trimap image of the first sample image, and the first sample trimap image in the first training sample is used as the expected output of the first initial model, and the first initial model is trained by using a machine learning method. Specifically, the difference between the obtained trimap image and the first sample trimap image in the first training sample may be calculated first using a preset loss function, and the difference between the obtained trimap image and the first sample trimap image in the first training sample may be calculated using a standard cross entropy loss function as the loss function. Then, based on the calculated difference, the network parameters of the first initial model may be adjusted, and the training may be ended if a preset training end condition is satisfied. For example, the training end conditions preset herein may include, but are not limited to, at least one of: the training time exceeds the preset duration; the training times exceed the preset times; the calculated variance is less than a preset variance threshold. The first initial model may be a convolutional neural network, a deep neural network, or the like.

Here, the trained bipartite graph generation model may be based on a Deeplab V < 3+ > and Resnet50 structure, which may include an encoder that downsamples the input image 16 times and upsamples 4 times. Deeplab V3+ is a tool that can adjust the field of view of the filter, controlling the resolution of the characteristic response calculated by the convolutional neural network.

In some optional implementations of this embodiment, the matting model may be the execution body or other execution bodies for training the matting model are trained by:

in step S1, a second set of training samples may be obtained.

Here, the second training samples in the second training sample set may include a second sample image and a second sample trimap image and a sample mask. A portrait is typically present in the second sample image. The second sample trimap image may be generated by morphological transformation of a mask corresponding to the second sample image. Here, when performing morphological transformation on a mask corresponding to the second sample image, parameters applied at the time of the morphological transformation are generally set to increase the range of the morphological transformation.

Step S2, a second sample image and a second sample trimap image in a second training sample in the second training sample set may be used as input of a second initial model, a sample mask corresponding to the input second sample image and second sample trimap image may be used as expected output of the second initial model, and the second initial model may be trained by using a machine learning method to obtain a matting model.

Here, the second sample image and the second sample trimap image in the second training sample set may be input into the second initial model to obtain a mask of the second sample image, and the second initial model may be trained by using a machine learning method with the sample mask in the second training sample as the expected output of the second initial model. Specifically, the difference between the obtained mask and the sample mask in the second training sample may be calculated first using a preset loss function, and the difference between the obtained mask and the sample mask in the second training sample may be calculated using a regression loss function as the loss function. Then, based on the calculated difference, the network parameters of the second initial model may be adjusted, and the training may be ended if a preset training end condition is satisfied. For example, the training end conditions preset herein may include, but are not limited to, at least one of: the training time exceeds the preset duration; the training times exceed the preset times; the calculated variance is less than a preset variance threshold. The second initial model may be a convolutional neural network, a deep neural network, or the like.

According to the method provided by the embodiment of the application, the original image is input into the pre-trained trimap image generation model to obtain the trimap image of the original image, and then the mask of the original image is determined based on the original image, the trimap image of the original image and the pre-trained matting model. The method can enable the three-dimensional graph generation model and the matting model not to be coupled, and is convenient for modifying or enhancing the function of any model.

With further reference to fig. 3, a flow 300 of yet another embodiment of a person matting method is shown. The process 300 of the portrait matting method includes the following steps:

Step 301, an original image presenting an existing portrait is acquired.

Step 302, inputting the original image into a pre-trained trimap image generating model to obtain a trimap image of the original image.

In this embodiment, steps 301-302 may be performed in a similar manner to steps 201-202, and will not be described again.

Step 303, correcting the trimap image of the original image.

In this embodiment, the execution subject may correct the trimap image of the original image obtained in step 302. As an example, if the execution subject is a terminal device, the execution subject may present the original image and the trimap image of the original image, and the user may correct the original image and the trimap image of the original image presented by the terminal device. If the execution subject is a server, the execution subject may transmit the original image and the trimap image of the original image to a target terminal device (terminal device of a person who corrects the trimap image); then, the target terminal device may present the original image and the trimap image of the original image, and the user may correct the original image and the trimap image of the original image presented by the target terminal device; finally, the target terminal device may send the corrected trimap image of the original image to the execution subject.

And step 304, inputting the original image and the corrected trisection image into a pre-trained matting model to obtain a mask of the original image.

In this embodiment, the execution body may input the original image acquired in step 301 and the corrected trisection image obtained in step 303 into a pre-trained matting model, to obtain a mask of the original image.

Step 305, using the mask of the original image to extract the portrait image from the original image.

In this embodiment, step 305 may be performed in a similar manner to step 204, and will not be described herein.

In some optional implementations of this embodiment, the executing entity may modify the trimap image of the original image by: the execution subject may first determine at least one connected region of the foreground region in the trimap image of the original image. In an image, the smallest unit is a pixel, with 8 adjacent pixels around each pixel. There are 2 common adjacencies: 4 abutment and 8 abutment. 4-contiguous means that 4 points are contiguous up, down, left, and right. The 8-point abutment means that 8 points are adjacent to each other in the vertical, horizontal and diagonal positions. If pixel A is adjacent to B, we call A to B connected, so we have the following conclusion without proof: if A communicates with B, B communicates with C, A communicates with C. Visually, points that are in communication with each other form one region, while points that are not in communication form a different region. Such a set of points communicating with each other is called a communicating region. Here, the adjacent relationship of the pixel points in the communication region is generally 8 adjacent. Then, for each of at least one connected region of the foreground region, the execution subject may determine a region area of the connected region; and determining a communication area of which the area of the area in the at least one communication area is smaller than a preset first area threshold as a first target communication area. Finally, the execution subject may change the region attribute of the first target communication region in the trimap image from the foreground region to an unknown region, to obtain a corrected trimap image.

In some optional implementations of this embodiment, the executing entity may modify the trimap image of the original image by: the execution subject may first determine at least one connected region of a background region in the trimap image of the original image. Here, the adjacent relationship of the pixel points in the communication region is generally 8 adjacent. Then, for each of at least one communication region of the background region, the execution body may determine a region area of the communication region; and determining a communication region of which the region area is smaller than a preset second area threshold value in at least one communication region of the background region as a second target communication region. Finally, the execution subject may change the region attribute of the second target communication region in the trimap image from the background region to an unknown region, to obtain a corrected trimap image.

It should be noted that, the execution subject may change the first target connected region in the trimap image of the original image from the foreground region to the unknown region, and change the second target connected region in the trimap image of the original image from the background region to the unknown region, so as to obtain the corrected trimap image.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the image matting method in this embodiment represents a step 303 of correcting the tri-image of the original image, and a step 304 of inputting the original image and the corrected tri-image into a pre-trained matting model to obtain a mask of the original image. Therefore, the proposal described in the embodiment can correct the output trisection image, thereby improving the accuracy of the matting result.

With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the portrait matting method according to the present embodiment. In the application scenario of fig. 4, an execution subject (e.g., a server or a terminal device) of the portrait matting method may first acquire an original image 401 in which a portrait is presented. Thereafter, the execution subject may input the original image 401 into a pre-trained trimap image generation model 402, resulting in a trimap image 403 of the original image 401. Then, the execution subject may correct the trimap image 403 of the original image to obtain a corrected trimap image 404. The execution subject may then input the original image 401 and the corrected trimap image 404 into a pre-trained matting model 405 to obtain a mask 406 of the original image 401. Finally, the executing body may extract the portrait image 407 from the original image 401 using the mask 406 of the original image.

With further reference to fig. 5, as an implementation of the method shown in the foregoing drawings, the present application provides an embodiment of a portrait matting apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 5, the image matting apparatus 500 of the present embodiment includes: an acquisition unit 501, an input unit 502, a determination unit 503, and an interception unit 504. Wherein the acquisition unit 501 is configured to acquire an original image in which a portrait is presented; the input unit 502 is configured to input an original image into a pre-trained trimap image generation model, resulting in a trimap image of the original image; the determining unit 503 is configured to determine a mask of the original image based on the original image, the trisection of the original image, and the pre-trained matting model; the clipping unit 504 is configured to clip out a portrait image from the original image using a mask of the original image.

In this embodiment, specific processes of the acquisition unit 501, the input unit 502, the determination unit 503, and the interception unit 504 of the image matting apparatus 500 may refer to steps 201, 202, 203, and 204 in the corresponding embodiment of fig. 2.

In some alternative implementations of the present embodiment, the determining unit 503 may correct the trimap image of the original image. As an example, if the execution subject is a terminal device, the determining unit 503 may present the original image and the trimap image of the original image, and the user may correct the original image and the trimap image of the original image presented by the terminal device. If the execution subject is a server, the determining unit 503 may send the original image and the trimap image of the original image to a target terminal device (a terminal device of a person who corrects the trimap image); then, the target terminal device may present the original image and the trimap image of the original image, and the user may correct the original image and the trimap image of the original image presented by the target terminal device; finally, the target terminal device may send the corrected trimap image of the original image to the execution subject. The determining unit 503 may input the original image and the corrected trimap image into a pre-trained matting model to obtain a mask of the original image. Here, the above-described matting model may be used to characterize the correspondence between the original image and the trisection of the original image and the mask of the original image, i.e., the matting model may generate the mask of the original image based on both the original image and the trisection of the original image. Here, a matting model that can characterize the original image and the correspondence between the trimmings of the original image and the masks of the original image may be trained in a variety of ways. As an example, a correspondence table in which correspondence between a plurality of sample images and a trisection image of the sample images and a mask of the sample images is stored, which is generated by a technician based on statistics of a large number of sample images, the trisection image of the sample images, and the mask of the sample images, may be used as the matting model.

In some optional implementations of this embodiment, the determining unit 503 may correct the trimap image of the original image by: the determining unit 503 may first determine at least one connected region of the foreground region in the trimap image of the original image. In an image, the smallest unit is a pixel, with 8 adjacent pixels around each pixel. There are 2 common adjacencies: 4 abutment and 8 abutment. 4-contiguous means that 4 points are contiguous up, down, left, and right. The 8-point abutment means that 8 points are adjacent to each other in the vertical, horizontal and diagonal positions. If pixel A is adjacent to B, we call A to B connected, so we have the following conclusion without proof: if A communicates with B, B communicates with C, A communicates with C. Visually, points that are in communication with each other form one region, while points that are not in communication form a different region. Such a set of points communicating with each other is called a communicating region. Here, the adjacent relationship of the pixel points in the communication region is generally 8 adjacent. Thereafter, the determining unit 503 may determine, for each of at least one connected region of the foreground region, a region area of the connected region; and determining a communication area of which the area of the area in the at least one communication area is smaller than a preset first area threshold as a first target communication area. Finally, the determining unit 503 may change the area attribute of the first target communication area in the trimap image from the foreground area to the unknown area, to obtain a corrected trimap image.

In some optional implementations of this embodiment, the determining unit 503 may correct the trimap image of the original image by: the determining unit 503 may first determine at least one connected region of the background region in the trimap image of the original image. Here, the adjacent relationship of the pixel points in the communication region is generally 8 adjacent. Thereafter, the determining unit 503 may determine, for each of at least one connected region of the background region, a region area of the connected region; and determining a communication region of which the region area is smaller than a preset second area threshold value in at least one communication region of the background region as a second target communication region. Finally, the determining unit 503 may change the region attribute of the second target connected region in the trimap image from the background region to the unknown region, to obtain a corrected trimap image.

in step S1, a first set of training samples may be obtained.

The key problem of the matting technique is to solve the following formula (1):

I_p＝α_pF_p+(1-α_p)B_p (1)

Here, the mask corresponding to the image may be morphologically transformed to obtain a trimap image of the image by: the mask corresponding to the image can be subjected to expansion transformation and corrosion transformation to obtain a trisection image of the image. It should be noted that expansion and corrosion are generally directed to the foreground region. Here, the expansion is to expand a foreground region in the image, and a portion of the region expanded by the expansion operation is taken as an unknown region, and it is easily understood that the background region naturally becomes smaller after the expansion operation is performed on the image; the etching is performed on a foreground region in the image, and the region reduced by the etching operation is taken as an unknown region, and it is easy to understand that the foreground region naturally becomes smaller after the etching operation is performed on the image.

In the course of morphological transformation, when the morphological transformation is performed using different parameters, the obtained trimap image is different. In order to enhance the effect of the trimap image generated by the trained trimap image generation model during application, each first sample trimap image herein is typically generated by morphological transformation of the mask corresponding to the first sample image with different parameters.

in step S1, a second set of training samples may be obtained.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 601. It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an original image with a portrait; inputting an original image into a pre-trained trimap image generation model to obtain a trimap image of the original image; determining a mask of the original image based on the original image, the trisection of the original image and a pre-trained matting model; and intercepting the portrait image from the original image by using the mask of the original image.

Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, an input unit, a determination unit, and an interception unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires an original image presenting a portrait".

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A portrait matting method comprising:

acquiring an original image with a portrait;

Inputting the original image into a pre-trained trimap image generation model to obtain a trimap image of the original image, wherein the trimap image generation model is obtained by training in the following way: acquiring a first training sample set, wherein the first training sample comprises a first sample image and a first sample trimap image, the first sample trimap image is generated by performing morphological transformation on a mask corresponding to the first sample image, and the mask comprises a channel with transparency and translucency of the image; respectively taking a first sample image and a first sample trimap image in a first training sample in the first training sample set as input and expected output of a first initial model, and training the first initial model by using a machine learning method to obtain a trimap image generation model;

Determining a mask of the original image based on the original image, the trisection of the original image and a pre-trained matting model;

And intercepting a portrait image from the original image by utilizing the mask of the original image.

2. A method as recited in claim 1, wherein the determining a mask for the original image based on the original image, a trisection of the original image, and a pre-trained matting model comprises:

correcting the trimap image of the original image;

And inputting the original image and the corrected trisection into a pre-trained matting model to obtain a mask of the original image.

3. The method of claim 2, wherein the modifying the trimap image of the original image comprises:

determining at least one connected region of a foreground region in a trimap image of the original image;

Determining a communication region of which the region area is smaller than a preset first area threshold value in at least one communication region of the foreground region as a first target communication region;

And changing the first target communication area from the foreground area to an unknown area to obtain a corrected trimap image.

4. The method of claim 2, wherein the modifying the trimap image of the original image comprises:

determining at least one connected region of a background region in a trimap image of the original image;

Determining a communication region of which the region area is smaller than a preset second area threshold value in at least one communication region of the background region as a second target communication region;

and changing the background area of the second target communication area into an unknown area to obtain a corrected trimap image.

5. A method as in claim 1 wherein the matting model is trained by:

acquiring a second training sample set, wherein the second training sample comprises a second sample image, a second sample trimap image and a sample mask, and the second sample image is provided with a portrait;

And taking a second sample image and a second sample trisection in a second training sample in the second training sample set as input of a second initial model, taking a sample mask corresponding to the input second sample image and the second sample trisection as expected output of the second initial model, and training the second initial model by using a machine learning method to obtain a matting model.

6. A portrait matting apparatus comprising:

An acquisition unit configured to acquire an original image in which a portrait is presented;

An input unit configured to input the original image into a pre-trained trimap image generation model, to obtain a trimap image of the original image, the trimap image generation model being trained by: acquiring a first training sample set, wherein the first training sample comprises a first sample image and a first sample trimap image, the first sample trimap image is generated by performing morphological transformation on a mask corresponding to the first sample image, and the mask comprises a channel with transparency and translucency of the image; respectively taking a first sample image and a first sample trimap image in a first training sample in the first training sample set as input and expected output of a first initial model, and training the first initial model by using a machine learning method to obtain a trimap image generation model;

a determining unit configured to determine a mask of the original image based on the original image, a trimap image of the original image, and a pre-trained matting model;

and the intercepting unit is configured to intercept a portrait image from the original image by using the mask of the original image.

7. The apparatus of claim 6, wherein the determining unit is further configured to determine a mask of the original image based on the original image, a tri-gram of the original image, and a pre-trained matting model by:

correcting the trimap image of the original image;

8. An electronic device, comprising:

One or more processors;

a storage device having one or more programs stored thereon,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

9. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-5.