CN112991371B

CN112991371B - Automatic image coloring method and system based on coloring overflow constraint

Info

Publication number: CN112991371B
Application number: CN202110423250.6A
Authority: CN
Inventors: 普园媛; 吕大华; 徐丹; 赵征鹏; 周浩; 袁国武; 钱文华
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2024-01-05
Anticipated expiration: 2041-04-20
Also published as: CN112991371A

Abstract

The invention relates to an automatic image coloring method and system based on coloring overflow constraint. The method comprises the following steps: acquiring a plurality of original color images; converting each original color image into a gray level image, and converting each original color image into a line image by utilizing an edge detection algorithm; inputting the gray level image and the line image into a dual-channel generator at the same time to generate a predicted color image; constructing an image automatic coloring model according to the predicted color image and the original color image; the automatic image coloring model is used for automatically coloring any gray level image. The invention can achieve complete coloring effect.

Description

Automatic image coloring method and system based on coloring overflow constraint

Technical Field

The invention relates to the field of image processing, in particular to an automatic image coloring method and system based on coloring overflow constraint.

Background

As a means of image processing, color information is supplemented for gray level images, so that better ornamental effect and experience can be obtained. Gray image coloring is a simple task for the painter to easily color gray images, such as blue ocean and red sun. The colored objects can be in multiple modes as long as the common color knowledge of the normal life scene is used. The natural coloring image is the most realistic picture of human beings, and comprises the image coloring of the cartoon industry, so that people and scenes can be more vivid, and the appreciation degree of people is improved. With the improvement of the social coloring demand, the requirement on the automatic coloring technology of the image is also higher and higher, so that the image coloring technology has important application value and research significance for the living application of human beings and the development of related industries.

In general, the problem of image coloring focuses on the accuracy of the colored region and the rationality of color selection. To address these problems, a number of algorithms have been developed. They can be divided into three categories: based on the user-directed auxiliary coloring method, the user manually marks the region to be colored to obtain the coloring source and the colors are automatically distributed in space; an automatic coloring method based on a reference image, wherein coloring is performed according to the colors of categories and similar areas in the reference image; the full-automatic coloring method provides a reference for coloring images through a large amount of training data.

For the first auxiliary coloring method based on user guidance, human intervention is required to complete coloring, and the accuracy of coloring is closely related to the accuracy of human labeling. Levin proposes a framework optimized based on this method, but is limited to neighboring pixels with similar intensities to maintain similar colors, and is not suitable for a wide range of applications. Sykora proposes an optimized framework based on graphic cutting with a flexible hand-drawn cartoon coloring style, which is easy to apply to various painting styles, but does not take into account edge information for the coloring work, so that the result of these methods naturally leads to color boundary overflow. To prevent color boundary overflow of objects Huang improves the method by applying adaptive edge detection, thereby ensuring the integrity of the color transmission. In other words, such methods rely on the accuracy with which the user marks the painted area, which is prone to color errors, confusing results.

For the second automatic coloring method based on the reference image, it is necessary to transfer color information in the reference image into the target image. Welsh proposes a method of coloring based on a reference color histogram. There are also methods for mapping the color distribution of a reference image to an input image by calculating color statistics of the input image and the reference image and then establishing a mapping function, such as some color transfer techniques. The possibility of coloring errors is reduced compared to the first method, but it is still necessary to provide a reference image corresponding to coloring, which is time-consuming in practical work.

For the third fully automatic coloring method. After the back propagation has occurred, neural networks have begun to be used for a variety of tasks, and convolutional neural network models have also been successfully applied in various fields, including image coloring. The full-automatic coloring method attracts a great deal of research workers, and Lizuka proposes an automatic coloring method based on global and local images, which accomplishes classification tasks by training a large data set, and colors gray-scale images using the feature. Larsson uses a convolutional neural network model to predict the color histogram of each pixel in the gray-scale image, which typically generates a suitably colored image, with the result being in fact inconsistent compared to the color of the original image.

With the advent of generation of antagonism networks (Generative Adversarial Networks, GAN), many CNN-based applications have been replaced. The image translation work to generate the countermeasure network will be more efficient in the image rendering task than the CNN model, and the closer to the input image, the better the learning mapping effect. In Zhu et al work, the GAN network model includes a John-based generator and a arbiter PatchGAN, resNet architecture that acts as a transformation network to generate input-conditioned images, followed by successfully transforming between unpaired images in a similar scene or role. The method is particularly suitable for image coloring, and can color in similar scenes or actions of the model, but the automatic coloring method which is proposed previously is not realized aiming at coloring edges, so that the coloring effect overflows easily, the effect of complete coloring cannot be achieved, and the coloring effect is poor.

Disclosure of Invention

The invention aims to provide an automatic image coloring method and system based on coloring overflow constraint, which are used for solving the problems that the existing automatic coloring method is easy to cause coloring effect overflow, cannot achieve complete coloring effect and is poor in coloring effect.

In order to achieve the above object, the present invention provides the following solutions:

an automatic image coloring method based on coloring overflow constraint, comprising:

acquiring a plurality of original color images;

converting each original color image into a gray level image, and converting each original color image into a line image by utilizing an edge detection algorithm;

inputting the gray level image and the line image into a dual-channel generator at the same time to generate a predicted color image;

constructing an image automatic coloring model according to the predicted color image and the original color image; the automatic image coloring model is used for automatically coloring any gray level image.

Optionally, the acquiring a plurality of original color images further includes:

processing the original color image to generate an original color image with the size of 256 multiplied by 3; the dimensions 256×256×3 are the width pixel value, the length pixel value, and the number of channels of the original color image, respectively.

Optionally, the inputting the gray-scale image and the line image into the dual-channel generator simultaneously to generate the predicted color image specifically includes:

inputting the gray level image and the line image into a feature extractor in the dual-channel generator for three convolutions, and generating a convolved gray level image feature map and a convolved line image feature map;

Fusing the convolved gray image feature map and the convolved line image feature map to generate a feature fused image;

inputting gray features and line features in the image after feature fusion into a feature converter in the dual-channel generator to convert the gray features and the line features into color features;

and generating a predicted color image according to the color characteristics.

Optionally, the constructing an image automatic coloring model according to the predicted color image and the original color image specifically includes:

constructing a one-way mapping loss function according to the predicted color image and the original color image, and adjusting the predicted color image based on the one-way mapping loss function to generate a color image with adjusted tone;

extracting predicted color image features of the predicted color image and original color image features of the original color image, generating a predicted color image Gram matrix according to the predicted color image features, and generating an original color image Gram matrix according to the original color image features;

calculating a style loss function according to the predicted color image Gram matrix and the original color image Gram matrix, and adjusting the color image after tone adjustment according to the style loss function to generate a color image after style adjustment;

Inputting the color image with the adjusted style and the original color image into a discriminator based on a countermeasures loss function, and judging whether the similarity between the color image with the adjusted style and the original color image is larger than a similarity threshold;

if yes, determining the dual-channel generator as an automatic coloring model of the trained image;

if not, establishing a cyclic loss function, and based on the cyclic loss function, enabling the first generator to learn from the original color image to obtain color mapping until the similarity between the color image after style adjustment and the original color image is larger than a similarity threshold.

Optionally, the one-way mapping loss function is:

L _uml (G)＝E[||y ₁ -y|| ₁ ]the method comprises the steps of carrying out a first treatment on the surface of the Wherein the method comprises the steps of，L _uml For predicting colour image y ₁ And the expected value of the color image y; y is ₁ To predict a color image; y is the original color image; e is a desired function; i ₁ For predicting colour image y ₁ And the absolute value of the mean value of the color image y.

Optionally, the style loss function is:

wherein L is _style Is a style loss function; i is the i layer of the dual-channel generator; />A Gram matrix for predicting a color image; g (x, z) is a predicted color image generated by the dual-channel generator with a gray image x as a first input and a line image z as a second input; said- >A Gram matrix is an original color image; phi represents the characteristic diagram of the ith layer of the dual-channel generator.

Optionally, the cyclic loss function is:

wherein L is _cyc A first generator for generating a predicted color image for the cyclic loss function; f is a second generator for generating a gray level image;for the first cycle (x, z) →y ₁ →x ₂ Is a function of the desired function of (2); />For the second cycle y → x ₁ Then (x) ₁ ,z)→y ₂ Is a function of the desired function of (2); p (P) _data Data distribution; f (G (x, z)) is the first-cycle reconstructed grayscale image x ₂ The method comprises the steps of carrying out a first treatment on the surface of the G (F (y), z) is the color image y reconstructed in the second cycle ₂ F (y) is the predicted gray image x of the second generator ₁ 。

Optionally, the countermeasures loss function is:

wherein L is _GAN (G,D _Y X, Y, Z) is an counterloss function; dy is a discriminator probability judgment; x is the whole data set of the gray image; y is the whole data set of the line image; z is the whole data set of the original color image; d (D) _Y (G (x, z)) is a determination result of the predicted color image generated for the first time; d (D) _Y And (y) is a determination result of the original color image y.

An automatic image rendering system based on a rendering overflow constraint, comprising:

the original color image acquisition module is used for acquiring a plurality of original color images;

the original color image conversion module is used for converting each original color image into a gray level image and converting each original color image into a line image by utilizing an edge detection algorithm;

The prediction color image generation module is used for inputting the gray level image and the line image into the dual-channel generator at the same time to generate a prediction color image;

an image automatic coloring model construction module for constructing an image automatic coloring model according to the predicted color image and the original color image; the automatic image coloring model is used for automatically coloring any gray level image.

Optionally, the image automatic coloring model specifically comprises a feature extraction stage, a feature fusion stage, a feature conversion stage, an up-sampling calculation stage, an activation layer and an output layer which are sequentially connected;

the feature extraction stage comprises two input channels, wherein one input channel is used for inputting gray images, and the other input channel is used for inputting line images; each input channel comprises three convolution layers;

the feature fusion stage is used for fusing the convolved gray image feature map and the convolved line image feature map to generate a feature fused image;

the feature conversion stage comprises 4 densely connected DenseNet blocks;

the upsampling calculation phase comprises two upsampling layers and a convolution block.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention provides an automatic image coloring method and system based on coloring overflow constraint, which utilizes a double-channel generator, adopts a double-channel input mode to help line features to serve as constraint of gray image coloring and prevent coloring overflow.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an automatic image coloring method based on coloring overflow constraint provided by the invention;

FIG. 2 is a schematic diagram of a network structure of a dual-channel generator provided by the present invention;

FIG. 3 is a schematic diagram of a network structure of a arbiter according to the present invention;

FIG. 4 is a graph showing the comparison of effects under cyclic mapping of different parameters according to the present invention;

FIG. 5 is a graph comparing the effect of the present invention with those tested on datasets by Zhu et al, isola et al, harrish et al and Yoo et al;

FIG. 6 is a block diagram of an image automatic shading system based on shading overflow constraint provided by the present invention;

fig. 7 is a schematic diagram of the whole network structure of the image automatic coloring method based on coloring overflow constraint provided by the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention aims to provide an automatic image coloring method and system based on coloring overflow constraint, which can achieve a complete coloring effect.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Fig. 1 is a flowchart of an automatic image coloring method based on coloring overflow constraint, as shown in fig. 1, and the automatic image coloring method based on coloring overflow constraint includes:

step 101: a plurality of original color images are acquired.

In order to train the coloring model, a large number of color images are selected to obtain images with the size of 256 multiplied by 3, wherein the images respectively correspond to the width, the height and the number of channels of the picture, and the color mode is RGB.

And converting each color image into a corresponding gray level image and converting the gray level image into a corresponding line image through an edge detection algorithm, and finally respectively matching the gray level image x, the color image y and the line image z as a training data set of the invention.

Step 102: and converting each original color image into a gray level image, and converting each original color image into a line image by utilizing an edge detection algorithm.

Step 103: and simultaneously inputting the gray level image and the line image into a dual-channel generator to generate a predicted color image.

The step 103 specifically includes: inputting the gray level image and the line image into a feature extractor in the dual-channel generator for three convolutions, and generating a convolved gray level image feature map and a convolved line image feature map; fusing the convolved gray image feature map and the convolved line image feature map to generate a feature fused image; inputting gray features and line features in the image after feature fusion into a feature converter in the dual-channel generator to convert the gray features and the line features into color features; and generating a predicted color image according to the color characteristics.

The method for constructing an image automatic coloring model according to the predicted color image and the original color image specifically comprises the following steps: constructing a one-way mapping loss function according to the predicted color image and the original color image, and adjusting the predicted color image based on the one-way mapping loss function so that the color tone of the predicted color image is consistent with that of the original color image, and generating a color image with adjusted color tone; extracting predicted color image features of the predicted color image and original color image features of the original color image, generating a predicted color image Gram matrix according to the predicted color image features, and generating an original color image Gram matrix according to the original color image features; calculating a style loss function according to the predicted color image Gram matrix and the original color image Gram matrix, and adjusting the color image after tone adjustment according to the style loss function so that the color image after tone adjustment is consistent with the style of the original color image, and generating a color image after style adjustment; inputting the color image after style adjustment and the original color image into a discriminator based on a countermeasure loss function, and judging whether the similarity between the color image after style adjustment and the original color image is larger than a similarity threshold value as shown in fig. 3; if yes, determining the dual-channel generator as an automatic coloring model of the trained image; if not, establishing a cyclic loss function, and based on the cyclic loss function, enabling the first generator to learn from the original color image to obtain color mapping until the similarity between the color image after style adjustment and the original color image is larger than a similarity threshold.

The color image y is taken as an input and is input into a dual-channel generator to form a cyclic mode (a first cyclic process is that a gray image is converted into a color image by a first generator and then the gray image is reconstructed by a second generator, (x, z) to y) ₁ →x ₂ The method comprises the steps of carrying out a first treatment on the surface of the The second cycle is a cycle process y-x of converting the color image into gray image by the second generator and reconstructing the color image by the first generator ₁ →y ₂ ) Generating a corresponding gray-scale image x ₁ 。

Step 104: constructing an image automatic coloring model according to the predicted color image and the original color image; the automatic image coloring model is used for automatically coloring any gray level image.

The image automatic coloring model is executed as follows: and respectively taking the gray level image x and the corresponding line image z in the training data set as the input of the automatic image coloring model, and executing a feature extraction stage, a feature fusion stage, a feature conversion stage and an up-sampling calculation stage to finally generate a corresponding color image.

The method comprises the following specific steps:

feature extraction:

the feature extraction stage consists of three convolution blocks, and the gray level image x and the line image z are mapped into a generated color image y ₁ As a first generator, the gray level image x and the line image z of two branches are taken as input of the generator and enter the feature extractor for three times of convolution, the convolution channels are respectively 32, 64 and 128, the feature images obtained by the two convolution branches are respectively subjected to the conventional example normalization processing, in addition, the normalized feature images are subjected to the conventional Relu activation function operation for accelerating the training speed, the formula is f (x) =max (0, x), x refers to the feature images after the normalization processing, and the gray level image feature images and the line image feature images after the normalization processing are obtained after the operation.

Feature fusion stage:

as shown in fig. 2, the jump connection is a marked concat part, the feature superposition is a plus part between two branches, the feature fusion stage is composed of two jump connections and feature superposition, the input of the convolution layer of the two branches is jump connected with the up-sampling layer, and the outputs of the feature extraction layers of the two branches are superposed to obtain a feature fusion result.

Feature conversion stage:

the feature conversion stage of the present invention consists of DenseNet, representing that the inputs of each layer are associated with the outputs of all the preceding layers, i.e. densely connected. As in the DenseNet part of fig. 2, each convolution layer is densely connected together with all previous convolution layers on the channel.

And (3) inputting the result after feature fusion into a feature converter of the first generator, and converting the result into color features at corresponding positions by using gray scale and line features. The calculation formula is as follows:

f _l ＝H _l ([f ₀ ，f ₁ ，...，f _l-1 ])

wherein H is _l Representative is a nonlinear transfer function that is combined from a batch normalization, a Relu activation function, and a 3 x 3 convolution operation. f (f) ₀ ，f ₁ And f _1-1 Features representing the first, second and 1-1 st convolution layers, f, respectively, in the feature transformer ₁ Is the output of the feature transformation stage.

Up-sampling calculation phase:

deconvolution consists of two upsamples and a convolution block, which, like the existing deconvolution module, corresponds in turn to the portions named deconv1, deconv2 and output of fig. 2 according to the data flow order.

Each layer outputs a characteristic graph with the number of channels of the size, the length and the width of 64 x 128, 32 x 256, and 256 x 256 sequentially, characteristic splicing is carried out through concat, and the final layer is convolved with the length and the width of 256 but the number of channels of 3, and a color image y with the length and the width of 3 x 256 is obtained through a Tanh activation function of a Relu layer ₁ 。

To ensure that the image y is generated ₁ The invention aims to establish a mapping relation between the two parts so as to lead the color image y to generate an image y by better detail processing and tone information consistent with the true color image y ₁ Guiding the mapping information again to enable the generated predicted color image y to be ₁ Some fine adjustments are automatically made and this calculation process is named one-way map loss. Color image y generated by up-sampling calculation ₁ In contrast to the color image y, a one-way mapping loss function is calculated, expressed as:

L _uml (G)＝E[||y ₁ -y|| ₁ ]

wherein y is ₁ Representing the generated predicted color image, y representing the color image of step 1, E representing the desired function. L (L) _uml Representing the generated colour image y ₁ And the expected value of the color image y, which is used for measuring the consistency of the tone of the color image y and the color image y, and optimizing y by the existing back propagation mode of the function result ₁ And (3) the distribution of the feature map is restrained to be consistent with the distribution of the y feature map as far as possible until the function loss converges.

The Gram matrix is composed of arbitrary k (k is less than or equal to n) vectors in the n-dimensional European space, wherein the vectors are mutually inner products, and the larger the numerical value of the inner product is, the larger the correlation is represented, and the more similar the vectors are. In the field of computer vision, gram matrices are often used to grasp the overall style of an image, are largely used as style characteristic losses in the work of image style migration, and the difference between Gram matrices of two images is minimized as an optimization target, so that the style of a baseline image is continuously adjusted to be close. In the present invention, image colorization is essentially a style migration task, so this style penalty will be added to calculate the distance of the generated image from the Gram matrix of the real image feature map, and by continuously optimizing the difference of the penalty to minimize it, encourages similarity in color style to the style of the real image, in order to ensure that the hue style of the generated image remains consistent with the hue style of the real image.

Using predictive colour image y ₁ Extracting features from the original color image y to form a Gram matrix for comparisonCalculating a style loss function, the style loss function expressed as:

wherein L is _style Representing the style loss function, i representing the i-th layer of the network, G' representing the glamer (Gram) matrix, Φ representing the map of the i-th layer of the network. The loss function represents the generated color image y ₁ The expected values of style characteristics and y style characteristics of the color image are used for measuring the correlation of the style characteristics and the y style characteristics of the color image, and the y is optimized by the function result through the existing back propagation mode ₁ And the style characteristics are Gram matrixes, the distance between the Gram matrixes and the matrix is restrained to be minimized until the function loss converges, and the style of the Gram matrixes and the matrix is consistent.

The loss function is derived from y ₁ And y, the distance between the Gram matrices of y is determined byMeasured by the expected value E of (a). The greater the distance, the greater the expected value, L _style The larger the style correlation of the two, and vice versa. The L obtained _style Optimizing y by existing back propagation means ₁ Is arranged so that the distance between the matrix and the Gram matrix of y is close to L _style The value is reduced until L after training for a period of time _style The value remains unchanged and stable, indicating that the two styles remain consistent.

The method is based on the HingeLoss loss function as the existing contrast loss of the GAN, and mainly aims to train the game effect between the generator and the arbiter, optimize the coefficients of the generator and the arbiter in a counter-propagation mode, minimize the contrast loss and finally achieve the dynamic balance of the generator and the arbiter. The loss function is expressed as:

wherein L is _GAN Represents countermeasures against losses, E represents expectations, P _data Representing the data distribution, it is immaterial that the first part is x and z as inputs and the second part is y as input. D (D) _y Representing the decision of the probability of the discriminator, 1 represents true, 0 represents false, and G (x, z) represents the generated color image y ₁ X and z represent an input gray-scale image and line image, respectively, and y represents a color image. The loss function is an antagonistic loss function used by existing GANs. Mainly for training the game effect between the generator and the arbiter, E of the first part _x,z I.e. the discrimination of the effect of the discriminator on the generator generation, E of the second part _y Is the discrimination of the original color image by the discriminator, the closer the result of the two E's is to 1, the better the effect is represented, and the added result is L _GAN I.e. representing the challenge loss function. The E obtained _x,z And E is _y Respectively optimizing the generation effect of the generator and the discrimination effect of the discriminator by the existing counter propagation mode to ensure that the corresponding expected value E approaches 1 until the result L of the counterloss function is obtained after training for a period of time _GAN And remains unchanged and stable, the representation generator and the arbiter reach dynamic balance.

Gray scale image x to be circularly reconstructed ₂ Comparing with the image x in the first step; conversely, the color image y to be circularly reconstructed ₂ Compared to the color image y. Their expected losses were calculated, expressed as:

wherein L is _cyc Represented as a cyclic loss function. F represents the gray image x reconstructed for the second generator cycle ₂ The result L of the function _CYC The magnitude of the difference is measured by the expected value of the difference determined from the difference between the reconstructed image and the real image. The larger the difference, the larger the expected value, L _CYC The larger the effect difference representing the reconstructed image and the real image is, the larger. The L obtained _cyc Optimizing distribution of reconstructed image by existing back propagation mode to make it match with realityThe distribution of the images is close to L _cyc The value is reduced until L after training for a period of time _cyc The values remain unchanged and stable, i.e. represent that the reconstructed image is close to the real image, and this looping pattern is smooth.

The function is mainly used to ensure that the cyclic pattern proceeds such that the reconstructed image is close to the real image. And minimizes the cyclic loss function with proper operation of the cyclic mode.

Training the training process according to a preset period. Letting the generated predictive color image y be made by existing back propagation techniques ₁ And (3) performing corresponding to the color image y in the corresponding step one, and performing iterative optimization on the loss function by using the existing gradient descent optimization algorithm to obtain a minimized loss function, so that the predicted color image effect is close to the original color image effect step by step, and obtaining a final model after the preset training times are completed, wherein in the optimization process, the anti-loss function value, the unidirectional mapping loss value, the style loss value and the anti-loss value are smaller and better.

In the obtained final model, a color image y with better coloring quality is obtained through inputting a gray level image x and a line image z ₁ And (5) completing the coloring of the image.

Taking a specific image as an example, the specific steps are as follows:

step 1: in order to train the coloring model, a large number of high-definition cartoon color images are selected, then an image sequence is derived at a speed of 12 frames per second by using Opencv, excessively torn and exaggerated images in a data set are deleted, and pictures of the same person and the same clothes are displayed as much as possible, so that a generator can know accurate distribution, and images with the size of 256 multiplied by 3 are obtained through processing, and the images correspond to the width, the height and the channel of the pictures respectively. And converting the RGB images into corresponding gray images through a de-coloring method and converting the RGB images into corresponding line images through a Canny edge detection algorithm, and finally respectively matching the gray images x, the original color images y and the line images z to be used as training data sets of the method.

Step 2: the training process of the automatic coloring model of the image is as follows:

the gray level image and the line image are taken as input to enter a generator shown in fig. 2, an image characteristic image is obtained through three-layer convolution, the characteristic images of the gray level image and the line image are fused through a superposition mode, the fused characteristic image enters a converter formed by Densenet, the fused characteristic image is converted into a characteristic image with color image characteristic information, the converted characteristic image is obtained through an up-sampling calculation stage, finally, a unidirectional mapping loss function is established between the generated color image and a real color image, the result of the function is the tone correlation of the generated color image and the real color image, the smaller the function represents the tone of the generated color image and the tone of the real color image, the distribution of the generated color image characteristic image is optimized through the existing counter propagation mode, the distribution of the function is as close as possible to the distribution of the real color image, the style characteristic loss function of the generated color image is identical, the style characteristic distribution of the generated color image is kept as far as possible, finally, the optimized generated color image enters a discriminator for true and false discrimination, the result of the contrast loss is minimized through the counter propagation mode, the result of the contrast loss is balanced, and the discrimination training process is finally completed. The execution process of the automatic image coloring model is as follows: respectively taking a gray level image x and a corresponding line image z in a training data set as input of the automatic image coloring model, and executing a feature extraction stage, a feature fusion stage, a feature conversion stage and an up-sampling calculation stage to finally generate a corresponding color image, wherein the step 2 specifically comprises the following steps:

Step 2.1: feature extraction stage

The feature extraction stage consists of three convolution blocks, wherein the convolution kernel of the first convolution layer is 7×7 in size, the step length is 1, the convolution kernels of the other two convolution layers are 3×3 in size, and the step length is 2. The gray level image x and the line image z are mapped into a color image y to be used as a first generator, the gray level images x and the line images z of the two branches are used as inputs of the generator and enter the feature extractor to be convolved for three times, convolution channels are sequentially 32, 64 and 128, the conventional example normalization processing is performed, in addition, the conventional Relu activation function operation is performed on the feature map after the normalization processing to speed up training, the formula is f (x) =max (0, x), and the processed gray level image feature map and the processed line image feature map are obtained after operation.

The automatic coloring method which is not realized aiming at coloring edges and is easy to cause coloring effect overflow is not realized, and the two-channel input mode of the invention is beneficial to taking line characteristics as the constraint of gray image coloring, preventing coloring overflow and achieving the effect of complete coloring.

Step 2.2: feature fusion phase

As in the generator shown in fig. 2, the jump connection is the indicated concat part, the features being superimposed as the plus part between the two branches. The feature fusion stage is composed of two jump connections and feature superposition, input of convolution layers of two branches is connected with an up-sampling layer through the jump connections, more feature information can be reserved in the process of extracting features, and the line features output by the line branches are superposed into the features of the first gray image branch to obtain a feature fusion result.

Step 2.3: feature transformation stage

The feature conversion stage of the present invention consists of DenseNet, representing that the inputs of each layer are associated with the outputs of all the preceding layers, i.e. densely connected. As in the DenseNet part of fig. 2, each convolution layer is densely connected together with all previous convolution layers on the channel. DenseNet can reduce the disappearance of gradients and enhance feature delivery capabilities. And (3) inputting the result after feature fusion into a feature converter of the generator, and converting the result into color features at corresponding positions by using gray scale and line features. The calculation formula is as follows:

f _l ＝H _l ([f ₀ ，f ₁ ，...，f _l-1 ]) (1)

wherein H is _l Representative is a nonlinear transfer function that is combined from a batch normalization, a Relu activation function, and a 3 x 3 convolution operation. f (f) ₀ ，f ₁ And f _1-1 Features representing the first, second and 1-1 st convolution layers, f, respectively, in the feature transformer ₁ The method is the output of the feature conversion stage, as shown in a formula (1), the modules are connected with each other by DenseNet, more shallow information can be obtained, and the information flow coupling capability between the modules is improved.

Step 2.4: up-sampling calculation stage

The up-sampling calculation stage consists of two up-sampling and one convolution block, and corresponds to the portions named deconvl, deconv2 and 0utput of fig. 2 in turn according to the data flow order, as in the existing deconvolution module. Each layer outputs a characteristic graph with the number of channels of the size, the length and the width of 64 x 128, 32 x 256, and 256 x 256 sequentially, characteristic splicing is carried out through concat, and the final layer is convolved with the length and the width of 256 but the number of channels of 3, and 3 x 256 predictive color images y are obtained through a Tanh activation function of a Relu layer ₁ 。

Step 3: mapping the color image y into a gray image as a second generator, and repeating the step 2 with the color image as an input to form a cyclic pattern (the first cyclic process is a cyclic process of converting the gray image into the color image by the first generator and reconstructing the gray image by the second generator, (x, z) →y) ₁ →x ₂ The method comprises the steps of carrying out a first treatment on the surface of the The second cycle is a cycle process y-x of converting the color image into gray image by the second generator and reconstructing the color image by the first generator ₁ →y ₂ ) Generating a corresponding gray-scale image x ₁ The method comprises the steps of carrying out a first treatment on the surface of the The purpose is to extract the color information of the color image to be added to the training phase so that the first generator can obtain this color information for image rendering.

Step 4: to ensure the generated predictive colour image y ₁ The invention aims to establish a mapping relation between the two parts so as to lead the original color image y to correspond to the generated predicted color image y ₁ Guiding the mapping information again to enable the generated predicted color image y to be ₁ Some fine adjustments are automatically made and this calculation process is named one-way map loss. The up-sampling calculation is used to generate a predictive color image y in step 2.4 ₁ Comparing with the original color image y, calculating a one-way mapping loss function, which is expressed as:

L _uml (G)＝E[||y ₁ -y|| ₁ ] (2)

wherein y is ₁ Representing the generated predicted color image, y representing the original image, and E representing the desired function. The original color image y and the corresponding generated predicted color image y ₁ And carrying out difference comparison, and measuring the difference between the two by using mathematical expectation, so as to restrict the reconstructed image to carry out proper small change according to the original image, and further ensure the generation quality of the image. L (L) _uml Representing a generated predictive color image y ₁ And the expected value of the original color image y, which is used for measuring the consistency of the tone of the original color image y and the original color image y, and optimizing y by the function result through the existing back propagation mode ₁ And (3) the distribution of the feature map is restrained to be consistent with the distribution of the y feature map as far as possible until the function loss converges.

The unidirectional mapping loss function can ensure that the generated image has better detail processing and consistent tone information. The prior method well keeps the color information of the color image, but easily causes the difference of tone in the color information, and the unidirectional mapping loss function is intended to establish a mapping relation between the reconstructed image and the color image, so that the color image carries out mapping information guidance on the reconstructed image again, and the reconstructed image is automatically subjected to some fine adjustment.

Step 5: the Gram matrix is composed of arbitrary k (k is less than or equal to n) vectors in the n-dimensional European space, wherein the vectors are mutually inner products, and the larger the numerical value of the inner product is, the larger the correlation is represented, and the more similar the vectors are. In the field of computer vision, gram matrices are often used to grasp the overall style of an image, are largely used as style characteristic losses in the work of image style migration, and the difference between Gram matrices of two images is minimized as an optimization target, so that the style of a baseline image is continuously adjusted to be close. In the present invention, image colorization is essentially a style migration task, so this style penalty will be added to calculate the distance of the generated image from the Gram matrix of the real image feature map, and by continuously optimizing the difference of the penalty to minimize it, encourages similarity in color style to the style of the real image, in order to ensure that the hue style of the generated image remains consistent with the hue style of the real image. And (3) extracting features by utilizing the color image y1 generated in the step (2.4) and the color image y in the step (1), forming a Gram matrix, comparing, and calculating a style loss function, wherein the style loss function is expressed as follows:

wherein L is _style Representing a style loss function, i representing the ith layer of the network, G' representing a glamer matrix, Φ representing a feature map of the ith layer of the network. The loss function represents the expected value of the generated predicted color image y1 style characteristic and the original color image y style characteristic, and is used for measuring the correlation of the two styles, the y1 style characteristic, namely the Gram matrix, is optimized by the existing back propagation mode through the function result, the distance between the matrix and the Gram matrix of y is restrained to be minimized until the function loss converges, the two styles are kept consistent, the style loss function uses a VGG19 network trained on an ImageNet, and the number of extracted layers is 2_2,3_4,4_4 and 5_2.

Step 6: based on the hangloss function as the counterloss of the existing GAN, the loss function is expressed as:

the loss function is an antagonistic loss function used by existing GANs. The method is mainly used for training the game effect between the generator and the arbiter, optimizing the coefficients of the generator and the arbiter in a counter propagation mode, minimizing the counter loss and finally achieving the dynamic balance of the generator and the arbiter. Wherein L is _GAN Representing the contrast loss, E representing the expectation, the expectation of the first part representing the pair of discriminators for the generated color image y ₁ The closer 1 means the more realistic, and the second part's expectation-representation discriminator judges the true image y, naturally also the expectation-approaching maximum 1.P (P) _data Representing data distribution, of no material significance, first partThe x and z are inputs and the second part is y as an input. The loss function may be expressed as an expected value in the case of distribution of the data P; d (D) _y Representing the decision of the probability of the discriminator, 1 represents true, 0 represents false, and G (x, z) represents the generated color image y ₁ X and z represent an input gray-scale image and a line image, respectively, and y represents an original color image.

Step 7: gray scale image x to be circularly reconstructed ₂ Comparing with the image x in the first step; conversely, the color image y to be circularly reconstructed ₂ Compared to the original color image y. Their expected losses were calculated, expressed as:

wherein L is _cyc Represented as a cyclic loss function. F represents the gray image x reconstructed for the second generator cycle ₂ The function is mainly used to ensure that the cyclic pattern is performed so that the reconstructed image is close to the real image. And minimizes the cyclic loss function with proper operation of the cyclic mode. First part of the equationRepresenting a greyscale image x reconstructed from a generated colour image ₂ Contrast with the gray scale image x. Second part->The color image y2 reconstructed from the generated gray-scale image is represented in comparison with the original color image y, and the two desired functions have weights in the cyclic loss function of 15 and 10, respectively. The two parts form a circular consistent countermeasure network, characteristic information is provided for mutual conversion, and fig. 4 is a comparison chart of effects under circular mapping of different parameters.

Step 8: training the training process according to a preset period. Letting the generated color image y be by existing back propagation techniques ₁ Color image corresponding to step one y corresponds to the parameter, the parameter is optimized by using the existing gradient descent optimization algorithm, and a final model is obtained after the preset training times are completed.

Step 9: in the obtained final model, a predicted color image y with better coloring quality is obtained through the input of a gray level image x and a line image z ₁ And (5) completing the coloring of the image.

The invention uses peak signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR), structural similarity (Structural Similarity Index, SSIM) and French Lei Xie initiation distance (Fre chet inception distance, FID) to measure the image's performance. PSNR is an objective criterion for evaluating image quality, and higher PSNR values indicate smaller distortion and better image quality. SSIM measures the similarity of images from the three aspects of image brightness, contrast and structure, and the higher the SSIM value is, the higher the similarity is, and the better the image quality is. FID represents the distance between the feature vector of the generated image and the feature vector of the real image, and the smaller the FID value is, the closer the distance is, the better the generated image effect is. The comparison results are shown in Table one.

Table 1 shows the results of the comparison of the present invention with the prior art (PSNR, SSIM, FID) under three evaluation indexes, as shown in Table 1.

TABLE 1

As can be seen from Table 1, the present invention has higher PSNR and SSIM values, better Image quality, and the lowest FID value, and is shown best in comparison with the actual values measured on the dataset by Zhu et al, isola et al, harrish et al, and Yoo et al, where α is the specific gravity of the first cycle, β is the specific gravity of the second cycle, zhu et al shows the method of "Unpaired Image-to-Image translationusing cycle-consistent adversarialnetworks", isola et al shows the method of "Image-to-Image translationwith conditional adversarialnetworks", harrish et al shows "Automatic temporally coherentvideo colorization", yoo et al shows the method of "Coloring WithLimitedData: few-Shot ColorizationviaMemory-Augmantednworks". Therefore, the image coloring effect of the invention is obviously improved in the automatic image coloring effect compared with the aspects by using the technical proposal.

FIG. 6 is a block diagram of an image automatic coloring system based on coloring overflow constraint according to the present invention, as shown in FIG. 6, an image automatic coloring system based on coloring overflow constraint, comprising:

The original color image acquisition module 601 is configured to acquire a plurality of original color images.

An original color image conversion module 602, configured to convert each of the original color images into a gray scale image, and convert each of the original color images into a line image by using an edge detection algorithm.

The predicted color image generating module 603 is configured to input the grayscale image and the line image into the dual-channel generator simultaneously, and generate a predicted color image.

An image automatic shading model building module 604 for building an image automatic shading model from the predicted color image and the original color image; the automatic image coloring model is used for automatically coloring any gray level image.

The image automatic coloring model specifically comprises a feature extraction stage, a feature fusion stage, a feature conversion stage, an up-sampling calculation stage, an activation layer and an output layer which are connected in sequence; the feature extraction stage comprises two input channels, wherein one input channel is used for inputting gray images, and the other input channel is used for inputting line images; each input channel comprises three convolution layers with the step length of 2, and the number of the convolution channels is 32, 64 and 128 in sequence; the feature fusion stage is used for fusing the convolved gray image feature map and the convolved line image feature map to generate a feature fused image; the feature conversion stage comprises 4 densely connected DenseNet blocks; the upsampling calculation phase comprises two upsampling layers and a convolution block.

Fig. 7 is a schematic diagram of the whole network structure of the automatic coloring method based on the coloring overflow constraint, as shown in fig. 7, the invention adopts the cyclic generation countermeasure network of the two-channel model to construct an image corresponding to an original color image from a gray image, so that the image is lifelike like to the original color image. The invention combines the ideas of restricting coloring overflow and unidirectional mapping loss to keep the tone consistent by combining the line characteristics, thereby ensuring better coloring effect.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. An automatic image coloring method based on coloring overflow constraint, comprising:

acquiring a plurality of original color images;

constructing an image automatic coloring model according to the predicted color image and the original color image; the automatic image coloring model is used for automatically coloring any gray level image; the image automatic coloring model specifically comprises a feature extraction stage, a feature fusion stage, a feature conversion stage, an up-sampling calculation stage, an activation layer and an output layer which are connected in sequence; the feature extraction stage comprises two input channels, wherein one input channel is used for inputting gray images, and the other input channel is used for inputting line images; each input channel comprises three convolution layers;

feature fusion stage:

the jump connection of the generator is a concat part, the features are overlapped into a plus part between two branches, the feature fusion stage is composed of two jump connections and feature overlapping, the input of the convolution layers of the two branches is connected with the up-sampling layer through the jump connection, and the line features output by the line branches are overlapped into the features of the first gray image branches to obtain a feature fusion result;

Feature conversion stage:

the feature conversion stage consists of DenseNet, and is represented as that the input of each layer is related to the output of all the previous layers, namely, dense connection; each convolution layer in the DenseNet part is densely connected with all previous convolution layers on a channel; the result after feature fusion is sent to a feature converter of a first generator, and the gray level and line features are converted into color features at corresponding positions;

up-sampling calculation phase:

deconvolution consists of two upsamples and one convolution block; each layer outputs a characteristic graph with the number of channels of the size, the length and the width of 64 x 128, 32 x 256, and 256 x 256 sequentially, characteristic splicing is carried out through concat, and the final layer is convolved with the length and the width of 256 but the number of channels of 3, and 3 x 256 predictive color images y are obtained through a Tanh activation function of a Relu layer ₁ 。

2. The method for automatically coloring an image based on a coloring overflow constraint according to claim 1, wherein said acquiring a plurality of original color images further comprises:

3. The automatic coloring method for an image based on coloring over-color constraint according to claim 1, wherein the inputting the gray image and the line image into a dual-channel generator simultaneously generates a predicted color image, specifically comprising:

the calculation formula of the feature conversion stage is as follows:

f _l ＝H _l ([f ₀ ,f ₁ ,...,f _l-1 ])

wherein H is _l Representing a nonlinear transfer function that is combined from a batch normalization, a Relu activation function, and a 3 x 3 convolution operation; f (f) ₀ ，f ₁ And f _l-1 Features representing the first, second and first-1 convolutional layers, f, respectively, in the feature transformer _l Is the output of the feature conversion stage;

and generating a predicted color image according to the color characteristics.

4. The automatic image coloring method based on coloring overflow constraint according to claim 1, wherein said constructing an automatic image coloring model from said predicted color image and said original color image specifically comprises:

if not, establishing a cyclic loss function, and enabling a first generator to learn from the original color image based on the cyclic loss function to obtain color mapping until the similarity between the color image with the adjusted style and the original color image is larger than a similarity threshold.

5. The method for automatically coloring an image based on a coloring overflow constraint according to claim 4, wherein the one-way mapping loss function is:

L _uml (G)＝E[||y ₁ -y|| ₁ ]the method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _uml For predicting colour image y ₁ And the expected value of the original color image y; y is ₁ To predict a color image; y is the original color image; e is a desired function; i ₁ For predicting colour image y ₁ And the absolute value of the mean value of the original color image y; g is a first generator for generating a predicted color image.

6. The automatic image rendering method based on the coloring over-flow constraint of claim 5, wherein the style loss function is:

7. The method for automatically coloring an image based on a coloring overflow constraint according to claim 6, wherein the cyclic loss function is:

wherein L is _cyc A first generator for generating a predicted color image for the cyclic loss function; f is a second generator for generating a gray level image; />For the first cycle (x, z) →y ₁ →x ₂ Is a function of the desired function of (2); />For the second cycle y → x ₁ →y ₂ Is a function of the desired function of (2); p (P) _data Data distribution; f (G (x, z)) is a first-cycle reconstructed grayscale imagex ₂ The method comprises the steps of carrying out a first treatment on the surface of the G (F (y), z) is the color image y reconstructed in the second cycle ₂ F (y) is the predicted gray image x of the second generator ₁ 。

8. The method for automatically coloring an image based on coloring overflow constraint according to claim 7, wherein the contrast loss function is:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _GAN (G,D _Y X, Y, Z) is an counterloss function; d (D) _Y Judging probability for a discriminator; x is the whole data set of the gray image; y is the whole data set of the line image; z is the whole data set of the original color image; d (D) _Y (G (x, z)) is a determination result of the predicted color image generated for the first time; d (D) _Y And (y) is a determination result of the original color image y.

9. An automatic image rendering system based on a rendering overflow constraint, comprising:

an image automatic coloring model construction module for constructing an image automatic coloring model according to the predicted color image and the original color image; the automatic image coloring model is used for automatically coloring any gray level image; the image automatic coloring model specifically comprises a feature extraction stage, a feature fusion stage, a feature conversion stage, an up-sampling calculation stage, an activation layer and an output layer which are connected in sequence; the feature extraction stage comprises two input channels, wherein one input channel is used for inputting gray images, and the other input channel is used for inputting line images; each input channel comprises three convolution layers;

Feature fusion stage:

feature conversion stage:

up-sampling calculation phase: