CN118250514A

CN118250514A - Outdoor live video restoration method and device

Info

Publication number: CN118250514A
Application number: CN202410345169.4A
Authority: CN
Inventors: 吴佳豪; 唐若凡
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2024-03-25
Filing date: 2024-03-25
Publication date: 2024-06-25

Abstract

The invention provides an outdoor live video restoration method and device, which relate to the technical field of artificial intelligence and video restoration, and the method comprises the following steps: acquiring video original data and preprocessing the video original data; performing fuzzy kernel estimation and deblurring treatment on the preprocessed video frames by adopting a kernel track network model subjected to light weight optimization; performing multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frames to improve video quality; and (3) adjusting the video format and the size of the video frame subjected to the multi-scale fusion processing and the edge-preserving smoothing processing according to the requirements of the mobile device, and outputting the video frame. According to the invention, the performance of the model can be optimized and the processing time is reduced by adopting the lightweight optimized kernel track network model, so that the real-time performance of video restoration is ensured, and in addition, the video quality is improved by multi-scale fusion processing and edge-preserving smoothing processing, so that the definition and visual quality of the video can be effectively improved by deblurring processing.

Description

Outdoor live video restoration method and device

Technical Field

The invention relates to the technical field of artificial intelligence and video restoration, which can be used in the financial field, in particular to an outdoor live video restoration method and device.

Background

In modern society, outdoor live broadcast becomes a popular media form, and is widely applied to the fields of news reports, sports events, outdoor adventure and the like, and at present, the financial industry begins to prevail to develop business through outdoor live broadcast. However, many challenges are often faced in outdoor live broadcast processes, especially video quality problems, where video blur is a common and troublesome problem. Video blur may be caused by a variety of factors including shake of the handheld device, rapid movement of the subject, shooting in a low-light environment, and the like. Video blurring not only affects the viewing experience of the viewer, but may also mask important details of the live content.

In an outdoor live broadcast scenario, the computing resources of mobile terminals (e.g., smartphones, tablets, etc.) are significantly limited compared to desktops and servers due to their reliance on such devices for shooting and transmission. Although existing video repair techniques, such as deep learning models, have made significant progress in video deblurring, these models often require high computing power and memory resources and are difficult to run in real-time directly on mobile terminals with limited computing power.

In addition, for outdoor live broadcast, real-time is a key requirement. The viewer expects to see a picture that is as synchronized as possible with the actual time of occurrence. However, existing video deblurring techniques require a long time to process a frame of picture, which is not acceptable in live scenes. Particularly when video repair is performed using deep learning techniques, the steps of forward propagation and post-processing of the model can lead to significant delays.

Finally, running a computationally intensive video deblurring algorithm on a mobile terminal can also result in rapid drain of the device's battery, which is especially detrimental to long-term outdoor live broadcasts. Therefore, how to reduce the consumption of computing resources and energy sources by the algorithm while guaranteeing the deblurring effect of the video is also a problem to be solved.

Disclosure of Invention

In view of the foregoing, the present invention provides an outdoor live video restoration method and apparatus to solve at least one of the above-mentioned problems.

In order to achieve the above purpose, the present invention adopts the following scheme:

According to a first aspect of the present invention, there is provided an outdoor live video repair method, the method comprising: acquiring video original data and preprocessing the video original data; performing fuzzy kernel estimation and deblurring treatment on the preprocessed video frames by adopting a kernel track network model subjected to light weight optimization; performing multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frames to improve video quality; and (3) adjusting the video format and the size of the video frame subjected to the multi-scale fusion processing and the edge-preserving smoothing processing according to the requirements of the mobile device, and outputting the video frame.

As an embodiment of the present invention, the method for acquiring video raw data and preprocessing the video raw data includes: analyzing the illumination condition of a video frame in the video original data, and adjusting the brightness and contrast of the low-illumination video frame; the resolution of the video frames is dynamically adjusted based on the processing power and network load of the current mobile device.

As an embodiment of the present invention, the lightweight optimization of the kernel trajectory network model in the above method includes: replacing an original standard convolution layer by adopting a depth separation convolution in a kernel track network, wherein the depth separation convolution comprises a depth convolution and a point convolution; the depth convolution includes independently applying a convolution kernel for each input channel to extract spatial features; the point convolution includes convolving spatial features of the depth convolution output using a1 x1 convolution kernel to achieve inter-channel information fusion.

As an embodiment of the present invention, the lightweight optimization of the kernel trajectory network model in the above method further includes: evaluating a convolution kernel in the kernel trajectory network model, which has an influence on the deblurring performance less than a threshold value; pruning the convolution kernel and adjusting a network architecture based on the evaluation result; trimming the pruned kernel track network model to restore the model expression capacity influenced by pruning operation; and evaluating and pruning the trimmed kernel track network model again, and continuously performing iterative optimization until the pruning target requirement is met.

As an embodiment of the present invention, the performing multi-scale fusion processing on the deblurred video frame in the method includes: applying downsampling of different scales to each frame of deblurred image to generate multiple scale image versions; extracting features independently for the image version for each scale; combining the features extracted by different scales by using a preset fusion strategy; and reconstructing a final image based on the fused features.

As an embodiment of the present invention, the performing edge preserving smoothing processing on the deblurred video frame in the method includes: selecting a guiding filter based on a local linear model to carry out edge-preserving smoothing treatment; adjusting and determining window size and regularization parameters of the guide filter according to video content and deblurring effects; the deblurred video frame is processed with the guide filter to preserve the dominant edges in the image while smoothing out noise and unimportant details.

As an embodiment of the present invention, the above method further includes: performing deep supervision processing on the core track network model subjected to light weight optimization, wherein the method specifically comprises the following steps: adding auxiliary output layers on a plurality of middle layers of the kernel track network model; for each auxiliary output layer, calculating auxiliary loss between the output of each auxiliary output layer and a corresponding real clear image; calculating a main loss between the final deblurred image and the true sharp image; the weights of the auxiliary losses are optimized by a cross-validation method to balance the contribution of primary and auxiliary losses to the total loss, which is the weighted sum of the primary and auxiliary losses.

According to a second aspect of the present invention there is provided an outdoor live video repair device, the device comprising: the preprocessing unit is used for acquiring video original data and preprocessing the video original data; the fuzzy processing unit is used for carrying out fuzzy kernel estimation and deblurring processing on the preprocessed video frames by adopting a kernel track network model which is subjected to light weight optimization; the post-processing unit is used for carrying out multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frames so as to improve the video quality; and the output unit is used for outputting the video frames subjected to the multi-scale fusion processing and the edge preservation smoothing processing after adjusting the video format and the size according to the requirements of the mobile device.

As an embodiment of the present invention, the preprocessing unit includes: the brightness and contrast adjusting module is used for analyzing the illumination condition of the video frames in the video original data and adjusting the brightness and contrast of the low-illumination video frames; and the resolution adjustment module is used for dynamically adjusting the resolution of the video frame according to the processing capacity and the network load of the current mobile device.

As an embodiment of the present invention, the lightweight optimization of the kernel trajectory network model in the above device includes: replacing an original standard convolution layer by adopting a depth separation convolution in a kernel track network, wherein the depth separation convolution comprises a depth convolution and a point convolution; the depth convolution includes independently applying a convolution kernel for each input channel to extract spatial features; the point convolution includes convolving spatial features of the depth convolution output using a1 x1 convolution kernel to achieve inter-channel information fusion.

As an embodiment of the present invention, the lightweight optimization of the kernel trajectory network model in the above device further includes: evaluating a convolution kernel in the kernel trajectory network model, which has an influence on the deblurring performance less than a threshold value; pruning the convolution kernel and adjusting a network architecture based on the evaluation result; trimming the pruned kernel track network model to restore the model expression capacity influenced by pruning operation; and evaluating and pruning the trimmed kernel track network model again, and continuously performing iterative optimization until the pruning target requirement is met.

As an embodiment of the present invention, the post-processing unit includes a multi-scale processing module, which includes: an image generation sub-module, configured to apply downsampling of different scales to each frame of the deblurred image to generate image versions of multiple scales; the feature extraction submodule is used for extracting features of the image version of each scale independently; the feature fusion sub-module is used for combining the features extracted by different scales by utilizing a preset fusion strategy; and the image reconstruction sub-module is used for reconstructing a final image based on the fused features.

As an embodiment of the present invention, the above-mentioned post-processing unit includes an edge-holding processing module including: a filter selection sub-module for selecting a guide filter based on a local linear model for edge preserving smoothing; the parameter determination submodule is used for adjusting and determining the window size and regularization parameters of the guide filter according to video content and the deblurring effect; an edge-preserving sub-module for processing the deblurred video frame with the guide filter to preserve the dominant edges in the image while smoothing out noise and unimportant details.

As an embodiment of the present invention, the above apparatus further includes: the depth supervision unit is used for carrying out depth supervision processing on the kernel track network model which is subjected to light weight optimization, and specifically comprises the following steps: the auxiliary layer adding module is used for adding auxiliary output layers on a plurality of middle layers of the kernel track network model; the auxiliary loss calculation module is used for calculating auxiliary loss between the output of each auxiliary output layer and the corresponding real clear image; the main loss calculation module calculates main loss between the final deblurred image and the real clear image; an optimization module for optimizing the weights of the auxiliary losses by a cross-validation method to balance the contribution of primary and auxiliary losses to a total loss, the total loss being a weighted sum of the primary and auxiliary losses.

According to a third aspect of the present invention there is provided an electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, the processor implementing the steps of the above method when executing said computer program.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

According to the technical scheme, the outdoor live video restoration method and device provided by the invention have the advantages that the model performance can be optimized and the processing time can be reduced by adopting the lightweight optimized kernel track network model, so that the real-time performance of video restoration is ensured, in addition, the video quality is improved by multi-scale fusion processing and edge-preserving smooth processing, so that the deblurring processing can better cope with different scenes and fuzzy types, and particularly in a complex and changeable outdoor live video environment, the definition and visual quality of video can be effectively improved. Finally, the algorithm energy consumption is directly influenced by reducing the complexity of the model and improving the processing speed, and when the algorithm runs on mobile equipment, the reduction of the energy consumption means longer service time, which is particularly important for long-time outdoor live broadcast.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

fig. 1 is a schematic flow chart of an outdoor live video restoration method provided by an embodiment of the application;

fig. 2 is a schematic flow chart of preprocessing video original data according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of lightweight optimization of a core trajectory network model provided by an embodiment of the present application;

fig. 4 is a schematic flow chart of a video frame multi-scale fusion process according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of a multi-scale fusion process according to an embodiment of the present application after adding a time dimension;

FIG. 6 is a schematic flow chart of an edge preserving smoothing process according to an embodiment of the present application;

fig. 7 is a schematic flow chart of another outdoor live video restoration method according to an embodiment of the present application;

FIG. 8 is a schematic flow chart of a deep supervision process for a core trajectory network model that is optimized by light weight according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of an outdoor live video restoration device according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a preprocessing unit according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a multi-scale processing module according to an embodiment of the present application;

FIG. 12 is a schematic view of an edge preserving process module according to an embodiment of the present application;

Fig. 13 is a schematic block diagram of a system configuration of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.

The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations. The user information in the embodiment of the application is obtained through legal compliance approaches, and the user information is obtained, stored, used, processed and the like through the approval of the client.

Fig. 1 is a schematic flow chart of an outdoor live video restoration method according to an embodiment of the present application, where the embodiment describes the present application from a processing unit side of a mobile device, and the method includes the following steps:

Step S101: and acquiring video original data and preprocessing the video original data.

That is, the processing unit acquires real-time video data from the image pickup device of the mobile device, and then performs real-time preprocessing on the video data. The preprocessing step creates more favorable conditions for subsequent video restoration and enhancement processing through a series of operations on the video original data, and aims to improve the quality of final video output and ensure the high efficiency and the effectiveness of a processing flow.

Preferably, as shown in fig. 2, the present step may include the following sub-steps:

step S1011: and analyzing the illumination condition of the video frames in the video original data, and adjusting the brightness and the contrast of the low-illumination video frames.

The illumination condition is very critical to the deblurring treatment of the video during outdoor live broadcast, and the video quality can be improved by analyzing the illumination condition of the video frames in the video original data and adjusting the brightness and contrast of the low-illumination video frames, so that the video quality is more suitable for the subsequent processing flow.

In this embodiment, the illumination condition may be analyzed by using the luminance histogram of the video frame, which can reflect the luminance distribution of the image, and by analyzing the shape and distribution of the histogram, it may be determined whether the video frame is photographed under the conditions of low illumination, normal illumination, or overexposure. And when obvious brightness difference exists in the same frame, the bright and dark areas in the video are identified by carrying out local brightness detection on the video frame.

After the above-described illumination analysis, the present embodiment can perform brightness and contrast adjustment on the low-illuminance video frame, and in the present embodiment, brightness and contrast adjustment can be performed using an image processing library such as OpenCV. For example, brightness is changed by adjusting a V-channel (brightness channel in HSV color space) of an image, and contrast and brightness are adjusted by linear transformation. In addition to basic linear adjustment, the present embodiment may also employ more complex image processing algorithms, such as Retinex theory model, local contrast enhancement, etc., to more naturally adjust brightness and contrast while reducing the side effects of noise amplification. Through the steps, the quality of the video frames shot under the low-illumination condition can be effectively improved, and better input is provided for subsequent video restoration, so that the watching experience of the whole outdoor live video is improved.

Step S1012: the resolution of the video frames is dynamically adjusted based on the processing power and network load of the current mobile device.

In this embodiment, the CPU and GPU usage, memory occupancy, currently available network bandwidth, and other metrics of the mobile device may be periodically detected to evaluate the current processing capability and network conditions. For resolution, the resolution level may be preset first, i.e. a series of video resolution levels may be defined, e.g. from low to high including 144p, 360p, 480p, 720p, 1080p, etc. And then dynamically selecting the most suitable resolution level according to the real-time evaluation result of the equipment processing capacity and the network load, and selecting lower resolution when the equipment processing capacity is lower or the network condition is poorer, otherwise selecting higher resolution. During live video frames may also be adjusted in real time, during resolution conversion, measures are taken to ensure smoothness of the conversion, avoiding an abrupt visual experience for the viewer, such as by gradually adjusting the resolution by a progressive method, e.g. changing the resolution level only one at a time, or pre-loading lower resolution video data into the buffer before reducing the resolution, and similarly loading high resolution data in advance before increasing the resolution, all of which help to avoid buffering or delay during resolution conversion.

Through implementing these steps, the mobile device can dynamically adjust the resolution of the video frame according to its own processing capability and the current network condition when live broadcast outdoors, thereby providing video quality as good as possible while guaranteeing real-time performance and optimizing user experience.

Step S102: and carrying out fuzzy kernel estimation and deblurring treatment on the preprocessed video frame by adopting a kernel track network model which is subjected to light weight optimization.

The kernel trajectory network (Kernel Trajectory Network, KTN) is a deep learning model designed specifically for video deblurring, the core idea being to estimate and exploit the blur kernel (i.e., blurred "trajectory") in video due to camera motion, object motion, or other factors to recover a clear video frame.

In an outdoor live environment with mobile equipment, computing resources are at a premium, and lightweight models can significantly reduce the computational burden by reducing the number of model parameters and simplifying the computational flow so that the models can run more efficiently on these devices. In addition, the lightweight kernel track network model not only reduces the calculation requirement, but also reduces the memory occupation, so that the model is easier to be deployed on mobile equipment with limited memory, meanwhile, the electric quantity consumption can be reduced, and the service time of the equipment is prolonged, which is particularly critical for long-time outdoor live broadcast. Furthermore, outdoor live broadcast has extremely high requirements on instantaneity, and the lightweight optimized kernel track network model can rapidly respond and process video data due to fewer calculation requirements, so that continuity and instantaneity of video streams are guaranteed, and viewing experience of audiences is improved.

However, in the process of carrying out light optimization on the kernel trajectory network model, careful design is needed, and the deblurring capability of the kernel trajectory network model cannot be damaged.

Therefore, preferably, the application replaces the original standard convolution layer by adopting the depth separation convolution in the kernel track network model to carry out the light optimization of the kernel track network model. Specifically, by decomposing the standard convolution operation into two smaller operations: the depth convolution (DEPTHWISE CONVOLUTION) and the point convolution (Pointwise Convolution) achieve this goal.

The depth convolution, namely, each input channel independently applies one convolution kernel, if M channels are input, there are M convolution kernels, each convolution kernel only carries out convolution operation on the corresponding input channel, and the output also has MM channels, which are mainly responsible for extracting spatial features.

The point convolution, namely, the 1×1 convolution is used for checking the output of the depth convolution to carry out convolution, so that the information fusion among channels is realized, the information of the space dimension is not increased, and only the characteristics of the channel dimension are adjusted.

When the depth separation convolution is applied to lighten the kernel track network model, the performance of the kernel track network model after the depth separation convolution is applied can be tested through experiments, namely, the deblurring effect and the change of the processing speed are observed. And then adjusting specific parameters (such as the size of a convolution kernel, the step length, the filling mode and the like) of the depth separation convolution and the application position of the specific parameters in the kernel track network model so as to achieve the optimal performance and efficiency balance.

Through the steps, when the depth separation convolution is applied to carry out light optimization, the number of parameters and the computational complexity of the model are obviously reduced, so that the resource consumption of training and reasoning is reduced, the processing speed is directly improved by the reduced computational complexity, and the real-time video deblurring requirement is met. In addition, although the calculation amount is reduced, the deblurring performance of the kernel trajectory network model can be maintained or only has small performance degradation through fine network design and parameter adjustment. Therefore, by introducing depth separation convolution into the kernel track network model, not only the deblurring effect can be ensured, but also the calculation efficiency is improved, and the method is particularly suitable for outdoor live video deblurring scenes with strict requirements on instantaneity and resource consumption.

Further preferably, in order to lightweight the kernel trajectory network model, a pruning algorithm can be introduced in the training stage of the model, so as shown in fig. 3, the lightweight optimization of the kernel trajectory network model can further comprise the following steps:

Step S301: and evaluating convolution kernels in the kernel trajectory network model, wherein the influence on the deblurring performance is smaller than a threshold value.

In this step, the goal is to identify those convolution kernels that contribute less to the final deblurring performance, which typically involves quantifying the importance or impact of each convolution kernel. Specific evaluation methods may include weight analysis, activation analysis, and sensitivity analysis, wherein: the weight analysis refers to directly analyzing the weight of the convolution kernel, and the convolution kernel with smaller weight has smaller influence on output; the activation analysis is to observe the activation condition of the feature graphs of the convolution check, and the convolution kernels corresponding to the feature graphs which are less activated may be less important; sensitivity analysis is to experimentally remove some of the convolution kernels and evaluate the performance variations to determine the importance of the convolution kernels.

Step S302: pruning the convolution kernel and adjusting a network architecture based on the evaluation result.

Based on the evaluation result of step S301, pruning operation is performed, removing those convolution kernels that are considered unimportant. This step also requires adapting the network architecture accordingly, ensuring the consistency and integrity of the network. After pruning, the number of parameters and the computational complexity of the network are reduced.

Step S303: and fine-tuning the pruned kernel trajectory network model to restore the model expression capacity influenced by pruning operation.

Pruning operations, while reducing the complexity of the model, may also result in reduced model performance. Thus, fine tuning of the pruned model is required to restore or even enhance its deblurred performance. Fine tuning is typically performed on the original dataset using a small learning rate to fine tune the remaining parameters.

Step S304: and evaluating and pruning the trimmed kernel track network model again, and continuously performing iterative optimization until the pruning target requirement is met.

Pruning and trimming are an iterative process, and the trimmed model needs to be evaluated again to determine if further pruning is required. This process is repeated until the model meets the given performance and complexity requirements.

Therefore, the embodiment of the application can effectively balance the performance of the model and the use of computing resources by introducing a pruning algorithm and by iterative optimization, and is particularly suitable for outdoor live broadcast scenes which need to be subjected to real-time video processing in a resource limited environment.

Step S103: and carrying out multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frames so as to improve the video quality.

The purpose of this step is to further improve the deblurred video quality, where the multi-scale fusion process is aimed at restoring video detail and the edge-preserving smoothing process is aimed at preserving the edge sharpness of the video.

Preferably, referring to fig. 4, the steps of the multi-scale fusion process in this step are a schematic flow diagram of a multi-scale fusion process of a video frame according to an embodiment of the present application, which includes:

Step S401: downsampling of different scales is applied to each frame of deblurred image to generate multiple scale versions of the image.

This step generates a series of image versions of different scales representing visual information at different levels from coarse to fine by performing different degrees of downsampling of the deblurred video frame.

Step S402: features are extracted independently for the image version for each scale.

Feature extraction is performed independently for each scale-generated version of the image, and these features may include information of edges, textures, colors, etc., reflecting important visual characteristics of the image at that scale.

Step S403: and combining the features extracted by different scales by using a preset fusion strategy.

The preset fusion strategy in the step can be weighted average, maximum fusion and the like, and can combine the features extracted by different scales, and different weights can be given to the features of different scales in the fusion process so as to optimize the fusion effect.

Step S404: and reconstructing a final image based on the fused features.

The quality of the deblurred video can be effectively improved through the multi-scale fusion processing, and the method is more effective in detail recovery and visual effect improvement.

Considering the real-time performance and continuity of outdoor live broadcast, the embodiment may further add consideration of time dimension in the multi-scale fusion process, and employ spatio-temporal characteristics to ensure smooth transition and consistency between video frames, and specifically may include the steps as shown in fig. 5:

Step S501: a convolutional network with spatio-temporal perceptibility is employed to extract spatio-temporal features in video frames. The convolution network of the space-time perception capability here may be, for example, a 3D convolution neural network (3D CNNs) or a Time Convolution Network (TCNs). Through this step, both spatial visual information and temporal dynamic changes can be captured, which helps to understand the continuity of video content.

Step S502: the motion between successive frames is estimated using a light flow algorithm, and the motion trend and the rate of change of the video frames are analyzed. Based on the results of the optical flow, the weights between frames can be adjusted in a multi-scale fusion process to ensure consistency and smoothness of motion.

Step S503: and dynamically adjusting the fusion weight according to the content change and the motion information of the continuous video frames. In particular, in areas with less motion, spatial detail may be preferentially preserved; and in areas of greater movement or variation, emphasis is placed on ensuring a smooth transition in time.

Step S504: a consistency loss function is introduced during model training to penalize cases of poor inter-frame consistency. Such a loss function may be defined based on a number of factors, such as visual differences between frames, continuity of the optical flow field, and the like, to name a few examples:

(1) Consistency loss function based on optical flow

Assume that two consecutive frames of images I _t and I _t+1 are provided, where t represents a time step. First, an optical flow F _t→t+1 from I _t to I _t+1 is calculated using an optical flow estimation algorithm (such as Farneback algorithm, deep flow, PWC-Net, etc.). Then, I _t is remapped according to the optical flow to obtainTheoreticallyShould be close to the true I _t+1. Based on this, a consistency loss function can be defined as:

Where L is the consistency loss function, ||| ₁ | denotes the L1 norm, for calculating the sum of the absolute values of the pixel differences between two frames of images, this loss function encourages the model to reduce the difference between the frames after optical flow remapping and the real frames, thereby preserving visual consistency between frames.

(2) Structural similarity loss

The Structural Similarity (SSIM) index is an index for measuring the visual similarity of two images, and focuses on the structural information of the images. The structural similarity penalty (L ₁) may be defined as:

In practical applications, these two losses can also be combined to form a comprehensive consistency loss:

L_total＝λ1L+λ2L₁

where λ1 and λ2 are weight parameters used to balance the different penalty terms. In this way, the kernel trajectory network model is encouraged not only to maintain pixel-level consistency between frames, but also to maintain structural consistency of images, thereby helping to improve spatio-temporal consistency and visual quality after video repair processing.

The following further describes the edge-preserving smoothing processing in step S103. Fig. 6 is a schematic flow chart of an edge preserving smoothing process according to an embodiment of the present application, which includes:

Step S601: and selecting a guiding filter based on the local linear model for edge-preserving smoothing.

The guide filter is a very effective edge preserving smoothing technique that can remove noise while preserving sharpness of the image edges. It smoothes the image within a window around each pixel based on a local linear model.

Step S602: and adjusting and determining the window size and regularization parameters of the guide filter according to the video content and the deblurring effect.

The size of the window controls the size of a local area considered by the filter, and a larger window can remove more noise, but can also cause edge blurring; smaller windows better preserve detail and edges, but have weaker denoising effects. The regularization parameters are used for adjusting the parameters of the smoothness, larger regularization parameter values make the filter more emphasized the smoothness effect, and smaller values pay more attention to the structure of the original image.

Step S603: the deblurred video frame is processed with the guide filter to preserve the dominant edges in the image while smoothing out noise and unimportant details.

The application of the guide filter with adjusted parameters to each deblurred video frame not only smoothes noise in the image, but also maintains major edges and important details in the image, such as texture and contours.

By the edge-preserving smoothing process, the key visual features of the image, particularly the edges and detail parts, can be preserved or even enhanced while noise is greatly reduced, so that the visual quality and viewing experience of the video are remarkably improved. The processing is particularly suitable for outdoor live video restoration, and can effectively resist illumination change in an outdoor environment and fuzzy problems caused by movement.

Step S104: and (3) adjusting the video format and the size of the video frame subjected to the multi-scale fusion processing and the edge-preserving smoothing processing according to the requirements of the mobile device, and outputting the video frame.

This step is to ensure that the processed video can be played at the target device with the best quality while taking into account storage and transmission efficiency. In practice, the video can be converted into a format widely supported by mobile devices, such as MP4 (using H.264 or H.265 coding), so that the compatibility of the video on most mobile devices can be ensured; adjusting coding parameters, such as bit rate, key frame interval, etc., to balance video quality and file size; the video size is adjusted according to the screen resolution of the target mobile device, so that the file size can be reduced, and additional scaling processing during playing on equipment can be avoided, thereby improving the playing efficiency and quality; to accommodate different proportions of screens, it may be necessary to crop or fill the video, it being important to ensure that this step does not lose critical visual content in the video.

According to the outdoor live video restoration method provided by the invention, the performance of the model can be optimized and the processing time can be reduced by adopting the lightweight optimized kernel track network model, so that the real-time performance of video restoration is ensured, in addition, the video quality is improved by multi-scale fusion processing and edge-preserving smoothing processing, so that the deblurring processing can better cope with different scenes and fuzzy types, and particularly in a complex and changeable outdoor live video environment, the definition and visual quality of the video can be effectively improved. Finally, the algorithm energy consumption is directly influenced by reducing the complexity of the model and improving the processing speed, and when the algorithm runs on mobile equipment, the reduction of the energy consumption means longer service time, which is particularly important for long-time outdoor live broadcast.

Fig. 7 is a schematic flow chart of another outdoor live video restoration method according to an embodiment of the present application, which includes the following steps:

Step S701: and acquiring video original data and preprocessing the video original data.

Step S702: and carrying out fuzzy kernel estimation and deblurring on the preprocessed video frame by adopting a kernel track network model subjected to light optimization and depth supervision processing.

Step S703: and carrying out multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frames so as to improve the video quality.

Step S704: and (3) adjusting the video format and the size of the video frame subjected to the multi-scale fusion processing and the edge-preserving smoothing processing according to the requirements of the mobile device, and outputting the video frame.

As can be seen from the above, the outdoor live video restoration method of the present embodiment is different from the method of the corresponding embodiment of fig. 1 in that the core track network model that is optimized by light weight is further subjected to the depth supervision processing. This is to further improve the learning efficiency and deblurring performance of the kernel trajectory network model during its training.

Fig. 8 is a schematic flow chart of performing a deep supervision process on a core trajectory network model that is optimized by light weight according to an embodiment of the present application, where the process includes:

Step S801: an auxiliary output layer is added on a plurality of middle layers of the kernel trajectory network model.

The main purpose of adding auxiliary output layers on a plurality of middle layers of the kernel track network model is to enable the kernel track network model to receive supervision signals at a plurality of stages in the training process and promote feature learning and model optimization. These auxiliary output layers are typically designed as simplified versions of the output layer that are capable of directly predicting deblurred images or related attributes based on intermediate layer characteristics.

Step S802: for each auxiliary output layer, the auxiliary loss between its output and the corresponding true sharp image is calculated.

For each auxiliary output layer, the loss between its output and the true sharp image is calculated, where the loss may be Mean Square Error (MSE), structural Similarity (SSIM) index, perceptual loss, etc., selected according to the specific task and effect. The assistance loss helps guide the middle layer of the network to learn features that are useful for the final deblurring task, preventing feature dissipation problems that occur in deep networks.

Step S803: the main penalty between the final deblurred image and the true sharp image is calculated.

The loss between the final deblurred image and the true sharp image is the main loss, which is the main goal of model optimization. Likewise, this loss may also be calculated by MSE, SSIM, or other suitable loss function.

Step S804: the weights of the auxiliary losses are optimized by a cross-validation method to balance the contribution of primary and auxiliary losses to the total loss, which is the weighted sum of the primary and auxiliary losses.

The weight of the auxiliary losses and the balance between the auxiliary losses and the main losses are optimized through methods such as cross validation and the like in the step, and the optimization of the weight helps to find an optimal loss combination so as to promote the learning and performance improvement of the network most effectively. Eventually, the total loss of the model is a weighted sum of the main loss and all auxiliary losses, which will be used for the back propagation of the network and for parameter updates.

Through the deep supervision processing, the kernel trajectory network model can more comprehensively utilize data information in the training process, so that the training efficiency and the deblurring effect are improved. The auxiliary output layer and the auxiliary loss are introduced, so that the model can obtain effective gradient signals in a deep layer, and the common gradient disappearance problem of a depth network is avoided, thereby being beneficial to model learning more complicated and fine deblurring mapping. Finally, when processing outdoor live video with rich dynamic scenes and details, the quality of the video deblurring processing is obviously improved.

Fig. 9 is a schematic structural diagram of an outdoor live video restoration device according to an embodiment of the present application, where the device includes: the preprocessing unit 910, the blurring processing unit 920, the post-processing unit 930, and the output unit 940 are sequentially connected therebetween. Wherein:

the preprocessing unit 910 is configured to acquire and preprocess video raw data.

And the blurring processing unit 920 is configured to perform blurring kernel estimation and deblurring processing on the preprocessed video frame by using the kernel trajectory network model that is subjected to light weight optimization.

The post-processing unit 930 is configured to perform multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frame to improve video quality.

And an output unit 940, configured to output the video frame after the multi-scale fusion process and the edge preservation smoothing process after adjusting the video format and the size according to the requirements of the mobile device.

Preferably, as shown in fig. 10, the preprocessing unit 910 includes: the brightness and contrast adjustment module 911 is configured to analyze an illumination condition of a video frame in the video original data, and adjust brightness and contrast of the low-illumination video frame; a resolution adjustment module 912 for dynamically adjusting the resolution of the video frame based on the processing power and network load of the current mobile device.

Preferably, the lightweight optimization of the kernel trajectory network model in the device includes: replacing an original standard convolution layer by adopting a depth separation convolution in a kernel track network, wherein the depth separation convolution comprises a depth convolution and a point convolution; the depth convolution includes independently applying a convolution kernel for each input channel to extract spatial features; the point convolution includes convolving spatial features of the depth convolution output using a1 x 1 convolution kernel to achieve inter-channel information fusion.

Preferably, the lightweight optimization of the kernel trajectory network model in the device further includes: evaluating a convolution kernel in the kernel trajectory network model, which has an influence on the deblurring performance less than a threshold value; pruning the convolution kernel and adjusting a network architecture based on the evaluation result; trimming the pruned kernel track network model to restore the model expression capacity influenced by pruning operation; and evaluating and pruning the trimmed kernel track network model again, and continuously performing iterative optimization until the pruning target requirement is met.

Preferably, the post-processing unit 930 includes a multi-scale processing module 931, as shown in fig. 11, which includes: an image generation sub-module 9311 for applying downsampling of different scales to each frame of deblurred image to generate multiple scale versions of the image; a feature extraction sub-module 9312 for extracting features independently for the image versions of each scale; a feature fusion submodule 9313 for combining the features extracted by different scales by using a preset fusion strategy; an image reconstruction sub-module 9314 for reconstructing a final image based on the fused features.

Preferably, the post-processing unit 930 further includes an edge-preserving processing module 932, as shown in fig. 12, which includes: a filter selection sub-module 9321 for selecting a partial linear model-based guide filter for edge-preserving smoothing; a parameter determination sub-module 9322 for adjusting and determining the window size and regularization parameters of the pilot filter according to video content and deblurring effects; an edge-preserving sub-module 9323 for processing the deblurred video frames with the pilot filter to preserve the dominant edges in the image while smoothing out noise and unimportant details.

Preferably, the apparatus further comprises: the depth supervision unit is used for carrying out depth supervision processing on the kernel track network model which is subjected to light weight optimization, and specifically comprises the following steps: the auxiliary layer adding module is used for adding auxiliary output layers on a plurality of middle layers of the kernel track network model; the auxiliary loss calculation module is used for calculating auxiliary loss between the output of each auxiliary output layer and the corresponding real clear image; the main loss calculation module calculates main loss between the final deblurred image and the real clear image; an optimization module for optimizing the weights of the auxiliary losses by a cross-validation method to balance the contribution of primary and auxiliary losses to a total loss, the total loss being a weighted sum of the primary and auxiliary losses.

According to the technical scheme, the outdoor live video restoration device provided by the invention has the advantages that the model performance can be optimized and the processing time can be reduced by adopting the lightweight optimized kernel track network model, so that the real-time performance of video restoration is ensured, in addition, the video quality is improved by multi-scale fusion processing and edge-preserving smooth processing, so that the deblurring processing can better cope with different scenes and fuzzy types, and particularly in a complex and changeable outdoor live video environment, the definition and visual quality of video can be effectively improved. Finally, the algorithm energy consumption is directly influenced by reducing the complexity of the model and improving the processing speed, and when the algorithm runs on mobile equipment, the reduction of the energy consumption means longer service time, which is particularly important for long-time outdoor live broadcast.

The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method when executing the program.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the method.

As shown in fig. 13, the electronic device 600 may further include: a communication module 110, an input unit 120, an audio processor 130, a display 160, a power supply 170. It is noted that the electronic device 600 need not include all of the components shown in fig. 13; in addition, the electronic device 600 may further include components not shown in fig. 13, to which reference is made to the related art.

As shown in fig. 13, the central processor 100, also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 100 receives inputs and controls the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 100 can execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides an input to the central processor 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, or the like. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. Memory 140 may also be some other type of device. Memory 140 includes a buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage 142, the application/function storage 142 for storing application and function programs or a flow for executing operations of the electronic device 600 by the central processor 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. A communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and to receive audio input from the microphone 132 to implement usual telecommunication functions. The audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 130 is also coupled to the central processor 100 so that sound can be recorded locally through the microphone 132 and so that sound stored locally can be played through the speaker 131.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. An outdoor live video restoration method, comprising:

acquiring video original data and preprocessing the video original data;

Performing fuzzy kernel estimation and deblurring treatment on the preprocessed video frames by adopting a kernel track network model subjected to light weight optimization;

Performing multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frames to improve video quality;

and (3) adjusting the video format and the size of the video frame subjected to the multi-scale fusion processing and the edge-preserving smoothing processing according to the requirements of the mobile device, and outputting the video frame.

2. The outdoor live video restoration method as recited in claim 1, wherein the acquiring video raw data and preprocessing the video raw data comprises:

Analyzing the illumination condition of a video frame in the video original data, and adjusting the brightness and contrast of the low-illumination video frame;

the resolution of the video frames is dynamically adjusted based on the processing power and network load of the current mobile device.

3. The outdoor live video restoration method as recited in claim 1, wherein the lightweight optimization of the kernel trajectory network model comprises:

Replacing an original standard convolution layer by adopting a depth separation convolution in a kernel track network, wherein the depth separation convolution comprises a depth convolution and a point convolution;

The depth convolution includes independently applying a convolution kernel for each input channel to extract spatial features;

The point convolution includes convolving spatial features of the depth convolution output using a1 x 1 convolution kernel to achieve inter-channel information fusion.

4. An outdoor live video restoration method as defined in claim 3 wherein the lightweight optimization of the core track network model further comprises:

evaluating a convolution kernel in the kernel trajectory network model, which has an influence on the deblurring performance less than a threshold value;

pruning the convolution kernel and adjusting a network architecture based on the evaluation result;

Trimming the pruned kernel track network model to restore the model expression capacity influenced by pruning operation;

And evaluating and pruning the trimmed kernel track network model again, and continuously performing iterative optimization until the pruning target requirement is met.

5. The outdoor live video restoration method as recited in claim 1, wherein said performing a multi-scale fusion process on the deblurred video frames comprises:

Applying downsampling of different scales to each frame of deblurred image to generate multiple scale image versions;

Extracting features independently for the image version for each scale;

combining the features extracted by different scales by using a preset fusion strategy;

And reconstructing a final image based on the fused features.

6. An outdoor live video restoration method as defined in claim 1 wherein said edge preserving smoothing of the deblurred video frames comprises:

Selecting a guiding filter based on a local linear model to carry out edge-preserving smoothing treatment;

adjusting and determining window size and regularization parameters of the guide filter according to video content and deblurring effects;

the deblurred video frame is processed with the guide filter to preserve the dominant edges in the image while smoothing out noise and unimportant details.

7. An outdoor live video repair method as defined in claim 1, further comprising: performing deep supervision processing on the core track network model subjected to light weight optimization, wherein the method specifically comprises the following steps:

Adding auxiliary output layers on a plurality of middle layers of the kernel track network model;

For each auxiliary output layer, calculating auxiliary loss between the output of each auxiliary output layer and a corresponding real clear image;

calculating a main loss between the final deblurred image and the true sharp image;

The weights of the auxiliary losses are optimized by a cross-validation method to balance the contribution of primary and auxiliary losses to the total loss, which is the weighted sum of the primary and auxiliary losses.

8. An outdoor live video repair device, the device comprising:

The preprocessing unit is used for acquiring video original data and preprocessing the video original data;

the fuzzy processing unit is used for carrying out fuzzy kernel estimation and deblurring processing on the preprocessed video frames by adopting a kernel track network model which is subjected to light weight optimization;

The post-processing unit is used for carrying out multi-scale fusion processing and edge-preserving smoothing processing on the deblurred video frames so as to improve the video quality;

And the output unit is used for outputting the video frames subjected to the multi-scale fusion processing and the edge preservation smoothing processing after adjusting the video format and the size according to the requirements of the mobile device.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed by the processor.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.