Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present application and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
It will be understood that the terms first, second, etc. as used herein may be used to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element. For example, the first pose data may be referred to as second pose data, and similarly, the second pose data may be referred to as first pose data, without departing from the scope of the application. Both the first pose data and the second pose data belong to the pose data, but they are not the same pose data. In addition, the term "plurality" or the like used in the embodiments of the present application means two or more.
For image anti-shake, the targets mainly comprise optical anti-shake and digital anti-shake modes. Among them, optical anti-shake is a phenomenon in which instrument shake occurring in an image capturing process is avoided by setting of optical components (e.g., lens setting, etc.). The digital anti-shake is realized by analyzing the image acquired by the image sensor in a software mode and performing anti-shake compensation on the image. Common digital anti-shake methods include image anti-shake by image feature matching, motion estimation by acquired images, and motion compensation of images according to motion estimation results to realize image anti-shake and the like. However, the existing digital anti-shake method still has the problems of single application scene, complex calculation process, poor anti-shake effect and the like, so that the image anti-shake technology still needs to be improved.
In the embodiment of the application, an image anti-shake method, an image anti-shake device, electronic equipment and a computer readable storage medium are provided, target gesture data after shake elimination corresponding to an image to be processed at the current moment can be accurately obtained through a gesture processing model, and further shake compensation is performed on an image acquisition device to obtain a target image after shake elimination, so that the blurring phenomenon of the image caused by shake can be effectively eliminated, the image anti-shake effect is achieved, and the visual effect of the image is improved.
The embodiment of the application provides an electronic device, which may include, but is not limited to, a mobile phone, an intelligent wearable device, a tablet computer, a PC (Personal Computer ), a vehicle-mounted terminal, a digital camera, and the like, and the embodiment of the application is not limited thereto. The electronic device includes image Processing circuitry, which may be implemented using hardware and/or software components, and may include various Processing units defining an ISP (IMAGE SIGNAL Processing) pipeline. FIG. 1 is a block diagram of an image processing circuit in one embodiment. For ease of illustration, fig. 1 illustrates only aspects of image processing techniques associated with embodiments of the present application.
As shown in fig. 1, the image processing circuit includes an ISP processor 140 and a control logic 150. Image data captured by imaging device 110 is first processed by ISP processor 140, where ISP processor 140 analyzes the image data to capture image statistics that may be used to determine one or more control parameters of imaging device 110. Imaging device 110 may include one or more lenses 112 and an image sensor 114. The image sensor 114 may include a color filter array (e.g., bayer filters), and the image sensor 114 may acquire light intensity and wavelength information captured by each imaging pixel and provide a set of raw image data that may be processed by the ISP processor 140. The attitude sensor 120 (e.g., tri-axis gyroscope, hall sensor, accelerometer, etc.) may provide acquired image processing parameters (e.g., anti-shake parameters) to the ISP processor 140 based on the type of attitude sensor 120 interface. The attitude sensor 120 interface may employ an SMIA (Standard Mobile Imaging Architecture ) interface, other serial or parallel camera interfaces, or a combination of the above.
It should be noted that, although only one imaging device 110 is shown in fig. 1, in an embodiment of the present application, at least two imaging devices 110 may be included, where each imaging device 110 may correspond to one image sensor 114, or a plurality of imaging devices 110 may correspond to one image sensor 114, which is not limited herein. The operation of each imaging device 110 may be as described above.
In addition, the image sensor 114 may also send raw image data to the gesture sensor 120, the gesture sensor 120 may provide raw image data to the ISP processor 140 based on the gesture sensor 120 interface type, or the gesture sensor 120 may store raw image data in the image memory 130.
The ISP processor 140 processes the raw image data on a pixel-by-pixel basis in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and ISP processor 140 may perform one or more image processing operations on the raw image data, collecting statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.
ISP processor 140 may also receive image data from image memory 130. For example, the gesture sensor 120 interface sends the raw image data to the image memory 130, where the raw image data in the image memory 130 is provided to the ISP processor 140 for processing. Image memory 130 may be part of a memory device, a storage device, or a separate dedicated memory within an electronic device, and may include DMA (Direct Memory Access ) features.
Upon receiving raw image data from the image sensor 114 interface or from the pose sensor 120 interface or from the image memory 130, the ISP processor 140 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 130 for additional processing before being displayed. The ISP processor 140 receives the processing data from the image memory 130 and performs image data processing in the original domain and in the RGB and YCbCr color spaces on the processing data. The image data processed by ISP processor 140 may be output to display 160 for viewing by a user and/or further processing by a graphics engine or GPU (Graphics Processing Unit, graphics processor). In addition, the output of ISP processor 140 may also be sent to image memory 130, and display 160 may read image data from image memory 130. In one embodiment, image memory 130 may be configured to implement one or more frame buffers.
The statistics determined by ISP processor 140 may be sent to control logic 150. For example, the statistics may include image sensor 114 statistics such as vibration frequency of gyroscope, auto-exposure, auto-white balance, auto-focus, flicker detection, black level compensation, lens 112 shading correction, etc. The control logic 150 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware) that may determine control parameters of the imaging device 110 and control parameters of the ISP processor 140 based on the received statistics. For example, the control parameters of the imaging device 110 may include attitude sensor 120 control parameters (e.g., gain, integration time for exposure control, anti-shake parameters, etc.), camera flash control parameters, camera anti-shake displacement parameters, lens 112 control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters. The ISP control parameters may include gain levels and color correction matrices for automatic white balancing and color adjustment (e.g., during RGB processing), as well as lens 112 shading correction parameters.
Exemplary, the image anti-shake method, the apparatus, the electronic device and the computer readable storage medium provided by the embodiment of the application are described with reference to the image processing circuit of fig. 1. The pose sensor 120 may acquire pose data of the imaging device 110 in real time while the imaging device 110 acquires each frame of image data. When acquiring the image to be processed at the current time, the ISP processor 140 may acquire raw pose data corresponding to the image to be processed at the current time acquired by the pose sensor 120, where the raw pose data may include at least first pose data of the imaging device 110 when acquiring the image to be processed. The ISP processor 140 may process the original pose data through a pose processing model to obtain target pose data corresponding to the image to be processed, where the target pose data is used to characterize the pose of the imaging device 110 after the shake is eliminated, and then perform image transformation on the image to be processed according to the target pose data and the first pose data to obtain a target image with the shake eliminated.
As shown in fig. 2, in one embodiment, an image anti-shake method is provided, which can be applied to the above electronic device, and the method may include the following steps:
step 210, acquiring original posture data corresponding to an image to be processed at the current moment, wherein the original posture data at least comprises first posture data of the image acquisition device when the image to be processed is acquired.
The electronic device may acquire an image to be processed at the current time, where the image to be processed at the current time may refer to an image that the electronic device needs to perform image shake removal processing at the current time, and the image to be processed may be an image acquired by the image acquisition device (i.e. the imaging device 110, such as a camera, as described above) in real time, or may be an image stored in a memory. It should be noted that, the image to be processed at the current time does not refer to an image acquired by the image acquisition device at the current time, and there is usually a certain time delay from the image acquisition device acquiring the image to the processor performing the image shake removing processing on the image, for example, the image acquisition device acquires the 10 th frame of image at the current time, and the image to be processed at the current time is the 3 rd frame of image. It will be appreciated that if the performance of the processor is high, there may be no time delay between the image capturing device capturing the image and the image debounce processing performed by the processor, and the image to be processed at the current time is the image captured by the image capturing device at the current time.
In the process of acquiring the image by the image acquisition device, the gesture sensor in the electronic equipment can acquire gesture data of the electronic equipment in real time, and the gesture data acquired by the gesture sensor can be used as gesture data of the image acquisition device. The attitude sensor may include one or more of a gyro sensor, an acceleration sensor, a gravity sensor, and the like, but is not limited thereto. Optionally, the gesture data may include an euler angle gesture, and the euler angle speed acquired by the gesture sensor at each acquisition time may be integrated to obtain gesture data corresponding to each acquisition time.
Specifically, each frame of image acquired by the image acquisition device may carry a first timestamp (for representing the acquisition time of the image), and each frame of gesture data acquired by the gesture sensor may carry a second timestamp (for representing the acquisition time of the gesture data). The first time stamp corresponding to each frame of image can be matched with the second time stamp corresponding to each frame of gesture data, synchronization of image acquisition and gesture data is achieved, and gesture data with the same second time stamp as the first time stamp of the image can be used as gesture data of the image acquisition device when the image is acquired.
The electronic equipment can acquire original gesture data corresponding to the image to be processed at the current moment, wherein the original gesture data at least comprises first gesture data of the image acquisition device when the image to be processed is acquired, and a second timestamp of the first gesture data can be equal to a first timestamp of the image to be processed.
In some embodiments, the original pose data corresponding to the image to be processed at the current moment may further include, in addition to the first pose data of the image acquisition device when the image to be processed is acquired, second pose data corresponding to M frames of history images acquired by the image acquisition device before the image to be processed respectively, and/or third pose data corresponding to N frames of future images acquired by the image acquisition device after the image to be processed respectively, where M, N are all positive integers.
As an embodiment, M may be greater than N, for example, M may be 100, 120, N may be 10, 15, or M may be 12, 13, N may be 2, 5, etc., and specific values of M, N are not limited in this disclosure. Optionally, the second gesture data corresponding to the M frame history images acquired by the image acquisition device before the first timestamp may be acquired according to the first timestamp corresponding to the image to be processed at the current moment, and/or the third gesture data corresponding to the N frame future images acquired by the image acquisition device after the first timestamp may be acquired according to the first timestamp corresponding to the image to be processed at the current moment.
For example, the image to be processed at the current moment is a 20 th frame image, 10 frames of images (i.e. 10 th frame to 19 th frame of images) acquired before the 20 th frame image are taken as history images, and pose data corresponding to the 10 th frame to 19 th frame of images respectively are acquired. For another example, the image to be processed at the current moment is a 20 th frame image, and 5 frames of images (i.e. 21 st frame to 25 th frame images) acquired after the 20 th frame image are taken as future images, and pose data corresponding to the 21 st frame to the 25 th frame images respectively are acquired.
In some embodiments, before the original gesture data corresponding to the image to be processed at the current moment is acquired, the method further comprises the step of buffering gesture data corresponding to a first number of images acquired by the image acquisition device, wherein the first number of images at least comprise M frames of historical images and/or N frames of future images.
After the gesture sensor collects gesture data, the electronic device can buffer multi-frame gesture data collected by the gesture sensor. Alternatively, only pose data corresponding to the first number of images acquired by the image acquisition device may be cached, for example, the first number is 50, and only 50 frames of pose data having the same second timestamp as the first timestamp of each of the 50 frames of images may be cached.
Optionally, in a target acquisition time period corresponding to the first number of images acquired by the image acquisition device, gesture data acquired by the gesture sensor may be cached, for example, the first number is 50, and the target acquisition time period for acquiring the 50 frames of images by the image acquisition device is a-b, so that the gesture data acquired by the gesture sensor in the target acquisition time period a-b may be cached directly.
By caching the gesture data corresponding to the first number of images acquired by the image acquisition device, M second gesture data and N third gesture data can be acquired when each frame of image to be processed is subjected to image debouncing processing, and the processing requirements of a subsequent gesture processing model are met.
And 220, processing the original gesture data through a gesture processing model to obtain target gesture data corresponding to the image to be processed, wherein the target gesture data is used for representing the gesture of the image acquisition device after the shake is eliminated.
After the original posture data corresponding to the image to be processed at the current moment is obtained, the obtained original posture data can be input into a posture processing model, and the posture processing model can process the original posture data to obtain target posture data corresponding to the image to be processed. The gesture processing model can be obtained by training according to a plurality of original sample gesture data and a smooth gesture curve corresponding to the plurality of original sample gesture data, wherein the smooth gesture curve is obtained by performing smooth processing on the plurality of original sample gesture data. The gesture processing model is trained by utilizing the plurality of original sample gesture data and smooth gesture curves corresponding to the plurality of original sample gesture data, so that the gesture processing model has the capability of accurately predicting and obtaining the target gesture data for eliminating the shake according to the original gesture data. The target gesture data corresponding to the image to be processed, which is obtained by the gesture processing model, and the gesture data before the image to be processed and after the image to be processed can be in smooth transition, so that the phenomenon of image blurring caused by shaking can be eliminated by correcting the image to be processed by using the target gesture data.
The gesture processing model may include, but is not limited to, network models such as CNN (Convolutional Neural Networks, convolutional neural network), RNN (Recurrent Neural Network ), and the like, and the specific architecture of the gesture processing model is not limited in embodiments of the present application.
As an implementation manner, the original gesture data may include first gesture data of the image acquisition device when acquiring the image to be processed, and the second gesture data and/or the third gesture data, and the gesture processing model may analyze a plurality of gesture data with continuous acquisition time, extract characteristics of the gesture data, and obtain target gesture data corresponding to the image to be processed based on the characteristics. The gesture processing model can synthesize more gesture data to analyze and obtain target gesture data corresponding to the image to be processed, and accuracy of the target gesture data can be improved.
And 230, performing image transformation on the image to be processed according to the target attitude data and the first attitude data to obtain a target image.
The electronic equipment can utilize the target attitude data to carry out jitter compensation on the image to be processed, and carry out image transformation on the image to be processed according to the target attitude data and the first attitude data so as to convert the image to be processed acquired by the image acquisition device under the first attitude data into a target image when the image acquisition device carries out image acquisition by the target attitude data.
As a specific implementation mode, according to the target attitude data and the first attitude data, attitude difference data between the target attitude data and the first attitude data can be determined, and a transformation matrix corresponding to the image to be processed is calculated according to the attitude difference data. Wherein the transformation matrix may include at least one of a rotation matrix and a translation matrix. The image to be processed can be transformed according to the transformation matrix corresponding to the image to be processed, transformation operation can be carried out on pixel coordinates of each pixel point in the image to be processed according to the transformation matrix, and the pixel coordinates after transformation of each pixel point are determined, so that the target image is obtained.
The electronic equipment obtains target attitude data corresponding to the image to be processed by utilizing the attitude processing model, and the target attitude data and the attitude data before and after the image to be processed can be in smooth transition, so that the target image obtained after jitter compensation is carried out on the target attitude data is more stable, and the condition of image blurring caused by jitter can be eliminated. Compared with the mode of filtering the original gesture data by using a filtering algorithm in the related art, the method has the advantages that more accurate target gesture data can be obtained by using the neural network, and particularly more gesture data do not need to be cached, so that the memory overhead is reduced, and the overall performance of the electronic equipment is improved.
In the embodiment of the application, as the gesture processing model is obtained by training according to the gesture data of a plurality of original samples and smooth gesture curves corresponding to the gesture data of the plurality of original samples, the target gesture data after the shake elimination corresponding to the image to be processed at the current moment can be accurately obtained through the gesture processing model, so that the shake compensation is accurately carried out on the image acquisition device by utilizing the target gesture data, the target image after the shake is eliminated is obtained, the blurring phenomenon of the image caused by the shake can be effectively eliminated, the image shake prevention effect is achieved, and the visual effect of the image is improved.
In some embodiments, the gesture processing model may be a CNN. The step of processing the original gesture data through the gesture processing model to obtain target gesture data corresponding to the image to be processed can comprise the steps of performing cavity convolution processing on the original gesture data through one or more cavity convolution layers in the gesture processing model, and obtaining the target gesture data corresponding to the image to be processed according to a result of the cavity convolution processing.
The gesture processing model may include one or more hole convolution layers, raw gesture data may be input into the gesture processing model, and features of the raw gesture data may be extracted through the one or more hole convolution layers in the gesture processing model. Starting from the layer 2 cavity convolution layer, each layer of cavity convolution layer can carry out cavity convolution operation on the characteristics output by the previous layer of cavity convolution layer, the scales of the characteristics output by each layer of cavity convolution layer can be the same, the receptive field can be increased through the cavity convolution layer, the scales of the characteristics can not be reduced, the calculated amount can not be increased, the resources occupied by the gesture processing model can be reduced, and the operation efficiency of the gesture processing model is improved.
As an implementation manner, the original gesture data may include first gesture data of the image acquisition device when acquiring the image to be processed, and the second gesture data and/or the third gesture data, each gesture data included in the original gesture data may be input into the gesture processing model at the same time, and the gesture processing model may generate a feature (such as a feature vector or a feature matrix) corresponding to the image to be processed according to each input gesture data, and input the generated feature into the layer 1 hole convolution layer. The 1 st layer of the hole convolution layer can carry out hole convolution operation on the input features, the features obtained by operation are input to the 2 nd layer of the hole convolution layer, the number of the operation is the number of the operation, and the like until the last layer of the hole convolution layer, and then target gesture data corresponding to the image to be processed are determined according to the features output by the last layer of the hole convolution layer.
FIG. 3 is a schematic diagram of a gesture processing model in one embodiment. As shown in fig. 3, the gesture processing model 310 may include N hole convolution layers, M second gesture data, first gesture data, and N third gesture data may be input into the gesture processing model 310, and the gesture processing model 310 may process the input gesture data through the N hole convolution layers to obtain target gesture data.
It should be noted that the gesture processing model may further include other network modules besides one or more hole convolution layers, for example, a pooling layer, a full-connection layer, and the like, and is not limited to the hole convolution layers described above.
In some embodiments, the gesture processing model may be an RNN. As shown in fig. 4, the step of processing the original pose data through the pose processing model to obtain target pose data corresponding to the image to be processed may include the following steps:
Step 402, sequentially inputting each attitude data into the attitude processing model according to the acquisition time corresponding to each attitude data contained in the original attitude data according to the sequence of the acquisition time from the beginning to the end.
In the embodiment of the application, the original gesture data corresponding to the image to be processed at the current moment can comprise first gesture data when the image acquisition device acquires the image to be processed, second gesture data respectively corresponding to M frames of historical images acquired by the image acquisition device before the image to be processed, and/or third gesture data respectively corresponding to N frames of future images acquired by the image acquisition device after the image to be processed. The gesture data contained in the original gesture data can be used as sequence information to be input into the gesture processing model according to the sequence from the beginning to the end of the acquisition time.
For example, the image to be processed at the current moment is a 51 st frame image acquired by the image acquisition device, the original gesture data corresponding to the image to be processed at the current moment may include gesture data corresponding to a1 st frame image to a 60 th frame image acquired by the image acquisition device, and then the gesture data corresponding to the 1 st frame image to the 60 th frame image may be sequentially input into the gesture processing model as a sequence information according to a sequence from first to last of second time stamps (acquisition moments of the representation gesture data) corresponding to the gesture data of each frame.
And step 404, obtaining the predicted gesture data output at this time according to the gesture data input at this time and the predicted gesture data output at last time through a gesture processing model.
Step 406, determining whether the gesture data input this time is the last gesture data in the original gesture data at the time of collection, if yes, executing step 408, if not, taking the predicted gesture data output this time as the predicted gesture data output last time, and continuing to execute step 404.
And step 408, determining the output predicted gesture data as target gesture data corresponding to the image to be processed.
The gesture processing model can continuously perform cyclic iterative computation based on a time axis, and can acquire gesture data input at this time, process the gesture data input at this time and predicted gesture data output last time, and acquire the predicted gesture data output at this time. Whether the gesture data input at this time is the last gesture data of the original gesture data at the collection time can be judged, if the gesture data input at this time is the last gesture data of the original gesture data at the collection time (namely the last input gesture data of the original gesture data), loop iteration can be stopped, and the predicted gesture data output at this time is determined as the target gesture data corresponding to the image to be processed.
If the gesture data input at this time is not the last gesture data of the collection time in the original gesture data, the predicted gesture data output at this time can be used as new predicted gesture data output at last time, the new input gesture data is continuously obtained, and the step of obtaining the predicted gesture data output at this time according to the gesture data input at this time and the predicted gesture data output at last time through the gesture processing model is continuously executed.
As shown in fig. 5A and 5B, fig. 5A is a schematic diagram of a gesture processing model in another embodiment, and fig. 5B is a schematic diagram of developing the gesture processing model in fig. 5A according to a time axis. The gesture processing model 510 may include an input layer (not shown), a hidden layer C, and an output layer (not shown), and may sequentially calculate, according to the order of the collection time corresponding to each gesture data, the gesture data input at this time and the predicted gesture data output last time by the hidden layer C through the hidden layer C, to obtain the predicted gesture data output by the hidden layer C at this time. For example, the hidden layer C may calculate the posture data input at the time t and the predicted posture data output by the hidden layer C at the time t-1 to obtain the predicted posture data output at the time t, where the predicted posture data output at the time t may be used to determine the predicted posture data output at the time t+1 together with the posture data input at the time t+1.
As one embodiment, the hidden layer may include a neuronal network, which may include, but is not limited to, LSTM (Long Short-Term Memory) neurons, and the like.
The posture processing model may be a neural network model other than the CNN and RNN described above.
In the embodiment of the application, the target gesture data after the shake elimination corresponding to the image to be processed at the current moment can be accurately obtained through the gesture processing model, the shake removing effect of the image can be effectively improved, and the visual effect of the image is improved.
In another embodiment, as shown in fig. 6, an image anti-shake method is provided, which can be applied to the above electronic device, and the method may include the following steps:
Step 602, obtaining original sample gesture data corresponding to a plurality of acquisition moments respectively, and performing smooth filtering processing on the original sample gesture data to obtain a smooth gesture curve corresponding to the original sample gesture data.
Prior to training the pose processing model, a training data set may be generated that may include a large amount of raw sample pose data. Further, the training data set may include one or more groups of sample data, each group of sample data may include original sample gesture data corresponding to a plurality of acquisition moments, and smoothing filtering processing may be performed on a plurality of original sample gesture data included in each group of sample data, so as to obtain a smooth gesture curve corresponding to each group of sample data.
In some embodiments, the raw sample pose data may be smoothed by a smoothing filter, which may include, but is not limited to, any of a mean filter, a median filter, a weighted filter, and the like. Optionally, the plurality of original sample gesture data may be arranged in order from first to last according to the collection time, the smoothing filter may sequentially slide in the arranged original sample gesture data according to the target window, and the smoothing filter may perform smoothing filter processing on the original sample gesture data in the target window, so as to obtain filtered sample gesture data. And the filtered sample gesture data corresponding to each original sample gesture data can form a smooth gesture curve.
Alternatively, the filter parameters of the smoothing filter may be preset fixed parameters, and the filter parameters may be obtained by measuring and calculating according to multiple experimental data, and the filter parameters may include, but are not limited to, the window size of the target window, and the like, and the filter parameters may be used to determine the filtering strength of the smoothing filter, for example, the larger the target window is, the larger the filtering strength is, the smaller the target window is, and the smaller the filtering strength is.
Alternatively, the filter parameters of the smoothing filter may also be determined from the raw sample pose data. The filter parameters of the smoothing filter can be determined according to the original sample posture data, and the smoothing filter adopting the filter parameters is used for carrying out smoothing filtering processing on the original sample posture data to obtain a smoothing posture curve corresponding to the original sample posture data.
As an embodiment, the degree of data fluctuation between the original sample posture data corresponding to each of the plurality of acquisition times may be determined, and the filter parameters of the smoothing filter may be determined according to the degree of data fluctuation. Alternatively, a difference between the raw sample attitude data at each adjacent two acquisition times may be calculated, and the degree of data fluctuation may be determined based on the calculated difference, e.g., the larger the difference, the larger the degree of data fluctuation, the smaller the difference, the smaller the degree of data fluctuation.
In some embodiments, the determined filtering effect corresponding to the filter parameter of the smoothing filter may have a positive correlation with the degree of data fluctuation between the plurality of original sample gesture data, that is, the greater the degree of data fluctuation, the stronger the determined filtering effect corresponding to the filter parameter may be, and the smoother the obtained smooth gesture curve may be ensured, and the smaller the degree of data fluctuation, the weaker the determined filtering effect corresponding to the filter parameter may be, and the smooth gesture curve may be made to be more fit with the actual original sample gesture data, so that the stability and smoothness of the smooth gesture curve may be ensured, and the accuracy of the gesture processing model may be improved.
Further, the original sample gesture data corresponding to the plurality of acquisition moments can be divided into a plurality of groups, and the acquisition moments corresponding to the original sample gesture data contained in each group are continuous. For example, 50 raw sample pose data may be included, and then the 50 raw sample pose data may be divided into 3 groups, with the acquisition times corresponding to the raw sample pose data within each group being consecutive.
Alternatively, the plurality of raw sample pose data may be divided into a plurality of groups directly, for example, 50 raw sample pose data are divided into 5 groups, and each group contains 10 raw sample pose data.
Alternatively, the difference between the raw sample pose data at each adjacent two acquisition times may also be calculated, and the raw sample pose data may be divided into a plurality of groups based on the calculated difference. For example, whether the difference between the current original sample posture data and the previous original sample posture data is larger than a difference threshold value or not can be judged in sequence from the beginning to the end of the acquisition time, if not, the current original sample posture data can be divided into the same groups as the previous original sample posture data, and if so, the current original sample posture data can be divided into new groups. Therefore, the fluctuation degree of the original sample attitude data in each group is smaller, and the generated smooth sub-curve is more accurate.
After the original sample posture data is divided into a plurality of groups, the data fluctuation degree corresponding to each group can be determined according to the original sample posture data contained in each group, the filter parameters can be determined according to the data fluctuation degree, the original sample posture data contained in each group can be subjected to smoothing filter processing through smoothing filters corresponding to each group, a smoothing sub-curve corresponding to each group is obtained, and a smoothing posture curve is determined according to the smoothing sub-curve.
The manner of determining the degree of fluctuation of the data corresponding to each group by determining the original sample posture data included in each group may be similar to the manner of determining the degree of fluctuation of the data between the original sample posture data corresponding to each of the plurality of acquisition moments, and will not be repeated here. For the original sample gesture data of each group, the filter parameters of the smoothing filters corresponding to each group can be respectively set according to the data fluctuation degree of the original sample gesture data in each group, so that the smoothing filters have more pertinence, and the accuracy of the smoothing gesture curve can be improved.
Illustratively, FIG. 7 is a schematic diagram of a smooth gesture curve in one embodiment. As shown in fig. 7, the curve 720 is an original posture curve formed by original sample posture data corresponding to a plurality of acquisition moments, and the curve 710 can be a smooth posture curve obtained by performing smoothing filtering processing on the original sample posture data corresponding to the plurality of acquisition moments. Compared with the original gesture curve, the smooth gesture curve is smoother and more natural in transition, and the shaking compensation is carried out on the image acquired by the image acquisition device through gesture data on the smooth gesture curve, so that the conditions of image blurring and the like caused by shaking can be effectively eliminated.
Step 604, inputting the original sample gesture data and the smooth gesture curve corresponding to the sample collection time into a gesture processing model to be trained, and processing the original sample gesture data corresponding to the sample collection time through the gesture processing model to be trained to obtain predicted gesture data.
Step 606, determining an output error according to the predicted gesture data and the corresponding smooth gesture data in the smooth gesture curve at the sample acquisition time by the gesture processing model to be trained.
And 608, adjusting parameters in the gesture processing model to be trained according to the output error.
The electronic device may train the gesture processing model to be trained according to the training data set. The original sample posture data and the smooth posture curve corresponding to the sample collection time can be input into a posture processing model to be trained, and the original sample posture data corresponding to the sample collection time is processed through the posture processing model to be trained, so that the predicted posture data corresponding to the sample collection time is obtained. The gesture processing model to be trained can compare the predicted gesture data corresponding to the sample acquisition time with the smooth gesture data corresponding to the sample acquisition time in the smooth gesture curve, and calculate the output error of the predicted gesture data and the smooth gesture data corresponding to the sample acquisition time through a preset loss function. And adjusting parameters in the gesture processing model to be trained according to the output error, and performing next training until convergence conditions are reached, so as to obtain the gesture processing model after training.
Alternatively, the convergence condition may include, but is not limited to, the output error being less than an error threshold, the number of parameter iterations (i.e., the number of training times reaching a number threshold, etc.).
In some embodiments, to improve the accuracy of the gesture processing model, the gesture processing model may be trained with more gesture data at a time. The original sample posture data corresponding to the sample collection time, the original sample posture data corresponding to the M collection times among the sample collection times and the original sample posture data corresponding to the N collection times after the sample collection time can be input into a posture processing model to be trained, and the input original sample posture data are processed through the posture processing model to be trained to obtain predicted posture data corresponding to the sample collection times. And calculating an output error according to the predicted attitude data and the smooth attitude data corresponding to the sample acquisition time. And the gesture processing model is trained by using more gesture data every time, so that the accuracy of the gesture processing model can be improved, the accuracy of an output result of the gesture processing model is improved, and the jitter removing effect of the image is further improved.
Step 610, acquiring original pose data corresponding to the image to be processed at the current moment, where the original pose data at least includes first pose data of the image acquisition device when acquiring the image to be processed.
And step 612, processing the original gesture data through the gesture processing model after training to obtain target gesture data corresponding to the image to be processed, wherein the target gesture data is used for representing the gesture of the image acquisition device after eliminating the shake.
Step 614, performing image transformation on the image to be processed according to the target pose data and the first pose data to obtain a target image.
The descriptions of steps 610 to 614 may refer to the related descriptions in the above embodiments, and the descriptions are not repeated here.
In the embodiment of the application, the gesture processing model can be trained by utilizing the original sample gesture data respectively corresponding to a plurality of acquisition moments and the corresponding smooth gesture curve, so that the gesture processing model after training has the capability of outputting smooth target gesture data, and the image acquisition device is accurately subjected to jitter compensation by utilizing the target gesture data, thereby achieving the image anti-jitter effect. Compared with the mode of directly utilizing a filter to obtain target attitude data in the related art, the target attitude data obtained through the attitude processing model is more accurate, excessive data does not need to be cached in the image anti-shake process, memory overhead is reduced, and system performance of the electronic equipment is improved. In addition, power consumption in the image anti-shake processing process can be reduced.
As shown in fig. 8, in one embodiment, an image anti-shake apparatus 800 is provided, which is applicable to the above-mentioned electronic device. The image anti-shake apparatus 800 may include a data acquisition module 810, an attitude processing module 820, and a transformation module 830.
The data acquisition module 810 is configured to acquire original pose data corresponding to an image to be processed at a current moment, where the original pose data includes at least first pose data of the image acquisition device when the image to be processed is acquired.
In one embodiment, the original gesture data further comprises second gesture data corresponding to M frames of history images acquired by the image acquisition device before the image to be processed respectively, and/or third gesture data corresponding to N frames of future images acquired by the image acquisition device after the image to be processed respectively, wherein M, N are positive integers.
In one embodiment, the image anti-shake apparatus 800 further includes a buffer module.
The caching module is used for caching the gesture data corresponding to the first number of images acquired by the image acquisition device, wherein the first number of images at least comprise M frames of historical images and/or N frames of future images.
The gesture processing module 820 is configured to process the raw gesture data through a gesture processing model to obtain target gesture data corresponding to the image to be processed, where the target gesture data is used to characterize a gesture of the image acquisition device after eliminating jitter, the gesture processing model is obtained by training according to a plurality of raw sample gesture data and a smooth gesture curve corresponding to the plurality of raw sample gesture data, and the smooth gesture curve is obtained by performing smoothing processing on the plurality of raw sample gesture data.
The transformation module 830 is configured to perform image transformation on the image to be processed according to the target pose data and the first pose data, so as to obtain a target image.
In one embodiment, the gesture processing module 820 is further configured to perform a hole convolution process on the original gesture data through one or more hole convolution layers in the gesture processing model, and obtain target gesture data corresponding to the image to be processed according to a result of the hole convolution process.
In one embodiment, the gesture processing module 820 is further configured to sequentially input each gesture data into the gesture processing model according to the acquisition time corresponding to each gesture data included in the original gesture data in a sequence from the beginning to the end of the acquisition time, obtain the predicted gesture data output this time according to the gesture data input this time and the predicted gesture data output last time through the gesture processing model, and use the predicted gesture data output this time as new predicted gesture data output last time, and continuously execute the steps of obtaining the predicted gesture data output this time according to the gesture data input this time and the predicted gesture data output last time through the gesture processing model until the gesture data input this time is the last gesture data of the acquisition time in the original gesture data, and determine the predicted gesture data output this time as the target gesture data corresponding to the image to be processed.
In one embodiment, the image anti-shake apparatus 800 further comprises a training module.
And the training module is used for training the gesture processing model according to the plurality of original sample gesture data and the smooth gesture curves corresponding to the plurality of original sample gesture data.
The training module comprises a filtering unit, a prediction unit, an error determination unit and an adjusting unit.
The filtering unit is used for acquiring original sample posture data corresponding to a plurality of acquisition moments respectively, and carrying out smooth filtering processing on the original sample posture data to obtain a smooth posture curve corresponding to the original sample posture data.
In one embodiment, the filtering unit is further configured to determine a filter parameter of the smoothing filter according to the original sample gesture data, and perform smoothing filtering processing on the original sample gesture data by using the smoothing filter of the filter parameter, so as to obtain a smooth gesture curve corresponding to the original sample gesture data.
In one embodiment, the filtering unit is further configured to divide the raw sample gesture data into a plurality of groups, where the collection time corresponding to the raw sample gesture data included in each group is continuous, determine a data fluctuation degree corresponding to each group according to the raw sample gesture data included in each group, determine a filter parameter according to the data fluctuation degree, and set a smoothing filter corresponding to each group, and perform smoothing filtering processing on the raw sample gesture data included in each group through the smoothing filter corresponding to each group, so as to obtain a smoothing sub-curve corresponding to each group, and determine a smoothing gesture curve according to the smoothing sub-curve.
The prediction unit is used for inputting the original sample gesture data and the smooth gesture curve corresponding to the sample acquisition time into a gesture processing model to be trained, and processing the original sample gesture data corresponding to the sample acquisition time through the gesture processing model to be trained to obtain predicted gesture data.
In one embodiment, the prediction unit is further configured to input, to a gesture processing model to be trained, original sample gesture data corresponding to sample acquisition times, original sample gesture data corresponding to M acquisition times between sample acquisition times, and original sample gesture data corresponding to N acquisition times after the sample acquisition times, respectively, and process the input original sample gesture data through the gesture processing model to be trained to obtain predicted gesture data corresponding to the sample acquisition times.
The error determining unit is used for determining an output error according to the predicted gesture data and the smooth gesture data corresponding to the sample acquisition time in the smooth gesture curve through the gesture processing model to be trained.
And the adjusting unit is used for adjusting parameters in the gesture processing model to be trained according to the output error.
In the embodiment of the application, the gesture processing model can be trained by utilizing the original sample gesture data respectively corresponding to a plurality of acquisition moments and the corresponding smooth gesture curve, so that the gesture processing model after training has the capability of outputting smooth target gesture data, and the image acquisition device is accurately subjected to jitter compensation by utilizing the target gesture data, thereby achieving the image anti-jitter effect. Compared with the mode of directly utilizing a filter to obtain target attitude data in the related art, the target attitude data obtained through the attitude processing model is more accurate, excessive data does not need to be cached in the image anti-shake process, memory overhead is reduced, and system performance of the electronic equipment is improved.
Fig. 9 is a block diagram of an electronic device in one embodiment. As shown in fig. 9, the electronic device 900 may include one or more processors 910, a memory 920 coupled to the processors 910, wherein the memory 920 may store one or more computer programs that may be configured to implement methods as described in the embodiments above when executed by the one or more processors 910.
Processor 910 may include one or more processing cores. The processor 910 utilizes various interfaces and lines to connect various portions of the overall electronic device 900, perform various functions of the electronic device 900, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 920, and invoking data stored in the memory 920. Alternatively, the processor 910 may be implemented in hardware in at least one of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 910 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing display contents, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor 910 and may be implemented solely by a single communication chip.
The Memory 920 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM). Memory 920 may be used to store instructions, programs, code, sets of codes, or instruction sets. The memory 920 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data or the like created by the electronic device 900 in use.
It is to be appreciated that electronic device 900 may include more or fewer structural elements than those described in the above-described structural block diagrams, including, for example, a power module, physical key, wiFi (WIRELESS FIDELITY ) module, speaker, bluetooth module, sensor, etc., and may not be limited herein.
The embodiment of the application discloses a computer readable storage medium storing a computer program, wherein the computer program, when being executed by a processor, implements the method as described in the above embodiment.
Embodiments of the present application disclose a computer program product comprising a non-transitory computer readable storage medium storing a computer program, which when executed by a processor, implements a method as described in the above embodiments.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, etc.
Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable nonvolatile memory can include ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (ELECTRICALLY ERASABLE PROM, EEPROM), or flash memory. Volatile memory can include random access memory (random access memory, RAM), which acts as external cache memory. By way of illustration and not limitation, RAM may take many forms, such as static RAM (STATIC RAM, SRAM), dynamic RAM (Dynamic Random Access Memory, DRAM), synchronous DRAM (SDRAM), double-data-rate SDRAM (Double DATA RATE SDRAM, DDR SDRAM), enhanced SDRAM (Enhanced Synchronous DRAM, ESDRAM), synchronous link DRAM (SYNCHLINK DRAM, SLDRAM), memory bus Direct RAM (Rambus DRAM), and Direct memory bus dynamic RAM (DRDRAM).
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments and that the acts and modules referred to are not necessarily required for the present application.
In various embodiments of the present application, it should be understood that the sequence numbers of the foregoing processes do not imply that the execution sequences of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation of the embodiments of the present application.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The above embodiments of the present application disclose an image anti-shake method, an apparatus, an electronic device, and a computer readable storage medium, and specific examples are applied to illustrate the principles and embodiments of the present application, where the above embodiments are only used to help understand the method and core idea of the present application. Meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.