Detailed Description
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may include other steps or elements not listed or inherent to such process, method, article, or apparatus in one possible example.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The electronic device according to the embodiment of the present application may include, but is not limited to: a smart phone, a tablet computer, a smart robot, a vehicle-mounted device, a wearable device, a computing device or other processing device connected to a wireless modem, as well as various forms of User Equipment (UE), a Mobile Station (MS), a terminal device (terminal device), and the like, which are not limited herein, the electronic device may also be a server.
Referring to fig. 1A, fig. 1A is a schematic flowchart of an image fusion method provided in an embodiment of the present application, and as shown in the drawing, the image fusion method is applied to an electronic device, and includes:
101. and acquiring an infrared image through an infrared camera.
In the embodiment of the application, the infrared image can be any frame image in a video shot by an infrared camera. In specific implementation, the electronic equipment can shoot through the infrared camera, and then, an infrared image can be obtained.
Optionally, in step 101, acquiring an infrared image by an infrared camera may include the following steps:
11. acquiring a target environment temperature;
12. determining a first shooting parameter corresponding to the target ambient temperature according to a mapping relation between a preset ambient temperature and the shooting parameter;
13. and shooting according to the first shooting parameter to obtain the infrared image.
In the embodiment of the present application, a mapping relationship between a preset ambient temperature and a shooting parameter may be pre-stored in the electronic device, and the shooting parameter may be at least one of the following: the focal length, the sensitivity, the infrared light brightness, the wavelength of the infrared light, the operating frequency of the infrared light, the emitting power of the infrared light, the operating current of the infrared camera, the operating voltage of the infrared camera, the operating power of the infrared camera, and the like, which are not limited herein.
In the specific implementation, the electronic equipment can acquire the target environment temperature, determine the first shooting parameter corresponding to the target environment temperature according to the mapping relation between the preset environment temperature and the shooting parameter, further obtain the shooting parameter suitable for the temperature, shoot according to the first shooting parameter, obtain the infrared image, and contribute to improving the image quality of the infrared image.
Optionally, in the step 11, obtaining the target ambient temperature may include the following steps:
111. acquiring a preview image through the infrared camera;
112. determining an average gray value of the preview image;
113. determining a reference temperature corresponding to the average gray level according to a mapping relation between a preset temperature and a gray value;
114. dividing the preview image into a plurality of areas, and determining the gray value of each area in the plurality of areas to obtain a plurality of gray values;
115. performing mean square error operation according to the plurality of gray values to obtain a target mean square error;
116. determining a target optimization factor corresponding to the target mean square error according to a mapping relation between a preset mean square error and the optimization factor;
117. and optimizing the reference temperature according to the target optimization factor to obtain the target environment temperature.
In a specific implementation, a mapping relationship between a preset temperature and a preset gray value, and a mapping relationship between a preset mean square error and an optimization factor may be stored in the electronic device in advance.
Specifically, the electronic device may acquire the preview image through the infrared camera, and determine an average gray value of the preview image, where the principle of the infrared image is based on temperature imaging, and thus the average gray value reflects the overall temperature of an object in a shooting scene to a certain extent. Furthermore, the electronic device may determine the reference temperature corresponding to the average gray scale according to a mapping relationship between preset temperatures and gray scales. The electronic equipment can also divide the preview image into a plurality of areas, determine the gray value of each area in the plurality of areas to obtain a plurality of gray values, perform mean square error operation according to the gray values to obtain a target mean square error, and determine a target optimization factor corresponding to the target mean square error according to a mapping relation between a preset mean square error and the optimization factor, wherein the value range of the optimization factor can be-0.2, the actual temperature of a shooting scene can be influenced by considering the temperature change between neighborhoods in the image, and further, the influence degree between the neighborhoods is determined through the mean square error to optimize the reference temperature, which is beneficial to accurately evaluating the temperature of the scene, and finally, the reference temperature is optimized according to the target optimization factor to obtain the target environment temperature, which is specifically as follows:
target ambient temperature (1+ target optimization factor) reference temperature
Furthermore, the environmental temperature can be accurately estimated by the preview image.
102. And acquiring a visible light image through a visible light camera, wherein the infrared camera and the visible light camera correspond to the same shooting range.
In the embodiment of the application, the electronic equipment can comprise an infrared camera system and a visible light camera system, and the infrared camera and the visible light camera can be both arranged in the infrared camera system and the visible light camera system. The infrared camera and the visible light camera correspond to the same shooting range, namely the infrared camera and the visible light camera are calibrated, and the shot preview pictures are registered.
Optionally, in step 102, acquiring a visible light image through a visible light camera may include the following steps:
21. acquiring target environment parameters;
22. determining a second shooting parameter corresponding to the target environment parameter according to a mapping relation between a preset environment parameter and the shooting parameter;
23. and shooting according to the second shooting parameter to obtain the visible light image.
In a specific implementation, the environmental parameter may be at least one of: the ambient brightness, the ambient color temperature, the ambient humidity, the weather, and the like, which are not limited herein, the shooting parameter may be at least one of the following: sensitivity, exposure duration, focal length, zoom parameters, etc., without limitation.
In the specific implementation, the electronic device may pre-store a mapping relationship between the preset environment parameter and the shooting parameter, and then the electronic device may obtain the target environment parameter, determine the second shooting parameter corresponding to the target environment parameter according to the mapping relationship between the preset environment parameter and the shooting parameter, and shoot according to the second shooting parameter to obtain the visible light image, so that the shot image suitable for the environment can be obtained.
103. A first saliency map of the infrared image is determined.
In specific implementation, the infrared image only has gray information and is low in resolution, so that the significance detection of the infrared video sequence can be realized based on the global contrast.
Optionally, in the step 103, determining the first saliency map of the infrared image may include the following steps:
31. acquiring a histogram of the infrared image;
32. and determining a first saliency map of the infrared image according to the histogram.
In a specific implementation, for an infrared video sequence diagram, i.e. an infrared image, the significance calculation formula of each pixel is as follows:
V(p)=|Ip-I1|+|Ip-I2|+...+|Ip-IN|
wherein p represents the pixel position in the infrared image, IpRepresenting the gray value of the corresponding position in the infrared image, N representing the number of pixels in the infrared image, and V may represent the first saliency map. Furthermore, the above equation can be further simplified by using the image histogram distribution:
where L represents the number of gray levels in the image, which may be generally 255, hjRepresenting the number of pixels with the gray level j in the infrared image, and obtaining a saliency map corresponding to the infrared image through normalization after calculating the saliency.
104. A second saliency map of the visible light image is determined.
In specific implementation, the image saliency is an important visual feature in an image, and represents the attention degree of human eyes to each region of the image. The electronic device may visualize a second saliency map of the visible light image.
Optionally, in the step 104, determining the second saliency map of the visible light image may include the following steps:
41. acquiring color channel parameters of the visible light image;
42. determining a red-green color component, a blue-yellow color component, a brightness component and a motion component of the visible light image according to the color channel parameters;
43. determining a first reference expression of the visible light image according to the red-green color component, the blue-yellow color component, the luminance component and the motion component;
44. simplifying the reference expression to obtain a simplified expression;
45. carrying out quaternary Fourier transform on the simplified expression to obtain a frequency domain expression;
46. extracting target phase information according to the frequency domain expression, and performing inverse Fourier transform on the target phase information to obtain a second reference expression;
47. and carrying out filtering processing on the second reference expression to obtain the second saliency map.
In a specific implementation, in this embodiment of the present application, the color channel parameter may be three color channel parameters, i.e., r, g, and b. The visible light image can be any frame image in a visible light video sequence, and the visible light video sequence significance detection comprises the following steps: the acquired picture can be represented by four components of red, green, blue and yellow, brightness and motion, and then the four-component Fourier transform is carried out on the picture to obtain a phase spectrum, wherein the formula is as follows:
RG(t)=R(t)-G(t)
BY(t)=B(t)-Y(t)
M(t)=|I(t)-I(t-1)|
wherein t represents the current frame number of the video sequence, r, g, b respectively represent three color channels of the input image, RG, BY, I, and M obtained BY calculation respectively represent two color components (red-green color component, blue-yellow color component), a luminance component, and a motion component of the image, and at this time, a quaternion can be used to represent the current frame image, i.e. a first reference expression is obtained, which is specifically as follows:
q(t)=M(t)+RG(t)u1+BG(t)u1+BY(t)u2+I(t)u3
wherein u is
iI is 1,2,3
u
1⊥u
2,u
2⊥u
3,u
1⊥u
3,u
3=u
1u
2。
Further, q (t) can be further simplified to the following formula, and a simplified expression of q (t) is obtained, specifically as follows:
q(t)=f1(t)+f2(t)u2
f1(t)=M(t)+RG(t)u1
f2(t)=BY(t)+I(t)u1
performing quaternary Fourier transform on q (t) to obtain a frequency domain expression, which is specifically as follows:
Q[u,v]=F1[u,v]+F2[u,v]u2
let Q (t) be the frequency domain representation of q (t), then Q (t) can be written in the form of a number of poles:
Q(t)=||Q(t)||euφ(t)
here, | q (t) | is a frequency spectrum part of fourier transform, Φ (t) is a phase spectrum part of fourier transform, and | q (t) | | 1 is made, phase information of q (t) frequency spectrum, that is, target phase information is extracted, and inverse fourier transform is performed on the target phase information to obtain a second reference expression, which is specifically as follows:
q′=ρ0+ρ1u1+ρ2u2+ρ3u3
where ρ isi(i ═ 0,1,2,3) represents the values of the components of the quaternion after the inverse fourier transform of the phase spectrum. Finally, gaussian filtering may be performed on the inverse fourier transform result to obtain a second saliency Map sM (t), where sM is an abbreviation of a saliency Map (saliency Map), and the details are as follows:
sM(t)=g*||q′(t)||2
where g denotes a two-dimensional gaussian filter kernel and represents the convolution operation.
105. And determining a first weight of the infrared image and a second weight of the visible light image according to the first saliency map and the second saliency map.
In specific implementation, the image saliency is an important visual feature in an image, and represents the attention degree of human eyes to each region of the image. Further, the electronic device can determine a first weight of the infrared image and a second weight of the visible light image based on the first saliency map and the second saliency map.
In a specific implementation, the step 105 of determining the first weight of the infrared image and the second weight of the visible light image according to the first saliency map and the second saliency map may include the following steps:
51. determining a first weight of the infrared image according to the following formula:
wherein, W1Is the first weight value, V1Is the first significant graph, V2Is the second saliency map;
52. determining a second weight of the visible light image according to the following formula:
W2=1-W1
wherein, W2Is the second weight.
The area of the detection target in the image can be determined by performing significance detection on the image, and the infrared and visible light images in the area are fused in the fusion process, so that the loss of scene details caused by pixel-level fusion of the infrared and visible light images is reduced. The proportion of the infrared image and the visible light image in fusion can be determined through the saliency map, and therefore the quality of the fusion image is improved.
106. And carrying out image fusion on the infrared image and the visible light image according to the first weight and the second weight to obtain a fused image.
In specific implementation, the electronic device may perform image fusion on the infrared image and the visible light image according to the first weight and the second weight, that is, perform weighting operation to obtain a fused image, which is specifically as follows:
the fused image is the first weight value, the infrared image and the second weight value, the visible light image
By the image fusion method, image fusion and image fusion between video sequences can be realized. Furthermore, the imaging characteristics of the infrared camera and the visible light camera can be utilized to fuse the images collected by the infrared camera and the visible light camera, the problems that the details of a single infrared camera are not clear and the visible light camera is easily affected by fog and dark light can be solved, and richer scene information can be obtained. In addition, the embedded device has real-time and light-weight requirements on a visible light and infrared image acquisition processing system, 1080 p-resolution visible light images and 640 x 512 infrared images need to be processed at a speed of 25 frames per second under the condition of limited computing resources, and the infrared images and the visible light images do not need to be subjected to multi-scale decomposition, so that the image fusion efficiency can be improved.
In specific implementation, the electronic device can calibrate and register the infrared camera system and the visible light camera system, respectively calculate the significance of the infrared video and the visible light video, and fuse the infrared video and the visible light video in a weighting mode according to the significance map, so that a blocked and disguised target in a scene can be observed and detected more easily in the video, and the detection capability of a single visible light camera system is improved.
According to the embodiment of the application, the saliency information of the complementary source image is obtained through a saliency extraction algorithm. The video saliency extraction algorithm can extract color, brightness and motion information in the image, and a phase spectrum of an image frequency spectrum after Fourier transformation can represent a position with small periodicity or homogeneity of the original image, so that the position of the candidate object is determined. The area of the candidate object in the infrared image is fused with the visible light image by utilizing the saliency information of the image, so that the situation detail distortion caused by the introduction of the non-target area of the infrared image by pixel level fusion can be avoided.
Motion information in video significance is extracted by using an inter-frame difference method, inter-frame differences are very sensitive to a moving target, significance extraction on inter-frame difference results can enable the moving target to have higher weight in a fusion stage, and compared with a method for detecting significance by using a single image, the method can effectively improve the detection capability of the moving target.
Of course, before step 101, the infrared camera and the visible light camera may be calibrated, in a specific implementation, as shown in fig. 1B, the infrared and visible light binocular cameras are calibrated, the left image is the heatable checkerboard calibration plate photographed by the infrared camera, and the right image is the heatable checkerboard calibration plate photographed by the visible light camera: the infrared and visible binocular systems were calibrated using a heatable checkerboard calibration plate as shown in fig. 1B.
The imaging process of the monocular camera can be represented by a pinhole model, and the relationship between a monocular camera pixel coordinate system and a world coordinate system is as follows:
where s represents the scale factor from the world coordinate system to the image coordinate system, (x)p,yp) Indicating the position in the pixel coordinate system, (X)w,Yw,Zw) Represents the coordinates of the spatial points in the coordinate system, (f)x,fy) Representing the equivalent focal length of the camera in the x and y directions, gamma representing the non-orthogonality of the x and y directions, (x0,y0) And the translation amount of the pixel coordinate system to the camera coordinate system is represented, K represents a camera internal reference matrix, and M represents a camera external reference matrix. The internal and external parameters of the camera can be obtained by Zhangyingyou calibration method, and a checkerboard plane Z is assumedWThe mapping relationship of space to image can be simplified as:
wherein r is1And r2Respectively representing the first two columns of the extrinsic rotation matrix, let H ═ K [ r [ [ r ]1 r2 T]I.e. H is the homography matrix of the image plane to the world plane, the mapping of space to image can be expressed as:
after solving the internal and external parameter matrixes of the two cameras by using a least square optimization method, assuming the coordinate of a world coordinate system as p for any point in spacewThe coordinate of the coordinate system of the left camera and the right camera is pl、prThen the following equation is satisfied:
eliminating the world coordinate system can result in:
the transformation relation of the coordinate systems of the two cameras can be obtained:
because the using scene of the binocular system is generally in the field, and the target is tens of meters away from the binocular system, the translation relation between the infrared camera and the visible light camera can be ignored, and the initial value of image registration is obtained:
by utilizing the initial image registration value, the method of Scale Invariant Feature Transform (SIFT) key point detection, the method of least square matching and the like are further used for realizing the fine image registration, so that the fusion result has better visual effect.
For example, the image fusion method described in the embodiment of the present application includes the following specific implementation steps:
1. calibrating an infrared and visible light binocular system: acquiring a plurality of images containing the checkerboards by using a binocular system, and extracting the checkerboard angular points in the images;
2. calibrating an infrared camera and a visible light camera respectively by utilizing checkerboard angular point information in infrared images and visible light images to obtain internal parameters and external parameters of the infrared camera and the visible light camera;
3. calculating a conversion relation between coordinate systems of the infrared camera and the visible light camera by utilizing position information of corner points in checkerboard images shot at the same time;
4. obtaining a conversion relation of an image coordinate system according to internal and external parameters of the infrared camera and the visible light camera and a conversion relation between coordinate systems of the infrared camera and the visible light camera, and taking the conversion relation as an initial parameter of binocular system registration;
5. carrying out fine registration on the parameters of the initial registration by using a least square matching method;
6. calculating color, brightness and motion components RG, BY, I and M of the visible light color image;
7. calculating the saliency of the visible light image;
8. calculating histogram distribution of the infrared image, and obtaining a saliency map of the infrared image based on the histogram distribution;
9. for the obtained infrared and visible significant image V
1、V
2And calculating the weight of the final infrared image:
the weight of the visible light image is W
2=1-W
1Is then based on W
1And W
2And performing image fusion on the infrared image and the visible light image.
According to the embodiment of the application, on one hand, the area of the detection target in the image is determined by performing significance detection on the image, and the infrared and visible light images in the area are fused in the fusion process, so that the loss of scene details caused by pixel level fusion of the infrared and visible light images is reduced. In addition, the fusion method based on video significance detection does not need to decompose images, reduces the use of operation memory and the calculation time, and can achieve better real-time performance on an embedded platform; on the other hand, on an embedded platform with limited computing resources, the saliency of the infrared video is computed, and salient objects are fused without reducing scene details. The method based on video significance fusion can provide candidate regions for a subsequent target detection algorithm, improves the calculation speed of a subsequent target detection system, and has the advantages of light weight, good real-time performance and strong expandability.
Optionally, before acquiring the infrared image by the infrared camera in step 101, the method may further include the following steps:
a1, carrying out binocular calibration on the infrared camera and the visible light camera by using a heatable checkerboard to obtain a calibration result;
a2, determining a perspective transformation matrix between the infrared camera and the visible light camera according to the calibration result;
in step 101, the infrared image is acquired by the infrared camera, and the following steps may be performed:
acquiring an infrared image through an infrared camera, and carrying out perspective transformation on the infrared image according to the perspective transformation matrix;
in step 102, the visible light image is obtained by the visible light camera, which may be implemented as follows:
and acquiring a visible light image through a visible light camera, and carrying out perspective transformation on the visible light image according to the perspective transformation matrix.
In the specific implementation, the electronic device can perform binocular calibration on the infrared camera and the visible light camera by using the heatable checkerboard, the calibration method can be that after internal and external parameters are obtained by calibrating the two cameras by using a Zhang friend calibration method, the external parameter matrixes of the two cameras are converted according to the corresponding relation to obtain a perspective transformation matrix, then an infrared image is obtained by using the infrared camera, the infrared image is subjected to perspective transformation according to the perspective transformation matrix, the visible light image is obtained by using the visible light camera, the visible light image is subjected to perspective transformation according to the perspective transformation matrix, and then the significant images of the infrared image and the visible light image after the perspective transformation are determined on the basis.
It can be seen that the image fusion method described in the embodiment of the present application obtains an infrared image through an infrared camera, obtains a visible light image through a visible light camera, determines a first saliency of the infrared image and a second saliency of the visible light image corresponding to the same shooting range of the infrared camera and the visible light camera, determines a first weight of the infrared image and a second weight of the visible light image according to the first saliency and the second saliency, performs image fusion on the infrared image and the visible light image according to the first weight and the second weight to obtain a fused image, and further can determine a region of a detection target in the image by performing saliency detection on the image, and fuses the infrared image and the visible light image in the region in the fusion process, thereby reducing loss of scene details caused by pixel-level fusion of the infrared image and the visible light image, and in addition, the fusion method based on video saliency detection does not need to decompose the image, the use and the calculation time of the operation memory are reduced, better real-time performance can be achieved on an embedded platform, the image quality is improved, and the real-time performance is ensured.
Referring to fig. 2, fig. 2 is a schematic flow chart of an image fusion method according to an embodiment of the present application, applied to an electronic device, the image fusion method including:
201. and carrying out binocular calibration on the infrared camera and the visible light camera by utilizing the heatable checkerboard to obtain a calibration result.
202. And determining a perspective transformation matrix between the infrared camera and the visible light camera according to the calibration result.
203. And acquiring an infrared image through the infrared camera, and carrying out perspective transformation on the infrared image according to the perspective transformation matrix.
204. And acquiring a visible light image through the visible light camera, and carrying out perspective transformation on the visible light image according to the perspective transformation matrix, wherein the infrared camera and the visible light camera correspond to the same shooting range.
205. A first saliency map of the infrared image is determined.
206. A second saliency map of the visible light image is determined.
207. And determining a first weight of the infrared image and a second weight of the visible light image according to the first saliency map and the second saliency map.
208. And carrying out image fusion on the infrared image and the visible light image according to the first weight and the second weight to obtain a fused image.
For the detailed description of the steps 201 to 208, reference may be made to the corresponding steps of the image fusion method described in the foregoing fig. 1A, and details are not repeated here.
It can be seen that, in the image fusion method described in the embodiment of the present application, the heatable checkerboard is used to perform binocular calibration on the infrared camera and the visible light camera to obtain a calibration result, the perspective transformation matrix between the infrared camera and the visible light camera is determined according to the calibration result, the infrared image is obtained by the infrared camera, the perspective transformation is performed on the infrared image according to the perspective transformation matrix, the visible light image is obtained by the visible light camera, the visible light image is subjected to perspective transformation according to the perspective transformation matrix, the infrared camera and the visible light camera correspond to the same shooting range, the first significant image of the infrared image is determined, the second significant image of the visible light image is determined, the first weight of the infrared image and the second weight of the visible light image are determined according to the first significant image and the second significant image, the infrared image and the visible light image are fused according to the first weight and the second weight, the fusion image is obtained, further, the area of the detection target in the image can be determined by performing significance detection on the image, infrared and visible light images in the area are fused in the fusion process, loss of scene details caused by pixel-level fusion of the infrared and visible light images is reduced, in addition, the fusion method based on video significance detection does not need to decompose the image, the use and calculation time of a calculation memory is reduced, better real-time performance can be achieved on an embedded platform, the image quality is favorably improved, and the real-time performance is ensured.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:
acquiring an infrared image through an infrared camera;
acquiring a visible light image through a visible light camera, wherein the infrared camera and the visible light camera correspond to the same shooting range;
determining a first saliency map of the infrared image;
determining a second saliency map of the visible light image;
determining a first weight of the infrared image and a second weight of the visible light image according to the first saliency map and the second saliency map;
and carrying out image fusion on the infrared image and the visible light image according to the first weight and the second weight to obtain a fused image.
It can be seen that, in the electronic device described in the embodiment of the present application, an infrared image is acquired by an infrared camera, a visible light image is acquired by a visible light camera, the infrared camera and the visible light camera correspond to the same shooting range, a first saliency of the infrared image is determined, a second saliency of the visible light image is determined, a first weight of the infrared image and a second weight of the visible light image are determined according to the first saliency and the second saliency, the infrared image and the visible light image are subjected to image fusion according to the first weight and the second weight, a fused image is obtained, further, a region of a detection target in the image can be determined by performing saliency detection on the image, the infrared image and the visible light image in the region are fused in a fusion process, loss of scene details caused by pixel-level fusion of the infrared image and the visible light image is reduced, in addition, a fusion method based on video saliency detection does not need to decompose the image, the use and the calculation time of the operation memory are reduced, better real-time performance can be achieved on an embedded platform, the image quality is improved, and the real-time performance is ensured.
Optionally, in the aspect of determining the first saliency map of the infrared image, the program includes instructions for:
acquiring a histogram of the infrared image;
and determining a first saliency map of the infrared image according to the histogram.
Optionally, in the aspect of determining the second saliency map of the visible light image, the program includes instructions for:
acquiring color channel parameters of the visible light image;
determining a red-green color component, a blue-yellow color component, a brightness component and a motion component of the visible light image according to the color channel parameters;
determining a first reference expression of the visible light image according to the red-green color component, the blue-yellow color component, the luminance component and the motion component;
simplifying the reference expression to obtain a simplified expression;
carrying out quaternary Fourier transform on the simplified expression to obtain a frequency domain expression;
extracting target phase information according to the frequency domain expression, and performing inverse Fourier transform on the target phase information to obtain a second reference expression;
and carrying out filtering processing on the second reference expression to obtain the second saliency map.
Optionally, in the aspect of determining the first weight of the infrared image and the second weight of the visible light image according to the first saliency map and the second saliency map, the program includes instructions for:
determining a first weight of the infrared image according to the following formula:
wherein, W1Is the first weight value, V1Is the first significant graph, V2Is the second saliency map;
determining a second weight of the visible light image according to the following formula:
W2=1-W1
wherein, W2Is the second weight.
Optionally, before the infrared image is acquired by the infrared camera, the program further includes instructions for executing the following steps:
carrying out binocular calibration on the infrared camera and the visible light camera by using the heatable checkerboard to obtain a calibration result;
determining a perspective transformation matrix between the infrared camera and the visible light camera according to the calibration result;
wherein, acquire infrared image through infrared camera, include:
acquiring an infrared image through an infrared camera, and carrying out perspective transformation on the infrared image according to the perspective transformation matrix;
the acquiring of the visible light image through the visible light camera includes:
and acquiring a visible light image through a visible light camera, and carrying out perspective transformation on the visible light image according to the perspective transformation matrix.
The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that in order to implement the above functions, it includes corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the functional units may be divided according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Fig. 4 is a block diagram of functional units of an image fusion apparatus 400 according to an embodiment of the present application, where the apparatus 400 includes: an acquisition unit 401, a determination unit 402, and an image fusion unit 403, wherein,
the acquiring unit 401 is configured to acquire an infrared image through an infrared camera; acquiring a visible light image through a visible light camera, wherein the infrared camera and the visible light camera correspond to the same shooting range;
the determining unit 402 is configured to determine a first saliency map of the infrared image; determining a second saliency map of the visible light image; determining a first weight of the infrared image and a second weight of the visible light image according to the first saliency map and the second saliency map;
the image fusion unit 403 is configured to perform image fusion on the infrared image and the visible light image according to the first weight and the second weight to obtain a fusion image.
It can be seen that, in the image fusion apparatus described in the embodiment of the present application, an infrared image is acquired by an infrared camera, a visible light image is acquired by a visible light camera, the infrared camera and the visible light camera correspond to the same shooting range, a first saliency of the infrared image is determined, a second saliency of the visible light image is determined, a first weight of the infrared image and a second weight of the visible light image are determined according to the first saliency and the second saliency, the infrared image and the visible light image are subjected to image fusion according to the first weight and the second weight, a fused image is obtained, further, a region of a detection target in the image can be determined by performing saliency detection on the image, the infrared image and the visible light image in the region are fused in the fusion process, loss of scene details caused by pixel-level fusion of the infrared image and the visible light image is reduced, in addition, the fusion method based on video saliency detection does not need to decompose the image, the use and the calculation time of the operation memory are reduced, better real-time performance can be achieved on an embedded platform, the image quality is improved, and the real-time performance is ensured.
Optionally, in the aspect of determining the first saliency map of the infrared image, the determining unit 402 is specifically configured to:
acquiring a histogram of the infrared image;
and determining a first saliency map of the infrared image according to the histogram.
Optionally, in the aspect of determining the second saliency map of the visible light image, the determining unit 402 is specifically configured to:
acquiring color channel parameters of the visible light image;
determining a red-green color component, a blue-yellow color component, a brightness component and a motion component of the visible light image according to the color channel parameters;
determining a first reference expression of the visible light image according to the red-green color component, the blue-yellow color component, the luminance component and the motion component;
simplifying the reference expression to obtain a simplified expression;
carrying out quaternary Fourier transform on the simplified expression to obtain a frequency domain expression;
extracting target phase information according to the frequency domain expression, and performing inverse Fourier transform on the target phase information to obtain a second reference expression;
and carrying out filtering processing on the second reference expression to obtain the second saliency map.
Optionally, in the aspect that the first weight of the infrared image and the second weight of the visible light image are determined according to the first saliency map and the second saliency map, the determining unit 402 is specifically configured to:
determining a first weight of the infrared image according to the following formula:
wherein, W1Is the first weight value, V1Is the first significant graph, V2Is the second saliency map;
determining a second weight of the visible light image according to the following formula:
W2=1-W1
wherein, W2Is the second weight.
Optionally, before the infrared image is acquired by the infrared camera, the apparatus 400 is further specifically configured to:
carrying out binocular calibration on the infrared camera and the visible light camera by using the heatable checkerboard to obtain a calibration result;
determining a perspective transformation matrix between the infrared camera and the visible light camera according to the calibration result;
wherein, acquire infrared image through infrared camera, include:
acquiring an infrared image through an infrared camera, and carrying out perspective transformation on the infrared image according to the perspective transformation matrix;
the acquiring of the visible light image through the visible light camera includes:
and acquiring a visible light image through a visible light camera, and carrying out perspective transformation on the visible light image according to the perspective transformation matrix.
It can be understood that the functions of each program module of the image fusion apparatus in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.