CN111602139A

CN111602139A - Image processing method and device, control terminal and mobile device

Info

Publication number: CN111602139A
Application number: CN201980008862.XA
Authority: CN
Inventors: 蔡剑钊; 赵峰; 周游
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd; Shenzhen Dajiang Innovations Technology Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2020-08-28
Also published as: WO2020237611A1

Abstract

An image processing method, an image processing device, a control terminal and a mobile device comprise: acquiring a target image (101) comprising a target object; determining a target region (102) in a target image; in the target area, determining a main body characteristic area (103) of the target object; depth information (104) of the target object is determined according to the initial depth information of the main body feature region, and three-dimensional physical information (105) of the target object is determined according to the depth information. In the process of obtaining the depth information, the invention removes the interference of the background, the obstruction and the non-main body part of the target object in the target image, thereby reducing the probability of introducing useless information in the process of calculating the depth information and improving the precision of the three-dimensional physical information.

Description

Image processing method and device, control terminal and mobile device

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image processing method, an image processing device, a control terminal and a mobile device.

Background

As an important field of intelligent computing, computer vision technology has been greatly developed and applied. The computer vision technology replaces human visual organs with an imaging system, so that the tracking and positioning of a target object are realized.

At present, the tracking and positioning of a target object are realized, Depth information of the target object needs to be firstly known, that is, a Depth information Map (Depth Map) for representing the Depth information of the target object is obtained, there are two ways of obtaining the Depth information Map at present, and in a first scheme, referring to fig. 1, feature detection is performed on an image 1 which is obtained by an imaging system and comprises the target object 2, and by defining a feature frame 3 comprising the target object 2 and a part of a background picture 4, the Depth information of the target object 2 is calculated according to all pixel points in the feature frame 3, so that the Depth information Map of the target object 2 is drawn. In the second scheme, the image 1 is identified with respect to the target object 2 by directly adopting an image semantic segmentation (semantic segmentation) algorithm or a semantic parsing (semantic parsing) algorithm, so as to draw the depth information map of the target object 2.

However, in the first conventional solution, since the defined feature frame 3 includes the target object 2 and a part of the background picture 4, a great amount of useless information, such as depth information of the background picture 4, depth information of some non-important parts of the target object 2, and the like, is introduced in the process of drawing the depth information map by using the feature frame 3. The finally drawn depth information map cannot accurately express the target object 2, and the tracking and positioning accuracy of the target object 2 is poor. In addition, in the second scheme, the algorithm processing is directly performed on the whole image of the image 1, so that a large computing resource is required, and the processing cost is high.

Disclosure of Invention

The invention provides an image processing method, an image processing device, a control terminal and a movable device, which are used for solving the problems that the determination of three-dimensional physical information of an object in the prior art requires large computing resources, so that the processing cost is high, and the tracking and positioning accuracy of the object is poor.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides an image processing method, where the method may include:

acquiring a target image including a target object;

determining a target area in the target image, the target object being at least partially within the target area;

in the target area, determining a main body characteristic area of the target object;

determining the depth information of the target object according to the initial depth information of the main body feature region;

and determining the three-dimensional physical information of the target object according to the depth information.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, which may include:

the receiver is configured to perform: acquiring a target image including a target object;

the processor is configured to perform:

In a third aspect of the embodiments of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when being executed by a processor, implements the steps of the image processing method described above.

In a fourth aspect of the embodiments of the present invention, there is provided a control terminal, including the image processing apparatus, a transmitting apparatus, and a receiving apparatus, where the transmitting apparatus transmits a shooting instruction to a mobile device, the receiving apparatus receives an image shot by the mobile device, and the image processing apparatus processes the image.

In a fifth aspect of the embodiments of the present invention, there is provided a mobile device, including a shooting device, and further including an image processing device, where the image processing device receives an image shot by the shooting device and performs image processing.

In the embodiment of the invention, the invention obtains the target image comprising the target object; determining a target area in the target image, wherein at least a main body part of the target object is positioned in the target area; in the target area, determining a main body characteristic area of the target object; and determining the depth information of the target object according to the initial depth information of the main body characteristic region, and determining the three-dimensional physical information of the target object according to the depth information. In the process of obtaining the depth information, the invention removes the interference of the background, the obstruction and the non-main body part of the target object in the target image, thereby reducing the probability of introducing useless information in the process of calculating the depth information and improving the precision of the three-dimensional physical information.

Drawings

FIG. 1 is a flowchart illustrating steps of an image processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a target image provided by an embodiment of the invention;

FIG. 3 is a schematic diagram of another target image provided by embodiments of the present invention;

FIG. 4 is a flowchart illustrating specific steps of an image processing method according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating specific steps of another image processing method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of another target image provided by embodiments of the present invention;

fig. 7 is a scene graph for acquiring initial depth information of a target object according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of another target image provided by embodiments of the present invention;

FIG. 9 is a probability distribution diagram of a timing matching operation according to an embodiment of the present invention;

fig. 10 is a block diagram of an image processing apparatus according to an embodiment of the present invention;

FIG. 11 is a block diagram of a removable device provided by an embodiment of the present invention;

fig. 12 is a schematic diagram of a hardware structure of a control terminal according to an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart illustrating steps of an image processing method according to an embodiment of the present invention, where as shown in fig. 1, the method may include:

step 101, a target image including a target object is acquired.

In an embodiment of the present invention, an image processing method provided in an embodiment of the present invention may be applied to a mobile device, where the mobile device includes: unmanned aerial vehicle, unmanned car, unmanned ship, handheld shooting equipment etc. are provided with the image processing apparatus who has the shooting function usually on the mobile device, and in addition, the normal work of mobile device can not leave the degree of depth information of the object that image processing apparatus obtained to the shooting of mobile device peripheral object and processing.

For example, when the unmanned vehicle is unmanned, an image processing device arranged on the unmanned vehicle is required to acquire an image of an object in the surrounding environment of the unmanned vehicle in real time, and depth information of the object is obtained according to further processing of the image.

In this step, a target image including the target object is acquired, and specifically, the target image having the target object in one or more frames may be captured by a camera in the image processing apparatus.

Step 102, determining a target area in the target image, wherein at least a main body part of the target object is located in the target area.

Specifically, in the embodiment of the present invention, after the target image including the target object is acquired, a target area in the target image may be further determined to achieve the purpose of detecting the object in the target image, where the target area may include at least a main portion of the target object, that is, the target area may completely overlap or partially overlap with at least the main portion of the target object.

Referring to fig. 2, a schematic diagram of a target image according to an embodiment of the present invention is shown, where the target image 10 includes a human target object 11 and two street lamps 12 in a background, if the entire target image 10 is directly scanned to determine depth information of the human target object 11, firstly, a calculated amount is too large, and secondly, in the process of determining the depth information of the human target object 11, irrelevant background and relevant information of the street lamps 12 are introduced, so that a probability of an error occurring in the depth information of the human target object 11 is large.

Therefore, in the embodiment of the present invention, the region in which the human target object 11 is located may be roughly framed by the target region frame 13, and the entire human target object 11 and a small part of the background region may be included in the target region frame 13.

In addition, in the case where the size of the target object is relatively large or the shape is irregular, the main body portion of the target object may be determined first, and the target region may include at least the main body portion.

Compared with the method of directly scanning the whole target image 10, the method of the invention has the advantages that the target area frame 13 including the whole person target object 11 is firstly defined, and then the area in the target area frame 13 is scanned, so that the calculated amount can be reduced to a certain extent, two irrelevant street lamps 12 in the background are filtered out by the target area frame 13, and the probability of generating errors in the process of calculating the depth information of the person target object 11 is reduced.

Specifically, the region where the human target object 11 is located is selected through the target region frame 13, which may specifically adopt the following two implementation manners:

in the first mode, a target area frame 13 is generated by receiving a frame selection operation of a user, and the area where the human target object 11 is located is selected through the target area frame 13.

In the second mode, a recognition model capable of recognizing and determining the human target object 11 in the target image 10 is obtained by deep learning and training, so that after the target image 10 is input into the recognition model, the recognition model can automatically output the target region frame 13 including the human target object 11. This approach is similar to the current face region location technique, and the present invention is not described herein.

The shape of the target region is preferably rectangular, but it is needless to say that the shape of the target region may be circular, irregular, or the like according to actual needs, and the shape is not limited in the embodiment of the present invention.

And 103, determining a main characteristic region of the target object in the target region.

In the embodiment of the present invention, the main feature region may accurately reflect the position of the target object, for example, when the movable device performs motion trajectory positioning on the target object, the main feature region may represent the centroid of the entire target object, so that a trajectory generated by movement of the main feature region may be determined as a trajectory generated by movement of the target object.

Specifically, the determination of the main feature region of the target object may be implemented according to the type of the target object.

For example, referring to fig. 2, in the case that the target object is a human, since the range of motion of the limbs of the human is large when the human moves, the variance of the measured depth information is large, the trunk portion of the human (i.e., the area ABCD in fig. 2) may be defined as a main feature area, so that the main feature area is further defined in the target area frame 13 to reduce the variance generated when the depth information is subsequently calculated.

For example, referring to fig. 3, when the target object in the target image 20 is a car 21 and the captured target image 20 has a blocking area 22, the target area frame 23 defines the whole car 21 and a part of the blocking area 22, and since the car 21 is a relatively regular-shaped object, the area of the car 21 outside the blocking area 22 in the target area frame 23 can be defined as a main characteristic area to reduce the variance of the blocking area 22 when the depth information is calculated later.

And 104, determining the depth information of the target object according to the initial depth information of the main body characteristic region.

In embodiments of the present invention, the perception of depth information is a prerequisite for human to generate stereoscopic object vision, and depth information refers to the number of bits used to store each pixel in an image, which determines the number of colors each pixel of a color image may have, or determines the number of gray levels each pixel of a gray-scale image may have.

In reality, an object has a depth change from near to far in an observation range of human eyes, for example, a ruler is horizontally placed on a desktop, a user stands at one end of a scale starting point of the ruler to watch, the scale of the ruler tends to change from small to large in a visual range, and the distance between scales is continuously reduced along with the movement of a sight line to the other end of the ruler, which is the influence of depth information on human vision.

In the field of computer vision, the depth information of an object can be a gray level image which comprises the depth information of each pixel point, the size of the depth information is expressed in the depth of gray level, and the gray level image expresses the distance between the object and a camera through gray level gradient.

Therefore, in the embodiment of the invention, the operations such as azimuth positioning, distance measurement and the like can be performed on the object near the movable equipment by acquiring the depth information of the object near the movable equipment, so that the intelligent experience of the movable equipment is improved.

Specifically, step 103 has determined the subject feature region of the target object, such that the depth information of the target object is further determined by the initial depth information of the subject feature region.

Specifically, there may be multiple ways to obtain the initial depth information of the main feature region, for example, in an implementation scheme, the current mobile device may include a configuration of a binocular camera module, so that obtaining the depth information of the target object may be implemented by a passive distance measurement sensing method, in the method, two cameras at a certain distance are used to simultaneously obtain two images of the same target object, and a stereo matching algorithm is used to find pixel points corresponding to the main feature region in the two images, and then parallax information is calculated according to a trigonometric principle, and the parallax information may be converted to obtain the initial depth information used to characterize the main feature region in the scene.

In another implementation scheme, the initial depth information of the main feature region can be acquired by an active distance measurement sensing method, and the most obvious features of the active distance measurement sensing compared with the passive distance measurement sensing are as follows: therefore, in the embodiment of the invention, continuous near infrared pulses can be emitted to the target object through the movable equipment, then the light pulses reflected by the target object are received by using the sensor of the movable equipment, the phase difference between the emitted light pulses and the light pulses reflected by the target object is compared, the transmission delay between the light pulses can be calculated, the distance between the target object and the emitter is further obtained, and finally the depth image containing the initial depth information corresponding to the main characteristic region of the target object is obtained.

Furthermore, after the initial depth information of the main characteristic region is determined, the depth information corresponding to the target object can be obtained by averaging the initial depth information of the main characteristic region, and in the process of obtaining the depth information, because the interference of a background, a shelter and a non-main part of the target object in the target image is removed, the probability of introducing useless information in the process of calculating the depth information is reduced, and the precision of the depth information is improved.

And 105, determining the three-dimensional physical information of the target object according to the depth information.

In this step, if the target object is to be tracked, the three-dimensional physical information of the target object must be acquired, so the three-dimensional physical information of the target object can be further determined according to the depth information of the target object, and the three-dimensional physical information of the target object can be used to indicate the direction and the motion trajectory of the target object during motion.

Specifically, the depth information of the object may be a gray scale map, which includes the depth information of each pixel point, the size of the depth information is expressed in the depth of the gray scale, and the gray scale map expresses the distance between the object and the camera by gray scale gradient. The depth information of the target object can be converted into a gray-scale image, the gray-scale gradual change value in the gray-scale image is calculated, the distance between the target object and the movable equipment is determined by utilizing the corresponding relation between the gray-scale gradual change value and the distance, the position coordinates of the target object at different moments are determined, and in addition, the position coordinates of the target object at different moments can be associated with the corresponding moments to obtain the three-dimensional physical information of the target object. In addition, the position coordinates of the target object at different times may be associated with corresponding times and embodied on a specific map to obtain three-dimensional physical information of the target object.

To sum up, in the image processing method provided by the embodiment of the present invention, a target image including a target object is obtained; determining a target area in the target image, wherein at least a main body part of the target object is positioned in the target area; in the target area, determining a main body characteristic area of the target object; and determining the depth information of the target object according to the initial depth information of the main body characteristic region, and determining the three-dimensional physical information of the target object according to the depth information. In the process of obtaining the depth information, the invention removes the interference of the background, the obstruction and the non-main body part of the target object in the target image, thereby reducing the probability of introducing useless information in the process of calculating the depth information and improving the precision of the three-dimensional physical information.

Fig. 4 is a flowchart illustrating specific steps of an image processing method according to an embodiment of the present invention, and as shown in fig. 4, the method may include:

step 201, a target image including a target object is acquired.

This step may specifically refer to step 101, which is not described herein again.

Step 202, determining a target area in the target image, wherein at least a main body part of the target object is located in the target area.

This step may specifically refer to step 102, which is not described herein again.

Step 203, dividing the target area into a plurality of sub-areas by extracting edge features of the target area.

In the embodiment of the invention, the edge feature is used for representing the edge with obvious change or discontinuous areas in an image, and since the edge is the boundary line between different areas in an image, an edge image can be a binary image, and the purpose of edge detection is to capture the area with sharp change of brightness. In an ideal situation, edge detection is performed on the target region, edge features composed of a series of continuous curves can be obtained in the target region to represent the boundary of the object, and the whole target region can be divided into a plurality of sub-regions through intersection among the edge features.

And 204, determining the classification categories of the sub-regions through a classification model.

Optionally, the step 204 may also be implemented by determining classification categories of a plurality of sub-regions through a convolutional neural network model or determining classification categories of a plurality of sub-regions through a classifier.

Based on deep learning, a classification model can be obtained by training a training data set, the classification model is used for classifying the category of each sub-region, and specifically, the training process of the classification model may include: and training the classification model by adopting the corresponding relation between the region of the preset pattern and the belonged classification of the preset pattern, so that the classification model can achieve the purposes of inputting a certain region and outputting the belonged classification corresponding to the region.

In this step, a plurality of sub-regions of the target region may be input into a trained classification model, which outputs a classification category for each sub-region.

And step 205, combining the sub-regions corresponding to the target classification categories in the plurality of sub-regions to obtain the main body feature region.

In this step, a target classification category matched with the main body feature region may be determined first, and then sub-regions corresponding to the target classification category are connected to obtain the main body feature region.

For example, when the target object is a human, since the range of motion of the limbs of the human is large when the human moves, the variance of the measured depth information is large, the human trunk may be defined as a main body feature region, the target classification category may be determined as a human trunk category, and the sub-regions corresponding to the human trunk category are merged to obtain the main body feature region.

Optionally, when the target object is in a stressed or moving state, an offset of the contour of the main feature region is less than or equal to a preset threshold.

Specifically, the definition of the main feature region is that when the target object is in a stressed or moving state, the offset of the profile of the main feature region is less than or equal to a preset threshold, that is, the main feature region of the target object can keep a relatively stable state in the moving or stressed state of the target object, so as to avoid introducing too much useless information when calculating the depth information of the target object in the later stage.

In addition, the measuring of the deviation amount of the contour of the main feature region may specifically include: under a fixed shooting visual angle, acquiring continuous frames of frame images including a target object, and recording the displacement difference of the outline of the main feature region in the adjacent frame images as an offset, or recording the displacement difference between the outline of the main feature region in one frame image and the outline of the main feature region in a frame image before a plurality of frames as an offset.

And step 206, determining the depth information of the target object according to the initial depth information of the main body feature region.

This step may specifically refer to step 104, which is not described herein again.

And step 207, determining the position coordinates of the target object at different moments according to the depth information.

In the embodiment of the invention, the depth information of the object can be a gray level map, the depth information of each pixel point is contained, the size of the depth information is represented by the depth of gray level, and the gray level map represents the distance between the object and the camera through gray level gradient.

Therefore, the depth information of the target object can be converted into a gray scale map, and the distance between the target object and the movable equipment can be determined by calculating the gray scale gradual change value in the gray scale map and utilizing the corresponding relation between the gray scale gradual change value and the distance. The depth information can be continuously updated when the target object moves, so that a new gray-scale image can be obtained according to the updated depth information, and the position coordinates of the target object at different moments are determined according to the distance between the target object and the movable equipment at different moments.

And 208, determining the three-dimensional physical information of the target object according to the position coordinates of the target object at different moments.

In the embodiment of the invention, the position coordinates of the target object at different moments can be associated with corresponding moments to obtain the three-dimensional physical information of the target object. In addition, the position coordinates of the target object at different times may be associated with corresponding times and embodied on a specific map to obtain three-dimensional physical information of the target object.

In summary, the image processing method provided by the embodiment of the present invention obtains the target image including the target object; determining a target area in the target image, wherein at least a main body part of the target object is positioned in the target area; in the target area, determining a main body characteristic area of the target object; and determining the depth information of the target object according to the initial depth information of the main body characteristic region, and determining the three-dimensional physical information of the target object according to the depth information. In the process of obtaining the depth information, the invention removes the interference of the background, the obstruction and the non-main body part of the target object in the target image, thereby reducing the probability of introducing useless information in the process of calculating the depth information and improving the precision of the three-dimensional physical information.

Fig. 5 is a flowchart illustrating specific steps of an image processing method according to an embodiment of the present invention, and as shown in fig. 5, the method may include:

step 301, at a preset time, acquiring a first image and a second image of the target object through a binocular camera module.

In the embodiment of the invention, the determination of the initial depth information of the target object can be realized by adopting a binocular camera module, the binocular camera module comprises a first camera and a second camera which have fixed optical centers and fixed attention intervals, and the binocular camera module is a device for acquiring the three-dimensional geometric information of the target object from a plurality of images based on a binocular parallax principle. Specifically, referring to fig. 6, which shows a schematic diagram of a target image according to an embodiment of the present invention, a first image 30 of a target object may be acquired by a first camera of a binocular camera module at a preset time T1, and a second image 40 of the target object may be acquired by a second camera of the binocular camera module at the same time.

Step 302, determining a target area in the first image and the second image, wherein at least a main body part of the target object is located in the target area.

In this step, referring to fig. 6, the first target region 31 in the first image 30 may be determined, and the second target region 41 in the second image 40 may be determined, and specifically, the method for determining the target region in the image may refer to step 102, which is not described herein again.

Step 303, in the target area, determining a main feature area of the target object.

In this step, referring to fig. 6, a first main feature region EFGH in the first target region 31 may be determined, and a second main feature region E 'F' G 'H' in the second target region 41 may be determined, specifically, in the target region, the method for determining the main feature region of the target object may refer to step 103, which is not described herein again.

Step 304, performing matching processing on the first main feature region of the first image and the second image, and/or performing matching processing on the second main feature region of the second image and the first image, and calculating to obtain the initial depth information.

The actual operation of acquiring the initial depth information of the target object by utilizing the binocular camera module comprises 4 steps: camera calibration, binocular correction, binocular matching and depth information calculation.

Calibrating a camera: the camera calibration is a process of eliminating distortion of the camera due to the characteristics of the optical lens, and internal and external parameters and distortion parameters of the first camera and the second camera of the binocular camera module can be obtained through the camera calibration.

Binocular correction: after the first image and the second image are obtained, the distortion elimination and line alignment processing are carried out on the first image and the second image by utilizing the internal and external parameters and the distortion parameters of the first camera and the second camera which are obtained by calibrating the camera, so that the first image and the second image without distortion are obtained.

Binocular matching: and matching the first main feature region of the first image with the second image, and/or matching the second main feature region of the second image with the first image.

Specifically, referring to fig. 6, the pixel points in the first main feature region EFGH may be matched with the pixel points in the entire second image 40, the pixel points in the second main feature region E 'F' G 'H' may be matched with the pixel points in the entire first image 30, or both the pixel points in the first main feature region EFGH and the pixel points in the entire second image 40 may be matched, and the pixel points in the second main feature region E 'F' G 'H' may be matched with the pixel points in the entire first image 30. The binocular matching is used for matching corresponding pixel points of the same scene on left and right views (namely a first image and a second image), and the purpose of the binocular matching is to obtain a parallax value. After obtaining the disparity value, an operation of calculating depth information may be further performed.

In the embodiment of the invention, since the main characteristic region capable of accurately reflecting the centroid of the target object is determined, the first main characteristic region of the first image and the second image can be matched, and/or the second main characteristic region of the second image and the first image can be matched, and the purpose of binocular matching can be achieved by the two modes. Meanwhile, binocular matching processing is carried out on the main body characteristic region, compared with the mode that binocular matching is directly carried out on the whole first image and the whole second image, the calculated amount is reduced, and the processing efficiency is improved.

Optionally, step 304 may specifically include:

substep 3041, matching the first main feature region of the first image with the second image, and/or matching the second main feature region of the second image with the first image, so as to obtain a disparity value.

In this step, a calculation depth information operation in determining depth information may be performed, where calculating depth information first requires calculating a disparity value between the first camera and the second camera, and specifically includes:

in this step, referring to fig. 7, a scene graph for acquiring initial depth information of a target object according to an embodiment of the present invention is shown, where P is a certain point in a main feature region of the target object, OR and OT are optical centers of the first camera and the second camera, respectively, imaging points of the point P on photoreceptors of the two first cameras and the second camera are P and P '(an imaging plane of the camera is placed in front of the lens after being rotated), f is a camera focal length, B is a two-camera center distance, and if a disparity value between the point P and the point P' is dis: the disparity value dis ═ B- (Xr-Xt).

Optionally, the first main feature region of the first image is matched with the second image, and/or the second main feature region of the second image is matched with the first image, specifically, the feature pixel points extracted from the first feature region are matched in the second image; and/or matching the characteristic pixel points extracted from the second main characteristic region in the first image.

Optionally, the characteristic pixel point is a pixel point in the image where the change of the gray value is greater than a preset threshold value or the curvature on the edge of the image is greater than a preset curvature value.

In the embodiment of the invention, in order to further reduce the data processing amount in the initial depth information calculation process, the feature pixel points extracted from the first feature region can be further matched in the second image; and/or further performing matching processing on the characteristic pixel points extracted from the second main characteristic region in the first image. The characteristic pixel points are pixel points with the gray value change larger than a preset threshold value in the image or the curvature on the edge of the image larger than a preset curvature value, and can be corner points, boundary points and other points with violent change characteristics of the target object.

For example, referring to fig. 6, the feature pixel points extracted in the first feature region may be four corner points E, F, G, H of the human torso, and the feature pixel points extracted in the second feature region may be four corner points E ', F', G ', H' of the human torso.

Substep 3042, calculating to obtain the initial depth information according to the disparity value.

In this step, referring to fig. 7, assuming that the initial depth information is Z, after the disparity value dis is obtained as B- (Xr-Xt), the initial depth information Z can be obtained as (fB)/(Xr-Xt) according to the triangle-like principle of B- (Xr-Xt)/B ═ Z-f/Z.

Therefore, according to the focal length of the binocular camera module, the optical center distance and the parallax value of the first camera and the second camera, the initial depth information of the target object can be obtained through calculation.

Optionally, after step 301, the method may further include:

and 305, acquiring a first image and a second image of the target object through the binocular camera module at a plurality of moments.

In the embodiment of the invention, the key feature pixel points in the main feature region can be determined through time sequence matching operation, the corresponding weight values are added to the key feature pixel points according to the confidence degrees of the key feature pixel points, and the weight values can be weighted in the initial depth information in the process of calculating the depth information of the target object according to the initial depth information, so that the calculated depth information of the target object is more stable and accurate.

Specifically, the key feature pixel points may be relatively stable points at different times and are not prone to relative position change. Therefore, in this step, first, the first image and the second image of the target object need to be acquired at a plurality of times by the binocular camera module.

For example, referring to fig. 8, which shows a schematic diagram of a target image provided by an embodiment of the present invention, a first image 30 of a target object may be acquired by a first camera of a binocular camera module at a time T1, while a second camera of the binocular camera module acquires a second image 40 of the target object; and at time T2, a third image 50 of the target object is acquired by the first camera of the binocular camera module, while a fourth image 60 of the target object is acquired by the second camera of the binocular camera module.

Further, a first target area 31 in the first image 30 may be determined, a second target area 41 in the second image 40 may be determined, a third target area 51 in the third image 50 may be determined, and a fourth target area 61 in the fourth image 60 may be determined.

Further, a first body feature area EFGH in the first target area 31, a second body feature area E 'F' G 'H' in the second target area 41, a third body feature area IJKL in the third target area 51, and a fourth body feature area I 'J' K 'L' in the fourth target area 61 may be determined.

Step 306, performing matching processing on the first main feature region of the first image and the second image acquired at the corresponding moment, and/or performing matching processing on the second main feature region of the second image and the first image acquired at the corresponding moment.

In this step, at the same time, a matching operation may be performed on an image acquired by a first camera and an image acquired by a second camera to determine whether a relatively stable key feature pixel exists in the image acquired by the first camera and the image acquired by the second camera at the same time, and specifically, the first main feature region of the first image and the second image acquired at the corresponding time may be subjected to a matching process, and/or the second main feature region of the second image and the first image acquired at the corresponding time may be subjected to a matching process. Referring to fig. 8, the pixel points in the first main feature region EFGH may be matched with the pixel points in the entire second image 40, the pixel points in the second main feature region E 'F' G 'H' may be matched with the pixel points in the entire first image 30, and in addition, the pixel points in the first main feature region EFGH may be matched with the pixel points in the entire second image 40, and the pixel points in the second main feature region E 'F' G 'H' may be matched with the pixel points in the entire first image 30.

And 307, determining the first matching success times of the matching processing.

In this step, if the position coordinates of a pixel point in the image acquired by the first camera are not changed compared with the position coordinates of a corresponding pixel point in the image acquired by the second camera, it can be determined that the pixel point is successfully matched at the moment, and the confidence of the pixel point serving as the key feature pixel point is increased. The first matching success times can be obtained by counting the matching results of the matching processing of the pixel points at each moment.

For example, referring to fig. 8, if the position of the pixel point E, F, G, H in the first main feature region EFGH and the positions of the pixel points E ', F', G ', and H' in the second main feature region E 'F' G 'H' do not change relatively, the number of times of the first matching success is added to the pixel point E, F, G, H.

And 308, matching the characteristic regions in the plurality of first images acquired at different moments.

In this step, matching processing may be performed between feature regions in the first image at different times.

For example, referring to fig. 8, the matching process may be performed between the pixel points in the first body feature region EFGH at time T1 and the pixel points in the third body feature region IJKL at time T2.

Step 309, determining a second matching success number of the matching process.

For example, referring to fig. 8, if the position of the pixel point E, F, G, H in the first body feature region EFGH and the position of the pixel point I, J, K, L in the third body feature region IJKL do not change relatively, a second matching success number is added to the pixel point E, F, G, H.

Optionally, after step 304 and step 309, the method may further include:

step 310, setting a weight value for the initial depth information according to the first matching success frequency and the second matching success frequency, wherein the weight value is positively correlated with the matching frequency.

In this step, the weight value P corresponding to the initial depth information determined according to the first matching success frequency c1 and the second matching success frequency c2 may be implemented according to the following formula 1_t：

Referring specifically to fig. 7, at time T1, for a point E in the first image 30, based on the matching operation between the point E and a point E' in the second image, a first matching success number c1 can be obtained, and meanwhile, the initial depth information Ed of the point E can be obtained through binocular matching calculation, where the initial confidence of the set point E is

At time T2, for point E in the first image 30, a second matching success number c2 can be obtained according to the matching operation between point E and point I in the third image, and a weight value P of point E is calculated according to the above formula 1_t。

In the same manner, points F, G, and G can be obtainedWeight value P of H_t. It should be noted that, the parameter 60 and the parameter 5 in the formula 1 may be updated and set according to experience and requirements, which is not limited in the embodiment of the present invention.

And 311, performing weighted average calculation according to the initial depth information and a weight value corresponding to the initial depth information to obtain the depth information of the target object.

In the process of averaging the initial depth information of the main body characteristic region to obtain the depth information of the corresponding target object, the weight value P of the point E is used_tAnd weighting the initial depth information corresponding to the midpoint E in the main body characteristic region to achieve the purpose of realizing weighted average according to the confidence of the pixel points, so that the depth information of the target object obtained by calculation is more stable and accurate.

For example, referring to fig. 8, it is assumed that a pixel E, F, G, H in the first main feature region EFGH is determined as a feature pixel, and a weight value P is calculated for a point E_t1And initial depth information Ed of the point E, and a weighted value P is obtained by aiming at the point F_t2And initial depth information Fd of the point E, a weight value P is calculated for the point G_t3And initial depth information Gd of the point E, a weight value P is calculated for the point H_t4And initial depth information Hd of the point E, and finally calculating to obtain the target object through weighted average calculation

Further, in practical application, referring to fig. 9, a probability distribution diagram of a timing matching operation provided by an embodiment of the present invention is shown, where an abscissa is a frame number, and the frame number is used to represent consecutive frames of the first image and the second image of the target object acquired by the binocular shooting module, and at one of a plurality of time instants, the first image and the second image of the target object acquired by the binocular shooting module can be regarded as one frame. The ordinate is probability and is used for representing the degree of confidence, and in the process of matching the characteristic regions in the plurality of first images acquired at different moments, the matching is continuous and successful, the higher the degree of confidence is, the higher the weight is; otherwise, the matching failure will gradually decrease the confidence, thereby decreasing its weight. As shown in fig. 9, after the successive matching succeeds, the confidence gradually increases until the highest 100%, and once the matching failure occurs, the confidence probability gradually decreases until 0%.

And step 312, determining the three-dimensional physical information of the target object according to the depth information.

The step may specifically refer to the step 105, and is not described herein again.

Fig. 10 is a block diagram of an image processing apparatus according to an embodiment of the present invention, and as shown in fig. 10, the image processing apparatus 400 may include: a receiver 401 and a processor 402;

the receiver 401 is configured to perform: acquiring a target image including a target object;

the processor 402 is configured to perform:

Optionally, the processor 402 is further configured to perform:

dividing the target area into a plurality of sub-areas by extracting edge features of the target area;

determining classification categories of a plurality of the sub-regions through a classification model;

and combining the sub-regions corresponding to the target classification category in the plurality of sub-regions to obtain the main body feature region.

Optionally, the processor 402 is further configured to perform:

determining classification categories of the plurality of sub-regions through a convolutional neural network model;

or, determining classification categories of a plurality of the sub-regions through a classifier.

Optionally, the receiver 401 is further configured to perform:

and at a preset moment, acquiring a first image and a second image of the target object through a binocular camera module.

Optionally, the processor 402 is further configured to perform:

matching the first main feature region of the first image with the second image, and/or matching the second main feature region of the second image with the first image, and calculating to obtain the initial depth information;

and determining the depth information of the target object according to the initial depth information.

Optionally, the processor 402 is further configured to perform:

matching the first main feature region of the first image with the second image, and/or matching the second main feature region of the second image with the first image to obtain a parallax value;

and calculating to obtain the initial depth information according to the parallax value.

Optionally, the processor 402 is further configured to perform:

matching the characteristic pixel points extracted from the first characteristic region in the second image; and/or matching the characteristic pixel points extracted from the second main characteristic region in the first image.

Optionally, the receiver 401 is further configured to perform:

and at a plurality of moments, acquiring a first image and a second image of the target object through the binocular camera module.

Optionally, the processor 402 is further configured to perform:

matching the first main feature region of the first image with the second image acquired at the corresponding moment, and/or matching the second main feature region of the second image with the first image acquired at the corresponding moment;

determining a first matching success number of the matching process.

Optionally, the processor 402 is further configured to perform:

matching the characteristic regions in the plurality of first images acquired at different moments;

and determining second matching success times of the matching processing.

Optionally, the processor 402 is further configured to perform:

setting a weight value for the initial depth information according to the first matching success times and the second matching success times, wherein the weight value is positively correlated with the matching times;

and performing weighted average calculation according to the initial depth information and the weight value corresponding to the initial depth information to obtain the depth information of the target object.

Optionally, the processor 402 is further configured to perform:

determining position coordinates of the target object at different moments according to the depth information;

and determining the three-dimensional physical information of the target object according to the position coordinates of the target object at different moments.

To sum up, the image processing apparatus provided in the embodiment of the present invention obtains a target image including a target object; determining a target area in the target image, wherein at least a main body part of the target object is positioned in the target area; in the target area, determining a main body characteristic area of the target object; and determining the depth information of the target object according to the initial depth information of the main body characteristic region, and determining the three-dimensional physical information of the target object according to the depth information. In the process of obtaining the depth information, the invention removes the interference of the background, the obstruction and the non-main body part of the target object in the target image, thereby reducing the probability of introducing useless information in the process of calculating the depth information and improving the precision of the three-dimensional physical information.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the image processing method are implemented.

The embodiment of the invention also provides a control terminal which is characterized by comprising the image processing device, a transmitting device and a receiving device, wherein the transmitting device transmits a shooting instruction to the movable equipment, the receiving device receives the image shot by the movable equipment, and the image processing device processes the image.

Referring to fig. 11, an embodiment of the present invention further provides a mobile device 500, which includes a camera 501, and further includes the image processing apparatus 400 illustrated in fig. 10, where the image processing apparatus 400 receives an image captured by the camera 501 and performs image processing.

Optionally, the mobile device 500 further includes a controller 502 and a power system 503, and the controller 502 controls the power output of the power system 503 according to the processing result processed by the image processing apparatus 400.

Specifically, the power system includes a motor for driving the paddle and a motor for driving the pan/tilt head, so that the controller 502 can change the attitude or the pan/tilt head orientation (i.e., the orientation of the camera 501) of the mobile device 500 according to the image processing result.

Optionally, the image processing apparatus 400 is integrated in the controller 502.

Optionally, the mobile device 500 includes at least one of an unmanned aerial vehicle, an unmanned ship, and a handheld shooting device.

Fig. 12 is a schematic diagram of a hardware structure of a control terminal for implementing various embodiments of the present invention, where the control terminal 600 includes, but is not limited to: a radio frequency unit 601, a network module 602, an audio output unit 603, an input unit 604, a sensor 605, a display unit 606, a user input unit 607, an interface unit 608, a memory 609, a processor 610, and a power supply 611. Those skilled in the art will appreciate that the control terminal configuration shown in fig. 12 does not constitute a limitation of the control terminal, and that the control terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the control terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 601 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 610; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Further, the radio frequency unit 601 may also communicate with a network and other devices through a wireless communication system.

The control terminal provides wireless broadband internet access to the user via the network module 602, e.g., to assist the user in transceivingElectronic mailBrowsing web pages and accessing streaming media, etc.

The audio output unit 603 may convert audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into an audio signal and output as sound. Also, the audio output unit 603 may also provide audio output related to controlling a specific function performed by the terminal 600 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.

The input unit 604 is used to receive audio or video signals. The input Unit 604 may include a Graphics Processing Unit (GPU) 6041 and a microphone 6042, and the Graphics processor 6041 processes image data of a still picture or video obtained by an image capture control terminal (such as a camera) in a video capture mode or an image capture mode. The processed image frames may be displayed on the display unit 606. The image frames processed by the graphic processor 6041 may be stored in the memory 609 (or other storage medium) or transmitted via the radio frequency unit 601 or the network module 602. The microphone 6042 can receive sound, and can process such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 601 in case of the phone call mode.

The control terminal 600 also includes at least one sensor 605, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 6061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 6061 and/or the backlight when the control terminal 600 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the attitude of the control terminal (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 605 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 606 is used to display information input by the user or information provided to the user. The Display unit 606 may include a Display panel 6061, and the Display panel 6061 may be configured by a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 607 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the control terminal. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. Touch panel 6071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 6071 using a finger, stylus, or any suitable object or accessory). The touch panel 6071 may include two parts of a touch detection control terminal and a touch controller. The touch detection control terminal detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection control terminal, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 610, receives a command sent by the processor 610, and executes the command. In addition, the touch panel 6071 can be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The user input unit 607 may include other input devices 6072 in addition to the touch panel 6071. Specifically, the other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein again.

Further, the touch panel 6071 can be overlaid on the display panel 6061, and when the touch panel 6071 detects a touch operation on or near the touch panel 6071, the touch operation is transmitted to the processor 610 to determine the type of the touch event, and then the processor 610 provides a corresponding visual output on the display panel 6061 according to the type of the touch event. Although the touch panel 6071 and the display panel 6061 are two independent components to realize the input and output functions of the control terminal, in some embodiments, the touch panel 6071 and the display panel 6061 may be integrated to realize the input and output functions of the control terminal, and this is not limited herein.

The interface unit 608 is an interface through which an external control terminal is connected to the control terminal 600. For example, the external control terminal may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a control terminal having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 608 may be used to receive input (e.g., data information, power, etc.) from an external control terminal and transmit the received input to one or more elements within the control terminal 600 or may be used to transmit data between the control terminal 600 and the external control terminal.

The memory 609 may be used to store software programs as well as various data. The memory 609 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 609 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 610 is a control center of the control terminal, connects various parts of the entire control terminal by using various interfaces and lines, and performs various functions of the control terminal and processes data by running or executing software programs and/or modules stored in the memory 609 and calling data stored in the memory 609, thereby performing overall monitoring of the control terminal. Processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 610.

The control terminal 600 may further include a power supply 611 (such as a battery) for supplying power to various components, and preferably, the power supply 611 may be logically connected to the processor 610 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

In addition, the control terminal 600 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present invention further provides a control terminal, which includes a processor 610, a memory 609, and a computer program stored in the memory 609 and capable of running on the processor 610, where the computer program is executed by the processor 610 to implement each process of the above-mentioned image processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the embodiment of the image processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

Referring to fig. 12, an embodiment of the present invention further provides a mobile device, including a shooting device, and further including the image processing device shown in fig. 10, where the image processing device receives an image shot by the shooting device and performs image processing on the image.

Optionally, the mobile device further comprises a controller and a power system, wherein the controller controls power output of the power system according to a processing result processed by the image processing apparatus.

Optionally, the image processing apparatus is integrated in the controller.

Optionally, the mobile device includes at least one of an unmanned aerial vehicle, an unmanned ship, and a handheld shooting device.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, control terminal, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create control terminals for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction control terminals which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The foregoing describes in detail a processing method and a control terminal for an application icon provided in the present application, and a specific example is applied in the present application to explain the principle and implementation of the present application, and the description of the foregoing embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring a target image including a target object;

2. The method of claim 1, wherein the step of determining a subject feature region of the target object in the target region comprises:

3. The method according to any one of claims 1 to 2, wherein when the target object is in a stressed or moving state, the amount of deviation of the profile of the main body feature region is less than or equal to a preset threshold.

4. The method of claim 2, wherein the step of determining classification categories for the plurality of sub-regions via a classification model comprises:

5. The method of claim 1, wherein the step of acquiring a target image including a target object comprises:

6. The method according to claim 5, wherein the step of determining the depth information of the target object according to the initial depth information of the pixel points in the main feature region comprises:

7. The method according to claim 6, wherein the matching processing is performed on the first main feature region of the first image and the second image, and/or the matching processing is performed on the second main feature region of the second image and the first image, and the calculating of the initial depth information specifically includes:

8. The method according to claim 6 or 7, wherein matching the first subject feature region of the first image with the second image, and/or matching the second subject feature region of the second image with the first image specifically comprises:

9. The method of claim 8, wherein the characteristic pixel is a pixel having a gray value change greater than a predetermined threshold value or a curvature at an edge of the image greater than a predetermined curvature value.

10. The method according to claim 6, further comprising, after the matching the first subject feature region of the first image with the second image and/or the matching the second subject feature region of the second image with the first image to obtain the initial depth information, calculating:

11. The method of claim 10, further comprising, after acquiring the first image and the second image of the target object by the binocular camera module at a plurality of times:

determining a first matching success number of the matching process.

12. The method of claim 11, wherein after acquiring the first image and the second image of the target object by the binocular camera module at a plurality of times, further comprising:

and determining second matching success times of the matching processing.

13. The method according to claim 12, wherein determining the depth information of the target object according to the initial depth information specifically comprises:

14. The method of claim 1, wherein determining three-dimensional physical information of the target object from the depth information comprises:

15. An image processing apparatus, characterized in that the apparatus comprises: a receiver and a processor;

the processor is configured to perform:

16. The apparatus of claim 15, wherein the processor is further configured to perform:

17. The apparatus according to any one of claims 15 to 16, wherein when the target object is in a stressed or moving state, the amount of deviation of the profile of the main body feature region is less than or equal to a preset threshold.

18. The apparatus of claim 16, wherein the processor is further configured to perform:

19. The apparatus of claim 15, wherein the receiver is further configured to perform:

20. The apparatus of claim 19, wherein the processor is further configured to perform:

21. The apparatus of claim 20, wherein the processor is further configured to perform:

22. The apparatus of claim 20 or 21, wherein the processor is further configured to perform:

23. The apparatus of claim 22, wherein the characteristic pixel is a pixel in which a change in gray scale value in the image is greater than a predetermined threshold or a curvature on an edge of the image is greater than a predetermined curvature value.

24. The apparatus of claim 20, wherein the receiver is further configured to perform:

25. The apparatus of claim 24, wherein the processor is further configured to perform:

determining a first matching success number of the matching process.

26. The apparatus of claim 25, wherein the processor is further configured to perform:

and determining second matching success times of the matching processing.

27. The apparatus of claim 26, wherein the processor is further configured to perform:

28. The apparatus of claim 15, wherein the processor is further configured to perform:

29. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the image processing method of any one of claims 1 to 14.

30. A control terminal comprising the image processing apparatus according to any one of claims 15 to 28, transmitting means for transmitting a photographing instruction to a movable device, and receiving means for receiving an image photographed by the movable device, wherein the image processing means processes the image.

31. The control terminal of claim 30, wherein the mobile device comprises at least one of a drone, a drone vehicle, a drone, a handheld camera device.

32. A mobile device comprising a camera, characterized in that it further comprises an image processing device according to any one of claims 15 to 28, said image processing device receiving images taken by said camera and performing image processing.

33. The mobile device according to claim 32, further comprising a controller and a power system, the controller controlling a power output of the power system according to a processing result processed by the image processing apparatus.

34. The removable device of claim 33, wherein the image processing apparatus is integrated into the controller.

35. The mobile device according to claim 32, wherein the mobile device comprises at least one of a drone, a drone vehicle, a drone, a handheld camera device.