Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the first aspect of the present invention provides a multi-scale unmanned aerial vehicle aerial photographing target tracking method, which comprises:
Acquiring an unmanned aerial vehicle aerial video;
Inputting an initial frame and a current frame of an aerial video of an unmanned plane into a twin tracking network constructed based on a G-ResNet network, outputting three groups of first weighted feature graphs and second weighted feature graphs from three convolution blocks of layer2, layer3 and layer4 of the G-ResNet network respectively, wherein the G-ResNet network is obtained by replacing a convolution kernel of 3 times 3 of a residual module of each Bottleneck in the resnet network through a plurality of convolution layer groups of the same topological structure stacked in parallel, and adding a double multi-scale attention module behind each Bottleneck;
and carrying out weighted fusion on the three groups of first weighted feature images and the second weighted feature images by utilizing a plurality of area suggestion networks without anchor frames, and tracking the target of the current frame according to the predicted frames and the predicted positions in the weighted fusion results.
Further, replacing the 3 by 3 convolution kernels of the residual modules of each Bottleneck in the resnet network by a plurality of convolutionally grouped layers of the same topology stacked in parallel, comprising:
In layer1, a 3 by 3 convolution kernel of 64 channels in the residual modules of 3 Bottleneck is divided into 32 groups of parallel stacked convolution kernel groups of 3 by 3 of 4 channels;
In layer2, the 3 by 3 convolution kernels with 128 channels in the 4 Bottleneck residual modules are divided into 32 groups of parallel stacked convolution kernel groups with 8 channels and a size of 3 by group convolution;
In layer3, the 3 by 3 convolution kernels with 256 channels in the 6 Bottleneck residual modules are divided into 32 groups of parallel stacked convolution kernel groups with 16 channels and a size of 3 by grouping convolution;
in layer4, the 3 by 3 convolution kernels of 512 channels in the 3 Bottleneck residual modules are divided into 32 sets of parallel stacked sets of 32 channels with a3 by 3 convolution kernel size by group convolution.
Further, the first weighted feature map and the second weighted feature map are output from three convolution blocks of layer2, layer3 and layer4 of the G-ResNet network, respectively, including:
Extracting a first feature map and a second feature map output by a first Bottleneck of layers 2, 3 and 4 of the template branch and the search branch respectively through a double multi-scale attention module;
Grouping the first feature map and the second feature map respectively to obtain a plurality of grouping feature maps corresponding to the first feature map and the second feature map respectively;
Decomposing each grouping feature map into a first sub-feature map and a second sub-feature map;
processing the first sub-feature map and the second sub-feature map by using the position attention module and the channel attention module respectively to obtain a sub-feature map with position attention response and a third sub-feature map with channel attention response and a fourth sub-feature map with channel attention response respectively;
Channel fusion is carried out on the third sub-feature diagram and the fourth sub-feature diagram to obtain a fifth sub-feature diagram corresponding to the grouping feature diagram;
Acquiring a plurality of fifth sub-feature graphs corresponding to the plurality of grouping feature graphs;
shuffling the fifth sub-feature maps to obtain a weighted feature map output by a first Bottleneck of the template branch and the search branch of the first Bottleneck;
The weighted feature maps output from the first Bottleneck of the template branch and the search branch are sequentially propagated backward, and the first weighted feature map and the second weighted feature map are output from the last Bottleneck of layer2, layer3, and layer4, respectively.
Further, an expression of a position attention response, comprising:
;
Wherein, the Representing a first sub-feature map, IN (X k1) represents completion using instance normalizationSpatial information statistics of (a),AndRespectively for reinforcementAnd sigmoid nonlinear activation functions.
Further, a channel attention response expression comprising:
;
wherein H and W represent the height and width of the second sub-feature map, respectively, Representing a second sub-feature map, F gap represents a global average pooling function,The scaling and shifting operations are performed on s,Representing a sigmoid nonlinear activation function.
Further, using a plurality of anchor-free regional suggestion networks, performing weighted fusion on the three sets of first weighted feature maps and the second weighted feature maps, including:
An RPN module without an anchor frame strategy is respectively arranged between three convolution blocks of a template branch and a layer2, a layer3 and a layer4 of a search branch of the G-ResNet network, the RPN module without the anchor frame strategy comprises a classification branch and a regression branch, and the regression branch predicts the offset between a target pixel point and a real frame through the regression branch;
respectively inputting the first weighted feature map and the second weighted feature map into a convolution network in a regression branch and a classification branch of an RPN module without an anchor frame strategy, outputting the regression map and the classification map from the regression branch, and outputting the regression map and the classification map from the classification branch;
Performing deep cross-correlation operation on the two regression graphs output by the classification branch and the regression branch to obtain a regression result;
performing deep cross-correlation operation on the two classification graphs output by the classification branch and the regression branch to obtain a classification result;
acquiring the position of the maximum value of the classification result as the predicted position of the target;
and obtaining a prediction boundary frame corresponding to the prediction position from the regression result as a target prediction frame.
The invention also provides a multi-scale unmanned aerial vehicle aerial photographing target tracking device, which comprises:
The acquisition module is used for acquiring the aerial video of the unmanned aerial vehicle;
The processing module is used for inputting an initial frame and a current frame of an aerial video of the unmanned aerial vehicle into a template branch and a search branch in a twin tracking network constructed based on a G-ResNet network, outputting three groups of first weighted feature images and second weighted feature images from three convolution blocks of layer2, layer3 and layer4 of the G-ResNet network respectively, wherein the G-ResNet network is obtained by replacing a convolution kernel of 3 times 3 of a residual error module of each Bottleneck in the resnet network through a plurality of convolution layer groups of the same topological structure stacked in parallel and adding a double multi-scale attention module behind each Bottleneck;
and the output module is used for carrying out weighted fusion on the three groups of first weighted feature images and the second weighted feature images by utilizing a plurality of area suggestion networks without anchor frames, and tracking the target of the current frame according to the predicted frame and the predicted position in the weighted fusion result.
The invention also provides an electronic device comprising a processor and a memory, wherein at least one instruction, at least one program, code set or instruction set is stored in the memory, and the at least one instruction, the at least one program, code set or instruction set is loaded and executed by the processor to implement the multi-scale unmanned aerial vehicle aerial photographing target tracking method according to any one of the first aspect.
The present invention also provides a computer readable storage medium having stored therein at least one instruction, at least one program, code set or instruction set, the at least one instruction, at least one program, code set or instruction set being loaded and executed by a processor to implement the multi-scale unmanned aerial vehicle aerial target tracking method according to any of the first aspects.
The embodiment of the invention provides a multi-scale unmanned aerial vehicle aerial photographing target tracking method and device, which have the following beneficial effects compared with the prior art:
1) By utilizing the sub-space learning idea of grouping-conversion-fusion, a grouping residual error network G-ResNet is designed, deep semantic features and diversified features of the target can be extracted, challenges such as appearance change and motion blur of the target are effectively met, and the representation capability of the small target is enhanced.
2) A multi-scale attention module DMSAM is designed, feature images are grouped to extract target feature information of different scales, then double attention is used for respectively extracting local features of targets in space and channel dimensions and establishing global dependence relationship between the targets and the background, and finally information communication between different channels is established, so that the scale adaptation capability and the anti-interference capability of the invention are enhanced.
3) An area suggestion module AF-RPN based on an anchor frame-free strategy is provided to replace a predefined anchor frame, distinguish targets from backgrounds pixel by pixel, and realize self-adaptive perception capability on target scales. And a plurality of AF-RPNs are cascaded on the G-ResNet, so that complementary detailed information and semantic information are effectively utilized to realize robust tracking and accurate positioning of a tracking target. Meanwhile, the speed reaches 40.5 FPS, and the real-time requirement is met.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment).
At present, the target tracking algorithm is mainly divided into a tracking algorithm based on correlation filtering and a tracking algorithm based on deep learning. The correlation filter tracking algorithm uses a correlation filter in the signal processing field to calculate the similarity between the template and the search image, and the Fourier transform is utilized to accelerate in the frequency domain, so that the operation amount is greatly reduced, the operation speed is improved, and hundreds of frames per second can be reached. However, most of related filtering algorithms are used for representing a tracking target by using a traditional feature extraction algorithm, so that the robustness and the accuracy are insufficient, and the target tracking task in a complex scene cannot be effectively processed.
Due to the great potential of the twin tracking algorithm in precision and speed, the twin tracking algorithm gradually becomes a mainstream algorithm in the field of target tracking, and most of follow-up tracking algorithms are researched based on twin structures. The working principle of the twin tracking algorithm can be expressed as formula (1), and the twin tracking algorithm mainly consists of a feature extraction partSimilarity calculation part) And a tracking result generation section.
(1)
In the formula,Is a similarity response graph; Is a feature extraction section; Is a cross-correlation operation; Deviation for each position; Is an identity matrix.
1) And the feature extraction part is used for extracting features by using a twin neural network, and the two branches are respectively a template branch and a search branch. Target image of template branch input initial frameAs templates and output as template feature imagesThe search branch inputs the search image of the subsequent frameOutput as search feature map。
2) Similarity calculation part) Feature information on feature graphs of two branches is integrated, similarity between a search feature graph and a template feature graph is calculated, and a similarity response graph is generated。
3) And a tracking result generating part for predicting the target position on the search image according to the obtained response diagram, wherein the position of the maximum response is generally considered as the target predicted position, and then carrying out target scale estimation and bounding box regression.
The process of on-line tracking by the twin tracking algorithm mainly comprises the following steps:
Inputting the video sequence into the feature extraction part frame by frame;
If the frame is the first frame, extracting target features by a template branch to serve as template features;
if the frame is not the first frame, the searching branch extracts the target feature of the current frame as the searching feature;
the similarity calculation part calculates the similarity between the feature images and generates a response image;
The tracking result generating part predicts the target position in the current frame by using the similarity response diagram;
Repeating the steps 3-5 until the last frame of the video sequence.
Fig. 1 is a flowchart of a multi-scale unmanned aerial vehicle aerial target tracking method provided by an embodiment of the present invention, where, as shown in fig. 1, the method includes:
Step 101, acquiring an unmanned aerial vehicle aerial video;
102, inputting an initial frame and a current frame of an unmanned aerial vehicle aerial video into a template branch and a search branch in a twin tracking network constructed based on a G-ResNet network, and outputting three groups of first weighted feature graphs and second weighted feature graphs from layer2, layer3 and layer4 of the G-ResNet network respectively, wherein the G-ResNet network is obtained by replacing a convolution kernel of 3 times 3 of a residual module of each Bottleneck in the resnet network through a plurality of convolution layer groups of the same topological structure stacked in parallel and adding a double multi-scale attention module behind each Bottleneck;
And 103, carrying out weighted fusion on the three groups of first weighted feature images and the second weighted feature images by utilizing a plurality of area suggestion networks without anchor frames, and tracking the target of the current frame according to the predicted frames and the predicted positions in the weighted fusion result.
Fig. 2 is a network model diagram of an unmanned aerial vehicle target tracking method based on a double multi-scale attention module. As shown in fig. 2, first, a packet residual network (Group Residual Network, G-ResNet) is designed, convolutional blocks having the same topology are stacked in parallel, diversified features of a target are extracted, and a characterization capability of a tracking target is enhanced without increasing a network depth. Second, to better screen features, two multi-scale attentives (Dual Multi Scale Attention Module, DMSAM) are used to extract multi-scale feature information of the target, suppressing interference information in both channel and spatial dimensions. And in the final tracking frame generation stage, a plurality of anchor frame-free regional suggestion networks (Anchor Free Region Proposal Network, AF-RPN) are used for adaptively sensing the scale change of the target, so that the problem of the scale change is effectively solved. Experiments show that the method can more effectively cope with the problems of scale change, small targets, motion blur, partial shielding and the like, improves the tracking effect on aerial targets, achieves the speed of 40.5 FPS and meets the real-time requirement.
In one possible embodiment, replacing the 3 by 3 convolution kernels of the residual modules of each Bottleneck in the resnet network by a plurality of convolutionally layered groups of identical topologies stacked in parallel, comprises:
In layer1, a 3 by 3 convolution kernel of 64 channels in the residual modules of 3 Bottleneck is divided into 32 groups of parallel stacked convolution kernel groups of 3 by 3 of 4 channels;
In layer2, the 3 by 3 convolution kernels with 128 channels in the 4 Bottleneck residual modules are divided into 32 groups of parallel stacked convolution kernel groups with 8 channels and a size of 3 by group convolution;
In layer3, the 3 by 3 convolution kernels with 256 channels in the 6 Bottleneck residual modules are divided into 32 groups of parallel stacked convolution kernel groups with 16 channels and a size of 3 by grouping convolution;
in layer4, the 3 by 3 convolution kernels of 512 channels in the 3 Bottleneck residual modules are divided into 32 sets of parallel stacked sets of 32 channels with a3 by 3 convolution kernel size by group convolution.
In the embodiment provided by the invention, the invention increases the base number on ResNet-50 with deeper network layer number to improve the network performance. Increasing the cardinality (cardinality) of the network more effectively increases the network's feature description capabilities than increasing the number of network layers, while not increasing the number of network parameters. Based on the design concept of packet-transform-merge (split-transform-merge), as shown in FIG. 3, FIG. 3 shows an alternative example of layer1, taking into account the residual blockIs the main extraction part of the feature information, and therefore will be in the residual blockInstead of stacking multiple convolutions of the same topology in parallel. In the convolution process of the common convolution, one channel of the output feature map needs all channels of the input feature map to participate in calculation. In the implementation of the parallel stacking operation, the number of channels is 64 by grouping convolution (Group convolution)4 Channels divided into 32 groupsDifferent convolution groups can be regarded as different subspaces, and the feature information learned by each subspace is different from each other with emphasis, namely, the diversified feature information of the target is extracted.
In one possible implementation, three sets of first and second weighted feature maps are output from three convolution blocks of layer2, layer3, and layer4, respectively, of the G-ResNet network, including:
Extracting a first feature map and a second feature map output by a first Bottleneck of layers 2, 3 and 4 of the template branch and the search branch respectively through a double multi-scale attention module;
Grouping the first feature map and the second feature map respectively to obtain a plurality of grouping feature maps corresponding to the first feature map and the second feature map respectively;
Decomposing each grouping feature map into a first sub-feature map and a second sub-feature map;
processing the first sub-feature map and the second sub-feature map by using the position attention module and the channel attention module respectively to obtain a sub-feature map with position attention response and a third sub-feature map with channel attention response and a fourth sub-feature map with channel attention response respectively;
Channel fusion is carried out on the third sub-feature diagram and the fourth sub-feature diagram to obtain a fifth sub-feature diagram corresponding to the grouping feature diagram;
Acquiring a plurality of fifth sub-feature graphs corresponding to the plurality of grouping feature graphs;
shuffling the fifth sub-feature maps to obtain a weighted feature map output by a first Bottleneck of the template branch and the search branch of the first Bottleneck;
The weighted feature maps output from the first Bottleneck of the template branch and the search branch are sequentially propagated backward, and the first weighted feature map and the second weighted feature map are output from the last Bottleneck of layer2, layer3, and layer4, respectively.
In the embodiment provided by the invention, the attention module can adaptively allocate weights and selectively screen the feature map information, so that the network is helped to pay attention to the interested target better, and the defect of G-ResNet can be effectively overcome. Thus, to enhance the discrimination capabilities of the present invention, a dual multiscale attention module (DMSAM) was introduced on G-ResNet. As shown in FIG. 4, in order for the network to learn the feature information of different scales, the DMSAM firstly extracts and groups the features of various scales, then uses the position and channel attention module in parallel to capture local features and global dependencies adaptively, and finally fuses and shuffles the feature graphs of all channels to strengthen the information exchange among different channels.
First assume that the input feature map isWhereinRepresenting the number, height and width of channels of the feature map, respectively. To reduce the calculation cost, willIn the channel dimension intoA group of sub-feature maps is provided,Because the sub-feature images are divided according to channels, each sub-feature image can capture specific semantic information in the training processIs divided into two parts to obtainOne uses channel attention to capture the interrelationship between channels and the other uses position attention to find the spatial relationship between features. Thus by weight allocation of the attention module, the network knows better what is concerned (what) and where is concerned (where) is meaningful.
In one possible embodiment, the expression of the position attention response includes:
(2)
Wherein, the Representing a first sub-feature map, IN (X k1) represents completion using instance normalizationSpatial information statistics of (a),AndRespectively for reinforcementAnd sigmoid nonlinear activation functions.
In the embodiment provided by the invention, the object similar to the tracking target is always present in the unmanned aerial vehicle tracking process, so that the characteristic information of the tracking target is present on the characteristic diagram, and the characteristic information of the similar object is also present. The position attention is to enhance the discrimination of similar objects and give a larger degree of attention to the position of the target. The present invention uses instance normalization (Instance Normalization, IN) to complete the alignmentSpatial information statistics onLast position attention responseFrom formula (3):
(3)
wherein: And For strengtheningIs a representation of the capabilities of (1). The weight design of the position attention response to each position of the feature map effectively suppresses the interference of the similar objects, and the aim of the position (where) on the image is clearly focused by the network.
In one possible implementation, the channel attention response expression includes:
(4)
(5)
wherein H and W represent the height and width of the second sub-feature map, respectively, Representing a second sub-feature map, F gap represents a global average pooling function,The scaling and shifting operations are performed on s,Representing a sigmoid nonlinear activation function.
Different channels on the feature map of the deep network represent different semantic information. The process of channel attention allocation weights can be seen as a process of selecting semantic attributes for different channels. The present invention uses Global Average Pooling (GAP) to compressFeature layer on channel to obtain result:
(6)
To learn the nonlinear relationship between channels, the following followsNonlinear activation function through sigmoidObtaining weight coefficients, adaptively guiding the network to select proper characteristic diagrams, and responding to channel attentionObtained from the formula (7):
(7)
wherein: For a pair of Scaling and shifting operations are performed. And (3) distributing weights of the feature images according to different semantic information, wherein the weight of the channel where the target is located is the largest. In the cross-correlation operation, the responses on the other channels are suppressed, and the aim of what class (what) the network should pay attention to is clear.
Attention response before shufflingAndConnection to obtain new sub-feature mapAll the new sub-feature images are overlapped according to the channels and combined to form the feature imageAs shown in formula (8). Then equation (9) the operation process uses channel shuffling (channel_shuffle) as shown in FIG. 5. Will firstIs unfolded intoMatrix of four dimensions, then matrixUnchanged dimension, pairThe dimensions are transposed, and then the dimensions of the matrix are compressed to obtain an output characteristic diagram. The shuffling operation can effectively integrate the characteristic information on each channel, and strengthen the information exchange between channels.
(8)
(9)
In the DMSAM, target feature information of different scales is extracted from the group feature map, then the double attentions are used for respectively extracting local features of the targets in the channel and space dimensions, establishing global dependency relationship between the targets and the background, finally establishing information communication between different channels, increasing the difference between the targets and interference information, and improving the scale adaptability and discrimination capability of the invention.
In one possible implementation, using a plurality of anchor-free regional suggestion networks, the weighted fusion of the three sets of first weighted feature maps and the second weighted feature map includes:
An RPN module without an anchor frame strategy is respectively arranged between three convolution blocks of a template branch and a layer2, a layer3 and a layer4 of a search branch of the G-ResNet network, the RPN module without the anchor frame strategy comprises a classification branch and a regression branch, and the regression branch predicts the offset between a target pixel point and a real frame through the regression branch;
respectively inputting the first weighted feature map and the second weighted feature map into a convolution network in a regression branch and a classification branch of an RPN module without an anchor frame strategy, outputting the regression map and the classification map from the regression branch, and outputting the regression map and the classification map from the classification branch;
Performing deep cross-correlation operation on the two regression graphs output by the classification branch and the regression branch to obtain a regression result;
performing deep cross-correlation operation on the two classification graphs output by the classification branch and the regression branch to obtain a classification result;
acquiring the position of the maximum value of the classification result as the predicted position of the target;
and obtaining a prediction boundary frame corresponding to the prediction position from the regression result as a target prediction frame.
In one possible implementation manner, a set of anchor frames with different scales are predefined in the RPN module to perform scale estimation, the prior information of the anchor frames is obtained by analyzing from video, the prior information is against the starting point of the tracking task, and the tracking performance is sensitive to the parameters of the anchor frames and needs to be set manually and carefully. Therefore, in order to get rid of excessive dependence on target priori information, the adaptive estimation of the target scale is completed in the RPN module by using an anchor-free frame strategy. RPN module (AF-RPN) based on anchor-free frame strategy, and boundary frame regression branch thereofInstead of regression of the size (length, width, center point position) of the anchor, the offset l, t, b ,r between the target pixel and the real frame (group-truth) is predictedWhether the target in the anchor is a positive sample is judged by calculating the area intersection ratio (Intersection overUnion, ioU) of the anchor and the real frame. Therefore, the anchor-free frame strategy requires a new positive and negative sample discrimination method that the pixel points of the similarity response graph are mapped back into the search image, fall outside the ellipse E 1 and are negative samples, and fall inside the ellipse E 2 and are positive samples, as shown in FIG. 6.
(9)
Wherein:、 classification results and regression results; Representing a depth cross-correlation operation; Extracting a network for the features; For the width, height and number of channels of the feature map,
And finding the maximum value on the classification result S, wherein the position of the maximum value is the predicted position of the target, and meanwhile, the position in the regression result has a corresponding predicted boundary box which is used as the predicted box of the target.
The invention also provides a multi-scale unmanned aerial vehicle aerial photographing target tracking device 200, as shown in fig. 7, comprising:
an acquisition module 201, configured to acquire an aerial video of the unmanned aerial vehicle;
The processing module 202 is configured to input an initial frame and a current frame of an aerial video of the unmanned aerial vehicle into a template branch and a search branch in a twin tracking network constructed based on a G-ResNet network, output three groups of a first weighted feature map and a second weighted feature map from three convolution blocks of layer2, layer3 and layer4 of the G-ResNet network, respectively, where the G-ResNet network is obtained by stacking a plurality of convolution groups of the same topological structure in parallel, replacing a convolution kernel of 3 times 3 of a residual module of each Bottleneck in the resnet network, and adding a double multi-scale attention module behind each Bottleneck;
And the output module 203 is configured to perform weighted fusion on the three sets of first weighted feature maps and the second weighted feature maps by using a plurality of area suggestion networks without anchor frames, and track the target of the current frame according to the prediction frame and the prediction position in the weighted fusion result.
In yet another embodiment of the present invention, there is further provided an apparatus, including a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the multi-scale unmanned aerial vehicle aerial target tracking method described in the embodiments of the present invention.
In yet another embodiment of the present invention, a computer readable storage medium is provided, where at least one instruction, at least one section of program, a code set, or an instruction set is stored, where the at least one instruction, the at least one section of program, the code set, or the instruction set is loaded and executed by a processor to implement the multi-scale unmanned aerial vehicle aerial target tracking method described in the embodiments of the present invention.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes a plurality of computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of a plurality of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.