CN111626208B

CN111626208B - Method and device for detecting small objects

Info

Publication number: CN111626208B
Application number: CN202010461384.2A
Authority: CN
Inventors: 何刚
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2023-06-13
Anticipated expiration: 2040-05-27
Also published as: KR102523886B1; JP2021179971A; KR20210042275A; JP7262503B2; CN111626208A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for detecting small targets. One embodiment of the method comprises the following steps: acquiring an original image including a small target; reducing the original image to a low resolution image; identifying candidate areas comprising small targets from the low-resolution image by adopting a lightweight segmentation network; and taking the region of the original image corresponding to the candidate region as an interest region, running a pre-trained detection model on the interest region, and determining the position of the small target in the original image. According to the embodiment, a two-stage detection method is designed, the region of interest is searched through the lightweight segmentation network, and then the detection model is operated in the region of interest, so that the calculated amount can be greatly saved.

Description

Method and device for detecting small objects

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for detecting small objects.

Background

Target detection is an important research direction in the field of autopilot. The targets that it mainly detects fall into two categories: a stationary object and a moving object. Stationary objects such as traffic lights, traffic signs, lanes, obstacles, etc.; moving objects such as vehicles, pedestrians, non-motor vehicles, etc. The traffic sign detection provides rich and necessary navigation information for the unmanned automobile in the driving process, and is a fundamental work with important significance.

In applications such as AR navigation, the traffic identification of the current road section is detected in real time, and the method has important significance for users to make corresponding prompts. In the vehicle-mounted video, the size distribution range of the traffic sign is wide, a large number of small targets (below 20 pixels) exist, and the detection of the small targets not only tests the detection algorithm, but also requires the image to keep higher resolution, which is a big test on the limited calculation performance of the vehicle.

In order to ensure timeliness of traffic sign recognition, most of the existing schemes adopt a YOLO model to train an input image, and classification of traffic signs is predicted through obtained predicted values, so that recognition is completed. The training network of the YOLO model is a CNN model comprising 7 convolutions of C1-C7 and two full-connection layers, so that the identification can be completed at a high speed, but traffic signs usually only occupy a small part of the acquired original images, and the size of a feature map is continuously reduced when the feature map passes through one convolutions, so that the features of the smaller images are easily lost after the conventional YOLO model method is subjected to multi-layer convolution, and the success rate of traffic sign identification is influenced.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatus for detecting small targets.

In a first aspect, embodiments of the present disclosure provide a method for detecting a small target, comprising: acquiring an original image including a small target; reducing the original image to a low resolution image; identifying candidate areas comprising small targets from the low-resolution image by adopting a lightweight segmentation network; and taking the region of the original image corresponding to the candidate region as an interest region, running a pre-trained detection model on the interest region, and determining the position of the small target in the original image.

In some embodiments, the detection model is trained by the following method: determining a network structure of an initial detection model and initializing network parameters of the initial detection model; acquiring a training sample set, wherein the training sample comprises a sample image and labeling information for representing the position of a small target in the sample image; the training samples are enhanced by at least one of: copying, multi-scale changing and editing; respectively taking sample images and labeling information in training samples in the enhanced training sample set as input and expected output of an initial detection model, and training the initial detection model by using a machine learning method; and determining the initial detection model obtained through training as a pre-trained detection model.

In some embodiments, the training samples are compiled by: digging out a small target from the sample image; and randomly pasting the small target to other positions in the sample image after scaling and/or rotating operation to obtain a new sample image.

In some embodiments, the method further comprises: when training samples of the segmentation network are manufactured, setting the pixel points in the rectangular frame originally used for the detection task as positive samples, and setting the pixel points outside the rectangular frame as negative samples; expanding a rectangular frame of a small target with the length and width smaller than the preset pixel number; and setting the pixels in the rectangular frame after the expansion as positive samples.

In some embodiments, the detection model is a deep neural network.

In some embodiments, attention module is introduced after each prediction layer feature fusion, learning an appropriate weight for the features of the different channels.

In a second aspect, embodiments of the present disclosure provide an apparatus for detecting a small target, comprising: an acquisition unit configured to acquire an original image including a small target; a reduction unit configured to reduce an original image to a low resolution image; a first detection unit configured to identify a candidate region including a small target from the low resolution image using a lightweight segmentation network; the second detection unit is configured to take the region of the original image corresponding to the candidate region as an interest region, run a pre-trained detection model on the interest region and determine the position of the small target in the original image.

In some embodiments, embodiments of the present disclosure provide that the apparatus further comprises a training unit configured to: determining a network structure of an initial detection model and initializing network parameters of the initial detection model; acquiring a training sample set, wherein the training sample comprises a sample image and labeling information for representing the position of a small target in the sample image; the training samples are enhanced by at least one of: copying, multi-scale changing and editing; respectively taking sample images and labeling information in training samples in the enhanced training sample set as input and expected output of an initial detection model, and training the initial detection model by using a machine learning device; and determining the initial detection model obtained through training as a pre-trained detection model.

In some embodiments, the training unit is further configured to: digging out a small target from the sample image; and randomly pasting the small target to other positions in the sample image after scaling and/or rotating operation to obtain a new sample image.

In some embodiments, the first detection unit is further configured to: when training samples of the segmentation network are manufactured, setting the pixel points in the rectangular frame originally used for the detection task as positive samples, and setting the pixel points outside the rectangular frame as negative samples; expanding a rectangular frame of a small target with the length and width smaller than the preset pixel number; and setting the pixels in the rectangular frame after the expansion as positive samples.

In some embodiments, the detection model is a deep neural network.

In a third aspect, embodiments of the present disclosure provide an electronic device for detecting a small target, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as in any of the first aspects.

In a fourth aspect, embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any of the first aspects.

In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the first aspects.

The method and the device for detecting the small target are mainly solved from three aspects of a training method, a model structure and two-stage detection, wherein the training method and the model structure are mainly used for improving the detection capability of the model on the small target, and the two-stage detection is used for reducing the calculated amount in a picture irrelevant area so as to improve the operation speed.

The invention can provide a real-time traffic sign detection algorithm for the AR navigation project, has better performance on small target detection, and can promote the navigation experience of users.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method for detecting small objects according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for detecting small objects according to the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a method for detecting small objects according to the present disclosure;

FIG. 5 is a network block diagram of a detection model for a method of detecting small objects according to the present disclosure;

FIG. 6 is a schematic structural view of one embodiment of an apparatus for detecting small objects according to the present disclosure;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the present application for detecting small targets or for detecting small target devices may be applied.

As shown in fig. 1, a system architecture 100 may include a vehicle 101 and a traffic sign 102.

The vehicle 101 may be a general motor vehicle or an unmanned vehicle. The vehicle 101 may have a controller 1011, a network 1012, and sensors 1013 installed therein. Network 1012 is the medium used to provide a communication link between controller 1011 and sensor 1013. Network 1012 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A controller (also known as an onboard brain) 1011 is responsible for intelligent control of the vehicle 101. The controller 1011 may be a separately provided controller such as a programmable logic controller (Programmable Logic Controller, PLC), a single chip microcomputer, an industrial controller, or the like; the device can also be equipment consisting of other electronic devices with input/output ports and operation control functions; but also a computer device installed with a vehicle driving control type application. The controller is provided with a trained segmentation network and a detection model.

The sensor 1013 may be various types of sensors such as a video camera, a gravity sensor, a wheel speed sensor, a temperature sensor, a humidity sensor, a laser radar, a millimeter wave radar, and the like. In some cases, a GNSS (Global Navigation Satellite System ) device and an SINS (Strap-down Inertial Navigation System, strapdown inertial navigation System) may also be installed in the vehicle 101, and so on.

The vehicle 101 captures a traffic sign 102 during travel. The traffic sign in the image is a small target, whether the image is taken farther or closer.

The vehicle 101 delivers the captured original image including the traffic sign to the controller for recognition, and determines the location of the traffic sign. OCR recognition can also be performed to identify the content of the traffic sign. And then outputting the content of the traffic sign in the form of voice or text.

It should be noted that, the method for detecting a small target provided in the embodiment of the present application is generally performed by the controller 1011, and accordingly, the device for detecting a small target is generally disposed in the controller 1011.

It should be understood that the number of controllers, networks, and sensors in fig. 1 are merely illustrative. There may be any number of controllers, networks, and sensors, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for detecting small objects according to the present disclosure is shown. The method for detecting a small target comprises the following steps:

in step 201, an original image including a small object is acquired.

In this embodiment, an execution subject (e.g., a controller shown in fig. 1) of the method for detecting a small target may acquire a front image through an in-vehicle camera, and the acquired original image includes the small target. A small target refers to an image of a target object having a number of long and wide pixels less than a predetermined value (e.g., 20).

Step 202, reducing the original image into a low resolution image.

In this embodiment, the original image may be divided by 4 (or other multiple) in each of the length-width directions to obtain a low resolution image. The aspect ratio is kept unchanged during the shrinking process.

In step 203, a lightweight segmentation network is used to identify candidate regions from the low resolution image that include small objects.

In this embodiment, since the first stage detection only needs to locate the approximate position where the target may exist, and does not need an accurate outer frame, the first stage detection is implemented by using a lightweight segmentation network, and a point in the final output thermodynamic diagram of the first stage detection, which is greater than a certain threshold, is regarded as a point where the target is suspected to exist. A split network like U-Net may be used, and a backbone network is used for a shufflelet for light weight.

When training samples of the segmentation network are manufactured, the pixels in the rectangular frame originally used for the detection task are set as positive samples, and the pixels outside the rectangular frame are set as negative samples. Because of the scaling in the length-width direction, to ensure recall on small targets, when training samples are made, rectangular frames of targets with length-width smaller than a predetermined value, e.g., 20 pixels, are expanded by one time, and then pixels in the expanded rectangular frames are set as positive samples.

And 204, taking the region of the original image corresponding to the candidate region as an interest region, running a pre-trained detection model on the interest region, and determining the position of the small target in the original image.

In this embodiment, after filtering out noise points in the result output by the segmentation network, a minimum circumscribed rectangle surrounding all suspected target points is formed, and a region corresponding to the rectangle in the non-zoomed high-resolution image is used as the region of interest. The detection model is then run over the region of interest, so that only a portion of the area of the high resolution picture needs to be processed, thereby reducing the computational effort.

As described above, in order to better detect small objects, it is necessary to keep the resolution of the picture high, and a large picture can cause a multiple increase in the calculation amount, which is difficult to process in real time in the vehicle environment. On the other hand, traffic signs occupy a small proportion of the picture, most of which are background areas, and the amount of computation in the background areas occupies a large proportion of the total amount, and processing the background areas at high resolution is time-consuming and meaningless. Therefore, the invention adopts a two-stage detection mode, the approximate position of the suspected target is positioned on the low-resolution picture through a lightweight segmentation network, then the minimum circumscribed rectangle containing all the suspected targets is obtained, and finally a detection model is operated on the high-resolution image block corresponding to the minimum circumscribed rectangle, so that the calculated amount is reduced under the condition of ensuring the detection rate of the small target.

After the two-stage processing, the average calculated amount of the detection model is reduced to about 25% of the original calculated amount, and the average calculated amount of the two models added up is about 45% of the original calculated amount.

With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the method for detecting a small target according to the present embodiment. In the application scenario of fig. 4, the vehicle acquires the front image in real time during driving. Dividing the acquired original image length and width by 4, and then shrinking the original image length and width into a low-resolution image. The low resolution image is input into a lightweight segmented network, identifying candidate regions that include traffic identifications. And then finding out the region of the original image corresponding to the candidate region in the original image as the region of interest. The image of the interest area is scratched out, a pre-trained detection model is input, and the specific position of the traffic sign in the original image is determined, as shown by a dotted line box.

The method provided by the embodiment of the disclosure reduces the calculated amount and improves the recognition speed and accuracy through secondary detection.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for detecting small objects is shown. The process 400 of the method for detecting small objects comprises the steps of:

step 401, determining a network structure of an initial detection model and initializing network parameters of the initial detection model.

In this embodiment, the electronic device (e.g., the controller shown in FIG. 1) on which the method for detecting small objects operates may train out a detection model. The detection model may also be trained by a third party server and then installed into the controller of the vehicle. The detection model is a neural network model, and can be any existing neural network for target detection.

In some alternative implementations of the present embodiment, the detection model is a deep neural network, such as a YOLO series network. YOLO (You Only Look Once) is an object recognition and positioning algorithm based on a deep neural network, and has the biggest characteristic of high running speed, and can be used for a real-time system. YOLO has now evolved to version v3 (YOLO 3), but the new version is evolving with continued improvements over the original version. In the original structural design of YOLO3, the low-resolution feature map is fused with the high-resolution feature map by upsampling. However, such fusion occurs only on the high-resolution feature map, and the fusion of features of different scales is not sufficient.

In order to better fuse the features of different layers, the invention firstly selects the features of 8 times, 16 times and 32 times of downsampling in a backbone network as basic features, then sets the sizes of the prediction feature images to be 8 times, 16 times and 32 times of downsampling of pictures respectively in order to predict targets with different sizes, and fuses the features of each prediction feature image from 3 basic feature layers after the downsampling or upsampling is unified to the same size. Taking a picture downsampling 16 times of prediction layer as an example, the features of the picture are respectively from 3 base feature layers, so that the picture is unified to the same size, downsampling 8 times of base feature layers is performed one time, downsampling 32 times of base feature layers is performed one time, and then the two feature layers are fused with the downsampling 16 times of base feature layers.

If the features of different scales are simply fused, the specific gravity of the features in the 3 prediction layers is the same, and the features cannot be used with emphasis according to different prediction targets. Therefore, after the characteristics of each prediction layer are fused, a attention module is introduced to learn a proper weight for the characteristics of different channels, so that each prediction layer can use the fused characteristics with emphasis according to the characteristics of the prediction targets required by the prediction layer. The network structure is shown in fig. 5. The learning manner of the parameters of the attention module is the prior art, so that no further description is given.

The present disclosure may use YOLO3 as a detection network, and design and assignment of anchors in such an anchor-based (anchor) detection method are very important, and since the number of anchors that can be matched with a small target is small, this may directly lead to insufficient learning of the model on the small target, so that the small target cannot be detected well. A dynamic anchor matching mechanism is adopted, an IOU (confidence score) threshold value is selected adaptively according to the size of a group trunk and matched with the group trunk, and when the targets are smaller, the IOU threshold value is reduced, so that more small targets can participate in training, and the performance of the model on small target detection is improved. When training samples are made, the size of the target is known, and then the appropriate IOU threshold is selected based on the target size.

Step 402, a training sample set is obtained.

In this embodiment, the training sample includes a sample image and annotation information for characterizing the location of small objects in the sample image.

Step 403, enhancing the training sample by at least one of the following ways: replication, multi-scale change, editing.

In this embodiment, this is mainly for strategies that the small number of targets in the training data is not sufficient. On one hand, the number of small targets in the data is directly increased by copying pictures containing the small targets in the multiple data sets; on the other hand, after the small targets in the pictures are picked out to perform operations such as zooming and rotating, the small targets are randomly stuck to other positions of the images, so that the number of the small targets can be increased, more changes can be introduced, and the distribution of training data is enriched.

Optionally, training is performed by scaling the training pictures to different scales, so that the target scale change in the original data set can be enriched, and the model can adapt to detection tasks of targets with different scales.

And step 404, respectively taking sample images and labeling information in the training samples in the enhanced training sample set as input and expected output of an initial detection model, and training the initial detection model by using a machine learning method.

In this embodiment, the execution subject may input a sample image in a training sample in the training sample set into the initial detection model to obtain the position information of the small target in the sample image, and use the labeling information in the training sample as the expected output of the initial detection model, and train the initial detection model by using the machine learning method. Specifically, the difference between the obtained position information and the labeling information in the training sample may be calculated first using a preset loss function, for example, the difference between the obtained position information and the labeling information in the training sample may be calculated using an L2 norm as the loss function. Then, based on the calculated difference, the network parameters of the initial detection model may be adjusted, and the training may be ended if a preset training end condition is satisfied. For example, the training end conditions preset herein may include, but are not limited to, at least one of: the training time exceeds the preset duration; the training times exceed the preset times; the calculated variance is less than a preset variance threshold.

Here, various implementations may be employed to adjust network parameters of the initial detection model based on differences between the generated location information and the annotation information in the training sample. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent, random gradient descent) algorithm may be employed to adjust network parameters of the initial detection model.

Step 405, determining the initial detection model obtained by training as a pre-trained detection model.

In this embodiment, the execution subject of the training step may determine the initial detection model trained in step 404 as a pre-trained detection model.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for detecting small objects, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 6, the apparatus 600 for detecting a small target of the present embodiment includes: an acquisition unit 601, a reduction unit 602, a first detection unit 603, and a second detection unit 604. Wherein the acquisition unit 601 is configured to acquire an original image including a small target; a reduction unit 602 configured to reduce an original image to a low resolution image; a first detection unit 603 configured to identify a candidate region including a small target from the low resolution image using a lightweight segmentation network; the second detection unit 604 is configured to determine the position of the small target in the original image by using the region of the original image corresponding to the candidate region as the region of interest and running a pre-trained detection model on the region of interest.

In this embodiment, specific processes of the acquiring unit 601, the reducing unit 602, the first detecting unit 603, and the second detecting unit 604 of the apparatus 600 for detecting a small target may refer to

steps

201, 202, 203, and 204 in the corresponding embodiment of fig. 2.

In some optional implementations of the present embodiment, the apparatus 600 further includes a training unit (not shown in the drawings) configured to: determining a network structure of an initial detection model and initializing network parameters of the initial detection model; acquiring a training sample set, wherein the training sample comprises a sample image and labeling information for representing the position of a small target in the sample image; the training samples are enhanced by at least one of: copying, multi-scale changing and editing; respectively taking sample images and labeling information in training samples in the enhanced training sample set as input and expected output of an initial detection model, and training the initial detection model by using a machine learning device; and determining the initial detection model obtained through training as a pre-trained detection model.

In some optional implementations of this embodiment, the training unit is further configured to: digging out a small target from the sample image; and randomly pasting the small target to other positions in the sample image after scaling and/or rotating operation to obtain a new sample image.

In some optional implementations of this embodiment, the first detection unit is further configured to: when training samples of the segmentation network are manufactured, setting the pixel points in the rectangular frame originally used for the detection task as positive samples, and setting the pixel points outside the rectangular frame as negative samples; expanding a rectangular frame of a small target with the length and width smaller than the preset pixel number; and setting the pixels in the rectangular frame after the expansion as positive samples.

In some alternative implementations of the present embodiment, the detection model is a deep neural network.

In some alternative implementations of this embodiment, attention modules are introduced after each prediction layer feature fusion, learning a suitable weight for the features of the different channels.

Referring now to fig. 7, a schematic diagram of an electronic device (e.g., the controller of fig. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The controller illustrated in fig. 7 is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 7, the electronic device 700 may include a processing means (e.g., a central processor, a graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

In general, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 shows an electronic device 700 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 7 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 709, or installed from storage 708, or installed from ROM 702. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 701. It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an original image including a small target; reducing the original image to a low resolution image; identifying candidate areas comprising small targets from the low-resolution image by adopting a lightweight segmentation network; and taking the region of the original image corresponding to the candidate region as an interest region, running a pre-trained detection model on the interest region, and determining the position of the small target in the original image.

Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a reduction unit, a first detection unit, and a second detection unit. The names of these units do not constitute a limitation on the unit itself in some cases, and the acquisition unit may also be described as "a unit that receives a web browsing request of a user", for example.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. A method for detecting a small target, comprising:

acquiring an original image comprising a small target through a vehicle-mounted camera;

reducing the original image to a low resolution image;

identifying a candidate region comprising the small target from the low-resolution image by adopting a lightweight segmentation network, wherein a backbone network of the segmentation network adopts a shuffle;

and filtering noise points of the candidate areas to form a minimum circumscribed rectangle surrounding all the rest suspected target points, taking a corresponding area of the rectangle in the unscaled high-resolution original image as an interest area, running a pre-trained detection model on the interest area, and determining the position of the small target in the original image.

2. The method of claim 1, wherein the detection model is trained by:

determining a network structure of an initial detection model and initializing network parameters of the initial detection model;

acquiring a training sample set, wherein the training sample comprises a sample image and labeling information for representing the position of a small target in the sample image;

enhancing the training sample by at least one of: copying, multi-scale changing and editing;

respectively taking sample images and labeling information in training samples in the enhanced training sample set as input and expected output of the initial detection model, and training the initial detection model by using a machine learning method;

and determining the initial detection model obtained through training as the pre-trained detection model.

3. The method of claim 2, wherein the training samples are compiled by:

digging out a small target from the sample image;

and randomly pasting the small target to other positions in the sample image after scaling and/or rotating operation to obtain a new sample image.

4. The method of claim 1, wherein the method further comprises:

when a training sample of the segmentation network is manufactured, setting the pixel points in a rectangular frame originally used for a detection task as positive samples, and setting the pixel points outside the rectangular frame as negative samples;

expanding a rectangular frame of a small target with the length and width smaller than the preset pixel number;

and setting the pixels in the rectangular frame after the expansion as positive samples.

5. A method according to any of claims 1-3, wherein the detection model is a deep neural network.

6. The method of claim 5, wherein attention module is directed after each prediction layer feature fusion, learning an appropriate weight for the features of the different channels.

7. An apparatus for detecting a small target, comprising:

an acquisition unit configured to acquire an original image including a small target through a vehicle-mounted camera;

a reduction unit configured to reduce the original image to a low resolution image;

a first detection unit configured to identify a candidate region including the small object from the low resolution image using a lightweight split network, wherein a backbone network of the split network uses a shufflelet;

and the second detection unit is configured to filter noise points of the candidate areas to form a minimum circumscribed rectangle surrounding all suspected target points, take a corresponding area of the rectangle in the unscaled high-resolution original image as an interest area, run a pre-trained detection model on the interest area and determine the position of the small target in the original image.

8. The apparatus of claim 7, wherein the apparatus further comprises a training unit configured to:

respectively taking sample images and labeling information in training samples in the enhanced training sample set as input and expected output of the initial detection model, and training the initial detection model by using a machine learning device;

9. The apparatus of claim 8, wherein the training unit is further configured to:

digging out a small target from the sample image;

10. The apparatus of claim 7, wherein the first detection unit is further configured to:

11. The apparatus of one of claims 7-10, wherein the detection model is a deep neural network.

12. The apparatus of claim 11, wherein the attention module is directed after each prediction layer feature fusion to learn an appropriate weight for the features of the different channels.

13. An electronic device for detecting small objects, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

14. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-6.