CN114821778A

CN114821778A - A method and device for dynamic recognition of underwater fish body posture

Info

Publication number: CN114821778A
Application number: CN202210432208.5A
Authority: CN
Inventors: 石晨; 张天野; 李道亮; 李振波
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2022-07-29
Anticipated expiration: 2042-04-22

Abstract

The invention provides a method and a device for dynamically identifying underwater fish posture, wherein the method comprises the following steps: determining a fish body image to be recognized; and performing target detection on the spliced image based on the fish body posture recognition model to obtain a target detection frame image, then performing down-sampling on the target detection frame image, generating a feature pyramid based on the down-sampled target detection frame image, and determining the fish body posture based on the features in the feature pyramid. The invention can determine the posture of the fish body in the moving state with high precision.

Description

A method and device for dynamic recognition of underwater fish body posture

技术领域technical field

本发明涉及图像识别技术领域，尤其涉及一种水下鱼体姿态动态识别方法及装置。The invention relates to the technical field of image recognition, in particular to a method and device for dynamic recognition of underwater fish body posture.

背景技术Background technique

水产养殖是人利用可供养殖的水域，按照养殖对象的生态习性和对水域环境条件的要求，运用水产养殖技术和设施，从事水生经济的养殖。鱼游泳时的行为状态跟其所处的环境有着密切关系，检测和识别鱼的行为姿态有助于判断其健康情况。Aquaculture is the use of the waters available for aquaculture, in accordance with the ecological habits of the aquaculture objects and the requirements for the environmental conditions of the waters, using aquaculture technology and facilities to engage in aquatic economic aquaculture. The behavior of fish swimming is closely related to the environment in which it is located. Detecting and recognizing the behavior and posture of fish can help to judge its health.

目前，多采用传统图像处理技术(如二值化、形状拟合、灰度值检测)和卷积神经网络进行鱼体姿态识别，但上述方法仅能对处于静止状态的鱼类进行姿态识别，无法准确对移动状态的鱼类进行姿态识别。At present, traditional image processing techniques (such as binarization, shape fitting, gray value detection) and convolutional neural networks are mostly used for fish gesture recognition, but the above methods can only perform gesture recognition for fish in a stationary state. It is impossible to accurately recognize the posture of the fish in the moving state.

发明内容SUMMARY OF THE INVENTION

本发明提供一种水下鱼体姿态动态识别方法及装置，用以解决现有技术中对移动状态的鱼类进行姿态识别精度较低的缺陷。The present invention provides a method and a device for dynamically recognizing the posture of an underwater fish body, which are used to solve the defect that the posture recognition accuracy of the fish in a moving state is low in the prior art.

本发明提供一种水下鱼体姿态动态识别方法，包括：The invention provides a method for dynamic recognition of underwater fish body posture, comprising:

确定待识别鱼体图像；Determine the image of the fish to be identified;

基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对所述目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于所述特征金字塔中的特征确定鱼体姿态；所述拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的；Based on the fish pose recognition model, target detection is performed on the spliced image, and after the target detection frame image is obtained, the target detection frame image is downsampled, and a feature pyramid is generated based on the downsampled target detection frame image, and based on the The features in the feature pyramid determine the fish body posture; the spliced image is obtained after splicing the data-enhanced fish body images to be identified;

所述鱼体姿态识别模型基于样本鱼体图像及其样本鱼体姿态标签训练得到。The fish body gesture recognition model is obtained by training based on the sample fish body image and the sample fish body gesture label.

根据本发明提供的一种水下鱼体姿态动态识别方法，所述基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对所述目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于所述特征金字塔中的特征确定鱼体姿态，包括：According to a method for dynamic recognition of underwater fish body posture provided by the present invention, the target detection is performed on a spliced image based on a fish body posture recognition model, and after a target detection frame image is obtained, the target detection frame image is down-sampled, And generate a feature pyramid based on the down-sampled target detection frame image, and determine the fish body posture based on the features in the feature pyramid, including:

基于所述鱼体姿态识别模型的输入层，对所述待识别鱼体图像进行数据增强，并对数据增强后得到的多个图像进行拼接，得到所述拼接图像，以及对所述拼接图像进行目标检测，得到所述目标检测框图像；Based on the input layer of the fish body gesture recognition model, data enhancement is performed on the fish body image to be recognized, and multiple images obtained after the data enhancement are spliced to obtain the spliced image, and the spliced image is processed target detection to obtain the target detection frame image;

基于所述鱼体姿态识别模型的采样层，对所述目标检测框图像依次进行切片和卷积操作，得到初始特征图，并对所述初始特征图进行多次下采样，得到下采样特征图；Based on the sampling layer of the fish pose recognition model, slice and convolute the target detection frame image in sequence to obtain an initial feature map, and perform multiple downsampling on the initial feature map to obtain a downsampled feature map ;

基于所述鱼体姿态识别模型的金字塔层，对上一层的特征图进行上采样后与下一层的特征图进行融合得到第一特征金字塔，以及对下一层的特征图进行下采样后与上一层的特征图进行融合得到第二特征金字塔，第一层的特征图为所述下采样特征图；Based on the pyramid layer of the fish body gesture recognition model, the feature map of the upper layer is upsampled and then fused with the feature map of the next layer to obtain a first feature pyramid, and the feature map of the next layer is downsampled. Fusion with the feature map of the previous layer to obtain a second feature pyramid, and the feature map of the first layer is the down-sampling feature map;

基于所述鱼体姿态是识别模型的预测层，根据所述第一特征金字塔和所述第二特征金字塔中的特征，确定所述鱼体姿态。Based on the fish body posture being the prediction layer of the recognition model, the fish body posture is determined according to the features in the first feature pyramid and the second feature pyramid.

根据本发明提供的一种水下鱼体姿态动态识别方法，所述对所述拼接图像进行目标检测，之前还包括：将所述拼接图像缩放至预设标准尺寸。According to the method for dynamically recognizing the posture of an underwater fish body provided by the present invention, before performing target detection on the spliced image, the method further includes: scaling the spliced image to a preset standard size.

根据本发明提供的一种水下鱼体姿态动态识别方法，所述对所述初始特征图进行多次下采样，得到下采样特征图，包括：According to a method for dynamic recognition of underwater fish body posture provided by the present invention, the initial feature map is down-sampled multiple times to obtain a down-sampled feature map, including:

对当前特征图进行下采样，得到当前初始下采样特征图，并基于所述采样层的残差组件对上一下采样特征图和所述当前初始下采样特征图进行融合，得到当前下采样特征图；第一个当前特征图为所述初始特征图。Downsampling the current feature map to obtain the current initial downsampling feature map, and fusing the previous downsampling feature map and the current initial downsampling feature map based on the residual component of the sampling layer to obtain the current downsampling feature map ; the first current feature map is the initial feature map.

根据本发明提供的一种水下鱼体姿态动态识别方法，所述得到下采样特征图，之后还包括：According to a method for dynamic recognition of underwater fish body posture provided by the present invention, the obtained down-sampling feature map further includes:

对所述下采样特征图依次进行卷积处理、归一化处理和最大池化处理。Convolution processing, normalization processing and maximum pooling processing are sequentially performed on the down-sampled feature map.

根据本发明提供的一种水下鱼体姿态动态识别方法，所述基于所述特征金字塔中的特征确定鱼体姿态，包括：According to a method for dynamic recognition of underwater fish body posture provided by the present invention, the determination of the fish body posture based on the features in the feature pyramid includes:

基于所述特征金字塔中的特征，确定多个候选检测框；Determine a plurality of candidate detection frames based on the features in the feature pyramid;

基于非极大值抑制算法，从多个候选检测框中筛选得到鱼体检测框，并基于所述鱼体检测框进行姿态识别，确定所述鱼体姿态。Based on the non-maximum value suppression algorithm, a fish body detection frame is obtained from a plurality of candidate detection frames, and gesture recognition is performed based on the fish body detection frame to determine the fish body posture.

根据本发明提供的一种水下鱼体姿态动态识别方法，所述鱼体姿态识别模型的损失值基于如下公式确定：According to a method for dynamic recognition of underwater fish body posture provided by the present invention, the loss value of the fish body posture recognition model is determined based on the following formula:

Loss＝-1/n∑(t[i]log(o[i])+(1-t[i])log(1-o[i]))；Loss=-1/n∑(t[i]log(o[i])+(1-t[i])log(1-o[i]));

其中，Loss表示所述鱼体姿态识别模型的损失值，o[i]表示所述鱼体姿态识别模型对样本鱼体图像进行预测得到的样本预测鱼体姿态，t[i]表示所述样本鱼体姿态标签。Among them, Loss represents the loss value of the fish body posture recognition model, o[i] represents the sample predicted fish body posture obtained by the fish body posture recognition model predicting the sample fish body image, and t[i] represents the sample fish body posture Fish pose label.

本发明还提供一种水下鱼体姿态动态识别装置，包括：The present invention also provides an underwater fish posture dynamic recognition device, comprising:

确定单元，用于确定待识别鱼体图像；a determining unit for determining the image of the fish to be identified;

识别单元，用于基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对所述目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于所述特征金字塔中的特征确定鱼体姿态；所述拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的；The recognition unit is used for performing target detection on the spliced image based on the fish body gesture recognition model, after obtaining the target detection frame image, down-sampling the target detection frame image, and generating a feature pyramid based on the down-sampled target detection frame image , and determine the fish body posture based on the features in the feature pyramid; the stitched image is obtained by stitching the data-enhanced fish body images to be identified;

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述水下鱼体姿态动态识别方法。The present invention also provides an electronic device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, when the processor executes the program, the underwater fish as described above can be realized Body pose dynamic recognition method.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述水下鱼体姿态动态识别方法。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the above-mentioned methods for dynamically recognizing the posture of an underwater fish.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述水下鱼体姿态动态识别方法。The present invention also provides a computer program product, including a computer program, which, when executed by a processor, implements any one of the above-mentioned methods for dynamically recognizing the posture of an underwater fish.

本发明提供的水下鱼体姿态动态识别方法及装置，基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对目标检测框图像进行下采样，从而使得下采样后的目标检测框图像中不仅保留有图像的特征信息，而且能够减少模型的计算参数，提高模型的识别效率。此外，基于下采样后的目标检测框图像生成特征金字塔，从而能够有效基于特征金字塔中的各尺度特征准确确定鱼体姿态。The underwater fish body posture dynamic recognition method and device provided by the present invention, based on the fish body posture recognition model, perform target detection on the spliced image, and after obtaining the target detection frame image, the target detection frame image is down-sampled, so that after the down-sampling The target detection frame image not only retains the feature information of the image, but also reduces the calculation parameters of the model and improves the recognition efficiency of the model. In addition, a feature pyramid is generated based on the down-sampled target detection frame image, so that the fish pose can be accurately determined based on the features of each scale in the feature pyramid effectively.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本发明提供的水下鱼体姿态动态识别方法的流程示意图之一；Fig. 1 is one of the schematic flow sheets of the underwater fish body posture dynamic identification method provided by the present invention;

图2是本发明提供的鱼体姿态识别模型采样层的结构示意图；Fig. 2 is the structural representation of the sampling layer of the fish body gesture recognition model provided by the present invention;

图3是本发明提供的采样层中Focus的结构示意图；Fig. 3 is the structural representation of Focus in the sampling layer provided by the present invention;

图4是本发明提供的采样层中CSP1_X的结构示意图；4 is a schematic structural diagram of CSP1_X in the sampling layer provided by the present invention;

图5是本发明提供的卷积层CBL的结构示意图；5 is a schematic structural diagram of a convolutional layer CBL provided by the present invention;

图6是本发明提供的金字塔层中CSP2_X的结构示意图；Fig. 6 is the structural representation of CSP2_X in the pyramid layer provided by the present invention;

图7是本发明提供的残差组件的结构示意图；7 is a schematic structural diagram of a residual component provided by the present invention;

图8是本发明提供的SPP的结构示意图；Fig. 8 is the structural representation of SPP provided by the present invention;

图9本发明提供的模型性能指标曲线示意图；9 is a schematic diagram of a model performance index curve provided by the present invention;

图10是本发明提供的水下鱼体姿态动态识别方法的流程示意图之二；Fig. 10 is the second schematic flow chart of the underwater fish posture dynamic identification method provided by the present invention;

图11是本发明提供的水下鱼体姿态动态识别装置的结构示意图；11 is a schematic structural diagram of an underwater fish body posture dynamic recognition device provided by the present invention;

图12是本发明提供的电子设备的结构示意图。FIG. 12 is a schematic structural diagram of an electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

此外，也有通过支持向量机技术进行鱼体姿态识别，但该方法依赖提取如下特征向量及滑动窗口等复杂算法，效率较低，而且该方法需要额外增加相应的硬件设施，维护成本较高。In addition, there is also a fish body gesture recognition through support vector machine technology, but this method relies on complex algorithms such as extracting the following feature vectors and sliding windows, which is inefficient, and this method requires additional hardware facilities and high maintenance costs.

对此，本发明提供一种水下鱼体姿态动态识别方法。图1是本发明提供的水下鱼体姿态动态识别方法的流程示意图之一，如图1所示，该方法包括如下步骤：In this regard, the present invention provides a dynamic recognition method for underwater fish body posture. Fig. 1 is one of the schematic flow charts of the underwater fish posture dynamic recognition method provided by the present invention, as shown in Fig. 1, the method comprises the following steps:

步骤110、确定待识别鱼体图像。Step 110: Determine the image of the fish body to be identified.

此处，待识别鱼体图像即需要进行鱼体姿态识别的图像。待识别鱼体图像可以是通过图像采集设备采集得到，此处图像采集设备可以是相机或者摄像头，也可以是装摄像头的智能手机、平板、电脑等智能设备。Here, the fish body image to be recognized is the image for which fish body gesture recognition needs to be performed. The image of the fish body to be identified may be acquired by an image acquisition device, where the image acquisition device may be a camera or a camera, or a smart device such as a smartphone, tablet, or computer equipped with a camera.

步骤120、基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于特征金字塔中的特征确定鱼体姿态；拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的；Step 120: Perform target detection on the spliced image based on the fish pose recognition model, after obtaining the target detection frame image, downsample the target detection frame image, and generate a feature pyramid based on the downsampled target detection frame image, and based on the feature The features in the pyramid determine the posture of the fish body; the stitched image is obtained by stitching the image of the fish body to be identified after data enhancement;

鱼体姿态识别模型基于样本鱼体图像及其样本鱼体姿态标签训练得到。The fish pose recognition model is trained based on sample fish images and their sample fish pose labels.

具体地，拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的。例如，在确定待识别鱼体图像后，可以对其进行数据增强，获得多个增强图像。其中，这些增强图像可以是基于Mosaic数据增强算法对待识别鱼体图像进行随机缩放、随机裁剪、随机排布后得到的。在对待识别鱼体图像进行数据增强后，所获取的拼接图像包含有待识别鱼体不同角度、大小等的姿态信息，从而能够基于拼接图像获取待识别鱼体在移动时更多细节姿态信息。Specifically, the stitched image is obtained by stitching the data-enhanced fish images to be identified. For example, after determining the fish body image to be recognized, data enhancement can be performed on it to obtain multiple enhanced images. Among them, these enhanced images may be obtained by randomly scaling, randomly cropping, and randomly arranging the fish images to be identified based on the Mosaic data enhancement algorithm. After data enhancement is performed on the image of the fish to be recognized, the acquired stitched image contains the attitude information of different angles and sizes of the fish to be recognized, so that more detailed attitude information of the fish to be recognized can be obtained based on the stitched image when it moves.

在得到拼接图像后，对拼接图像进行目标检测，从而可以从拼接图像中准确提取需要进行姿态识别的目标对象，得到目标检测框图像。在得到目标检测框图像后，对其进行下采样，不仅可以不丢失图像的特征信息，而且能够减少模型的计算参数，提高模型的识别效率。After the stitched image is obtained, target detection is performed on the stitched image, so that the target object that needs gesture recognition can be accurately extracted from the stitched image, and the target detection frame image is obtained. After obtaining the target detection frame image, down-sampling it can not only not lose the feature information of the image, but also reduce the calculation parameters of the model and improve the recognition efficiency of the model.

在对目标检测框图像进行下采样后，基于下采样后的目标检测框图像生成特征金字塔，并基于特征金字塔中的特征确定鱼体姿态。其中，特征金字塔可以是自上而下的FPN(feature pyramid networks)特征金字塔和自下而上的PAN(Pyramid AttentionNetwork)特征金字塔，二者的结合在FPN特征金字塔的基础上增加了自下而上方向上的增强，使顶层特征得到底层特征的位置信息，从而提高了姿态识别精度。After down-sampling the target detection frame image, a feature pyramid is generated based on the down-sampled target detection frame image, and the fish pose is determined based on the features in the feature pyramid. Among them, the feature pyramid can be the top-down FPN (feature pyramid networks) feature pyramid and the bottom-up PAN (Pyramid Attention Network) feature pyramid. The combination of the two increases the bottom-up feature based on the FPN feature pyramid. The enhancement in direction enables the top-level features to obtain the position information of the bottom-level features, thereby improving the gesture recognition accuracy.

在以理解的是，本发明实施例可以预先训练得到鱼体姿态识别模型，具体可以通过执行如下步骤实现：首先，收集大量样本鱼体图像，通过人工标注确定其对应的样本鱼体姿态标签。随即，基于样本鱼体图像及其样本鱼体姿态标签对初始模型进行训练，从而得到鱼体姿态识别模型。It should be understood that, in this embodiment of the present invention, a fish body gesture recognition model can be obtained by pre-training, which can be specifically implemented by performing the following steps: first, collect a large number of sample fish body images, and determine their corresponding sample fish body gesture labels through manual annotation. Immediately, the initial model is trained based on the sample fish images and their sample fish pose labels, thereby obtaining a fish pose recognition model.

作为一种可选实施例，样本鱼体图像来自于一个直径5米，深80厘米的养鱼池，鱼池中的水来源于经过过滤的海水，水温通过加热达到15度，摄像机被放置在水池底部，使用录像方法获取样本鱼体图像视频，视频像素为1920*1080，25帧每秒，格式为mp4。视频录制1个小时，使用Python对录像获取到的视频进行截取处理，按每两秒截取一张的方式获取一系列jpg格式的初始样本鱼体图像，共获取到了1800张初始样本鱼体图像并保存到硬盘，然后删除了一些低质量的初始样本鱼体图像，比如鱼体缺失，大量鱼体重叠，无鱼体，以及鱼体模糊不清的图像，最终获得了800张样本鱼体图像。As an optional embodiment, the image of the sample fish body comes from a fish pond with a diameter of 5 meters and a depth of 80 cm. The water in the fish pond comes from filtered seawater, the water temperature is heated to 15 degrees, and the camera is placed in the pond At the bottom, use the video recording method to obtain the sample fish image video, the video pixel is 1920*1080, 25 frames per second, and the format is mp4. The video was recorded for 1 hour, and Python was used to intercept the video obtained from the recording, and a series of initial sample fish images in jpg format were obtained by taking one image every two seconds. A total of 1800 initial sample fish images were obtained and Saved to hard disk, and then deleted some low-quality initial sample fish images, such as missing fish, a lot of overlapping fish, no fish, and images with blurred fish, and finally got 800 sample fish images.

此外，在获取样本鱼体图像后，可以使用图像增强算法ACE(Automatic ColorEqualization)对样本鱼体图像进行处理，ACE算法源自retinex算法，可以调整和拉伸图像的对比度，实现色彩和饱和度的标准化。通过差分计算目标点与周围像素点的相对明暗关系来矫正像素值，有很好的增强效果，可以纠正色偏，实现去雾效果，降噪效果，还可以实现暗场景的亮度增强，经过处理的样本鱼体图像没有了原图的蓝绿色色偏，亮度得到了提升，水下模糊的情况也得到了改善。再有，可以对样本鱼体图像使用makesense进行标注处理，针对不同姿态的鱼体分别标注不同标签，重叠和缺失的鱼不进行标注，得到样本鱼体姿态标签。然后使用python对样本鱼体图像进行划分，如可以按照8:2的比例随机划分为训练集与测试集。In addition, after the sample fish image is obtained, the image enhancement algorithm ACE (Automatic Color Equalization) can be used to process the sample fish image. The ACE algorithm is derived from the retinex algorithm, which can adjust and stretch the contrast of the image to achieve color and saturation. standardization. Correct the pixel value by differentially calculating the relative light-dark relationship between the target point and the surrounding pixels, which has a good enhancement effect, can correct the color cast, realize the dehazing effect, noise reduction effect, and can also realize the brightness enhancement of the dark scene. The sample fish image of the original image does not have the blue-green color cast of the original image, the brightness has been improved, and the underwater blur has also been improved. In addition, makesense can be used to label the sample fish image, and different labels are labeled for fish bodies with different postures, and overlapping and missing fish are not labeled, and the sample fish body posture labels are obtained. Then use python to divide the sample fish image, for example, it can be randomly divided into training set and test set according to the ratio of 8:2.

本发明实施例提供的水下鱼体姿态动态识别方法，基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对目标检测框图像进行下采样，从而使得下采样后的目标检测框图像中不仅保留有图像的特征信息，而且能够减少模型的计算参数，提高模型的识别效率。此外，基于下采样后的目标检测框图像生成特征金字塔，从而能够有效基于特征金字塔中的各尺度特征准确确定鱼体姿态。In the method for dynamic recognition of underwater fish body posture provided by the embodiment of the present invention, based on the fish body posture recognition model, target detection is performed on the spliced image, and after the target detection frame image is obtained, the target detection frame image is down-sampled, so that after the down-sampling, the target detection frame image is down-sampled. The target detection frame image not only retains the feature information of the image, but also reduces the calculation parameters of the model and improves the recognition efficiency of the model. In addition, a feature pyramid is generated based on the down-sampled target detection frame image, so that the fish pose can be accurately determined based on the features of each scale in the feature pyramid effectively.

基于上述实施例，基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于特征金字塔中的特征确定鱼体姿态，包括：Based on the above embodiment, based on the fish body gesture recognition model, target detection is performed on the spliced image, after obtaining the target detection frame image, the target detection frame image is down-sampled, and a feature pyramid is generated based on the down-sampled target detection frame image, and Determine the fish pose based on the features in the feature pyramid, including:

基于鱼体姿态识别模型的输入层，对待识别鱼体图像进行数据增强，并对数据增强后得到的多个图像进行拼接，得到拼接图像，以及对拼接图像进行目标检测，得到目标检测框图像；Based on the input layer of the fish pose recognition model, data enhancement is performed on the fish image to be recognized, and multiple images obtained after data enhancement are spliced to obtain a spliced image, and target detection is performed on the spliced image to obtain a target detection frame image;

基于鱼体姿态识别模型的采样层，对目标检测框图像依次进行切片和卷积操作，得到初始特征图，并对初始特征图进行多次下采样，得到下采样特征图；Based on the sampling layer of the fish pose recognition model, slicing and convolution operations are performed on the target detection frame image in turn to obtain the initial feature map, and the initial feature map is down-sampled multiple times to obtain the down-sampled feature map;

基于鱼体姿态识别模型的金字塔层，对上一层的特征图进行上采样后与下一层的特征图进行融合得到第一特征金字塔，以及对下一层的特征图进行下采样后与上一层的特征图进行融合得到第二特征金字塔，第一层的特征图为下采样特征图；Based on the pyramid layer of the fish pose recognition model, the feature map of the upper layer is up-sampled and then fused with the feature map of the next layer to obtain the first feature pyramid, and the feature map of the next layer is downsampled and then merged with the feature map of the next layer. The feature maps of one layer are fused to obtain a second feature pyramid, and the feature maps of the first layer are down-sampling feature maps;

基于鱼体姿态是识别模型的预测层，根据第一特征金字塔和第二特征金字塔中的特征，确定鱼体姿态。The fish body pose is the prediction layer of the recognition model, and the fish body pose is determined according to the features in the first feature pyramid and the second feature pyramid.

具体地，鱼体姿态识别模型可以采用YOLOv5模型结构，其可以包括输入层，采样层(Backbone层)，金字塔层(Neck层)和预测层(Prediction层)四层结构。Specifically, the fish pose recognition model may adopt the YOLOv5 model structure, which may include a four-layer structure of an input layer, a sampling layer (Backbone layer), a pyramid layer (Neck layer) and a prediction layer (Prediction layer).

其中，输入层可以包含Mosaic数据增强算法，自适应图片缩放算法和自适应锚框计算算法。采样层可以包含focus结构，CSP1_X结构和SPP结构。金字塔层可以包含CSP2_X，FPN和PAN结构。预测层可以包含GIOU_Loss以及NMS非极大值抑制算法。Among them, the input layer can include Mosaic data enhancement algorithm, adaptive image scaling algorithm and adaptive anchor box calculation algorithm. The sampling layer can contain focus structure, CSP1_X structure and SPP structure. Pyramid layers can contain CSP2_X, FPN and PAN structures. The prediction layer can contain GIOU_Loss and NMS non-maximum suppression algorithm.

基于鱼体姿态识别模型的输入层，对待识别鱼体图像进行数据增强，并对数据增强后得到的多个图像进行拼接，得到拼接图像，以及对拼接图像进行目标检测，得到目标检测框图像。其中，在对待识别鱼体图像进行数据增强时，可以采用Mosaic数据增强算法，如对任意四张图进行随机缩放、随机裁剪、随机排布后拼接，提高了对小目标和遮挡目标的检测准确度和模型的鲁棒性，降低了模型的内存需求，丰富了数据集。Mosaic9是在Mosaic基础上，将随机缩放、随机裁剪、随机排布的图片增加到9张，进一步提高了鲁棒性，降低了模型对内存的需求。在YOLO算法中，针对不同的数据集，都会有初始长宽的锚框。在对鱼体姿态识别模型进行训练的过程中，在初始锚框的基础上输出预测框，进而和真实框进行比对，计算两者差距，再反向更新，迭代网络参数。每次训练时，自适应地计算不同训练集中的最佳锚框值。Based on the input layer of the fish pose recognition model, data enhancement is performed on the fish body image to be recognized, and multiple images obtained after data enhancement are spliced to obtain a spliced image, and target detection is performed on the spliced image to obtain a target detection frame image. Among them, Mosaic data enhancement algorithm can be used for data enhancement of fish images to be recognized, such as random scaling, random cropping, random arrangement and splicing of any four images, which improves the detection accuracy of small targets and occluded targets. It reduces the memory requirements of the model and enriches the dataset. Mosaic9 is based on Mosaic, which increases the number of randomly scaled, randomly cropped, and randomly arranged pictures to 9, which further improves the robustness and reduces the model's memory requirements. In the YOLO algorithm, for different data sets, there will be anchor boxes with initial length and width. In the process of training the fish pose recognition model, the predicted frame is output based on the initial anchor frame, and then compared with the real frame, the difference between the two is calculated, and then the network parameters are updated in the reverse direction. At each training time, the optimal anchor box values in different training sets are adaptively calculated.

如图2所示，采样层包含focus结构，CSP1_X结构和SPP结构。Focus结构，通过将原始608×608×3的图像输入Focus结构，切片操作后变成304×304×12的特征图，再经过一次32个卷积核的卷积操作，最终变成304×304×32的特征图，该结构的作用是在图像进行下采样的过程中避免图片特征信息的丢失，从而在使用卷积操作时提取到更加充分的特征。该模块在backbone之前进行切片操作，如图3所示，选取每隔一个像素的值进行下采样，在没有丢失信息的情况下得到四张特征互补的图片，并将W、H信息集中到通道空间，输出空间扩充了四倍，原先的三通道变成了十二通道，从而在卷积操作后得到没有信息丢失的二倍下采样特征图，Focus结构减少了卷积的成本，提升了速度。此外，如图4和图5所示，在CSP1_X结构前设置一个卷积层，卷积核的大小为3×3，步长为2，使特征图的尺寸都缩小一半，起到下采样的作用，Backbone中有4个CSP1_X结构，最后输出的特征图大小为19×19。As shown in Figure 2, the sampling layer contains focus structure, CSP1_X structure and SPP structure. Focus structure, by inputting the original 608 × 608 × 3 image into the Focus structure, after the slicing operation, it becomes a 304 × 304 × 12 feature map, and after a convolution operation of 32 convolution kernels, it finally becomes 304 × 304 ×32 feature map, the function of this structure is to avoid the loss of image feature information in the process of image downsampling, so that more sufficient features can be extracted when using convolution operations. This module performs slicing operation before backbone, as shown in Figure 3, selects the value of every other pixel for downsampling, obtains four pictures with complementary features without losing information, and concentrates the W and H information into the channel space, the output space has been expanded by four times, and the original three channels have become twelve channels, so that after the convolution operation, a double downsampling feature map without information loss can be obtained. The Focus structure reduces the cost of convolution and improves the speed. . In addition, as shown in Figure 4 and Figure 5, a convolution layer is set before the CSP1_X structure, the size of the convolution kernel is 3 × 3, and the stride is 2, so that the size of the feature map is reduced by half, and the downsampling is achieved. Function, there are 4 CSP1_X structures in Backbone, and the final output feature map size is 19×19.

基于鱼体姿态识别模型的金字塔层，对上一层的特征图进行上采样后与下一层的特征图进行融合得到第一特征金字塔(即FPN特征金字塔)，以及对下一层的特征图进行下采样后与上一层的特征图进行融合得到第二特征金字塔(即PAN特征金字塔)。Based on the pyramid layer of the fish pose recognition model, the feature map of the upper layer is up-sampled and then fused with the feature map of the next layer to obtain the first feature pyramid (ie, the FPN feature pyramid), and the feature map of the next layer is obtained. After downsampling, it is fused with the feature map of the previous layer to obtain the second feature pyramid (ie, the PAN feature pyramid).

例如，金字塔层可以为FPN+PAN结构，结合下采样和上采样，生成特征金字塔，使网络具有尺度不变性。FPN结构产生的特征金字塔是自上而下的，采样层提取的特征图经过下采样分别得到大小为76×76×255，38×38×255，19×19×255的特征，其中255表示255维输出，FPN将上一层特征上采样结果与下一层特征进行融合得到特征金字塔。本发明实施例在FPN的基础上增加了PAN结构，与FPN相反的是，PAN特征金字塔是自下而上的，将FPN的下一层特征下采样结果与上一层特征进行融合得到。FPN主要通过融合高低层特征提升目标检测的效果，尤其可以提高小尺寸目标的检测效果；PAN结构对网络浅层特征进行分割，网络浅层特征信息对于目标检测非常重要，因为目标检测是像素级别的分类，浅层信息多是边缘等特征，二者的结合在FPN的基础上增加了自下而上方向上的增强，使顶层特征得到底层特征的位置信息，提升了大物体的检测效果。其中，金字塔层还包括图6所示的CSP2_X结构。For example, the pyramid layer can be an FPN+PAN structure, combining downsampling and upsampling to generate feature pyramids, making the network scale invariant. The feature pyramid generated by the FPN structure is top-down, and the feature maps extracted by the sampling layer are down-sampled to obtain features with sizes of 76 × 76 × 255, 38 × 38 × 255, and 19 × 19 × 255, where 255 means 255 Dimensional output, FPN fuses the upper-layer feature upsampling result with the next-layer feature to obtain a feature pyramid. The embodiment of the present invention adds a PAN structure on the basis of FPN. Contrary to FPN, the PAN feature pyramid is bottom-up, obtained by fusing the lower-level feature downsampling result of FPN with the upper-level feature. FPN mainly improves the effect of target detection by fusing high and low-level features, especially the detection effect of small-sized targets; PAN structure segments the network shallow features, and the network shallow feature information is very important for target detection, because target detection is pixel-level The shallow information is mostly edge and other features. The combination of the two increases the bottom-up enhancement on the basis of FPN, so that the top-level features can obtain the position information of the bottom-level features, which improves the detection effect of large objects. Among them, the pyramid layer also includes the CSP2_X structure shown in FIG. 6 .

在得到第一金字塔和第二金字塔后，鱼体姿态是识别模型的预测层根据第一特征金字塔和第二特征金字塔中的特征，识别确定鱼体姿态。After the first pyramid and the second pyramid are obtained, the fish body pose is identified and determined by the prediction layer of the recognition model according to the features in the first feature pyramid and the second feature pyramid.

基于上述任一实施例，对拼接图像进行目标检测，之前还包括：将拼接图像缩放至预设标准尺寸。Based on any of the foregoing embodiments, performing target detection on the stitched image further includes: scaling the stitched image to a preset standard size.

具体地，输入层还可以包括自适应图片缩放算法，其用于在对拼接图像进行目标检测之前，将拼接图像缩放至预设标准尺寸，以添加尽可能少的黑边，从而大幅提高模型的计算速度。Specifically, the input layer may also include an adaptive image scaling algorithm, which is used to scale the stitched image to a preset standard size before performing target detection on the stitched image, so as to add as few black borders as possible, thereby greatly improving the model's performance. Calculate speed.

基于上述任一实施例，对初始特征图进行多次下采样，得到下采样特征图，包括：Based on any of the above embodiments, the initial feature map is down-sampled multiple times to obtain a down-sampled feature map, including:

对当前特征图进行下采样，得到当前初始下采样特征图，并基于采样层的残差组件对上一下采样特征图和当前初始下采样特征图进行融合，得到当前下采样特征图；第一个当前特征图为初始特征图。Downsampling the current feature map to obtain the current initial downsampling feature map, and fuses the previous downsampling feature map and the current initial downsampling feature map based on the residual component of the sampling layer to obtain the current downsampling feature map; the first The current feature map is the initial feature map.

如图7所示，采样增还包括多个残差组件和一个张量拼接模块，并使用SiLU作为激活函数，残差组件使用恒等映射对上一下采样特征图和当前初始下采样特征图进行融合，得到当前下采样特征图，以防止在卷积过程中发生梯度爆炸和网络退化，从而加强了模型的学习能力，减少了计算瓶颈，减少了内存消耗。As shown in Figure 7, the sampling increase also includes multiple residual components and a tensor splicing module, and uses SiLU as the activation function. The residual component uses the identity mapping to perform the previous downsampling feature map and the current initial downsampling feature map Fusion to obtain the current down-sampled feature map to prevent gradient explosion and network degradation during the convolution process, thereby enhancing the learning ability of the model, reducing computational bottlenecks, and reducing memory consumption.

基于上述任一实施例，得到下采样特征图，之后还包括：Based on any of the above-mentioned embodiments, a down-sampling feature map is obtained, which further includes:

对下采样特征图依次进行卷积处理、归一化处理和最大池化处理。Convolution, normalization, and max pooling are performed on the downsampled feature maps in sequence.

具体地，采样层还可以包括SPP结构，如图8所示，SPP结构先通过1×1卷积处理，然后归一化处理，归一化主要运算步骤为对所有数据求均值与方差，之后像素值与均值求差之后除以方差进行规范化,同时加入偏移因子与尺度变化因子控制归一化后的值，因子的值由模型在训练中学习得到的。然后可以分别采用5/9/13的进行最大池化处理，再进行concat张量拼接，最后使用SiLU作为激活函数，从而有效避免了对图像区域剪裁、缩放操作导致的图像失真等问题，提高了感受野，解决了模型对图像特征重复提取的问题，大大提高了运算速度，节省了计算成本。Specifically, the sampling layer may also include an SPP structure. As shown in Figure 8, the SPP structure is first processed by 1×1 convolution and then normalized. The main operation steps of normalization are to calculate the mean and variance of all data, and then The difference between the pixel value and the mean is divided by the variance for normalization, and an offset factor and a scale change factor are added to control the normalized value. The value of the factor is learned by the model during training. Then, 5/9/13 can be used for maximum pooling, concat tensor splicing, and finally SiLU is used as the activation function, which effectively avoids image distortion caused by cropping and scaling operations on the image area. The receptive field solves the problem of repeated extraction of image features by the model, greatly improves the operation speed and saves the calculation cost.

基于上述任一实施例，基于特征金字塔中的特征确定鱼体姿态，包括：Based on any of the above-mentioned embodiments, the fish body posture is determined based on the features in the feature pyramid, including:

基于特征金字塔中的特征，确定多个候选检测框；Determine multiple candidate detection frames based on the features in the feature pyramid;

基于非极大值抑制算法，从多个候选检测框中筛选得到鱼体检测框，并基于鱼体检测框进行姿态识别，确定鱼体姿态。Based on the non-maximum suppression algorithm, a fish body detection frame is obtained by screening multiple candidate detection frames, and gesture recognition is performed based on the fish body detection frame to determine the fish body posture.

具体地，基于特征金字塔中的特征，可以得到多个候选检测框，通常这些候选检测框包含着相同的内容，因此需要从中筛选得到最佳的检测框作为鱼体检测框，以进行姿态识别从而可以简化检测结果。因此，本发明实施例基于非极大抑制算法(NMS)，从多个候选检测框中筛选得到鱼体检测框，从而可以对于一些遮挡重叠的目标，能够得到一定的改进，且原本未检测出的目标在进行NMS操作后能够检测识别到。Specifically, based on the features in the feature pyramid, multiple candidate detection frames can be obtained. Usually, these candidate detection frames contain the same content. Therefore, it is necessary to filter out the best detection frame as the fish body detection frame for gesture recognition. Test results can be simplified. Therefore, the embodiment of the present invention is based on the non-maximum suppression algorithm (NMS), and the fish detection frame is obtained from multiple candidate detection frames, so that some occlusion and overlapping targets can be improved to a certain extent, and the original detection frame is not detected. The target can be detected and identified after the NMS operation.

其中，在进行非极大抑制操作时，可以确定对应的损失值为：GIoUloss＝1-(IoU-|Ac-U|)/|Ac|，其中，IoU为交并比，Ac为预测框与真实框最小闭包的区域面积，U为预测框与真实框的并集面积。该公式的意思为：先计算两个框的最小闭包区域面积，再计算出IoU与闭包区域中不属于两个框的区域占闭包区域的比重，最后用IoU减去这个比重得到GIoU，最终的Boundingbox损失为1-GIoU。Among them, when the non-maximum suppression operation is performed, the corresponding loss value can be determined: GIoUloss=1-(IoU-|Ac-U|)/|Ac|, where IoU is the intersection ratio, Ac is the prediction frame and The area of the minimum closure of the ground truth box, U is the union area of the predicted box and the ground truth box. The meaning of this formula is: first calculate the minimum closure area of the two boxes, then calculate the proportion of the IoU and the area that does not belong to the two boxes in the closure area to the closure area, and finally subtract this proportion from the IoU to get the GIoU , the final Boundingbox loss is 1-GIoU.

基于上述任一实施例，鱼体姿态识别模型的损失值基于如下公式确定：Based on any of the above embodiments, the loss value of the fish body gesture recognition model is determined based on the following formula:

其中，Loss表示鱼体姿态识别模型的损失值，o[i]表示鱼体姿态识别模型对样本鱼体图像进行预测得到的样本预测鱼体姿态，t[i]表示样本鱼体姿态标签。Among them, Loss represents the loss value of the fish pose recognition model, o[i] represents the sample predicted fish pose obtained by the fish pose recognition model predicting the sample fish image, and t[i] represents the sample fish pose label.

此外，本发明实施例中采用的SiLU激活函数具备无上界有下界、平滑、非单调的特性，是Swish激活函数的一种特例：f(x)＝x*σ(x)f′(x)＝f(x)+σ(x)(1-f(x))。In addition, the SiLU activation function used in the embodiment of the present invention has the characteristics of no upper bound and lower bound, smooth and non-monotonic, and is a special case of the Swish activation function: f(x)=x*σ(x)f′(x )=f(x)+σ(x)(1−f(x)).

作为一种可选实施例，鱼体姿态识别模型运行环境可以采用google的colab平台，数据和YOLOv5模型文件存放在google云盘上，训练时将模型导入colab平台运行，YOLOv5采用6.1版本，python版本为3.7，所需的库文件使用requirements文件安装，云盘容量为15GB，内存容量为8GB，GPU为Tesla K80，显存大小为11441MB，训练时开启CUDA。As an optional embodiment, the operating environment of the fish body gesture recognition model can use the colab platform of google, the data and YOLOv5 model files are stored on the google cloud disk, and the model is imported into the colab platform to run during training. YOLOv5 uses version 6.1 and python version For 3.7, the required library files are installed using the requirements file, the cloud disk capacity is 15GB, the memory capacity is 8GB, the GPU is Tesla K80, the video memory size is 11441MB, and CUDA is enabled during training.

YOLOv5模型的参数设置为：使用yolov5n模型，使用对scratch-low优化后的超参数，epoch设置为100，batch-size设置为16，图片大小设置为640，开启cache，开启agnostic-nms，开启augment，开启FP16 half，optimizer设置为AdamW，开启label-smoothing并设置为0.1，conf-thres设置为0.5，iou-thres设置为0.5。The parameter settings of the YOLOv5 model are: use the yolov5n model, use the hyperparameters optimized for scratch-low, set the epoch to 100, set the batch-size to 16, set the image size to 640, enable cache, enable agnostic-nms, and enable augment , turn on FP16 half, set optimizer to AdamW, turn on label-smoothing and set it to 0.1, set conf-thres to 0.5, and set iou-thres to 0.5.

开启Cache后，在训练前会将数据集导入显存中，不需每次使用时再从硬盘读取，加快了训练速度。agnostic-nms为nms非极大值抑制，开启后可以减少重复且冗余的的目标框，iou-thres为开启非极大值抑制时使用的交并比值，conf-thres为置信度阈值设置，训练结果中只显示置信度大于这个值的目标框，augment为一种基于HSV色彩空间的数据增强方式，开启后可以一定程度的扩充数据量，提高模型鲁棒性。AdamW是在Adam基础上添加了L2正则化之后得到的梯度下降算法，减轻了过拟合现象，同时收敛速度明显优于Adam和SGD。FP16 half是采用半浮点精度，可以在基本不影响模型精度的情况下，大幅减少内存占用，加快运算速度。label-smoothing是一种正则化策略，主要是通过soft one-hot来加入噪声，减少了真实样本标签的类别在计算损失函数时的权重，最终起到抑制过拟合的效果，提高模型的泛化能力。After Cache is turned on, the dataset will be imported into the video memory before training, and there is no need to read it from the hard disk every time it is used, which speeds up the training speed. agnostic-nms is nms non-maximum suppression, which can reduce repeated and redundant target boxes after enabling it, iou-thres is the intersection ratio used when non-maximum suppression is enabled, conf-thres is the confidence threshold setting, Only target boxes with a confidence greater than this value are displayed in the training results. Augment is a data enhancement method based on the HSV color space. After opening, it can expand the data volume to a certain extent and improve the robustness of the model. AdamW is a gradient descent algorithm obtained after adding L2 regularization on the basis of Adam, which reduces the overfitting phenomenon and has a significantly better convergence speed than Adam and SGD. FP16 half adopts half floating-point precision, which can greatly reduce the memory usage and speed up the operation without affecting the accuracy of the model. Label-smoothing is a regularization strategy. It mainly uses soft one-hot to add noise, which reduces the weight of the real sample label category when calculating the loss function, and finally has the effect of suppressing overfitting and improving the generalization of the model. transformation ability.

此外，为了提高模型的收敛速度，提高训练效率，减少迭代次数，YOLOv5模型的超参数使用evolve超参数进化算法获得，如共进行了100个epoch的针对不同且相互正交的超参数的训练，每个epoch训练到第10代为止。In addition, in order to improve the convergence speed of the model, improve the training efficiency, and reduce the number of iterations, the hyperparameters of the YOLOv5 model are obtained using the evolve hyperparameter evolution algorithm. Each epoch is trained until the 10th generation.

超参数初始值为YOLOv5中hyp.scratch-low文件中的值，超参数评价指标fitness设置为0.1×mAP(0.5)+0.9×mAP(0.5:0.95)，该方法通过最大化fitness指标获得最优的超参数，该方法采用了遗传算法，对于进化过程中的每一代，都选择了之前fitness分数最高的前一代进行突变。所有的超参数将有约20％的几率发生突变变为其他值。The initial value of the hyperparameter is the value in the hyp.scratch-low file in YOLOv5, and the fitness of the hyperparameter evaluation index is set to 0.1×mAP(0.5)+0.9×mAP(0.5:0.95). This method obtains the optimal value by maximizing the fitness index. The hyperparameter of the method uses a genetic algorithm. For each generation in the evolution process, the previous generation with the highest fitness score is selected for mutation. All hyperparameters will mutate to other values about 20% of the time.

优化后的值为：lr0学习率0.008，lr1学习率0.07，学习动量0.85，权重衰减0.0006，预热代数3.4，预热学习动量0.95，预热学习初始偏差学习率0.09，锚框大小0.065，iou阈值0.2，色调变换0.001，饱和度增强0.63，亮度增强0.43，图像旋转概率0.0009，图像平移概率0.12，图像缩放概率0.9，图像翻转概率0.001，图像镜像概率0.5，mosaic图像增强概率0.85，mixup图像混合概率0.05，每一层锚框数量4.6。The optimized values are: lr0 learning rate 0.008, lr1 learning rate 0.07, learning momentum 0.85, weight decay 0.0006, warm-up algebra 3.4, warm-up learning momentum 0.95, warm-up learning initial bias learning rate 0.09, anchor box size 0.065, iou Threshold 0.2, Hue Shift 0.001, Saturation Boost 0.63, Brightness Boost 0.43, Image Rotation Prob 0.0009, Image Shift Prob 0.12, Image Scaling Prob 0.9, Image Flip Prob 0.001, Image Mirror Prob 0.5, Mosaic Image Enhance Prob 0.85, Mixup Image Mix The probability is 0.05, and the number of anchor boxes in each layer is 4.6.

接着，使用搭载部署好的YOLOv5 6.1版本的计算机对图片进行训练，该模型自带图像增强算法，可以实现对原始图像的镜像翻转，裁剪拼接，旋转平移，复制混合，调节亮度色相饱和度等来扩充数据集。YOLOv5算法通过梯度下降迭代更新前向传播中的权重矩阵和偏置，来减小预测框与真实框之间的损失，并根据更新后的权值获得误差代价函数的值，并使用优化器调整学习率，最终得到鱼体姿态识别模型。Then, use the computer equipped with the deployed YOLOv5 6.1 version to train the image. The model has its own image enhancement algorithm, which can realize the mirror flip, crop and stitch, rotate and translate, copy and mix, adjust the brightness, hue and saturation of the original image. Expand the dataset. The YOLOv5 algorithm iteratively updates the weight matrix and bias in the forward propagation through gradient descent to reduce the loss between the predicted frame and the real frame, and obtains the value of the error cost function according to the updated weights, and uses the optimizer to adjust Learning rate, and finally get the fish pose recognition model.

其中，可以采用如下参数来评价模型的性能：Among them, the following parameters can be used to evaluate the performance of the model:

FN：被判定为负样本，但事实上是正样本。FN: It is judged as a negative sample, but it is actually a positive sample.

FP：被判定为正样本，但事实上是负样本。FP: It is judged as a positive sample, but it is actually a negative sample.

TN：被判定为负样本，事实上也是负样本。TN: It is judged as a negative sample, which is actually a negative sample.

TP：被判定为正样本，事实上也是证样本。TP: It is judged as a positive sample, which is actually a proof sample.

交并比IoU＝A∩B/A∪B。The intersection and union ratio IoU=A∩B/A∪B.

精确度precision＝TP/(TP+FP)。Precision precision=TP/(TP+FP).

召回率recall＝TP/(TP+FN)。Recall rate recall=TP/(TP+FN).

F1-score＝2(Precision×Recall)/(Precision+Recall)。F1-score=2(Precision×Recall)/(Precision+Recall).

mAP＝1/m∑AP(i)，i∈[0，m)，i∈N+，m为类别数，AP(i)为第i类类别的平均精度。mAP=1/m∑AP(i), i∈[0,m), i∈N+, m is the number of categories, AP(i) is the average precision of the i-th category.

mAP(0.5)表示IoU为0.5时的mAP，TP为IoU>0.5的检测框数量，FP为IoU<＝0.5的检测框数量。mAP(0.5) represents mAP when IoU is 0.5, TP is the number of detection boxes with IoU>0.5, and FP is the number of detection boxes with IoU<=0.5.

mAP(0.5:0.95)表示在不同IoU阈值(从0.5到0.95，步长0.05)上的平均mAP。mAP (0.5:0.95) represents the average mAP over different IoU thresholds (from 0.5 to 0.95, step size 0.05).

YOLOv5根据深度和宽度分为五种结构，分别为v5n，v5s，v5m，v5l，v5x，本发明实施例可以采用结构最简单的v5n结构，这个结构运行速度最快，和其余四个模型相比，预测精度并没有明显的降低。YOLOv5 is divided into five structures according to depth and width, namely v5n, v5s, v5m, v5l, and v5x. In this embodiment of the present invention, the v5n structure with the simplest structure can be adopted. This structure has the fastest running speed, compared with the other four models. , the prediction accuracy is not significantly reduced.

如图9所示，鱼体姿态识别模型实现了95.1％的精确度，94.9％的召回率，97.8％的mAP(0.5)分数以及83.0％的mAP(0.5:0.95)分数，各评价指标在100个epoch处基本均已收敛。通过对模型mAP分数的评估，进一步证明了本发明实施例提供的方法具有不错的鲁棒性和精度，而且识别速度可高达120FPS，能够实现在水下环境中对鱼体进行准实时的移动目标全自动化识别与定位。As shown in Figure 9, the fish pose recognition model achieves 95.1% precision, 94.9% recall, 97.8% mAP (0.5) score and 83.0% mAP (0.5:0.95) score, each evaluation index is in 100 All epochs have basically converged. Through the evaluation of the mAP score of the model, it is further proved that the method provided by the embodiment of the present invention has good robustness and accuracy, and the recognition speed can be as high as 120FPS, which can realize the quasi-real-time moving target of the fish body in the underwater environment Fully automatic identification and positioning.

基于上述任一实施例，本发明还提供一种水下鱼体姿态动态识别方法，如图10所示，该方法包括：Based on any of the above embodiments, the present invention also provides a method for dynamic recognition of underwater fish body posture, as shown in FIG. 10 , the method includes:

首先，获取视频图像数据，并对视频图像数据依次进行图像预处理和图像标注。接着，将标注后的视频图像数据随机划分为训练集和测试集，采用训练集进行模型搭建并训练鱼体姿态检测模型(该模型可以为YOLOv5-F模型)。First, video image data is acquired, and image preprocessing and image annotation are sequentially performed on the video image data. Next, the labeled video image data is randomly divided into training set and test set, and the training set is used to build a model and train a fish pose detection model (the model can be the YOLOv5-F model).

在模型训练完成后，采用测试集对模型的性能及准确率进行评价，判断是否达标，若是，则可以对实时视频数据进行姿态识别；若否，则需要继续采集数据对模型进行训练。After the model training is completed, use the test set to evaluate the performance and accuracy of the model to determine whether it meets the standard. If so, the gesture recognition can be performed on the real-time video data;

下面对本发明提供的水下鱼体姿态动态识别装置进行描述，下文描述的水下鱼体姿态动态识别装置与上文描述的水下鱼体姿态动态识别方法可相互对应参照。The device for dynamic recognition of underwater fish body posture provided by the present invention is described below. The device for dynamic recognition of underwater fish body posture described below and the method for dynamic recognition of underwater fish body posture described above can be referred to each other correspondingly.

基于上述任一实施例，本发明还提供一种水下鱼体姿态动态识别装置，如图11所示，该装置包括：Based on any of the above embodiments, the present invention also provides an underwater fish body posture dynamic recognition device, as shown in FIG. 11 , the device includes:

确定单元1110，用于确定待识别鱼体图像；a determining unit 1110, configured to determine the image of the fish to be identified;

识别单元1120，用于基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对所述目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于所述特征金字塔中的特征确定鱼体姿态；所述拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的；The identification unit 1120 is configured to perform target detection on the spliced image based on the fish body gesture recognition model, after obtaining the target detection frame image, downsample the target detection frame image, and generate features based on the downsampled target detection frame image Pyramid, and determine the fish body posture based on the features in the feature pyramid; the spliced image is obtained by splicing the data-enhanced fish body images to be identified;

基于上述任一实施例，所述识别单元1120，包括：Based on any of the above embodiments, the identifying unit 1120 includes:

输入单元，用于基于所述鱼体姿态识别模型的输入层，对所述待识别鱼体图像进行数据增强，并对数据增强后得到的多个图像进行拼接，得到所述拼接图像，以及对所述拼接图像进行目标检测，得到所述目标检测框图像；The input unit is used for performing data enhancement on the fish body image to be recognized based on the input layer of the fish body gesture recognition model, and splicing the multiple images obtained after the data enhancement to obtain the spliced image, and The stitched image is subjected to target detection to obtain the target detection frame image;

采样单元，用于基于所述鱼体姿态识别模型的采样层，对所述目标检测框图像依次进行切片和卷积操作，得到初始特征图，并对所述初始特征图进行多次下采样，得到下采样特征图；The sampling unit is used for sequentially slicing and convolution operations on the target detection frame image based on the sampling layer of the fish body gesture recognition model to obtain an initial feature map, and performing multiple downsampling on the initial feature map, Get the down-sampling feature map;

金字塔单元，用于基于所述鱼体姿态识别模型的金字塔层，对上一层的特征图进行上采样后与下一层的特征图进行融合得到第一特征金字塔，以及对下一层的特征图进行下采样后与上一层的特征图进行融合得到第二特征金字塔，第一层的特征图为所述下采样特征图；The pyramid unit is used to upsample the feature map of the upper layer and fuse the feature map of the next layer to obtain the first feature pyramid based on the pyramid layer of the fish body gesture recognition model, and to obtain the first feature pyramid, and to the features of the next layer. After the image is down-sampled, it is fused with the feature map of the previous layer to obtain a second feature pyramid, and the feature map of the first layer is the down-sampling feature map;

预测单元，用于基于所述鱼体姿态是识别模型的预测层，根据所述第一特征金字塔和所述第二特征金字塔中的特征，确定所述鱼体姿态。A prediction unit, configured to determine the fish body posture according to the features in the first feature pyramid and the second feature pyramid based on the fish body posture being a prediction layer of a recognition model.

基于上述任一实施例，所述装置还包括，缩放单元，用于对所述拼接图像进行目标检测之前，将所述拼接图像缩放至预设标准尺寸。Based on any of the foregoing embodiments, the apparatus further includes a scaling unit, configured to scale the stitched image to a preset standard size before performing target detection on the stitched image.

基于上述任一实施例，所述采样单元，用于：Based on any of the foregoing embodiments, the sampling unit is configured to:

基于上述任一实施例，所述装置还包括：Based on any of the foregoing embodiments, the apparatus further includes:

处理单元，用于得到下采样特征图之后，对所述下采样特征图依次进行卷积处理、归一化处理和最大池化处理。The processing unit is configured to sequentially perform convolution processing, normalization processing and maximum pooling processing on the down-sampling feature map after obtaining the down-sampling feature map.

候选单元，用于基于所述特征金字塔中的特征，确定多个候选检测框；a candidate unit for determining a plurality of candidate detection frames based on the features in the feature pyramid;

筛选单元，用于基于非极大值抑制算法，从多个候选检测框中筛选得到鱼体检测框，并基于所述鱼体检测框进行姿态识别，确定所述鱼体姿态。The screening unit is configured to screen out a plurality of candidate detection frames to obtain a fish body detection frame based on a non-maximum value suppression algorithm, and perform gesture recognition based on the fish body detection frame to determine the fish body posture.

基于上述任一实施例，所述鱼体姿态识别模型的损失值基于如下公式确定：Based on any of the above embodiments, the loss value of the fish body gesture recognition model is determined based on the following formula:

图12是本发明提供的电子设备的结构示意图，如图12所示，该电子设备可以包括：处理器(processor)1210、存储器(memory)1220、通信接口(Communications Interface)1230和通信总线1240，其中，处理器1210，存储器1220，通信接口1230通过通信总线1240完成相互间的通信。处理器1210可以调用存储器1220中的逻辑指令，以执行水下鱼体姿态动态识别方法，该方法包括：确定待识别鱼体图像；基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对所述目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于所述特征金字塔中的特征确定鱼体姿态；所述拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的；所述鱼体姿态识别模型基于样本鱼体图像及其样本鱼体姿态标签训练得到。FIG. 12 is a schematic structural diagram of an electronic device provided by the present invention. As shown in FIG. 12 , the electronic device may include: a processor (processor) 1210, a memory (memory) 1220, a communication interface (Communications Interface) 1230 and a communication bus 1240, The processor 1210 , the memory 1220 , and the communication interface 1230 communicate with each other through the communication bus 1240 . The processor 1210 can call the logic instructions in the memory 1220 to execute a method for dynamic recognition of underwater fish body posture. The method includes: determining a fish body image to be recognized; After the frame image is detected, the target detection frame image is downsampled, and a feature pyramid is generated based on the downsampled target detection frame image, and the fish pose is determined based on the features in the feature pyramid; The fish body image to be identified after data enhancement is obtained by splicing; the fish body gesture recognition model is obtained by training based on the sample fish body image and the sample fish body gesture label.

此外，上述的存储器1220中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the memory 1220 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，计算机能够执行上述各方法所提供的水下鱼体姿态动态识别方法，该方法包括：确定待识别鱼体图像；基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对所述目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于所述特征金字塔中的特征确定鱼体姿态；所述拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的；所述鱼体姿态识别模型基于样本鱼体图像及其样本鱼体姿态标签训练得到。In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer When executing, the computer can execute the underwater fish body posture dynamic recognition method provided by the above methods, and the method includes: determining the fish body image to be recognized; based on the fish body posture recognition model, performing target detection on the spliced images to obtain a target detection frame After the image, the target detection frame image is downsampled, and a feature pyramid is generated based on the downsampled target detection frame image, and the fish body posture is determined based on the features in the feature pyramid; the stitched image is a data enhancement method. The fish body image to be recognized is obtained after splicing; the fish body gesture recognition model is obtained by training based on the sample fish body image and the sample fish body gesture label.

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各提供的水下鱼体姿态动态识别方法，该方法包括：确定待识别鱼体图像；基于鱼体姿态识别模型，对拼接图像进行目标检测，得到目标检测框图像后，对所述目标检测框图像进行下采样，以及基于下采样后的目标检测框图像生成特征金字塔，并基于所述特征金字塔中的特征确定鱼体姿态；所述拼接图像是对数据增强后的待识别鱼体图像进行拼接后得到的；所述鱼体姿态识别模型基于样本鱼体图像及其样本鱼体姿态标签训练得到。In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, it is realized to execute the above-mentioned methods for dynamic recognition of underwater fish body posture, The method includes: determining a fish body image to be recognized; performing target detection on a spliced image based on a fish body gesture recognition model, and after obtaining a target detection frame image, down-sampling the target detection frame image, and based on the down-sampled target The detection frame image generates a feature pyramid, and the fish body posture is determined based on the features in the feature pyramid; the spliced image is obtained by splicing the data-enhanced fish body images to be identified; the fish body posture recognition model is based on The sample fish images and their sample fish pose labels are trained.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without any creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. a method for dynamic identification of underwater fish body posture, is characterized in that, comprises:

Determine the image of the fish to be identified;

Based on the fish pose recognition model, target detection is performed on the spliced image, and after the target detection frame image is obtained, the target detection frame image is downsampled, and a feature pyramid is generated based on the downsampled target detection frame image, and based on the The features in the feature pyramid determine the fish body posture; the stitched image is obtained by stitching the data-enhanced fish body images to be identified;

The fish body gesture recognition model is obtained by training based on the sample fish body image and the sample fish body gesture label.

2. underwater fish body posture dynamic recognition method according to claim 1, is characterized in that, described based on fish body posture recognition model, carries out target detection to spliced image, after obtaining target detection frame image, to described target detection The frame image is down-sampled, and a feature pyramid is generated based on the down-sampled target detection frame image, and the fish body pose is determined based on the features in the feature pyramid, including:

Based on the input layer of the fish body gesture recognition model, data enhancement is performed on the fish body image to be recognized, and multiple images obtained after the data enhancement are spliced to obtain the spliced image, and the spliced image is processed target detection to obtain the target detection frame image;

Based on the sampling layer of the fish pose recognition model, slice and convolute the target detection frame image in sequence to obtain an initial feature map, and perform multiple downsampling on the initial feature map to obtain a downsampled feature map ;

Based on the pyramid layer of the fish body gesture recognition model, the feature map of the upper layer is upsampled and then fused with the feature map of the next layer to obtain a first feature pyramid, and the feature map of the next layer is downsampled. Fusion with the feature map of the previous layer to obtain a second feature pyramid, and the feature map of the first layer is the down-sampling feature map;

Based on the fish body posture being the prediction layer of the recognition model, the fish body posture is determined according to the features in the first feature pyramid and the second feature pyramid.

3 . The method for dynamically recognizing underwater fish body posture according to claim 2 , wherein the performing target detection on the spliced image further comprises: scaling the spliced image to a preset standard size. 4 .

4. The method for dynamic recognition of underwater fish body posture according to claim 2, wherein the initial feature map is subjected to multiple downsampling to obtain a downsampling feature map, comprising:

Downsampling the current feature map to obtain the current initial downsampling feature map, and fusing the previous downsampling feature map and the current initial downsampling feature map based on the residual component of the sampling layer to obtain the current downsampling feature map ; the first current feature map is the initial feature map.

5. underwater fish body posture dynamic identification method according to claim 2, is characterized in that, described obtains downsampling feature map, also comprises afterwards:

Convolution processing, normalization processing and maximum pooling processing are sequentially performed on the down-sampled feature map.

6. The underwater fish body posture dynamic identification method according to any one of claims 1 to 5, wherein the fish body posture is determined based on the feature in the feature pyramid, comprising:

Determine a plurality of candidate detection frames based on the features in the feature pyramid;

Based on the non-maximum value suppression algorithm, a fish body detection frame is obtained from a plurality of candidate detection frames, and gesture recognition is performed based on the fish body detection frame to determine the fish body posture.

7. according to the underwater fish body posture dynamic identification method described in any one of claim 1 to 5, it is characterized in that, the loss value of described fish body posture recognition model is determined based on following formula:

Loss=-1/n∑(t[i]log(o[i])+(1-t[i])log(1-o[i]));

Among them, Loss represents the loss value of the fish body posture recognition model, o[i] represents the sample predicted fish body posture obtained by the fish body posture recognition model predicting the sample fish body image, and t[i] represents the sample fish body posture Fish pose label.

8. an underwater fish body posture dynamic identification device, is characterized in that, comprises:

a determining unit for determining the image of the fish to be identified;

The recognition unit is used for performing target detection on the spliced image based on the fish body gesture recognition model, after obtaining the target detection frame image, down-sampling the target detection frame image, and generating a feature pyramid based on the down-sampled target detection frame image , and determine the fish body posture based on the features in the feature pyramid; the stitched image is obtained by stitching the data-enhanced fish body images to be identified;

9. An electronic device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the program as claimed in claim 1 when executing the program The method for dynamic recognition of underwater fish body posture described in any one of to 7.

10. A non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program realizes the underwater fish posture as described in any one of claims 1 to 7 when the computer program is executed by the processor Dynamic identification method.