CN113569921B

CN113569921B - A ship classification and recognition method and device based on GNN

Info

Publication number: CN113569921B
Application number: CN202110766734.0A
Authority: CN
Inventors: 李湉雨; 胥辉旗; 曾维贵; 张润萍; 程永茂; 刘亮; 刘明刚; 杨利斌
Original assignee: Coastal Defense College Of Naval Aviation University Of Chinese Pla
Current assignee: Coastal Defense College Of Naval Aviation University Of Chinese Pla
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2025-02-18
Anticipated expiration: 2041-07-07
Also published as: CN113569921A

Abstract

The present invention provides a ship classification and identification method and device based on GNN, the method comprising: extracting the features of ship AIS data, constructing a sample set, the sample set being a three-dimensional matrix; converting the sample set into graph structure data, and dividing the sample set into a training set and a test set; training a GNN network model with the training set, inputting the features of the ship AIS data of all samples to be tested in the test set into the trained GNN network model to test the effectiveness of the GNN network, and classifying the ships to be classified using the GNN network that has passed the test, the GNN network model being a GNN neural network model with two layers of graph convolution layers. According to the scheme of the present invention, by using the ship trajectory, spatial features can be effectively extracted for machine learning, and the accuracy of ship trajectory classification and identification can be improved.

Description

Ship classification and identification method and device based on GNN

Technical Field

The invention relates to the field of pattern recognition, in particular to a ship classification recognition method and device based on GNN.

Background

Ship classification is widely used in both military and civilian fields, such as detecting illegal ships by related departments, vigilance to maritime terrorism, countering smuggling, etc. At present, the method for researching ship type classification at home and abroad mainly takes traditional radar identification and optical identification as main aspects, but has limitations, such as that the optical identification depends on video monitoring equipment, the visual field range is limited, the acting distance is short, the influence of weather factors such as rain and fog is easy to influence, and particularly the limitation on weather conditions such as high humidity and low cloud at sea is large. Although the radar identification is slightly affected by the environment, the radar identification has the problems of visibility and indistinguishability, and common-frequency interference clutter is easy to generate in a complex electromagnetic environment. The scheme for carrying out ship classification and identification according to the AIS is little in weather influence, can automatically identify the ship state in all weather, is high in data acquisition precision, can acquire static data such as voyage and ship self attribute, and therefore the AIS has important significance for ship classification and identification. The AIS data has the characteristics of large data volume and wide coverage area range, and brings certain challenges to classification and identification.

The traditional research method of ship classification mainly comprises a clustering algorithm based on the distance between track points, a machine learning algorithm after manually extracting features, and a neural network classification method. In recent years, deep neural networks are becoming a research hotspot compared to methods combining artificial feature extraction with machine learning. At present, the neural network used for classifying the ship track mainly comprises a CNN (computer numerical network), an MCDCNN (micro-channel digital network), a 1DCNN (computer numerical network), a convolutional neural network and other traditional cyclic neural networks, but the data processed by the convolutional neural network is in a matrix form, is based on a matrix formed by arranging samples, belongs to a Euclidean structure, and if the characteristics of the samples are regarded as vertexes, the vertexes of the traditional neural network are independent and do not utilize the connection between the vertexes, while the cyclic neural network is modeled based on a time sequence, the disadvantage is that the characteristics of the samples are insufficient and the connection between the recommended characteristics of different samples cannot be utilized, so that the characteristic learning is incomplete, and the classification result is not ideal.

Disclosure of Invention

In order to solve the technical problems, the invention provides a ship classification and identification method and device based on GNN, which are used for solving the technical problems that the prior art does not utilize the connection between data and the classification result is not ideal enough.

According to a first aspect of the present invention, there is provided a GNN-based ship classification recognition method, the method comprising the steps of:

Step S101, extracting characteristics of ship AIS data, and constructing a sample total set, wherein the sample total set is a three-dimensional matrix;

Step S102, training a GNN network model by a training set, inputting the characteristics of ship AIS data of all samples to be tested in a testing set into the trained GNN network model to test the effectiveness of the GNN network, and classifying ships to be classified by utilizing the GNN network passing the testing;

The method comprises the steps of taking each track as a sample, wherein the sample aggregate is a three-dimensional matrix, the first dimension of the three-dimensional matrix is the track number S= { S ₁,…,S_i,…,S_num } of AIS data, the second dimension is the track point number N on the track S _i of the AIS data, the third dimension is the attribute of each track point and comprises IMO, h, v, t-stamp, lat, lon, wherein IMO is the ship IMO code, h is the ship bow direction characteristic, v is the speed, t-stamp is a time stamp, lat is the track point latitude value, and lon is the track point longitude value;

Converting the sample total set into graph structure data G= (V, edge), wherein V is a vertex and Edge is an Edge connected with the vertex, constructing a vertex feature matrix M by taking a bow-to-ship feature h of a track point as a vertex feature, and constructing an adjacent matrix B by calculating the weight of the Edge connected with the vertex according to a navigational speed feature V;

The GNN network model is a GNN neural network model with two layers of graph convolution layers.

According to a second aspect of the present invention, there is provided a GNN-based ship classification recognition device, the device comprising:

the characteristic acquisition module is configured to extract characteristics of ship AIS data, and construct a sample total set, wherein the sample total set is a three-dimensional matrix;

the classification module is configured to train the GNN network model by the training set, input the characteristics of ship AIS data of all samples to be tested in the testing set into the trained GNN network model to test the effectiveness of the GNN network, and classify ships to be classified by utilizing the GNN network passing the test;

According to a third aspect of the present invention, there is provided a GNN-based ship classification recognition system, comprising:

a processor for executing a plurality of instructions;

A memory for storing a plurality of instructions;

wherein the plurality of instructions are for storing by the memory and loading and executing by the processor the GNN-based ship classification identification method as described above.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored therein a plurality of instructions for loading and executing by a processor a GNN based ship classification identification method as described hereinbefore.

According to the scheme of the invention, as the traditional neural network can only process the ordered Euclidean structure, the characteristic connection between different vertexes cannot be effectively utilized, the ship track has time-space domain characteristics, and the sample characteristics belong to the unordered non-Euclidean structure. According to the method, the connection relation information between the track points is extracted by utilizing time-space domain features, such as position features, distance features and speed features, included in the ship track, a topological association network is established, and space features can be effectively extracted for machine learning. Firstly, mapping of track point data and vertexes and edges in a graph structure is constructed, key characteristics of the vertexes are extracted, adjacent matrixes are constructed by weight assignment of the edges, the track point data is changed into the graph data structure, and the GNN is input for training, so that the method for classifying and identifying the ship types overcomes the defects, and the accuracy of classifying and identifying the ship tracks can be improved.

The foregoing description is only an overview of the present invention, and is intended to provide a better understanding of the present invention, as it is embodied in the following description, with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention, illustrate the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of a marine classification and identification method based on GNN of the present invention;

FIG. 2 is a diagram of data of one embodiment of the present invention;

FIG. 3 is a schematic diagram of a GNN network model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of training a GNN network model, in accordance with one embodiment of the present invention;

Fig. 5 is a block diagram showing a configuration of a GNN-based ship classification/identification apparatus according to an embodiment of the present invention.

Detailed Description

First, a GNN-based ship classification recognition method according to an embodiment of the present invention will be described with reference to fig. 1. As shown in fig. 1, the method comprises the steps of:

The graph neural network (Graph Neural Networks, GNN) is a deep learning model for processing graph data, can effectively utilize the characteristic connection among different samples, and is popular in the fields of social networks, knowledge maps, molecular chemistry and the like.

As shown in FIG. 2, the Graph (Graph) is composed of edges (edges) denoted by e and vertices (vertices) denoted by v. Each vertex in the graph contains a respective feature, the feature of the vertex can be represented by a matrix M in the X Y dimension, and the edges represent the relationship between the respective vertices, and can form a matrix B in the X dimension, referred to as an adjacency matrix. M and B are inputs to the neural network model.

Before said step S101, a step S100 is included;

step S100, determining the proportion of a training set and a testing set in a sample total set;

In this embodiment, the data ratio in the training set and the test set is 8:2, the training set is used to train the network model, and the test set is input into the network to classify the data to be tested.

The method comprises the steps of S101, extracting characteristics of ship AIS data, constructing a sample total set, converting the sample total set into graph structure data, and dividing the sample total set into a training set and a testing set, wherein the sample total set is a three-dimensional matrix, and the method comprises the following steps:

The method comprises the steps of obtaining continuous N track points conforming to a preset rule from each track S _i, forming a track S _i 'by the continuous N track points, extracting attributes of track points of the track S _i', and constructing a sample total set by extracting the characteristics of all samples, wherein the IMO is the ship IMO code, h is the bow direction, v is the speed, t-stamp is the time stamp, lat is the track point latitude value, lon is the track point longitude value, 1.ltoreq.i.ltoreq.Num, num is the track total number of the same IMO, the extracted AIS data is characterized by a three-dimensional matrix M _Num*N*6, and the sample total set is divided into a training set and a test set.

In this embodiment, the preset rule is that a predetermined time interval threshold is set, and a time interval between every two adjacent track points in the continuous N track points is smaller than the predetermined time interval threshold.

Because the time interval of the collection of the port AIS data is irregular, some adjacent track points are spaced for a few seconds, some adjacent track points are spaced for a few minutes or even tens of minutes, and the longer the time interval of the collection data is, the worse the sample quality is, therefore, a proper time interval threshold TT (Time threshold) which is as small as possible is necessary to be selected, and the reliability of the data is ensured. The number N of the track points contained in the data segment determines the classification recognition accuracy, and the more N the recognition accuracy is, so that proper N is selected as much as possible, and the effectiveness of the sample is ensured. Generally, the larger the time interval threshold, the larger the number of tracks in the sample segment, i.e. the reliability and validity of the sample is a pair of contradictions, so it is necessary to find an equilibrium state, and to make the sample have more N under the premise of smaller TT. Through testing, in this embodiment, the parameter tt=20 (in seconds) is selected and n=160 can meet the requirement through multiple experimental comparisons.

And converting the ship characteristic data into topological structure diagram data, and constructing an adjacency matrix by utilizing the characteristic description vertexes and edges. Only by selecting proper characteristic data as the input of the neural network, the effectiveness of ship track classification can be improved. In the embodiment, the ship bow direction characteristic is taken as the vertex characteristic, and the navigational speed characteristic is taken as the weight of the side to construct the adjacency matrix.

In this embodiment, a trace is used as a sample, and a sample is transformed into a vertex of the graph structure.

Converting the sample aggregate into graph structure data, comprising:

Step S1011, determining the receptive field of the vertices in the graph structure, comprising:

calculating the average Hash distance between samples to be measured

Wherein, Is the Hash distance between the nth track point in track S _i and the nth track point in track S _j.

And sorting the Hash distances in order from small to large, reserving data with the Hash distance value of the first 5% so as to ensure strong connection relation among the vertexes, and setting a relation strength threshold, for example, setting data which is exactly equal to the 5% as the relation strength threshold. The strong connection relation is represented by 1, the weak connection relation is represented by 0, and a relation matrix R based on the characteristic of the spatial distance connection strength is constructed to determine the receptive field of the vertex. The dimension of the relation matrix I is X, X representing the number of samples, i.e. the number of vertices of the graph.

Where Thr represents the relationship strength threshold value,Is the average haar distance between the samples to be measured.

In this embodiment, points with small haar distances represent strong connection relations between the distance features between the vertices, and points with large haar distances represent weak connection relations between the distance features between the vertices.

Step S1012, calculating the two norms of the average navigational speed difference of any two samples according to the average navigational speed ave_v of all track points in the samples, taking the two norms as the weights of edges in the graph structure to obtain a weight matrix E, wherein the dimension of the weight matrix E is X multiplied by X,

Wherein the method comprises the steps ofFor the average navigational speed of the trajectory S _i,The average navigational speed of the track S _j is indicated.

Step S1013, constructing an adjacent matrix B based on the weight matrix E:

Multiplying the relation matrix R point based on the space distance connection strength characteristic by the weight matrix E of the edge to obtain an adjacent matrix B, wherein the dimension of the adjacent matrix B is X multiplied by X,

B=R·E

And normalizing the adjacent matrix B:

wherein min (B) is the minimum value in matrix B, max (B) is the maximum value in matrix, and B (i, j) is the normalized adjacent matrix;

step S1014, extracting the bow-to-ship characteristic of the track point as a vertex characteristic, and constructing a vertex characteristic matrix M, wherein the dimension of the vertex characteristic matrix M is X multiplied by 1.

Step S102, training a GNN network model by a training set, inputting the characteristics of ship AIS data of all samples to be tested in a testing set into the trained GNN network model to test the effectiveness of the GNN network, classifying ships to be classified by using the GNN network passing the test, and the method comprises the following steps:

the GNN network model is a GNN neural network model with two layers of graph convolution layers, and further:

In this embodiment, as shown in fig. 3, the dots represent the vertices of the graph, and have different labels, and the input graph structure data passes through the GNN network structure and outputs different classification results of the labels.

In this embodiment, the input-output relationship of the first layer graph convolution layer is:

In the formula, h _j is the characteristic value of the vertex j in the input data, h _i is the characteristic value of the vertex i of the output data, sigma is an activation function, n _j∈Neigh(n_i) represents the receptive field of which the value range of the vertex j is the vertex i, W _1τ is the convolution kernel of the convolution layer of the first layer of graph, Normalization for the laplace matrix:

LAPRAS[i,j]=A^-1/2B_ijA^1/2

Wherein b=b+i, B is the normalized adjacency matrix, I is the identity matrix, a is the degree matrix of B, and the formula is a _ij＝∑_jB_ij.

The input-output relationship of the convolution layer of the second layer graph is as follows:

In the formula, h _j is the characteristic value of the vertex j in the input data, h _i is the characteristic value of the vertex i of the output data, sigma is an activation function, n _j∈Neigh(n_i) represents the receptive field of which the value range of the vertex j is the vertex i, W _2τ is the convolution kernel of the convolution layer of the second layer graph, Normalizing the Laplace matrix;

LAPRAS[i,j]=A^-1/2B_ijA^1/2

For the normalized adjacency matrix to be added to the identity matrix,Is thatA diagonal matrix formed by summing each column of elements,For the i-th row element of the diagonal matrix,Representing the ith row and ith column elements of the diagonal matrix,To the power of the negative half of the diagonal matrix,To the power of half the diagonal matrix.

As shown in fig. 4, the training process of the GNN network model is as follows:

Step 301, acquiring a training set in a sample total set, selecting all types of ship AIS sample data in the training set, and extracting characteristics of the AIS data of each sample data;

in this embodiment, 80% of the total set of samples is used as a training set, and 20% is used as a test set.

Wherein the extracting the AIS data characteristic of each sample data comprises:

The ship IMO codes the same AIS data, the ship IMO codes comprise at least one track S _i, each track S _i comprises a plurality of track points, the track points in each track S _i are sequentially arranged according to time stamps, continuous N track points conforming to a preset rule are obtained from each track S _i, the track S _i 'is formed by the continuous N track points, the attribute of each track point of the track S _i' is extracted, the attribute comprises IMO, h, v, t-stamp, lat, lon, the IMO is the ship IMO codes, h is the bow direction, v is the speed, t-stamp is the time stamp, lat is the track point latitude value, lon is the track point longitude value, i is not less than 1 and not more than Num, num is the total track number of the same IMO, and the extracted AIS data is characterized by a three-dimensional matrix M _Num*N*6.

Step S302, inputting the characteristics of the AIS data of each sample data in a training set into the GNN network until a preset training stopping condition is met, so as to obtain a trained GNN network;

And S303, inputting the characteristics of the ship AIS data of all the samples to be tested in the test set into a trained GNN network model to conduct vertex classification so as to test the effectiveness of the GNN network, and classifying the ships to be classified by utilizing the GNN network passing the test.

In this embodiment, the graph neural network training process trains the convolution kernel.

Further, before step S100, the ship AIS data preprocessing is performed, including:

The method comprises the steps of S1, constructing a ship characteristic information table, wherein an IMO (inertial measurement unit), a time stamp, a ship fore direction, a navigational speed, a track point latitude and a track point longitude are extracted from AIS (automatic identification system) data to serve as values, the IMO of the ship is taken as a main key value, namely, the track characteristic of each ship is stored according to an IMO number, and the track point data of each IMO are arranged according to the time stamp sequence;

Step S2, carrying out data cleaning on ship AIS data, wherein the step comprises the following steps:

And discarding dirty data meeting the data cleaning condition through data analysis, wherein the dirty data comprises abnormal position data and redundant position data, the abnormal position data refers to that the distance difference between two adjacent track point data is larger than a first preset distance threshold value when the time interval is smaller than a first preset time interval, and the redundant position data refers to that the characteristic properties of the two adjacent track point data are identical.

In this embodiment, the data analyzed and processed is ship track data in a fixed sea area, and dirty data needs to be discarded in order to make the classification effect ideal.

The judgment strategy is as follows:

1. For the data of the same key, the time interval and the distance interval between the (i+1) th track point and the (i) th track point are calculated, and the distance calculation formula is HAVERSINE formula in consideration of the curvature of the earth.

The distance calculated by HAVERSINE formula is called Hash distance for short.

Where l represents the distance between the two track points, R represents the earth radius, typically 6371Km, x _lat1 represents the latitude of x ₁, y _lon1 represents the longitude of y ₁, x _lat2 represents the latitude of x ₂, y _lon2 represents the longitude of y ₂, and phi is the input of the HAVERSINE formula.

If the Hash distance between two track points is too large in a short time interval, the i+1th track point to the n track point of the IMO ship are abnormal data, and the abnormal data need to be discarded, wherein n represents all track points under the same time window of the IMO ship.

2. If the i+1th track point is identical to the i track point, the i+1th track point is redundant data and needs to be discarded.

The embodiment of the invention further provides a ship classification and identification device based on GNN, as shown in fig. 5, the device comprises:

The embodiment of the invention further provides a ship classification and identification system based on GNN, which comprises the following steps:

a processor for executing a plurality of instructions;

A memory for storing a plurality of instructions;

The embodiment of the invention further provides a computer readable storage medium, wherein a plurality of instructions are stored in the storage medium, and the instructions are used for loading and executing the ship classification identification method based on the GNN by a processor.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for making a computer device (which may be a personal computer, a physical machine Server, or a network cloud Server, etc., and need to install a Windows or Windows Server operating system) execute part of the steps of the methods described in the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The above description is only of the preferred embodiments of the present invention, and is not intended to limit the present invention in any way, but any simple modification, equivalent variation and modification made to the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A GNN-based ship classification method, the method comprising the steps of:

The GNN network model is a GNN neural network model with two layers of graph convolution layers;

the converting the sample total set into graph structure data includes:

calculating the average Hash distance between samples to be measured

Wherein, Is the Hash distance between the nth track point in track S _i and the nth track point in track S _j;

Setting a relation strength threshold, sequencing Hash distances from small to large, wherein distance features between vertexes represented by the Hash distances smaller than the relation strength threshold are strong connection relations, distance features between vertexes represented by the Hash distances larger than the relation strength threshold are weak connection relations, the strong connection relations are represented by 1, the weak connection relations are represented by 0, a relation matrix R based on the space distance connection strength features is constructed to determine receptive fields of the vertexes, and the dimension of the relation matrix I is X multiplied by X, wherein X represents the number of samples, namely the number of vertexes of a graph:

where Thr represents the relationship strength threshold value, The average Hash distance between samples to be measured;

step S1012, calculating the two norms of the average navigational speed difference of any two samples to be tested according to the average navigational speed ave_v of all track points in the samples to be tested, and taking the norms as the weights of edges in the graph structure to obtain a weight matrix E, wherein the dimension of the weight matrix E is X multiplied by X,

Wherein the method comprises the steps ofFor the average navigational speed of the trajectory S _i,Representing the average navigational speed of the track S _j;

step S1013, constructing an adjacent matrix B based on the weight matrix E:

B=R·E

And normalizing the adjacent matrix B:

wherein min (B) is the minimum value in matrix B, max (B) is the maximum value in matrix, The normalized adjacency matrix;

2. The GNN-based ship classification recognition method according to claim 1, wherein prior to step S101, performing data cleaning on the ship AIS data, comprises:

Discarding dirty data meeting the data cleaning condition, wherein the dirty data comprises abnormal position data and redundant position data, the abnormal position data refers to that the distance difference between two adjacent track point data is larger than a first preset distance threshold value when the time interval is smaller than a first preset time interval, and the redundant position data refers to that the characteristic properties of the two adjacent track point data are completely identical.

3. A GNN-based ship classification and identification device, the device comprising:

4. The GNN-based ship classification recognition device of claim 3, wherein the feature acquisition module comprises:

a receptive field determination submodule configured to determine receptive fields for vertices in the graph structure, comprising:

calculating the average Hash distance between samples to be measured

The weight matrix acquisition sub-module is configured to calculate the two norms of the average navigational speed difference of any two samples to be detected according to the average navigational speed ave_v of all track points in the samples to be detected, and the two norms are used as the weights of edges in a graph structure to obtain a weight matrix E, the dimension of which is X multiplied by X,

An adjacency matrix acquisition sub-module configured to construct an adjacency matrix B based on the weight matrix E:

B=R·E

And normalizing the adjacent matrix B:

and the track point extraction submodule is configured to extract the bow-direction characteristics of the track points as vertex characteristics and construct a vertex characteristic matrix M, and the dimension of the vertex characteristic matrix M is X multiplied by 1.

5. GNN-based ship classification recognition device according to claim 3 or 4, characterized in that the device comprises:

The data cleaning module is configured to discard dirty data meeting data cleaning conditions, the dirty data comprises abnormal position data and redundant position data, the abnormal position data refers to that the distance difference between two adjacent track point data is larger than a first preset distance threshold value when the time interval is smaller than a first preset time interval, and the redundant position data refers to that the characteristic properties of the two adjacent track point data are identical.

6. A GNN-based ship classification recognition system, comprising:

a processor for executing a plurality of instructions;

A memory for storing a plurality of instructions;

Wherein the plurality of instructions are for storage by the memory and loading and executing by the processor the GNN-based ship classification identification method of any one of claims 1-2.

7. A computer readable storage medium having stored therein a plurality of instructions for loading and executing by a processor the GNN-based ship classification recognition method of any one of claims 1-2.