CN115410048B

CN115410048B - Training of image classification model, image classification method, device, equipment and medium

Info

Publication number: CN115410048B
Application number: CN202211202259.5A
Authority: CN
Inventors: 李朝林; 王志鹏; 张凡; 王国亚; 王勇; 欧阳剑
Original assignee: Kunlun Core Beijing Technology Co ltd
Current assignee: Kunlun Core Beijing Technology Co ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2024-03-19
Anticipated expiration: 2042-09-29
Also published as: CN115410048A

Abstract

The disclosure provides a training method of an image classification model, an image classification method, a device, equipment and a medium, and relates to the field of artificial intelligence, in particular to the field of deep learning. The specific implementation scheme is as follows: acquiring a training image set in a target image format, and constructing a bypass module for filtering all signals in a key frequency band according to the key frequency band matched with the training image set; identifying at least one low-dimensional image feature extraction layer in an image classification model to be trained, and connecting the bypass module in parallel with the output end of the at least one low-dimensional image feature extraction layer to obtain an improved model; training the improved model by using the training image set, and removing each bypass module connected in parallel in the trained improved model to obtain the target image classification model so as to expand the image format adapted by the target image classification model. By the technical scheme, the robustness of the image classification model can be improved.

Description

Training of image classification model, image classification method, device, equipment and medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to deep learning, and specifically relates to a training method of an image classification model, an image classification method, a device, equipment and a medium.

Background

With the development of the deep learning field, the robustness of the deep learning model becomes more important in the use process. For example, for the fields of automatic driving, face recognition and the like, the robustness of the deep learning model may be directly related to the personal safety of a client; for example, in the field of model deployment in an isolated environment, when the model input data picture type cannot be perceived in advance, it is possible that the model can maintain a high model prediction capability under a lossy compression format such as jpg, but when the model suddenly becomes an input picture in a bmp format, the prediction capability of the model is poor, and so on. Thus, it is really necessary to improve the robustness of the model.

In the prior art, the model robustness is generally improved from data, improved from a model structure, improved from a training tuning mode and improved from a countermeasure attack. However, increasing model robustness from the data increases the training time of the model; the improvement of the robustness of the model from the model structure not only requires the model designer to have abundant experience, but also increases the generation time of the countermeasure sample; the robustness of the model is improved in a training optimization mode, so that the service form and ingenious mathematical design corresponding to the model are needed to be known in depth; increasing model robustness from counter attacks also increases model training time. Therefore, how to effectively improve the robustness of the image classification model is a problem to be solved at present.

Disclosure of Invention

The disclosure provides a training and image classification method, device, equipment and medium for an image classification model.

According to an aspect of the present disclosure, there is provided a training method of an image classification model, including:

acquiring a training image set in a target image format, and constructing a bypass module for filtering all signals in a key frequency band according to the key frequency band matched with the training image set;

identifying at least one low-dimensional image feature extraction layer in an image classification model to be trained, and connecting the bypass module in parallel with the output end of the at least one low-dimensional image feature extraction layer to obtain an improved model;

and training the improved model by using the training image set, and removing each bypass module connected in parallel in the trained improved model to obtain a target image classification model so as to expand an image format adapted to the target image classification model.

According to another aspect of the present disclosure, there is provided an image classification method including:

acquiring an image to be classified in a first image format, and inputting the image to be classified into a pre-trained target image classification model;

the target image classification model is obtained after model training is carried out on an improved model by using a training image set in a second image format, and the improved model is obtained by constructing a bypass module which is constructed according to a key frequency band matched with the training image set and is used for filtering all signals in the key frequency band;

And obtaining an image classification result which is output by the target image classification model and matched with the image to be classified.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a training method or an image classification method of an image classification model according to any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the training method or the image classification method of the image classification model according to any of the embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of training an image classification model provided in accordance with an embodiment of the present disclosure;

FIG. 2 is a flow chart of another method of training an image classification model provided in accordance with an embodiment of the present disclosure;

FIG. 3 is a schematic structural view of an improved model provided in accordance with an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a bypass module provided in accordance with an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another bypass module provided in accordance with an embodiment of the present disclosure;

FIG. 6 is a flow chart of an alternative training method for an image classification model provided in accordance with an embodiment of the present disclosure;

FIG. 7 is a flow chart of an image classification method provided in accordance with an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a training device for an image classification model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural view of an image classification apparatus according to an embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device for implementing a training method or image classification method of an image classification model in accordance with an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Since the input used to train the model itself is a small data set and the millions of scenarios that need to be predicted are an open set, for some data sets that have not been input by the training model, problems with model predictions may result. Therefore, it is desirable to have the model as robust as possible to improve the model so that the model can cope with more likely scenes.

In the prior art, various methods for improving the robustness of the model can be generally classified into improving the robustness of the model from data, improving the robustness of the model from a model structure, improving the robustness of the model from a training optimization mode and improving the robustness of the model from a countermeasure attack. 1) The robustness of the model is improved from the data, more data needs to be collected, more data needs to be generated, the data needs to be scaled, the data needs to be transformed, the characteristics are selected, the problem is redefined, resampling is performed, and the like; the robustness of the model is improved from the data, various changes can be carried out on the input data without changing the data label, but the training time of the model can be increased;

2) Model robustness is improved from the model structure, such as model fusion, multi-scale fusion, grouping convolution (result), attention mechanism (Spinet) addition, activation function selection and improvement and the like; however, not only does the model designer need to have a lot of experience to improve the robustness of the model from the model structure, but also increases the generation time of the challenge sample; 3) From a training optimization mode, learning rate, loss selection, weight initialization, batch and epoch number and the like; however, the loss selection design requires deep knowledge of the service form and smart mathematical design corresponding to the model; 4) The robustness of the model is improved in the countermeasure attack, for example, the model performs white box attack to generate more countermeasure samples, and then the model performs original sample and countermeasure sample learning together, and the model training and countermeasure sample generation mode and the like are performed synchronously; however, increasing model robustness from challenge increases training time and may also decrease accuracy of the original model.

Therefore, in order to effectively improve the robustness of the model in view of the frequency band of the input information, the embodiment of the disclosure provides a training method of an image classification model.

Fig. 1 is a flowchart of a training method of an image classification model provided according to an embodiment of the present disclosure. The embodiment of the disclosure can be applied to the condition of training the image classification model and is used for improving the robustness of the image classification model. The method may be performed by a training device of the image classification model, which may be implemented in hardware and/or software, and may generally be integrated in an electronic device.

As shown in fig. 1, a training method for an image classification model according to an embodiment of the present disclosure includes the following specific steps:

s110: and acquiring a training image set in a target image format, and constructing a bypass module for filtering all signals in a key frequency band according to the key frequency band matched with the training image set.

The training image set may refer to a preset image data set for training the image classification model. The target image format may refer to an image format corresponding to each training image in a training image set used in training the image classification model. The target image format may be jpg format or bmp format, and may be generally defined according to actual service requirements, which is not limited by the embodiments of the present disclosure.

The critical frequency band may refer to a landmark frequency band of most training images in the training image set, that is, when each training image in the training image set is represented by a frequency domain signal, a frequency band in which each frequency domain signal is mainly concentrated may be determined as the critical frequency band. Typically, a critical frequency band may be represented by a start frequency point and an end frequency point.

Typically, a training image in a target image format corresponds to a critical frequency band. The bypass module may refer to a preset module for filtering all signals in the critical frequency band.

S120: and identifying at least one low-dimensional image feature extraction layer in the image classification model to be trained, and connecting the bypass module in parallel with the output end of the at least one low-dimensional image feature extraction layer to obtain an improved model.

The low-dimensional image feature extraction layer may refer to an extraction layer containing shallow semantic information in the image classification model. In general, the image classification model learns shallow semantic information in the first few layers, and the more deeply the image classification model learns, the higher the semantic information. Thus, the low-dimensional image feature extraction layer may be set as the first few layers of the image classification model, and may be, for example, the first three layers, according to specific business requirements, which is not limited by the embodiments of the present disclosure.

The reason for this is that: in general, the extraction layer of high-level semantic information of the image classification model focuses on the main image features of each training image in the training image set, and if such a feature extraction manner is understood from the perspective of the frequency domain, the extraction layer illustrating the high-level semantic information may ignore image features outside the critical frequency band. In this embodiment, in order to enable the image classification model to learn the image features outside the key frequency band, the output end of the low-dimensional image feature extraction layer that retains the image features outside the key frequency band is selected and connected in parallel with the bypass module, so as to extract the image features outside the key frequency band for training the image classification model.

The improved model may refer to a classification model obtained by connecting the output ends of the low-dimensional image feature extraction layers in parallel with the bypass module in the image classification model to be trained.

S130: and training the improved model by using the training image set, and removing each bypass module connected in parallel in the trained improved model to obtain a target image classification model so as to expand an image format adapted to the target image classification model.

The target image classification model may refer to an image classification model with improved robustness after training.

Therefore, in a specific example, the problem that the generalization capability of the input image during model reasoning is poor because the model does not learn the high-frequency information can be avoided by aiming at the key frequency band matched with the training image set, wherein the image information of each input image used for training the model is mainly concentrated in the low frequency band, and the image information of the input image during model reasoning is mainly concentrated in the high frequency band.

According to the technical scheme, a training image set in a target image format is obtained, and a bypass module for filtering all signals in a key frequency band is constructed according to the key frequency band matched with the training image set; further, in the image classification model to be trained, at least one low-dimensional image feature extraction layer is identified, and a bypass module is connected in parallel with the output end of the at least one low-dimensional image feature extraction layer to obtain an improved model; finally, training the improved model by using the training image set, removing each bypass module connected in parallel in the trained improved model to obtain the target image classification model, expanding the image format adapted to the target image classification model, and improving the robustness of the image classification model.

FIG. 2 is a flow chart of another method of training an image classification model provided in accordance with an embodiment of the present disclosure; the disclosed embodiments are refined based on the above disclosed embodiments, in this embodiment, the operation of constructing a bypass module for filtering all signals in a key frequency band according to the key frequency band matched with the training image set is specified as follows: the method comprises the steps of obtaining a standard branching module, wherein the standard branching module comprises a time domain-to-frequency domain unit, a filtering unit and a frequency domain-to-time domain unit which are sequentially connected; and determining a defective frequency band according to the key frequency band, and setting the center frequency and the cut-off frequency of the filtering unit according to the defective frequency band so as to construct a bypass module for filtering all signals in the key frequency band.

As shown in fig. 2, a training method for an image classification model according to an embodiment of the present disclosure includes the following specific steps:

s210: a training image set in a target image format is acquired.

S220: and performing time domain to frequency domain conversion on each training image in the training image set, acquiring frequency bands corresponding to each training image respectively, and acquiring a starting frequency point and a stopping frequency point corresponding to each frequency band respectively.

The time-domain to frequency-domain processing may refer to a processing operation of performing time-frequency conversion on each training image. Illustratively, the time-frequency transform (Time Frequency Transform, TFT) may be defined by a fourier transform or the like.

The frequency band of a training image is understood to be the frequency domain range in which the frequency domain signal is mainly concentrated after the training image is converted from the time domain signal to the frequency domain signal. A frequency band also corresponds to a start frequency point and an end frequency point.

The starting frequency point may refer to a starting point of each frequency band. The termination frequency point may refer to a termination point of each frequency band.

S230: clustering processing is carried out on the starting frequency points to obtain a maximum starting frequency point cluster, and clustering processing is carried out on the ending frequency points to obtain a maximum ending frequency point cluster.

The largest initial frequency point cluster comprises the cluster with the largest initial frequency point number. The maximum termination frequency point cluster may refer to a cluster containing the largest number of termination frequency points.

S240: and determining a key frequency band matched with the training image set according to the clustering center of the maximum starting frequency point cluster and the clustering center of the maximum ending frequency point cluster.

Specifically, clustering is carried out on each initial frequency point, a maximum initial frequency point cluster is obtained, and a cluster center of the maximum initial frequency point cluster is determined; clustering all the termination frequency points to obtain a maximum termination frequency point cluster, and determining a cluster center of the maximum termination frequency point cluster; therefore, the key frequency band matched with the training image set can be determined through the clustering center of the maximum initial frequency point cluster and the clustering center of the maximum termination frequency point cluster.

S250: and obtaining a standard branching module, wherein the standard branching module comprises a time domain-to-frequency domain unit, a filtering unit and a frequency domain-to-time domain unit which are sequentially connected.

The time-domain-to-frequency-domain unit may refer to a unit that performs time-domain-to-frequency-domain processing on each training image in the training image set. The filtering unit may refer to a unit that performs a filtering operation on an output result of the time-to-frequency domain unit. The frequency domain to time domain unit may refer to a unit performing a frequency domain to time domain operation on an output result of the filtering unit, and may, for example, implement the frequency domain to time domain operation by inverse fast fourier transform (Inverse Fast Fourier Transform, IFFT).

S260: and determining a defective frequency band according to the key frequency band, and setting the center frequency and the cut-off frequency of the filtering unit according to the defective frequency band so as to construct a bypass module for filtering all signals in the key frequency band.

The defective frequency band may refer to a frequency band that has not been learned in the classification model test process, and may be, for example, a frequency band in the full frequency domain except for a critical frequency band.

Where the center frequency may refer to the midpoint between two 3dB points of the filter unit, which may be generally represented by an arithmetic average of the two 3dB points. The cutoff frequency may refer to a frequency corresponding to when the frequency is changed to decrease the output signal to 0.707 times the maximum value while the amplitude of the input signal is maintained unchanged.

Therefore, the defect frequency band is determined through the key frequency band, and the center frequency and the cut-off frequency of the filtering unit in the standard branching module are set according to the defect frequency band, so that a bypass module for filtering all signals in the key frequency band can be constructed, and an effective basis is provided for subsequent operation.

S270: at least one low-dimensional image feature extraction layer is identified in the image classification model to be trained.

S280: and the output end of each low-dimensional feature extraction layer in the image classification model to be trained is respectively connected with the bypass module and the feature merging module.

The feature merging module is used for merging the output features of each low-dimensional feature extraction layer with the output features of the bypass module accessed by each low-dimensional feature extraction layer and outputting the merged output features.

A schematic structural diagram of the improved model is shown in fig. 3. Specifically, the output end of each low-dimensional feature extraction layer of the original branch in the image classification model to be trained is respectively connected with the bypass module and the feature merging module, and the bypass module is connected with the feature merging module, so that feature merging of the original branch and the bypass module is realized, and an improved model is obtained.

Fig. 4 and 5 are schematic diagrams of a bypass module. Specifically, at least one low-dimensional image feature extraction layer is identified in an image classification model to be trained, and then a bypass module and a feature merging module which comprise a time domain-to-frequency domain unit (fft), a filtering unit and a frequency domain-to-time domain unit (ifft) which are sequentially connected are respectively connected to the output end of each low-dimensional feature extraction layer in the image classification model to be trained. Each low-dimensional feature extraction layer may be a combination of vector convolution operations (conv) +bn+relu, or may be conv.

Therefore, the bypass module and the feature merging module are respectively connected to the output end of each low-dimensional feature extraction layer in the image classification model to be trained, so that an improved model can be obtained, and an effective basis is provided for subsequent operation.

S290: the current training image in the training image set is input into the improved model.

S2100: and extracting low-dimensional time domain features of the input image features through the low-dimensional feature extraction layer, and respectively inputting the extracted low-dimensional time domain features into the accessed bypass module and the feature merging module.

The low-dimensional time domain feature may refer to a feature of the input image feature corresponding in the low-dimensional time domain.

S2110: and performing time domain to frequency domain processing on the input low-dimensional time domain features through a time-frequency domain conversion unit in the bypass module to obtain low-dimensional frequency domain features.

The low-dimensional frequency domain feature may refer to a feature obtained by processing a low-dimensional time domain feature from a time domain to a frequency domain.

Specifically, taking a layer of the low-dimensional time domain feature as { L1, L2, … …, ln } as an example, taking each output channel of the fixed layer Ln to obtain { Ln1, ln2, … …, lnn }, and performing time domain to frequency domain processing on the low-dimensional time domain feature to obtain { TFT (Ln 1), TFT (Ln 2), … …, TFT (Lnn) }; it should be noted that a SHIFT operation may also be performed, such as shifting the low frequency bit to the center, defining this operation as tft_shift, resulting in tft_shift { TFT (Ln 1), TFT (Ln 2), … …, TFT (Lnn) }; also, tft_shift { TFT (Ln 1) } =an1 may be defined; thus, time domain to frequency domain processing is sequentially performed on each layer of the low-dimensional time domain feature and each output channel, and frequency bands { A1, … …, an } corresponding to the low-dimensional time domain feature are obtained. Wherein An contains the time-domain-to-frequency-domain results { An1, … …, an } of each channel corresponding to the nth layer.

S2120: and filtering the input low-dimensional frequency domain features through a filtering unit in the bypass module to obtain a frequency domain filtering result.

The frequency domain filtering result may refer to a result obtained by filtering the low-dimensional frequency domain feature.

Specifically, taking a key frequency band of the training image set as a low frequency band as An example, an1 frequency spectrum can be removed from An area outside a half of a central area, so as to realize filtering processing, and a frequency domain filtering result Bn1 corresponding to An1 is obtained.

S2130: and performing frequency domain-time domain conversion processing on the input frequency domain filtering result through a frequency domain-time domain conversion unit in the bypass module to obtain a time domain filtering characteristic.

The time domain filtering feature may refer to a result obtained by performing a frequency domain to time domain processing on a frequency domain filtering result.

Specifically, the frequency domain filtering result Bn1 corresponding to An1 may be subjected to frequency domain to time domain processing by using inverse time-frequency transformation, so as to obtain a time domain filtering feature corresponding to Bn1, where ITFT (Bn 1) =mn1 may be defined. Thus, the frequency domain-to-time domain processing is sequentially performed on each frequency domain filtering result, so that the time domain filtering characteristics { M1, …, mn } can be obtained, wherein Mn comprises the time domain filtering characteristics of the nth layer.

S2140: and carrying out feature combination on the input low-dimensional time domain features and the time domain filtering features through the feature combination module, and outputting the combined features.

Specifically, the current training image in the training image set is input into an improved model, the input image features are subjected to low-dimensional time domain feature extraction through a low-dimensional feature extraction layer, and the extracted low-dimensional time domain features are respectively input into an accessed bypass module and a feature merging module; furthermore, the time-frequency domain conversion unit in the bypass module is used for performing time-domain to frequency-domain conversion on the input low-dimensional time-domain characteristics to obtain low-dimensional frequency-domain characteristics; filtering the input low-dimensional frequency domain features through a filtering unit in the bypass module to obtain a frequency domain filtering result; further, the frequency domain to time domain processing is carried out on the input frequency domain filtering result through a frequency domain to time domain conversion unit in the bypass module, so that a time domain filtering characteristic is obtained; finally, the feature combination module is used for carrying out feature combination on the input low-dimensional time domain features and the time domain filtering features and outputting the combined features, so that training of an improved model can be realized.

S2150: and removing all bypass modules and all feature merging modules which are connected in parallel in the improved model after training is finished to obtain a target image classification model so as to expand the image format adapted to the target image classification model.

Specifically, after the improved model is trained, each bypass module and each feature merging module which are connected in parallel in the improved model are removed, so that the integrity of the original classification model can be ensured.

According to the technical scheme, time domain-to-frequency domain processing is conducted on each training image in a training image set, frequency bands corresponding to each training image are obtained, starting frequency points and ending frequency points corresponding to each frequency band are obtained, clustering processing is conducted on all starting frequency points, a maximum starting frequency point cluster is obtained, clustering processing is conducted on all ending frequency points, a maximum ending frequency point cluster is obtained, further, a key frequency band matched with the training image set is determined according to the clustering center of the maximum starting frequency point cluster and the clustering center of the maximum ending frequency point cluster, a standard branching module is obtained, a defect frequency band is determined according to the key frequency band, a bypass module used for filtering all signals in the key frequency band is built according to the defect frequency band, further, at least one low-dimensional image feature extraction layer is identified in an image classification model to be trained, and the bypass module and a feature merging module are respectively connected to the output end of each low-dimensional feature extraction layer in the image classification model to be trained; finally, the current training image in the training image set is input into an improved model, the input image features are subjected to low-dimensional time domain feature extraction through a low-dimensional feature extraction layer, the extracted low-dimensional time domain features are respectively input into an accessed bypass module and a feature merging module, the input low-dimensional time domain features are subjected to time domain to frequency domain processing through a time-frequency domain conversion unit in the bypass module, low-dimensional frequency domain features are obtained, the input low-dimensional frequency domain features are subjected to filtering processing through a filtering unit in the bypass module, a frequency domain filtering result is obtained, the input frequency domain filtering result is subjected to frequency domain to time domain processing through a frequency domain to time domain unit in the bypass module, the time domain filtering features are obtained, the input low-dimensional time domain features and the time domain filtering features are combined through a feature merging module and then output, and each bypass module and each feature merging module which are connected in parallel in the trained improved model are removed, so that a target image classification model is obtained, an image format adapted to the target image classification model can be expanded, and the robustness of the target image classification model is improved.

FIG. 6 is a flow chart of an alternative training method for an image classification model provided in accordance with an embodiment of the present disclosure; specifically, the current training image in the training image set is input into an improved model, the input image features are subjected to low-dimensional time domain feature extraction through a low-dimensional feature extraction layer to obtain a layer to be subjected to frequency domain filtering, then the input low-dimensional time domain features are subjected to time domain to frequency domain processing and filtering processing through a bypass module, the frequency domain filtering results are subjected to frequency domain to time domain processing through the bypass module, the time domain filtering features are added with the input low-dimensional time domain features, further, re-iterative training is further carried out, and for example, alternating training can be adopted, such as training with bypass branches is carried out on the first 5 batches, and original model structure training without side branches is adopted on the last 5 batches until training convergence reaches the requirement; and then, removing each bypass module and each feature merging module which are connected in parallel in the improved model after training, and obtaining the target image classification model.

Fig. 7 is a flowchart of an image classification method provided according to an embodiment of the present disclosure. The embodiment of the disclosure can be applied to the case of generating an output result of an image classification model. The method may be performed by an image classification device, which may be implemented in hardware and/or software, and may be generally integrated in an electronic device.

As shown in fig. 7, an image classification method provided in an embodiment of the present disclosure includes the following specific steps:

s310: and acquiring an image to be classified in a first image format, and inputting the image to be classified into a pre-trained target image classification model.

The target image classification model is obtained after model training is carried out on the improved model by using a training image set in a second image format, and the improved model is obtained by constructing a bypass module which is constructed according to a key frequency band matched with the training image set and is used for filtering all signals in the key frequency band.

The first image format may refer to an image format of an image to be classified when the target image classification model is tested. The second image format may refer to an image format of the training image when training the target image classification model. The first image format may be the same as or different from the second image format, which is not limited by the embodiments of the present disclosure.

It is noted that the embodiments of the present disclosure are particularly applicable to the case where the first image format is different from the second image format. Specifically, if the second image format is jpg format, after the training image set in jpg format is used to train the classification model in the prior art, when the to-be-classified image in bmp format is input to the trained classification model, the accuracy will be rapidly reduced. However, if the training image set with the second image format being jpg format is used to perform model training on the improved model to obtain the target image classification model, and then the image to be classified with the first image format being bmp format is input into the pre-trained target image classification model, the accuracy of the output result of the classification model can be kept.

S320: and obtaining an image classification result which is output by the target image classification model and matched with the image to be classified.

According to the technical scheme, the image to be classified in the first image format is obtained, the image to be classified is input into the pre-trained target image classification model, and then the image classification result which is output by the target image classification model and matched with the image to be classified is obtained, so that the image classification result corresponding to the image to be classified in the first image format can be obtained quickly and accurately.

As an implementation of the above-mentioned training method of each image classification model, the present disclosure further provides an optional embodiment of an execution apparatus that implements the above-mentioned training method of each image classification model.

FIG. 8 is a schematic structural diagram of a training device for an image classification model according to an embodiment of the present disclosure; as shown in fig. 8, the training apparatus of the image classification model includes: a bypass module construction module 410, an improvement model construction module 420, and a target model acquisition module 430;

the bypass module construction module 410 is configured to acquire a training image set in a target image format, and construct a bypass module for filtering all signals in a key frequency band according to the key frequency band matched with the training image set;

An improved model construction module 420, configured to identify at least one low-dimensional image feature extraction layer in an image classification model to be trained, and connect the bypass module in parallel with an output end of the at least one low-dimensional image feature extraction layer, so as to obtain an improved model;

and the target model obtaining module 430 is configured to train the improved model by using the training image set, and remove each bypass module connected in parallel in the improved model after training, so as to obtain a target image classification model, so as to expand an image format adapted to the target image classification model.

Optionally, the training device of the image classification model may further include:

the key frequency band acquisition module is used for carrying out time domain to frequency domain conversion on each training image in the training image set before constructing a bypass module for filtering all signals in the key frequency band according to the key frequency band matched with the training image set, acquiring frequency bands respectively corresponding to each training image, and acquiring a starting frequency point and a stopping frequency point respectively corresponding to each frequency band; clustering is carried out on each initial frequency point to obtain a maximum initial frequency point cluster, and clustering is carried out on each termination frequency point to obtain a maximum termination frequency point cluster; and determining a key frequency band matched with the training image set according to the clustering center of the maximum starting frequency point cluster and the clustering center of the maximum ending frequency point cluster.

Optionally, the bypass module construction module 410 may be specifically configured to:

the method comprises the steps of obtaining a standard branching module, wherein the standard branching module comprises a time domain-to-frequency domain unit, a filtering unit and a frequency domain-to-time domain unit which are sequentially connected;

and determining a defective frequency band according to the key frequency band, and setting the center frequency and the cut-off frequency of the filtering unit according to the defective frequency band so as to construct a bypass module for filtering all signals in the key frequency band.

Optionally, the improved model building module 420 may specifically be configured to:

the output end of each low-dimensional feature extraction layer in the image classification model to be trained is respectively connected with the bypass module and the feature merging module;

the feature merging module is used for merging the output features of each low-dimensional feature extraction layer with the output features of the bypass module accessed by each low-dimensional feature extraction layer and outputting the merged output features;

the object model obtaining module 430 may specifically be configured to: and removing each bypass module and each characteristic merging module which are connected in parallel in the improved model after training.

Optionally, the object model obtaining module 430 may specifically be configured to:

inputting a current training image in the training image set into the improved model;

extracting low-dimensional time domain features of the input image features through the low-dimensional feature extraction layer, and respectively inputting the extracted low-dimensional time domain features into the accessed bypass module and the feature merging module;

performing time domain to frequency domain processing on the input low-dimensional time domain features through a time-frequency domain conversion unit in the bypass module to obtain low-dimensional frequency domain features;

filtering the input low-dimensional frequency domain features through a filtering unit in the bypass module to obtain a frequency domain filtering result;

Performing frequency domain-to-time domain processing on the input frequency domain filtering result through a frequency domain-to-time domain unit in the bypass module to obtain a time domain filtering characteristic;

and carrying out feature combination on the input low-dimensional time domain features and the time domain filtering features through the feature combination module, and outputting the combined features.

The product can execute the method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the method.

As an implementation of the above-mentioned image classification methods, the present disclosure also provides an optional embodiment of an execution apparatus that implements the above-mentioned image classification methods.

Fig. 9 is a schematic structural view of an image classification apparatus according to an embodiment of the present disclosure; as shown in fig. 9, the image classification apparatus includes: an image input module 510 and a result acquisition module 520;

the image input module 510 is configured to obtain an image to be classified in a first image format, and input the image to be classified into a pre-trained target image classification model; the target image classification model is obtained after model training is carried out on an improved model by using a training image set in a second image format, and the improved model is obtained by constructing a bypass module which is constructed according to a key frequency band matched with the training image set and is used for filtering all signals in the key frequency band;

And the result obtaining module 520 is configured to obtain an image classification result that is output by the target image classification model and matches the image to be classified.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 10 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 601 performs the respective methods and processes described above, for example, a training method of an image classification model or an image classification method. For example, in some embodiments, the training method of the image classification model or the image classification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the training method of the image classification model or the image classification method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the training method of the image classification model or the image classification method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of training an image classification model, comprising:

acquiring a training image set in a target image format, and constructing a bypass module for filtering all signals in a key frequency band according to the key frequency band matched with the training image set; the key frequency band refers to the frequency band in which each frequency domain signal corresponding to each training image in the training image set is concentrated; the bypass module refers to a preset module for filtering all signals in the key frequency band and outputting image characteristics outside the key frequency band;

2. The method of claim 1, further comprising, prior to constructing a bypass module for filtering all signals within a critical frequency band based on the critical frequency band matching the training image set:

performing time domain to frequency domain conversion on each training image in the training image set, acquiring frequency bands corresponding to each training image respectively, and acquiring a starting frequency point and a stopping frequency point corresponding to each frequency band respectively;

clustering is carried out on each initial frequency point to obtain a maximum initial frequency point cluster, and clustering is carried out on each termination frequency point to obtain a maximum termination frequency point cluster;

and determining a key frequency band matched with the training image set according to the clustering center of the maximum starting frequency point cluster and the clustering center of the maximum ending frequency point cluster.

3. The method of claim 1, wherein constructing a bypass module for filtering all signals within a critical frequency band according to the critical frequency band matched with the training image set, comprises:

4. The method of claim 1, wherein the bypass module is connected in parallel at an output of the at least one low-dimensional image feature extraction layer to obtain an improved model, comprising:

the output end of each low-dimensional image feature extraction layer in the image classification model to be trained is respectively connected with the bypass module and the feature merging module;

the feature merging module is used for merging the output features of each low-dimensional image feature extraction layer with the output features of the bypass module accessed by each low-dimensional image feature extraction layer and outputting the merged output features;

removing each bypass module connected in parallel in the improved model after training, comprising:

and removing each bypass module and each characteristic merging module which are connected in parallel in the improved model after training.

5. The method of claim 4, wherein training the improved model using the training image set comprises:

Extracting low-dimensional time domain features of the input image features through the low-dimensional image feature extraction layer, and respectively inputting the extracted low-dimensional time domain features into the accessed bypass module and the feature merging module;

6. An image classification method, comprising:

wherein the pre-trained target image classification model is trained according to the method of any one of claims 1-5;

7. A training apparatus for an image classification model, comprising:

the bypass module construction module is used for acquiring a training image set in a target image format and constructing a bypass module for filtering all signals in a key frequency band according to the key frequency band matched with the training image set; the key frequency band refers to the frequency band in which each frequency domain signal corresponding to each training image in the training image set is concentrated; the bypass module refers to a preset module for filtering all signals in the key frequency band and outputting image characteristics outside the key frequency band;

the improved model construction module is used for identifying at least one low-dimensional image feature extraction layer in the image classification model to be trained, and connecting the bypass module in parallel with the output end of the at least one low-dimensional image feature extraction layer to obtain an improved model;

and the target model acquisition module is used for training the improved model by using the training image set, removing each bypass module connected in parallel in the improved model after the training is completed, and obtaining a target image classification model so as to expand the image format adapted to the target image classification model.

8. The apparatus of claim 7, further comprising:

the key frequency band acquisition module is used for carrying out time domain to frequency domain conversion on each training image in the training image set before constructing a bypass module for filtering all signals in the key frequency band according to the key frequency band matched with the training image set, acquiring frequency bands respectively corresponding to each training image, and acquiring a starting frequency point and a stopping frequency point respectively corresponding to each frequency band;

9. The apparatus of claim 7, wherein the bypass module is configured to:

10. The apparatus of claim 7, wherein the improvement model building module is specifically configured to:

the target model acquisition module is specifically configured to: and removing each bypass module and each characteristic merging module which are connected in parallel in the improved model after training.

11. The apparatus of claim 10, wherein the object model acquisition module is specifically configured to:

12. An image classification apparatus comprising:

the image input module is used for acquiring an image to be classified in a first image format and inputting the image to be classified into a pre-trained target image classification model; wherein the pre-trained target image classification model is trained according to the method of any one of claims 1-5;

and the result acquisition module is used for acquiring an image classification result which is output by the target image classification model and matched with the image to be classified.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or the method of claim 6.

14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-5 or the method of claim 6.