WO2021057423A1

WO2021057423A1 - Image processing method, image processing apparatus, and storage medium

Info

Publication number: WO2021057423A1
Application number: PCT/CN2020/113114
Authority: WO
Inventors: 姜立; 周雨熙; 梁思阳; 吴梦; 李玉德; 李红燕; 韩立通
Original assignee: 京东方科技集团股份有限公司; 北京大学
Priority date: 2019-09-29
Filing date: 2020-09-03
Publication date: 2021-04-01
Also published as: CN110664395A; CN110664395B

Abstract

An image processing method, an image processing apparatus, and a storage medium. The image processing apparatus (100) comprises: a depth feature extractor (110) configured to acquire a depth feature of an image to be identified, wherein the image to be identified is a medical image; an expert feature extractor (120) configured to acquire an expert feature of the image to be identified; a fusion processor (130) configured to fuse the depth feature with the expert feature to acquire a fused feature of the image to be identified; and a classification processor (140) configured to classify, according to the fused feature of the image to be identified, the image to be identified.

Description

Image processing method, image processing device and storage medium

This disclosure claims the priority of the Chinese patent application No. 201910935208.5 filed on September 29, 2019, and the contents of the above-mentioned Chinese patent application are quoted here in full as a part of this disclosure.

Technical field

The embodiments of the present disclosure relate to an image processing method, an image processing device, and a storage medium.

Background technique

Image classification refers to automatically classifying input images into a set of predefined categories according to certain classification rules. For example, according to the semantic information contained in the image, the input image can be classified into objects and scenes. For example, a preset target object contained in the input image can be recognized and classified according to the recognized object. For another example, images with similar content can also be classified into the same category according to semantic information in the input image.

Summary of the invention

At least one embodiment of the present disclosure provides an image processing device, including: a depth feature extractor configured to obtain the depth feature of an image to be recognized, the image to be recognized is a medical image; and an expert feature extractor, configured to obtain the Identify the expert features of the image; a fusion processor configured to fuse the depth feature and the expert feature to obtain the fusion feature of the image to be recognized; the classification processor is configured to pair according to the fusion feature of the image to be recognized The image to be recognized is classified.

For example, the image processing device provided by at least one embodiment of the present disclosure further includes: an unsupervised feature extractor configured to obtain unsupervised features of the image to be recognized; the fusion processor is further configured to fuse the depth features, The expert feature and the unsupervised feature are used to obtain the fusion feature of the image to be recognized.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the depth feature extractor is further configured to obtain the depth feature of the image to be recognized by using a deep neural network.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the expert feature extractor is further configured to extract the expert features of the image to be recognized based on empirical formulas, rules, and feature values obtained from medical image data.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the category of the expert feature includes at least one of statistics, morphology, time domain, and frequency domain.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the unsupervised feature extractor is further configured to use principal components before acquiring the unsupervised features of the image to be recognized based on the unsupervised feature extractor. At least one of the analysis method, the random projection method and the sequence autoencoder is trained to obtain the unsupervised feature extractor.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the fusion processor is further configured to splice the depth feature, the expert feature, and the unsupervised feature to obtain the fusion feature.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the fusion processor is further configured to: perform a global pooling operation and an average pooling operation on the depth feature, the expert feature, and the unsupervised feature, respectively. Operation to obtain the global vector and mean vector of the depth feature, the global vector and mean vector of the expert feature, and the global vector and mean vector of the unsupervised feature; splicing the global vector and mean of the depth feature At least one of the vectors, at least one of the global vector and the mean vector of the expert feature, and at least one of the global vector and the mean vector of the unsupervised feature, to obtain the fusion feature.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the classification processor is further configured to determine whether the image to be identified contains atrial fibrillation features according to the fusion feature of the image to be identified.

At least one embodiment of the present disclosure provides an image processing method, including: obtaining a depth feature of an image to be recognized based on a depth feature extractor, where the image to be recognized is a medical image; and an expert who obtains the image to be recognized based on an expert feature extractor Feature; fusion of the depth feature and the expert feature to obtain the fusion feature of the image to be recognized; classify the image to be recognized according to the fusion feature of the image to be recognized.

For example, the image processing method provided by at least one embodiment of the present disclosure further includes obtaining unsupervised features of the image to be recognized based on an unsupervised feature extractor; fusing the depth feature, the expert feature, and the unsupervised feature, To obtain the fusion feature of the image to be recognized.

For example, in the image processing method provided by at least one embodiment of the present disclosure, fusing the depth feature, the expert feature, and the unsupervised feature to obtain the fusion feature of the image to be recognized includes: stitching the depth Feature, the expert feature, and the unsupervised feature to obtain the fusion feature.

For example, in the image processing method provided by at least one embodiment of the present disclosure, fusing the depth feature, the expert feature, and the unsupervised feature to obtain the fusion feature of the image to be recognized includes: The depth feature, the expert feature and the unsupervised feature perform a global pooling operation and an average pooling operation to obtain the global vector and the mean vector of the depth feature, the global vector and the mean vector of the expert feature, and the total The global vector and the mean vector of the unsupervised feature; splicing at least one of the global vector and the mean vector of the depth feature, at least one of the global vector and the mean vector of the expert feature, and the global vector of the unsupervised feature And at least one of the mean vector to obtain the fusion feature.

At least one embodiment of the present disclosure further provides an image processing device, including: a processor; a memory; one or more computer program modules, the one or more computer program modules are stored in the memory and configured to be configured by Executed by the processor, the one or more computer program modules include instructions for executing the image processing method provided by any embodiment of the present disclosure.

At least one embodiment of the present disclosure further provides a storage medium that stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the image processing method provided in any embodiment of the present disclosure can be executed.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings of the embodiments. Obviously, the drawings in the following description only refer to some embodiments of the present disclosure, rather than limiting the present disclosure. .

FIG. 1A is a flowchart of an image processing method provided by at least one embodiment of the present disclosure;

FIG. 1B shows an exemplary scene diagram of an image processing system according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of extracting a depth feature provided by at least one embodiment of the present disclosure;

3A is a flowchart of a fusion operation provided by at least one embodiment of the present disclosure;

FIG. 3B is a schematic diagram of a fusion operation provided by at least one embodiment of the present disclosure;

4 is a flowchart of another image processing method provided by at least one embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another fusion operation provided by at least one embodiment of the present disclosure;

FIG. 6 is a schematic block diagram of an image processing apparatus provided by at least one embodiment of the present disclosure;

FIG. 7 is a schematic block diagram of another image processing apparatus provided by at least one embodiment of the present disclosure;

FIG. 8 is a schematic block diagram of another image processing apparatus provided by at least one embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an electronic device provided by at least one embodiment of the present disclosure; and

FIG. 10 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.

detailed description

In order to make the objectives, technical solutions, and advantages of the embodiments of the present disclosure clearer, the technical solutions of the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings of the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the described embodiments of the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative labor are within the protection scope of the present disclosure.

Unless otherwise defined, the technical terms or scientific terms used in the present disclosure shall have the usual meanings understood by those with ordinary skills in the field to which this disclosure belongs. The "first", "second" and similar words used in the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Similarly, similar words such as "a", "one" or "the" do not mean a quantity limit, but mean that there is at least one. "Include" or "include" and other similar words mean that the elements or items appearing before the word cover the elements or items listed after the word and their equivalents, but do not exclude other elements or items. Similar words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only used to indicate the relative position relationship. When the absolute position of the described object changes, the relative position relationship may also change accordingly.

Electrocardiogram (ECG) is widely used in the diagnosis of various heart diseases, although medical equipment such as current medical electrocardiographs and wearable automatic monitoring equipment all have some basic electrocardiogram automatic analysis functions (such as automatic measurement of waveform parameters, rhythm parameters, etc.) ), but for some types of arrhythmia such as atrial fibrillation, due to the high error rate of automatic analysis and diagnosis of medical equipment, the interpretation and diagnosis of some arrhythmia such as atrial fibrillation are still mainly completed by medical experts.

According to different feature extraction methods, the existing atrial fibrillation recognition methods can include a method based on feature engineering and a method based on deep learning. Traditional atrial fibrillation recognition methods basically use feature engineering-based methods. According to the different analysis mechanisms of atrial fibrillation, feature engineering-based methods can be divided into methods based on atrial activity analysis, methods based on ventricular response analysis, and combined with atrial activity And the way the ventricles react. The method based on atrial activity analysis mainly focuses on the disappearance of P waves in atrial fibrillation or the appearance of F waves in the TQ interval. The method based on atrial activity analysis mainly focuses on changes in the shape of the ECG data caused by changes in atrial activity. If the resolution of the ECG signal data is high and there is almost no noise pollution, the atrial fibrillation detector based on atrial activity analysis can achieve high-precision detection , But it will be more affected in real-time scenes with certain noise interference and difficult to perform complex noise reduction operations. The method based on ventricular response analysis mainly focuses on the change of the time interval (RR interval length) between heart beats detected by QRS. The RR interval is mainly determined based on the peak position of the R wave with the largest fluctuation in the ECG signal data. The method based on ventricular response analysis can be much less interfered by noise than the method based on atrial activity analysis, and it is also more suitable for real-time atrium Diagnose the problem with tremor. Methods that combine atrial activity and ventricular response can provide greater performance by combining periodic independent signals. The methods of combining atrial activity and ventricular response include: RR interval Markov model combining P wave morphological similarity measurement and PR interval variability, and fuzzy logic classification combining RR interval irregularity, P wave absence, and F wave appearance method.

The method based on feature engineering is closely related to the knowledge of domain experts, so it can also be called the method based on domain knowledge. The existing researches on atrial fibrillation recognition model based on feature engineering are limited in applicability and can only classify some arrhythmias. Because the waveform changes between different patients are complex and the characteristics of many non-AF signals may be similar to the characteristics of atrial fibrillation signals (such as irregular RR intervals, etc.), artificially designed expert features need to be designed specifically. Although a relatively good recognition effect can be achieved for a specific limited type of arrhythmia, it is difficult to accurately distinguish atrial fibrillation from other types of arrhythmia under the complicated situation where various types of arrhythmia are mixed. The reasons mainly include two aspects: on the one hand, it is difficult to ensure that all the features are extracted, resulting in the atrial fibrillation recognition model may discard a lot of key information in the feature extraction stage; on the other hand, the ECG signal data inevitably contains power frequency interference , Electrode contact noise, human movement, EMG interference, baseline drift and ECG signal amplitude changes, equipment noise and other large amounts of noise. In the method based on feature engineering, it is difficult to accurately measure the parameters and recognize the waveforms such as P wave, T wave, S wave and F wave from the electrocardiogram containing noise. Therefore, the method based on feature engineering is very susceptible to noise pollution. influences.

Due to the powerful ability of deep learning to automatically extract data features, the application of deep neural network models in biomedical signals has received extensive attention.

In recent years, methods based on deep learning have achieved success in the detection of atrial fibrillation on the electrocardiogram. However, these methods still have a high false diagnosis rate, and only about 66% of atrial fibrillation can be correctly identified from ECG data mixed with various arrhythmias. Since a large amount of valuable domain knowledge has been accumulated in the actual application field, the deep neural network at this stage cannot replace these domain knowledge. Therefore, studying how to combine deep neural networks with domain knowledge to improve the accuracy of automatic atrial fibrillation detection is a very valuable research problem, and it is also a place where many methods are not considered at this stage.

Generally speaking, the quality of a deep neural network model largely depends on the quality of the training samples. The more accurate the types of training samples and the more comprehensive the content, the higher the quality of the atrial fibrillation recognition model obtained by training. However, in practical applications, it is difficult to obtain comprehensive and accurate training samples. In the application of arrhythmia recognition, since the ECG is continuously collected and the human body's ECG signal is very weak, the training samples inevitably contain various noises, and these training samples containing noise will have an important impact on the final recognition results. Therefore, in the detection of atrial fibrillation, the deep neural network model has a higher error rate. On the one hand, it is due to the insufficient amount of training data; on the other hand, the noise segment in the ECG sample (for example, the noise segment can include noise and The semantic ambiguity caused by the current ECG sample belongs to other arrhythmia types) is also the main reason for the low accuracy of the arrhythmia recognition model. Deep learning technology is used to automatically extract the ECG signal data from the noisy data segment. Learning features will cause the wrong features to be mapped to the data distribution of the current arrhythmia type, resulting in the deterioration of the quality of the deep neural network model.

Therefore, in the actual application scenario of atrial fibrillation detection, the existing methods have insufficient consideration of the problems caused by the modeling perspective, domain knowledge and other challenges, resulting in deep learning methods that are difficult to be accepted by the actual domain due to the lack of domain knowledge. At the same time, when faced with the problem of low data quality in actual scenarios, the domain knowledge-based atrial fibrillation detection method has the limitation of low accuracy, which makes it difficult to truly apply the results of atrial fibrillation detection in research scenarios in actual scenarios.

At least one embodiment of the present disclosure provides an image processing method. The image processing method includes: acquiring a depth feature of an image to be recognized based on a depth feature extractor, where the image to be recognized is a medical image; and acquiring the image to be recognized based on an expert feature extractor. Identify the expert features of the image; fuse the depth feature and the expert feature to obtain the fusion feature of the image to be identified; classify the image to be identified according to the fusion feature of the image to be identified.

Some embodiments of the present disclosure also provide an image processing device and a storage medium corresponding to the above-mentioned image processing method.

The image processing method, image processing device, and storage medium provided by the above-mentioned embodiments of the present disclosure use the representation and extraction of expert features based on domain knowledge, and the representation and extraction of deep features based on deep neural networks, and adopt a unified framework for experts Features and depth features are expressed and fused to achieve the purpose of improving the accuracy of automatic detection of atrial fibrillation, thereby providing high-precision auxiliary methods for real-time and dynamic atrial fibrillation recognition and diagnosis, helping doctors to diagnose and accurately discover the patient’s atrial fibrillation in time To help patients understand the changes in their condition in time, thereby improving the quality of medical care, reducing the incidence of life-threatening conditions such as sudden cardiac death, and ultimately reducing the health and economic burdens brought to the family and society.

The embodiments and examples of the present disclosure will be described in detail below with reference to the accompanying drawings.

At least one embodiment of the present disclosure provides an image processing method, and FIG. 1A is a flowchart of an example of the image processing method. The image processing method can be implemented in the form of software, hardware, firmware or any combination thereof. It is loaded and executed by processors in devices such as mobile phones, notebook computers, desktop computers, network servers, digital cameras, etc., and can realize the image processing to be recognized. Classification for processing in subsequent steps. FIG. 1B shows an exemplary scene diagram of an image processing system according to an embodiment of the present disclosure. Hereinafter, the image processing method provided by at least one embodiment of the present disclosure will be described with reference to FIG. 1A and FIG. 1B. As shown in FIG. 1B, the image processing method includes steps S110 to S140.

Step S110: Obtain the depth feature of the image to be recognized based on the depth feature extractor.

Step S120: Obtain expert features of the image to be recognized based on the expert feature extractor.

Step S130: Fusion of the depth feature and the expert feature to obtain the fusion feature of the image to be recognized.

Step S140: Classify the image to be recognized according to the fusion feature of the image to be recognized.

For example, the image to be recognized is a medical image. The medical images mentioned here can be, for example, medical images collected by CT, MRI, ultrasound, X-ray, radionuclide imaging (such as SPECT, PET), etc., or can be displays such as electrocardiogram, electroencephalogram, optical photography, etc. Images of human body physiological information. In the embodiment of the present disclosure, the medical image is an electrocardiogram image as an example for description, which is not limited in the embodiment of the present disclosure. In the electrocardiogram, the characteristic of atrial fibrillation signal is that the P wave completely disappears, the P wave is completely replaced by the F wave, and the heart rate is absolutely uneven (that is, any two adjacent RR intervals are not the same).

For step S110, for example, in some examples, the deep feature extractor may be implemented as a deep neural network (for example, including a fully connected layer), and this step S110 includes using the deep neural network to obtain the depth features of the image to be recognized. For example, the deep neural network may be a convolutional neural network, such as any one of the Inception series network (such as Googlenet, etc.), the VGG series network, the Resnet series network, etc., or at least a part of any one of the foregoing networks. Embodiments of the present disclosure There is no restriction on this.

In some applications, the vector dimensionality of the deep features output by the deep neural network is relatively high, and the dimensionality of vectors such as expert features or unsupervised features is low, because the vector dimensionality of deep features is the same as that of expert features or unsupervised features. The dimensionality of the fusion is quite different, so that expert features or unsupervised features have little effect in the final fusion feature, so their advantages are not reflected in the fusion feature, and the purpose of the fusion feature is not achieved, and the atrial fibrillation cannot be improved. The accuracy of detection. Therefore, in order to reduce the gap between the vector dimension of deep features and the dimension of vectors such as expert features or unsupervised features, the deep feature extractor may also include an activation function layer connected to a deep neural network (for example, an Identity layer). ). Through the Identity layer, the depth features extracted by the deep neural network can be reduced to obtain the reduced depth features, which can be matched with the dimensions of expert features or unsupervised features, so as to solve the above problems and improve the integration of features. Diversity, in order to achieve the purpose of improving the accuracy of automatic detection of atrial fibrillation.

For example, the Identity activation function in the Identity layer can output the extracted depth features (for example, the depth features after dimensionality reduction) before outputting the classification results.

For example, the parameters in the deep neural network can be obtained by training in the training stage S1 in FIG. 2.

For example, in the training phase S1, the deep neural network can be connected to the classifier. For example, the classifier is a Softmax classifier or an SVM (Support Vector Machine) classifier, etc. In the embodiments of the present disclosure, the Softmax classifier is taken as an example for description, which is not limited in the embodiments of the present disclosure. The classifier can classify the input data of the input deep neural network according to the extracted features. The classification result of the classifier is output through the output layer as the final output of the deep neural network model.

FIG. 2 is a schematic diagram of extracting a depth feature provided by at least one embodiment of the present disclosure. As shown in Figure 2, in order to transform the deep neural network of any architecture so that it can automatically extract deep features, the deep feature extraction algorithm includes two phases: training phase and extraction phase.

As shown in Fig. 2, in the training stage S1, a deep neural network is first trained based on a specific task (for example, judging whether it is atrial fibrillation), and the architecture and weights of the deep neural network are saved.

In the training stage S1, data with labels (for example, whether the label includes atrial fibrillation) is input to the deep neural network, the deep neural network can output deep features, and output the deep features to the Softmax layer, so as to be in the Softmax layer ( That is, the layer where the Softmax classifier is located) outputs the classification result based on the depth features extracted by the deep neural network (for example, the predicted probability that the image to be recognized belongs to a preset category (for example, atrial fibrillation)), and determines the label corresponding to the prediction result , That is, whether it is atrial fibrillation.

For example, in the training process, the training process of the deep neural network may also include an optimizer, and the optimization function in the optimizer may calculate the error value of the parameters of the deep neural network according to the system loss value calculated by the system loss function, And according to the error value, the parameters of the deep neural network to be trained are corrected, so that the deep neural network can output more accurate depth features. For example, the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (BGD) algorithm, etc., to calculate the error value of the parameters of the deep neural network.

For example, the Softmax layer is a regression function layer for outputting classification results (for example, determining whether it is atrial fibrillation).

Therefore, in the extraction stage S2, the last Softmax layer connected to the deep neural network is replaced by the Identity layer (for example, the output layer, using the Identity activation function), and the network structure of the other fully connected layers in the deep neural network is maintained The sum weight is unchanged, that is, the deep features are extracted by the trained deep neural network, which can obtain the high-precision deep features, and reduce the dimensionality in the Identity layer to output the depth that matches the expert features or unsupervised features feature. At this time, by inputting new data to the deep neural network, deep features can be output from the Identity layer, so that the representation and extraction of deep features can be realized.

In this example, the deep feature extraction may not rely on the special architecture of the deep neural network, that is, any network architecture that can realize feature extraction can be used and is not limited to one type of network.

It should be noted that the above-mentioned deep feature extraction method is not limited to the above-mentioned neural network, and can also be implemented by conventional methods in the art, such as HOG+SVM, which is not limited in the embodiments of the present disclosure.

In addition, because the deep learning technology increases the representation learning ability, it also reduces the interpretability ability. They are often used as black box models, that is, they can only obtain specific output results (for example, whether it is atrial fibrillation), and cannot effectively explain the depth characteristics and other data generated in the intermediate process, for example, cannot explain which depth Which area of the feature corresponds to atrial fibrillation, etc. However, in the medical field, an appropriate interpretation of an analysis and recognition result is extremely important for both doctors and patients, and the current deep learning-based atrial fibrillation recognition technology extracts deep features that cannot effectively guide and assist doctors in diagnosis decisions. In view of this technical problem, the embodiments of the present disclosure can be implemented by correspondingly annotating depth features or combining interpretable expert features, etc., so as to improve the accuracy of atrial fibrillation recognition.

For step S120, for example, in some examples, step S120 includes: extracting expert features of the image to be recognized based on empirical formulas, rules, and feature values obtained from medical image data. For example, the medical image data may include electrocardiogram data.

For example, in the scene of automatic detection of atrial fibrillation based on domain knowledge, domain experts will sum up "empirical formulas", "rules", and "eigenvalues" based on data based on their own knowledge system, and perform AF detection based on them. Recognition, in fact, these empirical formulas, rules, and eigenvalues can all be expressed as a numeric vector C∈R ^d (where R ^d represents a value space with dimension d, and this formula indicates that the numeric vector C belongs to this value The form of space) can then be regarded as an expert feature.

For example, the category of the expert feature may include at least one of statistics, morphology, time domain, and frequency domain, and of course may also include other categories, which are not limited in the embodiments of the present disclosure.

Specifically, statistical expert features include, for example, Mean, Maximum, Minimum, Variance, Skewness, Kurtosis, Percentile, and Statistical values such as Threshold. For example, if the maximum difference in the ECG monitoring time series data is large, it indicates that the patient may have atrial fibrillation.

For example, the definition of morphology is directly related to specific areas. For example, in the field of atrial fibrillation detection, when the P-band disappears in the timing data of the ECG monitoring, it indicates that the patient may have atrial fibrillation.

Since many time series data have certain periodicity, time-domain analysis pays attention to the rhythmic characteristics of time series data in the time dimension. For example, if the RR interval of the time series data of ECG monitoring is irregular, it indicates that the patient may have atrial fibrillation.

For example, time series data will also show certain characteristics in the frequency domain. For example, in the time series data of ECG monitoring, if the energy greater than 50 Hz is too high, it indicates that the muscle current of the patient’s body is strong, that is, in the ECG monitoring time series data. In the monitoring sequence data, noises such as electromyographic interference will affect the analysis results. At this time, the accuracy of atrial fibrillation recognition is not high.

For step S130, in some examples, the step S130 may include: stitching depth features and expert features to obtain fusion features.

For example, in some examples, in order to integrate features from different sources, global pooling (Max Pooling) and mean pooling (Mean Pooling) operations are used.

FIG. 3A is a flowchart of a fusion operation provided by at least one embodiment of the present disclosure. That is, FIG. 3A is a flowchart of at least one example of step S130 shown in FIG. 1B. For example, in the example shown in FIG. 3A, the fusion operation includes step S1311 to step S1312. FIG. 3B is a schematic diagram of a fusion operation provided by at least one embodiment of the present disclosure. Hereinafter, the fusion operation provided by at least one embodiment of the present disclosure will be described in detail with reference to FIG. 3A and FIG. 3B.

Step S1311: Perform a global pooling operation and an average pooling operation on the depth feature and the expert feature respectively to obtain the global vector and the mean vector of the depth feature and the global vector and the mean vector of the expert feature respectively.

For example, as shown in FIG expert feature depth wherein the depth of the feature extractor 110 extracts and 3B expert feature extractor extracts were represented as a vector matrix _{V = [v 0, ...,} v i, ..., v n _-1 ], where, v _i ∈ R ^d , means that the vector v _{i is} in the value space R ^d ^{, and V ∈ R n×d} , means that the vector matrix V is in the value space R ^n×d , where R ^n×d represents the value space of dimension n*d.

For example, the depth of features and specialist features global pool operation (e.g., maximum pooling) and mean cell operations, respectively, to obtain the global vector v _max and mean vectors v _mean and the specialist feature depth features are global vector v _max And the mean vector v _mean , where v _max ∈ R ^d and v _mean ∈ R ^d . For example, the global vector v _max is obtained by calculating the maximum value of each column vector in the vector matrix V, for example, and the mean vector v _mean is obtained by calculating the average value of each column vector in the vector matrix V, for example, by column.

Step S1312: Splice at least one of the global vector and the mean vector of the depth feature and at least one of the global vector and the mean vector of the expert feature to obtain a fusion feature.

For example, concatenate the global vector and the mean vector of all features from different sources, for example, concatenate the global vector and the mean vector of the depth feature from the deep feature extractor 110 and the global vector of the expert feature from the expert feature extractor 120 v _max and the mean vector v _mean to obtain the final fusion feature, provide easy-to-use and accurate multi-source fusion features for the atrial fibrillation detection task, thereby providing an accurate auxiliary method for atrial fibrillation detection.

For example, in the embodiment of the present disclosure, the splicing operation is to merge the global vector and the mean vector of features from different sources into one vector, for example, one one-dimensional vector.

It should be noted that it is also possible to splice only the global vector of the depth feature and the global vector of the expert feature, or only the mean vector of the depth feature and the mean vector of the expert feature, or only the global vector of the depth feature and the mean vector of the expert feature. , Or only concatenate the mean vector of the depth feature and the global vector of the expert feature to obtain the fusion feature, which is not limited in the embodiment of the present disclosure.

For example, in other examples, features from different sources can be spliced directly.

For example, in some examples, the depth feature and the expert feature may have multiple channels. For example, the depth feature can be a tensor of size H1*W1*C1, where H1 can be the size of the depth feature in the first direction (for example, the length direction), and W1 can be the depth feature in the second direction (for example, the width direction) The above size, H1 and W1 can be the size in units of the number of pixels, and C1 can be the number of channels of the depth feature. The expert feature can be a tensor of size H2*W2*C2, where H2 can be the size of the expert feature in the first direction (for example, the length direction), and W2 can be the expert feature in the second direction (for example, the width direction) Size, H2 and W2 can be the size in units of the number of pixels, and C2 can be the number of channels of expert features. Here, C1 and C2 are integers greater than 1.

For example, the depth feature can have 100 channels. For example, the depth feature in each channel is a one-dimensional vector of 1 row*M column (M is an integer greater than 1), and the depth features of the 100 channels can be combined into one A vector matrix of 100 rows*M columns. For example, the expert features may also have 100 channels. For example, the expert features of the 100 channels may be combined into a vector matrix with 100 rows*M columns. A fusion feature of 200 channels can be obtained by splicing depth features and expert features. For example, the fusion feature of 200 channels is a vector matrix of 200 rows*M columns. The fusion feature with 200 channels integrates the information of depth feature and expert feature.

It should be noted that the number of channels of the above-mentioned depth feature and expert feature are only exemplary, and can be set according to specific embodiments, which are not limited in the embodiments of the present disclosure.

For example, a fusion processor can be provided, and the fusion processor can obtain the fusion feature of the image to be recognized according to the fusion depth feature and the expert feature; for example, the central processing unit (CPU), image processor (GPU), A tensor processor (TPU), a field programmable logic gate array (FPGA), or other forms of processing units with data processing capabilities and/or instruction execution capabilities and corresponding computer instructions implement the fusion processor. For example, the processing unit may be a general-purpose processor or a special-purpose processor, and may be a processor based on the X86 or ARM architecture.

In the embodiment of the present disclosure, through the above fusion operation, the fusion feature is fused with the information of the depth feature and the expert feature, and the deep neural network can be combined with domain knowledge to overcome the training of the deep neural network due to lack of sample data. The problem of insufficient quantity and the problem that atrial fibrillation is difficult to be accepted by the actual field due to lack of domain knowledge and low accuracy in recognition of atrial fibrillation. It can also overcome the problem of low data quality in actual scenarios. The accuracy of the detection method of atrial fibrillation based on domain knowledge Low-level problems, which can improve the accuracy of automatic detection of atrial fibrillation.

Specifically, in the embodiments of the present disclosure, the atrial fibrillation recognition method based on the deep neural network has better adaptive ability to noise. Since the depth features are automatically extracted rather than manually extracted, it can overcome the difficulty of the method based on domain knowledge. It is difficult to accurately distinguish between atrial fibrillation and other types of arrhythmia in the complex situation where various arrhythmia are mixed. However, the deep neural network-based atrial fibrillation recognition method requires large-scale well-annotated learning samples, which is very difficult. Therefore, in the embodiments of the present disclosure, expert features that are not strongly dependent on the amount of well-annotated samples will be used. (Atrial fibrillation recognition method based on domain knowledge or feature engineering) is incorporated into the modeling process to integrate deep features and expert features, so that it can still provide a more reliable prediction of atrial fibrillation even when the number of well-labeled learning samples is insufficient Accuracy.

FIG. 4 is a flowchart of another image processing method provided by at least one embodiment of the present disclosure. As shown in FIG. 4, based on the example shown in FIG. 1, the image processing method further includes step S150.

Step S150: Obtain unsupervised features of the image to be recognized based on the unsupervised feature extractor.

For example, in this example, in a scene where it is difficult to obtain labeled data in practical applications, before obtaining the unsupervised features of the image to be recognized based on the unsupervised feature extractor, the principal component analysis method, random projection method and sequence autoencoder are used At least one of the unsupervised learning methods trains an unsupervised feature extractor, learns automatic representation and extraction of unsupervised features, and extracts transformed low-dimensional data as unsupervised features.

Unlike supervised learning methods that build models on labeled data, unsupervised learning methods only build models on unlabeled data. Although due to the lack of labeled data, unsupervised learning features are not as effective as supervised learning features, but in practical applications, it may be difficult to obtain labeled data, so automatic learning methods for unsupervised features play an important role in this scenario. The role of.

For example, the unsupervised feature extractor can be obtained by using principal component analysis method, random projection method or sequential autoencoder training.

For example, the principal component analysis (Principle Component Analysis, PCA) method uses orthogonal transformation to convert related variables into linear and uncorrelated principal components. The number of these principal components is far less than the original variables, and it also provides more information. Extract the transformed low-dimensional principal components as unsupervised features.

For example, the Random Projection (RP) method multiplies the original data by a random projection matrix, projects high-dimensional data to lower dimensions to obtain low-dimensional data, and then extracts the transformed low-dimensional data as unsupervised features. The RP algorithm is simple and computationally efficient.

For example, Sequence to Sequence Autoencoder (SeqAE) is a variant of Autoencoder (AE), which replaces the fully connected layer in the encoder and decoder with a cyclic layer. The sequence auto-encoder first transforms the time series data into the representation of the hidden layer, and then the decoder transforms the representation of the hidden layer into the time series data again, and tries to minimize the distance between the original sequence and the decoded sequence. Finally, the representation of the hidden layer is extracted as unsupervised features.

In this example, since the acquisition of unsupervised features is added, step S130 can be expressed as follows:

Step S130: Fusion of depth features, expert features, and unsupervised features to obtain fusion features of the image to be recognized.

For example, in this example, as shown in FIG. 5, feature vectors from different sources are fused, for example, depth features, expert features, and unsupervised features can be spliced to obtain fusion features.

For example, you can perform global pooling operations and mean pooling operations on depth features, expert features, and unsupervised features, respectively, to obtain the global vector and mean vector of depth features, the global vector and mean vector of expert features, and unsupervised features. Global vector and mean vector; concatenate at least one of the global vector and the mean vector of the depth feature, at least one of the global vector and the mean vector of the expert feature, and at least one of the global vector and the mean vector of the unsupervised feature to obtain a fusion feature.

For example, the specific fusion method can refer to the introduction of the fusion process shown in FIG. 3B, which will not be repeated here.

It should be noted that this step S130 may not be limited to the fusion of the above-mentioned features, and may also include the fusion of more other features, which is not limited in the embodiment of the present disclosure.

In the embodiments of the present disclosure, the fusion feature combines the depth feature, the expert feature, and the unsupervised feature to achieve data dimensionality reduction, which can overcome the serious noise pollution problem in the depth feature and the expert feature, thereby further improving the automatic detection of atrial fibrillation. Accuracy.

Specifically, although the traditional method based on domain knowledge can solve the problem of deep neural network's strong dependence on the sample size of the local well-labeled, but it is sensitive to noise, while the unsupervised learning method has low recognition accuracy, but With the functions of automatic noise reduction and no labeling, the graphics processing method provided by the embodiments of the present disclosure combines the advantages of multiple different technologies (deep neural network, domain-based knowledge, and unsupervised learning) at the same time. In the case of insufficient, for noisy ECG data, it can still provide more reliable atrial fibrillation prediction accuracy.

It should be noted that other technologies can also be integrated, which is not limited in the embodiments of the present disclosure.

For step S140, in some examples, classifying the image to be recognized according to the fusion feature of the image to be recognized includes: judging whether the image to be recognized includes atrial fibrillation features according to the fusion feature of the image to be recognized.

For example, when the above-mentioned image processing method is applied to other fields, such as the mechanical field, it can also be used to detect changes in periodic waveforms. The embodiment of the present disclosure does not limit this.

For example, the fusion feature acquired in the above-mentioned embodiment (for example, the fusion feature acquired in FIG. 3B or the fusion feature acquired in FIG. 5) is input into the atrial fibrillation detector to realize atrial fibrillation recognition and detection. For example, the atrial fibrillation detector may be a neural network classifier or an SVM classifier, etc., which is not limited in the embodiment of the present disclosure.

For example, a classification processor can be provided, and the classification processor can classify the image to be recognized according to the fusion characteristics of the image to be recognized; for example, the central processing unit (CPU), image processor (GPU), or tensor processor can also be used to classify the image to be recognized. (TPU), Field Programmable Logic Gate Array (FPGA), or other forms of processing units with data processing capabilities and/or instruction execution capabilities and corresponding computer instructions to implement the classification processor.

It should be noted that, in the embodiments of the present disclosure, the flow of the image processing method may include more or fewer operations, and these operations may be executed sequentially or in parallel. Although the flow of the image processing method described above includes multiple operations appearing in a specific order, it should be clearly understood that the order of the multiple operations is not limited. The image processing method described above may be executed once, or may be executed multiple times according to predetermined conditions.

The image processing method provided by the foregoing embodiments of the present disclosure uses the representation and extraction of expert features based on domain knowledge and the representation and extraction of deep features based on deep neural networks, and uses a unified framework to represent and extract expert features and depth features. Fusion, in order to achieve the purpose of improving the accuracy of automatic detection of atrial fibrillation, so as to provide high-precision intelligent decision support methods for real-time and dynamic atrial fibrillation recognition and diagnosis, helping doctors diagnose and accurately discover the occurrence of atrial fibrillation in patients in time, and help patients in time Understand the changes in the disease, so as to improve the quality of medical care, reduce the incidence of life-threatening conditions such as sudden cardiac death, and ultimately reduce the health and economic burden brought to the family and society.

For example, the foregoing graphics processing method can be implemented by the image processing system shown in FIG. 1B. As shown in FIG. 1B, the image processing system 10 may include a user terminal 11, a network 12, a server 13, and a database 14.

The user terminal 11 may be, for example, the computer 11-1 and the mobile phone 11-2 shown in FIG. 1B. It is understandable that the user terminal 11 may be any other type of electronic device capable of performing data processing, which may include, but is not limited to, a desktop computer, a notebook computer, a tablet computer, a smart phone, a smart home device, a wearable device, and in-vehicle electronics. Equipment, monitoring equipment, etc. The user terminal may also be any equipment provided with electronic equipment, such as vehicles, robots, and so on.

The user terminal provided according to the embodiment of the present disclosure may be used to receive the image to be recognized, and use the method provided by the embodiment of the present disclosure to realize image recognition and classification. For example, the user terminal 11 may collect an image to be recognized through an image acquisition device (not shown in the figure, such as a camera, a video camera, etc.) provided on the user terminal 11. For another example, the user terminal 11 may also receive the image to be recognized from an independently set image acquisition device. For another example, the user terminal 11 may also receive the image to be recognized from the server 13 via the network. The image to be recognized can be a single image or a frame in the video. In the case that the image to be recognized is a medical image, the user terminal may also receive the image to be recognized with the medical acquisition device.

In some embodiments, the processing unit of the user terminal 11 may be used to execute the image processing method provided in the embodiments of the present disclosure. In some implementation manners, the user terminal 11 may use a built-in application program of the user terminal 11 to execute the image processing method. In other implementation manners, the user terminal 11 may execute the image processing method provided by at least one embodiment of the present disclosure by calling an application program stored externally of the user terminal 11.

In other embodiments, the user terminal 11 sends the received image to be recognized to the server 13 via the network 12, and the server 13 executes the image processing method. In some implementation manners, the server 13 may execute the image processing method by using an application program built in the server. In other implementation manners, the server 13 may execute the image processing method by calling an application program stored externally of the server 13.

The network 12 may be a single network, or a combination of at least two different networks. For example, the network 12 may include, but is not limited to, one or a combination of several of a local area network, a wide area network, a public network, and a private network.

The server 13 may be a single server or a server group, and each server in the group is connected through a wired or wireless network. A server group can be centralized, such as a data center, or distributed. The server 13 may be local or remote.

The database 14 can generally refer to a device with a storage function. The database 13 is mainly used to store various data used, generated, and output from the work of the user terminal 11 and the server 13. The database 14 can be local or remote. The database 14 may include various memories, such as Random Access Memory (RAM), Read Only Memory (ROM), and so on. The storage devices mentioned above are just a few examples, and the storage devices that can be used in the system are not limited to these.

The database 14 may be connected or communicated with the server 13 or a part thereof via the network 12, or directly connected or communicated with the server 13 or a combination of the above two methods.

In some embodiments, the database 15 may be a stand-alone device. In other embodiments, the database 15 may also be integrated in at least one of the user terminal 11 and the server 14. For example, the database 15 may be set on the user terminal 11 or on the server 14. For another example, the database 15 may also be distributed, a part of which is set on the user terminal 11, and the other part is set on the server 14.

FIG. 6 is a schematic block diagram of an image processing apparatus provided by at least one embodiment of the present disclosure. For example, in the example shown in FIG. 6, the image processing device 100 includes a depth feature extractor 110, an expert feature extractor 120, a fusion processor 130 and a classification processor 140. For example, these feature extractors and processors can be implemented by hardware (for example, circuit) modules or software modules, etc. The following embodiments are the same as this, and will not be repeated. For example, a central processing unit (CPU), an image processor (GPU), a tensor processor (TPU), a field programmable logic gate array (FPGA), or other forms of data processing capabilities and/or instruction execution capabilities can be used. Processing units and corresponding computer instructions implement these processors or extractors.

The depth feature extractor 110 is configured to obtain the depth feature of the image to be recognized. For example, the image to be recognized is a medical image. For example, the depth feature extractor 110 can implement step S110, and its specific implementation method can refer to the related description of step S110, which will not be repeated here.

The expert feature extractor 120 is configured to obtain expert features of the image to be recognized. For example, the expert feature extractor 120 can implement step S120, and its specific implementation method can refer to the related description of step S120, which will not be repeated here.

The fusion processor 130 is configured to fuse the depth feature and the expert feature to obtain the fusion feature of the image to be recognized. For example, the fusion processor 130 may implement step S130, and its specific implementation method can refer to the related description of step S130, which will not be repeated here.

The classification processor 140 is configured to classify the image to be recognized according to the fusion feature of the image to be recognized. For example, the classification processor 140 can implement step S140, and the specific implementation method can refer to the related description of step S140, which will not be repeated here.

For example, the depth feature extractor 110 is further configured to obtain the depth feature of the image to be recognized by using a deep neural network.

For example, in the image processing apparatus provided by at least one embodiment of the present disclosure, the expert feature extractor 120 is further configured to extract the expert features of the image to be recognized based on empirical formulas, rules, and feature values obtained from medical image data.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the category of expert features includes at least one of statistics, morphology, time domain, and frequency domain.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the classification processor 140 is further configured to determine whether the image to be identified contains atrial fibrillation features according to the fusion features of the image to be identified.

FIG. 7 is a schematic block diagram of another image processing apparatus provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 7, based on the example shown in FIG. 6, the image processing apparatus 100 further includes an unsupervised feature extractor 150.

For example, the unsupervised feature extractor 150 is configured to obtain unsupervised features of the image to be recognized. For example, the unsupervised feature extractor 150 can implement step S150, and its specific implementation method can refer to the related description of step S150, which will not be repeated here.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the unsupervised feature extractor 150 is further configured to use principal component analysis and random projection before acquiring the unsupervised features of the image to be recognized based on the unsupervised feature extractor. At least one of the method and sequence autoencoder is trained to obtain an unsupervised feature extractor.

For example, in this example, the fusion processor 130 is also configured to fuse depth features, expert features, and unsupervised features to obtain fusion features of the image to be recognized.

For example, in the image processing device provided by at least one embodiment of the present disclosure, the fusion processor 130 is further configured to splice depth features, expert features, and unsupervised features to obtain fusion features.

For example, in the image processing apparatus provided by at least one embodiment of the present disclosure, the fusion processor 130 is further configured to perform global pooling operations and average pooling operations on depth features, expert features, and unsupervised features, respectively, to obtain depths respectively. The global vector and mean vector of the feature, the global vector and the mean vector of the expert feature, the global vector and the mean vector of the unsupervised feature; at least one of the global vector and the mean vector of the spliced depth feature, the global vector and the mean vector of the expert feature At least one of and at least one of the global vector and the mean vector of the unsupervised feature to obtain the fusion feature.

It should be noted that in the embodiments of the present disclosure, more or fewer circuits or units may be included, and the connection relationship between the respective circuits or units is not limited, and may be determined according to actual requirements. The specific structure of each circuit is not limited, and may be composed of analog devices according to the circuit principle, or may be composed of digital chips, or be composed in other suitable manners.

FIG. 8 is a schematic block diagram of another image processing apparatus provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 8, the image processing apparatus 200 includes a processor 210, a memory 220, and one or more computer program modules 221.

For example, the processor 210 and the memory 220 are connected through a bus system 230. For example, one or more computer program modules 221 are stored in the memory 220. For example, one or more computer program modules 221 include instructions for executing the image processing method provided by any embodiment of the present disclosure. For example, instructions in one or more computer program modules 221 may be executed by the processor 210. For example, the bus system 230 may be a commonly used serial or parallel communication bus, etc., which is not limited in the embodiments of the present disclosure.

For example, the processor 210 may be a central processing unit (CPU), an image processing unit (GPU), or another form of processing unit with data processing capabilities and/or instruction execution capabilities, and may be a general-purpose processor or a special-purpose processor, and Other components in the image processing apparatus 200 can be controlled to perform desired functions.

The memory 220 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), for example. The non-volatile memory may include read-only memory (ROM), hard disk, flash memory, etc., for example. One or more computer program instructions may be stored on a computer-readable storage medium, and the processor 210 may run the program instructions to implement the functions (implemented by the processor 210) and/or other desired functions in the embodiments of the present disclosure, For example, image processing methods. Various application programs and various data, such as depth features, expert features, and various data used and/or generated by the application programs, can also be stored in the computer-readable storage medium.

It should be noted that, for the sake of clarity and conciseness, the embodiment of the present disclosure does not provide all the components of the image processing apparatus 200. In order to realize the necessary functions of the image processing apparatus 200, those skilled in the art can provide and set other unshown component units according to specific needs, and the embodiments of the present disclosure do not limit this.

Regarding the technical effects of the image processing device 100 and the image processing device 200 in different embodiments, reference may be made to the technical effects of the image processing method provided in the embodiments of the present disclosure, which will not be repeated here.

The image processing apparatus 100 and the image processing apparatus 200 can be used in various appropriate electronic devices. FIG. 9 is a schematic diagram of an electronic device provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 9, in some examples, the electronic device 300 includes a central processing unit (CPU) 301, which can be loaded to a random access memory according to a program stored in a read-only memory (ROM) 302 or from a storage device 308 (RAM) The program in 303 executes various appropriate actions and processing. In the RAM 303, various programs and data required for the operation of the computer system are also stored. The CPU 301, the ROM 302, and the RAM 303 are connected by this through the bus 304. An input/output (I/O) interface 305 is also connected to the bus 304.

The following components are connected to the I/O interface 305: an input device 306 including a keyboard, a mouse, etc.; an output device 307 such as a liquid crystal display (LCD) and a speaker; a storage device 308 including a hard disk; and a storage device 308, such as a LAN card, A communication device 309 of a network interface card such as a modem. The communication device 309 performs communication processing via a network such as the Internet. The driver 310 is also connected to the I/O interface 305 as needed. A removable medium 311, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 310 as needed, so that the computer program read from it can be installed into the storage device 309 as needed.

For example, the electronic device 300 may further include an image acquisition device (not shown in the figure), a peripheral interface (not shown in the figure), and the like. For example, the image acquisition device may include an imaging sensor and a lens, the image sensor may be of a CMOS type or a CCD type, and the lens may include one or more lenses (convex lens or concave lens, etc.). The peripheral interface can be various types of interfaces, such as a USB interface, a lightning interface, and the like. The communication device 309 can communicate with a network and other devices through wireless communication, such as the Internet, an intranet, and/or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and/or a metropolitan area network ( MAN). Wireless communication can use any of a variety of communication standards, protocols and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA) , Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi (e.g. based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n standards), voice transmission based on Internet protocol (VoIP), Wi-MAX, protocols used for e-mail, instant messaging and/or short message service (SMS), or any other suitable communication protocol.

For example, the electronic device can be any device such as a mobile phone, a tablet computer, a notebook computer, an e-book, a game console, a television, a digital photo frame, a navigator, etc., or can be any combination of electronic devices and hardware. This is not limited.

For example, the electronic device may be a medical electronic device. The image acquisition device may be used to acquire an image to be recognized, for example, a medical image. The medical images mentioned here can be, for example, medical images collected by CT, MRI, ultrasound, X-ray, radionuclide imaging (such as SPECT, PET), etc., or can be displays such as electrocardiogram, electroencephalogram, optical photography, etc. Images of human body physiological information.

For example, in some examples, the medical electronic equipment may be any medical imaging equipment such as CT, MRI, ultrasound, X-ray equipment. The image acquisition device may be implemented as the imaging unit of the above-mentioned medical imaging device, and the image processing device 100/200 may be implemented by the internal processing unit (for example, a processor) of the medical imaging device.

At least one embodiment of the present disclosure also provides a storage medium. FIG. 10 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 10, the storage medium 400 does not store computer-readable instructions 401. When the computer-readable instructions are executed by a computer (including a processor), the image processing method provided in any embodiment of the present disclosure can be executed.

For example, the storage medium may be any combination of one or more computer-readable storage media. For example, one computer-readable storage medium contains computer-readable program code for extracting depth features in the image to be recognized, and another computer-readable storage medium The medium contains computer-readable program codes that fuse the depth feature and expert feature of the image to be recognized to obtain the fusion feature. For example, when the program code is read by a computer, the computer can execute the program code stored in the computer storage medium, and execute, for example, the image processing method provided in any embodiment of the present disclosure.

For example, the storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), Portable compact disk read-only memory (CD-ROM), flash memory, or any combination of the foregoing storage media may also be other suitable storage media.

The following points need to be explained:

(1) The drawings of the embodiments of the present disclosure only refer to the structures related to the embodiments of the present disclosure, and other structures can refer to the usual design.

(2) In the case of no conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other to obtain new embodiments.

The above are only exemplary implementations of the present disclosure, and are not used to limit the protection scope of the present disclosure, which is determined by the appended claims.

Claims

An image processing device, including:

A depth feature extractor, configured to obtain the depth feature of the image to be recognized, wherein the image to be recognized is a medical image;

An expert feature extractor, configured to obtain expert features of the image to be recognized;

A fusion processor configured to fuse the depth feature and the expert feature to obtain the fusion feature of the image to be recognized;

The classification processor is configured to classify the image to be identified according to the fusion feature of the image to be identified.
The image processing device according to claim 1, further comprising:

An unsupervised feature extractor, configured to obtain unsupervised features of the image to be recognized;

Wherein, the fusion processor is further configured to fuse the depth feature, the expert feature, and the unsupervised feature to obtain the fusion feature of the image to be recognized.
The image processing device according to claim 1 or 2, wherein the depth feature extractor is further configured to obtain the depth feature of the image to be recognized by using a deep neural network.
The image processing device according to any one of claims 1 to 3, wherein the expert feature extractor is further configured to extract the expert features of the image to be recognized based on empirical formulas, rules, and feature values obtained from medical image data .
The image processing device according to any one of claims 1 to 4, wherein the category of the expert feature includes at least one of statistics, morphology, time domain, and frequency domain.
2. The image processing device according to claim 2, wherein the unsupervised feature extractor is further configured to use principal component analysis, prior to the acquisition of the unsupervised features of the image to be recognized based on the unsupervised feature extractor, At least one of a random projection method and a sequence auto-encoder is trained to obtain the unsupervised feature extractor.
3. The image processing device according to claim 2, wherein the fusion processor is further configured to splice the depth feature, the expert feature, and the unsupervised feature to obtain the fusion feature.
The image processing device according to claim 7, wherein the fusion processor is further configured to:

Perform a global pooling operation and an average pooling operation on the depth feature, the expert feature, and the unsupervised feature, respectively, to obtain the global vector and the mean vector of the depth feature, the global vector and the expert feature respectively A mean vector and the global vector and mean vector of the unsupervised feature;

Combine at least one of the global vector and the mean vector of the depth feature, at least one of the global vector and the mean vector of the expert feature, and at least one of the global vector and the mean vector of the unsupervised feature to obtain The fusion characteristics.
8. The image processing device according to any one of claims 1-8, wherein the classification processor is further configured to determine whether the to-be-recognized image contains atrial fibrillation features according to the fusion characteristics of the to-be-recognized image.
An image processing method, including:

Obtain the depth features of the image to be recognized based on the depth feature extractor;

Acquiring the expert features of the image to be recognized based on the expert feature extractor;

Fusing the depth feature and the expert feature to obtain the fusion feature of the image to be recognized;

The image to be identified is classified according to the fusion feature of the image to be identified; wherein the image to be identified is a medical image.
The image processing method according to claim 10, further comprising:

Acquiring the unsupervised features of the image to be recognized based on an unsupervised feature extractor;

The depth feature, the expert feature, and the unsupervised feature are fused to obtain the fusion feature of the image to be recognized.
The image processing method according to claim 11, wherein the fusion of the depth feature, the expert feature, and the unsupervised feature to obtain the fusion feature of the image to be recognized comprises:

The depth feature, the expert feature, and the unsupervised feature are spliced to obtain the fusion feature.
The image processing method according to claim 12, wherein the fusion of the depth feature, the expert feature, and the unsupervised feature to obtain the fusion feature of the image to be recognized comprises:

Perform a global pooling operation and an average pooling operation on the depth feature, the expert feature, and the unsupervised feature, respectively, to obtain the global vector and the mean vector of the depth feature, the global vector and the expert feature respectively A mean vector and the global vector and mean vector of the unsupervised feature;

Combine at least one of the global vector and the mean vector of the depth feature, at least one of the global vector and the mean vector of the expert feature, and at least one of the global vector and the mean vector of the unsupervised feature to obtain The fusion characteristics.
An image processing device, including:

processor;

Memory; one or more computer program modules, the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules include for execution Instructions for implementing the image processing method of any one of claims 10-13.
A storage medium storing computer readable instructions, and when the computer readable instructions are executed by a processor, the image processing method according to any one of claims 10-13 can be executed.