CN111260608A - Tongue region detection method and system based on deep learning - Google Patents
Tongue region detection method and system based on deep learning Download PDFInfo
- Publication number
- CN111260608A CN111260608A CN202010017676.7A CN202010017676A CN111260608A CN 111260608 A CN111260608 A CN 111260608A CN 202010017676 A CN202010017676 A CN 202010017676A CN 111260608 A CN111260608 A CN 111260608A
- Authority
- CN
- China
- Prior art keywords
- data set
- tongue
- image data
- processing
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 127
- 238000013135 deep learning Methods 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 44
- 238000002372 labelling Methods 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000003064 k means clustering Methods 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 68
- 238000013519 translation Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 7
- 238000005520 cutting process Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims description 6
- 210000002105 tongue Anatomy 0.000 description 96
- 238000010586 diagram Methods 0.000 description 13
- 238000004590 computer program Methods 0.000 description 7
- 238000003745 diagnosis Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000011897 real-time detection Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000005286 illumination Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a tongue region detection method and a tongue region detection system based on deep learning, wherein the method comprises the following steps: labeling the acquired image data set containing the tongue part, and preprocessing the labeled image data set to acquire a first image data set; setting the proportional sizes of various fixed reference frames, and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames; training based on a DarkNet network structure, determining the dimension of an output layer according to the plurality of clustering center reference frames, and training according to the first image data set to determine a first detection model; detecting the image dataset without the tongue by using the first detection model to obtain a false detection image dataset for false detection; and adjusting the network structure of the first detection model, modifying the dimension of an output layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
Description
Technical Field
The present invention relates to the technical field of deep learning algorithm, and more particularly, to a tongue region detection method and system based on deep learning.
Background
At present, many tongue diagnosis algorithms based on the traditional Chinese medicine theory require a person to be collected to stretch out the tongue at a fixed distance and a fixed area through a fixing device or equipment when tongue diagnosis analysis is carried out. And then analyzing the pixel points in the fixed area in the picture. In the real world, different people have different tongue extending states and extending sizes, and a fixed area is used, so that the situation that background pixels are more than tongue area pixels can occur, and the tongue diagnosis analysis has great influence on the actual tongue diagnosis analysis.
Furthermore, the prior art application scenarios have great limitations. There are major problems including: 1. the method has a lot of tongue picture shooting requirements on users, and the users need to extend tongues in a prompt box of a shooting interface, so that the method is very inconvenient and has limited application scenes; 2. the prompt frame area is very roughly used as the real tongue area for tongue diagnosis analysis, and the pixel proportion of the background area in the actual situation may be very large, which may affect the accuracy of tongue diagnosis analysis.
Therefore, there is a need for a tongue region detection method to accurately and intelligently determine the tongue region.
Disclosure of Invention
The invention provides a tongue region detection method and a tongue region detection system based on deep learning, and aims to solve the problem of accurately and intelligently determining a tongue region.
In order to solve the above problem, according to an aspect of the present invention, there is provided a tongue region detection method based on deep learning, the method including:
labeling the acquired image data set containing the tongue part, and preprocessing the labeled image data set to acquire a first image data set;
setting the proportional sizes of various fixed reference frames, and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
training based on a DarkNet network structure, determining the dimension of an output layer according to the plurality of clustering center reference frames, and training according to the first image data set to determine a first detection model;
detecting the image dataset without the tongue by using the first detection model to obtain a false detection image dataset for false detection;
and adjusting the network structure of the first detection model, modifying the dimensionality of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
Preferably, the labeling of the acquired image dataset containing the tongue comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the preprocessing the annotated image data set to obtain the first image data set comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
Preferably, the training based on the DarkNet network structure, determining the output layer dimension according to the plurality of cluster center reference frames, and training according to the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
According to another aspect of the present invention, there is provided a tongue region detection system based on deep learning, the system comprising:
the data processing unit is used for labeling the acquired image data set containing the tongue part and preprocessing the labeled image data set to acquire a first image data set;
the clustering unit is used for setting the proportional sizes of various fixed reference frames and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
the first detection model determining unit is used for training based on a DarkNet network structure, determining the dimensionality of an output layer according to the multiple clustering center reference frames, and training according to the first image data set to determine a first detection model;
the false detection data acquisition unit is used for detecting the image dataset which does not contain the tongue part by using the first detection model so as to acquire a false detection image dataset for false detection;
and the tongue detection model determining unit is used for adjusting the network structure of the first detection model, modifying the dimension of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
Preferably, the data processing unit labeling the acquired image data set including the tongue portion includes:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the data processing unit, which pre-processes the annotated image data set to obtain the first image data set, comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
Preferably, the training of the first detection model determining unit based on the DarkNet network structure, the determination of the output layer dimension according to the plurality of cluster center reference frames, and the training of the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
The invention provides a tongue region detection method and a tongue region detection system based on deep learning, wherein a tongue detection model is trained by utilizing a deep convolutional network, so that tongue picture pictures collected under different scenes, different illumination, different pixels and different image sizes can be judged, and the existence of a tongue and the size and the position of the tongue region are determined; only one detection target is used, a simpler convolution structure is used, the number of network layers is only 53, the real-time detection effect can be achieved while the accuracy is guaranteed, and the method is very convenient for various applications of various devices; the shooting requirement on the user is reduced, and the tongue can be correctly detected at any position in the image.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
fig. 1 is a flowchart of a tongue region detection method 100 based on deep learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of cluster distance setting according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of determining a tongue detection model according to an embodiment of the invention; and
fig. 4 is a schematic structural diagram of a tongue region detection system 400 based on deep learning according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 is a flowchart of a tongue region detection method 100 based on deep learning according to an embodiment of the present invention. As shown in fig. 1, in the tongue region detection method based on deep learning according to the embodiment of the present invention, a tongue detection model is trained by using a deep convolutional network, so that tongue photographs acquired under different scenes, different illumination, different pixels, and different image sizes can be judged, and whether a tongue exists, and the size and the position of the tongue region are determined; only one detection target is used, a simpler convolution structure is used, the number of network layers is only 53, the real-time detection effect can be achieved while the accuracy is guaranteed, and the method is very convenient for various applications of various devices; the shooting requirement on the user is reduced, and the tongue can be correctly detected at any position in the image. The tongue region detection method 100 based on deep learning provided by the embodiment of the invention starts from step 101, and in step 101, the acquired image dataset including the tongue is labeled, and the labeled image dataset is preprocessed to acquire a first image dataset.
Preferably, the labeling of the acquired image dataset containing the tongue comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the preprocessing the annotated image data set to obtain the first image data set comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
In an embodiment of the invention, a picture data set a containing the tongue is acquired, the tongue region in the data set is labeled by means of a labeling tool Labelimg, and the position of the tongue is marked in the form of a rectangular frame. And then, selecting data with a first preset quantity threshold value of 40% for data enhancement, wherein the data enhancement comprises horizontal 180-degree overturning, rotation from 15 degrees anticlockwise to 15 degrees clockwise, translation from 1% up, down, left and right to 10%, random clipping of pixels within 20% of the edge of the picture, Gaussian filtering with the filter size of 3 x 3 and 5 x 5 … … 17 x 17, scaling and the like, and any combination of the above. Finally, the image is scaled such that the scaled long edge is 416. If the scaled height is h and h is less than 416, filling pixel regions with the height and width (208-0.5 x h, 416) respectively at the upper and lower parts of the image, setting the pixel values to be 128 with fixed size, and synchronously adjusting the coordinate position of the marking frame in the image, and finally determining the first image data set.
In step 102, the proportional sizes of the multiple fixed reference frames are set, and clustering is performed in a k-means clustering manner to obtain multiple clustering center reference frames.
In the embodiment of the invention, in order to detect tongues with different proportions and sizes in the picture, 9 fixed reference frames anchors with different proportions and sizes are arranged. Specifically, a k-means clustering mode is adopted, and the ratio of the height and width of 9 labeling frames to the height and width of the original image is randomly selected from all labeling images to serve as a clustering center anchor. The clustering distances are set as shown in fig. 2, and the lengths and widths of B1 and B2 are the ratios of the height and width of the labeled frame to the height and width of the original image in different pictures. And (3) coinciding the centers of B1 and B2, and taking the ratio of the intersection area of B1 and B2 to the union area as the distance of the k-means cluster. After multiple rounds of clustering, 9 clustering centers anchors can be obtained.
In step 103, training is performed based on the DarkNet network structure, an output layer dimension is determined according to the plurality of clustering center reference frames, and training is performed according to the first image data set to determine a first detection model.
Preferably, the training based on the DarkNet network structure, determining the output layer dimension according to the plurality of cluster center reference frames, and training according to the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
In the embodiment of the invention, the main network of the training model adopts DarkNet-53, fuses the feature maps of three different scales, namely 13 × 13, 26 × 26 and 52 × 52, to perform target prediction, and each feature map of different scale sizes uses 3 anchors of different sizes, and the total number of the anchors is 9. Since there is only one target to be detected, the output layer dimension is 3 × 18 (1+5), where 3 denotes predicting three bounding boxes, 1 denotes that the prediction category is only one type, and 5 denotes the center coordinates, length and width, and target score of the predicted target. Then, model training is performed using 80% of the data in the first image data set as a training data set and 20% of the data in the first image data set as a verification set, resulting in a first detection model.
In step 104, the image dataset not containing the tongue is detected using the first detection model to obtain a false detected image dataset.
In step 105, the network structure of the first detection model is adjusted, the output layer dimension is modified, the model is initialized by using the parameters of the first detection model except the last layer, and the tongue detection model is determined by using the first image data set and the false detection image data set for detection of the tongue region.
The first detection model obtained by the embodiment of the invention has high target detection recall rate, but has high false detection rate on the image data set without the tongue part. Therefore, after the normal image data not including the tongue portion is acquired, the image data set not including the tongue portion is detected by the first detection model, the data erroneously detected by the first detection model in the image data set not including the tongue portion is collected, and the erroneously detected data is set as the second type object of the object detection.
And then, adjusting the network structure of the first detection model, changing the dimension of an output layer to 3 × 2+5 to 21, keeping other structures unchanged, initializing a new model by using parameters of the first detection model except the last layer, adding false detection data as training data, retraining the model, and obtaining a final tongue detection model.
FIG. 3 is a schematic diagram of determining a tongue detection model according to an embodiment of the invention. As shown in fig. 3, the process of determining the tongue detection model includes:
(1) a picture data set a containing the tongue and a normal data set B not containing the tongue are acquired.
(2) And (6) data annotation. Labeling the data set A, labeling the tongue region in the data set by means of a labeling tool Labelimg, and labeling the position of the tongue in a rectangular frame form.
(3) And (4) enhancing data. And performing data enhancement processing on about 40% of pictures in the A data set to expand the existing data set.
(4) And (5) data scaling processing. The image is scaled such that the scaled long edge is 416. If the scaled height is h and h is less than 416, then pixel regions with height and width (208-0.5 x h, 416) are filled up and down the image, the pixel value is set to 128, and the coordinate position of the marking frame in the image is synchronously adjusted.
(5) And (5) constructing and training a model. In order to detect tongues with different proportions and sizes in the picture, 9 anchors with different proportions and sizes are arranged. And clustering by adopting a k-means clustering mode, and determining 9 clustering centers anchors. The main network adopts DarkNet-53 to fuse the feature maps of three different scales, namely 13 × 13, 26 × 26 and 52 × 52, to perform target prediction, and each feature map of different scale sizes uses 3 anchors of different sizes, and the total number of the anchors is 9. Since there is only one target to be detected, the output layer dimension is 3 × 18 (1+5), where 3 denotes predicting three bounding boxes, 1 denotes that the prediction category is only one type, and 5 denotes the center coordinates, length and width, and target score of the predicted target. Model training was performed using 80% of the data in data set a as the training data set and 10% of the data as the validation set, resulting in model M0.
(6) And determining false detection data. M0 has a high false detection rate on the non-tongue data set B, collects the data false detected in the data set B by M0, and takes the false detected area as the second type target of target detection.
(7) The network structure of M0 was modified to change the output layer dimension to 3 x (2+5) to 21, the other structures were left unchanged, and the new model was initialized using the parameters of M0 except for the last layer. And adding M0 false detection data in the data set B as training data, and retraining the model to obtain a model M1.
According to the embodiment of the invention, the target to be detected is defined, then the basic network model M0 is trained, then the M0 false detection region is used as a background target, the model is finely adjusted, a new model is generated in an iterative manner, and the position of the tongue are judged in the picture by applying the deep convolutional network intelligence, so that the effect of real-time detection can be achieved, and the use of various devices is very convenient.
Fig. 4 is a schematic structural diagram of a tongue region detection system 400 based on deep learning according to an embodiment of the present invention. As shown in fig. 4, a tongue region detection system 400 based on deep learning according to an embodiment of the present invention includes: data processing section 401, clustering section 402, first detection model determining section 403, false detection data acquiring section 404, and tongue detection model determining section 405.
Preferably, the data processing unit 401 is configured to label the acquired image data set including the tongue portion, and pre-process the labeled image data set to acquire the first image data set.
Preferably, the data processing unit 401, labeling the acquired image data set containing the tongue portion, includes: labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the data processing unit 401, performing preprocessing on the annotated image data set to obtain a first image data set, includes: selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set; and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises: the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
Preferably, the clustering unit 402 is configured to set proportional sizes of multiple fixed reference frames, and perform clustering by using a k-means clustering method to obtain multiple clustering center reference frames.
Preferably, the first detection model determining unit 403 is configured to perform training based on a DarkNet network structure, determine an output layer dimension according to the plurality of clustering center reference frames, and perform training according to the first image data set to determine a first detection model.
Preferably, the training of the first detection model determining unit 403 based on the DarkNet network structure, determining the output layer dimension according to the plurality of cluster center reference frames, and training according to the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
Preferably, the false detection data acquiring unit 404 is configured to detect an image dataset without tongue by using the first detection model to acquire a false detection image dataset for false detection.
Preferably, the tongue detection model determining unit 405 is configured to adjust a network structure of the first detection model, modify an output layer dimension, initialize a model by using parameters of the first detection model except a last layer, and perform retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
The tongue region detection system 400 based on deep learning according to the embodiment of the present invention corresponds to the tongue region detection method 100 based on deep learning according to another embodiment of the present invention, and is not described herein again.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (10)
1. A tongue region detection method based on deep learning is characterized by comprising the following steps:
labeling the acquired image data set containing the tongue part, and preprocessing the labeled image data set to acquire a first image data set;
setting the proportional sizes of various fixed reference frames, and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
training based on a DarkNet network structure, determining the dimension of an output layer according to the plurality of clustering center reference frames, and training according to the first image data set to determine a first detection model;
detecting the image dataset without the tongue by using the first detection model to obtain a false detection image dataset for false detection;
and adjusting the network structure of the first detection model, modifying the dimensionality of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
2. The method of claim 1, wherein said labeling the acquired image dataset containing the tongue comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
3. The method of claim 1, wherein pre-processing the annotated image dataset to obtain the first image dataset comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
4. The method of claim 3, wherein the data enhancement process comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
5. The method of claim 1, wherein the training based on a DarkNet network structure, determining output layer dimensions from the plurality of cluster center reference boxes, and training from the first image dataset to determine a first detection model comprises:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
6. A deep learning based tongue region detection system, the system comprising:
the data processing unit is used for labeling the acquired image data set containing the tongue part and preprocessing the labeled image data set to acquire a first image data set;
the clustering unit is used for setting the proportional sizes of various fixed reference frames and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
the first detection model determining unit is used for training based on a DarkNet network structure, determining the dimensionality of an output layer according to the multiple clustering center reference frames, and training according to the first image data set to determine a first detection model;
the false detection data acquisition unit is used for detecting the image dataset which does not contain the tongue part by using the first detection model so as to acquire a false detection image dataset for false detection;
and the tongue detection model determining unit is used for adjusting the network structure of the first detection model, modifying the dimension of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
7. The system of claim 6, wherein the data processing unit, labeling the acquired image dataset containing the tongue, comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
8. The system of claim 6, wherein the data processing unit pre-processes the annotated image dataset to obtain the first image dataset, comprising:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
9. The system of claim 8, wherein the data enhancement process comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
10. The system of claim 6, wherein the first detection model determination unit, trained based on a DarkNet network structure, determines output layer dimensions from the plurality of cluster center reference boxes, and is trained from the first image dataset to determine a first detection model, comprises:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010017676.7A CN111260608A (en) | 2020-01-08 | 2020-01-08 | Tongue region detection method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010017676.7A CN111260608A (en) | 2020-01-08 | 2020-01-08 | Tongue region detection method and system based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111260608A true CN111260608A (en) | 2020-06-09 |
Family
ID=70954122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010017676.7A Pending CN111260608A (en) | 2020-01-08 | 2020-01-08 | Tongue region detection method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111260608A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149684A (en) * | 2020-08-19 | 2020-12-29 | 北京豆牛网络科技有限公司 | Image processing method and image preprocessing method for target detection |
CN114445682A (en) * | 2022-01-28 | 2022-05-06 | 北京百度网讯科技有限公司 | Method, device, electronic equipment, storage medium and product for training model |
CN115512241A (en) * | 2022-11-01 | 2022-12-23 | 中国科学院半导体研究所 | Data enhancement method and device for target detection in large-scale sparse sample remote sensing images |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108470138A (en) * | 2018-01-24 | 2018-08-31 | 博云视觉(北京)科技有限公司 | Method for target detection and device |
CN108960076A (en) * | 2018-06-08 | 2018-12-07 | 东南大学 | Ear recognition and tracking based on convolutional neural networks |
CN109063594A (en) * | 2018-07-13 | 2018-12-21 | 吉林大学 | Remote sensing images fast target detection method based on YOLOv2 |
CN109522963A (en) * | 2018-11-26 | 2019-03-26 | 北京电子工程总体研究所 | A kind of the feature building object detection method and system of single-unit operation |
CN110210391A (en) * | 2019-05-31 | 2019-09-06 | 合肥云诊信息科技有限公司 | Tongue picture grain quantitative analysis method based on multiple dimensioned convolutional neural networks |
CN110263660A (en) * | 2019-05-27 | 2019-09-20 | 魏运 | A kind of traffic target detection recognition method of adaptive scene changes |
CN110490073A (en) * | 2019-07-15 | 2019-11-22 | 浙江省北大信息技术高等研究院 | Object detection method, device, equipment and storage medium |
WO2019233297A1 (en) * | 2018-06-08 | 2019-12-12 | Oppo广东移动通信有限公司 | Data set construction method, mobile terminal and readable storage medium |
-
2020
- 2020-01-08 CN CN202010017676.7A patent/CN111260608A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108470138A (en) * | 2018-01-24 | 2018-08-31 | 博云视觉(北京)科技有限公司 | Method for target detection and device |
CN108960076A (en) * | 2018-06-08 | 2018-12-07 | 东南大学 | Ear recognition and tracking based on convolutional neural networks |
WO2019233297A1 (en) * | 2018-06-08 | 2019-12-12 | Oppo广东移动通信有限公司 | Data set construction method, mobile terminal and readable storage medium |
CN109063594A (en) * | 2018-07-13 | 2018-12-21 | 吉林大学 | Remote sensing images fast target detection method based on YOLOv2 |
CN109522963A (en) * | 2018-11-26 | 2019-03-26 | 北京电子工程总体研究所 | A kind of the feature building object detection method and system of single-unit operation |
CN110263660A (en) * | 2019-05-27 | 2019-09-20 | 魏运 | A kind of traffic target detection recognition method of adaptive scene changes |
CN110210391A (en) * | 2019-05-31 | 2019-09-06 | 合肥云诊信息科技有限公司 | Tongue picture grain quantitative analysis method based on multiple dimensioned convolutional neural networks |
CN110490073A (en) * | 2019-07-15 | 2019-11-22 | 浙江省北大信息技术高等研究院 | Object detection method, device, equipment and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149684A (en) * | 2020-08-19 | 2020-12-29 | 北京豆牛网络科技有限公司 | Image processing method and image preprocessing method for target detection |
CN112149684B (en) * | 2020-08-19 | 2024-06-07 | 北京豆牛网络科技有限公司 | Image processing method and image preprocessing method for target detection |
CN114445682A (en) * | 2022-01-28 | 2022-05-06 | 北京百度网讯科技有限公司 | Method, device, electronic equipment, storage medium and product for training model |
CN115512241A (en) * | 2022-11-01 | 2022-12-23 | 中国科学院半导体研究所 | Data enhancement method and device for target detection in large-scale sparse sample remote sensing images |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163198B (en) | Table identification reconstruction method and device and storage medium | |
CN107358149B (en) | Human body posture detection method and device | |
US10497099B2 (en) | Automatic orientation adjustment of spherical panorama digital images | |
Saxena et al. | Make3d: Learning 3d scene structure from a single still image | |
US7352881B2 (en) | Method for tracking facial features in a video sequence | |
CN112633144A (en) | Face occlusion detection method, system, device and storage medium | |
US11113571B2 (en) | Target object position prediction and motion tracking | |
CN110197146A (en) | Facial image analysis method, electronic device and storage medium based on deep learning | |
CN112016614A (en) | Construction method of optical image target detection model, target detection method and device | |
US7995866B2 (en) | Rotation angle detection apparatus, and control method and control program of rotation angle detection apparatus | |
CN111260608A (en) | Tongue region detection method and system based on deep learning | |
CN108596098B (en) | Human body part analysis method, system, device and storage medium | |
CN107301408A (en) | Human body mask extracting method and device | |
US20180101981A1 (en) | Smoothing 3d models of objects to mitigate artifacts | |
CN113570615A (en) | An image processing method, electronic device and storage medium based on deep learning | |
CN110705366A (en) | Real-time human head detection method based on stair scene | |
CN109544516B (en) | Image detection method and device | |
KR20190080388A (en) | Photo Horizon Correction Method based on convolutional neural network and residual network structure | |
CN111444803A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN113705304B (en) | Image processing method, device, storage medium and computer equipment | |
CN114638921B (en) | Motion capture method, terminal device, and storage medium | |
CN112651351B (en) | Data processing method and device | |
Elassal et al. | Unsupervised crowd counting | |
CN110796680B (en) | Target tracking method and device based on similar template updating | |
CN114842205A (en) | Vehicle damage detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200609 |
|
RJ01 | Rejection of invention patent application after publication |