[go: up one dir, main page]

CN111260608A - Tongue region detection method and system based on deep learning - Google Patents

Tongue region detection method and system based on deep learning Download PDF

Info

Publication number
CN111260608A
CN111260608A CN202010017676.7A CN202010017676A CN111260608A CN 111260608 A CN111260608 A CN 111260608A CN 202010017676 A CN202010017676 A CN 202010017676A CN 111260608 A CN111260608 A CN 111260608A
Authority
CN
China
Prior art keywords
data set
tongue
image data
processing
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010017676.7A
Other languages
Chinese (zh)
Inventor
杨强
柴胜
刘华根
何韦澄
王玉鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Laikang Technology Co Ltd
Original Assignee
Laikang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Laikang Technology Co Ltd filed Critical Laikang Technology Co Ltd
Priority to CN202010017676.7A priority Critical patent/CN111260608A/en
Publication of CN111260608A publication Critical patent/CN111260608A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a tongue region detection method and a tongue region detection system based on deep learning, wherein the method comprises the following steps: labeling the acquired image data set containing the tongue part, and preprocessing the labeled image data set to acquire a first image data set; setting the proportional sizes of various fixed reference frames, and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames; training based on a DarkNet network structure, determining the dimension of an output layer according to the plurality of clustering center reference frames, and training according to the first image data set to determine a first detection model; detecting the image dataset without the tongue by using the first detection model to obtain a false detection image dataset for false detection; and adjusting the network structure of the first detection model, modifying the dimension of an output layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.

Description

Tongue region detection method and system based on deep learning
Technical Field
The present invention relates to the technical field of deep learning algorithm, and more particularly, to a tongue region detection method and system based on deep learning.
Background
At present, many tongue diagnosis algorithms based on the traditional Chinese medicine theory require a person to be collected to stretch out the tongue at a fixed distance and a fixed area through a fixing device or equipment when tongue diagnosis analysis is carried out. And then analyzing the pixel points in the fixed area in the picture. In the real world, different people have different tongue extending states and extending sizes, and a fixed area is used, so that the situation that background pixels are more than tongue area pixels can occur, and the tongue diagnosis analysis has great influence on the actual tongue diagnosis analysis.
Furthermore, the prior art application scenarios have great limitations. There are major problems including: 1. the method has a lot of tongue picture shooting requirements on users, and the users need to extend tongues in a prompt box of a shooting interface, so that the method is very inconvenient and has limited application scenes; 2. the prompt frame area is very roughly used as the real tongue area for tongue diagnosis analysis, and the pixel proportion of the background area in the actual situation may be very large, which may affect the accuracy of tongue diagnosis analysis.
Therefore, there is a need for a tongue region detection method to accurately and intelligently determine the tongue region.
Disclosure of Invention
The invention provides a tongue region detection method and a tongue region detection system based on deep learning, and aims to solve the problem of accurately and intelligently determining a tongue region.
In order to solve the above problem, according to an aspect of the present invention, there is provided a tongue region detection method based on deep learning, the method including:
labeling the acquired image data set containing the tongue part, and preprocessing the labeled image data set to acquire a first image data set;
setting the proportional sizes of various fixed reference frames, and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
training based on a DarkNet network structure, determining the dimension of an output layer according to the plurality of clustering center reference frames, and training according to the first image data set to determine a first detection model;
detecting the image dataset without the tongue by using the first detection model to obtain a false detection image dataset for false detection;
and adjusting the network structure of the first detection model, modifying the dimensionality of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
Preferably, the labeling of the acquired image dataset containing the tongue comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the preprocessing the annotated image data set to obtain the first image data set comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
Preferably, the training based on the DarkNet network structure, determining the output layer dimension according to the plurality of cluster center reference frames, and training according to the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
According to another aspect of the present invention, there is provided a tongue region detection system based on deep learning, the system comprising:
the data processing unit is used for labeling the acquired image data set containing the tongue part and preprocessing the labeled image data set to acquire a first image data set;
the clustering unit is used for setting the proportional sizes of various fixed reference frames and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
the first detection model determining unit is used for training based on a DarkNet network structure, determining the dimensionality of an output layer according to the multiple clustering center reference frames, and training according to the first image data set to determine a first detection model;
the false detection data acquisition unit is used for detecting the image dataset which does not contain the tongue part by using the first detection model so as to acquire a false detection image dataset for false detection;
and the tongue detection model determining unit is used for adjusting the network structure of the first detection model, modifying the dimension of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
Preferably, the data processing unit labeling the acquired image data set including the tongue portion includes:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the data processing unit, which pre-processes the annotated image data set to obtain the first image data set, comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
Preferably, the training of the first detection model determining unit based on the DarkNet network structure, the determination of the output layer dimension according to the plurality of cluster center reference frames, and the training of the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
The invention provides a tongue region detection method and a tongue region detection system based on deep learning, wherein a tongue detection model is trained by utilizing a deep convolutional network, so that tongue picture pictures collected under different scenes, different illumination, different pixels and different image sizes can be judged, and the existence of a tongue and the size and the position of the tongue region are determined; only one detection target is used, a simpler convolution structure is used, the number of network layers is only 53, the real-time detection effect can be achieved while the accuracy is guaranteed, and the method is very convenient for various applications of various devices; the shooting requirement on the user is reduced, and the tongue can be correctly detected at any position in the image.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
fig. 1 is a flowchart of a tongue region detection method 100 based on deep learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of cluster distance setting according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of determining a tongue detection model according to an embodiment of the invention; and
fig. 4 is a schematic structural diagram of a tongue region detection system 400 based on deep learning according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 is a flowchart of a tongue region detection method 100 based on deep learning according to an embodiment of the present invention. As shown in fig. 1, in the tongue region detection method based on deep learning according to the embodiment of the present invention, a tongue detection model is trained by using a deep convolutional network, so that tongue photographs acquired under different scenes, different illumination, different pixels, and different image sizes can be judged, and whether a tongue exists, and the size and the position of the tongue region are determined; only one detection target is used, a simpler convolution structure is used, the number of network layers is only 53, the real-time detection effect can be achieved while the accuracy is guaranteed, and the method is very convenient for various applications of various devices; the shooting requirement on the user is reduced, and the tongue can be correctly detected at any position in the image. The tongue region detection method 100 based on deep learning provided by the embodiment of the invention starts from step 101, and in step 101, the acquired image dataset including the tongue is labeled, and the labeled image dataset is preprocessed to acquire a first image dataset.
Preferably, the labeling of the acquired image dataset containing the tongue comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the preprocessing the annotated image data set to obtain the first image data set comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
In an embodiment of the invention, a picture data set a containing the tongue is acquired, the tongue region in the data set is labeled by means of a labeling tool Labelimg, and the position of the tongue is marked in the form of a rectangular frame. And then, selecting data with a first preset quantity threshold value of 40% for data enhancement, wherein the data enhancement comprises horizontal 180-degree overturning, rotation from 15 degrees anticlockwise to 15 degrees clockwise, translation from 1% up, down, left and right to 10%, random clipping of pixels within 20% of the edge of the picture, Gaussian filtering with the filter size of 3 x 3 and 5 x 5 … … 17 x 17, scaling and the like, and any combination of the above. Finally, the image is scaled such that the scaled long edge is 416. If the scaled height is h and h is less than 416, filling pixel regions with the height and width (208-0.5 x h, 416) respectively at the upper and lower parts of the image, setting the pixel values to be 128 with fixed size, and synchronously adjusting the coordinate position of the marking frame in the image, and finally determining the first image data set.
In step 102, the proportional sizes of the multiple fixed reference frames are set, and clustering is performed in a k-means clustering manner to obtain multiple clustering center reference frames.
In the embodiment of the invention, in order to detect tongues with different proportions and sizes in the picture, 9 fixed reference frames anchors with different proportions and sizes are arranged. Specifically, a k-means clustering mode is adopted, and the ratio of the height and width of 9 labeling frames to the height and width of the original image is randomly selected from all labeling images to serve as a clustering center anchor. The clustering distances are set as shown in fig. 2, and the lengths and widths of B1 and B2 are the ratios of the height and width of the labeled frame to the height and width of the original image in different pictures. And (3) coinciding the centers of B1 and B2, and taking the ratio of the intersection area of B1 and B2 to the union area as the distance of the k-means cluster. After multiple rounds of clustering, 9 clustering centers anchors can be obtained.
In step 103, training is performed based on the DarkNet network structure, an output layer dimension is determined according to the plurality of clustering center reference frames, and training is performed according to the first image data set to determine a first detection model.
Preferably, the training based on the DarkNet network structure, determining the output layer dimension according to the plurality of cluster center reference frames, and training according to the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
In the embodiment of the invention, the main network of the training model adopts DarkNet-53, fuses the feature maps of three different scales, namely 13 × 13, 26 × 26 and 52 × 52, to perform target prediction, and each feature map of different scale sizes uses 3 anchors of different sizes, and the total number of the anchors is 9. Since there is only one target to be detected, the output layer dimension is 3 × 18 (1+5), where 3 denotes predicting three bounding boxes, 1 denotes that the prediction category is only one type, and 5 denotes the center coordinates, length and width, and target score of the predicted target. Then, model training is performed using 80% of the data in the first image data set as a training data set and 20% of the data in the first image data set as a verification set, resulting in a first detection model.
In step 104, the image dataset not containing the tongue is detected using the first detection model to obtain a false detected image dataset.
In step 105, the network structure of the first detection model is adjusted, the output layer dimension is modified, the model is initialized by using the parameters of the first detection model except the last layer, and the tongue detection model is determined by using the first image data set and the false detection image data set for detection of the tongue region.
The first detection model obtained by the embodiment of the invention has high target detection recall rate, but has high false detection rate on the image data set without the tongue part. Therefore, after the normal image data not including the tongue portion is acquired, the image data set not including the tongue portion is detected by the first detection model, the data erroneously detected by the first detection model in the image data set not including the tongue portion is collected, and the erroneously detected data is set as the second type object of the object detection.
And then, adjusting the network structure of the first detection model, changing the dimension of an output layer to 3 × 2+5 to 21, keeping other structures unchanged, initializing a new model by using parameters of the first detection model except the last layer, adding false detection data as training data, retraining the model, and obtaining a final tongue detection model.
FIG. 3 is a schematic diagram of determining a tongue detection model according to an embodiment of the invention. As shown in fig. 3, the process of determining the tongue detection model includes:
(1) a picture data set a containing the tongue and a normal data set B not containing the tongue are acquired.
(2) And (6) data annotation. Labeling the data set A, labeling the tongue region in the data set by means of a labeling tool Labelimg, and labeling the position of the tongue in a rectangular frame form.
(3) And (4) enhancing data. And performing data enhancement processing on about 40% of pictures in the A data set to expand the existing data set.
(4) And (5) data scaling processing. The image is scaled such that the scaled long edge is 416. If the scaled height is h and h is less than 416, then pixel regions with height and width (208-0.5 x h, 416) are filled up and down the image, the pixel value is set to 128, and the coordinate position of the marking frame in the image is synchronously adjusted.
(5) And (5) constructing and training a model. In order to detect tongues with different proportions and sizes in the picture, 9 anchors with different proportions and sizes are arranged. And clustering by adopting a k-means clustering mode, and determining 9 clustering centers anchors. The main network adopts DarkNet-53 to fuse the feature maps of three different scales, namely 13 × 13, 26 × 26 and 52 × 52, to perform target prediction, and each feature map of different scale sizes uses 3 anchors of different sizes, and the total number of the anchors is 9. Since there is only one target to be detected, the output layer dimension is 3 × 18 (1+5), where 3 denotes predicting three bounding boxes, 1 denotes that the prediction category is only one type, and 5 denotes the center coordinates, length and width, and target score of the predicted target. Model training was performed using 80% of the data in data set a as the training data set and 10% of the data as the validation set, resulting in model M0.
(6) And determining false detection data. M0 has a high false detection rate on the non-tongue data set B, collects the data false detected in the data set B by M0, and takes the false detected area as the second type target of target detection.
(7) The network structure of M0 was modified to change the output layer dimension to 3 x (2+5) to 21, the other structures were left unchanged, and the new model was initialized using the parameters of M0 except for the last layer. And adding M0 false detection data in the data set B as training data, and retraining the model to obtain a model M1.
According to the embodiment of the invention, the target to be detected is defined, then the basic network model M0 is trained, then the M0 false detection region is used as a background target, the model is finely adjusted, a new model is generated in an iterative manner, and the position of the tongue are judged in the picture by applying the deep convolutional network intelligence, so that the effect of real-time detection can be achieved, and the use of various devices is very convenient.
Fig. 4 is a schematic structural diagram of a tongue region detection system 400 based on deep learning according to an embodiment of the present invention. As shown in fig. 4, a tongue region detection system 400 based on deep learning according to an embodiment of the present invention includes: data processing section 401, clustering section 402, first detection model determining section 403, false detection data acquiring section 404, and tongue detection model determining section 405.
Preferably, the data processing unit 401 is configured to label the acquired image data set including the tongue portion, and pre-process the labeled image data set to acquire the first image data set.
Preferably, the data processing unit 401, labeling the acquired image data set containing the tongue portion, includes: labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
Preferably, the data processing unit 401, performing preprocessing on the annotated image data set to obtain a first image data set, includes: selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set; and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
Preferably, wherein the data enhancement processing comprises: the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
Preferably, the clustering unit 402 is configured to set proportional sizes of multiple fixed reference frames, and perform clustering by using a k-means clustering method to obtain multiple clustering center reference frames.
Preferably, the first detection model determining unit 403 is configured to perform training based on a DarkNet network structure, determine an output layer dimension according to the plurality of clustering center reference frames, and perform training according to the first image data set to determine a first detection model.
Preferably, the training of the first detection model determining unit 403 based on the DarkNet network structure, determining the output layer dimension according to the plurality of cluster center reference frames, and training according to the first image data set to determine the first detection model includes:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
Preferably, the false detection data acquiring unit 404 is configured to detect an image dataset without tongue by using the first detection model to acquire a false detection image dataset for false detection.
Preferably, the tongue detection model determining unit 405 is configured to adjust a network structure of the first detection model, modify an output layer dimension, initialize a model by using parameters of the first detection model except a last layer, and perform retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
The tongue region detection system 400 based on deep learning according to the embodiment of the present invention corresponds to the tongue region detection method 100 based on deep learning according to another embodiment of the present invention, and is not described herein again.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A tongue region detection method based on deep learning is characterized by comprising the following steps:
labeling the acquired image data set containing the tongue part, and preprocessing the labeled image data set to acquire a first image data set;
setting the proportional sizes of various fixed reference frames, and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
training based on a DarkNet network structure, determining the dimension of an output layer according to the plurality of clustering center reference frames, and training according to the first image data set to determine a first detection model;
detecting the image dataset without the tongue by using the first detection model to obtain a false detection image dataset for false detection;
and adjusting the network structure of the first detection model, modifying the dimensionality of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
2. The method of claim 1, wherein said labeling the acquired image dataset containing the tongue comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
3. The method of claim 1, wherein pre-processing the annotated image dataset to obtain the first image dataset comprises:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
4. The method of claim 3, wherein the data enhancement process comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
5. The method of claim 1, wherein the training based on a DarkNet network structure, determining output layer dimensions from the plurality of cluster center reference boxes, and training from the first image dataset to determine a first detection model comprises:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
6. A deep learning based tongue region detection system, the system comprising:
the data processing unit is used for labeling the acquired image data set containing the tongue part and preprocessing the labeled image data set to acquire a first image data set;
the clustering unit is used for setting the proportional sizes of various fixed reference frames and clustering by adopting a k-means clustering mode to obtain a plurality of clustering center reference frames;
the first detection model determining unit is used for training based on a DarkNet network structure, determining the dimensionality of an output layer according to the multiple clustering center reference frames, and training according to the first image data set to determine a first detection model;
the false detection data acquisition unit is used for detecting the image dataset which does not contain the tongue part by using the first detection model so as to acquire a false detection image dataset for false detection;
and the tongue detection model determining unit is used for adjusting the network structure of the first detection model, modifying the dimension of an output layer, initializing the model by using the parameters of the first detection model except the last layer, and retraining by using the first image data set and the false detection image data set to determine a tongue detection model for detecting a tongue region.
7. The system of claim 6, wherein the data processing unit, labeling the acquired image dataset containing the tongue, comprises:
labeling the tongue region in the acquired image dataset containing the tongue by using a labeling tool Labelimg, and labeling the position of the tongue in the form of a rectangular frame.
8. The system of claim 6, wherein the data processing unit pre-processes the annotated image dataset to obtain the first image dataset, comprising:
selecting data in the labeled image data set according to a first preset quantity threshold value to perform data enhancement processing so as to obtain an expanded data set;
and carrying out equal-scale scaling and filling processing on the marked image data set and the expanded data set according to a preset image proportion, and synchronously adjusting the coordinate position of the marking frame in the image to obtain a first image data set.
9. The system of claim 8, wherein the data enhancement process comprises:
the image processing method comprises at least one of horizontal turning processing, clockwise rotation and anticlockwise rotation processing within a preset angle range threshold, translation processing of a preset proportion threshold in the vertical direction and the horizontal direction, pixel processing within the edge of a randomly cut picture according to the preset cutting proportion threshold, Gaussian filter processing and scaling processing.
10. The system of claim 6, wherein the first detection model determination unit, trained based on a DarkNet network structure, determines output layer dimensions from the plurality of cluster center reference boxes, and is trained from the first image dataset to determine a first detection model, comprises:
the model backbone network adopts DarkNet-53 to fuse three feature mapping graphs of 13 × 13, 26 × 26 and 52 × 52 with different scales for target prediction, and each feature graph with different scales uses 3 fixed reference frames with different sizes to determine 9 fixed reference frames;
determining the output layer dimension to be 3 x (1+5) ═ 18; wherein 3 represents three frames for prediction, 1 represents that the prediction category is only one type, and 5 represents the central coordinate, the length and the width and the target score of the predicted target;
and selecting a data training data set with a second preset quantity threshold value in the first image data set, and performing model training by taking the rest data as a verification set to determine a first detection model.
CN202010017676.7A 2020-01-08 2020-01-08 Tongue region detection method and system based on deep learning Pending CN111260608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010017676.7A CN111260608A (en) 2020-01-08 2020-01-08 Tongue region detection method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010017676.7A CN111260608A (en) 2020-01-08 2020-01-08 Tongue region detection method and system based on deep learning

Publications (1)

Publication Number Publication Date
CN111260608A true CN111260608A (en) 2020-06-09

Family

ID=70954122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010017676.7A Pending CN111260608A (en) 2020-01-08 2020-01-08 Tongue region detection method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN111260608A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149684A (en) * 2020-08-19 2020-12-29 北京豆牛网络科技有限公司 Image processing method and image preprocessing method for target detection
CN114445682A (en) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 Method, device, electronic equipment, storage medium and product for training model
CN115512241A (en) * 2022-11-01 2022-12-23 中国科学院半导体研究所 Data enhancement method and device for target detection in large-scale sparse sample remote sensing images

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470138A (en) * 2018-01-24 2018-08-31 博云视觉(北京)科技有限公司 Method for target detection and device
CN108960076A (en) * 2018-06-08 2018-12-07 东南大学 Ear recognition and tracking based on convolutional neural networks
CN109063594A (en) * 2018-07-13 2018-12-21 吉林大学 Remote sensing images fast target detection method based on YOLOv2
CN109522963A (en) * 2018-11-26 2019-03-26 北京电子工程总体研究所 A kind of the feature building object detection method and system of single-unit operation
CN110210391A (en) * 2019-05-31 2019-09-06 合肥云诊信息科技有限公司 Tongue picture grain quantitative analysis method based on multiple dimensioned convolutional neural networks
CN110263660A (en) * 2019-05-27 2019-09-20 魏运 A kind of traffic target detection recognition method of adaptive scene changes
CN110490073A (en) * 2019-07-15 2019-11-22 浙江省北大信息技术高等研究院 Object detection method, device, equipment and storage medium
WO2019233297A1 (en) * 2018-06-08 2019-12-12 Oppo广东移动通信有限公司 Data set construction method, mobile terminal and readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470138A (en) * 2018-01-24 2018-08-31 博云视觉(北京)科技有限公司 Method for target detection and device
CN108960076A (en) * 2018-06-08 2018-12-07 东南大学 Ear recognition and tracking based on convolutional neural networks
WO2019233297A1 (en) * 2018-06-08 2019-12-12 Oppo广东移动通信有限公司 Data set construction method, mobile terminal and readable storage medium
CN109063594A (en) * 2018-07-13 2018-12-21 吉林大学 Remote sensing images fast target detection method based on YOLOv2
CN109522963A (en) * 2018-11-26 2019-03-26 北京电子工程总体研究所 A kind of the feature building object detection method and system of single-unit operation
CN110263660A (en) * 2019-05-27 2019-09-20 魏运 A kind of traffic target detection recognition method of adaptive scene changes
CN110210391A (en) * 2019-05-31 2019-09-06 合肥云诊信息科技有限公司 Tongue picture grain quantitative analysis method based on multiple dimensioned convolutional neural networks
CN110490073A (en) * 2019-07-15 2019-11-22 浙江省北大信息技术高等研究院 Object detection method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149684A (en) * 2020-08-19 2020-12-29 北京豆牛网络科技有限公司 Image processing method and image preprocessing method for target detection
CN112149684B (en) * 2020-08-19 2024-06-07 北京豆牛网络科技有限公司 Image processing method and image preprocessing method for target detection
CN114445682A (en) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 Method, device, electronic equipment, storage medium and product for training model
CN115512241A (en) * 2022-11-01 2022-12-23 中国科学院半导体研究所 Data enhancement method and device for target detection in large-scale sparse sample remote sensing images

Similar Documents

Publication Publication Date Title
CN110163198B (en) Table identification reconstruction method and device and storage medium
CN107358149B (en) Human body posture detection method and device
US10497099B2 (en) Automatic orientation adjustment of spherical panorama digital images
Saxena et al. Make3d: Learning 3d scene structure from a single still image
US7352881B2 (en) Method for tracking facial features in a video sequence
CN112633144A (en) Face occlusion detection method, system, device and storage medium
US11113571B2 (en) Target object position prediction and motion tracking
CN110197146A (en) Facial image analysis method, electronic device and storage medium based on deep learning
CN112016614A (en) Construction method of optical image target detection model, target detection method and device
US7995866B2 (en) Rotation angle detection apparatus, and control method and control program of rotation angle detection apparatus
CN111260608A (en) Tongue region detection method and system based on deep learning
CN108596098B (en) Human body part analysis method, system, device and storage medium
CN107301408A (en) Human body mask extracting method and device
US20180101981A1 (en) Smoothing 3d models of objects to mitigate artifacts
CN113570615A (en) An image processing method, electronic device and storage medium based on deep learning
CN110705366A (en) Real-time human head detection method based on stair scene
CN109544516B (en) Image detection method and device
KR20190080388A (en) Photo Horizon Correction Method based on convolutional neural network and residual network structure
CN111444803A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113705304B (en) Image processing method, device, storage medium and computer equipment
CN114638921B (en) Motion capture method, terminal device, and storage medium
CN112651351B (en) Data processing method and device
Elassal et al. Unsupervised crowd counting
CN110796680B (en) Target tracking method and device based on similar template updating
CN114842205A (en) Vehicle damage detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200609

RJ01 Rejection of invention patent application after publication