[go: up one dir, main page]

CN113436064B - Method and equipment for training detection model of key points of target object and detection method and equipment - Google Patents

Method and equipment for training detection model of key points of target object and detection method and equipment Download PDF

Info

Publication number
CN113436064B
CN113436064B CN202110986015.XA CN202110986015A CN113436064B CN 113436064 B CN113436064 B CN 113436064B CN 202110986015 A CN202110986015 A CN 202110986015A CN 113436064 B CN113436064 B CN 113436064B
Authority
CN
China
Prior art keywords
detected
key point
target
image
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110986015.XA
Other languages
Chinese (zh)
Other versions
CN113436064A (en
Inventor
王鹏程
高原
刘霄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202110986015.XA priority Critical patent/CN113436064B/en
Publication of CN113436064A publication Critical patent/CN113436064A/en
Application granted granted Critical
Publication of CN113436064B publication Critical patent/CN113436064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for training a detection model of a key point of a target object, a detection method and equipment, wherein a video sample is obtained, a plurality of first to-be-detected image samples contained in the video sample are input into a candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample, then the plurality of first to-be-detected image samples and the first candidate key points corresponding to each first to-be-detected image sample obtained in the last step are input into a self-coding network to obtain a plurality of target generation image samples input by the self-coding network, parameters of the candidate key point detection network are updated based on stability results of the plurality of target generation image samples until the stability results meet a first preset condition, the candidate key point detection network is determined as the target key point detection network, namely, the candidate key point detection network is trained based on the stability results of the plurality of target generation image samples as convergence conditions, therefore, the stability of the target key point detection network is improved.

Description

Method and equipment for training detection model of key points of target object and detection method and equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for training a detection model of key points of a target object.
Background
With the development of image processing technology, the beauty is widely applied to application software such as short video shooting and live webcast. For example: in the process of live webcasting, key points (such as a nose, eyes and the like) of a face image in a video are detected, and the face is beautified based on the detected key points.
The existing key point detection method generally includes the steps of obtaining a video sample, manually labeling key points of an image sample to be detected in the video sample, training the video sample to obtain a key point detection network, and obtaining the key points of the image to be detected in the video to be detected by using the key point detection network.
However, with the prior art methods, the stability of the keypoint detection is not high.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a method, a device and a system for training a detection model of a key point of a target object.
In a first aspect, the present disclosure provides a method for training a detection model of a key point of a target object, including:
obtaining a video sample, wherein the video sample comprises: the method comprises the steps that a plurality of first to-be-detected image samples containing first target objects are obtained, wherein the first to-be-detected image samples contain to-be-detected image samples of which the first target objects are not marked with key point information;
inputting a plurality of first to-be-detected image samples into a candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample;
inputting the plurality of first image samples to be detected and the first candidate key points corresponding to each first image sample to be detected into a target self-encoding network to obtain a plurality of target generation image samples;
updating parameters of the candidate key point detection network according to the stability results of the image samples generated by the targets, and returning to execute the step of inputting the first to-be-detected image samples into the candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample; and determining the candidate key point detection network as a target key point detection network until the stability result meets a first preset condition.
Optionally, the target self-encoding network includes: a first encoder, a second encoder and a decoder;
inputting the plurality of first to-be-detected image samples and the first candidate keypoints corresponding to each first to-be-detected image sample into a target self-encoding network to obtain a plurality of target-generated image samples, including:
for each first image sample to be detected, performing the following steps to obtain a plurality of target generation image samples:
performing first encoding processing on a first candidate key point corresponding to the first image sample to be detected by using a first encoder to obtain a key point feature corresponding to the first candidate key point;
performing second encoding processing on the first image sample to be detected by using a second encoder to obtain image characteristics corresponding to the first image sample to be detected;
and obtaining a target generation image sample corresponding to the first image sample to be detected by using the decoder according to the key point characteristics and the image characteristics.
Optionally, the target self-encoding network is obtained by training in the following manner:
inputting a plurality of second image samples to be detected containing second target objects into a first initial key point detection network to obtain first initial key points corresponding to each second image sample to be detected respectively, wherein the second image samples to be detected contain the image samples to be detected with the second target objects not labeled with key point information;
inputting a plurality of second image samples to be detected and the first initial key points corresponding to each second image sample to be detected into a candidate self-coding network to obtain a candidate generated image;
inputting the candidate generated image into a second initial key point detection network to obtain a second initial key point corresponding to the candidate generated image, wherein the first initial key point detection network is the same as the second initial key point detection network;
performing mode consistency judgment according to the first initial key point and the second initial key point to obtain a judgment result;
and updating parameters of the candidate self-coding network according to the judgment result, returning to execute the step of inputting a plurality of second image samples to be detected containing second target objects into a first initial key point detection network to obtain first initial key points corresponding to each second image sample to be detected until the judgment result meets a second preset condition, and determining the candidate self-coding network as the target self-coding network.
Optionally, the performing mode consistency judgment according to the first initial keypoint and the second initial keypoint to obtain a judgment result includes:
generating a first thermodynamic diagram according to the first initial key point;
generating a second thermodynamic diagram according to the second initial key point;
acquiring key point features corresponding to the first thermodynamic diagram, and acquiring key point features corresponding to the second thermodynamic diagram;
and obtaining the judgment result according to the key point characteristics corresponding to the first thermodynamic diagram and the key point characteristics corresponding to the second thermodynamic diagram.
Optionally, the candidate self-coding network is obtained by:
and training a plurality of third image samples to be detected containing third target objects to obtain the candidate self-coding network, wherein the third target objects in the plurality of third image samples to be detected are marked with key point information.
Optionally, the training to obtain the candidate self-encoding network by using a plurality of third to-be-detected image samples including a third target object includes:
performing first coding processing on target key points marked on the third image sample to be detected by using a first coder to obtain target key point characteristics;
performing second coding processing on the third image sample to be detected by using a second coder to obtain a target image characteristic corresponding to the third image sample to be detected;
obtaining a target generation image according to the target key point characteristics and the target image characteristics by using the decoder;
and training an initial self-coding network based on the third image sample to be detected and the target generation image until the initial self-coding network is converged to obtain the candidate self-coding network.
Optionally, the candidate keypoint detection network is obtained by training a plurality of fourth image samples to be detected containing fourth target objects, where the fourth target objects in the plurality of fourth image samples to be detected are labeled with keypoint information.
Optionally, before the generating the stability result of the image sample according to the plurality of targets and updating the parameter of the candidate keypoint detection network, the method further includes:
generating image samples based on a plurality of targets to obtain a generated video sequence;
and acquiring a stability result by using a target video stability judging network according to the generated video sequence and the video sequences corresponding to the plurality of first to-be-detected image samples included in the video samples.
Optionally, the obtaining, by the target video stability determining network, a stability result according to the generated video sequence and the video sequences corresponding to the plurality of first to-be-detected image samples included in the video sample includes:
performing three-dimensional convolution on a plurality of target generation images included in the generated video sequence by utilizing a space-time convolution network to extract a first space-time characteristic; performing two-dimensional convolution operation on the first time-space characteristic to extract a first deepened space characteristic;
performing three-dimensional convolution on a plurality of first to-be-detected image samples included in the video samples by utilizing a space-time convolution network to extract second space-time characteristics; performing two-dimensional convolution operation on the second space-time characteristic to extract a second deepened space characteristic;
and obtaining a stability result according to the first space-time characteristic, the first deepening space characteristic, the second space-time characteristic and the second deepening space characteristic.
Optionally, the target object is a human face.
The present disclosure provides a method for detecting key points of a target object, including:
acquiring a video to be detected, wherein the video to be detected comprises: a plurality of first images to be detected including a first target object;
and detecting a plurality of first images to be detected by using a target key point detection network to obtain a target object key point in each first image to be detected, wherein the target key point detection network is obtained by the training method of the target object key detection model of the first aspect.
In a third aspect, the present disclosure provides a training apparatus for a key point detection model of a target object, including:
an obtaining module, configured to obtain a video sample, where the video sample includes: the method comprises the steps that a plurality of first to-be-detected image samples containing first target objects are obtained, wherein the first to-be-detected image samples contain to-be-detected image samples of which the first target objects are not marked with key point information;
the processing module is used for inputting a plurality of first to-be-detected image samples into a candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample;
the processing module is further configured to input the plurality of first to-be-detected image samples and the first candidate keypoints corresponding to each of the first to-be-detected image samples into a target self-encoding network, so as to obtain a plurality of target-generated image samples;
the processing module is further configured to update parameters of the candidate keypoint detection network according to the stability results of the image samples generated by the multiple targets, and return to execute inputting of the multiple first to-be-detected image samples into the candidate keypoint detection network to obtain first candidate keypoints corresponding to each first to-be-detected image sample; and determining the candidate key point detection network as a target key point detection network until the stability result meets a first preset condition.
In a fourth aspect, the present disclosure provides an apparatus for detecting a key point of a target object, including:
the acquisition module is used for acquiring a video to be detected, wherein the video to be detected comprises: a plurality of first images to be detected including a first target object;
and the processing module is used for detecting the plurality of first images to be detected by using a target key point detection network to obtain a target object key point in each first image to be detected, wherein the target key point detection network is obtained by using the training method of the target object key detection model in any one of the second aspects.
In a fifth aspect, the present disclosure provides a computer device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of the first aspect or the steps of the method of any one of the second aspect when executing the computer program.
In a sixth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first aspect or the second aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
obtaining a video sample, inputting a plurality of first to-be-detected image samples contained in the video sample into a candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample, then, inputting the first candidate key points corresponding to the plurality of first to-be-detected image samples and each first to-be-detected image sample obtained in the previous step into the self-coding network again to obtain a plurality of target generation image samples input by the self-coding network, updating parameters of the candidate key point detection network based on the stability results of the plurality of target generation image samples until the stability results meet a first preset condition, the candidate keypoint detection network is determined to be the target keypoint detection network, that is, the candidate keypoint detection network is trained based on the stability result of the image samples generated by the multiple targets as a convergence condition, so that the stability of the target keypoint detection network is improved. In addition, in the process, the first target objects in the plurality of first to-be-detected image samples in the first to-be-detected image samples are not labeled with the key point information, so that the model training cost is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of an embodiment of a method for training a detection model of a target object key point according to the present disclosure;
fig. 2 is a schematic structural diagram of a candidate keypoint detection network provided by the present disclosure;
fig. 3 is a schematic structural diagram of a target self-coding network provided by the present disclosure;
fig. 4 is a schematic structural diagram of a target self-coding network provided by the present disclosure;
FIG. 5 is a schematic flowchart of another embodiment of a method for training a detection model of a key point of a target object according to the present disclosure;
FIG. 6a is a schematic diagram of an architecture for target self-coding network training according to the present disclosure;
fig. 6b is a schematic diagram of a modality-consistent determination network according to the present disclosure;
FIG. 7 is a schematic flowchart illustrating an embodiment of a method for training a detection model of a key point of a target object according to the present disclosure;
FIG. 8 is a schematic flowchart of an embodiment of a method for training a detection model of a key point of a target object according to the present disclosure;
FIG. 9 is a schematic flow chart diagram illustrating a method for detecting key points of a target object according to the present disclosure;
FIG. 10 is a schematic structural diagram of an embodiment of a training apparatus for a key point detection model of a target object according to the present disclosure;
fig. 11 is a schematic structural diagram of a device for detecting key points of a target object according to the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The target object in the present disclosure may be a specific portion in the image to be detected, for example, if the image to be detected includes a person, the target object may be a human face, an arm, a leg, or other portions, and the target object may also be other objects, for example, a license plate of a vehicle. The target object key point detection may be applied not only to image processing, such as beauty, but also to target tracking, etc., to which the present disclosure is not limited.
The detection of the key points is used as a basis for supporting applications such as image processing and target tracking, and the stability of the detected key points is very important. In the prior art, when the key point detection is performed on a target object in a video, the problem of key point jitter exists.
The method comprises the steps of obtaining a key point corresponding to an image sample to be detected through a candidate key point detection network, inputting the image sample to be detected and the key point into a target self-coding network, obtaining a target generated image sample by means of the image generation capacity of the target self-coding network, judging the stability of the key point based on the target generated image sample, taking a stability result as a convergence condition of the candidate key point detection network, adjusting parameters of the candidate key point detection network based on the stability result until the stability meets a certain condition, determining the convergence of the candidate key point detection network, and obtaining the target key point detection network. The stability is used as a convergence condition of model training, so that the stability of the key points output by the target key point detection network is improved.
The technical solutions of the present disclosure are described in several specific embodiments, and the same or similar concepts may be referred to one another, and are not described in detail in each place.
Fig. 1 is a schematic flowchart of an embodiment of a method for training a detection model of a target object key point, as shown in fig. 1, the method of the embodiment is as follows:
s101: a video sample is obtained.
Wherein, the video sample includes: each first image sample to be detected comprises a first target object, wherein the key point information of the first target object is not labeled. In general, the key point information can be displayed in the image sample to be detected in a manual labeling manner, that is, after the original image is taken, the key point information needs to be manually labeled to form the image sample to be detected; however, in the technical solution of the present disclosure, the first to-be-detected image sample in the video samples used in the method does not need to be labeled with the key point information, and therefore, a large number of video samples may be obtained in more ways to facilitate training of the network model, for example, the video samples may be obtained in the following ways:
the method can acquire a large number of video samples from authorized video platforms based on a crawler technology, or acquire the video samples from places with large people flow, such as shopping malls and roads, under the authorized condition, and extract the image samples of each frame in the video samples.
S103: and inputting a plurality of first to-be-detected image samples into a candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample.
Wherein, the network model for detecting the key point in the present disclosure includes: the candidate key point detection network and the target key point detection network have the same network structure, and the difference is that the candidate key point detection network refers to a key point detection network in the training process, and the target key point detection network refers to a key point detection network after convergence.
The structure diagram of the candidate keypoint detection network is shown in fig. 2, and the input of the candidate keypoint detection network is a first image sample to be detected, and the output of the candidate keypoint detection network is a first candidate keypoint.
And inputting a plurality of first to-be-detected image samples included in the video sample into a candidate key point detection network, performing key point detection processing on the candidate key point detection network, and outputting a first candidate key point corresponding to each first to-be-detected image sample.
After the first candidate keypoints corresponding to each first to-be-detected image sample are obtained, S105 is performed.
S105: and inputting the plurality of first image samples to be detected and the first candidate key points corresponding to each first image sample to be detected into a target self-encoding network to obtain a plurality of target generation image samples.
The structure diagram of the target self-encoding network is shown in fig. 3, and the first candidate keypoints corresponding to the first to-be-detected image sample and the first to-be-detected image sample are input into the target self-encoding network and output as the target generation image sample.
The target self-coding network extracts image features from the first image sample to be detected, acquires key point features from first candidate key points corresponding to the first image sample to be detected, and reconstructs an image based on the image features and the key point features to obtain a target image sample.
S107: and updating the parameters of the candidate key point detection network according to the stability results of the plurality of target generated image samples, and returning to execute S103.
The candidate key point detection network is a key point detection network in a training process, and if the stability of an output result of the candidate key point detection network cannot meet requirements, parameters of the candidate key point detection network need to be continuously adjusted through the training process, so that the stability of key points output by the candidate key point detection network is better and better.
The method comprises the steps of taking stability results of a plurality of target generation image samples as convergence conditions, and adjusting parameters of a candidate key point detection network based on the stability results.
S109: and determining the candidate key point detection network as a target key point detection network until the stability result meets a first preset condition.
After multiple times of training, determining that the candidate key point network is converged until the stability result meets a first preset condition, and determining that the converged candidate key point detection network is the target key point detection network.
In this embodiment, a video sample is obtained, a plurality of first to-be-detected image samples included in the video sample are input into a candidate keypoint detection network, so as to obtain first candidate keypoints corresponding to each first to-be-detected image sample, then, the plurality of first to-be-detected image samples and the first candidate keypoints corresponding to each first to-be-detected image sample obtained in the previous step are input into a self-coding network again, so as to obtain a plurality of target generated image samples input from the self-coding network, parameters of the candidate keypoint detection network are updated based on stability results of the plurality of target generated image samples until the stability results satisfy a first preset condition, so as to determine that the candidate keypoint detection network is a target keypoint detection network, that is, the candidate keypoint detection network is trained based on the stability results of the plurality of target generated image samples as convergence conditions, therefore, the stability of the target key point detection network is improved. In addition, in the process, the first target objects in the plurality of first to-be-detected image samples in the first to-be-detected image samples are not labeled with the key point information, so that the model training cost is reduced.
One implementation of the structure of the target self-coding network in the above embodiment is shown in fig. 4, and the target self-coding network 400 includes: a first encoder 41, a second encoder 42, and a decoder 43; in the above embodiment, with reference to fig. 4, further, a possible implementation manner of S105 is as follows:
for each first image sample to be detected, performing the following steps to obtain a plurality of target generation image samples:
and performing first coding processing on a first candidate key point corresponding to the first to-be-detected image sample by using a first coder to obtain a key point feature corresponding to the first candidate key point. And carrying out second coding processing on the first image sample to be detected by using a second coder to obtain the image characteristics corresponding to the first image sample to be detected. After the key point features and the image features are obtained, the key point features and the image features are connected and input into a decoder. And obtaining a target generation image sample corresponding to the first image sample to be detected by using the decoder according to the key point characteristics and the image characteristics.
In the embodiment, two encoders are used for respectively obtaining key point characteristics and image characteristics; and reconstructing the target generation image sample through a decoder based on the key point characteristics and the image characteristics.
Fig. 5 is a schematic flowchart of another embodiment of a method for training a detection model of a key point of a target object according to the present disclosure, fig. 6a is a schematic diagram of an architecture of a target self-encoded network training according to the present disclosure, fig. 5 is a schematic diagram of a training process of a target self-encoded network based on the foregoing embodiment, and further, a target self-encoded network can be obtained by training in the manner shown in fig. 5, which, with reference to fig. 5 and 6a, includes:
s501: and inputting a plurality of second image samples to be detected containing second target objects into a first initial key point detection network to obtain first initial key points corresponding to the second image samples to be detected respectively.
And the second to-be-detected image sample comprises a to-be-detected image sample of which the second target object is not marked with key point information.
The first initial key point detection network is the initial state of the candidate key point detection network.
S502: and inputting a plurality of second image samples to be detected and the first initial key points corresponding to each second image sample to be detected into a candidate self-coding network to obtain a candidate generated image.
Wherein, the input of the candidate self-coding network comprises: and outputting the second image sample to be detected and the first initial key point corresponding to the second image sample to be detected as a candidate generated image sample.
And the candidate self-coding network extracts image characteristics from the second image sample to be detected, acquires key point characteristics from second candidate key points corresponding to the second image sample to be detected, and reconstructs an image based on the image characteristics and the key point characteristics to obtain a candidate generated image.
S503: and inputting the candidate generated image into a second initial key point detection network to obtain a second initial key point corresponding to the candidate generated image.
Wherein the first initial keypoint detection network is the same as the second initial keypoint detection network.
And inputting the candidate generated image into a second initial key point detection network which is the same as the first initial key point detection network to obtain a second initial key point corresponding to the candidate generated image.
S504: and judging the mode consistency according to the first initial key point and the second initial key point to obtain a judgment result.
Optionally, the mode consistency determination may be performed through the mode consistency determination network based on the first initial key point and the second initial key point, so as to obtain a determination result.
Fig. 6b is a schematic diagram of an architecture of a modality coincidence determination network, where fig. 6b is a schematic diagram of an architecture of a modality coincidence determination network provided by the present disclosure, and the schematic diagram includes: the thermodynamic diagram processing module, the convolutional neural network and the discriminator, in combination with fig. 6b, implement the mode consistent discrimination as follows:
and inputting the first initial key point into a thermodynamic diagram processing module, and generating a first thermodynamic diagram according to the first initial key point by the thermodynamic diagram processing module. The thermodynamic diagram is a first thermodynamic diagram, wherein the coordinates of the key points are mapped onto a picture, and the picture with a black background and a key point position is obtained according to a first initial key point.
Similarly, a second initial key point is input into the thermodynamic diagram processing module, and the thermodynamic diagram processing module generates a second thermodynamic diagram according to the second initial key point. Specifically, a second thermodynamic diagram is generated according to the coordinates of the second initial keypoints.
Inputting a first thermodynamic diagram and a second thermodynamic diagram into a convolutional neural network, acquiring the key point features corresponding to the first thermodynamic diagram, and acquiring the key point features corresponding to the second thermodynamic diagram.
And obtaining the judgment result according to the key point characteristics corresponding to the first thermodynamic diagram and the key point characteristics corresponding to the second thermodynamic diagram.
Optionally, the key point feature corresponding to the first thermodynamic diagram and the key point feature corresponding to the second thermodynamic diagram are input into the discriminator, and according to the mode consistency loss = | | | discriminator (the key point feature corresponding to the first thermodynamic diagram) -discriminator (the key point feature corresponding to the second thermodynamic diagram) | | | |2And obtaining a judgment result. Wherein, the mode coincidence loss is the judgment result.
S505: and updating the parameters of the candidate self-coding network according to the judgment result, and returning to execute the step S501.
S506: and determining the candidate self-coding network as a target self-coding network until the judgment result meets a second preset condition.
The second preset condition may be that the determination result is smaller than a first preset threshold, where the first preset threshold may be 0.012, 0.01 or other values close to 0.
In this embodiment, a first initial key point corresponding to a second image sample to be detected is obtained first. Inputting the first initial key point and the second image sample to be detected into a candidate self-coding network to obtain a candidate generated image, wherein the second image sample to be detected is a real image, the candidate generated image can be regarded as a false image obtained based on the information of the real image, inputting the candidate generated image into the same key point detection model to obtain a second initial key point, if the false image is closer to the real image, which indicates that the accuracy of the image generated by the self-coding network is higher, the disclosure judges the closer the false image is to the real image by comparing the modal consistency of the first initial key point and the second initial key point, to adjust the parameters of the candidate self-coding network until the mode consistency judging result meets the second preset condition, and the candidate self-coding network is considered to be converged to obtain the target self-coding network, so that the accuracy of generating the image by the target self-coding network is improved. In addition, in the process of training to obtain the target self-coding network, the image sample of the target object which is not marked with the key point information is adopted, so that the cost of model training is reduced.
Fig. 7 is a schematic flowchart of an embodiment of a method for training a detection model of a key point of a target object according to the present disclosure, and optionally, the candidate self-encoding network in the embodiment shown in fig. 5 may also be obtained by training using a plurality of third to-be-detected image samples including a third target object, where the third target object in the plurality of third to-be-detected image samples is labeled with key point information. Namely, the candidate self-coding network is obtained by training the image sample to be detected marked with the key point information, and then the candidate self-coding network is continuously trained through the image sample to be detected without the key point, so that the convergence efficiency of the candidate self-coding network is improved. One possible implementation of the candidate self-coding network obtained by training is shown in fig. 7.
S701: and carrying out first coding processing on the target key points marked on the third image sample to be detected by using a first coder to obtain the characteristics of the target key points.
S702: and carrying out second coding processing on the third image sample to be detected by using a second coder to obtain the target image characteristics corresponding to the third image sample to be detected.
S703: and obtaining a target generation image according to the target key point characteristics and the target image characteristics by using the decoder.
S704: and training an initial self-coding network based on the third image sample to be detected and the target generation image until the initial self-coding network is converged to obtain the candidate self-coding network.
Specifically, based on the difference between the third detected image sample and the target generated image sample, the parameters of the initial self-coding network are adjusted until the initial self-coding network converges, and a candidate self-coding network is obtained.
Optionally, a value of the second loss function is obtained according to reconstruction loss = MSE (target generated image — third image sample to be detected), where reconstruction loss is a value of the first loss function, and MSE (target generated image — third image sample to be detected) is an average of a sum of squares of differences of each corresponding pixel between the target generated image and the third image sample to be detected.
Optionally, the value of the second loss function is obtained according to decoupling loss = -KL (feature of a key point | | image feature), where decoupling loss is the value of the second loss function, KL (feature of a key point | | | image feature) is a difference between the distribution of the features of the key point and the distribution of the image features, and a smaller value of the second loss function indicates that the distribution of the features of the key point is more similar to the distribution of the image features.
The convergence condition may be that the value of the first loss function is smaller than a second preset threshold and the value of the second loss function is smaller than a third preset threshold, the value of the second preset threshold may be 0.005, 0.01, 0.02 or other values close to 0, and the value of the third preset threshold may be 0.005, 0.01, 0.02 or other values close to 0, which is not limited by the disclosure.
Optionally, the first loss function may also be an average of the sum of absolute values of differences of each corresponding pixel between the target generation image and the third to-be-detected image sample, and the selection of the first loss function may be determined according to an actual situation, which is not limited by the present disclosure.
Optionally, in each of the above embodiments, the candidate keypoint detection network is obtained by training a plurality of fourth image samples to be detected including a fourth target object, where the fourth target object in the plurality of fourth image samples to be detected is labeled with keypoint information. Namely, the candidate key point detection network is obtained by training the to-be-detected image sample labeled with the key point information, and then the candidate key point detection network is continuously trained through the to-be-detected image sample not labeled with the key point, so that the convergence efficiency of the candidate key point detection network is improved.
Fig. 8 is a schematic flowchart of an embodiment of a method for training a detection model of a target object keypoint, provided by the present disclosure, where fig. 8 is based on the foregoing embodiments, and further includes, before S107:
s1061: generating image samples based on the plurality of targets, obtaining a generated video sequence.
S1062: and acquiring a stability result by using a target video stability judging network according to the generated video sequence and the video sequences corresponding to the plurality of first to-be-detected image samples included in the video samples.
One possible implementation is as follows:
performing three-dimensional convolution on a plurality of target generation images included in the generated video sequence by utilizing a space-time convolution network to extract a first space-time characteristic; and carrying out two-dimensional convolution operation on the first time-space characteristic to extract a first deepened space characteristic.
Performing three-dimensional convolution on a plurality of first to-be-detected image samples included in the video samples by utilizing a space-time convolution network to extract second space-time characteristics; and performing two-dimensional convolution operation on the second space-time characteristic to extract a second deepened space characteristic.
And obtaining a stability result according to the first space-time characteristic, the first deepening space characteristic, the second space-time characteristic and the second deepening space characteristic.
Stabilizationloss(= | | space-time convolution network (generate video sequence) | non-woven memory2+ | | | space-time convolution network (input video sequence) -1| luminance2
Wherein, it is stabilizedlossThe better the stability result is for the obtained stability result, which indicates that the more stable the key point detected by the key point detection network is, the better jitter is.
Fig. 9 is a schematic flow chart of another method for detecting key points of a target object according to the present disclosure, where the target key point detection network used in fig. 9 can be obtained by the above embodiments of the method for training key points of a target object, as shown in fig. 9, the method of this embodiment is as follows:
s901: and acquiring the video to be detected.
The video to be detected comprises: a plurality of first images to be detected including a first target object;
s902: and detecting the plurality of first images to be detected by using a target key point detection network to obtain the key points of the target object in each first image to be detected.
The target key point detection network is obtained by the training method of the target object key detection model in each embodiment.
In this embodiment, the target key point in each first image to be detected is obtained by detecting the plurality of first images to be detected by using the target key point detection network, and the stability results of the plurality of target generated image samples are used as convergence conditions to train the candidate key point detection network, so that the stability of the target key point detection network is improved.
Fig. 10 is a schematic structural diagram of an embodiment of a training apparatus for a target object keypoint detection model provided by the present disclosure, and as shown in fig. 10, the apparatus of the present embodiment includes an obtaining module 1001 and a processing module 1002, wherein,
an obtaining module, configured to obtain a video sample, where the video sample includes: the method comprises the steps that a plurality of first to-be-detected image samples containing first target objects are obtained, wherein the first to-be-detected image samples contain to-be-detected image samples of which the first target objects are not marked with key point information;
a processing module 1002, configured to input a plurality of first to-be-detected image samples into a candidate keypoint detection network, so as to obtain first candidate keypoints corresponding to each first to-be-detected image sample;
the processing module 1002 is further configured to input the plurality of first to-be-detected image samples and the first candidate keypoints corresponding to each of the first to-be-detected image samples into a target self-encoding network, so as to obtain a plurality of target-generated image samples;
the processing module 1002 is further configured to update parameters of the candidate keypoint detection network according to the stability results of the image samples generated by the multiple targets, and return to execute inputting the multiple first to-be-detected image samples into the candidate keypoint detection network to obtain first candidate keypoints corresponding to each first to-be-detected image sample; and determining the candidate key point detection network as a target key point detection network until the stability result meets a first preset condition.
Optionally, the target self-encoding network includes: a first encoder, a second encoder and a decoder;
the processing module 1002 is specifically configured to, for each first to-be-detected image sample, perform first encoding processing on a first candidate keypoint corresponding to the first to-be-detected image sample by using a first encoder to obtain a keypoint feature corresponding to the first candidate keypoint; performing second encoding processing on the first image sample to be detected by using a second encoder to obtain image characteristics corresponding to the first image sample to be detected; and obtaining a target generation image sample corresponding to the first image sample to be detected by using the decoder according to the key point characteristics and the image characteristics.
Optionally, the processing module 1002 is specifically configured to train to obtain the target self-encoding network in the following manner:
inputting a plurality of second image samples to be detected containing second target objects into a first initial key point detection network to obtain first initial key points corresponding to each second image sample to be detected respectively, wherein the second image samples to be detected contain the image samples to be detected with the second target objects not labeled with key point information;
inputting a plurality of second image samples to be detected and the first initial key points corresponding to each second image sample to be detected into a candidate self-coding network to obtain a candidate generated image;
inputting the candidate generated image into a second initial key point detection network to obtain a second initial key point corresponding to the candidate generated image, wherein the first initial key point detection network is the same as the second initial key point detection network;
performing mode consistency judgment according to the first initial key point and the second initial key point to obtain a judgment result;
and updating parameters of the candidate self-coding network according to the judgment result, returning to execute the step of inputting a plurality of second image samples to be detected containing second target objects into a first initial key point detection network to obtain first initial key points corresponding to each second image sample to be detected until the judgment result meets a second preset condition, and determining the candidate self-coding network as the target self-coding network.
Optionally, the processing module 1002 is specifically configured to generate a first thermodynamic diagram according to the first initial key point; generating a second thermodynamic diagram according to the second initial key point; acquiring key point features corresponding to the first thermodynamic diagram, and acquiring key point features corresponding to the second thermodynamic diagram; and obtaining the judgment result according to the key point characteristics corresponding to the first thermodynamic diagram and the key point characteristics corresponding to the second thermodynamic diagram.
Optionally, the processing module 1002 is specifically configured to obtain a candidate self-coding network by:
and training a plurality of third image samples to be detected containing third target objects to obtain the candidate self-coding network, wherein the third target objects in the plurality of third image samples to be detected are marked with key point information.
Optionally, the processing module 1002 is specifically configured to perform first encoding processing on a target key point labeled by the third image sample to be detected by using a first encoder, so as to obtain a target key point feature; performing second coding processing on the third image sample to be detected by using a second coder to obtain a target image characteristic corresponding to the third image sample to be detected; obtaining a target generation image according to the target key point characteristics and the target image characteristics by using the decoder; and training an initial self-coding network based on the third image sample to be detected and the target generation image until the initial self-coding network is converged to obtain the candidate self-coding network.
Optionally, the candidate keypoint detection network is obtained by training a plurality of fourth image samples to be detected containing fourth target objects, where the fourth target objects in the plurality of fourth image samples to be detected are labeled with keypoint information.
The processing module 1002 is further configured to generate an image sample based on a plurality of targets, obtaining a generated video sequence; and acquiring a stability result by using a target video stability judging network according to the generated video sequence and the video sequences corresponding to the plurality of first to-be-detected image samples included in the video samples.
The processing module 1002 is further configured to perform three-dimensional convolution on a plurality of target generation images included in the generated video sequence by using a spatio-temporal convolution network to extract a first spatio-temporal feature; performing two-dimensional convolution operation on the first time-space characteristic to extract a first deepened space characteristic; performing three-dimensional convolution on a plurality of first to-be-detected image samples included in the video samples by utilizing a space-time convolution network to extract second space-time characteristics; performing two-dimensional convolution operation on the second space-time characteristic to extract a second deepened space characteristic; and obtaining a stability result according to the first space-time characteristic, the first deepening space characteristic, the second space-time characteristic and the second deepening space characteristic.
Optionally, the target object is a human face.
Fig. 11 is a schematic structural diagram of an apparatus for detecting key points of a target object according to the present disclosure, and the apparatus of this embodiment includes an obtaining module 1101 and a processing module 1102, wherein,
the obtaining module 1101 is configured to obtain a video to be detected, where the video to be detected includes: a plurality of first images to be detected including a first target object;
the processing module 1102 is configured to detect the multiple first images to be detected by using a target keypoint detection network to obtain a target object keypoint in each first image to be detected, where the target keypoint detection network is obtained by using the training method of the target object key detection model described in the foregoing embodiments.
The disclosed embodiment provides a computer device, including: the memory, the processor, and the computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the technical solution of any one of the method embodiments shown in fig. 1 to 9, and the implementation principle and the technical effect are similar, and are not described herein again.
The present disclosure also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the solution of the method embodiment shown in any one of fig. 1 to 9.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method for training a detection model of key points of a target object is characterized by comprising the following steps:
obtaining a video sample, wherein the video sample comprises: the method comprises the steps that a plurality of first to-be-detected image samples containing first target objects are obtained, wherein the first to-be-detected image samples contain to-be-detected image samples of which the first target objects are not marked with key point information;
inputting a plurality of first to-be-detected image samples into a candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample;
inputting the plurality of first image samples to be detected and the first candidate key points corresponding to each first image sample to be detected into a target self-encoding network to obtain a plurality of target generation image samples;
generating image samples based on a plurality of targets to obtain a generated video sequence;
obtaining a stability result by using a target video stability judging network according to the generated video sequence and the video sequences corresponding to the plurality of first to-be-detected image samples included in the video samples;
updating parameters of the candidate key point detection network according to the stability results of the image samples generated by the targets, and returning to execute the step of inputting the first to-be-detected image samples into the candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample; and determining the candidate key point detection network as a target key point detection network until the stability result meets a first preset condition.
2. The method of claim 1, wherein the target self-encoded network comprises: a first encoder, a second encoder and a decoder;
inputting the plurality of first to-be-detected image samples and the first candidate keypoints corresponding to each first to-be-detected image sample into a target self-encoding network to obtain a plurality of target-generated image samples, including:
for each first image sample to be detected, performing the following steps to obtain a plurality of target generation image samples:
performing first encoding processing on a first candidate key point corresponding to the first image sample to be detected by using a first encoder to obtain a key point feature corresponding to the first candidate key point;
performing second encoding processing on the first image sample to be detected by using a second encoder to obtain image characteristics corresponding to the first image sample to be detected;
and obtaining a target generation image sample corresponding to the first image sample to be detected by using the decoder according to the key point characteristics and the image characteristics.
3. The method of claim 2, wherein the target self-coding network is obtained by training:
inputting a plurality of second image samples to be detected containing second target objects into a first initial key point detection network to obtain first initial key points corresponding to each second image sample to be detected respectively, wherein the second image samples to be detected contain the image samples to be detected with the second target objects not labeled with key point information;
inputting a plurality of second image samples to be detected and the first initial key points corresponding to each second image sample to be detected into a candidate self-coding network to obtain a candidate generated image;
inputting the candidate generated image into a second initial key point detection network to obtain a second initial key point corresponding to the candidate generated image, wherein the first initial key point detection network is the same as the second initial key point detection network;
performing mode consistency judgment according to the first initial key point and the second initial key point to obtain a judgment result;
and updating parameters of the candidate self-coding network according to the judgment result, returning to execute the step of inputting a plurality of second image samples to be detected containing second target objects into a first initial key point detection network to obtain first initial key points corresponding to each second image sample to be detected until the judgment result meets a second preset condition, and determining the candidate self-coding network as the target self-coding network.
4. The method according to claim 3, wherein said performing modal coincidence decision according to the first initial keypoint and the second initial keypoint to obtain a decision result comprises:
generating a first thermodynamic diagram according to the first initial key point;
generating a second thermodynamic diagram according to the second initial key point;
acquiring key point features corresponding to the first thermodynamic diagram, and acquiring key point features corresponding to the second thermodynamic diagram;
and obtaining the judgment result according to the key point characteristics corresponding to the first thermodynamic diagram and the key point characteristics corresponding to the second thermodynamic diagram.
5. The method according to claim 3 or 4, wherein the candidate self-coding networks are obtained by:
and training a plurality of third image samples to be detected containing third target objects to obtain the candidate self-coding network, wherein the third target objects in the plurality of third image samples to be detected are marked with key point information.
6. The method of claim 5, wherein the training of the candidate self-coding network by using a plurality of third image samples to be detected including a third target object comprises:
performing first coding processing on target key points marked on the third image sample to be detected by using a first coder to obtain target key point characteristics;
performing second coding processing on the third image sample to be detected by using a second coder to obtain a target image characteristic corresponding to the third image sample to be detected;
obtaining a target generation image according to the target key point characteristics and the target image characteristics by using the decoder;
and training an initial self-coding network based on the third image sample to be detected and the target generation image until the initial self-coding network is converged to obtain the candidate self-coding network.
7. The method according to any one of claims 1 to 4, wherein the candidate keypoint detection network is obtained by training a plurality of fourth image samples to be detected containing a fourth target object, wherein the fourth target object in the plurality of fourth image samples to be detected is labeled with keypoint information.
8. The method according to claim 1, wherein the obtaining the stability result according to the generated video sequence and the video sequences corresponding to the plurality of first to-be-detected image samples included in the video samples by using the target video stability determination network comprises:
performing three-dimensional convolution on a plurality of target generation images included in the generated video sequence by utilizing a space-time convolution network to extract a first space-time characteristic; performing two-dimensional convolution operation on the first time-space characteristic to extract a first deepened space characteristic;
performing three-dimensional convolution on a plurality of first to-be-detected image samples included in the video samples by utilizing a space-time convolution network to extract second space-time characteristics; performing two-dimensional convolution operation on the second space-time characteristic to extract a second deepened space characteristic;
and obtaining a stability result according to the first space-time characteristic, the first deepening space characteristic, the second space-time characteristic and the second deepening space characteristic.
9. The method of any one of claims 1-4, wherein the target object is a human face.
10. A method for detecting key points of a target object is characterized by comprising the following steps:
acquiring a video to be detected, wherein the video to be detected comprises: a plurality of first images to be detected including a first target object;
detecting a plurality of first images to be detected by using a target key point detection network to obtain a target object key point in each first image to be detected, wherein the target key point detection network is obtained by the training method of the target object key detection model according to any one of claims 1 to 9.
11. A training device for a key point detection model of a target object is characterized by comprising:
an obtaining module, configured to obtain a video sample, where the video sample includes: the method comprises the steps that a plurality of first to-be-detected image samples containing first target objects are obtained, wherein the first to-be-detected image samples contain to-be-detected image samples of which the first target objects are not marked with key point information;
the processing module is used for inputting a plurality of first to-be-detected image samples into a candidate key point detection network to obtain first candidate key points corresponding to each first to-be-detected image sample;
the processing module is further configured to input the plurality of first to-be-detected image samples and the first candidate keypoints corresponding to each of the first to-be-detected image samples into a target self-encoding network, so as to obtain a plurality of target-generated image samples;
the processing module is further used for generating image samples based on a plurality of targets to obtain a generated video sequence; obtaining a stability result by using a target video stability judging network according to the generated video sequence and the video sequences corresponding to the plurality of first to-be-detected image samples included in the video samples;
the processing module is further configured to update parameters of the candidate keypoint detection network according to the stability results of the image samples generated by the multiple targets, and return to execute inputting of the multiple first to-be-detected image samples into the candidate keypoint detection network to obtain first candidate keypoints corresponding to each first to-be-detected image sample; and determining the candidate key point detection network as a target key point detection network until the stability result meets a first preset condition.
12. An apparatus for detecting key points of a target object, comprising:
the acquisition module is used for acquiring a video to be detected, wherein the video to be detected comprises: a plurality of first images to be detected including a first target object;
a processing module, configured to detect, by using a target keypoint detection network, a plurality of the first images to be detected to obtain a target object keypoint in each first image to be detected, where the target keypoint detection network is obtained by the training method of the target object keypoint detection model according to any one of claims 1 to 9.
13. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 9 or implementing the steps of the method of claim 10 when executing the computer program.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9 or carries out the steps of the method of claim 10.
CN202110986015.XA 2021-08-26 2021-08-26 Method and equipment for training detection model of key points of target object and detection method and equipment Active CN113436064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110986015.XA CN113436064B (en) 2021-08-26 2021-08-26 Method and equipment for training detection model of key points of target object and detection method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110986015.XA CN113436064B (en) 2021-08-26 2021-08-26 Method and equipment for training detection model of key points of target object and detection method and equipment

Publications (2)

Publication Number Publication Date
CN113436064A CN113436064A (en) 2021-09-24
CN113436064B true CN113436064B (en) 2021-11-09

Family

ID=77798018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110986015.XA Active CN113436064B (en) 2021-08-26 2021-08-26 Method and equipment for training detection model of key points of target object and detection method and equipment

Country Status (1)

Country Link
CN (1) CN113436064B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533721A (en) * 2019-08-27 2019-12-03 杭州师范大学 A kind of indoor objects object 6D Attitude estimation method based on enhancing self-encoding encoder
CN110868598A (en) * 2019-10-17 2020-03-06 上海交通大学 Video content replacement method and system based on adversarial generative network
CN111523511A (en) * 2020-05-08 2020-08-11 中国科学院合肥物质科学研究院 Video image Chinese wolfberry branch detection method for Chinese wolfberry harvesting and clamping device
CN112308770A (en) * 2020-12-29 2021-02-02 北京世纪好未来教育科技有限公司 Portrait conversion model generation method and portrait conversion method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10154281B2 (en) * 2016-01-22 2018-12-11 Mitsubishi Electric Research Laboratories, Inc. Method and apparatus for keypoint trajectory coding on compact descriptor for video analysis
CN110738071A (en) * 2018-07-18 2020-01-31 浙江中正智能科技有限公司 face algorithm model training method based on deep learning and transfer learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533721A (en) * 2019-08-27 2019-12-03 杭州师范大学 A kind of indoor objects object 6D Attitude estimation method based on enhancing self-encoding encoder
CN110868598A (en) * 2019-10-17 2020-03-06 上海交通大学 Video content replacement method and system based on adversarial generative network
CN111523511A (en) * 2020-05-08 2020-08-11 中国科学院合肥物质科学研究院 Video image Chinese wolfberry branch detection method for Chinese wolfberry harvesting and clamping device
CN112308770A (en) * 2020-12-29 2021-02-02 北京世纪好未来教育科技有限公司 Portrait conversion model generation method and portrait conversion method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Boundary Adjusted Network Based on Cosine Similarity for Temporal Action Proposal Generation;Zheng, Jingye等;《NEURAL PROCESSING LETTERS 》;20210531;第53卷(第4期);2813-2828 *
基于生成模型的人脸图像合成与分析;黄怀波;《中国博士学位论文全文数据库 (信息科技辑)》;20200215(第2期);I138-68 *

Also Published As

Publication number Publication date
CN113436064A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN108549836B (en) Photo copying detection method, device, equipment and readable storage medium
JP5711387B2 (en) Method and apparatus for comparing pictures
CN107316035A (en) Object identifying method and device based on deep learning neutral net
CN108921782A (en) A kind of image processing method, device and storage medium
CN111931567B (en) Human body identification method and device, electronic equipment and storage medium
JP2018028899A5 (en)
CN107220931A (en) A kind of high dynamic range images method for reconstructing based on grey-scale map
WO2023035425A1 (en) Auto-encoder training method and component, and method and component for detecting abnormal image
CN111784624B (en) Target detection method, device, equipment and computer readable storage medium
Ulutas et al. Frame duplication/mirroring detection method with binary features
JP2021068056A (en) On-road obstacle detecting device, on-road obstacle detecting method, and on-road obstacle detecting program
CN112084939A (en) Image feature data management method and device, computer equipment and storage medium
Wu et al. Visual structural degradation based reduced-reference image quality assessment
Lecca et al. Comprehensive evaluation of image enhancement for unsupervised image description and matching
CN110245660B (en) Webpage glance path prediction method based on saliency feature fusion
CN116977674A (en) Image matching method, related device, storage medium and program product
CN113436064B (en) Method and equipment for training detection model of key points of target object and detection method and equipment
KR20210076660A (en) Method and Apparatus for Stereoscopic Image Quality Assessment Based on Convolutional Neural Network
CN115567736A (en) Video content detection method, device, device and storage medium
Saxena et al. Video inpainting detection and localization using inconsistencies in optical flow
KR102101481B1 (en) Apparatus for lenrning portable security image based on artificial intelligence and method for the same
KR101394473B1 (en) Method for detecting moving object and surveillance system thereof
JP4879257B2 (en) Moving object tracking device, moving object tracking method, and moving object tracking program
Li et al. Detection of partially occluded pedestrians by an enhanced cascade detector
WO2010070128A1 (en) Method for multi-resolution motion estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant