CN103413145A

CN103413145A - Articulation point positioning method based on depth image

Info

Publication number: CN103413145A
Application number: CN2013103742367A
Authority: CN
Inventors: 刘亚洲; 张艳; 孙权森
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2013-08-23
Filing date: 2013-08-23
Publication date: 2013-11-27
Anticipated expiration: 2033-08-23
Also published as: CN103413145B

Abstract

The invention discloses an articulation point positioning method based on a depth image. The method comprises a training process and a recognition process. The training process comprises the first step of calculating the random characters of a training sample, and the second step of training a decision tree classifier according to the characters. The recognition process comprises the third step of calculating the random characters of a test sample, the fourth step of classifying the pixel points of an object according to the decision tree classifier to obtain the portions of different classes of the object, and the fifth step of calculating the position of the articulation point of each portion. The method can reflect the local gradient information around pixels, is high in calculation efficiency, enhances the rotation invariance of the characters, and improves the accuracy of object recognition.

Description

Articulation point localization method based on depth image

Technical field

The present invention relates to computer vision, pattern-recognition and field of human-computer interaction, more particularly, relate to a kind of localization method of articulation point based on depth image.

Background technology

Articulation point localization method based on depth image refers to, comprises in the depth image of target at a width, determines the method for the articulation point position of target.The target here specifically refers to staff or human body.By determining the articulation point position of target, can judge skeleton structure, and then computing machine can realize the response for target, finally realize the purpose of man-machine interaction or the automatic processing and identification of computing machine.

Depth image take the form of a kind of two dimensional gray figure.But be different from traditional gray level image, on each pixel of depth image with message reflection be the distance of target object apart from video camera, so the pixel value of depth image is called depth value.Depth image has the following advantages: 1, can not run into the impact of the factors such as illumination, shade; 2, use depth image can directly utilize the three-dimensional information of target object, simplified greatly the problems such as three-dimensional reconstruction, identification, location of object.

The articulation point location comprises two committed steps: the study of sorter and the location of articulation point.At first the study of sorter depend on feature selecting, and for current goal, whether the feature of selection has very strong descriptive power, directly determined the success or not of target identification; Then, on the basis of fixed feature, determine a series of rules that can classify to current goal.The definite of articulation point refers to, after the sorter that utilizes study to obtain completes the Classification and Identification of target, at each position of target, finds the position of articulation point.

In the feature extraction of traditional visible images, Gradient Features is the common features of two large classes with putting feature.Gradient Features is such as the Canny operator, Laplace-Gaussian operator and histograms of oriented gradients HOG etc.For the first two operator, can reasonablely detect the point of all edges in image, but these two kinds of methods probably are divided into image several disconnected region units.HOG is method very classical in human detection and identification, and its advantage is that processing accuracy is high, detects effective.But it is high that shortcoming is dimension, and computing cost is large, therefore process in real time and be difficult to guarantee.On the other hand, common some feature such as angle point, round dot etc., although dimension is not high, but be difficult to adapt to the changeable form of human body in the situation that background is more in disorder, and the some feature also need to carry out the operations such as cluster, strengthened the difficulty of dealing with problems, caused the problem that Detection accuracy is low.Therefore, simple employing Gradient Features or some feature are not good solutions.

Summary of the invention

The technical problem to be solved in the present invention is, in above-mentioned target identification technology, the problem that real-time is poor or accuracy rate is low of the target identification that the employing single features causes as basis of characterization, propose a kind of method of extracting the random character of depth data and carried out training classifier, finally completed the method for articulation point location.

The technical solution that realizes the object of the invention is: the method comprises trains and identifies two processes,

Training process comprises the following steps:

1) random character of calculation training sample;

2) according to described features training decision tree classifier.

Identifying comprises the following steps:

3) calculate the random character of test sample book;

4) utilize decision tree classifier to classify to each pixel of target, obtain the different classes of position of target;

5) calculate the position of the articulation point at each position.

In said method, the training sample in described step 1) refers to the depth image that only retains target and mark through true value.

In said method, described step 1) comprises following concrete steps:

11) employing formula (1) is calculated the centre of form c (cx, cy) of target:

\{\begin{matrix} cx = \frac{1}{k} Σ_{i = 1}^{k} x_{i} \\ cy = \frac{1}{k} Σ_{i = 1}^{k} y_{i} \end{matrix} - - - (1)

Wherein, k means the sum of pixel on target, (x _i, y _i) mean the coordinate of each pixel on target, i=1,2 ... k;

12) take mark point is starting point, generates with two different reference point that random vector is pointed, and wherein, the length rz=r1* α/valz of random vector, r1 are the random length generated, and α is coefficient, and valz is the depth value of starting point; Angle beta=the θ of random vector+ο, θ are the angles of starting point and straight line that the centre of form connects and horizontal axis, and ο is the random angle generated; If two reference point have one at least not on image, the value of the feature of starting point is 1; Otherwise calculate the depth difference of two reference point: if depth difference is greater than at self-defined threshold set

In optional one, the value of the feature of starting point is 1, otherwise the value of the feature of starting point is 0;

13) for each mark point, repeating step 12) fn feature of fn generation, the inferior ordered pair feature generated according to feature is simultaneously carried out the numbering of 1～fn.

In said method, described step 2) comprise following concrete steps:

21) using the root node of decision tree classifier as present node;

22) calculate the information gain of each feature of present node:

Gain (ϵ) = entroy (T) - Σ_{i = 1}^{m} \frac{T_{i}}{T} entroy (T_{i}),

Wherein, the numbering of ε representation feature, ε=1,2 ... fn, T mean that present node namely marks a sample set, T _iMean the subset of sample set, m means the subset number, according to the value that is numbered the feature of ε, is 0 or 1 here, and will mark and a little be divided into two subsets is m=2, and entroy (T) means the information entropy of sample set,

entroy (T) = - Σ_{j = 1}^{s} p (C_{j}, T) \log_{2}^{p (C_{j}, T)},

P(C _j, T) mean to belong to classification C in sample set T _jFrequency, s means the number of classification in T;

23) will have the numbering of feature of maximum information gain as the numbering of present node;

24) if it is 0 that the mark point is numbered the value of the feature of present node numbering, this mark point is divided into to the left branch node of present node, otherwise is divided into the right branch node;

25) using branch node as present node, if the information entropy of present node is less than the threshold value h τ of entropy, perhaps the number of plies of decision tree reaches maximum number of plies depth, perhaps the mark point number of present node is less than sample point minimal amount small, stop division, using present node as leaf node, otherwise repeating step 22)～25);

26) the mark point category distribution of leaf node is carried out to probability statistics.

In said method, in described step 3), test sample book is to have removed background, only retains the depth image of target.

In said method, described step 3) comprises following concrete steps:

31) employing formula (1) is calculated the centre of form of target;

The point of 32) take on target is starting point, generates with two different reference point that random vector is pointed, and the length rz=r1* α/valz of random vector wherein, r1 is the random length generated, and α is coefficient, and valz is the depth value of starting point; Angle beta=the θ of random vector+ο, θ are the angles of starting point and straight line that the centre of form connects and horizontal axis, and ο is the random angle generated; Two reference point have one at least not on image, and the value of the feature of starting point is 1; Otherwise calculate the depth difference of two reference point, if depth difference is greater than at self-defined threshold set

In optional one, the value of the feature of starting point is 1, otherwise the value of the feature of starting point is 0.

33) for each mark point, repeating step 32) fn feature of fn generation, the inferior ordered pair feature generated according to feature is simultaneously carried out the numbering of 1～fn.

In said method, described step 4) comprises following concrete steps:

41) using the root node of decision tree as present node;

42) if the value of the feature that is numbered the present node numbering of pixel is 0, is divided into the left branch node of present node, otherwise is divided into the right branch node;

43) branch node pixel is divided into is as present node, repeating step 42), 43), until pixel arrives leaf node;

44), if the maximum probability of all categories of leaf node is greater than probability threshold value p τ, judges classification with maximum probability classification as pixel, otherwise give up this pixel.

In said method, described step 5) comprises following concrete steps:

51) from identical category each the some q _iSet out, find respectively corresponding articulation point position candidate p _i, i=1,2 ... r, wherein, r is the sum of the point of identical category.

52) institute's related node position candidate is screened, find the position of articulation point.

In said method, described step 51) comprise following concrete steps:

511) with q _iCentered by point, generating yardstick is the rectangular characteristic zone of w * h;

512) in employing formula (1) calculated characteristics zone with the centre of form of the generic point of central point;

513) distance of computing center's point and the centre of form;

514) if distance is not more than distance threshold d τ, using the centre of form as the articulation point position candidate, otherwise generate point centered by the centre of form, yardstick is the rectangular characteristic zone of w * h, repeating step 512)～514), if repeat abundant number of times, for example also do not find the articulation point position candidate 30 times, the centre of form that will obtain for the last time is as articulation point position candidate p _i.

In said method, described step 52) comprise following concrete steps:

521) with p ₁As the initial score object, according to p _i, i=2,3 ..., the order of r, repeat next step;

522), according to the sequencing that becomes the score object, calculate each score object and p _iDistance, as score object and a p _iDistance be less than threshold value dis τ, the mark of this score object adds 1, no longer score object and p of calculated for subsequent _iDistance.If all score objects and p _iThe distance all be not less than dis τ, by p _iAs next one score object;

523) select the score object with highest score tops, if tops is greater than score threshold sco τ, should score to liking the articulation point position; Otherwise reduce sco τ, until find the articulation point position.

The present invention compared with prior art, its remarkable advantage: the present invention utilizes the character of depth image, proposed to adopt the random character of the pixel depth difference of random 2 on every side as pixel, pixel local gradient information on every side be can reflect, a feature and Gradient Features good combination can be regarded as.This feature only relates to the simple arithmetic operations of pixel value, and counting yield is high, for processing advantage is provided in real time.In addition, in the random angle of this random character, add the deviation angle of the pixel of target for the target centre of form, strengthened the rotational invariance of feature, improved the accuracy rate of target identification.

The accompanying drawing explanation

Fig. 1 is based on the articulation point localization method process flow diagram of depth image.

Fig. 2 is the staff schematic diagram of mark true value.

Fig. 3 is the schematic diagram of generating reference point.

Fig. 4 is the schematic diagram that adopts decision tree classifier to classify to pixel.

Fig. 5 is the sorted position of staff schematic diagram.

Fig. 6 is the schematic diagram of staff articulation point position.

Embodiment

Integrated operation flow process of the present invention as shown in Figure 1.Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.The depth image of data source of the present invention for obtaining from image capture device, image capture device can be binocular vision device or structured light projecting device.This object point of the value representation of each pixel on depth image is to the distance of camera projection centre.By depth image, can obtain shape information and the three dimensional local information of object.

The present invention adopts and has removed background, and the depth map that only retains the staff target is made sample.On this basis, the depth map of process true value mark is as training sample, and the depth map of process true value mark is not as test sample book.11 articulation points of staff mark are respectively: the binding site of palm and wrist, and the finger tip of 5 fingers and finger root, 0～10 means the label of the mark of these 11 articulation points, to mean different classifications, each articulation point marks the point of sufficient amount.As shown in Figure 2.

Articulation point localization method of the present invention comprises that training classifier and application class device carry out two key steps of target identification.

The process of training classifier refers to by known training sample, and the process of the classifying rules of target is determined in study, comprises the random character of calculation training sample and according to two processes of described features training decision tree classifier.

Step 1: the random character of calculation training sample.

Step 11: the angle that calculates the centre of form, the every bit on target and straight line that the centre of form connects and the horizontal axis of target.

Employing formula (1) is calculated the centre of form c (cx, cy) of target:

\{\begin{matrix} cx = \frac{1}{k} Σ_{i = 1}^{k} x_{i} \\ cy = \frac{1}{k} Σ_{i = 1}^{k} y_{i} \end{matrix} - - - (1)

Wherein, k means the sum of pixel on target, (x _i, y _i) mean the coordinate of each pixel on target, i=1,2 ... k.

Employing formula (2) is calculated the every bit p (x, y) of target and the angle that c (cx, cy) connects straight line and horizontal axis:

θ＝arctan[(y-cy)/(x-cx)] （2）

Step 12: read the articulation point markup information of target, for each mark point l (x, y), number of computations is the feature of fn, can make that the fn span is 1000～10000, and wherein the computation process of each feature is as follows:

Step 121: the l (x, y) of take is starting point, generates at random two reference point.

The arbitrary reference point generated is designated as q (qx, qy), and q (qx, qy) take l (x, y) be starting point, take random-length as rz, angle is the vector of β point pointed at random, namely

\{\begin{matrix} qx = x + rz * \cos β \\ qy = y + rz * \sin β \end{matrix} - - - (3)

Wherein, rz=r1* α/valz, r1 are the random length generated, and can make it get the arbitrary value in 0～1 at every turn, and α is coefficient, can make that its span is that 1000～10000, valz is the depth value of l (x, y); β=θ+ο, θ are the angles that l (x, y) and centre of form c (cx, cy) connect straight line and horizontal axis, and employing formula (2) is calculated, and ο is the random angle generated, and can make it get the arbitrary value in 0 °～360 ° at every turn.Fig. 3 is shown in by the reference point schematic diagram.

Step 122: the eigenwert of determining l (x, y) according to the situation of reference point.

If two reference point are all on image, their depth value is respectively d ₁And d ₂, the poor dd=d of compute depth ₁-d ₂, the value f of the feature of l (x, y) is determined by following formula:

f = \{\begin{matrix} 1, dd > t \\ 0, dd < t \end{matrix} - - - (4)

Wherein, t is the random threshold value generated, and can be at self-defined threshold set

In optional one.Here can establish

If two reference point have one at least not on image, f=1.

Step 123: according to the order of calculated characteristics, be the ε time, determine l (x, y) feature be numbered ε.

Step 2: according to the features training decision tree classifier of step 1 extraction.

The corresponding training sample of the root node of decision tree classifier namely marks a sample set, and what branch node was corresponding is the subset of mark point sample set, using root node as present node, carries out following steps:

Step 21: adopt following formula to calculate the information gain of each feature of present node:

Gain (ϵ) = entroy (T) - Σ_{i = 1}^{m} \frac{T_{i}}{T} entroy (T_{i}) - - - (5)

entroy (T) = - Σ_{j = 1}^{s} p (C_{j}, T) \log_{2}^{p (C_{j}, T)},

P(C _j, T) mean to belong to classification C in sample set T _jFrequency, s means the number of classification in T, s=11 here.

Step 22: present node is split into to branch node according to the feature with maximum information gain.

The numbering that will have the feature of maximum information gain is numbered as present node, is 0 if the mark point is numbered the value of the feature of present node numbering, this mark point is divided into to the left branch node of present node, otherwise is divided into the right branch node.

Step 23: each branch node of present node, respectively as present node, is judged to the condition that it stops dividing below whether meeting:

A) information entropy of node is less than the threshold value h τ of entropy, and h τ can get 0.5 here;

B) number of plies of decision tree reaches maximum number of plies depth, can make here that the depth span is 10～30;

C) the mark point number of node is less than sample point minimal amount small, and the span that can make small here is 100～1000.

If present node does not meet the condition that stops dividing, repeating step 22～23; If present node meets the condition that stops dividing, present node is leaf node.Category distribution to the mark point of leaf node is carried out probability statistics.

If obtain k leaf node after the decision tree classifier training finishes, the mark point number of i leaf node is designated as n _i, i=1,2 ... k.The label of i leaf node is that the mark point number of the classification of j is designated as n _Ij, j=0,1 ... 10.The label of i leaf node is the probability of the classification of j

The maximum probability of the mark point classification of this leaf node is P (i, j _max)=max{P _Ij, j=0,1 ..., 10.Wherein, j _maxMean to have the classification of maximum probability.

Next step, adopt decision tree classifier to carry out Classification and Identification to test sample book, and Classification and Identification comprises the random character that calculates test sample book, three of the positions process of utilizing decision tree classifier image to be classified and calculate articulation point.

Step 3: the random character that calculates test sample book.

Step 31: employing formula (1) is calculated the centre of form of target, and employing formula (2) is calculated each pixel of target and the angle of straight line that the centre of form connects and horizontal axis.

Step 32: for each pixel p (x, y) of target, number of computations is the feature of fn, and wherein the computation process of each feature is as follows:

Step 321: the p (x, y) of take is starting point, generates at random two reference point.

The arbitrary reference point generated is designated as q (qx, qy), q (qx, qy) be to take p (x, y) to be starting point, the random-length of take is the vector of β point pointed as rz, random angle, employing formula (3) is calculated, in formula, and rz=r1* α/valz, r1 is the random length generated, can make it get the arbitrary value in 0～1, α is coefficient at every turn, can make that its span is 1000～10000, valz is the depth value of p (x, y); β=θ+ο, θ are the angles that p (x, y) and centre of form c (cx, cy) connect straight line and horizontal axis, and employing formula (2) is calculated, and ο is the random angle generated, and can make it get the arbitrary value in 0 °～360 ° at every turn.

Step 322: according to the situation of reference point, determine the eigenwert of p (x, y), its method and step 122 are identical.

Step 323: according to the order of calculated characteristics, be the ε time, determine p (x, y) feature be numbered ε.

Step 4: utilize decision tree classifier to classify to each pixel p (x, y) of target, assorting process as shown in Figure 4.

Step 41: using the root node of decision tree as present node, carry out following steps.

Step 42: according to the value of the feature that is numbered present node numbering of pixel, be 0 or 1, be divided into the left or right branch node of present node, and using branch node as present node.

Step 43: repeating step 42, until present node is leaf node, if the P of this leaf node is (i, j _max) be greater than probability threshold value p τ, j _maxBe the class label of pixel, otherwise give up this pixel.

After for all pixels of target, classifying, belong to the position that other pixel of same class forms staff, pixel for the different parts intersection, its classification is usually clear and definite not, step 43 can arrange p τ >=0.7, can remove the pixel that the classification determinacy is lower like this, belongs to same classification at the pixel that guarantees to a greater extent same position, other pixel of same class is distributed in same position, for having simplified condition in the articulation point position of further determining each position.Staff station diagram after Fig. 5 presentation class, the black line of two position boundarys and do not have the part of digital label to mean the indefinite pixel be rejected of classification.

Step 5: the particular location that calculates the articulation point at each position.

Step 51: from each some q of identical category _iSet out, find respectively corresponding articulation point position candidate p _i, i=1,2 ... r, wherein, r is the sum of the point of identical category.

Step 511: with q _iCentered by point, generating yardstick is the rectangular characteristic zone W (x, y, w, h) of w * h.

If W (x, y, w, h) not exclusively on image, with the lap of image and W (x, y, w, h) as characteristic area.

Step 512: in characteristic area, employing formula (1) is calculated all and central point q _iThe centre of form c (cx, cy) of generic point.

Step 513: adopt following formula to calculate q _iDistance with c (cx, cy):

dis = \sqrt{{(cx - x)}^{2} + {(cy - y)}^{2}} .

Step 514: if dis≤d τ, using c (cx, cy) as articulation point position candidate p _iIf dis>d τ, with c (cx, cy) point centered by, generate rectangular characteristic zone W (cx, cy, w, h), repeating step 512)～514), for example also do not find the articulation point position candidate 30 times if repeat abundant number of times, the centre of form that will obtain for the last time is as articulation point position candidate p _i.Here d τ is distance threshold, can make that its span is 0.1～0.3.

Step 52: to the related node position candidate p of institute _iScreen, find the position of articulation point.

Step 521: with p ₁As the initial score object, according to p _i, i=2,3 ..., the order of r, repeat next step.

Step 522: according to the sequencing that becomes the score object, calculate each score object and p _iDistance, as score object and a p _iDistance be less than threshold value dis τ, the mark of this score object adds 1, no longer score object and p of calculated for subsequent _iDistance.If all score objects and p _iThe distance all be not less than dis τ, by p _iAs next one score object.Here dis τ can get 2～4.

Step 523: select the score object with highest score tops, if tops is greater than score threshold sco τ, should score to liking the articulation point position; Otherwise reduce sco τ, until find the articulation point position.Here sco τ can get 2～4.

The position of the institute's related node finally, found as shown in Figure 6.

Claims

1. A joint point localization method based on depth image, it is characterized in that comprising training process and recognition process:

The steps of the training process are as follows:

1) Calculate the random features of the training samples;

2) Train a decision tree classifier based on random features;

The steps in the identification process are as follows:

3) Calculate the random features of the test samples;

4) Use the decision tree classifier to classify each pixel of the target to obtain different types of parts of the target;

5) Calculate the position of the joint points of each part.

2. The method for locating joint points based on depth image according to claim 1, characterized in that: the training samples in the step 1) refer to the set of depth images that only keep the target and have been marked with true values.

3. The joint point location method based on depth image according to claim 1 or 2, characterized in that: the step 1) specific steps are as follows:

11) Use formula (1) to calculate the centroid c(cx, cy) of the target:

\{\begin{matrix} cx cx = = \frac{11}{k k} {Σ Σ}_{i i = = 11}^{k k} {x x}_{i i} \\ cy cy = = \frac{11}{k k} {Σ Σ}_{i i = = 11}^{k k} {y the y}_{i i} \end{matrix} - - - - - - ((11))

Among them, k represents the total number of pixels on the target, ( _xi , y _i ) represents the coordinates of each pixel on the target, i=1, 2,...k;

12) Starting from the marked point, generate different reference points pointed by two random vectors, where the length of the random vector is rz=r1*α/valz, r1 is the randomly generated length, α is the coefficient, and valz is the starting point Depth value; the angle β=θ+ο of the random vector, θ is the angle between the line connecting the starting point and the centroid and the horizontal coordinate axis, and ο is the angle generated randomly; if at least one of the two reference points is not on the image, then The value of the feature of the starting point is 1; both reference points are on the image, calculate the depth difference between the two reference points, if the depth difference is greater than in the custom threshold set

Any one selected in , the value of the feature of the starting point is 1, otherwise the value of the feature of the starting point is 0;

13) For each marked point, repeat step 12) fn times to generate fn features, and number the features from 1 to fn according to the order of feature generation.

4. The method for locating joint points based on depth image according to claim 1, characterized in that: said step 2) comprises the following specific steps:

21) Take the root node of the decision tree classifier as the current node;

22) Calculate the information gain of each feature of the current node:

Gain Gain ((ϵ ϵ)) = = entroy entroy ((T T)) - - {Σ Σ}_{i i = = 11}^{m m} \frac{{T T}_{i i}}{T T} entroy entroy (({T T}_{i i})),,

Among them, ε represents the serial number of the feature, ε=1, 2,...fn, T represents the current node is the sample set of label points, T _i represents the subset of the sample set, m represents the number of subsets, according to the number ε The value of the feature is 0 or 1, and the marked points are divided into two subsets, that is, m=2, entroy(T) represents the information entropy of the sample set,

entroy (T) = - Σ_{j = 1}^{the s} p (C_{j}, T) \log_{2}^{p (C_{j}, T)},

p(C _j , T) represents the frequency of category C _j in the sample set T, and s represents the number of categories in T;

23) Use the number of the feature with the largest information gain as the number of the current node;

24) If the value of the feature whose label point number is the current node number is 0, the label point is divided into the left branch node of the current node, otherwise it is divided into the right branch node;

25) Take the branch node as the current node, if the information entropy of the current node is less than the entropy threshold hτ, or the number of layers of the decision tree reaches the maximum layer depth, or the number of labeled points of the current node is less than the minimum number of sample points small, then stop Split, take the current node as a leaf node, otherwise repeat steps 22) to 25);

26) Perform probability statistics on the distribution of label point categories of leaf nodes.

5 . The method for locating joint points based on depth images according to claim 1 , wherein the test samples in the step 3) refer to a collection of depth images that only retain objects. 6 .

6. The method for locating joint points based on depth image according to claim 1 or 5, characterized in that: said step 3) includes the following specific steps:

31) Use formula (1) to calculate the centroid of the target;

32) Use the point on the target as the starting point to generate different reference points pointed to by two random vectors, where the length of the random vector rz=r1*α/valz, r1 is the randomly generated length, α is the coefficient, and valz is the starting point The depth value; the angle β=θ+ο of the random vector, θ is the angle between the line connecting the starting point and the centroid and the horizontal coordinate axis, and ο is a randomly generated angle; if at least one of the two reference points is not on the image, Then the value of the feature of the starting point is 1; if both reference points are on the image, calculate the depth difference between the two reference points, if the depth difference is greater than the custom threshold set

33) For each marked point, repeat step 32) fn times to generate fn features, and number the features from 1 to fn according to the order of feature generation.

7. The joint point location method based on depth image according to claim 1, characterized in that: said step 4) comprises the following specific steps:

41) Take the root node of the decision tree classifier as the current node;

42) If the number of the pixel point is 0 for the characteristic value of the current node number, divide it into the left branch node of the current node, otherwise divide it into the right branch node;

43) Take the branch node where the pixel is divided into as the current node, and repeat steps 42), 43) until the pixel reaches the leaf node;

44) If the maximum probability of all categories of leaf nodes is greater than the probability threshold pτ, the category with the maximum probability is determined to be the category of the pixel, otherwise the pixel is discarded.

8. The method for locating joint points based on depth image according to claim 1, characterized in that: said step 5) comprises the following specific steps:

51) Starting from each point q _i of the same category, find the corresponding joint point candidate position p _i , i=1, 2,...r, where r is the total number of points of the same category;

52) Screen all the candidate positions of the joint nodes to find the position of the joint points.

9. The depth image-based joint point positioning method according to claim 1 or 8, characterized in that: the step 51) includes the following specific steps:

511) Using q _i as the center point, generate a rectangular feature area with a scale of w×h;

512) Use formula (1) to calculate the centroid of points of the same category as the center point in the feature area;

513) Calculate the distance between the center point and the centroid;

514) If the distance is not greater than the distance threshold dτ, use the centroid as the candidate position of the joint point, otherwise generate a rectangular feature area with the centroid as the center point and a scale of w×h, repeat steps 512) to 514), if repeated enough If the joint point candidate position has not been found for too many times, the last obtained centroid is used as the joint point candidate position p _i ; the sufficient number of repetitions is greater than or equal to 30 times.

10. The depth image-based joint point positioning method according to claim 1 or 8, characterized in that: the step 52) includes the following specific steps:

521) Take p ₁ as the initial scoring object, and perform the next step repeatedly in the order of p _i , i=2, 3, ..., r;

522) Calculate the distance between each scoring object and p _i according to the order of scoring objects. When the distance between a scoring object and p _i is less than the threshold disτ, add 1 to the score of the scoring object and no longer calculate The distance between the subsequent scoring object and _pi ; if the distance between all scoring objects and _pi is not less than disτ, then take _pi as the next scoring object;

523) Select the scoring object with the highest score tops, if tops is greater than the score threshold scoτ, then the scoring object is the joint point position; otherwise, reduce scoτ until the joint point position is found.