[go: up one dir, main page]

CN117409077B - Chip attitude detection method based on multi-scale residual UNet segmentation - Google Patents

Chip attitude detection method based on multi-scale residual UNet segmentation Download PDF

Info

Publication number
CN117409077B
CN117409077B CN202311347754.XA CN202311347754A CN117409077B CN 117409077 B CN117409077 B CN 117409077B CN 202311347754 A CN202311347754 A CN 202311347754A CN 117409077 B CN117409077 B CN 117409077B
Authority
CN
China
Prior art keywords
chip
pose
image
average
relative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311347754.XA
Other languages
Chinese (zh)
Other versions
CN117409077A (en
Inventor
王萍
吴静静
安聪颖
李天贺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiuxiao Technology Co ltd
Original Assignee
Wuxi Jiuxiao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiuxiao Technology Co ltd filed Critical Wuxi Jiuxiao Technology Co ltd
Priority to CN202311347754.XA priority Critical patent/CN117409077B/en
Publication of CN117409077A publication Critical patent/CN117409077A/en
Application granted granted Critical
Publication of CN117409077B publication Critical patent/CN117409077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • G06T7/001Industrial image inspection using an image reference approach
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30148Semiconductor; IC; Wafer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/06Recognition of objects for industrial automation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a chip attitude detection method based on multi-scale residual UNet segmentation, and belongs to the technical field of digital image processing. The invention provides a multi-stage pose detection algorithm; for chips with large pose changes, a salient pose detection method based on an SVM classifier is provided, and for chips with small pose changes, a precise positioning method based on salient feature points is provided, and the relative distance between the chips and a tray key point is calculated to realize micro pose detection. In addition, a lightweight multi-scale residual inet semantic segmentation network MR-UNet is also provided, so that accurate segmentation of chips with different scales and a tray under a position-changing gesture is realized, the images of color-spraying chips with complex appearance are unified, and only the outline is reserved. The test result shows that the detection accuracy rate of the invention reaches 99.804 percent, has good robustness and real-time performance, and can meet the requirement of mass production on the industrial site.

Description

Chip attitude detection method based on multi-scale residual UNet segmentation
Technical Field
The invention relates to a chip attitude detection method based on multi-scale residual UNet segmentation, belonging to the technical field of digital image processing.
Background
In the chip manufacturing process, the chip needs to be turned over to a metal disc, and a color identification pattern is printed on the surface. To prevent the chips from being crushed by the flipping device, adding high production costs to the manufacturer, it is necessary to ensure that the chips are in the correct pose in the tray. Currently, a manual visual inspection method is generally adopted, the efficiency is low, and the accuracy is easily influenced by subjective factors. Therefore, research on a chip pose automatic detection method based on machine vision to replace manual visual inspection is a necessary trend.
For researches on pose detection methods, the population can be divided into three major categories: methods based on correspondence, template matching and network regression. The pose detection method based on the corresponding relation is to extract a certain kind of key points in the image, and when the object has the conditions of overlapping, shielding and the like, the detection method based on the key points is often not ideal in effect. The pose detection method based on template matching is to acquire template images through rendering under the condition of distance and visual angle change by utilizing a CAD model of an object, then extract template library features, and match the template closest to a target as the pose of the object during detection, so that the robustness to interference such as shielding, clutter and the like is high, but when the surface features of the object change in a complex manner, the detection precision is reduced. The pose detection method based on the network regression is to automatically extract target characteristics through a neural network and then perform pose detection by using a trained network model. However, if only the neural network is used, the performance of the model is easily affected by ambiguity of the target similarity pose, resulting in inaccurate detection results.
Therefore, it is needed to provide a pose detection algorithm which overcomes the above drawbacks, has high accuracy, and can realize automatic and rapid model change of a multi-model chip with complex surface characteristics.
Disclosure of Invention
In order to improve the accuracy of chip gesture detection, the invention provides a chip gesture detection method based on multi-scale residual UNet segmentation, which comprises the following steps:
step 1: acquiring an image of a chip to be detected, and coarsely positioning the salient feature points of the chip to be detected and the key points of the tray according to the fixed coordinates;
step 2: after coarse positioning, image segmentation is carried out on the chip and the material tray by utilizing a multi-scale residual error UNet model MR-UNet;
step 3: and after the segmentation, the pose of the chip is accurately detected in real time by using a multilevel pose detection method and an SVM classifier and a template matching algorithm.
Optionally, the MR-UNet model in the step 2 replaces the second convolution of the two 3×3 convolution layers of the original UNet network encoder stage with a multi-scale residual convolution module MRC; before each 3×3 convolution, a pixel fill operation with padding equal to 1 is performed around the image; introducing BN layer after each convolution operation, and carrying out standardization processing on the input image data so that the data of each layer accords with the distribution with the mean value of 0 and the variance of 1;
then activating the feature map after the BN layer through a ReLU function;
finally halving the number of downsampling channels;
the multi-scale residual convolution module MRC firstly uses four convolution kernels with different sizes of receptive fields to perform feature extraction on an input image in a parallel mode, and performs padding pixel filling operation on the image;
then, carrying out feature fusion on the extracted features through batch normalization operation and a ReLU activation function, and then carrying out 1X 1 convolution operation on the fused feature map again to reduce the number of channels to be the same as the input image;
and finally, adding the input feature images and the channels corresponding to the feature images after dimension reduction pixel by pixel, and outputting the feature images after the addition after ReLU activation.
Optionally, the multi-stage pose detection method includes:
step 31: counting the number of foreground pixels in the segmented image, and judging that the material is less if the number of the foreground pixels is smaller than a preset threshold value; if the number of the foreground pixels is greater than the preset threshold, continuing to detect in step 32;
step 32: classifying the chip images to be detected by adopting an SVM classifier, predicting the category according to the classification model, preliminarily judging the chip images to be detected as normal pose if the output result is 0, and adopting the step 33 to continue detection; if the output result is 1, the judging result is 'severe tilting pose';
step 33: the image which is preliminarily judged to be in the normal pose in the step 32 is preliminarily judged to be in the normal pose by calculating the relative distance between the remarkable chip characteristic points and the key points of the material tray, and if the relative distance is in the preset tolerance range, the step 34 is adopted to continue detection; if the relative distance exceeds the tolerance range, directly outputting a detection result as an offset pose;
step 34: and (3) preliminarily judging the image of the normal pose in the step (33), calculating the rotation angle of the image by acquiring the minimum circumscribed rectangle of the image, and finally judging the chip as the rotation pose or the normal pose by the angle value.
Optionally, the calculating the preset threshold in step 31 includes: firstly, selecting more than 100 chip images with less material gestures after segmentation, counting the number of pixels with the value equal to 255 in each image, adding and dividing the number of images to obtain the average foreground number of pixels, and taking the average foreground number of pixels as a threshold parameter for distinguishing the less material gestures from other gestures.
Optionally, the training process of the SVM classifier in the step 32 includes:
firstly, respectively placing the pictures of the chips of the normal pose and the severe cocked pose into two folders, reading the pictures to be trained in the two folders, extracting HOG feature descriptors, marking the feature vectors of the chips of the normal pose as class 0 tags, and marking the feature vectors of the chips of the severe cocked pose as class 1 tags; and finally, outputting a classification model by combining the label and the SVM classifier.
Optionally, the method based on the relative distance in step 33 includes:
firstly, precisely positioning a chip significant feature point p (x, y) and a tray key point q (x, y) by using a template matching algorithm;
then, the chip salient feature point p (x, y) is used for relative to the upper left corner point O of the chip area 1 (x 1 ,y 1 ) Absolute coordinates plus O 1 (x 1 ,y 1 ) Absolute coordinates of p (x, y) relative to O (x, y) are obtained relative to absolute coordinates of the image origin O (x, y); with respect to the upper left corner O of the tray area, the tray key q (x, y) 2 (x 2 ,y 2 ) Absolute coordinates plus O 2 (x 2 ,y 2 ) Absolute coordinates of q (x, y) relative to O (x, y) are obtained relative to absolute coordinates of the image origin O (x, y);
finally, subtracting the x and y absolute coordinate values of q (x, y) from the x and y absolute coordinate values of the chip salient feature point p (x, y) to obtain an x relative distance x between the two relative And y relative distance y relative
Obtaining the relative distances x and y of the same row and column positions of all the images by the steps, adding the relative distances x and y and dividing the relative distances x by the number of the images to obtain the average relative distance x and the average relative distance y average And y average
Because the chip can shake slightly in the clamping groove, the chip can shake slightly in x average And y average Increasing the tolerance of n pixels back and forth on the basis of (a), then the tolerance ranges in x and y directions are respectively: [ x ] average -n,x average +n]、[y average -n,y average +n]Repeating the steps to obtain the tolerance ranges of the rest row and column positions in the image;
in the detection process, if the relative distances of x and y are within the tolerance range, the detection result is primarily judged to be the normal pose, and if one of the detection results exceeds the tolerance range, the detection result is directly output to be the 'offset pose'.
Optionally, the step 34 specifically includes:
calculating the minimum circumscribed rectangle of the image, and simultaneously obtaining the rotation angle alpha, alpha epsilon (-90 degrees, 0 degrees) of the minimum circumscribed rectangle;
the judgment angle beta, beta epsilon [0 DEG, 45 DEG ]:
when beta is more than 3.5 degrees, the rotation gesture is judged, and when beta is less than 3.5 degrees, the normal gesture is judged.
Optionally, the tolerance n selects 5 pixels.
A second object of the present invention is to provide a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the chip gesture detection method of any one of the above.
The invention has the beneficial effects that:
the invention provides an MR-UNet network, which effectively partitions a material tray and a chip to obtain a binary image with clear edge contour; secondly, a multi-stage pose detection method is provided, wherein fewer pose chips are detected by calculating the number of foreground pixels of the segmented image, then a HOG+SVM classifier is utilized to detect severe tilted pose chips, offset pose chips are detected based on relative distances, and finally normal and slight rotation pose chips are detected based on minimum circumscribed rectangles, so that detection accuracy is greatly improved. The effectiveness of the MR-UNet model provided by the invention is proved by a deep learning network segmentation performance comparison test; the feasibility and the robustness of the chip gesture detection method are proved by a multi-stage gesture detection algorithm accuracy verification test, and the requirement of chip batch automatic production can be basically met.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a chip.
FIG. 2 is a complex and varied chip image during production, wherein (a) chips with different characters on the surface are shown; (b) trays of different types; (c) chips exhibiting different poses.
FIG. 3 is a coarse positioning result diagram, wherein (a) shows a chip salient feature coarse positioning result diagram; (b) a map of the results of the coarse positioning of the key points of the tray is shown.
Fig. 4 is a flowchart of a chip gesture detection method based on multi-scale residual UNet segmentation of the present invention.
Fig. 5 is a diagram of the multi-scale residual UNet network model structure of the present invention.
Fig. 6 is a block diagram of a multi-scale residual convolution of the present invention.
Fig. 7 is a flowchart of the severe cocking chip detection of the present invention.
FIG. 8 is a schematic diagram of the offset chip pose detection of the present invention.
Fig. 9 is a visual representation of the segmentation effect of the present invention and other prior art methods.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
Embodiment one:
the embodiment provides a chip multi-stage pose detection method based on multi-scale residual error UNet segmentation, firstly, a multi-scale residual error convolution module is introduced into UNet semantic segmentation model, the number of channels is halved, and chips with changeable poses and abundant appearances and complex tray backgrounds are effectively segmented while the light weight of the model is ensured. Secondly, the multi-stage pose detection method is provided, the pose of the chip is accurately detected in real time by using the SVM classifier and the template matching algorithm through the segmented image, and the real-time performance and the robustness are high.
Referring to fig. 4, the flow of the chip pose detection method of the embodiment includes the following steps:
step 1: acquiring an image of a chip to be detected, and coarsely positioning the salient feature points of the chip to be detected and the key points of the tray according to the fixed coordinates;
step 2: after coarse positioning, image segmentation is carried out on the chip and the material tray by utilizing a multi-scale residual error UNet model MR-UNet;
step 3: and after the segmentation, the pose of the chip is accurately detected in real time by using a multilevel pose detection method and an SVM classifier and a template matching algorithm.
Embodiment two:
the embodiment provides a chip multi-stage pose detection method based on multi-scale residual error UNet segmentation, firstly, a multi-scale residual error convolution module is introduced into UNet semantic segmentation model, the number of channels is halved, and chips with changeable poses and abundant appearances and complex tray backgrounds are effectively segmented while the light weight of the model is ensured. Secondly, the multi-stage pose detection method is provided, the pose of the chip is accurately detected in real time by using the SVM classifier and the template matching algorithm through the segmented image, and the real-time performance and the robustness are high.
Referring to fig. 4, the flow of the chip pose detection method of the embodiment includes the following steps:
step 1: coarse positioning is carried out on the remarkable characteristic points of the chip to be detected and the key points of the tray according to the fixed coordinates;
step 2: dividing the roughly positioned chip from the material tray based on the MR-UNet of the multi-scale residual UNet division;
step 3: and after the segmentation, the pose of the chip is accurately detected in real time by using a multilevel pose detection method and an SVM classifier and a template matching algorithm.
In step 2, a lightweight UNet model MR-UNet fused with a multi-scale residual convolution module (Multiscale Residual Convolution Module, MRC module) is proposed.
The model structure diagram of the MR-UNet module is shown in fig. 5, and the second convolution of the two 3×3 convolution layers of the original UNet model encoder stage is replaced by the multi-scale residual convolution module MRC provided in this embodiment, so as to enrich and purify the multi-scale characteristic information of the image.
Before each convolution of 3×3, a pixel filling operation with padding equal to 1 is performed around the image, so that the size of the image after each convolution is unchanged, and the problem of information loss of the image part is solved. Introducing BN layer after each convolution operation, and carrying out standardization processing on the input image data so that the data of each layer accords with the distribution with the mean value of 0 and the variance of 1; and then activating the characteristic map after BN by a ReLU function. And finally, the number of downsampling channels is halved, and the original [64,128,256,512,1024] is reduced to [32,64,128,256,512], so that the light weight of the model is ensured, and meanwhile, the separation effect is better.
The structure of the multi-scale residual convolution module MRC of this embodiment is shown in fig. 6. The MRC module firstly uses convolution kernels with different sizes of receptive fields of 1×1, 3×3, 5×5 and 7×7 to perform feature extraction on an input image in a parallel mode, and performs padding pixel filling operation on the image, wherein the sizes of the four convolution kernels are respectively 0, 1, 2 and 3.
Regarding the setting of four convolution kernel receptive fields and padding, other sizes of convolution kernels may also be selected, but some rules are met: 1. dimension of the feature map: the convolution kernel size chosen should be matched to the feature map size to ensure efficient feature extraction. Larger convolution kernels may capture a larger range of features, but may also result in increased computational effort; 2. computing resources: larger convolution kernels increase computational costs and thus require trade-offs in terms of available computational resources and performance requirements.
And carrying out feature fusion on the extracted features after batch normalization operation and ReLU activation function respectively, then carrying out 1X 1 convolution operation on the fused feature map again, and reducing the dimension of the channel number to be the same as that of the input image. And finally, adding the input feature images and the channels corresponding to the feature images after dimension reduction pixel by pixel, and outputting the feature images after the addition after ReLU activation.
In step 3, an objective is to provide a method for detecting multi-stage pose of a chip, as shown in fig. 4, to provide different defect detection methods for chip classification of different poses, which specifically includes:
(1) For chips with few positions, the number of background pixels of the chips after segmentation is large in proportion to the whole image, and the number of foreground pixels of the chips under other positions is large in proportion to the image. Therefore, the present embodiment uses this feature to detect a few-level-gesture chip. Firstly, selecting 100 chip images with fewer split material gestures, counting the number of pixels with the pixel value equal to 255 in each image, adding and dividing the number by 100 to obtain the average foreground pixel number, and taking the average foreground pixel number as a threshold parameter for distinguishing the fewer material gestures from other material gestures. If the pose is less than the parameter, judging that the pose is less than the parameter, and if the pose is more than the parameter, continuing the subsequent method flow to judge.
(2) The position and rotation angle of the severely tilted chip in the clamping groove are varied, the characteristics are obvious, and the chip is detected by using a classification algorithm in machine learning. Among a large number of classification algorithms, the mode of combining HOG features (Histogram of Oriented Gradient, HOG) with SVM classifiers (Support Vector Machine, SVM) is widely used because of small training difficulty and suitability for small samples, so the above combination method is used for detecting severe warping pose chips in this embodiment.
In the training stage, firstly, the pictures of the chips with normal pose and severe cocked pose are respectively put into two folders, the pictures to be trained in the two folders are read, HOG feature descriptors are extracted, the feature vectors of the chips with normal pose are marked as class 0 labels, and the feature vectors of the chips with severe cocked pose are marked as class 1 labels. And finally, outputting a classification model by combining the label and the SVM classifier. In the prediction stage, firstly, a trained SVM classification model and a picture to be predicted are read, then HOG features are extracted, and the category is predicted according to the classification model. If the output result is 0, the normal pose is primarily judged, and if the output result is 1, the direct output result is the severe tilting pose.
(3) For an offset chip, since features are not obvious, the SVM classification model is prone to misclassification, so that the embodiment provides a method for detecting the SVM classification model based on relative distance.
Firstly, precisely positioning a chip significant feature point p (x, y) and a tray key point q (x, y) by using a template matching algorithm;
then, using the chip salient feature point p (x, y) relative to O 1 (x 1 ,y 1 ) Absolute coordinates plus O 1 (x 1 ,y 1 ) Absolute coordinates of p (x, y) relative to O (x, y) are obtained relative to absolute coordinates of the image origin O (x, y); with respect to O by the tray key q (x, y) 2 (x 2 ,y 2 ) Absolute coordinates plus O 2 (x 2 ,y 2 ) Absolute coordinates of q (x, y) relative to O (x, y) are obtained relative to absolute coordinates of the image origin O (x, y);
finally, subtracting the x and y absolute coordinate values of q (x, y) from the x and y absolute coordinate values of the chip salient feature point p (x, y) to obtain an x relative distance x between the two relative And y relative distance y relative
Obtaining the relative distances x and y of the same row and column positions of all the images by the steps, adding the relative distances x and y and dividing the relative distances x by the number of the images to obtain the average relative distance x and the average relative distance y average And y average
Because the chip can shake slightly in the clamping groove, the chip can shake slightly in x average And y average Increasing the tolerance of n pixels back and forth on the basis of (a), then the tolerance ranges in x and y directions are respectively: [ x ] average -n,x average +n]、[y average -n,y average +n]Repeating the steps to obtain the tolerance ranges of the rest row and column positions in the image;
in the detection process, if the relative distances of x and y are within the tolerance range, the detection result is primarily judged to be the normal pose, and if one of the detection results exceeds the tolerance range, the detection result is directly output to be the 'offset pose'.
(4) Because the rotation angle of the slightly rotated chip is smaller, the x and y relative distances between the significant characteristic points of the matched chip and the key points of the tray are possibly in a tolerance range, so that slight rotation omission is caused. Therefore, in this embodiment, the minimum circumscribed rectangle of the image is obtained, the rotation angle of the minimum circumscribed rectangle is calculated, and whether the chip belongs to a slight rotation position is judged through the angle value. According to the related parameters of RectMin, the minimum circumscribed rectangle of the original image can be solved, and meanwhile, the rotation angle alpha, alpha epsilon (-90 degrees and 0 degrees) of the minimum circumscribed rectangle is obtained, the threshold value is calculated according to the following formula (1), beta epsilon [0 degrees and 45 degrees ].
The training process of the SVM classifier of the present embodiment is as follows:
step 1: the training set and the test set of the semantic segmentation model are divided. The input is a coarse positioned chip image, size 384 x 288. The data set comprises elements such as different material tray backgrounds, surfaces with different characters, different poses and the like, and is divided into 3040 training sets and 760 test sets, wherein each picture corresponds to a labeled label binary image (GT). Meanwhile, a cross-validation method is used in the training process, and 15% of training images are randomly extracted as a validation set during each training process so as to conduct finer training performance supervision.
Step 2: the training set and the test set of the SVM are divided. And for the SVM data set, respectively selecting 200 chip images of normal pose and 400 chip images of severe tilted pose after the segmentation of the MR-UNet model, and inputting the chip images into a classifier for training to obtain a classification model.
Step 3: and defining an evaluation index. In this embodiment, the model segmentation performance evaluation index is composed of the Dice coefficient, the cross-over ratio (intersection over union, IOU), the F1 Score (F1-Score) and the Sensitivity (SE), and the calculation formulas are (2) to (5). Parameters (parameters) and calculated amounts (flow) are used as model complexity evaluation indexes.
The TP indicates that a chip foreground region output by a network is a real foreground region; TN represents that the background area of the charging tray output by the network is a real background area; FP indicates that the foreground region of the chip output by the network is not a real foreground region, i.e., the background region is erroneously segmented into foreground regions; FN indicates that the network-output tray background area is not a true background area, i.e. the foreground area is erroneously segmented into background areas. The larger the values of the evaluation indexes are, the better the segmentation effect of the model is, and the quality of the network model can be comprehensively reflected.
To further illustrate the image segmentation performance of the MR-UNet model proposed in this example, a comparative experiment was performed, which was compared with several methods proposed in the literature, UNet, unet++, MSRD-ANet, CDWB-ASPP-UNet (CA-UNet), and RA-UNet, all performed in the same hardware environment and on the same data set, and the test results are shown in table 1, fig. 9, bolded to indicate the best results.
As can be seen from Table 1, the model proposed by the present invention is optimal in terms of four segmentation performance evaluation indexes, namely Dice, IOU, F-Score and SE, namely 0.9851, 0.9708, 0.9875 and 0.9884. Compared with UNet, UNet++, MSRD-ANet, RA-UNet and CA-UNet, the MR-UNet is respectively improved by 0.66%, 4.15%, 3.09%, 7.03% and 0.37% on the index of the Dice coefficient; respectively improving the IOU index by 1.16%, 7.23%, 5.63%, 8.7% and 0.47%; the F1-Score index is respectively improved by 1.27%, 2.39%, 6.71%, 4.88% and 0.43%; the SE indexes are respectively improved by 1.53%, 2.27%, 6.68%, 4.74% and 0.69%. In terms of complexity, the MR-UNet model of the invention has a minimum calculated amount compared with the other models, 50.42G, due to halving the number of channels, but has an increased parameter amount compared with unet++ and RA-UNet by 7.22M and 1.76M respectively due to the parallel convolution process of the MRC module. In general, the MR-UNet provided by the invention has the highest comprehensive precision and better overall performance.
Table 1 comparison of different segmentation methods performance
In order to verify the accuracy and robustness of the multi-stage pose detection algorithm provided by the invention, the pose of 3100 chips after MR-UNet segmentation is identified and verified, and the image comprises various elements such as different material disc backgrounds, different characters and the like. The recognition results of 1000 normal postures, 1000 severe cocked postures, 500 offset postures, 300 slightly rotated postures and 300 less postures are shown in table 2.
TABLE 2 statistical table of pose detection results
The multi-stage pose detection algorithm provided by the invention has the advantages that the chip detection accuracy rate of the multi-stage pose detection algorithm for normal poses and few poses is more than 99 percent; for chips with severe tilting, offset and slight rotation postures, the accuracy rates respectively reach 98.3%, 97.6% and 95.7%, and the situation that the tilting postures are erroneously detected as normal postures does not exist, so that the chips are effectively prevented from being crushed in the production process.
The effectiveness of the MR-UNet model provided by the invention is proved by a deep learning network segmentation performance comparison test; the feasibility and the robustness of the chip pose detection method provided by the invention are proved by a multi-stage pose detection algorithm accuracy verification test, and the requirement of chip batch automatic production can be basically met.
Some steps in the embodiments of the present invention may be implemented by using software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (6)

1. A method for detecting a chip gesture, the method comprising:
step 1: acquiring an image of a chip to be detected, and coarsely positioning the salient feature points of the chip to be detected and the key points of the tray according to the fixed coordinates;
step 2: after coarse positioning, image segmentation is carried out on the chip and the material tray by utilizing a multi-scale residual error UNet model MR-UNet;
step 3: after the segmentation, the pose of the chip is accurately detected in real time by using a multilevel pose detection method and an SVM classifier and a template matching algorithm;
the MR-UNet model in the step 2 replaces the second layer convolution of the two 3 multiplied by 3 convolution layers of the original UNet network encoder stage with a multi-scale residual convolution module MRC; before each 3×3 convolution, a pixel fill operation with padding equal to 1 is performed around the image; introducing BN layer after each convolution operation, and carrying out standardization processing on the input image data so that the data of each layer accords with the distribution with the mean value of 0 and the variance of 1;
then activating the feature map after the BN layer through a ReLU function;
finally halving the number of downsampling channels;
the multi-scale residual convolution module MRC firstly uses four convolution kernels with different sizes of receptive fields to perform feature extraction on an input image in a parallel mode, and performs padding pixel filling operation on the image;
then, carrying out feature fusion on the extracted features through batch normalization operation and a ReLU activation function, and then carrying out 1X 1 convolution operation on the fused feature map again to reduce the number of channels to be the same as the input image;
finally, adding the input feature images and the channels corresponding to the feature images after dimension reduction pixel by pixel, and outputting the added feature images after ReLU activation;
the multi-stage pose detection method comprises the following steps:
step 31: counting the number of foreground pixels in the segmented image, and judging that the material is less if the number of the foreground pixels is smaller than a preset threshold value; if the number of the foreground pixels is greater than the preset threshold, continuing to detect in step 32;
step 32: classifying the chip images to be detected by adopting an SVM classifier, predicting the category according to the classification model, preliminarily judging the chip images to be detected as normal pose if the output result is 0, and adopting the step 33 to continue detection; if the output result is 1, the judging result is 'severe tilting pose';
step 33: the image which is preliminarily judged to be in the normal pose in the step 32 is preliminarily judged to be in the normal pose by calculating the relative distance between the remarkable chip characteristic points and the key points of the material tray, and if the relative distance is in the preset tolerance range, the step 34 is adopted to continue detection; if the relative distance exceeds the tolerance range, directly outputting a detection result as an offset pose;
step 34: the step 33 is primarily judged to be an image of a normal pose, the rotation angle of the image is calculated by acquiring the minimum circumscribed rectangle of the image, and the chip is finally judged to be in a rotation pose or a normal pose through the angle value;
the relative distance-based method in step 33 includes:
firstly, precisely positioning a chip significant feature point p (x, y) and a tray key point q (x, y) by using a template matching algorithm;
then, the chip salient feature point p (x, y) is used for relative to the upper left corner point O of the chip area 1 (x 1 ,y 1 ) Absolute coordinates plus O 1 (x 1 ,y 1 ) Absolute coordinates of the origin O (x, y) of the image relative to O (x, y) to obtain p (x, y)x, y); with respect to the upper left corner O of the tray area, the tray key q (x, y) 2 (x 2 ,y 2 ) Absolute coordinates plus O 2 (x 2 ,y 2 ) Absolute coordinates of q (x, y) relative to O (x, y) are obtained relative to absolute coordinates of the image origin O (x, y);
finally, subtracting the x and y absolute coordinate values of q (x, y) from the x and y absolute coordinate values of the chip salient feature point p (x, y) to obtain an x relative distance x between the two relative And y relative distance y relative
Obtaining the relative distances x and y of the same row and column positions of all the images by the steps, adding the relative distances x and y and dividing the relative distances x by the number of the images to obtain the average relative distance x and the average relative distance y average And y average
Because the chip can shake slightly in the clamping groove, the chip can shake slightly in x average And y average Increasing the tolerance of n pixels back and forth on the basis of (a), then the tolerance ranges in x and y directions are respectively: [ x ] average -n,x average +n]、[y average -n,y average +n]Repeating the steps to obtain the tolerance ranges of the rest row and column positions in the image;
in the detection process, if the relative distances of x and y are within the tolerance range, the detection result is primarily judged to be the normal pose, and if one of the detection results exceeds the tolerance range, the detection result is directly output to be the 'offset pose'.
2. The method for detecting a chip posture according to claim 1, wherein the calculating process of the preset threshold in the step 31 includes: firstly, selecting more than 100 chip images with less material gestures after segmentation, counting the number of pixels with the value equal to 255 in each image, adding and dividing the number of images to obtain the average foreground number of pixels, and taking the average foreground number of pixels as a threshold parameter for distinguishing the less material gestures from other gestures.
3. The method according to claim 1, wherein the training process of the SVM classifier in the step 32 includes:
firstly, respectively placing the pictures of the chips of the normal pose and the severe cocked pose into two folders, reading the pictures to be trained in the two folders, extracting HOG feature descriptors, marking the feature vectors of the chips of the normal pose as class 0 tags, and marking the feature vectors of the chips of the severe cocked pose as class 1 tags; and finally, outputting a classification model by combining the label and the SVM classifier.
4. The method for detecting a chip posture according to claim 1, wherein the step 34 specifically includes:
calculating the minimum circumscribed rectangle of the image, and simultaneously obtaining the rotation angle alpha, alpha epsilon (-90 degrees, 0 degrees) of the minimum circumscribed rectangle;
the judgment angle beta, beta epsilon [0 DEG, 45 DEG ]:
when beta is more than 3.5 degrees, the rotation gesture is judged, and when beta is less than 3.5 degrees, the normal gesture is judged.
5. The chip gesture detection method of claim 1, wherein the tolerance n selects 5 pixels.
6. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the chip gesture detection method according to any one of claims 1 to 5.
CN202311347754.XA 2023-10-18 2023-10-18 Chip attitude detection method based on multi-scale residual UNet segmentation Active CN117409077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311347754.XA CN117409077B (en) 2023-10-18 2023-10-18 Chip attitude detection method based on multi-scale residual UNet segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311347754.XA CN117409077B (en) 2023-10-18 2023-10-18 Chip attitude detection method based on multi-scale residual UNet segmentation

Publications (2)

Publication Number Publication Date
CN117409077A CN117409077A (en) 2024-01-16
CN117409077B true CN117409077B (en) 2024-04-05

Family

ID=89486465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311347754.XA Active CN117409077B (en) 2023-10-18 2023-10-18 Chip attitude detection method based on multi-scale residual UNet segmentation

Country Status (1)

Country Link
CN (1) CN117409077B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020025696A1 (en) * 2018-07-31 2020-02-06 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Method and system for augmented imaging using multispectral information
CN111862126A (en) * 2020-07-09 2020-10-30 北京航空航天大学 Non-cooperative target relative pose estimation method based on deep learning and geometric algorithm
CN113705521A (en) * 2021-09-05 2021-11-26 吉林大学第一医院 Head pose estimation method combined with YOLO-MobilenetV3 face detection
WO2022074643A1 (en) * 2020-10-08 2022-04-14 Edgy Bees Ltd. Improving geo-registration using machine-learning based object identification
WO2022101361A2 (en) * 2019-11-12 2022-05-19 Astrazeneca Ab Automated assessment of wound tissue
CN114972968A (en) * 2022-05-19 2022-08-30 长春市大众物流装配有限责任公司 Tray identification and pose estimation method based on multiple neural networks
EP4068220A1 (en) * 2021-03-30 2022-10-05 Canon Kabushiki Kaisha Image processing device, image processing method, moving device, and storage medium
CN115170804A (en) * 2022-07-26 2022-10-11 无锡九霄科技有限公司 Surface defect detection method, device, system and medium based on deep learning
CN115187666A (en) * 2022-07-14 2022-10-14 武汉大学 Deep learning and image processing combined side-scan sonar seabed elevation detection method
WO2022221147A1 (en) * 2021-04-15 2022-10-20 Intrinsic Innovation Llc Systems and methods for six-degree of freedom pose estimation of deformable objects
CN115588043A (en) * 2022-09-23 2023-01-10 湖南省国土资源规划院 Excavator operation pose monitoring method based on vision
CN115661943A (en) * 2022-12-22 2023-01-31 电子科技大学 Fall detection method based on lightweight attitude assessment network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020025696A1 (en) * 2018-07-31 2020-02-06 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Method and system for augmented imaging using multispectral information
WO2022101361A2 (en) * 2019-11-12 2022-05-19 Astrazeneca Ab Automated assessment of wound tissue
CN111862126A (en) * 2020-07-09 2020-10-30 北京航空航天大学 Non-cooperative target relative pose estimation method based on deep learning and geometric algorithm
WO2022074643A1 (en) * 2020-10-08 2022-04-14 Edgy Bees Ltd. Improving geo-registration using machine-learning based object identification
EP4068220A1 (en) * 2021-03-30 2022-10-05 Canon Kabushiki Kaisha Image processing device, image processing method, moving device, and storage medium
WO2022221147A1 (en) * 2021-04-15 2022-10-20 Intrinsic Innovation Llc Systems and methods for six-degree of freedom pose estimation of deformable objects
CN113705521A (en) * 2021-09-05 2021-11-26 吉林大学第一医院 Head pose estimation method combined with YOLO-MobilenetV3 face detection
CN114972968A (en) * 2022-05-19 2022-08-30 长春市大众物流装配有限责任公司 Tray identification and pose estimation method based on multiple neural networks
CN115187666A (en) * 2022-07-14 2022-10-14 武汉大学 Deep learning and image processing combined side-scan sonar seabed elevation detection method
CN115170804A (en) * 2022-07-26 2022-10-11 无锡九霄科技有限公司 Surface defect detection method, device, system and medium based on deep learning
CN115588043A (en) * 2022-09-23 2023-01-10 湖南省国土资源规划院 Excavator operation pose monitoring method based on vision
CN115661943A (en) * 2022-12-22 2023-01-31 电子科技大学 Fall detection method based on lightweight attitude assessment network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于深度学习的哺乳期猪只目标检测与姿态识别";俞燃;《中国优秀硕士学位论文全文数据库农业科技辑》;20220315;第D050-248页 *

Also Published As

Publication number Publication date
CN117409077A (en) 2024-01-16

Similar Documents

Publication Publication Date Title
Lee et al. Simultaneous traffic sign detection and boundary estimation using convolutional neural network
CN106682598B (en) Multi-pose face feature point detection method based on cascade regression
CN106803247B (en) A method for image recognition of microaneurysm based on multi-level screening convolutional neural network
CN110232311A (en) Dividing method, device and the computer equipment of hand images
CN106407958B (en) Face feature detection method based on double-layer cascade
CN110930456A (en) Three-dimensional identification and positioning method of sheet metal part based on PCL point cloud library
CN102622589A (en) Multispectral face detection method based on graphics processing unit (GPU)
CN115661872A (en) Robust palm region-of-interest positioning method in natural scene
CN109816634B (en) Detection method, model training method, device and equipment
Wang et al. Detection of small aerial object using random projection feature with region clustering
Hua et al. Pedestrian-and vehicle-detection algorithm based on improved aggregated channel features
CN110570442A (en) Contour detection method under complex background, terminal device and storage medium
CN118864605B (en) A target recognition and positioning system based on image segmentation and its terminal device
CN110866931A (en) Image segmentation model training method and classification-based enhanced image segmentation method
Yang et al. An improved algorithm for the detection of fastening targets based on machine vision
Xu et al. Tolerance Information Extraction for Mechanical Engineering Drawings–A Digital Image Processing and Deep Learning-based Model
CN116912541A (en) Model training and image detection method and device, electronic equipment and storage medium
Juang et al. Stereo-camera-based object detection using fuzzy color histograms and a fuzzy classifier with depth and shape estimations
Hoque et al. Computer vision based gesture recognition for desktop object manipulation
CN104463085A (en) Face recognition method based on local binary pattern and KFDA
Fritz et al. Object recognition using local information content
CN117409077B (en) Chip attitude detection method based on multi-scale residual UNet segmentation
Chen et al. Automatic Fish Segmentation and Recognition in Taiwan FishMarket using Deep Learning Techniques.
Singh Gaussian elliptical fitting based skin color modeling for human detection
CN117058736A (en) Facial false detection recognition method, device, medium and equipment based on key point detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant