Disclosure of Invention
In order to solve the problems, the invention provides an automatic control system for the stacking rate of polyimide sintered wires, which is used for controlling the advancing rate and the wrapping rate adjustment decision of the copper flat wires by integrating an image recognition technology, a semantic recognition technology and a deep learning technology, so that the stacking rate is reduced to the greatest extent on the premise of ensuring the insulation rate quality of the copper flat wires, and the aim of optimizing the cost is fulfilled.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the polyimide sintering line stacking rate automatic control system comprises a pre-checking module, a wrapping module, a stacking rate monitoring module, a sintering module, an annealing cooling module, a detection checking module and a stacking rate control module which are sequentially connected, wherein the stacking rate control module is also connected with the detection checking module and the wrapping module respectively;
the pre-checking module is used for carrying out characterization and identification on the copper flat wire through a mobile light source and image identification, and confirming the characterization state of the copper flat wire, wherein the characterization state comprises the presence of characterization defects and the absence of characterization defects;
The wrapping module is used for controlling the wrapping speed of the polyimide film and controlling the advancing speed of the copper flat wire with the characteristic state that the characteristic defect does not exist;
The stacking rate monitoring module is used for carrying out semantic segmentation on the polyimide film and the copper flat wire in the image data in the wrapping process based on the semantic segmentation model, and calculating the stacking rate of the polyimide film according to the overlapping rate of the width of the polyimide film on the copper flat wire;
The device comprises a sintering module, an annealing cooling module, a detection verification module, a control module and a control module, wherein the sintering module is used for performing sintering temperature control on the copper flat wire which is completed with polyimide film lamination, and the annealing cooling module is used for performing annealing cooling temperature control on the copper flat wire which is completed with sintering;
The stacking rate control module is used for constructing a decision model through deep learning based on the difference value of the insulation rate of the copper flat wire and the preset standard insulation rate, the wrapping rate, the polyimide film width and the advancing rate of the copper flat wire, and generating an adjustment decision of the advancing rate and the wrapping rate of the copper flat wire through the decision model.
Further, the identification of the characterization of the flat copper wire by the mobile light source and the image identification, and the confirmation of the characterization state of the flat copper wire comprises the following steps:
Controlling the movable light source to move according to a preset track and irradiating the surface of the copper flat wire;
acquiring image data of a plurality of angles of the surface of the copper flat wire in the moving process of the moving light source;
Carrying out color value trend judgment on image data of a plurality of angles on the surface of the copper flat wire, and extracting abnormal points which do not accord with the color value trend;
And confirming whether the surface of the copper flat wire has characterization defects according to the occupation size of the abnormal point positions.
Further, the packet stacking rate monitoring module is configured to perform the following steps:
Acquiring image data of a copper flat wire wrapped by a polyimide film in a wrapping process;
Preprocessing image data, including image denoising, contrast enhancement and distortion correction;
Inputting the preprocessed image data into a pre-trained semantic segmentation model, and classifying the polyimide film and the copper flat wire in the image to obtain a segmentation mask of the polyimide film and the copper flat wire;
Extracting the width of the polyimide film through a segmentation mask of the polyimide film;
and calculating the wrapping rate of the polyimide film by the width of the overlapping area of the polyimide film on the copper flat line and the width of the polyimide film.
Further, the training step of the semantic segmentation model includes:
acquiring an image data set of a polyimide film and a copper flat wire in a wrapping process, and carrying out pixel data annotation on the polyimide film and the copper flat wire in the image;
The method comprises the steps of constructing a semantic segmentation model, wherein the semantic segmentation model comprises an encoder and a decoder, the encoder performs downsampling and feature extraction on an input image, reduces the spatial resolution of a feature map and increases the channel number of the feature map through convolution operation and pooling operation, and captures semantic information of the image;
The decoder carries out up-sampling and semantic segmentation on the features extracted by the encoder, restores the spatial resolution of the feature map through deconvolution layers and jump connection, and maps the semantic information extracted by the encoder to a segmentation result of a pixel level;
Inputting the image data set marked by the pixel data into a semantic segmentation model for training, taking a cross entropy loss function as an optimization target, and iteratively updating model parameters through a random gradient descent algorithm.
Further, the convolution operation is formulated as follows:
;
Wherein, Is an input image; For inputting images In the first placeOutputting the characteristics of the layers; Is connected to Layer and the firstA weight matrix of a kth convolution kernel of the layer; For inputting images In the first placeOutputting the characteristics of the layers; Is the first Bias of the layer; is an activation function; Is the number of convolution kernels.
Further, the sintering temperature control and the annealing cooling temperature control are respectively performed through a preset sintering working temperature and a preset annealing cooling temperature and corresponding preset time.
Further, the step of constructing a decision model by deep learning based on the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate, the wrapping rate, the polyimide film width and the advancing rate of the copper flat wire comprises the following steps:
Defining a state space, wherein the state space comprises a difference value between the insulation rate of the copper flat wire and a preset standard insulation rate, a wrapping rate, a polyimide film width and a pushing rate of the copper flat wire;
and constructing a decision model based on the state space, the action space and the reward function, performing reinforcement learning training through historical data, and limiting the difference between new and old strategies in the iterative training process through a near-end strategy optimization algorithm.
Further, the objective function formula of the near-end policy optimization algorithm is as follows:
;
Wherein, Is an objective function; representing an expected value for time step t; the probability ratio of the new strategy to the old strategy in the time step t is the strategy; is a dominance function, generated by the difference calculation between the actual return and the baseline; is a super parameter; for limiting probability ratio For applying, by clipping operationsIs limited toAndBetween them; representing a minimum taking operation.
Further, the reward function includes:
If the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate is positive or zero, the rewarding function is positive rewarding and is inversely related to the stacking rate;
And if the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate is a negative value, the rewarding function is negative rewarding and is irrelevant to the stacking rate.
The method has the beneficial effects that the possible defects on the surface of the copper flat wire are automatically detected by combining the mobile light source and the image recognition technology, so that only the copper flat wire with good surface state enters the wrapping stage in the production process, and the problem of uneven wrapping caused by uneven surface is avoided from the source. By dynamically adjusting the wrapping speed and the advancing speed of the copper flat wire and combining the data provided by the pre-checking module, the precise stacking of the polyimide film is ensured. The process is not dependent on fixed mechanical setting any more, but the process parameters are adjusted through real-time feedback, so that the wrapping process can adapt to the change of production conditions, and the uniformity and the accuracy of each wrapping are ensured. And the superposition rate of the polyimide film and the copper flat wire is calculated by processing the image data captured in real time and accurately identifying and dividing the images of the polyimide film and the copper flat wire. The technology based on image processing and artificial intelligence can overcome the limitation of the traditional method under the complex stacking pattern, and ensure the accurate control of the stacking rate. And the detection and verification module detects the insulation rate of the copper flat wire which completes the whole process, compares the actual insulation rate with a preset standard, and ensures that the electrical performance of the final product reaches the standard. The packet overlapping rate control module establishes a closed-loop control system based on reinforcement learning by introducing a near-end strategy optimization algorithm (PPO). The system can autonomously learn and adjust key parameters such as wrapping speed, propelling speed and the like under the input of multidimensional data. In the process, the system can continuously optimize the decision model according to actual production data, and the minimization of the stacking rate is realized on the premise of meeting the insulation performance, so that the material consumption is greatly reduced, and the effect of optimizing the cost is achieved.
Detailed Description
Referring to fig. 1-2, the invention relates to an automatic control system for a polyimide sintering line stacking rate, which comprises a pre-checking module, a wrapping module, a stacking rate monitoring module, a sintering module, an annealing cooling module, a detection checking module and a stacking rate control module which are sequentially connected, wherein the stacking rate control module is also respectively connected with the detection checking module and the wrapping module;
the pre-checking module is used for carrying out characterization and identification on the copper flat wire through a mobile light source and image identification, and confirming the characterization state of the copper flat wire, wherein the characterization state comprises the presence of characterization defects and the absence of characterization defects;
The wrapping module is used for controlling the wrapping speed of the polyimide film and controlling the advancing speed of the copper flat wire with the characteristic state that the characteristic defect does not exist;
The stacking rate monitoring module is used for carrying out semantic segmentation on the polyimide film and the copper flat wire in the image data in the wrapping process based on the semantic segmentation model, and calculating the stacking rate of the polyimide film according to the overlapping rate of the width of the polyimide film on the copper flat wire;
The device comprises a sintering module, an annealing cooling module, a detection verification module, a control module and a control module, wherein the sintering module is used for performing sintering temperature control on the copper flat wire which is completed with polyimide film lamination, and the annealing cooling module is used for performing annealing cooling temperature control on the copper flat wire which is completed with sintering;
The stacking rate control module is used for constructing a decision model through deep learning based on the difference value of the insulation rate of the copper flat wire and the preset standard insulation rate, the wrapping rate, the polyimide film width and the advancing rate of the copper flat wire, and generating an adjustment decision of the advancing rate and the wrapping rate of the copper flat wire through the decision model.
In some embodiments, the system first enables accurate detection of the surface condition of the copper flat wire through a pre-inspection module. The module includes a high resolution industrial camera and an adjustable mobile light source. The light source can uniformly irradiate the surface of the copper flat wire at a plurality of angles, and the industrial camera captures multi-angle image data in real time. In conjunction with the embedded processor, the system analyzes the data using advanced image recognition algorithms to identify possible micro-defects, such as asperities, cracks, or contaminants, on the surface. The pre-checking module is integrated in the integrated frame, can be in seamless butt joint with the production line, and transmits the detection result to the control system in real time through a standard communication protocol. The wrapping module mainly comprises a high-precision servo motor, a guide rail system and a tension controller. The servo motor controls the wrapping speed of the polyimide film, and the guide rail system precisely controls the advancing speed of the copper flat wire. The tension controller is used for ensuring that the tension of the film in the wrapping process is kept within a preset range and preventing the situation of over-loosening or over-tightening in the wrapping process. The system realizes dynamic adjustment of wrapping speed and propelling speed through tight linkage with the pre-checking module, and ensures uniform wrapping of the film. All hardware of the module is monitored and regulated in real time by a central controller, so that the response speed and the precision when the production conditions change are ensured. The pack rate monitoring module comprises an industrial camera and an image processing unit, wherein the image processing unit is accelerated by using a GPU, and can run a deep learning model in real time to perform semantic segmentation on images. Through the cooperative work of the hardware, the system can quickly calculate the packet stacking rate and transmit data to a central control system so as to carry out further optimization adjustment. The sintering module is used in combination with the annealing cooling module, and the sintering and cooling processes of the copper flat wire are controlled respectively. The sintering module adopts a precise heating element and a temperature sensor to ensure that the polyimide film is uniformly sintered on the surface of the copper flat wire. The temperature control system of the module can maintain an accurate temperature curve in a high-temperature environment, and consistency of material characteristics is ensured. The annealing cooling module gradually cools the sintered copper flat wire through the cooling fan and the liquid cooling system, so that thermal stress is prevented from being generated in the material, and the mechanical strength and the insulation performance of the copper flat wire are improved. The detection and verification module integrates a high-precision insulation resistance tester and a dielectric strength tester and is used for comprehensively evaluating the insulation performance of the copper flat wire. The insulation resistance tester measures the resistance value of the polyimide film on the outer layer of the copper flat wire by applying a certain direct current voltage. The resistance value reflects the insulation effect of the polyimide film, and if the insulation resistance value is lower than a preset standard, the film is indicated to have defects or insufficient lamination. To ensure the accuracy of the measurements, the system has strict control over the environmental conditions during the test. The temperature and the humidity of the test environment are in the standard range, and the shielding box is used for preventing the influence of external electromagnetic interference on the measurement result. In the test process, the system firstly applies a low voltage to perform pre-detection, ensures no arc discharge or obvious insulation failure, and then gradually increases the voltage until the specified test voltage is reached. After the test is completed, the system automatically uploads the measurement data to the central control system. The central control system compares the actual measured insulation resistance and breakdown voltage with preset standards and calculates the difference between the insulation performance and the standards. And if the difference value is positive, the copper flat wire batch is qualified.
Further, the identification of the characterization of the flat copper wire by the mobile light source and the image identification, and the confirmation of the characterization state of the flat copper wire comprises the following steps:
Controlling the movable light source to move according to a preset track and irradiating the surface of the copper flat wire;
acquiring image data of a plurality of angles of the surface of the copper flat wire in the moving process of the moving light source;
Carrying out color value trend judgment on image data of a plurality of angles on the surface of the copper flat wire, and extracting abnormal points which do not accord with the color value trend;
And confirming whether the surface of the copper flat wire has characterization defects according to the occupation size of the abnormal point positions.
In some embodiments, the high resolution camera captures image data of the copper flat wire surface in real time as the light source moves along a set trajectory. These image data are directly transmitted to the image analysis algorithm by the data stream processing module. Firstly, the system can preprocess each frame of image, including noise filtering, contrast enhancement, geometric correction and the like, so as to ensure that the input image data has higher quality and consistency. Next, the system analyzes the image by a color value trend decision algorithm. The algorithm is based on statistical and machine learning technologies, and the color value abnormal region is identified by comparing the chromaticity histograms of the multi-angle images. The algorithm determines which areas have color value distributions that deviate from the normal range according to a predetermined color value model. For areas that deviate significantly, the system marks them as outliers. Further, the algorithm performs cluster analysis on the abnormal points, and calculates the occupation size and distribution condition of the abnormal points in the image. Clustering algorithms such as K-means or DBSCAN (density-based clustering algorithms) will cluster dense points into possible characterization defect regions based on the spatial distribution of outlier points. The size of these defective areas is determined by calculating their pixel areas in the image and if the areas exceed a set threshold, the system will mark the areas as potential defects. In addition, the system can also carry out secondary screening on the clustering result, and a classification algorithm based on rules or machine learning is adopted to judge the actual influence of the areas. For example, the shape, texture and distribution characteristics of these regions are further analyzed using a Support Vector Machine (SVM) or Convolutional Neural Network (CNN) to accurately determine whether the region belongs to a true characterization defect. Finally, the system generates a characterization report detailing all detected outliers and their corresponding characterization defect regions. This report will be used to determine whether the copper flat wire is suitable for entering a subsequent processing step. If a serious defect is found, the system automatically informs the operator to take corresponding action, such as adjusting process parameters or replacing materials.
Further, the packet stacking rate monitoring module is configured to perform the following steps:
Acquiring image data of a copper flat wire wrapped by a polyimide film in a wrapping process;
Preprocessing image data, including image denoising, contrast enhancement and distortion correction;
Inputting the preprocessed image data into a pre-trained semantic segmentation model, and classifying the polyimide film and the copper flat wire in the image to obtain a segmentation mask of the polyimide film and the copper flat wire;
Extracting the width of the polyimide film through a segmentation mask of the polyimide film;
and calculating the wrapping rate of the polyimide film by the width of the overlapping area of the polyimide film on the copper flat line and the width of the polyimide film.
In some embodiments, the preprocessing stage employs a variety of image processing algorithms, including denoising, contrast enhancement, and distortion correction. The denoising process uses a convolution-based filtering algorithm, such as gaussian filtering or median filtering, to remove random noise from the image. Algorithms that enhance contrast, such as histogram equalization, can improve the contrast between polyimide films and copper flat lines in the image, ensuring that they are clearly visible in subsequent analysis. The distortion correction is to correct the deformation of the image caused by optical or mechanical reasons through a geometric transformation algorithm, so that the accuracy and consistency of the image are ensured. The preprocessed image data is then input into a pre-trained semantic segmentation model. The model adopts a Convolutional Neural Network (CNN) architecture, in particular a deep neural network (such as U-Net or SegNet) to realize accurate classification of polyimide films and copper flat wires in images. The polyimide film and the copper flat wire area in the image can be identified and segmented by the model through learning a large amount of labeling data, and a corresponding segmentation mask is generated. The division mask is a binary image in which each pixel belongs to either a polyimide film or a copper flat wire. This mask provides the underlying data for subsequent analysis. Through the generated division mask, the system can extract the width of the polyimide film. The width extraction algorithm is based on an image processing technology, and usually adopts an edge detection and connected domain analysis method to accurately measure the width of the polyimide film in the mask. This process requires consideration of the spatial resolution and geometric distortion correction of the image to ensure accuracy of the measurement. And finally, the system determines the stacking rate by calculating the ratio of the width of the overlapping area of the polyimide film on the copper flat line to the total width of the film.
Further, the training step of the semantic segmentation model includes:
acquiring an image data set of a polyimide film and a copper flat wire in a wrapping process, and carrying out pixel data annotation on the polyimide film and the copper flat wire in the image;
The method comprises the steps of constructing a semantic segmentation model, wherein the semantic segmentation model comprises an encoder and a decoder, the encoder performs downsampling and feature extraction on an input image, reduces the spatial resolution of a feature map and increases the channel number of the feature map through convolution operation and pooling operation, and captures semantic information of the image;
The decoder carries out up-sampling and semantic segmentation on the features extracted by the encoder, restores the spatial resolution of the feature map through deconvolution layers and jump connection, and maps the semantic information extracted by the encoder to a segmentation result of a pixel level;
Inputting the image data set marked by the pixel data into a semantic segmentation model for training, taking a cross entropy loss function as an optimization target, and iteratively updating model parameters through a random gradient descent algorithm.
In some embodiments, the process of labeling image data in pixel-level data is typically performed manually by a practitioner using a labeling tool, where each pixel is accurately classified as a polyimide film, copper flat wire, or background. This fine labeling provides an accurate supervisory signal for the model, ensuring that the model can learn the tiny features in the image during training. In terms of the construction of the semantic segmentation model, this embodiment employs an encoder-decoder architecture. The encoder section is responsible for downsampling and feature extraction of the input image. Specifically, the encoder extracts local features of the image through a series of convolution operations while reducing the spatial resolution of the feature map through a pooling operation. This process not only reduces the amount of computation, but also helps the model capture more abstract semantic information. The number of channels of the feature map increases as the convolutional layer deepens, thereby enabling more complex features to be represented. The task of the decoder section is to upsample the high level features extracted by the encoder and restore to the same spatial resolution as the input image. The decoder typically amplifies the feature map by a deconvolution operation and introduces low level detail information in the encoder into the decoder using a jump connection (skip connections). The design can combine high-level semantic information and low-level detail characteristics while maintaining spatial resolution, so as to generate accurate pixel level separation results. In the model training process, the annotated image dataset is input into the semantic segmentation model. The core of training is to take the cross entropy loss function as an optimization target. The cross entropy loss function is used for measuring the difference between the prediction result and the actual annotation of the model, and the smaller the loss value is, the higher the prediction precision of the model is. In each iteration, the system updates the parameters of the model by a random gradient descent (SGD) algorithm. The SGD approaches the optimal solution step by calculating the gradient of the loss function relative to the model parameters and adjusting the parameters in each training step along the direction of gradient descent.
Further, the convolution operation is formulated as follows:
;
Wherein, Is an input image; For inputting images In the first placeOutputting the characteristics of the layers; Is connected to Layer and the firstA weight matrix of a kth convolution kernel of the layer; For inputting images In the first placeOutputting the characteristics of the layers; Is the first Bias of the layer; is an activation function; Is the number of convolution kernels.
In particular, the method comprises the steps of,This feature representation is for the input imageIn the first placeLayer feature extraction results, each convolution kernelIs a matrix of fixed size (e.g., 3x3 or 5x 5), is used to extract features of a local region in the input feature map,The bias term is a scalar or matrix that acts to linearly adjust the results after the convolution operation to increase the expressive power of the model.The activation function is a nonlinear function, such as ReLU (RECTIFIED LINEAR Unit), and the function of the activation function is to introduce nonlinearity so that the model can learn more complex features. Number of convolution kernelsIndicating how many different convolution kernels are used at the current layer. Each convolution kernel extracts a different feature, so the output feature map is also multi-channel.
Further, the sintering temperature control and the annealing cooling temperature control are respectively performed through a preset sintering working temperature and a preset annealing cooling temperature and corresponding preset time.
Specifically, first, the sintering temperature control module sets a sintering working temperature within a certain range according to material characteristics and process requirements. The temperature setting is determined by pre-experimental data and thermal performance analysis of materials, and is generally required to ensure that a polyimide film can form a uniform and firm adhesive layer on the surface of a copper flat wire without damaging the copper flat wire. In order to realize accurate temperature control, the system adopts a closed-loop control strategy, namely, the temperature of a sintering area is monitored in real time through a high-precision temperature sensor arranged in the sintering furnace, and data is fed back to a temperature controller. The temperature controller adjusts the power output of the heater in real time according to the deviation between the actual temperature and the preset temperature, so as to ensure that the temperature is maintained near the set value. The sintering process is usually accompanied by a preset heating and holding phase, the specific timing being determined by the thermal stability of the polyimide film and the thermal conductivity of the copper flat wire. In the heating stage, the system increases the temperature to the target sintering temperature at a preset rate, and maintains the temperature for a period of time in the heat-preserving stage, so as to ensure that the polyimide film is completely melted and fully adhered to the surface of the copper flat wire. After sintering is completed, the system immediately enters an annealing cooling stage. The annealing cooling module also sets a preset cooling temperature and a corresponding cooling time. The main purpose of annealing and cooling is to eliminate the stress in the material by gradually reducing the temperature, thereby improving the mechanical strength and durability of the polyimide film. At this stage, the temperature controller adopts a staged cooling strategy according to the thermodynamic characteristics of the annealing process to prevent the accumulation of thermal stress of the material caused by too fast cooling. Accurate control of the cooling process not only relies on real-time data from the temperature sensor, but also incorporates the calculation of the thermal conduction model to ensure that the rate of temperature drop matches the cooling characteristics of the material. The entire sintering and annealing cooling process is performed on the time axis strictly according to a preset time node.
Further, the step of constructing a decision model by deep learning based on the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate, the wrapping rate, the polyimide film width and the advancing rate of the copper flat wire comprises the following steps:
Defining a state space, wherein the state space comprises a difference value between the insulation rate of the copper flat wire and a preset standard insulation rate, a wrapping rate, a polyimide film width and a pushing rate of the copper flat wire;
and constructing a decision model based on the state space, the action space and the reward function, performing reinforcement learning training through historical data, and limiting the difference between new and old strategies in the iterative training process through a near-end strategy optimization algorithm.
In some embodiments, first, a state space is defined. The state space comprises key parameters in the production process, such as the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate, the current wrapping rate, the width of the polyimide film and the advancing rate of the copper flat wire. Together, these parameters describe the current state of the production process and are the basic data input by the decision model. The definition of the state space ensures that the model can fully sense various factors in the production process, so that accurate process adjustment is made. Next, an action space is defined. The action space includes an adjustment range of the wrapping rate and the advancing rate. For each state, the model may choose to adjust the wrapping rate or the advance rate, or remain unchanged. The definition of the action space directly affects the adjustment strategies that the model can take in different states, which will further affect the production results. In order to enable the model to learn and improve in the optimization process, a reward function based on the insulation rate difference value and the stacking rate is constructed. The reward function is used to quantify the effect of the model after selecting a certain action in a particular state. Specifically, when the insulation rate of the copper flat wire approaches or reaches a preset standard and the packing rate is reduced, a positive reward is given, and when the insulation rate deviates from the standard or the packing rate is higher, a negative reward is given. The design of the reward function aims at guiding the model to minimize material waste and realize optimization and adjustment of technological parameters on the premise of ensuring the insulation performance. In the model training process, a near-end policy optimization algorithm (PPO) in reinforcement learning is adopted. First, the system generates a large number of state-action pairs through historical production data and simulation environment for the model to learn initially. In each iteration, the PPO algorithm limits the updating amplitude of the model by calculating the difference between the current strategy and the old strategy, and avoids instability caused by excessive strategy change. By fine tuning the strategy, the PPO algorithm ensures that the strategy is gradually optimized after each iteration and eventually converges to a stable and efficient strategy.
The training process of the model enables the model to accurately predict and execute the optimal process adjustment strategy under different production states through continuous iterative optimization. Along with the continuous increase of training data, the decision making capability of the model is gradually improved, real-time optimization adjustment can be realized in the production process, and the continuous improvement of the production quality and the efficiency is ensured.
Further, the objective function formula of the near-end policy optimization algorithm is as follows:
;
Wherein, Is an objective function; representing an expected value for time step t; the probability ratio of the new strategy to the old strategy in the time step t is the strategy; is a dominance function, generated by the difference calculation between the actual return and the baseline; is a super parameter; for limiting probability ratio For applying, by clipping operationsIs limited toAndBetween them; representing a minimum taking operation.
Specifically, first, the objective function takes into account the change between the new policy and the old policy by calculating the probability ratio of their selection of an action in a particular state. This ratio is used to evaluate whether the new policy deviates significantly from the old policy, thereby helping the model to keep a stable policy update. To prevent policy overshoot, PPO introduces a clipping mechanism. Specifically, the objective function limits the probability ratio of the new policy to a specific range when calculating the policy update. This range is defined by a small super-parameterControl is usually set to 0.1 or 0.2. This means that if the adjustment amplitude of the new strategy is outside this range, the algorithm clips it back into the predetermined range to avoid drastic changes in the update process. In addition, the PPO algorithm introduces a merit function that measures the difference between the actual and expected performance of the current strategy. The dominance function determines whether an action is better than average by comparing the actual return of the model in a certain state to the baseline return. This process helps the model more accurately evaluate the effectiveness of each action in the optimization process. By combining the clipping mechanism and the dominance function in the objective function, the PPO algorithm achieves stability and effectiveness in reinforcement learning. This design allows the model to gradually improve its decision making capability during the training process without causing instability or poor results in training due to too fast or too slow policy updates. Finally, the PPO algorithm enables the model to learn to make optimal decisions in a complex environment through a plurality of iterative processes, thereby improving the automation and optimization level of the production process.
Further, the reward function includes:
If the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate is positive or zero, the rewarding function is positive rewarding and is inversely related to the stacking rate;
And if the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate is a negative value, the rewarding function is negative rewarding and is irrelevant to the stacking rate.
Specifically, when the difference between the insulation ratio of the flat copper wire and the preset standard insulation ratio is a positive value or zero, this means that the insulation performance of the flat copper wire meets or exceeds the preset standard. In this case, the bonus function gives a positive bonus, representing positive feedback on the current strategy. Meanwhile, the prize value is inversely related to the pack rate, i.e. the lower the pack rate is, the higher the prize value is. The design aims at encouraging the model to reduce the stacking rate as much as possible on the premise of ensuring the insulation performance, thereby reducing the material cost. When the difference value between the insulation rate of the copper flat wire and the preset standard insulation rate is a negative value, the insulation performance of the copper flat wire does not reach the preset standard. In order to guide the model to give priority to insulation performance, the reward function gives a negative reward in this case, indicating that the current strategy needs to be adjusted. In this case, the stacking rate does not affect the prize value, because the primary goal of the model is to adjust the process parameters to increase the insulation rate and ensure that the product quality meets the standard.
The above embodiments are merely illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the scope of protection defined by the claims of the present invention without departing from the spirit of the design of the present invention.