CN118378092B

CN118378092B - Model training method, abnormality detection system, electronic device, and storage medium

Info

Publication number: CN118378092B
Application number: CN202410805536.4A
Authority: CN
Inventors: 郭晏; 冯云乔; 张松涛; 黄明; 田鹏伟
Original assignee: Alibaba Cloud Feitian Hangzhou Cloud Computing Technology Co ltd
Current assignee: Alibaba Cloud Feitian Hangzhou Cloud Computing Technology Co ltd
Priority date: 2024-06-20
Filing date: 2024-06-20
Publication date: 2024-10-25
Anticipated expiration: 2044-06-20
Also published as: CN118378092A

Abstract

In the embodiment of the application, a pre-training language model is used as a basic model to construct an anomaly detection network, normal sensor time sequence data acquired in a product manufacturing process is used as a normal sample to perform unsupervised training, the anomaly detection network with time sequence data reconstruction capability is obtained, the anomaly detection network obtained by training has good accuracy and generalization capability, and anomaly analysis can be performed on the time sequence data efficiently and accurately. Particularly, the anomaly detection network can accurately and efficiently detect the anomaly time sequence data in the processing process of the semiconductor wafer, and ensure the product yield of the semiconductor wafer.

Description

Model training method, abnormality detection system, electronic device, and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a model training method, an anomaly detection method, a system, an electronic device, and a storage medium.

Background

The price of a single semiconductor wafer varies from $1000 to $10000, and the yield of products is generally over 80 percent in the industry. Taking a factory with a daily yield of 3000 pieces as an example, the loss caused by the product yield is more than 30000 dollars in a single day, and the improvement of the product yield can directly bring great economic benefit. The semiconductor wafer processing process involves more than 2000 processes, the number of sensors on each process machine reaches 10-100, and massive sensor time sequence data can be generated in the semiconductor wafer processing process. The sensor time sequence data reflects the stability of equipment in the processing process of the semiconductor wafer, and can reflect the yield of products to a certain extent.

At present, a statistical analysis is generally performed on massive sensor time sequence data, whether the semiconductor wafer processing process is abnormal or not is judged based on a statistical analysis result, and equipment overhaul is timely performed to ensure the product yield of the semiconductor wafer when the semiconductor wafer processing process is abnormal. However, the difficulty of statistical analysis of massive sensor time sequence data is increased, the conditions of missed judgment and misjudgment are easy to generate, and the product yield of the semiconductor wafer is difficult to ensure.

Disclosure of Invention

Aspects of the present application provide a model training method, an anomaly detection method, a system, an electronic device, and a storage medium for efficiently and accurately performing anomaly analysis on time series data generated in a product manufacturing process.

The embodiment of the application provides a model training method, which comprises the following steps: the method comprises the steps of segmenting sample sensor time sequence data to obtain an original Token sequence, wherein the sample sensor time sequence data are normal sensor time sequence data acquired in the manufacturing process of a product; inputting the original Token sequence into an anomaly detection network constructed based on a target pre-training language model, so that the reconstruction processing is carried out on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence; and adjusting network parameters of the anomaly detection network according to the reconstruction errors between the original Token sequence and the reconstructed Token sequence.

The embodiment of the application also provides an abnormality detection method, which comprises the following steps: acquiring time sequence data of a sensor to be detected, which is acquired by the sensor in the manufacturing process of the product; performing segmentation processing on the time sequence data of the sensor to be detected to obtain an original Token sequence; inputting the original Token sequence into an anomaly detection network, carrying out reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence, determining an anomaly score according to the original Token sequence and the reconstructed Token sequence, and determining that the time sequence data of the sensor to be detected is anomalous if the anomaly score is greater than a preset score threshold value.

The embodiment of the application also provides electronic equipment, which comprises: a memory and a processor; a memory for storing a computer program; the processor is coupled to the memory for executing the computer program for performing the steps in the model training method or the anomaly detection method.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement steps in a model training method or an anomaly detection method.

Embodiments of the present application also provide a computer program product comprising computer programs/instructions which, when executed by a processor, cause the processor to implement steps in a model training method or an anomaly detection method.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a production flow diagram of an exemplary semiconductor wafer processing process;

FIG. 2 is a schematic diagram of an FDC system for product yield analysis;

FIG. 3 is a flowchart of a model training method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an exemplary anomaly detection network provided by an embodiment of the present application;

FIG. 5 is a flowchart of an abnormality detection method according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an abnormality detection system according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a model training device according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of an abnormality detection apparatus according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes the access relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may represent: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In the text description of the present application, the character "/" generally indicates that the front-rear associated object is an or relationship. In addition, in the embodiments of the present application, "first", "second", "third", etc. are only for distinguishing the contents of different objects, and have no other special meaning.

The words related to the embodiments of the present application are described below:

time series (TIME SERIES): the numerical values of the same statistical index are arranged according to the time sequence of occurrence.

Sensor timing data: refers to time series data collected by a sensor, for example, time series data collected by a pressure sensor, time series data collected by a temperature sensor, or time series data collected by a flow sensor.

Pre-training language model (Pre-trained Language Models, PLMs for short): the method refers to a model obtained by performing unsupervised or self-supervised pre-training on large-scale text data and learning general language structures and knowledge. The pre-training language model can capture rich language structure and semantic information, and fine-tuning (fine-tuning) is performed on the pre-training language model according to the downstream tasks, so that excellent performance is achieved on various downstream tasks. Model structures of the pre-trained language model include, for example, but are not limited to: bi-directional encoder representation from transducer (bidirectional encoder representation from transformers, BERT), autoregressive language model (Autoregressive Language Model), generative pre-training language model (GENERATIVE PRE-trained Transformer, GPT). The transducer module is a neural network structure based on a Self-Attention mechanism, and the model performance is obviously improved through parallel processing and the Self-Attention (Self-Attention) mechanism.

Parameter-efficient fine Tuning (Parameter-EFFICIENT FINE-Tuning, PEFT): is a method for efficiently adapting a pre-trained language model to various downstream tasks by training a small number of parameters in the pre-trained language model without fine tuning all model parameters. The PEFT method only fine-tunes a small amount of model parameters, and the calculation cost and the storage cost are obviously reduced.

Large language model: also referred to as a large-scale language model (Large Language Model, LLM), refers to a class of natural language processing models with very large scale parameters. Large language models are typically based on deep learning architecture, especially the Transformer architecture, by pre-training on massive text data, learning complex structures of language and rich context information.

Fig. 1 is a production flow diagram of an exemplary semiconductor wafer processing process. Referring to fig. 1, a production flow of a semiconductor wafer processing process involves a plurality of processes and a plurality of production facilities. Specifically, the production process of the semiconductor wafer processing may be roughly divided into front-end processing, back-end processing, packaging, and testing.

In the front-end process stage, the method mainly comprises the following steps: preparing a silicon wafer, preparing a substrate, performing oxidation and layering by using an oxidation device and a heat treatment device, coating by using a photoetching machine, pre-baking, exposing by using the photoetching machine, post-baking, developing, performing ion implantation by using an ion implantation device, and etching by using an etching machine. In addition, in the front-end process stage, equipment for inspecting wafers in the previous process (which may be referred to as a previous wafer inspection equipment), process control equipment, and the like are used.

In the back-end process stage, the method mainly comprises the following steps: cleaning with a cleaning device, ion doping with a sputtering device, chemical vapor deposition with a chemical vapor deposition (Chemical Vapor Deposition, CVD) device, chemical mechanical polishing with a Chemical Mechanical Polishing (CMP) CHEMICAL MECHANICAL, physical vapor deposition with a physical vapor deposition (Physical Vapor Deposition, PVD).

In the packaging stage, mainly comprises the following steps: the back side of the semiconductor wafer is sequentially polished down and metallized (i.e., a metal layer is deposited on the back side of the wafer). After the encapsulation is finished, the semiconductor wafer is tested, and the semiconductor wafer which is qualified in test can be installed in various electronic equipment.

In practical applications, in the process of processing a semiconductor wafer, the mask may be used for repeating the steps of oxidation build-up, lithography machine coating, cleaning, ion doping, chemical vapor deposition, chemical mechanical polishing, physical vapor deposition, etc. several times.

FIG. 2 is a schematic diagram of the analysis of product yield using an FDC system. Referring to fig. 2, in the semiconductor wafer processing time sequence data monitoring scenario, the method has the characteristics of multiple sensor types, multiple sensor numbers, multiple machine (i.e. equipment) numbers, multiple channels, multiple index numbers and the like. Aiming at the monitoring scene of the time sequence data of the semiconductor wafer processing, the problems of high monitoring difficulty and poor mobility are the current pain point.

The monitoring difficulty is high: massive sensor timing data is generated during semiconductor wafer processing. The sensor time sequence data reflects the stability of equipment in the processing process of the semiconductor wafer, and can reflect the yield of products to a certain extent. Through analysis of the sensor time sequence, unstable factors in the process can be effectively pre-warned in advance. At present, an FDC (Fault Detection and Classification, defect detection and classification) system is generally adopted to collect, process and divide a sensor time sequence collected by a sensor end, and then a statistical method is adopted to carry out statistical analysis, such as average value, standard deviation, minimum value, maximum value, slope and the like of a time slice, so that the conditions of missed judgment and misjudgment are easy to generate due to huge data volume.

Poor migration: because the semiconductor wafer processing process is long, the number of the machine stations is numerous, and the phenomenon that after a few machine stations realize the expected effect, the effect is poor after the machine stations migrate to the rest of the machine stations is often encountered. Based on the traditional statistical method, the method can not be quickly adapted to a new data distribution mode, so that the business expansion is difficult, and the large-scale effect is difficult to realize.

The principle of the abnormality detection method based on the FDC system is mainly as follows: based on a statistical method, different sensor time sequences are segmented, statistical features such as mean value, standard deviation, slope and the like are calculated in time slices, distribution statistics is carried out on batch data, and product quality is monitored in a mode of comparison with a set threshold value. For example, referring to fig. 2, manually configured control parameters (including setting thresholds) exceed 1300 tens of thousands, resulting in a large effort. For any time slice (denoted Step N), the time series data in the time slice is statistically analyzed, for example, the time series data of the temperature in the time slice (red curve in fig. 2), the time series data of the pressure in the time slice (green curve in fig. 2) is statistically analyzed, and the time series data of the pressure in the time slice (black curve in fig. 2) is statistically analyzed. And judging whether the processing process of the semiconductor wafer is abnormal or not according to the comparison result of the statistical result and the corresponding set threshold value.

The anomaly detection method based on the FDC system has great disadvantages, and concretely comprises the following steps:

1. ) A lot of manual work is required to determine the slice extent.

2. ) A large amount of defect data need to be collected for model training or threshold debugging, sample accumulation is difficult, and the model landing period is long;

3. ) The statistical features have larger limitations, and a large number of time sequence dynamic features are lost, so that threshold monitoring is disabled;

4. ) The statistical method has no direct migration capability, and the labor cost is required to be input again for the copying among the devices to carry out the work of monitoring window setting, statistical value analysis, threshold setting and the like, and a large amount of optimization work exists after the online.

From the above, it is known that the difficulty of statistical analysis of massive sensor time series data is increased, and the conditions of missed judgment and erroneous judgment are easily generated, so that the product yield of the semiconductor wafer is difficult to ensure.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Fig. 3 is a flowchart of a model training method according to an embodiment of the present application. Referring to fig. 3, the method may include the steps of:

301. And cutting the time sequence data of the sample sensor to obtain an original Token sequence, wherein the time sequence data of the sample sensor is normal time sequence data of the sensor collected in the manufacturing process of the product.

For industrial manufacturing scenes, there is basically no mass public data set, and no condition for large-scale data pre-training is provided. In this embodiment, a small number of normal samples (also referred to as qualified samples) are collected, and unsupervised training is performed using the normal samples, so as to obtain an anomaly detection network capable of performing anomaly detection on time-series data generated in the manufacturing process of the product.

In practical application, the industry to which the anomaly detection network is applied is not limited. For example, anomaly detection networks are used for anomaly detection during semiconductor wafer processing. For another example, the anomaly detection network is applied to anomaly detection in the manufacturing process of urban rail transit vehicles. For another example, the anomaly detection network is applied to anomaly detection in the manufacturing process of new energy automobiles.

For industries to which the anomaly detection network is applied, time series data collected by sensors in the product manufacturing process of the industry are collected, and the time series data collected by the sensors are referred to as sensor time series data. Sensor timing data includes, for example, but is not limited to: time series data of temperature, time series data of pressure or time series data of flow rate, etc. And screening out normal sensor time sequence data from the sensor time sequence data generated in the product manufacturing process. The normal sensor time sequence data falls in a preset reasonable data range, so that no faults (namely no abnormality occurs) in the product manufacturing process can be reflected, and the normal sensor time sequence data can reflect that the product yield meets the expectations. The abnormal sensor time sequence data is opposite to the normal sensor time sequence data, the abnormal sensor time sequence data does not fall in a preset reasonable data range, and the abnormal sensor time sequence data reflects that the product manufacturing process has faults (namely, abnormality exists), so that the abnormal sensor time sequence data can reflect that the product yield rate does not accord with the expectation.

In this embodiment, the model training is not performed depending on the abnormal sensor time series data (may be understood as an abnormal sample), but the normal sensor time series data (may be understood as a normal sample) is taken as sample sensor time series data to participate in the model training. In contrast, especially in the high-end manufacturing industry, normal samples are easy to obtain, abnormal samples are relatively scarce, and model training is performed by using the normal samples, so that the touchdown property of the abnormal detection network in an industrial scene can be improved. In other words, a small amount of normal samples are adopted for unsupervised training, a large amount of abnormal samples do not need to be accumulated, and the landing of an abnormal detection network can be quickened.

In practical application, in the model training stage, the time sequence data of the sample sensor can be directly segmented according to the collected time sequence data of the sample sensor. Further optionally, the sample sensor timing data may be normalized (normalized) prior to the slicing of the sample sensor timing data. Through normalization processing, the scale difference of the data can be reduced, so that the data has certain distribution characteristics, the convergence rate of model training is improved, and the model performance is improved. In practical application, any normalization processing mode can be adopted, and the method is not limited. Further optionally, sample sensor timing data may be subjected to an instance normalization (Instance Normalization) process. The example normalization can enable data distribution to be more balanced, and speed up model training. Exemplary, example normalization processes are: based on the sensor data of a plurality of moments included in the sample sensor time sequence data, calculating a mean value and/or a standard deviation corresponding to the sample sensor time sequence data, and normalizing the sensor data of the plurality of moments included in the sample sensor time sequence data according to the mean value and/or the standard deviation.

In practical applications, the slicing (patch) process may slice the time-series data into a plurality of Token, and one Token may consider the time-series data in one time slice (also referred to as a time range or a time window), that is, the data collected by the sensor in the time slice. Token in natural language processing can be considered as a word element, which is the smallest unit of text segmentation in natural language processing, and a word element is, for example, a word, punctuation mark, number, or special character. For a pre-trained language model, token that characterizes time-series data within a time slice can be considered a Token.

In practical application, the step length for cutting time series data is not limited. For example, the step length is a fixed duration, each Token is time sequence data of the fixed duration, and the fixed duration is flexibly set according to the requirement. For example, the time series data of the temperature with the duration of 10 seconds is segmented to obtain 5 Token, and each Token is the time series data of the temperature of 2 seconds. For another example, the step size is set to a fixed number, each Token is a fixed number of time series data, and the fixed number is flexibly set according to the requirement. For example, the time series data including 160 temperatures is sliced to obtain 10 Token, and each Token includes 16 temperatures.

In this embodiment, the sample sensor time series data is subjected to the slicing process, and the sample sensor time series data may be sliced into a plurality of Token, and the plurality of Token form a Token sequence. For convenience of distinction, the Token sequence obtained by the slicing process is referred to as an original Token sequence. The segmentation processing can reduce data noise, and solves the problems of difficult model training, poor model generalization capability and the like caused by the data noise.

302. Inputting the original Token sequence into an anomaly detection network constructed based on a target pre-training language model, and carrying out reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence.

In this embodiment, an anomaly detection network is constructed based on a target pre-training language model. Fine tuning on a small number of downstream tasks to obtain an anomaly detection network with reconstruction capability and time sequence data analysis capability of time sequence data.

In practical application, the network structure of the anomaly detection network is not limited. For example, the anomaly detection network may in turn include an embedded layer, a target pre-trained language model. The anomaly detection network needs to perform feature dimension mapping processing on the output result of the target pre-training language model. For another example, the anomaly detection network may include, in order, a target pre-trained language model and an output layer. The anomaly detection network needs to embed the original Token sequence and then input a target pre-training language model. Further alternatively, in order to improve the generalization ability and accuracy of the anomaly detection network, referring to fig. 4, the anomaly detection network may include an embedded layer, a target pre-trained language model, and an output layer connected in sequence. Based on this, the implementation of step 302 is: inputting the original Token sequence into an embedding layer of an anomaly detection network, so that the embedding layer performs embedding processing on the original Token sequence to obtain an embedded vector; inputting the embedded vector into a target pre-training language model of an anomaly detection network, and carrying out reconstruction processing by the target pre-training language model based on the embedded vector to obtain an initial reconstructed Token sequence; and inputting the initial reconstructed Token sequence into an output layer of the anomaly detection network, and carrying out mapping processing on characteristic dimensions of the initial reconstructed Token sequence by the output layer to obtain the reconstructed Token sequence.

Specifically, the embedding layer is used for performing an embedding (Embedding) process, which can convert the high-dimensional sparse vector into a dense vector, thereby facilitating downstream model processing. The embedding process includes, for example, but is not limited to: time embedding (Temporal Embedding) processing, location embedding (Positional Embedding) processing, token embedding (Token Embedding) processing. The time embedding process is to encode the acquisition time of the data; the position embedding process is to encode the ordering positions of the data points in the time sequence data so that the model can learn the front-back relation of the data; the Token embedding process is a process of converting a Token into a vector of a fixed length, and in this embodiment, the Token embedding process may be understood as a mapping process for feature dimensions of the Token, and the mapping process may be a dimension reduction process or a dimension increase process, but is not limited thereto. For example, token embedding processing may employ 1D (one-dimensional) convolution to map the feature dimension of the time series data from the input dimension n to the model dimension. The 1D convolution can effectively encode local features in the time sequence data through the learned convolution kernel weight and convert the local features into feature dimensions required by the pre-training language model, so that the time sequence data can adapt to the input requirement of the pre-training language model. For example, the sample sensor timing data is composed of n sensor timing data collected by n sensors respectively, that is, the sample sensor timing data includes n dimensions of sensor timing data, n is a positive integer; the Token essence obtained by slicing the sample sensor time sequence data is also sensor time sequence data of n dimensions; the model dimension of the target pre-trained language model is 4096, and Token can be mapped from the input dimension n to the model dimension 4096 by 1D convolution.

In practical application, the embedding processing mode of the embedding layer is not limited. For example, the Token embedding layer performs Token embedding processing on the original Token sequence to obtain a Token embedded vector; the Token embedded vector is used as an embedded vector. For another example, the embedding layer performs time embedding processing on the original Token sequence to obtain a time embedded vector; performing Token embedding processing on the original Token sequence by an embedding layer to obtain a Token embedded vector; and obtaining an embedded vector according to the time embedded vector and the Token embedded vector. For another example, the embedding layer performs time embedding processing on the original Token sequence to obtain a time embedded vector; performing position embedding processing on the original Token sequence by an embedding layer to obtain a position embedding vector; performing Token embedding processing on the original Token sequence by an embedding layer to obtain a Token embedded vector; and obtaining an embedded vector according to the time embedded vector, the position embedded vector and the Token embedded vector.

It should be noted that various operations such as vector addition, vector subtraction, or vector concatenation may be performed on the time embedded vector, the position embedded vector, or the Token embedded vector to obtain an embedded vector output by the embedded layer, which is not limited thereto.

It is worth noting that the embedded vector obtained based on the time embedded vector, the position embedded vector and the Token embedded vector can enable the anomaly detection network to better understand and process the time sequence data due to the fact that time sequence information, position information and Token semantic information are fused, and model accuracy of the anomaly detection network can be effectively improved.

In this embodiment, the original Token sequence is input to the embedding layer of the anomaly detection network, and the embedding layer outputs an embedding vector. The embedded vector is input into a target pre-training language model of the anomaly detection network, and the target pre-training language model is used for carrying out reconstruction processing based on the embedded vector to obtain an initial reconstructed Token sequence.

In practice, the target pre-training language model may be a variety of existing complete pre-training language models. Considering that the parameter amount of the complete pre-training language model is huge, in order to balance the calculation efficiency and the generalization capability, a plurality of network modules can be selected from the complete pre-training language model to form a target pre-training language model; the target pre-training language model is an incomplete pre-training language model corresponding to the complete pre-training language model. It will be appreciated that different pre-trained language models include different network structures of network modules. For example, an existing complete pre-training language model has a language model with 70 billion parameters, and a number (e.g., 6) of network modules are selected from the complete pre-training language model to form a target pre-training language model. The network module may be, for example, a transducer-based model, including some core components including, for example, but not limited to: self-Attention (Self-Attention) mechanism module, layer normalization (Layer Norm), feed Forward neural network (Feed Forward). It will be appreciated that the existing complete pre-training language model includes a plurality of network modules, and that selecting a plurality of network modules from the plurality of network modules that are connected in sequence may constitute the target pre-training language model. The reduced number of network modules of the target pre-training language model as compared to the complete pre-training language model is a pre-training language model with smaller parameter scale. The model performance of the target pre-training language model is not quite different from that of the complete pre-training language model, but the model parameters of the target pre-training language model are greatly reduced, so that the calculation efficiency and the generalization capability of the target pre-training language model can be balanced.

In this embodiment, the target pre-training language model has a time-series reconstruction capability, and is capable of reconstructing an original Token sequence based on an embedded vector of the input original Token sequence, where a time-series obtained by reconstructing the original Token sequence is referred to as an initial reconstructed Token sequence. Because the feature dimension of the original reconstructed Token sequence is different from the feature dimension of the original Token sequence, the original reconstructed Token sequence also needs to be input into an output layer of the anomaly detection network, so that the feature dimension of the original reconstructed Token sequence is mapped by the output layer to obtain the reconstructed Token sequence, and the feature dimension of the reconstructed Token sequence is identical to the feature dimension of the input original Token sequence.

303. And adjusting network parameters of the anomaly detection network according to the reconstruction errors between the original Token sequence and the reconstructed Token sequence.

In this embodiment, any loss function may be used to calculate the reconstruction error between the original Token sequence and the reconstructed Token sequence. Arbitrary loss functions include, for example, but are not limited to: negative Log Likelihood Loss function (NLL Loss), cross entropy Loss function (Cross-Entropy Loss), reconstruction error Loss function (Reconstruction Error Loss).

In the model training process, network parameters of the anomaly detection network are adjusted with the aim of minimizing reconstruction errors. In practical application, the method can be used for performing model training for multiple times in an iterative manner, after each model training, the anomaly detection network with the network parameters adjusted is used as the anomaly detection network to be trained next time, the new normal sample is used for performing the next model training until the training condition is met, and the anomaly detection network obtained by the last training is used as the final anomaly detection network. The condition for ending training may be that the number of model training reaches a specified number, or that the reconstruction error is lower than a set threshold, which is not particularly limited herein.

In practical application, network parameters to be adjusted by the anomaly detection network can be flexibly selected according to the needs, and the method is not limited. Further optionally, the network parameters of the anomaly detection network may be adjusted in a parameter efficient fine tuning manner, i.e. a small number of network parameters of the anomaly detection network may be adjusted. Optionally, when the network parameters of the anomaly detection network are adjusted in a parameter efficient fine tuning manner, loRA (Low-Rank Adaptation) training can be performed on the self-attention module in the target pre-training language model. LoRA training is a lightweight fine tuning method, which can effectively and efficiently adjust a pre-trained language model to better adapt to downstream tasks. In practical application, when the network parameters of the anomaly detection network are adjusted in a parameter efficient fine adjustment manner, the network parameters of an embedded layer and/or an output layer in the anomaly detection network can be adjusted, which is not limited.

For a better understanding of the model training process of anomaly detection networks, the following description is presented in connection with FIG. 4.

First, a normal sample is prepared, and the normal sample includes sample sensor timing data from normal timing data collected by a sensor during the manufacturing process of the product. The sample sensor timing data may include normal timing data collected by one or more sensors. For example, the characteristic dimension of the sample sensor timing data is 3, i.e., the sample sensor timing data includes normal timing data collected by each of the 3 sensors. Taking the sample sensor time sequence data shown in fig. 4 as an example, the sample sensor time sequence data includes time sequence data of pressure collected by the pressure sensor, time sequence data of temperature collected by the temperature sensor and time sequence data of torque collected by the torque sensor, that is, the characteristic dimension of the sample sensor time sequence data is 3.

Then, carrying out example normalization processing on the time sequence data of the sample sensor, and carrying out segmentation processing on the normalized time sequence data of the sample sensor to obtain an original Token sequence. For example, in performing the slicing process, the step size may be 16, so that each Token in the original Token sequence includes 16 time instances of data acquired by the sensor.

Then, inputting the original Token sequence into an embedding layer of an anomaly detection network, and performing time embedding processing, position embedding processing and Token embedding processing on the original Token sequence by the embedding layer, and adding or splicing a time embedding vector, a position embedding vector and a Token embedding vector to obtain an embedding vector; the embedded vector is input into a target pre-training language model, the target pre-training language model outputs an initial reconstructed Token sequence, and an output layer of the anomaly detection network performs mapping processing on characteristic dimensions of the initial reconstructed Token sequence to obtain a reconstructed Token sequence, wherein the characteristic dimensions of the reconstructed Token sequence are identical with those of an original Token sequence.

The target pre-training language model shown in fig. 4 is composed of 6 network modules in the complete pre-training language model, which may be, for example, a transducer-based model, including some core components, including, for example, but not limited to: self-Attention (Self-Attention) mechanism module, layer normalization (Layer Norm), feed Forward neural network (Feed Forward).

The self-attention mechanism module generates a final output through interaction of Query (Q), key (Key, K) and Value (Value, V). Specifically, each Token in the input data in the hidden state (HIDDEN STATES) is transformed through a weight matrix Wq to obtain a corresponding query vector; each Token obtains a corresponding key vector through the transformation of a weight matrix Wk; each Token obtains a corresponding value vector through WV conversion of a weight matrix; the output result of the self-attention mechanism module is determined based on the query vector, key vector, value vector of each Token. For more description of the self-attention mechanism module, reference may be made to the related art, and no further description is given here.

Wherein, layer normalization: and normalizing the result to stabilize the training process.

Wherein, feedforward neural network: typically consisting of two linear transformations and an activation function. Its function is to perform a nonlinear transformation on the input to better capture complex patterns.

The basic principle of the network module shown in fig. 4 is roughly: the input data is processed by the self-attention mechanism module, the output result of the self-attention mechanism module and the input data are subjected to various fusion processes such as addition (Add), and the addition result is normalized by layer normalization. Inputting the normalization processing result into a feedforward neural network for processing, and adding the output result of the feedforward neural network and the normalization processing result to obtain a new addition result; and carrying out normalization processing on the new addition result by using layer normalization to obtain an output result of the network module.

In the model training stage, according to the reconstruction error between the original Token sequence and the reconstructed Token sequence, the network parameters of the anomaly detection network are adjusted according to the efficient fine-tuning mode of the parameters. Referring to fig. 4, for example, network parameters of an embedded layer and an output layer in the anomaly detection network are adjusted, and a part of network parameters of the target pre-training language model are adjusted. Specifically, for example, the network parameters of the feedforward neural network are frozen, the normalized network parameters of the trimming layer are frozen, the weight matrix of the self-attention mechanism module is frozen, and the low rank matrix added to the weight matrix Wq and the weight matrix Wk in the self-attention mechanism module is trimmed. It can be understood that the low rank matrix added for each of the weight matrix Wq and the weight matrix Wk in the self-attention mechanism module is fine tuned, that is, the self-attention module is trained LoRA.

Notably, the embodiment of the application realizes end-to-end model development, basically does not need artificial features and large number of abnormal sample accumulation (frequently abnormal samples are rare in high-end manufacture), and can be used for industrial time sequence data analysis by training with a small number of normal samples, thereby accelerating the landing speed of the pre-training language model in various industries such as the semiconductor industry.

In addition, the training process only carries out unsupervised training on normal samples, so that dependence on abnormal samples is reduced, and the abnormal detection network obtains the ability of understanding normal sample distribution, coding or reconstruction.

Because the initialization weight provided by the pre-training language model has better generalization capability, the data volume relied on for training the anomaly detection network model constructed based on the pre-training language model is relatively small. Compared with the data volume required by the training of the anomaly detection network model constructed based on the traditional small model, the data volume relied by the training of the anomaly detection network model constructed based on the pre-training language model can be reduced by 90%. Traditional small models include, for example, but are not limited to: patchTST (Patch-based Temporal Self-Attention), timesNet. PatchTST is a transducer-based time series prediction model specifically designed to capture complex time patterns in long sequences. TimesNet is an efficient time series prediction model, combines the advantages of a convolutional neural network and a gating recursion unit, and aims to improve prediction accuracy while keeping model simplicity and calculation efficiency.

In addition, the generalization capability of the pre-trained language model is strong, after one device is landed, the effect of a large amount of fine tuning of a small model can be achieved through fine tuning of a small amount of data when a new device is migrated, and the migration efficiency is greatly improved. The time sequence dynamic characteristics extracted based on the pre-training language model can better represent the time sequence process than the artificial characteristics, and a better algorithm effect is obtained.

According to the technical scheme provided by the embodiment of the application, the pre-training language model is used as a basic model to construct the anomaly detection network, normal sensor time sequence data acquired in the product manufacturing process is used as a normal sample, the non-supervision training is carried out, the anomaly detection network with time sequence data reconstruction capability is obtained, the anomaly detection network obtained by training has good accuracy and generalization capability, and the anomaly analysis can be carried out on the time sequence data efficiently and accurately. Particularly, the anomaly detection network can accurately and efficiently detect the anomaly time sequence data in the processing process of the semiconductor wafer, and ensure the product yield of the semiconductor wafer.

After training to obtain the anomaly detection network, the anomaly detection network can be applied to anomaly detection of industrial time series data. The industry of anomaly detection network applications is not limited. Particularly, when the abnormality detection network is applied to abnormality detection in the semiconductor wafer processing process, the defect recall rate meeting the expected effect can be achieved, and batch popularization is performed in the semiconductor customer wafer processing process.

Fig. 5 is a flowchart of an anomaly detection method according to an embodiment of the present application. Referring to fig. 5, the method may include the steps of:

501. And acquiring time sequence data of the sensor to be detected, which are acquired by the sensor in the product manufacturing process.

502. And performing segmentation processing on the time sequence data of the sensor to be detected to obtain an original Token sequence.

503. Inputting the original Token sequence into an anomaly detection network, carrying out reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence, and determining an anomaly score according to the original Token sequence and the reconstructed Token sequence.

504. And if the abnormal score is larger than a preset score threshold, determining that the time sequence data of the sensor to be detected is abnormal.

In practical application, when the sensor time sequence data of abnormality analysis is applied to the abnormality detection network, firstly, the sensor time sequence data to be detected, which is acquired by the sensor in the product manufacturing process, is acquired, and the sensor time sequence data to be detected, namely the sensor time sequence data needing to be detected for abnormality detection, is acquired. The sensor timing data to be detected may include timing data collected by one or more sensors, for example, the sensor timing data to be detected includes timing data of temperature collected by a temperature sensor, timing data of pressure collected by a pressure sensor, and timing data of flow collected by a flow sensor.

Then, the time sequence data of the sensor to be detected is segmented, and an original Token sequence is obtained. In practical application, the time sequence data of the sensor to be detected can be directly segmented, and the time sequence data of the sensor to be detected can be normalized before the time sequence data of the sensor to be detected is segmented. Preferably, the normalization process is an example normalization process. The description of the normalization process may be referred to the related description of the foregoing embodiments, and will not be repeated here.

Finally, the original Token sequence obtained by segmenting the time sequence data of the sensor to be detected is input into an anomaly detection network, the original Token sequence is subjected to reconstruction processing by the anomaly detection network to obtain a reconstructed Token sequence, an anomaly score is determined according to the original Token sequence and the reconstructed Token sequence, the anomaly score characterizes the probability of anomaly of the time sequence data of the sensor to be detected, the anomaly score is in direct proportion to the probability of anomaly, and it can be understood that the higher the anomaly score is, the higher the probability of anomaly of the time sequence data of the sensor to be detected is.

In this embodiment, the manner in which the anomaly detection network determines the anomaly score based on the original Token sequence and the reconstructed Token sequence is not selected. For example, differences between the original Token sequence and the reconstructed Token sequence are used as anomaly scores. For another example, the difference between the original Token sequence and the reconstructed Token sequence is added to a fixed value to obtain an anomaly score. For another example, an anomaly score is calculated according to equation (1).

SCORE=(X-Y)×2/L （1）

Wherein, SCORE represents an anomaly SCORE, X represents an original Token sequence, and Y represents a reconstructed Token sequence; l is the time sequence length, i.e. the number of Token in the Token sequence, L is a positive integer.

For the input time series data x= (X1, X2, X3 … … xL), xt= (d 1, d2, d3 … … dn), xt is any one Token of the L tokens, any Token includes data acquired by each of the n sensors, n is a positive integer, for example, d1 represents temperature data acquired by a temperature sensor, d2 represents pressure data acquired by a pressure sensor, d3 represents flow data … … dn acquired by a flow sensor represents torsion data acquired by an n torsion sensor, and so on.

For reconstructed time series data y= (Y1, Y2, Y3 … … yL), yt= (m 1, m2, m3 … … mn), yt is any one Token of the L tokens, any Token includes data reconstructed by each of the n sensors, n is a positive integer, for example, m1 represents temperature data reconstructed by the temperature sensor, m2 represents pressure data reconstructed by the pressure sensor, m3 represents flow data … … mn reconstructed by the flow sensor represents torsion data reconstructed by the torsion sensor, and so on.

In some optional embodiments, the abnormal score may be compared with a preset score threshold, if the abnormal score is greater than the preset score threshold, to indicate that the time sequence data of the sensor to be detected is abnormal, alarm information may be output to prompt the abnormal manufacturing process of the product, so that relevant personnel can detect the quality of the currently processed product in time and overhaul the equipment, thereby preventing the problem of the product yield in advance. If the abnormal score is smaller than or equal to a preset score threshold, the time sequence data of the sensor to be detected is normal, the product manufacturing process is normal, and good product yield can be ensured.

In practical application, a preset scoring threshold value can be flexibly set according to the requirement. Further optionally, the trained anomaly detection network can be tested by using the test set to obtain a test result; optimizing a preset scoring threshold according to the test result. The optimized scoring threshold can more accurately early warn whether the product manufacturing process is abnormal or not.

In practice, the test set may include normal samples and abnormal samples. When the normal sample is used for testing the anomaly detection network, if the anomaly score corresponding to the normal sample is larger than the current score threshold value, the normal sample is mistakenly identified as the anomaly sample. If the abnormal score corresponding to the normal sample is smaller than or equal to the current score threshold, the normal sample is correctly identified as a real normal sample. When the abnormal sample is used for testing the abnormal detection network, if the abnormal score corresponding to the abnormal sample is larger than the current score threshold value, the abnormal sample is correctly identified as the abnormal sample. If the abnormal score corresponding to the abnormal sample is smaller than or equal to the current score threshold value, the abnormal sample is mistakenly identified as a normal sample. The test results are counted according to the test conditions, and the test results comprise Precision and/or Recall. The accuracy may reflect the proportion of actual normal samples among the predicted normal samples, i.e. for all predicted normal samples, how many are true normal samples. The recall may reflect the proportion of normal samples that are correctly predicted as normal samples in the actual normal samples, i.e., how much are correctly predicted as normal samples for all actual normal samples.

In practical application, the accuracy rate is used as a target to meet the requirement, and a preset scoring threshold is optimized. Or optimizing a preset scoring threshold value with the recall rate meeting the requirement as a target. Or the preset scoring threshold is optimized with the aim of balancing the accuracy rate and the recall rate, and the method is not limited.

According to the technical scheme provided by the embodiment of the application, the industrial time sequence data is efficiently and accurately analyzed by using the anomaly detection network constructed by taking the pre-training language model as the basic model, so that the product yield in the industrial manufacturing process can be effectively ensured, and the pre-training language model is landed in an industrial scene.

Fig. 6 is a schematic structural diagram of an anomaly detection system according to an embodiment of the present application. Referring to fig. 6, the abnormality detection system may include: an internet of things system 10 and an intelligent manufacturing platform 20 deployed in a semiconductor manufacturing facility.

The internet of things system 10 is used for controlling the sensor to acquire data in the processing process of the semiconductor wafer, obtaining the time sequence data of the sensor to be detected, and sending the time sequence data of the sensor to be detected to the intelligent manufacturing platform;

The intelligent manufacturing platform 20 is used for performing segmentation processing on the sensor time sequence data to be detected to obtain an original Token sequence; inputting the original Token sequence into an anomaly detection network, carrying out reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence, determining an anomaly score according to the original Token sequence and the reconstructed Token sequence, and determining that the time sequence data of the sensor to be detected is anomalous if the anomaly score is greater than a preset score threshold value.

The training process and the reasoning process of the anomaly detection network can be referred to in the related description of the foregoing embodiments, and will not be repeated here.

In practical application, the internet of things system 10 may acquire sensor time sequence data acquired by various sensors such as a temperature sensor, a pressure sensor, a flow sensor, a torque sensor, a humidity sensor, and an air pressure sensor, and send the sensor time sequence data acquired by the various sensors to the intelligent manufacturing platform 20 as sensor time sequence data to be detected for processing.

The intelligent manufacturing platform 20 can be used for connecting artificial intelligence and big data technology into a traditional production line, helping a manufacturing enterprise to realize cooperation of data flow, production flow and control flow, improving production efficiency, reducing production cost and realizing autonomous and controllable intelligent manufacturing through an autonomous and controllable path. The smart manufacturing platform 20 may be comprised of software and/or hardware, the smart manufacturing platform 20 including, for example, but not limited to: distributed servers, regular servers, cloud servers, virtual servers, edge servers, high performance computing (High Performance Computing, HPC) server clusters, and the like.

The intelligent manufacturing platform 20 calls an anomaly detection network to perform anomaly detection on the time sequence data of the sensor to be detected, and an anomaly detection result is obtained. The abnormality detection results include, for example, but are not limited to: an anomaly score, a comparison between the anomaly score and a preset score threshold, alarm information reflecting that the anomaly score is greater than the preset score threshold, and the like. The intelligent manufacturing platform 20 sends the abnormality detection result to the monitoring terminal 30, so that the related personnel of the monitoring terminal 30 can timely learn the abnormality detection result and control the semiconductor wafer processing process based on the abnormality detection result. The monitoring terminal 30 includes, for example, but is not limited to: a cell phone, desktop computer, tablet computer, wearable device or vehicle terminal, etc.

According to the anomaly detection system provided by the embodiment of the application, the time sequence data of the sensor to be detected is collected through the Internet of things system arranged in the semiconductor production factory, and the anomaly detection network is called through the intelligent manufacturing platform to perform anomaly detection on the time sequence data of the sensor to be detected. Because the anomaly detection network constructed by taking the pre-training language model as the basic model is utilized, the time sequence data in the semiconductor wafer processing process is efficiently and accurately analyzed, the product yield in the semiconductor wafer processing process can be effectively ensured, and the pre-training language model is landed in the semiconductor wafer processing scene.

Fig. 7 is a schematic structural diagram of a model training device according to an embodiment of the present application. Referring to fig. 7, the apparatus may include:

The slicing module 71 is configured to slice the sample sensor time sequence data to obtain an original Token sequence, where the sample sensor time sequence data is normal sensor time sequence data collected in a product manufacturing process;

The training module 72 is configured to input the original Token sequence into an anomaly detection network constructed based on the target pre-training language model, so that the anomaly detection network performs reconstruction processing on the original Token sequence to obtain a reconstructed Token sequence; and adjusting network parameters of the anomaly detection network according to the reconstruction errors between the original Token sequence and the reconstructed Token sequence.

Further optionally, the training module 72 is specifically configured to, when performing the reconstruction process: inputting the original Token sequence into an embedding layer of an anomaly detection network, so that the embedding layer performs embedding processing on the original Token sequence to obtain an embedded vector; inputting the embedded vector into a target pre-training language model of an anomaly detection network, and carrying out reconstruction processing by the target pre-training language model based on the embedded vector to obtain an initial reconstructed Token sequence; and inputting the initial reconstructed Token sequence into an output layer of the anomaly detection network, and carrying out mapping processing on characteristic dimensions of the initial reconstructed Token sequence by the output layer to obtain the reconstructed Token sequence.

Further optionally, the apparatus further includes: and the normalization processing module is used for carrying out example normalization processing on the time sequence data of the sample sensor.

Further optionally, the training module 72 is specifically configured to, when performing the embedding process: performing time embedding processing on the original Token sequence by an embedding layer to obtain a time embedded vector; performing position embedding processing on the original Token sequence by an embedding layer to obtain a position embedding vector; performing Token embedding processing on the original Token sequence by an embedding layer to obtain a Token embedded vector; and obtaining an embedded vector according to the time embedded vector, the position embedded vector and the Token embedded vector.

Further optionally, when the training module 72 adjusts the network parameters of the anomaly detection network in a parameter efficient fine tuning manner, the training module is specifically configured to: and performing low-rank adaptation LoRA training on the self-attention mechanism module in the target pre-training language model.

Further optionally, the target pre-training language model is composed of a part of the network modules in the complete pre-training language model.

The apparatus shown in fig. 7 may perform the method shown in the embodiment shown in fig. 3, and its implementation principle and technical effects will not be repeated.

Fig. 8 is a schematic structural diagram of an abnormality detection apparatus according to an embodiment of the present application. Referring to fig. 8, the apparatus may include:

An acquisition module 81, configured to acquire sensor timing data to be detected acquired by a sensor during a product manufacturing process;

The segmentation module 82 is used for carrying out segmentation processing on the time sequence data of the sensor to be detected to obtain an original Token sequence;

The detection module 83 is configured to input the original Token sequence into the anomaly detection network, perform reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence, determine an anomaly score according to the original Token sequence and the reconstructed Token sequence, and determine that the time sequence data of the sensor to be detected is anomalous if the anomaly score is greater than a preset score threshold.

Further optionally, the detecting module 83 is further configured to: and if the abnormal score is greater than a preset score threshold, outputting alarm information.

Further optionally, the apparatus further includes: the optimizing module is used for testing the trained anomaly detection network by using the testing set to obtain a testing result; optimizing a preset scoring threshold according to the test result.

Further alternatively, the anomaly detection network is applied to anomaly detection during semiconductor wafer processing.

The apparatus shown in fig. 8 may perform the method shown in the embodiment shown in fig. 5, and its implementation principle and technical effects will not be repeated.

It should be noted that, the execution subjects of each step of the method provided in the above embodiment may be the same device, or the method may also be executed by different devices. For example, the execution subject of steps 301 to 303 may be device a; for another example, the execution subject of steps 301 and 302 may be device a, and the execution subject of step 303 may be device B; etc.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that the operations may be performed out of the order in which they appear herein or performed in parallel, the sequence numbers of the operations such as 301, 302, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device includes: a memory 91 and a processor 92;

Memory 91 is used to store computer programs and may be configured to store various other data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on a computing platform, contact data, phonebook data, messages, pictures, videos, and the like.

The Memory 91 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as Static Random-Access Memory (SRAM), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY, EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.

A processor 92 coupled to the memory 91 for executing the computer program in the memory 91 for: a model training method or an anomaly detection method.

Further alternatively, as shown in fig. 9, the electronic device further includes: communication component 93, display 94, power component 95, audio component 96, and other components. Only some of the components are schematically shown in fig. 9, which does not mean that the electronic device only comprises the components shown in fig. 9. In addition, the components within the dashed box in fig. 9 are optional components, not necessarily optional components, depending on the product form of the electronic device. The electronic device in this embodiment may be implemented as a terminal device such as a desktop computer, a notebook computer, a smart phone, or an IOT (internet of things ) device, or may be a server device such as a conventional server, a cloud server, or a server array. If the electronic device of the embodiment is implemented as a terminal device such as a desktop computer, a notebook computer, or a smart phone, the electronic device may include components within the dashed-line frame in fig. 9; if the electronic device of the embodiment is implemented as a server device such as a conventional server, a cloud server, or a server array, the components within the dashed box in fig. 9 may not be included.

The detailed implementation process of each action performed by the processor may refer to the related description in the foregoing method embodiment or the apparatus embodiment, and will not be repeated herein.

Accordingly, the present application also provides a computer readable storage medium storing a computer program, where the computer program is executed to implement the steps executable by the electronic device in the above method embodiments.

Accordingly, embodiments of the present application also provide a computer program product comprising a computer program/instructions which, when executed by a processor, cause the processor to carry out the steps of the above-described method embodiments that are executable by an electronic device.

The communication component is configured to facilitate wired or wireless communication between the device in which the communication component is located and other devices. The device where the communication component is located may access a wireless network based on a communication standard, such as a mobile communication network of WiFi (WIRELESS FIDELITY ), 2G (2 generation,2 generation), 3G (3 generation ), 4G (4 generation,4 generation)/LTE (long Term Evolution ), 5G (5 generation,5 generation), or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the Communication component further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, in the NFC module, it may be implemented based on radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data Association (IrDA) technology, ultra Wide Band (UWB) technology, bluetooth (BT) technology, and other technologies.

The display includes a screen, which may include a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation.

The power supply component provides power for various components of equipment where the power supply component is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

The audio component described above may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (Central Processing Unit, CPUs), input/output interfaces, network interfaces, and memory.

The Memory may include non-volatile Memory in a computer readable medium, random access Memory (Random Access Memory, RAM) and/or non-volatile Memory, such as Read Only Memory (ROM) or flash RAM. Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change Memory (PHASE CHANGE RAM, PRAM), static Random-Access Memory (SRAM), dynamic Random-Access Memory (Dynamic Random Access Memory, DRAM), other types of Random-Access Memory (Random Access Memory, RAM), read-Only Memory (ROM), electrically erasable programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, compact disc Read-Only Memory (CD-ROM), digital versatile discs (DIGITAL VERSATILE DISC, DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method of model training, comprising:

The method comprises the steps of segmenting sample sensor time sequence data to obtain an original Token sequence, wherein the sample sensor time sequence data are normal sensor time sequence data acquired in the manufacturing process of a product, and the Token refers to time sequence data in one time slice;

Inputting the original Token sequence into an anomaly detection network constructed based on a target pre-training language model, and carrying out reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence;

According to the reconstruction error between the original Token sequence and the reconstructed Token sequence, adjusting network parameters of the anomaly detection network;

the anomaly detection network comprises an embedded layer, a target pre-training language model and an output layer which are sequentially connected; the model structure of the target pre-training language model is any one of the following: bi-directional encoder representation from a transducer, autoregressive language model, and generative pre-training language model;

inputting the original Token sequence into an anomaly detection network constructed based on a target pre-training language model, so that the anomaly detection network carries out reconstruction processing on the original Token sequence to obtain a reconstructed Token sequence, wherein the method comprises the following steps of:

Inputting the original Token sequence into the embedding layer, so that the embedding layer performs embedding processing on the original Token sequence to obtain an embedded vector;

inputting the embedded vector into the target pre-training language model to obtain an initial reconstructed Token sequence by reconstruction processing of the target pre-training language model based on the embedded vector;

And inputting the initial reconstructed Token sequence into the output layer, and carrying out mapping processing on the characteristic dimension of the initial reconstructed Token sequence by the output layer to obtain the reconstructed Token sequence.

2. The method of claim 1, wherein the embedding the original Token sequence by the embedding layer results in an embedded vector, comprising:

Performing time embedding processing on the original Token sequence by the embedding layer to obtain a time embedded vector;

Performing position embedding processing on the original Token sequence by the embedding layer to obtain a position embedding vector;

performing Token embedding processing on the original Token sequence by the embedding layer to obtain a Token embedded vector;

and obtaining the embedded vector according to the time embedded vector, the position embedded vector and the Token embedded vector.

3. The method of claim 1, further comprising, prior to slicing the sample sensor timing data:

And carrying out example normalization processing on the time sequence data of the sample sensor.

4. A method according to any one of claims 1-3, wherein adjusting network parameters of the anomaly detection network comprises:

And adjusting network parameters of the anomaly detection network according to a parameter efficient fine adjustment mode.

5. The method of claim 4, wherein adjusting network parameters of the anomaly detection network in a parameter efficient fine-tuning manner comprises:

and performing low-rank adaptation LoRA training on the self-attention mechanism module in the target pre-training language model.

6. A method according to any of claims 1-3, wherein the target pre-training language model consists of part of the network modules in the complete pre-training language model.

7. An abnormality detection method, comprising:

Acquiring time sequence data of a sensor to be detected, which is acquired by the sensor in the manufacturing process of the product;

Performing segmentation processing on the time sequence data of the sensor to be detected to obtain an original Token sequence;

Inputting the original Token sequence into an anomaly detection network, carrying out reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence, and determining an anomaly score according to the original Token sequence and the reconstructed Token sequence;

if the abnormal score is larger than a preset score threshold, determining that the time sequence data of the sensor to be detected is abnormal;

wherein the anomaly detection network is trained in accordance with the method of any one of claims 1-6.

8. The method as recited in claim 7, further comprising:

and if the abnormal score is greater than a preset score threshold, outputting alarm information.

9. The method as recited in claim 8, further comprising:

testing the trained anomaly detection network by using a test set to obtain a test result;

and optimizing the preset scoring threshold according to the test result.

10. The method according to any one of claims 7-9, wherein the anomaly detection network is applied for anomaly detection during semiconductor wafer processing.

11. An abnormality detection system, characterized in that the abnormality detection system includes: an internet of things system and an intelligent manufacturing platform deployed in a semiconductor production factory;

the system of the Internet of things is used for controlling a sensor to acquire data in the processing process of a semiconductor wafer, obtaining sensor time sequence data to be detected, and sending the sensor time sequence data to be detected to the intelligent manufacturing platform;

The intelligent manufacturing platform is used for carrying out segmentation processing on the time sequence data of the sensor to be detected to obtain an original Token sequence; inputting the original Token sequence into an anomaly detection network, carrying out reconstruction processing on the original Token sequence by the anomaly detection network to obtain a reconstructed Token sequence, determining an anomaly score according to the original Token sequence and the reconstructed Token sequence, and determining that the time sequence data of the sensor to be detected is anomalous if the anomaly score is greater than a preset score threshold value; wherein the anomaly detection network is trained in accordance with the method of any one of claims 1-6.

12. An electronic device, comprising: a memory and a processor; the memory is used for storing a computer program; the processor is coupled to the memory for executing the computer program for performing the steps of the method of any of claims 1-6 or 7-10.

13. A computer readable storage medium storing a computer program, which when executed by a processor causes the processor to carry out the steps of the method of any one of claims 1-6 or 7-10.

14. A computer program product comprising computer programs/instructions which, when executed by a processor, cause the processor to carry out the steps of the method of any of claims 1-6 or 7-10.