CN109791627B

CN109791627B - Semiconductor device modeling for training deep neural networks using input preprocessing and transformation targets

Info

Publication number: CN109791627B
Application number: CN201880001122.9A
Authority: CN
Inventors: 雷源; 霍晓
Original assignee: Hong Kong Applied Science and Technology Research Institute ASTRI
Current assignee: Hong Kong Applied Science and Technology Research Institute ASTRI
Priority date: 2018-06-19
Filing date: 2018-06-20
Publication date: 2022-10-21
Anticipated expiration: 2038-06-20
Also published as: CN109791627A

Abstract

A semiconductor device modeling system and method based on a deep neural network. Training data is collected from measurements of test transistors, including gate and drain voltages, transistor width and length, and drain current measured under the above input conditions. The training data is transformed by an input preprocessing module, which may employ a method of taking the logarithm of the input data or performing a principal component analysis (PCA). When training the deep neural network, instead of using the measured drain current as the fitting target, the converted drain current generated by the target conversion module is used as the fitting target. Derivative, or the logarithm of its derivative. The total error is obtained by comparing the output of the input data through the deep neural network with the converted drain current. By adjusting the weight of the deep neural network during the training period, the total error value is continuously reduced, and the training is finally completed.

Description

Semiconductors for Training Deep Neural Networks Using Input Preprocessing and Transforming Targets Device modeling

【技术领域】【Technical field】

本发明涉及半导体器件建模，特别涉及使用人工神经网络来对器件进行建模。The present invention relates to semiconductor device modeling, and in particular to modeling devices using artificial neural networks.

【背景技术】【Background technique】

单个集成电路(IC)可能包含一百万个晶体管。每个晶体管通常是形成在半导体衬底上的金属氧化物半导体场效应晶体管(MOSFET)或其变体。在IC设计过程中，会创建一个网表或原理图来详细说明这些晶体管与其他元件如电容和电阻之间的连接。然后可以使用电路仿真器来仿真网表，电路仿真器使用器件模型来仿真每个晶体管的运行。A single integrated circuit (IC) may contain a million transistors. Each transistor is typically a metal oxide semiconductor field effect transistor (MOSFET) or a variant thereof formed on a semiconductor substrate. During the IC design process, a netlist or schematic is created to detail the connections between these transistors and other components such as capacitors and resistors. The netlist can then be simulated using a circuit simulator, which uses the device model to simulate the operation of each transistor.

器件模型估计晶体管的电特性，例如随栅极和漏极电压变化的漏极电流。更精确的仿真可以使用更精确的模型来估计其他参数如寄生电容，以更好地估计延迟和电路时序。The device model estimates the electrical properties of the transistor, such as drain current as a function of gate and drain voltages. More accurate simulations can use more accurate models to estimate other parameters such as parasitic capacitances to better estimate delays and circuit timing.

一个重要的模拟器是集成电路重点仿真程序(SPICE)，最初由加利福尼亚大学伯克利分校于1975年开发。自那时以来，SPICE得到了扩展和增强，并有多种变体。伯克利短沟道IGFET模型(BSIM)是另一种特别适用于小尺寸晶体管器件的模型。An important simulator is the Program for Emulation of Integrated Circuits Emphasis (SPICE), originally developed in 1975 by the University of California, Berkeley. SPICE has been expanded and enhanced since then, and has many variants. The Berkeley short-channel IGFET model (BSIM) is another model that is particularly well suited for small-scale transistor devices.

测试电路(例如集成电路带有的可以手动用探针探测测试焊盘的晶体管)允许器件工程师手动探测和测试这些器件，以测量随电压变化的电流。使用这些测试结果，器件工程师可以确定器件参数用于SPICE或BSIM器件模型，并将这些参数用于仿真更大规模的IC。尽管这些手动测量被自动测量取代，但提取供SPICE或BSIM模型使用的器件参数仍然非常耗时。Test circuits, such as transistors with integrated circuits whose test pads can be manually probed, allow device engineers to manually probe and test these devices to measure current as a function of voltage. Using these test results, device engineers can determine device parameters for use in SPICE or BSIM device models and use these parameters to simulate larger scale ICs. Although these manual measurements are replaced by automated measurements, extracting device parameters for use with SPICE or BSIM models is still time-consuming.

随着器件尺寸缩小，基本的一阶器件模型未能准确地估计较小器件的电流。短沟道长度、埋层和亚微米几何形状所引起的二阶效应需要新的参数和更复杂的器件建模方程。需要添加并测试更多具有不同尺寸和形状的测试设备，以获得这些附加参数的值。自动测量设备允许更快地提取器件模型参数。As device sizes shrink, basic first-order device models fail to accurately estimate current for smaller devices. Second-order effects caused by short channel lengths, buried layers, and submicron geometries require new parameters and more complex device modeling equations. More test equipment of different sizes and shapes needs to be added and tested to obtain values for these additional parameters. Automatic measurement equipment allows faster extraction of device model parameters.

随着器件尺寸不断缩小，栅长仅为10纳米或更小的器件对器件建模提出了额外的挑战，因为器件尺寸接近半导体衬底中的原子尺寸。正在使用的新半导体材料，如氮化镓(GaN)、砷化镓(GaAs)和碳化硅(SiC)，其物理性质与硅不同。特殊器件，如鳍式场效应晶体管(FinFET)和绝缘体上硅(SOI)，有三维电流流动，使用较旧的二维电流模型不能精确建模。测量不同尺寸和形状的测试器件的实际电流，对创建有用的器件模型至关重要。As device dimensions continue to shrink, devices with gate lengths of 10 nanometers or less present additional challenges for device modeling as device dimensions approach the atomic dimensions in semiconductor substrates. New semiconductor materials are being used, such as gallium nitride (GaN), gallium arsenide (GaAs) and silicon carbide (SiC), which have different physical properties than silicon. Special devices, such as fin field-effect transistors (FinFETs) and silicon-on-insulator (SOI), have three-dimensional current flow that cannot be accurately modeled using older two-dimensional current models. Measuring the actual current of test devices of different sizes and shapes is critical to creating useful device models.

最近，人工神经网络(ANN)被用于生成器件模型和选择模型的参数。人工神经网络特别适用于处理大量数据，这些方式比较复杂很难通过使用传统计算机程序来定义。不是利用指令进行编程，而是将训练数据输入到神经网络，并与预期输出进行比较，然后在神经网络内进行调整，接着再次处理训练数据并进行比较，以产生对神经网络的进一步调整。在经历多次这样的训练周期之后，经过训练的神经网络可以有效地处理类似于训练数据和预期输出的数据。神经网络是机器学习的一个例子，因为神经网络学习如何生成训练数据的预期输出。接着，可以将与训练数据相似的实际数据输入到神经网络来处理实时数据。More recently, artificial neural networks (ANNs) have been used to generate device models and select model parameters. Artificial neural networks are particularly useful for processing large amounts of data in ways that are complex and difficult to define using traditional computer programs. Instead of programming with instructions, training data is fed into the neural network and compared to the expected output, adjustments are made within the neural network, and the training data is processed again and compared to produce further adjustments to the neural network. After many such training cycles, the trained neural network can efficiently process data similar to the training data and expected output. A neural network is an example of machine learning because a neural network learns how to generate the expected output from the training data. Next, real data similar to the training data can be fed into the neural network to process the real-time data.

图1显示一个现有技术的神经网络。输入节点102、104、106、108接收输入数据I₁、I₂、I₃、...、I₄，而输出节点103、105、107、109输出神经网络的运行结果，输出数据O₁、O₂、O₃、...、O₄。在这个神经网络包含三层中间层。节点110、112、114、116、118，各自从输入节点102、104、106、108中的一个或多个获取输入，执行一些诸如加、减、乘或更复杂运算之类的操作，并发送和输出到第二层的节点。第二层节点120、122、124、126、128、129也接收多个输入，合并这些输入以生成输出，并将输出发送到第三层节点132、134、136、138、139，其类似地合并输入并生成输出。Figure 1 shows a prior art neural network. Input nodes 102, 104, 106, 108 receive input data I ₁ , I ₂ , _I ₃ , _. O ₂ , O ₃ , ..., O ₄ . In this neural network there are three intermediate layers. Nodes 110, 112, 114, 116, 118, each take input from one or more of input nodes 102, 104, 106, 108, perform some operation such as add, subtract, multiply or more complex operations, and send and outputs to the nodes of the second layer. Second tier nodes 120, 122, 124, 126, 128, 129 also receive multiple inputs, combine these inputs to generate outputs, and send the outputs to third tier nodes 132, 134, 136, 138, 139, which similarly Merge inputs and generate outputs.

通常对每层输入会进行加权，因此在每个节点处产生了加权后的总和(或其他加权运算结果)。对节点上的每个输入分配一个权重，该权重与该输入相乘，然后由该节点将所有加权后的输入一起相加、相乘或其他运算，以产生该节点的输出。这些权重可以被指定为W₃₁、W₃₂、W₃₂、W₃₃、...、W₄₁等，并在训练过程中调整其值。通过反复试验或其他训练程序，最终对于产生预期输出的路径，可以给出更高的权重，而将更小的权重分配给不产生预期输出的路径。机器会学习哪些路径产生期望输出并将高权重分配给这些路径的输入。The inputs to each layer are usually weighted so that a weighted sum (or other weighted operation result) is produced at each node. Each input on a node is assigned a weight, that weight is multiplied by that input, and all weighted inputs are added, multiplied, or otherwise operated on together by the node to produce the node's output. These weights can be designated as _W31 , _W32 , _W32 , _W33 , . . . , _W41 , etc., and their values adjusted during training. Through trial and error or other training procedures, eventually paths that produce the expected output can be given higher weights, while paths that do not produce the expected output can be assigned smaller weights. The machine learns which paths produce the desired output and assigns high weights to the inputs of those paths.

这些权重可以存储在权重存储器100中。由于神经网络通常具有许多节点，因此权重存储器100中存储有许多权重。每个权重可能需要多个二进制比特来表示该权重可能值的范围。权重通常需要8到16比特。These weights may be stored in the weight memory 100 . Since neural networks typically have many nodes, many weights are stored in the weight memory 100 . Each weight may require multiple binary bits to represent the range of possible values for that weight. Weights typically require 8 to 16 bits.

图2显示一个晶体管器件模型。栅极电压Vg和漏极电压Vd被施加到晶体管，而源极电压Vs通常被接地。衬底或体电压Vb可以接地，或是另一电压如反向偏置。器件模型使用各种参数来预测漏极电流Ids(drain to source current)，其是Vg、Vd、Vb、Vs的函数。其他输入如温度T、栅宽W和栅长L也会影响预测漏极电流，特别是当L非常小时。Figure 2 shows a transistor device model. The gate voltage Vg and drain voltage Vd are applied to the transistor, while the source voltage Vs is normally grounded. The substrate or bulk voltage Vb can be ground, or another voltage such as reverse bias. The device model uses various parameters to predict the drain current Ids (drain to source current), which is a function of Vg, Vd, Vb, Vs. Other inputs such as temperature T, gate width W and gate length L also affect the predicted drain current, especially when L is very small.

图3显示对器件建模时的过度拟合问题。测量数据204、206、208被输入到一个神经网络中以产生最适合测量数据204、206、208的模型参数。模型电流202是测量数据204、206、208的最佳拟合模型。测量数据206、208是两个异常数据点，其可能是某种测量误差的结果。使用神经网络来拟合包括异常测量数据206、208在内的所有数据点，导致模型电流202尖峰向下到异常测量数据208，然后急剧上升到异常数据点206，再向下到测量数据204。异常测量数据206、208导致模型电流202在异常数据点周围有一个负电导。而且，超出测量数据204的模型电流202可能是不可靠的，倾向于不稳定的行为。可扩展性差。Figure 3 shows the overfitting problem when modeling the device. The measurement data 204 , 206 , 208 are input into a neural network to generate model parameters that best fit the measurement data 204 , 206 , 208 . The model current 202 is the best fit model of the measurement data 204 , 206 , 208 . The measurement data 206, 208 are two outlier data points, which may be the result of some measurement error. Fitting all data points including anomalous measurement data 206 , 208 using a neural network causes model current 202 to spike down to anomalous measurement data 208 , then sharply rise to anomalous data point 206 , and down to measurement data 204 . The abnormal measurement data 206, 208 cause the model current 202 to have a negative conductance around the abnormal data points. Also, model currents 202 that exceed measured data 204 may be unreliable, prone to erratic behavior. Poor scalability.

图4A-4B显示在模型达到最小值时的较差的建模。在图4A，绘制了是漏极-源极电压的函数的漏极电流。训练数据214和测试数据212是测量数据点，训练数据214被输入到神经网络以生成权重值，而测试数据212用于测试神经网络权重值的准确性。218是通过输入不同栅极电压Vg所生成的模型。尽管模型218的精度如图4A所示在电流较大时看起来较好，但如图4B所示在电流较小时的模型218的精度很差。模型218不是收敛在原点(0,0)，而是原点附近。对处于亚阈值区的电压和电流而言，模型218是失败的。4A-4B show poor modeling when the model reaches a minimum value. In Figure 4A, the drain current is plotted as a function of drain-source voltage. Training data 214 and test data 212 are measurement data points, the training data 214 is input to the neural network to generate weight values, and the test data 212 is used to test the accuracy of the neural network weight values. 218 is a model generated by inputting different gate voltages Vg. While the accuracy of the model 218 appears to be better at higher currents as shown in FIG. 4A , the accuracy of the model 218 at lower currents as shown in FIG. 4B is poor. The model 218 does not converge at the origin (0,0), but near the origin. Model 218 fails for voltages and currents in the subthreshold region.

图5显示使用漏极电流作为拟合目标来训练神经网络以产生一个器件模型。对测试晶体管进行测量，并记录输入电压、温度、沟道宽度W和沟道长度L作为训练数据34，将测量得到的漏极电流Ids记录为与输入数据组合相对应的拟合目标数据38。神经网络36接收训练数据34和一个当前权重集，并对训练数据34进行操作以产生一个结果。神经网络36产生的结果通过损失函数42与拟合目标数据38进行比较，损失函数42产生一个损失值，其是显示所产生的结果距拟合目标的误差。损失函数42产生的损失值被用来调整权重，施加到神经网络36。通过损失函数42应用到训练数据34上，权重进行多次迭代，直到找出一个最小损失值，并将最终权重集用于晶体管模型。Figure 5 shows training a neural network to produce a device model using drain current as a fitting target. The test transistors are measured, and the input voltage, temperature, channel width W, and channel length L are recorded as training data 34, and the measured drain current Ids is recorded as fitting target data 38 corresponding to the input data combination. Neural network 36 receives training data 34 and a current set of weights, and operates on training data 34 to produce a result. The results produced by the neural network 36 are compared to the fit target data 38 by a loss function 42, which produces a loss value that indicates the error of the produced results from the fit target. The loss values produced by the loss function 42 are used to adjust the weights applied to the neural network 36 . By applying the loss function 42 to the training data 34, the weights are iterated many times until a minimum loss value is found, and the final set of weights is used for the transistor model.

器件模型期望在一个在更宽广的范围(从亚阈值到强反型区)上准确。但是，使用神经网络会导致图3的过度拟合问题和图4B的亚阈值准确度的问题。此外，某些电路仿真器使用模型的导数或斜率如电导(Gds)和跨导(Gm)，但模型收敛问题可能会使提取的电导(Gds)和跨导(Gm)值出现失真。模型的一阶导数可能精度较差。单调性可能很差。为了避免过度拟合和不好的单调性问题，可能需要限制隐藏层的尺寸，这使得深度神经网络的使用变得困难。但是，浅层神经网络不能应用于更复杂的模型，如果仍想得到准确的模型的话。Device models are expected to be accurate over a wider range (from subthreshold to strong inversion regions). However, using a neural network leads to the overfitting problem of Figure 3 and the sub-threshold accuracy of Figure 4B. Also, some circuit simulators use model derivatives or slopes such as conductance (Gds) and transconductance (Gm), but model convergence issues may distort the extracted conductance (Gds) and transconductance (Gm) values. The first derivative of the model may have poor accuracy. Monotonicity can be bad. To avoid overfitting and bad monotonicity issues, it may be necessary to limit the size of the hidden layers, which makes the use of deep neural networks difficult. However, shallow neural networks cannot be applied to more complex models if an accurate model is still desired.

所期望的是有一种用于半导体集成电路(IC)的器件模型，其准确地模拟一个宽范围内的电流，包括亚阈值区。神经网络产生的器件模型是令人期望的，但不要有过度拟合的问题。一种能够精确建模电导(Gds)和跨导(Gm)值的器件模型是令人期望的。It is desirable to have a device model for semiconductor integrated circuits (ICs) that accurately simulates a wide range of currents, including the subthreshold region. The device models produced by the neural network are desirable, but do not have problems with overfitting. A device model capable of accurately modeling conductance (Gds) and transconductance (Gm) values is desirable.

【附图说明】【Description of drawings】

图1显示一个现有技术的神经网络。Figure 1 shows a prior art neural network.

图2显示一个晶体管器件模型。Figure 2 shows a transistor device model.

图3显示器件建模时的过度拟合问题。Figure 3 shows the overfitting problem when modeling the device.

图4A-4B显示接近最小值时准确度较差的模型。Figures 4A-4B show less accurate models near the minimum.

图5显示使用漏极电流作为目标训练神经网络，以产生一个器件模型。Figure 5 shows training a neural network using the drain current as the target to produce a device model.

图6是对半导体漏极电流进行转换的人工神经网络的示意图。FIG. 6 is a schematic diagram of an artificial neural network that converts semiconductor drain currents.

图7显示对预处理输入进行操作并使用转换目标的损失函数来调整权重的一个深度神经网络。Figure 7 shows a deep neural network that operates on preprocessed inputs and uses a loss function that transforms the target to adjust the weights.

图8显示使用转换的漏极电流作为深度神经网络的拟合目标解决器件建模时的过度拟合问题。Figure 8 shows using the transformed drain current as a fitting target for a deep neural network to address overfitting in device modeling.

图9A-9B显示使用转换漏极电流作为拟合目标可以使得深度神经网络能够更好地对亚阈区晶体管进行建模。Figures 9A-9B show that using the transformed drain current as a fitting target enables deep neural networks to better model subthreshold transistors.

图10显示一种晶体管仿真器，其是基于预处理输入运算并以转换漏极电流为拟合目标的深神经网络而获得模型和参数。Figure 10 shows a transistor simulator that obtains models and parameters based on a deep neural network that preprocesses input operations and is fitted to convert drain current.

【具体实施方式】【Detailed ways】

本发明涉及使用人工神经网络对半导体器件建模的改进。以下描述使本领域普通技术人员能够制造和使用在特定应用及其要求的上下文中所提供的本发明。对本领域技术人员而言，对优选实施例的各种修改将是显而易见的，本发明所定义的一般原理可以应用于其他实施例。因此，本发明并非旨在受限于所示和所述的特定实施例，而是应被赋予与本发明公开的原理和新颖特征一致的最宽范围。The present invention relates to improvements in the modeling of semiconductor devices using artificial neural networks. The following description enables one of ordinary skill in the art to make and use the invention provided in the context of a particular application and its requirements. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

图6是以经过变换的半导体器件漏极电流作为拟合目标的人工神经网络的示意图。虽然漏极电流在某些范围内近似线性，但漏极电流可能在某些更大的范围内是非线性的。发明人认识到，与通常用于晶体管一阶模型不同，漏极电流是非线性的。发明人相信，使用漏极电流作为神经网络的拟合目标，将会产生亚阈值电流的模型精度问题、过度拟合、较差的单调性和收敛性问题。FIG. 6 is a schematic diagram of an artificial neural network with the transformed semiconductor device drain current as a fitting target. While the drain current is approximately linear in some ranges, the drain current may be non-linear in some larger ranges. The inventors have recognized that, unlike first-order models commonly used for transistors, the drain current is non-linear. The inventors believe that using drain current as a fitting target for a neural network will create model accuracy issues, overfitting, poor monotonicity and convergence issues for subthreshold currents.

发明人已经发现，通过对目标的转换有利于模型生成和所得模型精度。不使用漏极电流Ids作为目标，而是转换漏极电流，并由损失函数42使用转换的漏极电流来生成损失函数并调整权重。The inventors have found that model generation and resulting model accuracy are facilitated by transforming the target. Instead of using the drain current Ids as a target, the drain current is converted and the converted drain current is used by the loss function 42 to generate a loss function and adjust the weights.

目标数据38中的测量的漏极电流被目标转换模块44转换为转换漏极电流X_Ids。目标转换模块44通过求导的方式对漏极电流进行转换。转换过程可以是漏极电流对栅极电压，漏极电压，衬底电压，或者晶体管的尺寸和温度求导。求上述导数的对数也是一种转换方法。损失函数42计算神经网络输出与其预期输出之间的总差值，预期输出就是转换漏极电流X_Ids。应用于深度神经网络50的权重通过一些优化算法如随机梯度下降算法来调整，以使由损失函数42计算的总差值越来越小。由损失函数42计算的转换后漏极电流与神经网络输出之间差值，可能会比漏极电流与神经网络输出之间的差值更小。The measured drain current in target data 38 is converted by target conversion module 44 to converted drain current X_Ids. The target conversion module 44 converts the drain current by means of derivation. The conversion process can be the derivation of the drain current with respect to the gate voltage, drain voltage, substrate voltage, or the size and temperature of the transistor. Taking the logarithm of the above derivative is also a conversion method. The loss function 42 calculates the total difference between the neural network output and its expected output, which is the transition drain current X_Ids. The weights applied to the deep neural network 50 are adjusted by some optimization algorithm, such as stochastic gradient descent, so that the total difference calculated by the loss function 42 becomes smaller and smaller. The difference between the transformed drain current and the neural network output, calculated by the loss function 42, may be smaller than the difference between the drain current and the neural network output.

训练数据34包括漏极至源极电压Vds、栅极至源极电压Vgs、衬底至源极电压Vbs、温度、晶体管沟道宽度W和长度L，以及在上述条件下通过测量得到的漏极至源极电流Ids。这些输入电压和条件由输入预处理模块40进行处理以生成预处理输入数据，并以此作为深度神经网络50的输入。Training data 34 includes drain-to-source voltage Vds, gate-to-source voltage Vgs, substrate-to-source voltage Vbs, temperature, transistor channel width W and length L, and drain measured under the above conditions to the source current Ids. These input voltages and conditions are processed by the input preprocessing module 40 to generate preprocessed input data as input to the deep neural network 50 .

输入预处理模块40可以对训练数据34进行多种预处理，如求输入电压的自然对数：ln(Vgs)，ln(Vgs)。输入预处理模块40可以在训练数据34上进行主成分分析(PCA)，以获得最能够影响经过转换的漏极电流的主变量。PCA能检测到哪个输入变量对漏极电流影响最大。PCA可以使用协方差矩阵的特征向量来减小变量维数。The input preprocessing module 40 can perform various preprocessing on the training data 34, such as finding the natural logarithm of the input voltage: ln(Vgs), ln(Vgs). The input preprocessing module 40 may perform a principal component analysis (PCA) on the training data 34 to obtain the principal variables that most affect the transformed drain current. PCA can detect which input variable has the most influence on the drain current. PCA can use the eigenvectors of the covariance matrix to reduce the variable dimension.

深度神经网络50可以是由工程师针对特定应用产生的一个神经网络、或针对特定应用而调整的一个通用神经网络。例如，可以针对特定应用调整神经网络中的中间或隐藏层的数量，可以针对某些应用或需要解决的问题例如半导体器件建模来调整节点中执行的运算类型和节点间连接方式，虽然通常用浅层神经网络建模漏极电流Ids，但深层神经网络50却包含有至少5层中间层的深层神经网络。有更多中间层的深层神经网络50，会允许更好地模拟二阶效应，如在半导体器件中的埋层和复杂器件中的三维电流流动。The deep neural network 50 may be a neural network generated by an engineer for a specific application, or a general neural network tuned for a specific application. For example, the number of intermediate or hidden layers in a neural network can be adjusted for a specific application, the type of operations performed in a node and how nodes are connected can be adjusted for certain applications or problems that need to be solved, such as semiconductor device modeling. The shallow neural network models the drain current Ids, but the deep neural network 50 contains a deep neural network with at least 5 intermediate layers. A deep neural network 50 with more intermediate layers would allow better modeling of second-order effects such as buried layers in semiconductor devices and three-dimensional current flow in complex devices.

深度神经网络50中使用的初始权重，可以被设置为初始范围如(-1.0至1.0)内的随机值。训练数据34由输入预处理模块40进行预处理，预处理后的数据在执行训练时输入到深度神经网络50，使得深度神经网络50的输出结果被评估。The initial weights used in the deep neural network 50 can be set to random values within an initial range such as (-1.0 to 1.0). The training data 34 is preprocessed by the input preprocessing module 40, and the preprocessed data is input to the deep neural network 50 when performing training, so that the output of the deep neural network 50 is evaluated.

评估深度神经网络50的输出结果的质量的一种方法是计算损失。损失函数42可以计算损失值，其可以衡量当前循环的输出有多接近预期结果。通过将单个输出差值(输出与期望值之间的差值)进行平方，并对所有输出的这些平方进行平均或求和，可以产生一个均方误差(MSE)。One way to assess the quality of the output of the deep neural network 50 is to calculate a loss. Loss function 42 can calculate a loss value, which can measure how close the output of the current loop is to the expected result. A mean squared error (MSE) is created by squaring a single output difference (the difference between the output and the expected value) and averaging or summing these squares over all outputs.

训练的目标是找到使网络输出(预测值)与拟合目标(数据)相同或接近的权重值。这个过程很复杂，所以不可能用数学方法来计算权重。但是计算机可以从数据中学习。经过预处理或目标转换之后，数据被分成输入和拟合目标。首先权重被设置为随机初始值。当一个输入向量呈现给神经网络时，该值通过神经网络逐层向前传播，直到它到达输出层。然后使用损失函数将神经网络的输出与拟合目标进行比较。单输出的损失函数是1/2|y-y’|²，其中y是神经网络输出，y'是拟合目标。n个输入数据的损失函数E是单个输入的损失的平均值：E＝1/2n*∑|y-y’|²。在确定n个输入的损失之后，可以使用优化算法来调整权重并使损失最小化。优化算法重复该两阶段周期，正向计算(传播)和权重更新。前向计算用于计算总损失。在第二阶段，优化方法如梯度下降用于更新权重以尝试减少损失。当总损失降低到可接受值以下时，停止这些循环。The goal of training is to find weight values that make the network output (predicted values) the same or close to the fit target (data). The process is complex, so it is impossible to calculate the weights mathematically. But computers can learn from data. After preprocessing or target transformation, the data is split into input and fitted targets. First the weights are set to random initial values. When an input vector is presented to the neural network, the value is propagated forward through the neural network layer by layer until it reaches the output layer. The output of the neural network is then compared to the fitted target using a loss function. The loss function for a single output is 1/2|y-y'| ² , where y is the neural network output and y' is the fitting target. The loss function E for n input data is the average of the losses for a single input: E=1/2n*∑|y-y'| ² . After determining the loss for n inputs, an optimization algorithm can be used to adjust the weights and minimize the loss. The optimization algorithm repeats this two-stage cycle, forward computation (propagation) and weight update. Forward computation is used to calculate the total loss. In the second stage, optimization methods such as gradient descent are used to update the weights in an attempt to reduce the loss. These cycles are stopped when the total loss falls below an acceptable value.

损失函数42产生的损失函数也可以包括复杂性损失，该复杂性损失包括一个权重衰减函数(在调整权重范围时防止过拟合)、和一个稀疏函数(用于改善深度神经网络50内结构和正则性)。复杂性损失可以防止模型过拟合，例如包括异常测量数据206、208(图3)，因为与排除了异常测量数据206、208的更平滑的模型相比，包含异常测量数据206、208的模型更加复杂。The loss function generated by the loss function 42 may also include a complexity loss including a weight decay function (to prevent overfitting when adjusting the weight range), and a sparse function (used to improve the structure and regularity). The complexity loss can prevent overfitting of the model, for example including abnormal measurement data 206 , 208 ( FIG. 3 ), because a model that includes abnormal measurement data 206 , 208 is compared to a smoother model that excludes abnormal measurement data 206 , 208 more complicated.

准确性损失和复杂性损失都可以生成作为损失函数42的一部分，损失函数42在训练过程中调整下一个训练周期的权重。更新的权重应用于深度神经网络50，输入预处理模块40预处理的训练数据34再次输入到深度神经网络50，深度神经网络50产生一组新的损失值，该损失值通过损失函数42比较输出结果与转换漏极电压产生。在多次循环中调整权重和重新计算损失值，直到达到预期的终点。然后，应用于深度神经网络50的最终权重可以用来构建最终器件模型。Both the accuracy loss and the complexity loss can be generated as part of the loss function 42, which adjusts the weights for the next training epoch during training. The updated weights are applied to the deep neural network 50, and the training data 34 preprocessed by the input preprocessing module 40 is input again to the deep neural network 50, and the deep neural network 50 generates a new set of loss values, which are compared to the output by the loss function 42 The result is generated with the converted drain voltage. Adjust the weights and recalculate the loss value in multiple loops until the desired end point is reached. The final weights applied to the deep neural network 50 can then be used to construct the final device model.

图7显示了对预处理输入的操作和利用转换后的拟合目标的损失函数来调整权重的一个深度神经网络。训练数据34包含来被测量的输入条件包括栅极、漏极、体电压、温度和晶体管尺寸等。输入预处理模块40处理训练数据34产生的预处理数据被输入到深神经网络50。这些预处理输入可以包括漏源电压的对数、栅源电压的对数、以及通过主成分分析(PCA)选择或组合的其他输入。Figure 7 shows a deep neural network that operates on preprocessed inputs and uses the transformed loss function to fit the target to adjust the weights. The training data 34 contains input conditions to be measured including gate, drain, bulk voltage, temperature, and transistor size, among others. The preprocessed data generated by the processing of the training data 34 by the input preprocessing module 40 is input to the deep neural network 50 . These preprocessing inputs may include the logarithm of the drain-source voltage, the logarithm of the gate-source voltage, and other inputs selected or combined by principal component analysis (PCA).

深度神经网络50使用由损失函数42通过优化算法调整过的一组权重，对来自输入预处理模块40的预处理后的输入数据进行操作。深度神经网络50可以正向操作，其中每组预处理输入由深度神经网络50内的每个节点(权重介于-1和1)操作，以产生一个或多个输出，这些输出由损失函数42分析以生成一组新权重。新权重组被反向传送到深度神经网络50，使得当深度神经网络50的输入保持固定时，其输出随着权重的改变而改变。The deep neural network 50 operates on the preprocessed input data from the input preprocessing module 40 using a set of weights adjusted by the loss function 42 through an optimization algorithm. The deep neural network 50 can operate forward, where each set of preprocessing inputs is operated on by each node (with a weight between -1 and 1) within the deep neural network 50 to produce one or more outputs, which are determined by the loss function 42 Analysis to generate a new set of weights. The new set of weights is passed back to the deep neural network 50 so that while the input of the deep neural network 50 remains fixed, its output changes as the weights change.

损失函数42并不将神经网络输出与测量的漏极电流进行比较。相反，目标数据38中测量的漏极电流被目标转换模块44转换为转换漏极电流。损失函数42比较转换漏极电流与神经网络输出。The loss function 42 does not compare the neural network output to the measured drain current. Instead, the drain current measured in target data 38 is converted by target conversion module 44 to converted drain current. The loss function 42 compares the converted drain current to the neural network output.

如图7所示，由目标转换器44产生的、并被损失函数42用来与深度神经网络50的输出进行比较的转换漏极电流，可以是漏极电流的导数。这些漏极电流的转换可以包括漏极电流关于漏源电压的导数d(Ids)/d(Vds)、漏极电流关于于栅源电压的导数d(Ids)/d(Vgs)、漏极电流关于衬源极电压的导数d(Ids)/d(Vbs)、漏极电流关于温度的导数d(Ids)/dT、漏极电流关于晶体管沟道长度的导数d(Ids)/dL、以及漏极电流关于晶体管沟道宽度的导数d(Ids)/dW。As shown in FIG. 7 , the converted drain current produced by the target converter 44 and used by the loss function 42 to be compared with the output of the deep neural network 50 may be a derivative of the drain current. The conversion of these drain currents may include the derivative of the drain current with respect to the drain-source voltage d(Ids)/d(Vds), the derivative of the drain current with respect to the gate-source voltage d(Ids)/d(Vgs), the drain current Derivative d(Ids)/d(Vbs) with respect to pad-source voltage, d(Ids)/dT of drain current with respect to temperature, d(Ids)/dL of drain current with respect to transistor channel length, and drain The derivative of the pole current with respect to the transistor channel width, d(Ids)/dW.

目标变换器44对漏极电流的转换，也可以是这些导数中的任一导数的对数。例如，漏极电流关于漏源电压的导数的自然对数ln[d(Ids)/d(Vds)]、漏极电流关于栅源电压的导数的对数ln[d(Ids)/d(Vgs)]等。The conversion of the drain current by the target inverter 44 may also be the logarithm of any of these derivatives. For example, the natural logarithm of the derivative of the drain current with respect to the drain-source voltage ln[d(Ids)/d(Vds)], the logarithm of the derivative of the drain current with respect to the gate-source voltage ln[d(Ids)/d(Vgs )]Wait.

使用有更多层(更深)的深度神经网络50允许更精确地建模多个特征，这些特征可能出现在亚10-nm晶体管和三维晶体管，如鳍式场效应晶体管(FinFET)和绝缘体上硅(SOI)。深度神经网络50提供一个框架为更复杂的未来半导体工艺建模。Using a deep neural network 50 with more layers (deeper) allows more accurate modeling of multiple features that may occur in sub-10-nm transistors and 3D transistors such as Fin Field Effect Transistors (FinFETs) and silicon-on-insulator (SOI). The deep neural network 50 provides a framework for modeling more complex future semiconductor processes.

图8显示了使用转换的漏极电流作为深度神经网络的拟合目标解决器件建模时的过度拟合问题。测量数据204、206、208由输入预处理模块40预处理，并输入到深度神经网络50中，神经网络的输出通过损失函数42与转换的漏极电流进行比较，来产生最拟合测量数据204、206、208的模型参数。建模电流302是测量数据204、206、208的最佳拟合模型。Figure 8 shows using the transformed drain current as a fitting target for a deep neural network to solve the overfitting problem in device modeling. The measurement data 204, 206, 208 are preprocessed by the input preprocessing module 40 and input into the deep neural network 50, the output of the neural network is compared to the transformed drain current through a loss function 42 to generate the best fit measurement data 204 , 206, 208 model parameters. The modeled current 302 is the best fit model of the measurement data 204 , 206 , 208 .

测量数据206、208是两个异常数据点，其可能是某种错误或者测量误差的结果。在测量数据206、208的数据点上的漏极电流值，与其他测量数据204的漏极电流没有显著差异。但是，采用漏极电流的导数，增加了由损失函数42产生的误差。虽然数据点的电流值没有太大的差异，但连接数据点的线的斜率在测量数据206、208(参见图3线202)的较差拟合点附近有急剧跳跃。因此，采用转换漏极电流作为拟合目标能扩大由损失函数42产生的误差。正则化方法可能对漏极电流的较小误差作用不明显，但对通过求导方法增大的拟合目标值作用明显。The measurement data 206, 208 are two outlier data points, which may be the result of some error or measurement error. The drain current values at the data points of the measurement data 206 , 208 are not significantly different from the drain currents of the other measurement data 204 . However, taking the derivative of the drain current increases the error due to the loss function 42 . Although the current values of the data points do not differ much, the slope of the line connecting the data points has a sharp jump near the point of poor fit of the measured data 206, 208 (see line 202 in Figure 3). Therefore, using the converted drain current as the fitting target can enlarge the error produced by the loss function 42 . The regularization method may not have a significant effect on the small error of the drain current, but has a significant effect on the fitting target value increased by the derivation method.

尽管有异常测量数据206、208，但仍得到了曲线平滑的模型302。在异常数据点周围没有负电导。而且，超出测量数据204的建模电流302是可靠的。可扩展性很好。Despite the abnormal measurement data 206, 208, a curve-smoothed model 302 is obtained. There is no negative conductance around outlier data points. Also, the modeled current 302 beyond the measurement data 204 is reliable. Extensibility is good.

图9A-9B显示了使用转换漏极电流作为拟合目标，使得深度神经网络能够更好地模拟亚阈值区，图9A。Figures 9A-9B show that using the transformed drain current as a fitting target enables deep neural networks to better model the subthreshold region, Figure 9A.

在图9A，绘制了测量的漏极电流与漏源电压关系的函数。训练数据214和测试数据212是测量数据点，训练数据214被预处理并输入到深度神经网络50以生成权重，而测试数据212被目标转换模块44转换，并用于测试神经网络权重的精度。In Figure 9A, the measured drain current is plotted as a function of drain-source voltage. Training data 214 and test data 212 are measurement data points, training data 214 is preprocessed and input to deep neural network 50 to generate weights, and test data 212 is transformed by target transformation module 44 and used to test the accuracy of the neural network weights.

模型308是针对不同的栅极电压Vg而产生的。对图9A中的较大电流，模型308的精度是很好的，对于图9B中的较小电流，模型308的精度也是很好的。模型308在原点(0,0)收敛。模型308适用于亚阈值区的电压和电流。Model 308 is generated for different gate voltages Vg. The accuracy of the model 308 is good for the larger currents in Figure 9A, and also for the smaller currents in Figure 9B. Model 308 converges at the origin (0,0). Model 308 applies to voltages and currents in the subthreshold region.

通过以转换漏极电流而不是漏极电流本身为目标，可以导出包含更宽范围输入电压的模型308。在亚阈值区的模型更加准确且在原点处收敛。By targeting the conversion of the drain current rather than the drain current itself, a model 308 that covers a wider range of input voltages can be derived. Models in the subthreshold region are more accurate and converge at the origin.

图10显示一种晶体管仿真器，其是基于预处理输入运算并以转换漏极电流为拟合目标的深神经网络模型和参数深度神经网络50的输出同转换后的漏极电流最匹配时得到一组权重值，可将这组最终权重应用于深度神经网络50。仿真输入数据54包括在电路设计过程中由设计工程师输入的电压、温度、以及晶体管宽度和长度。仿真输入数据54由输入预处理模块40操作来预处理输入数据，以获得预处理输入X1、X2、X3、X4、...。例如，栅极和漏极电压的对数可以由输入预处理器40获得，并输入到深度神经网络50。FIG. 10 shows a transistor simulator based on a deep neural network model based on preprocessing input operations and with the conversion drain current as the fitting target and parameters obtained when the output of the deep neural network 50 best matches the converted drain current. A set of weight values to which the final set of weights can be applied to the deep neural network 50 . Simulation input data 54 includes voltage, temperature, and transistor width and length input by the design engineer during the circuit design process. The simulation input data 54 is operated by the input preprocessing module 40 to preprocess the input data to obtain preprocessed inputs X1, X2, X3, X4, . . . For example, the logarithm of the gate and drain voltages may be obtained by the input preprocessor 40 and input to the deep neural network 50 .

深度神经网络50根据预处理输入和最终权重而产生转换漏极电流X_Ids。反向目标转换器58执行的操作是目标转换器44(图6)的反向转换。例如，当目标转换器44产生漏极电流关于栅极电压的导数时，反向目标转换器58可以将转换漏极电流在栅极电压上积分。The deep neural network 50 generates the converted drain current X_Ids according to the preprocessing input and the final weights. The operation performed by inverse target converter 58 is the inverse conversion of target converter 44 (FIG. 6). For example, when target converter 44 produces the derivative of the drain current with respect to the gate voltage, inverse target converter 58 may integrate the converted drain current over the gate voltage.

反向目标转换器58可以使用黎曼和、牛顿-科特斯公式、或使用线性插值或多项式插值来进行积分。当目标转换模块44执行漏极电压的导数的对数时，反向目标转换模块58可以产生转换漏极电流的指数函数，然后进行积分。反向目标转换模块58产生由目标转换模块44转换之前的漏极电流值。这是仿真器60所预测的仿真漏极电流。The inverse target converter 58 may use Riemann sums, Newton-Cortes formulas, or use linear or polynomial interpolation for integration. When the target conversion module 44 performs the logarithm of the derivative of the drain voltage, the inverse target conversion module 58 may generate an exponential function that converts the drain current and then integrate. Reverse target conversion module 58 generates the drain current value prior to conversion by target conversion module 44 . This is the simulated drain current predicted by simulator 60 .

在仿真期间，不需要损失函数42和目标转换模块44，因为最终权重保持恒定且不被调整。仿真器60可以由输入预处理器40、深度神经网络50和反向目标转换器58构成。深度神经网络50的尺寸可以减小，例如通过删除权重值为0的权重的节点。当在仿真器60内部使用时，深神经网络50可以是正向神经网络。在更大规模的电路仿真器中，SPICE、BSIM或其他器件模型可以由模拟器60替换。对于每个仿真中的晶体管，上述更大的电路仿真器将产生输入电压，查找晶体管W和L，读取模拟的指定温度，然后调用仿真器60来仿真该晶体管的漏极电流。During the simulation, the loss function 42 and target transformation module 44 are not needed because the final weights remain constant and are not adjusted. The simulator 60 may consist of an input preprocessor 40 , a deep neural network 50 and an inverse object transformer 58 . The size of the deep neural network 50 can be reduced, for example, by removing nodes with a weight of zero. When used inside the simulator 60, the deep neural network 50 may be a forward neural network. In larger scale circuit simulators, SPICE, BSIM or other device models may be replaced by simulator 60 . For each transistor in the simulation, the larger circuit simulator described above will generate the input voltage, look up transistors W and L, read the specified temperature for the simulation, and then invoke the simulator 60 to simulate the drain current of that transistor.

通过使用有目标转换模块44和输入预处理模块40的深度神经网络50，可以显著降低研发器件模型的时间和所需人力。可以从使用新工艺制造的测试芯片上的器件获取测量数据，并将这些测量数据用作训练数据34(Vgs，Vds，...)和拟合目标数据38(Ids)。深层神经网络50可以正向和反向操作以调整权重，直到损失函数42找到可接受的低损失或最小值。然后，可以将最终权重应用于深度神经网络50并与损失函数42和反向目标转换模块58一起对新工艺制造的晶体管进行仿真。By using a deep neural network 50 with a target transformation module 44 and an input preprocessing module 40, the time and labor required to develop a device model can be significantly reduced. Measurement data can be obtained from devices on test chips fabricated using the new process and used as training data 34 (Vgs, Vds, . . . ) and fit target data 38 (Ids). The deep neural network 50 may operate forward and backward to adjust the weights until the loss function 42 finds an acceptable low loss or minimum value. The final weights can then be applied to the deep neural network 50 and simulated along with the loss function 42 and the inverse target conversion module 58 for transistors fabricated by the new process.

【其它实施方式】[Other Embodiments]

发明人还想到若干其他实施例。例如，目标转换器44、输入预处理器40和反向目标转换器58可以共享相同的计算硬件，或各自可以有专用硬件。转换漏极电流可以是漏极电流的导数、这些导数的对数、电导g(ds)、跨导g(m)、电导或跨导的对数、或漏极电流的其它转换。可以对这些转换中的几个进行测试，以找到一个最佳转换来用作产生最低损失函数的目标。类似地，输入预处理器40可以以各种方式预处理一些或全部输入。对数可以是自然对数、或基数为10的对数、或使用其他基数。转换或预处理函数的各种组合也可以被替换。Several other embodiments are also contemplated by the inventors. For example, object converter 44, input preprocessor 40, and inverse object converter 58 may share the same computing hardware, or each may have dedicated hardware. Transforming the drain current may be the derivative of the drain current, the logarithm of these derivatives, the conductance g(ds), the transconductance g(m), the logarithm of the conductance or transconductance, or other transformations of the drain current. Several of these transformations can be tested to find the best one to use as the objective that yields the lowest loss function. Similarly, input preprocessor 40 may preprocess some or all of the input in various ways. The logarithm can be the natural logarithm, or base 10 logarithm, or use other bases. Various combinations of transformation or preprocessing functions can also be substituted.

一些实施例可以不使用全部组件。可以添加其他组件。损失函数42可以使用各种误差/损失生成器，诸如在多个训练优化循环中防止权重增长过大的权重衰减项，鼓励节点将其权重归零的稀疏性惩罚，使得只有一小部分节点被有效地使用。剩余的这小部分节点是最相关的。虽然在原理中已经描述了各种损失和成本函数，但许多替代、组合和变化是可能的。其他变化和类型的损失或成本项可以添加到损失函数42。不同成本函数的相对缩放因子的值可以调整以平衡各个函数的影响。Some embodiments may not use all components. Additional components can be added. The loss function 42 may use various error/loss generators, such as a weight decay term that prevents weights from growing too large over multiple training optimization loops, a sparsity penalty that encourages nodes to zero their weights so that only a small fraction of nodes are Use effectively. The remaining small subset of nodes are the most relevant. While various loss and cost functions have been described in the principles, many alternatives, combinations and variations are possible. Other variations and types of loss or cost terms can be added to the loss function 42 . The values of the relative scaling factors for the different cost functions can be adjusted to balance the effects of the individual functions.

浮点值可以被转换为定点值或二进制值。尽管已经显示了二进制值权重，但可以使用各种编码，如二进制补码、霍夫曼编码、截断二进制编码等。表示权重值所需的二进制比特数目可以是指比特的数目，而不管编码方法，无论是二进制编码、灰码编码、定点、偏移量等。Floating-point values can be converted to fixed-point or binary values. Although binary-valued weights have been shown, various encodings can be used, such as two's complement, Huffman encoding, truncated binary encoding, etc. The number of binary bits required to represent the weight value may refer to the number of bits regardless of the encoding method, whether it is binary encoding, gray code encoding, fixed point, offset, or the like.

权重可以被限制在一个范围值内。尽管已经描述了-1到1的范围，但范围不一定必须包括0，例如512到1的范围。权重值可以被偏移以适合一个二进制范围，如范围为10511到10000的权重，其可以存储为一个9比特二进制字，在该二进制字上添加10000的偏移量以产生实际权重值。在优化期间可以调整范围。偏移量可以被存储或者可以硬件连线到深度神经网络50的逻辑中。Weights can be restricted to a range of values. Although a range of -1 to 1 has been described, the range does not necessarily have to include 0, such as a range of 512 to 1. The weight value can be offset to fit a binary range, such as a weight in the range 10511 to 10000, which can be stored as a 9-bit binary word to which an offset of 10000 is added to produce the actual weight value. The range can be adjusted during optimization. The offsets may be stored or may be hardwired into the logic of the deep neural network 50 .

对于运行深度神经网络50的训练例程，多种变化是可能的。优化可以首先确定隐藏或中间层节点的数量，然后继续优化权重。权重可以通过将一些权重归零以切断节点之间的连接来确定节点的布置或连接性。当结构被优化时，稀疏成本可以用于优化的初始循环，但是当权重值被微调时，稀疏成本不适用于后期的优化循环。可以使用S型函数来训练深度神经网络50内的隐藏层。查找表可以用来实现较复杂的函数而不是使用算术逻辑单元(ALU)以加速处理。每个节点的激活函数可能不同，例如sigmoid、tanh和relu。Numerous variations are possible for the training routine for running deep neural network 50 . The optimization can first determine the number of hidden or intermediate layer nodes, and then proceed to optimize the weights. Weights The placement or connectivity of nodes can be determined by zeroing some weights to sever connections between nodes. When the structure is optimized, the sparse cost can be used in the initial loop of optimization, but when the weight values are fine-tuned, the sparse cost is not applicable in the later optimization loop. Hidden layers within deep neural network 50 may be trained using a sigmoid function. Lookup tables can be used to implement more complex functions instead of using an arithmetic logic unit (ALU) to speed up processing. The activation function may be different for each node, such as sigmoid, tanh, and relu.

针对对不同的应用和训练数据损失降低的过程可能不相同。各式各样的结构、不同的数量和隐藏层布置可以用于深度神经网络50。每个节点的激活函数可能不同，例如sigmoid、tanh和relu。针对特定的应用和半导体工艺的建模，适合的神经网络可能是一些特定的神经网络，特定结构的深度神经网络50或者以通用神经网络50作为起点进行优化，。深神经网络50可以具有至少7个中间层，并具有至少一万个权重。The process of loss reduction may be different for different applications and training data. A wide variety of structures, different numbers, and hidden layer arrangements can be used for deep neural network 50 . The activation function may be different for each node, such as sigmoid, tanh, and relu. For specific applications and modeling of semiconductor processes, suitable neural networks may be some specific neural networks, a deep neural network 50 with a specific structure, or a general neural network 50 as a starting point for optimization. The deep neural network 50 may have at least 7 intermediate layers and have at least ten thousand weights.

Autoencoders、automax和softmax分类器以及其他种类的层可以插入到神经网络中。整个优化过程可以重复多次，例如针对不同的初始条件，如不同比特数的量化浮点值或其他参数、不同精度、不同缩放因子等。可以为各种条件组合设置终点，如期望的最终精度、精度-硬件成本积、目标硬件成本等。Autoencoders, automax and softmax classifiers, and other kinds of layers can be plugged into neural networks. The entire optimization process can be repeated multiple times, for example, for different initial conditions, such as quantized floating-point values or other parameters with different numbers of bits, different precisions, different scaling factors, etc. Endpoints can be set for various combinations of conditions, such as desired final accuracy, accuracy-hardware cost product, target hardware cost, etc.

虽然深度神经网络50的实际成本取决于多种因素，例如节点数、权重、互连、控制和接口，但发明人将成本近似为与权重总和成比例。用于表示深度神经网络50中所有权重的二进制比特总数是硬件成本的一个度量，即使仅仅是一个近似值。可以使用硬件复杂度成本梯度的梯度或斜率。在比较之前或之后，可以缩放和改变梯度值。While the actual cost of a deep neural network 50 depends on a variety of factors, such as number of nodes, weights, interconnects, controls, and interfaces, the inventors approximated the cost as proportional to the sum of the weights. The total number of binary bits used to represent all the weights in the deep neural network 50 is a measure of hardware cost, even if only an approximation. The gradient or slope of the hardware complexity cost gradient can be used. The gradient values can be scaled and changed before or after the comparison.

IC半导体制造工艺可能会有多种变化。光掩模可以用各种特殊机器和工艺，包括直接书写以烧掉金属化层而不是光致抗蚀剂。扩散、氧化物生长、蚀刻、沉积、离子注入以及其他制造步骤的多种组合可以使得在由光掩模控制的IC上产生其结果图案。尽管已经描述了建模晶体管，以及特别地建模了漏电流，但可以建模其他电流，例如二极管电流、衬底泄漏电流等，并可以用于其他器件如电容器、电阻器等的建模。IC semiconductor manufacturing processes may vary in many ways. Photomasks can be used with various special machines and processes, including direct writing to burn off metallization rather than photoresist. Various combinations of diffusion, oxide growth, etching, deposition, ion implantation, and other fabrication steps can produce the resulting pattern on an IC controlled by a photomask. Although modeling transistors has been described, and leakage currents in particular, other currents can be modeled, such as diode currents, substrate leakage currents, etc., and can be used for modeling of other devices such as capacitors, resistors, and the like.

深度神经网络50、损失函数42、目标转换模块44、反向目标转换模块58和其他部件可以使用软件、硬件、固件、程序、模块、函数等各种组合以各种技术来实现。最终产品，有最终权重的深度神经网络50以及输入预处理模块40和反向目标转换模块58，可以在专用集成电路(ASIC)或其他硬件中实施，以在仿真器60用于仿真大电路时提高处理速度并降低功耗。Deep neural network 50, loss function 42, target conversion module 44, inverse target conversion module 58, and other components may be implemented in various techniques using various combinations of software, hardware, firmware, programs, modules, functions, and the like. The final product, the deep neural network 50 with final weights as well as the input preprocessing module 40 and the inverse target transformation module 58, can be implemented in an application specific integrated circuit (ASIC) or other hardware for when the simulator 60 is used to simulate large circuits Increase processing speed and reduce power consumption.

本发明的背景部分可以包含有关本发明问题或环境的背景信息，而不是由其他人描述现有技术。因此，在背景部分中包含材料并不是申请人对现有技术的承认。The Background of the Invention section may contain background information about the problem or environment of the invention, rather than describing the prior art by others. Accordingly, the inclusion of material in the Background section is not an admission by the applicant of prior art.

在此所述的任何方法或过程是机器实施的或计算机实施的，并且旨在由机器、计算机或其它装置执行，不是没有这种机器辅助的情况下仅由人执行。所生成的有形结果可以包括报告或者在显示器设备(诸如计算机监视器、投影装置、音频生成装置和相关媒体装置)上的其它机器生成的显示，并且可以包括也是机器生成的硬拷贝打印输出。计算机控制其它机器是另一个有形结果。Any method or process described herein is machine-implemented or computer-implemented and is intended to be performed by a machine, computer or other device, not only by a human without such machine assistance. The generated tangible results may include reports or other machine-generated displays on display devices, such as computer monitors, projection devices, audio-generating devices, and related media devices, and may include hard-copy printouts that are also machine-generated. Computer control of other machines is another tangible result.

所述任何优点和益处可能不适用于本发明的所有实施例。当在权利要求要素中陈述用语“装置”(means)时，申请人意图使权利要求要素落入35USC第112章第6段的规定。在用语“装置”之前的一个或多个用语，是旨在便于对权利要求要素的引用，并且不旨在传达结构限制。这种装置加功能的权利要求旨在不仅覆盖这里描述的用于执行功能及其结构等同物的结构，而且覆盖等效结构。例如，虽然钉子和螺钉具有不同的构造，但是它们是等同的结构，因为它们都执行紧固的功能。不使用“装置”一词的权利要求不属于35USC第112章第6段的规定。信号通常是电信号，但可以是光信号，如可以通过光纤线路传送的信号。Any of the advantages and benefits described may not apply to all embodiments of the invention. When reciting the term "means" in a claim element, applicants intend for the claim element to fall within the provisions of 35 USC Chapter 112, paragraph 6. The term or terms preceding the term "means" is intended to facilitate reference to claim elements and is not intended to convey structural limitations. Such means-plus-function claims are intended to cover not only the structures described herein as performing the function and their structural equivalents, but also equivalent structures. For example, although nails and screws have different configurations, they are equivalent structures because they both perform the function of fastening. Claims that do not use the word "device" do not fall under 35 USC Chapter 112, paragraph 6. The signal is usually an electrical signal, but can be an optical signal, such as a signal that can be carried over fiber optic lines.

为了说明和描述，以上已经呈现了本发明实施例的描述。其并不旨在穷举或将本发明限制为所公开的精确形式。鉴于上述教导，许多修改和变化是可能的。旨在本发明的范围不受该详细描述的限制，而是由所附的权利要求限制。The foregoing descriptions of embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. It is intended that the scope of the invention be limited not by this detailed description, but by the appended claims.

Claims

1. A semiconductor device modeling system, comprising:

a deep neural network comprising a plurality of nodes, each node scaling its input using a weight, the resulting node output is transmitted to other nodes in the plurality of nodes;

an input module for receiving training data, the training data representing the gate voltage and the drain voltage of the transistor, the training data further including the width and length of the transistor;

an input preprocessing module that receives the training data from the input module, the input preprocessing module converts the training data into preprocessed training data;

wherein the preprocessed training data is applied to the input of the deep neural network;

a target input module that receives target data representing the measured drain current of the transistor when the gate and drain voltages represented by the training data are applied to the transistor, and the a transistor having a transistor width and a transistor length represented by the training data;

a target conversion module that receives the training data from the target input module, the target conversion module converts target data representing drain current into a converted drain current value;

a loss function generation module that compares the transformed drain current generated by the target transformation module with the output generated by the deep neural network based on the preprocessed training data and using the weights, wherein the The loss function generation module minimizes the loss function value between the converted drain current value and the output generated by the deep neural network by adjusting the weight of the neural network;

wherein a plurality of the training data and a plurality of the target data are input and processed during training of the deep neural network to generate a plurality of weights and loss function values;

Wherein, after the training is completed, the last set of weights is selected, and the last set of weights generates the minimum loss function value;

wherein the last set of weights defines a device model of the transistor that is simulated using the deep neural network;

wherein the target conversion module converts the training data representing the drain current into a derivative of the drain current;

The derivative of the drain current is the transformed drain current value, and is also the training target of the deep neural network evaluated by the loss function generation module.

2. The semiconductor device modeling system of claim 1, wherein the derivative of the drain current is a derivative with respect to the gate voltage.

3. The semiconductor device modeling system of claim 1, wherein the derivative of the drain current is a derivative with respect to the drain voltage.

4. The semiconductor device modeling system of claim 1, wherein the derivative of the drain current is a derivative with respect to transistor size.

5. The semiconductor device modeling system of claim 1, wherein the training data further comprises temperature:

wherein the derivative of the drain current is a derivative with respect to the temperature.

6. The semiconductor device modeling system of claim 1, wherein the target conversion module converts training data representing drain current into a logarithm of a derivative of the drain current;

wherein the logarithm of the derivative of the drain current is the transformed drain current value and is also the training target of the deep neural network evaluated by the loss function generation module.

7. The semiconductor device modeling system of claim 1 , wherein during simulation, the last set of weights is applied to the deep neural network, and simulation training data representing simulated voltages is provided by the input preprocessing module performing preprocessing and then processing by the deep neural network to generate a simulation output by using the final weight values;

an inverse target conversion module that receives the simulation output from the deep neural network during simulation and generates a simulated value of drain current, the operation of the inverse target conversion module is the inverse conversion of the target conversion module operate;

Thus, simulated values of drain current are generated from the deep neural network using the last set of weights.

8. The semiconductor device modeling system of claim 7, wherein the inverse target conversion module includes an integration module that integrates the simulation output of the deep neural network over a range of voltages to generate The simulated value of the drain current.

9. The semiconductor device modeling system of claim 1, wherein the input preprocessing module generates a logarithm of gate voltage or a logarithm of drain voltage as preprocessing training data as input to the deep neural network ,

Thus, a logarithmic voltage input is applied to the deep neural network.

10. The semiconductor device modeling system of claim 1, wherein the input preprocessing module performs a principal component analysis (PCA) on the training data to select master data, wherein the master data is applied to a deep neural network input The preprocessed training data,

Thus, PCA is performed and then applied to the deep neural network.

11. A computer-implemented method for simulating an analog transistor, comprising:

receiving input data representing gate and drain voltages applied to the analog transistor by a circuit simulator, the input data further including a transistor width and a transistor length of the analog transistor;

preprocessing the input data to generate preprocessed input data by generating the logarithm of the gate voltage or the logarithm of the drain voltage;

applying the preprocessed input data as input to a deep neural network;

the deep neural network includes a set of weights and inputs the preprocessed input data to the deep neural network to produce a neural network output;

integrating the neural network output over a gate voltage or a drain voltage to generate a drain current value for the analog transistor;

outputting the drain current value to the circuit simulator, the circuit simulator using the drain current value to simulate operation of the transistor in a circuit;

wherein the deep neural network receives the logarithm of the voltage and generates a derivative of the drain current value during simulation.

12. The computer-implemented method of claim 11, wherein the deep neural network has at least seven intermediate layers, wherein the deep neural network has at least ten thousand weights.

13. The computer-implemented method of claim 11, wherein the neural network output is a conductance value.

14. The computer-implemented method of claim 11, wherein the neural network output is a transconductance value.

15. A non-transitory computer-readable medium for storing computer-executable instructions that, when executed by a processor, implement a method, the method comprising:

applying initial weights to connections between nodes in a neural network, weights specifying the strength of connections between nodes in said neural network;

receiving input data representing gate and drain voltages applied to a test transistor, the input data further including a transistor width and a transistor length of the test transistor;

preprocessing the input data to generate preprocessed input data;

executing a training routine to input the preprocessed input data to the neural network;

receiving target data representing a drain current measured on the test transistor when the gate voltage and the drain voltage are applied to the test transistor during testing;

converting the target data into conversion target data;

generating a loss function based on a comparison of the transformation target data with the neural network output when the preprocessed input data is applied to the neural network input by the training routine;

to generate updated weights by adjusting the initial weights using the loss function;

When the target endpoint has not been reached, applying the update weights to the neural network, performing another iteration, and updating the update weights;

When the target end point is reached, output the neural network and the update weight as a model of the test transistor;

wherein preprocessing the input data to generate the preprocessing input data further comprises: generating the logarithm of the gate voltage or generating the logarithm of the drain voltage;

where the logarithm of the voltage is applied as the input to the neural network.

16. The non-transitory computer-readable medium of claim 15, wherein converting the target data to conversion target data further comprises:

The derivative of the drain current is generated as the conversion target data.

17. A non-transitory computer-readable medium for storing computer-executable instructions that, when executed by a processor, implement a method, the method comprising:

preprocessing the input data to generate preprocessed input data;

converting the target data into conversion target data;

The converting the target data into conversion target data further includes:

The logarithm of the derivative of the drain current is generated as the conversion target data.

18. A non-transitory computer-readable medium for storing computer-executable instructions that, when executed by a processor, implement a method, the method comprising:

preprocessing the input data to generate preprocessed input data;

converting the target data into conversion target data;

The method further includes:

after said target endpoint has been reached;

applying update weights to the neural network;

receiving input data representing gate and drain voltages applied to an analog transistor, the input data further including a transistor width and a transistor length of the analog transistor;

preprocessing the input data to generate preprocessed input data;

inputting the preprocessed input data to the neural network and operating on the preprocessed input data using the neural network and the update weights to produce a neural network output;

The neural network output is integrated to generate an analog drain current for the analog transistor.