CN114359674A

CN114359674A - A Non-Intrusive Load Identification Method Based on Metric Learning

Info

Publication number: CN114359674A
Application number: CN202210032592.XA
Authority: CN
Inventors: 于淼; 王丙楠; 陆玲霞; 赵强; 包哲静; 程卫东; 魏萍
Original assignee: Zhejiang University ZJU; Holley Technology Co Ltd
Current assignee: Zhejiang University ZJU; Holley Technology Co Ltd
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-04-15
Anticipated expiration: 2042-01-12
Also published as: CN114359674B

Abstract

The invention provides a non-invasive load identification method based on metric learning. The method can realize effective identification of unknown load and has strong generalization capability; on the other hand, metric learning, which is one of the methods for small sample learning, can reduce the dependence on training samples, and has high practicability.

Description

A Non-Intrusive Load Identification Method Based on Metric Learning

技术领域technical field

本发明涉及非侵入式负荷监测(non-intrusive load monitoring,NILM)领域，特别是涉及一种基于度量学习的非侵入式负荷识别方法。The present invention relates to the field of non-intrusive load monitoring (NILM), in particular to a non-intrusive load identification method based on metric learning.

背景技术Background technique

随着社会飞速发展，对能源的需求不断增加，而电能作为主要的二次能源，是主要的能源利用方式之一。如何提升电能使用效率，实现智能用电引起广泛关注，掌握用户侧详细用电信息显得尤为重要。传统的侵入式负荷监测需要在每个用电负荷安装采集和通信装置以检测负荷状态，需要对现有电器或线路进行改造，实施困难且成本较高。而非侵入式负荷监测技术通过对总线进行监测，从而分析线路中各负荷的状态，具有通用性强、成本低等优势，成为近年来的研究热点。With the rapid development of society, the demand for energy continues to increase, and electric energy, as the main secondary energy, is one of the main energy utilization methods. How to improve the efficiency of power use and realize intelligent power consumption has attracted widespread attention, and it is particularly important to master the detailed power consumption information on the user side. The traditional intrusive load monitoring requires the installation of acquisition and communication devices in each electrical load to detect the load status, and requires the modification of existing electrical appliances or lines, which is difficult and costly to implement. The non-intrusive load monitoring technology analyzes the status of each load in the line by monitoring the bus. It has the advantages of strong versatility and low cost, and has become a research hotspot in recent years.

现有非侵入式负荷识别研究大多基于最优化和模式识别方法，但这些方法均存在一定的缺陷。首先，这些模型大多依赖大量标签样本进行模型训练，但在实际应用中往往无法获得足够的标签数据或获取成本较高；其次，这些模型通常假设场景中所有负荷已知，场景中新增新负荷后原有模型无法对新负荷进行识别，甚至会影响原有负荷的识别效果；最后，这些模型泛化性能较差，更换场景后往往需要重新训练模型，针对每种场景单独训练和建立模型将导致维护复杂度和成本大大增加。Most of the existing non-invasive load identification research is based on optimization and pattern recognition methods, but these methods all have certain defects. First, most of these models rely on a large number of labeled samples for model training, but in practical applications they often cannot obtain enough labeled data or the acquisition cost is high; second, these models usually assume that all loads in the scene are known, and new loads are added to the scene. Afterwards, the original model cannot recognize the new load, and even affects the recognition effect of the original load; finally, these models have poor generalization performance, and it is often necessary to retrain the model after changing the scene. The maintenance complexity and cost are greatly increased.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对现有技术问题，提出一种基于度量学习的非侵入式负荷识别方法，以负荷电流作为特征，通过卷积神经网络将电流特征映射到新的度量空间内，并在网络训练时采用三元组损失作为网络损失实现聚类，在进行识别时只需对度量空间内负荷特征向量进行距离度量。该方法可实现对未知负荷的有效识别，并具有较强的泛化能力；另一方面，度量学习作为小样本学习的方法之一，能够减轻对训练样本的依赖，具有较高的实用性。The purpose of the present invention is to propose a non-intrusive load identification method based on metric learning in view of the problems of the prior art. The load current is used as a feature, and the current feature is mapped into a new metric space through a convolutional neural network. During training, the triplet loss is used as the network loss to achieve clustering, and the distance measurement is only required for the load feature vector in the metric space during identification. This method can realize effective identification of unknown loads and has strong generalization ability; on the other hand, metric learning, as one of the methods of small sample learning, can reduce the dependence on training samples and has high practicability.

本发明的技术方案如下：The technical scheme of the present invention is as follows:

一种基于度量学习的非侵入式负荷识别方法，包括如下步骤：A non-intrusive load identification method based on metric learning, comprising the following steps:

步骤1，使用事件检测算法对电力总线功率数据进行实时监测，定位到负荷投切事件发生时间，并将负荷投切事件发生时间前后的稳态数据做差，分离出待识别负荷投切的电压电流以及功率数据；Step 1, use the event detection algorithm to monitor the power data of the power bus in real time, locate the time when the load switching event occurs, and make a difference between the steady-state data before and after the load switching event, and separate the voltage of the load switching to be identified. Current and power data;

步骤2，对电流数据在电压正向过零点处开始一个完整周期的信息采集，并对采集的电流信息进行重采样和归一化处理；Step 2, start a complete cycle of information collection for the current data at the voltage positive zero-crossing point, and perform resampling and normalization processing on the collected current information;

步骤3，使用训练好的特征提取网络对电流进行特征提取，得到待识别负荷投切事件的特征向量信息；Step 3, using the trained feature extraction network to perform feature extraction on the current to obtain feature vector information of the load switching event to be identified;

步骤4，检查特征库是否存在，若不存在则建立特征库，存储待识别负荷投切事件的特征向量信息以及功率数据并设置对应的特征编号；Step 4, check whether the feature library exists, if not, establish a feature library, store the feature vector information and power data of the load switching event to be identified, and set the corresponding feature number;

若存在则对特征库中每个样本的特征向量与待识别负荷的特征向量一一进行相似度计算，对其中满足特征向量相似度条件的样本，则进一步计算样本的功率与待识别负荷的功率的相似度，若满足功率相似度条件，则待识别负荷投切事件与特征库中该样本属于同一负荷投切事件，将特征库中该样本的特征编号作为待识别负荷投切事件的特征编号；若全部不满足特征向量相似度条件或功率相似度条件，则待识别负荷投切事件为新事件，将待识别负荷投切事件的特征向量信息以及功率数据存储至特征库并设置对应的特征编号。If it exists, calculate the similarity between the eigenvector of each sample in the feature library and the eigenvector of the load to be identified one by one, and further calculate the power of the sample and the power of the load to be identified for the samples that satisfy the condition of the similarity of the eigenvectors. If the power similarity condition is met, the load switching event to be identified and the sample in the feature library belong to the same load switching event, and the feature number of the sample in the feature library is used as the feature number of the load switching event to be identified. ; If all of the eigenvector similarity conditions or power similarity conditions are not met, the load switching event to be identified is a new event, and the eigenvector information and power data of the load switching event to be identified are stored in the feature library and the corresponding features are set Numbering.

步骤5，将获得的特征编号映射为实际的负荷投切事件，所述负荷投切事件包括负荷类别及负荷的状态转换信息。Step 5: Map the obtained feature number to an actual load switching event, where the load switching event includes load category and state transition information of the load.

其中，新加入特征库的样本设置的特征编号通过如下方法与负荷投切事件建立映射：Among them, the feature number of the sample set newly added to the feature library is mapped with the load switching event by the following method:

当有新加入特征库的样本时，通知用户，用户根据历史投切记录结合当日实际使用情况，判断新增编号的实际负荷类别及负荷的状态转换信息。When there is a new sample added to the signature database, the user is notified, and the user can judge the actual load category of the newly added number and the state transition information of the load according to the historical switching records combined with the actual usage of the day.

历史投切记录可以用于各类型负荷历史工作信息的可视化展示等。The historical switching records can be used for the visual display of historical work information of various types of loads, etc.

进一步地，所述特征提取网络由依次连接的多个一维卷积层、激活函数层、多个残差模块、全局平均池化层和线性层以及L2正则化层组成。Further, the feature extraction network consists of a plurality of one-dimensional convolutional layers, an activation function layer, a plurality of residual modules, a global average pooling layer and a linear layer, and an L2 regularization layer, which are connected in sequence.

进一步地，所述特征提取网络训练时使用三元组损失作为网络损失。Further, the triplet loss is used as the network loss during the training of the feature extraction network.

进一步地，所述特征提取网络训练时，使用的训练样本将同种电器的不同工作状态视为单独的设备。Further, during the training of the feature extraction network, the used training samples regard different working states of the same kind of electrical appliances as separate devices.

进一步地，所述步骤4中，采用余弦相似度计算方法计算特征库中每个样本的特征向量与待识别负荷的特征向量的相似度，采用差值计算方法计算样本的功率与待识别负荷的功率的相似度。Further, in the step 4, the cosine similarity calculation method is used to calculate the similarity between the eigenvector of each sample in the feature library and the eigenvector of the load to be identified, and the difference calculation method is used to calculate the power of the sample and the load to be identified. Similarity of power.

进一步地，在特征编号映射与实际的负荷投切事件的映射中，对单状态负荷与多状态负荷做不同处理，具体如下：Further, in the mapping between the feature number mapping and the actual load switching event, different processing is performed for the single-state load and the multi-state load, as follows:

对于单状态负荷，只有开/关两种状态，其状态转换较为简单，只需要对该状态转换过程的特征进行标注即可；对多状态负荷，由于负荷各个状态之间可能发生转换，在进行标注时除了要标注负荷类型信息外，还需要对负荷状态的改变信息进行标注，只有掌握各状态之间转换时的特征信息，才能够对多状态负荷转换信息及当前所处状态信息进行准确识别。For a single-state load, there are only two states of on/off, and its state transition is relatively simple, and only the characteristics of the state transition process need to be marked; In addition to the load type information when labeling, it is also necessary to label the change information of the load state. Only by mastering the characteristic information of the transition between the states can the multi-state load transition information and the current state information be accurately identified. .

本发明的有益效果是：The beneficial effects of the present invention are:

首先度量学习作为小样本学习方法之一，能够降低非侵入式负荷识别模型对训练样本数量的要求，经过少量样本训练即可得到较好的识别性能；其次，通过与特征库匹配的方式实现负荷识别，并自动将未知负荷添加到特征库中进行识别和分类，等待后期用户标注设备信息；最后，度量学习模型具有较强的泛化性能，迁移到新场景中时无需重新训练或建立模型，大大减小迁移成本，具有更高的实用价值。同时本发明方法在新加入特征库的样本特征编号与负荷投切事件建立映射的过程中，还标注了当前电器/负荷的状态，可以获得更多更准确的用户侧详细用电信息。First, metric learning, as one of the small-sample learning methods, can reduce the requirement of the non-intrusive load recognition model on the number of training samples, and can obtain better recognition performance after training with a small number of samples; secondly, the load is achieved by matching with the feature library. Identify, and automatically add unknown loads to the feature library for identification and classification, and wait for later users to label device information; finally, the metric learning model has strong generalization performance, and there is no need to retrain or build a model when migrating to a new scene. It greatly reduces the migration cost and has higher practical value. At the same time, the method of the present invention also marks the current state of electrical appliances/loads in the process of establishing the mapping between the sample feature numbers newly added to the feature database and the load switching events, so that more and more accurate user-side detailed power consumption information can be obtained.

附图说明Description of drawings

图1为电流重采样和归一化处理过程展示。Figure 1 shows the current resampling and normalization process.

图2为电流特征提取网络结构示意图。FIG. 2 is a schematic diagram of the structure of the current feature extraction network.

图3为三元组损失计算示意图。Figure 3 is a schematic diagram of triple loss calculation.

图4为本发明整体识别流程图。FIG. 4 is a flow chart of the overall identification of the present invention.

图5为多状态负荷状态切换示意图。FIG. 5 is a schematic diagram of multi-state load state switching.

图6为模型在PLAID中house6训练，在COOLL部分电器测试的识别结果展示(混淆矩阵)。Figure 6 shows the recognition results (confusion matrix) of the model trained in house6 in PLAID and part of the electrical test in COOLL.

具体实施方式Detailed ways

为了验证本发明的特点和效果，下面结合WHITED数据集、PLAID数据集和COOLL数据集对本发明做进一步说明。In order to verify the features and effects of the present invention, the present invention will be further described below in conjunction with the WHITED data set, the PLAID data set and the COOLL data set.

本发明提供了一种基于度量学习的非侵入式负荷识别方法，通过训练好的特征提取网络获取特征向量信息，并对比待识别负荷投切事件与特征库中样本的特征向量信息及功率信息的相似度，结合相似度判断待识别负荷投切事件是否为新事件并确定非新事件的待识别负荷投切事件的事件类型。本实施方式中先构建和训练特征提取网络，再结合训练好的特征提取网络对本发明方法作详细说明。The invention provides a non-intrusive load identification method based on metric learning, which obtains feature vector information through a trained feature extraction network, and compares the load switching event to be identified with the feature vector information and power information of samples in a feature library. The similarity is combined with the similarity to determine whether the load switching event to be identified is a new event and to determine the event type of the load switching event to be identified that is not a new event. In this embodiment, a feature extraction network is constructed and trained first, and then the method of the present invention is described in detail in combination with the trained feature extraction network.

构建和训练特征提取网络：Build and train a feature extraction network:

(1)训练集构造：(1) Training set construction:

本发明的训练集可以采用现有的非侵入式负荷识别领域的公开数据集的样本，以PLAID数据集中house6的样本为例，训练集构造方法如下：The training set of the present invention can adopt the samples of the existing public data sets in the field of non-invasive load identification. Taking the sample of house6 in the PLAID data set as an example, the training set construction method is as follows:

house6中包括荧光灯、空调、冰箱、风扇、吹风机、笔记本电脑6种电器，其中，空调、冰箱各有3个和2个工作状态，在模型训练时，将不同工作状态视为独立的负荷类型进行处理。House6 includes 6 kinds of electrical appliances: fluorescent lamps, air conditioners, refrigerators, fans, hair dryers, and laptop computers. Among them, air conditioners and refrigerators have 3 and 2 working states respectively. During model training, different working states are regarded as independent load types. deal with.

从每种负荷类型样本中采集训练样本时，每种负荷类型测试用例数不尽相同，为了保证样本平衡，当样本数量不足时，采用过采样技术(synthetic minorityoversampling technique，SMOTE)对其进行扩充，主要过程如下：When collecting training samples from samples of each load type, the number of test cases for each load type is different. In order to ensure the balance of samples, when the number of samples is insufficient, the oversampling technique (synthetic minority oversampling technique, SMOTE) is used to expand them. The main process is as follows:

1)在同类样本中随机选取两个样本x₀和x₁；1) randomly select two samples x ₀ and x ₁ in the same sample;

2)新样本按照下式得到：2) The new sample is obtained according to the following formula:

x_new＝x₀+rand(0,1)(x₁-x₀)x _new = x ₀ +rand(0,1)(x ₁ -x ₀ )

Rand(0,1)表示随机函数，产生0或1的随机数。Rand(0,1) represents a random function that generates a random number of 0 or 1.

3)重复步骤直到获取指定数量的样本。3) Repeat steps until the specified number of samples are obtained.

对获取到的每个样本，对电流信息进行预处理，将其重采样到指定频率并归一化后作为网络输入。详细步骤为：For each sample obtained, the current information is preprocessed, resampled to the specified frequency and normalized as the network input. The detailed steps are:

a为了得到电流相位信息，在电压正向过零点时开始电流采集；a In order to obtain the current phase information, the current acquisition is started when the voltage is zero-crossing in the forward direction;

b记录一个完整周期的电流信息；b record the current information of a complete cycle;

c对电流信息进行重采样处理，使用线性插值方法得到指定采样频率的电流信息，具体计算过程如下：c Perform resampling processing on the current information, and use the linear interpolation method to obtain the current information of the specified sampling frequency. The specific calculation process is as follows:

i'_n＝(ceil(loc)-loc)·i_floor(loc)+(loc-floor(loc))·i_ceil(loc) i' _n =(ceil(loc)-loc)·i _floor(loc) +(loc-floor(loc))·i _ceil(loc)

式中，N₀和N₁分别为原数据和采样后数据每周期点数，Ts₀和Ts₁分别为采样前后的采样时间，n为采样后第n个数据点，loc为第n个数据点对应原始序列的位置，ceil和floor分别为向上和向下取整函数，i_ceil(loc)和i_floor(loc)分别为原始序列对应ceil(loc)和floor(loc)位置的第n个数据点的电流值，i'_n为采用线性插值计算得到采样后的第n个数据点的电流值。In the formula, N ₀ and N ₁ are the number of points per cycle of the original data and the data after sampling, respectively, Ts ₀ and Ts ₁ are the sampling time before and after sampling, n is the nth data point after sampling, and loc is the nth data point. Corresponding to the position of the original sequence, ceil and floor are round up and down functions respectively, i _ceil(loc) and i _floor(loc) are the nth data of the original sequence corresponding to the position of ceil(loc) and floor(loc) respectively The current value of the point, i' _n is the current value of the nth data point after sampling obtained by linear interpolation.

d对采样后电流进行标准化，将电流幅值缩放到[-1,1]之间，即对电流进行以下操作：d Standardize the current after sampling, and scale the current amplitude to be between [-1, 1], that is, perform the following operations on the current:

式中，I’为归一化之后电流，I＝[i’₁,i’₂,…,i’_N1,]为采样后的电流序列。max(abs(I))表示对采样后的电流序列取绝对值后的最大值。In the formula, I' is the current after normalization, and I=[i' ₁ , i' ₂ ,...,i' _N1 ,] is the current sequence after sampling. max(abs(I)) represents the maximum value after taking the absolute value of the sampled current sequence.

(2)构建特征提取网络模型：本发明中的特征提取网络可以为任意的特征提取网络，本实施中参考ResNet模型进行构建，是基于一维卷积残差块构造的度量网络模型，由依次连接的多个一维卷积层、激活函数层、多个残差模块、全局平均池化层、线性层和L2正则化层组成(未在图中显示)。图2所示为一示例性结构示意图，该结构首先使用64个大小为7的一维卷积核对电流进行卷积运算，并选择步长为2进行下采样，然后依次经过三个残差模块进行下采样和特征提取，最终经过全局平均池化层后进行维度展开，经过线性层后得到基于负荷电流提取的特征向量，为消除在距离计算时特征向量模值的影响，利用L2正则化层将网络输出的特征向量单位化，使其落到单位超球表面，这时欧氏距离和余弦距离可以视为等效。为了加快收敛速度，卷积层和残差模块中在每次卷积之后都加入了BN(BatchNormalization)层，对特征值进行标准化。在激活函数选择方面，网络输入为[-1,1]的电流采样值，故选择Tanh函数作为激活函数。(2) Building a feature extraction network model: the feature extraction network in the present invention can be any feature extraction network. In this implementation, reference is made to the ResNet model for construction, which is a metric network model constructed based on a one-dimensional convolution residual block. Concatenated multiple 1D convolutional layers, activation function layers, multiple residual modules, global average pooling layers, linear layers, and L2 regularization layers (not shown in the figure). Fig. 2 is a schematic diagram of an exemplary structure. The structure first uses 64 one-dimensional convolution kernels of size 7 to perform convolution operation on the current, and selects a step size of 2 for downsampling, and then passes through three residual modules in turn. Downsampling and feature extraction are performed, and finally the dimension is expanded after the global average pooling layer. After the linear layer, the feature vector extracted based on the load current is obtained. In order to eliminate the influence of the eigenvector modulus value in the distance calculation, the L2 regularization layer is used. The eigenvector output by the network is normalized so that it falls on the surface of the unit hypersphere. At this time, the Euclidean distance and the cosine distance can be regarded as equivalent. In order to speed up the convergence, a BN (BatchNormalization) layer is added to the convolutional layer and residual module after each convolution to standardize the eigenvalues. In terms of activation function selection, the network input is the current sampling value of [-1,1], so the Tanh function is selected as the activation function.

在网络训练时使用三元组损失函数，如果将每对样本都输入网络进行训练，其空间复杂度为O(N³)，并且随着训练过程的进行，损失为0的三元组数量将越来越多，网络损失均值被拉低，使得网络训练变慢直至陷入局部最优。为解决此问题，在每次训练构造样本时需要对三元组进行筛选，选择一定数量损失不为0的三元组作为训练样本组如图3所示，加速网络训练过程，避免陷入局部最优。The triplet loss function is used during network training. If each pair of samples is fed into the network for training, its space complexity is O(N ³ ), and as the training process progresses, the number of triples with a loss of 0 will be More and more, the mean network loss is pulled down, making the network training slower until it falls into a local optimum. In order to solve this problem, it is necessary to filter the triples when constructing samples for each training, and select a certain number of triples whose loss is not 0 as the training sample group, as shown in Figure 3, to speed up the network training process and avoid falling into the local optimum. excellent.

具体地，采用三元组损失作为网络训练的损失函数，三元组损失如下：Specifically, the triplet loss is used as the loss function for network training, and the triplet loss is as follows:

L＝max(d(a,p)-d(a,n)+margin,0)L=max(d(a,p)-d(a,n)+margin,0)

式中，三元组为<a,p,n>，a被称为锚点(anchor)，p为正实例(positive)，与a所属同一个类别，n则为反例(negative)，和a不同类。d为距离度量函数，在训练时使用的是欧氏距离(由于已进行归一化，单位超球上距离越远，两向量夹角越大，与余弦距离具有一定等效性)，margin为常数，其目的是为了防止正样本和负样本都非常接近锚点样本时损失为0。In the formula, the triplet is <a,p,n>, a is called the anchor, p is a positive instance (positive), which belongs to the same category as a, n is a negative example (negative), and a different classes. d is the distance metric function, and the Euclidean distance is used during training (due to normalization, the farther the distance on the unit hypersphere is, the larger the angle between the two vectors is, which is equivalent to the cosine distance), and the margin is A constant whose purpose is to prevent the loss from being 0 when both positive and negative samples are very close to the anchor samples.

训练完成后获得训练好的特征提取网络，利用训练好的特征提取网络实现本发明的基于度量学习的非侵入式负荷识别方法，如图4所示，包括以下步骤：After the training is completed, a trained feature extraction network is obtained, and the trained feature extraction network is utilized to realize the non-intrusive load identification method based on metric learning of the present invention, as shown in Figure 4, comprising the following steps:

步骤1，实时使用事件检测算法对电力总线功率数据进行监测，定位到负荷投切事件发生时间，并将负荷投切事件发生时间前后的稳态数据做差，分离出待识别负荷投切事件的电压电流以及功率数据；Step 1, use the event detection algorithm to monitor the power bus power data in real time, locate the occurrence time of the load switching event, and make a difference between the steady-state data before and after the occurrence time of the load switching event, and isolate the load switching event to be identified. Voltage, current and power data;

步骤2，对电压电流数据在电压正向过零点处开始一个完整周期的电流信息采集，并对采集的电流信息进行重采样和归一化处理；Step 2, starting a complete cycle of current information collection for the voltage and current data at the zero-crossing point of the forward voltage, and re-sampling and normalizing the collected current information;

若存在则对特征库中每个样本的特征向量与待识别负荷的特征向量一一进行相似度计算，对其中满足特征向量相似度条件的样本，则进一步计算样本的功率与待识别负荷的功率的相似度，若满足功率相似度条件，则待识别负荷投切事件与特征库中该样本属于同一负荷投切事件，将特征库中该样本的特征编号作为待识别负荷投切事件的特征编号；若全部不满足特征向量相似度条件或功率相似度条件，则待识别负荷投切事件为新事件，待识别负荷投切事件的特征向量信息以及功率数据存储至特征库并设置对应的特征编号。If it exists, calculate the similarity between the eigenvector of each sample in the feature library and the eigenvector of the load to be identified one by one, and further calculate the power of the sample and the power of the load to be identified for the samples that satisfy the condition of the similarity of the eigenvectors. If the power similarity condition is met, the load switching event to be identified and the sample in the feature library belong to the same load switching event, and the feature number of the sample in the feature library is used as the feature number of the load switching event to be identified. ; If all of them do not meet the feature vector similarity condition or power similarity condition, the load switching event to be identified is a new event, and the feature vector information and power data of the load switching event to be identified are stored in the feature library and the corresponding feature number is set .

所述相似度计算结果用于判断相似度，可以采用余弦相似度、差值计算、欧式距离等计算方法，优选地，本实施方式采用余弦相似度计算方法计算特征库中每个样本的特征向量与待识别负荷的特征向量的相似度，具体如下：The similarity calculation result is used to judge the similarity, and calculation methods such as cosine similarity, difference calculation, and Euclidean distance can be used. Preferably, the present embodiment adopts the cosine similarity calculation method to calculate the feature vector of each sample in the feature library. The similarity with the feature vector of the load to be identified is as follows:

式中，x和y表示两个特征向量，n表示特征向量长度。其值越接近于1，则代表两特征越相似，在实际中，可以根据情况合理设置相似度条件。In the formula, x and y represent two eigenvectors, and n represents the length of the eigenvectors. The closer the value is to 1, the more similar the two features are. In practice, the similarity condition can be set reasonably according to the situation.

采用差值计算方法计算样本的功率与待识别负荷的功率的相似度，具体如下：The difference calculation method is used to calculate the similarity between the power of the sample and the power of the load to be identified, as follows:

相对差值＝(P_max-P_min)/P_min Relative difference = (P _max -P _min )/P _min

其中P_max和P_min分别为待识别负荷投切事件的功率和特征库中样本的功率两者中的大值和小值。Among them, P _max and P _min are the large value and the small value of the power of the load switching event to be identified and the power of the samples in the feature library, respectively.

在实际中，可以根据情况合理设置相似度条件，也可针对不同功率段设定不同阈值，以提高识别精度。In practice, similarity conditions can be reasonably set according to the situation, and different thresholds can also be set for different power segments to improve the recognition accuracy.

当有新加入特征库的样本时，通知用户，用户根据历史投切记录结合当日实际使用情况，判断新增特征编号的实际负荷类别及负荷的状态转换信息。其中，对于单状态负荷，只有开/关两种状态，其状态转换较为简单，只需要对该状态转换过程的特征进行标注即可；对多状态负荷，如图5所示，由于负荷各个状态之间可能发生转换，不同的状态具有不同的特征信息，在进行标注时除了要标注负荷类型信息外，还可以对负荷状态的改变信息进行标注，掌握各状态之间转换时的特征信息，能够对多状态负荷转换信息及当前所处状态信息进行准确识别，从而获得更多更准确的用户侧详细用电信息。When there is a new sample added to the feature database, the user is notified, and the user can judge the actual load category of the newly added feature number and the state transition information of the load according to the historical switching records combined with the actual usage of the day. Among them, for a single-state load, there are only two states of on/off, and the state transition is relatively simple, and only the characteristics of the state transition process need to be marked; for a multi-state load, as shown in Figure 5, due to the various states of the load There may be transitions between them, and different states have different characteristic information. In addition to the load type information, the change information of the load state can also be annotated, and the characteristic information of the transition between the states can be grasped. Accurately identify the multi-state load conversion information and the current state information, so as to obtain more and more accurate detailed power consumption information on the user side.

为了说明本发明中方法的优越性，构造两个实施例进行说明。To illustrate the advantages of the method of the present invention, two examples are constructed for illustration.

两个实施例中，在模型参数选择方面，提取的电流采样频率为每周期128个点，度量网络提取的特征维度为16，在进行余弦相似度判别时，设置相似阈值为0.8，当两特征余弦相似度大于0.8时，则认为两特征相似，否则不相似，同理，在进行功率匹配时，设置功率相对差值阈值为0.2。In the two embodiments, in terms of model parameter selection, the extracted current sampling frequency is 128 points per cycle, the feature dimension extracted by the metric network is 16, and the similarity threshold is set to 0.8 when the cosine similarity is judged. When the cosine similarity is greater than 0.8, the two features are considered to be similar, otherwise they are not similar. Similarly, when performing power matching, set the power relative difference threshold to 0.2.

在测试结果衡量指标方面，选择F₁分数和准确度(Acc)常用指标进行衡量，其计算如下：In terms of test result measurement indicators, the commonly used indicators of F ₁ score and accuracy (Acc) are selected for measurement, which are calculated as follows:

式中，TP、FP、TN、FN分别指真正例、假正例、真负例、假负例数量，precision为准确率，recall为召回率。In the formula, TP, FP, TN, and FN refer to the number of true cases, false positive cases, true negative cases, and false negative cases, respectively, precision is the accuracy rate, and recall is the recall rate.

实施例1：Example 1:

为了说明本发明的通用识别能力，使用80％WHITED数据集样本作为训练集训练特征提取网络，在其余样本中进行测试，与常见V-I方法：文献[1](De Baets L,Dhaene T,Deschrijver D,et al.VI-Based Appliance Classification Using Aggregated PowerConsumption Data[C]//2018 IEEE International Conference on Smart Computing(SMARTCOMP).Taormina:IEEE,2018:179–186.)、文献[2](汪颖,杨维,肖先勇,张姝.基于U-I轨迹曲线精细化识别的非侵入式居民负荷监测方法[J].电网技术,2021,45(10):4104-4113.)和文献[3](De Baets L,Ruyssinck J,Develder C,et al.ApplianceClassification Using VI Trajectories and Convolutional Neural Networks[J].Energy and Buildings,2018,158:32–36.)进行对比，使用F₁分数作为性能指标，得到结果如表1所示。In order to illustrate the general recognition ability of the present invention, 80% of the WHITED data set samples are used as the training set to train the feature extraction network, and the rest of the samples are tested, which is similar to the common VI method: Reference [1] (De Baets L, Dhaene T, Deschrijver D , et al.VI-Based Appliance Classification Using Aggregated PowerConsumption Data[C]//2018 IEEE International Conference on Smart Computing (SMARTCOMP). Taormina: IEEE, 2018: 179–186.), literature [2] (Wang Ying, Yang Wei, Xiao Xianyong, Zhang Shu. Non-intrusive residential load monitoring method based on UI trajectory curve refined identification [J]. Power Grid Technology, 2021, 45(10): 4104-4113.) and literature [3] (De Baets L , Ruyssinck J, Develder C, et al.ApplianceClassification Using VI Trajectories and Convolutional Neural Networks[J].Energy and Buildings,2018,158:32–36.) for comparison, using the F ₁ score as the performance indicator, the results are shown in the table 1 shown.

表1 通用识别能力对比Table 1 Comparison of general recognition capabilities

可以看出，本发明通过使用电流数据、功率数据结合一维卷积神经网络即可获得比现有的V-I方法(包括V-I轨迹、功率两阶段识别)具有更好的识别能力，本发明方法具有更简单的网络结构、计算复杂度更小。It can be seen that the present invention can obtain better identification ability than the existing V-I method (including V-I trajectory and power two-stage identification) by using current data and power data in combination with a one-dimensional convolutional neural network. The method of the present invention has Simpler network structure and less computational complexity.

实施例2：Example 2:

为了验证本发明方法的通用性以及小样本学习能力，仅使用PLAID数据集中house6的样本作为训练集，选择部分COOLL数据集中电器进行测试。In order to verify the generality and small sample learning ability of the method of the present invention, only the samples of house6 in the PLAID data set are used as the training set, and some electrical appliances in the COOLL data set are selected for testing.

在COOLL数据集中选取的电器有空调、调制解调器、充电器、旅行充电器、钻孔机、风扇1、风扇2、烙铁、吸尘器9种，对每种电器采集20组信息作为测试样本，得到识别结果混淆矩阵如附图6所示。The electrical appliances selected in the COOLL data set are air conditioners, modems, chargers, travel chargers, drilling machines, fan 1, fan 2, soldering iron, and 9 types of vacuum cleaners. 20 sets of information are collected for each electrical appliance as test samples, and the identification results are obtained. The confusion matrix is shown in Figure 6.

由于信号噪声或电器稳态工作时电流波动等原因，使得部分电器被识别为多个编号，如混淆矩阵中标记所示，电器1和电器5分别识别出2个编号，即类内波形差异超过模型所设阈值，某些样本被识别为新设备。但整体上识别效果较为理想，各个电器识别准确率如表2所示，被识别为多个编号的电器可通过在将加入特征库的样本设置的特征编号到真实负荷投切事件的映射标注过程中，将这些编号映射到同一类型电器来解决，本发明方法具有很好的鲁棒性。Due to signal noise or current fluctuations during steady-state operation of electrical appliances, some electrical appliances are identified as multiple numbers. As shown in the confusion matrix, electrical appliances 1 and 5 are respectively identified with two numbers, that is, the intra-class waveform difference exceeds Thresholds set by the model, some samples are identified as new devices. However, the overall recognition effect is ideal. The recognition accuracy of each electrical appliance is shown in Table 2. Electrical appliances that are identified as multiple numbers can be marked by mapping the feature numbers set in the samples added to the feature library to the actual load switching event. , mapping these numbers to the same type of electrical appliances to solve, the method of the present invention has good robustness.

表2 跨数据集泛化能力测试结果Table 2 Cross-dataset generalization ability test results

综上，从实施例中可以看出，与其他基于V-I轨迹的方法相比，本发明在通用识别性能方面更优，更重要的是，本发明在跨数据集的泛化能力和小样本学习测试中表现依然出色，只需经过有限样本训练(只使用了PLAID数据集中house6中负荷作为训练集)，并且在后期可将多个识别编号映射为同一个电器来进一步提升识别效果，具有较高的实用价值，同时本发明方法在新加入特征库的样本特征编号与负荷投切事件建立映射的过程中，还标注了当前电器/负荷的状态，可以获得更多更准确的用户侧详细用电信息。To sum up, it can be seen from the examples that, compared with other methods based on V-I trajectory, the present invention is better in general recognition performance, and more importantly, the present invention has the ability to generalize across datasets and small sample learning. The performance is still excellent in the test. It only needs to be trained with limited samples (only the load in house6 in the PLAID data set is used as the training set), and in the later stage, multiple identification numbers can be mapped to the same electrical appliance to further improve the identification effect. At the same time, the method of the present invention also marks the status of the current electrical appliance/load in the process of establishing the mapping between the sample feature number newly added to the feature database and the load switching event, so that more and more accurate detailed power consumption on the user side can be obtained. information.

显然，上述实施例仅仅是为清楚地说明所作的举例，而并非对实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其他不同形式的变化或变动。这里无需也无法把所有的实施方式予以穷举。而由此所引申出的显而易见的变化或变动仍处于本发明的保护范围。Obviously, the above-mentioned embodiments are only examples for clear description, and are not intended to limit the implementation manner. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. All implementations need not and cannot be exhaustive here. However, the obvious changes or changes derived from this are still within the protection scope of the present invention.

Claims

1. A non-intrusive load identification method based on metric learning is characterized by comprising the following steps:

step 1, monitoring power bus power data in real time by using an event detection algorithm, positioning the occurrence time of a load switching event, and differentiating steady-state data before and after the occurrence time of the load switching event to separate out voltage and current and power data of switching of a load to be identified;

step 2, starting information acquisition of a complete period of the current data at the voltage positive zero crossing point, and performing resampling and normalization processing on the acquired current information;

step 3, using the trained feature extraction network to extract the features of the current to obtain the feature vector information of the load switching event to be identified;

step 4, checking whether the feature library exists or not, if not, establishing the feature library, storing feature vector information and power data of the load switching event to be identified and setting a corresponding feature number;

if the load switching event is the same as the sample in the feature library, the feature number of the sample in the feature library is used as the feature number of the load switching event to be identified; and if all the characteristic vector similarity conditions or the power similarity conditions are not met, the load switching event to be identified is a new event, the characteristic vector information and the power data of the load switching event to be identified are stored in a characteristic library, and corresponding characteristic numbers are set.

And 5, mapping the obtained feature numbers into actual load switching events, wherein the load switching events comprise load types and load state conversion information.

The feature number of the sample setting newly added into the feature library is mapped with the load switching event by the following method:

and when a sample newly added into the feature library exists, informing a user, and judging the actual load type of the newly added number and the state conversion information of the load by the user according to the historical switching record and the current-day actual use condition.

2. The non-invasive load identification method based on metric learning as claimed in claim 1, wherein the feature extraction network is composed of a plurality of one-dimensional convolution layers, an activation function layer, a plurality of residual modules, a global average pooling layer and a linear layer, and an L2 regularization layer, which are connected in sequence.

3. The non-intrusive load identification method based on metric learning as claimed in claim 1, wherein the feature extraction network training uses triplet loss as network loss.

4. The non-intrusive load identification method based on metric learning as claimed in claim 1, wherein in the feature extraction network training, training samples used in the training process regard different working states of the same electrical appliance as separate devices.

5. The non-invasive load identification method based on metric learning as claimed in claim 1, wherein in the step 4, the cosine similarity calculation method is used to calculate the similarity between the feature vector of each sample in the feature library and the feature vector of the load to be identified, and the difference calculation method is used to calculate the similarity between the power of the sample and the power of the load to be identified.