CN118836523A

CN118836523A - Central air conditioner purifying method and system based on deep reinforcement learning

Info

Publication number: CN118836523A
Application number: CN202410957025.4A
Authority: CN
Inventors: 陈玉潘; 潘涛
Original assignee: Individual
Current assignee: Individual
Priority date: 2024-07-17
Filing date: 2024-07-17
Publication date: 2024-10-25

Abstract

The present invention relates to the field of energy management technology, specifically a central air conditioning purification method and system based on deep reinforcement learning. The specific implementation method is: obtaining indoor and outdoor environmental parameters and ASHRAE data sets, and preprocessing the obtained data; using the ASHRAE data set to train a feedforward neural network, and predicting the thermal comfort of individuals in the indoor environment through the trained neural network; accurately identifying and counting the distribution of people in different areas of the building through the population density detection method, and providing key crowd density information; defining the MDP model, including state space, action space, reward function, transfer function and discount factor; through the PRE-DDPG algorithm, the system dynamically adjusts the air conditioning operation parameters according to the real-time collected environmental data and personnel needs, so as to purify the indoor air quality and improve the comfort to the greatest extent, while effectively saving energy consumption, in line with the modern building management concept of energy conservation and emission reduction.

Description

A central air conditioning purification method and system based on deep reinforcement learning

技术领域Technical Field

本发明涉及能源管理技术领域，具体为一种基于深度强化学习的中央空调净化方法及系统。The present invention relates to the technical field of energy management, and specifically to a central air-conditioning purification method and system based on deep reinforcement learning.

背景技术Background Art

随着全球城市化进程的加快和建筑能耗问题的日益突出，建筑行业对于提升能源利用效率和改善室内环境质量的需求越来越迫切。中央空调系统作为建筑能耗的重要组成部分，在维持室内舒适度和提供空气净化功能方面面临诸多挑战。With the acceleration of global urbanization and the increasing prominence of building energy consumption, the construction industry has an increasingly urgent need to improve energy efficiency and indoor environmental quality. As an important component of building energy consumption, central air-conditioning systems face many challenges in maintaining indoor comfort and providing air purification functions.

传统的中央空调系统往往采用固定的控制策略和预设的运行模式，无法灵活应对不同季节、不同天气条件和变化频繁的室内空气质量需求，导致能耗浪费和空气净化效果不佳的问题。目前市场上已经出现了一些基于模型控制的空调优化方法及系统，Ambroziak根据当前运行的执行器信息，自动优化空气处理装置的比例积分微分控制设置，以提高其能源效率和无故障运行时间(Ambroziak A.The PID controller optimisation moduleusing fuzzy self-tuning PSO for air handling unit in continuousoperation.Engineering Applications of Artificial Intelligence,2023,117:105485)。Xie采用PID控制风机速度，并利用模糊推理来调整PID参数，根据性能指标设计的目标函数通过遗传算法对模糊规则进行优化，以提高终端风机的传热效率和节能效果(XieR.GA Optimized fuzzy PID control with modified smith predictor for HVACterminal fan system,IEEE,2022:1098-1104)。Traditional central air-conditioning systems often adopt fixed control strategies and preset operation modes, which cannot flexibly respond to different seasons, different weather conditions and frequently changing indoor air quality requirements, resulting in energy waste and poor air purification effects. At present, some air-conditioning optimization methods and systems based on model control have appeared on the market. Ambroziak automatically optimizes the proportional integral differential control settings of the air handling unit according to the current operating actuator information to improve its energy efficiency and trouble-free operation time (Ambroziak A. The PID controller optimisation module using fuzzy self-tuning PSO for air handling unit in continuous operation. Engineering Applications of Artificial Intelligence, 2023, 117: 105485). Xie used PID to control the fan speed and used fuzzy reasoning to adjust the PID parameters. The objective function designed according to the performance index was optimized through genetic algorithm to improve the heat transfer efficiency and energy saving effect of the terminal fan (Xie R.GA Optimized fuzzy PID control with modified smith predictor for HVACterminal fan system, IEEE, 2022: 1098-1104).

然而，随着科技的快速进步，中央空调系统面临日益增加的复杂性和动态性，因此迫切需要引入更智能、自适应的控制手段，推动建筑节能领域不断更新。Zeng提出了一个自适应的MPC架构，定期从数据中重新学习建筑动态和未测量的内部热负荷，并对原始的非凸问题进行凸近似，以在降低能耗的同时保持室内温度(Zeng T.An Adaptive ModelPredictive Control Scheme for Energy-Efficient Control of Building HVACSystems.Journal of Engineering for Sustainable Buildings and Cities,2021,2(3):1-10)。Tang建立冷水机组的模型，采用MPC方法，在预测部分采用一阶指数平滑法处理模型简化和不准确造成的预测误差，从而提升最终的控制效果(Tang R.Model predictivecontrol for thermal energy storage and thermal comfort optimizationofbuilding demandresponse in smart grids.Applied Energy,2019,242:873-882)。尽管MPC控制效果可能更好，但是在实践中构建一个简化的且足够准确的建筑模型并不容易。这是因为室内环境受到多种因素影响，如建筑结构、建筑布局、建筑内部热量和室外环境等。当模型无法准确描述建筑热动力学，并存在较大偏差时，控制性能可能会偏离预期。However, with the rapid advancement of science and technology, central air-conditioning systems are facing increasing complexity and dynamics. Therefore, it is urgent to introduce more intelligent and adaptive control methods to promote continuous updates in the field of building energy conservation. Zeng proposed an adaptive MPC architecture that regularly relearns building dynamics and unmeasured internal heat loads from data, and convexly approximates the original non-convex problem to maintain indoor temperature while reducing energy consumption (Zeng T. An Adaptive Model Predictive Control Scheme for Energy-Efficient Control of Building HVAC Systems. Journal of Engineering for Sustainable Buildings and Cities, 2021, 2 (3): 1-10). Tang established a model of the chiller, using the MPC method, and used the first-order exponential smoothing method in the prediction part to deal with the prediction errors caused by model simplification and inaccuracy, thereby improving the final control effect (Tang R. Model predictive control for thermal energy storage and thermal comfort optimization of building demand response in smart grids. Applied Energy, 2019, 242: 873-882). Although MPC control may be more effective, it is not easy to construct a simplified and sufficiently accurate building model in practice. This is because the indoor environment is affected by many factors, such as building structure, building layout, internal building heat, and outdoor environment. When the model cannot accurately describe the building thermal dynamics and there is a large deviation, the control performance may deviate from expectations.

目前的中央空调空气净化系统及方法缺乏灵活性，控制精确度不高。为了降低建筑能耗，根据室内环境特性来提高室内空气质量和室内舒适度，提高灵活性和精确度，是解决现有相关技术中存在的不足与问题的关键之一。为此，提出一种基于深度强化学习的中央空调净化系统及方法。The current central air conditioning air purification system and method lack flexibility and have low control accuracy. In order to reduce building energy consumption, improve indoor air quality and indoor comfort according to indoor environmental characteristics, improving flexibility and accuracy is one of the keys to solving the deficiencies and problems in existing related technologies. To this end, a central air conditioning purification system and method based on deep reinforcement learning is proposed.

发明内容Summary of the invention

本发明的目的在于提供一种基于深度强化学习的中央空调净化系统及方法，通过前馈神经网络精确预测室内环境中个体的热舒适度，从而实现对中央空调精细化的控制和优化；其次，通过人口密度计算公式，能够准确地识别和统计出建筑内不同区域的人员分布情况，还能够提供关键的人流密度信息；最后，提出了PRE-DDPG算法，系统能够根据实时收集的环境数据和人员需求动态调整空调运行参数，以最大程度地提升室内空气质量和舒适性，同时有效降低建筑能耗，符合节能减排的现代化建筑管理理念。The purpose of the present invention is to provide a central air-conditioning purification system and method based on deep reinforcement learning, which accurately predicts the thermal comfort of individuals in the indoor environment through a feedforward neural network, thereby realizing refined control and optimization of the central air-conditioning; secondly, through the population density calculation formula, it can accurately identify and count the distribution of people in different areas of the building, and can also provide key crowd density information; finally, a PRE-DDPG algorithm is proposed, and the system can dynamically adjust the air-conditioning operating parameters according to the real-time collected environmental data and personnel needs, so as to maximize the indoor air quality and comfort, while effectively reducing the building energy consumption, in line with the modern building management concept of energy conservation and emission reduction.

为实现上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

一种基于深度强化学习的中央空调净化方法，其包括：A central air conditioning purification method based on deep reinforcement learning, comprising:

步骤S1、数据收集与整理，获取室内外环境参数和ASHRAE数据集，对所述室内外环境参数和ASHRAE数据集进行数据清洗、缺失值处理和数据归一化，将所述室内外环境参数和ASHRAE数据集分成训练集和测试集；Step S1, data collection and collation, obtaining indoor and outdoor environmental parameters and ASHRAE data sets, performing data cleaning, missing value processing and data normalization on the indoor and outdoor environmental parameters and ASHRAE data sets, and dividing the indoor and outdoor environmental parameters and ASHRAE data sets into a training set and a test set;

步骤S2、前馈神经网络预测PMV值，使用所述ASHRAE数据集的训练集离线训练所述前馈神经网络；通过迭代更新和多次训练，将所述ASHRAE数据集的测试集导入优化后的PRE-DDPG模型，得到预测准确的前馈神经网络模型；将所述室内外环境参数中的温度、湿度和风速输入至所述前馈神经网络，实时预测所述PMV值；Step S2, predicting the PMV value by a feedforward neural network, using the training set of the ASHRAE data set to train the feedforward neural network offline; importing the test set of the ASHRAE data set into the optimized PRE-DDPG model through iterative updating and multiple trainings to obtain an accurate feedforward neural network model; inputting the temperature, humidity and wind speed in the indoor and outdoor environmental parameters into the feedforward neural network to predict the PMV value in real time;

步骤S3、人口密度检测方法，通过人口密度检测公式计算和识别建筑内不同区域的人员分布情况，提供关键的人流密度信息；Step S3, a population density detection method, which calculates and identifies the distribution of people in different areas of the building through a population density detection formula, and provides key crowd density information;

步骤S4、设计MDP模型，包括状态空间、动作空间、奖励函数、转移函数和折扣因子，将所述PMV值输入到所述状态空间；Step S4, designing an MDP model, including a state space, an action space, a reward function, a transfer function and a discount factor, and inputting the PMV value into the state space;

步骤S5、运行PRE-DDPG算法，在所述MDP模型上利用所述人口密度检测方法和所述室内外环境参数的训练集，通过迭代更新和多次训练，动态调整中央空调运行参数，得到所述优化后的PRE-DDPG模型；导入所述室内外环境参数的测试集，得到建筑室内中央空调净化的最优调控策略。Step S5, running the PRE-DDPG algorithm, using the population density detection method and the training set of the indoor and outdoor environmental parameters on the MDP model, dynamically adjusting the central air-conditioning operating parameters through iterative updates and multiple trainings, and obtaining the optimized PRE-DDPG model; importing the test set of the indoor and outdoor environmental parameters to obtain the optimal control strategy for indoor central air-conditioning purification of the building.

进一步，所述步骤S1包括：Further, the step S1 comprises:

通过安装在建筑内的传感器，实时采集所述室内外环境参数，包括不同区域温度、湿度、CO₂浓度、空气供给速率、人数和时间，从所述ASHRAE数据集中获取包含环境参数和舒适度指标的数据；The indoor and outdoor environmental parameters are collected in real time through sensors installed in the building, including temperature, humidity, _CO2 concentration, air supply rate, number of people and time in different areas, and data including environmental parameters and comfort indicators are obtained from the ASHRAE data set;

进行所述数据清洗，使用均值填补法处理缺失值，确定数据集中缺失哪些值，对于每个有所述缺失值的特征，计算该特征的非缺失值的均值，并将所述缺失值替换为所述特征的均值；Perform the data cleaning, use the mean imputation method to process missing values, determine which values are missing in the data set, for each feature with the missing values, calculate the mean of the non-missing values of the feature, and replace the missing values with the mean of the feature;

使用三倍标准差法检测并处理数据中的异常值，如果所述特征的值超出所述均值三倍范围，则将其标记为异常值，并将所述异常替换为所述特征的均值；Use the triple standard deviation method to detect and process outliers in the data. If the value of the feature exceeds three times the mean, it is marked as an outlier and the anomaly is replaced with the mean of the feature.

将所有数据进行归一化，所述归一化公式为：All data are normalized, and the normalization formula is:

其中X是原始数据，X_norm是归一化后的值，X_min和X_max分别是特征的最小值和最大值。Where X is the original data, X _norm is the normalized value, X _min and X _max are the minimum and maximum values of the feature respectively.

进一步，所述步骤S2包括：Further, the step S2 comprises:

确定输入层的维度，即网络接收的特征数量，包括温度、湿度和风速；Determine the dimensionality of the input layer, i.e. the number of features the network receives, including temperature, humidity, and wind speed;

决定隐藏层的数量和每层的神经元数量；Determine the number of hidden layers and the number of neurons in each layer;

确定输出层的结构，回归任务只需要一个神经元来输出预测值；Determine the structure of the output layer. The regression task only requires one neuron to output the predicted value.

选择隐藏层激活函数为ReLU；Select the hidden layer activation function as ReLU;

选择损失函数为均方误差，用于衡量预测值与真实值之间的差距，选择Adam优化器用于优化网络的权重和偏置；Select the mean square error as the loss function to measure the gap between the predicted value and the true value, and select the Adam optimizer to optimize the weights and biases of the network;

使用所述ASHRAE数据中的训练集来训练所述前馈神经网络，根据性能调整模型的超参数，包括学习率、批大小和隐藏层的神经元数量；Using the training set in the ASHRAE data to train the feedforward neural network, adjusting the hyperparameters of the model according to the performance, including the learning rate, batch size, and the number of neurons in the hidden layer;

监控所述前馈神经网络的性能，检验所述前馈神经网络是否过拟合和欠拟合；Monitoring the performance of the feedforward neural network to check whether the feedforward neural network is overfitting or underfitting;

若发生所述过拟合，则减少神经网络的层数和神经元数量，若发生所述欠拟合，则增加训练时长，即增加训练轮数，若未发生所述过拟合和欠拟合，则不需要进一步调整；If the overfitting occurs, the number of layers and neurons of the neural network is reduced; if the underfitting occurs, the training time is increased, that is, the number of training rounds is increased; if the overfitting and underfitting do not occur, no further adjustment is required;

载入所述训练好的前馈神经网络，输入所述ASHRAE数据中的训练集，计算预测所述PMV值的准确率；Loading the trained feedforward neural network, inputting the training set in the ASHRAE data, and calculating the accuracy of predicting the PMV value;

将所述室内外环境参数中的温度、湿度和风速输入至所述前馈神经网络，实时预测所述PMV值。The temperature, humidity and wind speed among the indoor and outdoor environmental parameters are input into the feedforward neural network to predict the PMV value in real time.

进一步，所述步骤S3包括：Further, the step S3 comprises:

每间隔Δt时刻，获取所述室内外环境中的室内与室外CO₂浓度，并计算室内CO₂浓度的增加量ΔC，表示为：At each interval Δt, the indoor and outdoor _CO2 concentrations in the indoor and outdoor environment are obtained, and the increase in indoor _CO2 concentration ΔC is calculated, which is expressed as:

ΔC＝C-C₀；ΔC＝CC ₀ ;

其中，所述室内CO₂浓度为C，所述室外CO₂浓度为C₀；Wherein, the indoor CO ₂ concentration is C, and the outdoor CO ₂ concentration is C ₀ ;

所述CO₂浓度增加量ΔC，也表示为：The _CO2 concentration increase ΔC is also expressed as:

其中，I是人均CO₂产生量，K是人口密度，V是室内体积；Among them, I is the per capita CO ₂ production, K is the population density, and V is the indoor volume;

通过以上公式推出，人口密度公式K为：According to the above formula, the population density formula K is:

当所述人口密度超出最高人口阈值M时，发出警告。When the population density exceeds a maximum population threshold M, a warning is issued.

进一步，所述步骤S4中，设计强化学习MDP模型，所述MDP模型建模过程是根据目标问题，即净化室内空气质量和提升舒适性，同时有效节约能源消耗，将读取连续的状态变量和动作变量，使智能体通过不断与环境进行交互，利用不同状态下采取对应动作得到奖励值，经过多次迭代掌握怎么得到最高奖励的方案，表示为：Furthermore, in step S4, a reinforcement learning MDP model is designed. The modeling process of the MDP model is based on the target problem, that is, purifying indoor air quality and improving comfort, while effectively saving energy consumption. The continuous state variables and action variables are read, so that the intelligent agent can continuously interact with the environment and take corresponding actions in different states to obtain reward values. After multiple iterations, the solution of how to obtain the highest reward is mastered, which is expressed as:

其中，Φ为MDP模型，S为所述状态空间，A为智能体可采取的所述动作空间，P为所述转移函数，R为所述奖励函数，γ为所述折扣因子。Among them, Φ is the MDP model, S is the state space, A is the action space that the agent can take, P is the transfer function, R is the reward function, and γ is the discount factor.

进一步，所述状态空间表示为：Further, the state space is expressed as:

其中，S_t是t时刻状态空间，Temp_out,t是t时刻室外温度，是t时刻z区域室内温度，Hum_out,t是t时刻室外湿度，是t时刻z区域室内湿度，C_out,t是t时刻的CO₂浓度，是t时刻z区域的CO₂浓度，PMV_t ^z是t时刻热舒适预测值，是t时刻人数。Among them, S _t is the state space at time t, Temp _out,t is the outdoor temperature at time t, is the indoor temperature of area z at time t, Hum _out,t is the outdoor humidity at time t, is the indoor humidity in area z at time t, C _out,t is the CO ₂ concentration at time t, is the CO ₂ concentration in area z at time t, PMV _t ^z is the predicted value of thermal comfort at time t, is the number of people at time t.

进一步，所述动作空间表示为：Further, the action space is expressed as:

其中，A_t为t时刻动作空间，表示t时刻z区域的空气供给速率，每个区域的设定点是一个连续变量。Among them, A _t is the action space at time t, represents the air supply rate of zone z at time t, and the set point of each zone is a continuous variable.

进一步，所述奖励函数表示为：Furthermore, the reward function is expressed as:

其中，R_t为t时刻奖励函数，为t时刻z区域热舒适度奖励，R_energy,t为t时刻能耗奖励，为t时刻z区域空气质量奖励，a、b和d为权衡因子，n为区域总量；Among them, R _t is the reward function at time t, is the thermal comfort reward of area z at time t, R _energy,t is the energy consumption reward at time t, is the air quality reward for area z at time t, a, b and d are the weighting factors, and n is the total amount of the area;

所述热舒适度奖励表示为：The thermal comfort reward is expressed as:

其中，为t时刻z区域的热舒适度奖励，PMV_t ^z为t时刻z区域的PMV值，PMV_penalty为超出阈值M时的惩罚项，为t时刻z区域的人口密度，当检测到室内人口密度高于所述阈值M且高于最高舒适度阈值c时，则奖励为当前PMV值的绝对值和额外的惩罚项，当室内为其他情况时，奖励为当前PMV值的绝对值；in, is the thermal comfort reward of area z at time t, PMV _t ^z is the PMV value of area z at time t, and PMV _penalty is the penalty item when the threshold M is exceeded. is the population density of area z at time t. When it is detected that the indoor population density is higher than the threshold M and higher than the maximum comfort threshold c, the reward is the absolute value of the current PMV value and the additional penalty term. When the indoor situation is other, the reward is the absolute value of the current PMV value.

所述空气质量奖励表示为：The air quality reward is expressed as:

其中，为t时刻z区域的空气质量奖励，为t时刻z区域的CO₂浓度，C_high为CO₂浓度最高阈值，C_penalty为超出所述阈值C_high时的额外惩罚项，当检测到室内CO₂浓度超出固定标准时，则奖励为C_penalty。in, is the air quality reward for area z at time t, is the CO ₂ concentration in area z at time t, C _high is the highest threshold of CO ₂ concentration, C _penalty is the additional penalty item when the threshold C _high is exceeded, and when it is detected that the indoor CO ₂ concentration exceeds the fixed standard, the reward is C _penalty .

进一步，所述步骤S5中，PRE-DDPG算法运行，包括：Further, in step S5, the PRE-DDPG algorithm runs, including:

步骤S5-1、智能体观察到所述室内环境状态s_t；Step S5-1, the agent observes the indoor environment state s _t ;

步骤S5-2、智能体根据所述室内环境状态预测当前各个区域室内的PMV值；Step S5-2, the agent predicts the current PMV value of each area indoor according to the indoor environmental state;

步骤S5-3、智能体计算当前各个区域的人口密度 Step S5-3: The agent calculates the current population density of each area

步骤S5-4、智能体根据当前策略和探索噪声选择各个区域的控制动作即t时刻z区域的中央空调系统中的空气供给速率；Step S5-4: The agent selects control actions for each area based on the current strategy and exploration noise That is, the air supply rate in the central air conditioning system in area z at time t;

步骤S5-5、智能体在环境中执行所述动作计算所述奖励函数r_t，并依靠PRE-DDPG模型观察到下一状态s_t′；Step S5-5: The agent performs the action in the environment Calculate the reward function r _t and observe the next state s _t ′ based on the PRE-DDPG model;

步骤S5-6、将存储到经验回放池D，并令s_t＝s_t′；Step S5-6: Store it in the experience replay pool D, and let s _t = s _t ′;

步骤S5-7、如果经验回放池D已满，从D中随机采样由N个transition组成的小批量，其中j为样本号；Step S5-7: If the experience replay pool D is full, randomly sample N transitions from D. A small batch consisting of, where j is the sample number;

步骤S5-8、通过最小化损失函数更新Critic网络，使用策略梯度更新Actor网络；Step S5-8, update the Critic network by minimizing the loss function, and update the Actor network using the policy gradient;

步骤S5-9、使用软更新来更新目标Actor网络和目标Critic网络；Step S5-9, using soft update to update the target Actor network and the target Critic network;

步骤S5-10、若所述PRE-DDPG学习过程效果不好，可结合模型运行情况不断调整网络参数和折扣因子，使整个学习效果实现更好地收敛；Step S5-10: If the PRE-DDPG learning process is not effective, the network parameters and discount factors can be continuously adjusted based on the model operation status to achieve better convergence of the entire learning effect;

步骤S5-11、重复上述步骤S5-1至步骤S5-10共n次，直到学习得出最大累计奖励值的最优策略，即最优调控运行策略。Step S5-11, repeat the above steps S5-1 to S5-10 for a total of n times until the optimal strategy with the maximum cumulative reward value is learned, that is, the optimal control operation strategy.

步骤S5-12、将所述室内外环境参数的测试集导入已训练的PRE-DDPG模型，得到建筑室内中央空调净化的最优调控策略。Step S5-12: import the test set of indoor and outdoor environmental parameters into the trained PRE-DDPG model to obtain the optimal control strategy for indoor central air-conditioning purification of the building.

进一步，一种基于深度强化学习的中央空调净化系统，包括：Furthermore, a central air conditioning purification system based on deep reinforcement learning includes:

数据采集与处理单元，用于获取室内外环境参数和ASHRAE数据集，对所述室内外环境参数和ASHRAE数据集进行数据清洗、缺失值处理和数据归一化，将所述室内外环境参数和ASHRAE数据集分成训练集和测试集；A data acquisition and processing unit, used for acquiring indoor and outdoor environmental parameters and ASHRAE data sets, performing data cleaning, missing value processing and data normalization on the indoor and outdoor environmental parameters and ASHRAE data sets, and dividing the indoor and outdoor environmental parameters and ASHRAE data sets into a training set and a test set;

前馈神经网络单元，使用所述ASHRAE数据集的训练集离线训练所述前馈神经网络；训练完成后，将所述ASHRAE数据集的测试集导入已训练的PRE-DDPG模型，通过迭代更新和多次训练，得到预测准确的前馈神经网络模型；将所述室内外环境参数中的温度、湿度和风速输入至所述前馈神经网络，实时预测PMV值；A feedforward neural network unit is used to train the feedforward neural network offline using the training set of the ASHRAE data set; after the training is completed, the test set of the ASHRAE data set is imported into the trained PRE-DDPG model, and an accurate feedforward neural network model is obtained through iterative updates and multiple trainings; the temperature, humidity and wind speed in the indoor and outdoor environmental parameters are input into the feedforward neural network to predict the PMV value in real time;

人口密度检测单元，用于人口密度检测公式计算和识别建筑内不同区域的人员分布情况，提供关键的人流密度信息；Population density detection unit, which is used to calculate the population density detection formula and identify the distribution of people in different areas of the building, and provide key crowd density information;

深度强化学习单元，用于涉及MDP模型，包括状态空间、动作空间、奖励函数、转移函数和折扣因子，并将所述PMV值输入到所述状态空间；A deep reinforcement learning unit, for involving an MDP model, including a state space, an action space, a reward function, a transfer function, and a discount factor, and inputting the PMV value into the state space;

中央空调控制单元，用于运行PRE-DDPG算法，将所述PRE-DDPG算法在所述MDP模型上运行，使用所述人口密度检测方法，通过迭代更新和多次训练，利用所述室内外环境参数的训练集动态调整空调运行参数，得到所述已训练的PRE-DDPG模型；将所述室内外环境参数的测试集导入所述已训练的PRE-DDPG模型，得到建筑室内中央空调净化的最优调控策略。The central air-conditioning control unit is used to run the PRE-DDPG algorithm, run the PRE-DDPG algorithm on the MDP model, use the population density detection method, through iterative update and multiple training, use the training set of indoor and outdoor environmental parameters to dynamically adjust the air-conditioning operation parameters to obtain the trained PRE-DDPG model; import the test set of indoor and outdoor environmental parameters into the trained PRE-DDPG model to obtain the optimal control strategy for indoor central air-conditioning purification of the building.

进一步，所述人口密度检测单元根据所述数据采集与处理单元的结果响应，并应用于所述深度强化学习单元内，为所述中央空调控制单元提供最优决策方法。Furthermore, the population density detection unit responds according to the result of the data acquisition and processing unit and is applied to the deep reinforcement learning unit to provide an optimal decision-making method for the central air conditioning control unit.

与现有技术相比，本发明的有益效果为：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明借助ASHRAE数据集训练前馈神经网络模型，通过实时环境数据预测不同区域的PMV值，使中央空调能够根据个体需求和不同区域的实时环境变化，个性化地调整策略，实现了更灵活性和精确化控制。1. The present invention uses the ASHRAE data set to train a feedforward neural network model and predicts the PMV values of different areas through real-time environmental data, so that the central air conditioner can adjust the strategy in a personalized manner according to individual needs and real-time environmental changes in different areas, thereby achieving more flexible and precise control.

2、本发明设计了一种人口密度检测方法，通过精确的人口密度检测和实时数据分析，有效管理和优化建筑内部的人流密度。在突发事件或人群聚集情况下，系统能够迅速做出响应，调整空调运行参数以确保安全和舒适，提升了中央空调控制的精确性与灵活性。2. The present invention designs a population density detection method, which effectively manages and optimizes the density of people flow inside the building through accurate population density detection and real-time data analysis. In the event of an emergency or crowd gathering, the system can respond quickly and adjust the air conditioning operating parameters to ensure safety and comfort, thereby improving the accuracy and flexibility of central air conditioning control.

3、本发明引入了一种基于深度强化学习的方法，设计了一种PRE-DDPG算法，该算法无需依赖精确的建筑模型，通过智能化的空调系统控制和优化，能够有效提高建筑的空气质量和热舒适性，降低能耗成本，实现更加灵活和精确性控制。3. This paper introduces a method based on deep reinforcement learning and designs a PRE-DDPG algorithm. This algorithm does not rely on an accurate building model. Through intelligent air-conditioning system control and optimization, it can effectively improve the air quality and thermal comfort of the building, reduce energy consumption costs, and achieve more flexible and precise control.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例提供一种基于深度强化学习的中央空调净化方法示意图；FIG1 is a schematic diagram of a central air conditioning purification method based on deep reinforcement learning provided by an embodiment of the present invention;

图2为本发明实施例提供的多区域商业建筑中央空调净化系统示意图；FIG2 is a schematic diagram of a central air conditioning purification system for a multi-zone commercial building provided by an embodiment of the present invention;

图3为本发明实施例提供的前馈神经网络预测室内热舒适值的示意图；FIG3 is a schematic diagram of a feedforward neural network for predicting indoor thermal comfort values according to an embodiment of the present invention;

图4为本发明实施例提供的PRE-DDPG算法的网络结构示意图；FIG4 is a schematic diagram of a network structure of a PRE-DDPG algorithm provided in an embodiment of the present invention;

图5为本发明实施例提供一种基于深度强化学习的中央空调净化系统示意图。FIG5 is a schematic diagram of a central air conditioning purification system based on deep reinforcement learning provided in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

目前的中央空调空气净化系统及方法缺乏灵活性，精确度不高。因此，为了降低建筑能耗，根据室内环境特性来提高室内空气质量和室内舒适度，是解决现有相关技术中存在的不足与问题的关键之一。The current central air conditioning air purification system and method lack flexibility and are not very accurate. Therefore, in order to reduce building energy consumption, improving indoor air quality and indoor comfort according to indoor environmental characteristics is one of the keys to solving the deficiencies and problems in existing related technologies.

实施例一Embodiment 1

参照图1所示，本发明提供一种基于深度强化学习的中央空调净化方法，技术方案如下：As shown in FIG1 , the present invention provides a central air conditioning purification method based on deep reinforcement learning, and the technical solution is as follows:

步骤S1、数据收集与整理，获取室内外环境参数和ASHRAE数据集，对所述室内外环境参数和ASHRAE数据集进行数据清洗、缺失值处理和数据归一化，将所述室内外环境参数和ASHRAE数据集分成训练集和测试集。Step S1, data collection and organization, obtaining indoor and outdoor environmental parameters and ASHRAE data sets, performing data cleaning, missing value processing and data normalization on the indoor and outdoor environmental parameters and ASHRAE data sets, and dividing the indoor and outdoor environmental parameters and ASHRAE data sets into a training set and a test set.

具体的，在分析室内空气环境和人体热舒适的影响因素时，需要识别便于实现的控制变量和被控制变量。值得注意的是，空气供给速率的调整也会直接影响能耗。因此，在优化室内环境和热舒适度时，必须同时考虑能耗效率。Specifically, when analyzing the factors affecting the indoor air environment and human thermal comfort, it is necessary to identify the control variables and controlled variables that are easy to implement. It is worth noting that the adjustment of the air supply rate will also directly affect energy consumption. Therefore, when optimizing the indoor environment and thermal comfort, energy efficiency must be considered at the same time.

参照图2所示，假设某商业建筑带有4个区域，中央空调包括气处理单元和各区域的可变风量箱。空气处理机组由风阀、冷却/加热盘管、变频驱动送风机组成。冷却/加热盘管可以冷却/加热混合空气，而变频驱动送风机将经过冷却/加热后的混合空气输送至各区域的可变风量箱中，通过空气处理单元输送空气并排出室内空气来净化室内空气质量，为简化控制操作且更明显体现控制效果，仅控制不同区域的空气供给速率。As shown in Figure 2, suppose a commercial building has 4 areas, and the central air conditioner includes an air handling unit and a variable air volume box in each area. The air handling unit consists of a damper, a cooling/heating coil, and a variable frequency drive blower. The cooling/heating coil can cool/heat the mixed air, and the variable frequency drive blower delivers the cooled/heated mixed air to the variable air volume box in each area, and delivers the air through the air handling unit and exhausts the indoor air to purify the indoor air quality. In order to simplify the control operation and more clearly reflect the control effect, only the air supply rate of different areas is controlled.

不同区域的温度、湿度、CO₂浓度和人数会随时间变化而变化，这些参数易于采集，因此可作为室内环境的关键参数。通过安装在建筑内的传感器，实时采集所述室内外环境参数，可以有效作为室内热环境的参考因素。The temperature, humidity, _CO2 concentration and number of people in different areas will change over time. These parameters are easy to collect and can therefore be used as key parameters of the indoor environment. By installing sensors in the building, the indoor and outdoor environmental parameters can be collected in real time and effectively used as reference factors for the indoor thermal environment.

ASHRAE数据集包含了详细且广泛的环境参数和舒适度指标，从ASHRAE数据集中获取包含环境参数和舒适度指标的数据，包括但不限于温度、湿度、风速和舒适度指标PMV值。The ASHRAE dataset contains detailed and extensive environmental parameters and comfort indicators. Data containing environmental parameters and comfort indicators are obtained from the ASHRAE dataset, including but not limited to temperature, humidity, wind speed, and comfort indicator PMV value.

进行所述数据清洗，另外数据中的缺失值可能会影响后续分析和建模的结果，因此需要对缺失值进行处理以保持数据的完整性和可用性，使用均值填补法处理缺失值，确定所述ASHRAE数据集和室内外环境参数中缺失哪些值，对于每个有所述缺失值的特征，计算该特征的非缺失值的均值，并将所述缺失值替换为所述特征的均值。The data cleaning is performed. In addition, missing values in the data may affect the results of subsequent analysis and modeling. Therefore, the missing values need to be processed to maintain the integrity and availability of the data. The mean imputation method is used to process the missing values, determine which values are missing in the ASHRAE data set and indoor and outdoor environmental parameters, and for each feature with the missing values, calculate the mean of the non-missing values of the feature, and replace the missing values with the mean of the feature.

使用三倍标准差法检测并处理数据中的异常值，如果所述特征的值超出所述均值三倍范围，则将其标记为异常值，并将所述异常替换为所述特征的均值。The three-times standard deviation method is used to detect and process outliers in the data. If the value of the feature is beyond three times the mean, it is marked as an outlier and the anomaly is replaced with the mean of the feature.

将所有数据进行归一化，以避免特征值差异过大导致的训练不稳定问题。所述归一化公式为：All data are normalized to avoid training instability caused by large differences in eigenvalues. The normalization formula is:

将预处理好的ASHRAE数据集和室内外环境参数，80％划分成训练集，20％划分成测试集，通过全面的数据预处理，提升了数据的质量和可靠性，为后续的建模和优化提供了坚实的基础，确保了模型能够更精准地预测和调控室内环境。The preprocessed ASHRAE data set and indoor and outdoor environmental parameters were divided into 80% training sets and 20% test sets. Through comprehensive data preprocessing, the quality and reliability of the data were improved, providing a solid foundation for subsequent modeling and optimization, and ensuring that the model can more accurately predict and control the indoor environment.

步骤S2、前馈神经网络预测PMV值，使用所述ASHRAE数据集的训练集离线训练所述前馈神经网络；通过迭代更新和多次训练，将所述ASHRAE数据集的测试集导入已训练的PRE-DDPG模型，得到预测准确的前馈神经网络模型；将所述室内外环境参数中的温度、湿度和风速输入至所述前馈神经网络，实时预测所述PMV值。Step S2, a feedforward neural network predicts the PMV value, using the training set of the ASHRAE data set to train the feedforward neural network offline; through iterative updates and multiple trainings, the test set of the ASHRAE data set is imported into the trained PRE-DDPG model to obtain an accurate feedforward neural network model; the temperature, humidity and wind speed in the indoor and outdoor environmental parameters are input into the feedforward neural network to predict the PMV value in real time.

具体的，风速对应于空气供给速率，而PMV值是一种评估人体热舒适度的指标，综合考虑了影响人体的多种因素。除了温度、湿度和风速外，PMV还包括辐射温度、活动水平和个体代谢率。温度、湿度和风速是影响人体热舒适度直接因素，对人体的舒适度影响较大，因此是PMV模型中重要的考虑因素。由于其他因素(如辐射温度、活动水平和个体代谢率)不易获取，暂时不在考虑范围之内。Specifically, wind speed corresponds to the air supply rate, and the PMV value is an indicator for evaluating human thermal comfort, which comprehensively considers multiple factors that affect the human body. In addition to temperature, humidity and wind speed, PMV also includes radiant temperature, activity level and individual metabolic rate. Temperature, humidity and wind speed are direct factors affecting human thermal comfort and have a great impact on human comfort, so they are important considerations in the PMV model. Since other factors (such as radiant temperature, activity level and individual metabolic rate) are not easy to obtain, they are not considered for the time being.

前馈神经网络具有强大的非线性建模能力，能够有效地捕捉这些复杂的关系，参照图3所示，输入层的维度，包括温度、湿度和风速，共3个特征。隐藏层的数量为2，每层包含20个神经元，以增强网络对非线性关系的学习能力，如果学习效果不好，可以随时后期进行调整。由于是回归任务，输出层只需一个神经元来输出预测的PMV值，这个值直接反映了当前环境条件对人体的舒适程度的预测。选择隐藏层激活函数为ReLU，能有效处理非线性关系并缓解梯度消失问题，使网络能够更好地学习复杂的特征。选择损失函数为均方误差用于衡量预测值与真实值之间的差距，选择Adam优化器用于优化网络的权重和偏置。The feedforward neural network has a strong nonlinear modeling capability and can effectively capture these complex relationships. As shown in Figure 3, the dimensions of the input layer include temperature, humidity, and wind speed, a total of 3 features. The number of hidden layers is 2, and each layer contains 20 neurons to enhance the network's ability to learn nonlinear relationships. If the learning effect is not good, it can be adjusted at any time. Since it is a regression task, the output layer only needs one neuron to output the predicted PMV value, which directly reflects the prediction of the comfort level of the human body under the current environmental conditions. The hidden layer activation function is selected as ReLU, which can effectively handle nonlinear relationships and alleviate the gradient disappearance problem, so that the network can better learn complex features. The loss function is selected as the mean square error to measure the gap between the predicted value and the true value, and the Adam optimizer is selected to optimize the weights and biases of the network.

设置学习率为0.001，批大小为64，训练500轮后，监控所述前馈神经网络在训练集上的表现，适时调整学习率、批大小和隐藏层神经元数量等超参数，以确保模型在实际应用中具有良好的泛化能力，避免过拟合或欠拟合的问题。The learning rate is set to 0.001 and the batch size is set to 64. After 500 rounds of training, the performance of the feedforward neural network on the training set is monitored, and hyperparameters such as the learning rate, batch size, and the number of neurons in the hidden layer are adjusted in a timely manner to ensure that the model has good generalization ability in practical applications and avoid overfitting or underfitting problems.

载入训练好的前馈神经网络，输入所述ASHRAE数据中的训练集，计算预测PMV的准确率，通过所述前馈神经网络，实现了人体热舒适度的精确预测，为中央空调系统的智能化控制提供了坚实的数据和模型基础。The trained feedforward neural network is loaded, the training set in the ASHRAE data is input, and the accuracy of the predicted PMV is calculated. Through the feedforward neural network, the accurate prediction of human thermal comfort is achieved, providing a solid data and model foundation for the intelligent control of the central air-conditioning system.

步骤S3、人口密度检测方法，通过人口密度检测公式计算和识别建筑内4个区域的人员分布情况，提供关键的人流密度信息。Step S3, a population density detection method, calculates and identifies the distribution of people in four areas of the building through a population density detection formula, and provides key crowd density information.

具体的，为了模拟这一过程，采用了离散的时间步方法，以每半小时为一个时间单位，用t＝0,1,2,...表示。每间隔半小时，测量室内与室外CO₂浓度，计算室内CO₂浓度的增加量ΔC：Specifically, in order to simulate this process, a discrete time step method is used, with half an hour as a time unit, represented by t = 0, 1, 2, .... Every half an hour, the indoor and outdoor _CO2 concentrations are measured, and the increase in indoor _CO2 concentration ΔC is calculated:

ΔC＝C-C₀；ΔC＝CC ₀ ;

其中，所述室内CO₂浓度为C，室外CO₂浓度为C₀；Wherein, the indoor CO ₂ concentration is C, and the outdoor CO ₂ concentration is C ₀ ;

CO₂浓度增加量为：The increase in CO ₂ concentration is:

通过以上公式可以推出，人口密度公式为：From the above formula, we can deduce that the population density formula is:

其中，K为所述人口密度，当所述人口密度超出最高人口阈值M时，发出警告。通过精确的人口密度检测和实时动态调整，提高中央空调系统的运行灵活性和精确性，以确保室内环境的舒适度和健康性。Wherein, K is the population density, and a warning is issued when the population density exceeds the maximum population threshold M. Through accurate population density detection and real-time dynamic adjustment, the operation flexibility and accuracy of the central air-conditioning system are improved to ensure the comfort and health of the indoor environment.

步骤S4、以所述商业建筑为环境，以PRE-DDPG算法为智能体，设计MDP模型，借助前馈神经网络预测PMV值，并输入至状态空间内，智能体在环境中的交互，通过学习过程不断优化其策略，以达到最大化累积奖励的目标。Step S4: Taking the commercial building as the environment and the PRE-DDPG algorithm as the agent, the MDP model is designed, the PMV value is predicted with the help of a feedforward neural network, and the PMV value is input into the state space. The agent interacts in the environment and continuously optimizes its strategy through the learning process to achieve the goal of maximizing the cumulative reward.

具体的，构建的MDP模型，表示为：Specifically, the constructed MDP model is expressed as:

其中，Φ为MDP模型，S表示所述状态空间，指的是智能体可以感知和处理的环境状态的集合；A表示智能体可采取的所述动作空间，指的是智能体可以执行的操作；P表示所述转移函数，指的是系统从一个状态转移到另一个状态的概率分布；R表示所述奖励函数，指的是智能体在环境中采取某个动作后所获得的即时反馈；γ表示所述折扣因子，用于衡量智能体对未来奖励的重视程度。Among them, Φ is the MDP model, S represents the state space, which refers to the set of environmental states that the agent can perceive and process; A represents the action space that the agent can take, which refers to the operations that the agent can perform; P represents the transfer function, which refers to the probability distribution of the system transferring from one state to another; R represents the reward function, which refers to the immediate feedback obtained after the agent takes an action in the environment; γ represents the discount factor, which is used to measure the importance of the agent to future rewards.

其中，S_t是t时刻状态空间，Temp_out,t是t时刻室外温度，是t时刻z区域室内温度，Hum_out,t是t时刻室外湿度，是t时刻z区域室内湿度，C_out,t是t时刻的CO₂浓度，是t时刻z区域的CO₂浓度，PMV_t ^z是t时刻热舒适预测值，是t时刻人数。利用所述状态空间的定义方法，智能体能够更全面地掌握和响应室内外环境的变化，提升中央空调系统的实时性和灵活性。Among them, S _t is the state space at time t, Temp _out,t is the outdoor temperature at time t, is the indoor temperature of area z at time t, Hum _out,t is the outdoor humidity at time t, is the indoor humidity in area z at time t, C _out,t is the CO ₂ concentration at time t, is the CO ₂ concentration in area z at time t, PMV _t ^z is the predicted value of thermal comfort at time t, is the number of people at time t. By using the state space definition method, the intelligent agent can more comprehensively grasp and respond to changes in the indoor and outdoor environments, and improve the real-time performance and flexibility of the central air-conditioning system.

进一步，所述动作空间A_t表示为：Further, the action space _At is expressed as:

其中，表示t时刻z区域的空气供给速率，每个区域的设定点是一个连续变量，通过对不同区域空气供给速率的精准调控，可以更好的应对不同的温度、湿度和人数变化。比如，针对不同区域的温度差异，调节空气供给速率，使得每个区域的温度都能维持在舒适范围内，避免过冷或过热的情况。in, It represents the air supply rate of the z zone at time t. The set point of each zone is a continuous variable. By accurately controlling the air supply rate of different zones, we can better cope with different temperature, humidity and number of people. For example, according to the temperature difference of different zones, the air supply rate is adjusted so that the temperature of each zone can be maintained within a comfortable range to avoid overcooling or overheating.

其中，R_t为t时刻奖励函数，为t时刻z区域热舒适度奖励，R_energy,t为t时刻能耗奖励，由环境自动计算得出，为t时刻z区域空气质量奖励，n为区域总量，a、b和d为权衡因子，通过动态调整所述权衡因子来考虑不同奖励的重要性；Among them, R _t is the reward function at time t, is the thermal comfort reward for area z at time t, R _energy,t is the energy consumption reward at time t, which is automatically calculated by the environment. is the air quality reward for area z at time t, n is the total area, a, b and d are trade-off factors, and the importance of different rewards is considered by dynamically adjusting the trade-off factors;

进一步，所述热舒适度奖励表示为：Further, the thermal comfort reward is expressed as:

其中，为t时刻z区域的热舒适度奖励，PMV_t ^z为t时刻所述前馈神经网络预测z区域的PMV值，PMV_penalty为超出阈值M时的惩罚项，PMV_penalty定为20，M定为3，为t时刻z区域的人口密度，当检测到室内人口密度高于所述阈值M且高于最高舒适度阈值c时，则奖励为当前PMV值的绝对值和额外的惩罚项，最高舒适度阈值c定为1，当检测到室内为其他情况时，奖励为当前PMV值的绝对值；in, is the thermal comfort reward of area z at time t, PMV _t ^z is the PMV value of area z predicted by the feedforward neural network at time t, PMV _penalty is the penalty term when the threshold M is exceeded, PMV _penalty is set to 20, M is set to 3, is the population density of area z at time t. When it is detected that the indoor population density is higher than the threshold M and higher than the maximum comfort threshold c, the reward is the absolute value of the current PMV value and an additional penalty term. The maximum comfort threshold c is set to 1. When other conditions are detected indoors, the reward is the absolute value of the current PMV value.

当检测到室内CO₂浓度超出最高阈值时，则增加负奖励，所述空气质量奖励具体表示如下：When it is detected that the indoor _CO2 concentration exceeds the maximum threshold, a negative reward is added. The air quality reward is specifically expressed as follows:

其中，为t时刻z区域的空气质量奖励，为t时刻z区域的CO₂浓度，C_high为CO₂浓度最高阈值，C_penalty为超出所述阈值C_high时的惩罚项，C_penalty定义为20，C_high定义为1,200ppm。通过综合考虑热舒适度、能耗和空气质量的优化目标，并动态调整权衡因子。比如，当系统检测到当前区域的PMV值偏离理想值时，权衡因子会增加对热舒适度的关注，促使系统优先调整相关参数。in, is the air quality reward for area z at time t, is the CO ₂ concentration in area z at time t, C _high is the maximum CO ₂ concentration threshold, C _penalty is the penalty term when the threshold C _high is exceeded, C _penalty is defined as 20, and C _high is defined as 1,200 ppm. By comprehensively considering the optimization goals of thermal comfort, energy consumption, and air quality, and dynamically adjusting the trade-off factors. For example, when the system detects that the PMV value of the current area deviates from the ideal value, the trade-off factor will increase the focus on thermal comfort, prompting the system to prioritize the adjustment of relevant parameters.

进一步，所述折扣因子用于衡量智能体对未来奖励的重视程度，在所述商业建筑系统中，考虑到舒适度优化与能源消耗之间的平衡，以实现长期的系统效益和用户满意度，这里把折扣因子定义为0.9，而转移函数由环境决定。Furthermore, the discount factor is used to measure the importance the agent places on future rewards. In the commercial building system, the balance between comfort optimization and energy consumption is taken into account to achieve long-term system benefits and user satisfaction. Here, the discount factor is defined as 0.9, and the transfer function is determined by the environment.

步骤S5、运行PRE-DDPG算法，所述PRE-DDPG算法在所述MDP模型上运行，使用所述人口密度检测方法，利用所述室内外环境参数的训练集动态调整空调运行参数，通过迭代更新和多次训练，得到已训练的PRE-DDPG模型；将所述室内外环境参数的测试集导入已训练的PRE-DDPG模型，得到建筑室内中央空调净化的最优调控策略。Step S5, running the PRE-DDPG algorithm, the PRE-DDPG algorithm runs on the MDP model, uses the population density detection method, and dynamically adjusts the air-conditioning operation parameters using the training set of the indoor and outdoor environmental parameters, and obtains the trained PRE-DDPG model through iterative updates and multiple trainings; imports the test set of the indoor and outdoor environmental parameters into the trained PRE-DDPG model to obtain the optimal control strategy for indoor central air-conditioning purification of the building.

在本实例中，商业建筑内的中央空调系统需要的状态空间非常庞大，因此深度神经网络的支持显得尤为重要，运行PRE-DDPG算法时，通过深度学习和强化学习的结合，能够自动化调控，降低对人为操作和监控的依赖，简化管理流程。In this example, the state space required by the central air-conditioning system in the commercial building is very large, so the support of deep neural networks is particularly important. When running the PRE-DDPG algorithm, the combination of deep learning and reinforcement learning can automate control, reduce dependence on manual operation and monitoring, and simplify the management process.

具体的，PRE-DDPG算法，参照图4所示，就网络结构来说，DDPG运用了两种类型的网络，分为Actor网络和Critic网络。同时还延续DQN使用固定目标网络的思想，每种网络都包含目标网络和估计网络。传统PG方法采用随机策略，每一次获取动作需要对当前最优策略的分布进行采样，而DDPG采取确定性策略，Actor网络的输入是当前状态，输出的是一个确定性的动作。Critic网络用来拟合状态动作价值函数，它的输入由当前状态和Actor网络生成的动作组成，输出是当前状态动作对Q值。这个Q值将被进一步用于更新Actor网络的参数。Specifically, the PRE-DDPG algorithm, as shown in Figure 4, in terms of network structure, DDPG uses two types of networks, namely Actor networks and Critic networks. At the same time, it also continues the idea of using a fixed target network in DQN, and each network contains a target network and an estimation network. The traditional PG method adopts a random strategy, and each time an action is obtained, the distribution of the current optimal strategy needs to be sampled, while DDPG adopts a deterministic strategy. The input of the Actor network is the current state, and the output is a deterministic action. The Critic network is used to fit the state-action value function. Its input consists of the current state and the action generated by the Actor network, and the output is the Q value of the current state-action pair. This Q value will be further used to update the parameters of the Actor network.

实施步骤包括：Implementation steps include:

步骤S5-1、随机初始化Actor网络m(s|q^u)和Critic网络Q(s,a|q^Q)，两个网络的参数分别用θ^μ和θ^Q表示；Step S5-1, randomly initialize the Actor network m(s|q ^u ) and the Critic network Q(s,a|q ^Q ), and the parameters of the two networks are represented by θ ^μ and θ ^Q respectively;

步骤S5-2、初始化目标网络：q^u′←q^u，q ^Q′←q^Q，两个目标网络的参数分别用q^u′和q^Q′表示，用于稳定训练过程中的目标值估计；Step S5-2, initializing the target network: q ^u′ ←q ^u , q ^Q′ ←q ^Q , the parameters of the two target networks are represented by q ^u′ and q ^Q′ respectively, which are used to stabilize the target value estimation in the training process;

步骤S5-3、初始化经验回放池，经验回放机制使得算法能够从历史经验中学习，提高样本效率和学习稳定性，尤其是在非平稳环境下表现良好；Step S5-3, initialize the experience replay pool. The experience replay mechanism enables the algorithm to learn from historical experience, improve sample efficiency and learning stability, especially perform well in non-stationary environments;

步骤S5-4、进行迭代。Step S5-4: perform iteration.

进一步，所述步骤S5-4的迭代过程如下：Further, the iterative process of step S5-4 is as follows:

步骤S5-4-1、智能体观察到初始状态{Temp_out,t＝0,Temp_in,t＝0,Hum_out,t＝0,Hum_in,t＝0}；Step S5-4-1, the agent observes the initial state {Temp _{out, t = 0} , Temp _{in, t = 0} , Hum _{out, t = 0} , Hum _{in, t = 0} };

步骤S5-4-2、智能体预测当前各个区域室内的初始热舒适值 Step S5-4-2: The agent predicts the initial thermal comfort value of each area indoors

步骤S5-4-3、智能体计算当前各个区域的人口密度表示人口密度的初始值；Step S5-4-3: The agent calculates the current population density of each area represents the initial value of population density;

步骤S5-4-4、智能体根据当前策略和探索噪声选择动作即t时刻z区域的中央空调系统中的空气供给速率调节；Step S5-4-4: The agent selects an action based on the current strategy and exploration noise That is, the air supply rate regulation in the central air conditioning system in area z at time t;

步骤S5-4-5、智能体在环境中执行动作按照所述奖励函数计算奖励r_t，并依靠PRE-DDPG模型观察下一状态s_t′；Step S5-4-5: The agent performs actions in the environment Calculate the reward r _t according to the reward function and observe the next state s _t ′ based on the PRE-DDPG model;

步骤S5-4-6、将存储到经验回放池D，用于后续的学习，并令s_t＝s_t′；Step S5-4-6: Store it in the experience replay pool D for subsequent learning, and let s _t = s _t ′;

步骤S5-4-7、如果经验回放池D已满，从D中随机采样由N个transition组成的小批量，令y_j＝r(s_j,a_j)+Q(s_j′,m(s_j′|q^m′)|q ^Q′)；Step S5-4-7: If the experience replay pool D is full, randomly sample N transitions from D. The small batch composed of , let y _j = r(s _j , a _j ) + Q(s _j ′, m(s _j ′|q ^m′ )|q ^Q′ );

步骤S5-4-8、通过最小化损失函数更新Critic网络Q(s,a|q ^Q)，使用策略梯度更新Actor网络m(s|q^u)；Step S5-4-8, update the Critic network Q(s,a|q ^Q ) by minimizing the loss function, and use the policy gradient to update the Actor network m(s|q ^u );

步骤S5-4-9、使用软更新来更新目标Actor网络和目标Critic网络，以稳定训练过程；Step S5-4-9, use soft update to update the target Actor network and the target Critic network to stabilize the training process;

步骤S5-4-10、环境模拟停止，算法停止。Step S5-4-10: Environmental simulation stops and the algorithm stops.

步骤S5-4-11、将所述室内外环境参数的测试集导入已训练的PRE-DDPG模型，得到建筑室内中央空调净化的最优调控策略。Step S5-4-11, import the test set of indoor and outdoor environmental parameters into the trained PRE-DDPG model to obtain the optimal control strategy for indoor central air-conditioning purification of the building.

为了展示本发明所提方法的有效性，两种对比方案被引入。具体而言，对比方案一不使用所述前馈神经网络模型，且使用采用传统的开/关方式对中央空调进行控制；对比方案二不使用所述前馈神经网络模型，使用DDPG对中央空调进行控制。In order to demonstrate the effectiveness of the method proposed in the present invention, two comparative schemes are introduced. Specifically, comparative scheme 1 does not use the feedforward neural network model, and uses the traditional on/off method to control the central air conditioner; comparative scheme 2 does not use the feedforward neural network model, and uses DDPG to control the central air conditioner.

表1性能对比Table 1 Performance comparison

PRE-DDPGPRE-DDPG 方案一Solution 1 方案二Solution 2 平均能耗(kWh)Average energy consumption (kWh) 150150 200200 180180 PMV值PMV value 0.80.8 1.11.1 1.21.2 空气质量(CO2ppm)Air quality (CO2ppm) 800800 10001000 900900

表1展示了所提算法与对比方案的性能。通过前馈神经网络精确预测室内环境中个体的热舒适度，所提PRE-DDPG算法热舒适PMV值离最理想舒适值0最近，从而实现对中央空调精细化的控制和优化；其次，通过人口密度计算公式，能够识别和统计出建筑内不同区域的人员分布情况，还能够提供关键的人流密度信息；最后，提出了PRE-DDPG算法，系统能够根据实时收集的环境数据和人员需求动态调整空调运行参数，所提算法具有最低的能耗，空气质量也最佳，能够最大程度地提升室内空气质量和舒适性，同时有效降低建筑能耗，符合节能减排的现代化建筑管理理念。Table 1 shows the performance of the proposed algorithm and the comparison scheme. The thermal comfort of individuals in the indoor environment is accurately predicted by the feedforward neural network. The thermal comfort PMV value of the proposed PRE-DDPG algorithm is closest to the ideal comfort value of 0, thereby realizing the refined control and optimization of the central air conditioner. Secondly, through the population density calculation formula, the distribution of people in different areas of the building can be identified and counted, and key crowd density information can also be provided. Finally, the PRE-DDPG algorithm is proposed. The system can dynamically adjust the air conditioning operation parameters according to the real-time collected environmental data and personnel needs. The proposed algorithm has the lowest energy consumption and the best air quality. It can maximize the improvement of indoor air quality and comfort, while effectively reducing building energy consumption, which is in line with the modern building management concept of energy conservation and emission reduction.

实施例二Embodiment 2

参照图5所示，一种基于深度强化学习的中央空调净化系统，包括：As shown in FIG5 , a central air conditioning purification system based on deep reinforcement learning includes:

数据采集与处理单元，用于获取室内外环境参数(包括温度、湿度、CO₂含量等)和ASHRAE数据集，对所述室内外环境参数和ASHRAE数据集进行数据清洗、缺失值处理和数据归一化，将所述室内外环境参数和ASHRAE数据集分成训练集和测试集，通过所述数据采集与处理单元能够精确数据准备和高效数据分配；A data acquisition and processing unit is used to obtain indoor and outdoor environmental parameters (including temperature, humidity, _CO2 content, etc.) and ASHRAE data sets, perform data cleaning, missing value processing and data normalization on the indoor and outdoor environmental parameters and ASHRAE data sets, and divide the indoor and outdoor environmental parameters and ASHRAE data sets into a training set and a test set. The data acquisition and processing unit can accurately prepare data and efficiently distribute data;

前馈神经网络单元，使用所述ASHRAE数据集的训练集离线训练所述前馈神经网络；训练完成后，将所述ASHRAE数据集的测试集导入已训练的PRE-DDPG模型，通过迭代更新和多次训练，得到预测准确的前馈神经网络模型；将所述室内外环境参数中的温度、湿度和风速输入至所述前馈神经网络，通过所述前馈神经网络单元能够实时预测PMV值；A feedforward neural network unit is used to train the feedforward neural network offline using the training set of the ASHRAE data set; after the training is completed, the test set of the ASHRAE data set is imported into the trained PRE-DDPG model, and an accurate feedforward neural network model is obtained through iterative updates and multiple trainings; the temperature, humidity and wind speed in the indoor and outdoor environmental parameters are input into the feedforward neural network, and the PMV value can be predicted in real time through the feedforward neural network unit;

人口密度检测单元，用于人口密度检测公式计算和识别建筑内不同区域的人员分布情况，提供关键的人流密度信息，通过所述人口密度检测单元能够有效提高中央空调系统的响应能力，比如，在人员较少的区域，系统会减少空气供给速率，从而降低能耗；而在人员密集的区域，则会增加空气供给，确保舒适度和空气质量；A population density detection unit is used to calculate and identify the distribution of people in different areas of the building using a population density detection formula, and to provide key crowd density information. The population density detection unit can effectively improve the responsiveness of the central air conditioning system. For example, in areas with fewer people, the system will reduce the air supply rate, thereby reducing energy consumption; while in areas with dense population, the system will increase the air supply to ensure comfort and air quality;

深度强化学习单元，用于涉及MDP模型，包括状态空间、动作空间、奖励函数、转移函数和折扣因子，并将所述PMV值输入到所述状态空间，通过动态调整空调运行参数，提高系统的智能化和自适应能力，能够根据实时数据进行灵活调控；A deep reinforcement learning unit is used to involve an MDP model, including a state space, an action space, a reward function, a transfer function and a discount factor, and input the PMV value into the state space, and improve the intelligence and adaptability of the system by dynamically adjusting the air-conditioning operating parameters, and can be flexibly adjusted according to real-time data;

中央空调控制单元，用于运行PRE-DDPG算法，将所述PRE-DDPG算法在所述MDP模型上运行，使用所述人口密度检测方法，利用所述室内外环境参数的训练集动态调整空调运行参数，得到所述已训练的PRE-DDPG模型；将所述室内外环境参数的测试集导入所述已训练的PRE-DDPG模型。A central air-conditioning control unit is used to run a PRE-DDPG algorithm, run the PRE-DDPG algorithm on the MDP model, use the population density detection method, and dynamically adjust the air-conditioning operation parameters using the training set of indoor and outdoor environmental parameters to obtain the trained PRE-DDPG model; and import the test set of indoor and outdoor environmental parameters into the trained PRE-DDPG model.

进一步，所述人口密度检测单元根据所述数据采集与处理单元的结果响应，计算空气质量和能耗，并利用前馈神经网络单元计算热舒适值，应用于所述深度强化学习单元内，根据获得的奖励函数为所述中央空调控制单元提供控制方法，经过多次迭代，输出中央空调的最优决策。通过多单元协同工作，实现了数据采集、处理、分析、预测、决策和控制的闭环管理，整体提高了中央空调系统的控制的精确性和灵活性。Furthermore, the population density detection unit calculates the air quality and energy consumption according to the result response of the data acquisition and processing unit, and uses the feedforward neural network unit to calculate the thermal comfort value, which is applied to the deep reinforcement learning unit, and provides a control method for the central air conditioning control unit according to the obtained reward function. After multiple iterations, the optimal decision of the central air conditioning is output. Through the collaborative work of multiple units, closed-loop management of data acquisition, processing, analysis, prediction, decision-making and control is realized, which improves the overall control accuracy and flexibility of the central air conditioning system.

尽管已经示出和描述了本发明的实施例，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the present invention, and that the scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. A central air conditioning purification method based on deep reinforcement learning, characterized by comprising:

Step S1, data collection and collation, obtaining indoor and outdoor environmental parameters and ASHRAE data sets, performing data cleaning, missing value processing and data normalization on the indoor and outdoor environmental parameters and ASHRAE data sets, and dividing the indoor and outdoor environmental parameters and ASHRAE data sets into a training set and a test set;

Step S2, predicting the PMV value by a feedforward neural network, using the training set of the ASHRAE data set to train the feedforward neural network offline; importing the test set of the ASHRAE data set into the trained feedforward neural network model through iterative updating and multiple trainings to obtain an accurate feedforward neural network model; inputting the indoor and outdoor environmental parameters into the feedforward neural network to predict the PMV value in real time;

Step S3: Detecting population density, and identifying the distribution of people in different areas of the building by calculating a population density detection formula;

Step S4, designing an MDP model, including a state space, an action space, a reward function, a transfer function and a discount factor, and inputting the PMV value into the state space;

Step S5, running the PRE-DDPG algorithm, using the population density detection method and the indoor and outdoor environmental parameter training set on the MDP model, dynamically adjusting the central air-conditioning parameters through multiple iterative training, and obtaining an optimized PRE-DDPG model; importing the test set of the indoor and outdoor environmental parameters to obtain the optimal central air-conditioning control strategy.

2. According to a central air conditioning purification method based on deep reinforcement learning according to claim 1, it is characterized in that the step S1 comprises:

Collect the indoor and outdoor environmental parameters in real time, and obtain data including environmental parameters and comfort indexes from the ASHRAE data set;

Perform the data cleaning, use the mean filling method to deal with the missing values, and use the triple standard deviation method to detect and deal with outliers in the data;

All data were normalized as described.

3. According to a central air conditioning purification method based on deep reinforcement learning according to claim 1, it is characterized in that the step S2 comprises:

Determine the dimensionality of the input layer, i.e. the number of features the network receives, including temperature, humidity, and wind speed;

Determine the number of hidden layers and the number of neurons in each layer;

Determine the structure of the output layer;

Select the hidden layer activation function as ReLU;

Select the mean square error as the loss function and the Adam optimizer to optimize the weights and biases of the network;

Training the feedforward neural network using a training set in the ASHRAE data, and adjusting the hyperparameters of the model based on performance;

Monitor the performance of the feedforward neural network to check whether it is overfitting or underfitting. If overfitting occurs, reduce the number of layers and neurons of the neural network. If underfitting occurs, increase the number of training rounds. If overfitting or underfitting does not occur, no further adjustment is required.

Load the trained feedforward neural network model, input the training set in the ASHRAE data, and calculate the accuracy;

The temperature, humidity and wind speed among the indoor and outdoor environmental parameters are input into the feedforward neural network model to predict the PMV value in real time.

4. According to a central air conditioning purification method based on deep reinforcement learning according to claim 1, it is characterized in that the step S3 comprises:

At each time interval Δt, the indoor and outdoor _CO2 concentrations in the indoor and outdoor environmental parameters are obtained, and the increase in indoor _CO2 concentration ΔC is calculated, which is expressed as:

ΔC＝CC ₀ ;

Wherein, the indoor CO ₂ concentration is C, and the outdoor CO ₂ concentration is C ₀ ;

The _CO2 concentration increase ΔC is expressed as:

Where I is the per capita _CO2 production, K is the population density, and V is the indoor volume;

It is deduced that the population density K is expressed as:

When the population density exceeds a maximum population threshold M, a warning is issued.

5. According to a central air conditioning purification method based on deep reinforcement learning according to claim 1, it is characterized in that the state space in step S4 is expressed as:

Among them, S _t is the state space at time t, Temp _out,t is the outdoor temperature at time t, is the indoor temperature of area z at time t, Hum _out,t is the outdoor humidity at time t, is the indoor humidity in area z at time t, C _out,t is the CO ₂ concentration at time t, is the CO ₂ concentration in region z at time t, is the predicted value of thermal comfort at time t, is the number of people at time t.

6. According to a central air conditioning purification method based on deep reinforcement learning in claim 1, it is characterized in that the action space _At in step S4 is expressed as:

in, represents the air supply rate of zone z at time t, and the set point of each zone is a continuous variable.

7. According to a central air conditioning purification method based on deep reinforcement learning according to claim 1, it is characterized in that the reward function in step S4 is expressed as:

Among them, R _t is the reward function at time t, is the thermal comfort reward of area z at time t, R _energy,t is the energy consumption reward at time t, is the air quality reward for area z at time t, a, b and d are the weighting factors, and n is the total amount of the area;

The thermal comfort reward is expressed as:

in, is the thermal comfort bonus for area z at time t, is the PMV value of area z at time t, and PMV _penalty is the penalty term when the maximum population threshold M is exceeded. is the population density of area z at time t, and c is the maximum comfort threshold;

The air quality reward is expressed as:

in, is the air quality reward for area z at time t, is the CO ₂ concentration in area z at time t, C _high is the maximum threshold of CO ₂ concentration, and C _penalty is the additional penalty item when exceeding the threshold C _high .

8. According to a central air conditioning purification method based on deep reinforcement learning according to claim 1, it is characterized in that the step S5 comprises:

Step S5-1, the agent observes the state s _t of each area;

Step S5-2, the agent predicts the current PMV value of each area according to the indoor environment state;

Step S5-3, the agent calculates the current population density of each area;

Step S5-4: The agent selects control actions for each area based on the current strategy and exploration noise That is, the air supply rate in area z at time t;

Step S5-5: The agent performs the action in the environment Calculate the reward function r _t to obtain the state s _t ′ at the next moment;

Step S5-6: Store it in the experience replay pool D, and let the s _t = s _t ′;

Step S5-7: When the experience replay pool D is full, randomly sample N A small batch consisting of, where j is the sample number;

Step S5-8, update the Critic network by minimizing the loss function, and update the Actor network using the policy gradient;

Step S5-9, using soft update to update the target Actor network and the target Critic network;

Step S5-10: If the learning process of the PRE-DDPG algorithm is not effective, the network parameters and discount factors are continuously adjusted based on the model operation status to achieve better convergence of the entire learning effect;

Step S5-11, importing the test set of indoor and outdoor environmental parameters into the optimized PRE-DDPG model to obtain the optimal control strategy for indoor central air-conditioning purification of the building.

9. A central air conditioning purification system based on deep reinforcement learning, characterized by comprising:

A data acquisition and processing unit, used for acquiring indoor and outdoor environmental parameters and ASHRAE data sets, performing data cleaning, missing value processing and data normalization on the indoor and outdoor environmental parameters and ASHRAE data sets, and dividing the indoor and outdoor environmental parameters and ASHRAE data sets into a training set and a test set;

A feedforward neural network unit is used to train the feedforward neural network offline using the training set of the ASHRAE data set; through iterative updates and multiple trainings, the test set of the ASHRAE data set is imported into the optimized PRE-DDPG model to obtain a feedforward neural network model with accurate prediction; the temperature, humidity and wind speed in the indoor and outdoor environmental parameters are input into the feedforward neural network model to predict the PMV value in real time;

Population density detection unit, used to calculate population density and identify the distribution of people in different areas of the building, providing key crowd density information;

A deep reinforcement learning unit, used for designing an MDP model, including a state space, an action space, a reward function, a transfer function and a discount factor, and inputting the PMV value into the state space;

The central air-conditioning control unit is used to run the PRE-DDPG algorithm, use the population density detection method and the indoor and outdoor environmental parameter training set on the MDP model, dynamically adjust the air-conditioning parameters through multiple iterative training, and obtain the optimized PRE-DDPG model; import the indoor and outdoor environmental parameters of the test set to obtain the best central air-conditioning control strategy.

10. According to claim 9, a central air conditioning purification system based on deep reinforcement learning is characterized in that the population density detection unit responds according to the result of the data acquisition and processing unit and is applied to the deep reinforcement learning unit to provide an optimal decision-making method for the central air conditioning control unit.