[go: up one dir, main page]

CN119536413A - A large surface source blackbody parameter-free temperature control method, device and medium - Google Patents

A large surface source blackbody parameter-free temperature control method, device and medium Download PDF

Info

Publication number
CN119536413A
CN119536413A CN202510105108.5A CN202510105108A CN119536413A CN 119536413 A CN119536413 A CN 119536413A CN 202510105108 A CN202510105108 A CN 202510105108A CN 119536413 A CN119536413 A CN 119536413A
Authority
CN
China
Prior art keywords
temperature control
parameter
temperature
large surface
blackbody
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510105108.5A
Other languages
Chinese (zh)
Other versions
CN119536413B (en
Inventor
亓洪兴
杨文航
刘世界
徐霖
张阳阳
朱首正
何欣
王建宇
李春来
金海军
金柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Institute of Advanced Studies of UCAS
Original Assignee
Hangzhou Institute of Advanced Studies of UCAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Institute of Advanced Studies of UCAS filed Critical Hangzhou Institute of Advanced Studies of UCAS
Priority to CN202510105108.5A priority Critical patent/CN119536413B/en
Publication of CN119536413A publication Critical patent/CN119536413A/en
Application granted granted Critical
Publication of CN119536413B publication Critical patent/CN119536413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D23/00Control of temperature
    • G05D23/19Control of temperature characterised by the use of electric means
    • G05D23/20Control of temperature characterised by the use of electric means with sensing elements having variation of electric or magnetic properties with change of temperature

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Temperature (AREA)
  • Feedback Control In General (AREA)

Abstract

本发明属于温控技术领域,公开了一种大面源黑体无参温度控制方法、装置及介质,包括:S1,获取数据集;S2,问题建模;S3,以深度DQN网络为基础构建大面源黑体无参温度控制模型,然后利用训练数据集对大面源黑体无参温度控制模型进行训练;S4,验证与调整,使用测试数据集评估大面源黑体无参温度控制模型,得到最优的大面源黑体无参温度控制模型;S5,将最优的大面源黑体无参温度控制模型应用于大面源黑体温度控制系统中对大面源黑体的温度进行控制。本发明基于深度DQN网络,通过精确的状态和动作设计,以及优化的奖励函数,能够在最少的动作步数内快速且精准地将温度调节到目标值。

The present invention belongs to the field of temperature control technology, and discloses a large surface source blackbody non-parameter temperature control method, device and medium, including: S1, obtaining a data set; S2, problem modeling; S3, building a large surface source blackbody non-parameter temperature control model based on a deep DQN network, and then using a training data set to train the large surface source blackbody non-parameter temperature control model; S4, verification and adjustment, using a test data set to evaluate the large surface source blackbody non-parameter temperature control model, and obtain the optimal large surface source blackbody non-parameter temperature control model; S5, applying the optimal large surface source blackbody non-parameter temperature control model to a large surface source blackbody temperature control system to control the temperature of the large surface source blackbody. Based on a deep DQN network, the present invention can quickly and accurately adjust the temperature to a target value within the minimum number of action steps through precise state and action design and an optimized reward function.

Description

Large-area source black body parameter-free temperature control method, device and medium
Technical Field
The invention relates to the technical field of temperature control, in particular to a large-area source black body parameter-free temperature control method, a device and a medium.
Background
The large-area source black body is used as calibration equipment with extremely high temperature control corresponding speed and precision requirements, and is widely applied to the scenes such as infrared detection equipment calibration, thermal imaging system test and the like. Traditional blackbody temperature control methods rely on mathematical modeling and parametric design, often requiring manual adjustment of multiple parameters to accommodate different experimental conditions. However, this method is prone to problems of inefficiency, insufficient accuracy, and high complexity of regulation when faced with the temperature control requirements of dynamic variation and nonlinear characteristics. Especially in high-precision infrared detection and measurement, the traditional method is difficult to quickly respond to temperature change, so that the testing and calibration process takes longer time, and the requirements on instantaneity and consistency cannot be met.
In order to solve the problems, reinforcement learning is used as a data-driven intelligent algorithm, and is an ideal tool for solving the complex temperature control problem because the reinforcement learning can be automatically learned and optimized under the condition of no parameters. The temperature control problem is modeled as a Markov Decision Process (MDP), and the temperature state can be perceived in real time by combining a depth DQN network and the large-surface-source black body parameter-free temperature control method, so that the power output is dynamically adjusted to realize accurate temperature control. The method abandons the traditional parameter dependence, completes the optimization of the depth DQN network through the interaction of the intelligent agent and the environment, remarkably improves the response speed and the control precision of temperature control, and solves the difficult problem of temperature control under the dynamic nonlinear condition. Meanwhile, the reinforcement learning method also has good self-adaptive capacity and expansibility, and provides a new research direction and application prospect for the development of temperature control technology.
Disclosure of Invention
The invention aims to provide a large-area source black body parameter-free temperature control method, a device and a medium, so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a large-area source black body parameter-free temperature control method comprises the following steps:
s1, acquiring a data set, generating a simulation temperature data set through blackbody system simulation, constructing an actual temperature acquisition system, acquiring a real temperature data set, mixing the simulation temperature data set and the real temperature data set, and dividing the mixture into a training data set and a test data set according to a proportion;
S2, modeling a problem, namely modeling a parameter-free temperature control problem as a Markov decision process, defining an intelligent body-environment interaction mechanism, and defining the relation of states, actions and rewards;
S3, constructing a large-area source blackbody parameter-free temperature control model based on a depth DQN network, and training the large-area source blackbody parameter-free temperature control model by utilizing a training data set;
S4, verifying and adjusting, namely evaluating a large-area-source black body parameter-free temperature control model by using a test data set, comparing the deviation between the predicted temperature and the target temperature, and optimally adjusting parameters of the large-area-source black body parameter-free temperature control model according to a verification result to obtain an optimal large-area-source black body parameter-free temperature control model;
And S5, applying the optimal large-area source black body parameter-free temperature control model to a large-area source black body temperature control system to control the temperature of the large-area source black body.
Further, the step S1 includes the steps of:
s1.1, establishing a blackbody system simulation model, namely simulating the heating process of the blackbody system through the blackbody system simulation model to generate a simulation temperature data set covering different temperature ranges and heating rates;
Setting up a temperature acquisition system in an experimental environment, and acquiring actual blackbody temperature data by using a high-precision sensor to form an actual temperature data set which is compared with a simulation temperature data set;
S1.2, mixing the simulation temperature data set with the real temperature data set, and dividing the mixture into a training data set and a test data set in proportion.
Further, the step S2 includes the steps of:
S2.1, defining a state space, wherein the defined state space comprises a current temperature T (T), a target temperature T goal, an ambient temperature T E and a temperature change rate delta T (T), namely states S= [ T (T), T goal,TE and delta T (T) ];
S2.2, defining an action space, wherein the action A represents a duty ratio value selected by an agent of a large-area source black body parameter-free temperature control algorithm, and the action space is A= {10,20,30,..100 };
s2.3, defining a reward function, wherein the reward function simultaneously considers temperature error, waste heat effect, control efficiency and overshoot penalty.
Further, the step S2.3 further comprises the step of, when the temperature control does not reach the target temperature, providing a reward function formula:
Wherein R represents feedback rewards of the environment to the current action, I represents absolute deviation between the current temperature T (T) and the target temperature T goal, N step represents the number of action steps taken by an intelligent agent, T residual represents additional temperature rise caused by waste heat effect, lambda, gamma and delta are respectively the number of balance action steps, the weight coefficient of the waste heat effect and the weight coefficient of overshoot penalty intensity, and max represents taking the maximum value;
If the temperature control reaches the target temperature T goal, the bonus function formula is R=R+C, where C is a fixed high bonus value.
Further, the working process of the large-area source black body parameter-free temperature control model comprises the following steps:
s3.1, in the initialization stage of the large-area source blackbody parameter-free temperature control model, firstly, randomly initializing parameters of a neural network, and then, initializing an experience playback buffer zone, wherein the experience playback buffer zone is used for storing four-element groups of states, actions, rewards and next states generated in the interaction process of an intelligent body and an environment;
s3.2, adopting a greedy strategy in the action selection stage of the large-area source blackbody parameter-free temperature control model, and enabling the intelligent agent to use probability under the greedy strategy Randomly selecting one action a to search, and using probabilitySelecting an action A with the largest Q value in the current state, namely:
S3.3, in the state updating stage of the large-area-source blackbody parameter-free temperature control model, the intelligent agent calculates the influence of the duty ratio on the large-area-source blackbody parameter-free temperature control model according to the selected action A, updates the state of the environment, and obtains the following state S':
wherein T (t+1) is an updated temperature value, T E is an ambient temperature, deltaT (t+1) is a new temperature variation, and simultaneously, the intelligent agent calculates an instant reward R according to feedback of the environment and a set reward function;
S3.4, in the action updating stage of the large-area source black body parameter-free temperature control model, the depth DQN network learns from past experience to minimize the error of a target Q value and a predicted Q value, so that the large-area source black body parameter-free temperature control model is optimized, and rewards are maximized;
the optimized depth DQN network Loss function Loss is formulated as follows, which is used to train the depth DQN network to minimize the mean square error between the predicted value Q (s t,at; θ) and the target value Q target(st,at-:
Where Q (s t,at; θ) is the output of the current depth DQN network based on state s t and action a t, θ is a parameter of the depth DQN network, Q target(st,at-) is a target value calculated by the depth DQN network, θ - is a delayed updated version of the depth DQN network, the loss function is optimized by the batch size N, and an average of mean square errors is calculated each time for adjusting the parameter θ of the depth DQN network.
The invention also provides a large-area source black body parameter-free temperature control device which comprises one or more processors and is used for realizing the large-area source black body parameter-free temperature control method.
The invention also provides a readable storage medium, on which a program is stored, which when executed by a processor, implements a large-area blackbody parameter-free temperature control method as described above.
Compared with the prior art, the invention has the beneficial effects that:
The invention is based on a deep DQN network, and can quickly and accurately adjust the temperature to the target value in the minimum action steps through accurate state and action design and optimized rewarding function. Compared with the traditional temperature control method, the depth DQN network does not depend on a predefined control parameter, but adaptively adjusts the heating duty ratio through environmental feedback, so that the response speed and the accuracy of the temperature control process are improved. In addition, through experience playback and DQN value updating, the deep DQN network can continuously optimize a control strategy, adapt to complex environment changes, and ensure that the temperature control system always keeps high-efficiency stable operation. According to the invention, no parameter input is needed, and the algorithm learns the thermodynamic characteristics of the device through three data sets of heating, waste heat and cooling, so that the temperature control is realized. Adding overshoot penalty in constraint conditions of rewarding conditions of a temperature control algorithm, and avoiding overshoot phenomenon in the temperature control process, thereby accelerating the temperature control rate.
Aiming at the problems of complicated parameter adjustment, slower response speed, insufficient precision and the like of the traditional control algorithm in a complex temperature control scene, the invention provides a large-area source black body parameter-free temperature control method. The method is simple to operate and easy to realize in practical application, and can dynamically adjust the control strategy to obviously improve the temperature control rate. The intelligent temperature control system has the advantages that complicated parameters are not required to be preset, and different temperature control requirements can be met only by means of intelligent learning. Has extremely wide application prospect and important practical significance.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a diagram of an "agent-environment" interaction in a Markov decision process of the present invention.
Fig. 3 is a diagram of the deep DQN network of the invention.
FIG. 4 is a frame structure diagram of the large-area source black body parameter-free temperature control model of the invention.
FIG. 5 is a schematic diagram of a large-area source black body parameter-free temperature control device.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-4, a large-area source black body parameter-free temperature control method includes the following steps:
S1, acquiring a data set, generating a simulation temperature data set through blackbody system simulation, building an actual temperature acquisition system, acquiring a real temperature data set, mixing the simulation temperature data set and the real temperature data set, and dividing the mixture into a training data set and a test data set according to a proportion, thereby providing data support for model training and verification. The method specifically comprises the following steps:
S1.1, establishing a blackbody system simulation model, namely simulating the heating process of the blackbody system through the blackbody system simulation model, generating a simulation temperature data set covering different temperature ranges and heating rates, and providing basic data for subsequent model training and verification.
And (3) establishing an actual temperature data acquisition system, namely establishing the temperature acquisition system in an experimental environment, and acquiring actual blackbody temperature data by using a high-precision sensor to form an actual temperature data set which is compared with the simulation temperature data set.
S1.2, mixing the simulation temperature data set with the real temperature data set, and dividing the mixture into a training data set and a test data set in proportion.
S2, modeling the problem, namely using reinforcement learning for parameter-free temperature control, and firstly defining the control problem as an intelligent agent-environment interaction problem in a Markov decision process, namely MDP. The parameter-free temperature control method is regarded as an environment, and the deep DQN network is regarded as an Agent. The design of the system state, control actions and bonus functions is shown in fig. 2.
S2.1, defining a state space, namely, in the depth DQN based network, the more comprehensive the influence factors of state consideration are, the more complete the information received by the agent is, the more accurate the result of depth DQN network prediction is, and the input state fully describes the current state of the temperature control method. The state vector contains the current temperature T (T), the target temperature T goal, the ambient temperature T E and the temperature change rate Δt (T), i.e., the state s= [ T (T), T goal,TE, Δt (T) ], and describes the current operation state of the temperature control method comprehensively, see table 1.
TABLE 1 System State variable of large area source blackbody parameter-free temperature control method
And S2.2, defining an action space, wherein the action A represents the duty ratio value selected by an agent of the depth DQN network, and further, as shown in fig. 3, the depth DQN network comprises a data input layer, a full connection layer FC1, a full connection layer FC2, an activation function ReLU1 corresponding to the full connection layer FC1, an activation function ReLU2 corresponding to the full connection layer FC2, a full connection layer FC3 and an output layer.
And the data input layer receives 128-dimensional input data.
The two fully connected layers (FC 1 and FC 2) and the corresponding activation functions (ReLU 1 and ReLU 2) perform feature extraction and nonlinear transformation on the input data.
And the full connection layer FC3 is used for further processing the output of the first two layers to obtain the final 1-dimensional output.
And the output layer is used for outputting a final 1-dimensional result.
The deep DQN network first receives 128-dimensional input data, which is then feature extracted and non-linearly transformed by two fully connected layers (FC 1 and FC 2) and corresponding ReLU activation functions. Through the processing of these two hidden layers, the data is converted into a 128-dimensional intermediate representation. Finally, the deep DQN network uses a third fully connected layer FC3 to further compress these features to a 1-dimensional output. The whole network structure follows a typical fully-connected neural network architecture, and a required output result is finally obtained through multi-layer characteristic transformation. The action space is a= {10,20,30,..100 }, i.e. each time the agent selects a fixed duty ratio value according to the environmental state, to adjust the power output of the large-area source black body parameter-free temperature control model. Through the discretization action design, the intelligent body can accurately control the temperature rising process of the system in a certain range, thereby achieving the accurate control of the target temperature.
S2.3, the depth DQN network aims to enable a large-area source black body parameter-free temperature control model to learn to quickly and accurately approximate the temperature to the target temperature T goal in the least action steps. Meanwhile, the problem of waste heat effect and overshoot is considered, the problem is expanded into a multi-target optimal control problem, and the large-surface-source blackbody parameter-free temperature control model combines temperature error, waste heat effect, control efficiency and overshoot penalty into a rewarding function.
When the temperature control does not reach the target temperature, the bonus function is designed to:
Wherein R represents feedback rewards of environment to current actions, I represents absolute deviation between current temperature T (T) and target temperature T goal, N step is action steps taken by an intelligent agent for punishing inaccuracy of temperature regulation, N step is action steps taken by the intelligent agent for punishing excessive action times, the intelligent agent is encouraged to realize accurate regulation of temperature through fewer action steps, T residual represents additional temperature rise caused by waste heat effect for quantifying influence of waste heat on a large surface source black body parameter-free temperature control model, and lambda, gamma and delta are weight coefficients for balancing action steps, waste heat influence and overshoot punishment intensity respectively, and max represents taking maximum value. The temperature control model of the large-area source black body without the parameter is ensured to pursue temperature precision, redundant actions are reduced as much as possible, and meanwhile, overshoot risks are avoided.
If the temperature control reaches the target temperature T goal, the reward function formula is: Wherein C is a fixed high prize value indicating that the task is complete and the temperature is exactly to the desired target, motivating the agent to complete the task correctly. Through the bonus function design, the temperature error of the large-area-source blackbody parameter-free temperature control method is reduced, the overshoot phenomenon is strictly avoided, the efficiency of the large-area-source blackbody parameter-free temperature control model is further improved, the target temperature is accurately regulated within the minimum step number as much as possible, and the control efficiency of the large-area-source blackbody parameter-free temperature control model is effectively improved.
S3, constructing a large-area source blackbody parameter-free temperature control model based on a depth DQN network, then training the large-area source blackbody parameter-free temperature control model by utilizing a training data set, enabling an intelligent body to accurately adjust a duty ratio through a greedy strategy based on the depth DQN network, and realizing efficient target temperature control in the minimum steps, wherein the method comprises the following steps:
And the agent decision and the environment interaction are that in each time step, the agent of the depth DQN network selects an action A according to the current state, adjusts the power of the large-area source black body parameter-free temperature control model according to the action, updates the environment state to obtain the next state S', and calculates the instant rewards R through feedback.
Experience playback and policy optimization, namely storing interaction data (S, A, R, S ') by using an experience playback buffer zone, wherein S represents a current State (State) which is a comprehensive description of the current State of a large-surface-source blackbody parameter-free temperature control model, such as current temperature, target temperature, ambient temperature, temperature change rate and the like, A represents actions (actions) selected by an agent in the current State, such as specific values (such as 10% and 20%) of duty ratio, R represents feedback rewards (Reward) of the environment to the current actions and used for measuring the quality of the actions, and the rewards can be related to factors such as temperature errors, action steps, overshoot penalties and the like, and S' represents a Next State (Next State) after the actions are executed, such as updated temperature values, temperature change rate and the like. By minimizing the error between the target Q value and the predicted Q value, the depth DQN network is optimized, and rapid and accurate temperature regulation is realized.
S3 specifically comprises the following steps:
S3.1, in the initialization stage of the large-area-source blackbody parameter-free temperature control model, firstly, randomly initializing parameters of a neural network, which provides initial weights for the subsequent learning process, then, initializing an experience playback buffer zone, which is used for storing four-element groups of states, actions, rewards and next states generated in the interaction process of an intelligent agent and the environment, and then, entering an action selection stage, wherein the large-area-source blackbody parameter-free temperature control is used for selecting actions according to greedy strategies in each state.
S3.2, adopting a greedy strategy in the action selection stage of the large-area source blackbody parameter-free temperature control model, wherein under the strategy, the intelligent agent uses probabilityRandomly selecting one action a to search, thereby trying different strategies, and obtaining probabilitySelecting an action A with the largest Q value in the current state, namely:
at this point, the agent preferentially selects actions using the learned greedy strategy, thereby maximizing the expected rewards.
S3.3, in the state updating stage of the large-area-source blackbody parameter-free temperature control model, the intelligent agent calculates the influence of the duty ratio on the large-area-source blackbody parameter-free temperature control according to the selected action A, updates the state of the environment, and obtains the following state S':
Wherein T (t+1) is an updated temperature value, T E is an ambient temperature, deltaT (t+1) is a new temperature variation, meanwhile, the intelligent agent calculates an instant reward R according to feedback of the environment and a set reward function, the reward reflects the effect of the current action on a large-area source blackbody parameter-free temperature control model, and interaction data (S, A, R, S') are stored in an experience playback buffer area M to provide data support for subsequent learning.
S3.4, in the action updating stage of the large-area source blackbody parameter-free temperature control model, the depth DQN network learns from past experience to minimize the error of the target Q value and the predicted Q value, so that the temperature control method is optimized, and the rewards are maximized.
The optimized depth DQN network Loss function Loss is formulated as follows, which is used to train the depth DQN network to minimize the mean square error between the predicted value Q (s t,at; θ) and the target value Q target(st,at-:
Wherein Q (s t,at; θ) is the output of the current depth DQN network based on the state s t and the action a t, θ is a parameter of the depth DQN network, Q target(st,at-) is a target value calculated by the depth DQN network, θ - is a delay updated version of the depth DQN network, a loss function is optimized by the batch size N, and an average value of mean square error is calculated each time for adjusting the parameter θ of the depth DQN network so as to gradually approximate the real value of the state-action pair, and in the training process, the historical data is sampled by an experience playback mechanism, so that the correlation among samples is reduced and the generalization capability of the method is improved.
By continuously learning the heating, waste heat and cooling characteristics in the simulation and real data set, the deep DQN network can gradually master the optimal temperature control method, and quick and accurate temperature adjustment can be realized in practical application.
And S4, verifying and adjusting, namely in the verification stage, evaluating the effect of the trained large-area-source blackbody parameter-free temperature control model in practical application through a test data set, comparing the deviation of the predicted temperature and the target temperature, and adjusting the method parameters according to the result. The generalization capability of the model and the adaptability thereof in the real environment are verified by comparing the simulation data with the actual data. Meanwhile, error analysis is carried out on the prediction deviation of the model, and a reward function and a control strategy are adjusted to ensure that the model can stably run under changeable environmental conditions. Finally, the performance of the model is optimized through adjusting the strategy, so that the temperature control effect of the model in practical application is expected.
S5, iterative optimization, namely repeating the steps S3 and S4 until the depth DQN network can accurately adjust the temperature to the target temperature T goal in the minimum step number, and keeping the stability and the high efficiency of the large-area source blackbody temperature control model. In the iterative optimization process, training and verification are continuously carried out, and the decision capability of the depth DQN network is improved through experience playback and greedy strategies, so that the robustness and the optimality of the temperature control model are ensured. Finally, the intelligent body can quickly and accurately adjust the temperature according to different environmental changes and requirements, and redundant operation steps are reduced as much as possible while the control precision is ensured, so that the control efficiency and the reaction speed of the model are improved.
And finally, applying the optimal large-area source black body parameter-free temperature control model to a large-area source black body temperature control system to control the temperature of the large-area source black body.
Referring to fig. 5, the large-area-source black body parameter-free temperature control device provided by the invention comprises one or more processors, and is used for realizing the large-area-source black body parameter-free temperature control method in the embodiment.
The embodiment of the large-area source black body parameter-free temperature control device can be applied to any device with data processing capability, and the device with the data processing capability can be equipment or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with optional data processing capability where the large-area blackbody parameter-free temperature control apparatus of the present invention is located is shown, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the manufacturing of the optional data processing capability where the apparatus is located in the embodiment generally depends on the actual function of the optional data processing capability apparatus, and may further include other hardware, which is not described herein.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The invention also provides a readable storage medium, on which a program is stored, which when executed by a processor, implements a large-area blackbody parameter-free temperature control method in the above embodiment.
The readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing apparatus described in any of the previous embodiments. The readable storage medium may also be an external storage device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the apparatus. Further, the readable storage medium may also include both internal storage units and external storage devices of any data processing capable device. The readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1.一种大面源黑体无参温度控制方法,其特征在于,包括以下步骤:1. A large area source blackbody parameter-free temperature control method, characterized in that it comprises the following steps: S1,获取数据集,通过黑体系统仿真生成仿真温度数据集,并搭建实际温度采集系统,获取真实温度数据集,将仿真温度数据集与真实温度数据集混合后按比例划分为训练数据集和测试数据集;S1, obtain a data set, generate a simulated temperature data set through blackbody system simulation, and build an actual temperature acquisition system to obtain a real temperature data set, mix the simulated temperature data set with the real temperature data set, and divide it into a training data set and a test data set in proportion; S2,问题建模,将无参数温控问题建模为马尔可夫决策过程,定义“智能体-环境”交互机制,明确状态、动作和奖励的关系;S2, problem modeling, modeling the parameter-free temperature control problem as a Markov decision process, defining the "agent-environment" interaction mechanism, and clarifying the relationship between state, action, and reward; S3,以深度DQN网络为基础构建大面源黑体无参温度控制模型,然后利用训练数据集对大面源黑体无参温度控制模型进行训练;基于深度DQN网络使智能体通过贪心策略精准调节占空比,在最少步数内实现高效的目标温度控制;S3, based on the deep DQN network, a large area source black body parameter-free temperature control model is constructed, and then the large area source black body parameter-free temperature control model is trained using the training data set; based on the deep DQN network, the intelligent agent uses a greedy strategy to accurately adjust the duty cycle and achieve efficient target temperature control within the minimum number of steps; S4,验证与调整,使用测试数据集评估大面源黑体无参温度控制模型,比较预测温度与目标温度的偏差,并根据验证结果对大面源黑体无参温度控制模型的参数进行优化调整,得到最优的大面源黑体无参温度控制模型;S4, verification and adjustment, use the test data set to evaluate the large surface source blackbody non-parameter temperature control model, compare the deviation between the predicted temperature and the target temperature, and optimize and adjust the parameters of the large surface source blackbody non-parameter temperature control model according to the verification results to obtain the optimal large surface source blackbody non-parameter temperature control model; S5,将最优的大面源黑体无参温度控制模型应用于大面源黑体温度控制系统中对大面源黑体的温度进行控制。S5, applying the optimal large surface source black body parameter-free temperature control model to the large surface source black body temperature control system to control the temperature of the large surface source black body. 2.如权利要求1所述的一种大面源黑体无参温度控制方法,其特征在于,所述S1包括以下步骤:2. A large surface source blackbody parameter-free temperature control method as claimed in claim 1, characterized in that said S1 comprises the following steps: S1.1,建立黑体系统仿真模型:通过黑体系统仿真模型模拟黑体系统的升温过程,生成涵盖不同温度范围和升温速率的仿真温度数据集;S1.1, establish a blackbody system simulation model: simulate the heating process of the blackbody system through the blackbody system simulation model, and generate a simulation temperature data set covering different temperature ranges and heating rates; 建立实际温度数据采集系统:在实验环境中搭建温度采集系统,使用高精度传感器采集实际黑体温度数据,形成与仿真温度数据集对比的实际温度数据集;Establishing an actual temperature data acquisition system: Build a temperature acquisition system in the experimental environment, use a high-precision sensor to collect actual blackbody temperature data, and form an actual temperature data set for comparison with the simulated temperature data set; S1.2,将仿真温度数据集与真实温度数据集混合后按比例划分为训练数据集和测试数据集。S1.2, the simulated temperature dataset and the real temperature dataset are mixed and then divided into a training dataset and a test dataset in proportion. 3.如权利要求1所述的一种大面源黑体无参温度控制方法,其特征在于,所述S2包括以下步骤:3. A large surface source blackbody parameter-free temperature control method as claimed in claim 1, characterized in that said S2 comprises the following steps: S2.1,定义状态空间:定义状态向量包含当前温度T(t)、目标温度Tgoal、环境温度TE和温度变化速率△T(t),即状态S=[T(t),Tgoal,TE,△T(t)];S2.1, define state space: define the state vector including current temperature T(t), target temperature T goal , ambient temperature TE and temperature change rate △T(t), that is, state S=[T(t),T goal , TE ,△T(t)]; S2.2,定义动作空间:动作A表示大面源黑体无参温度控制算法的智能体选择的占空比值,动作空间为A={10,20,30,...100};S2.2, define the action space: action A represents the duty cycle value selected by the intelligent agent of the large surface source blackbody parameter-free temperature control algorithm, and the action space is A={10,20,30,...100}; S2.3,定义奖励函数,奖励函数同时考虑温度误差、余热效应、控制效率和超调惩罚。S2.3, define the reward function, which takes into account temperature error, residual heat effect, control efficiency and overshoot penalty. 4.如权利要求3所述的一种大面源黑体无参温度控制方法,其特征在于,所述S2.3还包括:当温度控制未达到目标温度时,奖励函数公式为:4. A large surface source blackbody parameter-free temperature control method as described in claim 3, characterized in that said S2.3 also includes: when the temperature control does not reach the target temperature, the reward function formula is: , 其中R表示环境对当前动作的反馈奖励,∣T(t)﹣Tgoal∣表示当前温度T(t)与目标温度Tgoal之间的绝对偏差;Nstep是智能体所采取的动作步数;Tresidual表示余热效应导致的额外温升;λ,γ,δ分别是平衡动作步数、余热影响的权重系数和超调惩罚强度的权重系数;max表示取最大值;Where R represents the feedback reward of the environment for the current action, |T(t)-T goal | represents the absolute deviation between the current temperature T(t) and the target temperature T goal ; N step is the number of action steps taken by the agent; T residual represents the additional temperature rise caused by the residual heat effect; λ, γ, δ are the weight coefficients for balancing the number of action steps, the residual heat effect, and the overshoot penalty intensity, respectively; max represents the maximum value; 若温度控制达到目标温度Tgoal,则奖励函数公式为:R=R+C,其中C是一个固定的高奖励值。If the temperature control reaches the target temperature T goal , the reward function formula is: R=R+C, where C is a fixed high reward value. 5.如权利要求4所述的一种大面源黑体无参温度控制方法,其特征在于,所述大面源黑体无参温度控制模型的工作过程包括:5. A large surface source blackbody parameter-free temperature control method as claimed in claim 4, characterized in that the working process of the large surface source blackbody parameter-free temperature control model includes: S3.1,在大面源黑体无参温度控制模型的初始化阶段,首先对神经网络的参数进行随机初始化;然后,初始化经验回放缓冲区 ,该经验回放缓冲区用于存储智能体与环境交互过程中产生的状态、动作、奖励和下一个状态的四元组;S3.1, in the initialization stage of the large surface source blackbody parameter-free temperature control model, the parameters of the neural network are first randomly initialized; then, the experience replay buffer is initialized, which is used to store the four-tuple of state, action, reward and next state generated during the interaction between the agent and the environment; S3.2,在大面源黑体无参温度控制模型的动作选择阶段,采用贪心策略,在贪心策略下,智能体以概率随机选择一个动作a 进行探索;以概率选择当前状态下Q值最大的动作A,即:S3.2, in the action selection stage of the large-surface blackbody parameter-free temperature control model, a greedy strategy is adopted. Under the greedy strategy, the intelligent agent uses probability Randomly select an action a to explore; with probability Select the action A with the largest Q value in the current state, that is: , S3.3,在大面源黑体无参温度控制模型的状态更新阶段,智能体根据所选动作A计算占空比对大面源黑体无参温度控制模型的影响,并更新环境的状态,得到下一状态S’:S3.3, in the state update stage of the large surface source blackbody non-parameter temperature control model, the intelligent agent calculates the influence of the duty cycle on the large surface source blackbody non-parameter temperature control model according to the selected action A, and updates the state of the environment to obtain the next state S': , 其中T(t+1)为更新后的温度值,TE为环境温度,△T(t+1)为新的温度变化量;同时,智能体根据环境的反馈和设定的奖励函数计算即时奖励R;交互数据(S,A,R,S’) 随后被存储到经验回放缓冲区M中;Where T(t+1) is the updated temperature value, TE is the ambient temperature, and △T(t+1) is the new temperature change. At the same time, the agent calculates the immediate reward R based on the feedback from the environment and the set reward function. The interaction data (S, A, R, S') is then stored in the experience replay buffer M. S3.4,在大面源黑体无参温度控制模型的动作更新阶段,深度DQN网络通过从过去经验中学习,最小化目标Q值与预测Q值的误差,从而优化大面源黑体无参温度控制模型,最大化奖励;S3.4, in the action update phase of the large surface source black body non-parameter temperature control model, the deep DQN network minimizes the error between the target Q value and the predicted Q value by learning from past experience, thereby optimizing the large surface source black body non-parameter temperature control model and maximizing the reward; 优化的深度DQN网络损失函数Loss公式如下,该损失函数用于训练深度DQN网络,以最小化预测值Q(st,at;θ)和目标值Qtarget(st,at;θ-)之间的均方误差:The optimized deep DQN network loss function Loss formula is as follows. This loss function is used to train the deep DQN network to minimize the mean square error between the predicted value Q(s t , at ;θ) and the target value Q target (s t , at- ): , 其中,Q(st,at;θ)是当前深度DQN网络基于状态st和动作at的输出,θ为深度DQN网络的参数;Qtarget(st,at;θ-)是通过深度DQN网络计算的目标值,θ-是深度DQN网络的延迟更新版本;损失函数通过批量大小N进行优化,每次计算均方误差的平均值,用于调整深度DQN网络的参数θ。Among them, Q(s t ,a t ; θ) is the output of the current deep DQN network based on state s t and action a t , and θ is the parameter of the deep DQN network; Q target (s t ,a t- ) is the target value calculated by the deep DQN network, and θ - is the delayed update version of the deep DQN network; the loss function is optimized by the batch size N, and the average value of the mean square error is calculated each time, which is used to adjust the parameter θ of the deep DQN network. 6.一种大面源黑体无参温度控制装置,其特征在于,包括一个或多个处理器,用于实现权利要求1-5中任一项所述的一种大面源黑体无参温度控制方法。6. A large surface source black body parameter-free temperature control device, characterized in that it comprises one or more processors for implementing a large surface source black body parameter-free temperature control method according to any one of claims 1 to 5. 7.一种可读存储介质,其特征在于,其上存储有程序,该程序被处理器执行时,实现权利要求1-5中任一项所述的一种大面源黑体无参温度控制方法。7. A readable storage medium, characterized in that a program is stored thereon, and when the program is executed by a processor, a large surface source blackbody parameter-free temperature control method according to any one of claims 1 to 5 is implemented.
CN202510105108.5A 2025-01-23 2025-01-23 A large surface source blackbody parameter-free temperature control method, device and medium Active CN119536413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510105108.5A CN119536413B (en) 2025-01-23 2025-01-23 A large surface source blackbody parameter-free temperature control method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510105108.5A CN119536413B (en) 2025-01-23 2025-01-23 A large surface source blackbody parameter-free temperature control method, device and medium

Publications (2)

Publication Number Publication Date
CN119536413A true CN119536413A (en) 2025-02-28
CN119536413B CN119536413B (en) 2025-05-06

Family

ID=94710813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510105108.5A Active CN119536413B (en) 2025-01-23 2025-01-23 A large surface source blackbody parameter-free temperature control method, device and medium

Country Status (1)

Country Link
CN (1) CN119536413B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018217903A1 (en) * 2017-05-24 2018-11-29 Relativity Space, Inc. Real-time adaptive control of additive manufacturing processes using machine learning
US20220253630A1 (en) * 2021-02-08 2022-08-11 Adobe Inc. Optimized policy-based active learning for content detection
CN116358114A (en) * 2023-05-06 2023-06-30 国网浙江省电力有限公司综合服务分公司 A temperature control method for air conditioners based on deep reinforcement learning
CN117171508A (en) * 2023-09-05 2023-12-05 石家庄铁道大学 Deep Q-learning bearing fault diagnosis method based on Bayesian optimization
CN117856258A (en) * 2023-11-23 2024-04-09 国网浙江省电力有限公司信息通信分公司 Target value competition-based multi-energy cooperative complementary optimization method, equipment and medium
CN118199078A (en) * 2024-03-22 2024-06-14 清华大学深圳国际研究生院 Robust reinforcement learning reactive power optimization method suitable for unobservable power distribution network
CN118821414A (en) * 2024-06-19 2024-10-22 国科大杭州高等研究院 A thrust prediction and control method, device and medium for electric thruster
CN119199619A (en) * 2024-11-27 2024-12-27 国网山东省电力公司潍坊供电公司 A method and system for intelligently evaluating the health status of a mobile energy storage power supply

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018217903A1 (en) * 2017-05-24 2018-11-29 Relativity Space, Inc. Real-time adaptive control of additive manufacturing processes using machine learning
US20220253630A1 (en) * 2021-02-08 2022-08-11 Adobe Inc. Optimized policy-based active learning for content detection
CN116358114A (en) * 2023-05-06 2023-06-30 国网浙江省电力有限公司综合服务分公司 A temperature control method for air conditioners based on deep reinforcement learning
CN117171508A (en) * 2023-09-05 2023-12-05 石家庄铁道大学 Deep Q-learning bearing fault diagnosis method based on Bayesian optimization
CN117856258A (en) * 2023-11-23 2024-04-09 国网浙江省电力有限公司信息通信分公司 Target value competition-based multi-energy cooperative complementary optimization method, equipment and medium
CN118199078A (en) * 2024-03-22 2024-06-14 清华大学深圳国际研究生院 Robust reinforcement learning reactive power optimization method suitable for unobservable power distribution network
CN118821414A (en) * 2024-06-19 2024-10-22 国科大杭州高等研究院 A thrust prediction and control method, device and medium for electric thruster
CN119199619A (en) * 2024-11-27 2024-12-27 国网山东省电力公司潍坊供电公司 A method and system for intelligently evaluating the health status of a mobile energy storage power supply

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHAO, LIYUAN: "Deep reinforcement learning-based joint load scheduling for household multi-energy system", 《 APPLIED ENERGY》, vol. 324, 3 October 2022 (2022-10-03) *
檀朝东 等: "基于强化学习的煤层气井螺杆泵 排采参数智能决策", 《石油钻采工艺》, vol. 42, no. 1, 31 January 2020 (2020-01-31), pages 62 - 69 *
罗丹峰: "基于深度强化学习的室内环境控制技术研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, no. 04, 15 April 2024 (2024-04-15), pages 038 - 870 *

Also Published As

Publication number Publication date
CN119536413B (en) 2025-05-06

Similar Documents

Publication Publication Date Title
US11915142B2 (en) Creating equipment control sequences from constraint data
CN111795484B (en) Intelligent air conditioner control method and system
CN112632860B (en) A method for identifying parameters of power transmission system model based on reinforcement learning
CN114391150B (en) Generative design based on machine learning with inverse and forward modeling
Deb et al. A machine learning-based framework for cost-optimal building retrofit
CN111337258B (en) Device and method for online calibration of engine control parameters by combining genetic algorithm and extremum search algorithm
CN116149166B (en) Unmanned rescue boat course control method based on improved beluga algorithm
CN102314533B (en) Methods and systems for matching a computed curve to a target curve
KR20160013012A (en) Methods for ascertaining a model of a starting variable of a technical system
CN116880164B (en) A method and device for determining the operation strategy of the terminal air-conditioning system of a data center
CN115562403A (en) Poultry house environment intelligent regulation and control method based on model prediction and improved particle swarm algorithm
CN118466173A (en) Environment temperature control optimization method for canine house
KR20220032861A (en) Neural architecture search method and attaratus considering performance in hardware
CN116243604A (en) Self-adaptive neural network sliding mode control method, device and medium for sewage denitrification process
Chakrabarty Optimizing closed-loop performance with data from similar systems: A Bayesian meta-learning approach
Duddeti et al. Constrained search space selection based optimization approach for enhanced reduced order approximation of interconnected power system models
CN119536413B (en) A large surface source blackbody parameter-free temperature control method, device and medium
CN114117778A (en) Control parameter determination method and device, electronic equipment and storage medium
CN118227442A (en) Processor performance model construction method and system based on micro-architecture parameters
CN114879644B (en) Rapid calibration method for controller parameters of automobile adaptive cruise control system
CN113673665B (en) Method, system, device and medium for optimizing wireless energy supply system of capsule robot
Burger et al. ARX model of a residential heating system with backpropagation parameter estimation algorithm
CN119597064B (en) Multi-channel balanced heating method, device and medium for large-area source black body
CN113267998B (en) A high-precision modeling and control method for atomic gyroscope temperature control system
US20230196149A1 (en) System and Method for Calibrating Digital Twins using Probabilistic Meta-Learning and Multi-Source Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant