CN119536413A

CN119536413A - A large surface source blackbody parameter-free temperature control method, device and medium

Info

Publication number: CN119536413A
Application number: CN202510105108.5A
Authority: CN
Inventors: 亓洪兴; 杨文航; 刘世界; 徐霖; 张阳阳; 朱首正; 何欣; 王建宇; 李春来; 金海军; 金柯
Original assignee: Hangzhou Institute of Advanced Studies of UCAS
Current assignee: Hangzhou Institute of Advanced Studies of UCAS
Priority date: 2025-01-23
Filing date: 2025-01-23
Publication date: 2025-02-28
Anticipated expiration: 2045-01-23
Also published as: CN119536413B

Abstract

The present invention belongs to the field of temperature control technology, and discloses a large surface source blackbody non-parameter temperature control method, device and medium, including: S1, obtaining a data set; S2, problem modeling; S3, building a large surface source blackbody non-parameter temperature control model based on a deep DQN network, and then using a training data set to train the large surface source blackbody non-parameter temperature control model; S4, verification and adjustment, using a test data set to evaluate the large surface source blackbody non-parameter temperature control model, and obtain the optimal large surface source blackbody non-parameter temperature control model; S5, applying the optimal large surface source blackbody non-parameter temperature control model to a large surface source blackbody temperature control system to control the temperature of the large surface source blackbody. Based on a deep DQN network, the present invention can quickly and accurately adjust the temperature to a target value within the minimum number of action steps through precise state and action design and an optimized reward function.

Description

Large-area source black body parameter-free temperature control method, device and medium

Technical Field

The invention relates to the technical field of temperature control, in particular to a large-area source black body parameter-free temperature control method, a device and a medium.

Background

The large-area source black body is used as calibration equipment with extremely high temperature control corresponding speed and precision requirements, and is widely applied to the scenes such as infrared detection equipment calibration, thermal imaging system test and the like. Traditional blackbody temperature control methods rely on mathematical modeling and parametric design, often requiring manual adjustment of multiple parameters to accommodate different experimental conditions. However, this method is prone to problems of inefficiency, insufficient accuracy, and high complexity of regulation when faced with the temperature control requirements of dynamic variation and nonlinear characteristics. Especially in high-precision infrared detection and measurement, the traditional method is difficult to quickly respond to temperature change, so that the testing and calibration process takes longer time, and the requirements on instantaneity and consistency cannot be met.

In order to solve the problems, reinforcement learning is used as a data-driven intelligent algorithm, and is an ideal tool for solving the complex temperature control problem because the reinforcement learning can be automatically learned and optimized under the condition of no parameters. The temperature control problem is modeled as a Markov Decision Process (MDP), and the temperature state can be perceived in real time by combining a depth DQN network and the large-surface-source black body parameter-free temperature control method, so that the power output is dynamically adjusted to realize accurate temperature control. The method abandons the traditional parameter dependence, completes the optimization of the depth DQN network through the interaction of the intelligent agent and the environment, remarkably improves the response speed and the control precision of temperature control, and solves the difficult problem of temperature control under the dynamic nonlinear condition. Meanwhile, the reinforcement learning method also has good self-adaptive capacity and expansibility, and provides a new research direction and application prospect for the development of temperature control technology.

Disclosure of Invention

The invention aims to provide a large-area source black body parameter-free temperature control method, a device and a medium, so as to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a large-area source black body parameter-free temperature control method comprises the following steps:

s1, acquiring a data set, generating a simulation temperature data set through blackbody system simulation, constructing an actual temperature acquisition system, acquiring a real temperature data set, mixing the simulation temperature data set and the real temperature data set, and dividing the mixture into a training data set and a test data set according to a proportion;

S2, modeling a problem, namely modeling a parameter-free temperature control problem as a Markov decision process, defining an intelligent body-environment interaction mechanism, and defining the relation of states, actions and rewards;

S3, constructing a large-area source blackbody parameter-free temperature control model based on a depth DQN network, and training the large-area source blackbody parameter-free temperature control model by utilizing a training data set;

S4, verifying and adjusting, namely evaluating a large-area-source black body parameter-free temperature control model by using a test data set, comparing the deviation between the predicted temperature and the target temperature, and optimally adjusting parameters of the large-area-source black body parameter-free temperature control model according to a verification result to obtain an optimal large-area-source black body parameter-free temperature control model;

And S5, applying the optimal large-area source black body parameter-free temperature control model to a large-area source black body temperature control system to control the temperature of the large-area source black body.

Further, the step S1 includes the steps of:

s1.1, establishing a blackbody system simulation model, namely simulating the heating process of the blackbody system through the blackbody system simulation model to generate a simulation temperature data set covering different temperature ranges and heating rates;

Setting up a temperature acquisition system in an experimental environment, and acquiring actual blackbody temperature data by using a high-precision sensor to form an actual temperature data set which is compared with a simulation temperature data set;

S1.2, mixing the simulation temperature data set with the real temperature data set, and dividing the mixture into a training data set and a test data set in proportion.

Further, the step S2 includes the steps of:

S2.1, defining a state space, wherein the defined state space comprises a current temperature T (T), a target temperature T _goal, an ambient temperature T _E and a temperature change rate delta T (T), namely states S= [ T (T), T _goal,T_E and delta T (T) ];

S2.2, defining an action space, wherein the action A represents a duty ratio value selected by an agent of a large-area source black body parameter-free temperature control algorithm, and the action space is A= {10,20,30,..100 };

s2.3, defining a reward function, wherein the reward function simultaneously considers temperature error, waste heat effect, control efficiency and overshoot penalty.

Further, the step S2.3 further comprises the step of, when the temperature control does not reach the target temperature, providing a reward function formula:

Wherein R represents feedback rewards of the environment to the current action, I represents absolute deviation between the current temperature T (T) and the target temperature T _goal, N _step represents the number of action steps taken by an intelligent agent, T _residual represents additional temperature rise caused by waste heat effect, lambda, gamma and delta are respectively the number of balance action steps, the weight coefficient of the waste heat effect and the weight coefficient of overshoot penalty intensity, and max represents taking the maximum value;

If the temperature control reaches the target temperature T _goal, the bonus function formula is R=R+C, where C is a fixed high bonus value.

Further, the working process of the large-area source black body parameter-free temperature control model comprises the following steps:

s3.1, in the initialization stage of the large-area source blackbody parameter-free temperature control model, firstly, randomly initializing parameters of a neural network, and then, initializing an experience playback buffer zone, wherein the experience playback buffer zone is used for storing four-element groups of states, actions, rewards and next states generated in the interaction process of an intelligent body and an environment;

s3.2, adopting a greedy strategy in the action selection stage of the large-area source blackbody parameter-free temperature control model, and enabling the intelligent agent to use probability under the greedy strategy Randomly selecting one action a to search, and using probabilitySelecting an action A with the largest Q value in the current state, namely:

S3.3, in the state updating stage of the large-area-source blackbody parameter-free temperature control model, the intelligent agent calculates the influence of the duty ratio on the large-area-source blackbody parameter-free temperature control model according to the selected action A, updates the state of the environment, and obtains the following state S':

wherein T (t+1) is an updated temperature value, T _E is an ambient temperature, deltaT (t+1) is a new temperature variation, and simultaneously, the intelligent agent calculates an instant reward R according to feedback of the environment and a set reward function;

S3.4, in the action updating stage of the large-area source black body parameter-free temperature control model, the depth DQN network learns from past experience to minimize the error of a target Q value and a predicted Q value, so that the large-area source black body parameter-free temperature control model is optimized, and rewards are maximized;

the optimized depth DQN network Loss function Loss is formulated as follows, which is used to train the depth DQN network to minimize the mean square error between the predicted value Q (s _t,a_t; θ) and the target value Q _target(s_t,a_t;θ^-:

Where Q (s _t,a_t; θ) is the output of the current depth DQN network based on state s _t and action a _t, θ is a parameter of the depth DQN network, Q _target(s_t,a_t;θ^-) is a target value calculated by the depth DQN network, θ ^- is a delayed updated version of the depth DQN network, the loss function is optimized by the batch size N, and an average of mean square errors is calculated each time for adjusting the parameter θ of the depth DQN network.

The invention also provides a large-area source black body parameter-free temperature control device which comprises one or more processors and is used for realizing the large-area source black body parameter-free temperature control method.

The invention also provides a readable storage medium, on which a program is stored, which when executed by a processor, implements a large-area blackbody parameter-free temperature control method as described above.

Compared with the prior art, the invention has the beneficial effects that:

The invention is based on a deep DQN network, and can quickly and accurately adjust the temperature to the target value in the minimum action steps through accurate state and action design and optimized rewarding function. Compared with the traditional temperature control method, the depth DQN network does not depend on a predefined control parameter, but adaptively adjusts the heating duty ratio through environmental feedback, so that the response speed and the accuracy of the temperature control process are improved. In addition, through experience playback and DQN value updating, the deep DQN network can continuously optimize a control strategy, adapt to complex environment changes, and ensure that the temperature control system always keeps high-efficiency stable operation. According to the invention, no parameter input is needed, and the algorithm learns the thermodynamic characteristics of the device through three data sets of heating, waste heat and cooling, so that the temperature control is realized. Adding overshoot penalty in constraint conditions of rewarding conditions of a temperature control algorithm, and avoiding overshoot phenomenon in the temperature control process, thereby accelerating the temperature control rate.

Aiming at the problems of complicated parameter adjustment, slower response speed, insufficient precision and the like of the traditional control algorithm in a complex temperature control scene, the invention provides a large-area source black body parameter-free temperature control method. The method is simple to operate and easy to realize in practical application, and can dynamically adjust the control strategy to obviously improve the temperature control rate. The intelligent temperature control system has the advantages that complicated parameters are not required to be preset, and different temperature control requirements can be met only by means of intelligent learning. Has extremely wide application prospect and important practical significance.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a diagram of an "agent-environment" interaction in a Markov decision process of the present invention.

Fig. 3 is a diagram of the deep DQN network of the invention.

FIG. 4 is a frame structure diagram of the large-area source black body parameter-free temperature control model of the invention.

FIG. 5 is a schematic diagram of a large-area source black body parameter-free temperature control device.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-4, a large-area source black body parameter-free temperature control method includes the following steps:

S1, acquiring a data set, generating a simulation temperature data set through blackbody system simulation, building an actual temperature acquisition system, acquiring a real temperature data set, mixing the simulation temperature data set and the real temperature data set, and dividing the mixture into a training data set and a test data set according to a proportion, thereby providing data support for model training and verification. The method specifically comprises the following steps:

S1.1, establishing a blackbody system simulation model, namely simulating the heating process of the blackbody system through the blackbody system simulation model, generating a simulation temperature data set covering different temperature ranges and heating rates, and providing basic data for subsequent model training and verification.

And (3) establishing an actual temperature data acquisition system, namely establishing the temperature acquisition system in an experimental environment, and acquiring actual blackbody temperature data by using a high-precision sensor to form an actual temperature data set which is compared with the simulation temperature data set.

S2, modeling the problem, namely using reinforcement learning for parameter-free temperature control, and firstly defining the control problem as an intelligent agent-environment interaction problem in a Markov decision process, namely MDP. The parameter-free temperature control method is regarded as an environment, and the deep DQN network is regarded as an Agent. The design of the system state, control actions and bonus functions is shown in fig. 2.

S2.1, defining a state space, namely, in the depth DQN based network, the more comprehensive the influence factors of state consideration are, the more complete the information received by the agent is, the more accurate the result of depth DQN network prediction is, and the input state fully describes the current state of the temperature control method. The state vector contains the current temperature T (T), the target temperature T _goal, the ambient temperature T _E and the temperature change rate Δt (T), i.e., the state s= [ T (T), T _goal,T_E, Δt (T) ], and describes the current operation state of the temperature control method comprehensively, see table 1.

TABLE 1 System State variable of large area source blackbody parameter-free temperature control method

And S2.2, defining an action space, wherein the action A represents the duty ratio value selected by an agent of the depth DQN network, and further, as shown in fig. 3, the depth DQN network comprises a data input layer, a full connection layer FC1, a full connection layer FC2, an activation function ReLU1 corresponding to the full connection layer FC1, an activation function ReLU2 corresponding to the full connection layer FC2, a full connection layer FC3 and an output layer.

And the data input layer receives 128-dimensional input data.

The two fully connected layers (FC 1 and FC 2) and the corresponding activation functions (ReLU 1 and ReLU 2) perform feature extraction and nonlinear transformation on the input data.

And the full connection layer FC3 is used for further processing the output of the first two layers to obtain the final 1-dimensional output.

And the output layer is used for outputting a final 1-dimensional result.

The deep DQN network first receives 128-dimensional input data, which is then feature extracted and non-linearly transformed by two fully connected layers (FC 1 and FC 2) and corresponding ReLU activation functions. Through the processing of these two hidden layers, the data is converted into a 128-dimensional intermediate representation. Finally, the deep DQN network uses a third fully connected layer FC3 to further compress these features to a 1-dimensional output. The whole network structure follows a typical fully-connected neural network architecture, and a required output result is finally obtained through multi-layer characteristic transformation. The action space is a= {10,20,30,..100 }, i.e. each time the agent selects a fixed duty ratio value according to the environmental state, to adjust the power output of the large-area source black body parameter-free temperature control model. Through the discretization action design, the intelligent body can accurately control the temperature rising process of the system in a certain range, thereby achieving the accurate control of the target temperature.

S2.3, the depth DQN network aims to enable a large-area source black body parameter-free temperature control model to learn to quickly and accurately approximate the temperature to the target temperature T _goal in the least action steps. Meanwhile, the problem of waste heat effect and overshoot is considered, the problem is expanded into a multi-target optimal control problem, and the large-surface-source blackbody parameter-free temperature control model combines temperature error, waste heat effect, control efficiency and overshoot penalty into a rewarding function.

When the temperature control does not reach the target temperature, the bonus function is designed to:

Wherein R represents feedback rewards of environment to current actions, I represents absolute deviation between current temperature T (T) and target temperature T _goal, N _step is action steps taken by an intelligent agent for punishing inaccuracy of temperature regulation, N _step is action steps taken by the intelligent agent for punishing excessive action times, the intelligent agent is encouraged to realize accurate regulation of temperature through fewer action steps, T _residual represents additional temperature rise caused by waste heat effect for quantifying influence of waste heat on a large surface source black body parameter-free temperature control model, and lambda, gamma and delta are weight coefficients for balancing action steps, waste heat influence and overshoot punishment intensity respectively, and max represents taking maximum value. The temperature control model of the large-area source black body without the parameter is ensured to pursue temperature precision, redundant actions are reduced as much as possible, and meanwhile, overshoot risks are avoided.

If the temperature control reaches the target temperature T _goal, the reward function formula is: Wherein C is a fixed high prize value indicating that the task is complete and the temperature is exactly to the desired target, motivating the agent to complete the task correctly. Through the bonus function design, the temperature error of the large-area-source blackbody parameter-free temperature control method is reduced, the overshoot phenomenon is strictly avoided, the efficiency of the large-area-source blackbody parameter-free temperature control model is further improved, the target temperature is accurately regulated within the minimum step number as much as possible, and the control efficiency of the large-area-source blackbody parameter-free temperature control model is effectively improved.

S3, constructing a large-area source blackbody parameter-free temperature control model based on a depth DQN network, then training the large-area source blackbody parameter-free temperature control model by utilizing a training data set, enabling an intelligent body to accurately adjust a duty ratio through a greedy strategy based on the depth DQN network, and realizing efficient target temperature control in the minimum steps, wherein the method comprises the following steps:

And the agent decision and the environment interaction are that in each time step, the agent of the depth DQN network selects an action A according to the current state, adjusts the power of the large-area source black body parameter-free temperature control model according to the action, updates the environment state to obtain the next state S', and calculates the instant rewards R through feedback.

Experience playback and policy optimization, namely storing interaction data (S, A, R, S ') by using an experience playback buffer zone, wherein S represents a current State (State) which is a comprehensive description of the current State of a large-surface-source blackbody parameter-free temperature control model, such as current temperature, target temperature, ambient temperature, temperature change rate and the like, A represents actions (actions) selected by an agent in the current State, such as specific values (such as 10% and 20%) of duty ratio, R represents feedback rewards (Reward) of the environment to the current actions and used for measuring the quality of the actions, and the rewards can be related to factors such as temperature errors, action steps, overshoot penalties and the like, and S' represents a Next State (Next State) after the actions are executed, such as updated temperature values, temperature change rate and the like. By minimizing the error between the target Q value and the predicted Q value, the depth DQN network is optimized, and rapid and accurate temperature regulation is realized.

S3 specifically comprises the following steps:

S3.1, in the initialization stage of the large-area-source blackbody parameter-free temperature control model, firstly, randomly initializing parameters of a neural network, which provides initial weights for the subsequent learning process, then, initializing an experience playback buffer zone, which is used for storing four-element groups of states, actions, rewards and next states generated in the interaction process of an intelligent agent and the environment, and then, entering an action selection stage, wherein the large-area-source blackbody parameter-free temperature control is used for selecting actions according to greedy strategies in each state.

S3.2, adopting a greedy strategy in the action selection stage of the large-area source blackbody parameter-free temperature control model, wherein under the strategy, the intelligent agent uses probabilityRandomly selecting one action a to search, thereby trying different strategies, and obtaining probabilitySelecting an action A with the largest Q value in the current state, namely:

at this point, the agent preferentially selects actions using the learned greedy strategy, thereby maximizing the expected rewards.

S3.3, in the state updating stage of the large-area-source blackbody parameter-free temperature control model, the intelligent agent calculates the influence of the duty ratio on the large-area-source blackbody parameter-free temperature control according to the selected action A, updates the state of the environment, and obtains the following state S':

Wherein T (t+1) is an updated temperature value, T _E is an ambient temperature, deltaT (t+1) is a new temperature variation, meanwhile, the intelligent agent calculates an instant reward R according to feedback of the environment and a set reward function, the reward reflects the effect of the current action on a large-area source blackbody parameter-free temperature control model, and interaction data (S, A, R, S') are stored in an experience playback buffer area M to provide data support for subsequent learning.

S3.4, in the action updating stage of the large-area source blackbody parameter-free temperature control model, the depth DQN network learns from past experience to minimize the error of the target Q value and the predicted Q value, so that the temperature control method is optimized, and the rewards are maximized.

Wherein Q (s _t,a_t; θ) is the output of the current depth DQN network based on the state s _t and the action a _t, θ is a parameter of the depth DQN network, Q _target(s_t,a_t;θ^-) is a target value calculated by the depth DQN network, θ ^- is a delay updated version of the depth DQN network, a loss function is optimized by the batch size N, and an average value of mean square error is calculated each time for adjusting the parameter θ of the depth DQN network so as to gradually approximate the real value of the state-action pair, and in the training process, the historical data is sampled by an experience playback mechanism, so that the correlation among samples is reduced and the generalization capability of the method is improved.

By continuously learning the heating, waste heat and cooling characteristics in the simulation and real data set, the deep DQN network can gradually master the optimal temperature control method, and quick and accurate temperature adjustment can be realized in practical application.

And S4, verifying and adjusting, namely in the verification stage, evaluating the effect of the trained large-area-source blackbody parameter-free temperature control model in practical application through a test data set, comparing the deviation of the predicted temperature and the target temperature, and adjusting the method parameters according to the result. The generalization capability of the model and the adaptability thereof in the real environment are verified by comparing the simulation data with the actual data. Meanwhile, error analysis is carried out on the prediction deviation of the model, and a reward function and a control strategy are adjusted to ensure that the model can stably run under changeable environmental conditions. Finally, the performance of the model is optimized through adjusting the strategy, so that the temperature control effect of the model in practical application is expected.

S5, iterative optimization, namely repeating the steps S3 and S4 until the depth DQN network can accurately adjust the temperature to the target temperature T _goal in the minimum step number, and keeping the stability and the high efficiency of the large-area source blackbody temperature control model. In the iterative optimization process, training and verification are continuously carried out, and the decision capability of the depth DQN network is improved through experience playback and greedy strategies, so that the robustness and the optimality of the temperature control model are ensured. Finally, the intelligent body can quickly and accurately adjust the temperature according to different environmental changes and requirements, and redundant operation steps are reduced as much as possible while the control precision is ensured, so that the control efficiency and the reaction speed of the model are improved.

And finally, applying the optimal large-area source black body parameter-free temperature control model to a large-area source black body temperature control system to control the temperature of the large-area source black body.

Referring to fig. 5, the large-area-source black body parameter-free temperature control device provided by the invention comprises one or more processors, and is used for realizing the large-area-source black body parameter-free temperature control method in the embodiment.

The embodiment of the large-area source black body parameter-free temperature control device can be applied to any device with data processing capability, and the device with the data processing capability can be equipment or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with optional data processing capability where the large-area blackbody parameter-free temperature control apparatus of the present invention is located is shown, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the manufacturing of the optional data processing capability where the apparatus is located in the embodiment generally depends on the actual function of the optional data processing capability apparatus, and may further include other hardware, which is not described herein.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The invention also provides a readable storage medium, on which a program is stored, which when executed by a processor, implements a large-area blackbody parameter-free temperature control method in the above embodiment.

The readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing apparatus described in any of the previous embodiments. The readable storage medium may also be an external storage device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the apparatus. Further, the readable storage medium may also include both internal storage units and external storage devices of any data processing capable device. The readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A large area source blackbody parameter-free temperature control method, characterized in that it comprises the following steps:

S1, obtain a data set, generate a simulated temperature data set through blackbody system simulation, and build an actual temperature acquisition system to obtain a real temperature data set, mix the simulated temperature data set with the real temperature data set, and divide it into a training data set and a test data set in proportion;

S2, problem modeling, modeling the parameter-free temperature control problem as a Markov decision process, defining the "agent-environment" interaction mechanism, and clarifying the relationship between state, action, and reward;

S3, based on the deep DQN network, a large area source black body parameter-free temperature control model is constructed, and then the large area source black body parameter-free temperature control model is trained using the training data set; based on the deep DQN network, the intelligent agent uses a greedy strategy to accurately adjust the duty cycle and achieve efficient target temperature control within the minimum number of steps;

S4, verification and adjustment, use the test data set to evaluate the large surface source blackbody non-parameter temperature control model, compare the deviation between the predicted temperature and the target temperature, and optimize and adjust the parameters of the large surface source blackbody non-parameter temperature control model according to the verification results to obtain the optimal large surface source blackbody non-parameter temperature control model;

S5, applying the optimal large surface source black body parameter-free temperature control model to the large surface source black body temperature control system to control the temperature of the large surface source black body.

2. A large surface source blackbody parameter-free temperature control method as claimed in claim 1, characterized in that said S1 comprises the following steps:

S1.1, establish a blackbody system simulation model: simulate the heating process of the blackbody system through the blackbody system simulation model, and generate a simulation temperature data set covering different temperature ranges and heating rates;

Establishing an actual temperature data acquisition system: Build a temperature acquisition system in the experimental environment, use a high-precision sensor to collect actual blackbody temperature data, and form an actual temperature data set for comparison with the simulated temperature data set;

S1.2, the simulated temperature dataset and the real temperature dataset are mixed and then divided into a training dataset and a test dataset in proportion.

3. A large surface source blackbody parameter-free temperature control method as claimed in claim 1, characterized in that said S2 comprises the following steps:

S2.1, define state space: define the state vector including current temperature T(t), target temperature T _goal , ambient temperature _TE and temperature change rate △T(t), that is, state S=[T(t),T _goal , _TE ,△T(t)];

S2.2, define the action space: action A represents the duty cycle value selected by the intelligent agent of the large surface source blackbody parameter-free temperature control algorithm, and the action space is A={10,20,30,...100};

S2.3, define the reward function, which takes into account temperature error, residual heat effect, control efficiency and overshoot penalty.

4. A large surface source blackbody parameter-free temperature control method as described in claim 3, characterized in that said S2.3 also includes: when the temperature control does not reach the target temperature, the reward function formula is:

,

Where R represents the feedback reward of the environment for the current action, |T(t)-T _goal | represents the absolute deviation between the current temperature T(t) and the target temperature T _goal ; N _step is the number of action steps taken by the agent; T _residual represents the additional temperature rise caused by the residual heat effect; λ, γ, δ are the weight coefficients for balancing the number of action steps, the residual heat effect, and the overshoot penalty intensity, respectively; max represents the maximum value;

If the temperature control reaches the target temperature T _goal , the reward function formula is: R=R+C, where C is a fixed high reward value.

5. A large surface source blackbody parameter-free temperature control method as claimed in claim 4, characterized in that the working process of the large surface source blackbody parameter-free temperature control model includes:

S3.1, in the initialization stage of the large surface source blackbody parameter-free temperature control model, the parameters of the neural network are first randomly initialized; then, the experience replay buffer is initialized, which is used to store the four-tuple of state, action, reward and next state generated during the interaction between the agent and the environment;

S3.2, in the action selection stage of the large-surface blackbody parameter-free temperature control model, a greedy strategy is adopted. Under the greedy strategy, the intelligent agent uses probability Randomly select an action a to explore; with probability Select the action A with the largest Q value in the current state, that is:

,

S3.3, in the state update stage of the large surface source blackbody non-parameter temperature control model, the intelligent agent calculates the influence of the duty cycle on the large surface source blackbody non-parameter temperature control model according to the selected action A, and updates the state of the environment to obtain the next state S':

,

Where T(t+1) is the updated temperature value, _TE is the ambient temperature, and △T(t+1) is the new temperature change. At the same time, the agent calculates the immediate reward R based on the feedback from the environment and the set reward function. The interaction data (S, A, R, S') is then stored in the experience replay buffer M.

S3.4, in the action update phase of the large surface source black body non-parameter temperature control model, the deep DQN network minimizes the error between the target Q value and the predicted Q value by learning from past experience, thereby optimizing the large surface source black body non-parameter temperature control model and maximizing the reward;

The optimized deep DQN network loss function Loss formula is as follows. This loss function is used to train the deep DQN network to minimize the mean square error between the predicted value Q(s _t , _at ;θ) and the target value Q _target (s _t , _at ;θ ^- ):

,

Among them, Q(s _t ,a _t ; θ) is the output of the current deep DQN network based on state s _t and action a _t , and θ is the parameter of the deep DQN network; Q _target (s _t ,a _t ;θ ^- ) is the target value calculated by the deep DQN network, and θ ^- is the delayed update version of the deep DQN network; the loss function is optimized by the batch size N, and the average value of the mean square error is calculated each time, which is used to adjust the parameter θ of the deep DQN network.

6. A large surface source black body parameter-free temperature control device, characterized in that it comprises one or more processors for implementing a large surface source black body parameter-free temperature control method according to any one of claims 1 to 5.

7. A readable storage medium, characterized in that a program is stored thereon, and when the program is executed by a processor, a large surface source blackbody parameter-free temperature control method according to any one of claims 1 to 5 is implemented.