Disclosure of Invention
The invention aims to provide a large-area source black body parameter-free temperature control method, a device and a medium, so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a large-area source black body parameter-free temperature control method comprises the following steps:
s1, acquiring a data set, generating a simulation temperature data set through blackbody system simulation, constructing an actual temperature acquisition system, acquiring a real temperature data set, mixing the simulation temperature data set and the real temperature data set, and dividing the mixture into a training data set and a test data set according to a proportion;
S2, modeling a problem, namely modeling a parameter-free temperature control problem as a Markov decision process, defining an intelligent body-environment interaction mechanism, and defining the relation of states, actions and rewards;
S3, constructing a large-area source blackbody parameter-free temperature control model based on a depth DQN network, and training the large-area source blackbody parameter-free temperature control model by utilizing a training data set;
S4, verifying and adjusting, namely evaluating a large-area-source black body parameter-free temperature control model by using a test data set, comparing the deviation between the predicted temperature and the target temperature, and optimally adjusting parameters of the large-area-source black body parameter-free temperature control model according to a verification result to obtain an optimal large-area-source black body parameter-free temperature control model;
And S5, applying the optimal large-area source black body parameter-free temperature control model to a large-area source black body temperature control system to control the temperature of the large-area source black body.
Further, the step S1 includes the steps of:
s1.1, establishing a blackbody system simulation model, namely simulating the heating process of the blackbody system through the blackbody system simulation model to generate a simulation temperature data set covering different temperature ranges and heating rates;
Setting up a temperature acquisition system in an experimental environment, and acquiring actual blackbody temperature data by using a high-precision sensor to form an actual temperature data set which is compared with a simulation temperature data set;
S1.2, mixing the simulation temperature data set with the real temperature data set, and dividing the mixture into a training data set and a test data set in proportion.
Further, the step S2 includes the steps of:
S2.1, defining a state space, wherein the defined state space comprises a current temperature T (T), a target temperature T goal, an ambient temperature T E and a temperature change rate delta T (T), namely states S= [ T (T), T goal,TE and delta T (T) ];
S2.2, defining an action space, wherein the action A represents a duty ratio value selected by an agent of a large-area source black body parameter-free temperature control algorithm, and the action space is A= {10,20,30,..100 };
s2.3, defining a reward function, wherein the reward function simultaneously considers temperature error, waste heat effect, control efficiency and overshoot penalty.
Further, the step S2.3 further comprises the step of, when the temperature control does not reach the target temperature, providing a reward function formula:
Wherein R represents feedback rewards of the environment to the current action, I represents absolute deviation between the current temperature T (T) and the target temperature T goal, N step represents the number of action steps taken by an intelligent agent, T residual represents additional temperature rise caused by waste heat effect, lambda, gamma and delta are respectively the number of balance action steps, the weight coefficient of the waste heat effect and the weight coefficient of overshoot penalty intensity, and max represents taking the maximum value;
If the temperature control reaches the target temperature T goal, the bonus function formula is R=R+C, where C is a fixed high bonus value.
Further, the working process of the large-area source black body parameter-free temperature control model comprises the following steps:
s3.1, in the initialization stage of the large-area source blackbody parameter-free temperature control model, firstly, randomly initializing parameters of a neural network, and then, initializing an experience playback buffer zone, wherein the experience playback buffer zone is used for storing four-element groups of states, actions, rewards and next states generated in the interaction process of an intelligent body and an environment;
s3.2, adopting a greedy strategy in the action selection stage of the large-area source blackbody parameter-free temperature control model, and enabling the intelligent agent to use probability under the greedy strategy Randomly selecting one action a to search, and using probabilitySelecting an action A with the largest Q value in the current state, namely:
S3.3, in the state updating stage of the large-area-source blackbody parameter-free temperature control model, the intelligent agent calculates the influence of the duty ratio on the large-area-source blackbody parameter-free temperature control model according to the selected action A, updates the state of the environment, and obtains the following state S':
wherein T (t+1) is an updated temperature value, T E is an ambient temperature, deltaT (t+1) is a new temperature variation, and simultaneously, the intelligent agent calculates an instant reward R according to feedback of the environment and a set reward function;
S3.4, in the action updating stage of the large-area source black body parameter-free temperature control model, the depth DQN network learns from past experience to minimize the error of a target Q value and a predicted Q value, so that the large-area source black body parameter-free temperature control model is optimized, and rewards are maximized;
the optimized depth DQN network Loss function Loss is formulated as follows, which is used to train the depth DQN network to minimize the mean square error between the predicted value Q (s t,at; θ) and the target value Q target(st,at;θ-:
Where Q (s t,at; θ) is the output of the current depth DQN network based on state s t and action a t, θ is a parameter of the depth DQN network, Q target(st,at;θ-) is a target value calculated by the depth DQN network, θ - is a delayed updated version of the depth DQN network, the loss function is optimized by the batch size N, and an average of mean square errors is calculated each time for adjusting the parameter θ of the depth DQN network.
The invention also provides a large-area source black body parameter-free temperature control device which comprises one or more processors and is used for realizing the large-area source black body parameter-free temperature control method.
The invention also provides a readable storage medium, on which a program is stored, which when executed by a processor, implements a large-area blackbody parameter-free temperature control method as described above.
Compared with the prior art, the invention has the beneficial effects that:
The invention is based on a deep DQN network, and can quickly and accurately adjust the temperature to the target value in the minimum action steps through accurate state and action design and optimized rewarding function. Compared with the traditional temperature control method, the depth DQN network does not depend on a predefined control parameter, but adaptively adjusts the heating duty ratio through environmental feedback, so that the response speed and the accuracy of the temperature control process are improved. In addition, through experience playback and DQN value updating, the deep DQN network can continuously optimize a control strategy, adapt to complex environment changes, and ensure that the temperature control system always keeps high-efficiency stable operation. According to the invention, no parameter input is needed, and the algorithm learns the thermodynamic characteristics of the device through three data sets of heating, waste heat and cooling, so that the temperature control is realized. Adding overshoot penalty in constraint conditions of rewarding conditions of a temperature control algorithm, and avoiding overshoot phenomenon in the temperature control process, thereby accelerating the temperature control rate.
Aiming at the problems of complicated parameter adjustment, slower response speed, insufficient precision and the like of the traditional control algorithm in a complex temperature control scene, the invention provides a large-area source black body parameter-free temperature control method. The method is simple to operate and easy to realize in practical application, and can dynamically adjust the control strategy to obviously improve the temperature control rate. The intelligent temperature control system has the advantages that complicated parameters are not required to be preset, and different temperature control requirements can be met only by means of intelligent learning. Has extremely wide application prospect and important practical significance.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-4, a large-area source black body parameter-free temperature control method includes the following steps:
S1, acquiring a data set, generating a simulation temperature data set through blackbody system simulation, building an actual temperature acquisition system, acquiring a real temperature data set, mixing the simulation temperature data set and the real temperature data set, and dividing the mixture into a training data set and a test data set according to a proportion, thereby providing data support for model training and verification. The method specifically comprises the following steps:
S1.1, establishing a blackbody system simulation model, namely simulating the heating process of the blackbody system through the blackbody system simulation model, generating a simulation temperature data set covering different temperature ranges and heating rates, and providing basic data for subsequent model training and verification.
And (3) establishing an actual temperature data acquisition system, namely establishing the temperature acquisition system in an experimental environment, and acquiring actual blackbody temperature data by using a high-precision sensor to form an actual temperature data set which is compared with the simulation temperature data set.
S1.2, mixing the simulation temperature data set with the real temperature data set, and dividing the mixture into a training data set and a test data set in proportion.
S2, modeling the problem, namely using reinforcement learning for parameter-free temperature control, and firstly defining the control problem as an intelligent agent-environment interaction problem in a Markov decision process, namely MDP. The parameter-free temperature control method is regarded as an environment, and the deep DQN network is regarded as an Agent. The design of the system state, control actions and bonus functions is shown in fig. 2.
S2.1, defining a state space, namely, in the depth DQN based network, the more comprehensive the influence factors of state consideration are, the more complete the information received by the agent is, the more accurate the result of depth DQN network prediction is, and the input state fully describes the current state of the temperature control method. The state vector contains the current temperature T (T), the target temperature T goal, the ambient temperature T E and the temperature change rate Δt (T), i.e., the state s= [ T (T), T goal,TE, Δt (T) ], and describes the current operation state of the temperature control method comprehensively, see table 1.
TABLE 1 System State variable of large area source blackbody parameter-free temperature control method
And S2.2, defining an action space, wherein the action A represents the duty ratio value selected by an agent of the depth DQN network, and further, as shown in fig. 3, the depth DQN network comprises a data input layer, a full connection layer FC1, a full connection layer FC2, an activation function ReLU1 corresponding to the full connection layer FC1, an activation function ReLU2 corresponding to the full connection layer FC2, a full connection layer FC3 and an output layer.
And the data input layer receives 128-dimensional input data.
The two fully connected layers (FC 1 and FC 2) and the corresponding activation functions (ReLU 1 and ReLU 2) perform feature extraction and nonlinear transformation on the input data.
And the full connection layer FC3 is used for further processing the output of the first two layers to obtain the final 1-dimensional output.
And the output layer is used for outputting a final 1-dimensional result.
The deep DQN network first receives 128-dimensional input data, which is then feature extracted and non-linearly transformed by two fully connected layers (FC 1 and FC 2) and corresponding ReLU activation functions. Through the processing of these two hidden layers, the data is converted into a 128-dimensional intermediate representation. Finally, the deep DQN network uses a third fully connected layer FC3 to further compress these features to a 1-dimensional output. The whole network structure follows a typical fully-connected neural network architecture, and a required output result is finally obtained through multi-layer characteristic transformation. The action space is a= {10,20,30,..100 }, i.e. each time the agent selects a fixed duty ratio value according to the environmental state, to adjust the power output of the large-area source black body parameter-free temperature control model. Through the discretization action design, the intelligent body can accurately control the temperature rising process of the system in a certain range, thereby achieving the accurate control of the target temperature.
S2.3, the depth DQN network aims to enable a large-area source black body parameter-free temperature control model to learn to quickly and accurately approximate the temperature to the target temperature T goal in the least action steps. Meanwhile, the problem of waste heat effect and overshoot is considered, the problem is expanded into a multi-target optimal control problem, and the large-surface-source blackbody parameter-free temperature control model combines temperature error, waste heat effect, control efficiency and overshoot penalty into a rewarding function.
When the temperature control does not reach the target temperature, the bonus function is designed to:
Wherein R represents feedback rewards of environment to current actions, I represents absolute deviation between current temperature T (T) and target temperature T goal, N step is action steps taken by an intelligent agent for punishing inaccuracy of temperature regulation, N step is action steps taken by the intelligent agent for punishing excessive action times, the intelligent agent is encouraged to realize accurate regulation of temperature through fewer action steps, T residual represents additional temperature rise caused by waste heat effect for quantifying influence of waste heat on a large surface source black body parameter-free temperature control model, and lambda, gamma and delta are weight coefficients for balancing action steps, waste heat influence and overshoot punishment intensity respectively, and max represents taking maximum value. The temperature control model of the large-area source black body without the parameter is ensured to pursue temperature precision, redundant actions are reduced as much as possible, and meanwhile, overshoot risks are avoided.
If the temperature control reaches the target temperature T goal, the reward function formula is: Wherein C is a fixed high prize value indicating that the task is complete and the temperature is exactly to the desired target, motivating the agent to complete the task correctly. Through the bonus function design, the temperature error of the large-area-source blackbody parameter-free temperature control method is reduced, the overshoot phenomenon is strictly avoided, the efficiency of the large-area-source blackbody parameter-free temperature control model is further improved, the target temperature is accurately regulated within the minimum step number as much as possible, and the control efficiency of the large-area-source blackbody parameter-free temperature control model is effectively improved.
S3, constructing a large-area source blackbody parameter-free temperature control model based on a depth DQN network, then training the large-area source blackbody parameter-free temperature control model by utilizing a training data set, enabling an intelligent body to accurately adjust a duty ratio through a greedy strategy based on the depth DQN network, and realizing efficient target temperature control in the minimum steps, wherein the method comprises the following steps:
And the agent decision and the environment interaction are that in each time step, the agent of the depth DQN network selects an action A according to the current state, adjusts the power of the large-area source black body parameter-free temperature control model according to the action, updates the environment state to obtain the next state S', and calculates the instant rewards R through feedback.
Experience playback and policy optimization, namely storing interaction data (S, A, R, S ') by using an experience playback buffer zone, wherein S represents a current State (State) which is a comprehensive description of the current State of a large-surface-source blackbody parameter-free temperature control model, such as current temperature, target temperature, ambient temperature, temperature change rate and the like, A represents actions (actions) selected by an agent in the current State, such as specific values (such as 10% and 20%) of duty ratio, R represents feedback rewards (Reward) of the environment to the current actions and used for measuring the quality of the actions, and the rewards can be related to factors such as temperature errors, action steps, overshoot penalties and the like, and S' represents a Next State (Next State) after the actions are executed, such as updated temperature values, temperature change rate and the like. By minimizing the error between the target Q value and the predicted Q value, the depth DQN network is optimized, and rapid and accurate temperature regulation is realized.
S3 specifically comprises the following steps:
S3.1, in the initialization stage of the large-area-source blackbody parameter-free temperature control model, firstly, randomly initializing parameters of a neural network, which provides initial weights for the subsequent learning process, then, initializing an experience playback buffer zone, which is used for storing four-element groups of states, actions, rewards and next states generated in the interaction process of an intelligent agent and the environment, and then, entering an action selection stage, wherein the large-area-source blackbody parameter-free temperature control is used for selecting actions according to greedy strategies in each state.
S3.2, adopting a greedy strategy in the action selection stage of the large-area source blackbody parameter-free temperature control model, wherein under the strategy, the intelligent agent uses probabilityRandomly selecting one action a to search, thereby trying different strategies, and obtaining probabilitySelecting an action A with the largest Q value in the current state, namely:
at this point, the agent preferentially selects actions using the learned greedy strategy, thereby maximizing the expected rewards.
S3.3, in the state updating stage of the large-area-source blackbody parameter-free temperature control model, the intelligent agent calculates the influence of the duty ratio on the large-area-source blackbody parameter-free temperature control according to the selected action A, updates the state of the environment, and obtains the following state S':
Wherein T (t+1) is an updated temperature value, T E is an ambient temperature, deltaT (t+1) is a new temperature variation, meanwhile, the intelligent agent calculates an instant reward R according to feedback of the environment and a set reward function, the reward reflects the effect of the current action on a large-area source blackbody parameter-free temperature control model, and interaction data (S, A, R, S') are stored in an experience playback buffer area M to provide data support for subsequent learning.
S3.4, in the action updating stage of the large-area source blackbody parameter-free temperature control model, the depth DQN network learns from past experience to minimize the error of the target Q value and the predicted Q value, so that the temperature control method is optimized, and the rewards are maximized.
The optimized depth DQN network Loss function Loss is formulated as follows, which is used to train the depth DQN network to minimize the mean square error between the predicted value Q (s t,at; θ) and the target value Q target(st,at;θ-:
Wherein Q (s t,at; θ) is the output of the current depth DQN network based on the state s t and the action a t, θ is a parameter of the depth DQN network, Q target(st,at;θ-) is a target value calculated by the depth DQN network, θ - is a delay updated version of the depth DQN network, a loss function is optimized by the batch size N, and an average value of mean square error is calculated each time for adjusting the parameter θ of the depth DQN network so as to gradually approximate the real value of the state-action pair, and in the training process, the historical data is sampled by an experience playback mechanism, so that the correlation among samples is reduced and the generalization capability of the method is improved.
By continuously learning the heating, waste heat and cooling characteristics in the simulation and real data set, the deep DQN network can gradually master the optimal temperature control method, and quick and accurate temperature adjustment can be realized in practical application.
And S4, verifying and adjusting, namely in the verification stage, evaluating the effect of the trained large-area-source blackbody parameter-free temperature control model in practical application through a test data set, comparing the deviation of the predicted temperature and the target temperature, and adjusting the method parameters according to the result. The generalization capability of the model and the adaptability thereof in the real environment are verified by comparing the simulation data with the actual data. Meanwhile, error analysis is carried out on the prediction deviation of the model, and a reward function and a control strategy are adjusted to ensure that the model can stably run under changeable environmental conditions. Finally, the performance of the model is optimized through adjusting the strategy, so that the temperature control effect of the model in practical application is expected.
S5, iterative optimization, namely repeating the steps S3 and S4 until the depth DQN network can accurately adjust the temperature to the target temperature T goal in the minimum step number, and keeping the stability and the high efficiency of the large-area source blackbody temperature control model. In the iterative optimization process, training and verification are continuously carried out, and the decision capability of the depth DQN network is improved through experience playback and greedy strategies, so that the robustness and the optimality of the temperature control model are ensured. Finally, the intelligent body can quickly and accurately adjust the temperature according to different environmental changes and requirements, and redundant operation steps are reduced as much as possible while the control precision is ensured, so that the control efficiency and the reaction speed of the model are improved.
And finally, applying the optimal large-area source black body parameter-free temperature control model to a large-area source black body temperature control system to control the temperature of the large-area source black body.
Referring to fig. 5, the large-area-source black body parameter-free temperature control device provided by the invention comprises one or more processors, and is used for realizing the large-area-source black body parameter-free temperature control method in the embodiment.
The embodiment of the large-area source black body parameter-free temperature control device can be applied to any device with data processing capability, and the device with the data processing capability can be equipment or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with optional data processing capability where the large-area blackbody parameter-free temperature control apparatus of the present invention is located is shown, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the manufacturing of the optional data processing capability where the apparatus is located in the embodiment generally depends on the actual function of the optional data processing capability apparatus, and may further include other hardware, which is not described herein.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The invention also provides a readable storage medium, on which a program is stored, which when executed by a processor, implements a large-area blackbody parameter-free temperature control method in the above embodiment.
The readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing apparatus described in any of the previous embodiments. The readable storage medium may also be an external storage device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), an SD card, a flash memory card (FLASH CARD), etc. provided on the apparatus. Further, the readable storage medium may also include both internal storage units and external storage devices of any data processing capable device. The readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.