[go: up one dir, main page]

CN113888327B - Energy internet transaction method and system based on reinforcement learning block chain enabling - Google Patents

Energy internet transaction method and system based on reinforcement learning block chain enabling Download PDF

Info

Publication number
CN113888327B
CN113888327B CN202111164320.7A CN202111164320A CN113888327B CN 113888327 B CN113888327 B CN 113888327B CN 202111164320 A CN202111164320 A CN 202111164320A CN 113888327 B CN113888327 B CN 113888327B
Authority
CN
China
Prior art keywords
energy
retailer
blockchain
price
utility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111164320.7A
Other languages
Chinese (zh)
Other versions
CN113888327A (en
Inventor
曹一凡
仇超
任晓旭
王晓飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202111164320.7A priority Critical patent/CN113888327B/en
Publication of CN113888327A publication Critical patent/CN113888327A/en
Application granted granted Critical
Publication of CN113888327B publication Critical patent/CN113888327B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于强化学习区块链赋能的能源互联网交易方法及系统,包括如下步骤:基于运营商、零售商和产销者在区块链交易平台上的能源交易关系,构建运营商、零售商和产销者之间的三阶段博弈模型;利用分布式分层策略梯度算法求解三阶段博弈模型中的博弈均衡点,所述博弈均衡点包括最佳单位服务价格、最佳单位能源价格和最佳能源需求;运营商、零售商和产销者根据博弈均衡点进行能源交易。本发明可以帮助运营商和零售商实现更高的效用,同时产销商也可以取得较好的效用。

The present invention discloses an energy internet transaction method and system based on reinforcement learning blockchain empowerment, comprising the following steps: constructing a three-stage game model between operators, retailers and producers and sellers based on the energy transaction relationship between operators, retailers and producers and sellers on the blockchain transaction platform; using a distributed hierarchical policy gradient algorithm to solve the game equilibrium point in the three-stage game model, the game equilibrium point includes the best unit service price, the best unit energy price and the best energy demand; operators, retailers and producers and sellers conduct energy transactions according to the game equilibrium point. The present invention can help operators and retailers achieve higher utility, while producers and sellers can also achieve better utility.

Description

Energy internet transaction method and system based on reinforcement learning block chain enabling
Technical Field
The invention belongs to the technical field of energy Internet, and particularly relates to an energy Internet transaction method and system based on reinforcement learning blockchain enabling.
Background
With the trend of distributed energy, energy Internet (EI) is rapidly becoming a focus of attention. However, the influx of a large number of distributed energy sources and traditional control methods have hampered the development of the energy internet due to the intermittence and uncertainty of the distributed energy sources. At the same time, the advent of software defined networks (Software defined network, SDN) has brought reliability and flexibility to address these issues. Due to reasonable price and efficient transmission, distributed energy markets are evolving in the energy internet. The system enables traditional energy consumers to be converted into energy retailers and has the capability of producing, storing and selling distributed energy, and the mode can reduce power transmission loss and reduce load peaks of the energy Internet.
On the other hand, in the general trend of the internet of things, edge computing is widely applied to the architecture of various network computing by virtue of the advantages of the edge computing in terms of network delay, expandability and reliability. In order to continuously provide reliable computing, storage and communication services, energy utilization and supply of devices such as edge servers, gateways and the like are urgent problems to be explored.
In order to meet the energy demands of both the emerging energy retailers and various edge devices in the energy internet, an energy trading market serving edge computing is to be constructed. Although this model can effectively solve the demands of two parties, there are still many problems that 1) credit crisis among different trading entities makes it impossible to conduct energy trading reliably, (2) imperfect market modeling that model establishment for each character is imperfect in the existing energy trading market, a trading process that is mutually interactive and constrained is not formed, and (3) unbalanced utility that the optimization mechanism in the existing energy trading focuses on maximizing utility of one party, and utility balance among multiple parties is not considered. In addition, most current research uses the methodology of game theory to simulate interactions between parties in the transaction process in order to achieve a utility balance. Conventional approaches typically assume a centralized organization to collect the user's information and assist them in developing relevant policies, which are targeted optimizations built under complete information, ignoring the protection of the user's privacy parameters. Meanwhile, in real life, complete information of an individual cannot be well obtained, particularly some privacy parameters, so that the problem of difficult information collection is easy to generate when a traditional method is adopted to formulate related policies, and the traditional method cannot be used.
Disclosure of Invention
Aiming at the problem that the energy transaction based on edge calculation in the prior art cannot realize the privacy protection and the utility balance of users, the invention provides an energy internet transaction method and system based on reinforcement learning block chain energization. In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
An energy internet transaction method based on reinforcement learning block chain enabling comprises the following steps:
S1, constructing a three-stage game model among an operator, a retailer and a obstetrician based on the energy trade relation of the operator, the retailer and the obstetrician on a blockchain trade platform;
s2, solving game balance points in a three-stage game model by using a distributed hierarchical strategy gradient algorithm, wherein the game balance points comprise an optimal unit service price, an optimal unit energy price and an optimal energy demand;
and S3, the operators, retailers and producers conduct energy transaction according to the game balance points obtained in the step S2.
The step S2 includes the steps of:
s2.1, setting network parameters of a three-stage game model;
s2.2, initializing weight parameters of the three-stage game model;
S2.3, respectively acquiring the states of operators Status of retailerAnd the state of the sales producerEach edge server in the blockchain trading platform sequentially selects the appropriate unit service price η for the operator utility U o (η) as the reward function to maximize the operator utility U o (η), the appropriate unit energy price p for the retailer utility U r (p) as the reward function to maximize the retailer utility U r (p), and the respective seller utility using a markov decision processSelecting the appropriate energy demand q j as a function of rewards to be effective for the producerMaximization.
In step S2.3, the status of the operatorThe expression of (2) is:
wherein p t-1 represents the unit energy price at the time of t-1 step, Representing the energy demand submitted by the sales producer to the local retailer through the edge server j at step t-1;
the expression that the operator utility U o (η) maximizes is:
Where U m represents the extra rewards the operator obtains through the trusted blockchain service provided by each energy transaction, φ represents the transmission loss rate, C t represents the unit transmission cost, C o represents the fixed operation and maintenance cost, η min represents the lowest price per unit service, η max represents the highest price per unit service, q j represents the energy demand submitted by the seller to the local retailer through edge server j, and N represents the aggregate of all edge servers in the blockchain transaction platform.
The calculation formula of the bonus U m is:
Um=(Rf+rs)λ;
Where R f represents a fixed block prize, R represents the blockchain service fee offered to the operator by the producer at each energy transaction, s represents a blockparameter, and λ represents a probability factor in the blockchain.
The status of the retailerThe expression of (2) is:
Where η t denotes the price per service at step t, Representing the energy demand submitted by the sales producer to the local retailer through the edge server j at step t-1;
The retailer utility U r (p) maximizes the expression:
Where C g represents the cost of production that the retailer needs to afford to produce energy, C s represents the cost of storage that the retailer needs to afford to store energy, p min represents the lowest unit energy price, p max represents the highest unit energy price, q j represents the energy demand that the producer submits to the local retailer through edge server j, and N represents the set of all edge servers in the blockchain trading platform.
The calculation formula of the production cost C g is as follows:
Where a, b, k are weighting factors for the cost of electricity generation at the time of retailer production, and phi represents the transmission loss rate.
The calculation formula of the storage cost C s is as follows:
Where c s represents the unit cost of the retailer's energy storage, ζ c represents the charging efficiency of the energy storage device, and ζ d represents the discharging efficiency of the energy storage device.
The state of the sales producerThe expression of (2) is:
the utility of the obstetrician The maximized expression is:
Where δ represents a conversion factor, w j represents the usage scenario of edge server j in terms of energy utilization, q min represents the minimum energy demand, q max represents the maximum energy demand, q j represents the energy demand submitted by the seller to the local retailer via edge server j, and r represents the blockchain service fee offered to the operator by the seller at each energy transaction.
The energy internet transaction system based on reinforcement learning blockchain enabling comprises an energy application layer, an energy data layer and an edge control layer, wherein the energy application layer interacts with the edge control layer through an intelligent contract interface, the energy data layer interacts with the edge control layer, the energy application layer comprises retailers and sellers, the retailers and the sellers interact through a blockchain transaction platform, the edge control layer comprises edge servers and distributed SDN controllers maintained by operators, each edge server serves as a node in the blockchain transaction platform, the energy data layer comprises a switch and an energy router, the switch is connected with the distributed SDN controllers and is used for receiving scheduling instructions sent by the distributed SDN controllers and forwarding the scheduling instructions to the corresponding energy routers, and the energy router is used for sensing states of energy lines and reflecting the states of the energy lines to the edge servers.
The intelligent contract interface is built based on an intelligent contract system, the intelligent contract system comprises a user registration module, an energy transaction module, an energy transmission module, an energy recording module and an information query module, after three parties of an operator, a sales producer and a retailer register respective accounts through the user registration module, the sales producer places orders through the energy transaction module according to own needs, the energy is transmitted to the sales producer from the retailer through the energy transmission module, the energy recording module is used for recording respective electric quantity information of the retailer and the sales producer, and the information query module is used for enabling the parties to query own account information.
The invention has the beneficial effects that:
The hierarchical design of the actions and the learning processes of the multiple agents is helpful for the agents to learn own strategies according to competing strategies, the overall performance of a transaction system is improved, and compared with a popular deep reinforcement learning algorithm, the method can help operators and retailers to achieve higher utility, and meanwhile, manufacturers and retailers can achieve better utility. Under the unified pricing mechanism, the convergence sequence of different entities is consistent with the action sequence of three stages of the Stark Stackelberg game, so that a leader in the game is more likely to obtain better benefits than a follower.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a transaction system according to the present invention.
FIG. 2 is a schematic diagram of a three-stage Stackelberg gaming model.
FIG. 3 is a block diagram of a hierarchical policy gradient algorithm.
Fig. 4 is a block diagram of the smart contract system.
Fig. 5 is a game convergence presentation under HDPG.
Fig. 6 is a graph of performance versus various algorithms.
FIG. 7 is a schematic diagram of utility under different parameters.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
Embodiment 1 an energy internet transaction method based on reinforcement learning blockchain enabling, comprising the following steps:
S1, constructing a three-stage game model among an operator, a retailer and a obstetrician based on the energy trade relation of the operator, the retailer and the obstetrician on a blockchain trade platform;
As shown in fig. 2 and 3, the three-stage game model includes three policy networks corresponding to an operator, a retailer, and a sales producer, respectively, where the sales producer submits energy requirements to the operator through a blockchain transaction platform, and the operator assists in completing energy transactions and transmissions between the retailer and the sales producer through the blockchain transaction platform as an intermediary. The blockchain transaction platform is composed of a plurality of edge servers, each edge server is used as a node in the blockchain transaction platform and bears functions of accounting, broadcasting, verification and consensus, and the collection of the edge servers is represented as N= {1, 2.
S2, solving game equilibrium points in a three-stage game model by using a distributed hierarchical strategy gradient algorithm (HIERARCHICAL DISTRIBUTED POLICY GRADIENT, HDPG);
Solving a game balance point in a three-stage game model by using a Markov decision process in consideration of a Stackelberg game under incomplete information, wherein one Markov decision process is equivalent to the whole process of one-time energy transaction and transmission, and the game balance point comprises an optimal unit service price, an optimal unit energy price and an optimal energy requirement, so that the utility of an operator U o (eta), a retailer U r (p) and a seller is maximized The overall benefit is maximized under the condition that eta represents the unit service price, p represents the unit energy price, q j represents the energy demand submitted by the producer to the local retailer through the edge server j, and j epsilon N.
The step S2 includes the steps of:
S2.1, setting network parameters of three strategy networks;
The network parameters include a learning rate α r of the retailer policy network, a learning rate α p of the producer policy network, a learning rate α o of the carrier policy network, and a discount factor γ.
S2.2, weight parameters of three strategy networksCarrying out random initialization;
S2.3, respectively acquiring the states of operators Status of retailerAnd the state of the sales producerEach edge server in turn maximizes carrier utility U o (η) by selecting appropriate unit service price η as a reward function for carrier utility U o (η), maximizes retailer utility U r (p) by selecting appropriate unit energy price p as a reward function for retailer utility U r (p), maximizes respective seller utilitySelecting appropriate energy demand as a function of rewardsTo make the sales person effectiveMaximized, and η t∈Ao,Ao represents the operator's action space, p t∈Ar,Ar represents the retailer's action space,A p represents the action space of the sales producer;
The status of the operator The expression of (2) is:
wherein p t-1 represents the unit energy price at the time of t-1 step, Representing the energy demand submitted by the seller to the local retailer at step t-1 via edge server j.
The expression that the operator utility U o (η) maximizes is:
Where U m represents the extra rewards the operator obtains through the trusted blockchain service provided by each energy transaction, φ represents the transmission loss rate, C t represents the unit transmission cost, C o represents the fixed operation and maintenance cost, η min represents the lowest unit service cost, and η max represents the highest unit service cost.
The calculation formula of the bonus U m is:
Um=(Rf+rs)λ;
Where R f represents a fixed block prize, R represents the blockchain service fee offered to the operator by the producer at each energy transaction, s represents a blockparameter, and λ represents a probability factor in the blockchain. The operator may be further motivated to maintain the blockchain by the bonus U m.
The status of the retailerThe expression of (2) is:
The retailer utility U r (p) is determined by the unit energy price p and the total energy demand, and the expression of the retailer utility U r (p) maximization is:
where C g represents the production cost that the retailer needs to afford to produce energy, C s represents the storage cost that the retailer needs to afford to store energy, p min represents the lowest unit energy price, and p max represents the highest unit energy price.
The calculation formula of the production cost C g is as follows:
where a, b, k are weighting factors of the power generation cost.
The calculation formula of the storage cost C s is as follows:
Where c s represents the unit cost of the retailer's energy storage, ζ c represents the charging efficiency of the energy storage device, ζ d represents the discharging efficiency of the energy storage device, Representing the energy that the retailer needs to actually produce and the energy stored, taking into account the energy lost during transmission.
The state of the sales producerThe expression of (2) is:
the utility of the obstetrician The maximized expression is:
Where δ denotes a conversion factor, w j denotes the usage scenario of edge server j in terms of energy utilization, q min denotes minimum energy demand, q max denotes maximum energy demand, pq j denotes the energy payment cost of the seller to the retailer, Representing the benefit that the edge server j takes from its purchased energy in actual production.
The specific flow of the markov decision process is the prior art, and the invention is not repeated, in addition, the time complexity of each iteration of the outer loop, that is, the total iteration of executing the markov decision process, the middle loop, that is, the iteration step number of each strategy network, and the inner loop, that is, the number of producers in the producer strategy network is respectively O (E), O (T) and O (N). Each policy network includes two fully-connected layers, and the time complexity of each fully-connected layer is expressed asWhere K l refers to the number of fully connected neural units and L represents the number of layers of a policy network. Because each policy network contains two fully connected layers to generate policies, the overall temporal complexity of the algorithm is O (ETN (T f)).
The formula of the operator policy network update weight parameter θ o is:
In the formula, Representing the policy of the operator at step t,Representing the rewards of the operator at step t,Representing the status of the operator at step t, andS o represents the state space of the operator, α o represents the learning rate of the operator policy network.
The formula for the retailer policy network update weight parameter θ r is:
In the formula, Representing the policy of the retailer at step t,Indicating the rewards of the retailer at step t,Representing the status of the retailer at step t, anS r represents the state space of the retailer, a r represents the learning rate of the retailer policy network.
Producer policy network update weight parametersThe formula of (2) is:
In the formula, Representing the strategy of the shipper at step t,Indicating the rewards of the seller at step t,Indicating the state of the sales producer at step t, andS p represents the state space of the shipper,Representing the learning rate of the producer policy network.
In this embodiment, the producers refer to distributed energy users who cannot produce energy by themselves or whose energy produced cannot meet the energy consumption, and they can purchase energy from retailers of public energy companies or BSDEI according to the energy demand and unit energy price.
Retailers refer to energy users who use distributed power generation and energy storage devices that generate more total power than total power, and they benefit from providing energy to a variety of distributed applications. On the other hand, they need to bear the costs of distributed power generation, energy storage, and pay the operator for the transmission routing services.
The carrier is an intermediary between the seller and the retailer to assist in completing the energy trading process. In order to provide more convenient service and lower delay, operators deploy hardware devices such as an edge server, a distributed SDN controller and the like on an edge control layer, so that edge-to-edge coordination among the devices is realized. In return, it charges the retailer for transmission routing services and the seller for trusted blockchain services.
Embodiment 2 an energy internet transaction system based on reinforcement learning blockchain enabling, as shown in fig. 1, the system comprises an energy application layer, an energy data layer and an edge control layer, wherein the energy application layer, the energy data layer and the edge control layer form an energy transaction service system of a distributed energy market together, the three layers are mutually independent and mutually related, and energy routing and scheduling control in the blockchain enabling energy internet (Blockchain-ASSISTED ENERGY INTERNET, BEI) are decoupled; the energy application layer interacts with the edge control layer through an intelligent contract interface, and the energy data layer interacts with the edge control layer through a standard interface OpenFlow; the energy application layer comprises retailers and producers, the retailers and the producers directly conduct information interaction through a blockchain transaction platform, the direct transaction process can motivate the retailers and the producers to participate in a distributed energy market more actively, the blockchain transaction platform provides a reliable and stable third party service platform for energy transaction in the energy application layer, the edge control layer comprises edge servers maintained by operators and distributed SDN controllers, the distributed SDN controllers are used for dispatching and controlling energy routing of an energy data layer, each edge server serves as a node in the blockchain transaction platform and bears accounting, broadcasting, verifying and consensus functions, the aggregate of the edge servers is expressed as N= {1,2, j, N' }, the intelligent contract provides reliable and automatic process control for energy transaction in the blockchain transaction platform, the energy data layer comprises a switch and an energy router, the switch is used for receiving a dispatching instruction of the distributed SDN controller of the edge control layer and sending the dispatching instruction to the corresponding energy router, the energy router is used for sensing the state of the energy line and reflecting the real-time state of the energy line to the edge server, so that the distributed SDN controller is helped to modify the scheduling instruction, and the state of the energy line comprises the electric energy value, the voltage, the current and the like on the energy line. In addition, the energy router may also receive a command from the distributed SDN controller to change the state of the energy router.
As shown in fig. 4, the intelligent contract interface is established based on an intelligent contract system, the intelligent contract system comprises a user registration module, an energy transaction module, an energy transmission module, an energy recording module and an information query module, after three parties of an operator, a sales producer and a retailer register respective accounts through the user registration module respectively, the sales producer orders through the energy transaction module according to own requirements, the energy transaction realizes that energy flows from the retailer to the sales producer through the energy transmission module, the energy recording module is used for recording respective electric quantity information of the retailer and the sales producer, and the information query module is used for allowing the parties to query own account information.
Specifically, in the user registration module, the intelligent contract deployer is an initial administrator of the transaction system and initializes some common parameters. The participant needs to register an account through the user registration module according to the user name, the account address and the account type. Information for these accounts, including available energy and energy coins, and total energy generated and used, is then initialized. In view of the cold start problem, participants may acquire energy coins through the energy transaction module. After this, the sales producer confirms the exact amount of energy required and places an order, and the transaction process is completed by the energy transaction module, and the entire energy transaction process proceeds based on the method described in example 1. The energy transaction module can verify the authority of an account, which ensures the sufficient balance of energy, then the energy transmission module is invoked, which not only clears the transaction between a retailer and an operator, but also activates an energy scheduling switch, which ensures that energy flows from the retailer to a sales producer, the energy recording module can execute the modification of the total generated energy and the total used electric quantity according to the intelligent ammeter data of the places of the retailer and the sales producer through the energy recording module, and further according to the electric quantity accounting cost, and the information query module provides six types of interfaces for participants to query own account information.
The performance of the present invention is illustrated in terms of convergence performance under a unified pricing mechanism by setting up 1 operator, 1 retailer, and 10 edge servers as follows. Since the optimal demands of the edge servers are quite similar under the unified pricing mechanism, as shown in fig. 5, selecting one of the edge servers for presentation, both the carrier and retailer achieve quite good utility. Under the operator's high-level policies and the retailer's middle-level policies, the producer, i.e., consumer, can quickly converge to a relatively good solution. The convergence order of the different entities is consistent with the action order of the three phases of the Stackelberg game, so that the leader is more likely to gain better benefits than the follower.
To demonstrate the superior performance of the present invention, the present invention was compared to some popular deep reinforcement learning algorithms from the economic analysis perspective, as shown in fig. 6, where the data is the average of 10 experimental results to reduce random errors. As is evident from fig. 6a, HDPG obtains more total rewards than the three deep reinforcement learning algorithms PPO, also known as proximity strategy optimization, SAC, also known as flexible actor-critique, DQN, i.e. deep Q learning. In addition, different algorithms have their own characteristics, e.g., SAC assists retailers in achieving the highest utility, but perform poorly in operators' policies. In contrast HDPG helps operators and retailers achieve higher utility. In addition, edge device usage HDPG also achieves better utility. The hierarchical design of the actions and learning processes of multiple agents helps the agents learn their own strategies according to competing strategies, which is a potential reason for better performance of HDPG.
As shown in fig. 7, the parameter sensitivity of the utility as a function of the number of production users was analyzed. Since the utility of an edge server is not too sensitive to the number of participants, a box plot is used to carefully describe the impact of edge server energy usage on the utility of an edge server. As shown in fig. 7a, it is sensible for an edge server with high value production to participate in the energy market. As can be seen from fig. 7b and 7c, the transmission loss rate largely determines the utility of the operators and retailers. Therefore, the adoption of more advanced technology in the energy transmission and distribution network to reduce the transmission loss rate has important significance. As the number of edge servers increases, the utility of operators and retailers steadily increases, while the utility of each edge server tends to decrease slightly. It should be noted that an increase in the number of edge servers may affect the policy of each edge server, resulting in some fluctuation in the utility of all entities. Fig. 7d shows the trend of retailer utility for different energy storage efficiencies. Low energy storage efficiency may lead to negative utility, while the higher the efficiency of the energy storage device, the better the utility of the retailer. However, the cost and difficulty of reducing transmission loss rates and increasing energy storage efficiency are also difficulties in the energy trading market.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (4)

1. The energy internet transaction method based on reinforcement learning block chain enabling is characterized by comprising the following steps of:
S1, constructing a three-stage game model among an operator, a retailer and a obstetrician based on the energy trade relation of the operator, the retailer and the obstetrician on a blockchain trade platform;
s2, solving game balance points in a three-stage game model by using a distributed hierarchical strategy gradient algorithm, wherein the game balance points comprise an optimal unit service price, an optimal unit energy price and an optimal energy demand;
s3, the operators, retailers and producers conduct energy transaction according to the game balance points obtained in the step S2;
The block chain transaction platform comprises a plurality of edge servers which are connected with each other, wherein the edge servers are connected with a distributed SDN controller through intelligent contract interfaces, retailers and producers and sellers realize energy transaction through the edge servers and the intelligent contract interfaces, and the producers and sellers provide block chain services for the retailers and the producers through the edge servers and the intelligent contract interfaces;
the step S2 includes the steps of:
s2.1, setting network parameters of a three-stage game model;
s2.2, initializing weight parameters of the three-stage game model;
S2.3, respectively acquiring the states of operators Status of retailerAnd the state of the sales producerEach edge server in the blockchain trading platform sequentially selects the appropriate unit service price η for the operator utility U o (η) as the reward function to maximize the operator utility U o (η), the appropriate unit energy price p for the retailer utility U r (p) as the reward function to maximize the retailer utility U r (p), and the respective seller utility using a markov decision processSelecting the appropriate energy demand q j as a function of rewards to be effective for the producerMaximizing;
the expression that the operator utility U o (η) maximizes is:
Wherein U m represents an additional incentive obtained by an operator through a trusted blockchain service provided by each energy transaction, phi represents a transmission loss rate, C t represents a unit transmission cost, C o represents a fixed operation and maintenance cost, eta min represents a minimum unit service price, eta max represents a maximum unit service price, q j represents an energy demand submitted by a seller to a local retailer through an edge server j, and N represents a set of all edge servers in a blockchain transaction platform;
The calculation formula of the bonus U m is:
Um=(Rf+rs)λ;
Wherein R f represents a fixed block prize, R represents the blockchain service fee offered to the operator by the producer upon each energy transaction, s represents a blockparameter, and λ represents a probability factor in the blockchain;
The retailer utility U r (p) maximizes the expression:
wherein, C g represents the production cost required to be born when the retailer produces the energy, C s represents the storage cost required to be born when the retailer stores the energy, p min represents the lowest unit energy price, and p max represents the highest unit energy price;
the utility of the obstetrician The maximized expression is:
where δ represents a conversion factor, w j represents a use scenario of the edge server j in terms of energy utilization, q min represents a minimum energy demand, and q max represents a maximum energy demand;
The calculation formula of the production cost C g is as follows:
wherein a, b and k are weighting factors of power generation cost when produced by retailers, and phi represents transmission loss rate;
The calculation formula of the storage cost C s is as follows:
Where c s represents the unit cost of the retailer's energy storage, ζ c represents the charging efficiency of the energy storage device, and ζ d represents the discharging efficiency of the energy storage device.
2. The reinforcement learning blockchain enabled energy internet transaction method of claim 1, wherein in step S2.3, the status of the operatorThe expression of (2) is:
wherein p t-1 represents the unit energy price at the time of t-1 step, Representing the energy demand submitted by the seller to the local retailer at step t-1 via edge server j.
3. The reinforcement learning blockchain enabled energy internet transaction method of claim 1, wherein,
The status of the retailerThe expression of (2) is:
Where η t denotes the price per service at step t, Representing the energy demand submitted by the seller to the local retailer at step t-1 via edge server j.
4. The reinforcement learning blockchain enabled energy internet transaction method of claim 1, wherein,
The state of the sales producerThe expression of (2) is:
Wherein p t represents the unit energy price at the time of t steps.
CN202111164320.7A 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain enabling Active CN113888327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111164320.7A CN113888327B (en) 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain enabling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111164320.7A CN113888327B (en) 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain enabling

Publications (2)

Publication Number Publication Date
CN113888327A CN113888327A (en) 2022-01-04
CN113888327B true CN113888327B (en) 2024-12-27

Family

ID=79004931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111164320.7A Active CN113888327B (en) 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain enabling

Country Status (1)

Country Link
CN (1) CN113888327B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529107A (en) * 2022-04-21 2022-05-24 南方电网数字电网研究院有限公司 Energy transaction data processing method and device, computer equipment and storage medium
CN115496521A (en) * 2022-08-04 2022-12-20 贵州大学 Multidimensional Data Utility Evaluation and Pricing Method Based on Stackelberg Game
CN115660896B (en) * 2022-11-11 2024-09-10 深圳市人工智能与机器人研究院 Incentive mechanism configuration method and related equipment for energy system based on blockchain
CN116362327A (en) * 2023-03-30 2023-06-30 北京天弛网络有限公司 A model training method, system and electronic device
CN120045310B (en) * 2024-12-18 2026-01-06 东莞理工学院 Blockchain-based resource exchange methods, devices, equipment, media, and products

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460358A (en) * 2020-03-23 2020-07-28 四川大学 Park operator energy transaction optimization decision method based on supply and demand game interaction

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7043457B1 (en) * 2000-06-28 2006-05-09 Probuild, Inc. System and method for managing and evaluating network commodities purchasing
CN109784926A (en) * 2019-01-22 2019-05-21 华北电力大学(保定) A virtual power plant internal market transaction method and system based on alliance blockchain
CN111107506B (en) * 2020-01-02 2022-05-10 南京邮电大学 A secure sharing method of network resources based on blockchain and bidding game
CN111556508B (en) * 2020-05-20 2023-03-10 南京大学 A Stackelberg game multi-operator dynamic spectrum sharing method for large-scale IoT access

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460358A (en) * 2020-03-23 2020-07-28 四川大学 Park operator energy transaction optimization decision method based on supply and demand game interaction

Also Published As

Publication number Publication date
CN113888327A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN113888327B (en) Energy internet transaction method and system based on reinforcement learning block chain enabling
CN112054513B (en) Hybrid game-based multi-microgrid double-layer coordination optimization scheduling method
Tushar et al. Transforming energy networks via peer-to-peer energy trading: The potential of game-theoretic approaches
Mengelkamp et al. Trading on local energy markets: A comparison of market designs and bidding strategies
Zhang et al. Incentive-driven energy trading in the smart grid
Wang et al. Modelling and analysis of a two-level incentive mechanism based peer-to-peer energy sharing community
Lamparter et al. An agent-based market platform for smart grids
CN112465320A (en) Virtual power plant transaction management method based on block chain technology
CN109902952A (en) A blockchain-based photovoltaic microgrid power intelligent transaction system and method
Teng et al. Efficient blockchain-enabled large scale parked vehicular computing with green energy supply
CN112417048B (en) Blockchain-based smart microgrid system scheduling method, storage medium and equipment
CN118172123B (en) Collaborative optimization method and system for operation and transaction of multiple micro-grids in virtual power plant
CN114066633B (en) Electric energy trading method, device, storage medium and electronic equipment
Karaca et al. Core-selecting mechanisms in electricity markets
Guo et al. Energy management of Internet data centers in multiple local energy markets
CN117498323A (en) Blockchain-based virtual power plant hybrid game power resource interaction method and device
Chen et al. Interactive Learning-Implementation of ChatGPT and Reinforcement Learning in Local Energy Trading
CN115423622A (en) Block chain-based power demand response transaction settlement method and system
Yu et al. Agent-based retail electricity market: modeling and analysis
CN114445223A (en) Block chain-based power demand response transaction method and system
CN117559387B (en) VPP internal energy optimization method and system based on deep reinforcement learning dynamic pricing
Zikos et al. Local Energy and Flexibility Markets: State of the art and technological gap analysis
CN118552309A (en) A method, device and storage medium for simulating transaction behavior of a power sales company based on MTMA-SAC algorithm
CN118071499A (en) A peer-to-peer market trading method for virtual power plant energy management
CN115796873A (en) Method, system, electronic device and medium for computing power billing and settlement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant