JP2022018880A

JP2022018880A - Optimization method of wireless communication system, wireless communication system, and program for wireless communication system

Info

Publication number: JP2022018880A
Application number: JP2020122301A
Authority: JP
Inventors: 笑子篠原; Emiko Shinohara; 保彦井上; Yasuhiko Inoue; 裕介淺井; Yusuke Asai; 泰司鷹取; Taiji Takatori; 啓史大関; Hiroshi Ozeki; 義哲成末; Yoshiaki Narusue; 博之森川; Hiroyuki Morikawa
Original assignee: Nippon Telegraph and Telephone Corp; University of Tokyo NUC
Current assignee: University of Tokyo NUC; NTT Inc
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2022-01-27
Anticipated expiration: 2040-07-16
Also published as: JP7388634B2

Abstract

To provide an optimization method of a wireless communication system that calculates a setting parameter and an usage condition by using a computer in a situation in which a plurality of wireless communication systems with different requirements are mixed while interfering in the same environment, a wireless communication system, and a program.SOLUTION: In a reinforcement learning model, wireless environment information including a state related to wireless communication is detected from a wireless communication terminal belonging to a plurality of wireless communication systems. The wireless environment information is provided to agents 12-1, 12-2, and 12-3 prepared in response to a condition imposed on each of the plurality of wireless communication systems. A computer is caused to execute reinforcement learning to which the condition and the wireless environment information are applied for each of the agents 12-1, 12-2, and 12-3. For each of the plurality of wireless communication systems, the computer is caused to calculate an optimum control parameter under the wireless communication environment on the basis of the result of reinforcement learning.SELECTED DRAWING: Figure 4

Description

この発明は、無線通信システムの最適化方法、無線通信システムおよび無線通信システム用プログラムに係り、特に、多段階評価の学習を用いて通信状態の最適化を図る無線通信システムの最適化方法、無線通信システムおよび無線通信システム用プログラムに関する。 The present invention relates to a method for optimizing a wireless communication system, a wireless communication system, and a program for a wireless communication system, and in particular, a method for optimizing a wireless communication system for optimizing a communication state by using learning of multi-step evaluation, wireless. Related to communication systems and programs for wireless communication systems.

より具体的には、本発明は、異なる無線通信システムが混在いして干渉し合う環境において、通信の最適化を図るものである。ここでは、各々の無線通信システムに異なる要求条件が課される場合に、要求条件の夫々を考慮した最適化が実施される。この最適化は、１つないし複数の無線通信システムに対して、機械学習や強化学習など、計算機を用いた学習により実行される。 More specifically, the present invention aims to optimize communication in an environment in which different wireless communication systems coexist and interfere with each other. Here, when different requirements are imposed on each wireless communication system, optimization is performed in consideration of each requirement. This optimization is performed by computer-based learning such as machine learning and reinforcement learning for one or more wireless communication systems.

無線LANは、免許不要帯において廉価に利用できる無線通信システムである。このため、その普及は急激に進み、多数の無線LAN端末が同じエリア内に混在する事態が生じている。その結果、無線LAN端末同士が互いに干渉し合うことが課題となっている。このような課題を受けて、無線LAN端末同士の干渉の影響を最小限にして、個々の、または全体のシステム容量を拡大するための技術が多数提案されている。 Wireless LAN is a wireless communication system that can be used inexpensively in the license-free band. For this reason, its spread is rapidly increasing, and a large number of wireless LAN terminals are mixed in the same area. As a result, it is a problem that wireless LAN terminals interfere with each other. In response to these problems, many technologies have been proposed for minimizing the influence of interference between wireless LAN terminals and expanding the individual or overall system capacity.

例えば図１は、無線通信端末１～Ｎが、互いに干渉しあう無線LAN基地局（AP：Access Point）である例を示している。尚、図１の下段に示す無線通信端末Ｎ＋１～Ｎ＋Ｍは、上記のAPと通信を確立するスマートフォン等のユーザ端末である。この例では、APとして機能する無線通信端末１～Ｎの夫々が、それらの周辺における干渉情報や、無線通信端末Ｎ＋１～Ｎ＋Ｍとの接続成否の情報を取得し、無線環境情報として制御サーバ１０へ送信する。 For example, FIG. 1 shows an example in which wireless communication terminals 1 to N are wireless LAN base stations (APs: Access Points) that interfere with each other. The wireless communication terminals N + 1 to N + M shown in the lower part of FIG. 1 are user terminals such as smartphones that establish communication with the above AP. In this example, each of the wireless communication terminals 1 to N functioning as an AP acquires interference information in the vicinity thereof and information on success / failure of connection with the wireless communication terminals N + 1 to N + M, and sends the control server 10 as wireless environment information. Send.

制御サーバ１０は、無線通信端末１～Ｎを含むAP群のスループットが最大となるように周波数チャネルや送信電力値の割り当てを算出し、その結果を制御情報として各APへ返送する。 The control server 10 calculates the allocation of frequency channels and transmission power values so that the throughput of the AP group including the wireless communication terminals 1 to N is maximized, and returns the result as control information to each AP.

ところで、無線LANを使用するアプリケーションやデバイスでは、利用シーンに応じて、重要視するべき項目が異なることがある。例えば、IoTセンサを含む無線通信システム等では、通信速度は重要ではない。一方で、エリア内に多数のIoTセンサを取り付ける場合は、当該エリア内で確立可能な通信の数を増やすことは重要である。 By the way, in applications and devices that use wireless LAN, the items that should be emphasized may differ depending on the usage scene. For example, in a wireless communication system including an IoT sensor, the communication speed is not important. On the other hand, when installing a large number of IoT sensors in an area, it is important to increase the number of communications that can be established in the area.

そのため、システム容量の拡大を目指すのではなく、狭帯域であっても干渉が少ない周波数チャネルを選択することが必要になる。また、広範囲に通信したい場合には、他の無線通信システムからの干渉を重視するのではなく、他の無線通信システムに影響を及ぼさない範囲で送信電力を最大化する制御が必要になる。このように、使用する無線通信システムのアプリケーションによって、要求される制御方針は異なる。 Therefore, it is necessary to select a frequency channel with less interference even in a narrow band, rather than aiming to expand the system capacity. Further, when it is desired to communicate in a wide range, it is necessary to control to maximize the transmission power within a range that does not affect other wireless communication systems, instead of emphasizing interference from other wireless communication systems. As described above, the required control policy differs depending on the application of the wireless communication system used.

特に、現在日本国内でRFIDやIoT向けに開放されている９２０MHｚ帯には複数の無線通信システムが混在している。具体的には、この帯域は、例えば下記のようなシステムで利用されている。
１．位置情報や温度などのセンサ情報を定期的に伝送する無線通信システム
２．監視カメラを使用して動画を伝送する無線通信システム
３．山間部や海洋などの広域な範囲でネットワーク構築が必要な無線通信システム In particular, a plurality of wireless communication systems coexist in the 920 MHz band, which is currently open for RFID and IoT in Japan. Specifically, this band is used in, for example, the following systems.
1. 1. Wireless communication system that periodically transmits sensor information such as position information and temperature 2. Wireless communication system that transmits video using surveillance cameras 3. Wireless communication system that requires network construction over a wide area such as mountains and the ocean

これらの無線通信システムは、夫々異なる要求条件を持っていると同時に、同じ周波数チャネル上で混在することが想定されている。従って、これらに対する制御情報は、同じ周波数リソース上で混在することを前提として算出する必要がある。 These wireless communication systems are expected to have different requirements and at the same time be mixed on the same frequency channel. Therefore, it is necessary to calculate the control information for these on the assumption that they are mixed on the same frequency resource.

図１に示す無線通信システムの構成例では、複数の無線通信端末１～Ｎが、互いに干渉しあう環境で制御サーバ１０に接続されている。また、無線通信端末１～Ｎは、無線通信を用いて他の無線通信端末Ｎ＋１～Ｎ＋Ｍとデータ通信することができる。 In the configuration example of the wireless communication system shown in FIG. 1, a plurality of wireless communication terminals 1 to N are connected to the control server 10 in an environment in which they interfere with each other. Further, the wireless communication terminals 1 to N can perform data communication with other wireless communication terminals N + 1 to N + M by using wireless communication.

当該システム構成では、無線通信端末１～Ｎの夫々が、無線環境情報を制御サーバ１０に送信する。無線環境情報とは、例えば無線LAN通信の場合は、SSIDや、チャネル使用率などの周波数チャネルの利用情報のほか、無線通信端末で設定されているパラメータを意味する。このパラメータには、使用中の周波数チャネル、チャネル帯域幅、送信電力値などが含まれる。無線環境情報は、無線通信端末１～Ｎが、夫々の周辺に対してキャリアセンスを実施して取得する。 In the system configuration, each of the wireless communication terminals 1 to N transmits wireless environment information to the control server 10. For example, in the case of wireless LAN communication, the wireless environment information means frequency channel usage information such as SSID and channel usage rate, as well as parameters set in the wireless communication terminal. This parameter includes the frequency channel in use, channel bandwidth, transmit power value, and so on. The wireless environment information is acquired by the wireless communication terminals 1 to N by performing carrier sense in their respective surroundings.

従来の方式では、制御サーバ１０は、無線環境情報を収集した後、全ての無線通信端末１～Ｎが、同じ仕様であり、かつ同様に通信容量を必要としていると仮定したうえで、最適化計算を実施する。最適化計算は、例えば、周波数チャネルの位置や幅、送信電力について行われる。計算の結果は、制御情報として制御サーバ１０から無線通信端末１～Ｎに送信される。制御情報を受け取った無線通信端末１～Ｎは、その制御情報に従って該当設定値を変更する。 In the conventional method, after collecting the wireless environment information, the control server 10 is optimized on the assumption that all the wireless communication terminals 1 to N have the same specifications and similarly require the communication capacity. Perform the calculation. The optimization calculation is performed, for example, on the position and width of the frequency channel and the transmission power. The result of the calculation is transmitted from the control server 10 to the wireless communication terminals 1 to N as control information. The wireless communication terminals 1 to N that have received the control information change the corresponding set value according to the control information.

制御が定期的に、または何らかのトリガに起因して実行される場合は、初期値算出の手法に加え、周波数チャネルの利用情報などの更新された情報をもとに、再度最適パラメータが算出される。そして、その算出により制御情報が決定されて無線通信端末１～Ｎの夫々に送信され。 If the control is executed periodically or due to some trigger, the optimum parameters are calculated again based on the updated information such as the frequency channel usage information in addition to the initial value calculation method. .. Then, the control information is determined by the calculation and transmitted to each of the wireless communication terminals 1 to N.

図２は、従来の制御例についてのフローチャートである。Ｓ１００では、ＡＰとして機能する無線通信端末１～Ｎ夫々の無線環境情報が収集される。 FIG. 2 is a flowchart of a conventional control example. In S100, wireless environment information of each of the wireless communication terminals 1 to N functioning as an AP is collected.

Ｓ１０２では、収集された情報をもとに最適パラメータが算出される。従来の制御では、無線通信端末１～Ｎの全てについて同様の通信容量が必要だと判断される。このため、全ての無線通信端末１～Ｎに対して同じ評価関数が使用され、繰り返し計算や遺伝的アルゴリズムなどのヒューリスティックな手法により最適と考えられる無線通信パラメータが算出される。 In S102, the optimum parameter is calculated based on the collected information. In the conventional control, it is determined that the same communication capacity is required for all of the wireless communication terminals 1 to N. Therefore, the same evaluation function is used for all the wireless communication terminals 1 to N, and the optimum wireless communication parameter is calculated by a heuristic method such as an iterative calculation or a genetic algorithm.

算出されたパラメータ情報は、制御情報として無線通信端末１～Ｎに送信される（Ｓ１０４）。 The calculated parameter information is transmitted to the wireless communication terminals 1 to N as control information (S104).

以後、制御トリガの発生が認められるまで（Ｓ１０６）、制御終了が判定されない限り（Ｓ１０８）、待機の処理が採られる（Ｓ１１０）。そして、制御トリガが発生すれば、Ｓ１００以降の処理が再び実行される。ただし、ヒューリスティックな手法だけでは複数の無線通信システムの環境や制御情報を十分に考慮することが難しいため、機械学習や強化学習を使用した最適化も提案されている。 After that, until the occurrence of the control trigger is recognized (S106), the standby process is taken (S110) until the end of control is determined (S108). Then, when the control trigger is generated, the processing after S100 is executed again. However, since it is difficult to fully consider the environment and control information of multiple wireless communication systems using only the heuristic method, optimization using machine learning and reinforcement learning has also been proposed.

Liang, Le et al., “Multi-Agent Reinforcement Learning for Spectrum Sharing in Vehicular Networks”, 2019 IEEE 20th International Workshop on Signal ProcessingAdvances in Wireless Communications (SPAWC), 1-5, 2019.Liang, Le et al., “Multi-Agent Reinforcement Learning for Spectrum Sharing in Vehicular Networks”, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 1-5, 2019. Cheng Wu, Kaushik Chowdhury, Marco Di Felice, and Waleed Meleis, “Spectrum management of cognitive radio using multi-agent reinforcement learning”, In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Industry track (AAMAS ’10), 1705-1712, 2010.Cheng Wu, Kaushik Chowdhury, Marco Di Felice, and Waleed Meleis, “Spectrum management of cognitive radio using multi-agent reinforcement learning”, In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Industry track (AAMAS '10), 1705-1712, 2010.

多様な通信デバイスや無線通信を利用するサービス・アプリケーションに関して、無線通信リソースを最適化するために、繰返し計算や機械学習を取り入れた手法が提案されている。これらの手法は、全ての無線通信システムが、通信容量の拡大など、統一された目的を持つことを前提としている。 For services and applications that use various communication devices and wireless communication, methods incorporating iterative calculation and machine learning have been proposed in order to optimize wireless communication resources. These methods assume that all wireless communication systems have a unified purpose, such as expansion of communication capacity.

しかしながら、実際には、複数の異なる無線通信システムが混在する環境下で、各々のシステムに同じ要求条件が課されるとは限らない。つまり、現実には、最適化の目的或いは目標がシステム毎に異なる事態が生じ得る。単一的な最適化を目指す従来の手法は、そのような事態に対して不十分である。この場合、無線デバイスの夫々に対する要求条件、或いは利用シーンの夫々に応じた要求条件が反映された最適化を実現することが必要になる。 However, in reality, the same requirements are not always imposed on each system in an environment where a plurality of different wireless communication systems coexist. That is, in reality, the purpose or goal of optimization may differ from system to system. Traditional methods aimed at single optimization are inadequate for such situations. In this case, it is necessary to realize optimization that reflects the requirements for each wireless device or the requirements according to each usage scene.

本発明は、複数の異なる制約条件、或いは複数の異なる要求条件が課された複数の無線通信システムが、同じ環境の中に干渉しながら混在する状況において、無線通信システムの設定パラメータや利用条件を、計算機を使用した学習を用いて算出する無線通信システムの最適化方法等を提供することを目的とする。 The present invention provides setting parameters and usage conditions for a wireless communication system in a situation where a plurality of wireless communication systems imposed with a plurality of different constraints or different requirements coexist while interfering with each other in the same environment. , It is an object of the present invention to provide an optimization method of a wireless communication system calculated by using learning using a computer.

第１の発明は、上記の目的を達成するため、異なる複数の無線通信システムが混在する無線通信環境において実行される無線通信システムの最適化方法であって、前記複数の無線通信システムに属する無線通信端末から、無線通信に関する状態を含む無線環境情報を検出するステップと、前記複数の無線通信システムの夫々に課される条件に対応して準備されたエージェントの夫々に、前記無線環境情報を提供するステップと、前記エージェントの夫々に、前記条件および前記無線環境情報を適用させた強化学習を計算機に実施させるステップと、前記複数の無線通信システムの夫々について、前記無線通信環境の下での最適な制御パラメータを、前記強化学習の結果に基づいて計算機に算出させるステップと、前記制御パラメータを、対応する無線通信システムに属する前記無線通信端末に提供するステップと、を含むことが望ましい。 The first invention is a method for optimizing a wireless communication system executed in a wireless communication environment in which a plurality of different wireless communication systems coexist in order to achieve the above object, and is a radio belonging to the plurality of wireless communication systems. The wireless environment information is provided to each of the steps prepared for detecting the wireless environment information including the state related to the wireless communication from the communication terminal and the agents prepared in response to the conditions imposed on each of the plurality of wireless communication systems. Optimal under the wireless communication environment for each of the multiple wireless communication systems, a step of causing the computer to perform enhanced learning to which the conditions and the wireless environment information are applied to each of the agents. It is desirable to include a step of causing a computer to calculate the control parameter based on the result of the reinforcement learning, and a step of providing the control parameter to the wireless communication terminal belonging to the corresponding wireless communication system.

また、第２の発明は、異なる複数の無線通信システムが混在する無線通信環境において動作する無線通信システムであって、前記複数の無線通信システムから無線環境情報を受け取ると共に、当該複数の無線通信システムに制御情報を提供する制御サーバを備え、当該制御サーバは、前記複数の無線通信システムに属する無線通信端末から、無線通信に関する状態を含む無線環境情報を検出する処理と、前記複数の無線通信システムの夫々に課される条件に対応して準備されたエージェントの夫々に、前記無線環境情報を提供する処理と、前記エージェントの夫々に前記条件および前記無線環境情報を適用させた強化学習を実施する処理と、前記複数の無線通信システムの夫々について、前記無線通信環境の下での最適な制御パラメータを、前記強化学習の結果に基づいて算出する処理と、前記制御パラメータを、対応する無線通信システムに属する前記無線通信端末に提供する処理と、を実行することが望ましい。 The second invention is a wireless communication system that operates in a wireless communication environment in which a plurality of different wireless communication systems coexist, and receives wireless environment information from the plurality of wireless communication systems and the plurality of wireless communication systems. The control server is provided with a control server that provides control information, and the control server detects wireless environment information including a state related to wireless communication from wireless communication terminals belonging to the plurality of wireless communication systems, and the plurality of wireless communication systems. The process of providing the radio environment information to each of the agents prepared in response to the conditions imposed on each of the agents and the reinforcement learning to apply the conditions and the radio environment information to each of the agents are carried out. The process, the process of calculating the optimum control parameters under the wireless communication environment for each of the plurality of wireless communication systems based on the result of the enhanced learning, and the process of calculating the control parameters according to the corresponding wireless communication system. It is desirable to execute the process provided to the wireless communication terminal belonging to the above.

また、第３の発明は、複数の無線通信システムから無線環境情報を受け取ると共に当該複数の無線通信システムに制御情報を提供する制御サーバに実装される無線通信システム用プログラムであって、当該制御サーバに、前記複数の無線通信システムに属する無線通信端末から、無線通信に関する状態を含む無線環境情報を検出する処理と、前記複数の無線通信システムの夫々に課される条件に対応して準備されたエージェントの夫々に、前記無線環境情報を提供する処理と、前記エージェントの夫々に前記条件および前記無線環境情報を適用させた強化学習を実施する処理と、前記複数の無線通信システムの夫々について、前記複数の無線通信システムが動作している無線通信環境の下での最適な制御パラメータを、前記強化学習の結果に基づいて算出する処理と、前記制御パラメータを、対応する無線通信システムに属する前記無線通信端末に提供する処理と、を実行させるものであることが望ましい。 The third invention is a program for a wireless communication system implemented in a control server that receives wireless environment information from a plurality of wireless communication systems and provides control information to the plurality of wireless communication systems, and the control server. In addition, the process of detecting wireless environment information including the state related to wireless communication from the wireless communication terminals belonging to the plurality of wireless communication systems and the conditions imposed on each of the plurality of wireless communication systems were prepared. The process of providing the wireless environment information to each of the agents, the process of performing enhanced learning to which the conditions and the wireless environment information are applied to each of the agents, and the process of performing the enhanced learning to each of the plurality of wireless communication systems are described. The process of calculating the optimum control parameters under the wireless communication environment in which a plurality of wireless communication systems are operating based on the result of the enhanced learning, and the control parameters are the wireless devices belonging to the corresponding wireless communication system. It is desirable to execute the process provided to the communication terminal.

本発明によれば、無線通信環境の中に、異なる複数の無線通信システムが混在する場合に、無線通信システムに課される条件毎に強化学習のためのエージェントが準備される。そして、エージェントの夫々に対応する条件を適用させて強化学習を進めることができる。このため、本発明によれば、複数の無線通信システムが同じ環境の中に干渉しながら混在する状況において、無線通信システムの設定パラメータや利用条件を、夫々の条件毎に最適化することができる。 According to the present invention, when a plurality of different wireless communication systems coexist in a wireless communication environment, an agent for reinforcement learning is prepared for each condition imposed on the wireless communication system. Then, reinforcement learning can be advanced by applying the conditions corresponding to each of the agents. Therefore, according to the present invention, in a situation where a plurality of wireless communication systems coexist while interfering with each other in the same environment, the setting parameters and usage conditions of the wireless communication system can be optimized for each condition. ..

無線通信システムの構成例を示す図である。It is a figure which shows the configuration example of a wireless communication system. 無線通信システムの制御例のフローチャートである。It is a flowchart of the control example of a wireless communication system. 従来の強化学習のモデル例を説明するための図である。It is a figure for demonstrating the model example of the conventional reinforcement learning. 本発明の実施の形態１で実施される強化学習のモデルの例を説明するための図である。It is a figure for demonstrating an example of the model of reinforcement learning carried out in Embodiment 1 of this invention. 図４に示すモデルを適用する環境の一例を説明するための図である。It is a figure for demonstrating an example of the environment to which the model shown in FIG. 4 is applied. 図５に示す環境で許容されるアグリゲーションの態様を説明するための図である。It is a figure for demonstrating the mode of the aggregation which is allowed in the environment shown in FIG. 従来の方式により決定されたチャネル割り当ての例を示す図である。It is a figure which shows the example of the channel allocation determined by the conventional method. 本発明の実施の形態１の方式により決定されたチャネル割り当ての例を示す図である。It is a figure which shows the example of the channel allocation determined by the method of Embodiment 1 of this invention. 本発明の実施の形態２で実施される強化学習のモデルの例を説明するための図である。It is a figure for demonstrating an example of the model of reinforcement learning carried out in Embodiment 2 of this invention.

実施の形態１．
［実施の形態１の構成］
本発明の実施形態１の無線通信システムは、図１に示す構成例により実現することができる。図１において、中段に示す無線通信端末１～Ｎは、夫々Access Point（AP）として機能する。これらは、図１の下段に示す無線通信端末Ｎ＋１～Ｎ＋Ｍと通信することができる。無線通信端末Ｎ＋１～Ｎ＋Ｍは、スマートフォン、IoT用のセンサ、スマートメータ等で構成されている。このように、図１に示す構成には、同じ周波数リソースを共用するが、規格や仕様が異なる複数の無線通信システムが含まれている。 Embodiment 1.
[Structure of Embodiment 1]
The wireless communication system of the first embodiment of the present invention can be realized by the configuration example shown in FIG. In FIG. 1, the wireless communication terminals 1 to N shown in the middle stage each function as an access point (AP). These can communicate with the wireless communication terminals N + 1 to N + M shown in the lower part of FIG. The wireless communication terminals N + 1 to N + M are composed of a smartphone, a sensor for IoT, a smart meter, and the like. As described above, the configuration shown in FIG. 1 includes a plurality of wireless communication systems that share the same frequency resource but have different standards and specifications.

本実施形態の無線通信システムは、制御サーバ１０を備えている。制御サーバ１０は、通信インターフェース、プロセッサユニット、メモリ等のハードウェアを備えている。制御サーバ１０は、これらのハードウェアが、メモリ内に格納されているプログラムに従って処理を進めることにより、後述する機能を実現する。 The wireless communication system of this embodiment includes a control server 10. The control server 10 includes hardware such as a communication interface, a processor unit, and a memory. The control server 10 realizes the functions described later by having these hardware proceed with processing according to a program stored in the memory.

制御サーバ１０は、APとして機能する無線通信端末１～Ｎに対して、制御情報を提供することができる。制御情報には、例えば、利用可能な周波数リソースや送信電力等の情報が含まれている。一方、無線通信端末１～Ｎは、制御サーバ１０に対して無線環境情報を送信することができる。無線環境情報には、無線通信端末１～Ｎ夫々の周辺における干渉情報や、無線通信端末Ｎ＋１～Ｎ＋Ｍとの接続成否の情報が含まれている。 The control server 10 can provide control information to the wireless communication terminals 1 to N that function as APs. The control information includes, for example, information such as available frequency resources and transmission power. On the other hand, the wireless communication terminals 1 to N can transmit wireless environment information to the control server 10. The wireless environment information includes interference information in the vicinity of each of the wireless communication terminals 1 to N and information on success or failure of connection with the wireless communication terminals N + 1 to N + M.

また、制御サーバ１０には、無線環境情報等に基づいて、制御情報に含める各種パラメータを最適化するための学習機能と、それら各種パラメータを、その学習の結果に基づいて決定する機能とが備わっている。 Further, the control server 10 is provided with a learning function for optimizing various parameters included in the control information based on wireless environment information and the like, and a function for determining these various parameters based on the learning result. ing.

［強化学習の概要］ [Outline of reinforcement learning]

本実施形態において、制御情報に含める各種パラメータの最適化には、強化学習が用いられる。図３は、一般的な強化学習のモデル図を示す。図３に示すモデルには、学習を行う対象としてエージェント１２が存在する。エージェント１２は、事象の観測タイミングをｔとして、一意な環境１４の中で、現在の状態Ｓ(ｔ)および報酬Ｒ(ｔ)から行動Ａ(ｔ＋１)を算出して実行する。その結果、状態Ｓ(ｔ＋１)が実現される。この状態Ｓ(ｔ＋１)から、行動を評価する報酬Ｒ(ｔ＋１)を得て、次の行動が算出される。 In this embodiment, reinforcement learning is used for optimizing various parameters included in the control information. FIG. 3 shows a model diagram of general reinforcement learning. In the model shown in FIG. 3, an agent 12 exists as a learning target. The agent 12 calculates and executes the action A (t + 1) from the current state S (t) and the reward R (t) in the unique environment 14 with the observation timing of the event as t. As a result, the state S (t + 1) is realized. From this state S (t + 1), the reward R (t + 1) for evaluating the action is obtained, and the next action is calculated.

以下の説明では、ｓおよびＳが状態、ａおよびＡが行動、ｒおよびＲが報酬を夫々表すものとする。ここで、小文字は個々のエージェント（最適化対象）に対するパラメータ、大文字はその集合（複数のエージェント）に対するパラメータであることを意味する。また、各パラメータの添え字ｔは、そのパラメータが、観測タイミングｔにおける値であることを示し、Ｓｔ，Ａｔ，ＲｔはそれぞれＳ(ｔ)，Ａ(ｔ)，Ｒ(ｔ)と同じであるものとする。 In the following description, s and S represent states, a and A represent actions, and r and R represent rewards, respectively. Here, lowercase letters mean parameters for individual agents (optimization targets), and uppercase letters mean parameters for their set (plurality of agents). Further, the subscript t of each parameter indicates that the parameter is a value at the observation timing t, and St, At, and Rt are the same as S (t), A (t), and R (t), respectively. It shall be.

図３に示す強化学習は、以下のステップの繰り返しにより進められる。
１．エージェント１２は、環境１４から状態Ｓ(ｔ)と報酬Ｒ(ｔ)を受け取り、方策πに基づいて決定した行動Ａ(ｔ)を環境１４に返す。
２．環境１４は、エージェント１２から受け取った行動Ａ(ｔ)と現在の状態Ｓ(ｔ)とに基づいて次の状態Ｓ(ｔ＋１)に変化し、遷移後の状態Ｓ(ｔ＋１)と報酬Ｒ(ｔ＋１)をエージェント１２に提供する。尚、報酬Ｒは、その直前の行動Ａの良し悪しを示すスカラー量である。 The reinforcement learning shown in FIG. 3 is advanced by repeating the following steps.
1. 1. The agent 12 receives the state S (t) and the reward R (t) from the environment 14, and returns the action A (t) determined based on the policy π to the environment 14.
2. 2. The environment 14 changes to the next state S (t + 1) based on the action A (t) received from the agent 12 and the current state S (t), and the state S (t + 1) and the reward R (t + 1) after the transition. ) Is provided to the agent 12. The reward R is a scalar amount indicating the quality of the action A immediately before that.

ある状態Ｓに対するエージェントの行動がＡであるとした場合、現時点から無限の未来までに得ることのできる報酬Ｒの総和、つまり収益Ｇは、次式のようになる。

Assuming that the action of the agent for a certain state S is A, the sum of the rewards R that can be obtained from the present time to the infinite future, that is, the profit G is as follows.

但し、γは０≦γ≦１であり、未来の報酬の影響をどの程度収益として評価するかを調整するパラメータである。 However, γ is 0 ≦ γ ≦ 1, and is a parameter for adjusting how much the influence of future rewards is evaluated as profit.

強化学習によるＱ学習では、行動ａの価値が以下の関数で評価される。

In Q-learning by reinforcement learning, the value of action a is evaluated by the following function.

但し、Ｅは期待値を示す関数である。また、Ｑ^πは、状態ｓから行動ａをとるエージェントが方策πに従って行動をとっていった場合の期待値を表す価値関数（以下、「Ｑ関数」とする）である。 However, E is a function indicating an expected value. Further, Q ^π is a value function (hereinafter referred to as “Q function”) representing an expected value when an agent taking action a from the state s takes an action according to the policy π.

図３に示す強化学習は、このＱ関数を最大化するように進められる。この学習は、例えば、状態ｓで行動ａを行ったときの収益Ｇを推定するＱ関数を、次式のアルゴリズムで求めることにより進めることができる。

The reinforcement learning shown in FIG. 3 is advanced so as to maximize this Q function. This learning can be advanced, for example, by obtaining a Q function for estimating the profit G when the action a is performed in the state s by the algorithm of the following equation.

ここで、ｐは学習率と呼ばれるパラメータで、機械学習の設計者が決める代数である。通常は１未満の小さな値に設定される。また、maxQは、理想的に取得すると考えられるＱ関数の最大値を示す。Q関数の学習は、各時間ｔごとに、次の時間ｔ＋１に取る行動によって得られるＱ値を全て見積もり、その中で最大のものを用いてQ 値を更新するというものである。 Here, p is a parameter called the learning rate, which is an algebra determined by the machine learning designer. Usually set to a small value less than 1. Further, maxQ indicates the maximum value of the Q function that is considered to be ideally acquired. The learning of the Q function is to estimate all the Q values obtained by the action taken at the next time t + 1 for each time t, and update the Q value using the largest one among them.

［実施の形態１の特徴］
図４は、本実施形態の無線通信システムにおいて実施される強化学習のモデルを示す。本実施形態では、条件の異なる複数の無線通信システムを対象とした最適化が図られる。複数の無線通信システムは、夫々の条件に基づいてグループ化することができる。図４に示すモデルでは、３つのグループが存在し、グループ毎にエージェントが存在している。 [Characteristics of Embodiment 1]
FIG. 4 shows a model of reinforcement learning implemented in the wireless communication system of the present embodiment. In this embodiment, optimization is achieved for a plurality of wireless communication systems having different conditions. Multiple wireless communication systems can be grouped based on their respective conditions. In the model shown in FIG. 4, there are three groups, and an agent exists for each group.

図４に示すエージェント１２－１，１２－３，１２－３は、夫々のグループに属するユーザｉの行動を評価する。例えば、エージェント１２－１は、グループ１に含まれるユーザｉの状態Ｓから、報酬Ｒを計算し、行動Ａを計算することができる。また、エージェント１２－２，１２－３は、夫々に属するユーザｉの状態Ｓから、報酬Ｒを計算して行動Ａを決定する。 Agents 12-1, 12-3, 12-3 shown in FIG. 4 evaluate the behavior of user i belonging to each group. For example, the agent 12-1 can calculate the reward R and the action A from the state S of the user i included in the group 1. Further, the agents 12-2 and 12-3 calculate the reward R from the state S of the user i belonging to each of them and determine the action A.

図４に示す３つのエージェント１２－１，１２－２，１２－３は、夫々に提供される報酬Ｒおよび状態Ｓに基づいて、夫々異なる行動Ａを出力することがある。そして、図４に示すモデルでは、同一の環境１４から、エージェント１２－１，１２－２，１２－３の夫々に対して、異なる報酬Ｒ並びに異なる状態Ｓが提供されることがある。 The three agents 12-1, 12-2, and 12-3 shown in FIG. 4 may output different actions A based on the reward R and the state S provided to each. Then, in the model shown in FIG. 4, different rewards R and different states S may be provided to agents 12-1, 12-2, and 12-3 from the same environment 14.

以下、図５および図６を参照して、同じ無線通信規格を満たす４つの異なる無線通信端末が存在する場合を例にして説明を続ける。図５は、本例で制御対象となる４つの端末についての要求条件等を整理して表した図である。また、図６は、本例で許容されるアグリゲーションの例を示す。 Hereinafter, the description will be continued with reference to FIGS. 5 and 6 by taking as an example the case where there are four different wireless communication terminals satisfying the same wireless communication standard. FIG. 5 is a diagram showing the requirements and the like for the four terminals to be controlled in this example in an organized manner. Further, FIG. 6 shows an example of the aggregation allowed in this example.

本例では、全ての端末への周波数リソースの割り当て方（周波数チャネル位置および周波数チャネル幅）が制御される。周波数リソースとしては、チャネル１～４の４つの単位チャネルが存在する。これらのチャネルは、２つまたは４つをアグリゲーションして使用することができる。 In this example, how to allocate frequency resources to all terminals (frequency channel position and frequency channel width) is controlled. As the frequency resource, there are four unit channels of channels 1 to 4. These channels can be used by aggregating two or four.

また、４つの無線通信端末に対する要求条件は、各々以下の通りである。
１．無線通信端末１については、センサネットワークで親機として利用するため、「多数の端末（センサ）からの上り送信成功率の最大化」が要求条件となる。
２．無線通信端末２については、広域センサネットワークで利用するため、「伝送到達距離の最大化」が要求条件となる。
３．無線通信端末３および４については、データ配信で親機として使用するため、「配下の端末への下りスループットの最大化」が要求条件となる。 The requirements for the four wireless communication terminals are as follows.
1. 1. Since the wireless communication terminal 1 is used as a master unit in a sensor network, "maximization of the success rate of uplink transmission from a large number of terminals (sensors)" is a requirement.
2. 2. Since the wireless communication terminal 2 is used in a wide area sensor network, "maximization of transmission reach" is a requirement.
3. 3. Since the wireless communication terminals 3 and 4 are used as a master unit in data distribution, "maximization of downlink throughput to subordinate terminals" is a requirement.

尚、夫々の無線通信端末には、上記の要求条件の他にも、当然ながら幾つかの要求条件が課される。上記の要求条件は、夫々の無線通信端末に要求される幾つかの条件の中で、夫々の性質に応じて最も優先されるべき条件である。 In addition to the above requirements, each wireless communication terminal is naturally subject to some requirements. The above-mentioned requirements are the conditions that should be given the highest priority according to the nature of each of the several conditions required for each wireless communication terminal.

無線通信端末１をエージェント１、無線通信端末２をエージェント２、無線通信端末３および４をエージェント３とした場合、各々の報酬の計算は、下記のように設定することができる。 When the wireless communication terminal 1 is an agent 1, the wireless communication terminal 2 is an agent 2, and the wireless communication terminals 3 and 4 are agents 3, the calculation of each reward can be set as follows.

エージェント１の要求条件は、上記の通り「多数端末からの上り送信成功率の最大化」である。従って、エージェント１は、無線通信端末１の配下で上り通信を実施する送信端末の送信成功率が最大化されるように行動Ａを決定する。本例では、全ての端末が無LANでキャリアセンスを実施できるものとする。この場合、送信成功率は次式により表すことができる。

As described above, the requirement condition of the agent 1 is "maximization of the uplink transmission success rate from a large number of terminals". Therefore, the agent 1 determines the action A so as to maximize the transmission success rate of the transmission terminal that carries out uplink communication under the control of the wireless communication terminal 1. In this example, it is assumed that all terminals can carry out carrier sense without LAN. In this case, the transmission success rate can be expressed by the following equation.

但し、上記の式中に示すＮは、同じチャネル内に存在する送信端末の総数である。ここで、上りトラヒックについては、同じチャネル内の全ての親機の配下にある端末が送信端末となる。また、下りトラヒックの場合は、同じチャネル内の全ての親機が送信端末となる。そして、上記のＮは、上りトラヒックの送信端末の数と、下りトラヒックの送信端末の数との和である。また、上記式中のτは、各送信端末が送信を行う確率である。この確率は、上記の総数Ｎに基づいて計算することができる。更に、上記式中のｎ’は制御対象となっている親機に接続される送信端末の数である。 However, N shown in the above equation is the total number of transmitting terminals existing in the same channel. Here, for uplink traffic, terminals under the control of all master units in the same channel are transmission terminals. In the case of downlink traffic, all master units in the same channel are transmission terminals. The above N is the sum of the number of upstream traffic transmitting terminals and the number of downlink traffic transmitting terminals. Further, τ in the above equation is the probability that each transmitting terminal performs transmission. This probability can be calculated based on the total number N above. Further, n'in the above equation is the number of transmission terminals connected to the master unit to be controlled.

エージェント２の要求条件は、「伝送到達距離の最大化」である。このため、エージェント２は、伝搬特性、電力密度、フレームエラー率などを考慮する。複数の周波数チャネルから選択可能である場合、伝搬特性を考慮した報酬、即ち伝送到達距離Ｄは下記の数式で表現できる。

The requirement of the agent 2 is "maximization of transmission reach". Therefore, the agent 2 considers the propagation characteristics, the power density, the frame error rate, and the like. When it is possible to select from a plurality of frequency channels, the reward considering the propagation characteristics, that is, the transmission reach distance D can be expressed by the following formula.

但し、上記式中のＬは、伝搬による減衰を求める関数であり、Ｌ^－１はその逆関数である。また、ｆｃは伝送信号の中心周波数であり、Ｂは帯域幅である。 However, L in the above equation is a function for obtaining attenuation due to propagation, and L ^-1 is an inverse function thereof. Further, fc is the center frequency of the transmission signal, and B is the bandwidth.

エージェント３については、「配下の端末への下りスループットの最大化」が要求条件である。このため、エージェント３は、例えば、エージェント１の場合と同様に、全ての端末が無線LANでキャリアセンスを実施できるとした場合のスループットを評価する。この場合、そのスループットは次式により算出することができる。

For the agent 3, "maximization of downlink throughput to subordinate terminals" is a requirement. Therefore, the agent 3 evaluates the throughput when all the terminals can carry out carrier sense by wireless LAN, as in the case of the agent 1, for example. In this case, the throughput can be calculated by the following equation.

但し、上記式中のＥ[Ｌ]は、送信成功時のビット数平均であり、Ｅ[Ｉ]は平均待ち時間である。また、式中のＴｓは平均送信フレーム時間であり、Ｔｃは衝突で浪費する平均時間である。 However, E [L] in the above equation is the average number of bits at the time of successful transmission, and E [I] is the average waiting time. Further, Ts in the equation is the average transmission frame time, and Tc is the average time wasted in the collision.

従来の方式による最適化の制御は、全ての端末について等しくスループットが最大化されることを目指して実施される。この場合、端末間の干渉が生じないようにチャネルの割り当てが決定される。より具体的には、図７に示すように、無線通信端末１～４に対して、夫々一つずつチャネルが割り当てられる。つまり、下りの送信が主となる無線通信端末３および４については、送信端末間の衝突が殆どないにも関わらず、狭い帯域幅で周波数リソースが割り当てられる。その結果、無線通信端末３および４のスループットは、本来実現できるスループットより低いものとなってしまう。 Optimization control by the conventional method is carried out with the aim of maximizing the throughput equally for all terminals. In this case, channel allocation is determined so that interference between terminals does not occur. More specifically, as shown in FIG. 7, one channel is assigned to each of the wireless communication terminals 1 to 4. That is, for the wireless communication terminals 3 and 4 mainly for downlink transmission, frequency resources are allocated with a narrow bandwidth even though there is almost no collision between the transmitting terminals. As a result, the throughput of the wireless communication terminals 3 and 4 is lower than the throughput that can be originally realized.

これに対して、本実施形態では、チャネルの割り当てが、例えば図８に示すように決定される。ここでは、無線通信端末１に対してチャネル２が、無線通信端末２に対してチャネル１が割り当てられている。無線通信端末１，２は、センサネットワークの構成要素であるため、外部からの干渉の影響を受けやすい。このため、これらの端末１，２には、狭い帯域幅の単位チャネルが割り当てられる。更に、無線通信端末２には、広域での通信が求められる。信号の伝搬ロスは、通信の中心周波数が低いほど小さくなる。無線通信端末２に割り当てられたチャネル１は、最も周波数が低く、信号の伝搬ロスが最小となると考えられるチャネルである。 On the other hand, in this embodiment, the channel allocation is determined, for example, as shown in FIG. Here, the channel 2 is assigned to the wireless communication terminal 1 and the channel 1 is assigned to the wireless communication terminal 2. Since the wireless communication terminals 1 and 2 are components of the sensor network, they are easily affected by external interference. Therefore, these terminals 1 and 2 are assigned a unit channel having a narrow bandwidth. Further, the wireless communication terminal 2 is required to communicate over a wide area. The signal propagation loss decreases as the center frequency of communication decreases. The channel 1 assigned to the wireless communication terminal 2 has the lowest frequency and is considered to have the minimum signal propagation loss.

一方、スループットの最大化が重要である無線通信端末３および４に対しては、単位チャネルを２つアグリゲーションしたチャネル３＋４が割り当てられる。この割り当てによれば、無線通信端末３と４は互いに干渉することになる。しかし、それらは何れも下りトラヒックが主たるトラヒックであるため、送信端末は主に親機の２台となる。この場合、同じ周波数チャネル内で共存による衝突が生ずる確率は低い。このため、２台の端末が同じチャネル内で共存していても、互いが常にチャネルを取り合うようなシナリオでなければ、アグリゲーションにより帯域幅を大きくすることで、瞬時スループットが増大する効果が見込める。 On the other hand, for the wireless communication terminals 3 and 4 where maximization of throughput is important, channels 3 + 4 in which two unit channels are aggregated are assigned. According to this allocation, the wireless communication terminals 3 and 4 interfere with each other. However, since all of them are mainly downlink traffic, the transmission terminals are mainly two master units. In this case, the probability of collision due to coexistence within the same frequency channel is low. Therefore, even if two terminals coexist in the same channel, the effect of increasing the instantaneous throughput can be expected by increasing the bandwidth by aggregation unless the scenario is such that the two terminals always compete with each other.

以上説明した通り、本実施形態の無線通信システムによれば、要求条件等の異なる複数の無線通信端末に対して、それぞれ異なるエージェントを設定して最適化のための強化学習を進めることができる。そして、夫々の端末の行動を、夫々に対する要求条件等に応じて、個別独立に最適化することができる。このため、本実施形態の無線通信システムによれば、要求条件等の異なる異種の端末が混在するエリアにおいて、夫々の端末に、夫々に求められている要求に関して、最大限のパフォーマンスを発揮させることができる。 As described above, according to the wireless communication system of the present embodiment, different agents can be set for each of a plurality of wireless communication terminals having different requirements and the like, and reinforcement learning for optimization can be promoted. Then, the behavior of each terminal can be individually and independently optimized according to the requirements for each terminal and the like. Therefore, according to the wireless communication system of the present embodiment, in an area where different types of terminals having different requirements and the like coexist, each terminal can exert the maximum performance with respect to the required requirements. Can be done.

なお、本例は簡易な例であるため、例えばマルコフ過程でのモデル化も可能であるが、現実の環境は、隠れ端末などの影響でモデル化が難しい複雑なものとなる。このため、現実の環境を想定した場合、強化学習が必要となる。また、環境が複雑である場合は、数式モデルではなく、シミュレーションや実空間での測定結果を利用する方法、或いはデータベースを使用して状態や報酬を測る方法などを用いてもよい。 Since this example is a simple example, it can be modeled in a Markov process, for example, but the actual environment becomes complicated and difficult to model due to the influence of hidden terminals and the like. Therefore, when assuming a real environment, reinforcement learning is required. Further, when the environment is complicated, a method of using a simulation or a measurement result in a real space, a method of measuring a state or a reward using a database, or the like may be used instead of a mathematical model.

また、本例では、周波数リソースを割り当てる制御の例を示したが、本発明はこれに限定されるものではない。上記の例の他にも、送信電力、送信頻度（もしくはランダムアクセスに要する平均待ち時間）、無線LANのRTS/CTS設定などの無線通信システムに関するパラメータ、接続可能とする端末数、消費電力値など、無線通信に使用するリソースや設定値は、制御の対象とすることができる。 Further, in this example, an example of control for allocating frequency resources is shown, but the present invention is not limited thereto. In addition to the above examples, transmission power, transmission frequency (or average waiting time required for random access), parameters related to wireless communication systems such as RTS / CTS settings for wireless LAN, number of terminals that can be connected, power consumption value, etc. , Resources and settings used for wireless communication can be controlled.

実施の形態２．
次に、図１と共に図９を参照して、本発明の実施の形態２について説明する。本実施形態の無線通信システムは、実施の形態１の場合と同様に、図１に示す構成により実現することができる。 Embodiment 2.
Next, a second embodiment of the present invention will be described with reference to FIG. The wireless communication system of the present embodiment can be realized by the configuration shown in FIG. 1 as in the case of the first embodiment.

図９は、本実施形態の無線通信システムにおいて実施される強化学習のモデルを示す。図９に示すモデルには、実施の形態１の場合と同様に、複数のエージェント１２－１，１２－２，１２－３が含まれている。そして、このモデルでは、異なる複数の環境が評価の対象となることが想定されている。より具体的には、図９に示すモデルでは、エージェント１２－１，１２－２，１２－３の夫々が選択した行動を返す環境として、複数の環境が存在している。この場合、選択された行動を評価するために環境を選択する必要が生ずる。 FIG. 9 shows a model of reinforcement learning implemented in the wireless communication system of the present embodiment. The model shown in FIG. 9 includes a plurality of agents 12-1, 12-2, 12-3 as in the case of the first embodiment. And in this model, it is assumed that multiple different environments will be evaluated. More specifically, in the model shown in FIG. 9, a plurality of environments exist as environments for returning the behavior selected by each of the agents 12-1, 12-2, and 12-3. In this case, it becomes necessary to select the environment in order to evaluate the selected behavior.

エージェント１２－１，１２－２，１２－３の夫々が、要求条件の違いで定義付けられている場合、環境は、規格の違いや通信システムの違いにより定義付けることができる。例えば、IoT向けの無線通信システムとしては、IEEE 802.11ah、Wi-SUN、或いはLoRaが存在する。これらのシステムでは、規定されている周波数帯域幅や変復調方式が異なっている。このため、受信電力値と干渉電力値が同じであったとしても、送信フレームがエラーとなる確率はシステム毎に異なった値となる。 When agents 12-1, 12-2, and 12-3 are defined by different requirements, the environment can be defined by different standards and communication systems. For example, as a wireless communication system for IoT, there are IEEE 802.11ah, Wi-SUN, or LoRa. These systems differ in the specified frequency bandwidth and modulation / demodulation method. Therefore, even if the received power value and the interference power value are the same, the probability that the transmission frame will cause an error will be different for each system.

このため、異なる無線通信システムが混在し、それらに干渉が生ずるエリアでは、一方のシステムに対する干渉の影響を、他方のシステムに対する影響より大きく見積もる必要が生ずる。同様の事情は、例えば消費電力の評価に関しても発生する。即ち、実機について比較すれば、無線通信端末のハード構成は必ずしも均一ではなく、バッテリ容量の大きいものと、その容量が小さいものとが同じエリアに混在することがある。そして、消費電力の影響は、バッテリ容量の小さい端末では、バッテリ容量の大きい端末より、大きく見積もる必要がある。 Therefore, in an area where different wireless communication systems coexist and interfere with each other, it is necessary to estimate the influence of the interference on one system larger than the influence on the other system. Similar circumstances occur, for example, with respect to the evaluation of power consumption. That is, when comparing the actual machines, the hardware configurations of the wireless communication terminals are not always uniform, and those having a large battery capacity and those having a small capacity may coexist in the same area. The effect of power consumption needs to be estimated larger in a terminal having a small battery capacity than in a terminal having a large battery capacity.

更に、マルチRFの機能を具備する無線通信端末が制御対象である場合は、周波数帯域毎に環境を評価する必要が生ずる。例えば、９２０MHz、２．４GHz、５GHzのトライバンドで動作する無線通信端末については、それらの何れの周波数帯域で動作しているかに応じて、環境評価の手法を切り替える必要が生ずる。 Further, when a wireless communication terminal having a multi-RF function is a control target, it becomes necessary to evaluate the environment for each frequency band. For example, for a wireless communication terminal operating in a tri-band of 920MHz, 2.4GHz, or 5GHz, it is necessary to switch the environmental evaluation method depending on which frequency band the wireless communication terminal operates in.

図９に示すモデルには、３つの環境１４－１、１４－２、１４－３が準備されている。これらの環境１４－１、１４－２、１４－３は、制御の対象となる複数の無線通信端末について成立する可能性のある環境を網羅するように整理されている。このため、本実施形態では、制御サーバ１０の管理下にある全ての無線通信端末は、環境１４－１、１４－２、１４－３の何れかの下で動作していることになる。 In the model shown in FIG. 9, three environments 14-1, 14-2, and 14-3 are prepared. These environments 14-1, 14-2, and 14-3 are arranged so as to cover the environments that may be established for a plurality of wireless communication terminals to be controlled. Therefore, in the present embodiment, all the wireless communication terminals under the control of the control server 10 are operating under any of the environments 14-1, 14-2, and 14-3.

図９に示すモデルは、環境選択部１６を備えている。環境選択部１６は、エージェント１２－１が置かれている環境を、３つの環境１４－１、１４－２、１４－３の中から選択し、選択した環境に行動Ａ_１を提供する。これにより、異なる複数の環境が併存する状況下であっても、エージェント１２－１の行動Ａ_１は正しい環境に戻されることになる。環境選択部１６は、エージェント１２－２，１２－３についても、同様の環境選択を行う。これにより、エージェント１２－２，１２－３によって選択される行動Ａ_２，Ａ_３についても、夫々適切な環境に戻されることになる。環境選択部１６の機能は、制御サーバ１０が、無線環境情報に基づいて無線通信端末１～Ｎの置かれた環境を判断することにより実現される。 The model shown in FIG. 9 includes an environment selection unit 16. The environment selection unit 16 selects the environment in which the agent 12-1 is placed from the _three environments 14-1, 14-2, and 14-3, and provides the action A1 to the selected environment. As a result, even in a situation where a plurality of different environments coexist, the action A1 of the agent _12-1 is returned to the correct environment. The environment selection unit 16 also selects the same environment for agents 12-2 and 12-3. As a result, the actions A ₂ and A ₃ selected by the agents 12-2 and 12-3 are also returned to the appropriate environment, respectively. The function of the environment selection unit 16 is realized by the control server 10 determining the environment in which the wireless communication terminals 1 to N are placed based on the wireless environment information.

図９に示すモデルは、更に、エージェント選択部１８を備えている。エージェント選択部１８は、環境１４－１、１４－２、１４－３の夫々から提供される状態Ｓ_１、Ｓ_２、Ｓ_３並びに報酬Ｒ_１、Ｒ_２，Ｒ_３を、適切なエージェントに提供する。エージェント選択部１８の機能は、制御サーバ１０が、無線通信端末１～Ｎのうち適切なものに対して制御情報を提供することにより実現される。 The model shown in FIG. 9 further includes an agent selection unit 18. The agent selection unit 18 provides the appropriate agents with the states S1, S2 _, _S3 and the rewards _R1 _, _R2 , R3 provided by the environments 14-1, 14-2, and _14-3 , respectively. do. The function of the agent selection unit 18 is realized by the control server 10 providing control information to an appropriate one of the wireless communication terminals 1 to N.

以上説明した通り、図９に示すモデルによれば、同じ周波数リソース内で共存する複数の無線通信システムを対象として最適化の制御を行う場合に、複数の環境を適宜切り替えて、選択された行動Ａを評価することができる。同様に、図９に示すモデルは、複数の周波数帯域を適宜切り替えて動作するような無線通信端末が制御対象に含まれる場合にも、複数の環境を切り替えることで、選択された行動Ａの有用性を適切に評価することができる。 As described above, according to the model shown in FIG. 9, when optimizing control is performed for a plurality of wireless communication systems coexisting within the same frequency resource, a plurality of environments are appropriately switched and selected actions are taken. A can be evaluated. Similarly, in the model shown in FIG. 9, even when a wireless communication terminal that operates by appropriately switching a plurality of frequency bands is included in the control target, the selected action A is useful by switching a plurality of environments. Gender can be evaluated appropriately.

１０制御サーバ
１２，１２－１，１２－２，１２－３エージェント
１４，１４－１，１４－２，１４－３環境
１６環境選択部
１８エージェント選択部 10 Control server 12, 12-1, 12-2, 12-3 Agent 14, 14-1, 14-2, 14-3 Environment 16 Environment selection unit 18 Agent selection unit

Claims

It is a method of optimizing a wireless communication system executed in a wireless communication environment in which a plurality of different wireless communication systems coexist.
A step of detecting wireless environment information including a state related to wireless communication from wireless communication terminals belonging to the plurality of wireless communication systems, and
A step of providing the wireless environment information to each of the agents prepared in response to the conditions imposed on each of the plurality of wireless communication systems.
A step of causing the computer to perform reinforcement learning to which the conditions and the wireless environment information are applied to each of the agents.
For each of the plurality of wireless communication systems, a step of causing a computer to calculate the optimum control parameters under the wireless communication environment based on the result of the reinforcement learning.
A step of providing the control parameter to the wireless communication terminal belonging to the corresponding wireless communication system, and
How to optimize wireless communication systems, including.

The plurality of wireless communication systems include wireless communication systems that are subject to different conditions.
The optimization method according to claim 1, wherein in the reinforcement learning, the reward, state, and behavior of the agent are set for each of the same conditions and optimization is aimed at.

The conditions imposed on each of the plurality of wireless communication systems include a plurality of requirements.
The optimization method according to claim 2, wherein in the reinforcement learning, the requirements that should be given the highest priority for each of the wireless communication systems are applied to the corresponding agents, and optimization is aimed at.

In the wireless communication environment, a plurality of wireless communication systems conforming to different wireless communication standards are mixed.
The optimization method according to claim 2, wherein in the reinforcement learning, different conditions are set for each wireless communication standard and optimization is aimed at.

The wireless communication environment is a mixture of different environments.
The optimization method according to claim 1, wherein in the reinforcement learning, an environmental evaluation corresponding to each of the agents is set for each of the plurality of environments, and optimization is aimed at.

A wireless communication system that operates in a wireless communication environment in which a plurality of different wireless communication systems coexist.
A control server that receives wireless environment information from the plurality of wireless communication systems and provides control information to the plurality of wireless communication systems is provided.
The control server is
A process for detecting wireless environment information including a state related to wireless communication from wireless communication terminals belonging to the plurality of wireless communication systems.
A process of providing the wireless environment information to each of the agents prepared in response to the conditions imposed on each of the plurality of wireless communication systems.
A process of performing reinforcement learning by applying the conditions and the wireless environment information to each of the agents, and
For each of the plurality of wireless communication systems, a process of calculating the optimum control parameters under the wireless communication environment based on the result of the reinforcement learning, and
A process of providing the control parameter to the wireless communication terminal belonging to the corresponding wireless communication system, and
A wireless communication system that runs.

A program for a wireless communication system implemented in a control server that receives wireless environment information from a plurality of wireless communication systems and provides control information to the plurality of wireless communication systems.
To the control server
A process for detecting wireless environment information including a state related to wireless communication from wireless communication terminals belonging to the plurality of wireless communication systems.
A process of providing the wireless environment information to each of the agents prepared in response to the conditions imposed on each of the plurality of wireless communication systems.
A process of performing reinforcement learning by applying the conditions and the wireless environment information to each of the agents, and
For each of the plurality of wireless communication systems, a process of calculating the optimum control parameters under the wireless communication environment in which the plurality of wireless communication systems are operating, and a process of calculating the optimum control parameters based on the result of the reinforcement learning.
A process of providing the control parameter to the wireless communication terminal belonging to the corresponding wireless communication system, and
A program for wireless communication systems to execute.