JP7645640B2

JP7645640B2 - Vehicle control device, vehicle control method, and program

Info

Publication number: JP7645640B2
Application number: JP2021002159A
Authority: JP
Inventors: 建後藤
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2025-03-14
Anticipated expiration: 2041-01-08
Also published as: JP2022107296A

Description

本発明は、車両制御装置、車両制御方法、およびプログラムに関する。 The present invention relates to a vehicle control device, a vehicle control method, and a program.

従来、車両を自動的に（Automatedly）制御する技術（自動運転）について実用化が進められている。自動運転における各種の制御を、現在時点の環境に基づくフィードバック制御だけでなく、将来の状態を考慮したモデリングによって行うことについて研究がなされている。特許文献１、２には、マルコフ決定過程を用いたモデリングによって自動運転の行動計画を生成することについて記載されている。 Technology for automatically controlling vehicles (autonomous driving) has been put to practical use. Research is being conducted on various types of control in autonomous driving, not only through feedback control based on the current environment, but also through modeling that takes future conditions into account. Patent documents 1 and 2 describe the generation of an action plan for autonomous driving through modeling using a Markov decision process.

特表２０２０－５１０５７０号公報Special Publication No. 2020-510570 国際公開第２０１９／１６７４５７号International Publication No. 2019/167457

車両の速度制御には考慮すべき要素が種々存在し、適切な制約を設定しなければ誤った解が導出される場合がある。また、不要な試行錯誤が発生して処理負荷が過大となることも懸念される。 There are various factors that must be taken into account when controlling vehicle speed, and if appropriate constraints are not set, an incorrect solution may be derived. There is also a concern that unnecessary trial and error may occur, resulting in an excessive processing load.

本発明は、このような事情を考慮してなされたものであり、将来の状態を考慮した速度制御を行う際に、適切な制約を設定することで好適な解を得ることができる車両制御装置、車両制御方法、およびプログラムを提供することを目的の一つとする。 The present invention was made in consideration of these circumstances, and one of its objectives is to provide a vehicle control device, a vehicle control method, and a program that can obtain a suitable solution by setting appropriate constraints when performing speed control that takes future conditions into account.

この発明に係る車両制御装置、車両制御方法、およびプログラムは、以下の構成を採用した。
（１）：この発明の一態様に係る車両制御装置は、時間を含み、車両の移動に関連する複数の要素を軸とする状態空間において、開始時点から目標時点までの間の複数の時点間に生じる行動量を複数の候補の中から選択することで定義される、前記開始時点から前記目標時点までの間の複数の状態パスのそれぞれについて、状態価値を算出する状態価値算出部と、前記状態価値の高い状態パスに従って前記車両の将来の速度推移を決定する速度決定部と、を備え、前記状態価値算出部は、開始時点から目標時点までの間の各時点について、一以上の評価対象量に基づく報酬関数値を算出し、前記報酬関数値を時系列に合計することで前記状態価値を算出するものであり、前記複数の要素のうち二以上の要素を軸とする複数の部分空間のうち一部または全部に設定される、前記状態価値を低下させる第１領域と、前記報酬関数値を低下させる第２領域とに従って前記状態価値を算出する。 A vehicle control device, a vehicle control method, and a program according to the present invention employ the following configuration.
(1): A vehicle control device according to one embodiment of the present invention includes a state value calculation unit that calculates a state value for each of a plurality of state paths between a start point and a target point in a state space including time and having a plurality of elements related to the movement of the vehicle as its axes, the state path being defined by selecting from a plurality of candidates an amount of action occurring between a plurality of time points between the start point and the target point, and a speed determination unit that determines a future speed transition of the vehicle in accordance with a state path with a high state value. The state value calculation unit calculates a reward function value based on one or more evaluation target quantities for each time point between the start point and the target point and calculates the state value by summing up the reward function values in a time series, and calculates the state value according to a first region that reduces the state value and a second region that reduces the reward function value, which are set in some or all of a plurality of subspaces having axes of two or more elements among the plurality of elements.

（２）：上記（１）の態様において、前記複数の要素は、時間、前記車両の進行方向に関する位置、速度、および加速度のうち一部または全部を含むものである。 (2): In the above aspect (1), the multiple elements include some or all of the following: time, position in the direction of travel of the vehicle, speed, and acceleration.

（３）：上記（１）または（２）の態様において、前記行動量は、ジャークであるものである。 (3): In the above embodiment (1) or (2), the amount of movement is a jerk.

（４）：上記（１）から（３）のいずれかの態様において、前記一以上の評価対象量は、前記車両の速度、加速度、およびジャークを含むものである。 (4): In any of the above aspects (1) to (3), the one or more evaluation quantities include the speed, acceleration, and jerk of the vehicle.

（５）：上記（１）から（４）のいずれかの態様において、前記車両の周辺状況を認識する認識部を更に備え、前記状態価値算出部は、前記車両の周辺状況に基づいて、前記複数の部分空間のうち一部または全部に、前記第１領域と前記第２領域とのうち一方または双方を設定するものである。 (5): In any of the above aspects (1) to (4), a recognition unit that recognizes the surrounding conditions of the vehicle is further provided, and the state value calculation unit sets one or both of the first area and the second area in some or all of the plurality of partial spaces based on the surrounding conditions of the vehicle.

（６）：上記（５）の態様において、前記状態価値算出部は、前記車両の周辺状況に静的目標物が含まれる場合、位置と速度とを軸とする部分空間に前記第１領域と前記第２領域とを設定するものである。 (6): In the aspect of (5) above, when the surrounding situation of the vehicle includes a static target, the state value calculation unit sets the first region and the second region in a subspace having axes of position and speed.

（７）：上記（５）または（６）の態様において、前記状態価値算出部は、前記車両の周辺状況に動的目標物が含まれる場合、位置と時間とを軸とする部分空間に前記第１領域と前記第２領域とを設定するものである。 (7): In the above aspect (5) or (6), when the surrounding situation of the vehicle includes a moving target object, the state value calculation unit sets the first region and the second region in a subspace having axes of position and time.

（８）：上記（５）から（７）のいずれかの態様において、前記状態価値算出部は、前記第１領域と前記第２領域とを互いに隣接させて設定するものである。 (8): In any of the above aspects (5) to (7), the state value calculation unit sets the first area and the second area adjacent to each other.

（９）：本発明の他の態様に係る車両制御方法は、車両制御装置が、時間を含み、車両の移動に関連する複数の要素を軸とする状態空間において、開始時点から目標時点までの間の複数の時点間に生じる行動量を複数の候補の中から選択することで定義される、前記開始時点から前記目標時点までの間の複数の状態パスのそれぞれについて、状態価値を算出し、前記状態価値の高い状態パスに従って前記車両の将来の速度推移を決定し、前記状態価値を算出する際に、開始時点から目標時点までの間の各時点について、一以上の評価対象量に基づく報酬関数値を算出し、前記報酬関数値を時系列に合計することで前記状態価値を算出し、前記複数の要素のうち二以上の要素を軸とする複数の部分空間のうち一部または全部に設定される、前記状態価値を低下させる第１領域と、前記報酬関数値を低下させる第２領域とに従って前記状態価値を算出するものである。 (9): In another aspect of the present invention, a vehicle control method includes a vehicle control device that calculates a state value for each of a plurality of state paths between a start time and a target time, the state path being defined by selecting from a plurality of candidates an amount of action occurring between a plurality of time points between a start time and a target time in a state space including time and having a plurality of elements related to the movement of the vehicle as its axes, determines a future speed transition of the vehicle according to the state path with the highest state value, calculates a reward function value based on one or more evaluation target quantities for each time point between the start time and the target time when calculating the state value, and calculates the state value by summing up the reward function values in a time series, and calculates the state value according to a first region that reduces the state value and a second region that reduces the reward function value, which are set in some or all of a plurality of subspaces having axes of two or more elements among the plurality of elements.

（１０）：本発明の他の態様に係るプログラムは、コンピュータに、時間を含み、車両の移動に関連する複数の要素を軸とする状態空間において、開始時点から目標時点までの間の複数の時点間に生じる行動量を複数の候補の中から選択することで定義される、前記開始時点から前記目標時点までの間の複数の状態パスのそれぞれについて、状態価値を算出させ、前記状態価値の高い状態パスに従って前記車両の将来の速度推移を決定させるプログラムであって、前記状態価値を算出させる際に、開始時点から目標時点までの間の各時点について、一以上の評価対象量に基づく報酬関数値を算出し、前記報酬関数値を時系列に合計することで前記状態価値を算出させ、前記複数の要素のうち二以上の要素を軸とする複数の部分空間のうち一部または全部に設定される、前記状態価値を低下させる第１領域と、前記報酬関数値を低下させる第２領域とに従って前記状態価値を算出させるものである。 (10): A program according to another aspect of the present invention is a program that causes a computer to calculate a state value for each of a plurality of state paths between a start time and a target time, the state path being defined by selecting from a plurality of candidates the amount of action occurring between a plurality of time points between the start time and the target time in a state space including time and having a plurality of elements related to the movement of the vehicle as its axes, and determines the future speed transition of the vehicle according to the state path with the highest state value. When calculating the state value, the program calculates a reward function value based on one or more evaluation target quantities for each time point between the start time and the target time, and calculates the state value by summing the reward function values in a time series, and calculates the state value according to a first region that reduces the state value and a second region that reduces the reward function value, which are set in some or all of a plurality of subspaces having axes of two or more elements out of the plurality of elements.

上記（１）～（１０）の態様によれば、将来の状態を考慮した速度制御を行う際に、適切な制約を設定することで好適な解を得ることができる。 According to the above aspects (1) to (10), when performing speed control taking into account future conditions, it is possible to obtain a suitable solution by setting appropriate constraints.

実施形態に係る車両制御装置を利用した車両システム１の構成図である。1 is a configuration diagram of a vehicle system 1 that uses a vehicle control device according to an embodiment. 第１制御部１２０および第２制御部１６０の機能構成図である。FIG. 2 is a functional configuration diagram of a first control unit 120 and a second control unit 160. 状態パスの定義について説明するための図である。FIG. 11 is a diagram for explaining the definition of a state path. 部分空間制約マップ１４６の第１例を示す図である。FIG. 13 is a diagram showing a first example of a subspace constraint map 146. 部分空間制約マップ１４６の第２例を示す図である。FIG. 13 is a diagram showing a second example of the subspace constraint map 146. 状態価値算出部１４２および速度決定部１４４により実行される処理の流れの一例を示すフローチャートである。13 is a flowchart showing an example of the flow of processes executed by a state value calculation unit 142 and a speed determination unit 144.

以下、図面を参照し、本発明の車両制御装置、車両制御方法、およびプログラムの実施形態について説明する。 Below, an embodiment of the vehicle control device, vehicle control method, and program of the present invention will be described with reference to the drawings.

［全体構成］
図１は、実施形態に係る車両制御装置を利用した車両システム１の構成図である。車両システム１が搭載される車両は、例えば、二輪や三輪、四輪等の車両であり、その駆動源は、ディーゼルエンジンやガソリンエンジンなどの内燃機関、電動機、或いはこれらの組み合わせである。電動機は、内燃機関に連結された発電機による発電電力、或いは二次電池や燃料電池の放電電力を使用して動作する。 [Overall configuration]
1 is a configuration diagram of a vehicle system 1 that uses a vehicle control device according to an embodiment. The vehicle on which the vehicle system 1 is mounted is, for example, a two-wheeled, three-wheeled, or four-wheeled vehicle, and its drive source is an internal combustion engine such as a diesel engine or a gasoline engine, an electric motor, or a combination of these. The electric motor operates using power generated by a generator connected to the internal combustion engine, or discharged power from a secondary battery or a fuel cell.

車両システム１は、例えば、カメラ１０と、レーダ装置１２と、ＬＩＤＡＲ（Light Detection and Ranging）１４と、物体認識装置１６と、通信装置２０と、ＨＭＩ（Human Machine Interface）３０と、車両センサ４０と、ナビゲーション装置５０と、ＭＰＵ（Map Positioning Unit）６０と、運転操作子８０と、自動運転制御装置１００と、走行駆動力出力装置２００と、ブレーキ装置２１０と、ステアリング装置２２０とを備える。これらの装置や機器は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続される。なお、図１に示す構成はあくまで一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 The vehicle system 1 includes, for example, a camera 10, a radar device 12, a LIDAR (Light Detection and Ranging) 14, an object recognition device 16, a communication device 20, an HMI (Human Machine Interface) 30, a vehicle sensor 40, a navigation device 50, an MPU (Map Positioning Unit) 60, a driving operator 80, an automatic driving control device 100, a driving force output device 200, a brake device 210, and a steering device 220. These devices and equipment are connected to each other by multiple communication lines such as a CAN (Controller Area Network) communication line, serial communication lines, a wireless communication network, etc. Note that the configuration shown in FIG. 1 is merely an example, and some of the configuration may be omitted, or other configurations may be added.

カメラ１０は、例えば、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）等の固体撮像素子を利用したデジタルカメラである。カメラ１０は、車両システム１が搭載される車両（以下、自車両Ｍ）の任意の箇所に取り付けられる。前方を撮像する場合、カメラ１０は、フロントウインドシールド上部やルームミラー裏面等に取り付けられる。カメラ１０は、例えば、周期的に繰り返し自車両Ｍの周辺を撮像する。カメラ１０は、ステレオカメラであってもよい。 The camera 10 is, for example, a digital camera that uses a solid-state imaging element such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor). The camera 10 is attached to any location of the vehicle (hereinafter, the vehicle M) in which the vehicle system 1 is mounted. When capturing an image of the front, the camera 10 is attached to the top of the front windshield, the back of the rearview mirror, or the like. The camera 10, for example, periodically and repeatedly captures images of the surroundings of the vehicle M. The camera 10 may be a stereo camera.

レーダ装置１２は、自車両Ｍの周辺にミリ波などの電波を放射すると共に、物体によって反射された電波（反射波）を検出して少なくとも物体の位置（距離および方位）を検出する。レーダ装置１２は、自車両Ｍの任意の箇所に取り付けられる。レーダ装置１２は、ＦＭ－ＣＷ（Frequency Modulated Continuous Wave）方式によって物体の位置および速度を検出してもよい。 The radar device 12 emits radio waves such as millimeter waves around the vehicle M and detects radio waves reflected by objects (reflected waves) to detect at least the position (distance and direction) of the object. The radar device 12 is attached to any location on the vehicle M. The radar device 12 may detect the position and speed of an object using an FM-CW (Frequency Modulated Continuous Wave) method.

ＬＩＤＡＲ１４は、自車両Ｍの周辺に光（或いは光に近い波長の電磁波）を照射し、散乱光を測定する。ＬＩＤＡＲ１４は、発光から受光までの時間に基づいて、対象までの距離を検出する。照射される光は、例えば、パルス状のレーザー光である。ＬＩＤＡＲ１４は、自車両Ｍの任意の箇所に取り付けられる。 The LIDAR 14 irradiates light (or electromagnetic waves with a wavelength close to that of light) around the vehicle M and measures the scattered light. The LIDAR 14 detects the distance to the target based on the time between emitting and receiving the light. The irradiated light is, for example, a pulsed laser light. The LIDAR 14 is attached to any location on the vehicle M.

物体認識装置１６は、カメラ１０、レーダ装置１２、およびＬＩＤＡＲ１４のうち一部または全部による検出結果に対してセンサフュージョン処理を行って、物体の位置、種類、速度などを認識する。物体認識装置１６は、認識結果を自動運転制御装置１００に出力する。物体認識装置１６は、カメラ１０、レーダ装置１２、およびＬＩＤＡＲ１４の検出結果をそのまま自動運転制御装置１００に出力してよい。車両システム１から物体認識装置１６が省略されてもよい。 The object recognition device 16 performs sensor fusion processing on the detection results from some or all of the camera 10, the radar device 12, and the LIDAR 14 to recognize the position, type, speed, etc. of the object. The object recognition device 16 outputs the recognition results to the autonomous driving control device 100. The object recognition device 16 may output the detection results from the camera 10, the radar device 12, and the LIDAR 14 directly to the autonomous driving control device 100. The object recognition device 16 may be omitted from the vehicle system 1.

通信装置２０は、例えば、セルラー網やＷｉ－Ｆｉ網、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＤＳＲＣ（Dedicated Short Range Communication）などを利用して、自車両Ｍの周辺に存在する他車両と通信し、或いは無線基地局を介して各種サーバ装置と通信する。 The communication device 20 communicates with other vehicles in the vicinity of the vehicle M, for example, using a cellular network, a Wi-Fi network, Bluetooth (registered trademark), or DSRC (Dedicated Short Range Communication), or communicates with various server devices via a wireless base station.

ＨＭＩ３０は、自車両Ｍの乗員に対して各種情報を提示すると共に、乗員による入力操作を受け付ける。ＨＭＩ３０は、各種表示装置、スピーカ、ブザー、タッチパネル、スイッチ、キーなどを含む。 The HMI 30 presents various information to the occupants of the vehicle M and accepts input operations by the occupants. The HMI 30 includes various display devices, speakers, buzzers, touch panels, switches, keys, etc.

車両センサ４０は、自車両Ｍの速度を検出する車速センサ、加速度を検出する加速度センサ、鉛直軸回りの角速度を検出するヨーレートセンサ、自車両Ｍの向きを検出する方位センサ等を含む。 The vehicle sensor 40 includes a vehicle speed sensor that detects the speed of the host vehicle M, an acceleration sensor that detects the acceleration, a yaw rate sensor that detects the angular velocity around a vertical axis, and a direction sensor that detects the direction of the host vehicle M.

ナビゲーション装置５０は、例えば、ＧＮＳＳ（Global Navigation Satellite System）受信機５１と、ナビＨＭＩ５２と、経路決定部５３とを備える。ナビゲーション装置５０は、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置に第１地図情報５４を保持している。ＧＮＳＳ受信機５１は、ＧＮＳＳ衛星から受信した信号に基づいて、自車両Ｍの位置を特定する。自車両Ｍの位置は、車両センサ４０の出力を利用したＩＮＳ（Inertial Navigation System）によって特定または補完されてもよい。ナビＨＭＩ５２は、表示装置、スピーカ、タッチパネル、キーなどを含む。ナビＨＭＩ５２は、前述したＨＭＩ３０と一部または全部が共通化されてもよい。経路決定部５３は、例えば、ＧＮＳＳ受信機５１により特定された自車両Ｍの位置（或いは入力された任意の位置）から、ナビＨＭＩ５２を用いて乗員により入力された目的地までの経路（以下、地図上経路）を、第１地図情報５４を参照して決定する。第１地図情報５４は、例えば、道路を示すリンクと、リンクによって接続されたノードとによって道路形状が表現された情報である。第１地図情報５４は、道路の曲率やＰＯＩ（Point Of Interest）情報などを含んでもよい。地図上経路は、ＭＰＵ６０に出力される。ナビゲーション装置５０は、地図上経路に基づいて、ナビＨＭＩ５２を用いた経路案内を行ってもよい。ナビゲーション装置５０は、例えば、乗員の保有するスマートフォンやタブレット端末等の端末装置の機能によって実現されてもよい。ナビゲーション装置５０は、通信装置２０を介してナビゲーションサーバに現在位置と目的地を送信し、ナビゲーションサーバから地図上経路と同等の経路を取得してもよい。 The navigation device 50 includes, for example, a GNSS (Global Navigation Satellite System) receiver 51, a navigation HMI 52, and a route determination unit 53. The navigation device 50 holds first map information 54 in a storage device such as a HDD (Hard Disk Drive) or a flash memory. The GNSS receiver 51 identifies the position of the vehicle M based on a signal received from a GNSS satellite. The position of the vehicle M may be identified or complemented by an INS (Inertial Navigation System) using the output of the vehicle sensor 40. The navigation HMI 52 includes a display device, a speaker, a touch panel, a key, and the like. The navigation HMI 52 may be partially or entirely shared with the above-mentioned HMI 30. The route determination unit 53 determines, for example, a route (hereinafter, a route on a map) from the position of the vehicle M identified by the GNSS receiver 51 (or any input position) to a destination input by the occupant using the navigation HMI 52, by referring to the first map information 54. The first map information 54 is, for example, information that expresses road shapes using links that indicate roads and nodes connected by the links. The first map information 54 may also include road curvature and POI (Point Of Interest) information. The route on the map is output to the MPU 60. The navigation device 50 may perform route guidance using the navigation HMI 52 based on the route on the map. The navigation device 50 may be realized by the functions of a terminal device such as a smartphone or tablet terminal owned by the occupant. The navigation device 50 may transmit the current position and destination to a navigation server via the communication device 20 and obtain a route equivalent to the route on the map from the navigation server.

ＭＰＵ６０は、例えば、推奨車線決定部６１を含み、ＨＤＤやフラッシュメモリなどの記憶装置に第２地図情報６２を保持している。推奨車線決定部６１は、ナビゲーション装置５０から提供された地図上経路を複数のブロックに分割し（例えば、車両進行方向に関して１００［ｍ］毎に分割し）、第２地図情報６２を参照してブロックごとに推奨車線を決定する。推奨車線決定部６１は、左から何番目の車線を走行するといった決定を行う。推奨車線決定部６１は、地図上経路に分岐箇所が存在する場合、自車両Ｍが、分岐先に進行するための合理的な経路を走行できるように、推奨車線を決定する。 The MPU 60 includes, for example, a recommended lane determination unit 61, and stores second map information 62 in a storage device such as an HDD or flash memory. The recommended lane determination unit 61 divides the route on the map provided by the navigation device 50 into a number of blocks (for example, every 100 m in the vehicle travel direction), and determines a recommended lane for each block by referring to the second map information 62. The recommended lane determination unit 61 determines, for example, which lane from the left the vehicle should travel in. When a branch point is present on the route on the map, the recommended lane determination unit 61 determines a recommended lane so that the vehicle M can travel along a reasonable route to proceed to the branch point.

第２地図情報６２は、第１地図情報５４よりも高精度な地図情報である。第２地図情報６２は、例えば、車線の中央の情報あるいは車線の境界の情報等を含んでいる。また、第２地図情報６２には、道路情報、交通規制情報、住所情報（住所・郵便番号）、施設情報、電話番号情報などが含まれてよい。第２地図情報６２は、通信装置２０が他装置と通信することにより、随時、アップデートされてよい。 The second map information 62 is map information with higher accuracy than the first map information 54. The second map information 62 includes, for example, information on the center of lanes or information on lane boundaries. The second map information 62 may also include road information, traffic regulation information, address information (address and zip code), facility information, telephone number information, and the like. The second map information 62 may be updated at any time by the communication device 20 communicating with other devices.

運転操作子８０は、例えば、アクセルペダル、ブレーキペダル、シフトレバー、ステアリングホイール、異形ステア、ジョイスティックその他の操作子を含む。運転操作子８０には、操作量あるいは操作の有無を検出するセンサが取り付けられており、その検出結果は、自動運転制御装置１００、もしくは、走行駆動力出力装置２００、ブレーキ装置２１０、およびステアリング装置２２０のうち一部または全部に出力される。 The driving operators 80 include, for example, an accelerator pedal, a brake pedal, a shift lever, a steering wheel, a special steering wheel, a joystick, and other operators. The driving operators 80 are fitted with sensors that detect the amount of operation or the presence or absence of operation, and the detection results are output to the automatic driving control device 100, or some or all of the driving force output device 200, the brake device 210, and the steering device 220.

自動運転制御装置１００は、例えば、第１制御部１２０と、第２制御部１６０とを備える。第１制御部１２０と第２制御部１６０は、それぞれ、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予め自動運転制御装置１００のＨＤＤやフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体に格納されており、記憶媒体（非一過性の記憶媒体）がドライブ装置に装着されることで自動運転制御装置１００のＨＤＤやフラッシュメモリにインストールされてもよい。自動運転制御装置１００は「車両制御装置」の一例であり、行動計画生成部１４０と第２制御部１６０を合わせたものが「運転制御部」の一例である。 The automatic driving control device 100 includes, for example, a first control unit 120 and a second control unit 160. The first control unit 120 and the second control unit 160 are each realized by, for example, a hardware processor such as a CPU (Central Processing Unit) executing a program (software). In addition, some or all of these components may be realized by hardware (including circuitry) such as an LSI (Large Scale Integration), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a GPU (Graphics Processing Unit), or may be realized by collaboration between software and hardware. The program may be stored in advance in a storage device (storage device having a non-transient storage medium) such as an HDD or flash memory of the automatic driving control device 100, or may be stored in a removable storage medium such as a DVD or CD-ROM, and may be installed in the HDD or flash memory of the automatic driving control device 100 by attaching the storage medium (non-transient storage medium) to a drive device. The automatic driving control device 100 is an example of a "vehicle control device," and the combination of the action plan generation unit 140 and the second control unit 160 is an example of a "driving control unit."

図２は、第１制御部１２０および第２制御部１６０の機能構成図である。第１制御部１２０は、例えば、認識部１３０と、行動計画生成部１４０とを備える。行動計画生成部１４０は、状態価値算出部１４２と、速度決定部１４４とを備える。状態価値算出部１４２は、状態パス定義部１４２Ａと、算出部１４２Ｂとを備える。行動計画生成部１４０が参照可能なメモリ領域には、一以上の部分空間制約マップ１４６が格納されている。状態価値算出部１４２、速度決定部１４４、および部分空間制約マップ１４６の内容については後述する。 Figure 2 is a functional configuration diagram of the first control unit 120 and the second control unit 160. The first control unit 120 includes, for example, a recognition unit 130 and a behavior plan generation unit 140. The behavior plan generation unit 140 includes a state value calculation unit 142 and a speed determination unit 144. The state value calculation unit 142 includes a state path definition unit 142A and a calculation unit 142B. One or more subspace constraint maps 146 are stored in a memory area that can be referenced by the behavior plan generation unit 140. The contents of the state value calculation unit 142, the speed determination unit 144, and the subspace constraint map 146 will be described later.

第１制御部１２０は、例えば、ＡＩ（Artificial Intelligence；人工知能）による機能と、予め与えられたモデルによる機能とを並行して実現する。例えば、「交差点を認識する」機能は、ディープラーニング等による交差点の認識と、予め与えられた条件（パターンマッチング可能な信号、道路標示などがある）に基づく認識とが並行して実行され、双方に対してスコア付けして総合的に評価することで実現されてよい。これによって、自動運転の信頼性が担保される。 The first control unit 120, for example, realizes a function based on AI (Artificial Intelligence) and a function based on a pre-given model in parallel. For example, the "intersection recognition" function may be realized by executing in parallel the recognition of intersections using deep learning or the like and the recognition based on pre-given conditions (such as traffic lights and road markings that can be pattern matched), and by scoring both and evaluating them comprehensively. This ensures the reliability of autonomous driving.

認識部１３０は、カメラ１０、レーダ装置１２、およびＬＩＤＡＲ１４から物体認識装置１６を介して入力された情報に基づいて、自車両Ｍの周辺にある物体の位置、および速度、加速度等の状態を認識する。物体の位置は、例えば、自車両Ｍの代表点（重心や駆動軸中心など）を原点とした絶対座標上の位置として認識され、制御に使用される。物体の位置は、その物体の重心やコーナー等の代表点で表されてもよいし、表現された領域で表されてもよい。物体の「状態」とは、物体の加速度やジャーク、あるいは「行動状態」（例えば車線変更をしている、またはしようとしているか否か）を含んでもよい。 The recognition unit 130 recognizes the position, speed, acceleration, and other states of objects around the vehicle M based on information input from the camera 10, the radar device 12, and the LIDAR 14 via the object recognition device 16. The position of an object is recognized as a position on absolute coordinates with a representative point of the vehicle M (such as the center of gravity or the center of the drive shaft) as the origin, and is used for control. The position of an object may be represented by a representative point such as the center of gravity or a corner of the object, or may be represented by a represented area. The "state" of an object may include the acceleration or jerk of the object, or the "behavioral state" (for example, whether or not the object is changing lanes or is about to change lanes).

また、認識部１３０は、例えば、自車両Ｍが走行している車線（走行車線）を認識する。例えば、認識部１３０は、第２地図情報６２から得られる道路区画線のパターン（例えば実線と破線の配列）と、カメラ１０によって撮像された画像から認識される自車両Ｍの周辺の道路区画線のパターンとを比較することで、走行車線を認識する。なお、認識部１３０は、道路区画線に限らず、道路区画線や路肩、縁石、中央分離帯、ガードレールなどを含む走路境界（道路境界）を認識することで、走行車線を認識してもよい。この認識において、ナビゲーション装置５０から取得される自車両Ｍの位置やＩＮＳによる処理結果が加味されてもよい。また、認識部１３０は、一時停止線、障害物、赤信号、料金所、その他の道路事象を認識する。 The recognition unit 130 also recognizes, for example, the lane in which the vehicle M is traveling (the driving lane). For example, the recognition unit 130 recognizes the driving lane by comparing the pattern of road dividing lines (for example, an arrangement of solid and dashed lines) obtained from the second map information 62 with the pattern of road dividing lines around the vehicle M recognized from the image captured by the camera 10. Note that the recognition unit 130 may recognize the driving lane by recognizing road boundaries (road boundaries) including road dividing lines, shoulders, curbs, medians, guard rails, etc., in addition to road dividing lines. In this recognition, the position of the vehicle M obtained from the navigation device 50 and the processing results by the INS may be taken into account. The recognition unit 130 also recognizes stop lines, obstacles, red lights, toll booths, and other road phenomena.

認識部１３０は、走行車線を認識する際に、走行車線に対する自車両Ｍの位置や姿勢を認識する。認識部１３０は、例えば、自車両Ｍの基準点の車線中央からの乖離、および自車両Ｍの進行方向の車線中央を連ねた線に対してなす角度を、走行車線に対する自車両Ｍの相対位置および姿勢として認識してもよい。これに代えて、認識部１３０は、走行車線のいずれかの側端部（道路区画線または道路境界）に対する自車両Ｍの基準点の位置などを、走行車線に対する自車両Ｍの相対位置として認識してもよい。 When recognizing the driving lane, the recognition unit 130 recognizes the position and attitude of the host vehicle M with respect to the driving lane. For example, the recognition unit 130 may recognize the deviation of the reference point of the host vehicle M from the center of the lane and the angle with respect to a line connecting the centers of the lanes in the traveling direction of the host vehicle M as the relative position and attitude of the host vehicle M with respect to the driving lane. Alternatively, the recognition unit 130 may recognize the position of the reference point of the host vehicle M with respect to either side end of the driving lane (a road dividing line or a road boundary) as the relative position of the host vehicle M with respect to the driving lane.

行動計画生成部１４０は、原則的には推奨車線決定部６１により決定された推奨車線を走行し、更に、自車両Ｍの周辺状況に対応できるように、自車両Ｍが自動的に（運転者の操作に依らずに）将来走行する目標軌道を生成する。目標軌道は、例えば、速度要素を含んでいる。例えば、目標軌道は、自車両Ｍの到達すべき地点（軌道点）を順に並べたものとして表現される。軌道点は、道なり距離で所定の走行距離（例えば数［ｍ］程度）ごとの自車両Ｍの到達すべき地点であり、それとは別に、所定のサンプリング時間（例えば０コンマ数［ｓｅｃ］程度）ごとの目標速度および目標加速度が、目標軌道の一部として生成される。また、軌道点は、所定のサンプリング時間ごとの、そのサンプリング時刻における自車両Ｍの到達すべき位置であってもよい。この場合、目標速度や目標加速度の情報は軌道点の間隔で表現される。 In principle, the action plan generating unit 140 generates a target trajectory along which the host vehicle M will automatically (without the driver's operation) travel in the future so that the host vehicle M can travel along the recommended lane determined by the recommended lane determining unit 61 and can respond to the surrounding conditions of the host vehicle M. The target trajectory includes, for example, a speed element. For example, the target trajectory is expressed as a sequence of points (trajectory points) to be reached by the host vehicle M. The trajectory points are points to be reached by the host vehicle M at each predetermined travel distance (for example, about several meters) along the road, and separately, the target speed and target acceleration are generated as part of the target trajectory for each predetermined sampling time (for example, about a few tenths of a second). The trajectory points may also be positions to be reached by the host vehicle M at each sampling time for each predetermined sampling time. In this case, the information on the target speed and target acceleration is expressed as the interval between the trajectory points.

行動計画生成部１４０は、目標軌道を生成するにあたり、自動運転のイベントを設定してよい。自動運転のイベントには、定速走行イベント、低速追従走行イベント、車線変更イベント、分岐イベント、合流イベント、テイクオーバーイベントなどがある。行動計画生成部１４０は、起動させたイベントに応じた目標軌道を生成する。 The behavior plan generation unit 140 may set an autonomous driving event when generating the target trajectory. Autonomous driving events include a constant speed driving event, a low speed following driving event, a lane change event, a branching event, a merging event, and a takeover event. The behavior plan generation unit 140 generates a target trajectory according to the activated event.

第２制御部１６０は、行動計画生成部１４０によって生成された目標軌道を、予定の時刻通りに自車両Ｍが通過するように、走行駆動力出力装置２００、ブレーキ装置２１０、およびステアリング装置２２０を制御する。 The second control unit 160 controls the driving force output device 200, the brake device 210, and the steering device 220 so that the host vehicle M passes through the target trajectory generated by the action plan generation unit 140 at the scheduled time.

図２に戻り、第２制御部１６０は、例えば、取得部１６２と、速度制御部１６４と、操舵制御部１６６とを備える。取得部１６２は、行動計画生成部１４０により生成された目標軌道（軌道点）の情報を取得し、メモリ（不図示）に記憶させる。速度制御部１６４は、メモリに記憶された目標軌道に付随する速度要素に基づいて、走行駆動力出力装置２００またはブレーキ装置２１０を制御する。操舵制御部１６６は、メモリに記憶された目標軌道の曲がり具合に応じて、ステアリング装置２２０を制御する。速度制御部１６４および操舵制御部１６６の処理は、例えば、フィードフォワード制御とフィードバック制御との組み合わせにより実現される。一例として、操舵制御部１６６は、自車両Ｍの前方の道路の曲率に応じたフィードフォワード制御と、目標軌道からの乖離に基づくフィードバック制御とを組み合わせて実行する。 Returning to FIG. 2, the second control unit 160 includes, for example, an acquisition unit 162, a speed control unit 164, and a steering control unit 166. The acquisition unit 162 acquires information on the target trajectory (trajectory points) generated by the action plan generation unit 140 and stores it in a memory (not shown). The speed control unit 164 controls the driving force output device 200 or the brake device 210 based on the speed element associated with the target trajectory stored in the memory. The steering control unit 166 controls the steering device 220 according to the curvature of the target trajectory stored in the memory. The processing of the speed control unit 164 and the steering control unit 166 is realized, for example, by a combination of feedforward control and feedback control. As an example, the steering control unit 166 executes a combination of feedforward control according to the curvature of the road ahead of the vehicle M and feedback control based on the deviation from the target trajectory.

走行駆動力出力装置２００は、車両が走行するための走行駆動力（トルク）を駆動輪に出力する。走行駆動力出力装置２００は、例えば、内燃機関、電動機、および変速機などの組み合わせと、これらを制御するＥＣＵ（Electronic Control Unit）とを備える。ＥＣＵは、第２制御部１６０から入力される情報、或いは運転操作子８０から入力される情報に従って、上記の構成を制御する。 The driving force output device 200 outputs a driving force (torque) to the drive wheels for the vehicle to travel. The driving force output device 200 includes, for example, a combination of an internal combustion engine, an electric motor, and a transmission, and an ECU (Electronic Control Unit) that controls these. The ECU controls the above configuration according to information input from the second control unit 160 or information input from the driving operator 80.

ブレーキ装置２１０は、例えば、ブレーキキャリパーと、ブレーキキャリパーに油圧を伝達するシリンダと、シリンダに油圧を発生させる電動モータと、ブレーキＥＣＵとを備える。ブレーキＥＣＵは、第２制御部１６０から入力される情報、或いは運転操作子８０から入力される情報に従って電動モータを制御し、制動操作に応じたブレーキトルクが各車輪に出力されるようにする。ブレーキ装置２１０は、運転操作子８０に含まれるブレーキペダルの操作によって発生させた油圧を、マスターシリンダを介してシリンダに伝達する機構をバックアップとして備えてよい。なお、ブレーキ装置２１０は、上記説明した構成に限らず、第２制御部１６０から入力される情報に従ってアクチュエータを制御して、マスターシリンダの油圧をシリンダに伝達する電子制御式油圧ブレーキ装置であってもよい。 The brake device 210 includes, for example, a brake caliper, a cylinder that transmits hydraulic pressure to the brake caliper, an electric motor that generates hydraulic pressure in the cylinder, and a brake ECU. The brake ECU controls the electric motor according to information input from the second control unit 160 or information input from the driving operation unit 80, so that a brake torque corresponding to the braking operation is output to each wheel. The brake device 210 may include a backup mechanism that transmits hydraulic pressure generated by operating the brake pedal included in the driving operation unit 80 to the cylinder via a master cylinder. Note that the brake device 210 is not limited to the configuration described above, and may be an electronically controlled hydraulic brake device that controls an actuator according to information input from the second control unit 160 to transmit hydraulic pressure from the master cylinder to the cylinder.

ステアリング装置２２０は、例えば、ステアリングＥＣＵと、電動モータとを備える。電動モータは、例えば、ラックアンドピニオン機構に力を作用させて転舵輪の向きを変更する。ステアリングＥＣＵは、第２制御部１６０から入力される情報、或いは運転操作子８０から入力される情報に従って、電動モータを駆動し、転舵輪の向きを変更させる。 The steering device 220 includes, for example, a steering ECU and an electric motor. The electric motor changes the direction of the steered wheels by, for example, applying a force to a rack and pinion mechanism. The steering ECU drives the electric motor according to information input from the second control unit 160 or information input from the driving operator 80, to change the direction of the steered wheels.

［状態価値に基づく速度制御］
以下、状態価値算出部１４２と速度決定部１４４による速度制御について説明する。状態価値算出部１４２と速度決定部１４４は、動的計画法、より具体的にはマルコフ決定過程を用いて自車両Ｍの将来の速度推移（以下、速度プロファイル）を決定する。 [Speed control based on state value]
The following describes the speed control performed by the state value calculation unit 142 and the speed determination unit 144. The state value calculation unit 142 and the speed determination unit 144 determine a future speed transition (hereinafter, speed profile) of the host vehicle M using dynamic programming, more specifically, a Markov decision process.

状態価値算出部１４２の状態パス定義部１４２Ａは、車両の移動に関連する複数の要素であって、時間を含む複数の要素を軸とする状態空間において、開始時点から目標時点までの間の複数の時点間に生じる行動量を複数の候補の中から選択することで状態パスを定義する。「開始時点」とは、例えば現時点である（制御遅れを考慮して微小時間後の時点でもよい）。「複数の要素」は、例えば、｛時間、進行方向（道路長手方向）に関する位置、速度、加速度｝である。時間以外の要素が複数の要素から除外されてもよいし、別の要素が複数の要素に追加されてもよい。以下、各要素が具体的な値をとることで決定されるものを「状態」と称する。「行動量」は、速度に関する物理量であればよく、例えばジャーク（躍度）である。この処理において、時間は所定幅（例えば１［ｓｅｃ］）刻みで進行する。図３は、状態パスの定義について説明するための図である。状態パス定義部１４２Ａは、例えば、開始時点から目標時点までの間の複数の時点間（１［ｓｅｃ］経過するごと）に生じるジャークの候補を例えば（０．５）、（０．３）、（０．１）、（０）、（－０．１）、（－０．３）、（－０．５）のように複数個用意し、その時点間でジャークの候補から選択した一つのジャークで定ジャーク走行した場合の次の時点の状態を算出する。状態パス定義部１４２Ａは、これを時点が進行するのに応じて波及的に実行し、目標時点（例えば開始時点の数［ｓｅｃ］後～十数［ｓｅｃ］後）まで行う。図中、ＳＰは状態パスのうち一つを表している。状態パスとは、開始時点から順に辿れる状態を、各時点で一つずつ選択することで決定される、一連の状態をいう。ジャークの候補がｋ個用意され、開始時点から目標時点までの時間がｈ［ｓｅｃ］であるとすると、ｋのｈ乗の状態パスが生成される。 The state path definition unit 142A of the state value calculation unit 142 defines a state path by selecting from among multiple candidates the amount of action occurring between multiple time points between the start time point and the target time point in a state space with multiple elements including time as axes, which are multiple elements related to the movement of the vehicle. The "start time point" is, for example, the current time (or a time point after a small time considering control delay). The "multiple elements" are, for example, {time, position in the traveling direction (longitudinal direction of the road), speed, and acceleration}. Elements other than time may be excluded from the multiple elements, or other elements may be added to the multiple elements. Hereinafter, what is determined by each element taking a specific value is referred to as a "state". The "amount of action" may be any physical quantity related to speed, such as jerk (jerk). In this process, time progresses in increments of a predetermined width (for example, 1 [sec]). FIG. 3 is a diagram for explaining the definition of a state path. The state path definition unit 142A prepares multiple jerk candidates that occur between multiple time points (every 1 [sec] elapses) between the start time point and the target time point, such as (0.5), (0.3), (0.1), (0), (-0.1), (-0.3), and (-0.5), and calculates the state at the next time point when driving with a constant jerk at one jerk selected from the jerk candidates between those time points. The state path definition unit 142A executes this in a cascading manner as the time point progresses, and continues until the target time point (for example, several [sec] to a dozen [sec] after the start time point). In the figure, SP represents one of the state paths. A state path is a series of states that are determined by selecting one at each time point from the states that can be traced in order from the start time point. If k jerk candidates are prepared and the time from the start time point to the target time point is h [sec], a state path of k to the power of h is generated.

算出部１４２Ｂは、状態パスごとに、開始時点から目標時点までの間の各時点について、一以上の評価対象量に基づく報酬関数値を算出する。評価対象量は、車両の移動に関連する物理量であり、例えば、速度、加速度、およびジャークを含む。報酬関数値を算出するための報酬関数ｆ（ｉ，ｔ）は、例えば式（１）で表される。式中、ｉは状態パスの識別情報であり、ｔは時点である。報酬関数ｆ（ｉ，ｔ）は、例えば、原則的に、速度が高いほど高い値を返し、加速度の絶対値が高いほど低い値を返し、ジャークの絶対値が高いほど低い値を返す関数である。速度が高く、加速度やジャークが低いということは、自車両Ｍが不要な加減速をせずに走行できているということを表すので、報酬関数ｆ（ｉ，ｔ）は、そのような場合に高い値（良好であることを示す値）を返すように定義されている。また、報酬関数ｆ（ｉ，ｔ）が出力する値の最小値は、ゼロになるように設定されている。 The calculation unit 142B calculates a reward function value based on one or more evaluation target quantities for each state path at each time point between the start time point and the target time point. The evaluation target quantities are physical quantities related to the movement of the vehicle, and include, for example, speed, acceleration, and jerk. The reward function f(i,t) for calculating the reward function value is expressed, for example, by formula (1). In the formula, i is identification information of the state path, and t is the time point. The reward function f(i,t) is, for example, a function that returns a higher value as the speed is higher, returns a lower value as the absolute value of the acceleration is higher, and returns a lower value as the absolute value of the jerk is higher. A high speed and low acceleration or jerk indicate that the host vehicle M is able to travel without unnecessary acceleration or deceleration, so the reward function f(i,t) is defined to return a high value (a value indicating good) in such a case. In addition, the minimum value of the value output by the reward function f(i,t) is set to be zero.

報酬関数＝ｆ（ｉ，ｔ）｛速度，加速度，ジャーク｝ …（１） Reward function = f(i,t) {velocity, acceleration, jerk} ... (1)

そして、算出部１４２Ｂは、報酬関数値を時系列に合計することで、状態パスｉごとの状態価値ＳＶ（ｉ）を算出する。状態価値は、例えば式（２）で表される。 Then, the calculation unit 142B calculates the state value SV(i) for each state path i by summing the reward function values in a time series. The state value is expressed by, for example, formula (2).

ＳＶ（ｉ）＝Σ_ｔ＝0 ^ｈｆ（ｉ，ｔ） …（２） SV(i)=Σ _t=0 ^h f(i, t)...(2)

上記の計算において、算出部１４２Ｂは、認識部１３０の認識結果に基づいて部分空間制約マップ１４６を設定し、それを反映させて報酬関数値や状態価値を求める。部分空間制約マップ１４６は、複数の「要素」のうち二以上の要素を軸とする複数の部分空間のうち一部または全部に設定されるものである。算出部１４２Ｂは、部分空間制約マップ１４６に、状態価値ＳＶ（ｉ）を低下させる（例えばゼロにする）禁止領域（第１領域）と、報酬関数値を低下させる（例えばゼロにする）非推奨領域（第２領域）とのうち一方または双方を設定する。 In the above calculations, the calculation unit 142B sets the subspace constraint map 146 based on the recognition result of the recognition unit 130, and calculates the reward function value and the state value by reflecting this. The subspace constraint map 146 is set in some or all of the multiple subspaces whose axes are two or more of the multiple "elements". The calculation unit 142B sets, in the subspace constraint map 146, one or both of a prohibited region (first region) in which the state value SV(i) is reduced (e.g., set to zero) and a non-recommended region (second region) in which the reward function value is reduced (e.g., set to zero).

図４は、部分空間制約マップ１４６の第１例を示す図である。この部分空間制約マップ１４６（１）は、静的目標物に対する制約を規定したものであり、位置と速度を軸とした平面で定義されている。図中、Ａ１は禁止領域であり、Ａ２は非推奨領域である。また、Ｖ１は法定速度などの制限速度であり、Ｘ１は「その位置で停止する（速度をゼロにする）べき位置」である。例えば、信号機の手前の停止線の位置などがＸ１として設定される。算出部１４２Ｂは、位置と速度を軸とした平面において速度がＶ１以上の領域を禁止領域Ａ１に設定する。また、算出部１４２Ｂは、想定される最大限の減速をしても位置Ｘ１で停止できない領域を禁止領域Ａ１に設定する。また、算出部１４２Ｂは、禁止領域Ａ１以外の領域において禁止領域Ａ１に近い境界部の領域を、非推奨領域Ａ２に設定する。つまり、禁止領域Ａ１と非推奨領域は隣接している。これらを統合すると、図４に示す部分空間制約マップ１４６（１）となる。禁止領域Ａ１の境界線である直線部Ａ１ａの傾きは、自動運転として許容される最大の減速度に基づいている。非推奨領域Ａ２の境界線である直線部Ａ２ａの傾きは、自動運転として推奨される程度の減速度（最大の減速度よりも絶対値が小さい）に基づいている。 Figure 4 is a diagram showing a first example of the subspace constraint map 146. This subspace constraint map 146 (1) specifies constraints on static targets, and is defined on a plane with the position and speed as axes. In the figure, A1 is a prohibited area, and A2 is a non-recommended area. Also, V1 is a speed limit such as a legal speed limit, and X1 is "a position where you should stop (set the speed to zero)". For example, the position of a stop line in front of a traffic light is set as X1. The calculation unit 142B sets the area where the speed is V1 or more on the plane with the position and speed as the prohibited area A1. The calculation unit 142B also sets the area where you cannot stop at position X1 even if you decelerate to the maximum extent possible as the prohibited area A1. The calculation unit 142B also sets the boundary area close to the prohibited area A1 in the area other than the prohibited area A1 as the non-recommended area A2. In other words, the prohibited area A1 and the non-recommended area are adjacent to each other. Combining these results in the subspace constraint map 146(1) shown in FIG. 4. The gradient of the straight line portion A1a, which is the boundary line of the prohibited area A1, is based on the maximum deceleration permitted for autonomous driving. The gradient of the straight line portion A2a, which is the boundary line of the non-recommended area A2, is based on the deceleration recommended for autonomous driving (the absolute value of which is smaller than the maximum deceleration).

図５は、部分空間制約マップ１４６の第２例を示す図である。この部分空間制約マップ１４６（２）は、動的目標物に対する制約を規定したものであり、時間と位置を軸とした平面で定義されている。部分空間制約マップ１４６（２）が生成されるのは、自車両Ｍが前走車両に追従して走行する場面である。前走車両とは、自車両Ｍと同じ車線上において、自車両Ｍの直前（間に車両が存在しないことを意味する）を自車両Ｍと同じ方向に走行する車両である。算出部１４２Ｂは、前走車両の将来の位置を定速モデル、定加速度モデル、定ジャークモデル、カルマンフィルタ等で予測した上で、将来の前走車両の占める領域にマージン領域を加えた領域を禁止領域Ａ１に、禁止領域Ａ１の境界線Ａ１ｂから、境界線Ａ１ｂを目標車間距離ＴＤだけ平行移動させた線までの領域を非推奨領域Ａ２に設定する。 Figure 5 is a diagram showing a second example of the subspace constraint map 146. This subspace constraint map 146 (2) specifies constraints for dynamic targets, and is defined on a plane with time and position as axes. The subspace constraint map 146 (2) is generated when the host vehicle M is traveling following a vehicle ahead. The vehicle ahead is a vehicle traveling in the same direction as the host vehicle M, immediately before the host vehicle M (meaning that there is no vehicle between them) on the same lane as the host vehicle M. The calculation unit 142B predicts the future position of the vehicle ahead using a constant speed model, a constant acceleration model, a constant jerk model, a Kalman filter, etc., and sets the region occupied by the future vehicle ahead plus a margin region as the prohibited region A1, and the region from the boundary line A1b of the prohibited region A1 to a line obtained by translating the boundary line A1b by the target inter-vehicle distance TD as the non-recommended region A2.

算出部１４２Ｂは、図４および図５で例示した場面ごとに、道路事象（停止線、車両などの交通参加者、信号の状態）に応じてどのように禁止領域Ａ１や非推奨領域Ａ２を定義するかを決定するためのテーブル情報を保有しており、行動計画生成部１４０が起動するイベントに応じてテーブル情報から禁止領域Ａ１や非推奨領域Ａ２の定義規則を取得し、禁止領域Ａ１や非推奨領域Ａ２を定義する。 The calculation unit 142B holds table information for determining how to define the prohibited area A1 and the non-recommended area A2 according to road events (stop lines, traffic participants such as vehicles, and traffic signal status) for each of the scenes illustrated in Figures 4 and 5, and obtains definition rules for the prohibited area A1 and the non-recommended area A2 from the table information according to the event activated by the action plan generation unit 140, and defines the prohibited area A1 and the non-recommended area A2.

算出部１４２Ｂは、禁止領域Ａ１を一度でも通る状態パスｉについて、状態価値ＳＶ（ｉ）をゼロ（最低値）に固定する。また、算出部１４２Ｂは、非推奨領域Ａ２に存在する状態（ｉ，ｔ）について、その状態に関する報酬関数ｆ（ｉ，ｔ）をゼロにする。これによって状態価値ＳＶ（ｉ）も低下するが、状態価値ＳＶ（ｉ）がゼロになる訳では無く、非推奨領域Ａ２に状態（ｉ，ｔ）が存在する状態パスｉが選択される可能性もある。なお、部分空間制約マップ１４６は、平面で定義されるのに限らず、三次元以上の空間で定義されてもよい。 The calculation unit 142B fixes the state value SV(i) to zero (the minimum value) for a state path i that passes through the prohibited area A1 at least once. Furthermore, the calculation unit 142B sets the reward function f(i, t) for a state (i, t) that exists in the non-recommended area A2 to zero. This also reduces the state value SV(i), but this does not mean that the state value SV(i) becomes zero, and there is a possibility that a state path i whose state (i, t) exists in the non-recommended area A2 may be selected. Note that the subspace constraint map 146 is not limited to being defined on a plane, and may be defined in a space of three or more dimensions.

速度決定部１４４は、部分空間制約マップ１４６を反映させて計算した状態価値ＳＶ（ｉ）の高い状態パスの各時点におけるジャークに従って、自車両Ｍの将来の速度を決定する。これによって、無駄な加減速が抑制されると共に、場面に応じた禁止領域Ａ１を通らず、非推奨領域Ａ２をなるべく通らないような速度プロファイルが決定される。 The speed determination unit 144 determines the future speed of the host vehicle M according to the jerk at each time point of the state path with a high state value SV(i) calculated by reflecting the subspace constraint map 146. This suppresses unnecessary acceleration and deceleration, and determines a speed profile that does not pass through the prohibited area A1 according to the scene and avoids passing through the non-recommended area A2 as much as possible.

図６は、状態価値算出部１４２および速度決定部１４４により実行される処理の流れの一例を示すフローチャートである。まず、状態価値算出部１４２は、認識部１３０から自車両Ｍの周辺状況を取得し（ステップＳ１００）、周辺状況に応じた部分空間制約マップ１４６を生成する（ステップＳ１０２）。 Figure 6 is a flowchart showing an example of the flow of processing executed by the state value calculation unit 142 and the speed determination unit 144. First, the state value calculation unit 142 acquires the surrounding conditions of the host vehicle M from the recognition unit 130 (step S100), and generates a subspace constraint map 146 according to the surrounding conditions (step S102).

次に、状態価値算出部１４２は、前述した手法で複数の状態パスを生成し（ステップＳ１０４）、状態パスごとに、部分空間制約マップ１４６に従って報酬関数ｆ（ｉ，ｔ）を算出し、次いで状態価値ＳＶ（ｉ）を算出する（ステップＳ１０６）。そして、速度決定部１４４が、状態価値ＳＶ（ｉ）の高い状態パスの各時点におけるジャークに従って、自車両Ｍの速度プロファイルを決定する（ステップＳ１０８）。 Next, the state value calculation unit 142 generates multiple state paths using the method described above (step S104), calculates the reward function f(i,t) for each state path according to the subspace constraint map 146, and then calculates the state value SV(i) (step S106). Then, the speed determination unit 144 determines the speed profile of the host vehicle M according to the jerk at each time point of the state path with the high state value SV(i) (step S108).

以上説明した実施形態によれば、将来の状態を考慮した速度制御を行う際に、適切な制約を設定することで好適な解を得ることができる。 According to the embodiment described above, when performing speed control that takes future conditions into account, it is possible to obtain an optimal solution by setting appropriate constraints.

上記説明した実施形態は、以下のように表現することができる。
プログラムを記憶した記憶装置と、
ハードウェアプロセッサと、を備え、
前記ハードウェアプロセッサが前記記憶装置に記憶されたプログラムを実行することにより、
時間を含み、車両の移動に関連する複数の要素を軸とする状態空間において、開始時点から目標時点までの間の複数の時点間に生じる行動量を複数の候補の中から選択することで定義される、前記開始時点から前記目標時点までの間の複数の状態パスのそれぞれについて、状態価値を算出し、
前記状態価値の高い状態パスに従って前記車両の将来の速度推移を決定し、
前記状態価値を算出する際に、
開始時点から目標時点までの間の各時点について、一以上の評価対象量に基づく報酬関数値を算出し、前記報酬関数値を時系列に合計することで前記状態価値を算出し、
前記複数の要素のうち二以上の要素を軸とする複数の部分空間のうち一部または全部に設定される、前記状態価値を低下させる第１領域と、前記報酬関数値を低下させる第２領域とに従って前記状態価値を算出する、
ように構成されている、車両制御装置。 The above-described embodiment can be expressed as follows.
A storage device storing a program;
a hardware processor;
The hardware processor executes the program stored in the storage device,
Calculating a state value for each of a plurality of state paths between a start time point and a target time point, the state path being defined by selecting, from a plurality of candidates, an amount of action occurring between a plurality of time points between the start time point and the target time point in a state space including time and having a plurality of elements related to the movement of the vehicle as axes;
determining a future speed transition of the vehicle according to the state path with a high state value;
When calculating the condition value,
Calculating a reward function value based on one or more evaluation target quantities for each time point between the start time point and the target time point, and calculating the state value by summing the reward function values in a time series;
Calculating the state value according to a first region for reducing the state value and a second region for reducing the reward function value, the first region being set in a part or all of a plurality of subspaces having axes of two or more elements among the plurality of elements;
The vehicle control device is configured as follows.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 The above describes the form for carrying out the present invention using an embodiment, but the present invention is not limited to such an embodiment, and various modifications and substitutions can be made without departing from the spirit of the present invention.

１００自動運転制御装置
１３０認識部
１４０行動計画生成部
１４２状態価値算出部
１４２Ａ状態パス定義部
１４２Ｂ算出部
１４４速度決定部
１４６部分空間制約マップ 100 Automatic driving control device 130 Recognition unit 140 Action plan generation unit 142 State value calculation unit 142A State path definition unit 142B Calculation unit 144 Speed determination unit 146 Subspace constraint map

Claims

a state evaluation amount calculation unit that calculates a state evaluation amount for each of a plurality of state paths between a start time point and a target time point, the state path being defined by selecting, from a plurality of candidates, an amount of action occurring between a plurality of time points between a start time point and a target time point in a state space including time and having axes of a plurality of elements related to the movement of the vehicle ;
a speed determination unit that determines a future speed transition of the vehicle according to a state path having a high state evaluation value ;
Equipped with
The state evaluation amount calculation unit is
For each time point between the start time point and the target time point, a reward function value based on one or more evaluation target quantities is calculated, and the state evaluation quantity is calculated by summing the reward function values in a time series,
Calculating the state evaluation amount according to a first region for reducing the state evaluation amount and a second region for reducing the reward function value, the first region being set in a part or all of a plurality of subspaces having axes of two or more elements among the plurality of elements;
Vehicle control device.

The plurality of elements include some or all of time, position in the direction of travel of the vehicle, velocity, and acceleration.
The vehicle control device according to claim 1.

The amount of movement is a jerk.
The vehicle control device according to claim 1 or 2.

The one or more evaluation target quantities include the speed, acceleration, and jerk of the vehicle.
The vehicle control device according to any one of claims 1 to 3.

A recognition unit that recognizes a surrounding situation of the vehicle,
the state evaluation amount calculation unit sets one or both of the first area and the second area in a part or all of the plurality of partial spaces based on a surrounding situation of the vehicle.
The vehicle control device according to any one of claims 1 to 4.

the state evaluation amount calculation unit sets the first area and the second area in a subspace having axes of a position and a speed when a static target is included in the surrounding situation of the vehicle;
The vehicle control device according to claim 5.

the state evaluation amount calculation unit sets the first area and the second area in a subspace having axes of position and time when a dynamic target object is included in the surrounding situation of the vehicle;
The vehicle control device according to claim 5 or 6.

The vehicle control device according to claim 5 , wherein the state evaluation amount calculation unit sets the first region and the second region adjacent to each other.

A vehicle control device
Calculating a state evaluation value for each of a plurality of state paths between a start time point and a target time point, the state path being defined by selecting, from a plurality of candidates, an amount of action occurring between a plurality of time points between the start time point and the target time point in a state space including time and having a plurality of elements related to the movement of the vehicle as axes ;
determining a future speed transition of the vehicle according to a state path having a high state metric ;
When calculating the condition value,
Calculating a reward function value based on one or more evaluation target quantities for each time point between the start time point and the target time point, and calculating the state evaluation quantity by summing the reward function values in a time series;
Calculating the state evaluation amount according to a first region for reducing the state evaluation amount and a second region for reducing the reward function value, the first region being set in a part or all of a plurality of subspaces having axes of two or more elements among the plurality of elements;
A vehicle control method.

On the computer,
calculating a state evaluation amount for each of a plurality of state paths between a start time point and a target time point, the state path being defined by selecting, from a plurality of candidates, an amount of action occurring between a plurality of time points between the start time point and the target time point in a state space including time and having a plurality of elements related to the movement of the vehicle as axes;
A program for determining a future speed transition of the vehicle according to a state path having a high state evaluation value ,
When calculating the state evaluation amount ,
Calculating a reward function value based on one or more evaluation target quantities for each time point between the start time point and the target time point, and calculating the state evaluation quantity by summing the reward function values in a time series;
Calculating the state evaluation amount according to a first region for reducing the state evaluation amount and a second region for reducing the reward function value, the first region being set in a part or all of a plurality of subspaces having axes of two or more elements among the plurality of elements;
program.