CN114328547A

CN114328547A - Vehicle-mounted high-definition map data source selection method and device

Info

Publication number: CN114328547A
Application number: CN202111399669.9A
Authority: CN
Inventors: 吴帆; 任炬; 张尧学
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-04-12

Abstract

The invention provides a method and device for selecting a vehicle high-definition map data source. The method combines a reinforcement learning method to construct an asynchronous data source selection framework. The framework is divided into offline training and online selection parts, and the offline part is responsible for using a deep reinforcement learning algorithm for neural network. In the training of the model, the online part uses the neural network parameters synchronized from the offline part to select the data source, and realizes the parallel execution of data source selection, experience trajectory collection and model training. Through the present invention, the problem of throughput reduction caused by the data source transmission process can be avoided, frequent data source switching can be avoided, and the optimal vehicle high-definition map data source can be effectively selected.

Description

A method and device for selecting a vehicle high-definition map data source

技术领域technical field

本发明涉及深度学习技术领域，尤其涉及一种车载高清地图数据源选择方法及装置。The invention relates to the technical field of deep learning, and in particular, to a method and device for selecting a vehicle high-definition map data source.

背景技术Background technique

随着信息基础设施的广泛部署和车载传感技术的快速发展，自动驾驶已成为彻底改变当前汽车技术的一个有希望的方向。自动驾驶是智能技术的未来发展趋势，它利用大量传感器组成感知系统，感知车辆周围的环境信息。根据感知系统获得的道路结构、车辆位置、障碍物状态等信息，实施自动电控系统控制车辆行驶速度和方向，使其在道路上安全可靠地行驶。与传统的电子地图不同，自动驾驶汽车需要高清(HD)地图来支持车道级导航。高清地图是专题地图，可以分为三层：道路模型层、车道模型层和定位模型层。具体来说，道路模型用于导航规划；车道模型用于基于感知当前道路和交通状况进行路线规划；定位模型用于在地图中定位车辆，车道模型只有在车辆准确定位在地图上时才能辅助车辆感知。要实现自动驾驶，高清地图是必不可少的组成部分，但与传统电子地图相比，高清地图的数据量相对较大。因此，在车上存储全部高清地图是不切实际的，并且道路信息和交通信息是实时变化的，高清地图应该实时分发，低延迟和高可靠性。With the widespread deployment of information infrastructure and the rapid development of in-vehicle sensing technology, autonomous driving has emerged as a promising direction to revolutionize current automotive technology. Autonomous driving is the future development trend of intelligent technology. It uses a large number of sensors to form a perception system to perceive environmental information around the vehicle. According to the road structure, vehicle position, obstacle status and other information obtained by the perception system, the automatic electronic control system is implemented to control the speed and direction of the vehicle, so that it can drive safely and reliably on the road. Unlike traditional electronic maps, autonomous vehicles require high-definition (HD) maps to support lane-level navigation. The high-definition map is a thematic map, which can be divided into three layers: the road model layer, the lane model layer and the positioning model layer. Specifically, the road model is used for navigation planning; the lane model is used for route planning based on sensing the current road and traffic conditions; the localization model is used to locate the vehicle in the map, and the lane model can only assist the vehicle when the vehicle is accurately positioned on the map perception. To achieve autonomous driving, high-definition maps are an essential component, but compared with traditional electronic maps, high-definition maps have a relatively large amount of data. Therefore, it is impractical to store all high-definition maps in the car, and the road information and traffic information are changing in real time, and the high-definition maps should be distributed in real time, with low latency and high reliability.

传统的高清地图选择和分发方式是采用RTT指标判断数据源的方法，但是随着覆盖范围内车辆数量增加，传统方案通过使用通信模型(车对基础设施(V2I)或车对车(V2V))来选择数据源，其中吞吐量将显着降低；此外，传统方案仅通过测量数据源和车辆之间的往返时间(RTT)来选择数据源，在这种情况下，车辆状态是实时变化的，尤其是在复杂的移动场景下，由于没有考虑其他类型的车辆信息(例如速度、方向)，RTT的度量不能保证最佳的数据源选择结果；再者，由于移动性问题，实施传统方案的方法，会出现频繁的数据源切换，导致频繁的RTT更新和低效的数据传输。总之，现有方案无法有效判断当前选择的数据源质量，并且随着车辆的频繁移动导致RTT测量不准确和低效率数据传输。The traditional high-definition map selection and distribution method is to use the RTT indicator to determine the data source, but as the number of vehicles within the coverage increases, the traditional solution uses a communication model (Vehicle-to-Infrastructure (V2I) or Vehicle-to-Vehicle (V2V)) to select the data source, where the throughput will be significantly reduced; in addition, the traditional scheme selects the data source only by measuring the round trip time (RTT) between the data source and the vehicle, in this case, the vehicle state is changing in real time, Especially in complex mobile scenarios, since other types of vehicle information (such as speed, direction) are not considered, the measurement of RTT cannot guarantee the best data source selection results; moreover, due to mobility problems, the method of implementing traditional solutions , there will be frequent data source switching, resulting in frequent RTT updates and inefficient data transmission. In conclusion, the existing solutions cannot effectively judge the quality of the currently selected data source, and lead to inaccurate RTT measurement and inefficient data transmission as the vehicle moves frequently.

发明内容SUMMARY OF THE INVENTION

本发明提供一种车载高清地图数据源选择方法及装置，旨在为车辆自动驾驶选择最优的车载高清地图数据源。The present invention provides a method and device for selecting a vehicle high-definition map data source, aiming to select an optimal vehicle high-definition map data source for the automatic driving of the vehicle.

为此，本发明的第一个目的在于提出一种车载高清地图数据源选择方法，包括：To this end, the first object of the present invention is to propose a method for selecting a vehicle high-definition map data source, including:

构建车载高清地图数据源选择网络，所述车载高清地图数据源选择网络包括离线训练网络和在线选择网络；constructing a vehicle high-definition map data source selection network, wherein the vehicle high-definition map data source selection network includes an offline training network and an online selection network;

利用现有的不同车载高清地图数据源的状态信息作为训练数据集，对所述离线训练网络进行训练，训练完成后将所述离线训练网络的网络参数应用于所述在线选择网络中；Using the state information of different existing vehicle-mounted high-definition map data sources as a training data set, the offline training network is trained, and after the training is completed, the network parameters of the offline training network are applied to the online selection network;

自动驾驶车辆行驶过程中，将实时接收的多个车载高清地图数据源的状态信息输入训练完成的所述在线选择网络中，输出结果即为对车载高清地图数据源的选择结果。During the driving of the autonomous vehicle, the real-time received status information of multiple on-board high-definition map data sources is input into the online selection network after training, and the output result is the selection result of on-board high-definition map data sources.

其中，车载高清地图数据源选择网络的离线训练网络和在线选择网络采用DDQN神经网络；其中，对所述离线训练网络进行训练的步骤包括：Wherein, the offline training network and the online selection network of the vehicle high-definition map data source selection network adopt DDQN neural network; wherein, the steps of training the offline training network include:

通过所述离线训练网络的收集器收集车载高清地图数据源并提取状态信息，划分训练集和测试集，作为所述离线训练网络的训练数据；Collect the vehicle high-definition map data source and extract state information through the collector of the offline training network, and divide the training set and the test set as the training data of the offline training network;

构建所述离线训练网络；所述离线训练网络包括两个强化学习网络DQN，对两个强化学习网络DQN进行同步训练；constructing the offline training network; the offline training network includes two reinforcement learning networks DQN, and the two reinforcement learning networks DQN are trained synchronously;

将所述训练集输入两个强化学习网络DQN中进行训练，优化更新第一强化学习网络DQN的参数，在所述第一强化学习网络DQN找出具有最大Q值的动作，并利用第二强化学习网络DQN计算满足要求的Q值；Input the training set into two reinforcement learning networks DQN for training, optimize and update the parameters of the first reinforcement learning network DQN, find the action with the largest Q value in the first reinforcement learning network DQN, and use the second reinforcement The learning network DQN calculates the Q value that meets the requirements;

构建离线训练网络的损失函数，表征实时学习的Q值与目标Q值之间的差异，在损失函数最小时判定所述离线训练网络训练完成，并将测试集输入训练完成的离线训练网络中验证训练的准确性。Construct the loss function of the offline training network to characterize the difference between the Q value of real-time learning and the target Q value, determine that the offline training network is trained when the loss function is the smallest, and input the test set into the offline training network that has been trained for verification. training accuracy.

其中，在离线训练网络训练完成后，将离线训练网络的网络参数应用于所述在线选择网络；所述在线选择网络对车载高清地图数据源进行数据源选择后，生成经验信息并发送至离线训练网络，以供所述离线训练网络训练迭代。Wherein, after the offline training network training is completed, the network parameters of the offline training network are applied to the online selection network; after the online selection network performs data source selection on the vehicle high-definition map data source, experience information is generated and sent to the offline training network for the offline training network training iterations.

其中，实时接收多个车载高清地图数据源的状态信息的步骤包括：Wherein, the step of receiving the status information of multiple vehicle high-definition map data sources in real time includes:

自动驾驶车辆向外发送探测兴趣包，以寻找潜在车载高清地图数据源及状态信息；The autonomous vehicle sends out probe interest packets to find potential in-vehicle high-definition map data sources and status information;

在接收到车载高清地图数据源后，通过线选择网络的过滤器进行筛选，得到车载高清地图数据源集合，作为在线选择网络的输入。After receiving the vehicle high-definition map data source, it is filtered through the filter of the online selection network to obtain the vehicle high-definition map data source set, which is used as the input of the online selection network.

其中，第一强化学习网络DQN寻找最大Q值的动作公式表示为：Among them, the action formula of the first reinforcement learning network DQN to find the maximum Q value is expressed as:

a^max(s′，ω)＝argmax_a′Q(s′，a，ω) (1)a ^max (s', ω)=argmax _a' Q(s', a, ω) (1)

其中，s表示状态，a表示行动，ω为权重参数；Among them, s represents the state, a represents the action, and ω is the weight parameter;

使用动作a^max(s′，ω)在第二强化学习网络DQN中进行计算，得到目标Q值，公式表示为：Use the action a ^max (s', ω) to calculate in the second reinforcement learning network DQN to obtain the target Q value, and the formula is expressed as:

y＝r+γQ′(s′，argmax_a′Q(s′，a，ω)，ω^-) (2)y=r+γQ′(s′, argmax _a′ Q(s′, a, ω), ω ⁻ ) (2)

其中，γ表示折扣因子，r表示奖励。where γ is the discount factor and r is the reward.

其中，离线训练网络的损失函数公式表示为：Among them, the loss function formula of offline training network is expressed as:

其中，m表示训练次数；Among them, m represents the number of training times;

利用平衡更新方法更新第二强化学习网络DQN的权重，平滑更新计算如公式(4)所示：Use the balanced update method to update the weight of the second reinforcement learning network DQN, and the smooth update calculation is shown in formula (4):

ω^-←l*ω+(1-l)ω^- (4)ω ^- ←l*ω+(1-l)ω ^- (4)

其中，l表示更新率，l＜＜1，ω^-为第二强化学习网络DQN的权重参数。Among them, l represents the update rate, l<<1, ω ^- is the weight parameter of the second reinforcement learning network DQN.

其中，在进行车载高清地图数据源选择之前，包括触发车载高清地图数据源选择的步骤；判断触发车载高清地图数据源选择的原因至少包括：车辆在进行初始化时为自动驾驶车辆选择数据源；或者连接的链路质量变差或者断开连接时为自动驾驶车辆选择新数据源；Wherein, before selecting the data source of the vehicle high-definition map, the step of triggering the selection of the vehicle high-definition map data source is included; judging that the reasons for triggering the selection of the vehicle high-definition map data source include at least: selecting a data source for the autonomous vehicle when the vehicle is initialized; or Choose a new data source for the autonomous vehicle when the quality of the connected link deteriorates or when the connection is disconnected;

判定是否因为连接的链路质量变差或者断开连接时，计算当前自动驾驶车辆连接数据源的丢包率，以当前时间戳为随机种子，使用随机函数计算命中切换概率，表示为：When determining whether the quality of the connected link deteriorates or the connection is disconnected, calculate the packet loss rate of the current autonomous vehicle connection data source, use the current timestamp as the random seed, and use the random function to calculate the hit switching probability, which is expressed as:

通过rand()函数产生100以内的随机数，如果

，则表示概率命中执行切换数据源；公式表示为：Generate random numbers within 100 by the rand() function, if

, it means that the probability hits the execution switch data source; the formula is expressed as:

其中，H表示数据源切换判断标志，P_max表示最大丢包率，即当链路丢包率大于最大丢包率后则不再使用概率计算方式进行切换判断，直接进行数据源切换。Among them, H represents the data source switching judgment flag, P _max represents the maximum packet loss rate, that is, when the link packet loss rate is greater than the maximum packet loss rate, the probability calculation method is no longer used for switching judgment, and the data source switching is performed directly.

其中，在接收到车载高清地图数据源后，通过线选择网络的过滤器进行筛选的步骤中，包括：Among them, after receiving the vehicle high-definition map data source, the steps of screening through the filter of the line selection network include:

过线选择网络的选择器接收t时刻的第i个数据源的状态信息s_t，i＝(N_t，i，M_t，i，V_t，i，D_t，i)，其中N_t，i表示第个i数据源在t时刻的数据源和车辆之间的往返时间RTT；M_t，i表示第个i数据源发送数据包的时间间隔；V_t，i表示第个i数据在t时刻的行驶速度；D_t，i表示第个i数据源在t时刻与发送请求车辆的距离；The selector of the line selection network receives the state information s _{t of the i-th data source at time t, i} =(N _{t, i} , M _{t, i} , V _{t, i} , D _{t, i} ), where N _{t, i} represents the round-trip time RTT between the data source and the vehicle of the i-th data source at time t; M _{t, i} represents the time interval for the i-th data source to send data packets; V _{t, i} represents the i-th data source at t Driving speed at time; D _{t, i} represents the distance between the i-th data source and the vehicle sending the request at time t;

N_t，i表示平滑RTT值，RTT越小表示数据源往返时延越小，网络性能越好；当多次获取到从同一个数据源的RTT值，对其做RTT平滑处理可信度更高，可以通过Jacobson/Karels算法中平滑方法，计算公式如下：N _{t, i} represents the smoothed RTT value, the smaller the RTT, the smaller the round-trip delay of the data source, and the better the network performance; when the RTT value from the same data source is obtained multiple times, the RTT smoothing process is more reliable. High, you can use the smoothing method in the Jacobson/Karels algorithm, and the calculation formula is as follows:

N_t，i＝u*N_t-1，i+e*(R_t，i-N_t-1，i) (7)N _t,i =u*N _t-1,i +e*(R _t,i -N _t-1,i ) (7)

其中，R_t，i表示当前观察到的瞬时RTT值，u＝1，e＝0.125；Among them, R _{t, i} represents the currently observed instantaneous RTT value, u=1, e=0.125;

M_t，i表示平滑间隔时间，在带宽相同的情况下，此间隔越大，则数据源越空闲，剩余可用带宽越大；相反此间隔越小，数据源剩余可用带宽越小，计算公式为：M _{t, i} represents the smoothing interval time. In the case of the same bandwidth, the larger the interval is, the more idle the data source is and the larger the remaining available bandwidth; on the contrary, the smaller the interval is, the smaller the remaining available bandwidth of the data source is. The calculation formula is: :

M_t，i＝(1-σ)*M_t-1，i+σ*(Data_t，i-Data_t-1，i) (8)M _t,i =(1-σ)*M _t-1,i +σ*(Data _t,i -Data _t-1,i ) (8)

其中，σ＝0.5，Data_t，i表示当前状态信息数据发送时刻，Data_t-1，i表示上一个状态信息数据发送时刻，两者相减即为时间间隔；Among them, σ=0.5, Data _{t, i} represents the current state information data sending time, Data _{t-1, i} represents the last state information data sending time, the subtraction of the two is the time interval;

V_t，i表示车辆行驶速度，V_t，i＞0表示对应数据源与请求自动驾驶车辆同向行驶，V_t，i＜0表示对应数据源与请求自动驾驶车辆相向行驶；速度越低，数据源越稳定；V _t,i indicates the speed of the vehicle, V _{t, i} > 0 indicates that the corresponding data source and the requesting automatic driving vehicle travel in the same direction, V _{t, i} < 0 indicates that the corresponding data source and the requesting automatic driving vehicle travel in the opposite direction; the lower the speed, the The more stable the data source;

D_t，i表示第个i数据源在t时刻与发送请求的自动驾驶车辆的距离，D_t，i＞0表示对应数据源在自动驾驶车辆前方，D_t，i＜0表示对应数据源在自动驾驶车辆后方；距离越近数据源的稳定性越高；D _t,i represents the distance between the i-th data source and the autonomous vehicle that sends the request at time t, D _{t, i} > 0 indicates that the corresponding data source is in front of the autonomous vehicle, and D _{t, i} < 0 indicates that the corresponding data source is in Behind the autonomous vehicle; the closer the distance, the higher the stability of the data source;

依次使用包含的不同状态信息作为依据进行排序，筛选具有最佳状态的数据源；4个状态分别能选出一个最佳数据源，4个最佳数据源一共含有4组共16个状态信息数据，作为在线选择网络输入。Use the different status information contained as a basis for sorting, and filter the data sources with the best status; each of the 4 statuses can select an optimal data source, and the 4 best data sources contain a total of 4 groups of 16 status information data. , as the online selection network input.

其中，在依次使用包含的不同状态信息作为依据进行排序，筛选具有最佳状态的数据源的步骤中，包括：Wherein, in the steps of sorting the data sources with the best state by sequentially using the different state information contained as a basis, the steps include:

计算每个状态的最优值，即Calculate the optimal value for each state, i.e.

计算在执行动作a_t时选择数据源的评分：Compute the score for choosing a data source when performing action a _t :

其中，Max{G_t，i}表示在s_t状态下执行a_t动作时选择的数据源i的最高评分；Among them, Max{G _t _{, i} } represents the highest score of the data source i selected when performing the at action in the s _t state;

通过调整在线选择网络的网络参数对状态参数进行归一化，调整动作参数的取值范围，构建数据源和状态参数值之间的映射关系，确定评分最高的数据源，作为最终选择结果。By adjusting the network parameters of the online selection network, the state parameters are normalized, the value range of the action parameters is adjusted, the mapping relationship between the data source and the state parameter value is constructed, and the data source with the highest score is determined as the final selection result.

其中，生成经验信息并发送至离线训练网络，以供所述离线训练网络训练迭代的步骤之后，还包括设定对应数据源的奖励的步骤；奖励函数表示为：Wherein, after the step of generating the experience information and sending it to the offline training network for the training iteration of the offline training network, it also includes the step of setting the reward corresponding to the data source; the reward function is expressed as:

其中，

表示链路吞吐量，

表示链路持续时间，

表示当前连接数据源RTT值，通过N_t，i计算平滑RTT；其它指标系数取值范围为1＜＜ρ≤2、

0＜＜φ≤0.5。in,

represents the link throughput,

represents the link duration,

Indicates the RTT value of the currently connected data source, and calculates the smoothed RTT by N _{t, i} ; the value range of other index coefficients is 1<<ρ≤2,

0<<φ≤0.5.

本发明的第二个目的在于提出一种车载高清地图数据源选择装置，包括：The second object of the present invention is to propose a vehicle high-definition map data source selection device, including:

网络构建模块：用于构建车载高清地图数据源选择网络，所述车载高清地图数据源选择网络包括离线训练网络和在线选择网络；Network building module: used to construct a vehicle high-definition map data source selection network, and the vehicle high-definition map data source selection network includes an offline training network and an online selection network;

网络训练模块，用于利用现有的不同车载高清地图数据源的状态信息作为训练数据集，对所述离线训练网络进行训练，训练完成后将所述离线训练网络的网络参数应用于所述在线选择网络中；The network training module is used to use the state information of different existing vehicle high-definition map data sources as training data sets to train the offline training network, and after the training is completed, apply the network parameters of the offline training network to the online training network. select network;

数据源选择模块，用于自动驾驶车辆行驶过程中，将实时接收的多个车载高清地图数据源的状态信息输入训练完成的所述在线选择网络中，输出结果即为对车载高清地图数据源的选择结果。The data source selection module is used to input the status information of multiple in-vehicle high-definition map data sources received in real time into the online selection network that has been trained during the driving process of the autonomous vehicle, and the output result is the data source of the in-vehicle high-definition map data source. Select the result.

区别于现有技术，本发明提供的车载高清地图数据源选择方法，利用NDN架构的内容分发机制和转发策略，并结合强化学习方法构建异步数据源选择框架，该框架分为离线训练和在线选择部分，离线部分负责使用深度强化学习算法进行神经网络模型的训练，同时在线部分使用从离线部分同步过来的神经网络参数进行数据源的选择，实现数据源选择、经验轨迹采集和模型训练的并行执行。通过本发明，能够避免数据源传输过程中造成的吞吐量降低的问题，避免频繁的数据源切换，有效选定最佳的车载高清地图数据源。Different from the prior art, the vehicle high-definition map data source selection method provided by the present invention utilizes the content distribution mechanism and forwarding strategy of the NDN architecture, and combines the reinforcement learning method to construct an asynchronous data source selection framework, which is divided into offline training and online selection. Part, the offline part is responsible for using deep reinforcement learning algorithm to train the neural network model, while the online part uses the neural network parameters synchronized from the offline part to select the data source to realize the parallel execution of data source selection, experience trajectory collection and model training . Through the present invention, the problem of throughput reduction caused by the data source transmission process can be avoided, frequent data source switching can be avoided, and the optimal vehicle high-definition map data source can be effectively selected.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

图1是本发明提供的一种车载高清地图数据源选择方法的流程示意图。FIG. 1 is a schematic flowchart of a method for selecting a vehicle high-definition map data source provided by the present invention.

图2是本发明提供的一种车载高清地图数据源选择方法中车载高清地图数据源选择网络的结构示意图。2 is a schematic structural diagram of a vehicle high-definition map data source selection network in a method for selecting a vehicle high-definition map data source provided by the present invention.

图3是本发明提供的一种车载高清地图数据源选择方法中探索兴趣包和data包的包结构示意图。3 is a schematic diagram of a package structure of an exploration interest package and a data package in a method for selecting a vehicle high-definition map data source provided by the present invention.

图4是本发明提供的一种车载高清地图数据源选择装置的结构示意图。FIG. 4 is a schematic structural diagram of a vehicle high-definition map data source selection device provided by the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention and should not be construed as limiting the present invention.

本发明提出了一种车载高清地图数据源选择方法，采用深度强化学习算法根据收集到的经验数据学习训练神经网络，以生成数据源选择的选择策略本发明基于深度强化学习(DRL)构建车载高清地图选择网络，由四个主要部分组成，即状态信息、动作、策略和奖励。为了模拟车辆场景的动态，使用车速、车辆行驶方向、间隔时间和平滑的RTT来表示数据源的状态信息。为了评估所选数据源的动作性能，定义将链路吞吐量、链路持续时间和RTT考虑在内的奖励函数。为了运行车载高清地图选择网络，提出一种离线训练和在线决策机制，以在车载场景中找到合适的数据源，意味着神经网络训练过程通过使用收集的经验数据离线执行并迭代更新以辅助在线选择。在离线训练和在线决策机制中，设计基于异步强化学习的算法，将轨迹采集和神经网络训练解耦，实现数据源选择、经验轨迹采集和模型训练的并行执行。具体如下：The invention proposes a method for selecting a vehicle high-definition map data source, which adopts a deep reinforcement learning algorithm to learn and train a neural network according to the collected experience data to generate a selection strategy for data source selection. The map selection network consists of four main parts, namely state information, action, policy and reward. To simulate the dynamics of vehicle scenarios, vehicle speed, vehicle driving direction, interval time, and smooth RTT are used to represent the state information of the data source. To evaluate the action performance of selected data sources, a reward function is defined that takes into account link throughput, link duration, and RTT. In order to run the in-vehicle high-definition map selection network, an offline training and online decision-making mechanism is proposed to find suitable data sources in the in-vehicle scene, meaning that the neural network training process is performed offline and iteratively updated by using the collected empirical data to assist online selection. . In the offline training and online decision-making mechanism, an algorithm based on asynchronous reinforcement learning is designed to decouple the trajectory acquisition and neural network training, and realize the parallel execution of data source selection, empirical trajectory acquisition and model training. details as follows:

图1为本发明实施例所提供的一种车载高清地图数据源选择方法的流程示意图。该方法包括以下步骤：FIG. 1 is a schematic flowchart of a method for selecting a vehicle high-definition map data source according to an embodiment of the present invention. The method includes the following steps:

步骤101，构建车载高清地图数据源选择网络，所述车载高清地图数据源选择网络包括离线训练网络和在线选择网络。Step 101 , constructing a vehicle high-definition map data source selection network, where the vehicle high-definition map data source selection network includes an offline training network and an online selection network.

为了支持实时数据源选择，本发明构建的车载高清地图选择网络包括离线训练网络和在线选择网络，如图2所示。其目的主要是解耦数据采集和模型训练，使得在线算法可以实时进行数据源选择和数据采集，离线算法同步进行模型训练和迭代。在离线部分，使用收集器从选择器和环境之间的交互中收集经验信息，然后将其存储在重放缓冲区库中。此外，训练器使用深度Q网络从收集的经验集中训练策略。在线部分，选择器通过观察环境获取数据源信息作为状态，并通过策略采取行动。每完成一次迭代，选择器将经验分享给离线部分的收集器，并同步策略神经网络的权重。In order to support real-time data source selection, the vehicle high-definition map selection network constructed by the present invention includes an offline training network and an online selection network, as shown in FIG. 2 . The main purpose is to decouple data collection and model training, so that online algorithms can perform data source selection and data collection in real time, and offline algorithms can synchronize model training and iteration. In the offline part, a collector is used to collect empirical information from the interaction between the selector and the environment, which is then stored in the replay buffer library. Furthermore, the trainer uses a deep Q-network to train the policy from the collected experience set. In the online part, the selector obtains the data source information as state by observing the environment, and takes action through the policy. After each iteration, the selector shares its experience with the offline part of the collector and synchronizes the weights of the policy neural network.

具体的，本发明构建的网络模型结构如下：Specifically, the network model structure constructed by the present invention is as follows:

智能体(Agent)：需要进行地图更新的自动驾驶车辆上用来执行数据源选择与切换决策的智能体。触发选择后将收集处理数据源状态信息，然后从神经网络获得输出后执行相应选择动作，并等待获取奖励，智能体通过与环境交互的方式来进行学习。Agent: The agent used to perform data source selection and switching decisions on the autonomous vehicle that needs to update the map. After the selection is triggered, the state information of the data source will be collected and processed, and then the corresponding selection action will be performed after obtaining the output from the neural network, and the agent will learn by interacting with the environment.

状态(s_t)：由于数据源的异构性(RSU或者车辆)，可以获取到多种数据源的状态信息，本发明专利采用了4种参数来代表数据源当前状态信息。智能体使用状态s_t，i表示时刻t触发数据源选择时获得到的第i个数据源的状态，即s_t，i＝(N_t，i，M_t，i，V_t，i，D_t，i)。其中，N_t，i表示第i个数据源在t时刻的RTT值；M_t，i表示第个i数据源发送数据包的时间间隔；V_t，i表示第i个数据在t时刻的行驶速度；D_t，i表示第i个数据源在t时刻与发送请求车辆的距离，具体计算方式后续将展开。Status (s _t ): Due to the heterogeneity of data sources (RSU or vehicle), status information of various data sources can be obtained. The patent of the present invention uses four parameters to represent the current status information of data sources. The agent uses the state s _t,i to represent the state of the i-th data source obtained when the data source selection is triggered at time t, that is, s _t,i =(N _t,i ,M _t,i ,V _t,i ,D _{t, i} ). Among them, N _t,i represents the RTT value of the i-th data source at time t; M _t,i represents the time interval for the i-th data source to send data packets; V _t,i represents the travel of the i-th data source at time t Speed; D _{t, i} represents the distance between the i-th data source and the vehicle sending the request at time t, and the specific calculation method will be developed later.

动作(a_t)：a_t表示智能体在t时刻执行的动作。在本发明设计中，a_t不直接选择具体数据源，而是为智能体提供选择方法，即神经网络需要确定具体动作参数

动作参数为离散集合，然后再将具体的动作映射为对应的数据源选择，具体计算方法后续将展开。Action(at) _: at represents the action performed by the agent at time _t . In the design of the present invention, at _does not directly select the specific data source, but provides a selection method for the agent, that is, the neural network needs to determine the specific action parameters

Action parameters are discrete sets, and then specific actions are mapped to corresponding data source selections. The specific calculation method will be expanded later.

奖励(r_t)：当智能体执行完动作a_t以后，如果在t+1时刻再次触发数据源选择，同时获取到当前数据源状态s_t+1，智能体可以根据链路的状态计算出上一个动作获得的奖赏r(s_t，a_t)。智能体要做的就是最大化累积奖赏，即期望

称为γ折扣累积奖赏，折扣因子γ∈(0，1]，时间越靠后奖赏权重越低，

是全部随机变量的期望。累积奖赏能够判断某个策略的优劣，越优秀的策略累积奖赏越高，具体计算方法和奖赏函数定义后续将展开。Reward (r _t ): After the agent completes the action at _t , if the data source selection is triggered again at time t+1, and the current data source state s _t+1 is obtained at the same time, the agent can calculate according to the state of the link The reward r(s _t , at _t ) for the previous action. All the agent has to do is maximize the cumulative reward, the expectation

It is called γ discount cumulative reward, the discount factor γ∈(0,1], the later the time, the lower the reward weight,

is the expectation of all random variables. The cumulative reward can judge the pros and cons of a strategy. The better the strategy, the higher the cumulative reward. The specific calculation method and definition of the reward function will be expanded later.

策略(π)：智能体需要通过与环境不断进行交互来学习一个达到较好效果的策略π，并以此指导智能体下一步动作选择。策略的好坏使用上文中提到的累积奖赏进行判断。本发明中使用确定性策略作为策略搜索方法，即π(s_t)＝a_t，表示智能体识别到状态s_t时会执行动作a_t，因此数据源选择的过程可以表示为一系列数据源状态与动作的映射关系对，然后通过这些映射关系对，智能体就能在相应状态下选择最佳的动作。本发明使用深度强化学习算法，无需建立映射表格，使用图1动作选择神经网络表示策略。动作选择神经网络可以很方便的处理输入状态和输出动作参数集合。Strategy (π): The agent needs to interact with the environment to learn a strategy π that achieves better results, and guide the agent to choose the next action. The quality of the strategy is judged using the jackpot mentioned above. In the present invention, a deterministic strategy is used as the strategy search method, that is, π(s _t )=at , which means that the agent will execute the action at when it recognizes the state _{s t} _, _so the process of data source selection can be expressed as a series of data sources The mapping relationship between states and actions, and then through these mapping relationship pairs, the agent can choose the best action in the corresponding state. The present invention uses a deep reinforcement learning algorithm, and does not need to build a mapping table, and uses the actions shown in Figure 1 to select a neural network representation strategy. Action selection neural networks can easily process input states and output action parameter sets.

步骤102：利用现有的不同车载高清地图数据源的状态信息作为训练数据集，对所述离线训练网络进行训练，训练完成后将所述离线训练网络的网络参数应用于所述在线选择网络中。Step 102: Use the status information of different existing vehicle high-definition map data sources as training data sets to train the offline training network, and apply the network parameters of the offline training network to the online selection network after the training is completed. .

车载高清地图数据源选择网络的离线训练网络和在线选择网络采用DDQN神经网络；其中，对所述离线训练网络进行训练的步骤包括：The offline training network and the online selection network of the vehicle high-definition map data source selection network adopt DDQN neural network; wherein, the steps of training the offline training network include:

在线训练部分，本发明利用DDQN算法作为强化学习方法来训练基于Q-learning的模型，以减少过估计现象。Q-learning的核心有两个：异策略(OffPolicy)和时间差分(Temporal Difference，TD)。异策略即选择动作的策略和更新Q值的策略不是同一个策略，选择动作的策略为贪心策略，更新Q值的策略为确定性策略，即选择Q值最大的动作。而时间差分是指利用TD目标来更新当前值函数。TD目标是带有衰减的未来获益的总和。首先为了提高算法的收敛性，本发明设计两个神经网络来进行同步训练，利用当前神经网络Q(权重参数ω)来负责更新模型的权重参数；并利用目标神经网络Q′(权重参数ω^-)来负责计算Q值。另外，为了减少由值迭代或参数更新引起的过估计现象(即估计的函数值比真值函数大，最终导致模型偏差)，本发明首先在当前神经网络Q中找出具有最大Q值的动作，计算方法如下：In the online training part, the present invention uses the DDQN algorithm as a reinforcement learning method to train a model based on Q-learning, so as to reduce the phenomenon of overestimation. There are two cores of Q-learning: OffPolicy and Temporal Difference (TD). Different strategies means that the strategy for selecting an action and the strategy for updating the Q value are not the same strategy. The strategy for selecting an action is a greedy strategy, and the strategy for updating the Q value is a deterministic strategy, that is, the action with the largest Q value is selected. The time difference refers to using the TD target to update the current value function. The TD target is the sum of future gains with decay. First of all, in order to improve the convergence of the algorithm, the present invention designs two neural networks for synchronous training, uses the current neural network Q (weight parameter ω) to be responsible for updating the weight parameters of the model; and uses the target neural network Q' (weight parameter ω ⁻ ) is responsible for calculating the Q value. In addition, in order to reduce the overestimation phenomenon caused by value iteration or parameter update (that is, the estimated function value is larger than the true value function, which eventually leads to model deviation), the present invention first finds the action with the largest Q value in the current neural network Q , the calculation method is as follows:

然后，使用动作a^max(s′，ω)在目标神经网络Q′中进行计算，最后得到满足要求的目标Q值y，计算方法如下：Then, use the action a ^max (s', ω) to calculate in the target neural network Q', and finally obtain the target Q value y that meets the requirements. The calculation method is as follows:

y＝r+γQ′(s′，argmax_a′Q(s′，a，ω)，ω^- (2)y=r+γQ'(s', argmax _a' Q(s', a, ω), ω ^- (2)

在离线训练部分，经验回放库保存着过往的经验组

记录了每次迭代的过去经验，这对于实时车载网络中的模型训练至关重要。对于每次迭代，当前网络通过从经验回放库中随机采样m组的经验来训练，并且这些m组之间没有时间相关性。损失函数(L)的目标是最小化学习到的Q值和目标Q值之间的差异，计算如下：In the offline training part, the experience playback library saves the past experience groups

The past experience for each iteration is recorded, which is crucial for model training in real-time in-vehicle networks. For each iteration, the current network is trained by randomly sampling m groups of experiences from the experience replay library, and there is no temporal correlation between these m groups. The objective of the loss function (L) is to minimize the difference between the learned Q-value and the target Q-value, calculated as:

另外，本发明利用平衡更新方法更新目标网络Q′的权重ω^-，以提高目标网络Q′的稳定性，平滑更新计算如下：In addition, the present invention uses the balanced update method to update the weight ω ^- of the target network Q' to improve the stability of the target network Q', and the smooth update is calculated as follows:

ω^-←l*ω+(1-l)ω^- (4)ω ^- ←l*ω+(1-l)ω ^- (4)

其中，l表示更新率，l＜＜1。Among them, l represents the update rate, and l<<1.

在线选择部分，车辆中运行过程中过观察现实世界车辆场景中的环境状态来采取行动，将经验收集到收集器进行离线模型训练。另外，选择策略网络结构与离线部分的神经网络结构相同，输入环境状态，输出为动作值。每次发起数据源获取与切换请求时选择器先从应用层的训练器中获取当前神经网络Q训练出的权重ω，然后同步给动作选择神经网络A(权重参数为ω^A)。一旦数据选择机制被触发，用户首先发送探测兴趣包来发现潜在的数据源和状态信息，选择器中的过滤器会对数据源及其状态进行筛选，将不满足要求的数据源剔除，得到数据源集合DS＝{DS1，DS2，DS3，DS4}。对于每个动作选择，选择策略网络A输出动作值后需要使用∈～greedy算法进行判断是否进行随机探索，即(1-∈)的概率选取值函数最大的动作，另外有∈的概率随机选择动作。在初始阶段这种方法可以对动作做出探索，避免陷入局部最优。随着训练与探索的进行，我们不在需要频繁的探索，通过利用δ因子(缩小比例)减少探索次数。随着探索率∈的降低，将有助于离线训练算法的收敛，其中∈＝∈-δ*∈。选择器在状态s_t下采取动作a_t后，选择器可以获得奖励r_t和下一个状态s_t+1，并将经验信息{s_t，a_t，r_t，s_t+1，end}同步给离线部分的经验回放库。另外，在线选择器还需要在网络层执行具体数据请求动作和获取相应奖赏。离线训练和在线选择异步算法设计能提高算法训练的效率。In the online selection part, actions are taken by observing the environmental state in the real-world vehicle scene during the operation of the vehicle, and the experience is collected into the collector for offline model training. In addition, the network structure of the selection strategy is the same as the neural network structure of the offline part, the input environment state, and the output is the action value. Each time a data source acquisition and switching request is initiated, the selector first obtains the weight ω trained by the current neural network Q from the trainer of the application layer, and then synchronously selects the neural network A for the action (the weight parameter is ω ^A ). Once the data selection mechanism is triggered, the user first sends a probe interest packet to discover potential data sources and status information. The filter in the selector will filter the data sources and their status, and eliminate the data sources that do not meet the requirements to obtain the data. Source set DS={DS1, DS2, DS3, DS4}. For each action selection, after the selection strategy network A outputs the action value, it needs to use the ∈ ~ greedy algorithm to determine whether to conduct random exploration, that is, the probability of (1-∈) selects the action with the largest value function, and the probability of ∈ randomly selects action. In the initial stage, this method can explore the action and avoid getting stuck in local optima. As training and exploration progress, we no longer need to explore frequently, and reduce the number of explorations by using the delta factor (downscaling). The convergence of the offline training algorithm will be facilitated as the exploration rate ∈ decreases, where ∈ = ∈ - δ*∈. After the selector takes the action a _t in the state s _t , the selector can obtain the reward r _t and the next state s _t+1 , and combine the experience information {s _t , at _t , r _t , s _t+1 , end} Synchronized to the offline part of the experience playback library. In addition, the online selector also needs to perform specific data request actions and obtain corresponding rewards at the network layer. Offline training and online selection of asynchronous algorithm design can improve the efficiency of algorithm training.

S103：自动驾驶车辆行驶过程中，将实时接收的多个车载高清地图数据源的状态信息输入训练完成的所述在线选择网络中，输出结果即为对车载高清地图数据源的选择结果。S103: During the driving process of the autonomous driving vehicle, input the status information of multiple in-vehicle high-definition map data sources received in real time into the online selection network that has been trained, and the output result is the selection result of the in-vehicle high-definition map data source.

在本发明中，触发车辆进行数据源选择有以下两方面的原因：1)车辆在进行初始化时必须为车辆选择数据源(默认以RTT最小为选择的依据)；2)当连接的链路质量变差或者断开连接时。将整个数据传输过程划分为连续且长度相等的时间周期，其周期采用IEEE802.11协议Beacon帧间隔时间，即100ms为一个周期。链路的丢包率是判断该链路质量的最直接方式，但如果一旦发生丢包就进行数据源切换则会导致数据源频繁的切换并影响链路吞吐量，因此本发明设计一种基于丢包率的周期性概率触发方案，其目的是通过丢包率来判断是否进行数据源切换，使得车辆能在链路质量较差时切换到新的数据源，同时避免数据源频繁切换带来额外开销。在每个周期中，首先计算当前车辆至连接数据源的丢包率P，然后以UNIX系统当前时间戳(Timestamp)作为随机种子，使用随机函数计算命中切换概率In the present invention, there are two reasons for triggering the vehicle to select the data source: 1) the vehicle must select the data source for the vehicle during initialization (the default is based on the minimum RTT); 2) when the link quality of the connection is deterioration or disconnection. The entire data transmission process is divided into continuous and equal length time periods, the period of which adopts the Beacon frame interval time of the IEEE802.11 protocol, that is, 100ms is a period. The packet loss rate of the link is the most direct way to judge the quality of the link, but if the data source switching occurs once the packet loss occurs, it will cause frequent switching of the data source and affect the link throughput. The periodic probability triggering scheme of the packet loss rate is to determine whether to switch the data source according to the packet loss rate, so that the vehicle can switch to a new data source when the link quality is poor, and at the same time avoid the frequent switching of the data source. extra cost. In each cycle, first calculate the packet loss rate P of the current vehicle to the connected data source, and then use the current timestamp of the UNIX system (Timestamp) as a random seed to calculate the hit switching probability using a random function

通过rand()函数产生100以内的随机数。如果

则表示概率命中执行切换数据源。Generate random numbers within 100 through the rand() function. if

It means that the probability hits the execution switch data source.

其中，H表示数据源切换判断标志，P_max表示最大丢包率(默认为30％)，即当链路丢包率大于最大丢包率后则不再使用概率计算方式进行切换判断，直接进行数据源切换。Among them, H represents the data source switching judgment flag, P _max represents the maximum packet loss rate (default is 30%), that is, when the link packet loss rate is greater than the maximum packet loss rate, the probability calculation method is no longer used for switching judgment, and directly Data source switching.

如果选择机制被触发，车辆将发送探测兴趣包，选择器通过接收包收集状态信息。这意味着激活了数据源选择机制。选择器将收集状态作为网络输入，选择的数据源作为动作输出，最终获取并记录车辆中相应的奖励。对于每个周期，选择器的目的是选择当前环境中最好的数据源。如果不触发选择机制，选择器将不会选择新的数据源并继续从数据源传输当前数据，这意味着基于数据源选择方法不会被激活。If the selection mechanism is triggered, the vehicle will send Probe Interest packets, and the selector will collect state information by receiving packets. This means that the data source selection mechanism is activated. The selector takes the collection state as the network input, the selected data source as the action output, and finally obtains and records the corresponding reward in the vehicle. For each cycle, the purpose of the selector is to select the best data source in the current environment. If the selection mechanism is not triggered, the selector will not select the new data source and continue to transfer the current data from the data source, which means that the selection method based on the data source will not be activated.

每当成功触发数据源选择时，请求的车辆将首先广播探测兴趣包。数据源的存储位置为其他车辆或沿交通路线上设置的基础设施。在发送探测兴趣包的过程中，原始链路没有中断。但是，一旦选择新的数据源，原始链路将断开并切换到新的数据源。当数据源收到探测兴趣包时，数据源会在返回的数据包中添加额外的状态信息(即单跳标签、间隔时间、距离和速度)。探测兴趣包发出后，请求的车辆设置等待定时器(默认为50ms)，定时器超时后会触发选择器进行数据源选择。因此，本发明在现有的探测兴趣包和Data包的基础上扩展了一些新的状态来收集信息，如图3所示。对于探测兴趣包，车辆将发送带有附加信息的探测包，即位置和行驶方向。当数据源响应车辆时，数据源会首先计算出它与请求车辆的距离，以及返回数据包的附加信息，即间隔时间和速度。Whenever data source selection is successfully triggered, the requesting vehicle will first broadcast a Probe Interest. Data sources are stored in other vehicles or infrastructure set up along the traffic route. The original link was not interrupted during the sending of Probe Interests. However, once a new data source is selected, the original link is broken and switched to the new data source. When the data source receives a Probe Interest, the data source adds additional state information (i.e., single-hop label, interval time, distance, and speed) to the returned data packet. After the probe interest packet is sent, the requested vehicle sets a waiting timer (50ms by default), and when the timer expires, the selector will be triggered to select the data source. Therefore, the present invention expands some new states to collect information on the basis of the existing probe Interest packets and Data packets, as shown in FIG. 3 . For Probe Interests, the vehicle will send a Probe packet with additional information, i.e. location and direction of travel. When a data source responds to a vehicle, the data source first calculates its distance from the requesting vehicle and additional information in the returned packet, namely interval time and speed.

在接收到车载高清地图数据源后，通过线选择网络的过滤器进行筛选的步骤中，包括：After receiving the vehicle high-definition map data source, the steps of selecting the network filter through the line include:

数据源状态作为神经网络的输入，其输入数量必须固定，但每次可能探测到的数据源数量并不固定，导致数据源的状态数目也不固定。本发明采取的方法是先对数据源进行一次筛选，依次使用每种状态作为依据进行排序。具有最佳状态的数据源可以通过筛选(例如：数据源DS1 RTT最小，通过筛选)，4个状态分别能选出一个最佳数据源。4个最佳数据源一共含有4组共16个状态，选择器将这16个状态作为输入。如果选出的数据源不足4个，此时将随机选取数据源作为输入的补充。例如，某辆车只获取到一个数据源，选择器会将此数据源的4个状态复制4次作为16个状态进行输入。使用这种方式主要目的是将输入状态的数量固定下来。The state of the data source is used as the input of the neural network, and the number of inputs must be fixed, but the number of data sources that may be detected each time is not fixed, resulting in the number of states of the data source is not fixed. The method adopted in the present invention is to screen the data sources once, and then use each state as a basis for sorting. The data source with the best state can be filtered (for example, the data source DS1 has the smallest RTT, passed the filtering), and each of the four states can select an optimal data source. The 4 best data sources contain 4 groups of 16 states in total, and the selector takes these 16 states as input. If the selected data sources are less than 4, the data sources will be randomly selected as input supplements. For example, if a car only gets one data source, the selector will copy 4 states of this data source 4 times as 16 states for input. The main purpose of using this method is to fix the number of input states.

数据源选择动作设计。即使只有两种类型的数据源(即基础设施RSU和车辆)，但由于车辆的移动，车辆在作为数据源时可能会导致动态变化。这意味着在不同的车辆场景中动作空间不是固定的。然而，数据源的数量受到链路持续时间内车辆和RSU的覆盖范围的限制。因此本发明采用排序的思想对动作进行定义，基于之前的状态定义，我们首先计算每个状态的最优值，即Data source selection action design. Even though there are only two types of data sources (i.e., infrastructure RSUs and vehicles), the vehicle may cause dynamic changes when it is a data source due to the movement of the vehicle. This means that the action space is not fixed in different vehicle scenarios. However, the number of data sources is limited by the coverage of vehicles and RSUs for the duration of the link. Therefore, the present invention adopts the idea of sorting to define actions. Based on the previous state definition, we first calculate the optimal value of each state, namely

然后通过计算在执行动作a_t时选择数据源的评分如下：Then by calculating the score for selecting a data source when performing action a _t as follows:

其中，Max{G_t，i}表示在s_t状态下执行a_t动作时选择的数据源i的最高评分。为了评估选择数据源i的动作评分G_t，通过调整相应的参数值来对状态参数进行归一化(

β_t，θ_t，μ_t表示相应状态参数，即动作参数)。Among them, Max{G _t _{, i} } represents the highest score of the data source i selected when performing the at action in the s _t state. To evaluate the action score G _t for selecting data source i, the state parameters are normalized by adjusting the corresponding parameter values (

β _t , θ _t , μ _t represent corresponding state parameters, namely action parameters).

在本发明中动作参数的取值范围设置为

β_t∈{0，0.2，0.8}和θ_t，μ_t∈{0，0.5}，在这种情况下，随着每个状态的动作参数的变化，这四个集合共有64个组合，这意味着有64个动作可供智能体选择。然后，通过选择神经网络的输出结果构建数据源和动作参数值之间的映射关系。最后，当车辆在s_t状态下执行动作a_t时，选择的数据源i是最高评分G_t，i。选择神经网络输出动作参数，然后计算可以选择数据源的评分G_t，i，并映射到数据源集合DS＝{DS1，DS2，DS3，DS4}，最终选择具体的数据源。In the present invention, the value range of the action parameter is set as

β _t ∈ {0, 0.2, 0.8} and θ _t , μ _t ∈ {0, 0.5}, in this case, there are a total of 64 combinations of these four sets as the action parameters of each state change, which This means that there are 64 actions for the agent to choose from. Then, a mapping relationship between data sources and action parameter values is constructed by selecting the output results of the neural network. Finally, when the vehicle performs action a _t in state _st , the selected data source i is the highest score G _t,i . Select the neural network output action parameters, then calculate the score G _t,i that can select the data source, and map it to the data source set DS={DS1, DS2, DS3, DS4}, and finally select the specific data source.

奖励函数设计：为确保网络可以从过去的经验中学习，每次执行动作后都会返回相应的奖励，代表了智能体遵循策略的整体收益。为了理解数据源的奖励，本发明考虑如下设计原则：1)尽可能增加吞吐量。吞吐量是地图数据传输最基本的指标，意味着车辆可以快速高效地获取地图数据；2)延长链接的持续时间。目的是避免车辆频繁切换数据源带来的额外开销，保持链路稳定可以增加吞吐量；3)减少传输延迟。在自动驾驶场景下，高清地图数据分发对传输时延的要求更高。低延迟意味着数据源可以快速响应车辆的请求，减少数据包排队时间。因此，本发明设计如下奖励函数：Reward function design: To ensure that the network can learn from past experience, a corresponding reward is returned after each action is performed, representing the overall benefit of the agent following the policy. To understand the rewards of data sources, the present invention considers the following design principles: 1) Increase throughput as much as possible. Throughput is the most basic indicator of map data transmission, which means that the vehicle can acquire map data quickly and efficiently; 2) Extend the duration of the link. The purpose is to avoid the extra overhead caused by the vehicle frequently switching data sources, and maintaining the link stability can increase the throughput; 3) reduce the transmission delay. In autonomous driving scenarios, high-definition map data distribution has higher requirements on transmission delay. Low latency means that data sources can quickly respond to vehicle requests, reducing packet queuing time. Therefore, the present invention designs the following reward function:

其中，

表示链路吞吐量，

表示链路持续时间，

表示当前连接数据源RTT值，通过N_t，i计算平滑RTT。其它指标系数取值范围为1＜＜ρ≤2、

0＜＜φ≤0.5。in,

represents the link throughput,

represents the link duration,

Indicates the RTT value of the currently connected data source, and the smoothed RTT is calculated by N _{t, i} . The value range of other index coefficients is 1<<ρ≤2,

0<<φ≤0.5.

本发明设计奖励函数的目的是最大化预期的累积折扣奖励，计算方法即期望

称为γ折扣累积奖赏。折扣因子γ∈(0，1]，它决定了奖励的时间尺度。时间越靠后奖赏权重越低，

是全部随机变量的期望。The purpose of designing the reward function in the present invention is to maximize the expected cumulative discount reward, and the calculation method is the expected

Called the gamma discount jackpot. The discount factor γ∈(0, 1], which determines the time scale of the reward. The later the time, the lower the reward weight,

is the expectation of all random variables.

为了实现上述实施例，本发明还提出一种车载高清地图数据源选择装置，如图4所示，包括：In order to realize the above embodiment, the present invention also proposes a vehicle high-definition map data source selection device, as shown in FIG. 4 , including:

网络构建模块310，用于构建车载高清地图数据源选择网络，所述车载高清地图数据源选择网络包括离线训练网络和在线选择网络；a network construction module 310, configured to construct a vehicle high-definition map data source selection network, the vehicle high-definition map data source selection network includes an offline training network and an online selection network;

网络训练模块320，用于利用现有的不同车载高清地图数据源的状态信息作为训练数据集，对所述离线训练网络进行训练，训练完成后将所述离线训练网络的网络参数应用于所述在线选择网络中；The network training module 320 is configured to use the status information of different existing vehicle high-definition map data sources as training data sets to train the offline training network, and apply the network parameters of the offline training network to the offline training network after the training is completed. online selection network;

数据源选择模块330，用于自动驾驶车辆行驶过程中，将实时接收的多个车载高清地图数据源的状态信息输入训练完成的所述在线选择网络中，输出结果即为对车载高清地图数据源的选择结果。The data source selection module 330 is used to input the status information of multiple vehicle high-definition map data sources received in real time into the online selection network that has been trained during the driving process of the autonomous vehicle, and the output result is the data source of the vehicle-mounted high-definition map. selection result.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically defined.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of the process , and the scope of the preferred embodiments of the invention includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present invention belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，″计算机可读介质″可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如，如果用硬件来实现和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.

此外，在本发明各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present invention have been shown and described above, it should be understood that the above-mentioned embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.

Claims

1. a vehicle high-definition map data source selection method, is characterized in that, comprises:

constructing a vehicle high-definition map data source selection network, wherein the vehicle high-definition map data source selection network includes an offline training network and an online selection network;

Using the state information of different existing vehicle-mounted high-definition map data sources as a training data set, the offline training network is trained, and after the training is completed, the network parameters of the offline training network are applied to the online selection network;

During the driving of the autonomous vehicle, the real-time received status information of multiple on-board high-definition map data sources is input into the online selection network after training, and the output result is the selection result of on-board high-definition map data sources.

2. The vehicle-mounted high-definition map data source selection method according to claim 1, wherein the offline training network and the online selection network of the vehicle-mounted high-definition map data source selection network adopt DDQN neural network; The steps for network training include:

Collect the vehicle high-definition map data source and extract state information through the collector of the offline training network, and divide the training set and the test set as the training data of the offline training network;

constructing the offline training network; the offline training network includes two reinforcement learning networks DQN, and the two reinforcement learning networks DQN are trained synchronously;

Input the training set into two reinforcement learning networks DQN for training, optimize and update the parameters of the first reinforcement learning network DQN, find the action with the largest Q value in the first reinforcement learning network DQN, and use the second reinforcement The learning network DQN calculates the Q value that meets the requirements;

Construct the loss function of the offline training network to characterize the difference between the Q value of real-time learning and the target Q value, determine that the offline training network is trained when the loss function is the smallest, and input the test set into the offline training network that has been trained for verification. training accuracy.

3. The method for selecting a vehicle high-definition map data source according to claim 2, wherein after the offline training network training is completed, the network parameters of the offline training network are applied to the online selection network; After the vehicle high-definition map data source selects the data source, the experience information is generated and sent to the offline training network for the training iteration of the offline training network.

4. The method for selecting a vehicle high-definition map data source according to claim 1, wherein the step of receiving the status information of a plurality of vehicle-mounted high-definition map data sources in real time comprises:

The autonomous vehicle sends out probe interest packets to find potential in-vehicle high-definition map data sources and status information;

After receiving the vehicle high-definition map data source, it is filtered through the filter of the online selection network to obtain the vehicle high-definition map data source set, which is used as the input of the online selection network.

5. vehicle high-definition map data source selection method according to claim 2, is characterized in that, the action formula that described first reinforcement learning network DQN searches for maximum Q value is expressed as:

a ^max (s', ω)=argmax _a' Q(s', a, ω) (1)

Among them, s represents the state, a represents the action, and ω is the weight parameter;

Use the action a ^max (s', ω) to calculate in the second reinforcement learning network DQN to obtain the target Q value, and the formula is expressed as:

y=r+γQ′(s′, argmax _a′ Q(s′, a, ω), ω ⁻ ) (2)

where γ is the discount factor and r is the reward.

6. vehicle high-definition map data source selection method according to claim 2, is characterized in that, the loss function formula of offline training network is expressed as:

Among them, m represents the number of training times;

Use the balanced update method to update the weight of the second reinforcement learning network DQN, and the smooth update calculation is shown in formula (4):

ω ^- ←l*ω+(1-l)ω ^- (4)

Among them, l represents the update rate, l<<1, ω ^- is the weight parameter of the second reinforcement learning network DQN.

7. The method for selecting a vehicle high-definition map data source according to claim 1, wherein before selecting the vehicle-mounted high-definition map data source, it comprises the step of triggering the vehicle-mounted high-definition map data source selection; and judging to trigger the vehicle-mounted high-definition map data source selection The reasons include at least: the vehicle selects a data source for the self-driving vehicle during initialization; or the quality of the connected link deteriorates or a new data source is selected for the self-driving vehicle when the connection is disconnected;

When determining whether the quality of the connected link deteriorates or the connection is disconnected, calculate the packet loss rate of the current autonomous vehicle connection data source, use the current timestamp as the random seed, and use the random function to calculate the hit switching probability, which is expressed as:

Generate random numbers within 100 by the rand() function, if

It means that the probability hits the execution switch data source; the formula is expressed as:

Among them, H represents the data source switching judgment flag, P _max represents the maximum packet loss rate, that is, when the link packet loss rate is greater than the maximum packet loss rate, the probability calculation method is no longer used for switching judgment, and the data source switching is performed directly.

8. The method for selecting a vehicle high-definition map data source according to claim 4, wherein after receiving the vehicle-mounted high-definition map data source, in the step of screening by a filter of the line selection network, comprising:

The selector of the line selection network receives the state information s _{t of the i-th data source at time t, i} =(N _{t, i} , M _{t, i} , V _{t, i} , D _{t, i} ), where N _{t, i} represents the round-trip time RTT between the data source and the vehicle of the i-th data source at time t; M _{t, i} represents the time interval for the i-th data source to send data packets; V _{t, i} represents the i-th data source at t Driving speed at time; D _{t, i} represents the distance between the i-th data source and the vehicle sending the request at time t;

N _{t, i} represents the smoothed RTT value, the smaller the RTT, the smaller the round-trip delay of the data source, and the better the network performance; when the RTT value from the same data source is obtained multiple times, the RTT smoothing process is more reliable. High, you can use the smoothing method in the Jacobson/Karels algorithm, and the calculation formula is as follows:

N _t,i =u*N _t-1,i +e*(R _t,i -N _t-1,i ) (7)

Among them, R _{t, i} represents the currently observed instantaneous RTT value, u=1, e=0.125;

M _{t, i} represents the smoothing interval time. In the case of the same bandwidth, the larger the interval is, the more idle the data source is and the larger the remaining available bandwidth; on the contrary, the smaller the interval is, the smaller the remaining available bandwidth of the data source is. The calculation formula is: :

M _t,i =(1-σ)*M _t-1,i +σ*(Data _t,i -Data _t-1,i ) (8)

Among them, σ=0.5, Data _{t, i} represents the current state information data sending time, Data _{t-1, i} represents the last state information data sending time, the subtraction of the two is the time interval;

V _t,i indicates the speed of the vehicle, V _{t, i} > 0 indicates that the corresponding data source and the requesting automatic driving vehicle travel in the same direction, V _{t, i} < 0 indicates that the corresponding data source and the requesting automatic driving vehicle travel in the opposite direction; the lower the speed, the The more stable the data source;

D _t,i represents the distance between the i-th data source and the autonomous vehicle that sends the request at time t, D _{t, i} > 0 indicates that the corresponding data source is in front of the autonomous vehicle, and D _{t, i} < 0 indicates that the corresponding data source is in Behind the autonomous vehicle; the closer the distance, the higher the stability of the data source;

Use the different status information contained as a basis for sorting, and filter the data sources with the best status; each of the 4 statuses can select an optimal data source, and the 4 best data sources contain a total of 4 groups of 16 status information data. , as the online selection network input.

9. The method for selecting a vehicle high-definition map data source according to claim 8, characterized in that, in the step of using the different state information contained in sequence as a basis for sorting, and screening the data source with the best state, comprising:

Calculate the optimal value for each state, i.e.

Compute the score for choosing a data source when performing action a _t :

Among them, Max{G _t _{, i} } represents the highest score of the data source i selected when performing the at action in the s _t state;

By adjusting the network parameters of the online selection network, the state parameters are normalized, the value range of the action parameters is adjusted, the mapping relationship between the data source and the state parameter value is constructed, and the data source with the highest score is determined as the final selection result.

It also includes the step of setting the reward corresponding to the data source; the reward function is expressed as:

in,

represents the link throughput,

represents the link duration,

0<<φ≤0.5.

10. A vehicle high-definition map data source selection device, characterized in that, comprising:

Network building module: used to construct a vehicle high-definition map data source selection network, and the vehicle high-definition map data source selection network includes an offline training network and an online selection network;

The network training module is used to use the state information of different existing vehicle high-definition map data sources as training data sets to train the offline training network, and after the training is completed, apply the network parameters of the offline training network to the online training network. select network;

The data source selection module is used to input the status information of multiple in-vehicle high-definition map data sources received in real time into the online selection network that has been trained during the driving process of the autonomous vehicle, and the output result is the data source of the in-vehicle high-definition map data source. Select the result.