[go: up one dir, main page]

CN116881707B - Autonomous driving models, training methods, devices, and vehicles - Google Patents

Autonomous driving models, training methods, devices, and vehicles Download PDF

Info

Publication number
CN116881707B
CN116881707B CN202310274740.3A CN202310274740A CN116881707B CN 116881707 B CN116881707 B CN 116881707B CN 202310274740 A CN202310274740 A CN 202310274740A CN 116881707 B CN116881707 B CN 116881707B
Authority
CN
China
Prior art keywords
information
real
autonomous driving
layer
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310274740.3A
Other languages
Chinese (zh)
Other versions
CN116881707A (en
Inventor
王凡
黄际洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310274740.3A priority Critical patent/CN116881707B/en
Publication of CN116881707A publication Critical patent/CN116881707A/en
Application granted granted Critical
Publication of CN116881707B publication Critical patent/CN116881707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/09Arrangements for giving variable traffic instructions
    • G08G1/0962Arrangements for giving variable traffic instructions having an indicator mounted inside the vehicle, e.g. giving voice messages
    • G08G1/0967Systems involving transmission of highway information, e.g. weather, speed limits
    • G08G1/096708Systems involving transmission of highway information, e.g. weather, speed limits where the received information might be used to generate an automatic action on the vehicle control
    • G08G1/096725Systems involving transmission of highway information, e.g. weather, speed limits where the received information might be used to generate an automatic action on the vehicle control where the received information generates an automatic action on the vehicle control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Atmospheric Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)

Abstract

本公开提供了一种自动驾驶模型、训练方法、装置和车辆。涉及自动驾驶技术领域。模型包括连接组成端到端神经网络模型的多模态编码层和决策控制层。方法包括:获取真实驾驶数据;基于真实感知信息获取真实自动驾驶策略信息;将真实驾驶数据输入多模态编码层,将多模态编码层的输出输入决策控制层,预测自动驾驶策略信息;可选的,将多模态编码层的输出输入未来预测层,获取车辆周围环境的未来预测信息;基于预测和真实自动驾驶策略信息、未来预测信息,调整自动驾驶模型的参数。由此,能够利用未标注的真实驾驶数据对自动驾驶模型进行训练,保证决策效率,使得自动驾驶行为能够很好对齐到人类乘客的偏好,并且避免冷启动的漫长学习过程。

The present disclosure provides an autonomous driving model, training method, device and vehicle. It relates to the field of autonomous driving technology. The model includes a multimodal coding layer and a decision control layer that are connected to form an end-to-end neural network model. The method includes: obtaining real driving data; obtaining real autonomous driving strategy information based on real perception information; inputting real driving data into the multimodal coding layer, inputting the output of the multimodal coding layer into the decision control layer, and predicting the autonomous driving strategy information; optionally, inputting the output of the multimodal coding layer into the future prediction layer to obtain future prediction information of the vehicle's surrounding environment; adjusting the parameters of the autonomous driving model based on the prediction and real autonomous driving strategy information and future prediction information. As a result, the autonomous driving model can be trained using unlabeled real driving data to ensure decision-making efficiency, so that the autonomous driving behavior can be well aligned to the preferences of human passengers, and the long learning process of cold start can be avoided.

Description

Automatic driving model, training method, training device and vehicle
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the field of autopilot and artificial intelligence technology, and in particular, to a training method for an autopilot model, an autopilot method implemented using an autopilot model, a training apparatus for an autopilot model, an autopilot apparatus based on an autopilot model, an electronic device, a computer-readable storage medium, a computer program product, and an autopilot vehicle.
Background
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
The automatic driving technology integrates the technologies of various aspects such as identification, decision making, positioning, communication safety, man-machine interaction and the like. Automatic driving strategies can be assisted by artificial intelligence learning.
High-precision maps, also called high-precision maps, are maps used by autopilot vehicles. The high-precision map has accurate vehicle position information and rich road element data information, and can help automobiles to predict complex road surface information such as gradient, curvature, heading and the like, so that potential risks are better avoided. In other words, the autopilot technology strongly depends on high-precision maps.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.
Disclosure of Invention
The present disclosure provides a training method of an automatic driving model, an automatic driving method implemented using an automatic driving model, a training apparatus of an automatic driving model, an automatic driving apparatus based on an automatic driving model, an electronic device, a computer-readable storage medium, a computer program product, and an automatic driving vehicle.
According to one aspect of the present disclosure, a method of training an autopilot model is provided. The automatic driving model comprises a multi-mode coding layer and a decision control layer, wherein the multi-mode coding layer and the decision control layer are connected to form an end-to-end neural network model, so that the decision control layer predicts automatic driving strategy information directly based on the output of the multi-mode coding layer. The training method of the automatic driving model comprises the steps of performing first training on the multi-mode coding layer and the decision control layer, wherein the performing first training on the multi-mode coding layer and the decision control layer comprises the following steps: acquiring first real driving data in the running process of a vehicle, wherein the first real driving data comprises first navigation information of the vehicle and first real perception information aiming at the surrounding environment of the vehicle, and the first real perception information comprises current perception information and historical perception information aiming at the surrounding environment of the vehicle; acquiring first real automatic driving strategy information corresponding to the first real driving data based on the first real perception information; inputting first sample input information comprising the first real driving data into the multi-modal coding layer to obtain a first sample implicit representation output by the multi-modal coding layer; inputting first intermediate sample input information comprising an implicit representation of the first sample into the decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer; and adjusting parameters of the multi-mode coding layer and the decision control layer based on the first predicted automatic driving strategy information and the first real automatic driving strategy information.
According to another aspect of the disclosure, an autopilot model obtained by training using the training method described above is provided, including a multi-modal coding layer and a decision control layer, where the multi-modal coding layer and the decision control layer are connected to form an end-to-end neural network model, so that the decision control layer predicts autopilot strategy information directly based on an output of the multi-modal coding layer. The first input information of the multi-modal encoding layer comprises navigation information of a target vehicle and perception information of the surrounding environment of the target vehicle obtained by using a sensor, the perception information comprises current perception information and historical perception information aiming at the surrounding environment of the target vehicle in the driving process of the vehicle, the multi-modal encoding layer is configured to acquire an implicit representation corresponding to the first input information, the second input information of the decision control layer comprises the implicit representation, and the decision control layer is configured to acquire target automatic driving strategy information based on the second input information.
According to another aspect of the present disclosure, there is provided an automatic driving method implemented using an automatic driving model, including: controlling the target vehicle to execute automatic driving by using the automatic driving model; and acquiring real driving data in an automatic driving process, wherein the real driving data comprises navigation information of the target vehicle, real perception information aiming at the surrounding environment of the target vehicle and real automatic driving strategy information, and the real driving data is used for carrying out iterative training on the automatic driving model.
According to another aspect of the disclosure, there is provided a training apparatus of an autopilot model, the autopilot model including a multi-modal coding layer and a decision control layer, the multi-modal coding layer and the decision control layer being connected to form an end-to-end neural network model, such that the decision control layer predicts autopilot strategy information directly based on an output of the multi-modal coding layer. The apparatus is configured to first train the multi-modal coding layer and decision control layer, and includes: a first real driving data acquisition unit configured to acquire first real driving data during running of the vehicle, the first real driving data including first navigation information of the vehicle and first real perception information for a surrounding environment of the vehicle, the first real perception information including current perception information and history perception information for the surrounding environment of the vehicle; a real automatic driving strategy information acquisition unit configured to acquire first real automatic driving strategy information corresponding to the first real driving data based on first real perception information; a multi-modal encoding layer training unit configured to input first sample input information including the first real driving data into the multi-modal encoding layer to obtain a first sample implicit representation output by the multi-modal encoding layer; a decision control layer training unit configured to input first intermediate sample input information including an implicit representation of the first sample into the decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer; and a parameter adjustment unit configured to adjust parameters of the multi-modal encoding layer and the decision control layer based on the first predicted automatic driving strategy information and the first real automatic driving strategy information.
According to another aspect of the present disclosure, there is provided an automatic driving apparatus based on an automatic driving model, including: a control unit configured to control the target vehicle to perform automatic driving using the above-described automatic driving model; and a second real driving data acquisition unit configured to acquire real driving data during automatic driving, the real driving data including navigation information of the target vehicle, real perception information for a surrounding environment of the target vehicle, and real automatic driving strategy information, the real driving data being used for iterative training of the automatic driving model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described method.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the above method.
According to another aspect of the present disclosure, there is provided an autonomous vehicle including: one of a training apparatus of an autopilot model, an autopilot apparatus, and an electronic device according to embodiments of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.
FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with an embodiment of the present disclosure;
FIG. 2 shows a schematic diagram of an autopilot model in accordance with an embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of a method of training an autopilot model in accordance with an embodiment of the present disclosure;
FIG. 4 illustrates a flow chart of a training method portion process of an autopilot model in accordance with an embodiment of the present disclosure;
FIG. 5 illustrates a flow chart of a training method portion process of an autopilot model in accordance with an embodiment of the present disclosure;
FIG. 6 illustrates a flow chart of a method of training an autopilot model in accordance with another embodiment of the present disclosure;
FIG. 7 illustrates a flow chart of an autopilot method in accordance with an embodiment of the present disclosure;
FIG. 8 illustrates a flow chart of an autopilot method according to another embodiment of the present disclosure;
FIG. 9 shows a block diagram of a training device of an autopilot model in accordance with an embodiment of the present disclosure;
FIG. 10 illustrates a block diagram of an autopilot in accordance with an embodiment of the present disclosure; and
Fig. 11 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another element. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.
The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
In the related art, optimization and rule-based algorithms in autopilot technology typically rely on high-precision maps and algorithm optimization for different scenarios. High-precision maps, also called high-precision maps, are maps used by autopilot vehicles. The high-precision map has accurate vehicle position information and rich road element data information, and can help automobiles to predict complex road surface information such as gradient, curvature, heading and the like, so that potential risks are better avoided. Accordingly, the application of algorithms is limited to very localized areas, may fail in autopilot due to map errors, and is difficult to address in a large number of long tail situations. Furthermore, the algorithms in the related art rely on a large amount of manual labeling, which on the one hand consumes a large amount of manual effort, and on the other hand aims at perception. For example, there is a lot of background information during driving, as well as remote obstacles not related to driving (e.g. non-motor vehicles on opposite lanes). In automatic labeling for perception purposes, it is difficult for labeling personnel to determine which obstacles should be identified, which should not be focused on, and it is difficult to directly service policy optimization and driving decisions for automatic driving.
In the related art, the unmanned technique mainly relies on the cooperation of the perception module and the planning control module. The working process of autopilot comprises two phases: first, unstructured information obtained by a sensor such as a camera or radar is converted into structured information (structured information includes obstacle information, other vehicle information, pedestrian and non-motor vehicle information, lane line information, traffic light information, other static road surface information, and the like). The information can be combined and matched with the high-precision map, so that the position information on the high-precision map can be accurately obtained. Second, predictions and decisions are made based on structured information and related observation histories. Wherein predicting comprises predicting a change in the ambient structured environment over a period of time in the future; decisions include generating some structured information (e.g., lane change, stuffing, waiting) that can be used for subsequent trajectory planning. Third, a trajectory of the target vehicle for a future period of time is planned, such as a planned trajectory or control information (e.g., planned speed and position), based on the structured decision information and the change in the surrounding structured environment.
It has been found through research that awareness-prediction-planning-based autopilot technology may face some technical problems. First of all, the problem of error accumulation is that perception is not directly responsible for decision making, which makes it not necessary for perception to capture information critical to decision making, and further, because perceived errors are difficult to make up in subsequent flows (e.g., obstacles within an area may not be identified), which may have difficulty making a correct decision in the event of loss of a critical obstacle. Secondly, the problem of coupling between prediction and planning cannot be solved, and the behavior of surrounding obstacles, especially critical obstacles interacting with the target vehicle, may be affected by the target vehicle. In other words, during the running of the autopilot model, there is a coupling between the two modules, prediction and planning, so that the streaming decisions have an impact on the final autopilot effect. Furthermore, there is the problem of representing defects in the structured information, which is entirely limited by manually predefined criteria, and algorithms are prone to failure once a new paradigm is encountered that is not well defined (e.g., the occurrence of unknown obstructions, unknown conditions of the vehicle and pedestrian, etc.). Finally, the problem of dependence on high-cost maps (such as high-precision maps) is solved, and the related technology mainly relies on information such as point clouds of the high-precision maps to position vehicles, however, in practice, the high-precision maps are only available in limited areas, which limits the practical application area of automatic driving; in addition, the updating cost of the high-precision map is huge, and once the map and an actual road are not matched, decision failure is easy to cause.
Based on the above, the present disclosure provides a training method of an autopilot model, an autopilot method implemented by using the autopilot model, a training apparatus of the autopilot model, an autopilot apparatus based on the autopilot model, an electronic device, a computer-readable storage medium, a computer program product, and an autopilot vehicle, and a perception-decision integrated driving technique is adopted, so that perception is directly responsible for decision making, perception is facilitated to capture information playing a key role in decision making, error accumulation is reduced, and a coupling problem between prediction and decision making in related technologies is solved. In addition, the perception is directly responsible for decision making, so that the problem that an algorithm is easy to fail due to the fact that structured prediction information is limited by a manual predefined standard can be solved, the problem that decision making is failed due to the fact that high-precision map updating is not timely and the area is limited can be solved, and the updating cost of the high-precision map can be saved due to the fact that dependence on the high-precision map is eliminated. In addition, under the condition that the obtained real driving data is marked with limited real automatic driving strategy information, the real automatic driving strategy information corresponding to the real driving data can be obtained based on the perception information in the real driving data, the model training process is correspondingly completed, the automatic driving technology of the heavy perception light map is realized, a large amount of real driving data is adopted to train the automatic driving model, the decision efficiency can be ensured, the automatic driving behavior can be well aligned to the preference of human passengers, the user experience is improved, and the long learning process of cold starting is avoided.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes a motor vehicle 110, a server 120, and one or more communication networks 130 coupling the motor vehicle 110 to the server 120.
In an embodiment of the present disclosure, motor vehicle 110 may include a computing device in accordance with an embodiment of the present disclosure and/or be configured to perform a method in accordance with an embodiment of the present disclosure.
The server 120 may run one or more services or software applications that enable autopilot. In some embodiments, server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user of motor vehicle 110 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.
The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.
In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from motor vehicle 110. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of motor vehicle 110.
Network 130 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, the one or more networks 130 may be a satellite communications network, a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (including, for example, bluetooth, wiFi), and/or any combination of these with other networks.
The system 100 may also include one or more databases 150. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 150 may be used to store information such as audio files and video files. The data store 150 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 150 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.
In some embodiments, one or more of databases 150 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.
Motor vehicle 110 may include a sensor 111 for sensing the surrounding environment. The sensors 111 may include one or more of the following: visual cameras, infrared cameras, ultrasonic sensors, millimeter wave radar, and laser radar (LiDAR). Different sensors may provide different detection accuracy and range. The camera may be mounted in front of, behind or other locations on the vehicle. The vision cameras can capture the conditions inside and outside the vehicle in real time and present them to the driver and/or passengers. In addition, by analyzing the captured images of the visual camera, information such as traffic light indication, intersection situation, other vehicle running state, etc. can be acquired. The infrared camera can capture objects under night vision. The ultrasonic sensor can be arranged around the vehicle and is used for measuring the distance between an object outside the vehicle and the vehicle by utilizing the characteristics of strong ultrasonic directivity and the like. The millimeter wave radar may be installed in front of, behind, or other locations of the vehicle for measuring the distance of an object outside the vehicle from the vehicle using the characteristics of electromagnetic waves. Lidar may be mounted in front of, behind, or other locations on the vehicle for detecting object edges, shape information for object identification and tracking. The radar apparatus may also measure a change in the speed of the vehicle and the moving object due to the doppler effect.
Motor vehicle 110 may also include a communication device 112. The communication device 112 may include a satellite positioning module capable of receiving satellite positioning signals (e.g., beidou, GPS, GLONASS, and GALILEO) from satellites 141 and generating coordinates based on these signals. The communication device 112 may also include a module for communicating with the mobile communication base station 142, and the mobile communication network may implement any suitable communication technology, such as the current or evolving wireless communication technology (e.g., 5G technology) such as GSM/GPRS, CDMA, LTE. The communication device 112 may also have a Vehicle-to-Everything (V2X) module configured to enable, for example, vehicle-to-Vehicle (V2V) communication with other vehicles 143 and Vehicle-to-Infrastructure (V2I) communication with Infrastructure 144. In addition, the communication device 112 may also have a module configured to communicate with a user terminal 145 (including but not limited to a smart phone, tablet computer, or wearable device such as a watch), for example, by using a wireless local area network or bluetooth of the IEEE802.11 standard. With the communication device 112, the motor vehicle 110 can also access the server 120 via the network 130.
Motor vehicle 110 may also include a control device 113. The control device 113 may include a processor, such as a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), or other special purpose processor, etc., in communication with various types of computer readable storage devices or mediums. The control device 113 may include an autopilot system for automatically controlling various actuators in the vehicle. The autopilot system is configured to control a powertrain, steering system, braking system, etc. of a motor vehicle 110 (not shown) via a plurality of actuators in response to inputs from a plurality of sensors 111 or other input devices to control acceleration, steering, and braking, respectively, without human intervention or limited human intervention. Part of the processing functions of the control device 113 may be implemented by cloud computing. For example, some of the processing may be performed using an onboard processor while other processing may be performed using cloud computing resources. The control device 113 may be configured to perform a method according to the present disclosure. Furthermore, the control means 113 may be implemented as one example of a computing device on the motor vehicle side (client) according to the present disclosure.
The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.
According to one aspect of the present disclosure, a method of training an autopilot model is provided. FIG. 2 shows a schematic diagram of an autopilot model 200 in accordance with an embodiment of the present disclosure; and fig. 3 shows a flowchart of a training method 300 of an autopilot model in accordance with an embodiment of the present disclosure.
Referring first to fig. 2, the autopilot model 200 includes a multi-modal encoding layer 210 and a decision control layer 220, the multi-modal encoding layer 210 and the decision control layer 220 being connected to form an end-to-end neural network model such that the decision control layer 220 predicts autopilot strategy information directly based on the output of the multi-modal encoding layer 210.
As described above, in the related art, prediction may be performed based on the sensing information to obtain future prediction information, and then the decision control layer performs planning prediction based on the future prediction information, that is, the decision control layer 220 performs planning prediction based on the future prediction information instead of directly performing planning prediction based on the sensing information. In the embodiment of the present application, the decision control layer 220 may directly predict the automatic driving strategy information based on the output of the multi-mode coding layer 210, and the multi-mode coding layer 210 is used for performing coding calculation on the perception information, which is equivalent to that the decision control layer 220 may directly plan based on the perception information to predict the automatic driving strategy information. In other words, the training method in the embodiment of the application can learn the automatic driving technology with perception directly responsible for decision making.
The training method 300 of the autopilot model includes a first training of the multimodal coding layer 210 and the decision control layer 220. As shown in fig. 3, the first training includes:
Step S310, acquiring first real driving data in the running process of the vehicle, wherein the first real driving data comprises first navigation information of the vehicle and first real perception information aiming at the surrounding environment of the vehicle, and the first real perception information comprises current perception information and historical perception information aiming at the surrounding environment of the vehicle;
Step S320, acquiring first real automatic driving strategy information corresponding to the first real driving data based on the first real perception information;
Step S330, inputting first sample input information comprising first real driving data into the multi-mode coding layer to obtain a first sample implicit representation output by the multi-mode coding layer;
Step S340, inputting first intermediate sample input information including implicit representation of a first sample into a decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer; and
And step S350, adjusting parameters of the multi-mode coding layer and the decision control layer based on the first prediction automatic driving strategy information and the first real automatic driving strategy information.
In an example, the first training may be offline pre-training, i.e. in the first training process, the autonomous driving model 200 is not deployed on a real vehicle travelling on a real road scene, but rather the model is trained with a large amount of real driving data collected, avoiding a cold-start lengthy learning process.
The first real driving data may include driving data collected during unmanned driving, and/or driving data collected by a human driver driving a vehicle having associated sensors. Some of these driving data are not annotated with true autopilot information, such as trajectories or control signals (e.g., throttle, brake, steering amplitude, etc.), and cannot be used for model training. According to the embodiment of the application, corresponding automatic driving strategy information is acquired based on the perception information obtained by the sensor in the driving data, so that the false labeling of the corresponding real automatic driving strategy information of the driving data is realized.
In an example, the autopilot model 200 may employ a transducer network structure with an encoder (Encoder) and a Decoder (Decoder). It is understood that the autopilot model 200 may be another neural network model based on a transducer network structure, which is not limited herein. The transducer architecture can compute implicit representations of model inputs and outputs through a self-attention mechanism. In other words, the transducer architecture may be a Encoder-Decoder model built based on this self-attention mechanism.
In an example, the first navigation information In1 of the vehicle In the first real driving data may include vectorized navigation information and vectorized map information, which may be obtained by vectorizing one or more of lane-level, or road-level navigation information and coarse positioning information.
In an example, the first real perception information (which may, for example, but not limited to, include In2, in3, and In4, and is described below by taking the example that the perception information includes In2, in3, and In 4) for the surrounding environment of the vehicle In the first real driving data may include perception information In2 of one or more cameras on the vehicle, perception information In3 of one or more laser radars, and perception information In4 of one or more millimeter wave radars. It is to be understood that the perception information of the surroundings of the vehicle is not limited to the above-described one form, and may include, for example, only the perception information In2 of the plurality of cameras, but not the perception information In3 of the one or more lidars and the perception information In4 of the one or more millimeter wave radars. The sensing information In2 acquired by the camera may be sensing information In the form of a picture or a video, and the sensing information In3 acquired by the laser radar may be sensing information In the form of a radar point cloud (e.g., a three-dimensional point cloud). In an example, the perceived information includes current perceived information x t for the surrounding environment of the target vehicle during the running of the vehicle and historical perceived information x t-Δt corresponding to a plurality of historical moments, where a time span between t and Δt may have a preset duration.
In an example, the multimodal encoding layer 210 may perform encoding calculations on the first real driving data to generate a corresponding implicit representation. The implicit representation may be, for example, an implicit representation in a Bird's Eye View (BEV) space. For example, the perception information of the cameras can be input to a shared Backbone network (Backbone) first, and the data characteristics of each camera can be extracted. The perceived information of the plurality of cameras is then fused and converted to BEV space. Then, cross-modal fusion can be performed in the BEV space, and the pixel-level visual data and the lidar point cloud are fused. Finally, time sequence fusion is carried out to form an implicit representation e t of the BEV space.
In one example, projection of the input information of multiple cameras into an implicit representation of the BEV space may be achieved using a Transformer Encoder structure that fuses the spatio-temporal information. For example, the spatio-temporal information may be utilized by a grid-partitioned BEV query mechanism (BEV queries) that presets parameters. The BEV query mechanism is enabled to extract features from multiple camera views of interest by using a spatial cross-attention mechanism (i.e., the BEV query mechanism extracts required spatial features from multiple camera features through the attention mechanism), thereby aggregating spatial information; in addition, the historical information is fused by a time-series self-attention mechanism (i.e., each time-series generated BEV feature obtains the required time-series information from the BEV feature at the previous time), thereby aggregating the time-series information.
Accordingly, the decision control layer 220 obtains the first predictive autopilot strategy information based on the implicit representation e t of the input. The first predicted automatic driving strategy information may include, for example, a planned trajectory Out1 or a control signal Out2 for the vehicle (e.g., a signal to control throttle, brake, steering amplitude, etc.). In an example, the decision control layer 220 may include a decoder in a transducer.
Since the multi-modal coding layer 210 and the decision control layer 220 of the model to be trained are connected to form an end-to-end neural network model, the perception information in the sample input information (including the real driving data) can be directly responsible for the decision, and the problem of coupling between prediction and planning in the related technology can be solved. In addition, the corresponding implicit representation is obtained by encoding and calculating the real driving data, so that the problem that the algorithm is easy to fail due to the representation defect of the structured information in the related technology can be solved. In addition, as the perception information in the sample input information can be directly responsible for decision making, the perception can capture information which is critical to the decision making through training, and error accumulation caused by perception errors in model training is reduced. Furthermore, as the perception is directly responsible for decision, the automatic driving technology of heavy perception light map is realized, and further the problem of decision learning failure caused by untimely updating of the high-precision map and limited area can be solved, and the dependence on the high-precision map is eliminated, so that the updating cost of the high-precision map can be saved.
In addition, because the first real automatic driving strategy information is acquired based on the first real perception information, under the condition that the sample input information is marked with the real automatic driving strategy information to be limited, the first real automatic driving strategy information corresponding to the first real driving data is acquired through the perception information in the sample number input information, and the model training process is correspondingly completed. In other words, when training the autopilot model, if there is no or only a small amount of vehicle trajectory data or control signal data for the vehicle in the sample data, the first real autopilot strategy information (e.g., pseudo-labeling trajectory data) can be acquired based on the first real awareness information, thereby completing the model training process. Therefore, the automatic driving technology of the heavy perception light map can be realized, and the decision efficiency can be ensured by training the automatic driving model by adopting a large amount of real driving data, so that the automatic driving behavior can be well aligned to the preference of human passengers, the user experience and the safety are improved, and the long learning process of cold start is avoided.
According to some embodiments, the step S320 may include: the first real awareness information is input to a driving strategy prediction model (not shown in the drawing) to obtain first real automatic driving strategy information output by the driving strategy prediction model.
In an example, first real awareness information (x 1,....,xt) (e.g., sensor awareness information) may be input to a driving strategy prediction model to predict a corresponding trajectory plan (y 1,....,yt). The predicted trajectory plan (y 1,....,yt) can be used as first real automatic driving strategy information in the process of training the multi-mode coding layer and the decision control layer, so that the pseudo-annotation of the first real driving data is realized.
Because the first real perception information (for example, an image acquired by a camera or a point cloud acquired by a radar) comprises current perception information and historical perception information, automatic driving strategy information (for example, a driving track) of an automatic driving vehicle is hidden between the current perception information and the historical perception information, a driving strategy prediction model can be trained based on a small amount of data labels (namely, driving data marked with the real track), a corresponding track planning (y 1,....,yt) can be predicted based on the first real perception information (x 1,....,xt), and therefore the first real perception information without the track labels is marked with a pseudo-label track label.
It will be appreciated that the driving strategy prediction model may be a model independent of the autopilot model 200.
According to some embodiments, the autopilot model 200 may also include an assessment feedback layer 230. And the first training of the multi-modal coding layer 210 and the decision control layer 220 may further include: the first sample implicit representation e t is input to the evaluation feedback layer 230 to obtain first sample evaluation feedback information Out3 for the first predicted automatic driving strategy information output by the evaluation feedback layer 230. And, the step S350 may include: parameters of the multimodal encoding layer 210 and the decision control layer 220 are adjusted based on the first sample evaluation feedback information Out3 for the first predicted automatic driving strategy information, the first predicted automatic driving strategy information and the first real automatic driving strategy information.
In an example, the evaluation feedback layer 230 may include a decoder in a transducer.
Thus, by introducing the evaluation feedback layer 230 into the automatic driving model 200, it is possible to learn whether the current driving behavior is derived from a human driver or a model, whether the current driving is comfortable, whether the current driving violates traffic rules, whether the current driving belongs to dangerous driving, and the like, thereby improving user experience. In addition, when model training is performed, the multi-modal coding layer 210 can be further trained by the evaluation feedback layer 230 on the basis of the decision control layer 220, so that the coding of the multi-modal coding layer 210 is more accurate, and the decision control layer 220 obtained by training can predict and obtain more optimized automatic driving strategy information.
In an example, parameters of the multi-modal coding layer and the decision control layer may be adjusted using reinforcement learning. For example, the first actual automatic driving strategy information may be based on information including the first predicted automatic driving strategy information (y 1,....,yt)And the first sample evaluation feedback information (r 1,....,rt) performs reinforcement learning.
In an example, the reinforcement learning may be performed using a PPO algorithm or a SAC algorithm.
In an example, the parameters of the multi-mode coding layer and the decision control layer may be adjusted using the objective function in the following equation (1):
Where a t may indicate a dominance function for time t (ADVANTAGE FUNCTION) and a t may be derived based on the first sample evaluation feedback information (r 1,....,rt). Alpha may be a super parameter for adjusting the magnitude of the loss value.
When reinforcement learning training is performed on a real vehicle, the autopilot model may be required to predict some erroneous or failed results, and even the target vehicle may be required to collide with surrounding obstacles to learn based on erroneous or collision experience. However, due to cost and safety considerations, it is not possible to have an autonomous vehicle collide realistically during real vehicle travel.
According to some embodiments, the first real driving data may further comprise a first intervention identification, the first intervention identification being capable of characterizing whether the first real automatic driving maneuver information is driving maneuver information with human intervention. And the step S350 may include: based on the first intervention identification (i 1,....,iT), the first sample evaluation feedback information (r 1,....,rt) for the first predicted automatic driving strategy information, the first predicted automatic driving strategy information (y 1,....,yt) and the first real automatic driving strategy informationParameters of the multi-modal coding layer 210 and the decision control layer 220 are adjusted.
In the automatic running process of the real vehicle, a safety person/driver can intervene at any time at critical time, take control right of the automatic driving vehicle, and avoid unacceptable model training cost caused by possible collision during running of the real vehicle. After the crisis passes, control is returned to the autonomous vehicle. The first intervention identification is used to characterize whether the first real autopilot strategy information is autopilot strategy information in the presence of human intervention. In other words, by introducing the first intervention mark, the model can learn an automatic driving strategy of intervention of a safety officer, the driving behavior learned by the model can be well aligned to the preferences of human passengers, and the user experience and safety are improved. Reinforcement learning of the circuit by a person can gradually learn to continuously reduce the adverse conditions of intervention. Through the mechanism, the reinforcement learning efficiency can be improved, and the influence of the inferior experience on the learning process can be reduced, so that the robustness of the model obtained through training is further improved.
FIG. 4 shows a flow chart of a training method portion process of an autopilot model in accordance with an embodiment of the present disclosure. According to some embodiments, as shown in fig. 4, the training process of evaluating feedback layer 230 may include:
Step S410, acquiring second sample input information and real evaluation feedback information aiming at the second sample input information, wherein the sample input information comprises navigation information of a sample vehicle, current perception information and historical perception information aiming at the surrounding environment of the sample vehicle;
Step S420, inputting second sample input information into the multi-mode coding layer to obtain second sample implicit expression output by the multi-mode coding layer;
Step S430, the second sample is implicitly input into the evaluation feedback layer 230 to obtain the predicted evaluation feedback information for the second sample input information output by the evaluation feedback layer 230; and
Step S440, adjusting parameters of the multi-mode coding layer and the evaluation feedback layer 230 based on the real evaluation feedback information and the predicted evaluation feedback information.
The second sample input information may be an input sample acquired by the autonomous vehicle during autonomous driving (e.g., L4 level autonomous driving) or during manual driving, or may be acquired in a simulation environment. For example, the second sample input information may include sensor (e.g., camera, radar) perception information, and navigation information, as well as other information such as lane-level maps.
True evaluation feedback informationThe evaluation feedback information (evaluation of the driving experience of the automatically driven vehicle by the passenger or the driver) that can be manually fed back, for example, can indicate whether the current driving behavior is derived from a human driver or a model, whether the current driving is comfortable, whether the current driving violates traffic regulations, whether the current driving belongs to dangerous driving, and the like.
Accordingly, the predictive evaluation feedback information (r t) is the prediction result output by the evaluation feedback layer 230.
In an example, the parameters of the multi-mode encoding layer 210 and the evaluation feedback layer 230 may be adjusted using the objective function in equation (2) as follows:
in an example, feedback modeling may be utilized to learn a function to estimate the evaluation feedback information. In other words, the model itself may be made to estimate the expected benefit (i.e., the predicted outcome output by the above-mentioned assessment feedback layer 230) obtained by the current driving trajectory. For example, (r t) can be determined using the following equation (3):
r t=R(xt,…,xt-l+1) equation (3)
Where (x t,…,xt-l+1) may be the second sample input information.
Fig. 5 shows a flowchart of a training method portion procedure of an automatic driving model according to an embodiment of the present disclosure. According to some embodiments, the autopilot model 200 may further include a future prediction layer, and the first real awareness information may include future awareness information for the vehicle surroundings. And, as shown in fig. 5, performing the first training on the multi-mode coding layer and the decision control layer may further include:
Step S510, acquiring future real information aiming at the surrounding environment of the vehicle based on the future perception information;
Step S520, the first sample is implicitly expressed and input into a future prediction layer so as to acquire future prediction information output by the future prediction layer; and
Step S530, parameters of the multi-mode coding layer and the future prediction layer are adjusted based on the future real information and the future prediction information.
The future perceptual information x t+Δt may be the perceptual information that follows the current perceptual information x t, and there may be a time span of a preset duration between t and Δt. Accordingly, the future actual information may include information with labels (e.g., detection boxes) corresponding to the future perceived information.
Future prediction layers may include decoders in the transform.
Therefore, when model training is performed, the multi-modal coding layer 210 can be further trained through the future prediction layer on the basis of the decision control layer 220, so that the multi-modal coding layer 210 is encoded more accurately, and the decision control layer 220 can predict and obtain more optimized automatic driving strategy information.
According to some embodiments, the future prediction layer and the decision control layer may share the same network structure, i.e. the output of the network structure may comprise future prediction information and autopilot strategy information. Illustratively, the future prediction layer and the decision control layer may share the same decoder in the transform.
According to some embodiments, the future prediction information may include at least one of: future predictive awareness information for the surrounding environment of the vehicle (e.g., sensor information at some point in the future)The sensor information of a future moment comprises camera input information or radar input information of the future moment); and future prediction implicit representation corresponding to future prediction awareness information(E.g., an implicit representation of sensor information corresponding to a future point in time in BEV space).
According to some embodiments, with further reference to fig. 2, the future prediction layers may include at least one of a first future prediction layer 240 and a second future prediction layer 250. The first future prediction layer 240 may be configured to represent output future prediction awareness information Out4 (e.g., sensor information at some time in the future) based on the input first sample implicit e t The sensor information of a future moment comprises camera input information or radar input information of the future moment); the second future prediction layer 250 may be configured to output a future prediction implicit representation out5 based on the input first sample implicit representation e t Such as an implicit representation of BEV space at some point in the future).
Fig. 6 illustrates a flow chart of a method 600 of training an autopilot model in accordance with another embodiment of the present disclosure. The training method 600 of the automatic driving model includes a step of first training the multi-modal coding layer 210 and the decision control layer 220 with the training method 300 of the automatic driving model, and according to some embodiments, the training method 600 of the automatic driving model may further include:
Step S610, obtaining second real driving data in the process of controlling the target vehicle to execute the automatic driving by using the automatic driving model obtained by the first training, wherein the second real driving data comprises second navigation information of the target vehicle, second real perception information aiming at the surrounding environment of the target vehicle and second real automatic driving strategy information And
And performing second training on the automatic driving model based on second real driving data. The second training includes the following steps S621 to S623:
step S621, inputting third sample input information comprising second real driving data into the multi-mode coding layer to obtain a third sample implicit representation output by the multi-mode coding layer;
Step S622, inputting second intermediate sample input information including the implicit representation of the third sample into the decision control layer to obtain second predicted automatic driving strategy information (y 1,....,yt) output by the decision control layer; and
Step S623, based on the second predicted automatic driving strategy information (y 1,....,yt) and the second actual automatic driving strategy informationParameters of the multi-mode coding layer and the decision control layer are adjusted.
In an example, the first training may be offline pre-training, i.e., during the first training process, the autopilot model 200 is not deployed on a real vehicle traveling on a real road scene or a simulated vehicle traveling on a simulated road scene. Accordingly, the second training may be training performed by acquiring second real driving data collected during driving of the vehicle controlled by the autopilot model obtained by the first training, that is, during the second training, the autopilot model 200 is deployed on a real vehicle traveling on a real road scene or a simulated vehicle traveling on a simulated road scene.
In the second training process, the second navigation information of the target vehicle in the second real driving data may include vectorized navigation information and vectorized map information, and the vectorized navigation information and the vectorized map information may be obtained by vectorizing one or more of lane-level or road-level navigation information and coarse positioning information. The second real perception information may include perception information of one or more cameras, perception information of one or more lidars, and perception information of one or more millimeter wave radars on the vehicle in the real road scene. It is to be understood that the perception information of the surroundings of the target vehicle is not limited to the above-described one form, and may include, for example, only the perception information of a plurality of cameras, but not the perception information of one or more lidars and the perception information of one or more millimeter wave radars. The perceived information obtained by the camera may be perceived information in the form of a picture or video, and the perceived information obtained by the lidar may be perceived information in the form of a radar point cloud (e.g., a three-dimensional point cloud). Second real autopilot strategy informationMay include planned trajectories of autonomous vehicles or control signals for the vehicle (e.g., signals to control throttle, brake, steering amplitude, etc.) acquired in a real road scene.
Therefore, the automatic driving model can train in a real road scene, a simulated road scene and an offline pre-training scene, so that training of mass data and multiple scenes is realized, and the accuracy of a model training result is further improved while the model training efficiency is improved.
According to some embodiments, when the autopilot model includes the assessment feedback layer 230, second training the autopilot model based on the second real driving data may further include: the third sample is implicitly represented and input to the evaluation feedback layer 230 to obtain second sample evaluation feedback information (r 1,....,rt) for the second predicted automatic driving strategy information output by the evaluation feedback layer 230. Wherein the feedback information (r 1,....,rt), the second predicted automatic driving strategy information (y 1,....,yt), and the second actual automatic driving strategy information are evaluated based on a second sample for the second predicted automatic driving strategy informationParameters of the multi-modal coding layer 210 and the decision control layer 220 are adjusted.
Second real autopilot strategy informationMay be manual driving trajectory data. Accordingly, the second predictive autopilot strategy information (y 1,....,yt) is the predicted outcome (trajectory plan) output by the decision control layer 220.
The second sample evaluation feedback information (r 1,....,rt) may indicate, for example, whether the current driving behavior originates from a human driver or a model, whether the current driving is comfortable, whether the current driving violates traffic rules, whether the current driving belongs to dangerous driving, and the like.
Therefore, by further utilizing the sample evaluation feedback information to perform cooperative parameter adjustment on the multi-mode coding layer 210 and the decision control layer 220, the learning effect of the multi-mode coding layer 210 and the decision control layer 220 can be further improved, so that the user experience is improved.
In an example, parameters of the multi-modal coding layer and the decision control layer may be adjusted using reinforcement learning. For example, the second actual automatic driving strategy information may be based on information including the second predicted automatic driving strategy information (y 1,....,yt)And the second sample evaluation feedback information (r 1,....,rt) performs reinforcement learning.
In an example, the reinforcement learning may be performed using a PPO algorithm or a SAC algorithm.
In an example, the parameters of the multi-mode coding layer 210 and the decision control layer 220 may be adjusted using an objective function in equation (3) as follows:
Where a t may indicate a dominance function for time t (ADVANTAGE FUNCTION) and a t may be derived based on the second sample evaluation feedback information (r 1,....,rt). Alpha may be a super parameter for adjusting the magnitude of the loss value.
According to some embodiments, when the autopilot model includes an assessment feedback layer, the second real driving data may further include a second intervention identifier (i 1,....,iT), the second intervention identifier (i 1,....,iT) being capable of characterizing whether the second real autopilot strategy information is autopilot strategy information with human intervention. Wherein performing the second training of the autopilot model based on the second real driving data may further include: and inputting the implicit representation of the third sample into an evaluation feedback layer to obtain second sample evaluation feedback information which is output by the evaluation feedback layer and aims at the second predicted automatic driving strategy information. Wherein the feedback information (r 1,....,rt), the second predicted automatic driving strategy information (y 1,....,yt) and the second actual automatic driving strategy information are evaluated based on the second intervention identity (i 1,....,iT), a second sample of the second predicted automatic driving strategy informationParameters of the multi-modal coding layer 210 and the decision control layer 220 are adjusted.
In an example, parameters of the multi-modal coding layer and the decision control layer may be adjusted using feedback reinforcement learning and human in-loop learning. For example, the feedback information (r 1,....,rt), the second intervention identity (i 1,....,it), the second predicted automatic driving strategy information (y 1,....,yt), the second actual automatic driving strategy information may be evaluated based on a second sample includedAnd the quintuple data including the second sample input information (x 1,....,xt).
Wherein when the second intervention sign (i 1,....,iT) is true, the automatic driving vehicle is represented by manual workThe control is not controlled by the control signal sent by the automatic driving model any more; when the intervention flag (i 1,....,iT) is non-true, it indicates that the automatic driving vehicle is controlled by the control signal sent by the automatic driving model rather than being manually controlledAnd (5) controlling.
In an example, the parameters of the multi-mode coding layer and the evaluation feedback layer may be adjusted using the objective function in equation (5) as follows:
where λ 1 and λ 2 may be hyper-parameters, respectively, indicating the weighting of the respective components. Wherein the intervention mark (i 1,....,iT) is a true value 1, and the non-true value is 0.A t may indicate a dominance function for time t (ADVANTAGE FUNCTION), and a t may be derived based on the second sample evaluation feedback information (r 1,....,rt).
According to further embodiments, when the autopilot model comprises an assessment feedback layer, the second real driving data may further comprise a second intervention identifier (i 1,....,it), the second intervention identifier (i 1,....,it) being capable of characterizing whether the second real autopilot strategy information is autopilot strategy information with human intervention, the second real driving data comprising real assessment feedback information for the second real autopilot strategy information. And performing the second training of the autopilot model based on the second real driving data may further include: and inputting the implicit representation of the third sample into an evaluation feedback layer to obtain second sample evaluation feedback information which is output by the evaluation feedback layer and aims at the second predicted automatic driving strategy information. And adjusting parameters of the multi-mode coding layer and the decision control layer based on the second intervention identification, the second sample evaluation feedback information and the real evaluation feedback information for the second predicted automatic driving strategy information, and the second predicted automatic driving strategy information and the second real automatic driving strategy information.
Real evaluation feedback information for second real automatic driving strategy information in second real driving dataThe evaluation feedback information (evaluation of the driving experience of the automatically driven vehicle by the passenger or the driver) may be manually fed back, for example, it may be indicated whether the current driving behavior is derived from a human driver or a model, whether the current driving is comfortable, whether the current driving violates a traffic rule, whether the current driving belongs to dangerous driving, or the like.
In an example, the parameters of the multi-modal encoding layer 210 and the evaluation feedback layer 230 may be adjusted using an objective function as in equation (5) above.
In addition, feedback modeling may be used to learn a function to estimate the feedback information. In other words, the model itself may be made to estimate the expected benefit (i.e., the predicted outcome output by the above-mentioned assessment feedback layer 230) obtained by the current driving trajectory. For example, (r t) may be determined using equation (3) above. The parameters of the multi-modal coding layer 210 and the evaluation feedback layer 230 may then be adjusted using the objective function as in equation (5) above.
It will be appreciated that the actual driving data in performing driving using the autopilot model may include navigation information of the vehicle, current and historical awareness information for the vehicle environment, actual autopilot strategy information, and evaluation feedback information and intervention identifications, wherein the evaluation feedback information may be that of a safety/driver or that predicted by an evaluation feedback layer.
According to some embodiments, the evaluation feedback information (e.g., the evaluation feedback information, the first sample evaluation feedback information, or the second sample evaluation feedback information) may include at least one of: driving comfort information, driving safety information, driving efficiency, whether running lights are used civilized, driving behavior source information, and whether traffic regulation information is violated.
According to some embodiments, second real driving data in the process of controlling the target vehicle to perform the automatic driving by using the automatic driving model obtained by the first training may be acquired at preset time intervals, and the automatic driving model may be subjected to the second training again based on the newly acquired second real driving data.
The preset time may be, for example, half a day, one day, half a month, one month, etc., and may be set according to actual requirements, without limitation.
Thus, the autopilot model may be continuously iteratively trained based on driving data in real-vehicle driving and/or in simulation scenarios, thereby continuously optimizing the autopilot model. It will be appreciated that while the second training is performed iteratively on-line, the autopilot model may also be trained off-line based on sample input information employed by the first training.
According to some embodiments, the model training method may further comprise: after a second training of the autopilot model based on the second real driving data, the first training is performed again on the autopilot model comprising at least a multimodal coding layer and a decision control layer.
In an example, offline pre-training may be performed first, followed by online training, and based on the model obtained by the online training, the offline training and the online training may be performed. Thus, the model is constantly optimized through multiple iterative training.
According to some embodiments, obtaining second real driving data for controlling the target vehicle to perform the automatic driving using the automatic driving model obtained by the first training may include:
acquiring second real driving data in the process of controlling the target vehicle to execute automatic driving by using the automatic driving model obtained by the first training under the real driving scene, and/or
And acquiring second real driving data in the process of controlling the target vehicle to execute automatic driving by using the automatic driving model acquired by the first training under the simulated driving scene.
In an example, the second real driving data obtained by performing the automatic driving in the real driving scene may be mainly used, and the second real driving data obtained by performing the automatic driving in the simulated driving scene may be secondarily used. The input information of the simulation sample can be set as required, so that the simulation environment is utilized to mine a plurality of long tail samples, and the richness of the training samples is expanded. In other words, the amount of real vehicle driving data used in the training process of the automatic driving model may be greater than the amount of simulated vehicle driving data.
It will be appreciated that training based on driving data in a simulation environment may be included, whether in an offline pre-training phase or an online training phase.
According to some embodiments, the autopilot strategy information may include a target planned trajectory.
According to another aspect of the present disclosure, an autopilot model is provided. The autopilot model may be trained using a model training method according to embodiments of the present disclosure.
As shown in fig. 2, the autopilot model 200 includes a multi-modal encoding layer 210 and a decision control layer 220, where the multi-modal encoding layer 210 and the decision control layer 220 are connected to form an end-to-end neural network model, so that the decision control layer 220 directly obtains autopilot strategy information based on the output of the multi-modal encoding layer 210. The first input information of the multi-modal encoding layer 210 comprises navigation information of the target vehicle and perception information of the surroundings of the target vehicle obtained with the sensor, the perception information comprising current perception information and historical perception information for the surroundings of the target vehicle during driving of the vehicle, the multi-modal encoding layer 210 being configured to obtain an implicit representation corresponding to the first input information. The second input information of the decision control layer 220 comprises an implicit representation, the decision control layer being configured to obtain target autopilot strategy information based on the second input information.
Details regarding the autopilot model 200 are described in detail above and are therefore not repeated. Because the multi-mode coding layer 210 and the decision control layer 220 of the model to be trained are connected to form an end-to-end neural network model, the perception information in the sample information can be directly responsible for decision, and the problem of coupling between prediction and planning of the automatic driving model obtained through training can be solved. In addition, the introduction of implicit representation can overcome the problem that the algorithm is easy to fail due to the representation defect of the structured information. In addition, as the perception information in the sample information can be directly responsible for decision, the perception can capture information which is critical to decision, and error accumulation caused by perception errors in a model obtained through training is reduced. Furthermore, the perception is directly responsible for decision making, so that the automatic driving technology of a heavy perception light map is realized, the problem of decision failure caused by untimely updating of a high-precision map and limited area can be solved, and the dependence on the high-precision map is eliminated, so that the updating cost of the high-precision map can be reduced.
According to some embodiments, the autopilot model 200 may further include a future prediction layer configured to obtain future prediction information for the target vehicle surroundings based on the implicit representation corresponding to the entered first input information.
According to some embodiments, with further reference to fig. 2, the future prediction layers may include at least one of a first future prediction layer 240 and a second future prediction layer 250. The first future prediction layer 240 may be configured to represent output future prediction awareness information Out4 (e.g., sensor information at some time in the future) based on the input first sample implicit e t The sensor information of a future moment comprises camera input information or radar input information of the future moment); the second future prediction layer 250 may be configured to output a future prediction implicit representation out5 based on the input first sample implicit representation e t Such as an implicit representation of BEV space at some point in the future).
Illustratively, the future prediction layer may include, but is not limited to, a decoder in a transform.
In fig. 2, the future prediction layer and the decision control layer 220 are two independent network structures, respectively, and it can be understood that the future prediction layer and the decision control layer may also share the same network structure, i.e., the output of the network structure may include the future prediction information and the autopilot strategy information. Illustratively, the future prediction layer and the decision control layer may share the same decoder in the transform.
According to some embodiments, the future prediction information may include at least one of: future predictive awareness information for the surrounding environment of the vehicle (e.g., sensor information at some point in the future)The sensor information of a future moment comprises camera input information or radar input information of the future moment); and future prediction implicit representation corresponding to future prediction awareness information(E.g., an implicit representation of sensor information corresponding to a future point in time in BEV space).
According to another aspect of the present disclosure, an autopilot method implemented using an autopilot model is provided.
Fig. 7 shows a flow chart of an autopilot method 700 according to an embodiment of the present disclosure. As shown in fig. 7, the automatic driving method 700 includes:
step S710, controlling the target vehicle to perform automatic driving by using the automatic driving model 200; and
Step S720, obtaining real driving data in the automatic driving process, wherein the real driving data comprises navigation information of a target vehicle, real perception information aiming at the surrounding environment of the target vehicle and real automatic driving strategy information, and the real driving data is used for carrying out iterative training on an automatic driving model.
The navigation information of the target vehicle in the real driving data may include vectorized navigation information and vectorized map information, which may be obtained by vectorizing one or more of lane-level, or road-level navigation information and coarse positioning information. The real perception information may include perception information of one or more cameras, perception information of one or more lidars, and perception information of one or more millimeter wave radars on a vehicle in a real road scene. It is to be understood that the perception information of the surroundings of the target vehicle is not limited to the above-described one form, and may include, for example, only the perception information of a plurality of cameras, but not the perception information of one or more lidars and the perception information of one or more millimeter wave radars. The perceived information obtained by the camera may be perceived information in the form of a picture or video, and the perceived information obtained by the lidar may be perceived information in the form of a radar point cloud (e.g., a three-dimensional point cloud). The actual autopilot strategy information may include planned trajectories of the autopilot vehicle or control signals for the vehicle (e.g., signals to control throttle, brake, steering amplitude, etc.) collected in an actual road scene.
According to some embodiments of the application, the target vehicle may be controlled to perform autopilot using autopilot strategy information (e.g., planned trajectories) predicted by an autopilot model.
Fig. 8 shows a flow chart of an autopilot method 800 according to another embodiment of the present disclosure. According to some embodiments, as shown in fig. 8, the autopilot method 800 includes steps S810, S820 similar to steps S710, S720, respectively, in the autopilot method 700.
Step 810, controlling the target vehicle to execute automatic driving by using the automatic driving model 200 obtained by iterative training; and
Step S820, obtaining real driving data in the automatic driving process, wherein the real driving data comprises navigation information of a target vehicle, real perception information aiming at the surrounding environment of the target vehicle and real automatic driving strategy information, and the real driving data is used for carrying out iterative training on an automatic driving model; and
And step S830, controlling the target vehicle to execute automatic driving again by using the automatic driving model obtained through iterative training. Thus, the automatic driving task and the model training task can be synchronously performed in the real vehicle running process. In an example, the planned trajectory predicted by the autopilot model 200 or a control signal for the vehicle (e.g., a signal to control throttle, brake, steering amplitude, etc.) may be utilized to control the target vehicle to again perform autopilot. For example, a trajectory plan may be interpreted using a control strategy module in an autonomous vehicle to obtain control signals for the vehicle; or may utilize a neural network to directly output control signals for the vehicle based on the implicit representation.
According to some embodiments, in the automatic driving method 700 or the automatic driving method 800, real driving data in the process of performing automatic driving by using the automatic driving model control target vehicle may be acquired at preset time intervals, and the automatic driving model may be iteratively trained based on the newly acquired real driving data.
According to some embodiments, in the autopilot method 700 or the autopilot method 800, the real driving data may include an intervention identifier that can characterize whether the real autopilot strategy information is autopilot strategy information with human intervention.
In the running process of the real vehicle, a safety person can intervene at any time at critical time, the control right of the automatic driving vehicle is taken, and the possible collision during the running of the real vehicle can be avoided. After the crisis passes, control is returned to the autonomous vehicle. The intervention identification is used for representing whether the real automatic driving strategy information is automatic driving strategy information with human intervention. In other words, by introducing the intervention mark, the model can learn an automatic driving strategy of intervention of a safety person, the driving behavior learned by the model can be well aligned to the preferences of human passengers, and the user experience and safety are improved. Reinforcement learning of the circuit by a person can gradually learn to reduce the adverse events of intervention. Through the mechanism, the reinforcement learning efficiency can be improved, and the influence of the inferior experience on the learning process can be reduced, so that the robustness of the model obtained through training is further improved.
The training method of the automatic driving model provided by the embodiment of the application has the following advantages:
Unmanned annotation data are rare, and the model is pre-trained rapidly by utilizing the existing L2+, L4 and the like, so that the model can reach a certain standard when the model is on the bus. After getting on, further update iteration is performed through continuous reinforcement learning. Compared with the traditional reinforcement learning, the scheme has the following advantages:
a. risk-free training (Risk-FREE LEARNING). Unlike traditional reinforcement learning, which requires some costly behaviors to learn, the HRL technique can learn to avoid risks such as collision and violation completely without risk under the protection of a safety officer, so that the whole training process can learn not only in a simulation environment but also synchronously in a real environment.
B. The driving behavior can be well aligned to the preference of human passengers, the problems of efficiency, comfort, safety and the like can be comprehensively considered, and an optimal scheme is provided.
C. Extremely low training costs. Once the entire process is tuned on, the training and migration costs will be very small. When the unmanned vehicle is migrated to different cities, only drive test data are required to be collected in different areas.
D. massive data + large model advantage. Massive data is used for pre-training, and a long learning process of cold start is avoided, so that a larger model can be fully utilized, and a better effect is obtained. .
According to another aspect of the present disclosure, a training apparatus for an autopilot model is provided. The automatic driving model comprises a multi-mode coding layer and a decision control layer, wherein the multi-mode coding layer and the decision control layer are connected to form an end-to-end neural network model, so that the decision control layer directly obtains automatic driving strategy information based on the output of the multi-mode coding layer.
Fig. 9 shows a block diagram of a training apparatus 900 of an autopilot model in accordance with an embodiment of the present disclosure. The training apparatus 900 of the autopilot model is configured to perform a first training of the multimodal coding layer and the decision control layer, and includes:
a first real driving data acquisition unit 910 configured to acquire first real driving data during running of the vehicle, the first real driving data including first navigation information of the vehicle and first real perception information for a surrounding environment of the vehicle, the first real perception information including current perception information and history perception information for the surrounding environment of the vehicle;
A real automatic driving strategy information acquisition unit 920 configured to acquire first real automatic driving strategy information corresponding to the first real driving data based on first real perception information;
A multi-modal coding layer training unit 930 configured to input first sample input information including the first real driving data into the multi-modal coding layer to obtain a first sample implicit representation output by the multi-modal coding layer;
A decision control layer training unit 940 configured to input first intermediate sample input information including an implicit representation of the first sample into the decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer; and
A parameter adjustment unit 950 configured to adjust parameters of the multi-modal encoding layer and the decision control layer based on the first predicted automatic driving strategy information and the first real automatic driving strategy information.
According to another aspect of the present disclosure, an autopilot based on an autopilot model is provided.
Fig. 10 shows a block diagram of an autopilot 1000 in accordance with an embodiment of the present disclosure. As shown in fig. 10, the automatic driving apparatus 1000 includes:
A control unit 1010 configured to control the target vehicle to perform autopilot using the autopilot model 200 described above; and
A second real driving data acquisition unit 1020 configured to acquire real driving data during automatic driving, the real driving data including navigation information of the target vehicle, real perception information for a surrounding environment of the target vehicle, and real automatic driving strategy information, the real driving data being used for iterative training of the automatic driving model.
It should be appreciated that the various modules or units of the apparatus 900 shown in fig. 9 may correspond to the various steps in the method 300 described with reference to fig. 3. Thus, the operations, features and advantages described above with respect to method 300 apply equally to apparatus 900 and the modules and units comprised thereof; and the various modules or units of the apparatus 1000 shown in fig. 10 may correspond to the various steps in the method 700 described with reference to fig. 7. Thus, the operations, features and advantages described above with respect to method 700 apply equally to apparatus 1000 and the modules and units comprised thereof. For brevity, certain operations, features and advantages are not described in detail herein.
Although specific functions are discussed above with reference to specific modules, it should be noted that the functions of the various units discussed herein may be divided into multiple units and/or at least some of the functions of the multiple units may be combined into a single unit.
It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various units described above with respect to fig. 9 and 10 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the units may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these units may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the units 910-950, and units 1010-1020 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip including one or more components of a Processor (e.g., a central processing unit (Central Processing Unit, CPU), microcontroller, microprocessor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), etc.), memory, one or more communication interfaces, and/or other circuitry, and may optionally execute received program code and/or include embedded firmware to perform functions.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an autopilot method or a training method of an autopilot model in accordance with embodiments of the present disclosure.
According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method of automated driving or a method of training an automated driving model according to an embodiment of the present disclosure.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a method of automatic driving or a method of training an automatic driving model according to embodiments of the present disclosure.
According to another aspect of the present disclosure, there is also provided an autonomous vehicle including the training device 900 of the autonomous driving model according to an embodiment of the present disclosure, the autonomous driving device 1000, and one of the above-described electronic apparatuses.
Referring to fig. 11, a block diagram of an electronic device 1100 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 11, the electronic device 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the electronic device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
A number of components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106, an output unit 1107, a storage unit 1108, and a communication unit 1109. The input unit 1106 may be any type of device capable of inputting information to the electronic device 1100, the input unit 1106 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 1107 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1108 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices through computer networks such as the internet and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the various methods and processes described above, such as the methods (or processes) 300-800. For example, in some embodiments, the methods (or processes) 300-800 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the methods (or processes) 300 to 800 described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the methods (or processes) 300-800 by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims (29)

1.一种自动驾驶模型的训练方法,所述自动驾驶模型包括多模态编码层和决策控制层,所述多模态编码层和决策控制层连接组成端到端的神经网络模型,以使得所述决策控制层直接基于所述多模态编码层的输出预测自动驾驶策略信息,所述方法包括对所述多模态编码层和决策控制层进行第一训练,1. A training method for an autonomous driving model, the autonomous driving model comprising a multimodal coding layer and a decision control layer, the multimodal coding layer and the decision control layer being connected to form an end-to-end neural network model, so that the decision control layer directly predicts autonomous driving strategy information based on an output of the multimodal coding layer, the method comprising first training the multimodal coding layer and the decision control layer, 其中,所述对所述多模态编码层和决策控制层进行第一训练包括:The first training of the multimodal coding layer and the decision control layer includes: 获取车辆行驶过程中的第一真实驾驶数据,所述第一真实驾驶数据包括车辆的第一导航信息和针对车辆周围环境的第一真实感知信息,所述第一真实感知信息包括针对车辆周围环境的当前感知信息和历史感知信息;Acquire first real driving data during the driving of the vehicle, the first real driving data including first navigation information of the vehicle and first real perception information of the vehicle's surrounding environment, the first real perception information including current perception information and historical perception information of the vehicle's surrounding environment; 基于第一真实感知信息获取所述第一真实驾驶数据相对应的第一真实自动驾驶策略信息;Acquire first real autonomous driving strategy information corresponding to the first real driving data based on the first real perception information; 将包括所述第一真实驾驶数据的第一样本输入信息输入所述多模态编码层,以获取所述多模态编码层所输出的融合了空间与时序信息的第一样本隐式表示;Inputting first sample input information including the first real driving data into the multimodal coding layer to obtain a first sample implicit representation output by the multimodal coding layer that combines spatial and temporal information; 将包括所述第一样本隐式表示的第一中间样本输入信息输入所述决策控制层,以获取所述决策控制层所输出的第一预测自动驾驶策略信息;以及Inputting first intermediate sample input information including the implicit representation of the first sample into the decision control layer to obtain first predicted autonomous driving strategy information output by the decision control layer; and 基于所述第一预测自动驾驶策略信息和第一真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。Based on the first predicted autonomous driving strategy information and the first real autonomous driving strategy information, the parameters of the multimodal coding layer and the decision control layer are adjusted. 2.根据权利要求1所述的方法,其中,所述自动驾驶模型还包括未来预测层,所述第一真实感知信息包括针对车辆周围环境的未来感知信息,2. The method according to claim 1, wherein the autonomous driving model further comprises a future prediction layer, and the first real perception information comprises future perception information of the vehicle's surrounding environment, 其中,对所述多模态编码层和决策控制层进行第一训练还包括:The first training of the multimodal coding layer and the decision control layer further includes: 基于所述未来感知信息获取针对车辆周围环境的未来真实信息;Acquire future real information about the vehicle's surrounding environment based on the future perception information; 将所述第一样本隐式表示输入所述未来预测层,以获取所述未来预测层所输出的未来预测信息;以及Inputting the first sample implicit representation into the future prediction layer to obtain future prediction information output by the future prediction layer; and 基于所述未来真实信息和未来预测信息,调整所述多模态编码层和所述未来预测层的参数。Based on the future real information and the future predicted information, parameters of the multimodal coding layer and the future prediction layer are adjusted. 3.根据权利要求2所述的方法,其中,未来预测信息包括以下各项中的至少一者:3. The method according to claim 2, wherein the future prediction information comprises at least one of the following: 针对所述车辆周围环境的未来预测感知信息;以及Future predicted perception information for the vehicle's surroundings; and 与所述未来预测感知信息相对应的未来预测隐式表示。A future prediction implicit representation corresponding to the future prediction perceptual information. 4.根据权利要求3所述的方法,其中,未来预测层包括第一未来预测层和第二未来预测层中的至少一者,4. The method of claim 3, wherein the future prediction layer comprises at least one of a first future prediction layer and a second future prediction layer, 其中,所述第一未来预测层被配置用于基于输入的第一样本隐式表示输出所述未来预测感知信息,所述第二未来预测层被配置用于基于输入的第一样本隐式表示输出所述未来预测隐式表示。The first future prediction layer is configured to output the future prediction perceptual information based on the input first sample implicit representation, and the second future prediction layer is configured to output the future prediction implicit representation based on the input first sample implicit representation. 5.根据权利要求2-4中任一项所述的方法,其中,所述未来预测层和所述决策控制层共用同一网络结构。5. The method according to any one of claims 2-4, wherein the future prediction layer and the decision control layer share the same network structure. 6.根据权利要求1-4中任一项所述的方法,其中,基于第一真实感知信息获取所述第一真实驾驶数据的第一真实自动驾驶策略信息包括:6. The method according to any one of claims 1 to 4, wherein acquiring first real autonomous driving strategy information of the first real driving data based on the first real perception information comprises: 将所述第一真实感知信息输入驾驶策略预测模型,以获取所述驾驶策略预测模型所输出的第一真实自动驾驶策略信息。The first real perception information is input into a driving strategy prediction model to obtain first real automatic driving strategy information output by the driving strategy prediction model. 7.根据权利要求1-4中任一项所述的方法,其中,所述自动驾驶模型还包括评价反馈层,对所述多模态编码层和决策控制层进行第一训练还包括:7. The method according to any one of claims 1 to 4, wherein the autonomous driving model further comprises an evaluation feedback layer, and the first training of the multimodal encoding layer and the decision control layer further comprises: 将所述第一样本隐式表示输入所述评价反馈层,以获取所述评价反馈层所输出的针对所述第一预测自动驾驶策略信息的第一样本评价反馈信息,inputting the first sample implicit representation into the evaluation feedback layer to obtain first sample evaluation feedback information output by the evaluation feedback layer for the first predicted autonomous driving strategy information, 其中,基于针对所述第一预测自动驾驶策略信息的第一样本评价反馈信息、所述第一预测自动驾驶策略信息和第一真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。Among them, based on the first sample evaluation feedback information for the first predicted autonomous driving strategy information, the first predicted autonomous driving strategy information and the first real autonomous driving strategy information, the parameters of the multimodal coding layer and the decision control layer are adjusted. 8.根据权利要求7所述的方法,其中,所述第一真实驾驶数据还包括第一干预标识,所述第一干预标识能够表征所述第一真实自动驾驶策略信息是否为存在人为干预的驾驶策略信息,8. The method according to claim 7, wherein the first real driving data further includes a first intervention flag, and the first intervention flag can indicate whether the first real autonomous driving strategy information is driving strategy information with human intervention. 其中,基于所述第一干预标识、针对所述第一预测自动驾驶策略信息的第一样本评价反馈信息、所述第一预测自动驾驶策略信息和第一真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。Among them, based on the first intervention identifier, the first sample evaluation feedback information for the first predicted autonomous driving strategy information, the first predicted autonomous driving strategy information and the first real autonomous driving strategy information, the parameters of the multimodal coding layer and the decision control layer are adjusted. 9.根据权利要求7所述的方法,其中,所述评价反馈层的训练过程包括:9. The method according to claim 7, wherein the training process of the evaluation feedback layer comprises: 获取第二样本输入信息以及针对所述第二样本输入信息的真实评价反馈信息;Acquire second sample input information and true evaluation feedback information for the second sample input information; 将第二样本输入信息输入所述多模态编码层,以获取所述多模态编码层所输出的第二样本隐式表示;Inputting second sample input information into the multimodal coding layer to obtain a second sample implicit representation output by the multimodal coding layer; 将所述第二样本隐式表示输入所述评价反馈层,以获取所述评价反馈层所输出的针对所述第二样本输入信息的预测评价反馈信息;以及Inputting the second sample implicit representation into the evaluation feedback layer to obtain predicted evaluation feedback information output by the evaluation feedback layer for the second sample input information; and 基于所述真实评价反馈信息和预测评价反馈信息,调整所述多模态编码层和所述评价反馈层的参数。Based on the real evaluation feedback information and the predicted evaluation feedback information, parameters of the multimodal coding layer and the evaluation feedback layer are adjusted. 10.根据权利要求1-4中任一项所述的方法,还包括:10. The method according to any one of claims 1 to 4, further comprising: 获取利用所述第一训练所获得的自动驾驶模型控制目标车辆执行自动驾驶过程中的第二真实驾驶数据,所述第二真实驾驶数据包括所述目标车辆的第二导航信息、针对目标车辆周围环境的第二真实感知信息、以及第二真实自动驾驶策略信息;以及Acquire second real driving data in a process of controlling a target vehicle to perform automatic driving using the automatic driving model obtained by the first training, wherein the second real driving data includes second navigation information of the target vehicle, second real perception information of the surrounding environment of the target vehicle, and second real automatic driving strategy information; and 基于所述第二真实驾驶数据对所述自动驾驶模型进行第二训练,包括:Performing a second training on the autonomous driving model based on the second real driving data includes: 将包括所述第二真实驾驶数据的第三样本输入信息输入所述多模态编码层,以获取所述多模态编码层所输出的第三样本隐式表示;Inputting third sample input information including the second real driving data into the multimodal coding layer to obtain a third sample implicit representation output by the multimodal coding layer; 将包括所述第三样本隐式表示的第二中间样本输入信息输入所述决策控制层,以获取所述决策控制层所输出的第二预测自动驾驶策略信息;以及Inputting the second intermediate sample input information including the implicit representation of the third sample into the decision control layer to obtain the second predicted automatic driving strategy information output by the decision control layer; and 基于所述第二预测自动驾驶策略信息和第二真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。Based on the second predicted autonomous driving strategy information and the second real autonomous driving strategy information, parameters of the multimodal coding layer and the decision control layer are adjusted. 11.根据权利要求10所述的方法,其中,当所述自动驾驶模型包括评价反馈层时,基于所述第二真实驾驶数据对所述自动驾驶模型进行第二训练还包括:11. The method according to claim 10, wherein, when the autonomous driving model includes an evaluation feedback layer, performing a second training on the autonomous driving model based on the second real driving data further comprises: 将所述第三样本隐式表示输入所述评价反馈层,以获取所述评价反馈层所输出的针对所述第二预测自动驾驶策略信息的第二样本评价反馈信息,inputting the third sample implicit representation into the evaluation feedback layer to obtain second sample evaluation feedback information output by the evaluation feedback layer for the second predicted autonomous driving strategy information, 其中,基于针对所述第二预测自动驾驶策略信息的第二样本评价反馈信息、所述第二预测自动驾驶策略信息和第二真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。Among them, based on the second sample evaluation feedback information for the second predicted autonomous driving strategy information, the second predicted autonomous driving strategy information and the second real autonomous driving strategy information, the parameters of the multimodal coding layer and the decision control layer are adjusted. 12.根据权利要求10所述的方法,其中,当所述自动驾驶模型包括评价反馈层时,所述第二真实驾驶数据还包括第二干预标识,所述第二干预标识能够表征所述第二真实自动驾驶策略信息是否为存在人为干预的自动驾驶策略信息,12. The method according to claim 10, wherein, when the autonomous driving model includes an evaluation feedback layer, the second real driving data further includes a second intervention flag, and the second intervention flag can indicate whether the second real autonomous driving strategy information is autonomous driving strategy information with human intervention. 其中,基于所述第二真实驾驶数据对所述自动驾驶模型进行第二训练还包括:The performing a second training on the automatic driving model based on the second real driving data further includes: 将所述第三样本隐式表示输入所述评价反馈层,以获取所述评价反馈层所输出的针对所述第二预测自动驾驶策略信息的第二样本评价反馈信息,inputting the third sample implicit representation into the evaluation feedback layer to obtain second sample evaluation feedback information output by the evaluation feedback layer for the second predicted autonomous driving strategy information, 其中,基于所述第二干预标识、针对所述第二预测自动驾驶策略信息的第二样本评价反馈信息、所述第二预测自动驾驶策略信息和第二真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。Among them, based on the second intervention identifier, the second sample evaluation feedback information for the second predicted autonomous driving strategy information, the second predicted autonomous driving strategy information and the second real autonomous driving strategy information, the parameters of the multimodal coding layer and the decision control layer are adjusted. 13.根据权利要求10所述的方法,其中,当所述自动驾驶模型包括评价反馈层时,所述第二真实驾驶数据还包括第二干预标识,所述第二干预标识能够表征所述第二真实自动驾驶策略信息是否为存在人为干预的自动驾驶策略信息,所述第二真实驾驶数据包括针对所述第二真实自动驾驶策略信息的真实评价反馈信息,13. The method according to claim 10, wherein, when the autonomous driving model includes an evaluation feedback layer, the second real driving data further includes a second intervention flag, and the second intervention flag can indicate whether the second real autonomous driving strategy information is autonomous driving strategy information with human intervention, and the second real driving data includes real evaluation feedback information for the second real autonomous driving strategy information. 其中,基于所述第二真实驾驶数据对所述自动驾驶模型进行第二训练还包括:The performing a second training on the automatic driving model based on the second real driving data further includes: 将所述第三样本隐式表示输入所述评价反馈层,以获取所述评价反馈层所输出的针对所述第二预测自动驾驶策略信息的第二样本评价反馈信息,inputting the third sample implicit representation into the evaluation feedback layer to obtain second sample evaluation feedback information output by the evaluation feedback layer for the second predicted autonomous driving strategy information, 其中,基于所述第二干预标识、针对所述第二预测自动驾驶策略信息的第二样本评价反馈信息和真实评价反馈信息、以及所述第二预测自动驾驶策略信息和第二真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。Among them, based on the second intervention identifier, the second sample evaluation feedback information and the real evaluation feedback information for the second predicted autonomous driving strategy information, and the second predicted autonomous driving strategy information and the second real autonomous driving strategy information, the parameters of the multimodal coding layer and the decision control layer are adjusted. 14.根据权利要求13所述的方法,其中,评价反馈信息包括以下各项中的至少一者:14. The method according to claim 13, wherein the evaluation feedback information comprises at least one of the following: 驾驶舒适度信息、驾驶安全性信息、驾驶效率、是否文明使用行车灯、驾驶行为来源信息、是否违反交通规则信息。Driving comfort information, driving safety information, driving efficiency, whether the driving lights are used in a civilized manner, information on the source of driving behavior, and whether traffic rules are violated. 15.根据权利要求10所述的方法,其中,以预设时间间隔,获取利用所述第一训练所获得的自动驾驶模型控制目标车辆执行自动驾驶过程中的第二真实驾驶数据,并基于新获取的第二真实驾驶数据对所述自动驾驶模型再次进行第二训练。15. The method according to claim 10, wherein, at preset time intervals, second real driving data of the autonomous driving model obtained by the first training being used to control the target vehicle to perform autonomous driving is obtained, and the autonomous driving model is subjected to a second training based on the newly acquired second real driving data. 16.根据权利要求10所述的方法,还包括:16. The method according to claim 10, further comprising: 在基于所述第二真实驾驶数据对所述自动驾驶模型进行第二训练之后,对至少包括所述多模态编码层和决策控制层的自动驾驶模型再次进行第一训练。After the autonomous driving model is trained for the second time based on the second real driving data, the autonomous driving model including at least the multimodal encoding layer and the decision control layer is trained for the first time again. 17.根据权利要求10所述的方法,获取利用所述第一训练所获得的自动驾驶模型控制目标车辆执行自动驾驶过程中的第二真实驾驶数据包括:17. The method according to claim 10, wherein obtaining second real driving data in a process in which the target vehicle is controlled to perform autonomous driving using the autonomous driving model obtained by the first training comprises: 获取在真实驾驶场景下利用所述第一训练所获得的自动驾驶模型控制目标车辆执行自动驾驶过程中的第二真实驾驶数据,和/或Acquire second real driving data in a process of controlling a target vehicle to perform automatic driving using the automatic driving model obtained by the first training in a real driving scenario, and/or 获取在仿真驾驶场景下利用所述第一训练所获得的自动驾驶模型控制目标车辆执行自动驾驶过程中的第二真实驾驶数据。Acquire second real driving data in a process of controlling a target vehicle to perform autonomous driving using the autonomous driving model obtained through the first training in a simulated driving scenario. 18.根据权利要求1-4中任一项所述的方法,其中,自动驾驶策略信息包括目标规划轨迹。18. The method according to any one of claims 1-4, wherein the autonomous driving strategy information includes a target planning trajectory. 19.一种利用权利要求1-18中任一项所述的训练方法训练得到的自动驾驶模型,包括多模态编码层和决策控制层,所述多模态编码层和决策控制层连接组成端到端的神经网络模型,以使得所述决策控制层直接基于所述多模态编码层的输出预测自动驾驶策略信息,19. An autonomous driving model trained by the training method according to any one of claims 1 to 18, comprising a multimodal coding layer and a decision control layer, wherein the multimodal coding layer and the decision control layer are connected to form an end-to-end neural network model, so that the decision control layer directly predicts autonomous driving strategy information based on the output of the multimodal coding layer, 其中,所述多模态编码层的第一输入信息包括目标车辆的导航信息和利用传感器所获得的目标车辆周围环境的感知信息,所述感知信息包括在车辆的行驶过程中针对所述目标车辆周围环境的当前感知信息和历史感知信息,所述多模态编码层被配置用于获取所述第一输入信息相对应的隐式表示,The first input information of the multimodal coding layer includes navigation information of the target vehicle and perception information of the surrounding environment of the target vehicle obtained by using a sensor, wherein the perception information includes current perception information and historical perception information of the surrounding environment of the target vehicle during the driving process of the vehicle, and the multimodal coding layer is configured to obtain an implicit representation corresponding to the first input information. 所述决策控制层的第二输入信息包括所述隐式表示,所述决策控制层被配置用于基于所述第二输入信息获取目标自动驾驶策略信息。The second input information of the decision control layer includes the implicit representation, and the decision control layer is configured to obtain target autonomous driving strategy information based on the second input information. 20.一种利用自动驾驶模型实现的自动驾驶方法,包括:20. An autonomous driving method implemented using an autonomous driving model, comprising: 利用权利要求19所述的自动驾驶模型控制目标车辆执行自动驾驶;以及Using the autonomous driving model described in claim 19 to control the target vehicle to perform autonomous driving; and 获取在自动驾驶过程中的真实驾驶数据,所述真实驾驶数据包括所述目标车辆的导航信息、针对目标车辆周围环境的真实感知信息、以及真实自动驾驶策略信息,所述真实驾驶数据用于对所述自动驾驶模型进行迭代训练。Acquire real driving data during the autonomous driving process, wherein the real driving data includes navigation information of the target vehicle, real perception information of the target vehicle's surrounding environment, and real autonomous driving strategy information, and the real driving data is used to iteratively train the autonomous driving model. 21.根据权利要求20所述的方法,还包括:21. The method according to claim 20, further comprising: 利用迭代训练得到的自动驾驶模型控制所述目标车辆再次执行自动驾驶。The autonomous driving model obtained by iterative training is used to control the target vehicle to perform autonomous driving again. 22.根据权利要求20或21所述的方法,其中,以预设时间间隔,获取利用自动驾驶模型控制目标车辆控制目标车辆执行自动驾驶过程中的真实驾驶数据,并基于新获取的真实驾驶数据对所述自动驾驶模型进行迭代训练。22. The method according to claim 20 or 21, wherein real driving data of the target vehicle being controlled by the autonomous driving model during the autonomous driving process is obtained at preset time intervals, and the autonomous driving model is iteratively trained based on the newly acquired real driving data. 23.根据权利要求20或21所述的方法,其中,所述真实驾驶数据包括干预标识,所述干预标识能够表征所述真实自动驾驶策略信息是否为存在人为干预的自动驾驶策略信息。23. The method according to claim 20 or 21, wherein the real driving data includes an intervention flag, and the intervention flag can indicate whether the real autonomous driving strategy information is autonomous driving strategy information with human intervention. 24.一种自动驾驶模型的训练装置,所述自动驾驶模型包括多模态编码层和决策控制层,所述多模态编码层和决策控制层连接组成端到端的神经网络模型,以使得所述决策控制层直接基于所述多模态编码层的输出预测自动驾驶策略信息,所述装置被配置为对所述多模态编码层和决策控制层进行第一训练,并且包括:24. A training device for an autonomous driving model, the autonomous driving model comprising a multimodal coding layer and a decision control layer, the multimodal coding layer and the decision control layer being connected to form an end-to-end neural network model, so that the decision control layer directly predicts autonomous driving strategy information based on an output of the multimodal coding layer, the device being configured to perform a first training on the multimodal coding layer and the decision control layer, and comprising: 第一真实驾驶数据获取单元,被配置为获取车辆行驶过程中的第一真实驾驶数据,所述第一真实驾驶数据包括车辆的第一导航信息和针对车辆周围环境的第一真实感知信息,所述第一真实感知信息包括针对车辆周围环境的当前感知信息和历史感知信息;A first real driving data acquisition unit is configured to acquire first real driving data during the driving process of the vehicle, wherein the first real driving data includes first navigation information of the vehicle and first real perception information of the vehicle's surrounding environment, wherein the first real perception information includes current perception information and historical perception information of the vehicle's surrounding environment; 真实自动驾驶策略信息获取单元,被配置为基于第一真实感知信息获取所述第一真实驾驶数据相对应的第一真实自动驾驶策略信息;a real autonomous driving strategy information acquisition unit, configured to acquire first real autonomous driving strategy information corresponding to the first real driving data based on the first real perception information; 多模态编码层训练单元,被配置为将包括所述第一真实驾驶数据的第一样本输入信息输入所述多模态编码层,以获取所述多模态编码层所输出的融合了空间与时序信息的第一样本隐式表示;a multimodal coding layer training unit, configured to input first sample input information including the first real driving data into the multimodal coding layer, so as to obtain a first sample implicit representation output by the multimodal coding layer that combines spatial and temporal information; 决策控制层训练单元,被配置为将包括所述第一样本隐式表示的第一中间样本输入信息输入所述决策控制层,以获取所述决策控制层所输出的第一预测自动驾驶策略信息;以及a decision control layer training unit, configured to input first intermediate sample input information including the implicit representation of the first sample into the decision control layer to obtain first predicted autonomous driving strategy information output by the decision control layer; and 参数调整单元,被配置为基于所述第一预测自动驾驶策略信息和第一真实自动驾驶策略信息,调整所述多模态编码层和决策控制层的参数。A parameter adjustment unit is configured to adjust parameters of the multimodal coding layer and the decision control layer based on the first predicted autonomous driving strategy information and the first real autonomous driving strategy information. 25.一种基于自动驾驶模型的自动驾驶装置,包括:25. An automatic driving device based on an automatic driving model, comprising: 控制单元,被配置为利用权利要求19所述的自动驾驶模型控制目标车辆执行自动驾驶;以及A control unit configured to control the target vehicle to perform autonomous driving using the autonomous driving model described in claim 19; and 第二真实驾驶数据获取单元,被配置为获取在自动驾驶过程中的真实驾驶数据,所述真实驾驶数据包括所述目标车辆的导航信息、针对目标车辆周围环境的真实感知信息、以及真实自动驾驶策略信息,所述真实驾驶数据用于对所述自动驾驶模型进行迭代训练。The second real driving data acquisition unit is configured to acquire real driving data during the autonomous driving process, wherein the real driving data includes navigation information of the target vehicle, real perception information of the surrounding environment of the target vehicle, and real autonomous driving strategy information, and the real driving data is used to iteratively train the autonomous driving model. 26.一种电子设备,包括:26. An electronic device comprising: 至少一个处理器;以及at least one processor; and 与所述至少一个处理器通信连接的存储器;其中a memory communicatively coupled to the at least one processor; wherein 所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-18或20-23中任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-18 or 20-23. 27.一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-18或20-23中任一项所述的方法。27. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method according to any one of claims 1-18 or 20-23. 28.一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现权利要求1-18或20-23中任一项所述的方法。28. A computer program product, comprising a computer program, wherein the computer program implements the method of any one of claims 1-18 or 20-23 when executed by a processor. 29.一种自动驾驶车辆,包括:29. An autonomous driving vehicle comprising: 根据权利要求24所述的自动驾驶模型的训练装置、根据权利要求25所述的自动驾驶装置、以及根据权利要求26所述的电子设备中的一者。One of a training device for an autonomous driving model according to claim 24, an autonomous driving device according to claim 25, and an electronic device according to claim 26.
CN202310274740.3A 2023-03-17 2023-03-17 Autonomous driving models, training methods, devices, and vehicles Active CN116881707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310274740.3A CN116881707B (en) 2023-03-17 2023-03-17 Autonomous driving models, training methods, devices, and vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310274740.3A CN116881707B (en) 2023-03-17 2023-03-17 Autonomous driving models, training methods, devices, and vehicles

Publications (2)

Publication Number Publication Date
CN116881707A CN116881707A (en) 2023-10-13
CN116881707B true CN116881707B (en) 2024-11-22

Family

ID=88255603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310274740.3A Active CN116881707B (en) 2023-03-17 2023-03-17 Autonomous driving models, training methods, devices, and vehicles

Country Status (1)

Country Link
CN (1) CN116881707B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI842641B (en) * 2023-10-19 2024-05-11 財團法人車輛研究測試中心 Sensor fusion and object tracking system and method thereof
CN117539260B (en) * 2023-12-07 2025-04-22 北京百度网讯科技有限公司 Automatic driving model, method and vehicle based on time sequence recursion autoregressive reasoning
CN118393876A (en) * 2024-04-18 2024-07-26 北京百度网讯科技有限公司 Training method of automatic driving model and control information acquisition method
CN118397862B (en) * 2024-04-23 2025-07-22 北京百度网讯科技有限公司 Control signal acquisition method and automatic driving model training method
CN118657044A (en) * 2024-05-28 2024-09-17 北京百度网讯科技有限公司 Method, device and electronic device for training autonomous driving model
CN118323198B (en) * 2024-06-13 2024-08-27 新石器慧通(北京)科技有限公司 Training and using method and device of decision model in automatic driving vehicle and vehicle
CN118393973B (en) * 2024-06-26 2024-09-17 山东海量信息技术研究院 Automatic driving control method, device, system, equipment and storage medium
CN118597196B (en) * 2024-06-27 2025-02-18 武汉大学 High-precision map human-in-the-loop method and system for autonomous driving decision control
CN118651213B (en) * 2024-07-23 2024-12-03 江苏海丰交通设备科技有限公司 Intelligent cooperative control system and method for semitrailer based on multi-mode sensing
CN119540929B (en) * 2025-01-23 2025-06-03 北京全路通信信号研究设计院集团有限公司 Intelligent analysis method and system for shunting safety
CN119975417B (en) * 2025-04-17 2025-07-18 吉林大学 End-to-end automatic driving decision-making method integrating priori knowledge

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021226921A1 (en) * 2020-05-14 2021-11-18 Harman International Industries, Incorporated Method and system of data processing for autonomous driving
CN115578876A (en) * 2022-10-14 2023-01-06 浪潮(北京)电子信息产业有限公司 Automatic driving method, system, equipment and storage medium of vehicle

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108773373B (en) * 2016-09-14 2020-04-24 北京百度网讯科技有限公司 Method and device for operating an autonomous vehicle
CN113401144B (en) * 2021-07-27 2022-10-11 阿波罗智能技术(北京)有限公司 Control method, apparatus, device and medium for autonomous vehicle
CN114194211B (en) * 2021-11-30 2023-04-25 浪潮(北京)电子信息产业有限公司 An automatic driving method, device, electronic equipment, and storage medium
CN114358128B (en) * 2021-12-06 2024-07-12 深圳先进技术研究院 Method for training end-to-end automatic driving strategy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021226921A1 (en) * 2020-05-14 2021-11-18 Harman International Industries, Incorporated Method and system of data processing for autonomous driving
CN115578876A (en) * 2022-10-14 2023-01-06 浪潮(北京)电子信息产业有限公司 Automatic driving method, system, equipment and storage medium of vehicle

Also Published As

Publication number Publication date
CN116881707A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN116881707B (en) Autonomous driving models, training methods, devices, and vehicles
CN116880462B (en) Automatic driving model, training method, automatic driving method and vehicle
JP7222868B2 (en) Real-time prediction of object behavior
CN111252061B (en) Real-time decision-making for autonomous vehicles
US11410315B2 (en) High quality instance segmentation
CN116991157B (en) Automatic driving model with human expert driving capability, training method and vehicle
CN115366920B (en) Decision-making method, device, equipment and medium for automatic driving vehicle
CN116859724B (en) Automatic driving model for simultaneous decision and prediction of time sequence autoregressive and training method thereof
CN118163808B (en) World knowledge enhanced autopilot model, training method and autopilot method
CN116776151A (en) Autonomous driving models and training methods that can autonomously interact with people outside the vehicle
CN114758502B (en) Dual-vehicle combined track prediction method and device, electronic equipment and automatic driving vehicle
CN117539260B (en) Automatic driving model, method and vehicle based on time sequence recursion autoregressive reasoning
CN117539253B (en) Automatic driving method and device capable of achieving autonomous escape following instruction and vehicle
CN116882122B (en) Method and device for constructing a simulation environment for autonomous driving
CN117010265B (en) Autonomous driving model capable of natural language interaction and its training method
CN117519206A (en) Automatic driving model, method and device based on generated diffusion model and vehicle
CN117035032B (en) Method for model training by fusing text data and automatic driving data and vehicle
CN116861230A (en) Automatic driving model, training method and device for outputting interpretation information and vehicle
CN114212108A (en) Automatic driving method, device, vehicle, storage medium and product
CN116560377B (en) Automatic driving model for predicting position track and training method thereof
CN117707172A (en) Decision-making method and device for automatic driving vehicle, equipment and medium
CN117034732B (en) Automatic driving model training method based on true and simulated countermeasure learning
CN116872962A (en) Automatic driving model containing manual intervention prediction, training method, training equipment and vehicle
CN119354195A (en) Trajectory prediction method, trajectory prediction model training method, medium and device
CN118551806A (en) Automatic driving model, automatic driving method and device based on state node prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant