Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another element. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.
The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
In the related art, optimization and rule-based algorithms in autopilot technology typically rely on high-precision maps and algorithm optimization for different scenarios. High-precision maps, also called high-precision maps, are maps used by autopilot vehicles. The high-precision map has accurate vehicle position information and rich road element data information, and can help automobiles to predict complex road surface information such as gradient, curvature, heading and the like, so that potential risks are better avoided. Accordingly, the application of algorithms is limited to very localized areas, may fail in autopilot due to map errors, and is difficult to address in a large number of long tail situations. Furthermore, the algorithms in the related art rely on a large amount of manual labeling, which on the one hand consumes a large amount of manual effort, and on the other hand aims at perception. For example, there is a lot of background information during driving, as well as remote obstacles not related to driving (e.g. non-motor vehicles on opposite lanes). In automatic labeling for perception purposes, it is difficult for labeling personnel to determine which obstacles should be identified, which should not be focused on, and it is difficult to directly service policy optimization and driving decisions for automatic driving.
In the related art, the unmanned technique mainly relies on the cooperation of the perception module and the planning control module. The working process of autopilot comprises two phases: first, unstructured information obtained by a sensor such as a camera or radar is converted into structured information (structured information includes obstacle information, other vehicle information, pedestrian and non-motor vehicle information, lane line information, traffic light information, other static road surface information, and the like). The information can be combined and matched with the high-precision map, so that the position information on the high-precision map can be accurately obtained. Second, predictions and decisions are made based on structured information and related observation histories. Wherein predicting comprises predicting a change in the ambient structured environment over a period of time in the future; decisions include generating some structured information (e.g., lane change, stuffing, waiting) that can be used for subsequent trajectory planning. Third, a trajectory of the target vehicle for a future period of time is planned, such as a planned trajectory or control information (e.g., planned speed and position), based on the structured decision information and the change in the surrounding structured environment.
It has been found through research that awareness-prediction-planning-based autopilot technology may face some technical problems. First of all, the problem of error accumulation is that perception is not directly responsible for decision making, which makes it not necessary for perception to capture information critical to decision making, and further, because perceived errors are difficult to make up in subsequent flows (e.g., obstacles within an area may not be identified), which may have difficulty making a correct decision in the event of loss of a critical obstacle. Secondly, the problem of coupling between prediction and planning cannot be solved, and the behavior of surrounding obstacles, especially critical obstacles interacting with the target vehicle, may be affected by the target vehicle. In other words, during the running of the autopilot model, there is a coupling between the two modules, prediction and planning, so that the streaming decisions have an impact on the final autopilot effect. Furthermore, there is the problem of representing defects in the structured information, which is entirely limited by manually predefined criteria, and algorithms are prone to failure once a new paradigm is encountered that is not well defined (e.g., the occurrence of unknown obstructions, unknown conditions of the vehicle and pedestrian, etc.). Finally, the problem of dependence on high-cost maps (such as high-precision maps) is solved, and the related technology mainly relies on information such as point clouds of the high-precision maps to position vehicles, however, in practice, the high-precision maps are only available in limited areas, which limits the practical application area of automatic driving; in addition, the updating cost of the high-precision map is huge, and once the map and an actual road are not matched, decision failure is easy to cause.
Based on the above, the present disclosure provides a training method of an autopilot model, an autopilot method implemented by using the autopilot model, a training apparatus of the autopilot model, an autopilot apparatus based on the autopilot model, an electronic device, a computer-readable storage medium, a computer program product, and an autopilot vehicle, and a perception-decision integrated driving technique is adopted, so that perception is directly responsible for decision making, perception is facilitated to capture information playing a key role in decision making, error accumulation is reduced, and a coupling problem between prediction and decision making in related technologies is solved. In addition, the perception is directly responsible for decision making, so that the problem that an algorithm is easy to fail due to the fact that structured prediction information is limited by a manual predefined standard can be solved, the problem that decision making is failed due to the fact that high-precision map updating is not timely and the area is limited can be solved, and the updating cost of the high-precision map can be saved due to the fact that dependence on the high-precision map is eliminated. In addition, under the condition that the obtained real driving data is marked with limited real automatic driving strategy information, the real automatic driving strategy information corresponding to the real driving data can be obtained based on the perception information in the real driving data, the model training process is correspondingly completed, the automatic driving technology of the heavy perception light map is realized, a large amount of real driving data is adopted to train the automatic driving model, the decision efficiency can be ensured, the automatic driving behavior can be well aligned to the preference of human passengers, the user experience is improved, and the long learning process of cold starting is avoided.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes a motor vehicle 110, a server 120, and one or more communication networks 130 coupling the motor vehicle 110 to the server 120.
In an embodiment of the present disclosure, motor vehicle 110 may include a computing device in accordance with an embodiment of the present disclosure and/or be configured to perform a method in accordance with an embodiment of the present disclosure.
The server 120 may run one or more services or software applications that enable autopilot. In some embodiments, server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user of motor vehicle 110 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.
The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.
In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from motor vehicle 110. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of motor vehicle 110.
Network 130 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, the one or more networks 130 may be a satellite communications network, a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (including, for example, bluetooth, wiFi), and/or any combination of these with other networks.
The system 100 may also include one or more databases 150. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 150 may be used to store information such as audio files and video files. The data store 150 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 150 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.
In some embodiments, one or more of databases 150 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.
Motor vehicle 110 may include a sensor 111 for sensing the surrounding environment. The sensors 111 may include one or more of the following: visual cameras, infrared cameras, ultrasonic sensors, millimeter wave radar, and laser radar (LiDAR). Different sensors may provide different detection accuracy and range. The camera may be mounted in front of, behind or other locations on the vehicle. The vision cameras can capture the conditions inside and outside the vehicle in real time and present them to the driver and/or passengers. In addition, by analyzing the captured images of the visual camera, information such as traffic light indication, intersection situation, other vehicle running state, etc. can be acquired. The infrared camera can capture objects under night vision. The ultrasonic sensor can be arranged around the vehicle and is used for measuring the distance between an object outside the vehicle and the vehicle by utilizing the characteristics of strong ultrasonic directivity and the like. The millimeter wave radar may be installed in front of, behind, or other locations of the vehicle for measuring the distance of an object outside the vehicle from the vehicle using the characteristics of electromagnetic waves. Lidar may be mounted in front of, behind, or other locations on the vehicle for detecting object edges, shape information for object identification and tracking. The radar apparatus may also measure a change in the speed of the vehicle and the moving object due to the doppler effect.
Motor vehicle 110 may also include a communication device 112. The communication device 112 may include a satellite positioning module capable of receiving satellite positioning signals (e.g., beidou, GPS, GLONASS, and GALILEO) from satellites 141 and generating coordinates based on these signals. The communication device 112 may also include a module for communicating with the mobile communication base station 142, and the mobile communication network may implement any suitable communication technology, such as the current or evolving wireless communication technology (e.g., 5G technology) such as GSM/GPRS, CDMA, LTE. The communication device 112 may also have a Vehicle-to-Everything (V2X) module configured to enable, for example, vehicle-to-Vehicle (V2V) communication with other vehicles 143 and Vehicle-to-Infrastructure (V2I) communication with Infrastructure 144. In addition, the communication device 112 may also have a module configured to communicate with a user terminal 145 (including but not limited to a smart phone, tablet computer, or wearable device such as a watch), for example, by using a wireless local area network or bluetooth of the IEEE802.11 standard. With the communication device 112, the motor vehicle 110 can also access the server 120 via the network 130.
Motor vehicle 110 may also include a control device 113. The control device 113 may include a processor, such as a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), or other special purpose processor, etc., in communication with various types of computer readable storage devices or mediums. The control device 113 may include an autopilot system for automatically controlling various actuators in the vehicle. The autopilot system is configured to control a powertrain, steering system, braking system, etc. of a motor vehicle 110 (not shown) via a plurality of actuators in response to inputs from a plurality of sensors 111 or other input devices to control acceleration, steering, and braking, respectively, without human intervention or limited human intervention. Part of the processing functions of the control device 113 may be implemented by cloud computing. For example, some of the processing may be performed using an onboard processor while other processing may be performed using cloud computing resources. The control device 113 may be configured to perform a method according to the present disclosure. Furthermore, the control means 113 may be implemented as one example of a computing device on the motor vehicle side (client) according to the present disclosure.
The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.
According to one aspect of the present disclosure, a method of training an autopilot model is provided. FIG. 2 shows a schematic diagram of an autopilot model 200 in accordance with an embodiment of the present disclosure; and fig. 3 shows a flowchart of a training method 300 of an autopilot model in accordance with an embodiment of the present disclosure.
Referring first to fig. 2, the autopilot model 200 includes a multi-modal encoding layer 210 and a decision control layer 220, the multi-modal encoding layer 210 and the decision control layer 220 being connected to form an end-to-end neural network model such that the decision control layer 220 predicts autopilot strategy information directly based on the output of the multi-modal encoding layer 210.
As described above, in the related art, prediction may be performed based on the sensing information to obtain future prediction information, and then the decision control layer performs planning prediction based on the future prediction information, that is, the decision control layer 220 performs planning prediction based on the future prediction information instead of directly performing planning prediction based on the sensing information. In the embodiment of the present application, the decision control layer 220 may directly predict the automatic driving strategy information based on the output of the multi-mode coding layer 210, and the multi-mode coding layer 210 is used for performing coding calculation on the perception information, which is equivalent to that the decision control layer 220 may directly plan based on the perception information to predict the automatic driving strategy information. In other words, the training method in the embodiment of the application can learn the automatic driving technology with perception directly responsible for decision making.
The training method 300 of the autopilot model includes a first training of the multimodal coding layer 210 and the decision control layer 220. As shown in fig. 3, the first training includes:
Step S310, acquiring first real driving data in the running process of the vehicle, wherein the first real driving data comprises first navigation information of the vehicle and first real perception information aiming at the surrounding environment of the vehicle, and the first real perception information comprises current perception information and historical perception information aiming at the surrounding environment of the vehicle;
Step S320, acquiring first real automatic driving strategy information corresponding to the first real driving data based on the first real perception information;
Step S330, inputting first sample input information comprising first real driving data into the multi-mode coding layer to obtain a first sample implicit representation output by the multi-mode coding layer;
Step S340, inputting first intermediate sample input information including implicit representation of a first sample into a decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer; and
And step S350, adjusting parameters of the multi-mode coding layer and the decision control layer based on the first prediction automatic driving strategy information and the first real automatic driving strategy information.
In an example, the first training may be offline pre-training, i.e. in the first training process, the autonomous driving model 200 is not deployed on a real vehicle travelling on a real road scene, but rather the model is trained with a large amount of real driving data collected, avoiding a cold-start lengthy learning process.
The first real driving data may include driving data collected during unmanned driving, and/or driving data collected by a human driver driving a vehicle having associated sensors. Some of these driving data are not annotated with true autopilot information, such as trajectories or control signals (e.g., throttle, brake, steering amplitude, etc.), and cannot be used for model training. According to the embodiment of the application, corresponding automatic driving strategy information is acquired based on the perception information obtained by the sensor in the driving data, so that the false labeling of the corresponding real automatic driving strategy information of the driving data is realized.
In an example, the autopilot model 200 may employ a transducer network structure with an encoder (Encoder) and a Decoder (Decoder). It is understood that the autopilot model 200 may be another neural network model based on a transducer network structure, which is not limited herein. The transducer architecture can compute implicit representations of model inputs and outputs through a self-attention mechanism. In other words, the transducer architecture may be a Encoder-Decoder model built based on this self-attention mechanism.
In an example, the first navigation information In1 of the vehicle In the first real driving data may include vectorized navigation information and vectorized map information, which may be obtained by vectorizing one or more of lane-level, or road-level navigation information and coarse positioning information.
In an example, the first real perception information (which may, for example, but not limited to, include In2, in3, and In4, and is described below by taking the example that the perception information includes In2, in3, and In 4) for the surrounding environment of the vehicle In the first real driving data may include perception information In2 of one or more cameras on the vehicle, perception information In3 of one or more laser radars, and perception information In4 of one or more millimeter wave radars. It is to be understood that the perception information of the surroundings of the vehicle is not limited to the above-described one form, and may include, for example, only the perception information In2 of the plurality of cameras, but not the perception information In3 of the one or more lidars and the perception information In4 of the one or more millimeter wave radars. The sensing information In2 acquired by the camera may be sensing information In the form of a picture or a video, and the sensing information In3 acquired by the laser radar may be sensing information In the form of a radar point cloud (e.g., a three-dimensional point cloud). In an example, the perceived information includes current perceived information x t for the surrounding environment of the target vehicle during the running of the vehicle and historical perceived information x t-Δt corresponding to a plurality of historical moments, where a time span between t and Δt may have a preset duration.
In an example, the multimodal encoding layer 210 may perform encoding calculations on the first real driving data to generate a corresponding implicit representation. The implicit representation may be, for example, an implicit representation in a Bird's Eye View (BEV) space. For example, the perception information of the cameras can be input to a shared Backbone network (Backbone) first, and the data characteristics of each camera can be extracted. The perceived information of the plurality of cameras is then fused and converted to BEV space. Then, cross-modal fusion can be performed in the BEV space, and the pixel-level visual data and the lidar point cloud are fused. Finally, time sequence fusion is carried out to form an implicit representation e t of the BEV space.
In one example, projection of the input information of multiple cameras into an implicit representation of the BEV space may be achieved using a Transformer Encoder structure that fuses the spatio-temporal information. For example, the spatio-temporal information may be utilized by a grid-partitioned BEV query mechanism (BEV queries) that presets parameters. The BEV query mechanism is enabled to extract features from multiple camera views of interest by using a spatial cross-attention mechanism (i.e., the BEV query mechanism extracts required spatial features from multiple camera features through the attention mechanism), thereby aggregating spatial information; in addition, the historical information is fused by a time-series self-attention mechanism (i.e., each time-series generated BEV feature obtains the required time-series information from the BEV feature at the previous time), thereby aggregating the time-series information.
Accordingly, the decision control layer 220 obtains the first predictive autopilot strategy information based on the implicit representation e t of the input. The first predicted automatic driving strategy information may include, for example, a planned trajectory Out1 or a control signal Out2 for the vehicle (e.g., a signal to control throttle, brake, steering amplitude, etc.). In an example, the decision control layer 220 may include a decoder in a transducer.
Since the multi-modal coding layer 210 and the decision control layer 220 of the model to be trained are connected to form an end-to-end neural network model, the perception information in the sample input information (including the real driving data) can be directly responsible for the decision, and the problem of coupling between prediction and planning in the related technology can be solved. In addition, the corresponding implicit representation is obtained by encoding and calculating the real driving data, so that the problem that the algorithm is easy to fail due to the representation defect of the structured information in the related technology can be solved. In addition, as the perception information in the sample input information can be directly responsible for decision making, the perception can capture information which is critical to the decision making through training, and error accumulation caused by perception errors in model training is reduced. Furthermore, as the perception is directly responsible for decision, the automatic driving technology of heavy perception light map is realized, and further the problem of decision learning failure caused by untimely updating of the high-precision map and limited area can be solved, and the dependence on the high-precision map is eliminated, so that the updating cost of the high-precision map can be saved.
In addition, because the first real automatic driving strategy information is acquired based on the first real perception information, under the condition that the sample input information is marked with the real automatic driving strategy information to be limited, the first real automatic driving strategy information corresponding to the first real driving data is acquired through the perception information in the sample number input information, and the model training process is correspondingly completed. In other words, when training the autopilot model, if there is no or only a small amount of vehicle trajectory data or control signal data for the vehicle in the sample data, the first real autopilot strategy information (e.g., pseudo-labeling trajectory data) can be acquired based on the first real awareness information, thereby completing the model training process. Therefore, the automatic driving technology of the heavy perception light map can be realized, and the decision efficiency can be ensured by training the automatic driving model by adopting a large amount of real driving data, so that the automatic driving behavior can be well aligned to the preference of human passengers, the user experience and the safety are improved, and the long learning process of cold start is avoided.
According to some embodiments, the step S320 may include: the first real awareness information is input to a driving strategy prediction model (not shown in the drawing) to obtain first real automatic driving strategy information output by the driving strategy prediction model.
In an example, first real awareness information (x 1,....,xt) (e.g., sensor awareness information) may be input to a driving strategy prediction model to predict a corresponding trajectory plan (y 1,....,yt). The predicted trajectory plan (y 1,....,yt) can be used as first real automatic driving strategy information in the process of training the multi-mode coding layer and the decision control layer, so that the pseudo-annotation of the first real driving data is realized.
Because the first real perception information (for example, an image acquired by a camera or a point cloud acquired by a radar) comprises current perception information and historical perception information, automatic driving strategy information (for example, a driving track) of an automatic driving vehicle is hidden between the current perception information and the historical perception information, a driving strategy prediction model can be trained based on a small amount of data labels (namely, driving data marked with the real track), a corresponding track planning (y 1,....,yt) can be predicted based on the first real perception information (x 1,....,xt), and therefore the first real perception information without the track labels is marked with a pseudo-label track label.
It will be appreciated that the driving strategy prediction model may be a model independent of the autopilot model 200.
According to some embodiments, the autopilot model 200 may also include an assessment feedback layer 230. And the first training of the multi-modal coding layer 210 and the decision control layer 220 may further include: the first sample implicit representation e t is input to the evaluation feedback layer 230 to obtain first sample evaluation feedback information Out3 for the first predicted automatic driving strategy information output by the evaluation feedback layer 230. And, the step S350 may include: parameters of the multimodal encoding layer 210 and the decision control layer 220 are adjusted based on the first sample evaluation feedback information Out3 for the first predicted automatic driving strategy information, the first predicted automatic driving strategy information and the first real automatic driving strategy information.
In an example, the evaluation feedback layer 230 may include a decoder in a transducer.
Thus, by introducing the evaluation feedback layer 230 into the automatic driving model 200, it is possible to learn whether the current driving behavior is derived from a human driver or a model, whether the current driving is comfortable, whether the current driving violates traffic rules, whether the current driving belongs to dangerous driving, and the like, thereby improving user experience. In addition, when model training is performed, the multi-modal coding layer 210 can be further trained by the evaluation feedback layer 230 on the basis of the decision control layer 220, so that the coding of the multi-modal coding layer 210 is more accurate, and the decision control layer 220 obtained by training can predict and obtain more optimized automatic driving strategy information.
In an example, parameters of the multi-modal coding layer and the decision control layer may be adjusted using reinforcement learning. For example, the first actual automatic driving strategy information may be based on information including the first predicted automatic driving strategy information (y 1,....,yt)And the first sample evaluation feedback information (r 1,....,rt) performs reinforcement learning.
In an example, the reinforcement learning may be performed using a PPO algorithm or a SAC algorithm.
In an example, the parameters of the multi-mode coding layer and the decision control layer may be adjusted using the objective function in the following equation (1):
Where a t may indicate a dominance function for time t (ADVANTAGE FUNCTION) and a t may be derived based on the first sample evaluation feedback information (r 1,....,rt). Alpha may be a super parameter for adjusting the magnitude of the loss value.
When reinforcement learning training is performed on a real vehicle, the autopilot model may be required to predict some erroneous or failed results, and even the target vehicle may be required to collide with surrounding obstacles to learn based on erroneous or collision experience. However, due to cost and safety considerations, it is not possible to have an autonomous vehicle collide realistically during real vehicle travel.
According to some embodiments, the first real driving data may further comprise a first intervention identification, the first intervention identification being capable of characterizing whether the first real automatic driving maneuver information is driving maneuver information with human intervention. And the step S350 may include: based on the first intervention identification (i 1,....,iT), the first sample evaluation feedback information (r 1,....,rt) for the first predicted automatic driving strategy information, the first predicted automatic driving strategy information (y 1,....,yt) and the first real automatic driving strategy informationParameters of the multi-modal coding layer 210 and the decision control layer 220 are adjusted.
In the automatic running process of the real vehicle, a safety person/driver can intervene at any time at critical time, take control right of the automatic driving vehicle, and avoid unacceptable model training cost caused by possible collision during running of the real vehicle. After the crisis passes, control is returned to the autonomous vehicle. The first intervention identification is used to characterize whether the first real autopilot strategy information is autopilot strategy information in the presence of human intervention. In other words, by introducing the first intervention mark, the model can learn an automatic driving strategy of intervention of a safety officer, the driving behavior learned by the model can be well aligned to the preferences of human passengers, and the user experience and safety are improved. Reinforcement learning of the circuit by a person can gradually learn to continuously reduce the adverse conditions of intervention. Through the mechanism, the reinforcement learning efficiency can be improved, and the influence of the inferior experience on the learning process can be reduced, so that the robustness of the model obtained through training is further improved.
FIG. 4 shows a flow chart of a training method portion process of an autopilot model in accordance with an embodiment of the present disclosure. According to some embodiments, as shown in fig. 4, the training process of evaluating feedback layer 230 may include:
Step S410, acquiring second sample input information and real evaluation feedback information aiming at the second sample input information, wherein the sample input information comprises navigation information of a sample vehicle, current perception information and historical perception information aiming at the surrounding environment of the sample vehicle;
Step S420, inputting second sample input information into the multi-mode coding layer to obtain second sample implicit expression output by the multi-mode coding layer;
Step S430, the second sample is implicitly input into the evaluation feedback layer 230 to obtain the predicted evaluation feedback information for the second sample input information output by the evaluation feedback layer 230; and
Step S440, adjusting parameters of the multi-mode coding layer and the evaluation feedback layer 230 based on the real evaluation feedback information and the predicted evaluation feedback information.
The second sample input information may be an input sample acquired by the autonomous vehicle during autonomous driving (e.g., L4 level autonomous driving) or during manual driving, or may be acquired in a simulation environment. For example, the second sample input information may include sensor (e.g., camera, radar) perception information, and navigation information, as well as other information such as lane-level maps.
True evaluation feedback informationThe evaluation feedback information (evaluation of the driving experience of the automatically driven vehicle by the passenger or the driver) that can be manually fed back, for example, can indicate whether the current driving behavior is derived from a human driver or a model, whether the current driving is comfortable, whether the current driving violates traffic regulations, whether the current driving belongs to dangerous driving, and the like.
Accordingly, the predictive evaluation feedback information (r t) is the prediction result output by the evaluation feedback layer 230.
In an example, the parameters of the multi-mode encoding layer 210 and the evaluation feedback layer 230 may be adjusted using the objective function in equation (2) as follows:
in an example, feedback modeling may be utilized to learn a function to estimate the evaluation feedback information. In other words, the model itself may be made to estimate the expected benefit (i.e., the predicted outcome output by the above-mentioned assessment feedback layer 230) obtained by the current driving trajectory. For example, (r t) can be determined using the following equation (3):
r t=R(xt,…,xt-l+1) equation (3)
Where (x t,…,xt-l+1) may be the second sample input information.
Fig. 5 shows a flowchart of a training method portion procedure of an automatic driving model according to an embodiment of the present disclosure. According to some embodiments, the autopilot model 200 may further include a future prediction layer, and the first real awareness information may include future awareness information for the vehicle surroundings. And, as shown in fig. 5, performing the first training on the multi-mode coding layer and the decision control layer may further include:
Step S510, acquiring future real information aiming at the surrounding environment of the vehicle based on the future perception information;
Step S520, the first sample is implicitly expressed and input into a future prediction layer so as to acquire future prediction information output by the future prediction layer; and
Step S530, parameters of the multi-mode coding layer and the future prediction layer are adjusted based on the future real information and the future prediction information.
The future perceptual information x t+Δt may be the perceptual information that follows the current perceptual information x t, and there may be a time span of a preset duration between t and Δt. Accordingly, the future actual information may include information with labels (e.g., detection boxes) corresponding to the future perceived information.
Future prediction layers may include decoders in the transform.
Therefore, when model training is performed, the multi-modal coding layer 210 can be further trained through the future prediction layer on the basis of the decision control layer 220, so that the multi-modal coding layer 210 is encoded more accurately, and the decision control layer 220 can predict and obtain more optimized automatic driving strategy information.
According to some embodiments, the future prediction layer and the decision control layer may share the same network structure, i.e. the output of the network structure may comprise future prediction information and autopilot strategy information. Illustratively, the future prediction layer and the decision control layer may share the same decoder in the transform.
According to some embodiments, the future prediction information may include at least one of: future predictive awareness information for the surrounding environment of the vehicle (e.g., sensor information at some point in the future)The sensor information of a future moment comprises camera input information or radar input information of the future moment); and future prediction implicit representation corresponding to future prediction awareness information(E.g., an implicit representation of sensor information corresponding to a future point in time in BEV space).
According to some embodiments, with further reference to fig. 2, the future prediction layers may include at least one of a first future prediction layer 240 and a second future prediction layer 250. The first future prediction layer 240 may be configured to represent output future prediction awareness information Out4 (e.g., sensor information at some time in the future) based on the input first sample implicit e t The sensor information of a future moment comprises camera input information or radar input information of the future moment); the second future prediction layer 250 may be configured to output a future prediction implicit representation out5 based on the input first sample implicit representation e t Such as an implicit representation of BEV space at some point in the future).
Fig. 6 illustrates a flow chart of a method 600 of training an autopilot model in accordance with another embodiment of the present disclosure. The training method 600 of the automatic driving model includes a step of first training the multi-modal coding layer 210 and the decision control layer 220 with the training method 300 of the automatic driving model, and according to some embodiments, the training method 600 of the automatic driving model may further include:
Step S610, obtaining second real driving data in the process of controlling the target vehicle to execute the automatic driving by using the automatic driving model obtained by the first training, wherein the second real driving data comprises second navigation information of the target vehicle, second real perception information aiming at the surrounding environment of the target vehicle and second real automatic driving strategy information And
And performing second training on the automatic driving model based on second real driving data. The second training includes the following steps S621 to S623:
step S621, inputting third sample input information comprising second real driving data into the multi-mode coding layer to obtain a third sample implicit representation output by the multi-mode coding layer;
Step S622, inputting second intermediate sample input information including the implicit representation of the third sample into the decision control layer to obtain second predicted automatic driving strategy information (y 1,....,yt) output by the decision control layer; and
Step S623, based on the second predicted automatic driving strategy information (y 1,....,yt) and the second actual automatic driving strategy informationParameters of the multi-mode coding layer and the decision control layer are adjusted.
In an example, the first training may be offline pre-training, i.e., during the first training process, the autopilot model 200 is not deployed on a real vehicle traveling on a real road scene or a simulated vehicle traveling on a simulated road scene. Accordingly, the second training may be training performed by acquiring second real driving data collected during driving of the vehicle controlled by the autopilot model obtained by the first training, that is, during the second training, the autopilot model 200 is deployed on a real vehicle traveling on a real road scene or a simulated vehicle traveling on a simulated road scene.
In the second training process, the second navigation information of the target vehicle in the second real driving data may include vectorized navigation information and vectorized map information, and the vectorized navigation information and the vectorized map information may be obtained by vectorizing one or more of lane-level or road-level navigation information and coarse positioning information. The second real perception information may include perception information of one or more cameras, perception information of one or more lidars, and perception information of one or more millimeter wave radars on the vehicle in the real road scene. It is to be understood that the perception information of the surroundings of the target vehicle is not limited to the above-described one form, and may include, for example, only the perception information of a plurality of cameras, but not the perception information of one or more lidars and the perception information of one or more millimeter wave radars. The perceived information obtained by the camera may be perceived information in the form of a picture or video, and the perceived information obtained by the lidar may be perceived information in the form of a radar point cloud (e.g., a three-dimensional point cloud). Second real autopilot strategy informationMay include planned trajectories of autonomous vehicles or control signals for the vehicle (e.g., signals to control throttle, brake, steering amplitude, etc.) acquired in a real road scene.
Therefore, the automatic driving model can train in a real road scene, a simulated road scene and an offline pre-training scene, so that training of mass data and multiple scenes is realized, and the accuracy of a model training result is further improved while the model training efficiency is improved.
According to some embodiments, when the autopilot model includes the assessment feedback layer 230, second training the autopilot model based on the second real driving data may further include: the third sample is implicitly represented and input to the evaluation feedback layer 230 to obtain second sample evaluation feedback information (r 1,....,rt) for the second predicted automatic driving strategy information output by the evaluation feedback layer 230. Wherein the feedback information (r 1,....,rt), the second predicted automatic driving strategy information (y 1,....,yt), and the second actual automatic driving strategy information are evaluated based on a second sample for the second predicted automatic driving strategy informationParameters of the multi-modal coding layer 210 and the decision control layer 220 are adjusted.
Second real autopilot strategy informationMay be manual driving trajectory data. Accordingly, the second predictive autopilot strategy information (y 1,....,yt) is the predicted outcome (trajectory plan) output by the decision control layer 220.
The second sample evaluation feedback information (r 1,....,rt) may indicate, for example, whether the current driving behavior originates from a human driver or a model, whether the current driving is comfortable, whether the current driving violates traffic rules, whether the current driving belongs to dangerous driving, and the like.
Therefore, by further utilizing the sample evaluation feedback information to perform cooperative parameter adjustment on the multi-mode coding layer 210 and the decision control layer 220, the learning effect of the multi-mode coding layer 210 and the decision control layer 220 can be further improved, so that the user experience is improved.
In an example, parameters of the multi-modal coding layer and the decision control layer may be adjusted using reinforcement learning. For example, the second actual automatic driving strategy information may be based on information including the second predicted automatic driving strategy information (y 1,....,yt)And the second sample evaluation feedback information (r 1,....,rt) performs reinforcement learning.
In an example, the reinforcement learning may be performed using a PPO algorithm or a SAC algorithm.
In an example, the parameters of the multi-mode coding layer 210 and the decision control layer 220 may be adjusted using an objective function in equation (3) as follows:
Where a t may indicate a dominance function for time t (ADVANTAGE FUNCTION) and a t may be derived based on the second sample evaluation feedback information (r 1,....,rt). Alpha may be a super parameter for adjusting the magnitude of the loss value.
According to some embodiments, when the autopilot model includes an assessment feedback layer, the second real driving data may further include a second intervention identifier (i 1,....,iT), the second intervention identifier (i 1,....,iT) being capable of characterizing whether the second real autopilot strategy information is autopilot strategy information with human intervention. Wherein performing the second training of the autopilot model based on the second real driving data may further include: and inputting the implicit representation of the third sample into an evaluation feedback layer to obtain second sample evaluation feedback information which is output by the evaluation feedback layer and aims at the second predicted automatic driving strategy information. Wherein the feedback information (r 1,....,rt), the second predicted automatic driving strategy information (y 1,....,yt) and the second actual automatic driving strategy information are evaluated based on the second intervention identity (i 1,....,iT), a second sample of the second predicted automatic driving strategy informationParameters of the multi-modal coding layer 210 and the decision control layer 220 are adjusted.
In an example, parameters of the multi-modal coding layer and the decision control layer may be adjusted using feedback reinforcement learning and human in-loop learning. For example, the feedback information (r 1,....,rt), the second intervention identity (i 1,....,it), the second predicted automatic driving strategy information (y 1,....,yt), the second actual automatic driving strategy information may be evaluated based on a second sample includedAnd the quintuple data including the second sample input information (x 1,....,xt).
Wherein when the second intervention sign (i 1,....,iT) is true, the automatic driving vehicle is represented by manual workThe control is not controlled by the control signal sent by the automatic driving model any more; when the intervention flag (i 1,....,iT) is non-true, it indicates that the automatic driving vehicle is controlled by the control signal sent by the automatic driving model rather than being manually controlledAnd (5) controlling.
In an example, the parameters of the multi-mode coding layer and the evaluation feedback layer may be adjusted using the objective function in equation (5) as follows:
where λ 1 and λ 2 may be hyper-parameters, respectively, indicating the weighting of the respective components. Wherein the intervention mark (i 1,....,iT) is a true value 1, and the non-true value is 0.A t may indicate a dominance function for time t (ADVANTAGE FUNCTION), and a t may be derived based on the second sample evaluation feedback information (r 1,....,rt).
According to further embodiments, when the autopilot model comprises an assessment feedback layer, the second real driving data may further comprise a second intervention identifier (i 1,....,it), the second intervention identifier (i 1,....,it) being capable of characterizing whether the second real autopilot strategy information is autopilot strategy information with human intervention, the second real driving data comprising real assessment feedback information for the second real autopilot strategy information. And performing the second training of the autopilot model based on the second real driving data may further include: and inputting the implicit representation of the third sample into an evaluation feedback layer to obtain second sample evaluation feedback information which is output by the evaluation feedback layer and aims at the second predicted automatic driving strategy information. And adjusting parameters of the multi-mode coding layer and the decision control layer based on the second intervention identification, the second sample evaluation feedback information and the real evaluation feedback information for the second predicted automatic driving strategy information, and the second predicted automatic driving strategy information and the second real automatic driving strategy information.
Real evaluation feedback information for second real automatic driving strategy information in second real driving dataThe evaluation feedback information (evaluation of the driving experience of the automatically driven vehicle by the passenger or the driver) may be manually fed back, for example, it may be indicated whether the current driving behavior is derived from a human driver or a model, whether the current driving is comfortable, whether the current driving violates a traffic rule, whether the current driving belongs to dangerous driving, or the like.
In an example, the parameters of the multi-modal encoding layer 210 and the evaluation feedback layer 230 may be adjusted using an objective function as in equation (5) above.
In addition, feedback modeling may be used to learn a function to estimate the feedback information. In other words, the model itself may be made to estimate the expected benefit (i.e., the predicted outcome output by the above-mentioned assessment feedback layer 230) obtained by the current driving trajectory. For example, (r t) may be determined using equation (3) above. The parameters of the multi-modal coding layer 210 and the evaluation feedback layer 230 may then be adjusted using the objective function as in equation (5) above.
It will be appreciated that the actual driving data in performing driving using the autopilot model may include navigation information of the vehicle, current and historical awareness information for the vehicle environment, actual autopilot strategy information, and evaluation feedback information and intervention identifications, wherein the evaluation feedback information may be that of a safety/driver or that predicted by an evaluation feedback layer.
According to some embodiments, the evaluation feedback information (e.g., the evaluation feedback information, the first sample evaluation feedback information, or the second sample evaluation feedback information) may include at least one of: driving comfort information, driving safety information, driving efficiency, whether running lights are used civilized, driving behavior source information, and whether traffic regulation information is violated.
According to some embodiments, second real driving data in the process of controlling the target vehicle to perform the automatic driving by using the automatic driving model obtained by the first training may be acquired at preset time intervals, and the automatic driving model may be subjected to the second training again based on the newly acquired second real driving data.
The preset time may be, for example, half a day, one day, half a month, one month, etc., and may be set according to actual requirements, without limitation.
Thus, the autopilot model may be continuously iteratively trained based on driving data in real-vehicle driving and/or in simulation scenarios, thereby continuously optimizing the autopilot model. It will be appreciated that while the second training is performed iteratively on-line, the autopilot model may also be trained off-line based on sample input information employed by the first training.
According to some embodiments, the model training method may further comprise: after a second training of the autopilot model based on the second real driving data, the first training is performed again on the autopilot model comprising at least a multimodal coding layer and a decision control layer.
In an example, offline pre-training may be performed first, followed by online training, and based on the model obtained by the online training, the offline training and the online training may be performed. Thus, the model is constantly optimized through multiple iterative training.
According to some embodiments, obtaining second real driving data for controlling the target vehicle to perform the automatic driving using the automatic driving model obtained by the first training may include:
acquiring second real driving data in the process of controlling the target vehicle to execute automatic driving by using the automatic driving model obtained by the first training under the real driving scene, and/or
And acquiring second real driving data in the process of controlling the target vehicle to execute automatic driving by using the automatic driving model acquired by the first training under the simulated driving scene.
In an example, the second real driving data obtained by performing the automatic driving in the real driving scene may be mainly used, and the second real driving data obtained by performing the automatic driving in the simulated driving scene may be secondarily used. The input information of the simulation sample can be set as required, so that the simulation environment is utilized to mine a plurality of long tail samples, and the richness of the training samples is expanded. In other words, the amount of real vehicle driving data used in the training process of the automatic driving model may be greater than the amount of simulated vehicle driving data.
It will be appreciated that training based on driving data in a simulation environment may be included, whether in an offline pre-training phase or an online training phase.
According to some embodiments, the autopilot strategy information may include a target planned trajectory.
According to another aspect of the present disclosure, an autopilot model is provided. The autopilot model may be trained using a model training method according to embodiments of the present disclosure.
As shown in fig. 2, the autopilot model 200 includes a multi-modal encoding layer 210 and a decision control layer 220, where the multi-modal encoding layer 210 and the decision control layer 220 are connected to form an end-to-end neural network model, so that the decision control layer 220 directly obtains autopilot strategy information based on the output of the multi-modal encoding layer 210. The first input information of the multi-modal encoding layer 210 comprises navigation information of the target vehicle and perception information of the surroundings of the target vehicle obtained with the sensor, the perception information comprising current perception information and historical perception information for the surroundings of the target vehicle during driving of the vehicle, the multi-modal encoding layer 210 being configured to obtain an implicit representation corresponding to the first input information. The second input information of the decision control layer 220 comprises an implicit representation, the decision control layer being configured to obtain target autopilot strategy information based on the second input information.
Details regarding the autopilot model 200 are described in detail above and are therefore not repeated. Because the multi-mode coding layer 210 and the decision control layer 220 of the model to be trained are connected to form an end-to-end neural network model, the perception information in the sample information can be directly responsible for decision, and the problem of coupling between prediction and planning of the automatic driving model obtained through training can be solved. In addition, the introduction of implicit representation can overcome the problem that the algorithm is easy to fail due to the representation defect of the structured information. In addition, as the perception information in the sample information can be directly responsible for decision, the perception can capture information which is critical to decision, and error accumulation caused by perception errors in a model obtained through training is reduced. Furthermore, the perception is directly responsible for decision making, so that the automatic driving technology of a heavy perception light map is realized, the problem of decision failure caused by untimely updating of a high-precision map and limited area can be solved, and the dependence on the high-precision map is eliminated, so that the updating cost of the high-precision map can be reduced.
According to some embodiments, the autopilot model 200 may further include a future prediction layer configured to obtain future prediction information for the target vehicle surroundings based on the implicit representation corresponding to the entered first input information.
According to some embodiments, with further reference to fig. 2, the future prediction layers may include at least one of a first future prediction layer 240 and a second future prediction layer 250. The first future prediction layer 240 may be configured to represent output future prediction awareness information Out4 (e.g., sensor information at some time in the future) based on the input first sample implicit e t The sensor information of a future moment comprises camera input information or radar input information of the future moment); the second future prediction layer 250 may be configured to output a future prediction implicit representation out5 based on the input first sample implicit representation e t Such as an implicit representation of BEV space at some point in the future).
Illustratively, the future prediction layer may include, but is not limited to, a decoder in a transform.
In fig. 2, the future prediction layer and the decision control layer 220 are two independent network structures, respectively, and it can be understood that the future prediction layer and the decision control layer may also share the same network structure, i.e., the output of the network structure may include the future prediction information and the autopilot strategy information. Illustratively, the future prediction layer and the decision control layer may share the same decoder in the transform.
According to some embodiments, the future prediction information may include at least one of: future predictive awareness information for the surrounding environment of the vehicle (e.g., sensor information at some point in the future)The sensor information of a future moment comprises camera input information or radar input information of the future moment); and future prediction implicit representation corresponding to future prediction awareness information(E.g., an implicit representation of sensor information corresponding to a future point in time in BEV space).
According to another aspect of the present disclosure, an autopilot method implemented using an autopilot model is provided.
Fig. 7 shows a flow chart of an autopilot method 700 according to an embodiment of the present disclosure. As shown in fig. 7, the automatic driving method 700 includes:
step S710, controlling the target vehicle to perform automatic driving by using the automatic driving model 200; and
Step S720, obtaining real driving data in the automatic driving process, wherein the real driving data comprises navigation information of a target vehicle, real perception information aiming at the surrounding environment of the target vehicle and real automatic driving strategy information, and the real driving data is used for carrying out iterative training on an automatic driving model.
The navigation information of the target vehicle in the real driving data may include vectorized navigation information and vectorized map information, which may be obtained by vectorizing one or more of lane-level, or road-level navigation information and coarse positioning information. The real perception information may include perception information of one or more cameras, perception information of one or more lidars, and perception information of one or more millimeter wave radars on a vehicle in a real road scene. It is to be understood that the perception information of the surroundings of the target vehicle is not limited to the above-described one form, and may include, for example, only the perception information of a plurality of cameras, but not the perception information of one or more lidars and the perception information of one or more millimeter wave radars. The perceived information obtained by the camera may be perceived information in the form of a picture or video, and the perceived information obtained by the lidar may be perceived information in the form of a radar point cloud (e.g., a three-dimensional point cloud). The actual autopilot strategy information may include planned trajectories of the autopilot vehicle or control signals for the vehicle (e.g., signals to control throttle, brake, steering amplitude, etc.) collected in an actual road scene.
According to some embodiments of the application, the target vehicle may be controlled to perform autopilot using autopilot strategy information (e.g., planned trajectories) predicted by an autopilot model.
Fig. 8 shows a flow chart of an autopilot method 800 according to another embodiment of the present disclosure. According to some embodiments, as shown in fig. 8, the autopilot method 800 includes steps S810, S820 similar to steps S710, S720, respectively, in the autopilot method 700.
Step 810, controlling the target vehicle to execute automatic driving by using the automatic driving model 200 obtained by iterative training; and
Step S820, obtaining real driving data in the automatic driving process, wherein the real driving data comprises navigation information of a target vehicle, real perception information aiming at the surrounding environment of the target vehicle and real automatic driving strategy information, and the real driving data is used for carrying out iterative training on an automatic driving model; and
And step S830, controlling the target vehicle to execute automatic driving again by using the automatic driving model obtained through iterative training. Thus, the automatic driving task and the model training task can be synchronously performed in the real vehicle running process. In an example, the planned trajectory predicted by the autopilot model 200 or a control signal for the vehicle (e.g., a signal to control throttle, brake, steering amplitude, etc.) may be utilized to control the target vehicle to again perform autopilot. For example, a trajectory plan may be interpreted using a control strategy module in an autonomous vehicle to obtain control signals for the vehicle; or may utilize a neural network to directly output control signals for the vehicle based on the implicit representation.
According to some embodiments, in the automatic driving method 700 or the automatic driving method 800, real driving data in the process of performing automatic driving by using the automatic driving model control target vehicle may be acquired at preset time intervals, and the automatic driving model may be iteratively trained based on the newly acquired real driving data.
According to some embodiments, in the autopilot method 700 or the autopilot method 800, the real driving data may include an intervention identifier that can characterize whether the real autopilot strategy information is autopilot strategy information with human intervention.
In the running process of the real vehicle, a safety person can intervene at any time at critical time, the control right of the automatic driving vehicle is taken, and the possible collision during the running of the real vehicle can be avoided. After the crisis passes, control is returned to the autonomous vehicle. The intervention identification is used for representing whether the real automatic driving strategy information is automatic driving strategy information with human intervention. In other words, by introducing the intervention mark, the model can learn an automatic driving strategy of intervention of a safety person, the driving behavior learned by the model can be well aligned to the preferences of human passengers, and the user experience and safety are improved. Reinforcement learning of the circuit by a person can gradually learn to reduce the adverse events of intervention. Through the mechanism, the reinforcement learning efficiency can be improved, and the influence of the inferior experience on the learning process can be reduced, so that the robustness of the model obtained through training is further improved.
The training method of the automatic driving model provided by the embodiment of the application has the following advantages:
Unmanned annotation data are rare, and the model is pre-trained rapidly by utilizing the existing L2+, L4 and the like, so that the model can reach a certain standard when the model is on the bus. After getting on, further update iteration is performed through continuous reinforcement learning. Compared with the traditional reinforcement learning, the scheme has the following advantages:
a. risk-free training (Risk-FREE LEARNING). Unlike traditional reinforcement learning, which requires some costly behaviors to learn, the HRL technique can learn to avoid risks such as collision and violation completely without risk under the protection of a safety officer, so that the whole training process can learn not only in a simulation environment but also synchronously in a real environment.
B. The driving behavior can be well aligned to the preference of human passengers, the problems of efficiency, comfort, safety and the like can be comprehensively considered, and an optimal scheme is provided.
C. Extremely low training costs. Once the entire process is tuned on, the training and migration costs will be very small. When the unmanned vehicle is migrated to different cities, only drive test data are required to be collected in different areas.
D. massive data + large model advantage. Massive data is used for pre-training, and a long learning process of cold start is avoided, so that a larger model can be fully utilized, and a better effect is obtained. .
According to another aspect of the present disclosure, a training apparatus for an autopilot model is provided. The automatic driving model comprises a multi-mode coding layer and a decision control layer, wherein the multi-mode coding layer and the decision control layer are connected to form an end-to-end neural network model, so that the decision control layer directly obtains automatic driving strategy information based on the output of the multi-mode coding layer.
Fig. 9 shows a block diagram of a training apparatus 900 of an autopilot model in accordance with an embodiment of the present disclosure. The training apparatus 900 of the autopilot model is configured to perform a first training of the multimodal coding layer and the decision control layer, and includes:
a first real driving data acquisition unit 910 configured to acquire first real driving data during running of the vehicle, the first real driving data including first navigation information of the vehicle and first real perception information for a surrounding environment of the vehicle, the first real perception information including current perception information and history perception information for the surrounding environment of the vehicle;
A real automatic driving strategy information acquisition unit 920 configured to acquire first real automatic driving strategy information corresponding to the first real driving data based on first real perception information;
A multi-modal coding layer training unit 930 configured to input first sample input information including the first real driving data into the multi-modal coding layer to obtain a first sample implicit representation output by the multi-modal coding layer;
A decision control layer training unit 940 configured to input first intermediate sample input information including an implicit representation of the first sample into the decision control layer to obtain first predicted automatic driving strategy information output by the decision control layer; and
A parameter adjustment unit 950 configured to adjust parameters of the multi-modal encoding layer and the decision control layer based on the first predicted automatic driving strategy information and the first real automatic driving strategy information.
According to another aspect of the present disclosure, an autopilot based on an autopilot model is provided.
Fig. 10 shows a block diagram of an autopilot 1000 in accordance with an embodiment of the present disclosure. As shown in fig. 10, the automatic driving apparatus 1000 includes:
A control unit 1010 configured to control the target vehicle to perform autopilot using the autopilot model 200 described above; and
A second real driving data acquisition unit 1020 configured to acquire real driving data during automatic driving, the real driving data including navigation information of the target vehicle, real perception information for a surrounding environment of the target vehicle, and real automatic driving strategy information, the real driving data being used for iterative training of the automatic driving model.
It should be appreciated that the various modules or units of the apparatus 900 shown in fig. 9 may correspond to the various steps in the method 300 described with reference to fig. 3. Thus, the operations, features and advantages described above with respect to method 300 apply equally to apparatus 900 and the modules and units comprised thereof; and the various modules or units of the apparatus 1000 shown in fig. 10 may correspond to the various steps in the method 700 described with reference to fig. 7. Thus, the operations, features and advantages described above with respect to method 700 apply equally to apparatus 1000 and the modules and units comprised thereof. For brevity, certain operations, features and advantages are not described in detail herein.
Although specific functions are discussed above with reference to specific modules, it should be noted that the functions of the various units discussed herein may be divided into multiple units and/or at least some of the functions of the multiple units may be combined into a single unit.
It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various units described above with respect to fig. 9 and 10 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the units may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these units may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the units 910-950, and units 1010-1020 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip including one or more components of a Processor (e.g., a central processing unit (Central Processing Unit, CPU), microcontroller, microprocessor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), etc.), memory, one or more communication interfaces, and/or other circuitry, and may optionally execute received program code and/or include embedded firmware to perform functions.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an autopilot method or a training method of an autopilot model in accordance with embodiments of the present disclosure.
According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method of automated driving or a method of training an automated driving model according to an embodiment of the present disclosure.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a method of automatic driving or a method of training an automatic driving model according to embodiments of the present disclosure.
According to another aspect of the present disclosure, there is also provided an autonomous vehicle including the training device 900 of the autonomous driving model according to an embodiment of the present disclosure, the autonomous driving device 1000, and one of the above-described electronic apparatuses.
Referring to fig. 11, a block diagram of an electronic device 1100 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 11, the electronic device 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the electronic device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
A number of components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106, an output unit 1107, a storage unit 1108, and a communication unit 1109. The input unit 1106 may be any type of device capable of inputting information to the electronic device 1100, the input unit 1106 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 1107 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1108 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices through computer networks such as the internet and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the various methods and processes described above, such as the methods (or processes) 300-800. For example, in some embodiments, the methods (or processes) 300-800 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the methods (or processes) 300 to 800 described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the methods (or processes) 300-800 by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.